CN113223639A - Method for exploring structure, composition and property of perovskite oxide - Google Patents
Method for exploring structure, composition and property of perovskite oxide Download PDFInfo
- Publication number
- CN113223639A CN113223639A CN202110274280.5A CN202110274280A CN113223639A CN 113223639 A CN113223639 A CN 113223639A CN 202110274280 A CN202110274280 A CN 202110274280A CN 113223639 A CN113223639 A CN 113223639A
- Authority
- CN
- China
- Prior art keywords
- feature
- optimal
- descriptor
- algorithm
- descriptors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C60/00—Computational materials science, i.e. ICT specially adapted for investigating the physical or chemical properties of materials or phenomena associated with their design, synthesis, processing, characterisation or utilisation
Landscapes
- Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method for exploring the structure, component and property relation of perovskite oxide, which comprises the following steps: 1) collecting a data set of perovskite oxide as a data set sample; 2) preprocessing a data set, and analyzing a data sample value by using a Pearson correlation coefficient to find out a characteristic with strong correlation with a target variable; 3) carrying out feature dimension expansion on the features selected in the step 2) to obtain a new descriptor; 4) selecting the characteristics of the new descriptor to find an optimal descriptor subset; 5) performing linear regression fitting on the optimal descriptor subset to find an optimal descriptor; 6) and obtaining the corresponding relationship among the structure, the components and the properties through linear regression. The novel method provided by the invention does not depend on prior knowledge or a model, can obtain a large number of descriptors, finds the optimal descriptor from the descriptors and obtains the relationship among the structure, the components and the properties of the perovskite oxide, and provides a novel method for the exploration and research of materials.
Description
Technical Field
The invention relates to the field of designing the structure, the components and the properties of perovskite materials. Is a method for exploring the structural, component and property relationship of perovskite oxide. Particularly, the method combines feature engineering and a linear regression algorithm, a large number of descriptors are created based on the feature engineering, important descriptor subsets are obtained through feature selection, and then the optimal descriptors are found through the linear regression algorithm, and the relation among the structure, the components and the properties of the perovskite oxide is obtained.
Technical Field
In material discovery, it is critical to explore the relationship between the structure, composition and properties of the material. The large compositional possibilities of material compounds present a significant challenge for understanding the relationship between structure, composition and properties. Exploring relationships between structures, components and properties results from a large number of experiments in the laboratory, which often require a wide variety of equipment, require a significant amount of time and effort, and thus make it very difficult to explore relationships between structures, components and properties of materials.
In recent 5 years, machine learning has been widely used in material discovery, and can greatly reduce the calculation cost and shorten the development period. Therefore, it is the most effective method to replace laboratory experiments. However, the complexity of the machine learning process and the lack of model interpretability make it difficult to derive straightforward formulas that describe the relationship between material structure and composition. Therefore, it is important to find a reasonable method to investigate the structure of the material, and the relationship between the components and the properties is important.
Perovskite oxides have received much attention in electrocatalysts and fuel cells because of their excellent electrocatalytic properties. Perovskite oxide refers to an ABO of the type having a perovskite structure3A compound of the formula (I). Wherein the A site is a large radius cation and the B site is a small radius cation. The most important characteristic of the perovskite structure is that ions with greatly different radii can stably coexist in the same structure. Since the types and the amounts of elements which can be accommodated in the A and B sites are very wide, the types of compounds having perovskite structures are very large, which is a search for ABO3The relationship between the structure, composition and properties of perovskite oxides presents significant challenges.
To date, some methods of finding valid descriptors, such as symbolic regression algorithms, have been reported. The purpose of this method is to find several important descriptors describing the target property in a given feature space, or to find some hidden mathematical formula, so as to use these important descriptors to predict the target variable. These methods, while effective, require reliance on a number of conditions, such as large amounts of data, proper algorithms, etc., which are clearly difficult for materials scientists who are not familiar with computer algorithms. Therefore, the efficiency of these algorithms is very low.
Disclosure of Invention
In order to solve the problems of the prior art, the present invention aims to overcome the shortcomings of the prior art and provide a method for exploring the structural, compositional and property relationships of perovskite oxides. The method is helpful for exploring the relationship among the structure, the components and the properties of the perovskite oxide under the condition of no prior knowledge, finding out expressions for describing the correlation among the structure, the components and the properties, saving the experimental time and resources and accelerating the design and optimization of materials. The invention is realized by adopting the following technical scheme:
a method for exploring the structural, compositional and property relationships of perovskite oxides, comprising the steps of:
1) collecting a data set of perovskite oxides from the literature as a data set sample for the method;
2) preprocessing the data set, deleting the samples with the defect values, analyzing the Pearson correlation coefficient of the complete data sample values, and finding out the characteristic with strong correlation with the target variable;
3) establishing a new descriptor based on the feature engineering according to the features selected in the step 2);
4) selecting the characteristics of the generated new descriptors to find an optimal descriptor subset;
5) performing linear regression fitting on the optimal descriptor subset to find an optimal descriptor;
6) and (4) obtaining the relation of the corresponding structure, components and properties by using the found optimal descriptor as an input variable through a linear regression algorithm.
The implementation steps of the above technical solution will now be further described.
In the feature engineering in the step 3), a new descriptor is constructed through a transformation function and a feature combination, wherein the transformation function is beneficial to scaling the size of the feature, or the nonlinear relation between the feature and the target variable is converted into a linear relation; feature combining is one of the very useful methods in feature engineering, which combines two or more class features into one feature; feature combinations are cross-multiplications of all possible feature values, the features of each combination actually representing the synergy of the respective information; there are 9 changesFunction x, x-1,x2,x3,exLn | x |, ln (1+ | x |), log | x |, all of which are non-linearly combined to get more descriptors.
Further, the method for selecting the features of the descriptors in the step 4) adopts a sequential forward feature selection method, and the algorithm is as follows:
and selecting a feature subset training model. The model herein refers to the underlying algorithm employed by the sequential forward feature selection method, here a gradient boosting regression algorithm. The model is evaluated with a validation dataset. This is done for different feature subsets according to a sequential forward search algorithm. And selecting the optimal feature subset according to the evaluation result.
Further, the method for selecting the features of the descriptors in step 4) adopts a sequential forward feature selection method, and the method adopts a gradient lifting regression algorithm as a bottom-layer algorithm, wherein the gradient lifting regression algorithm is as follows:
the gradient lifting regression algorithm minimizes an objective function in each iteration by adding new regression trees, each new tree learns on the residual error of the last tree and trains along the negative gradient direction of the loss function, a plurality of weak learners are trained through multiple iterations, and finally the weak learners are linearly combined to generate a strong learner.
Further, the method for performing regression fitting on the optimal descriptor subset in step 5) adopts a linear regression algorithm, and the algorithm is as follows:
linear regression uses the least squares method or the gradient descent method to find the optimal model coefficients for a given data set; the least squares method does this by minimizing the sum of the squared errors between the fitted values and the actual values for each observation instance in the training data; the gradient descent method finds out the optimal model coefficient by updating the coefficient in each iteration; each update ensures that the sum of the model fit value and the actual value of the training data is reduced; after several iterations it finds the local minimum by moving in the direction of the negative gradient and thus finds the optimal model coefficients for a given data set.
Compared with the prior art, the invention has the following obvious prominent substantive characteristics and obvious advantages:
1. the method facilitates exploring ABO without prior knowledge3The relation among the structure, the components and the properties of the perovskite oxide is found, an expression for describing the relation among the structure, the components and the properties is found, the guidance is provided for exploring the relation among the structure, the components and the properties, and blindness is avoided; the design and optimization of the material are accelerated, and a new method is provided for the exploration and research of the material;
2. the invention does not relate to experiments and use chemical products in the whole process, does not generate chemical pollution and accords with the concept of green environmental protection; the method is easy to realize and is suitable for popularization and application;
3. the method is simple and easy to implement, saves experimental time and resources, has low cost, and is suitable for popularization and use.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
fig. 1 is a thermodynamic diagram of pearson correlation coefficient analysis performed on data sample values in embodiment 1 of the present invention; the fill fraction in each box corresponds to the value of the associated pearson correlation coefficient.
Fig. 2 is a process diagram of descriptor generation based on feature engineering in embodiment 1 of the present invention.
FIG. 3 is a scatter plot of optimum descriptor and target variable stability for perovskite oxides in example 1 of the present invention; the line represents the predicted value of stability, and the scatter represents the actual value of stability.
Detailed Description
To further clarify the objects, technical solutions and advantages of the present invention, the following detailed description of the present invention is given with reference to specific examples and drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not intended to limit the present invention.
The first embodiment is as follows:
in this embodiment, referring to fig. 1, a method for recursive automatic acquisition of descriptors for exploring the structural, compositional and property relationships of perovskite oxides comprises the steps of:
1) a data set of 4912 groups of perovskite oxides is collected from the literature as a data set sample for the method;
the perovskite oxide data set is described in table 1:
TABLE 1 perovskite oxide Material data set description
2) And preprocessing the data set, deleting the samples with the defect numerical values, and analyzing the Pearson correlation coefficient of the complete data sample values to find out the characteristic with strong correlation with the target variable. Fig. 1 shows how several characteristics of perovskite are correlated with the stability of the target variable, and all of the characteristics are strongly correlated with the stability of the target variable, and all of the characteristics are taken as effective characteristics.
3) Establishing a new descriptor based on the feature engineering according to the features selected in the step 2); the feature engineering constructs a new descriptor by combining a transformation function and features, the transformation function introduces an algorithm and an aggregation operator in a given feature to generate the new descriptor, and the transformation function is helpful for scaling the size of the feature or converting a nonlinear relation between the feature and a target variable into a linear relation; feature combining is one of the very useful methods in feature engineering, which combines two or more class features into one feature; feature combinations are cross-multiplications of all possible feature values, the features of each combination actually representing the synergy of the respective information. FIG. 2 shows feature engineering based featuresAnd (5) generating. x is the number ofi( i 1, 2.., n) represents the selected feature, the parameters following the arrow representing the new descriptor generated; there are 9 transformation functions x, x-1,x2,x3,exLn | x |, ln (1+ | x |), log | x |, all of which are non-linearly combined to get more descriptors.
4) Selecting the characteristics of the generated new descriptors to find an optimal descriptor subset;
further, the method for selecting the features of the descriptors in the step 4) adopts a sequential forward feature selection method, and the algorithm is as follows:
and selecting a feature subset training model. The model herein refers to the underlying algorithm employed by the sequential forward feature selection method, here a gradient boosting regression algorithm. The model is evaluated with a validation dataset. This is done for different feature subsets according to a sequential forward search algorithm. And selecting the optimal feature subset according to the evaluation result.
Further, the method for selecting the features of the descriptors in step 4) adopts a sequential forward feature selection method, which adopts a gradient lifting regression algorithm as a bottom-layer algorithm, and the algorithm is as follows:
the gradient lifting regression algorithm minimizes an objective function in each iteration by adding new regression trees, each new tree learns on the residual error of the last tree and trains along the negative gradient direction of the loss function, a plurality of weak learners are trained through multiple iterations, and finally the weak learners are linearly combined to generate a strong learner.
5) Performing linear regression fitting on the optimal descriptor subset to find an optimal descriptor; the selection of the best descriptors and corresponding evaluation indices by the linear regression model is shown in table 2:
further, the method for performing regression fitting on the optimal descriptor subset in step 5) adopts a linear regression algorithm, and the algorithm is as follows:
linear regression uses the least squares method or the gradient descent method to find the optimal model coefficients for a given data set; the least squares method does this by minimizing the sum of the squared errors between the fitted values and the actual values for each observation instance in the training data; the gradient descent method finds out the optimal model coefficient by updating the coefficient in each iteration; each update ensures that the sum of the model fit value and the actual value of the training data is reduced; after several iterations it finds the local minimum by moving in the direction of the negative gradient and thus finds the optimal model coefficients for a given data set.
TABLE 2 selection of optimal descriptors and corresponding evaluation indices for linear regression models
6) Finding the optimal descriptor by linear regression algorithmAs input variables, the corresponding relationships of structure, components and properties are obtained,FIG. 3 is an optimal descriptorA scatter plot of stability against a target variable; the line represents the predicted value of stability, and the scatter represents the actual value of stability.
The above-mentioned embodiments are described in further detail with reference to the accompanying drawings, and it should be understood that the present invention is not limited to the above-mentioned embodiments, and various changes and modifications can be made according to the purpose of the invention, and any modifications, substitutions, combinations, or simplifications made within the spirit and principle of the technical solution of the present invention shall be included in the scope of the present invention.
Claims (5)
1. A method for exploring the structural, compositional and property relationships of perovskite oxides, characterized by: comprises the following steps:
1) collecting a data set of perovskite material from the literature as a data set sample for the method;
2) preprocessing the collected data set, deleting the samples with defective numerical values, and analyzing the Pearson correlation coefficient of the complete data sample value to find out the characteristic with strong correlation with the target variable;
3) establishing a new descriptor based on the feature engineering according to the features selected in the step 2);
4) selecting the characteristics of the generated new descriptors to find an optimal descriptor subset;
5) performing linear regression fitting on the optimal descriptor subset to find an optimal descriptor;
6) and (4) obtaining the relation of the corresponding structure, components and properties by using the found optimal descriptor as an input variable through a linear regression algorithm.
2. A method for exploring the structural, compositional and property relationships of perovskite oxides as claimed in claim 1 wherein said feature engineering in step 3) constructs new descriptors by transformation functions and feature combinations, the transformation functions helping to scale the size of the features or to translate non-linear relationships between the features and the target variables into linear relationships; feature combining is one of the very useful methods in feature engineering, which combines two or more class features into one feature; feature combinations are cross-multiplications of all possible feature values, the features of each combination actually representing the synergy of the respective information; there are 9 transformation functions x, x-1,x2,x3,exLn | x |, ln (1+ | x |), log | x |, all of which are non-linearly combined to get more descriptors.
3. The method for exploring the structural, compositional and property relationships of perovskite oxides as claimed in claim 1, wherein said method of feature selection for descriptors in step 4) employs a sequential forward feature selection method, and the algorithm is as follows:
selecting a feature subset training model, wherein the model refers to a bottom-layer algorithm adopted by a sequential forward feature selection method, and a gradient lifting regression algorithm is adopted; evaluating the model with a validation dataset; performing the above operations on different feature subsets according to a sequential forward search algorithm; and selecting the optimal feature subset according to the evaluation result.
4. The method for exploring the structural, compositional and property relationships of perovskite oxides as claimed in claim 1, wherein said feature selection method for descriptors in step 4) employs a sequential forward feature selection method employing a gradient boosting regression algorithm as an underlying algorithm as follows: the gradient lifting regression algorithm minimizes an objective function in each iteration by adding new regression trees, each new tree learns on the residual error of the last tree and trains along the negative gradient direction of the loss function, a plurality of weak learners are trained through multiple iterations, and finally the weak learners are linearly combined to generate a strong learner.
5. A method for exploring the structural, compositional and property relationships of perovskite oxides as claimed in claim 1 wherein said method of regression fitting the optimal descriptor subset in step 5) employs a linear regression algorithm as follows:
linear regression uses the least squares method or the gradient descent method to find the optimal model coefficients for a given data set; the least squares method does this by minimizing the sum of the squared errors between the fitted values and the actual values for each observation instance in the training data; the gradient descent method finds out the optimal model coefficient by updating the coefficient in each iteration; each update ensures that the sum of the model fit value and the actual value of the training data is reduced; after several iterations it finds the local minimum by moving in the direction of the negative gradient and thus finds the optimal model coefficients for a given data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110274280.5A CN113223639A (en) | 2021-03-15 | 2021-03-15 | Method for exploring structure, composition and property of perovskite oxide |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110274280.5A CN113223639A (en) | 2021-03-15 | 2021-03-15 | Method for exploring structure, composition and property of perovskite oxide |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113223639A true CN113223639A (en) | 2021-08-06 |
Family
ID=77083718
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110274280.5A Pending CN113223639A (en) | 2021-03-15 | 2021-03-15 | Method for exploring structure, composition and property of perovskite oxide |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113223639A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116825227A (en) * | 2023-08-31 | 2023-09-29 | 桑若(厦门)光伏产业有限公司 | Perovskite component proportion analysis method and device based on depth generation model |
-
2021
- 2021-03-15 CN CN202110274280.5A patent/CN113223639A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116825227A (en) * | 2023-08-31 | 2023-09-29 | 桑若(厦门)光伏产业有限公司 | Perovskite component proportion analysis method and device based on depth generation model |
CN116825227B (en) * | 2023-08-31 | 2023-11-14 | 桑若(厦门)光伏产业有限公司 | Perovskite component proportion analysis method and device based on depth generation model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112132177B (en) | Machine learning based fast prediction of ABO 3 On-line forecasting method of perovskite band gap | |
CN113052367A (en) | Method for efficiently predicting stability of perovskite based on integrated machine learning | |
CN113223639A (en) | Method for exploring structure, composition and property of perovskite oxide | |
Abolhasani et al. | Role of AI in experimental materials science | |
Oweida et al. | Merging materials and data science: opportunities, challenges, and education in materials informatics | |
Xu et al. | Machine learning in energy chemistry: introduction, challenges and perspectives | |
CN113808681A (en) | ABO (abnormal noise) rapid prediction based on SHAP-Catboost3Method and system for specific surface area of perovskite material | |
Kalinin et al. | Designing workflows for materials characterization | |
CN117370568A (en) | Power grid main equipment knowledge graph completion method based on pre-training language model | |
Schaechtle et al. | Time series structure discovery via probabilistic program synthesis | |
CN112116091B (en) | Online forecasting method for rapidly forecasting organic-inorganic hybrid perovskite band gap based on machine learning | |
CN113051816A (en) | Optimization method of atmospheric pollution control scheme | |
CN114329805A (en) | Connecting piece multidisciplinary collaborative design optimization method based on self-adaptive agent model | |
CN113222455A (en) | Generator set parameter name matching method based on modular decomposition and matching | |
CN109145518B (en) | Method for constructing reliability decision graph model of large-scale complex equipment | |
CN112132183A (en) | Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm | |
Vasconcelos et al. | Hidden-state-only speciation and extinction models provide accurate tip estimates of diversification rates | |
Ismail et al. | A hybrid of Newton method and genetic algorithm for constrained optimization method of the production of metabolic pathway | |
CN116884536B (en) | Automatic optimization method and system for production formula of industrial waste residue bricks | |
Chen et al. | Database technologies for L-system simulations in virtual plant applications on bioinformatics | |
Baroš et al. | Parasitic Architectural Forms (PAF) S01. E02 “Methodology and Ontology” | |
CN115497573B (en) | Carbon-based biological and geological catalytic material property prediction and preparation method | |
Huang | A Study on the CO2 Emissions of Mainland China Using Deep Learning Models | |
Zhang et al. | Rethinking complexity for software code structures: A pioneering study on Linux kernel code repository | |
CN113744810B (en) | Microscopic model molecule generation track tracking method based on gas-solid heterogeneous reaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |