CN115579089A - Method for screening ideal band gap perovskite material based on machine learning - Google Patents

Method for screening ideal band gap perovskite material based on machine learning Download PDF

Info

Publication number
CN115579089A
CN115579089A CN202211397291.3A CN202211397291A CN115579089A CN 115579089 A CN115579089 A CN 115579089A CN 202211397291 A CN202211397291 A CN 202211397291A CN 115579089 A CN115579089 A CN 115579089A
Authority
CN
China
Prior art keywords
band gap
perovskite material
machine learning
feature
screening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211397291.3A
Other languages
Chinese (zh)
Inventor
冯晶
杨超
种晓宇
何京津
余威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202211397291.3A priority Critical patent/CN115579089A/en
Publication of CN115579089A publication Critical patent/CN115579089A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C60/00Computational materials science, i.e. ICT specially adapted for investigating the physical or chemical properties of materials or phenomena associated with their design, synthesis, processing, characterisation or utilisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • G16C20/64Screening of libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E10/00Energy generation through renewable energy sources
    • Y02E10/50Photovoltaic [PV] energy
    • Y02E10/549Organic PV cells

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medicinal Chemistry (AREA)
  • Library & Information Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an ideal band gap perovskite material screening method based on machine learning, which comprises the steps of collecting experimental band gap data of an organic-inorganic hybrid perovskite material, constructing a feature pool by utilizing perovskite intrinsic features, calculating Pearson correlation coefficients among features, eliminating redundant features with strong correlation, performing importance ranking on the remaining features by utilizing a gradient lifting regression tree algorithm, performing iterative learning according to the ranking order to screen an optimal sub-feature combination when the model precision is highest, and constructing a machine learning band gap prediction model based on the gradient lifting regression algorithm and a symbolic regression algorithm through the optimal sub-features. The invention utilizes the intrinsic characteristics of elements as the intermediate input between the components and the band gap, reduces the characteristic dimension and the model complexity compared with the method of directly using the components as the input, reduces the model dimension to one dimension by combining the provided sub-characteristic screening method and the symbolic regression algorithm, has simple model and convenient use on the premise of ensuring the precision, and is beneficial to large-scale prediction screening.

Description

Method for screening ideal band gap perovskite material based on machine learning
Technical Field
The invention relates to the field of perovskite solar cells, in particular to a method for screening an ideal band gap perovskite material based on machine learning.
Background
Perovskite solar cells are widely concerned due to the characteristics of low preparation cost, high photoelectric conversion efficiency, flexible device manufacturing and the like. As a light absorption layer of a perovskite solar cell, organic and inorganic hybrid perovskites are mainly characterized by large adjustable range of band gap (forbidden bandwidth) and high carrier mobility. According to the Shockley-Queisser (SQ) theory, the ideal band gap value of the light absorption layer of the solar cell is 1.3-1.4eV, and the photoelectric conversion efficiency of the solar cell reaches an upper limit value in the range. However, the chemical composition space of the hybrid perovskite is too large due to too many constituent elements, and the method of regulating and controlling the band gap by regulating and controlling the chemical composition by adopting an experimental trial and error method is long in time consumption and high in cost.
The machine learning method is a fourth paradigm of material research, and can quickly establish a mapping relation between effective material characteristic input and one or more properties as output by constructing the material characteristic input, so as to achieve the purpose of predicting the properties of a new material through the characteristic input.
The existing method for screening other materials by utilizing a machine learning model is to directly predict the material properties from the material components, and when the material composition is excessive, the input dimension of the model is very high, so that the model is very complex and is not beneficial to large-scale rapid screening of subsequent components.
Disclosure of Invention
The invention aims to provide an ideal band gap perovskite material screening method based on machine learning, and by the method, a high-precision machine learning model can be constructed, and the rapid prediction of the band gap value of organic and inorganic hybrid perovskite and the screening of perovskite material components are realized.
The invention discloses a method for screening an ideal band gap perovskite material based on machine learning, which comprises the following steps of:
step 1, perovskite material data and band gap experimental values corresponding to each perovskite material are collected, the element composition of all perovskite materials is ABX, and the sum of element dose ratios of three positions of A, B and X is 1:1:3, wherein A represents any one, any two or three combinations of Cs, FA and MA, and FA is HC (NH) 2 ) 2 MA is CH 3 NH 3 B represents any one or two combinations of Pb and Sn, and X represents any one or two or three combinations of Br, cl and I;
step 2, taking the element dose ratio as the weight, carrying out weighted mathematical operation on the intrinsic characteristics corresponding to the elements A, B and X to obtain weighted average characteristics, then carrying out addition, subtraction and division operation on the weighted average characteristics to obtain operation characteristics, and taking the weighted average characteristics and the operation characteristics as initial characteristics;
step 3, redundant initial features with correlation greater than 0.95 are eliminated by calculating a Pearson correlation coefficient among the initial features, and a feature pool is constructed;
step 4, based on the feature pool obtained by calculation in the step 3, using a GBRT algorithm to perform feature importance sequencing;
step 5, performing sub-feature iterative screening by taking the prediction accuracy of the GBRT algorithm test set as an objective function, and screening out a sub-feature set corresponding to the GBRT model with the highest accuracy;
step 6, constructing a band gap prediction model of the perovskite material by taking the sub-feature set screened out in the step 5 as an input independent variable and the perovskite experiment band gap value as an output dependent variable;
and 7, according to the element composition, constructing a perovskite material component data set to be screened according to the component gradient with the element dose ratio of 0-1 and the step length of 0.01, and predicting and screening the perovskite material corresponding to the ideal band gap by using a band gap prediction model.
The invention has the beneficial effects that:
1. in the prior art, a screening mode of perovskite material components for the solar cell with an ideal band gap is an experimental trial and error method, and an ideal band gap value is obtained by continuously adjusting a distribution ratio. The method uses the machine learning model to screen the optimal components, and compared with the traditional experiment trial and error method, the certainty of the experiment direction is increased, and the cost and time of the experiment trial and error are reduced.
2. The method has the advantages that the intrinsic characteristics of elements are introduced as intermediate input between the components and the band gap, and the mode that the components are directly used as input in the traditional machine learning method is replaced, so that the dimension of model input and the complexity of the model are reduced, and the rapid screening of ten million-level potential components can be realized.
3. The traditional material screening and modeling based on the machine learning technology does not have a set of systematic characteristic screening process, and the invention realizes the rapid dimension reduction process from high-dimensional characteristics to low-dimensional characteristics through a layer-by-layer progressive screening process combining Pearson correlation coefficient screening, characteristic importance sorting and sub-characteristic iterative screening, thereby further reducing the complexity of model input.
The preferred embodiments of the invention are: and 6, constructing the band gap prediction model based on a GBRT algorithm.
Explanation: the gradient boosting regression tree algorithm (namely GBRT algorithm) is a classic integrated algorithm in the field of machine learning, and is jointly completed by a plurality of weak learners (decision trees), and the prediction results of each weak learner are accumulated by chain calling to be used as final prediction results to be output. The weak learner (decision tree) is a model for making decision prediction in a tree structure (including binary tree and multi-branch tree). And carrying out classification judgment on the independent variables through the information entropy, and obtaining a prediction result at the tail end of the tree.
By adopting a band gap prediction model constructed by a GBRT algorithm, organic and inorganic hybrid perovskite material components with target band gap values can be rapidly screened, the blindness of an experimental trial and error method is avoided, and the experimental cost is remarkably reduced.
The preferred embodiments of the invention are: in step 6, the band gap prediction model constructs a band gap empirical prediction formula based on a symbolic regression algorithm of a genetic algorithm.
The preferred embodiments of the invention are: in step 2, the intrinsic characteristics of the elements include: goldsccmidt tolerance factor, octahedral factor, average Pauli electronegativity, average fragrant ion radius, average electron affinity, s, p, d, f orbital average electron number, average atomic polarizability and average atomic radius.
The preferred embodiments of the invention are: in step 4, the GBRT algorithm indirectly calculates the importance index of each feature by calculating the information gain generated when the features are increased or decreased, and the sum of the importance indexes of all the features is 1.
The gradient lifting regression tree algorithm can calculate the influence degree (importance relation) of each independent variable feature on the output variable, so that the GBRT algorithm can be used for carrying out importance ranking on the independent variable features so as to screen main features and eliminate redundant features.
The preferred embodiments of the present invention are: in step 5, the method for iterative screening of the sub-features comprises the following steps: and 4, sorting according to the importance of the features in the step 4, deleting the last feature after each iteration, reserving the remaining features for GBRT model training, and screening out the corresponding sub-feature set when the GBRT model has the highest precision.
The preferred embodiments of the invention are: in step 5, the method for sub-feature iterative screening comprises the following steps: and adopting a ten-fold cross validation method, taking the root mean square error and the decision coefficient of the band gap value predicted by the GBRT model and the acquired band gap experimental value as judgment standards, and performing sub-feature iterative screening.
The preferred embodiments of the present invention are: the band gap empirical prediction formula is as follows:
Figure BDA0003933700830000031
wherein x B-X Is the difference between the weighted average Paglie electronegativity of the B-bit element and the X-bit element.
The symbolic regression algorithm based on the genetic algorithm is an excellent algorithm for screening the optimal mathematical formula, iterative screening is carried out in a preset formula pool by simulating the characteristics of biogenetic evolution of nature organisms, the formula tree is randomly processed by using variation operation, and the optimal mathematical formula meeting the precision threshold is finally screened.
The precedent of applying the existing machine learning technology to material science modeling only focuses on generating a complex model file, neglects the expression of a mathematical formula with intuition, constructs a mathematical relation between element characteristics in components and band gaps by utilizing a symbolic regression algorithm while establishing the complex model file, has the characteristics of high precision and convenient use, and experimenters can quickly regulate and control band gap values based on the formula to guide the experimental process.
The preferred embodiments of the present invention are: in step 2, the method also comprises the step of carrying out normalization processing on the initial characteristics according to the following formula,
Figure BDA0003933700830000041
wherein x is normalization The initial feature after normalization, x, the standard deviation of x, and the mean of x.
By normalizing the initial features, the initial features are scaled to the same range, and negative effects on model accuracy when the initial feature values are different greatly are avoided.
The preferred embodiments of the present invention are: in step 7, the screened perovskite materials corresponding to the ideal band gaps have the chemical formulas:
MA 0.61 FA 0.07 Cs 0.32 Pb 0.68 Sn 0.32 (Br 0.1 I 0.9 ) 3 、MA 0.68 FA 0.03 Cs 0.29 Pb 0.66 Sn 0.34 (Br 0.24 I 0.76 ) 3 and MA 0.02 FA 0.08 Cs 0.9 Pb 0.5 Sn 0.5 (Br 0.3 I 0.7 ) 3
Drawings
FIG. 1 is a flow chart of the method for screening the ideal band gap perovskite material based on machine learning;
FIG. 2 is a graph of the results of the gradient boosting regression tree algorithm after ranking the feature importance;
FIG. 3 is a graph showing the results of the sub-feature screening;
FIG. 4 is a graph of gradient lifting regression tree model accuracy;
fig. 5 is a graph showing the formula screening result and the formula prediction accuracy of the symbolic regression algorithm in the second embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it should be understood that the preferred embodiments described below are only for illustrating the present invention and do not limit the scope of the present invention.
Example one
As shown in figure 1: the method for screening the ideal band gap perovskite material based on machine learning comprises the following steps:
step 1, perovskite material data and band gap experimental values corresponding to each perovskite material are collected to form a data set, and the element composition of all perovskite materials is ABX, and the element dose ratio sum of three positions of A, B and X is 1:1:3, wherein A represents any one, any two or three combinations of Cs, FA and MA, and FA is HC (NH) 2 ) 2 MA is CH 3 NH 3 B represents any one or two combinations of Pb and Sn, and X represents any one or two or three combinations of Br, cl and I; for example component MA 0.3 FA 0.4 Cs 0.3 Pb 0.6 Sn 0.4 (Br 0.2 I 0.8 ) 3 The MA, FA, and Cs dose ratio sum is 1, pb and Sn dose ratio sum is 1, br, and I dose ratio sum is 3. The total number of 600 data is obtained, and the data set is randomly divided into a training set and a test set according to a proportion of 5.
And 2, taking the element dose ratio as the weight, performing weighted mathematical operation on the intrinsic characteristics corresponding to the elements A, B and X to obtain weighted average characteristics, performing addition, subtraction and division operation on the weighted average characteristics to obtain operation characteristics, and taking the weighted average characteristics and the operation characteristics as initial characteristics.
In step 2, the intrinsic characteristics corresponding to the elements A, B and X comprise Goldschmidt tolerance factors, octahedral factors, average Pauli electronegativity, average aromatic ion radius, average electron affinity, average electron numbers of s, p, d and f orbits, average atomic polarizability and average atomic radius. Wherein the A site element only uses the aromatic ion radius, and the B site and the above intrinsic characteristics of the X site element are both used. The intrinsic characteristics, the calculation formula of the intrinsic characteristics and the formula interpretation are shown in table 1:
Figure BDA0003933700830000051
TABLE 1
For example: the intrinsic characteristic of the element corresponds to the symbol IR in the table. A. The B and X position elements are weighted average calculated by using the IR of the aromatic ion according to the corresponding formula to generate IR A ,IR B ,IR X Three characteristics, the addition, the subtraction and the division are continuously used among the three characteristics for mathematical operation, and finally the characteristic aiming at the IR is the IR A ,IR B ,IR X ,IR A-B ,IR A-X ,IR A+B ,IR A+X ,IR A/B ,IR A/X ,IR B-X ,IR B+X ,IR B/X . Besides Golddscmidt tolerance factor and octahedral factor, other intrinsic features are constructed using this method for the initial features. Goldsccmidt tolerance factor and octahedron factor do not distinguish A, B and X elements, and both are directly generated by the average aromatic ion radius of A and B elements through corresponding formulas in a table. According to the steps, 50 characteristics are correspondingly generated in each piece of data.
In this embodiment, in step 2, the method further includes performing normalization processing on the initial features according to the following formula,
Figure BDA0003933700830000061
wherein x is normalization The normalized initial feature is x, the initial feature before normalization is x, mu is the standard deviation of x, and sigma is the mean value of x.
Step 3, redundant initial features with correlation >0.95 are eliminated by calculating the Pearson correlation coefficient among the initial features, a feature pool is constructed, and the redundant initial features and the reserved initial features are shown in a table 2:
Figure BDA0003933700830000062
TABLE 2
In Table 2, IR A Average ion radius of aroma, IR, representing A site element A-B Represents the difference between the average aromatic ion radii of the A site element and the B site element.
IR X Represents the average aromatic ionic radius, IR, of the X-site element B+X Represents the sum of the average aromatic ion radii of B site element and X site element, IR B-X Represents the difference between the average ion radii of B site element and X site element, IR B/X Represents the average ion radius of B site element divided by the average ion radius of X site element, EA X Represents the average electron nucleophilic potential, DP, of the X-position element X Representing the average atomic polarizability of X-site elements, AR X Represents the average atomic radius of X-bit element, X represents the average Paglie electronegativity of X-bit element, X B Represents the average Pagli electronegativity, χ, of the B-bit element B-X Representing the difference between the weighted average Paglie electronegativity of the B-bit element and the X-bit element.
IR B Represents the average aromatic ion radius of B-site element, EA B Represents the average electron nucleophilic potential, DP, of the B-site element B Representing the average atomic polarizability of B-site elements, AR B Represents the average atomic radius of B-site element, f B Represents the average number of electrons in the f orbital of the B-site element, f B-X Represents the difference between the average number of electrons in the f orbitals of the B-bit element and the X-bit element.
IR A+X Represents the sum of the average aromatic ion radii of the A site element and the X site element, IR A-X Represents the difference between the average ion radii of the A site element and the X site element, IR A/X Represents the average ion radius of A bit element divided by the average ion radius of X bit element, EA B-X Represents the difference between the average electron nucleophilic potentials of the B-site element and the X-site element, DP B-X Representing the difference in average atomic polarizabilities of the B-site element and the X-site element, AR B-X RepresentsDifference of average atomic radius of B-bit element and X-bit element, s B Represents the average number of electrons of the s orbital of the B-site element, p B Represents the average number of electrons in p orbital of B-site element, d B Represents the average number of electrons of d orbital of the B-site element, s X Represents the average number of electrons of the s orbital of the X-bit element, s B-X Represents the difference between the average number of electrons in the s-orbitals of the B-bit element and the X-bit element, p X Represents the average number of electrons of p orbitals of the X-bit element, p B-X Represents the difference between the average number of electrons in p-orbitals of the B-bit element and the X-bit element, d X Represents the average number of electrons of d orbitals of the X-bit element, d B-X Represents the difference between the average number of electrons in d-orbitals of the B-bit element and the X-bit element, f X Represents the average number of electrons in the f orbital of the X-bit element.
And 4, based on the feature pool obtained by calculation in the step 3, performing feature importance ranking by using a GBRT algorithm, specifically, taking 11 features as input, and calculating information gain generated when the features are increased or decreased by using the GBRT algorithm to indirectly calculate the importance index of each feature, wherein the sum of the importance indexes of all the features is 1. The results of the ranking by importance are shown in fig. 2.
Step 5, performing sub-feature iterative screening by taking the prediction accuracy of the GBRT algorithm test set as an objective function, and screening out a corresponding sub-feature set when the accuracy of the GBRT model is highest; the method for sub-feature iterative screening comprises the following steps: and 4, sorting according to the importance of the features in the step 4, deleting the last feature after each iteration, reserving the rest features for GBRT model training, and screening out the corresponding sub-feature set when the GBRT model has the highest precision, wherein the screening result of the sub-feature set is shown in fig. 3.
And 6, constructing a band gap prediction model of the perovskite material by taking the sub-feature set screened in the step 5 as an input independent variable and the perovskite experiment band gap value as an output dependent variable, wherein the band gap prediction model is constructed on the basis of a GBRT algorithm. The construction process comprises the following steps: after the characteristics are input into the program, the hyperparameters required by the GBRT model program are adjusted, so that the GBRT model has the maximum precision (the root mean square error between the band gap value predicted by the GBRT model and the acquired band gap experimental value is minimum), the input hyperparameters at the moment are reserved, a model file is output, and the band gap prediction model construction is completed. The specific values and descriptions of the optimal hyperparameters are shown in table 3, and the final precision of the band gap prediction model is shown in fig. 4.
Figure BDA0003933700830000081
TABLE 3
Step 7, according to the element composition, constructing a perovskite material component data set to be screened according to the component gradient of 0-1 element dose ratio and 0.01 step length, and constructing 5x10 7 And (3) a component data set to be screened is used for predicting and screening the perovskite material corresponding to the ideal band gap by using a band gap prediction model, wherein the ideal band gap range is 1.3-1.4eV.
In step 7, the screened perovskite materials corresponding to the ideal band gaps have the chemical formulas:
MA 0.61 FA 0.07 Cs 0.32 Pb 0.68 Sn 0.32 (Br 0.1 I 0.9 ) 3 、MA 0.68 FA 0.03 Cs
0.29 Pb 0.66 Sn 0.34 (Br 0.24 I 0.76 ) 3 and MA 0.02 FA 0.08 Cs 0.9 Pb 0.5 Sn 0.5 (Br 0.3 I 0.7 ) 3
Wherein, MA is 0.61 FA 0.07 Cs 0.32 Pb 0.68 Sn 0.32 (Br 0.1 I 0.9 ) 3 The predicted value of the band gap is 1.36eV 0.68 FA 0.03 Cs
0.29 Pb 0.66 Sn 0.34 (Br 0.24 I 0.76 ) 3 The predicted value of the band gap is 1.39eV 0.02 FA 0.08 Cs 0.9 Pb 0.5 Sn 0.5 (Br 0.3 I 0.7 ) 3 The predicted value of the band gap of (A) is 1.39eV.
In this embodiment, the method further includes the step of verifying: according to the screened 3 perovskite material components, preparing the perovskite thin film by using a one-step spin coating method, which specifically comprises the following steps:
screened in step 7The perovskite material is prepared by mixing the components in proportion of CH 3 NH 3 I,HC(NH 2 ) 2 I,CsI,CH 3 NH 3 Br,HC(NH 2 ) 2 Br,CsBr,CH 3 NH 3 Cl,HC(NH 2 ) 2 Dissolving Cl and CsCl in 1mL of mixed solvent with the volume ratio of DMF to DMSO being 7.5, stirring for 5h at room temperature, and filtering to obtain perovskite precursor solution; in a glove box in nitrogen atmosphere, a perovskite precursor solution is further spin-coated on a glass substrate to prepare a perovskite thin film, the spin-coating rotation speed is 3000rpm (revolutions per minute), chlorobenzene is used as an anti-solvent, the glass substrate is placed on a 150 ℃ hot bench to be annealed for 30min under the irradiation of 254nm ultraviolet light after the spin-coating is finished, and then the glass substrate is cooled to room temperature. The thickness of the resulting perovskite thin film was about 400nm. And (3) taking the prepared ten-component film, testing the absorption spectrum in an ultraviolet spectrophotometer, and calculating a corresponding band gap value. The band gap value is tested by an ultraviolet spectrophotometer, and the average absolute error between the band gap test value and the predicted value is only 0.02eV.
In the embodiment, a machine learning algorithm is utilized to establish an organic-inorganic hybrid perovskite band gap prediction model, so that rapid prediction from components to band gap values is realized, synthesis of perovskite materials with ideal band gap values is guided, ideal components capable of serving as perovskite solar cell light absorption layers are successfully screened out, experimental test results are very consistent with predicted values, experimental cost is greatly reduced, and 'blindness' of an experimental trial and error method is avoided.
Example two
The difference between the present embodiment and the first embodiment is:
in step 5, the method for sub-feature iterative screening is as follows: adopting a ten-fold cross verification method to calculate the Root Mean Square Error (RMSE) and the coefficient of determination (R) of the band gap value predicted by the GBRT model and the acquired band gap experimental value 2 ) And as a judgment standard, performing sub-feature iterative screening. The method comprises the following steps: according to the feature importance ranking in the step 4, deleting the last feature in each iteration, reserving the remaining features for machine learning model training, and screening out a corresponding sub-feature set when the root mean square error of the predicted band gap value and the acquired experimental band gap value is lowest and the coefficient determining value is highest;
dividing the training set into ten parts by using a ten-fold cross validation method, training ten times, taking 9 parts of the training set in each training as the training set, taking the rest part of the training set as the test set, and enabling the model precision to be determined by RMSE and R between predicted values and experimental values of the ten test sets 2 And (4) showing.
In this embodiment, a ten-fold cross-validation method is used to train the machine learning model, and the root mean square error and the decision coefficient R are used 2 The value is expressed by precision, a high-precision machine prediction model is obtained by training, the root mean square error is 0.05eV, and the determination coefficient is 0.99. The constructed machine learning model can quickly and accurately predict the perovskite band gap value.
The present embodiment is different from the first embodiment in that: in step 6, the band gap prediction model constructs a band gap empirical prediction formula based on a symbolic regression algorithm of a genetic algorithm.
The band gap empirical prediction formula is as follows:
Figure BDA0003933700830000091
wherein x B-X Is the difference between the weighted average pauli electronegativity of the B-bit element and the X-bit element.
The formula screening results and formula prediction accuracy of the symbolic regression algorithm are shown in figure 5.
The preferred embodiments of the present application are described in detail above with reference to the drawings, and typical known structures and common general knowledge in the art are not described herein too much, so that a person skilled in the art can complete and implement the technical solution of the present invention based on the teaching of the embodiments, and some typical known structures, known methods or common general knowledge in the art should not be considered as obstacles for the person skilled in the art to implement the present application.
The scope of the claims of the present application shall be determined by the content of the claims, and the content of the invention, the detailed description, and the drawings of the specification shall be interpreted as the claims.
Several modifications may be made to the embodiments of the present application within the scope of the technical idea of the present application, and the embodiments after such modifications should also be considered within the scope of the present application.

Claims (10)

1. A method for screening an ideal band gap perovskite material based on machine learning is characterized by comprising the following steps:
step 1, perovskite material data and band gap experimental values corresponding to each perovskite material are collected, the element composition of all perovskite materials is ABX, and the sum of element dose ratios of three positions of A, B and X is 1:1:3, wherein A represents any one, any two or three combinations of Cs, FA and MA, and FA is HC (NH) 2 ) 2 MA is CH 3 NH 3 B represents any one or two combinations of Pb and Sn, and X represents any one, any two or three combinations of Br, cl and I;
step 2, taking the element dose ratio as the weight, carrying out weighted mathematical operation on the intrinsic characteristics corresponding to the elements A, B and X to obtain weighted average characteristics, then carrying out addition, subtraction and division operation on the weighted average characteristics to obtain operation characteristics, and taking the weighted average characteristics and the operation characteristics as initial characteristics;
step 3, redundant initial features with correlation greater than 0.95 are eliminated by calculating a Pearson correlation coefficient among the initial features, and a feature pool is constructed;
step 4, based on the feature pool obtained by calculation in the step 3, using a GBRT algorithm to perform feature importance sequencing;
step 5, performing sub-feature iterative screening by taking the prediction accuracy of the GBRT algorithm test set as an objective function, and screening out a sub-feature set corresponding to the GBRT model with the highest accuracy;
step 6, constructing a band gap prediction model of the perovskite material by taking the sub-feature set screened out in the step 5 as an input independent variable and the perovskite experiment band gap value as an output dependent variable;
and 7, according to the element composition, constructing a perovskite material component data set to be screened according to the component gradient with the element dose ratio of 0-1 and the step length of 0.01, and predicting and screening the perovskite material corresponding to the ideal band gap by using a band gap prediction model.
2. The machine learning-based ideal bandgap perovskite material screening method according to claim 1, wherein: and 6, constructing the band gap prediction model based on a GBRT algorithm.
3. The machine learning-based ideal bandgap perovskite material screening method according to claim 1, wherein: in step 6, the band gap prediction model constructs a band gap empirical prediction formula based on a symbolic regression algorithm of a genetic algorithm.
4. The machine learning-based ideal bandgap perovskite material screening method of claim 1, wherein:
in step 2, the intrinsic characteristics of the elements include: goldsccmidt tolerance factor, octahedral factor, average Pauli electronegativity, average fragrant ion radius, average electron affinity, s, p, d, f orbital average electron number, average atomic polarizability and average atomic radius.
5. The machine learning-based ideal bandgap perovskite material screening method of claim 1, wherein:
in step 4, the GBRT algorithm indirectly calculates the importance index of each feature by calculating the information gain generated when the features are increased or decreased, and the sum of the importance indexes of all the features is 1.
6. The machine learning-based ideal bandgap perovskite material screening method of claim 1, wherein:
in step 5, the method for sub-feature iterative screening comprises the following steps: and 4, sorting according to the importance of the features in the step 4, deleting the last feature after each iteration, reserving the rest features for GBRT model training, and screening out the corresponding sub-feature set when the GBRT model has the highest precision.
7. The machine learning-based ideal bandgap perovskite material screening method of claim 1, wherein:
in step 5, the method for sub-feature iterative screening comprises the following steps: and adopting a ten-fold cross validation method, taking the root mean square error and the decision coefficient of the band gap value predicted by the GBRT model and the acquired band gap experimental value as judgment standards, and performing sub-feature iterative screening.
8. The machine learning-based ideal bandgap perovskite material screening method of claim 3, wherein:
the band gap empirical prediction formula is as follows:
Figure FDA0003933700820000021
wherein x B-X Is the difference between the weighted average pauli electronegativity of the B-bit element and the X-bit element.
9. The machine learning-based ideal bandgap perovskite material screening method according to claim 1, wherein:
in step 2, the method also comprises the step of carrying out normalization processing on the initial characteristics according to the following formula,
Figure FDA0003933700820000022
wherein x is normalization The initial feature after normalization, x, the standard deviation of x, and the mean of x.
10. The machine learning-based ideal bandgap perovskite material screening method according to any one of claims 1 to 9, wherein:
in step 7, the screened perovskite materials corresponding to the ideal band gaps have the chemical formulas:
MA 0.61 FA 0.07 Cs 0.32 Pb 0.68 Sn 0.32 (Br 0.1 I 0.9 ) 3 、MA 0.68 FA 0.03 Cs 0.29 Pb 0.66 Sn 0.34 (Br 0.24 I 0.76 ) 3 and MA 0.02 FA 0.08 Cs 0.9 Pb 0.5 Sn 0.5 (Br 0.3 I 0.7 ) 3
CN202211397291.3A 2022-11-09 2022-11-09 Method for screening ideal band gap perovskite material based on machine learning Pending CN115579089A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211397291.3A CN115579089A (en) 2022-11-09 2022-11-09 Method for screening ideal band gap perovskite material based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211397291.3A CN115579089A (en) 2022-11-09 2022-11-09 Method for screening ideal band gap perovskite material based on machine learning

Publications (1)

Publication Number Publication Date
CN115579089A true CN115579089A (en) 2023-01-06

Family

ID=84589183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211397291.3A Pending CN115579089A (en) 2022-11-09 2022-11-09 Method for screening ideal band gap perovskite material based on machine learning

Country Status (1)

Country Link
CN (1) CN115579089A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116825227A (en) * 2023-08-31 2023-09-29 桑若(厦门)光伏产业有限公司 Perovskite component proportion analysis method and device based on depth generation model
CN117275634A (en) * 2023-11-20 2023-12-22 桑若(厦门)光伏产业有限公司 Perovskite solar cell design method and device based on machine learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116825227A (en) * 2023-08-31 2023-09-29 桑若(厦门)光伏产业有限公司 Perovskite component proportion analysis method and device based on depth generation model
CN116825227B (en) * 2023-08-31 2023-11-14 桑若(厦门)光伏产业有限公司 Perovskite component proportion analysis method and device based on depth generation model
CN117275634A (en) * 2023-11-20 2023-12-22 桑若(厦门)光伏产业有限公司 Perovskite solar cell design method and device based on machine learning
CN117275634B (en) * 2023-11-20 2024-05-28 桑若(厦门)光伏产业有限公司 Perovskite solar cell design method and device based on machine learning

Similar Documents

Publication Publication Date Title
CN115579089A (en) Method for screening ideal band gap perovskite material based on machine learning
CN109802430B (en) Wind power grid control method based on LSTM-Attention network
Guo et al. Exploring interpretable LSTM neural networks over multi-variable data
Chen et al. Biogeography-based learning particle swarm optimization
CN112116147A (en) River water temperature prediction method based on LSTM deep learning
CN103106544B (en) A kind of photovoltaic generation prognoses system based on T-S Fuzzy neutral net
She et al. Machine learning-guided search for high-efficiency perovskite solar cells with doped electron transport layers
CN110956312A (en) Photovoltaic power distribution network voltage prediction method based on EMD-CNN deep neural network
CN1060915A (en) The universal process control of using artificial neural networks
CN109492748B (en) Method for establishing medium-and-long-term load prediction model of power system based on convolutional neural network
CN110942205A (en) Short-term photovoltaic power generation power prediction method based on HIMVO-SVM
CN110674947B (en) Spectral feature variable selection and optimization method based on Stacking integrated framework
CN108983849A (en) It is a kind of to utilize compound extreme learning machine ANN Control greenhouse method
CN114792156A (en) Photovoltaic output power prediction method and system based on curve characteristic index clustering
Hwang et al. Engineering synaptic plasticity through the control of oxygen vacancy concentration for the improvement of learning accuracy in a Ta2O5 memristor
CN112132177B (en) Machine learning based fast prediction of ABO 3 On-line forecasting method of perovskite band gap
Faulina et al. Ensemble method based on anfis-arima for rainfall prediction
CN114970725A (en) Adaboost-SVM-based transformer working condition identification method
CN111080001A (en) Deep neural network prediction method applied to wind speed of wind power plant
Khalil et al. The adaptive neuro-fuzzy inference system (ANFIS) application for the ammonium removal from aqueous solution predicting by biochar
CN109978024B (en) Effluent BOD prediction method based on interconnected modular neural network
Chen et al. Application of machine learning in perovskite materials and devices: A review
Hattrick-Simpers et al. An open combinatorial diffraction dataset including consensus human and machine learning labels with quantified uncertainty for training new machine learning models
CN110415769B (en) Low-activation steel design method based on machine learning under guidance of physical metallurgy
Wongsathan et al. Artificial intelligence and ANFIS reduced rule for equivalent parameter estimation of PV module on various weather conditions utilized for MPPT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination