CN115579089A

CN115579089A - Method for screening ideal band gap perovskite material based on machine learning

Info

Publication number: CN115579089A
Application number: CN202211397291.3A
Authority: CN
Inventors: 冯晶; 杨超; 种晓宇; 何京津; 余威
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2022-11-09
Filing date: 2022-11-09
Publication date: 2023-01-06

Abstract

The invention relates to an ideal band gap perovskite material screening method based on machine learning, which comprises the steps of collecting experimental band gap data of an organic-inorganic hybrid perovskite material, constructing a feature pool by utilizing perovskite intrinsic features, calculating Pearson correlation coefficients among features, eliminating redundant features with strong correlation, performing importance ranking on the remaining features by utilizing a gradient lifting regression tree algorithm, performing iterative learning according to the ranking order to screen an optimal sub-feature combination when the model precision is highest, and constructing a machine learning band gap prediction model based on the gradient lifting regression algorithm and a symbolic regression algorithm through the optimal sub-features. The invention utilizes the intrinsic characteristics of elements as the intermediate input between the components and the band gap, reduces the characteristic dimension and the model complexity compared with the method of directly using the components as the input, reduces the model dimension to one dimension by combining the provided sub-characteristic screening method and the symbolic regression algorithm, has simple model and convenient use on the premise of ensuring the precision, and is beneficial to large-scale prediction screening.

Description

Method for screening ideal band gap perovskite material based on machine learning

Technical Field

The invention relates to the field of perovskite solar cells, in particular to a method for screening an ideal band gap perovskite material based on machine learning.

Background

Perovskite solar cells are widely concerned due to the characteristics of low preparation cost, high photoelectric conversion efficiency, flexible device manufacturing and the like. As a light absorption layer of a perovskite solar cell, organic and inorganic hybrid perovskites are mainly characterized by large adjustable range of band gap (forbidden bandwidth) and high carrier mobility. According to the Shockley-Queisser (SQ) theory, the ideal band gap value of the light absorption layer of the solar cell is 1.3-1.4eV, and the photoelectric conversion efficiency of the solar cell reaches an upper limit value in the range. However, the chemical composition space of the hybrid perovskite is too large due to too many constituent elements, and the method of regulating and controlling the band gap by regulating and controlling the chemical composition by adopting an experimental trial and error method is long in time consumption and high in cost.

The machine learning method is a fourth paradigm of material research, and can quickly establish a mapping relation between effective material characteristic input and one or more properties as output by constructing the material characteristic input, so as to achieve the purpose of predicting the properties of a new material through the characteristic input.

The existing method for screening other materials by utilizing a machine learning model is to directly predict the material properties from the material components, and when the material composition is excessive, the input dimension of the model is very high, so that the model is very complex and is not beneficial to large-scale rapid screening of subsequent components.

Disclosure of Invention

The invention aims to provide an ideal band gap perovskite material screening method based on machine learning, and by the method, a high-precision machine learning model can be constructed, and the rapid prediction of the band gap value of organic and inorganic hybrid perovskite and the screening of perovskite material components are realized.

The invention discloses a method for screening an ideal band gap perovskite material based on machine learning, which comprises the following steps of:

step 1, perovskite material data and band gap experimental values corresponding to each perovskite material are collected, the element composition of all perovskite materials is ABX, and the sum of element dose ratios of three positions of A, B and X is 1:1:3, wherein A represents any one, any two or three combinations of Cs, FA and MA, and FA is HC (NH) ₂ ) ₂ MA is CH ₃ NH ₃ B represents any one or two combinations of Pb and Sn, and X represents any one or two or three combinations of Br, cl and I;

step 2, taking the element dose ratio as the weight, carrying out weighted mathematical operation on the intrinsic characteristics corresponding to the elements A, B and X to obtain weighted average characteristics, then carrying out addition, subtraction and division operation on the weighted average characteristics to obtain operation characteristics, and taking the weighted average characteristics and the operation characteristics as initial characteristics;

step 3, redundant initial features with correlation greater than 0.95 are eliminated by calculating a Pearson correlation coefficient among the initial features, and a feature pool is constructed;

step 4, based on the feature pool obtained by calculation in the step 3, using a GBRT algorithm to perform feature importance sequencing;

step 5, performing sub-feature iterative screening by taking the prediction accuracy of the GBRT algorithm test set as an objective function, and screening out a sub-feature set corresponding to the GBRT model with the highest accuracy;

step 6, constructing a band gap prediction model of the perovskite material by taking the sub-feature set screened out in the step 5 as an input independent variable and the perovskite experiment band gap value as an output dependent variable;

and 7, according to the element composition, constructing a perovskite material component data set to be screened according to the component gradient with the element dose ratio of 0-1 and the step length of 0.01, and predicting and screening the perovskite material corresponding to the ideal band gap by using a band gap prediction model.

The invention has the beneficial effects that:

1. in the prior art, a screening mode of perovskite material components for the solar cell with an ideal band gap is an experimental trial and error method, and an ideal band gap value is obtained by continuously adjusting a distribution ratio. The method uses the machine learning model to screen the optimal components, and compared with the traditional experiment trial and error method, the certainty of the experiment direction is increased, and the cost and time of the experiment trial and error are reduced.

2. The method has the advantages that the intrinsic characteristics of elements are introduced as intermediate input between the components and the band gap, and the mode that the components are directly used as input in the traditional machine learning method is replaced, so that the dimension of model input and the complexity of the model are reduced, and the rapid screening of ten million-level potential components can be realized.

3. The traditional material screening and modeling based on the machine learning technology does not have a set of systematic characteristic screening process, and the invention realizes the rapid dimension reduction process from high-dimensional characteristics to low-dimensional characteristics through a layer-by-layer progressive screening process combining Pearson correlation coefficient screening, characteristic importance sorting and sub-characteristic iterative screening, thereby further reducing the complexity of model input.

The preferred embodiments of the invention are: and 6, constructing the band gap prediction model based on a GBRT algorithm.

Explanation: the gradient boosting regression tree algorithm (namely GBRT algorithm) is a classic integrated algorithm in the field of machine learning, and is jointly completed by a plurality of weak learners (decision trees), and the prediction results of each weak learner are accumulated by chain calling to be used as final prediction results to be output. The weak learner (decision tree) is a model for making decision prediction in a tree structure (including binary tree and multi-branch tree). And carrying out classification judgment on the independent variables through the information entropy, and obtaining a prediction result at the tail end of the tree.

By adopting a band gap prediction model constructed by a GBRT algorithm, organic and inorganic hybrid perovskite material components with target band gap values can be rapidly screened, the blindness of an experimental trial and error method is avoided, and the experimental cost is remarkably reduced.

The preferred embodiments of the invention are: in step 6, the band gap prediction model constructs a band gap empirical prediction formula based on a symbolic regression algorithm of a genetic algorithm.

The preferred embodiments of the invention are: in step 2, the intrinsic characteristics of the elements include: goldsccmidt tolerance factor, octahedral factor, average Pauli electronegativity, average fragrant ion radius, average electron affinity, s, p, d, f orbital average electron number, average atomic polarizability and average atomic radius.

The preferred embodiments of the invention are: in step 4, the GBRT algorithm indirectly calculates the importance index of each feature by calculating the information gain generated when the features are increased or decreased, and the sum of the importance indexes of all the features is 1.

The gradient lifting regression tree algorithm can calculate the influence degree (importance relation) of each independent variable feature on the output variable, so that the GBRT algorithm can be used for carrying out importance ranking on the independent variable features so as to screen main features and eliminate redundant features.

The preferred embodiments of the present invention are: in step 5, the method for iterative screening of the sub-features comprises the following steps: and 4, sorting according to the importance of the features in the step 4, deleting the last feature after each iteration, reserving the remaining features for GBRT model training, and screening out the corresponding sub-feature set when the GBRT model has the highest precision.

The preferred embodiments of the invention are: in step 5, the method for sub-feature iterative screening comprises the following steps: and adopting a ten-fold cross validation method, taking the root mean square error and the decision coefficient of the band gap value predicted by the GBRT model and the acquired band gap experimental value as judgment standards, and performing sub-feature iterative screening.

The preferred embodiments of the present invention are: the band gap empirical prediction formula is as follows:

wherein x _B-X Is the difference between the weighted average Paglie electronegativity of the B-bit element and the X-bit element.

The symbolic regression algorithm based on the genetic algorithm is an excellent algorithm for screening the optimal mathematical formula, iterative screening is carried out in a preset formula pool by simulating the characteristics of biogenetic evolution of nature organisms, the formula tree is randomly processed by using variation operation, and the optimal mathematical formula meeting the precision threshold is finally screened.

The precedent of applying the existing machine learning technology to material science modeling only focuses on generating a complex model file, neglects the expression of a mathematical formula with intuition, constructs a mathematical relation between element characteristics in components and band gaps by utilizing a symbolic regression algorithm while establishing the complex model file, has the characteristics of high precision and convenient use, and experimenters can quickly regulate and control band gap values based on the formula to guide the experimental process.

The preferred embodiments of the present invention are: in step 2, the method also comprises the step of carrying out normalization processing on the initial characteristics according to the following formula,

wherein x is _{normalization} The initial feature after normalization, x, the standard deviation of x, and the mean of x.

By normalizing the initial features, the initial features are scaled to the same range, and negative effects on model accuracy when the initial feature values are different greatly are avoided.

The preferred embodiments of the present invention are: in step 7, the screened perovskite materials corresponding to the ideal band gaps have the chemical formulas:

MA _0.61 FA _0.07 Cs _0.32 Pb _0.68 Sn _0.32 (Br _0.1 I _0.9 ) ₃ 、MA _0.68 FA _0.03 Cs _0.29 Pb _0.66 Sn _0.34 (Br _0.24 I _0.76 ) ₃ and MA _0.02 FA _0.08 Cs _0.9 Pb _0.5 Sn _0.5 (Br _0.3 I _0.7 ) ₃ 。

Drawings

FIG. 1 is a flow chart of the method for screening the ideal band gap perovskite material based on machine learning;

FIG. 2 is a graph of the results of the gradient boosting regression tree algorithm after ranking the feature importance;

FIG. 3 is a graph showing the results of the sub-feature screening;

FIG. 4 is a graph of gradient lifting regression tree model accuracy;

fig. 5 is a graph showing the formula screening result and the formula prediction accuracy of the symbolic regression algorithm in the second embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it should be understood that the preferred embodiments described below are only for illustrating the present invention and do not limit the scope of the present invention.

Example one

As shown in figure 1: the method for screening the ideal band gap perovskite material based on machine learning comprises the following steps:

step 1, perovskite material data and band gap experimental values corresponding to each perovskite material are collected to form a data set, and the element composition of all perovskite materials is ABX, and the element dose ratio sum of three positions of A, B and X is 1:1:3, wherein A represents any one, any two or three combinations of Cs, FA and MA, and FA is HC (NH) ₂ ) ₂ MA is CH ₃ NH ₃ B represents any one or two combinations of Pb and Sn, and X represents any one or two or three combinations of Br, cl and I; for example component MA _0.3 FA _0.4 Cs _0.3 Pb _0.6 Sn _0.4 (Br _0.2 I _0.8 ) ₃ The MA, FA, and Cs dose ratio sum is 1, pb and Sn dose ratio sum is 1, br, and I dose ratio sum is 3. The total number of 600 data is obtained, and the data set is randomly divided into a training set and a test set according to a proportion of 5.

And 2, taking the element dose ratio as the weight, performing weighted mathematical operation on the intrinsic characteristics corresponding to the elements A, B and X to obtain weighted average characteristics, performing addition, subtraction and division operation on the weighted average characteristics to obtain operation characteristics, and taking the weighted average characteristics and the operation characteristics as initial characteristics.

In step 2, the intrinsic characteristics corresponding to the elements A, B and X comprise Goldschmidt tolerance factors, octahedral factors, average Pauli electronegativity, average aromatic ion radius, average electron affinity, average electron numbers of s, p, d and f orbits, average atomic polarizability and average atomic radius. Wherein the A site element only uses the aromatic ion radius, and the B site and the above intrinsic characteristics of the X site element are both used. The intrinsic characteristics, the calculation formula of the intrinsic characteristics and the formula interpretation are shown in table 1:

TABLE 1

For example: the intrinsic characteristic of the element corresponds to the symbol IR in the table. A. The B and X position elements are weighted average calculated by using the IR of the aromatic ion according to the corresponding formula to generate IR _A ，IR _B ，IR _X Three characteristics, the addition, the subtraction and the division are continuously used among the three characteristics for mathematical operation, and finally the characteristic aiming at the IR is the IR _A ，IR _B ，IR _X ，IR _A-B ，IR _A-X ，IR _A+B ，IR _A+X ，IR _A/B ，IR _A/X ，IR _B-X ，IR _B+X ，IR _B/X . Besides Golddscmidt tolerance factor and octahedral factor, other intrinsic features are constructed using this method for the initial features. Goldsccmidt tolerance factor and octahedron factor do not distinguish A, B and X elements, and both are directly generated by the average aromatic ion radius of A and B elements through corresponding formulas in a table. According to the steps, 50 characteristics are correspondingly generated in each piece of data.

In this embodiment, in step 2, the method further includes performing normalization processing on the initial features according to the following formula,

wherein x is _{normalization} The normalized initial feature is x, the initial feature before normalization is x, mu is the standard deviation of x, and sigma is the mean value of x.

Step 3, redundant initial features with correlation >0.95 are eliminated by calculating the Pearson correlation coefficient among the initial features, a feature pool is constructed, and the redundant initial features and the reserved initial features are shown in a table 2:

TABLE 2

In Table 2, IR _A Average ion radius of aroma, IR, representing A site element _A-B Represents the difference between the average aromatic ion radii of the A site element and the B site element.

IR _X Represents the average aromatic ionic radius, IR, of the X-site element _B+X Represents the sum of the average aromatic ion radii of B site element and X site element, IR _B-X Represents the difference between the average ion radii of B site element and X site element, IR _B/X Represents the average ion radius of B site element divided by the average ion radius of X site element, EA _X Represents the average electron nucleophilic potential, DP, of the X-position element _X Representing the average atomic polarizability of X-site elements, AR _X Represents the average atomic radius of X-bit element, X represents the average Paglie electronegativity of X-bit element, X _B Represents the average Pagli electronegativity, χ, of the B-bit element _B-X Representing the difference between the weighted average Paglie electronegativity of the B-bit element and the X-bit element.

IR _B Represents the average aromatic ion radius of B-site element, EA _B Represents the average electron nucleophilic potential, DP, of the B-site element _B Representing the average atomic polarizability of B-site elements, AR _B Represents the average atomic radius of B-site element, f _B Represents the average number of electrons in the f orbital of the B-site element, f _B-X Represents the difference between the average number of electrons in the f orbitals of the B-bit element and the X-bit element.

IR _A+X Represents the sum of the average aromatic ion radii of the A site element and the X site element, IR _A-X Represents the difference between the average ion radii of the A site element and the X site element, IR _A/X Represents the average ion radius of A bit element divided by the average ion radius of X bit element, EA _B-X Represents the difference between the average electron nucleophilic potentials of the B-site element and the X-site element, DP _B-X Representing the difference in average atomic polarizabilities of the B-site element and the X-site element, AR _B-X RepresentsDifference of average atomic radius of B-bit element and X-bit element, s _B Represents the average number of electrons of the s orbital of the B-site element, p _B Represents the average number of electrons in p orbital of B-site element, d _B Represents the average number of electrons of d orbital of the B-site element, s _X Represents the average number of electrons of the s orbital of the X-bit element, s _B-X Represents the difference between the average number of electrons in the s-orbitals of the B-bit element and the X-bit element, p _X Represents the average number of electrons of p orbitals of the X-bit element, p _B-X Represents the difference between the average number of electrons in p-orbitals of the B-bit element and the X-bit element, d _X Represents the average number of electrons of d orbitals of the X-bit element, d _B-X Represents the difference between the average number of electrons in d-orbitals of the B-bit element and the X-bit element, f _X Represents the average number of electrons in the f orbital of the X-bit element.

And 4, based on the feature pool obtained by calculation in the step 3, performing feature importance ranking by using a GBRT algorithm, specifically, taking 11 features as input, and calculating information gain generated when the features are increased or decreased by using the GBRT algorithm to indirectly calculate the importance index of each feature, wherein the sum of the importance indexes of all the features is 1. The results of the ranking by importance are shown in fig. 2.

Step 5, performing sub-feature iterative screening by taking the prediction accuracy of the GBRT algorithm test set as an objective function, and screening out a corresponding sub-feature set when the accuracy of the GBRT model is highest; the method for sub-feature iterative screening comprises the following steps: and 4, sorting according to the importance of the features in the step 4, deleting the last feature after each iteration, reserving the rest features for GBRT model training, and screening out the corresponding sub-feature set when the GBRT model has the highest precision, wherein the screening result of the sub-feature set is shown in fig. 3.

And 6, constructing a band gap prediction model of the perovskite material by taking the sub-feature set screened in the step 5 as an input independent variable and the perovskite experiment band gap value as an output dependent variable, wherein the band gap prediction model is constructed on the basis of a GBRT algorithm. The construction process comprises the following steps: after the characteristics are input into the program, the hyperparameters required by the GBRT model program are adjusted, so that the GBRT model has the maximum precision (the root mean square error between the band gap value predicted by the GBRT model and the acquired band gap experimental value is minimum), the input hyperparameters at the moment are reserved, a model file is output, and the band gap prediction model construction is completed. The specific values and descriptions of the optimal hyperparameters are shown in table 3, and the final precision of the band gap prediction model is shown in fig. 4.

TABLE 3

Step 7, according to the element composition, constructing a perovskite material component data set to be screened according to the component gradient of 0-1 element dose ratio and 0.01 step length, and constructing 5x10 ⁷ And (3) a component data set to be screened is used for predicting and screening the perovskite material corresponding to the ideal band gap by using a band gap prediction model, wherein the ideal band gap range is 1.3-1.4eV.

In step 7, the screened perovskite materials corresponding to the ideal band gaps have the chemical formulas:

MA _0.61 FA _0.07 Cs _0.32 Pb _0.68 Sn _0.32 (Br _0.1 I _0.9 ) ₃ 、MA _0.68 FA _0.03 Cs

_0.29 Pb _0.66 Sn _0.34 (Br _0.24 I _0.76 ) ₃ and MA _0.02 FA _0.08 Cs _0.9 Pb _0.5 Sn _0.5 (Br _0.3 I _0.7 ) ₃ 。

Wherein, MA is _0.61 FA _0.07 Cs _0.32 Pb _0.68 Sn _0.32 (Br _0.1 I _0.9 ) ₃ The predicted value of the band gap is 1.36eV _0.68 FA _0.03 Cs

_0.29 Pb _0.66 Sn _0.34 (Br _0.24 I _0.76 ) ₃ The predicted value of the band gap is 1.39eV _0.02 FA _0.08 Cs _0.9 Pb _0.5 Sn _0.5 (Br _0.3 I _0.7 ) ₃ The predicted value of the band gap of (A) is 1.39eV.

In this embodiment, the method further includes the step of verifying: according to the screened 3 perovskite material components, preparing the perovskite thin film by using a one-step spin coating method, which specifically comprises the following steps:

screened in step 7The perovskite material is prepared by mixing the components in proportion of CH ₃ NH ₃ I，HC(NH ₂ ) ₂ I，CsI，CH ₃ NH ₃ Br，HC(NH ₂ ) ₂ Br，CsBr，CH ₃ NH ₃ Cl，HC(NH ₂ ) ₂ Dissolving Cl and CsCl in 1mL of mixed solvent with the volume ratio of DMF to DMSO being 7.5, stirring for 5h at room temperature, and filtering to obtain perovskite precursor solution; in a glove box in nitrogen atmosphere, a perovskite precursor solution is further spin-coated on a glass substrate to prepare a perovskite thin film, the spin-coating rotation speed is 3000rpm (revolutions per minute), chlorobenzene is used as an anti-solvent, the glass substrate is placed on a 150 ℃ hot bench to be annealed for 30min under the irradiation of 254nm ultraviolet light after the spin-coating is finished, and then the glass substrate is cooled to room temperature. The thickness of the resulting perovskite thin film was about 400nm. And (3) taking the prepared ten-component film, testing the absorption spectrum in an ultraviolet spectrophotometer, and calculating a corresponding band gap value. The band gap value is tested by an ultraviolet spectrophotometer, and the average absolute error between the band gap test value and the predicted value is only 0.02eV.

In the embodiment, a machine learning algorithm is utilized to establish an organic-inorganic hybrid perovskite band gap prediction model, so that rapid prediction from components to band gap values is realized, synthesis of perovskite materials with ideal band gap values is guided, ideal components capable of serving as perovskite solar cell light absorption layers are successfully screened out, experimental test results are very consistent with predicted values, experimental cost is greatly reduced, and 'blindness' of an experimental trial and error method is avoided.

Example two

The difference between the present embodiment and the first embodiment is:

in step 5, the method for sub-feature iterative screening is as follows: adopting a ten-fold cross verification method to calculate the Root Mean Square Error (RMSE) and the coefficient of determination (R) of the band gap value predicted by the GBRT model and the acquired band gap experimental value ² ) And as a judgment standard, performing sub-feature iterative screening. The method comprises the following steps: according to the feature importance ranking in the step 4, deleting the last feature in each iteration, reserving the remaining features for machine learning model training, and screening out a corresponding sub-feature set when the root mean square error of the predicted band gap value and the acquired experimental band gap value is lowest and the coefficient determining value is highest;

dividing the training set into ten parts by using a ten-fold cross validation method, training ten times, taking 9 parts of the training set in each training as the training set, taking the rest part of the training set as the test set, and enabling the model precision to be determined by RMSE and R between predicted values and experimental values of the ten test sets ² And (4) showing.

In this embodiment, a ten-fold cross-validation method is used to train the machine learning model, and the root mean square error and the decision coefficient R are used ² The value is expressed by precision, a high-precision machine prediction model is obtained by training, the root mean square error is 0.05eV, and the determination coefficient is 0.99. The constructed machine learning model can quickly and accurately predict the perovskite band gap value.

The present embodiment is different from the first embodiment in that: in step 6, the band gap prediction model constructs a band gap empirical prediction formula based on a symbolic regression algorithm of a genetic algorithm.

The band gap empirical prediction formula is as follows:

wherein x _B-X Is the difference between the weighted average pauli electronegativity of the B-bit element and the X-bit element.

The formula screening results and formula prediction accuracy of the symbolic regression algorithm are shown in figure 5.

The preferred embodiments of the present application are described in detail above with reference to the drawings, and typical known structures and common general knowledge in the art are not described herein too much, so that a person skilled in the art can complete and implement the technical solution of the present invention based on the teaching of the embodiments, and some typical known structures, known methods or common general knowledge in the art should not be considered as obstacles for the person skilled in the art to implement the present application.

The scope of the claims of the present application shall be determined by the content of the claims, and the content of the invention, the detailed description, and the drawings of the specification shall be interpreted as the claims.

Several modifications may be made to the embodiments of the present application within the scope of the technical idea of the present application, and the embodiments after such modifications should also be considered within the scope of the present application.

Claims

1. A method for screening an ideal band gap perovskite material based on machine learning is characterized by comprising the following steps:

step 1, perovskite material data and band gap experimental values corresponding to each perovskite material are collected, the element composition of all perovskite materials is ABX, and the sum of element dose ratios of three positions of A, B and X is 1:1:3, wherein A represents any one, any two or three combinations of Cs, FA and MA, and FA is HC (NH) ₂ ) ₂ MA is CH ₃ NH ₃ B represents any one or two combinations of Pb and Sn, and X represents any one, any two or three combinations of Br, cl and I;

2. The machine learning-based ideal bandgap perovskite material screening method according to claim 1, wherein: and 6, constructing the band gap prediction model based on a GBRT algorithm.

3. The machine learning-based ideal bandgap perovskite material screening method according to claim 1, wherein: in step 6, the band gap prediction model constructs a band gap empirical prediction formula based on a symbolic regression algorithm of a genetic algorithm.

4. The machine learning-based ideal bandgap perovskite material screening method of claim 1, wherein:

in step 2, the intrinsic characteristics of the elements include: goldsccmidt tolerance factor, octahedral factor, average Pauli electronegativity, average fragrant ion radius, average electron affinity, s, p, d, f orbital average electron number, average atomic polarizability and average atomic radius.

5. The machine learning-based ideal bandgap perovskite material screening method of claim 1, wherein:

in step 4, the GBRT algorithm indirectly calculates the importance index of each feature by calculating the information gain generated when the features are increased or decreased, and the sum of the importance indexes of all the features is 1.

6. The machine learning-based ideal bandgap perovskite material screening method of claim 1, wherein:

in step 5, the method for sub-feature iterative screening comprises the following steps: and 4, sorting according to the importance of the features in the step 4, deleting the last feature after each iteration, reserving the rest features for GBRT model training, and screening out the corresponding sub-feature set when the GBRT model has the highest precision.

7. The machine learning-based ideal bandgap perovskite material screening method of claim 1, wherein:

in step 5, the method for sub-feature iterative screening comprises the following steps: and adopting a ten-fold cross validation method, taking the root mean square error and the decision coefficient of the band gap value predicted by the GBRT model and the acquired band gap experimental value as judgment standards, and performing sub-feature iterative screening.

8. The machine learning-based ideal bandgap perovskite material screening method of claim 3, wherein:

the band gap empirical prediction formula is as follows:

9. The machine learning-based ideal bandgap perovskite material screening method according to claim 1, wherein:

in step 2, the method also comprises the step of carrying out normalization processing on the initial characteristics according to the following formula,

10. The machine learning-based ideal bandgap perovskite material screening method according to any one of claims 1 to 9, wherein: