CN113326664A

CN113326664A - Method for predicting dielectric constant of glass based on M5P algorithm

Info

Publication number: CN113326664A
Application number: CN202110717315.8A
Authority: CN
Inventors: 赵谦; 赵明; 刘鑫; 陈阳; 匡宁
Original assignee: Nanjing Fiberglass Research and Design Institute Co Ltd
Current assignee: Nanjing Fiberglass Research and Design Institute Co Ltd
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2021-08-31
Anticipated expiration: 2041-06-28
Also published as: CN113326664B

Abstract

The invention discloses a method for predicting a glass dielectric constant based on an M5P algorithm, and belongs to the technical field of glass performance prediction. The method only constructs clusters but not crystals by providing a new construction method of an atomic structure model of oxide clusters with different symmetries and considering an electronic structure based on a first principle, so that the state of a system can be reflected, the calculation cost is not increased, and the prediction accuracy of the dielectric constant is ensured. In addition, the dielectric constant prediction model is constructed by adopting the M5P model tree, global linear regression does not need to be considered, the M5P model tree divides sample characteristics into a plurality of piecewise linear regressions, the tree is higher in interpretability, and the time for training the model of the algorithm is short.

Description

Method for predicting dielectric constant of glass based on M5P algorithm

Technical Field

The invention relates to a method for predicting a glass dielectric constant based on an M5P algorithm, and belongs to the technical field of glass performance prediction.

Background

Dielectric constant, also known as permittivity or relative permittivity, is an important parameter for characterizing the electrical properties of a dielectric or insulating material, with smaller dielectric constants giving faster propagation rates of signals. With the development of electronic technology and the rapid rise of 5G communication, miniaturization of electronic devices has become a mainstream trend, and electromagnetic wave frequencies used for electronic devices have reached GHz levels. This requires that the sealing glass have a low dielectric constant, which only serves to protect the circuitry, isolate the insulation, and prevent signal distortion in the electronic package. Meanwhile, the low dielectric constant glass can also reduce the relaxation and cross interference of signals, so the low dielectric constant glass has wide application prospect.

The publication No. CN110648727A entitled "preparation method of glass material with specific physical properties" discloses a preparation method of glass material, which takes the product of the property and content of cation element of glass network modifier and network intermediate oxide as input layer variable, glass properties as output variable, and combines with neural network algorithm to construct component intelligent design model. However, this method has two disadvantages: firstly, the data volume related to the performance of the glass material is relatively small at present, and overfitting (overfitting) of a model is easily caused by using machine learning methods such as a neural network and the like; secondly, the descriptors input as the machine learning model cannot well reflect the basic physical properties of various chemical components of the glass material, so that the well-fitted machine learning model is accurate only in a limited local chemical composition space, such as a chemical composition space identical to a training data set of machine learning data. Such machine learning models have only interpolated (interpolation) prediction capabilities and are not able to find optimal components in a broader composition space, i.e. do not have extrapolated (extrapolation) prediction capabilities.

The invention discloses a method for predicting the properties of a multi-component glass system, which is disclosed in the publication No. CN110364231B entitled "method for predicting the properties of a multi-component glass system", and is based on a first-character principle to carry out structural screening and calculate the properties of target glass according to a multi-component glass system lever model. However, the method cannot well reflect the basic physical properties of various chemical components of the glass material, and the performance of the glass cannot be predicted by adopting a machine learning related algorithm, so that the dielectric constant of the glass cannot be accurately predicted.

Disclosure of Invention

In order to accurately predict the dielectric constant of glass without being limited by chemical composition, the invention provides a method for predicting the dielectric constant of glass based on an M5P algorithm, which comprises the following steps: the method comprises the following steps:

step 1, collecting dielectric constant data of glass materials composed of different components, and constructing a dielectric constant database, wherein the database comprises glass components mapped one by one and dielectric constants corresponding to the glass components;

step 2, constructing atomic structure models of oxide clusters with different symmetries in the oxide glass material based on a first principle, and using the binding energy of each unit cation i of each cluster

The balance bond length of the cation corresponding to the cluster containing the cation i and the oxygen ion, the Bade charge of the cation i in various clusters and the HOMO-LUMO gap of various clusters containing the cation i; the HOMO-LUMO gap of each cluster containing the cation i is a descriptor containing a material gene, which takes the difference of the energy of the highest occupied molecular orbital and the lowest unoccupied molecular orbital of the cluster structure containing the cation i as a performance parameter to construct the dielectric constant;

step 3, constructing a training set, a verification set and a test set based on the dielectric constant database constructed in the step 1 and the descriptor constructed in the step 2;

step 4, constructing a dielectric constant prediction model based on the M5P model tree, and training the constructed dielectric constant prediction model according to the training set, the verification set and the test set constructed in the step 3 to obtain a trained dielectric constant prediction model;

and 5, aiming at the glass material to be predicted, predicting the dielectric constant of the glass material by using the trained dielectric constant prediction model.

Optionally, step 2 includes:

step 2-1, constructing atomic structure models of oxide clusters with different symmetries as unit cells calculated by a first principle;

step 2-2, calculating the unit cell structure of each type of oxide cluster constructed in the step 2-1 by a first principle to obtain cluster energy E of each unit cell_clusterAnd a structural constant;

step 2-3, for each of the cell structures constructed in step 2-1; constructing descriptors for machine learning by further calculating the first principle to obtain the performance parameter set

Where n is all non-zero integers between-3 and +3, C_iIs the ratio of the corresponding cations i, Cation is the set of cations i, x_iIs a performance parameter calculated corresponding to the first principle of cation i; the performance parameters include the binding energy per unit cation i of each cluster

The balance bond length of the cation corresponding to the cluster containing the cation i and the oxygen ion, the Bade charge of the cation i in various clusters and the HOMO-LUMO gap of various clusters containing the cation i; the HOMO-LUMO gap of the various clusters containing the cation i is the difference in energy of the highest occupied molecular orbital and the lowest unoccupied molecular orbital of the cluster structure containing the cation i.

Optionally, when the atomic structure model of the oxide clusters with different symmetries is constructed in step 2-1, the construction is performed according to the following rules:

(1) each cluster is located at one

In the cubic unit cell of (a);

(2) for each cation present in the glass composition, the atom corresponding to that cation is placed in the center of the unit cell, around which 2 oxygen atoms are added in a linear molecular manner; simultaneously adding a hydrogen atom to each oxygen atom along the extension direction of the atomic bond from the central atom to the oxygen atom, wherein each oxygen atom and each hydrogen atom form a hydroxyl group;

(3) for each cation present in the glass component, placing the atom corresponding to the cation in the center of the unit cell, adding 3 oxygen atoms on the same plane around the cation in a 3-time rotational symmetry manner, and simultaneously adding a hydrogen atom on each oxygen atom along the extension direction of the atomic bond from the center atom to the oxygen atom, wherein each oxygen atom and each hydrogen atom form a hydroxyl group;

(4) for each cation present in the glass composition, placing the atom corresponding to the cation in the center of the unit cell, adding 4 oxygen atoms around it in a tetrahedrally symmetric manner, while adding one hydrogen atom on each oxygen atom along the extension of the atomic bond from the center atom to the oxygen atom, each oxygen atom and hydrogen atom constituting one hydroxyl group;

(5) for each cation present in the glass composition, the atom corresponding to the cation is placed in the center of the unit cell, 6 oxygen atoms are added around it in an octahedral symmetry, while on each oxygen atom a hydrogen atom is added along the extension of the atomic bond from the central atom to the oxygen atom, each oxygen atom and hydrogen atom constituting a hydroxyl group.

Optionally, the binding energy per unit cation i of each cluster is

The calculation method is as follows:

using cluster energy E of oxide cluster in unit cell structure constructed in step 2-1_clusterSubtracting the sum of the energies of the single atoms with the same number and the same type to obtain the product, wherein the calculation formula is as follows:

wherein l is the number of oxygen atoms in the oxide cluster, E_iAnd E_OHRespectively being a single cation and a single hydroxy group in one

The cubic unit cell of (a).

Optionally, when the dielectric constant database is constructed in step 1, the method further includes preprocessing the acquired dielectric constant data, where the preprocessing includes:

judging whether the following two conditions are simultaneously satisfied or not according to the two glass components:

condition 1: the difference value of the mole ratio of the components of each oxide component is less than or equal to a first preset threshold, and the unit is percentage;

condition 2: the difference value of the dielectric constants is larger than a second preset threshold value, and the unit is percentage;

if both are true, the corresponding glass composition and corresponding dielectric constant data are removed from the database.

Optionally, the first preset threshold is 2%, and the second preset threshold is 10%.

Optionally, in step 4, constructing a dielectric constant prediction model based on the M5P model tree, including:

step 4-1, setting the maximum layer number set h of the M5P model tree as { h }₁，h₂，h₃，.....，h_gAnd the minimum set of sample numbers for node splitting f ═ f₁，f₂，f₃，.....，f_j}；

Step 4-2, for (h)₁，f₁) Dividing based on standard deviation of sample data to select characteristic parameters of split nodes of binary tree

Step 4-3, adopting a multivariate linear regression model to establish a linear regression model of the dielectric constant and the branch residual undivided characteristic parameter Unpar for each leaf node:

in the formula, Unpar is a set of the rest non-divided descriptors in the branch tree, theta is a regression parameter set, D is a set formed by all the descriptors, and I is an indicative function;

4-4, utilizing the training set D in the step 3₁Determining the coefficient [ theta ] of the linear model corresponding to each leaf node according to the least square method_0l，θ_l]Wherein [ theta ]_0l，θ_l]As a data set D_lRegression parameters of;

4-5, calculating a mean square error value of the linear regression model in the verification set by using the verification set in the step 3:

in the formula, N^*In order to verify the amount of data in a set,

to validate the predicted values of the set, y_aActual values for the validation set;

step 4-6, adopting a k-fold cross verification method, repeatedly executing the step 4-3 to the step 4-5 for k times, and calculating to obtain the final product (h)₁，f₁) Average of k-fold cross validation under conditions

Step 4-7, adjusting hyper-parameters (h, f) of the M5P tree model, and sequentially setting h coefficients as h₁，h₂，h₃，.....，h_gSetting f coefficient as f₁，f₂，f₃，.....，f_jRepeat and repeatStep 4-2 to step 4-6, calculating in sequence

Step 4-8, selecting the smallest

Corresponding to

The value is used as the optimal hyper-parameter of the M5P model;

step 4-9, taking the sum of all training sets { D } and verification set { V } as the training set of the final model after parameter adjustment is finished, namely { S }₁，S₂，S₃.....S_k}；

Step 4-10, based on steps 4-2 through 4-4, training set { S₁，S₂，S₃.....S_kAnd (5) obtaining a series of regression coefficients of leaf nodes of the M5P model after training to form a dielectric constant prediction model.

Optionally, the training set { D } and the verification set { V } are constructed by a k-fold cross-validation method.

The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

The invention has the beneficial effects that:

by providing a new construction method of an atomic structure model of oxide clusters with different symmetries and considering the electronic structure based on the first principle, only the clusters are constructed without constructing crystals, so that the state of the system can be reflected, the calculation cost is not increased, and the prediction accuracy of the dielectric constant is ensured. In addition, the dielectric constant prediction model is constructed by adopting the M5P model tree, global linear regression does not need to be considered, the M5P model tree divides sample characteristics into a plurality of piecewise linear regressions, the tree is higher in interpretability, and the time for training the model of the algorithm is short.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of a method for predicting the dielectric constant of glass based on the first principle in an embodiment of the present invention.

Fig. 2 is a flow chart of a first principle-based descriptor construction in an embodiment of the invention.

Fig. 3 is a schematic view of an atomic structural model of an oxide cluster in a tetrahedrally symmetric manner.

Fig. 4 is a schematic view of an atomic structure model of an oxide cluster in an octahedral symmetry manner.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The first embodiment is as follows:

the present embodiment provides a method for predicting the dielectric constant of glass based on M5P algorithm, referring to fig. 1, the method includes:

step 2, constructing a descriptor containing a material gene of the dielectric constant of the oxide glass material based on a first principle, wherein the descriptor comprises:

step 2-2, calculating the unit cell structure of each type of oxide cluster constructed in the step 2-1 by a first principle to obtain cluster energy E of each unit cell_clusterAnd structure oftenCounting;

Example two:

for the acquired dielectric constant data, in practical application, the method further comprises the step of preprocessing the acquired dielectric constant data, wherein the preprocessing comprises the following steps:

In the above two conditions, the first preset threshold and the second preset threshold may be determined by those skilled in the art according to prior knowledge, and in the present application, the first preset threshold is 2% and the second preset threshold is 10%.

Step 2, constructing descriptors containing 'material genes' of the dielectric constant of the oxide glass material based on a first principle, specifically, see fig. 2;

specifically, the method comprises the following steps:

(1) each cluster is located at one

In the cubic unit cell of (a);

(4) for each cation present in the glass composition, the atom corresponding to the cation is placed in the center of the unit cell, 4 oxygen atoms are added around the cation in a tetrahedrally symmetric manner as shown in fig. 3, and simultaneously, a hydrogen atom is added to each oxygen atom along the extension direction of the atomic bond from the center atom to the oxygen atom, and each oxygen atom and hydrogen atom form a hydroxyl group;

(5) for each cation present in the glass composition, the atom corresponding to the cation is placed in the center of the unit cell, 6 oxygen atoms are added around the cation in an octahedral symmetrical manner as shown in fig. 4, and simultaneously, one hydrogen atom is added to each oxygen atom along the extension direction of the atomic bond from the central atom to the oxygen atom, and each oxygen atom and hydrogen atom form a hydroxyl group;

that is, the above structures (2) to (5) are constructed once for each cation.

step 4 a: software Quantum Erespress calculated by adopting a first sex principle;

and 4 b: the cluster energy E of the constructed cell was obtained using the pw.x of Quantum espress with the following parameters to optimize the cell size_clusterAnd a structural constant;

the self-contained pseudopotential library of Quantum Espresso is adopted: pseudotopic type: PAW; functional type: PBE; non Linear Core Correction functional simulation

The truncation energy is 45 Ry-612 eV, and the self-consistent field convergence criterion is 10^-5Ry

The way of calculation is to obtain the equilibrium bond length of the cluster inside the unit cell by optimization (calculation ═ relax')

The unit cell structure keeps the original symmetry in the optimization process (nosym ═ FALSE')

A method of adopting a corresponding insulator for electron orbit occupation near the fermi level (air ═ fixed')

All calculations were performed by a method of non-spin polarization (nspin ═ 1)

For all the cluster unit cells, the lattice of K space is 1 multiplied by 1;

2-3, calculating the optimized structures containing various clusters according to a first principle; constructing descriptors for machine learning by further calculating the first principle to obtain the performance parameter set

Where n is all non-zero integers between-3 and +3, C_iIs the ratio of the corresponding cations i, x_iAre performance parameters calculated according to the first principle of the cation i, and the performance parameters are calculated according to the first principle.

The set of performance parameters includes the following performance parameters:

(1) binding energy per unit cation i of each cluster

Unit: eV/atom;

calculation of binding energy by using optimized cluster energy E_clusterSubtracting the sum of the energies of the single atoms with the same number and the same type to obtain the product, wherein the calculation formula is as follows:

wherein l is the number of oxygen atoms in the cluster, E_iAnd E_OHRespectively being a single cation and a single hydroxy group in one

The cubic unit cell energy of (a);

(2) the equilibrium bond length of the cation and oxygen ion corresponding to the cluster containing the cation i;

(3) bader Charge of cation i in various clusters

After the optimization of the first principle is completed, the Bader Charge of the cations i in various cluster structures is calculated according to an electron density file output by Quantum Espresso.

(4) Directly calculating HOMO-LUMO gap of each cluster containing the cation i through Quantum Espresso;

after the optimization of the first principle, the energy of the highest occupied molecular orbital and the lowest unoccupied molecular orbital of the cluster structure of the 4 cluster structures corresponding to the cation i is directly obtained, and the difference value is HOMO-LUMO gap.

Here, the calculation of the descriptor is further elaborated, exemplarily in connection with the above-mentioned performance parameters:

one set of data collected from the database is as follows: the glass structure contains A mol SiO₂，B mol B₂O₃，C mol Na₂And O, the dielectric constant of the component glass is y.

The total number of ions was calculated to be (3A +5B +3C) mol, where the proportion of Si atoms C_SiComprises the following steps:

B³⁺the proportion is as follows:

Na⁺the proportion is as follows:

in predicting the dielectric constant, for each cation i (Si, B, Na), the corresponding performance parameter x consists of the following 4 classes of parameters (16 seed parameters), including

(4 kinds in total, ET₁～ET₄) Cluster average bond length (4, LT)₁～LT₄) Bader Charge (4 kinds, Q)₁～Q₄) Group of 4 kinds of G₁～EG₄)。

When n is 1, the number of n is 4 as follows

)：

By analogy, the rest of n-valued descriptors (n-3, -2, -1, 2, 3) can be constructed (4 descriptors per n-value).

the specific process comprises the following steps:

step 3-1, p in terms of total data amount N from dielectric constant database₁% random drawn data as the first subset of tests { T%₁}；

Step 3-2, for the remaining (1-p)₁) % of the data set, selecting data with dielectric constant value less than third preset threshold value, and randomly selecting p from the data₂Data of% N as a second subset of tests { T }₂}；

Step 3-3, test subset for divide { T }₁}、{T₂Acquiring the glass components of the concerned specific components in the preset interval, and then randomly selecting p from the glass components₃Glass data of% N as a third subset of tests { T }₃}；

And 3-4, combining the three test subsets to form a test set { T } - { T } of the model₁，T₂，T₃And taking the rest data as a training set and a verification set of the model, wherein p is₁、p₂And p₃The value of (A) needs to ensure that the data ratio of the training set plus the verification set ({ D } + { V }) to the test set { T } is 9: 1.

The rest data are used as a training set and a verification set of the model, and the specific division process comprises the following steps:

constructing a training set { D } and a verification set { V } by adopting a k-fold cross validation method, and specifically comprising the following steps:

step 3-4-1, the remaining 90% N data in the database are sorted in ascending order according to dielectric constant values, and then divided into k disjoint subsets S on average₁，S₂，S₃.....S_k}；

Step 3-4-2, 1 subset S of the data is taken each time₁As a verification set { V }₁K, 1 ═ 1, 2, 3.. k, and the remaining k-1 subsets serve as training sets { D }₁Will train set { D₁V and verification set₁As cross validation data.

the specific process comprises the following steps:

step 4-1, setting the maximum layer number set h of the tree as h ═ h₁，h₂，h₃，.....，h_gAnd the minimum set of sample numbers for node splitting f ═ f₁，f₂，f₃，.....，f_j}；

The specific process comprises the following steps:

step 4-2-1, selecting each characteristic parameter

As binary tree partitioning nodes, respectively calculating the reduced values SDR of the standard deviations of the binary trees before and after partitioning:

where sd (T) represents the standard deviation of the total sample data, | T_bI denotes according to

The number of samples, b 1, 2, n, of each classified subset_class，n_classRepresents the number of subsets, and T represents the number of samples in the population;

step 4-2-2, selecting the characteristic corresponding to the maximum SDR value

Step 4-2-3, repeating iteration step 4-2-1 in each subset of the classification,selecting a series of characteristic parameters

Splitting the nodes until the tree layer number exceeds the set maximum layer number h₁Or the minimum number of samples of node splitting is less than a set value f₁The splitting is stopped and eventually all branches reach leaf nodes.

step 4-5, utilizing the verification set V in step 3₁Calculating the linear regression model in the verification set V₁Mean square error value of (1):

in the formula, N^*In order to verify the amount of data in a set,

Step 4-7, adjusting hyper-parameters (h, f) of the M5P tree model, and sequentially setting h coefficients as h₁，h₂，h₃，.....，h_gSetting f coefficient as f₁，f₂，f₃，.....，f_jRepeating the steps 4-2 to 4-6, and calculating sequentially

Step 4-8, selecting the smallest

Corresponding to

The value is used as the optimal hyper-parameter of the M5P model;

And 5, predicting the dielectric constant of the glass material by using the dielectric constant prediction model aiming at the glass material to be predicted.

The specific process comprises the following steps:

step 5-1, according to the process of step 2, constructing a descriptor of the glass material to be predicted

Step 5-2, the descriptor is processed

And substituting the dielectric constant into the dielectric constant prediction model to obtain the predicted dielectric constant.

To verify the effectiveness of the method of the present application, 10 groups of glass materials with known dielectric constants were predicted by the method of the present invention, and the prediction results are shown in table 1 below.

TABLE 1 comparison of predicted values and true values of the predicted glass dielectric constant by the method

As can be seen from the above Table 1, the average error between the dielectric constant of the glass predicted by the method provided by the invention and the true value is only 2.83%, and compared with the existing method, the method provided by the invention can relatively accurately predict the dielectric constant, thereby verifying the effectiveness of the method. Moreover, by adopting the method to predict the glass with unknown dielectric constant, the dielectric constant of the glass with different component proportions can be quickly estimated, the trial and error cost for glass research and development can be greatly reduced, and the method has great significance for some research and development targets with strict requirements on the dielectric constant of the glass.

EXAMPLE III

The embodiment provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the following steps:

step 2, constructing a descriptor containing a material gene of the dielectric constant of the oxide glass material based on a first principle;

In one embodiment, a computer-readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the steps of:

step 1, collecting dielectric constant data of a glass material, and constructing a dielectric constant database, wherein the database comprises glass components mapped one by one and dielectric constants corresponding to the glass components;

step 3, constructing a training set, a verification set and a test set based on the dielectric constant database and the descriptors constructed in the step 2;

step 4, constructing a dielectric constant prediction model based on the M5P model tree;

For the specific definition of each step, see the definition of the method for predicting the dielectric constant of the glass by using the M5P algorithm, which is not described herein again.

Some steps in the embodiments of the present invention may be implemented by software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for predicting the dielectric constant of glass based on an M5P algorithm, the method comprising:

2. The method of claim 1, wherein the step 2 comprises:

3. The method according to claim 2, wherein the step 2-1 constructs the atomic structure model of the oxide clusters having different symmetries according to the following rules:

(1) each cluster is located at one

In the cubic unit cell of (a);

4. The method of claim 3, wherein the binding energy per unit cation i of each cluster is

The calculation method is as follows:

The cubic unit cell of (a).

5. The method according to claim 4, wherein the step 1 of constructing the dielectric constant database further comprises preprocessing the collected dielectric constant data, wherein the preprocessing comprises:

6. The method according to claim 5, wherein the first predetermined threshold is 2% and the second predetermined threshold is 10%.

7. The method of claim 6, wherein the step 4 of constructing the dielectric constant prediction model based on the M5P model tree comprises:

step 4-1, setting the maximum layer number set h of the M5P model tree as { h }₁，h₂，h₃，……，h_gAnd the minimum set of sample numbers for node splitting f ═ f₁，f₂，f₃，……，f_j}；

4-4, utilizing the training set D in the step 3_lDetermining the coefficient [ theta ] of the linear model corresponding to each leaf node according to the least square method_0l,θ_l]Wherein [ theta ]_0l,θ_l]As a data set D_lRegression parameters of;

in the formula, N^*In order to verify the amount of data in a set,

Step 4-8, selecting the smallest

Corresponding to

The value is used as the optimal hyper-parameter of the M5P model;

8. The method of claim 7, wherein the training set { D } and validation set { V } are constructed using a k-fold cross validation method.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 8 are implemented when the computer program is executed by the processor.