Crystal property prediction and classification method based on attention mechanism and crystal atlas neural network
Technical Field
The invention relates to a crystal property prediction and classification technology, in particular to a crystal property prediction and classification method based on an attention mechanism and a crystal volume neural network.
Background
The simulation of the crystal properties is usually realized by a first-principle calculation method based on DFT (density functional theory), but the screening of crystal materials with ideal properties by using the first-principle is very time-consuming and is not low in calculation cost. Therefore, how to realize large-scale screening of crystalline materials becomes a difficult problem. With the development of computers, machine learning is becoming an important topic in the field of academic, and people try to adopt a machine learning method to perform large-scale crystal property simulation. With the continuous optimization of the machine learning algorithm, the simulation accuracy gradually approaches the result of the first principle calculation. The combination of machine learning and crystal simulation is helpful for realizing large-scale crystal research simulation, and accelerates the development and research of new crystal materials, thus receiving wide attention of people.
The difficulty in using machine learning methods for the simulation of crystal properties is: how to correctly encode chemical information (such as atomic information, crystal topology, etc.) in crystals of any size and to be compatible with machine learning models, and how to train models with sufficient accuracy from the limited available data.
The crystal map convolutional neural network is a machine learning algorithm for crystal property research, directly learns the crystal property from the connection of atoms in the crystal, and provides a universal and interpretable crystal chemical information coding mode. Various physical properties of crystals can be predicted based on a crystal map convolution neural network (abbreviated CGCNN) of a Graph Convolution (GCN). The crystal structure diagram is an omnidirectional multi-graph in which atoms are represented by nodes and the sides represent atomic bonds between the atoms. In CGCNN, node i uses a feature vector viIs to represent viThe characteristics of the encoding property of the atom i are contained in the method. Non-directional edge (i, j)kRepresents the k-th bond between atoms i and j, u (i, j)kThen the eigenvector representing the kth atomic bond between atoms i and j is represented. In order to solve the problem of the difference of interaction strength between neighbors, CGCNN designs a new convolution function,
wherein
Representing the connection of atoms and the eigenvectors of the atomic bonds.
b
(t)The convolution weight matrix, the self weight matrix and the bias of the t-th layer respectively, and g (-) represents the softplus activation function between the layers.
However, the CGCNN method has limited prediction accuracy as a machine learning method that is fast and capable of large-scale screening of crystalline materials. This is because CGCNN reduces the complexity of the network in order to improve the efficiency of the machine learning algorithm, and although the running rate is increased, it may cause a reduction in prediction accuracy. And the number of the default operation cycles (epochs) of the CGCNN method is 30, which can reduce the time consumption of the model establishing process, can also influence the fitting of the network, and can also cause the accuracy of the prediction model to be reduced.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects in the prior art, a crystal property prediction and classification method based on an attention mechanism and a crystal graph volume neural network is provided, a new convolution function is designed for the crystal graph convolution neural network, the capability of fusing the topological structure and the node characteristics in graph convolution can be improved, the calculation accuracy is improved, a new normalization method is introduced to regularize a depth graph volume network, the fitting of the network is improved, a better model is established, and the improved new network has the characteristics of rapid and large-scale crystal material screening.
The technical scheme is as follows: in order to achieve the above object, the present invention provides a crystal property prediction and classification method based on attention mechanism and crystal volume neural network, comprising the following steps:
s1: acquiring a crystallography information file (crystal structure data) and DFT calculation data, and dividing the crystallography information file and the DFT calculation data into a training set, a verification set and a test set;
s2: extracting crystal characteristics from the crystallography information file, inputting the crystal characteristics into a neural network, and acquiring neural network output;
s3: training and verifying the constructed neural network model by adopting a training set and a verification set respectively to obtain a prediction model and a classification model, and completing prediction of crystal properties through the prediction model; classification of the crystal properties is accomplished by a classification model.
Further, the method for acquiring the structure data of the crystal and the DFT calculation data in step S1 includes:
a1: connecting a Materials Project database through a pymatgen program package in python software, and exporting the id number of the crystal and DFT calculation data forming physical properties such as energy, absolute energy, band gap, Fermi energy and the like to a csv file;
a2: connecting a Materials Project database through a pymatgen program package in python software, reading a crystal id number in the exported csv file, and exporting a corresponding cif file (crystallography information file);
a3: prepare one atom _ init.json file: and the JSON file is used for storing the initialization vector of each element.
Further, the obtaining process of the neural network output in step S2 is as follows:
b1: extracting cif the atom characteristics in the document, the bond characteristics between each atom and its neighbor atoms, the index of each atom's neighbor atoms and the index of crystal mapping to atoms, and using them as the input of the neural network;
b2: generating a new vector by inputting the atomic characteristics of the network through the embedding layer, and then inputting the new atomic characteristic vector, the key characteristic vector and the index vector of the neighbor atom into the convolution layer;
b3: in the convolutional layer, an atom is regarded as a node, an atomic bond is regarded as an edge, a node vector, a neighbor node vector and an edge vector are connected through an index vector to form a new embedded vector, and the new vector passes through a full connection layer 1 and then node normalization processing is carried out on output;
b4: node vector h after node normalization and softplus function activation(t)Is M hidden vectors Z merged into neighbor featuresT∈R1×FThe combined arrays are transformed by a non-linear transformation, and then a shared attention vector q ∈ R is usedF′×1Get the attention value ωT;
B5: attention to the value ω using the softmax function1,ω2,…,ωMStandardizing to obtain final weight;
b6: combining the M hidden vectors merged into the neighbor features and the attention values of the M hidden vectors to obtain a final node embedding H(t)Carrying out batch normalization, adding normalized original node feature vectors of the input convolutional layers, activating by a softplus function and outputting;
b7: after 3 layers of convolution layers, a new vector fused with a local chemical environment is generated, the new vector generates a vector representing the whole crystal through the pooling layer, is activated through a softplus function, is connected to the full-link layer 2, is activated through the same function, and is input into the full-link layer 3 for output.
Further, the formula of the node normalization processing in step B3 is as follows:
wherein h is(t)Embedding vectors, μ, for newly generated nodes(t)Is node h(t)Average value of (a) ("sigma(t)Is the deviation of the node;
the node vector h in the step B4(t)Is expressed as:
wherein T is h(t)The T line of (1), T belongs to M, M represents the maximum number of neighbor atoms, and F is the number of hidden features of atoms;
the value ω of interest in said step B4TThe expression of (a) is:
ωT=qT·tanh(W·(ZT)T+b) (3)
wherein W ∈ RF′×FIs a weight matrix, b ∈ RF′×1Is a bias vector;
the final weight expression in step B5 is:
embedding H into the node in the step B6(t)The expression of (a) is:
H(t)=a1Z1+a2Z2+…+aMZM。 (5)
further, the convolution formula of the neural network in step S2 is:
wherein, Nodnorm (cndot.) represents node normalization, g (cndot.) represents a softplus activation function, Attention (cndot.) represents an Attention mechanism, and BN (cndot.) represents batch normalization.
Further, the training method of the prediction model in step S3 includes:
using mean square loss and random gradient descent as a loss function and optimizer; the mean square loss is shown in equation (7),
loss(xi,yi)=(xi-yi)2 (7)
in the formula, xiIs an input value, yiIs a target attribute value, namely a DFT calculated value; the prediction model uses the Mean Absolute Error (MAE) as an index for evaluating the performance of the model.
Further, the Mean Absolute Error (MAE) in the step S3: MAE represents the average value of absolute errors between the predicted value and the tested value, and is an evaluation index of the prediction model. As shown in the formula (8),
wherein x isiIndicates the predicted value, yiThe test values are indicated.
Further, the classification process of the classification model in the step S3 for the crystal property is as follows:
under the framework of the same neural network, the activation function of the output layer is changed into a logsoftmax activation function and matched with a negative log-likelihood loss function to realize the classification of crystal properties.
Further, the logsoftmax activation function in the step S3 is shown as formula (9), the negative log likelihood loss function is shown as formula (10),
the classification model takes accuracy (accuracuracy) and area under ROC curve (AUC) as indexes for evaluating model performance.
Further, the area under the ROC curve (AUC) is obtained by summing the areas of the portions under the ROC curve. The ROC curve abscissa is False Positive Rate (FPR), i.e., the probability that a positive case is determined but not a true case. The ordinate is the true rate (TPR), i.e., the probability that the positive case is also the true case. An AUC size of approximately 1 indicates a better classification model.
The attention mechanism is a special structure embedded in a machine learning model by people and used for automatically learning and calculating the contribution of input data to output data. The method can improve the capability of fusing the topological structure and the node characteristics in graph convolution by learning the importance weight of self-adaptive embedding.
The invention provides a crystal property prediction and classification method based on an attention mechanism and a crystal graph volume neural network. In addition, in order to reduce the risk of overfitting, node normalization is further introduced. The depth map convolution network is regularized by suppressing the characteristic correlation of hidden embedding and improving the smoothness of the model relative to the input node characteristics, and the overfitting risk of the network is reduced.
Has the advantages that: compared with the prior art, the method has the advantages that the crystal data collection, the crystal property prediction and the crystal property classification are taken as a complete system, the crystal graph convolution neural network and the attention mechanism are fully combined, the prediction and classification precision of the crystal property can be effectively improved, the consumed time is low, the engineering practical value is realized, the accurate large-scale crystal research simulation is facilitated, and the method guarantee is provided for the development and research of new crystal materials.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a diagram of the convolution structure of a convolutional layer in the present invention;
fig. 3 is a structural diagram of a neural network in the present invention.
Detailed Description
The present invention is further illustrated by the following figures and specific examples, which are to be understood as illustrative only and not as limiting the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalent modifications thereof which may occur to those skilled in the art upon reading the present specification.
The invention provides a crystal property prediction and classification method based on an attention mechanism and a crystal atlas neural network, which mainly comprises two stages of crystal property prediction and crystal property classification, wherein in the first stage, the mean square loss is used as a loss function, the random gradient descent is used as an optimizer, and the crystal Formation energy (Formation energy) and the Absolute energy (Absolute energy) are respectively used for the crystal) Band gap (Bandgap) and Fermi energy (Fermi energy) were predicted and compared to DFT calculation data. In the second stage, the activation function of the output layer is changed into logsoftmax activation function, the loss function is changed into negative log-likelihood loss function, and the threshold value of the total magnetic moment is 0.5 muBThe crystal of (2) is classified, and a wide bandgap semiconductor crystal having a bandgap threshold of 2.3eV is also classified.
As shown in fig. 1, the method for predicting and classifying crystal properties based on attention mechanism and crystal volume neural network provided by the present invention specifically includes the following steps:
s1: acquiring structure data and DFT calculation data of the crystal, and dividing the structure data and the DFT calculation data into a training set, a verification set and a test set;
the method for acquiring the structure data and DFT calculation data of the crystal comprises the following steps:
a1: connecting a Materials Project database through a pymatgen program package in python software, and exporting the id number of the crystal and DFT calculation data forming physical properties such as energy, absolute energy, band gap, Fermi energy and the like to a csv file;
a2: connecting a Materials Project database through a pymatgen program package in python software, reading a crystal id number in the exported csv file, and exporting a corresponding cif file (crystallography information file);
a3: prepare one atom _ init.json file: and the JSON file is used for storing the initialization vector of each element.
S2: collecting a crystallography information file, extracting crystal characteristics, inputting the crystal characteristics into a neural network, and acquiring neural network output;
the acquisition process of the neural network output is as follows:
b1: extracting cif the atom characteristics in the document, the bond characteristics between each atom and its neighbor atoms, the index of each atom's neighbor atoms and the index of crystal mapping to atoms, and using them as the input of the neural network;
b2: generating a new vector by inputting the atomic characteristics of the network through the embedding layer, and then inputting the new atomic characteristic vector, the key characteristic vector and the index vector of the neighbor atom into the convolution layer;
b3: in the convolutional layer, an atom is regarded as a node, an atomic bond is regarded as an edge, as shown in a convolutional structure diagram of fig. 2, a node vector, a neighbor node vector and an edge vector are connected through an index vector to form a new embedded vector, and the new vector passes through a full connection layer 1 and then node normalization processing is performed on output;
the formula of the node normalization processing is as follows:
wherein h is(t)Embedding vectors, μ, for newly generated nodes(t)Is node h(t)Average value of (a) ("sigma(t)Is the deviation of the node;
b4: node vector h after node normalization and softplus function activation(t)Is M hidden vectors Z merged into neighbor featuresT∈R1×FThe combined arrays are transformed by a non-linear transformation, and then a shared attention vector q ∈ R is usedF′×1Get the attention value ωT;
Node vector h(t)Is expressed as:
wherein T is h(t)The T line of (1), T belongs to M, M represents the maximum number of neighbor atoms, and F is the number of hidden features of atoms;
note the value ωTThe expression of (a) is:
ωT=qT·tanh(W·(ZT)T+b) (3)
wherein W ∈ RF′×FIs a weight matrix, b ∈ RF′×1Is a bias vector;
b5: attention to the value ω using the softmax function1,ω2,…,ωMStandardizing to obtain the mostA final weight;
the expression for the weights is:
b6: combining the M hidden vectors merged into the neighbor features and the attention values of the M hidden vectors to obtain a final node embedding H(t)Carrying out batch normalization, adding normalized original node feature vectors of the input convolutional layers, activating by a softplus function and outputting;
node embedding H(t)The expression of (a) is:
H(t)=a1Z1+a2Z2+…+aMZM (5)
b7: referring to fig. 3, after 3 convolutional layers, a new vector fused with the local chemical environment is generated, and then a vector representing the whole crystal is generated through the pooling layer, and is connected to the fully-connected layer 2 after being activated by the softplus function, and then is activated by the same function and then is input into the fully-connected layer 3 for output.
Based on the above process, the convolution formula of the neural network is:
wherein, Nodnorm (cndot.) represents node normalization, g (cndot.) represents a softplus activation function, Attention (cndot.) represents an Attention mechanism, and BN (cndot.) represents batch normalization.
S3: training and verifying the constructed neural network model by adopting a training set and a verification set respectively to obtain a prediction model and a classification model, and completing prediction of crystal properties through the prediction model according to the neural network output;
here the prediction model uses mean-square loss and random gradient descent as loss functions and optimizers; the mean square loss is shown in equation (7),
loss(xi,yi)=(xi-yi)2 (7)
in the formula, xiIs an input value, yiIs a target attribute value, namely a DFT calculated value; the prediction model takes the average absolute error (MAE) as an index for evaluating the performance of the model;
mean Absolute Error (MAE): MAE represents the average value of absolute errors between the predicted value and the tested value, and is an evaluation index of the prediction model. As shown in the formula (8),
wherein x isiIndicates the predicted value, yiThe test values are indicated.
S4: classification of the crystal properties is accomplished by a classification model.
The classification process of the classification model for the crystal properties here is:
under the framework of the same neural network, the activation function of the output layer is changed into a logsoftmax activation function and matched with a negative log-likelihood loss function to realize the classification of crystal properties.
The logsoftmax activation function is shown in equation (9), the negative log-likelihood loss function is shown in equation (10),
the classification model takes accuracy (accuracuracy) and area under ROC curve (AUC) as indexes for evaluating model performance.
The area under the ROC curve (AUC) is the sum of the areas of the sections under the ROC curve. The ROC curve abscissa is False Positive Rate (FPR), i.e., the probability that a positive case is determined but not a true case. The ordinate is the true rate (TPR), i.e., the probability that the positive case is also the true case. An AUC size of approximately 1 indicates a better classification model.
In the step, positive and negative samples are prepared as a classification basis, and the method specifically comprises the following steps: the total magnetic moment is greater than 0.5 muBIs set to 1, and the total magnetic moment is less than 0.5 muBThe crystal (b) is set to 0, the value of the test results is between 0 and 1, and crystals with a value greater than 0.5 are considered to have a total magnetic moment greater than 0.5 muBCrystals with a value less than 0.5 are considered to have a total moment less than 0.5 muB. Similarly, a crystal having a band gap greater than 2.3eV is set to 1, a crystal having a band gap less than 2.3eV is set to 0, and the results of the test are found to be a value between 0 and 1, a crystal having a value greater than 0.5 is considered to have a band gap greater than 2.3eV, and a crystal having a value less than 0.5 is considered to have a band gap less than 2.3 eV.
In the embodiment, the above scheme is applied to an example experiment, and data of 3 tens of thousands of crystals are collected in the experiment, wherein 80% of the data is training data, 10% is verification data, and 10% is detection data. In the prediction experiment, the errors of the predicted values of the two physical quantities of the absolute energy and the formation energy and the calculated DFT value are minimum, and the MAE is 0.103eV/atom and 0.060 eV/atom; the band gap and fermi energy predictions are most erroneous from the DFT calculations, with MAEs of 0.312eV and 0.343 eV. Greater than 0.5 mu for total magnetic momentBIn the crystal classification experiment, the accuracy of the model classification reaches 87.9 percent, and the AUC is 0.919. In a wide-bandgap semiconductor crystal classification experiment with a band gap larger than 0.23eV, the model classification accuracy is more 93.9%, and the AUC is 0.981. The running environment is Win10, the CPU is i7-10700k, and the GPU is RTX 3080.
Secondly, in order to better embody the effect of the method of the present invention, the present example performs a comparative experiment on the method of the present invention and the CGCNN method, and under the same data and over-parameter conditions, the most obvious improvement of the method of the present invention is the band gap, and the MAE thereof is reduced by 8.8%. In addition, the MAE formation energy and absolute energy were also reduced by 4.8% and 3.7%, respectively. Although the prediction error of fermi energy is the largest, the MAE still decreases by 1.4% by comparison. The research results further show that compared with the CGCNN method, the method of the invention introduced with an attention mechanism and node normalization has obvious improvement on the aspect of prediction precision.
When classifying the total magnetic moment, CGCNN uses 80% of the collected data as training data, which can be 86.9% accurate. However, the method of the present invention requires only 60% of the data to be used as training data to achieve the same accuracy. And, when 80% of the data is also used as the training set, the accuracy of the method of the present invention is improved by 1%. In classifying wide bandgap semiconductor crystals, CGCNN can achieve 92.1% accuracy using 80% of the data as training data, while the new method can achieve almost the same accuracy using only 40% of the data as training data. The accuracy of the method of the invention can be improved even by 1.8% when training is also performed using 80% of the data. Therefore, the accuracy level of the CGCNN can be achieved by using less training data, and the accuracy can be improved by using the training set with the same data quantity.