CN115691695A - Material component generation method and evaluation method based on GAN and VAE - Google Patents

Material component generation method and evaluation method based on GAN and VAE Download PDF

Info

Publication number
CN115691695A
CN115691695A CN202211412749.8A CN202211412749A CN115691695A CN 115691695 A CN115691695 A CN 115691695A CN 202211412749 A CN202211412749 A CN 202211412749A CN 115691695 A CN115691695 A CN 115691695A
Authority
CN
China
Prior art keywords
network
vector
condition
gan
vae
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211412749.8A
Other languages
Chinese (zh)
Other versions
CN115691695B (en
Inventor
鲁鸣鸣
姚艺峰
王超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202211412749.8A priority Critical patent/CN115691695B/en
Publication of CN115691695A publication Critical patent/CN115691695A/en
Application granted granted Critical
Publication of CN115691695B publication Critical patent/CN115691695B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention discloses a material component generation method based on GAN and VAE, which comprises the steps of coding and expressing the composition of each material in an original data set to obtain a real data sample; performing one-hot encoding on the specific attribute of each material in the original data set to represent the specific attribute into a condition vector; constructing a conditional countermeasure self-encoder preliminary model; training a preliminary model of the conditional countermeasure self-encoder by adopting real data samples, condition vectors, latent variables and corresponding condition information to obtain a conditional countermeasure self-encoder model; the material composition is generated using the conditional opposing autoencoder model. The invention also discloses an evaluation method comprising the GAN and VAE-based material composition generation method. The invention can not only generate chemically effective material molecules, but also generate materials with specific attributes by using controllable condition information, keeps high novelty and high uniqueness, and has high reliability, good accuracy and wide application range.

Description

Material component generation method and evaluation method based on GAN and VAE
Technical Field
The invention belongs to the technical field of material information, and particularly relates to a method for generating and evaluating material components based on GAN and VAE.
Background
With the development of economic technology and the improvement of living standard of people, various new materials are widely applied to the production and the life of people, and bring endless convenience to the production and the life of people. Therefore, the search for new materials becomes one of the research focuses of researchers.
However, because of the huge material composition space, it is a very challenging task to efficiently explore useful materials from the material composition space. Most of the traditional methods are that scientists design theoretically feasible candidate materials in a certain material chemical system according to own intuition and rich experience, and then verify the feasibility of the designed materials through experiments. However, this method is not only inefficient and highly dependent on the level of expertise of the scientists, but also has a high cost.
Currently, with the rapid development of deep learning technology, more and more methods are combined with deep learning, and the deep learning makes breakthrough progress in the field of material informatics. The scheme of adopting the deep generative model to realize the tasks of material generation and material condition generation is superior to the traditional material space searching scheme. However, in the current field of material generation, the application of depth generation models still has the following three problems: (1) Although work has been done to apply deep generative models such as GAN or VAE to the field of material generation, these work is only directed to specific material systems, such as alloy materials, fixed materials, or hydrides; the application range of the scheme is small; (2) Although the technical solutions are not limited to a particular material system, candidate material generation across the system can be achieved, however, the materials generated by these solutions are either not sufficiently novel or are chemically ineffective; the reliability of such solutions is not high; (3) Some solutions fail to generate candidate materials based on specific target material properties, i.e., do not enable conditional generation.
Disclosure of Invention
One of the purposes of the present invention is to provide a method for generating a material component based on GAN and VAE, which has high reliability, good accuracy and a wide application range.
Another object of the present invention is to provide an evaluation method including the method for producing a material composition based on GAN and VAE.
The invention provides a method for generating a material component based on GAN and VAE, which comprises the following steps:
s1, representing the composition of each material in an original data set by adopting codes to obtain a real data sample;
s2, carrying out one-hot coding on the specific attribute of each material in the original data set, and expressing the specific attribute into a condition vector;
s3, constructing a conditional countermeasure self-encoder preliminary model based on the GAN network and the VAE network;
s4, training the preliminary model of the conditional countermeasure self-encoder constructed in the step S3 by adopting the real data sample obtained in the step S1, the condition vector obtained in the step S2, the latent variable obtained by random sampling and condition information corresponding to the latent variable to obtain a conditional countermeasure self-encoder model;
and S5, adopting the condition obtained in the step S4 to resist the self-encoder model, and generating the final material composition.
The step S1 of representing the composition of each material in the original data set by using codes to obtain a real data sample specifically includes the following steps:
counting and analyzing data in an OQMD data set and an MP data set of the open data set, and selecting e chemical elements as atom types; defining that the number of elements in the composition of each material in a real data sample is not more than n;
finally, the composition of each material is expressed by a matrix M, wherein M belongs to R e*n E is the total number of chemical elements and n is the total number of elements in the composition.
The step S2 of performing one-hot encoding on the specific attribute of each material in the original data set, and representing the specific attribute as a condition vector specifically includes the following steps:
selecting three material characteristics of chemical effectiveness, monatomic formation energy and band gap as generation condition information;
the generation condition information is encoded as follows:
if the material meets the charge center and the electronegativity balance, setting a chemical effectiveness mark Vflag to be 1; otherwise, setting the chemical effectiveness mark Vflag to be 0;
if the monatomic formation energy of the material is not more than 0, the monatomic formation energy flag Fflag is set to 1; otherwise, setting the monatomic formation energy flag Fflag to be 0;
if the band gap of the material is not less than 0, setting a band gap flag Bflag to be 1; otherwise, setting a band gap flag Bflag to be 0;
according to the rule of permutation and combination, the code of the condition information of the material can be obtained;
and converting the coding of the condition information of the material into a one-hot vector to express, so as to obtain the condition vector of the material.
Step S3, constructing a preliminary model of the conditional countermeasure self-encoder based on the GAN network and the VAE network, specifically comprising the following steps:
the constructed condition confrontation self-encoder preliminary model comprises a mapping network F, a generating network G, an encoding network E and an identifying network D;
the mapping network F is used for realizing the mapping of the data samples to the embedding space; mapping network F includes eight fully-connected layers, the first fully-connected layer for implementing the mapping of (128 + 8) dimensions to 512 dimensions, the second fully-connected layer to the eighth fully-connected layer for maintaining the mapping transformation of 512 dimensions;
generating a network G for enabling generation of candidate materials from the embedding space; the generation network G comprises a fully connected layer and four deconvolution layers: the fully connected layer realizes the mapping transformation from (512 + 8) dimension to 32 +8 dimension, the convolution kernels of the four deconvolution layers are all 3 × 3 in size, and the step sizes are (2,2), (2,2), (2,2) and (2,1), respectively;
the structure of the encoding network E is opposite to that of the generating network G, and the encoding network E is used for realizing the encoding of the candidate materials into the embedding space; the encoding network E comprises four convolutional layers and a fully-connected layer, the structure of each layer is opposite to that of each layer of the generation network G, the sizes of convolution kernels of the four convolutional layers are all 3 multiplied by 3, the step sizes are respectively set to be (2,1), (2,2), (2,2) and (2,2), and the fully-connected layer is used for realizing mapping transformation of (512 + 8) dimensions;
the identification network D is used for obtaining the authenticity judgment result of the corresponding material according to the input of the embedding space; the identification network D comprises six full connection layers, wherein the first full connection layer to the fifth full connection layer are used for realizing the conversion from (512 + 8) dimension to 512 dimension, and the sixth full connection layer is used for outputting the true and false judgment results of the candidate materials;
the input distribution of the generating network G matches the output distribution of the encoding network E.
Step S4, training the preliminary model of the conditional countermeasure autoencoder constructed in step S3 by using the real data sample obtained in step S1, the condition vector obtained in step S2, the latent variable obtained by random sampling, and the condition information corresponding to the latent variable, specifically including the following steps:
randomly generated condition information associating latent variable z with it
Figure BDA0003938727100000041
Splicing to obtain a first splicing vector
Figure BDA0003938727100000042
Vector the first splicing
Figure BDA0003938727100000043
Inputting the data into a mapping network F to obtain a mapping network output w z
Output w of the mapping network z And condition vector
Figure BDA0003938727100000044
Splicing to obtain a second splicing vector
Figure BDA0003938727100000045
Concatenate the second vector
Figure BDA0003938727100000046
Inputting the vector into a generation network G to obtain a vector meeting the condition
Figure BDA0003938727100000047
Candidate material of (2)
Figure BDA0003938727100000051
Mixing the candidate materials
Figure BDA0003938727100000052
Inputting the actual data sample x into the coding network E to obtain a candidate material
Figure BDA0003938727100000053
Latent variables in embedding space
Figure BDA0003938727100000054
And the latent variable w of the real data sample x in the embedding space x
Mixing the candidate materials
Figure BDA0003938727100000055
Latent variables in embedding space
Figure BDA0003938727100000056
And condition vector
Figure BDA0003938727100000057
Combining to obtain a third splicing vector
Figure BDA0003938727100000058
Latent variable w in embedding space for real data sample x x And condition vector
Figure BDA0003938727100000059
Combining to obtain a fourth splicing vector
Figure BDA00039387271000000510
Joining the third spliced vector
Figure BDA00039387271000000511
And a fourth stitching vector
Figure BDA00039387271000000512
Inputting the data into an identification network D to finally obtain the authenticity judgment result and the classification result of the corresponding material;
in the training process, firstly, the following Loss function Loss is adopted ED Updating the encoding network E and the authentication network D:
Figure BDA00039387271000000513
in the formula
Figure BDA00039387271000000514
Is a connection symbol; d is an authentication network; e is a coding network; p x Is the prior distribution of x;
Figure BDA00039387271000000515
is composed of
Figure BDA00039387271000000516
A prior distribution of;
Figure BDA00039387271000000517
is the loss of match between the probability distribution of x and the prior distribution of x;
Figure BDA00039387271000000518
is composed of
Figure BDA00039387271000000519
Probability distribution of
Figure BDA00039387271000000520
Prior distribution ofThe matching loss of (2); λ is a constant; | | is a 1 norm;
Figure BDA00039387271000000521
is a gradient operator; BCE () is a binary cross entropy loss; cls x Outputting variables for x corresponding to classification through the D network;
Figure BDA00039387271000000522
is composed of
Figure BDA00039387271000000523
Outputting variables corresponding to the classification through the D network; c is a condition vector corresponding to x;
Figure BDA00039387271000000524
is composed of
Figure BDA00039387271000000525
A corresponding condition vector;
then, the Loss function Loss is used as follows FG Updating the mapping network F and generating the network G:
Figure BDA0003938727100000061
finally, the matching Loss function Loss is adopted as follows EG Updating the encoding network E and the generating network G:
Figure BDA0003938727100000062
wherein F is a mapping network; g is a generation network; e is a coding network; z is a latent variable; p z Is the prior distribution of z;
Figure BDA0003938727100000063
a loss of match between the probability distribution of (a) and the prior distribution of z;
Figure BDA0003938727100000064
is 2The square of the norm.
And S5, generating a final material component by adopting the condition-confrontation self-encoder model obtained in the step S4, specifically, randomly sampling a latent variable z in the prior distribution p (z) by adopting the condition-confrontation self-encoder model obtained in the step S4, and generating the final material component.
The invention also provides an evaluation method comprising the GAN and VAE-based material composition generation method, and the evaluation method further comprises the following steps:
and S6, evaluating the material components obtained in the step S5 in the aspects of chemical effectiveness, condition generation, novelty and monatomic forming energy band gap.
The invention provides a material component generation model based on a GAN and VAE model, which can generate chemically effective material molecules, can generate materials with specific properties by using controllable condition information, and keeps high novelty and high uniqueness in the generation process; therefore, the invention has high reliability, good accuracy and wide application range.
Drawings
FIG. 1 is a schematic method flow diagram of the generation method of the present invention.
Fig. 2 is a schematic structural diagram of a generative model in the generation method of the present invention.
FIG. 3 is a schematic diagram of the comparison result analysis of the novelty of material generation by the generation method of the present invention and other methods.
FIG. 4 is a schematic method flow diagram of the evaluation method of the present invention.
Detailed Description
Fig. 1 is a schematic flow chart of the method of the generation method of the present invention: the invention provides a method for generating a material component based on GAN and VAE, which comprises the following steps:
s1, representing the composition of each material in an original data set by adopting codes to obtain a real data sample; the method specifically comprises the following steps:
counting and analyzing the data in the open dataset OQMD dataset and the MP dataset, and selecting e (preferably 86) chemical elements related to most of the materials in the dataset as atom types; defining that the number of elements in the composition of each material in a real data sample is not more than n; the number of the elements is preferably 8, because the number of the elements in most material compositions is less than or equal to 8;
finally, the composition of each material is expressed by a matrix M, wherein M belongs to R e*n E is the total number of chemical elements, n is the total number of elements in the composition; each row in the matrix M represents an element e (the elements are arranged in the order of the periodic table of elements, for a total of 86 elements); each column in the matrix M represents the number n of atoms of the element in the composition;
s2, carrying out one-hot coding on the specific attribute of each material in the original data set, and expressing the specific attribute into a condition vector; the method specifically comprises the following steps:
selecting three material characteristics of chemical effectiveness, monatomic formation energy and band gap as generation condition information;
the generation condition information is encoded as follows:
if the material meets the charge center and the electronegativity balance, setting a chemical effectiveness mark Vflag to be 1; otherwise, setting the chemical effectiveness mark Vflag to be 0;
if the monatomic formation energy of the material is not more than 0, the monatomic formation energy flag Fflag is set to 1; otherwise, setting a monatomic formation energy flag Fflag to be 0;
if the band gap of the material is not less than 0, setting a band gap flag Bflag to be 1; otherwise, setting a band gap flag Bflag to be 0;
according to the law of permutation and combination, any material can belong to one of 8 (2 x 2) categories; a code of condition information of the material can be obtained;
converting the codes of the condition information of the material into one-hot vectors for representation to obtain the condition vectors of the material;
s3, constructing a conditional countermeasure self-encoder preliminary model (ConALAE preliminary model, the structure of which is shown in figure 2) based on the GAN network and the VAE network; the method specifically comprises the following steps:
the constructed condition confrontation self-encoder preliminary model comprises a mapping network F, a generating network G, an encoding network E and an identifying network D; the mapping network F and the generating network G can serve as generating networks in the GAN, and the encoding network E and the authentication network D can serve as authentication networks in the GAN;
the mapping network F is used for realizing the mapping of the data samples to the embedding space; mapping network F includes eight fully-connected layers, the first fully-connected layer for implementing the mapping of (128 + 8) dimensions to 512 dimensions, the second fully-connected layer to the eighth fully-connected layer for maintaining the mapping transformation of 512 dimensions;
generating a network G for enabling generation of candidate materials from the embedding space; the generation network G comprises a fully connected layer and four deconvolution layers: the fully connected layer realizes the mapping transformation from (512 + 8) dimension to 32 +8 dimension, the convolution kernels of the four deconvolution layers are all 3 × 3 in size, and the step sizes are (2,2), (2,2), (2,2) and (2,1), respectively;
the structure of the encoding network E is opposite to that of the generating network G, and the encoding network E is used for realizing the encoding of the candidate materials into the embedding space; the encoding network E comprises four convolutional layers and a fully-connected layer, the structure of each layer is opposite to that of each layer of the generation network G, the sizes of convolution kernels of the four convolutional layers are all 3 multiplied by 3, the step sizes are respectively set to be (2,1), (2,2), (2,2) and (2,2), and the fully-connected layer is used for realizing mapping transformation of (512 + 8) dimensions;
the identification network D is used for obtaining the authenticity judgment result of the corresponding material according to the input of the embedding space; the identification network D comprises six full connection layers, wherein the first full connection layer to the fifth full connection layer are used for realizing the conversion from (512 + 8) dimension to 512 dimension, and the sixth full connection layer is used for outputting the true and false judgment results of the candidate materials;
the input distribution of the generated network G is matched with the output distribution of the coding network E, namely, the potential spatial distribution after passing through the coding network E is limited through prior distribution, and the VAE idea is embodied;
s4, training the preliminary model of the conditional countermeasure self-encoder constructed in the step S3 by adopting the real data sample obtained in the step S1, the condition vector obtained in the step S2, the latent variable obtained by random sampling and condition information corresponding to the latent variable to obtain a conditional countermeasure self-encoder model; the method specifically comprises the following steps:
randomly sampling potential variable z and corresponding condition information
Figure BDA0003938727100000091
Splicing is carried out to obtain a first splicing vector
Figure BDA0003938727100000092
Vector the first splicing
Figure BDA0003938727100000093
Inputting the data into a mapping network F to obtain a mapping network output w z
Will map the net output w z And condition vector
Figure BDA0003938727100000094
Splicing to obtain a second splicing vector
Figure BDA0003938727100000095
Second splicing vector
Figure BDA0003938727100000096
Inputting the vector into a generation network G to obtain a vector meeting the condition
Figure BDA0003938727100000097
Candidate material of (2)
Figure BDA0003938727100000098
Mixing the candidate materials
Figure BDA0003938727100000099
Inputting the actual data sample x into the coding network E to obtain a candidate material
Figure BDA00039387271000000910
In the embeddingLatent variables in space
Figure BDA00039387271000000911
And the latent variable w of the real data sample x in the embedding space x
Mixing the candidate materials
Figure BDA00039387271000000912
Latent variables in embedding space
Figure BDA00039387271000000913
And condition vector
Figure BDA00039387271000000914
Combining to obtain a third splicing vector
Figure BDA0003938727100000101
Latent variable w in embedding space for real data sample x x And condition vector
Figure BDA0003938727100000102
Combining to obtain a fourth splicing vector
Figure BDA0003938727100000103
Joining the third spliced vector
Figure BDA0003938727100000104
And a fourth stitching vector
Figure BDA0003938727100000105
Inputting the true and false judgment result and the classification result of the corresponding material into an identification network D;
in the training process, firstly, the following Loss function Loss is adopted ED Updating the encoding network E and the authentication network D:
Figure BDA0003938727100000106
in the formula
Figure BDA0003938727100000107
Is a connection symbol; d is an authentication network; e is a coding network; p x Is the prior distribution of x;
Figure BDA0003938727100000108
is composed of
Figure BDA0003938727100000109
A priori distribution of;
Figure BDA00039387271000001010
is the match loss between the probability distribution of x and the prior distribution of x;
Figure BDA00039387271000001011
is composed of
Figure BDA00039387271000001012
Probability distribution of
Figure BDA00039387271000001013
Matching loss of prior distribution of (a); λ is a constant; | | is a 1 norm;
Figure BDA00039387271000001014
is a gradient operator; BCE () is a binary cross entropy loss; cls x Outputting variables for x corresponding to classification through the D network;
Figure BDA00039387271000001015
is composed of
Figure BDA00039387271000001016
Outputting variables corresponding to the classification through the D network; c is a condition vector corresponding to x;
Figure BDA00039387271000001017
is composed of
Figure BDA00039387271000001018
A corresponding condition vector;
then, the following Loss function Loss is adopted FG Updating the mapping network F and generating the network G:
Figure BDA00039387271000001019
finally, the matching Loss function Loss is adopted as follows EG Updating the encoding network E and the generating network G:
Figure BDA00039387271000001020
wherein F is a mapping network; g is a generation network; e is a coding network; z is a latent variable; p z Is a prior distribution of z;
Figure BDA0003938727100000111
a loss of match between the probability distribution of (a) and the prior distribution of z;
Figure BDA0003938727100000112
is the square of the 2 norm.
S5, adopting the conditional countermeasure self-encoder model obtained in the step S4 to generate final material components; specifically, the conditional countermeasure self-encoder model obtained in step S4 is used to randomly sample the latent variable z in the prior distribution p (z) to generate the final material composition.
The effects of the process of the present invention will be further described with reference to examples.
The examples were performed on two large public datasets, OQMD and MP. The effect of a generative model is generally difficult to evaluate, and generally a corresponding evaluation index needs to be proposed for a specific application field. In the application, several important characteristics in the field of materials and some indexes of a generation model are considered to evaluate the generation effect of the ConALAE model and the base line model MatGAN provided by the application. On one hand, three material characteristics of chemical effectiveness (charge neutrality and electronegativity balance), monatomic formation energy and band gap are selected, wherein the monatomic formation energy is related to the thermal stability of the material, and the band gap is an important characteristic of various materials such as a solar cell. On the other hand, it is proposed to use the unique rate, the novelty rate, to evaluate the quality of samples generated by the ConALAE model and the baseline model MatGAN.
Generating a material chemical effectiveness assessment: the evaluation data are shown in table 1:
table 1 schematic table of chemical validity evaluation data of the resulting material
Figure BDA0003938727100000113
The ConALAE method (the method of the invention) was chemically more effective on both datasets than the MATGAN method, and was slightly less effective at the unique rate. In particular, from the results on the OQMD data set, the ConALAE model of the present invention has chemical effectiveness as high as 76.7%, and can be generated without the limitation of chemical effectiveness of the original material data set (the original chemical effectiveness is 41.6%). The MATGAN model can not break through the limit, and only 38.4% of chemical efficiency can be obtained in the experiment.
Analysis of material condition generation results: analytical data are shown in table 2:
table 2 analysis data schematic table of material condition generation result
Figure BDA0003938727100000121
Material generation novelty experimental analysis: analytical data are shown in table 3; fig. 3 is a schematic diagram illustrating analysis of comparison results of the generation method of the present invention and other methods in material generation novelty, fig. 3 (a) is a schematic diagram illustrating comparison results of the methods on an MP data set, and fig. 3 (b) is a schematic diagram illustrating comparison results of the methods on an OQMD data set;
TABLE 3 materials Generation summary of analysis data of novelty experiments
Method OQMD MP
MatGAN 70.10% 97.50%
The invention 96.10% 99.20%
As can be seen from table 2, table 3 and fig. 3, the generated material pair novelty index of the ConALAE model proposed by the present invention on both data sets is higher than that of the MatGAN currently optimized method. On the MP dataset, the method of the invention achieved a novelty rate of 99.2% and on the OQMD dataset also achieved a novelty rate of 96.1%. This demonstrates that the method of the invention can maintain the production of highly novel candidate materials even when a large amount of material composition is produced.
Fig. 4 is a schematic flow chart of the evaluation method of the present invention: the evaluation method comprising the GAN and VAE-based material composition generation method provided by the invention comprises the following steps:
s1, representing the composition of each material in an original data set by adopting codes to obtain a real data sample;
s2, carrying out one-hot coding on the specific attribute of each material in the original data set, and expressing the specific attribute into a condition vector;
s3, constructing a conditional countermeasure self-encoder preliminary model based on the GAN network and the VAE network;
s4, training the preliminary model of the conditional countermeasure self-encoder constructed in the step S3 by adopting the real data sample obtained in the step S1, the condition vector obtained in the step S2, the latent variable obtained by random sampling and condition information corresponding to the latent variable to obtain a conditional countermeasure self-encoder model;
s5, adopting the condition obtained in the step S4 to resist the self-encoder model to generate final material components;
and S6, evaluating the material components obtained in the step S5 in the aspects of chemical effectiveness, condition generation, novelty and monatomic forming energy band gap.

Claims (7)

1. A GAN and VAE based material composition generation method comprising the steps of:
s1, representing the composition of each material in an original data set by adopting codes to obtain a real data sample;
s2, carrying out one-hot coding on the specific attribute of each material in the original data set, and expressing the specific attribute into a condition vector;
s3, constructing a conditional countermeasure self-encoder preliminary model based on the GAN network and the VAE network;
s4, training the preliminary model of the conditional countermeasure self-encoder constructed in the step S3 by adopting the real data sample obtained in the step S1, the condition vector obtained in the step S2, the latent variable obtained by random sampling and condition information corresponding to the latent variable to obtain a conditional countermeasure self-encoder model;
and S5, adopting the condition obtained in the step S4 to resist the self-encoder model, and generating the final material composition.
2. The method according to claim 1, wherein the step S1 of representing the composition of each material in the original data set by using codes to obtain the actual data sample comprises the following steps:
counting and analyzing data in an open data set OQMD data set and an MP data set, and selecting e chemical elements as atom types; defining that the number of elements in the composition of each material in a real data sample is not more than n;
finally, the composition of each material is expressed by a matrix M, wherein M belongs to R e*n E is the total number of chemical elements and n is the total number of elements in the composition.
3. The GAN and VAE-based material composition generating method of claim 2, wherein the step S2 of encoding the specific property of each material in the original data set by one hot code and expressing it as condition vector comprises the following steps:
selecting three material characteristics of chemical effectiveness, monatomic formation energy and band gap as generation condition information;
the generation condition information is encoded as follows:
if the material meets the charge center and the electronegativity balance, setting a chemical effectiveness mark Vflag to be 1; otherwise, setting the chemical effectiveness mark Vflag to be 0;
if the monatomic formation energy of the material is not greater than 0, the monatomic formation energy flag Fflag is set to 1; otherwise, setting the monatomic formation energy flag Fflag to be 0;
if the band gap of the material is not less than 0, setting a band gap flag Bflag to be 1; otherwise, setting a band gap flag Bflag to be 0;
according to the rule of permutation and combination, the code of the condition information of the material can be obtained;
and converting the coding of the condition information of the material into a one-hot vector to express, so as to obtain the condition vector of the material.
4. The GAN and VAE-based material composition generating method according to claim 3, wherein the step S3 of constructing the conditional countermeasure autoencoder preliminary model based on the GAN network and the VAE network specifically comprises the following steps:
the constructed condition confrontation self-encoder preliminary model comprises a mapping network F, a generating network G, an encoding network E and an identifying network D;
the mapping network F is used for realizing the mapping of the data samples to the embedding space; mapping network F includes eight fully-connected layers, the first fully-connected layer for implementing the mapping of (128 + 8) dimensions to 512 dimensions, the second fully-connected layer to the eighth fully-connected layer for maintaining the mapping transformation of 512 dimensions;
generating a network G for enabling generation of candidate materials from the embedding space; the generation network G comprises a fully connected layer and four deconvolution layers: the fully connected layer realizes the mapping transformation from (512 + 8) dimension to 32 +8 dimension, the convolution kernels of the four deconvolution layers are all 3 × 3 in size, and the step sizes are (2,2), (2,2), (2,2) and (2,1), respectively;
the encoding network E is used for realizing the encoding of the candidate materials into the embedding space; the encoding network E comprises four convolutional layers and a full connection layer, the sizes of convolutional kernels of the four convolutional layers are all 3 multiplied by 3, the step sizes are respectively set to be (2,1), (2,2), (2,2) and (2,2), and the full connection layer is used for realizing mapping transformation of (512 + 8) dimensions;
the identification network D is used for obtaining the authenticity judgment result of the corresponding material according to the input of the embedding space; the identification network D comprises six full connection layers, wherein the first full connection layer to the fifth full connection layer are used for realizing the conversion from (512 + 8) dimension to 512 dimension, and the sixth full connection layer is used for outputting the true and false judgment results of the candidate materials;
the input distribution of the generating network G matches the output distribution of the encoding network E.
5. The method for generating a material composition based on GAN and VAE as claimed in claim 4, wherein the step S4 is performed by training the preliminary model of the conditional robust auto-encoder constructed in the step S3 by using the real data samples obtained in the step S1, the condition vectors obtained in the step S2, the latent variables obtained by random sampling, and the condition information corresponding to the latent variables, and specifically includes the following steps:
randomly sampling potential variable z and corresponding condition information
Figure FDA0003938727090000031
Splicing is carried out to obtain a first splicing vector
Figure FDA0003938727090000032
Vector the first splicing
Figure FDA0003938727090000033
Inputting the data into a mapping network F to obtain a mapping network output w z
Will map the net output w z And condition vector
Figure FDA0003938727090000034
Splicing to obtain a second splicing vector
Figure FDA0003938727090000035
Second splicing vector
Figure FDA0003938727090000036
Inputting the vector into a generation network G to obtain a vector meeting the condition
Figure FDA0003938727090000037
Candidate material of (2)
Figure FDA0003938727090000038
Mixing the candidate materials
Figure FDA0003938727090000039
Inputting the actual data sample x into the coding network E to obtain a candidate material
Figure FDA00039387270900000310
Latent variables in embedding space
Figure FDA00039387270900000311
And the latent variable w of the real data sample x in the embedding space x
Mixing the candidate materials
Figure FDA00039387270900000312
Latent variables in embedding space
Figure FDA00039387270900000313
And condition vector
Figure FDA00039387270900000314
Combining to obtain a third splicing vector
Figure FDA0003938727090000041
Latent variable w in embedding space for real data sample x x And condition vector
Figure FDA0003938727090000042
Combining to obtain a fourth splicing vector
Figure FDA0003938727090000043
Joining the third spliced vector
Figure FDA0003938727090000044
And a fourth stitching vector
Figure FDA0003938727090000045
Inputting the true and false judgment result and the classification result of the corresponding material into an identification network D;
in the training process, firstly, the following Loss function Loss is adopted ED Updating the encoding network E and the authentication network D:
Figure FDA0003938727090000046
in the formula
Figure FDA0003938727090000047
Is a connection symbol; d is an authentication network; e is a coding network; p x Is a prior distribution of x;
Figure FDA0003938727090000048
is composed of
Figure FDA0003938727090000049
A priori distribution of;
Figure FDA00039387270900000410
is the loss of match between the probability distribution of x and the prior distribution of x;
Figure FDA00039387270900000411
is composed of
Figure FDA00039387270900000412
Probability distribution of
Figure FDA00039387270900000413
Matching loss of prior distribution of (a); λ is a constant; | | is a 1 norm;
Figure FDA00039387270900000414
is a gradient operator; BCE () is a binary cross entropy loss; cls x Outputting variables for x corresponding to classification through the D network;
Figure FDA00039387270900000415
is composed of
Figure FDA00039387270900000416
Outputting variables corresponding to the classification through the D network; c is a condition vector corresponding to x;
Figure FDA00039387270900000417
is composed of
Figure FDA00039387270900000418
A corresponding condition vector;
then, the following Loss function Loss is adopted FG Updating the mapping network F and generating the network G:
Figure FDA00039387270900000419
finally, the matching Loss function Loss is adopted as follows EG Updating the encoding network E and the generating network G:
Figure FDA00039387270900000420
wherein F is a mapping network; g is a generation network; e is a coding network; z is a latent variable; p z Is the prior distribution of z;
Figure FDA0003938727090000051
a loss of match between the probability distribution of (a) and the prior distribution of z;
Figure FDA0003938727090000052
is the square of the 2 norm.
6. The GAN and VAE based material composition generation method of claim 5, wherein the step S5 adopts the conditional robust self-encoder model obtained in the step S4 to generate a final material composition, and specifically adopts the conditional robust self-encoder model obtained in the step S4 to randomly sample the latent variable z in the prior distribution p (z) to generate the final material composition.
7. An evaluation method comprising the GAN and VAE based material composition generation method according to any one of claims 1 to 6, further comprising the steps of:
and S6, evaluating the material components obtained in the step S5 in the aspects of chemical effectiveness, condition generation, novelty and monatomic forming energy band gap.
CN202211412749.8A 2022-11-11 2022-11-11 Material component generation method and evaluation method based on GAN and VAE Active CN115691695B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211412749.8A CN115691695B (en) 2022-11-11 2022-11-11 Material component generation method and evaluation method based on GAN and VAE

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211412749.8A CN115691695B (en) 2022-11-11 2022-11-11 Material component generation method and evaluation method based on GAN and VAE

Publications (2)

Publication Number Publication Date
CN115691695A true CN115691695A (en) 2023-02-03
CN115691695B CN115691695B (en) 2023-06-30

Family

ID=85052744

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211412749.8A Active CN115691695B (en) 2022-11-11 2022-11-11 Material component generation method and evaluation method based on GAN and VAE

Country Status (1)

Country Link
CN (1) CN115691695B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543745A (en) * 2018-11-20 2019-03-29 江南大学 Feature learning method and image-recognizing method based on condition confrontation autoencoder network
US20190279075A1 (en) * 2018-03-09 2019-09-12 Nvidia Corporation Multi-modal image translation using neural networks
US20200294630A1 (en) * 2019-03-12 2020-09-17 California Institute Of Technology Systems and Methods for Determining Molecular Structures with Molecular-Orbital-Based Features
CN112599208A (en) * 2019-10-02 2021-04-02 三星电子株式会社 Machine learning system and method for generating material structure of target material attributes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190279075A1 (en) * 2018-03-09 2019-09-12 Nvidia Corporation Multi-modal image translation using neural networks
CN109543745A (en) * 2018-11-20 2019-03-29 江南大学 Feature learning method and image-recognizing method based on condition confrontation autoencoder network
US20200294630A1 (en) * 2019-03-12 2020-09-17 California Institute Of Technology Systems and Methods for Determining Molecular Structures with Molecular-Orbital-Based Features
CN112599208A (en) * 2019-10-02 2021-04-02 三星电子株式会社 Machine learning system and method for generating material structure of target material attributes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LITAO CHEN 等: "Generative models for inverse design of inorganic solid materials", JOURNAL OF MATERIALS INFORMATICS *

Also Published As

Publication number Publication date
CN115691695B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
Kodi Ramanah et al. Super-resolution emulator of cosmological simulations using deep physical models
JP2021060992A (en) Machine learning system and method
Vieira et al. Improved efficient, nearly orthogonal, nearly balanced mixed designs
CN111782768B (en) Fine-grained entity identification method based on hyperbolic space representation and label text interaction
CN111428848B (en) Molecular intelligent design method based on self-encoder and 3-order graph convolution
Luck et al. Number Theory and Physics: Proceedings of the Winter School, Les Houches, France, March 7–16, 1989
CN112560966B (en) Polarized SAR image classification method, medium and equipment based on scattering map convolution network
US11455440B2 (en) Graphic user interface assisted chemical structure generation
CN114359582A (en) Small sample feature extraction method based on neural network and related equipment
CN112598039A (en) Method for acquiring positive sample in NLP classification field and related equipment
Flaut et al. Models and Theories in Social Systems
Maekawa et al. General generator for attributed graphs with community structure
Nousi et al. Autoencoder-driven spiral representation learning for gravitational wave surrogate modelling
CN115691695A (en) Material component generation method and evaluation method based on GAN and VAE
Cui et al. On robustness of neural odes image classifiers
Zheng et al. Variant map construction to detect symmetric properties of genomes on 2D distributions
CN114722920A (en) Deep map convolution model phishing account identification method based on map classification
Duan et al. Pre-trained bidirectional temporal representation for crowd flows prediction in regular region
Kekre et al. Discrete Sine Transform Sectorization for Feature Vector Generation in CBIR
Siregar Learning human insight by cooperative AI: Shannon-Neumann measure
Basu et al. Guest editors' introduction to the special section on syntactic and structural pattern recognition
Fabregat-Hernández et al. Exploring explainable AI: category theory insights into machine learning algorithms
Liu et al. A Click-through Rate Prediction Method Based on Interaction Features Extraction for High-dimensional Sparse Data
Knaute Tensor Networks: From Holography to Quantum Field Theory
CN111177557B (en) Interpretable nerve factor recommendation system and method based on inter-domain explicit interaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant