CN111755080B - Method for predicting methane gas adsorption performance of MOF (metal oxide film) based on deep convolutional neural network - Google Patents

Method for predicting methane gas adsorption performance of MOF (metal oxide film) based on deep convolutional neural network Download PDF

Info

Publication number
CN111755080B
CN111755080B CN202010374618.XA CN202010374618A CN111755080B CN 111755080 B CN111755080 B CN 111755080B CN 202010374618 A CN202010374618 A CN 202010374618A CN 111755080 B CN111755080 B CN 111755080B
Authority
CN
China
Prior art keywords
mof
classifier
neural network
data
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010374618.XA
Other languages
Chinese (zh)
Other versions
CN111755080A (en
Inventor
卢罡
赵正阳
阳庆元
李睿琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Chemical Technology
Original Assignee
Beijing University of Chemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Chemical Technology filed Critical Beijing University of Chemical Technology
Priority to CN202010374618.XA priority Critical patent/CN111755080B/en
Publication of CN111755080A publication Critical patent/CN111755080A/en
Application granted granted Critical
Publication of CN111755080B publication Critical patent/CN111755080B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C60/00Computational materials science, i.e. ICT specially adapted for investigating the physical or chemical properties of materials or phenomena associated with their design, synthesis, processing, characterisation or utilisation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/08Investigating permeability, pore-volume, or surface area of porous materials
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/08Investigating permeability, pore-volume, or surface area of porous materials
    • G01N2015/0866Sorption

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Dispersion Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Separation Of Gases By Adsorption (AREA)
  • Investigating Or Analyzing Materials By The Use Of Fluid Adsorption Or Reactions (AREA)

Abstract

The invention discloses a method for predicting the adsorption performance of MOF (metal oxide film) on methane gas based on a deep convolutional neural network. According to the invention, a convolutional neural network is used for designing a classifier, a MOF basic three-dimensional structure in a CIF file is converted into characteristics acceptable by the classifier, and then the MOF adsorption capacity to methane gas is divided into a plurality of intervals, and model training is carried out to obtain the predicted performance category of the MOF.

Description

Method for predicting methane gas adsorption performance of MOF (metal oxide film) based on deep convolutional neural network
Technical Field
The invention belongs to the technical field of intelligent prediction of functional materials, and particularly relates to a method for predicting the adsorption performance of MOF (metal oxide semiconductor field effect) on methane gas based on a deep convolutional neural network.
Background
Metal-organic frameworks (Metal-Organic Frameworks), abbreviated as MOFs, are organic-inorganic hybrid materials with intramolecular pores formed by self-assembly of organic ligands and Metal ions or clusters through coordination bonds, belonging to the class of coordination polymers. The metal-organic frameworks consist of a three-dimensional periodic network of molecules constituting building blocks, such as metal clusters and organic linkers. The result of the possible combination of these numerous building blocks under different topologies is an almost unlimited number of potential MOFs. In MOFs, the arrangement of organic ligands and metal ions or clusters has significant directionality, and different framework pore structures can be formed, thereby exhibiting different adsorption properties, optical properties, electromagnetic properties, and the like. MOFs present great development potential and attractive development prospects in modern materialization.
Maryam Pardakhti et al used the physical structure, chemical structure, and combination of physical and chemical structures of the metal-organic framework as inputs, respectively, and compared the predictive effects of decision trees, poisson regression, support vector machines, and random forest methods, and concluded that the predictive effects using the random forest algorithm were best using the combination of physical and chemical structures as inputs. However, in practical applications, the available metal-organic frameworks are very large, and their chemical structures and some physical structures need to be obtained through calculation and a lot of experiments, so that the method has a certain limitation in practical applications.
Disclosure of Invention
Aiming at the problem that the adsorption value of methane gas cannot be predicted by using the MOF three-dimensional basic structure information in the CIF file in the prior art, the invention provides a method for predicting the adsorption performance of the MOF on the methane gas based on a deep convolutional neural network (Convolutional neural network, CNN), which is implemented by combining characteristic information in a basic space structure of the MOF excavated by a deep learning model and extracting the characteristics of the CIF file storing the basic space structure information of the MOF.
The technical proposal is as follows:
a method for predicting the adsorption performance of MOF to methane gas based on a deep convolutional neural network comprises the following steps:
1) Data preprocessing: expanding an original CIF file database and converting the MOF basic three-dimensional structure in the CIF file into characteristics acceptable by a classifier;
2) Constructing a classifier by using a convolutional neural network, and performing iterative training on the classifier by dividing the MOF into intervals of methane gas adsorption capacity;
3) The preprocessed data is input into a trained classifier to output the predicted adsorption value category of MOF to methane.
Further, the specific process of step 1) is as follows:
classifying the original data set according to the adsorption value, randomly dividing the original data set into a training set and a testing set, and expanding data in other categories by taking one category with the largest data quantity as a reference; by rotating the MOF unit cells in the CIF document around a certain coordinate axis, wherein the rotation angle is an included angle on a plane perpendicular to the coordinate axis, the side lengths of the MOF unit cells obtained after rotation are exchanged, the sides a, b and c reflected in the CIF document, namely the document, are exchanged, the atomic coordinates are changed, and the included angles of the corresponding a, b and c are changed, so that various CIF documents generated based on MOFs with the same structure are obtained, and data expansion is realized. In addition, the atomic relative coordinates stored in the data are converted into absolute coordinates in space, the sizes of the atomic absolute coordinates in all MOF files are counted, the atomic absolute coordinates are normalized by a maximum and minimum normalization method, and then the normalized MOF unit cell coordinates are amplified to obtain xy-plane, xz-plane and yz-plane projection matrixes.
Furthermore, in step 2), a classifier is built based on a convolutional neural network and a residual block, iterative training is performed, and finally, a test set is used for testing to obtain the classifier with the best classifying effect, wherein a cross entropy function is used as a loss function of the classifier, and a LeakyRelu function is used as an activation function after each convolutional layer to prevent overfitting.
The beneficial effects obtained by the invention are as follows:
the method for predicting the adsorption performance of the MOF on the methane gas based on the deep convolutional neural network utilizes the angle and other information in the CIF file to realize data expansion on the MOF database, the convolutional neural network is used for extracting the characteristics of the three-dimensional space basic structure of the MOF stored in the CIF file, then the extracted characteristics are utilized for carrying out iterative training on the classifier, and finally the trained classifier is applied to the predicted adsorption value of the MOF on the methane gas. The invention can better process the CIF file storing the basic three-dimensional structure information of the MOF, extract the characteristics from the CIF file and obtain the gas prediction adsorption performance of the MOF on methane with higher accuracy by the characteristics.
Drawings
FIG. 1 is a flow chart of a method for predicting methane gas adsorption performance of MOF of the present invention;
FIG. 2 is a ROC curve obtained by the classifier of the present invention;
fig. 3 is a ROC curve obtained in the prior art.
Detailed Description
In order to enable those skilled in the art to better understand the technical scheme of the invention, the method for predicting the methane gas adsorption performance based on the depth convolution neural network provided by the invention is described in detail below with reference to the embodiment. The following examples are only illustrative of the present invention and are not intended to limit the scope of the invention.
Step one, data preprocessing:
the raw data set is divided into five classes according to the size of adsorption value, and the data set is divided into 9: the scale of 1 is randomly divided into training and testing sets from five classes. The data in the remaining classes are then expanded based on the one class with the greatest amount of data, and the CIF file stores three angles alpha, beta, gamma in the three coordinate planes xy, xz, yz of the MOF unit cell, and the relative atomic coordinates of the atoms in the MOF unit cell in the unit cell. The unit cell after rotation also needs to be able to be periodically extended in x, y, z directions and the resulting MOF remains unchanged, and therefore cannot be rotated at any angle. According to alpha, beta and gamma stored in the CIF file, anticlockwise rotation or clockwise rotation is carried out along the x-axis, the y-axis and the z-axis, so that MOFs generated by periodic expansion of cells in the x-direction, the y-direction and the z-direction are obtained through rotation, and are identical to the original MOF structure, namely the adsorption performance of the MOFs on methane gas is kept unchanged, and information such as side lengths, angles and coordinates in the CIF file is changed, so that various CIF files are generated for MOFs of the same structure, namely the expansion of an original data set is realized. Because the CIF file stores the relative coordinates of atoms in the MOF, the relative coordinates of atoms stored in the data are converted into absolute coordinates in space after the data expansion
In the above formula, x, y and z are absolute coordinates of the obtained atoms, x ', y ', z ' are relative coordinates of the atoms, a, b and c are three side lengths of a single unit cell of the MOF, and alpha, beta and gamma are included angles of three sides of the MOF.
And then counting the absolute coordinate sizes of atoms in all MOF files, normalizing the atomic coordinates, normalizing with maximum and minimum normalization, and then placing normalized MOF unit cells in a space with the size of 100x100x 100.
X in the above new ,y new ,z new For atomic coordinates in a 100x100x100 space, x, y, z are absolute atomic coordinates, x max ,y max ,z max For the maximum coordinate values of atoms x, y and z in all data, x min ,y min ,z min Is the minimum coordinate value of the atoms x, y and z in all data.
And finally, the MOFs in the 100x100x100 space are projected to an xy plane, a yz plane and an xz plane respectively to obtain three projection matrixes.
Step two, designing a classifier: based on a convolutional neural network and residual blocks, designing a classifier, performing iterative training, and finally testing by using a test set to obtain the classifier with the best classifying effect, wherein a cross entropy function is adopted as a loss function of the classifier, and a LeakyRelu function is adopted as an activation function after each convolutional layer to prevent overfitting.
Step three, predicting and obtaining the predicted adsorption type of MOF to methane gas by using a classifier: and inputting the preprocessed data into a classifier, and obtaining the predicted adsorption value category of the MOF to the methane according to the classification result output by the classifier.
Example 1
As shown in fig. 1, the method for predicting the adsorption performance of the MOF to the methane gas by adopting the deep convolutional neural network comprises the following specific steps:
1. data preprocessing: the data set used in the examples is the paper "In silico discovery of metal-organic frameworks for precombustion CO" published by Chung Y G et al in 2016, "Science Advances 2 capture using a genetic algorithm ", contains 51163 MOFs. The data set contains 51163 MOF samples and the adsorption value of the MOF samples to methane gas, and each MOF sample contains the atomic name, the atomic relative coordinates, the trilateral length and the trilateral included angle of each MOF unit cell. The adsorption value is 530cm at most obtained by statistics 3 And/g, and classifying MOFs according to adsorption values into five categories. After classification according to 9:1 into training set and test set, 0-106cm 3 The total data of the interval/g is 7717, 106-212cm 3 The total data of the interval/g is 13056, 212-318cm 3 The total data of the interval/g is 16955, 318-424cm 3 7274 pieces of data in the interval of/g and 424-530cm 3 The/g interval has 742 pieces of data. At 212-318cm 3 And (3) taking the data quantity of the interval/g as a reference, and expanding the rest interval training data. The data expansion method is to rotate the included angle of the MOF on the plane perpendicular to a certain coordinate axis, and the relative coordinates of atoms of the MOF after rotation are changed, but the spatial structure is not changed, so that multiple CIF files are generated for the MOF with the same structure, namely, data expansion is realized. After the data set is expanded, the atomic relative coordinates stored in the data are converted into absolute coordinates in space, the maximum normalization processing and the minimum normalization processing are carried out, and then the coordinates after the normalization processing are amplified by 100 times to the xy plane, the xz plane and the yz plane for projection, so that a projection matrix of 100x100 is obtained.
2. Designing a classifier: based on a convolutional neural network and residual blocks, designing a classifier, performing iterative training, and finally testing by using a test set to obtain the classifier with the best classifying effect, wherein a cross entropy function is adopted as a loss function of the classifier, and a LeakyRelu function is adopted as an activation function after each convolutional layer to prevent overfitting.
3. The predicted adsorption class of MOF to methane gas is predicted using a classifier: the method inputs the preprocessed data into a classifier, and obtains the predicted adsorption value category of the MOF on methane according to the classification result output by the classifier.
To demonstrate the effectiveness of the proposed method, the present invention compares the ROC curve 2 obtained for the dataset at the classifier with the ROC curve 3 obtained in the remaining prior art.
ACC and AUC values of the CIF-CNN classifier model and the comparative experimental method created in the invention are shown in Table 1:
TABLE 1
ACC AUC
CIF-CNN 82% 0.94
SVM 29% 0.54
DT 40% 0.63
RF 46% 0.76
As can be seen from table 1, on this data set, the proposed model prediction method using the deep convolutional network obtains the best ACC value and AUC value with a large difference, which indicates that the model prediction method using the deep convolutional network is more suitable for feature extraction of the MOF three-dimensional space infrastructure, and obtains the gas adsorption predicted value.
While the present invention has been described in detail with reference to the embodiments, the present invention is not limited to the above-described embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art, and the present invention shall also be considered as the scope of the present invention.

Claims (2)

1. The method for predicting the adsorption performance of MOF to methane gas based on the deep convolutional neural network is characterized by comprising the following steps:
1) Data preprocessing: expanding an original CIF file database and converting the MOF basic three-dimensional structure in the CIF file into characteristics acceptable by a classifier;
2) Constructing a classifier by using a convolutional neural network, and performing iterative training on the classifier by dividing the MOF into intervals of methane gas adsorption capacity;
3) Inputting the preprocessed data into a trained classifier to output the predicted adsorption value category of MOF to methane;
the specific process of the step 1) is as follows:
classifying the original data set according to the adsorption value, randomly dividing the original data set into a training set and a testing set, and expanding data in other categories by taking one category with the largest data quantity as a reference; the method comprises the steps of rotating an MOF single unit cell in a CIF file around a certain coordinate axis, wherein the rotation angle is an included angle on a plane perpendicular to the coordinate axis, obtaining the side length of the MOF unit cell after rotation, exchanging the side length of the MOF unit cell, reflecting the side length of the MOF unit cell to the side a, the side b and the side c in the CIF file, and correspondingly changing the included angle of the atomic coordinate and the corresponding side a, the side b and the side c, so that the data expansion of various CIF files generated based on MOFs with the same structure is realized, in addition, the atomic relative coordinates stored in the data are converted into absolute coordinates in space, the absolute coordinate sizes of the atoms in all the MOF files are counted, and the normalized MOF unit cell coordinates are amplified by a maximum and minimum normalization method to obtain an xy plane projection matrix, an xz plane projection matrix and an yz plane projection matrix.
2. The method for predicting the adsorption performance of MOF to methane gas based on the deep convolutional neural network according to claim 1, wherein in the step 2), a classifier is constructed based on the convolutional neural network and a residual block and is subjected to iterative training, and finally, the classifier with the best classification effect is obtained by testing a test set, wherein a cross entropy function is adopted as a loss function of the classifier, and a LeakyRelu function is adopted as an activation function after each convolutional layer to prevent overfitting.
CN202010374618.XA 2020-05-06 2020-05-06 Method for predicting methane gas adsorption performance of MOF (metal oxide film) based on deep convolutional neural network Active CN111755080B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010374618.XA CN111755080B (en) 2020-05-06 2020-05-06 Method for predicting methane gas adsorption performance of MOF (metal oxide film) based on deep convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010374618.XA CN111755080B (en) 2020-05-06 2020-05-06 Method for predicting methane gas adsorption performance of MOF (metal oxide film) based on deep convolutional neural network

Publications (2)

Publication Number Publication Date
CN111755080A CN111755080A (en) 2020-10-09
CN111755080B true CN111755080B (en) 2023-07-28

Family

ID=72673817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010374618.XA Active CN111755080B (en) 2020-05-06 2020-05-06 Method for predicting methane gas adsorption performance of MOF (metal oxide film) based on deep convolutional neural network

Country Status (1)

Country Link
CN (1) CN111755080B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110542710A (en) * 2019-09-16 2019-12-06 中国石油大学(华东) Preparation method of tungsten disulfide-based formaldehyde gas sensor and application of gas sensor in vehicle-mounted microenvironment detection

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9880142B2 (en) * 2015-05-15 2018-01-30 General Electric Company Photonic sensor for in situ selective detection of components in a fluid
US10423861B2 (en) * 2017-10-16 2019-09-24 Illumina, Inc. Deep learning-based techniques for training deep convolutional neural networks

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110542710A (en) * 2019-09-16 2019-12-06 中国石油大学(华东) Preparation method of tungsten disulfide-based formaldehyde gas sensor and application of gas sensor in vehicle-mounted microenvironment detection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Attainable Volumetric Targets for Adsorption-Based Hydrogen Storage in Porous Crystals: Molecular Simulation and Machine Learning;Grace Anderson et al.;《THE JOURNAL OF PHYSICAL CHEMISTRY》;第120-129页 *

Also Published As

Publication number Publication date
CN111755080A (en) 2020-10-09

Similar Documents

Publication Publication Date Title
Lee et al. Self-attention graph pooling
CN106650809B (en) A kind of vehicle-mounted laser point cloud objective classification method and system
Yoon et al. Cache-oblivious mesh layouts
CN106709067A (en) Multi-source heterogeneous spatial data flow method based on Oracle database
Guo et al. Sparse deep nonnegative matrix factorization
Li et al. Nested subtree hash kernels for large-scale graph classification over streams
CN111914112B (en) Part CAD model reuse method based on point cloud classification network
Kucuktunc et al. λ-diverse nearest neighbors browsing for multidimensional data
CN106570104B (en) Multi-partition clustering preprocessing method for stream data
CN113990401A (en) Method and apparatus for designing drug molecules of intrinsically disordered proteins
Zhang et al. Learning all-in collaborative multiview binary representation for clustering
CN111755080B (en) Method for predicting methane gas adsorption performance of MOF (metal oxide film) based on deep convolutional neural network
Pitsianis et al. Spaceland embedding of sparse stochastic graphs
Wang et al. A fast algorithm for mining association rules in image
Zeng et al. Htc: Hybrid vertex-parallel and edge-parallel triangle counting
Kozawa et al. GPU acceleration of probabilistic frequent itemset mining from uncertain databases
Sun A parallel clustering method study based on MapReduce
CN114880690A (en) Source data time sequence refinement method based on edge calculation
CN112686468B (en) Public facility stability optimization method
CN112785663A (en) Image classification network compression method based on arbitrary shape convolution kernel
CN108062325A (en) Comparative approach and comparison system
CN113298111A (en) Feature selection method for processing high-dimensional data
Jiang et al. A hybrid clustering algorithm
CN109976913A (en) Based on the Skyline service data selection method calculated and device
Vinh et al. Incremental spatial clustering in data mining using genetic algorithm and R-tree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant