CN112382350B - Machine learning estimation method for sensitivity and mechanical property of energetic substance and relation of energetic substance - Google Patents

Machine learning estimation method for sensitivity and mechanical property of energetic substance and relation of energetic substance Download PDF

Info

Publication number
CN112382350B
CN112382350B CN202011311694.2A CN202011311694A CN112382350B CN 112382350 B CN112382350 B CN 112382350B CN 202011311694 A CN202011311694 A CN 202011311694A CN 112382350 B CN112382350 B CN 112382350B
Authority
CN
China
Prior art keywords
descriptors
model
dataset
molecular
energetic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011311694.2A
Other languages
Chinese (zh)
Other versions
CN112382350A (en
Inventor
蒲雪梅
邓倩倩
郭延芝
徐涛
刘建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202011311694.2A priority Critical patent/CN112382350B/en
Publication of CN112382350A publication Critical patent/CN112382350A/en
Application granted granted Critical
Publication of CN112382350B publication Critical patent/CN112382350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C60/00Computational materials science, i.e. ICT specially adapted for investigating the physical or chemical properties of materials or phenomena associated with their design, synthesis, processing, characterisation or utilisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Investigating Strength Of Materials By Application Of Mechanical Stress (AREA)

Abstract

The invention belongs to the technical field of compound performance evaluation, and discloses a machine learning estimation method for the sensitivity and mechanical properties of energetic substances and the relation thereof. The invention establishes 7 QSPR models of the impact sensitivity and the bulk modulus of the nitro energetic compound based on the molecular descriptor calculated by E-Dragon and several common molecular structure information, which is beneficial to shortening the experimental research process of the energetic material and is beneficial to the design and comprehensive evaluation of the novel energetic compound.

Description

Machine learning estimation method for sensitivity and mechanical property of energetic substance and relation of energetic substance
Technical Field
The invention belongs to the technical field of compound performance evaluation, and particularly relates to a machine learning evaluation method for the sensitivity and mechanical properties of energetic substances and the relation of the energetic substances.
Background
Currently, energetic materials are a class of compounds or mixtures containing explosive groups or oxidants and combustibles that are capable of independent chemical reactions and energy output, and are an important component of military explosive, propellant and rocket propellant formulations. The energetic material has wide application in the fields of national defense and scientific industry, aerospace industry and civil use, and the research of the compound has great academic significance and great application value. However, the experiment has the problems of long period, high cost, high risk, low result reproducibility due to a plurality of influencing factors, incapability of obtaining performance data of the non-synthesized energetic material through the experiment, and the like. The practical application has higher requirements on the performances of the material (such as high detonation performance, good thermal stability, low sensitivity, excellent mechanical property, environmental friendliness and the like), so the development is relatively slow, and the deep research in theory has positive guiding significance on accelerating the development and development of the energetic material.
In the middle of the 20 th century, the mode of performing simulation on a scientific experiment by using an electronic computer is rapidly popularized, researchers can deduce more and more complex phenomena through simulating the structure and movement of a simulation substance, the research progress of energetic materials is greatly accelerated by the appearance of a simulation calculation mode, and the experimental researchers can only spend resources on molecules which are expected to improve the performance, reduce the sensitivity and reduce the environmental hazard by screening the designed energetic materials through a calculation model. However, although the result of the calculation simulation is accurate and reliable, there are some limitations, such as complex calculation process, high model requirement, long time consumption, and often high-precision calculation can be performed only for small batches of substances. Technology has rapidly developed since this century The computing power is greatly improved, the data explosiveness is increased, and the combination of big data and artificial intelligence (including data mining, machine/statistics learning, deep learning, compressed sensing and the like) promotes the appearance of a fourth mode of scientific research, which is also called as material 4.0 in the material field. Scientific research under the fourth range is supported by a large amount of data, making it possible to calculate previously unknown, trusted theory. The advent of artificial intelligence methods has the potential to greatly alter and enhance the role of computers in science and engineering. Machine learning is one of the branches of artificial intelligence that has rapidly progressed in recent years, and the statistical algorithm at the core can be continuously improved through training. Such techniques are suitable for dealing with complex problems involving a large number of combined spatial or nonlinear processes, which conventional approaches either fail to address or can only deal with at great computational expense. The number of applications in the chemical field is growing at a remarkable rate, the method is widely used for material synthesis guidance, molecular design, drug discovery, property prediction of various substances and the like. Machine learning has also long been used in predicting various important properties of energetic materials. As in the prior art 1, two-dimensional quantitative structure-activity relation between toxicity of 148 aromatic nitro compounds to 9 different targets and molecular structural characteristics of the aromatic nitro compounds is constructed by utilizing Multiple Linear Regression (MLR) and 20 topological descriptors, and the correlation coefficient R of 9 models 2 Minimum 0.71 and maximum 0.92. In the prior art 2, the explosion heat, density and orbit energy difference are taken as input in 2014, the prediction of the explosion speed of 54 high-nitrogen compounds is realized through Multiple Linear Regression (MLR) and least squares support vector machine (LS-SVM), and R of a test set of two methods is realized 2 0.921 and 0.971, respectively. In 2019, in prior art 3, 104 data points are extracted from 65 kinds of CHNO high-energy explosives, and the explosion velocity is predicted by using an Artificial Neural Network (ANN) by taking the composition, structure, generated heat and loading density of the explosives as characteristics. In 2016, in the prior art 4, experimental density values of 170 nitro energetic compounds are collected, the established MLR and ANN models have good robustness, and the test set R 2 0.886 and 0.931, respectively, provides a new opportunity for efficient and rapid prediction of crystal density and design of new energetic materials of high performance. PRIOR ART 5In 2017 and 2018, the quantitative structure-activity relationship between 100 azole compounds and 36 tetrazolium oxynitride salts and the molecular structure thereof is researched by adopting an MLR method, and a model R is adopted 2 0.923 and 0.9321, respectively. In 2018, in the prior art 6, a molecular structure is characterized, on the basis of 111 experimental data of 54 energetic substances, an MLR method is adopted to study the spontaneous combustion temperature, the RMS of a QSPR model is 47.45K, and the determination of the spontaneous combustion temperature of the energetic compound is simplified. In the same year prior art 7, a mixed model SVR-GSA is adopted, and a prediction model of the spontaneous combustion temperature of 53 organic energetic compounds is established by using the molecular weight and the CHON number as descriptors, so that the performance is respectively improved by 37.34% and 79.05% compared with the two models of the former people. In 2018, in the prior art 8, a series of quantitative structure-activity relation models of detonation performance, heat generation, density and other properties are established for 109 CHONF energetic molecules, various molecular characteristics and machine learning methods are comprehensively compared, and the optimal characteristics and models are respectively key sum and Kernel Ridge Regression (KRR), so that guidance is provided for further application of machine learning in the field. In addition, many scholars have developed a series of studies on the melting point, lattice energy, density, decomposition temperature, detonation velocity of energetic eutectic and detonation properties and melting points of energetic ionic liquids using machine learning methods.
Although there have been sufficient researches on the properties of density, detonation performance, stability, etc., many problems remain in the field of energetic materials, which deserve research.
1. The research on the relationship between the mechanical property and the molecular structure is very lack nowadays. The mechanical property is an important practical property of the energetic material, has important significance for preparation and use of the energetic material, and refers to the capability of resisting deformation and fracture under the action of certain temperature conditions and external force, and the energy and the use safety of substances are directly related. Although scientists pay more attention to the mechanical properties of energetic composite materials such as PBX explosives, researches indicate that the types, contents, proportions and the like of the components of the energetic composite have substantial influence on the mechanical properties of the energetic composite materials, and the monomer energetic materials are taken as main explosives to occupy 90% -95% of the total content (the content of polymer binder accounts for 5% -10%), so that the overall mechanical properties of the energetic composite system are directly influenced, and therefore, the selection of the main energetic materials with excellent properties is the key of the formula design of the energetic composite materials. The research on the mechanical properties of the energetic materials has important significance for guiding the formulation and structural part design of the energetic materials, carrying out safety evaluation, life prediction and the like on the energetic materials, but no quantitative structure-activity relationship research on the mechanical properties of the energetic materials and the structures of the energetic materials exists at present.
2. There is still room for improvement in the accuracy of the sensitivity QSPR prediction model. Sensitivity is one of the most important properties of energetic materials and is a major indicator for evaluating stability and reliability of energetic materials during use. Impact sensitivity is one of the most common, and is used for representing the easiness of explosion or combustion of a material after the material is subjected to mechanical impact, and is generally represented by the explosion probability of a sample in a drop weight experiment or the characteristic falling height h under 50% explosion condition 50 And (3) representing. In 2009 prior art 9, based on 16 electrical topological state indexes, a QSPR model of 156 nitroenergetic compounds was established using a Back Propagation Neural Network (BPNN), multiple Linear Regression (MLR) and Partial Least Squares (PLS), R of the whole data test set 2 0.740,0.715 and 0.718, respectively. In 2012 prior art 10 featuring 10 molecular descriptors, studies were developed on the same dataset using the ANN and MLR methods, both methods testing R of the set 2 0.8658 and 0.7222, respectively, the prediction accuracy is improved greatly, but more work is still required to explore and construct a model with higher accuracy, higher generalization capability and more representativeness.
3. There have been few studies on the correlation between different properties such as sensitivity and mechanical properties. It is pointed out that the energetic materials have high correlation among various properties, such as high correlation between mechanical properties and safety (the energetic materials are easy to form stress concentration under the action of external force, and can cause hot spot formation, thus causing unexpected detonation), and impact sensitivity is an external appearance of instability of the molecular structure of the explosive. Heretofore, many scholars have conducted some research on the relationship between sensitivity and other properties of energetic materials and different forms of sensitivity. As in 2004, in prior art 11, etc., the explosion heat and the feel under 30 different predicted reaction paths of nitroamines are studied by a quantum mechanical method The degree, the heat of detonation was found to be linearly related to the natural logarithm of sensitivity. In 2017, the correlation between detonation performance and sensitivity of the explosive is proved in theory in the prior art 12, and the log (h 50 ) Along with D -4 、P -2 Or E is G -1 Linearly increase, and the determination coefficient R between the data 2 The value is close to 0.8. In 2014, prior art 13 introduces a new relation between impact sensitivity and thermal decomposition activation energy based on 40 chen nitroaromatic energetic compounds, and research shows that the impact sensitivity is a function of the thermal decomposition activation energy, the atomic number ratio of H to O and specific molecular structure parameters. In 2016, prior art 14 proposes a general simple model for predicting friction sensitivity, which is characterized by thermal decomposition activation energy and molecular structure parameters, and adopts an MLR method to construct friction sensitivity models of 21 cyclic and acyclic nitrosamines, wherein the RMS of the models is 14.2N. In 2016, in the prior art 15, researches on spark sensitivity and impact sensitivity of 28 kinds of chen nitroaromatic energetic compounds and impact sensitivity and electrostatic sensitivity of 27 kinds of nitroaromatic compounds are developed, and obvious correlation exists between the two types of sensitivity, and the constructed linear model RMS is 1.55J and 2.4% respectively. In 2018, in the prior art 16, a linear modeling method is also adopted, so that a relation model of impact sensitivity and spark sensitivity of 11 kinds of explosives, spark sensitivity and impact sensitivity of 31 kinds of nitroaromatic compounds and 14 kinds of nitroamines is sequentially constructed, and RMS is respectively 2.38kbar and 1.31J. The relationship between the mutual influence and mutual restriction of different properties has very important guiding significance for experimental study, but the relationship between the sensitivity and the mechanical property is not studied at present. In addition, the most commonly used ANN method in machine learning has the advantages of being good in extracting abstract features, high in accuracy of quantitative and classification models, capable of automatically generating and optimizing brand new structures and the like, and insufficient in subsequent interpretation of the models.
With the rapid development of artificial intelligence and computer hardware, the impact sensitivity value of a substance can be obtained by a method of quantitative calculation or an empirical formula, and although a few methods exist today, the impact sensitivity value can be obtained by calculation, the method still has a certain limitation. The quantitative calculation method established in the prior art 17 has high calculation accuracy, but the process is complex and takes a long time, is only suitable for the prediction of a few specific molecules, and the calculation of a large number of samples consumes huge calculation resources and time cost. For another QSPR model method, a simple and short-time prediction of a large number of samples is applied, however, as mentioned above, there is room for further improvement in accuracy. However, for mechanical properties, the evaluation of the mechanical behavior (elasticity, plasticity and fracture phenomena) of the molecular crystals is often complicated due to the difficulty in preparing the samples. Therefore, nowadays, monomers such as HMX and energetic composites such as energetic co-crystals are generally studied by using a dynamic (MD) simulation method, and various mechanical properties are usually calculated from the elastic constants. However, MD simulation also has the drawbacks of long calculation time, inability to calculate a large number of samples in a short time, and inability to be applied to experimental non-synthesized materials, so it is also necessary to establish a structural property relationship to reduce the cost of the unknown sample prediction time or to evaluate new energetic materials.
Through the above analysis, the problems and defects existing in the prior art are as follows:
(1) At present, no research on the relationship between the quantitative structure-activity relationship, sensitivity and mechanical property of the energetic material and the structure of the energetic material exists;
(2) The prediction accuracy of the existing sensitivity prediction model still has a certain improvement space, the generalization capability is not enough, the representativeness is not strong, and the subsequent interpretation of the model is not enough;
(3) The mechanical property calculation process of the substance is complex and takes a long time, is only suitable for calculating a few specific molecules, cannot calculate a large number of samples in a short time, and has the defect that the method cannot be applied to experimental non-synthesized substances.
The difficulty of solving the problems and the defects is as follows:
(1) Data acquisition is difficult: the experimental data of the mechanical properties of a large number of monomer energetic materials are difficult to obtain, and a large amount of calculation resources are consumed for obtaining calculation data, so that the cost is high;
(2) Alternative methods are limited: although the methods such as deep learning and the like which are stronger than the machine learning are developed nowadays, the methods such as deep learning and the like cannot be applied to the methods due to the limitation of the existing data volume, and the common machine learning methods often have the problem of insufficient subsequent interpretation.
The meaning of solving the problems and the defects is as follows:
(1) The research on the mechanical properties of the energetic material has important significance for guiding the formulation of the energetic material and the design of structural members, carrying out safety evaluation and life prediction on the energetic material, and the like;
(2) Research on the relation between sensitivity and mechanical properties and molecular structures is helpful for comprehensive evaluation of the properties of the novel energetic material so as to accelerate the research and development process of the energetic material;
(3) The relationship between the mutual influence and mutual restriction of the mechanical property and the sensitivity has very important guiding significance for experimental study.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a machine learning estimation method for the sensitivity and mechanical properties of energetic substances and the relation thereof.
The invention is realized in such a way that a machine learning estimation method for the sensitivity and the mechanical property of the energetic substance and the relation thereof comprises the following steps: taking the molecular descriptor and molecular structure information calculated by E-Dragon as characteristics, constructing quantitative structure-activity relation models of the impact sensitivity and the bulk modulus of 7 nitro energetic substances based on an Artificial Neural Network (ANN) and a method for determining independent screening and sparse operators (SISSO), and determining the relation between the impact sensitivity and the mechanical property of the nitro energetic substances and the quantitative relation between the two and the molecular structure respectively by utilizing the constructed quantitative structure-activity relation models of the impact sensitivity and the bulk modulus of the nitro energetic substances.
Further, the machine learning estimation method for the sensitivity and the mechanical property of the energetic substance and the relation thereof comprises the following steps:
step one, collecting and processing data, obtaining impact sensitivity and bulk modulus values of 240 nitroaromatic-based nitro compounds, collecting 7 common characteristics such as molecular weight, crystal density and the like, and calculating Dragon molecular descriptors of 240 substances by taking SMILES character strings as input;
step two, gradually screening out corresponding final characteristics by taking the impact sensitivity and the bulk modulus as target properties, and establishing QSPR relation models between the impact sensitivity and the bulk modulus of the energetic material and the molecular structure of the energetic material by adopting an ANN method and an SISSO method;
step three, taking the impact sensitivity of 240 nitro energetic compounds as output, taking the atomic number, the molecular weight, the crystal density, the oxygen balance and the bulk modulus of CHON elements as characteristics, establishing the correlation between the sensitivity of the energetic material and the mechanical property, and analyzing the relation between the impact sensitivity of the nitro energetic material and the mechanical property thereof;
and step four, comparing the difference of model performances of the two methods of ANN and SISSO and the combination of the two characteristics, calculating a 5-time cross verification result of the model after finding the optimal parameters for each model, and comparing the suitability of the two characteristics of the molecule descriptor calculated according to the SMILES character string and the characteristics selected according to experience.
Further, the data acquisition method includes:
acquiring a crystal structure file and SMILES character strings of 240 nitro energetic compounds, and acquiring mechanical property data of the nitro energetic compounds by adopting molecular dynamics simulation of a forcitite module in MS software; after structure optimization, utilizing the COMPASS force field, adopting Anderson temperature control and Parrinello pressure control under NPT ensemble and 295K temperature, setting the pressure to 0.0001GPa, adopting atom-based and Ewald addition methods for Van der Waals and static action respectively, taking the cutoff radius of 0.95nm, and carrying out cutoff tail correction. The initial atomic motion speed is determined according to Maxwell-Boltzman distribution, the solution of the Newton's motion equation is established on basic assumptions such as periodic boundary conditions, time average is equivalent to ensemble average, etc., the integration adopts a Verlet method, the time step is 1fs, and a track is stored every 10 fs; after the system is balanced, mechanical property analysis is carried out by adopting a 1ns simulation track after the system is balanced to obtain an elastic coefficient C ij (i, j=1 to 6), and calculating the available mechanical performance parameters. Impact with nitro energetic material proposed by w. -p.lai et al in 2010The impact sensitivity values of 240 nitro-energetic materials are obtained by calculation of a sensitivity prediction empirical formula.
Further, the molecular descriptor calculation method includes:
(1) Calculating a molecular descriptor:
1666 molecular descriptors per molecule were computed online using E-Dragon1.0 software, based on SMILES strings; obtaining crystal density of nitro energetic compounds from CSD (Cambridge crystal database) from energetic material formula C a H b O c N d C, H, O, N atomic number and molecular weight of each substance were extracted;
wherein a, b, c, d respectively represents the atomic number of C, H, O, N elements in a molecule, M is the relative molecular mass, the unit is g/mol, OB is the oxygen balance of an energetic molecule, and the unit is g/g; the oxygen balance numerical calculation formula is as follows:
(2) The number of descriptors is reduced using statistical methods to obtain the required descriptors for building the QSPR model.
Further, in step (2), the reducing the number of descriptors using a statistical method, and obtaining the required descriptors for constructing the QSPR model includes the steps of:
1) Eliminating all descriptors containing error information and incapable of calculating exact numerical values;
2) Removing descriptors with more than 75% of samples all having the same value;
3) Omitting descriptors with a relative standard deviation RSD less than 0.05;
4) Deleting the descriptors with the correlation Pearson coefficient r being larger than 0.75, and removing the descriptors with smaller correlation with the target value when the correlation between the two descriptors is larger than 0.75;
5) The descriptor with p value, namely the probability value larger than 0.005 is removed by adopting an MLR forward stepwise regression method.
Further, the feature descriptors and target properties used for constructing the QSPR model after screening are organized into the following 4 data sets;
dataset-1: obtaining 14 molecular descriptors with bulk modulus as target property; including structure descriptor, information index, edge adjacency index, BCUT descriptor, geometry descriptor, 3D-Morse descriptor, GETAWAY descriptor, atomic center fragment, molecular property 9 kinds of descriptors altogether;
dataset-2: obtaining 17 molecular descriptors with impact sensitivity as target property, wherein the 17 molecular descriptors comprise 2D autocorrelation descriptors, geometric descriptors, RDF descriptors, 3D-Morse descriptors, WHIM descriptors, GETAWAY descriptors and 8 types of molecular property descriptors;
dataset-3: impact sensitivity and 8 characteristics of bulk modulus, crystal density, oxygen balance, molecular weight, CHON number of atoms;
dataset-4: and combining and de-duplicating the descriptors screened out twice respectively by taking the impact sensitivity and the bulk modulus as target properties together to obtain 26 descriptors including 10 descriptors.
Further, the method for constructing the quantitative structure-activity relation model of the impact sensitivity and the bulk modulus of the nitro energetic comprises the following steps:
Using Dataset-1, dataset-2, dataset-3 and Dataset-4 as data sets, randomly dividing the data sets into two subsets, using 80% of data as training sets and using 20% of data as test sets; modeling 4 data sets of Dataset-1, dataset-2, dataset-3 and Dataset-4 by using two methods of SISSO and ANN to obtain quantitative structure-activity relation models of impact sensitivity and bulk modulus of 7 nitro energetic substances;
the quantitative structure-activity relation model of the impact sensitivity and the bulk modulus of the nitro energetic is respectively as follows:
model-1: ANN model of bulk modulus with corresponding 14 molecular descriptors (Dataset-1);
model-2: an ANN model of the impact sensitivity and corresponding 17 molecular descriptors (Dataset-2);
model-3: an ANN model (Dataset-3) with 8 characteristics of impact sensitivity and bulk modulus, crystal density, oxygen balance, molecular weight, CHON atomic number;
model-4: an ANN model of impact sensitivity and bulk modulus with corresponding 26 molecular descriptors (Dataset-4);
model-5: SISSO model of bulk modulus with 14 molecular descriptors (Dataset-1);
model-6: SISSO model of impact sensitivity and 17 molecular descriptors (Dataset-2);
model-7: the SISSO model (Dataset-3) with 8 characteristics of impact sensitivity and bulk modulus, crystal density, oxygen balance, molecular weight, CHON atomic number.
Further, in the fourth step, the two features are:
oxygen balance and related characteristics selected according to chemical experience in Dataset-3;
molecular descriptors which are screened out from Dataset-1, dataset-2 and Dataset-4 step by adopting a statistical method according to the target property.
Further, in the fourth step, the comparing the model performance difference between the two methods of ANN and SISSO and the combination of the two features, comparing the suitability of the two features and the model, and obtaining the architecture of the best QSPR model for each data set includes:
the root mean square error RMSE, the Pearson correlation coefficient R and the decision coefficient R are adopted 2 Comprehensively evaluating the performances of the training set and the testing set of the 7 QSPR models;
the formula is as follows:
wherein N is the number of compounds in each dataset, y i true Is true value, y i pred As a result of the model predictive value,for the average of the true values of the samples, +.>Is the average of the sample predictions.
Further, the optimal QSPR model corresponding to the dataset is:
the best model with 14 descriptors and bulk modulus (mechanical property) as target property is ANN model (Mode-1); the best model for the target properties, characterized by 17 descriptors, is also the ANN model (Mode-2).
Further, the formula of the relation between the impact sensitivity and the mechanical property of the nitro energetic is as follows:
h 50 for impact sensitivity, a, b, c, d is the number of C, H, O, N elements in a molecule, M is the molecular weight, OB is the oxygen balance value, and K is the bulk modulus, respectively.
It is a further object of the present invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of: taking the molecular descriptor and the molecular structure information calculated by E-Dragon as characteristics, constructing quantitative structure-activity relation models of the impact sensitivity and the bulk modulus of 7 nitro energetic substances based on an artificial neural network and a method for determining independent screening and sparse operators, and determining the relation between the impact sensitivity and the mechanical property of the nitro energetic substances and the quantitative relation between the impact sensitivity and the mechanical property of the nitro energetic substances and the molecular structure respectively by utilizing the constructed quantitative structure-activity relation models of the impact sensitivity and the bulk modulus of the nitro energetic substances.
Another object of the present invention is to provide a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of: taking the molecular descriptor and the molecular structure information calculated by E-Dragon as characteristics, constructing quantitative structure-activity relation models of the impact sensitivity and the bulk modulus of 7 nitro energetic substances based on an artificial neural network and a method for determining independent screening and sparse operators, and determining the relation between the impact sensitivity and the mechanical property of the nitro energetic substances and the quantitative relation between the impact sensitivity and the mechanical property of the nitro energetic substances and the molecular structure respectively by utilizing the constructed quantitative structure-activity relation models of the impact sensitivity and the bulk modulus of the nitro energetic substances.
Another object of the present invention is to provide a machine learning estimation system for performing the machine learning estimation method for the sensitivity and mechanical properties of an energetic substance and their relationships, the machine learning estimation system comprising:
the quantitative structure-activity relation model building module is used for taking a molecular descriptor calculated by E-Dragon as a characteristic, and building quantitative structure-activity relation models of impact sensitivity and bulk modulus of 5 nitro energetic substances and molecular structures thereof respectively based on an artificial neural network and a method for determining independent screening and sparse operators;
the impact sensitivity and mechanical property relation determining module is used for determining the relation between the impact sensitivity and mechanical property of the nitro-energetic substances by using a quantitative structure-activity relation model of the impact sensitivity and the bulk modulus of the 2 nitro-energetic substances constructed by taking the molecular structure information as characteristics.
By combining all the technical schemes, the invention has the advantages and positive effects that: the invention establishes 7 QSPR models (Model 1-7) of the impact sensitivity and the bulk modulus (mechanical property) of the nitro energetic compound based on the molecular descriptor calculated by E-Dragon and several common and easily obtained molecular structure information, thereby being beneficial to shortening the experimental research process of the energetic material and being beneficial to the design and comprehensive evaluation of the novel energetic compound. Meanwhile, experiments prove that the model has certain advantages and high accuracy in predicting the impact sensitivity.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly explain the drawings needed in the embodiments of the present application, and it is obvious that the drawings described below are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a machine learning estimation method for the sensitivity and mechanical properties of energetic materials and their relationships according to an embodiment of the present invention.
FIG. 2 is a flow chart of a machine learning estimation method for the sensitivity and mechanical properties of energetic materials and their relationships according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of a model comparison using data set 1 provided by an embodiment of the present invention;
in the figure: a: r of Dataset-1 Dataset model 2 The method comprises the steps of carrying out a first treatment on the surface of the B: r of a Dataset-1 Dataset model; c: RMSE of Dataset-1 Dataset model.
FIG. 4 is a schematic diagram of a comparison of models using dataset 2 provided by an embodiment of the present invention;
in the figure: a: r of Dataset-2 Dataset model 2 The method comprises the steps of carrying out a first treatment on the surface of the B: r of a Dataset-2 Dataset model; c: RMSE of Dataset-2 Dataset model.
FIG. 5 is a schematic diagram of a comparison of models using dataset 3 provided by an embodiment of the present invention;
In the figure: a: r of Dataset-3 Dataset model 2 The method comprises the steps of carrying out a first treatment on the surface of the B: r of a Dataset-3 Dataset model; c: RMSE of Dataset-3 Dataset model.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Aiming at the problems existing in the prior art, the invention provides a machine learning estimation method for the sensitivity and mechanical properties of energetic materials and the relation thereof, and the invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the machine learning estimation method for the sensitivity and the mechanical property of the energetic material and the relation thereof provided by the embodiment of the invention comprises the following steps: taking the molecular descriptor and the molecular structure information calculated by E-Dragon as characteristics, constructing quantitative structure-activity relation models of the impact sensitivity and the bulk modulus of 7 nitro energetic substances based on an artificial neural network and a method for determining independent screening and sparse operators, and determining the relation between the impact sensitivity and the mechanical property of the nitro energetic substances and the quantitative relation between the two and the molecular structure by utilizing the constructed quantitative structure-activity relation models of the impact sensitivity and the bulk modulus of the nitro energetic substances.
As shown in fig. 2, the machine learning estimation method for the sensitivity and the mechanical property of the energetic material and the relation thereof provided by the embodiment of the invention comprises the following steps:
s101, collecting and processing data, acquiring impact sensitivity and bulk modulus values of 240 nitroaromatic-based nitro compounds, collecting 7 common characteristics such as molecular weight, crystal density and the like, and calculating Dragon molecular descriptors of 240 substances by taking SMILES character strings as input;
s102, gradually screening out corresponding final characteristics by taking impact sensitivity and bulk modulus as target properties, and establishing QSPR relation models between the impact sensitivity and bulk modulus of the energetic material and molecular structures of the energetic material by adopting an ANN method and an SISSO method;
s103, taking the impact sensitivity of 240 nitro energetic compounds as output, taking the atomic number, the molecular weight, the crystal density, the oxygen balance and the bulk modulus of CHON elements as characteristics, establishing the correlation between the sensitivity of the energetic material and the mechanical property, and analyzing the relation between the impact sensitivity of the nitro energetic material and the mechanical property thereof;
s104, comparing the difference of the model performances of the combination of the ANN method and the SISSO method and the two features, searching the optimal parameters for each model, calculating a 5-time cross verification result of the model, and comparing the suitability of the feature selected according to experience and the molecular descriptor calculated according to the SMILES character string with the model.
The data acquisition method provided by the embodiment of the invention comprises the following steps:
acquiring a crystal structure file and SMILES character strings of 240 nitro energetic compounds, and acquiring mechanical property data of the nitro energetic compounds by adopting molecular dynamics simulation of a forcitite module in MS software; after structure optimization, utilizing the COMPASS force field, adopting Anderson temperature control and Parrinello pressure control under NPT ensemble and 295K temperature, setting the pressure to 0.0001GPa, adopting atom-based and Ewald addition methods for Van der Waals and static action respectively, taking the cutoff radius of 0.95nm, and carrying out cutoff tail correction. The initial atomic motion speed is determined according to Maxwell-Boltzman distribution, the solution of the Newton's motion equation is established on basic assumptions such as periodic boundary conditions, time average is equivalent to ensemble average, etc., the integration adopts a Verlet method, the time step is 1fs, and a track is stored every 10 fs; after the system is balanced, mechanical property analysis is carried out by adopting a 1ns simulation track after the system is balanced to obtain an elastic coefficient C ij (i, j=1 to 6), and calculating the available mechanical performance parameters. The impact sensitivity values of 240 nitroenergetic substances are calculated and obtained by adopting an empirical formula for predicting the impact sensitivity of nitroenergetic substances, which is proposed by W.P.Lai et al in 2010.
The molecular descriptor calculation method provided by the embodiment of the invention comprises the following steps:
(1) Calculating a molecular descriptor:
1666 molecular descriptors per molecule were computed online using E-Dragon1.0 software, based on SMILES strings; obtaining crystal density of nitro energetic compounds from CSD (Cambridge crystal database) from energetic material formula C a H b O c N d C, H, O, N atomic number and molecular weight of each substance were extracted;
wherein a, b, c, d respectively represents the atomic number of C, H, O, N elements in a molecule, M is the relative molecular mass, the unit is g/mol, OB is the oxygen balance of an energetic molecule, and the unit is g/g; the oxygen balance numerical calculation formula is as follows:
(2) The number of descriptors is reduced using statistical methods to obtain the required descriptors for building the QSPR model.
In step (2), the statistical method for reducing the number of descriptors provided by the embodiment of the invention, the method for obtaining the required descriptors for constructing the QSPR model comprises the following steps:
1) Eliminating all descriptors containing error information and incapable of calculating exact numerical values;
2) Removing descriptors with more than 75% of samples all having the same value;
3) Omitting descriptors with a relative standard deviation RSD less than 0.05;
4) Deleting the descriptors with the correlation Pearson coefficient r being larger than 0.75, and removing the descriptors with smaller correlation with the target value when the correlation between the two descriptors is larger than 0.75;
5) The descriptor with p value, namely the probability value larger than 0.005 is removed by adopting an MLR forward stepwise regression method.
The characteristic descriptors and the target properties for constructing the QSPR model provided by the embodiment of the invention are organized into the following 4 data sets;
dataset-1: obtaining 14 molecular descriptors with bulk modulus as target property; including structure descriptor, information index, edge adjacency index, BCUT descriptor, geometry descriptor, 3D-Morse descriptor, GETAWAY descriptor, atomic center fragment, molecular property 9 kinds of descriptors altogether;
dataset-2: obtaining 17 molecular descriptors with impact sensitivity as target property, wherein the 17 molecular descriptors comprise 2D autocorrelation descriptors, geometric descriptors, RDF descriptors, 3D-Morse descriptors, WHIM descriptors, GETAWAY descriptors and 8 types of molecular property descriptors;
dataset-3: impact sensitivity and 8 characteristics of bulk modulus, crystal density, oxygen balance, molecular weight, CHON number of atoms;
dataset-4: and combining and de-duplicating the descriptors screened out twice respectively by taking the impact sensitivity and the bulk modulus as target properties together to obtain 26 descriptors including 10 descriptors.
The method for constructing the quantitative structure-activity relation model of the impact sensitivity and the bulk modulus of the nitro energetic provided by the embodiment of the invention comprises the following steps:
Using Dataset-1, dataset-2, dataset-3 and Dataset-4 as data sets, randomly dividing the data sets into two subsets, using 80% of data as training sets and using 20% of data as test sets; modeling 4 data sets of Dataset-1, dataset-2, dataset-3 and Dataset-4 by using two methods of SISSO and ANN to obtain quantitative structure-activity relation models of impact sensitivity and bulk modulus of 7 nitro energetic substances;
the quantitative structure-activity relation model of the impact sensitivity and the bulk modulus of the nitro energetic is respectively as follows:
model-1: ANN model of bulk modulus with corresponding 14 molecular descriptors (Dataset-1);
model-2: an ANN model of the impact sensitivity and corresponding 17 molecular descriptors (Dataset-2);
model-3: an ANN model (Dataset-3) with 8 characteristics of impact sensitivity and bulk modulus, crystal density, oxygen balance, molecular weight, CHON atomic number;
model-4: an ANN model of impact sensitivity and bulk modulus with corresponding 26 molecular descriptors (Dataset-4);
model-5: SISSO model of bulk modulus with 14 molecular descriptors (Dataset-1);
model-6: SISSO model of impact sensitivity and 17 molecular descriptors (Dataset-2);
model-7: the SISSO model (Dataset-3) with 8 characteristics of impact sensitivity and bulk modulus, crystal density, oxygen balance, molecular weight, CHON atomic number.
In step S104, two features provided in the embodiment of the present invention are:
oxygen balance and related characteristics selected according to chemical experience in Dataset-3;
molecular descriptors which are screened out from Dataset-1, dataset-2 and Dataset-4 step by adopting a statistical method according to the target property.
In step S104, comparing the difference of model performance between two methods of ANN and SISSO and two features, comparing suitability of the two features and the model, and obtaining an optimal QSPR model for each data set includes:
the root mean square error RMSE, the Pearson correlation coefficient R and the decision coefficient R are adopted 2 Comprehensively evaluating the performances of the training set and the testing set of the 7 QSPR models;
the formula is as follows:
wherein N is the number of compounds in each dataset, y i true Is true value, y i pred As a result of the model predictive value,for the average of the true values of the samples, +.>Is the average of the sample predictions.
The optimal QSPR model corresponding to the data set provided by the embodiment of the invention is as follows:
the best model with 14 descriptors and the bulk modulus, namely the mechanical property, as the target property is an ANN model (Mode-1); the best model for the target properties, characterized by 17 descriptors, is also the ANN model (Mode-2).
The formula of the relation between the impact sensitivity and the mechanical property of the nitro energetic is as follows:
h 50 for impact sensitivity, a, b, c, d is the number of C, H, O, N elements in a molecule, M is the molecular weight, OB is the oxygen balance value, and K is the bulk modulus, respectively.
The technical scheme of the invention is further described below with reference to specific embodiments.
Example 1:
1. introduction(s)
The invention aims to study the relation between the impact sensitivity and the mechanical property of the nitro energetic and the quantitative relation between the nitro energetic and the molecular structure by adopting an ANN method and an SISSO method. The study includes the following four parts.
1) And establishing a QSPR model of the mechanical property and the molecular structure of the energetic material. Nitro compounds are still the most dominant and important part of explosives today as a High Energy Density Material (HEDM) widely used in civilian and military applications. Almost all energetic materials contain nitro groups (X-NO) 2 X=c, N or O), the nitro group providing the energetic molecule with a nitrogen element which ensures its decomposition into N 2 The high-energy material releases a large amount of energy, and also provides an oxygen element which is indispensable in the combustion or detonation process of the high-energy material. As a result of reviewing the previous studies, in addition to many studies on the properties of various kinds of nitroenergetic substances (including nitroaromatics, nitroamines, aliphatic nitrocompounds, nitrates, etc.), many students have studied on nitroarenes alone, and nitroarenes have received great attention as an important component of nitroenergetic substances. Therefore, 240 nitro compounds mainly containing nitroaromatic hydrocarbon are collected from a Cambridge crystal database, and besides nitro, the nitro compounds also contain amino, hydroxyl, carboxyl, alkoxy, amido and other groups containing N, O elements. And constructing a relation model between the mechanical property of the nitro energetic material and the molecular structure of the nitro energetic material by taking the Dragon molecular descriptor as a model characteristic and adopting an ANN and SISSO method.
2) And establishing a QSPR relation model between the impact sensitivity of the energetic material and the molecular structure of the energetic material. And the method is the same as the construction of a mechanical property QSPR model, firstly, descriptors of 240 molecules are calculated, then, final characteristics are gradually screened out according to target properties, and finally, an ANN and SISSO method is adopted to build a relation model of the descriptor characteristics and the impact sensitivity.
3) And establishing the association between the sensitivity and mechanical property of the energetic material. The invention takes the impact sensitivity of 240 nitro energetic compounds as output, takes the atomic number, molecular weight, crystal density, oxygen balance and bulk modulus of CHON elements as characteristics, and establishes a relation formula of sensitivity and mechanical property, thereby analyzing the relation between the two from the theoretical angle. Several other characteristics besides bulk modulus were selected based on previous research experience. The atomic number and molecular weight of the CHON element are the most fundamental properties of a substance, and are often characterized to predict detonation performance, density, self-ignition temperature, and the like. The crystal density is one of the most readily available properties of energetic materials and is often also characteristic to predict other relevant properties such as detonation velocity. Whereas the oxygen balance has been found in 1979 by Kamlet and Adolph to have a linear relationship between the logarithmic value of the impact sensitivity of energetic materials with the same dissociation mechanism and the oxygen balance. It is worth mentioning that the impact sensitivity and bulk modulus data adopted by the invention are obtained by a calculation mode and are not experimental values. Along with the rapid development of artificial intelligence and computer hardware, the impact sensitivity value of the substance can be obtained by a method of quantitative calculation or an empirical formula, the accuracy of the prediction results of various models is high, the reliability is high, the time and the expense cost of experimental research are greatly saved, and the method has wide application in the design of novel energetic materials. Secondly, since the impact-induced process is extremely complex, experimental data of impact sensitivity are greatly different due to different test equipment, sample sizes, configurations and the like, and measurement results are generally unable to reproduce and are greatly different, so that the data obtained by experiments are only used as approximate indications of sensitivity, and certain difficulties exist in integrating the experimental data under different conditions. Therefore, the invention selects a calculation mode under the condition of lack of experimental data. In the prior art 18, the CHON polynitroaromatic compound is studied, the molecular weight of explosive molecules and the number of atoms of each element are utilized to obtain a correlation formula of impact sensitivity, and although the application result is good in consistency with experimental data, the influence of different groups on substances is not considered. The empirical formula adopted by the invention is improved on the basis, and the correction factors of the radical positions are introduced, so that the prediction accuracy is further improved.
Although there are several methods available today that can calculate the impact sensitivity value, certain limitations remain. Although the existing quantitative calculation method has high calculation accuracy, the method has complex process and long time consumption, is only suitable for the prediction of a few specific molecules, and the calculation of a large number of samples consumes huge calculation resources and time cost. The empirical formula method adopted by the invention usually needs to divide the nitro compound into a plurality of classes (such as polynitroarene, nitramine, aliphatic polynitro compound, nitro heterocyclic compound and the like) according to the structural characteristics, and respectively constructs a calculation formula, and when the method is applied, the correction term is usually calculated by looking up a table of statistical structure information (such as the number of groups, the relative positions of the groups and the like) of each substance, and the large-scale calculation is time-consuming and labor-consuming. For another QSPR model method, a large number of samples can be predicted simply and in a short time, however, as mentioned above, there is room for further improvement in accuracy, which is also the objective of the present invention to create a sensitivity model. However, for mechanical properties, the evaluation of the mechanical behavior (elasticity, plasticity and fracture phenomena) of the molecular crystals is often complicated due to the difficulty in preparing the samples. Therefore, nowadays, monomers such as HMX and energetic composites such as energetic co-crystals are generally studied by using a dynamic (MD) simulation method, and various mechanical properties are usually calculated from the elastic constants. However, MD simulation also has the drawbacks of long calculation time, inability to calculate a large number of samples in a short time, and inability to be applied to experimental non-synthesized materials, so it is also necessary to establish a structural property relationship to reduce the cost of the unknown sample prediction time or to evaluate new energetic materials.
4) Comparing the difference in model performance between the two methods of ANN and SISSO and the combination of the two features. One of the characteristics is oxygen balance and other characteristics selected according to chemical experience in Dataset-3; the other is molecular descriptors (Dataset-1, dataset-2 and Dataset-4) which are screened step by adopting a statistical method according to the target property, and the screening according to manual experience is not needed. The invention also discusses and compares the suitability of two features and the model while constructing QSPR model to predict sensitivity and mechanical properties by adopting an ANN and SISSO method.
2. Method of
2.1 database collection
According to the invention, a crystal structure file and SMILES character string of 240 nitro energetic compounds are obtained from a Cambridge crystal database (CSD), and mechanical property data of the nitro energetic compounds are obtained by adopting molecular dynamics simulation of a forcitite module in MaterialsStudio (MS) software. After structural optimization, utilizing the COMPASS force field, adopting Anderson temperature control and Parrinello pressure control under NPT ensemble and 295K temperature, setting the pressure to 0.0001GPa, adopting atom-based and Ewald addition methods respectively for van der Waals (vdW) and electrostatic interaction (Coulomb), taking a cutoff radius of 0.95nm, and carrying out cutoff tail correction. The initial atomic motion speed is determined according to Maxwell-Boltzman distribution, the solution of the Newton motion equation is established on basic assumptions such as periodic boundary conditions, time average equivalent to ensemble average and the like, the integration adopts a Verlet method, the time step is 1fs, and the track is stored every 10 fs. After the system is balanced, mechanical property analysis is carried out by adopting a 1ns simulation track after the system is balanced to obtain an elastic coefficient C ij A matrix of (i, j=1 to 6) and then calculating the available mechanical performance parameters. The tensile modulus (E), the shear modulus (G), and the bulk modulus (K) in mechanical properties, collectively referred to as engineering modulus, are commonly used as criteria for evaluating the stiffness or hardness of a material, and the bulk modulus K studied in the present invention is an important basis for reflecting the breaking strength.
Because the existing impact sensitivity experimental data is lack, and the experimental data is only used as an approximate indication of sensitivity due to poor reproducibility and the like, the invention adopts a reliable empirical formula proposed by Lai and the like, and the sensitivity data is calculated according to basic structural information of the nitro compound.
2.2 molecular descriptor computation
An important step in constructing a QSPR model is to quantify the molecular structure information under investigation, molecular descriptors, which are mathematical representations of the molecules, converting chemical information in the structure into useful numbers, each taking into account a small part of the total chemical information contained in the real molecule. The invention adopts E-Dragon1.0 software to calculate 1666 molecular descriptors on line based on SMILES character string. Table 1 shows the 20 classes that the descriptor contains.
The crystal density of the nitro energetic is obtained from CSD and the oxygen balance is calculated using equation (1). Furthermore, the C, H, O, N atomic number and molecular weight of each substance were extracted from the molecular formula,
The molecular formula of CHON series explosive researched by the invention can be written as C a H b O c N d Wherein a, b, c, d each represents the atomic number of C, H, O, N element in a molecule, M is the relative molecular mass (g/mol), and OB is the oxygen balance (g/g) of an energetic molecule.
2.3 reduction of Dragon descriptors
The present invention uses statistical methods to reduce the number of descriptors and obtain the required descriptors for constructing the QSPR model. Mainly comprises the following 5 steps:
(1) Eliminating all descriptors containing error information (i.e. exact values cannot be calculated);
(2) Removing descriptors with more than 75% of samples all having the same value;
(3) Omitting descriptors with a Relative Standard Deviation (RSD) of less than 0.05;
(4) Descriptors with correlation Pearson coefficients r greater than 0.75 are deleted, and descriptors with less correlation to the target value are removed when the correlation of two descriptors is greater than 0.75.
(5) Finally, the descriptor with p value (probability value) greater than 0.005 is removed by adopting an MLR forward stepwise regression method. Table 1 shows the screening results for properties targeting bulk modulus and impact sensitivity, respectively, for each step.
TABLE 1 reduction process statistics for molecular descriptors for bulk modulus and impact sensitivity, respectively, as target properties
14 molecular descriptors were obtained targeting bulk modulus as the property, including composition descriptor, information index, edge adjacency index, BCUT descriptor, geometry descriptor, 3D-MoRSE descriptor, GETAWAY descriptor, atomic center fragment, molecular property, 9 classes of descriptors as shown in table 2. 17 molecular descriptors were obtained targeting impact sensitivity, including 2D autocorrelation descriptors, geometric descriptors, RDF descriptors, 3D-MoRSE descriptors, WHIM descriptors, GETAWAY descriptors, and 8 types of molecular descriptors as shown in table 3. The descriptors screened out twice respectively were combined and de-duplicated using the impact sensitivity and bulk modulus together as target properties to obtain 26 kinds of descriptors including 10 kinds of descriptors as shown in table 4.
To this end, four data sets were obtained in total.
Dataset-1: the bulk modulus K corresponds to the 14 molecular descriptors screened out,
dataset-2: impact sensitivity h 50 17 molecular descriptors screened out corresponding to the molecular descriptors,
dataset-3: impact sensitivity and 8 characteristics of bulk modulus, crystal density, oxygen balance, molecular weight, CHON atom number, etc.,
dataset-4: bulk modulus and impact sensitivity, and 26 molecular descriptors corresponding thereto.
TABLE 2 descriptor features from K-targeting property screening
TABLE 3 h 50 Descriptor features obtained for target property screening
Table 4 is expressed in terms of K and h 50 Descriptor features obtained for target property screening
2.4 construction of ANN and SISSO models
Determining independent screening and sparse operators (SISSO) is a novel data analysis method developed based on compressed sensing, aimed at identifying low-dimensional descriptors that can capture the characteristics and functional attributes of potential physical mechanisms, and has been successfully applied to many material science problems. SISSO can identify the best descriptor from the combination of a large number of features (physical properties), reduce the dimension of a large feature space, and determine features unrelated to problems, so that the feature space can be further optimized, and finally a display analytic function of basic physical properties is obtained. The method is also applicable to small dataset models. For regression problems, SISSO has the potential to convert complex nonlinear problems into linear problems. For classification problems, besides calculating specific numerical values of certain properties, if two combined features are adopted to achieve an ideal effect, an intuitively clear material diagram can be directly drawn, and principles or internal mechanisms are further explored.
The SISSO mainly comprises the following two steps: 1) Feature space (potential descriptors) is built. The algebraic/functional operators (e.g. addition, multiplication, exponentiation, root, etc.) are iteratively combined with the original features, and in each iteration, each feature (a pair of features) is combined with each unary (binary) operator (the addition and subtraction operations after classifying the features can only be performed among similar features, and the complexity of the combined features can also be limited), so that an arbitrarily large feature space can be constructed. 2) Descriptor recognition, selecting one or more best combined features from the constructed feature space. In the first step, the size of the feature space depends on the number of operators, the dimension of the initial feature, and the number of iterations. In the second step, the complexity depends on the dimension of the final feature, the size of the feature subspace, and the model type. In the modeling process, the iteration number q of feature construction, the dimension omega of the final feature, the feature subspace size SIS and other super parameters need to be optimized according to comprehensive consideration of model performance, time cost and the like. The model of the invention is mainly optimized with respect to the iteration number q and the dimension omega of the final feature.
In recent years, an Artificial Neural Network (ANN) has been attracting attention, which is an information processing system based on the intelligent characteristics and structures of the simulated human brain, has parallel distributed processing and storage, high fault tolerance, self-organization and self-adaptation capabilities, is used as a method most commonly used for QSPR modeling, is often used for exploring and summarizing data rules, builds a quantitative mathematical model from "cause" to "fruit", and has made great progress in the fields of biology, medicine, economy and the like. For the type of ANN, the present invention uses MLP-ANN. The parameter ranges are as follows, algorithm: 'lbfgs', 'sgd', 'adam', activation function: 'identity', 'logistic', 'tan', 'relu', hidden layer neuron numbers: 2-20, selecting optimal parameters after optimization.
The invention mainly works by modeling four data sets by adopting two methods of SISSO and ANN, dividing the data sets into two subsets randomly, taking 80% of data as a training set and 20% of data as a testing set. The overall workflow is shown in fig. 2.
3. Results
3.1 optimal QSPR model for each dataset
After the steps of data arrangement, feature selection and model construction, only the best model is selected to test the performance. The invention builds 7 models on the basis of 4 data sets,
model-1: the bulk modulus and ANN model of 14 molecular descriptors (Dataset-1),
model-2: an ANN model of the impact sensitivity and 17 molecular descriptors (Dataset-2),
model-3: an ANN model (Dataset-3) with 8 characteristics of impact sensitivity and bulk modulus, crystal density, oxygen balance, molecular weight and CHON atomic number,
model-4: the impact sensitivity and bulk modulus with an ANN model of 26 molecular descriptors (Dataset-4),
model-5: the bulk modulus was compared with the SISSO model of 14 molecular descriptors (Dataset-1),
model-6: the SISSO model of the crash sensitivity and 17 molecular descriptors (Dataset-2),
model-7: the SISSO model (Dataset-3) with 8 characteristics of impact sensitivity and bulk modulus, crystal density, oxygen balance, molecular weight, CHON atomic number.
For the performance of the training set and the test set of each QSPR model, the invention adopts Root Mean Square Error (RMSE), pearson correlation coefficient (R), decision coefficient (R 2 ) The mathematical definitions are given by the formulas (2) to (4), respectively, for comprehensive evaluation. The model parameters and evaluation parameters for the 7 models are given in tables 5 and 6.
Wherein N is the number of compounds in each dataset, y i true Is true value, y i pred As a result of the model predictive value,for the average of the true values of the samples, +.>Is the average of the sample predictions.
Table 5 parameters of the best QSPR model for each dataset
Table 6 statistical parameters of the best QSPR model for each dataset
Characterized by 14 molecular descriptors (9 classes altogether), a model of the relationship between the bulk modulus (mechanical properties) of the nitroenergetic compound and its molecular structure was established using ANN (Dataset-1, model-1), training set and test set R 2 The model prediction performance is good at 0.92 and 0.81 respectively. The invention also adopts SISSO method to build model (Dataset-1, model-5) of the data set, and the result is less than ideal, R of training set and test set 2 0.71 and 0.63, respectively, there is a problem of under fitting.
Characterized by 17 molecular descriptors (8 classes altogether), a model of the relationship between the impact sensitivity of the nitroenergetic compound and its molecular structure was established using ANN (Dataset-2, model-2), R of training set and test set 2 The model prediction performance is good at 0.93 and 0.91 respectively. The invention also adopts SISSO method to build the model (Dataset-2, model-6) of the data set, training set and R of test set 2 0.91 and 0.85, respectively, the performance was still good, although the test set gave slightly worse than Model-2. The results of the invention are also compared with previous studies and are shown in table 7, and overall, the data set of the invention is larger and the model prediction accuracy is highest.
TABLE 7 prediction h 50 Different QSPR model performance comparisons of (C)
Although impact sensitivity has a number of influencing factors, the studies of the present invention indicate that it is possible to combine impact sensitivity of nitroenergetic species with their forcesThe correlation of the chemical properties (Dataset-3, model-7). R of the model training set and the test set 2 0.90 and 0.91, respectively. The relationship between the two is analyzed, and the result shows an analytical formula in a quasi-linear form, as shown in formula (5). In this formula, the impact sensitivity is only a function of the number of CHON atoms, the oxygen balance, the molecular weight and the bulk modulus, all of which can be directly obtained by the formula. As can be seen from the formula, for the isomer, the impact sensitivity h is as small as K 50 Will be large and is strongly affected by fluctuations in the value of K. The necessity of adding additives such as binders and the like into the PBX explosive to improve the mechanical properties is just described from the side, and if the mechanical properties (such as bulk modulus and the like) of the main explosive are high, the sensitivity is high, and the safety is low. ANN Model (Dataset-3, model-3) performance built from the same Dataset is comparable to Model-7, R for training and test sets 2 0.91 and 0.89, respectively.
h 50 For impact sensitivity, a, b, c, d is the number of C, H, O, N elements in a molecule, M is the molecular weight, OB is the oxygen balance value, and K is the bulk modulus, respectively.
In addition, a simple comparison of Model-1, model-2, model-4 can show that the ANN Model has both sensitivity and mechanical properties as outputs (Model-4), which are inferior to Model-1 and Model-2, which have one of them as outputs alone, because the Model needs to take into account the accuracy of prediction of both target properties when tuning. While combining feature build models while predicting multiple properties is somewhat beneficial to research, if accuracy is sought, single output models may be built separately.
3.2 comparison of ANN and SISSO models
In order to objectively compare the performances of sensitivity and mechanical property models, the invention respectively carries out parameter optimization by using two methods of SISSO and ANN, and then compares the two methods by adopting 5 times of cross validation results divided by the same sample of each data set Compared with the prior art. ANN is still optimized in the following parameter ranges, algorithm: 'lbfgs', 'sgd', 'adam', activation function: 'identity', 'logistic', 'tan', 'relu', hidden layer neuron numbers: 2-20. The SISSO model optimizes two parameters, namely the iteration number q and the final feature dimension omega. For Dataset-1, dataset-2, and Dataset-3, in FIGS. 3-5, the present invention compares the number of different iterations of the SISSO model with the Root Mean Square Error (RMSE), pearson correlation coefficient (R), decision coefficient (R 2 ). In fig. 3, the performance of the training set is slightly improved with the increase of the final dimension and the iteration number, but the result of the test set shows that the performance difference is not large when the iteration number is 0-1, and the overfitting occurs when the iteration number is 2. A clear trend is shown in fig. 4, where model accuracy increases with increasing number of iterations in the same dimension and with increasing dimension. As can be seen from fig. 5A, the performance gap of the training set is not large with increasing final dimension and iteration number, but the performance of the test set is gradually improved, and the gap from the performance of the training set is gradually reduced. And comprehensively considering the model performance, SISSO analytic formula complexity and model construction time cost, and selecting 3 data set cross-validation model parameters. The SISSO and ANN final selection parameters are shown in tables 8-10.
Table 8 model structure and statistical parameters for 5-fold cross-validation results for dataset 1
For SISSO, the parameter a is q, b is SIS, c is Ω. For ANN, the parameter a is the number of hidden layer neurons, b is the training algorithm, and c is the activation function.
Table 9 model structure and statistical parameters for 5-fold cross-validation results for dataset 2
/>
For SISSO, the parameter a is q, b is SIS, c is Ω. For ANN, the parameter a is the number of hidden layer neurons, b is the training algorithm, and c is the activation function.
Table 10 model structure and statistical parameters for 5-fold cross-validation results for dataset 3
For SISSO, the parameter a is q, b is SIS, c is Ω. For ANN, the parameter a is the number of hidden layer neurons, b is the training algorithm, and c is the activation function.
Comparing the performances of Dataset-1, dataset-2, dataset-3 on the SISSO and ANN models (tables 8-10), it can be seen that the mechanical properties and sensitivity data (Dataset-1 and Dataset-2) models featuring molecular descriptors are superior to the SISSO model in terms of ANN model performance, and the difference between the two is particularly pronounced on Dataset-1. And a sensitivity data (Dataset-3) model characterized by mechanical properties and the like, the SISSO method is superior to ANN. This may be due to the difference in the amount of information and transparency of the information contained by the two features. Deep learning models based on big data can extract useful information from abstract generic features such as SMILES strings, while for small data greater accuracy is often obtained by using more explicit features that rely on chemical (intuition) and domain expertise to manually select. The physical meaning of the descriptor calculated by SMILES is less clear than parameters such as element composition, crystal density, mechanical property, oxygen balance, molecular weight and the like, and useful information contained in the parameters such as oxygen balance and the like is more transparent than the descriptor. ANN has strong capability of processing nonlinear models, is good at mining the inherent relation among data, and is suitable for complex nonlinear problems with slightly undefined physical meanings such as descriptors. SISSO is relatively simple in principle, has high screening requirement on initial characteristics, and is suitable for model construction with more definite characteristic information. It is worth mentioning that SISSO has the potential to convert nonlinear problems into linear problems. When feature space is huge or highly correlated, SISSO is not limited like the traditional LASSO method, and the effectiveness of compressed sensing can be maintained while solving the huge space problem. The only limitation is that certain computational conditions are required when dealing with large feature spaces. However, due to the approximation of the equation and the unavoidable correlation between features (one or more features may be accurately described by a nonlinear function of the remaining feature subsets), the equations found by SISSO are not necessarily unique and the components of the descriptor may change as the final feature dimension changes.
In the invention, based on the molecular descriptor calculated by E-Dragon and several common and easily obtained molecular structure information, the invention establishes 7 QSPR models (Model 1-7) of the impact sensitivity and the bulk modulus (mechanical property) of the nitro energetic compound, thereby being beneficial to shortening the experimental research process of the energetic material and being beneficial to the design and comprehensive evaluation of novel energetic compounds. Featuring 14 descriptors, the best model for bulk modulus (mechanical properties) as target property is an ANN model (Dataset-1, model-1), training set R 2 0.92 and test set 0.81. Featuring 17 descriptors, the best model for the impact sensitivity as target property is also the ANN model (Dataset-2, model-2), training set R 2 0.93 and test set 0.91. Comparing the ANN sensitivity model of the present invention with similar models in the literature demonstrates that the model of the present invention has certain advantages in predicting impact sensitivity because it has the highest accuracy.
In addition, the invention also proves that a certain relation exists between the impact sensitivity and the mechanical property, and the two property relations of the nitro energetic compound are presented in the form of a mathematical formula, and the analysis of the formula shows that when the molecular formulas are the same, the bulk modulus is smaller, the material sensitivity is lower, and when the bulk modulus is small enough, the sensitivity can be greatly reduced by the weak reduction amplitude of the value, which just represents the necessity of improving the mechanical property of the energetic material. Comparing the input characteristics and performances of several models, the invention obtains the conclusion that the ANN model with strong function is good at mining the internal relation between data, and can extract the needed information from the characteristics (such as molecular descriptors) with small information quantity and ambiguity and learn; whereas the SISSO method, which is relatively simple in principle, is more suitable for feature information with definite physical meaning.
The results of the present invention demonstrate that the present invention employs a useful and desirable method.
It should be noted that the embodiments of the present invention can be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those of ordinary skill in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The device of the present invention and its modules may be implemented by hardware circuitry, such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., as well as software executed by various types of processors, or by a combination of the above hardware circuitry and software, such as firmware.
The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.

Claims (8)

1. The machine learning estimation method for the sensitivity, the mechanical property and the relation of the energy-containing substances is characterized in that the machine learning estimation method for the sensitivity, the mechanical property and the relation of the energy-containing substances takes a molecular descriptor calculated by E-Dragon and several basic molecular structure information as characteristics, a quantitative structure-activity relation model for the impact sensitivity and the bulk modulus of 7 nitro-energy-containing substances is constructed based on an artificial neural network and a method for determining independent screening and sparse operators, and the relation between the impact sensitivity and the mechanical property of the nitro-energy-containing substances and the quantitative relation between the impact sensitivity and the mechanical property of the nitro-energy-containing substances and the molecular structure are determined by utilizing the constructed quantitative structure-activity relation model for the impact sensitivity and the bulk modulus of the nitro-energy-containing substances;
the molecular descriptor calculation method comprises the following steps:
(1) Calculating a molecular descriptor:
1666 molecular descriptors per molecule were computed online using E-Dragon1.0 software, based on SMILES strings; obtaining crystal density of nitro energetic compounds from Cambridge crystal database CSD, molecular formula C from energetic materials a H b O c N d C, H, O, N atomic number and molecular weight of each substance were extracted;
wherein a, b, c, d respectively represents the atomic number of C, H, O, N elements in a molecule, M is the relative molecular mass, the unit is g/mol, OB is the oxygen balance of an energetic molecule, and the unit is g/g; the oxygen balance numerical calculation formula is as follows:
(2) Reducing the number of descriptors using a statistical method to obtain the required descriptors for constructing the QSPR model;
the feature descriptors and the target properties used for constructing the QSPR model after screening are arranged into the following 4 data sets;
dataset-1: obtaining 14 molecular descriptors with bulk modulus as target property; including structure descriptor, information index, edge adjacency index, BCUT descriptor, geometry descriptor, 3D-Morse descriptor, GETAWAY descriptor, atomic center fragment, molecular property 9 kinds of descriptors altogether;
dataset-2: obtaining 17 molecular descriptors with impact sensitivity as target property, wherein the 17 molecular descriptors comprise 2D autocorrelation descriptors, geometric descriptors, RDF descriptors, 3D-Morse descriptors, WHIM descriptors, GETAWAY descriptors and 8 types of molecular property descriptors;
Dataset-3: impact sensitivity and 8 characteristics of bulk modulus, crystal density, oxygen balance, molecular weight, CHON number of atoms;
dataset-4: combining and de-duplicating the descriptors screened out twice respectively by taking the impact sensitivity and the bulk modulus as target properties together to obtain 26 descriptors including 10 descriptors;
the method for constructing the quantitative structure-activity relation model of the impact sensitivity and the bulk modulus of the nitro energetic comprises the following steps: using Dataset-1, dataset-2, dataset-3 and Dataset-4 as data sets, randomly dividing the data sets into two subsets, using 80% of data as training sets and using 20% of data as test sets; modeling 4 data sets of Dataset-1, dataset-2, dataset-3 and Dataset-4 by using two methods of SISSO and ANN to obtain quantitative structure-activity relation models of impact sensitivity and bulk modulus of 7 nitro energetic substances;
the quantitative structure-activity relation model of the impact sensitivity and the bulk modulus of the nitro energetic is respectively as follows:
model-1: an ANN model of bulk modulus and corresponding 14 molecular descriptors, dataset-1;
model-2: an ANN model of the impact sensitivity and 17 corresponding molecular descriptors, dataset-2;
model-3: an ANN model with 8 characteristics of impact sensitivity, bulk modulus, crystal density, oxygen balance, molecular weight and CHON atomic number, dataset-3;
Model-4: an ANN model of impact sensitivity and bulk modulus and corresponding 26 molecular descriptors, dataset-4;
model-5: SISSO model of bulk modulus and 14 molecular descriptors, dataset-1;
model-6: a SISSO model of impact sensitivity and 17 molecular descriptors, dataset-2;
model-7: the SISSO model Dataset-3 with 8 characteristics of impact sensitivity, bulk modulus, crystal density, oxygen balance, molecular weight and CHON atomic number.
2. The machine learning estimation method of energetic material sensitivity and mechanical properties and their relationships according to claim 1, wherein the machine learning estimation method of energetic material sensitivity and mechanical properties and their relationships comprises:
step one, collecting and processing data, obtaining impact sensitivity and bulk modulus values of 240 nitroaromatic-based nitro compounds, collecting molecular weight and crystal density characteristics, and calculating Dragon molecular descriptors of 240 substances by taking SMILES character strings as inputs;
step two, gradually screening out corresponding final characteristics by taking the impact sensitivity and the bulk modulus as target properties, and establishing QSPR relation models between the impact sensitivity and the bulk modulus of the energetic material and the molecular structure of the energetic material by adopting an ANN method and an SISSO method;
Step three, taking the impact sensitivity of 240 nitro energetic compounds as output, taking the atomic number, the molecular weight, the crystal density, the oxygen balance and the bulk modulus of CHON elements as characteristics, establishing the correlation between the sensitivity of the energetic material and the mechanical property, and analyzing the relation between the impact sensitivity of the nitro energetic material and the mechanical property thereof;
comparing the model performance difference between the ANN method and the SISSO method and the combination of the two features, searching the optimal parameters for each model, calculating a 5-time cross verification result of the model, and comparing the suitability of the model and the two features of the feature selected according to experience and the molecular descriptor calculated according to the SMILES character string; the two characteristics are as follows:
oxygen balance and related characteristics selected according to chemical experience in Dataset-3;
molecular descriptors which are screened out from Dataset-1, dataset-2 and Dataset-4 step by adopting a statistical method according to the target property.
3. The machine learning estimation method of energetic material sensitivity and mechanical properties and their relationship according to claim 2, wherein the data acquisition method comprises: acquiring a crystal structure file and SMILES character strings of 240 nitro energetic compounds, and acquiring mechanical property data of the nitro energetic compounds by adopting molecular dynamics simulation of a forcitite module in MS software; after structure optimization, the COMPASS force field is used, and Anderson temperature control is adopted under the NPT ensemble and 295K temperature The Parrinello pressure is controlled, the pressure is set to be 0.0001GPa, van der Waals and static electricity effects respectively adopt atom-based and Ewald addition methods, the truncated radius is 0.95nm, and truncated tail correction is carried out; the initial atomic motion speed is determined according to Maxwell-Boltzman distribution, the solution of the Newton's motion equation is established on the basic assumption that the periodic boundary condition and the time average are equivalent to the ensemble average, the integration adopts a Verlet method, the time step is 1fs, and the track is stored every 10 fs; after the system is balanced, mechanical property analysis is carried out by adopting a 1ns simulation track after the system is balanced to obtain an elastic coefficient C ij I, j=a matrix of 1 to 6, and the available mechanical performance parameters are calculated.
4. The machine learning estimation method of energetic substance sensitivity and mechanical properties and their relationships according to claim 1, wherein in step (2), the use of statistical methods to reduce the number of descriptors, obtaining the required descriptors for constructing the QSPR model comprises the steps of:
1) Eliminating all descriptors containing error information and incapable of calculating exact numerical values;
2) Removing descriptors with more than 75% of samples all having the same value;
3) Omitting descriptors with a relative standard deviation RSD less than 0.05;
4) Deleting the descriptors with the correlation Pearson coefficient r being larger than 0.75, and removing the descriptors with smaller correlation with the target value when the correlation between the two descriptors is larger than 0.75;
5) The descriptor with p value, namely the probability value larger than 0.005 is removed by adopting an MLR forward stepwise regression method.
5. The method for machine learning estimation of energetic material sensitivity and mechanical properties and their relationships as defined in claim 2,
in the fourth step, the difference of model performance of the combination of the ANN method and the SISSO method and the two characteristics is compared, and the suitability of the two characteristics and the model is compared, wherein the two characteristics are as follows: oxygen balance and related characteristics selected according to chemical experience in Dataset-3; molecular descriptors which are screened out step by step according to the target property in Dataset-1, dataset-2 and Dataset-4 by adopting a statistical method; the architecture of the best QSPR model obtained for each dataset includes:
the root mean square error RMSE, the Pearson correlation coefficient R and the decision coefficient R are adopted 2 Comprehensively evaluating the performances of the training set and the testing set of the 7 QSPR models;
the formula is as follows:
wherein N is the number of compounds in each dataset, y i true Is true value, y i pred As a result of the model predictive value,for the average of the true values of the samples, +.>An average value of the sample predicted values;
the optimal QSPR model corresponding to the data set is as follows:
the best model with 14 descriptors and bulk modulus as target property is ANN model-1; characterized by 17 descriptors, the best model with the impact sensitivity as the target property is also an ANN model-2;
The formula of the relation between the impact sensitivity and the mechanical property of the nitro energetic is as follows:
h 50 for impact sensitivity, a, b, c, d is the number of C, H, O, N elements in a molecule, M is the molecular weight, OB is the oxygen balance value, and K is the bulk modulus, respectively.
6. A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of: taking the molecular descriptor and the molecular structure information calculated by the E-Dragon as characteristics, constructing quantitative structure-activity relation models of the impact sensitivity and the bulk modulus of 7 nitro energetic substances based on an artificial neural network and a method for determining independent screening and sparse operators, and determining the relation between the impact sensitivity and the mechanical property of the nitro energetic substances and the quantitative relation between the impact sensitivity and the mechanical property of the nitro energetic substances and the molecular structure respectively by utilizing the constructed quantitative structure-activity relation models of the impact sensitivity and the bulk modulus of the nitro energetic substances;
the molecular descriptor calculation method comprises the following steps:
(1) Calculating a molecular descriptor:
1666 molecular descriptors per molecule were computed online using E-Dragon1.0 software, based on SMILES strings; obtaining crystal density of nitro energetic compounds from Cambridge crystal database CSD, molecular formula C from energetic materials a H b O c N d C, H, O, N atomic number and molecular weight of each substance were extracted;
wherein a, b, c, d respectively represents the atomic number of C, H, O, N elements in a molecule, M is the relative molecular mass, the unit is g/mol, OB is the oxygen balance of an energetic molecule, and the unit is g/g; the oxygen balance numerical calculation formula is as follows:
(2) Reducing the number of descriptors using a statistical method to obtain the required descriptors for constructing the QSPR model;
the feature descriptors and the target properties used for constructing the QSPR model after screening are arranged into the following 4 data sets;
dataset-1: obtaining 14 molecular descriptors with bulk modulus as target property; including structure descriptor, information index, edge adjacency index, BCUT descriptor, geometry descriptor, 3D-Morse descriptor, GETAWAY descriptor, atomic center fragment, molecular property 9 kinds of descriptors altogether;
dataset-2: obtaining 17 molecular descriptors with impact sensitivity as target property, wherein the 17 molecular descriptors comprise 2D autocorrelation descriptors, geometric descriptors, RDF descriptors, 3D-Morse descriptors, WHIM descriptors, GETAWAY descriptors and 8 types of molecular property descriptors;
dataset-3: impact sensitivity and 8 characteristics of bulk modulus, crystal density, oxygen balance, molecular weight, CHON number of atoms;
Dataset-4: combining and de-duplicating the descriptors screened out twice respectively by taking the impact sensitivity and the bulk modulus as target properties together to obtain 26 descriptors including 10 descriptors;
the method for constructing the quantitative structure-activity relation model of the impact sensitivity and the bulk modulus of the nitro energetic comprises the following steps: using Dataset-1, dataset-2, dataset-3 and Dataset-4 as data sets, randomly dividing the data sets into two subsets, using 80% of data as training sets and using 20% of data as test sets; modeling 4 data sets of Dataset-1, dataset-2, dataset-3 and Dataset-4 by using two methods of SISSO and ANN to obtain quantitative structure-activity relation models of impact sensitivity and bulk modulus of 7 nitro energetic substances;
the quantitative structure-activity relation model of the impact sensitivity and the bulk modulus of the nitro energetic is respectively as follows:
model-1: an ANN model of bulk modulus and corresponding 14 molecular descriptors, dataset-1;
model-2: an ANN model of the impact sensitivity and 17 corresponding molecular descriptors, dataset-2;
model-3: an ANN model with 8 characteristics of impact sensitivity, bulk modulus, crystal density, oxygen balance, molecular weight and CHON atomic number, dataset-3;
Model-4: an ANN model of impact sensitivity and bulk modulus and corresponding 26 molecular descriptors, dataset-4;
model-5: SISSO model of bulk modulus and 14 molecular descriptors, dataset-1;
model-6: a SISSO model of impact sensitivity and 17 molecular descriptors, dataset-2;
model-7: the SISSO model Dataset-3 with 8 characteristics of impact sensitivity, bulk modulus, crystal density, oxygen balance, molecular weight and CHON atomic number.
7. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of: taking the molecular descriptor and the molecular structure information calculated by the E-Dragon as characteristics, constructing quantitative structure-activity relation models of the impact sensitivity and the bulk modulus of 7 nitro energetic substances based on an artificial neural network and a method for determining independent screening and sparse operators, and determining the relation between the impact sensitivity and the mechanical property of the nitro energetic substances and the quantitative relation between the impact sensitivity and the mechanical property of the nitro energetic substances and the molecular structure respectively by utilizing the constructed quantitative structure-activity relation models of the impact sensitivity and the bulk modulus of the nitro energetic substances;
the molecular descriptor calculation method comprises the following steps:
(1) Calculating a molecular descriptor:
1666 molecular descriptors per molecule were computed online using E-Dragon1.0 software, based on SMILES strings; obtaining crystal density of nitro energetic compounds from Cambridge crystal database CSD, molecular formula C from energetic materials a H b O c N d C, H, O, N atomic number and molecular weight of each substance were extracted;
wherein a, b, c, d respectively represents the atomic number of C, H, O, N elements in a molecule, M is the relative molecular mass, the unit is g/mol, OB is the oxygen balance of an energetic molecule, and the unit is g/g; the oxygen balance numerical calculation formula is as follows:
(2) Reducing the number of descriptors using a statistical method to obtain the required descriptors for constructing the QSPR model;
the feature descriptors and the target properties used for constructing the QSPR model after screening are arranged into the following 4 data sets;
dataset-1: obtaining 14 molecular descriptors with bulk modulus as target property; including structure descriptor, information index, edge adjacency index, BCUT descriptor, geometry descriptor, 3D-Morse descriptor, GETAWAY descriptor, atomic center fragment, molecular property 9 kinds of descriptors altogether;
dataset-2: obtaining 17 molecular descriptors with impact sensitivity as target property, wherein the 17 molecular descriptors comprise 2D autocorrelation descriptors, geometric descriptors, RDF descriptors, 3D-Morse descriptors, WHIM descriptors, GETAWAY descriptors and 8 types of molecular property descriptors;
Dataset-3: impact sensitivity and 8 characteristics of bulk modulus, crystal density, oxygen balance, molecular weight, CHON number of atoms;
dataset-4: combining and de-duplicating the descriptors screened out twice respectively by taking the impact sensitivity and the bulk modulus as target properties together to obtain 26 descriptors including 10 descriptors;
the method for constructing the quantitative structure-activity relation model of the impact sensitivity and the bulk modulus of the nitro energetic comprises the following steps: using Dataset-1, dataset-2, dataset-3 and Dataset-4 as data sets, randomly dividing the data sets into two subsets, using 80% of data as training sets and using 20% of data as test sets; modeling 4 data sets of Dataset-1, dataset-2, dataset-3 and Dataset-4 by using two methods of SISSO and ANN to obtain quantitative structure-activity relation models of impact sensitivity and bulk modulus of 7 nitro energetic substances;
the quantitative structure-activity relation model of the impact sensitivity and the bulk modulus of the nitro energetic is respectively as follows:
model-1: an ANN model of bulk modulus and corresponding 14 molecular descriptors, dataset-1;
model-2: an ANN model of the impact sensitivity and 17 corresponding molecular descriptors, dataset-2;
model-3: an ANN model with 8 characteristics of impact sensitivity, bulk modulus, crystal density, oxygen balance, molecular weight and CHON atomic number, dataset-3;
Model-4: an ANN model of impact sensitivity and bulk modulus and corresponding 26 molecular descriptors, dataset-4;
model-5: SISSO model of bulk modulus and 14 molecular descriptors, dataset-1;
model-6: a SISSO model of impact sensitivity and 17 molecular descriptors, dataset-2;
model-7: the SISSO model Dataset-3 with 8 characteristics of impact sensitivity, bulk modulus, crystal density, oxygen balance, molecular weight and CHON atomic number.
8. A machine learning estimation system for implementing the machine learning estimation method of energetic substance sensitivity and mechanical property and relationship thereof according to any one of claims 1 to 5, characterized in that the machine learning estimation system of energetic substance sensitivity and mechanical property and relationship thereof comprises:
the quantitative structure-activity relation model building module is used for taking a molecular descriptor calculated by E-Dragon as a characteristic, and building quantitative structure-activity relation models of impact sensitivity and bulk modulus of 5 nitro energetic substances and molecular structures thereof respectively based on an artificial neural network and a method for determining independent screening and sparse operators;
the impact sensitivity and mechanical property relation determining module is used for determining the relation between the impact sensitivity and mechanical property of the nitro-energetic substances by using a quantitative structure-activity relation model of the impact sensitivity and the bulk modulus of the 2 nitro-energetic substances constructed by taking the molecular structure information as characteristics.
CN202011311694.2A 2020-11-20 2020-11-20 Machine learning estimation method for sensitivity and mechanical property of energetic substance and relation of energetic substance Active CN112382350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011311694.2A CN112382350B (en) 2020-11-20 2020-11-20 Machine learning estimation method for sensitivity and mechanical property of energetic substance and relation of energetic substance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011311694.2A CN112382350B (en) 2020-11-20 2020-11-20 Machine learning estimation method for sensitivity and mechanical property of energetic substance and relation of energetic substance

Publications (2)

Publication Number Publication Date
CN112382350A CN112382350A (en) 2021-02-19
CN112382350B true CN112382350B (en) 2023-07-28

Family

ID=74585953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011311694.2A Active CN112382350B (en) 2020-11-20 2020-11-20 Machine learning estimation method for sensitivity and mechanical property of energetic substance and relation of energetic substance

Country Status (1)

Country Link
CN (1) CN112382350B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312853B (en) * 2021-06-28 2022-10-21 南京玻璃纤维研究设计院有限公司 Density prediction method based on molecular dynamics and ridge regression algorithm
CN114049922B (en) * 2021-11-09 2022-06-03 四川大学 Molecular design method based on small-scale data set and generation model
CN114397420B (en) * 2021-12-17 2023-12-12 西安近代化学研究所 Determination method for compression potential energy of layered stacked energetic compound molecular crystals
CN115169083B (en) * 2022-06-17 2024-03-19 山东科技大学 Method for calculating pyrolysis kinetic parameters of polymer matrix composite
CN115169111B (en) * 2022-07-04 2023-04-18 中北大学 Random forest based energetic material mechanical property prediction method and storage device
CN115762658B (en) * 2022-11-17 2023-07-21 四川大学 Eutectic density prediction method based on graph convolution neural network

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339181A (en) * 2008-08-14 2009-01-07 南京工业大学 Organic compound explosive characteristic prediction method based on genetic algorithm
CN105601457A (en) * 2016-02-17 2016-05-25 中北大学 ETN-DNT eutecticevaporate energetic material and preparation method thereof
CN106631639A (en) * 2017-01-06 2017-05-10 中国工程物理研究院化工材料研究所 Method for improving the surface wettability of energetic material and the mechanical property of explosive
CN106886615A (en) * 2015-12-10 2017-06-23 南京理工大学 A kind of analogy method of RDX Quito component containing energy compound
CN109283104A (en) * 2018-11-15 2019-01-29 北京理工大学 Product cut size is distributed on-line monitoring method in crystal solution in a kind of RDX preparation process
CN109411029A (en) * 2018-10-10 2019-03-01 西安近代化学研究所 A kind of energy-containing compound Performance Prediction system
CN109581870A (en) * 2018-11-27 2019-04-05 中国工程物理研究院化工材料研究所 The temperature in the kettle dynamic matrix control method of energetic material reaction kettle
CN110728047A (en) * 2019-10-08 2020-01-24 中国工程物理研究院化工材料研究所 Computer-aided design system for predicting energetic molecules based on machine learning performance
CN110867217A (en) * 2019-11-18 2020-03-06 西安近代化学研究所 Method for calculating crystallization morphology of energetic material in solution
CN110890135A (en) * 2019-11-18 2020-03-17 西安近代化学研究所 Prediction method of energetic N-oxide crystal structure
CN111429980A (en) * 2020-04-14 2020-07-17 北京迈高材云科技有限公司 Automatic acquisition method for material crystal structure characteristics

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2351002C (en) * 2000-06-27 2009-04-07 The Minister Of National Defence Insensitive melt cast explosive compositions containing energetic thermoplastic elastomers

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339181A (en) * 2008-08-14 2009-01-07 南京工业大学 Organic compound explosive characteristic prediction method based on genetic algorithm
CN106886615A (en) * 2015-12-10 2017-06-23 南京理工大学 A kind of analogy method of RDX Quito component containing energy compound
CN105601457A (en) * 2016-02-17 2016-05-25 中北大学 ETN-DNT eutecticevaporate energetic material and preparation method thereof
CN106631639A (en) * 2017-01-06 2017-05-10 中国工程物理研究院化工材料研究所 Method for improving the surface wettability of energetic material and the mechanical property of explosive
CN109411029A (en) * 2018-10-10 2019-03-01 西安近代化学研究所 A kind of energy-containing compound Performance Prediction system
CN109283104A (en) * 2018-11-15 2019-01-29 北京理工大学 Product cut size is distributed on-line monitoring method in crystal solution in a kind of RDX preparation process
CN109581870A (en) * 2018-11-27 2019-04-05 中国工程物理研究院化工材料研究所 The temperature in the kettle dynamic matrix control method of energetic material reaction kettle
CN110728047A (en) * 2019-10-08 2020-01-24 中国工程物理研究院化工材料研究所 Computer-aided design system for predicting energetic molecules based on machine learning performance
CN110867217A (en) * 2019-11-18 2020-03-06 西安近代化学研究所 Method for calculating crystallization morphology of energetic material in solution
CN110890135A (en) * 2019-11-18 2020-03-17 西安近代化学研究所 Prediction method of energetic N-oxide crystal structure
CN111429980A (en) * 2020-04-14 2020-07-17 北京迈高材云科技有限公司 Automatic acquisition method for material crystal structure characteristics

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Probing impact of molecular structure on bulk modulus and impact sensitivity of energetic materials by machine learning methods;Qianqian Deng 等;《Chemometrics and Intelligent Laboratory Systems》;104331 *
QSPR studies of impact sensitivity of nitro energetic compounds using three-dimensional descriptors;Jie Xu 等;《Journal of Molecular Graphics and Modelling》;10-19 *
含能材料的撞击感度等安全参数的定量构效关系研究;钱博文;《中国优秀硕士学位论文全文数据库 工程科技I辑》;B017-41 *
固相硝基甲烷感度及其调控的理论研究;钟汨;《中国优秀硕士学位论文全文数据库 工程科技I辑》;B017-11 *
溶剂和热诱导下炸药界面组装规律与性能研究;张孟华;《中国优秀硕士学位论文全文数据库 工程科技I辑》;B017-12 *
高聚物粘结炸药结构与性能的计算模拟研究;马秀芳;《中国博士论文全文数据库 工程科技I辑》;B017-4 *

Also Published As

Publication number Publication date
CN112382350A (en) 2021-02-19

Similar Documents

Publication Publication Date Title
CN112382350B (en) Machine learning estimation method for sensitivity and mechanical property of energetic substance and relation of energetic substance
Ji et al. Autonomous discovery of unknown reaction pathways from data by chemical reaction neural network
Nagel et al. Logic, methodology and philosophy of science, Proceedings of the 1960 International Congress
Liu et al. Fold-LTR-TCP: protein fold recognition based on triadic closure principle
Schmidt et al. Distilling free-form natural laws from experimental data
Ko et al. Collaborative recurrent neural networks for dynamic recommender systems
Geweke et al. Inference and prediction in a multiple-structural-break model
Simine et al. Predicting optical spectra for optoelectronic polymers using coarse-grained models and recurrent neural networks
Yu et al. The applications of deep learning algorithms on in silico druggable proteins identification
Barta Identifying biological pathway interrupting toxins using multi-tree ensembles
Pannell et al. Application of transfer learning for the prediction of blast impulse
Thakker et al. Pushing the limits of rnn compression
Hasic et al. Single-step retrosynthesis prediction based on the identification of potential disconnection sites using molecular substructure fingerprints
Jaume-Santero et al. Transformer performance for chemical reactions: Analysis of different predictive and evaluation scenarios
Hu et al. TargetDBP+: enhancing the performance of identifying DNA-binding proteins via weighted convolutional features
Leelaprute et al. Does coding in pythonic zen peak performance? preliminary experiments of nine pythonic idioms at scale
Zhou et al. TransVAE-DTA: Transformer and variational autoencoder network for drug-target binding affinity prediction
Debusschere et al. Computational singular perturbation with non-parametric tabulation of slow manifolds for time integration of stiff chemical kinetics
Stoehr et al. An ordinal latent variable model of conflict intensity
Song et al. Missing value imputation using XGboost for label-free mass spectrometry-based proteomics data
Li et al. Additive Multi-Index Gaussian process modeling, with application to multi-physics surrogate modeling of the quark-gluon plasma
Jiang et al. Does Deep Learning improve the performance of duplicate bug report detection? An empirical study
Wei et al. Toward efficient chemistry calculations in engine simulations through static adaptive acceleration
Hu et al. A quantitative analysis of determinants of non-citation using a panel data model
Jadidi et al. A long short-term memory neural network for the low-cost prediction of soot concentration in a time-dependent flame

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant