CN111522867B - Quick screening and recommending method and system for explosive formula - Google Patents
Quick screening and recommending method and system for explosive formula Download PDFInfo
- Publication number
- CN111522867B CN111522867B CN202010210459.XA CN202010210459A CN111522867B CN 111522867 B CN111522867 B CN 111522867B CN 202010210459 A CN202010210459 A CN 202010210459A CN 111522867 B CN111522867 B CN 111522867B
- Authority
- CN
- China
- Prior art keywords
- data
- formula
- explosive
- module
- screening
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000002360 explosive Substances 0.000 title claims abstract description 70
- 238000012216 screening Methods 0.000 title claims abstract description 65
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000000007 visual effect Effects 0.000 claims abstract description 56
- 230000003993 interaction Effects 0.000 claims abstract description 40
- 239000000203 mixture Substances 0.000 claims abstract description 40
- 238000009472 formulation Methods 0.000 claims abstract description 32
- 230000035945 sensitivity Effects 0.000 claims abstract description 29
- 238000004458 analytical method Methods 0.000 claims abstract description 27
- 238000005516 engineering process Methods 0.000 claims abstract description 27
- 238000007781 pre-processing Methods 0.000 claims abstract description 22
- 238000012545 processing Methods 0.000 claims abstract description 18
- 238000004422 calculation algorithm Methods 0.000 claims description 21
- 239000000463 material Substances 0.000 claims description 17
- 238000005474 detonation Methods 0.000 claims description 15
- 238000004138 cluster model Methods 0.000 claims description 7
- 230000004927 fusion Effects 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 238000012800 visualization Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000004140 cleaning Methods 0.000 claims description 3
- 230000009133 cooperative interaction Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000006467 substitution reaction Methods 0.000 claims description 2
- 238000007418 data mining Methods 0.000 abstract description 6
- 238000013461 design Methods 0.000 abstract description 4
- 238000004880 explosion Methods 0.000 description 6
- 239000000126 substance Substances 0.000 description 3
- 238000007794 visualization technique Methods 0.000 description 3
- 230000001680 brushing effect Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000013079 data visualisation Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000006185 dispersion Substances 0.000 description 2
- 239000012535 impurity Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 241000220317 Rosa Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 229910052698 phosphorus Inorganic materials 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Abstract
The explosive formula rapid screening and recommending system comprises an explosive formula rapid screening unit and an explosive similar formula recommending unit; the explosive formula rapid screening unit comprises a basic resource database, a data processing module, a Mysql database, a parameter introducing module and a visual analysis module; the explosive similar formula recommending unit comprises a formula sample data preprocessing module, a formula clustering model establishing module and a visual interaction module; the data preprocessing module is used for analyzing and preprocessing the component proportion, the molecular composition, the performance data and the sensitivity data of the formula; the formula clustering model building module determines a clustering center and acquires class clusters; the application designs and realizes the method and the system for rapidly screening and recommending the explosive formulation by a high-dimensional multi-element parallel coordinate visual interaction technology and a data mining and visual analysis technology, and simultaneously provides a plurality of convenient and flexible interaction modes, thereby facilitating the analysis and the understanding of the data by users.
Description
Technical Field
The application relates to the field of screening and recommending explosive formulations of energetic materials, in particular to a rapid screening and recommending method and a rapid screening and recommending system for explosive formulations.
Background
In 2016, china proposed an energetic material genome research program (Energy Materials Genome Initiative, EMGI), the goal of which is to fully exert the cross functions of databases, calculation and experiments, and utilize the existing computer and big data analysis technology to find the "genes" for determining the performance of the energetic material, and design and synthesize novel energetic materials by using the genes, and the research and development of the energetic materials in the future emphasize that innovation is promoted by multidisciplinary cross fusion and the benign communication of the military and civil technology is advocated to promote the development of the energetic materials.
In the process of screening and recommending an explosive formula, high-dimensional multi-metadata such as chemical property, thermal property, detonation property, sensitivity and the like of the formula can be involved. In the field of information visualization, processing and visual analysis of high-dimensional multi-metadata have been research hotspots, and in the middle 80 s and the early 90 s of the 20 th century, inselberg et al propose a high-dimensional multi-metadata visualization technology, called a parallel coordinate technology. This technique is currently the dominant technique for visual analysis of high-dimensional multi-metadata. In the method, each high-dimensional multi-data object is mapped on attribute axes parallel to each other, and connection relation is established between adjacent 2 attribute axes according to attribute values of each attribute, and edges are broken lines connecting N data points. To achieve the purposes of fast and scientifically solving the problems of fast screening and recommending of the explosive formulation, the problem of deep interaction between a person and a computer and cooperative treatment are required. In the prior art, in the screening and recommending process of an explosive formula, high-dimensional multi-metadata such as chemical property, thermal property, detonation property, sensitivity and the like of the formula are involved, and the characteristics of the high-dimensional multi-metadata, such as a large quantity and impurities, cause great trouble to scientific researchers. Most of traditional explosive formula screening and recommending methods rely on visual display of data and experience of researchers, are less in combination with methods such as visualization and data mining, and are difficult to quickly and conveniently finish the targets of rapid screening and recommending of the explosive formula.
In order to solve the problems, the application provides a method and a system for rapidly screening and recommending an explosive formula.
Disclosure of Invention
Object of the application
In order to solve the problems that in the prior art, in the screening and recommending process of an explosive formula, high-dimensional multi-metadata such as chemical property, thermal property, detonation property, sensitivity and the like of the formula are involved, and the characteristics of the high-dimensional multi-metadata such as a large amount and impurities cause great trouble to scientific researchers. The application provides a method and a system for rapidly screening and recommending an explosive formulation, which design and realize the method and the system for rapidly screening and recommending the explosive formulation by a high-dimensional multi-element parallel coordinate visual interaction technology and a data mining and visual analysis technology, simultaneously provide a plurality of convenient and flexible interaction modes, facilitate the analysis and understanding of data by a user, and solve the problems of screening and recommending the formulation.
(II) technical scheme
In order to solve the problems, the application provides an explosive formula rapid screening and recommending system, which comprises an explosive formula rapid screening unit and an explosive similar formula recommending unit;
the explosive formula rapid screening unit comprises a basic resource database, a data processing module, a Mysql database, a parameter introducing module and a visual analysis module;
the basic resource database stores basic parameter information of the explosive formula;
the data processing module cleans data, collects energy-containing knowledge and divides the data; the data processing module sends the data to a Mysql database;
after reading the data, the Mysql database displays corresponding formula data in the high-dimensional multi-element parallel coordinates;
the parameter introducing module introduces different parameters of the explosive formula;
the visual analysis module analyzes the data and obtains a screening result;
the explosive similar formula recommending unit comprises a formula sample data preprocessing module, a formula clustering model establishing module and a visual interaction module;
the data preprocessing module is used for analyzing and preprocessing the component proportion, the molecular composition, the performance data and the sensitivity data of the formula;
the formula clustering model building module determines a clustering center and acquires class clusters;
the visual interaction module displays the screened data.
Preferably, a user can perform formula screening through molecular formula, composition, density, detonation velocity, detonation pressure, detonation heat, friction sensitivity and impact sensitivity, and the screening result is displayed in a high-dimensional multi-element parallel coordinate graph.
Preferably, the data preprocessing module includes a normalized data format, calculated substitution data, normalized data, and randomly generated data.
Preferably, the formula cluster model building module is implemented based on a fusion density peak and a K-Means algorithm.
Preferably, the visual interaction module comprises a t-SNE, a pie chart, a scatter chart, a parallel coordinate chart and a line chart.
Preferably, the visual interaction module is provided with frame selection, clicking and association operations.
The operation method of the rapid screening and recommending system based on the explosive formula comprises the following specific steps:
s1, screening an explosive formula basic resource database; and processing the screened data; the data processing comprises data cleaning, energetic knowledge collection and data segmentation; and sending the data to a Mysql database;
s2, after the data are processed, storing the processed data into a Mysql database. After the data in the database is read, the corresponding formula data can be displayed in the high-dimensional multi-element parallel coordinates. The number of axes in the high-dimensional multi-element parallel coordinate can be changed correspondingly according to the dimension of the formula data.
S3, performing visual analysis on the high-dimensional multi-element parallel coordinates to obtain a screened explosive formula; the method comprises the steps of parallel coordinate collaborative interaction display and range screening interaction display;
s4, preprocessing the formula sample data, wherein the preprocessing comprises the steps of analyzing formula component proportion data, formula component molecular formula data, formula performance data and formula sensitivity data;
s5, establishing a formula clustering model;
s6, the visual interaction module displays the screened data; and storing the clustering result label, displaying the multidimensional data of the energetic material, helping a user to finish the setting of the formula clustering label, and further finishing the recommendation function of the similar formula.
Preferably, in S5, the clustering model is built by fusing density peaks and K-Means algorithm.
Preferably, building the cluster model includes calculating the local density ρ of each point i Calculating the distance delta of each point i Determining a cluster center and acquiring a class cluster.
Preferably, the density peak algorithm defines those points with a larger distance and at the same time a larger local density as cluster centers; confirming the number of formula clusters by using a density peak algorithm;
taking the number of the formula clusters obtained in the last step as the number of initial cluster centers of a K-Means algorithm, calculating the distance from each object to each cluster center, sequentially comparing the distance from each object to each cluster center, and distributing the objects to the cluster of the cluster center closest to the cluster center until the cluster center is not changed or a certain number of iterations is reached, so that a corresponding clustering result can be obtained.
The technical scheme of the application has the following beneficial technical effects: according to the application, a flexible interaction mode for setting each attribute range of the explosive formulation high-dimensional multivariate data is provided by a high-order parallel coordinate visual interaction technology, a user selects interesting explosive formulation performance attribute branches through interaction and sets a numerical range, and a result of rapid screening of the formulation is displayed through multiple views. On the basis, multi-dimensional formula data are analyzed and processed, a formula clustering classification model is established by integrating data mining and visual analysis methods, and a rich visual display and interaction method is provided for different clustering results so as to solve the problem of similar recommendation of explosive formulas. Most of traditional explosive formula screening and recommending methods rely on visual display of data and experience of researchers, and task requirements of the researchers are difficult to complete quickly and conveniently. The application designs and realizes the method and the system for rapidly screening and recommending the explosive formulation through the high-dimensional multi-element parallel coordinate visual interaction technology and the data mining and visual analysis technology, simultaneously provides a plurality of convenient and flexible interaction modes, is convenient for users to analyze and understand data, and is used for solving the problems of screening and recommending the formulation.
Drawings
FIG. 1 is an overall flow chart of the explosive formulation rapid screening and recommendation system of the present application.
Fig. 2 is a flowchart of the whole recommending unit of the explosive similar formula in the explosive formula rapid screening and recommending system provided by the application.
FIG. 3 is a flow chart of the method for rapidly screening and recommending the explosive formulation and the establishment of the formulation clustering device model in the system.
Detailed Description
The objects, technical solutions and advantages of the present application will become more apparent by the following detailed description of the present application with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the application. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present application.
As shown in fig. 1-3, the explosive formulation rapid screening and recommending system provided by the application comprises an explosive formulation rapid screening unit and an explosive similar formulation recommending unit;
the explosive formula rapid screening unit comprises a basic resource database, a data processing module, a Mysql database, a parameter introducing module and a visual analysis module;
the basic resource database stores basic parameter information of the explosive formula;
the data processing module cleans data, collects energy-containing knowledge and divides the data; the data processing module sends the data to a Mysql database;
after reading the data, the Mysql database displays corresponding formula data in the high-dimensional multi-element parallel coordinates;
the parameter introducing module introduces different parameters of the explosive formula;
the visual analysis module analyzes the data and obtains a screening result;
the explosive similar formula recommending unit comprises a formula sample data preprocessing module, a formula clustering model establishing module and a visual interaction module;
the data preprocessing module is used for analyzing and preprocessing the component proportion, the molecular composition, the performance data and the sensitivity data of the formula;
the formula clustering model building module determines a clustering center and acquires class clusters;
the visual interaction module displays the screened data.
In an alternative embodiment, the user may perform formulation screening by molecular formula, composition, density, detonation velocity, detonation pressure, detonation heat, friction sensitivity, and impact sensitivity, and the screening result will be presented in a high-dimensional multi-element parallel graph.
In an alternative embodiment, the data preprocessing module includes a normalized data format, calculated replacement data, normalized data, and randomly generated data.
In an alternative embodiment, the recipe cluster model creation module is implemented based on the fusion density peak and the K-Means algorithm.
In an alternative embodiment, the visual interaction module includes a t-SNE, pie chart, scatter plot, parallel coordinate plot, and line plot.
In an alternative embodiment, the visual interaction module is provided with box selection, click and association operations.
The operation method of the rapid screening and recommending system based on the explosive formula comprises the following specific steps:
s1, screening an explosive formula basic resource database; and processing the screened data; the data processing comprises data cleaning, energetic knowledge collection and data segmentation; and sending the data to a Mysql database;
s2, after the data are processed, storing the processed data into a Mysql database. After the data in the database is read, the corresponding formula data can be displayed in the high-dimensional multi-element parallel coordinates. The number of axes in the high-dimensional multi-element parallel coordinate can be changed correspondingly according to the dimension of the formula data.
S3, performing visual analysis on the high-dimensional multi-element parallel coordinates to obtain a screened explosive formula; the method comprises the steps of parallel coordinate collaborative interaction display and range screening interaction display;
s4, preprocessing the formula sample data, wherein the preprocessing comprises the steps of analyzing formula component proportion data, formula component molecular formula data, formula performance data and formula sensitivity data;
s5, establishing a formula clustering model;
s6, the visual interaction module displays the screened data; and storing the clustering result label, displaying the multidimensional data of the energetic material, helping a user to finish the setting of the formula clustering label, and further finishing the recommendation function of the similar formula.
In an alternative embodiment, the cluster model is built by fusing density peaks and the K-Means algorithm in S5.
In an alternative embodiment, building the cluster model includes calculating the local density ρ for each point i Calculating the distance delta of each point i Determining a cluster center and acquiring a class cluster.
In an alternative embodiment, the density peaking algorithm defines those points with greater distance and at the same time greater local density as cluster centers; confirming the number of formula clusters by using a density peak algorithm;
taking the number of the formula clusters obtained in the last step as the number of initial cluster centers of a K-Means algorithm, calculating the distance from each object to each cluster center, sequentially comparing the distance from each object to each cluster center, and distributing the objects to the cluster of the cluster center closest to the cluster center until the cluster center is not changed or a certain number of iterations is reached, so that a corresponding clustering result can be obtained.
According to the application, based on an explosive formula foundation resource database, the quick screening of the energetic material formula is realized by utilizing a high-dimensional multi-metadata parallel coordinate and multi-view multi-dimensional association visual interaction technology, so that the rate of explosive formula model selection is improved; the energetic material data is different from common high-dimensional multi-metadata, the data is difficult to collect, and the non-numerical discrete data cannot be well adapted to parallel coordinate axes, so that partial test data and real data are used in the method, and meanwhile, the data are processed and screened to a certain extent, so that the method can be suitable for multi-dimensional data display under the discrete data state. After the data is processed, the data is stored in a Mysql database. After the data in the database is read, the corresponding formula data can be displayed in the high-dimensional multi-element parallel coordinates. The number of axes in the high-dimensional multi-element parallel coordinate may be modified (expanded or reduced) accordingly based on the dimensions of the recipe data.
The high-dimensional multi-dimensional parallel coordinate visualization technique is a typical geometry-based multi-dimensional visualization technique that can clearly present data of all dimensions in one view. The main idea of the technology is that an N-dimensional data attribute space is mapped onto a two-dimensional plane through N equidistant parallel axes, each axis represents one attribute dimension, the value range on the axis is uniformly distributed from the minimum value to the maximum value of the corresponding attribute, and each piece of data can be connected on the N parallel axes into N-1 broken line segments by using line segments according to the attribute values. The N points at which the N-1 line segments intersect with the N coordinate axes represent N-dimensional data of the data points, respectively. This polyline representing N-dimensional data can be represented by N-1 linearly independent equations, which are as follows:
from equation (1) it can be derived:
x i+1 =m i x i +b i ,i=1,2,......,n-1 (2)
wherein m is i =k i+1 /k i Represents the slope, b i =(a i+1 -m i a i ) Represented at x i x i+1 X in plane i+1 An intercept on the axis;
although parallel coordinate visualization techniques have the ability to expose all dimensions and all data, since it treats each dimension equally, multiple sets of data are interleaved together. When the data scale is large, the number of edges in parallel coordinates is increased, a large amount of overlapping is formed among the edges, visual confusion is caused, and data analysis and visualization tasks are difficult to complete independently;
the application adopts a multi-view collaborative visual analysis technology; the multi-view collaborative visual analysis is multi-view parallel analysis which uses multiple visual technologies to visualize the same data object and realizes the fusion of multiple technologies through interaction. Compared with the traditional parallel coordinate technology, the multi-view collaborative visual analysis technology can more intuitively display data from more angles, and spans the problem of single-view visual flux limitation, so that the whole visual analysis flow has more logic; the multi-view collaborative visual analysis technology is adopted, the advantages of parallel coordinates, a pie chart and a radar chart are effectively combined, the interpretation of original data is supported, and the data can be locally displayed and compared and displayed according to the requirement; the parallel coordinate visual interaction technology supports the whole-to-detail and detail-to-whole cyclic screening process. Through interaction, an explosive formula performance attribute branch is selected. And representing each performance data attribute by adopting mutually parallel coordinate axes. For each explosive formulation, a broken line is used that passes through all coordinates. By parallel coordinates, it is convenient to view the distribution of the explosive formulation entries over the various performance attributes. Through the interactive technology, the screening objects can be conveniently switched;
the user can brush any one of six coordinate axes of density, explosion speed, explosion pressure, explosion heat, friction sensitivity and impact sensitivity, and can brush a plurality of axes at the same time, and the range of each attribute is limited by brushing. After the user swipes, the swiped range will be highlighted and clicking again below the coordinate axis will cancel the swipe. Meanwhile, the data of the principal component display diagram and the detailed information table are updated; the user can carry out formula screening through molecular formula, component, density, explosion speed, explosion pressure, explosion heat, friction sensitivity and impact sensitivity, the screening result is displayed in a high-dimensional multi-element parallel coordinate graph, and meanwhile, the number of data items meeting screening conditions and the proportion of main components in screened data are displayed.
After the user brushes the coordinate axis or screens the coordinate axis or passes the range, a table showing specific information is generated below the parallel coordinate as shown, and the checked formula comparison information is shown in the form of a radar chart by checking a check box in front of the table. Through range screening, principal component display and comparative analysis, the eligible explosive information can be rapidly screened from the high-dimensional multi-metadata, which saves a lot of time for energetic material researchers.
According to the application, after the screened explosive formula is obtained through visual analysis in the high-dimensional multi-element parallel coordinates, analysis pretreatment is carried out on the component proportion, molecular composition, performance data and sensitivity data of the formula based on fusion of data mining and visual analysis technology, and a similar formula recommendation method is provided by combining with various visual interaction technologies, so that researchers can find similar formulas.
1. Recipe sample data preprocessing
The application processes four kinds of multi-dimensional energetic material data, namely formula component proportion data, formula component molecular formula data, formula performance data and formula sensitivity data.
The data format of the formula component proportion and molecular formula raw data is required to be processed, and the data after partial processing is shown in table 1:
TABLE 1 partial post-treatment formulation ingredient ratio data
For the recipe component proportion, assume that component C is present in the sample data 1 ,C 2 ,…,C n Wherein n is the total number of components. Property dimension set sum c= { C of recipe component ratio can be obtained 1 ,C 2 ,…,C n And the value of the total weight is the mass fraction of the corresponding component in the formula.
The formula component formula is that elements in the set atom set A= { C, H, O, N, al, F, cl } are used as attribute dimensions. The ratio of each component in a formulation is multiplied by the sum of the number of individual atoms in its formula as the value of the individual dimensions in the formula attribute of the formulation.
Let the detonation velocity be v (m/s), the detonation pressure be P (GPa), the detonation heat be h (kj/kg), then the attribute dimension set of the formula performance data P= { v, P, h }. Because the span of the data range between different dimensions of the performance data is larger, the dispersion normalization method is adopted to normalize the formula performance data, so that the problem is solved.
The dispersion normalization is a linear change to the original data, which can lead the data to fall into the interval of [0,1], so that the model can compare and weight indexes of different units or orders. The formula is as follows:
let the impact sensitivity be i (%), the friction sensitivity be f (%), the formula sensitivity data attribute dimension set and s= { i, f }. Since there are cases where there are few literature records in the recipe sensitivity data, the missing recipe sensitivity data is replaced by randomly generating a set of test data for the (0, 1) interval.
2. A method for establishing a formula clustering model,
After the pretreatment is carried out on the formula sample data, a clustering model establishment method aiming at the formula is realized through fusing density peaks and a K-Means algorithm.
The following point distances were all calculated using the Euclidean distance, expressed as:
the following calculations are done with each recipe as a point and the values of the multidimensional attribute as their coordinate locations in space.
1. Calculating the local density ρ of each point i
The density of the dots is about the dot and about dist cutoff For radius, a small circle is drawn, a few points are arranged in the circle, and the number of points in the circle is the local density of the points.
Wherein the local density ρ i The definition is as follows:
wherein dist cutoff The cut-off distance is indicated as such,
2. calculating the distance delta of each point i
Distance delta between high density points i The definition is as follows:
3. determining cluster centers
The density peak algorithm defines those points with a larger distance and at the same time a larger local density as cluster centers.
The searching of the number of class centers in the calculated formula data can be completed by using a density peak algorithm, the number of formula clusters is confirmed, and a K-Means algorithm is further needed to calculate the more accurate clustering result.
4. Acquisition class clusters
Taking the number of the formula clusters obtained in the last step as the number of initial cluster centers of a K-Means algorithm, calculating the distance from each object to each cluster center, sequentially comparing the distance from each object to each cluster center, and distributing the objects to the cluster of the cluster center closest to the cluster center until the cluster center is not changed or a certain number of iterations is reached, so that a corresponding clustering result can be obtained. Thus, the establishment of the formula clustering model is completed.
3. Result display optimization visualization technology based on T distribution random neighbor embedding
After the steps, clustering results of the formula samples under different targets are preliminarily obtained.
However, for multi-objective, multi-dimensional data such as energetic material formulations, a situation that "congestion problems" are easily generated, i.e., high latitude data cannot be reliably mapped in low dimensions, and the distance distribution between points is very unbalanced with increasing dimensions. The t-SNE (t-distributed stochastic neighbor embedding) algorithm is therefore employed to alleviate the "crowding problem" for optimizing the presentation of cluster model visualization results.
4. Diversified visual interaction technology
The data visualization is realized by drawing the graph by using the Echarts open source graph library, meanwhile, the multi-target formula clustering process is displayed in an animation mode, and the multi-dimensional data of the energetic materials are displayed by combining visual interaction technologies such as frame selection, clicking and association with the assistance of tables, pie charts, line charts, column charts and parallel coordinate charts, so that a user is helped to complete formula clustering label setting, and further the recommendation function of similar formulas is completed.
Aiming at the multi-target multi-dimensional formula clustering, the categories of the multi-target multi-dimensional formula clustering can be simply divided, the purpose of similar formula recommendation can be achieved, and a user is required to set a label for the clustering result. Similar recipe systems provide visual interaction means for clicking, framing, associating, etc. to help users complete label settings.
Scatter diagram
Click on a node in the scatter plot. The system will show the data of one group and the data contained in the other three different clusters and the recipe data of the same class, and will show some data information of the corresponding class of the click class, for example: specific formula information of all the formulas in the same class as the specific formula information, cluster label information, main component content, atom number, sensitivity, performance and the like in the class. The basic information of the formula is displayed in a visual mode.
Parallel graph
The parallel graph is well suited for the exploration of Yu Duowei data. By brushing data of a certain dimension in the parallel coordinate graph, the corresponding formula information table, the atomic number line graph, the component proportion rose graph and the sensitivity histogram are updated and are used for displaying the brushed data.
Through the interaction, researchers can search for different types of formulas under different targets in multiple angles, so that the labels of the different types of formulas are set.
It is to be understood that the above-described embodiments of the present application are merely illustrative of or explanation of the principles of the present application and are in no way limiting of the application. Accordingly, any modification, equivalent replacement, improvement, etc. made without departing from the spirit and scope of the present application should be included in the scope of the present application. Furthermore, the appended claims are intended to cover all such changes and modifications that fall within the scope and boundary of the appended claims, or equivalents of such scope and boundary.
Claims (6)
1. The explosive formula rapid screening and recommending system is characterized by comprising an explosive formula rapid screening unit and an explosive similar formula recommending unit;
the explosive formula rapid screening unit comprises a basic resource database, a data processing module, a Mysql database, a parameter introducing module and a visual analysis module;
the basic resource database stores basic parameter information of the explosive formula;
the data processing module cleans data, collects energy-containing knowledge and divides the data; the data processing module sends the data to a Mysql database;
after reading the data, the Mysql database displays corresponding formula data in the high-dimensional multi-element parallel coordinates;
the parameter introducing module introduces different parameters of the explosive formula;
the visual analysis module analyzes the data and obtains a screening result;
the explosive similar formula recommending unit comprises a formula sample data preprocessing module, a formula clustering model establishing module and a visual interaction module;
the data preprocessing module is used for analyzing and preprocessing the component proportion, the molecular composition, the performance data and the sensitivity data of the formula;
the formula clustering model building module determines a clustering center and acquires class clusters;
the visual interaction module displays the screened data;
the operation method of the rapid screening and recommending system based on the explosive formula comprises the following specific steps:
s1, screening an explosive formula basic resource database; and processing the screened data; the data processing comprises data cleaning, energetic knowledge collection and data segmentation; and sending the data to a Mysql database;
s2, after the data are processed, the processed data are stored in a Mysql database, after the data in the database are read, the corresponding formula data can be displayed in the high-dimensional multi-element parallel coordinates, and the number of the axes in the high-dimensional multi-element parallel coordinates can be changed correspondingly according to the dimension of the formula data;
s3, performing visual analysis on the high-dimensional multi-element parallel coordinates to obtain a screened explosive formula; the method comprises the steps of parallel coordinate collaborative interaction display and range screening interaction display;
s4, preprocessing the formula sample data, wherein the preprocessing comprises the steps of analyzing formula component proportion data, formula component molecular formula data, formula performance data and formula sensitivity data;
s5, establishing a formula clustering model;
s6, the visual interaction module displays the screened data; storing the clustering result label, displaying the multidimensional data of the energetic material, helping the user to finish the setting of the formula clustering label, and further finishing the recommending function of the similar formula;
s5, establishing a clustering model through fusion of density peaks and a K-Means algorithm;
establishing the cluster model includes calculating the local density ρ of each point i Calculating the distance delta of each point i Determining a clustering center and acquiring a class cluster;
the formula clustering model building method comprises the following steps of:
the point distances are all calculated by adopting Euclidean distance, and are expressed as:
taking each formula as a point, taking the value of the multidimensional attribute as the coordinate position of the multidimensional attribute in the space, and thus completing the following calculation;
1. calculating the local density ρ of each point i
The density of dots is centered on the dot, about dist cutoff Drawing a small circle for radius, wherein the number of points in the circle is the local density of the points;
wherein the local density ρ i The definition is as follows:
wherein dist cutoff The cut-off distance is indicated as such,
2. calculating the distance delta of each point i
Distance delta between high density points i The definition is as follows:
3. determining cluster centers
The density peak algorithm defines those points with a larger distance and at the same time a larger local density as cluster centers; confirming the number of formula clusters by using a density peak algorithm;
4. acquisition class clusters
Taking the number of the formula clusters obtained in the last step as the number of initial cluster centers of a K-Means algorithm, calculating the distance from each object to each cluster center, sequentially comparing the distance from each object to each cluster center, and distributing the objects to the cluster of the cluster center closest to the cluster center until the cluster center is not changed or a certain iteration number is reached, so that a corresponding clustering result can be obtained;
the multi-target formula clustering process is displayed in an animation mode, a table, a pie chart, a line chart, a histogram and a parallel coordinate chart are assisted, and the multi-dimensional data of the energetic materials are displayed by combining frame selection, clicking and correlation visualization interaction technologies, so that a user is helped to complete formula clustering label setting, and further the recommendation function of similar formulas is completed.
2. The rapid screening and recommendation system for explosives formulations of claim 1, wherein the user can perform formulation screening by molecular formula, composition, density, detonation velocity, detonation pressure, detonation heat, friction sensitivity, and impact sensitivity, and the screening results are presented in a high-dimensional multi-element parallel graph.
3. The rapid screening and recommendation system for explosives formulations of claim 1, wherein the data preprocessing module comprises normalized data format, calculated substitution data, normalized data, and randomly generated data.
4. The rapid screening and recommendation system for explosive formulations according to claim 1, wherein the formulation clustering model building module is implemented based on a fusion density peak and a K-Means algorithm.
5. The rapid screening and recommendation system for explosives formulations of claim 1, wherein the visual interaction module comprises a t-SNE, pie chart, scatter chart, parallel graph, and line chart.
6. The rapid screening and recommendation system for explosives formulations of claim 1, wherein the visual interaction module is configured with frame selection, clicking and association operations.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010210459.XA CN111522867B (en) | 2020-03-23 | 2020-03-23 | Quick screening and recommending method and system for explosive formula |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010210459.XA CN111522867B (en) | 2020-03-23 | 2020-03-23 | Quick screening and recommending method and system for explosive formula |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111522867A CN111522867A (en) | 2020-08-11 |
CN111522867B true CN111522867B (en) | 2023-11-10 |
Family
ID=71910534
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010210459.XA Active CN111522867B (en) | 2020-03-23 | 2020-03-23 | Quick screening and recommending method and system for explosive formula |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111522867B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1117954A (en) * | 1995-05-24 | 1996-03-06 | 韩杼梁 | Safety explosive and its making method |
WO2000029202A1 (en) * | 1998-11-13 | 2000-05-25 | Therics, Inc. | A computer-aided fabrication process for rapid designing, prototyping and manufacturing |
US8224764B1 (en) * | 2009-06-01 | 2012-07-17 | Gregory Albert Ouzounian | Method to predict homemade explosive formulation outcomes |
CN103605718A (en) * | 2013-11-15 | 2014-02-26 | 南京大学 | Hadoop improvement based goods recommendation method |
KR101738571B1 (en) * | 2016-02-18 | 2017-06-08 | 주식회사 네비웍스 | Platform apparatus for firework display and control method thereof |
CN108460087A (en) * | 2018-01-22 | 2018-08-28 | 北京邮电大学 | Heuristic high dimensional data visualization device and method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846066B (en) * | 2018-06-06 | 2020-01-24 | 上海计算机软件技术开发中心 | Visual data analysis method and system |
-
2020
- 2020-03-23 CN CN202010210459.XA patent/CN111522867B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1117954A (en) * | 1995-05-24 | 1996-03-06 | 韩杼梁 | Safety explosive and its making method |
WO2000029202A1 (en) * | 1998-11-13 | 2000-05-25 | Therics, Inc. | A computer-aided fabrication process for rapid designing, prototyping and manufacturing |
US8224764B1 (en) * | 2009-06-01 | 2012-07-17 | Gregory Albert Ouzounian | Method to predict homemade explosive formulation outcomes |
CN103605718A (en) * | 2013-11-15 | 2014-02-26 | 南京大学 | Hadoop improvement based goods recommendation method |
KR101738571B1 (en) * | 2016-02-18 | 2017-06-08 | 주식회사 네비웍스 | Platform apparatus for firework display and control method thereof |
CN108460087A (en) * | 2018-01-22 | 2018-08-28 | 北京邮电大学 | Heuristic high dimensional data visualization device and method |
Non-Patent Citations (3)
Title |
---|
A novel recommendation method based on social network using matrix factorization technique;Xu Chonghuan;Information Processing & Management;第54卷(第3期);463-474 * |
一种融合K-means 和快速密度峰值搜索算法的聚类方法;盛华等;《计算机应用与软件》;20161015;第33卷(第10期);260-264+269 * |
高能气体压裂液体火药理论配方优选设计;王安仕等;西安石油学院学报(自然科学版);4-6+12 * |
Also Published As
Publication number | Publication date |
---|---|
CN111522867A (en) | 2020-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Visualizing high-dimensional data: Advances in the past decade | |
Seo et al. | A rank-by-feature framework for unsupervised multidimensional data exploration using low dimensional projections | |
Griffith | Introduction: the need for spatial statistics | |
US20030112234A1 (en) | Statistical comparator interface | |
TW201510932A (en) | System and method for interactive visual analytics of multi-dimensional temporal data | |
CN111639243B (en) | Space-time data progressive multi-dimensional mode extraction and anomaly detection visual analysis method | |
Vijayarani et al. | Research in big data: an overview | |
Sun et al. | A five-level design framework for bicluster visualizations | |
CN101622619A (en) | Be used for navigating and the method and system of the data of visual relational database and/or multi-dimensional database | |
CN105843842A (en) | Multi-dimensional gathering querying and displaying system and method in big data environment | |
CN106605222A (en) | Guided data exploration | |
Keim et al. | Visualization | |
Sarkar et al. | Visual discovery and model-driven explanation of time series patterns | |
Nusrat et al. | Visualizing cartograms: Goals and task taxonomy | |
CN111522867B (en) | Quick screening and recommending method and system for explosive formula | |
Usman et al. | A data mining approach to knowledge discovery from multidimensional cube structures | |
Iñiguez-Jarrín et al. | Defining interaction design patterns to extract knowledge from big data | |
Horvat et al. | Big Data Architecture for Cryptocurrency Real-time Data Processing | |
Ahmed et al. | Steerable Clustering for Visual Analysis of Ecosystems. | |
Wang et al. | Stac: Enhancing stacked graphs for time series analysis | |
Sun et al. | Visitpedia: Wiki article visit log visualization for event exploration | |
Alkathiri et al. | Kluster: Application of k-means clustering to multidimensional GEO-spatial data | |
CA2944612C (en) | Systems and methods for ranking data visualizations | |
Cui et al. | Enhancing scatterplot matrices for data with ordering or spatial attributes | |
Wagstaff | Data clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |