CN111522867B

CN111522867B - Quick screening and recommending method and system for explosive formula

Info

Publication number: CN111522867B
Application number: CN202010210459.XA
Authority: CN
Inventors: 彭莉娟; 吴亚东; 张建军; 吴毅; 薛炜; 杨甜; 周阳; 胡浩
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2020-03-23
Filing date: 2020-03-23
Publication date: 2023-11-10
Anticipated expiration: 2040-03-23
Also published as: CN111522867A

Abstract

The explosive formula rapid screening and recommending system comprises an explosive formula rapid screening unit and an explosive similar formula recommending unit; the explosive formula rapid screening unit comprises a basic resource database, a data processing module, a Mysql database, a parameter introducing module and a visual analysis module; the explosive similar formula recommending unit comprises a formula sample data preprocessing module, a formula clustering model establishing module and a visual interaction module; the data preprocessing module is used for analyzing and preprocessing the component proportion, the molecular composition, the performance data and the sensitivity data of the formula; the formula clustering model building module determines a clustering center and acquires class clusters; the application designs and realizes the method and the system for rapidly screening and recommending the explosive formulation by a high-dimensional multi-element parallel coordinate visual interaction technology and a data mining and visual analysis technology, and simultaneously provides a plurality of convenient and flexible interaction modes, thereby facilitating the analysis and the understanding of the data by users.

Description

Quick screening and recommending method and system for explosive formula

Technical Field

The application relates to the field of screening and recommending explosive formulations of energetic materials, in particular to a rapid screening and recommending method and a rapid screening and recommending system for explosive formulations.

Background

In 2016, china proposed an energetic material genome research program (Energy Materials Genome Initiative, EMGI), the goal of which is to fully exert the cross functions of databases, calculation and experiments, and utilize the existing computer and big data analysis technology to find the "genes" for determining the performance of the energetic material, and design and synthesize novel energetic materials by using the genes, and the research and development of the energetic materials in the future emphasize that innovation is promoted by multidisciplinary cross fusion and the benign communication of the military and civil technology is advocated to promote the development of the energetic materials.

In the process of screening and recommending an explosive formula, high-dimensional multi-metadata such as chemical property, thermal property, detonation property, sensitivity and the like of the formula can be involved. In the field of information visualization, processing and visual analysis of high-dimensional multi-metadata have been research hotspots, and in the middle 80 s and the early 90 s of the 20 th century, inselberg et al propose a high-dimensional multi-metadata visualization technology, called a parallel coordinate technology. This technique is currently the dominant technique for visual analysis of high-dimensional multi-metadata. In the method, each high-dimensional multi-data object is mapped on attribute axes parallel to each other, and connection relation is established between adjacent 2 attribute axes according to attribute values of each attribute, and edges are broken lines connecting N data points. To achieve the purposes of fast and scientifically solving the problems of fast screening and recommending of the explosive formulation, the problem of deep interaction between a person and a computer and cooperative treatment are required. In the prior art, in the screening and recommending process of an explosive formula, high-dimensional multi-metadata such as chemical property, thermal property, detonation property, sensitivity and the like of the formula are involved, and the characteristics of the high-dimensional multi-metadata, such as a large quantity and impurities, cause great trouble to scientific researchers. Most of traditional explosive formula screening and recommending methods rely on visual display of data and experience of researchers, are less in combination with methods such as visualization and data mining, and are difficult to quickly and conveniently finish the targets of rapid screening and recommending of the explosive formula.

In order to solve the problems, the application provides a method and a system for rapidly screening and recommending an explosive formula.

Disclosure of Invention

Object of the application

In order to solve the problems that in the prior art, in the screening and recommending process of an explosive formula, high-dimensional multi-metadata such as chemical property, thermal property, detonation property, sensitivity and the like of the formula are involved, and the characteristics of the high-dimensional multi-metadata such as a large amount and impurities cause great trouble to scientific researchers. The application provides a method and a system for rapidly screening and recommending an explosive formulation, which design and realize the method and the system for rapidly screening and recommending the explosive formulation by a high-dimensional multi-element parallel coordinate visual interaction technology and a data mining and visual analysis technology, simultaneously provide a plurality of convenient and flexible interaction modes, facilitate the analysis and understanding of data by a user, and solve the problems of screening and recommending the formulation.

(II) technical scheme

In order to solve the problems, the application provides an explosive formula rapid screening and recommending system, which comprises an explosive formula rapid screening unit and an explosive similar formula recommending unit;

the explosive formula rapid screening unit comprises a basic resource database, a data processing module, a Mysql database, a parameter introducing module and a visual analysis module;

the basic resource database stores basic parameter information of the explosive formula;

the data processing module cleans data, collects energy-containing knowledge and divides the data; the data processing module sends the data to a Mysql database;

after reading the data, the Mysql database displays corresponding formula data in the high-dimensional multi-element parallel coordinates;

the parameter introducing module introduces different parameters of the explosive formula;

the visual analysis module analyzes the data and obtains a screening result;

the explosive similar formula recommending unit comprises a formula sample data preprocessing module, a formula clustering model establishing module and a visual interaction module;

the data preprocessing module is used for analyzing and preprocessing the component proportion, the molecular composition, the performance data and the sensitivity data of the formula;

the formula clustering model building module determines a clustering center and acquires class clusters;

the visual interaction module displays the screened data.

Preferably, a user can perform formula screening through molecular formula, composition, density, detonation velocity, detonation pressure, detonation heat, friction sensitivity and impact sensitivity, and the screening result is displayed in a high-dimensional multi-element parallel coordinate graph.

Preferably, the data preprocessing module includes a normalized data format, calculated substitution data, normalized data, and randomly generated data.

Preferably, the formula cluster model building module is implemented based on a fusion density peak and a K-Means algorithm.

Preferably, the visual interaction module comprises a t-SNE, a pie chart, a scatter chart, a parallel coordinate chart and a line chart.

Preferably, the visual interaction module is provided with frame selection, clicking and association operations.

The operation method of the rapid screening and recommending system based on the explosive formula comprises the following specific steps:

s1, screening an explosive formula basic resource database; and processing the screened data; the data processing comprises data cleaning, energetic knowledge collection and data segmentation; and sending the data to a Mysql database;

s2, after the data are processed, storing the processed data into a Mysql database. After the data in the database is read, the corresponding formula data can be displayed in the high-dimensional multi-element parallel coordinates. The number of axes in the high-dimensional multi-element parallel coordinate can be changed correspondingly according to the dimension of the formula data.

S3, performing visual analysis on the high-dimensional multi-element parallel coordinates to obtain a screened explosive formula; the method comprises the steps of parallel coordinate collaborative interaction display and range screening interaction display;

s4, preprocessing the formula sample data, wherein the preprocessing comprises the steps of analyzing formula component proportion data, formula component molecular formula data, formula performance data and formula sensitivity data;

s5, establishing a formula clustering model;

s6, the visual interaction module displays the screened data; and storing the clustering result label, displaying the multidimensional data of the energetic material, helping a user to finish the setting of the formula clustering label, and further finishing the recommendation function of the similar formula.

Preferably, in S5, the clustering model is built by fusing density peaks and K-Means algorithm.

Preferably, building the cluster model includes calculating the local density ρ of each point _i Calculating the distance delta of each point _i Determining a cluster center and acquiring a class cluster.

Preferably, the density peak algorithm defines those points with a larger distance and at the same time a larger local density as cluster centers; confirming the number of formula clusters by using a density peak algorithm;

taking the number of the formula clusters obtained in the last step as the number of initial cluster centers of a K-Means algorithm, calculating the distance from each object to each cluster center, sequentially comparing the distance from each object to each cluster center, and distributing the objects to the cluster of the cluster center closest to the cluster center until the cluster center is not changed or a certain number of iterations is reached, so that a corresponding clustering result can be obtained.

The technical scheme of the application has the following beneficial technical effects: according to the application, a flexible interaction mode for setting each attribute range of the explosive formulation high-dimensional multivariate data is provided by a high-order parallel coordinate visual interaction technology, a user selects interesting explosive formulation performance attribute branches through interaction and sets a numerical range, and a result of rapid screening of the formulation is displayed through multiple views. On the basis, multi-dimensional formula data are analyzed and processed, a formula clustering classification model is established by integrating data mining and visual analysis methods, and a rich visual display and interaction method is provided for different clustering results so as to solve the problem of similar recommendation of explosive formulas. Most of traditional explosive formula screening and recommending methods rely on visual display of data and experience of researchers, and task requirements of the researchers are difficult to complete quickly and conveniently. The application designs and realizes the method and the system for rapidly screening and recommending the explosive formulation through the high-dimensional multi-element parallel coordinate visual interaction technology and the data mining and visual analysis technology, simultaneously provides a plurality of convenient and flexible interaction modes, is convenient for users to analyze and understand data, and is used for solving the problems of screening and recommending the formulation.

Drawings

FIG. 1 is an overall flow chart of the explosive formulation rapid screening and recommendation system of the present application.

Fig. 2 is a flowchart of the whole recommending unit of the explosive similar formula in the explosive formula rapid screening and recommending system provided by the application.

FIG. 3 is a flow chart of the method for rapidly screening and recommending the explosive formulation and the establishment of the formulation clustering device model in the system.

Detailed Description

The objects, technical solutions and advantages of the present application will become more apparent by the following detailed description of the present application with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the application. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present application.

As shown in fig. 1-3, the explosive formulation rapid screening and recommending system provided by the application comprises an explosive formulation rapid screening unit and an explosive similar formulation recommending unit;

the visual analysis module analyzes the data and obtains a screening result;

the visual interaction module displays the screened data.

In an alternative embodiment, the user may perform formulation screening by molecular formula, composition, density, detonation velocity, detonation pressure, detonation heat, friction sensitivity, and impact sensitivity, and the screening result will be presented in a high-dimensional multi-element parallel graph.

In an alternative embodiment, the data preprocessing module includes a normalized data format, calculated replacement data, normalized data, and randomly generated data.

In an alternative embodiment, the recipe cluster model creation module is implemented based on the fusion density peak and the K-Means algorithm.

In an alternative embodiment, the visual interaction module includes a t-SNE, pie chart, scatter plot, parallel coordinate plot, and line plot.

In an alternative embodiment, the visual interaction module is provided with box selection, click and association operations.

s5, establishing a formula clustering model;

In an alternative embodiment, the cluster model is built by fusing density peaks and the K-Means algorithm in S5.

In an alternative embodiment, building the cluster model includes calculating the local density ρ for each point _i Calculating the distance delta of each point _i Determining a cluster center and acquiring a class cluster.

In an alternative embodiment, the density peaking algorithm defines those points with greater distance and at the same time greater local density as cluster centers; confirming the number of formula clusters by using a density peak algorithm;

According to the application, based on an explosive formula foundation resource database, the quick screening of the energetic material formula is realized by utilizing a high-dimensional multi-metadata parallel coordinate and multi-view multi-dimensional association visual interaction technology, so that the rate of explosive formula model selection is improved; the energetic material data is different from common high-dimensional multi-metadata, the data is difficult to collect, and the non-numerical discrete data cannot be well adapted to parallel coordinate axes, so that partial test data and real data are used in the method, and meanwhile, the data are processed and screened to a certain extent, so that the method can be suitable for multi-dimensional data display under the discrete data state. After the data is processed, the data is stored in a Mysql database. After the data in the database is read, the corresponding formula data can be displayed in the high-dimensional multi-element parallel coordinates. The number of axes in the high-dimensional multi-element parallel coordinate may be modified (expanded or reduced) accordingly based on the dimensions of the recipe data.

The high-dimensional multi-dimensional parallel coordinate visualization technique is a typical geometry-based multi-dimensional visualization technique that can clearly present data of all dimensions in one view. The main idea of the technology is that an N-dimensional data attribute space is mapped onto a two-dimensional plane through N equidistant parallel axes, each axis represents one attribute dimension, the value range on the axis is uniformly distributed from the minimum value to the maximum value of the corresponding attribute, and each piece of data can be connected on the N parallel axes into N-1 broken line segments by using line segments according to the attribute values. The N points at which the N-1 line segments intersect with the N coordinate axes represent N-dimensional data of the data points, respectively. This polyline representing N-dimensional data can be represented by N-1 linearly independent equations, which are as follows:

from equation (1) it can be derived:

x _i+1 ＝m _i x _i +b _i ，i＝1，2，......，n-1 (2)

wherein m is _i ＝k _i+1 /k _i Represents the slope, b _i ＝(a _i+1 -m _i a _i ) Represented at x _i x _i+1 X in plane _i+1 An intercept on the axis;

although parallel coordinate visualization techniques have the ability to expose all dimensions and all data, since it treats each dimension equally, multiple sets of data are interleaved together. When the data scale is large, the number of edges in parallel coordinates is increased, a large amount of overlapping is formed among the edges, visual confusion is caused, and data analysis and visualization tasks are difficult to complete independently;

the application adopts a multi-view collaborative visual analysis technology; the multi-view collaborative visual analysis is multi-view parallel analysis which uses multiple visual technologies to visualize the same data object and realizes the fusion of multiple technologies through interaction. Compared with the traditional parallel coordinate technology, the multi-view collaborative visual analysis technology can more intuitively display data from more angles, and spans the problem of single-view visual flux limitation, so that the whole visual analysis flow has more logic; the multi-view collaborative visual analysis technology is adopted, the advantages of parallel coordinates, a pie chart and a radar chart are effectively combined, the interpretation of original data is supported, and the data can be locally displayed and compared and displayed according to the requirement; the parallel coordinate visual interaction technology supports the whole-to-detail and detail-to-whole cyclic screening process. Through interaction, an explosive formula performance attribute branch is selected. And representing each performance data attribute by adopting mutually parallel coordinate axes. For each explosive formulation, a broken line is used that passes through all coordinates. By parallel coordinates, it is convenient to view the distribution of the explosive formulation entries over the various performance attributes. Through the interactive technology, the screening objects can be conveniently switched;

the user can brush any one of six coordinate axes of density, explosion speed, explosion pressure, explosion heat, friction sensitivity and impact sensitivity, and can brush a plurality of axes at the same time, and the range of each attribute is limited by brushing. After the user swipes, the swiped range will be highlighted and clicking again below the coordinate axis will cancel the swipe. Meanwhile, the data of the principal component display diagram and the detailed information table are updated; the user can carry out formula screening through molecular formula, component, density, explosion speed, explosion pressure, explosion heat, friction sensitivity and impact sensitivity, the screening result is displayed in a high-dimensional multi-element parallel coordinate graph, and meanwhile, the number of data items meeting screening conditions and the proportion of main components in screened data are displayed.

After the user brushes the coordinate axis or screens the coordinate axis or passes the range, a table showing specific information is generated below the parallel coordinate as shown, and the checked formula comparison information is shown in the form of a radar chart by checking a check box in front of the table. Through range screening, principal component display and comparative analysis, the eligible explosive information can be rapidly screened from the high-dimensional multi-metadata, which saves a lot of time for energetic material researchers.

According to the application, after the screened explosive formula is obtained through visual analysis in the high-dimensional multi-element parallel coordinates, analysis pretreatment is carried out on the component proportion, molecular composition, performance data and sensitivity data of the formula based on fusion of data mining and visual analysis technology, and a similar formula recommendation method is provided by combining with various visual interaction technologies, so that researchers can find similar formulas.

1. Recipe sample data preprocessing

The application processes four kinds of multi-dimensional energetic material data, namely formula component proportion data, formula component molecular formula data, formula performance data and formula sensitivity data.

The data format of the formula component proportion and molecular formula raw data is required to be processed, and the data after partial processing is shown in table 1:

TABLE 1 partial post-treatment formulation ingredient ratio data

For the recipe component proportion, assume that component C is present in the sample data ₁ ，C ₂ ，…，C _n Wherein n is the total number of components. Property dimension set sum c= { C of recipe component ratio can be obtained ₁ ，C ₂ ，…，C _n And the value of the total weight is the mass fraction of the corresponding component in the formula.

The formula component formula is that elements in the set atom set A= { C, H, O, N, al, F, cl } are used as attribute dimensions. The ratio of each component in a formulation is multiplied by the sum of the number of individual atoms in its formula as the value of the individual dimensions in the formula attribute of the formulation.

Let the detonation velocity be v (m/s), the detonation pressure be P (GPa), the detonation heat be h (kj/kg), then the attribute dimension set of the formula performance data P= { v, P, h }. Because the span of the data range between different dimensions of the performance data is larger, the dispersion normalization method is adopted to normalize the formula performance data, so that the problem is solved.

The dispersion normalization is a linear change to the original data, which can lead the data to fall into the interval of [0,1], so that the model can compare and weight indexes of different units or orders. The formula is as follows:

let the impact sensitivity be i (%), the friction sensitivity be f (%), the formula sensitivity data attribute dimension set and s= { i, f }. Since there are cases where there are few literature records in the recipe sensitivity data, the missing recipe sensitivity data is replaced by randomly generating a set of test data for the (0, 1) interval.

2. A method for establishing a formula clustering model,

After the pretreatment is carried out on the formula sample data, a clustering model establishment method aiming at the formula is realized through fusing density peaks and a K-Means algorithm.

The following point distances were all calculated using the Euclidean distance, expressed as:

the following calculations are done with each recipe as a point and the values of the multidimensional attribute as their coordinate locations in space.

1. Calculating the local density ρ of each point _i

The density of the dots is about the dot and about dist _cutoff For radius, a small circle is drawn, a few points are arranged in the circle, and the number of points in the circle is the local density of the points.

Wherein the local density ρ _i The definition is as follows:

wherein dist _cutoff The cut-off distance is indicated as such,

2. calculating the distance delta of each point _i

Distance delta between high density points _i The definition is as follows:

3. determining cluster centers

The density peak algorithm defines those points with a larger distance and at the same time a larger local density as cluster centers.

The searching of the number of class centers in the calculated formula data can be completed by using a density peak algorithm, the number of formula clusters is confirmed, and a K-Means algorithm is further needed to calculate the more accurate clustering result.

4. Acquisition class clusters

Taking the number of the formula clusters obtained in the last step as the number of initial cluster centers of a K-Means algorithm, calculating the distance from each object to each cluster center, sequentially comparing the distance from each object to each cluster center, and distributing the objects to the cluster of the cluster center closest to the cluster center until the cluster center is not changed or a certain number of iterations is reached, so that a corresponding clustering result can be obtained. Thus, the establishment of the formula clustering model is completed.

3. Result display optimization visualization technology based on T distribution random neighbor embedding

After the steps, clustering results of the formula samples under different targets are preliminarily obtained.

However, for multi-objective, multi-dimensional data such as energetic material formulations, a situation that "congestion problems" are easily generated, i.e., high latitude data cannot be reliably mapped in low dimensions, and the distance distribution between points is very unbalanced with increasing dimensions. The t-SNE (t-distributed stochastic neighbor embedding) algorithm is therefore employed to alleviate the "crowding problem" for optimizing the presentation of cluster model visualization results.

4. Diversified visual interaction technology

The data visualization is realized by drawing the graph by using the Echarts open source graph library, meanwhile, the multi-target formula clustering process is displayed in an animation mode, and the multi-dimensional data of the energetic materials are displayed by combining visual interaction technologies such as frame selection, clicking and association with the assistance of tables, pie charts, line charts, column charts and parallel coordinate charts, so that a user is helped to complete formula clustering label setting, and further the recommendation function of similar formulas is completed.

Aiming at the multi-target multi-dimensional formula clustering, the categories of the multi-target multi-dimensional formula clustering can be simply divided, the purpose of similar formula recommendation can be achieved, and a user is required to set a label for the clustering result. Similar recipe systems provide visual interaction means for clicking, framing, associating, etc. to help users complete label settings.

Scatter diagram

Click on a node in the scatter plot. The system will show the data of one group and the data contained in the other three different clusters and the recipe data of the same class, and will show some data information of the corresponding class of the click class, for example: specific formula information of all the formulas in the same class as the specific formula information, cluster label information, main component content, atom number, sensitivity, performance and the like in the class. The basic information of the formula is displayed in a visual mode.

Parallel graph

The parallel graph is well suited for the exploration of Yu Duowei data. By brushing data of a certain dimension in the parallel coordinate graph, the corresponding formula information table, the atomic number line graph, the component proportion rose graph and the sensitivity histogram are updated and are used for displaying the brushed data.

Through the interaction, researchers can search for different types of formulas under different targets in multiple angles, so that the labels of the different types of formulas are set.

It is to be understood that the above-described embodiments of the present application are merely illustrative of or explanation of the principles of the present application and are in no way limiting of the application. Accordingly, any modification, equivalent replacement, improvement, etc. made without departing from the spirit and scope of the present application should be included in the scope of the present application. Furthermore, the appended claims are intended to cover all such changes and modifications that fall within the scope and boundary of the appended claims, or equivalents of such scope and boundary.

Claims

1. The explosive formula rapid screening and recommending system is characterized by comprising an explosive formula rapid screening unit and an explosive similar formula recommending unit;

the visual analysis module analyzes the data and obtains a screening result;

the visual interaction module displays the screened data;

s2, after the data are processed, the processed data are stored in a Mysql database, after the data in the database are read, the corresponding formula data can be displayed in the high-dimensional multi-element parallel coordinates, and the number of the axes in the high-dimensional multi-element parallel coordinates can be changed correspondingly according to the dimension of the formula data;

s5, establishing a formula clustering model;

s6, the visual interaction module displays the screened data; storing the clustering result label, displaying the multidimensional data of the energetic material, helping the user to finish the setting of the formula clustering label, and further finishing the recommending function of the similar formula;

s5, establishing a clustering model through fusion of density peaks and a K-Means algorithm;

establishing the cluster model includes calculating the local density ρ of each point _i Calculating the distance delta of each point _i Determining a clustering center and acquiring a class cluster;

the formula clustering model building method comprises the following steps of:

the point distances are all calculated by adopting Euclidean distance, and are expressed as:

taking each formula as a point, taking the value of the multidimensional attribute as the coordinate position of the multidimensional attribute in the space, and thus completing the following calculation;

1. calculating the local density ρ of each point _i

The density of dots is centered on the dot, about dist _cutoff Drawing a small circle for radius, wherein the number of points in the circle is the local density of the points;

wherein the local density ρ _i The definition is as follows:

wherein dist _cutoff The cut-off distance is indicated as such,

2. calculating the distance delta of each point _i

Distance delta between high density points _i The definition is as follows:

3. determining cluster centers

The density peak algorithm defines those points with a larger distance and at the same time a larger local density as cluster centers; confirming the number of formula clusters by using a density peak algorithm;

4. acquisition class clusters

Taking the number of the formula clusters obtained in the last step as the number of initial cluster centers of a K-Means algorithm, calculating the distance from each object to each cluster center, sequentially comparing the distance from each object to each cluster center, and distributing the objects to the cluster of the cluster center closest to the cluster center until the cluster center is not changed or a certain iteration number is reached, so that a corresponding clustering result can be obtained;

the multi-target formula clustering process is displayed in an animation mode, a table, a pie chart, a line chart, a histogram and a parallel coordinate chart are assisted, and the multi-dimensional data of the energetic materials are displayed by combining frame selection, clicking and correlation visualization interaction technologies, so that a user is helped to complete formula clustering label setting, and further the recommendation function of similar formulas is completed.

2. The rapid screening and recommendation system for explosives formulations of claim 1, wherein the user can perform formulation screening by molecular formula, composition, density, detonation velocity, detonation pressure, detonation heat, friction sensitivity, and impact sensitivity, and the screening results are presented in a high-dimensional multi-element parallel graph.

3. The rapid screening and recommendation system for explosives formulations of claim 1, wherein the data preprocessing module comprises normalized data format, calculated substitution data, normalized data, and randomly generated data.

4. The rapid screening and recommendation system for explosive formulations according to claim 1, wherein the formulation clustering model building module is implemented based on a fusion density peak and a K-Means algorithm.

5. The rapid screening and recommendation system for explosives formulations of claim 1, wherein the visual interaction module comprises a t-SNE, pie chart, scatter chart, parallel graph, and line chart.

6. The rapid screening and recommendation system for explosives formulations of claim 1, wherein the visual interaction module is configured with frame selection, clicking and association operations.