CN112148834A - Graph embedding-based high-risk food and hazard visual analysis method and system - Google Patents

Graph embedding-based high-risk food and hazard visual analysis method and system Download PDF

Info

Publication number
CN112148834A
CN112148834A CN202010868082.7A CN202010868082A CN112148834A CN 112148834 A CN112148834 A CN 112148834A CN 202010868082 A CN202010868082 A CN 202010868082A CN 112148834 A CN112148834 A CN 112148834A
Authority
CN
China
Prior art keywords
food
graph
node
nodes
radar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010868082.7A
Other languages
Chinese (zh)
Other versions
CN112148834B (en
Inventor
陈谊
张梦录
张清慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Technology and Business University
Original Assignee
Beijing Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Technology and Business University filed Critical Beijing Technology and Business University
Publication of CN112148834A publication Critical patent/CN112148834A/en
Application granted granted Critical
Publication of CN112148834B publication Critical patent/CN112148834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Abstract

The invention discloses a graph embedding-based high-risk food and hazard visual analysis method and system, which are used for establishing a side by taking food as a node and taking whether common hazards are detected in the food as a condition, and constructing a food association network; vectorizing nodes of the food association network; and clustering the nodes after the vector quantization so as to divide the nodes with similar structural features into the same subgraph. And visually designing a Radar graph with Force-Radar as a node for displaying characteristic indexes of subgraphs, taking correlation between the subgraphs as an edge, and forming an overview graph of the whole network by adopting a Force guide layout, thereby effectively displaying the overview of the whole food correlation network, and the relationship between the structural characteristics of the subgraphs and the subgraphs. By adopting the method and the device, the food association network can be interactively analyzed and explored, high-risk food and hazards can be found, and support is provided for food safety supervision.

Description

Graph embedding-based high-risk food and hazard visual analysis method and system
Technical Field
The invention relates to the technical field of information visualization, graph model information mining and food safety, in particular to a graph embedding-based visual exploration method and system for high-risk food and hazards.
Background
Food safety is a problem that everyone has to pay attention to in daily life, and with the continuous improvement of living standard and right-maintaining consciousness of people, the requirements of people on food safety and quality are higher and higher, so that people who eat safe food with reassurance become an important civil problem of wide attention of governments, academic circles and industrial circles. Therefore, the national food safety supervision department performs spot check on various foods in various regions to obtain a large amount of detection data, and the data contain various entities such as food names, types, names of detection items and the like and association relations thereof. The existing statistical analysis method can quickly find the distribution characteristics of the detection data, but is difficult to show all nodes and relations of a large-scale data association network; the existing exploration method for high-risk food and high-risk detection items is to compare a detection result with a limited standard, and find the problems of the food and the detection items with potential risks by considering the association between the food; the user is difficult to find and mine entities and relationships which need to pay attention to, and implicit association modes, and analysis and early warning on potential food safety hazards are difficult to carry out.
Disclosure of Invention
In order to overcome the problems in the prior art, the invention provides a graph-embedded visual analysis and exploration method and system for high-risk food and hazardous substances.
The invention adopts a graph model (network model) to carry out visual analysis on high-risk food and the hazards therein. The graph model can well represent the incidence relation of a plurality of entities in the real world, and has the advantages of simplicity, understandability and easiness in perception. However, when facing a large network containing a large number of entities and complex relationships, it is difficult for human beings to find a sub-graph with a certain structural feature pattern, and then to find entities and relationships that need to be focused. The graph visualization can display the graph to the user in an intuitive form, and the cognitive burden of people is reduced to a great extent. However, as the network scale increases, it is still very difficult to explore the structure of a complex network, discover key entities and relationships on a graph only by human eyes, and it is also difficult to show all the nodes and relationships of a large-scale network in a limited screen space. The recent graph embedding technology vectorizes network nodes and embeds the entire network into a vector space, so that various network features can be more easily calculated. Subgraphs with similar structural characteristics can be found through calculation and then further analyzed.
Therefore, the food safety detection result is subjected to graph modeling to construct a food association network, and the graph analysis and visual analysis technology based on graph embedding is utilized to analyze and explore the subgraph in the food association network and the association relation among foods, so that existing or potential risk foods and detection items can be effectively found, and early warning is timely given.
The invention provides a graph embedding-based visual analysis exploration method and system for high-risk food and hazards, which are used for helping a user to analyze and discover the high-risk food and the hazards according to the detection result of the hazards in the food and the hazard detection correlation among the foods. The method comprises the steps of firstly, establishing an edge on the condition that food is taken as a node and whether common hazards are detected in the food or not, and constructing a food association network; then, using a graph embedding model Struc2vec to vectorize the nodes in the graph, wherein the node vectorization rule of the Struc2vec technology is that the distances of the embedded vectors of the nodes with similar structural features in the original graph are close to each other as much as possible; and finally, clustering the quantified food associated network nodes by using k-means to realize the purpose of clustering the food associated network nodes with structural feature similarity to different subgraphs. In order to effectively display the overview of the whole food association network and the relationship between the structural characteristics of subgraphs and the subgraphs, the invention provides a novel visual design Force-Radar, which takes a Radar graph displaying 5 characteristic indexes of the subgraphs as a node and the association between the subgraphs as an edge and adopts a Force-oriented layout to form an overview graph of the whole network. The method is used for exploring the food association network, so that a user can be effectively helped to explore subgraphs in the food association network, and potential high-risk foods and risk detection items are deeply explored from the association relationship among the foods. The system provides a plurality of linked views including scatter plots, radar plots, node-link plots, word clouds, pie charts, topic rivers, histograms, and tables. In addition, the system provides a plurality of interaction modes, such as filtering, highlighting and the like, and helps an analyst understand and analyze the food association network.
The technical scheme provided by the invention is as follows:
a visual analysis exploration method for high-risk food and hazards based on graph embedding is characterized in that a food association network is constructed, network nodes are vectorized through a graph embedding technology, and the whole network is embedded into a vector space; and (4) finding out subgraphs with similar structural characteristics through calculation, and then carrying out visual analysis on high-risk food and hazards.
The method comprises the steps of firstly, constructing a food association network, wherein nodes in the food association network represent food, and establishing edges in the network under the condition that whether a common hazard is detected in the two foods or not; and then vectorizing the food association network by using a graph embedding model struc2vec method, calculating the distance between every two node vectors in the food association network, and clustering the nodes (by using a k-means method) to form a plurality of node clusters, namely subgraphs (radar graphs). Subgraph structural feature indicators GDC, GCC, GBC, GPR and GANDD are defined and displayed on the radar map for representing the features of the subgraph. And connecting the radar graphs representing the subgraphs according to whether edges exist among the nodes in the different subgraphs. And (3) generating an overview chart, namely a Force-Radar view, of the whole food association network by using a Force-oriented layout method by taking a Radar chart displaying five structural characteristic indexes of the subgraphs as nodes and taking the association between the subgraphs as edges. Designing a series of auxiliary views for showing sub-graph attributes, and realizing association analysis among the multiple views through interaction. The method specifically comprises the following steps:
a: and constructing a food association network, clustering the food association network into a plurality of subgraphs, and calculating and visualizing the structural feature index value of each subgraph.
A1: and constructing a food association network.
The food association network constructed by the abstracted food entity association data is represented as a graph G ═ (V, E), and a node set V ═ V1,v2,...,vi,...,vnIn which v isiIndicating a food item to be tested, edge set E ═ E1,e2,...ei,...,emIn which eiThe method comprises the steps of adding an edge between two detected foods if the same detection item exists between the two detected foods, wherein n represents the total number of nodes in a graph, and m represents the number of edges in the graph.
In specific implementation, food is taken as a node, and if a common hazard is detected between the food, an edge is connected between the two nodes. In this way, a food association network of 938 nodes and 105012 edges was constructed.
A2: vectorizing the food association network in A1 and performing subgraph clustering.
Firstly, vectorizing a food related network node by adopting a graph embedding technology Struc2vec, wherein the Struc2vec technology can keep the structural feature similarity of the node before and after vectorization; and then clustering the oppositely quantized nodes by using a clustering method (k-means) to form a plurality of node clusters, namely obtaining different subgraph sets.
A3: sub-graph structural feature indicators GDC, GCC, GBC, GPR and GANDD are defined, and the values of the structural feature indicators of each sub-graph in A2 are calculated and displayed on a radar graph for representing the features of the sub-graphs.
Figure BDA0002650313500000031
Figure BDA0002650313500000032
Figure BDA0002650313500000041
Figure BDA0002650313500000042
Figure BDA0002650313500000043
Figure BDA0002650313500000044
Figure BDA0002650313500000045
Figure BDA0002650313500000046
Figure BDA0002650313500000047
Figure BDA0002650313500000048
Where GDC represents the degree-centrality of the graph, which reflects the average DC value of the nodes in the subgraph, where giRepresents the ith subgraph; n represents all nodes in the graph; DC (i) represents the degree-centrality of node i, kiNumber of neighbor nodes representing node iCounting; GCC, GBC, GPR and GANDD are respectively an average CC value, an average BC value, an average ANND value and an average PR value of nodes in the subgraph; CC represents the near-centrality of node i, dijRepresenting the distance between node i and node j. BC represents the betweenness centrality, g, of node istRepresenting the number of shortest paths between node s to node t, gi stRepresenting the number of paths through node i of all shortest paths from node s to node t. ANND represents the average nearest neighbor of node i, where diAnd djRespectively representing the degrees of a node i and a node j, N represents the total number of neighbor nodes of the node i, aijRepresenting the weight between node i and node j. PR represents the importance ranking of the ordering of node i, where ki outRepresents the out degree of node i, aijRepresenting the weight between node i and node j.
B: and visualizing the subgraphs in the graph and the relationship between the subgraphs. The method comprises the following steps:
b1: visualization of subgraphs: and (3) representing the subgraph by using a radar map, wherein the area of the radar map is larger as the number of nodes in the subgraph is larger.
And B2, visualizing the relation among different subgraphs. And connecting the radar graphs by taking the radar graphs as nodes according to whether edges exist among the nodes of different subgraphs to form the edges among the radar graphs. The more connections (edges) between different subgraphs, the wider the edges between radar maps.
B3: and (4) taking the Radar maps as nodes, and taking edges between the Radar maps as edges to carry out Force-directed layout, so as to generate a Force-Radar view.
In Force-Radar view, each Radar chart is regarded as a charge, edges between the Radar charts are regarded as springs, wherein the larger the area of the Radar chart is, the larger the charge quantity is, the wider the edges between the Radar charts are, the higher the elastic coefficient is, and based on hooke's law and coulomb law, the Force applied to each Radar chart can be expressed by formula (11):
Figure BDA0002650313500000051
in the formula (11), fuvTo representSpring force, g, experienced between node u and node vuvRepresenting the charge force between node v and node u, and N (v) representing the neighbor nodes of node v.
The invention also provides a graph-embedding-based visual analysis and exploration system for high-risk food and hazards, which is used for comparing the attribute difference between nodes in food association network subgraphs with different structures and analyzing food with high risk and high-risk detection items. The system comprises: the system comprises a food association network construction module, a food association network visualization module and a visualization interface module.
The food association network construction module of the system is used for constructing the abstracted food entity association data into a food association network which is represented as a graph and a subgraph and establishing a graph structure characteristic index; the method comprises the following steps: a graph construction sub-module, a vectorization and sub-graph clustering sub-module and a sub-graph structure characteristic index establishment sub-module of the food association network; the food association network visualization module of the system comprises a sub-graph visualization sub-module, a relation visualization sub-module among different sub-graphs and a Force-Radar view generation sub-module. The visualization interface module of the system consists of eight views and a table.
In the visual interface module, the views comprise an embedded view, a Radar graph, a Force-Radar view, a node-link graph, a word cloud, a pie graph, a theme river and a histogram, and the use documents are as follows: discrete and continuous color Schemes in the Colorbrewer Tool of Mark Harrower and Cynthia A. Brewer.2003.ColorBrewer.org: An Online Tool for Selecting group formulas for maps the Cartographic Journal 40,1(2003),27-37.doi: https:// doi. org/10.1179/000870403235002042. to visually map the hazard level size of food products, and a table is used to show details of nodes representing different food products. Wherein the embedded view is represented by the document: the t-SNE in Laurens V.Der Maaten, GeoffreE.Hinton.2008.visualization data using t-SNE.journal of Machine Learning Research 9(Nov.2008), 2579-2605. URL: http:// www.jmlr.org/papers/v9/van der mataten08a. html. the vectorized network nodes in the high-dimensional space are reduced to two-dimensional planes to be shown in the form of scatter diagrams for setting the cluster number (i.e. the number of Radar maps in Force-Radar views); the radar graph is used for analyzing the difference of structural characteristic indexes among different sub-graphs; the Force-Radar view is used as a summary view for analyzing the relationship between the whole structure and the subgraphs of the network; the node-link view shows the specific structure of the subgraph; the word cloud is used for analyzing keywords of the food represented by the nodes in each sub-graph; the pie chart is used for analyzing the number of each harm grade food in the subgraph; the topic river shows the proportion distribution condition of the hazard grade in the food from 2018 to 2019 in each subgraph according to the time attribute; the histogram counts the proportion distribution condition of the hazard grade in the food in different subgraphs according to the detection items; the table is used for showing detailed information of the nodes.
Further, in order to analyze the degree of contamination of the food, the food is classified into five contamination levels, and the specific judgment conditions are as shown in formula (12):
Figure BDA0002650313500000061
where p represents the detected level of a hazard in the food, S represents the maximum level of the hazard permitted in the food, and max:2 represents a hazard rating of 2 above the maximum limit.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a graph-embedding-based visual analysis exploration method and system for high-risk food and hazards. In addition, a series of interaction modes are designed, and different attributes of the nodes in the graph can be analyzed by combining with the auxiliary view. The method allows for finding potential risk food and detection items from the association between food, and the invention is a general method and can also be applied to social network analysis, academic paper network analysis, etc.
The visual analysis exploration method and system based on graph embedding for high-risk food and hazards, which are designed and realized by the invention, comprise an embedding view, a Radar chart, a Force-Radar view, a node-link chart, a word cloud, a pie chart, a theme river, a histogram and a table, and allow an analyst to analyze and discover the high-risk food and the hazards according to the detection result of the hazards in the food and the hazard detection correlation among the food. By analyzing the food detection data from 1 month in 2017 to 12 months in 2018, high-risk food and high-risk hazardous substances can be effectively found by combining the auxiliary view.
Drawings
FIG. 1 (workflow of graph-based embedded visual analysis exploration method and system for high-risk food and hazards) is a workflow of a graph-based visual analysis exploration method and system for high-risk food and hazards in an example of the present invention, which can be summarized as the following four steps of (1) building a food association network. (2) Vectorizing network nodes by using a graph embedding technology Struc2vec, wherein the vectorized nodes should keep the structural feature similarity of the nodes; (3) and (3) reducing the dimension of the nodes subjected to vector quantization by using the t-SNE, and visualizing the nodes into a scatter diagram to provide support for artificially setting the number of clusters (the number of subgraphs). And clustering vectorized nodes into different subgraph sets by adopting K-means. A novel visual design called Force-Radar is designed, and the method lays out a Radar graph in a node-link mode, wherein the Radar graph displays structural features of subgraphs and compares the difference of the structural features of each subgraph; (4) using some auxiliary views, the user is helped to explore the details of the sub-graph.
Fig. 2 (schematic view of food association network structure) is a schematic view of a food association network structure, taking food as nodes, and connecting an edge between two nodes if a common hazard is detected in the food. The nodes of different colors represent different foods, and the edges of different colors represent different detection items.
FIG. 3 (overview view of the graph-based embedded high risk food and hazard visual exploration system) is an overview view of the graph-based embedded high risk food and hazard visual exploration system. (a) Displaying a view of the network node embedding result; (b) radar maps for comparing the structural indicators of each sub-graph; (c) carrying out a Force-Radar view of the overview of the whole network; (d) node-link views to visualize the specific structure of the subgraph; (e) a word cloud consisting of the food name keywords; (f) using a pie chart to show the percentage of different levels of harm of a detected item in a food product; (g) a subject river for displaying detection frequency and hazard level of detection items in food at different times; (h) a bar chart showing the detection frequency of the detection items with different risk degrees; (i) the table is used to display detailed information of food data.
Fig. 4 (auxiliary view produced by sub-graph id 7) and fig. 5 (auxiliary view produced by sub-graph id 8) show the external properties of the two sub-graphs with the largest structural difference.
FIG. 6 (auxiliary views produced by sub-graphs id 0, id 2 and id 6) analyzes the external properties of the three sub-graphs with the least structural differences.
Detailed Description
The invention will be further described, by way of example, with reference to the accompanying drawings, without in any way limiting the scope of the invention.
The invention provides a graph embedding-based visual analysis, analysis and exploration method for high-risk food and hazards. The method considers the structural characteristics of different sub-images in the food association network and the relation existing among the sub-images, and helps a supervision organization to find high-risk food and high-risk detection items through visual analysis and exploration. The method can be used for food risk early warning, thesis citation analysis in academic circles, social network exploration analysis and the like in the food safety field.
The method comprises the steps of firstly, constructing a food association network, wherein nodes represent foods in the food association network, and establishing edges in the network under the condition that whether a common hazard is detected in the two foods or not; and then vectorizing the food association network by using a graph embedding model struc2vec method, calculating the distance between every two node vectors in the food association network, clustering the nodes (by using a k-means method) to form a plurality of node clusters, namely subgraphs, and representing the characteristics of the subgraphs by using a radar map. Five indexes GDC, GCC, GBC, GPR and GANDD representing the structural features of the subgraph are defined and displayed on a radar chart for representing the features of the subgraph. And connecting the radar graphs representing the subgraphs according to whether edges exist between the nodes in different subgraphs, wherein the more the edges exist between the two subgraphs, the wider the edges between the two radar graphs are, and the more the nodes contained in the subgraphs, the larger the area of the radar graphs is. According to the method, a Radar graph displaying five structural characteristic indexes of the subgraphs is taken as a node, the association between the subgraphs is taken as an edge, a Force-oriented layout method is adopted to generate an overview graph of the whole food association network, and the overview graph is named as a Force-Radar view. And designing a series of auxiliary views for displaying the sub-graph attributes, and realizing association analysis among the multiple views through interaction. The method specifically comprises the following steps:
a: and constructing a food association network, clustering the food association network into a plurality of subgraphs, and calculating and visualizing the structural feature index value of each subgraph.
A1: and constructing a food association network.
The abstracted food entity associated data may be represented as graph G ═ (V, E), and node set V ═ V1,v2,...,vi,...,vnIn which v isiIndicating a food item to be tested, edge set E ═ E1,e2,...ei,...,emIn which eiThe method comprises the steps of adding an edge between two detected foods if the same detection item exists between the two detected foods, wherein n represents the total number of nodes in a graph, and m represents the number of edges in the graph.
A2: vectorizing the food association network in A1 and performing subgraph clustering.
Firstly, vectorizing nodes of a food association network by adopting a graph embedding technology struc2vec, and then clustering the vectorized nodes by using a clustering method (k-means) to form a plurality of node clusters, namely different subgraph sets.
A3: defining five indexes for measuring the subgraph structural characteristics, GDC, GCC, GBC, GPR and GANDD, and calculating the values of the structural characteristic indexes of each subgraph in A2.
And expanding the five indexes for measuring the node importance into five indexes for measuring the subgraph structure characteristics. First, five indexes reflecting the importance of the nodes in the graph are introduced: wherein DC (degree center) representation centrality in formula (1) judges the importance degree of the node according to the number of neighbor nodes, wherein kiIndicating the number of neighbor nodes of node i. CC (closeness center) in formula (2) represents the approximate centrality, which is the average distance between the current node and other nodes, where dijRepresenting the distance between node i and node j. BC (betweenness centering) in formula (3) represents betweenness centrality, which is determined by the number of shortest paths through a node, where gstRepresenting the number of shortest paths between node s to node t, gi stRepresenting the number of paths through node i of all shortest paths from node s to node t. ANND (average nearest-neighbor degree) in the formula (4) represents average nearest neighbor degree, which is used to measure correlation between nodes, wherein diAnd djRespectively representing the degrees of a node i and a node j, N represents the total number of neighbor nodes of the node i, aijRepresenting the weight between node i and node j. PR (PageRank) in equation (5) is a ranking algorithm that is initially applied to the importance ranking of web pages, where k isi outRepresents the out degree of node i, aijRepresenting the weight between node i and node j.
Extending the node importance measure method to the measure of sub-graph structure defines equation (6), where GDC represents the degree-centrality of the graph, which reflects the average DC value of the nodes in the sub-graph, where giRepresents the ith sub-graph, n represents all nodes in the graph, and DC (i) represents the degree-centrality of node i. The other four indicators GCC, GBC, GPR and GANDD also follow this definition, as equation (7) -equation (10), which reflect the average CC, BC, ANND and PR values, respectively, for the nodes in the subgraph.
Figure BDA0002650313500000091
Figure BDA0002650313500000092
Figure BDA0002650313500000093
Figure BDA0002650313500000101
Figure BDA0002650313500000102
Figure BDA0002650313500000103
Figure BDA0002650313500000104
Figure BDA0002650313500000105
Figure BDA0002650313500000106
Figure BDA0002650313500000107
A4: and (4) visualization of the subgraph structural features. And mapping each subgraph to a radar map displaying subgraph structural features, and displaying values of five indexes, namely GDC, GCC, GBC, GPR and GANDD, for describing the subgraph structural features, which are defined in A2, in the radar map.
B: and visualizing the subgraphs in the graph and the relationship between the subgraphs. The method comprises the following steps:
b1: and (5) visualization of the subgraph. And (3) representing the subgraph by using a radar graph, wherein the area of the radar graph is determined by the number of nodes in the subgraph, and the area of the radar graph is larger when the number of the nodes in the subgraph is larger.
And B2, visualizing the relation among different subgraphs. And taking the radar graph displaying 5 characteristic indexes of the subgraph as nodes, and connecting the radar graph according to whether edges exist among the nodes of different subgraphs, wherein the more the connection relation (edges) among different subgraphs is, the wider the edges among the radar graphs are.
B3: and (4) taking the Radar maps as nodes, and taking edges between the Radar maps as edges to carry out Force-directed layout, so as to generate a Force-Radar view. It regards each radar chart as a charge, and the edges between the radar charts are regarded as springs, wherein the larger the area of the radar chart is, the larger the amount of charge is, the wider the edges between the radar charts are, the higher the elastic coefficient is, and the force to which each radar chart is subjected can be expressed by equation (11), based on hooke's law fuvExpressing the spring force received between the node u and the node v based on Coulomb's law guvRepresenting the charge force between node v and node u, and N (v) representing the neighbor nodes of node v.
Figure BDA0002650313500000111
The invention also provides a graph-embedding-based visual analysis and exploration system for high-risk food and hazards, which is used for comparing the attribute difference between nodes in food association network subgraphs with different structures and analyzing food with high risk and high-risk detection items. The system comprises: the system comprises a food association network construction module, a food association network visualization module and a visualization interface module. The food association network construction module of the system is used for constructing the abstracted food entity association data into a food association network which is represented as a graph and a subgraph and establishing a graph structure characteristic index; the method comprises the following steps: a graph construction sub-module, a vectorization and sub-graph clustering sub-module and a sub-graph structure characteristic index establishment sub-module of the food association network; the food association network visualization module of the system comprises a sub-graph visualization sub-module, a relation visualization sub-module among different sub-graphs and a Force-Radar view generation sub-module.
The interface module of the system consists of eight views and a table. Wherein the views include an embedded view, a Radar map, a Force-Radar view, a node-link map, a word cloud, a pie chart, a subject river, a bar chart, a mapping for visualizing the hazard level size of food using discrete and continuous color schemes in colorbriewer, and a table for showing detailed information of nodes representing different food. The embedded view is displayed in a scatter diagram mode by reducing the dimensionality of vectorized network nodes in a high-dimensional space to a two-dimensional plane through t-SEN, so as to help a user to set clustering number (the number of Radar maps in the Force-Radar view); the radar graph is used for analyzing the difference of structural characteristic indexes among different sub-graphs; the Force-Radar view is used as a summary view for analyzing the relationship between the whole structure and the subgraphs of the network; the node-link view shows the specific structure of the subgraph; the word cloud is used for analyzing keywords of the food represented by the nodes in each sub-graph; the pie chart is used for analyzing the number of each harm grade food in the subgraph; the topic river shows the proportion distribution condition of the hazard grade in the food from 2018 to 2019 in each subgraph according to the time attribute; the histogram counts the proportion distribution condition of the hazard grade in the food in different subgraphs according to the detection items; the table is used for showing detailed information of the nodes.
The specific generation steps of the related visualization result are as follows:
c: and judging the clustering number of the subgraphs according to the embedded view, setting a K value, clustering the vectorized network nodes into K subgraphs by a K-means clustering method, and setting K to be 10.
D: and generating a Force-Radar view in B3 and a Radar graph showing the network indexes of the sub-graphs according to the clustering result in the C and the network indexes in the A2.
E: and clicking Radar graph nodes in the Force-Radar to generate a node-link view for visualizing the Radar graph node structure and word cloud, pie chart, theme river and histogram of network node attribute information.
F: when clicking on a node in the node-link view, information such as a food name, a detection item, a pollution level, etc. represented by the node may be generated in a table form.
In addition, the system provides a joint highlighting, multi-graph linkage interaction mode for the user. When a user hovers a mouse over a radar graph representing a sub-graph, the number of the radar graph, the number of nodes in the sub-graph and five pieces of structural feature index information can be displayed. When a radar graph is clicked for analysis, an auxiliary view for analyzing a node-link view and node attributes of the radar graph structure can be generated, and the auxiliary view can be further analyzed according to mouse hovering.
In order to analyze the degree of contamination of food, the food is classified into five levels of contamination, and the specific judgment conditions are shown in formula (12), wherein p represents the detected content of the hazard in the food, S represents the maximum content of the hazard allowed in the food, and max:2 represents that the hazard level exceeding the maximum limit is 2.
Figure BDA0002650313500000121
Aiming at the graph-embedded visual analysis and exploration system for high-risk food and hazardous materials, in the specific embodiment of the invention, 31849 pieces of data of 1571 meat products, which comprise 47 detection items and have production dates distributed from 1 month in 2017 to 12 months in 2018, are analyzed. Data attributes used in this study include food name, date of manufacture, test item, and food hazard rating.
For analysis and discovery of test items in high hazard level foods, the present invention only considers test data with hazard levels greater than 1. 1098 data were analyzed for 23 test items, 938 food items, and so on, after screening. To have a one-to-one correspondence of nodes and foods, we incorporate the same food into the detection data. For example, if the lead hazard level detected from beef a is 2 and the lead hazard level detected from beef B is 4, then the lead hazard level in beef is defined as 3, where beef a and beef B are the same food and are represented by a node in the food association network. This is a way of calculating the average (rounding).
Food is taken as a node, and if a common hazard is detected between the food, an edge is connected between the two nodes. In this way, a food association network of 938 nodes and 105012 edges was constructed.
The following is an example of analyzing the detection data of meat quality in a certain region of China in the months from 2017 to 2018 in the month 12 by using the graph embedding-based visual analysis and exploration method for high-risk food and hazards, and exploring the high-risk food and the hazards therein. Part of the raw data is shown in table 1:
TABLE 1 month to 2018 month 12 month test results (part) about hazards in meat quality in 12017
The method for analyzing the meat product detection data comprises the following specific steps:
food name Date of manufacture Detecting items Hazard classification
Preserved meat 2017/1/26 Fat max:3
Preserved chicken leg 2017/9/3 Fat max:4
Dried pork slice 2017/9/7 Chromium (III) max:2
Duck neck 2017/10/5 Nitrite salt max:2
Fish meat jujube 2017/10/7 Sorbic acid max:2
Spicy duck wings 2017/10/8 Nitrite salt max:2
Farmhouse sausage 2017/10/13 Fat max:2
Sausage prepared from two lakes 2017/11/2 Lead (II) max:2
A: constructing a food association network, and clustering nodes in the food association network into a plurality of nodes according to the similarity of the structural characteristicsAnd (4) sub-graph. The abstracted data can be expressed as G ═ (V, E), V ═ bacon, duck necki,., ham leg } represents the node set in the diagram, E { (ham ), (farmhouse sausage, ham leg) } represents the edge set in the diagram. The network nodes are vectorized and then reduced to a two-dimensional space, as shown in fig. 3(a), according to the distribution of the scatter diagram, the number of clusters is set to 10, and the structural feature information of the 10 sub-graphs and the relationship among the structural feature information are visualized, as shown in fig. 3(c), the most nodes in the id 1 sub-graph can be visually seen, and the edges between id 1 and id 8 are denser. In order to clearly compare the structural feature information among the sub-graphs, the structural feature indicators of the 10 sub-graphs are shown on a radar graph, as shown in fig. 3 (b).
B: and (4) analyzing the attributes contained in the two subgraphs with the largest difference of the structural feature indexes in the subgraphs to find high-risk food and high-risk detection items.
Two sub-graphs (id 7 and id 8) with the largest difference in structural features are selected from the 10 sub-graphs, as shown in fig. 3(b), and then the attributes of the food contained in the two sub-graphs are analyzed through an auxiliary view.
Two sub-graphs (id 7 and id 8) with the largest difference in structural features are selected and their node attributes are displayed using an auxiliary view, as shown in fig. 4 and fig. 5, where sub-graph id 8 has a larger number of food items with higher hazard levels than sub-graph id 7. And the structure of the subgraphs is also very different, id 8 is a connected graph and the network of id 7 is composed of five parts as shown in fig. 4(a) and fig. 5 (a). Sub-graph id 7 represents a food association network consisting of 45 food nodes and 5 detection items as edges, while id 8 represents a food association network consisting of 37 food nodes and 7 detection items as edges.
Through the food node attribute analysis on the subgraph id 7, the food hazard level in the subgraph is generally higher, and almost half of the food hazard level is 5, as shown in fig. 4 (c). In the four connected subgraphs with the lowest node number, the hazard grades of the detection items in the food are all 5. In addition, there are five food detection items in the sub-graph, and the detection of the hazard level of the food contained in the four sub-graphs is 5, as shown in fig. 4 (e). It can be concluded that this sub-map contains a number of high risk test items such as preservatives, benzoic acid, colony counts and carmine. Notably, the end of the subject river, representing the level and amount of food hazard, tends to become large. This indicates that more rejected food products may be detected in the future.
Sub-graph id 8 is then analyzed. From the auxiliary view, the detected hazard level of the food in this sub-graph was found to be relatively low. A food with a hazard rating of 5 was detected only in 2018 at 4 months, as shown in fig. 5 (d). But it is found from fig. 5(a) that the nodes in the node-link graph are relatively large (more connections between nodes). After some nodes are selected for analysis, it is found that various harmful substances are detected in the food corresponding to each node, as shown in fig. 5(f), 5 harmful substances are detected in the duck wing, and it can be concluded that a large amount of high-risk food is contained in the sub-graph and needs to be strictly monitored. Since some brand information is included in the name of the food, the name of the food is not shown, but all the foods are counted and their keywords are extracted. From the word cloud, it can be seen that the keywords extracted from the food names not only contain information of food components, but also contain some geographic information.
And C, analyzing the attributes contained in the three subgraphs with the minimum difference of the structural characteristic indexes in the subgraphs to find high-risk detection items existing in specific food.
Three sub-graphs (id 0, id 2 and id 6) with very close structural feature indexes are selected and the generated auxiliary views are analyzed.
Through analysis, the three subgraphs are found to be very dense connected graphs containing many nodes, as shown in fig. 6 (d-f). 141 food items and 4 detection items are contained in a sub-graph id 0, 133 food items and 2 detection items are contained in a sub-graph id 2, and 80 food items and 1 detection items are contained in a sub-graph id 6. According to the food association network construction rule, it can be judged that id 6 is a complete graph, and id 0 and id 2 are not complete graphs, but their edges are mainly composed of a harmful substance, as shown in fig. 6(a-b), it can be inferred that the harmful substance exists in specific foods, or only the harmful substance in the subgraph is frequently detected in the foods, such as only cadmium is detected in beef, Tujia preserved hoof and chicken stick. Regulatory agencies should be reminded to tightly control the use of specific hazardous substances in these specific foods.
From the food hazard level point of view, the food hazard levels in these three sub-graphs are concentrated at 2 and 3, and none of the food hazard levels are particularly high. The method is favorable for the supervision organization to judge the current situation of the food safety environment. Therefore, it is reasonable to believe that the items tested in these two sub-graphs have a low risk level in these foods.
By way of example, it is proved that according to the difference of structural feature indexes of different subgraphs, a user can find high-risk food and high-risk detection items. When the structural characteristic indexes of the subgraph are close, the structure of the subgraph is also close, and certain properties of contained food are also close, so that a user can be helped to analyze harmful substances in the food.
Furthermore, the visualization method of the present invention can also be applied to analysis of social networks, protein networks, and the like.
It is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and appended claims. Therefore, the present invention should not be limited to the disclosure of the embodiments, and the scope of the present invention is defined by the appended claims.

Claims (9)

1. A visual analysis method for high-risk food and hazards based on graph embedding is characterized in that a food association network is constructed, network nodes are vectorized through a graph embedding technology, and the whole network is embedded into a vector space; sub-graphs with similar structural characteristics are found out through calculation, and then high-risk food and hazard visual analysis is carried out; the method comprises the following steps:
1) constructing a food association network, wherein nodes represent foods in the food association network, and edges in the network are established on the condition that whether a common hazard is detected in the two foods or not;
the food association network constructed by the abstracted food entity association data is represented as a graph G ═ (V, E), and a node set V ═ V1,v2,...,vi,...,vnIn which v isiIndicating a food item to be tested, edge set E ═ E1,e2,...ei,...,emIn which eiIndicating that if the same detection items exist between two detected foods, adding an edge between the two foods, wherein n represents the total number of nodes in the graph, and m represents the number of edges in the graph;
2) vectorizing a food association network, clustering nodes by calculating the distance between every two node vectors in the food association network to form a plurality of node clusters, namely subgraphs;
3) defining structural characteristic indexes GDC, GCC, GBC, GPR and GANDD of subgraphs to be respectively expressed as formulas (6) to (10), and calculating to obtain the value of the structural characteristic index of each subgraph;
Figure FDA0002650313490000011
Figure FDA0002650313490000012
Figure FDA0002650313490000013
Figure FDA0002650313490000014
Figure FDA0002650313490000015
Figure FDA0002650313490000016
Figure FDA0002650313490000017
Figure FDA0002650313490000018
Figure FDA0002650313490000019
Figure FDA0002650313490000021
where GDC represents the degree-centrality of the graph, reflecting the average DC value of the nodes in the subgraph, where giRepresents the ith subgraph; n represents all nodes in the graph; DC (i) represents the degree-centrality of node i, kiRepresenting the number of neighbor nodes of the node i; GCC, GBC, GPR and GANDD are respectively an average CC value, an average BC value, an average ANND value and an average PR value of nodes in the subgraph; CC represents the near-centrality of node i, dijRepresents the distance between node i and node j; BC represents the betweenness centrality, g, of node istRepresenting the number of shortest paths between node s to node t, gi stRepresenting the number of paths passing through the node i in all shortest paths from the node s to the node t; ANND represents the average nearest neighbor of node i, where diAnd djRespectively representing the degrees of a node i and a node j, N represents the total number of neighbor nodes of the node i, aijRepresenting the weight between the node i and the node j; PR represents the importance ranking of the ordering of node i, where ki outRepresents the out degree of node i, aijRepresenting the weight between the node i and the node j;
4) displaying the structural characteristic index value of each subgraph obtained by calculation in the step 3) on a radar graph corresponding to the subgraph for representing the characteristics of the subgraph; connecting each radar graph representing the subgraph according to whether edges exist among nodes in different subgraphs to form the edges of each radar graph;
5) using a Radar graph displaying the structural characteristic indexes of the subgraphs as nodes, using the association between the subgraphs as edges, and adopting a Force-oriented layout method to generate an overview graph, namely a Force-Radar view, of the whole food association network;
Force-Radar views, where each Radar map is treated as a charge and the edges between the Radar maps are treated as springs; based on hooke's law and coulomb's law, the force experienced by each radar plot is expressed as equation (11):
Figure FDA0002650313490000022
in the formula (11), fuvRepresents the spring force, g, received between node u and node vuvRepresenting the charge force between the node v and the node u, and N (v) representing the neighbor node of the node v;
6) designing a series of auxiliary views for showing sub-graph attributes, and realizing association analysis among a plurality of views through interaction;
through the process, graph-embedding-based visual analysis of high-risk food and hazards is achieved.
2. The graph embedding-based visual analysis method for high-risk food and hazards, according to claim 1, characterized in that step 2) specifically applies the graph embedding model struc2vec method to vectorize the food association network.
3. The graph-embedding-based visual analysis method for high-risk food and hazards, according to claim 2, characterized in that step 2) specifically uses k-means clustering method to cluster the quantified nodes to form a plurality of node clusters, i.e. different subgraph sets are obtained.
4. The graph-embedding-based visual analysis method for high-risk food and hazards, according to claim 1, characterized in that in step 4), the subgraph is represented by a radar graph, and the larger the number of nodes in the subgraph, the larger the area of the radar graph; the more edges that are connected between different subgraphs, the wider the edges between radar maps.
5. The graph-based visual analysis method of high-risk food and hazardous materials according to claim 4, wherein in step 5), the larger the radar map area, the larger the amount of electric charge; the wider the edge between radar maps the higher its spring constant.
6. The graph-based visual analysis method for high-risk food and hazards embedding according to claim 5, wherein the degree of contamination of the food is further visually analyzed by dividing the degree of contamination into five levels of contamination, which are expressed by the following formula (12):
Figure FDA0002650313490000031
wherein p represents the detected content of the hazard in the food, S represents the maximum content of the hazard allowed in the food, and max:2 represents that the hazard rating of exceeding the maximum limit is 2.
7. A graph-based embedded visual analysis system of high-risk food and hazards, comprising: the food correlation network construction module, the food correlation network visualization module and the visualization interface module are used for comparing the attribute difference among nodes in food correlation network subgraphs of different structures and analyzing food with high risk and high-risk detection items;
the food association network construction module of the system is used for constructing the abstracted food entity association data into a food association network which is represented as a graph and a subgraph and establishing a graph structure characteristic index; the method comprises the following steps: a graph construction sub-module, a vectorization and sub-graph clustering sub-module and a sub-graph structure characteristic index establishment sub-module of the food association network; the food association network visualization module of the system comprises a sub-graph visualization sub-module, a relation visualization sub-module among different sub-graphs and a Force-Radar view generation sub-module; the visual interface module of the system comprises eight views and a table;
in the visual interface module, the views comprise an embedded view, a Radar chart, a Force-Radar view, a node-link chart, a word cloud, a pie chart, a theme river and a histogram, discrete and continuous color schemes in a Colorbrewer tool are used for carrying out visual mapping on the hazard grade size of food, and a table is used for displaying detailed information of nodes representing different food.
8. The graph embedding-based visual analysis system for high-risk food and hazards, according to claim 7, wherein the embedding view is displayed in a scatter diagram form by reducing the dimensionality of vectorized network nodes in a high-dimensional space to a two-dimensional plane by t-SEN, and is used for setting the number of clusters, namely the number of Radar maps in a Force-Radar view; the radar graph is used for analyzing the difference of structural characteristic indexes among different sub-graphs; the Force-Radar view is used as an overview view for analyzing the overall structure of the network and the relationship between sub-graphs.
9. The graph-based embedded visual analysis system of high-risk food and hazards, according to claim 7, wherein the node-link view shows the specific structure of the sub-graph; the word cloud is used for analyzing keywords of the food represented by the nodes in each sub-graph; the pie chart is used for analyzing the number of each harm grade food in the subgraph; the topic river shows the proportion distribution condition of the hazard grade in each subgraph food according to the time attribute; the histogram counts the proportion distribution condition of the hazard grade in the food in different subgraphs according to the detection items; the table is used for showing detailed information of the nodes.
CN202010868082.7A 2020-08-24 2020-08-26 Graph embedding-based high-risk food and hazard visual analysis method and system Active CN112148834B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010855911 2020-08-24
CN2020108559118 2020-08-24

Publications (2)

Publication Number Publication Date
CN112148834A true CN112148834A (en) 2020-12-29
CN112148834B CN112148834B (en) 2022-03-29

Family

ID=73887587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010868082.7A Active CN112148834B (en) 2020-08-24 2020-08-26 Graph embedding-based high-risk food and hazard visual analysis method and system

Country Status (1)

Country Link
CN (1) CN112148834B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112732727A (en) * 2021-04-06 2021-04-30 南京冰鉴信息科技有限公司 Graph index flow batch integrated processing method and device
JP2022055690A (en) * 2020-09-29 2022-04-08 Tdk株式会社 Food and drink evaluation system and food and drink evaluation program
CN114745171A (en) * 2022-04-08 2022-07-12 深圳市魔方安全科技有限公司 External attack surface visualization analysis method and system based on graph technology

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060106847A1 (en) * 2004-05-04 2006-05-18 Boston Consulting Group, Inc. Method and apparatus for selecting, analyzing, and visualizing related database records as a network
US20140230030A1 (en) * 2006-11-22 2014-08-14 Raj Abhyanker Method and apparatus for geo-spatial and social relationship analysis
US9740368B1 (en) * 2016-08-10 2017-08-22 Quid, Inc. Positioning labels on graphical visualizations of graphs
US20180165611A1 (en) * 2016-12-09 2018-06-14 Cognitive Scale, Inc. Providing Commerce-Related, Blockchain-Associated Cognitive Insights Using Blockchains
CN109508388A (en) * 2018-11-28 2019-03-22 交通银行股份有限公司 A kind of method and apparatus of relational network visualization map
US20190295111A1 (en) * 2017-04-22 2019-09-26 Visva Inc. Method and system for test-driven bilayer graph model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060106847A1 (en) * 2004-05-04 2006-05-18 Boston Consulting Group, Inc. Method and apparatus for selecting, analyzing, and visualizing related database records as a network
US20140230030A1 (en) * 2006-11-22 2014-08-14 Raj Abhyanker Method and apparatus for geo-spatial and social relationship analysis
US9740368B1 (en) * 2016-08-10 2017-08-22 Quid, Inc. Positioning labels on graphical visualizations of graphs
US20180165611A1 (en) * 2016-12-09 2018-06-14 Cognitive Scale, Inc. Providing Commerce-Related, Blockchain-Associated Cognitive Insights Using Blockchains
US20190295111A1 (en) * 2017-04-22 2019-09-26 Visva Inc. Method and system for test-driven bilayer graph model
CN109508388A (en) * 2018-11-28 2019-03-22 交通银行股份有限公司 A kind of method and apparatus of relational network visualization map

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈谊等: ""食品安全大数据可视分析方法研究"", 《计算机辅助设计与图形学学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022055690A (en) * 2020-09-29 2022-04-08 Tdk株式会社 Food and drink evaluation system and food and drink evaluation program
JP7136165B2 (en) 2020-09-29 2022-09-13 Tdk株式会社 Food evaluation system and food evaluation program
CN112732727A (en) * 2021-04-06 2021-04-30 南京冰鉴信息科技有限公司 Graph index flow batch integrated processing method and device
CN114745171A (en) * 2022-04-08 2022-07-12 深圳市魔方安全科技有限公司 External attack surface visualization analysis method and system based on graph technology

Also Published As

Publication number Publication date
CN112148834B (en) 2022-03-29

Similar Documents

Publication Publication Date Title
CN112148834B (en) Graph embedding-based high-risk food and hazard visual analysis method and system
Smith et al. The geometry of continuous latent space models for network data
de Lange et al. The Laplacian spectrum of neural networks
US20190294635A1 (en) Outcome analysis for graph generation
Perry et al. Illustrations and guidelines for selecting statistical methods for quantifying spatial pattern in ecological data
Hammond et al. Graph diffusion distance: A difference measure for weighted graphs based on the graph Laplacian exponential kernel
Schweinberger et al. HERGM: Hierarchical exponential-family random graph models
Butler et al. A latent Gaussian model for compositional data with zeros
Zhang et al. Similarity-based classification in partially labeled networks
Comber et al. Community detection in spatial networks: Inferring land use from a planar graph of land cover objects
Zhang et al. Landscape patterns and building functions for urban land-use classification from remote sensing images at the block level: a case study of Wuchang District, Wuhan, China
CN111898839A (en) Importance degree classification method and device for power consumers
CN111460059A (en) Ambient air quality data visualization method, device, equipment and storage medium
CN113537496A (en) Deep learning model visual construction system and application and design method thereof
Fasy et al. Exploring persistent local homology in topological data analysis
Yan et al. Finding missing edges and communities in incomplete networks
da Fonseca et al. Agro 4.0: A data science-based information system for sustainable agroecosystem management
Xu et al. Modeling forest fire spread using machine learning-based cellular automata in a GIS environment
Valdés et al. Visualizing high dimensional objective spaces for multi-objective optimization: A virtual reality approach
Clauzel et al. Graphab 2.4 User Manual
Thangaraj et al. Mgephi: Modified gephi for effective social network analysis
Zhao et al. Missbin: Visual analysis of missing links in bipartite networks
Nardini et al. A computer aided approach for river styles—Inspired characterization of large basins: A structured procedure and support tools
Bak et al. Visual analytics of urban environments using high-resolution geographic data
Mitchell et al. Regional variation in forest canopy height and implications for koala (Phascolarctos cinereus) habitat mapping and forest management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant