CN114357311A

CN114357311A - Force-directed graph layout method based on community discovery and cluster optimization

Info

Publication number: CN114357311A
Application number: CN202210029705.0A
Authority: CN
Inventors: 高天寒; 韩林珊
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2022-01-12
Filing date: 2022-01-12
Publication date: 2022-04-15

Abstract

The invention discloses a force-directed graph layout method based on community discovery and cluster optimization, and relates to the technical field of visual layout of graph data. The method comprises the following steps: converting the original data into corresponding graph data; dividing nodes of the graph data into leaf nodes and non-leaf nodes, regarding each non-leaf node as a community, and compressing the leaf nodes to obtain compressed graph data; performing a community discovery process of maximizing modularity when a first stage of a traditional Louvain algorithm is performed on the compressed graph data; the iterative community merging process in the second stage of the traditional Louvain algorithm is replaced by selective community merging on the updated graph data in the last step; and (4) realizing the force guide graph layout based on clustering optimization by utilizing a ComboForce layout algorithm on the community structure and the corresponding graph data obtained in the step (4). The method improves the layout efficiency when the force guide diagram layout performs visual layout on the diagram data, and optimizes the layout effect when the force guide diagram layout performs visual layout on the diagram data.

Description

Force-directed graph layout method based on community discovery and cluster optimization

Technical Field

The invention relates to the technical field of visual layout of graph data, in particular to a force-directed graph layout method based on community discovery and cluster optimization.

Background

The visual layout can convert the data into graphs or images to be displayed on a screen and provide interaction, so that effective and valuable information in the data can be visually displayed, and the visual layout plays an important role in analyzing and mining the data. For different types of data, the visualization layout needs to convert it into different categories of graphics to form different types of charts.

Graph data is data representing complex data entities and data relationships and exists in the fields of social networks, biological networks, mobile device communication networks, financial transaction networks and the like. The data model of graph data can be viewed as a collection of nodes and edges, and can be formulated generally as follows: g ═ V, E >, where V (vertex) represents a node in the data model, the entity embodying the data may contain entity attributes (expressed as key-value pairs); e (edge) represents an edge in the data model, always from a starting node (source) to an ending node (target), and represents the relationship between entities, which may include attributes and directions. Data relation models under various scenes can be constructed through graph data, such as literary works or social network character relation networks, financial transaction or personnel management networks, natural language processing triple set relation networks and the like. For graph data, a commonly-used visualization layout mode is a node-link graph (node-link graph) layout, that is, a data entity and an entity relationship are represented by links between nodes.

The force guide graph layout is a common node link graph layout method, the force guide graph layout comprises nodes and edges of link nodes, repulsion force exists between any two nodes through establishing a mechanical model, the two nodes are not too close to each other, attraction force exists between the two nodes with the edges, the two nodes are not too far away from each other, and therefore the node link graph layout effect is beautified. However, the traditional force guide graph layout has certain limitations, which are mainly reflected in layout efficiency and layout effect. The traditional force guide graph layout needs a certain time to enable the mechanical model to reach a stable state from disorder, when the graph data scale is large and the relationship is complex, the mechanical model can be converged to a stable state through long-time oscillation, and the layout efficiency is limited; meanwhile, when the graph data scale is large and the relationship is complex, the overlapping phenomenon between the nodes and the edges can not be avoided by only depending on the repulsive force and the attractive force between the nodes, and the layout effect is influenced. Optimizing the force directed graph layout is therefore an important research direction.

At present, the optimization of the force directed graph layout mainly comprises the optimization of a mechanical model thereof, for example, an approximate solution of Euclidean distance between nodes is obtained by using a multidimensional scaling analysis (MDS) method so as to complete the node layout; drawing some brief graphs with representative nodes by using a multilayer iterative layout Algorithm (Multilevel Algorithm), gradually adding original nodes to improve the precision, and finally generating an ideal layout; in chinese patent CN 107818149B, "a graph data visualization layout optimization method based on force guidance algorithm", the adjustment and stabilization methods of the positions of nodes and edges during layout are added to the layout rules of the conventional force guidance graph. However, these optimization methods have several problems: 1. usually, only the mechanical model is adjusted, and the geometric distance of the nodes in the layout formed by the mechanical model often has a certain error with the path length between the nodes in the graph data, which affects the understanding and judgment of the graph data; 2. usually, only the basic attributes of the nodes and edges in the graph data are used for adjusting the mechanical model, and the information contained in the graph data is not analyzed and mined, so that the final layout does not sufficiently display the analysis result of the graph data.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a force guide graph layout method based on community discovery and cluster optimization, aiming at improving the layout efficiency when the force guide graph layout is used for visually laying out graph data and optimizing the layout effect when the force guide graph layout is used for visually laying out graph data.

The technical scheme of the invention is as follows:

a force directed graph layout method based on community discovery and cluster optimization comprises the following steps:

step 1: converting original data needing to utilize visual layout for data analysis into corresponding graph data;

step 2: dividing the nodes in the original graph data obtained in the step 1 into leaf nodes and non-leaf nodes, regarding each non-leaf node as a community, and simultaneously compressing the leaf nodes to obtain compressed graph data;

and step 3: performing a community discovery process of maximizing modularity in the first stage of the traditional Louvain algorithm on the compression map data obtained in the step 2;

and 4, step 4: replacing an iterative community merging process in the second stage of the traditional Louvain algorithm with selective community merging on the graph data updated in the step 3, and reducing the calculated amount generated in the second stage of the traditional Louvain algorithm;

and 5: and (4) realizing the force guide graph layout based on clustering optimization by utilizing a ComboForce layout algorithm on the community structure and the corresponding graph data obtained in the step (4).

Further, according to the force directed graph layout method based on community discovery and cluster optimization, the graph data is composed of nodes and edges.

Further, according to the force directed graph layout method based on community discovery and cluster optimization, the step 2 includes the following steps:

step 2.1: traversing each node in the original graph data, defining the node with the degree of not 1 as a non-leaf node and regarding each non-leaf node as a community, defining the node with the degree of 1 as a leaf node and distributing the traversed leaf node to the community to which the neighbor node belongs;

step 2.2: traversing the number of nodes contained in each community, compressing the communities with the number of the nodes more than 1, and updating the original graph data into a new graph formed by compression in the step;

step 2.3: and (3) repeatedly executing the step 2.1 and the step 2.2 on the new graph, and iterating until no leaf node needing to be compressed exists in the graph, namely the community distribution result is not changed any more, and returning the graph data and the community distribution condition of each node at the moment.

Further, according to the force directed graph layout method based on community discovery and cluster optimization, the method for compressing communities with nodes more than 1 is as follows: and replacing all nodes in the community with the number of the nodes larger than 1 by using a new node, and inheriting the edge weight inside the community and the edge weight outside the community.

Further, according to the force directed graph layout method based on community discovery and cluster optimization, the step 4 includes the following steps:

step 4.1: carrying out graph compression pretreatment on the community structure obtained in the community discovery process in the step 3 by adopting the graph compression method in the step 2;

step 4.2: each node in the graph data after being preprocessed is regarded as an independent community, each node is traversed, and each traversed node is divided into a seed node and a non-seed node;

step 4.3: distributing the traversed non-seed nodes to communities where neighbor nodes belonging to a seed node set are located, performing relevant calculation of community distribution based on modularity variation delta Q in the first stage of the traditional Louvain algorithm, if the calculation can obtain a new community distribution mode, recording the community structure of the distribution mode, and then jumping to the step 4.5, otherwise, continuing to execute the step 4.4;

step 4.5: and continuously traversing the next non-seed node, and circularly returning to the step 4.3 until all the non-seed nodes in the set are traversed, obtaining a community structure of which the modularity Q is not increased any more and is ensured to be approximate to the maximum, and storing the community structure.

Further, according to the force directed graph layout method based on community discovery and cluster optimization, the method for distinguishing each traversed node into a seed node and a non-seed node comprises the following steps: judging whether the node is a seed node or not according to the degree of the node, defining the node meeting the judgment formula as the seed node, and defining the node not meeting the judgment formula as a non-seed node, wherein the judgment formula is as follows:

deg(v)>g+p

wherein v represents the nodes in the traversed graph data, deg (v) represents the degree of v, g represents the average number of degrees of the nodes in the graph data, and p represents the standard deviation of the degrees of the nodes in the graph data.

Further, according to the force directed graph layout method based on community discovery and cluster optimization, the step 5 includes the following steps:

step 5.1: after adding a community object combos representing a community structure to the data structure of the graph data obtained in the step 1, storing the community structure and the corresponding graph data obtained in the step 4;

step 5.2: performing related setting on a web page for displaying a visual layout image;

step 5.3: creating an image for packaging a visual layout, setting related attributes of the image, and designating a visual layout mode as ComboForce;

step 5.4: creating a graph in an image for realizing ComboForce layout, and setting relevant attributes of the graph;

step 5.5: adding user interaction events for the images and the graphs, so that the users can adjust and interact the images or the graphs;

step 5.6: and mapping the Graph data Graph obtained in the step 5.1 into the image Graph, and rendering, displaying and checking the image.

Further, according to the force directed graph layout method based on community discovery and cluster optimization, the step 5.3 includes the following steps:

step 5.3.1: creating and initializing an image object named Graph through a G6 Graph () function, and setting attributes of Graph, including setting default values of a width attribute and a height attribute for controlling the size of Graph, setting a fitView and a fitViewPadding attribute for adapting Graph to a canvas display effect, and setting a minZoom and a maxZoom attribute for controlling an interactive event display effect of the canvas scaling; defining a contour color array and a filling color array at the same time, and setting the attribute of the Graph when the Graph in the Graph is drawn subsequently;

step 5.3.2: and designating the visual layout method of the image as ComboForce and setting relevant parameters of the ComboForce layout.

Further, according to the force directed graph layout method based on community discovery and cluster optimization, the step 5.4 includes the following steps:

step 5.4.1: creating a node pattern, and carrying out related setting on the attribute of the node pattern; the attributes of the node pattern include: the shape of the node pattern, the size of the node pattern, the style attribute of the node pattern, and the label of the node pattern;

step 5.4.2: creating a side graph and carrying out related setting on the attribute of the side graph; the attributes of the edge graph include: the shape of the edge graph, the style attribute of the edge graph, the detection width of the edge graph and the label of the edge graph;

step 5.4.3: creating a grouping graph and carrying out related setting on the attribute of the grouping graph; the properties of the grouping pattern include: the shape of the grouping graph, the related size of the grouping graph, the style attribute of the grouping graph and the label of the grouping graph.

Compared with the prior art, the invention has the following beneficial effects:

(1) the force guide graph layout selected by the method is a common and reliable node link graph layout method when the graph data is visually laid out; the method of the invention can further optimize the layout effect of the force guide graph layout by clustering optimization of the force guide graph layout, so that an observer can more easily understand and analyze the information contained in the graph data expressed by the layout.

(2) The method for optimizing the clustering of the force-directed graph layout not only optimizes and adjusts the mechanical model of the force-directed graph layout, but also realizes the optimization and adjustment based on the analysis and mining of the graph data, so that the layout effect is optimized, and the optimization direction is more consistent and the information contained in the graph data is more fully displayed.

(3) The Louvain algorithm selected in the method is a classic and reliable community discovery algorithm, analysis and mining of information contained in the graph data can be achieved, the community discovery algorithm is matched with node classification of communities to which the nodes in the graph data belong in a grouping-based clustering mode in clustering optimization of force-directed graph layout, and clustering optimization of force-directed graph layout can be well guided.

(4) The method of the invention improves the traditional Louvain algorithm, utilizes the optimization thought of pruning, avoids unnecessary calculation, selects the calculation of data analysis requirement in the method of the invention, and realizes the community structure which reduces the calculation amount through the graph compression aiming at the leaf nodes, avoids the iterative process in the algorithm through selective community combination, and better accords with the requirement of better visual layout effect of the community discovery result, thereby improving the efficiency of the Louvain algorithm and the algorithm effect when being applied to the visual layout.

(5) The graph database neo4j, json format and AntV G6.js technology selected in the method can serve as a technical tool to better meet the design requirements of all steps of the method and realize the design idea of all steps of the method, and the method can be realized on all platforms as a reliable and universal technical tool.

Drawings

FIG. 1 is a flow diagram of a force directed graph layout method based on community discovery and cluster optimization in accordance with the present invention;

FIG. 2 is a schematic diagram of the logic for generating, storing and managing data according to the embodiment;

FIG. 3 is a schematic diagram of an implementation flow of a conventional Louvain algorithm;

FIG. 4 is a schematic diagram of an implementation flow of the improved Louvain algorithm of this embodiment

FIG. 5 is a schematic diagram illustrating a graph compression flow for a leaf node according to this embodiment;

FIG. 6 is a schematic diagram of the selective community merging process according to the present embodiment;

fig. 7 is a schematic flow chart illustrating an implementation process of the clustering-optimized force-directed graph layout method according to this embodiment.

Detailed Description

To facilitate an understanding of the present application, the present application will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present application are given in the accompanying drawings. This application may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

FIG. 1 is a flow chart diagram of the force directed graph layout method based on community discovery and cluster optimization according to the present invention. As shown in fig. 1, the force directed graph layout method based on community discovery and cluster optimization includes the following steps:

in the embodiment, original data which needs to be subjected to data analysis by using a visual layout is converted into graph data by defining the data structure and various attributes of the graph data, and the generated graph data is stored and managed by a neo4j database and a json file. Specifically, as shown in fig. 2, each node (node) and edge (edge) constituting graph data are defined according to a data entity and a data relationship in original data, so as to obtain graph data (graph) composed and expressed by a node set (nodes) and an edge set (edges), and a data file storing the graph data (graph) is generated in a json format.

A node object needs to define the relevant attributes of the node according to the specific attributes contained in the data entity, and the key attributes are listed in table 1.

TABLE 1 Key Attribute information relating to nodes

The communities, described in Table 1, represent a collection of closely related data entities. In this step, specific values are assigned and set according to the related attributes of the data entity represented by the node and stored when the id and label are defined, and an empty initialization value is assigned and set for the node when the combo attribute is defined, and the calculation result of the community discovery algorithm in the subsequent step is waited to be stored.

Edge (edge) objects represent data relationships among data entities and are represented as links to 2 nodes, so that the edge objects are represented and stored in graph data (graph) in the form of node pairs, and the related attributes of the edges are defined according to specific attributes contained in the data relationships, wherein the key attributes are listed in table 2.

TABLE 2 Key Attribute information related to edge (edge)

Together, these generated nodes and edges constitute graph data (graph), i.e., graph constitutes and represents a set of nodes (nodes) that contain all nodes and a set of edges (edges) that contain all edges.

The above process of this step can realize generation and storage of the graph data graph by the graph database neo4j and using CQL statements. Neo4j can also manage and query the stored graph by using CQL statements, and the query result can be written into data files in json, csv, gexf, gml and other formats for storage and transmission. In the step, a json format is selected, each object in the graph and key attributes (id, combo, source, target and weight of edge) of each object are queried by using a CQL statement, the simplified graph obtained by utilizing the query is written into a json file, and a data file used for calculating a community discovery algorithm in the subsequent step is generated.

The specific format of the graph formed in the step and stored in the json file can be displayed by taking pseudo ternary data "wanxx identity trial length" as an example. It can be seen that "wangx" and "judge" are data entities, and constitute 2 pieces of node data in nodes in a graph, and "identity" is a data relationship, and constitutes 2 pieces of edge data in edges in a graph, so that the graph data graph finally represents the following form:

fig. 3 is a schematic flow chart of a conventional implementation of the Louvain algorithm. The luvain algorithm is a community discovery algorithm based on Modularity (Modularity), as shown in fig. 3, the core of the luvain algorithm is Modularity Q and Modularity increment Δ Q, the implementation mode can be divided into 2 stages of iterative computation, and the goal is to maximize the Modularity Q of the graph attribute structure (community network) of the whole graph data. Fig. 4 is a schematic diagram of an implementation flow of the improved luvain algorithm provided by the present invention, which aims to improve the computational efficiency of the conventional luvain algorithm and optimize the community discovery result of the conventional luvain algorithm. Fig. 5 is a diagram compression flow diagram for leaf nodes, fig. 6 is a diagram of selective community merging flow diagram, and fig. 5 and fig. 6 show key improvement parts of the improved Louvain algorithm provided by the present invention.

Step 2: dividing nodes in original graph data into leaf nodes and non-leaf nodes, regarding each non-leaf node as a community, and compressing the leaf nodes to obtain compressed graph data;

this step belongs to the first stage of improving the Louvain algorithm, as shown in fig. 4, and is also a key improvement part of the conventional Louvain algorithm. Because each node is required to be regarded as an independent community to perform modularity related operation in the traditional Louvain algorithm, the optimization idea of pruning is adopted in the step, the nodes of the community which can be determined without the modularity related operation are removed, the range of the nodes which need to be calculated is reduced, the calculation times of the modularity related operation in the subsequent step are reduced, and the algorithm efficiency is improved.

The core idea of compressing the leaf nodes is as follows: the node with the degree of 1 in the graph data graph is defined as a leaf node (leaf), and since the leaf node with the degree of 1 can only belong to the community to which the neighbor node (neighbor) uniquely linked with the leaf node belongs, the leaf node leaf is directly distributed to the community of the neighbor node (neighbor) of the leaf node, so that graph compression on the original graph data is realized, and the realization flow of the graph compression is shown in fig. 5. This step will be described in detail below with reference to the figures:

step 2.1: traversing each node in the original graph data graph, defining a node with the degree not being 1 as a non-leaf node, regarding each non-leaf node as a community (namely, the combo of the node is a new community id), defining a node with the degree being 1 as a leaf node, and dividing the traversed leaf node and the neighbor node thereof into the same community (namely, the combo value of the node is equal to the combo value of the neighbor node thereof), namely, allocating the traversed leaf node to the community to which the neighbor node belongs. In the process of traversing, recording each community distributed in the step and the node set contained in each community.

Step 2.2: traversing the number of nodes contained in each community, compressing the communities with the number of the nodes more than 1, and updating the original graph data graph into a new graph compressed in the step;

the method for compressing the communities with the number of the nodes more than 1 comprises the following steps: replacing all nodes in the community with a new node, wherein the number of the nodes is more than 1, and inheriting the edge weight inside the community and the edge weight outside the community: the new node has a self-loop edge, and the weight of the self-loop edge is equal to the weight sum of the original internal edges of the community to which the new node belongs; and the new node in the compressed community inherits the link relations between the community and other communities, and the weight of the edge corresponding to each link relation is equal to the sum of the weights of the plurality of original edges corresponding to the link relation.

Step 2.3: and (3) repeatedly executing the step 2.1 and the step 2.2 on the updated graph data, and iterating until no leaf node leaf needing to be compressed in the graph exists, namely the community distribution result is not changed any more, and returning the graph data graph and the community distribution condition of each node at the moment.

And step 3: performing a community discovery process of maximizing modularity in the first stage of the traditional Louvain algorithm on the compressed graph obtained in the step 2;

in the community structure with high modularity, the similarity of the nodes in the community is high, and the similarity of the nodes outside the community is low, so that the community discovery method is a good community discovery result. In order to obtain a better community structure, the Louvain algorithm needs to find community discovery with maximized modularity, and by calculating the variation Δ Q of the modularity Q of the graph data graph and finding the maximum modularity variation max Δ Q, the community discovery result can be gradually close to the community structure with maximized modularity. In the improved luvain algorithm, the community discovery process of the maximum modularity of the first stage of the improved luvain algorithm is completed through the idea of the traditional luvain algorithm in this step, and this step will be described in detail below:

first compress the number of picturesAccording to the method, each node in the graph is regarded as an independent community, nodes in the compression graph data are traversed, the traversed node nodes are distributed to communities where each neighbor node is located in sequence, and then the variable quantity delta Q of the modularity Q is calculated respectively, namely the variable quantity delta Q is Q₂-Q₁，Q₁And Q₂The modularity before and after the allocation, respectively. Finding the maximum delta Q (max delta Q) in the delta Q process obtained by calculation, and if the maximum modularity variation is max delta Q>0, proving that the distribution mode when realizing max delta Q can be increased and the modularity Q of the compression map data can be increased to the maximum extent, so that the node is distributed to the community where the neighbor node realizing the maximum modularity variation max delta Q is located, and the community structure of the distribution mode is recorded; if max Δ Q is less than or equal to 0, it is proved that an allocation manner capable of increasing the modularity Q of the compression map data graph cannot be found, and thus the community allocation result of the node remains unchanged. The calculation formula of the modularity Q is as follows:

in the formula, i and j represent arbitrary nodes in the graph, A_ijRepresenting the edge weight between nodes i and j; k_i(K_j) Represents the sum of the edge weights connected to node i (node j); c. C_i(c_j) Represents the community to which node i (node j) belongs;

denotes c_iAnd c_jWhether the communities are the same or not is 1, and otherwise is 0; m is the total number of edges in the network,

after traversing all the nodes in the graph and the communities to which all the nodes belong do not change any more, compressing all the communities into equivalent nodes according to the method for compressing the communities with the number of the nodes larger than 1 to form a new graph and returning the new graph, so that the graphs and the community structures of the graphs are updated.

the step belongs to the second stage of improving the Louvain algorithm, and is a key improvement part of the traditional Louvain algorithm, as shown in FIG. 5. In the second stage of the traditional Louvain algorithm, the first-stage algorithm of the Louvain algorithm is continuously iterated until the modularity Q is not increased any more, so that the community discovery of the maximized modularity is continuously approached and realized. Although the second stage of the conventional Louvain algorithm can realize community discovery with maximized modularity, the calculation efficiency is affected by too many iterations. In addition, the community discovery with the maximized modularity cannot be completely equal to the community discovery with the best visual layout effect, the second stage of the traditional Louvain algorithm may be excessive merging among communities in the iteration process, and some communities are lack of merging, so that the situation that some nodes in the graph data graph are excessively distributed to the same community and some nodes are excessively distributed to other communities sporadically exists in the community structure, and the visual layout effect is also influenced. Therefore, the optimization thought of pruning is adopted in the step, a selective community combination algorithm is provided to replace the iterative community combination process in the second stage of the traditional Louvain algorithm, the relevant calculation of the community combination mode influencing the visual layout effect is removed, the calculation amount generated in the iterative process is avoided, and the algorithm efficiency is improved.

The core thought of the selective community merging algorithm is as follows: the first stage of the improved Louvain algorithm has resulted in a community structure with a few primitives of large communities (i.e., communities with more nodes) and a majority of scattered communities (i.e., communities with fewer nodes). When the node in the graph data graph formed by traversing the community structure is continuously calculated and the maximum modularity variable max delta Q is selected, in order to prevent excessive combination among large communities, the node representing the prototype of the large community is skipped; in order to merge scattered small communities, the node representing the small community is only distributed to the community where each neighbor node neighbor is located, and then the modularity variation delta Q is calculated; because the probability that the community with the large community prototype achieves the optimal community structure is higher, the node representing the small community is preferentially distributed to the community where the neighbor node neighborwood representing the large community prototype is located to calculate the modularity variation delta Q, and the modularity variation delta Q which increases the modularity Q and is larger than 0 is not found and then distributed to the communities where other neighbor nodes neighborwood are located to calculate the modularity variation delta Q. The implementation flow is shown in fig. 6. This step will be described in detail below with reference to the figures:

step 4.1: and (3) carrying out graph compression pretreatment on the community structure obtained in the community discovery process in the step (3) by adopting the graph compression method in the step (2).

Step 4.2: and each node in the graph data after the preprocessing is regarded as an independent community, each node is traversed, and each traversed node is divided into a seed node and a non-seed node.

Defining a node representing the large community prototype as a seed node, and judging the seed node in the following mode: since the node represents a community, the degree of the node can be regarded as the number of associations with other communities, and the larger the degree of the node is, the larger the number of associations with other communities is, i.e., a large community containing more nodes is easily formed. Therefore, whether the node is a seed node is judged according to the node degree, and the judgment formula is as follows:

deg(v)>g+p

where v represents the node in the traversed graph data, deg (v) represents the degree of v, g represents the mean of the degrees of the node in the graph, and p represents the standard deviation of the degrees of the node in the graph. And defining the node meeting the judgment formula as a seed node and storing the seed node into the seed node set, and defining the node not meeting the judgment formula as a non-seed node and storing the non-seed node into the non-seed node set.

Step 4.3: distributing the traversed non-seed nodes to communities where neighbor nodes belonging to the seed node set are located, performing relevant calculation of community distribution based on modularity variation delta Q in the first stage of the traditional Louvain algorithm, if the calculation can obtain a new community distribution mode, recording the community structure of the distribution mode, and then jumping to step 4.5, otherwise, continuing to execute step 4.4.

Step 4.4: distributing the traversed non-seed nodes to communities where neighbor nodes belonging to the non-seed node set are located, performing relevant calculation of community distribution based on modularity variable quantity delta Q in the first stage of the traditional Louvain algorithm, recording community structures of the distribution mode if a new community distribution mode can be obtained through the calculation, and otherwise, keeping the community distribution result of the non-seed nodes unchanged.

Step 4.5: and continuing to traverse the next non-seed node, and circulating to the step 4.3 until all the non-seed nodes in the set are traversed. At the moment, the community structure that the modularity Q is not increased any more and is approximately maximum is obtained. This community structure is preserved.

And 5: the community structure and the corresponding graph data obtained in the step 4 are subjected to force directed graph layout based on clustering optimization by utilizing a ComboForce layout algorithm;

step 5.1: and (4) adding community objects combos representing community structures to the data structure of the graph data obtained in the step (1), and storing the community structures and the corresponding graph data obtained in the step (4).

According to the community structure obtained by the improved Louvain algorithm, specific numerical values of the combo attributes are allocated to the nodes in the original graph data graph, and community sets (combos) representing the community structure are added to the original graph data graph, the combos include communities (combos) representing node categories, and the key attributes included in the combos are listed in Table 3.

TABLE 3 Combo contains Key Attribute information

The graph data graph updated in the step is stored in a json file in a json format, a data file for reading a force-directed graph layout for clustering optimization in the subsequent step is generated, and a json object representing the graph data graph in the file is finally expressed in the following form:

fig. 7 is a flowchart illustrating an implementation of a clustering-optimized force-directed graph layout method according to an embodiment. As shown in fig. 7, this part includes the steps of:

step 5.2: and performing relevant setting on the web page for displaying the visual layout image.

In this step, basic attributes of the web page, including resources to be loaded, a page parsing and rendering manner, and a canvas partition of the image for realizing the visual layout, need to be set, and canvas attributes of the image for realizing the visual layout, such as width, height, padding, and the like, are set so as to adapt to the web page and ensure the image effect of the visual layout.

Step 5.3: and creating an image for packaging the visual layout, setting related attributes of the image, and designating the visual layout mode as ComboForce.

In the step, the clustering optimization force guide graph layout and the visual layout image realization are realized in the canvas by utilizing the AntV G6.js technology. This step will be described in detail below:

step 5.3.1: creating and initializing an image object named Graph through a G6 Graph () function, and setting attributes of Graph, including setting default values of a width attribute and a height attribute for controlling the size of Graph, setting a fitView and a fitViewPadding attribute for adapting Graph to a canvas display effect, and setting a minZoom and a maxZoom attribute for controlling an interactive event display effect of the canvas scaling; and defining a contour color array and a filling color array at the same time, and setting the attribute of the Graph when the Graph in the Graph is drawn subsequently.

The ComboForce is a method for laying out a force-directed graph in G6.js, and can add a grouping element (Combo) representing a community set on the basis of node elements and edge elements linking the node elements contained in the traditional force-directed graph layout, so that a mechanical model and a rendering mode of the force-directed graph layout in the ComboForce mode are optimized, namely, gravity for controlling the cluster compactness degree of the same principle as that of an acting force for controlling the overall compactness degree of the layout (called layout gravity) also exists in the Combo, and an acting force for preventing overlapping exists among the Combo, so that nodes in the same Combo are aggregated as much as possible, the overlapping phenomenon does not occur among different Combo as much as possible, and the grouping rendering is performed according to the hierarchical relationship of the Combo to realize the cluster optimization of the force-directed graph.

In order to realize the ComboForce layout and ensure the layout effect, the parameters of the ComboForce layout need to be set, including: the center and the maximum iteration number of the layout are set, and the effect of the layout and the mechanical model of the layout are controlled to achieve stable efficiency; the method comprises the steps of setting the edge length, the node acting force, the edge acting force, the layout gravity, the Combo inner distance, the Combo inner gravity, whether to open the overlapping detection, the acting force for preventing the node from overlapping and the acting force for preventing the Combo from overlapping of the layout, and adjusting the overlapping condition among the nodes, the edges and the Combo of the layout and the compactness of the layout. The unset attribute will use the default value preset in g6. js.

Step 5.4: and creating graphics in the image for realizing the ComboForce layout, and setting relevant properties of the graphics.

In this step, the node elements, edge elements, node patterns of grouping elements, edge patterns, and grouping patterns of the mechanical model representing the ComboForce layout are created and the related attributes are set, which will be described in detail below:

step 5.4.1: creating a node pattern, and carrying out relevant setting on the attribute of the node pattern, wherein the relevant setting comprises the following steps: setting the shape of the node pattern (default to a circle); setting the size of the node pattern (setting the default size); setting style attributes of the node pattern (setting default fill color, stroke color, shading, transparency); the label of the node pattern is set (using the label attribute of the node representing the node element mapped to the node pattern).

Step 5.4.2: creating a side graph and carrying out related setting on the attributes of the side graph, wherein the related setting comprises the following steps: setting the shape of the edge graph (setting as a straight line by default); setting style attributes of the edge graphics (setting default width, color, shade and transparency, and setting default arrow attributes for closing the edge graphics); when a mouse event in the user interaction event is carried out, the side graph is not easy to be clicked by a mouse, so that the detection width of the side graph is set, and the detection range for capturing the mouse event is controlled; the label of the edge graph is set (using the label attribute of the edge mapped to the edge element of the edge graph).

Step 5.4.3: creating a grouping graph and carrying out related setting on the attributes of the grouping graph, wherein the related setting comprises the following steps: setting the shape of the grouping pattern (default setting is circular); setting the relative size of the grouped graphs (setting the default minimum size, not setting the default fixed size, fixSize, so that the specific size of the grouped graphs can be adjusted according to the distribution and size of the internal graphic elements, and setting the default fixed packing size, fixgallaspsize, so as to control the size of the grouped graphs when the grouped graphs are packed up); setting style attributes of the grouped graphs (setting default shades and transparencies, and allocating the outline color array and the filling color array which are set in the step 5.3.1 to default stroking colors and filling colors of the grouped graphs, so that the grouped graphs can obtain color data on corresponding indexes in the arrays according to the id attributes of the community combo represented by the grouped elements represented by the grouped graphs); the label of the grouping graphic is set (using the label attribute mapped to the grouping graphic representing the community combo represented by the grouping element).

Step 5.5: user interaction events are added to the images and the graphics, so that the user can adjust and interact with the images or the graphics. The various user interaction events added in this step will be described in detail below:

and (3) prompting an event by using characters: by setting the tooltip attribute of each Graph in the Graph, the corresponding text label (namely character string data of the attribute represented by the label of the node Graph, the edge Graph and the grouping Graph) is displayed when the mouse moves to each Graph.

A graph movement event: the drags-combo attribute of the grouped graph and the drags-node attribute of the node graph are set, so that the node graph and the grouped graph are dragged by a mouse, and a user can adjust the position of the graph according to the requirement of the user.

Collapse and expansion events of grouped graphs: by setting the collapse-expanded-combo attribute of the grouped graphics, the grouped graphics can be allowed to be packed and expanded by the user, and the influence of certain grouped graphics and node graphics in the grouped graphics on the image formed by the visual layout can be filtered and recovered by the user.

Canvas zoom event: through the setting of the zoom-canvas attribute of the canvas, the user is enabled to control and adjust the scaling of the image formed by the visualization layout in the canvas.

Highlighting the event: by setting the activity-relations of the node graph, the transparency of the node graph, the node graph directly linked with the node graph and the transparency of the edge graph are unchanged, and the transparency of other graphs is reduced when the mouse moves to the node graph, so that the highlight display effect is realized.

And reading the json file generated in the step 5.1, and loading the graph data graph in the json file by using a fetch () function as an interface for resource acquisition. And after obtaining the Graph data Graph, mapping the node, edge and combo in the Graph to a node Graph, a side Graph and a grouping Graph in the image Graph respectively by using a Graph data () function. And calling a Graph () function to request to realize the rendering of the image Graph, and realizing the display of the image Graph generated by the clustering optimized force guide Graph layout in the canvas partition of the web page. And judging whether optimization is needed according to the drawing effect of each graphic element in the image Graph and the drawing effect of the layout structure of the image Graph, if so, skipping to the corresponding step for adjusting the set parameters, and if so, saving and submitting the layout algorithm.

It should be understood that various changes and modifications can be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims

1. A force directed graph layout method based on community discovery and cluster optimization is characterized by comprising the following steps:

2. The method of claim 1, wherein the graph data is composed of nodes and edges.

3. The force directed graph layout method based on community discovery and cluster optimization according to claim 2, wherein the step 2 comprises the following steps:

4. The force directed graph layout method based on community discovery and cluster optimization according to claim 3, wherein the method for compressing communities with node number greater than 1 is as follows: and replacing all nodes in the community with the number of the nodes larger than 1 by using a new node, and inheriting the edge weight inside the community and the edge weight outside the community.

5. The force directed graph layout method based on community discovery and cluster optimization according to claim 3, wherein the step 4 comprises the steps of:

6. The force directed graph layout method based on community discovery and cluster optimization of claim 5, wherein the method of distinguishing each traversed node into a seed node and a non-seed node is: judging whether the node is a seed node or not according to the degree of the node, defining the node meeting the judgment formula as the seed node, and defining the node not meeting the judgment formula as a non-seed node, wherein the judgment formula is as follows:

deg(v)>g+p

7. The method for force directed graph layout based on community discovery and cluster optimization according to claim 2, wherein said step 5 comprises the steps of:

8. The force directed graph layout method based on community discovery and cluster optimization according to claim 7, wherein said step 5.3 comprises the steps of:

9. The method for force directed graph layout based on community discovery and cluster optimization according to claim 8, wherein said step 5.4 comprises the steps of: