CN108052743B - Method and system for determining step approach centrality - Google Patents

Method and system for determining step approach centrality Download PDF

Info

Publication number
CN108052743B
CN108052743B CN201711349361.7A CN201711349361A CN108052743B CN 108052743 B CN108052743 B CN 108052743B CN 201711349361 A CN201711349361 A CN 201711349361A CN 108052743 B CN108052743 B CN 108052743B
Authority
CN
China
Prior art keywords
node
nodes
subgraph
pointer
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711349361.7A
Other languages
Chinese (zh)
Other versions
CN108052743A (en
Inventor
金海�
钱辰
于东晓
谢夏
王娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201711349361.7A priority Critical patent/CN108052743B/en
Publication of CN108052743A publication Critical patent/CN108052743A/en
Application granted granted Critical
Publication of CN108052743B publication Critical patent/CN108052743B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for determining the step approach centrality, wherein the method comprises the following steps: constructing a data graph of a target database; all nodes in the data graph are used as a residual node set; calculating the distances between all node pairs in the residual node set, and calculating the proximity centrality of all nodes through the distances and the weighting functions; selecting a node with the maximum approaching centrality from the residual node set, wherein the step approaching centrality of the selected node is the approaching centrality calculated in the current sub-graph, deleting the selected node from the residual node set, and deleting the edge associated with the selected node from the data graph to generate a new sub-graph; and judging whether the deleted residual node set is empty or not, if not, repeating the steps to continue the calculation until the residual node set is empty, and at the moment, calculating all the nodes to obtain the self step approach centrality. The step approach centrality index provided by the invention has better locality and anti-interference capability.

Description

Method and system for determining step approach centrality
Technical Field
The invention belongs to the field of graph data analysis in the field of computers, and particularly relates to a method and a system for determining step proximity centrality for large-scale data graph analysis.
Background
At present, a wide variety of graph data and graph structures are abundantly present in our lives, such as social networks, traffic network graphs, biological networks, financial networks, scientific data network graphs, and so on. With the continuous development of society, the scale of the data graphs is rapidly expanded, and the scale of the data graphs is increased day by day, so that the difficulty of directly analyzing the structure of the whole graph is very high. Recently, researchers have proposed many methods for detecting and evaluating communities and centrality measures in large-scale graphs. The nodes in the graph are sorted by detecting the key elements in the graph and according to the importance of the nodes, so that the important candidate nodes are tracked. The methods provide powerful tools for researchers in various fields to understand the composition, the function and the dynamic evolution process of a real system. However, the diversity of large-scale data graph structures determines that the centrality measurement itself is a very difficult problem, and therefore, how to evaluate and detect the centrality of nodes in the graph, to provide more accurate measurement indexes and calculation algorithms, and to perform functional explanation on the detected modules and the centrality nodes is still a current very challenging task.
Various classical node centrality indexes and calculation methods have been designed at present, and the application of the classical node centrality indexes and calculation methods in large-scale graph data analysis is shown. However, most data graphs have small world characteristics, and the centrality index obtained by global calculation is often interfered by a small part of nodes with larger centrality. Therefore, how to describe the characteristics of the nodes in a local scope becomes a problem to be solved urgently.
Disclosure of Invention
Aiming at the defects or improvement requirements of the prior art, the invention provides a method and a system for determining the step approach centrality, so that the technical problem that the centrality index obtained by global calculation of the traditional node centrality index and calculation method is often interfered by a small part of nodes with larger centrality is solved.
To achieve the above object, according to one aspect of the present invention, there is provided a step approach centrality determining method including:
(1) establishing a data graph G (V, E) or G (V, E, W) by taking all data units in a target database as nodes, taking the association relation among the data units as edges and taking the strength of the association among the data units as a weight, wherein an edge set E represents a set of all edges in the data graph, a point set V represents a set of all nodes in the data graph, and a weight set W represents the weight of each edge in the edge set E;
(2) initializing current subgraph number l-0 and current residual node set V'lV, current remainder set E'lE, by V'lAnd E'lConstruction of a Generation subgraph Gl(V′l,E′l) Or Gl(V′l,E′l,W′l) Wherein, W'lRepresents E'lA weight set of the middle edges;
(3) generating subgraph GlCalculating to obtain the current residual node set V'lDistance d between each node pairl(u,v);
(4) According to the weighting function and the distance d between each pair of nodes obtained by calculationl(u, v) generating subgraph GlCalculating to obtain the current residual node set V'lThe approximate centrality of each node in the cluster;
(5) finding out the current residual node set V'lMiddle-approach most centrality node set Vl *
(6) From the current set V of nodes remaining'lTo be contained in the node set Vl *Deleting the nodes in the node group to obtain a next-level residual node set V'l+1
(7) Judging the residual node set V 'of the next level'l+1If the current state is not null, executing the step (8), otherwise, executing the step (9);
(8) from the current remaining edge set E'lWill be included in the edge set
Figure BDA0001509939240000021
Deleting the edges to obtain a remaining edge set E 'of the next level'l+1Then through the remaining node set V 'of the next stage'l+1And the remaining edge set E 'of the next stage'l+1Construction of a novel generative subgraph Gl+1Generating subgraph G at said new generatorl+1In the method, the residual node set V 'of the next stage is obtained through calculation'l+1Distance d between each node pairl+1(u, v) when the subgraph number l is l +1, and returningThe step (4) is executed in which,
Figure BDA0001509939240000032
representing the generated subgraph GlMiddle Vl *The adjacent side of (2);
(9) and obtaining the step approach centrality of all the nodes in the data graph G.
Preferably, in step (3), if the generation subgraph GlAs an authorized graph Gl(Vl',E'l,Wl') calculating the current residual node set V by adopting Dijkstra algorithml' distance d between each node pair inl(u, v) if said generating subgraph GlIs a no-right graph Gl(Vl',E'l) Respectively constructing the generated subgraph G by adopting a breadth-first search algorithm BFSlBFS spanning tree with each node as root node for calculating and maintaining distance d between each node pairl(u,v)。
Preferably, in step (4), the distance d between each pair of nodes is calculated according to the weighting functionl(u, v) generating subgraph GlThe current residual node set V is obtained by calculationl' the approximate centrality of each node v is:
Figure BDA0001509939240000031
wherein, α (d)l(u, v)) represents a weighting function, Cc(v) Represents the approximate centrality of the node V, and u represents the current set of nodes Vl' the nodes remaining after node v are removed.
Preferably, step (8) specifically comprises:
(8.1) from the current residual edge set El' in will be included in the edge set
Figure BDA0001509939240000033
Deleting the edges to obtain a remaining edge set E 'of the next level'l+1(ii) a Then passes through a residual node set V 'of the next stage'l+1And the remaining edge set E 'of the next stage'l+1Construction of a novel generative subgraph Gl+1
(8.2) if said new generation subgraph Gl+1And recalculating the residual node set V 'of the next stage by adopting Dijkstra algorithm as a weighted graph'l+1The distance between each pair of nodes in the set;
(8.3) if said new generation subgraph Gl+1And (4) maintaining the residual node set V 'of the next stage in a mode of incremental updating of the BFS spanning tree structure calculated in the step (3) as an unauthorized graph'l+1The distance between each pair of nodes in the array.
Preferably, step (8.3) comprises in particular:
(8.3.1) finding out the residual node set V 'of the next stage'l+1Any node v in0Setting a root node of a target BFS spanning tree as a node r for the target BFS spanning tree of the root node, adding the node r into a queue to be modified, and creating a pointer a to point to a head node of the queue to be modified;
(8.3.2) inserting the child node x of the node pointed by the pointer a into the tail of the queue to be modified if the inserted child node x belongs to the node to be modified
Figure BDA0001509939240000041
Adding all brother nodes of the child node x in the target BFS spanning tree structure into an anchor point queue, creating a pointer b to point to a head node s of the anchor point queue, simultaneously adding all child nodes of the child node x into a collapse linked list, and deleting the connection between the node pointed by the pointer a and the child node x;
(8.3.3) look-up on said generated subgraph GlIf yes, executing the step (8.3.4), and if not, executing the step (8.3.6);
(8.3.4) creating a pointer to node t at node s, and then deleting node t from the collapse linked list;
(8.3.5) judging whether the collapse linked list after the deletion operation is empty, if not, executing the step (8.3.6), and if so, executing the step (8.3.9);
(8.3.6) judging whether a child node exists in the node pointed by the pointer b, if so, inserting the child node of the node pointed by the pointer b into the tail of the anchor point queue;
(8.3.7) judging whether the node pointed by the pointer b has a subsequent node, if so, executing the step (8.3.8), and if not, executing the step (8.3.9);
(8.3.8) pointing the pointer b to the node subsequent to the node pointed to currently, and returning to execute the step (8.3.3);
(8.3.9) judging whether the node pointed by the pointer a has a child node, if so, inserting the child node of the node pointed by the pointer a into the tail of the queue to be modified;
(8.3.10) judging whether the node pointed by the pointer a has a subsequent node, if so, executing the step (8.3.11), and if not, executing the step (8.3.12);
(8.3.11) pointing the pointer a to the node succeeding the node pointed to currently, and returning to execute the step (8.3.2);
(8.3.12) completing the incremental update to the target BFS spanning tree and completing for the set of nodes remaining at the next level V'l+1Middle division v0Incremental updating of BFS spanning tree with other nodes outside as root nodes to complete the new generated subgraph Gl+1The distance between each node pair in (1) is updated.
According to another aspect of the present invention, there is provided a step proximity centrality determining system including:
the data graph building module is used for building a data graph G (V, E) or G (V, E, W) by taking all data units in a target database as nodes, taking the association relations among the data units as edges and taking the strength of the association among the data units as a weight, wherein an edge set E represents a set of all edges in the data graph, a point set V represents a set of all nodes in the data graph, and a weight set W represents the weight of each edge in the edge set E;
a generated subgraph construction module for initializing the current subgraph number l equal to 0 and the current residual node set Vl'-V, current remainder set E'lBy V ═ El'and E'lConstruction of a Generation subgraph Gl(Vl',E'l) Or Gl(Vl',E'l,Wl') wherein, Wl'represents E'lA weight set of the middle edges;
a first node pair distance calculation module for generating subgraph GlThe current residual node set V is obtained by calculationl' distance d between each node pair inl(u,v);
A proximity center calculation module for calculating the distance d between each pair of nodes according to the weighting functionl(u, v) generating subgraph GlThe current residual node set V is obtained by calculationl' the approximate centrality of each node in the set;
a searching module for finding out the current node set Vl' node set having maximum nearness to center in center
Figure BDA0001509939240000051
A node set determination module for determining the current node set V from the current node set Vl' will be included in the set of nodes Vl *Deleting the nodes in the node group to obtain a next-level residual node set V'l+1
A judging module, configured to judge the remaining node set V of the next stage'l+1If the node set is empty, if the node set is the residual node set V of the next level'l+1If the data graph G is empty, the step approach centrality of all the nodes in the data graph G is obtained;
a newly generated subgraph construction module for set V 'of remaining nodes at the next stage'l+1Not empty, from the current remaining edge set E'lWill be included in the edge set
Figure BDA0001509939240000063
Deleting the edges to obtain a remaining edge set E 'of the next level'l+1Then through the remaining node set V 'of the next stage'l+1And the remaining edge set E 'of the next stage'l+1Construction of a novel generative subgraph Gl+1Generating subgraph G at said new generatorl+1In the method, the residual node set V 'of the next stage is obtained through calculation'l+1Distance d between each node pairl+1(u, v) when the subgraph number l is l +1, and returning to the approximate centrality calculation module, wherein,
Figure BDA0001509939240000064
representing the generated subgraph GlMiddle Vl *The adjacent side of (2).
Preferably, said node pair distance calculation module, in particular for generating subgraph GlAs an authorized graph Gl(Vl',E'l,Wl') calculating the current set of nodes remaining V by using Dijkstra algorithml' distance d between each node pair inl(u, v) generating subgraph GlIs a no-right graph Gl(Vl',E'l) Then, the generated subgraph G is respectively constructed by adopting a breadth-first search algorithm BFSlBFS spanning tree with each node as root node for calculating and maintaining distance d between each node pairl(u,v)。
Preferably, the approximate centrality calculating module is specifically configured to calculate a distance d between each pair of nodes according to the weighting function and the calculated distance dl(u, v) generating subgraph GlThe current residual node set V is obtained by calculationl' the approximate centrality of each node v is:
Figure BDA0001509939240000061
wherein, α (d)l(u, v)) represents a weighting function, Cc(v) Represents the approximate centrality of the node V, and u represents the current set of nodes Vl' the nodes remaining after node v are removed.
Preferably, the new generated subgraph construction module comprises:
a newly generated subgraph construction submodule for constructing from the current remaining edge set E'lWill be included in the edge set
Figure BDA0001509939240000062
Deleting the edges to obtain a remaining edge set E 'of the next level'l+1(ii) a Then passes through a residual node set V 'of the next stage'l+1And the remaining edge set E 'of the next stage'l+1Construction of a novel generative subgraph Gl+1
A second node pair distance calculation module for calculating the distance between the new generation subgraph Gl+1When the map is a weighted map, recalculating residual node set V 'of the next stage by adopting Dijkstra algorithm'l+1The distance between each pair of nodes in the set;
a third node-to-distance computation module for generating subgraph G at the new generation subgraph Gl+1And if the node is an unweighted graph, maintaining the residual node set V 'of the next level in a mode of updating the increment of the BFS spanning tree structure calculated by the distance calculation module by the first node'l+1The distance between each pair of nodes in the array.
Preferably, the third node-to-distance calculation module includes:
a first submodule for finding out a set V 'of nodes remaining with the next stage'l+1Any node v in0Setting a root node of a target BFS spanning tree as a node r for the target BFS spanning tree of the root node, adding the node r into a queue to be modified, and creating a pointer a to point to a head node of the queue to be modified;
a second sub-module, configured to insert a sub-node x of the node pointed by the pointer a into the tail of the queue to be modified, if the inserted sub-node x belongs to Vl *Adding all brother nodes of the child node x in the target BFS spanning tree structure into an anchor point queue, creating a pointer b to point to a head node s of the anchor point queue, simultaneously adding all child nodes of the child node x into a collapse linked list, and deleting the connection between the node pointed by the pointer a and the child node x;
a third sub-module for searching in the generated subgraph GlWhether an edge exists between the node s pointed by the pointer b and any node t in the collapse linked list to connect the node s and the node t;
a fourth sub-module, configured to create a pointer pointing to a node t at a node s when an edge exists between the node s pointed to by the pointer b and any node t in the collapsing chain table, and then delete the node t from the collapsing chain table;
the fifth submodule is used for judging whether the collapsing link table after the deleting operation is empty or not;
a sixth sub-module, configured to, when there is no edge connecting node s to a node t in the collapsing chain table between the node s pointed to by the pointer b and any node t in the collapsing chain table, or when the collapsing chain table after deletion is not empty, determine whether there is a child node in the node pointed to by the pointer b, and if there is a child node in the node pointed to by the pointer b, insert the child node of the node pointed to by the pointer b into the tail of the anchor point queue;
the seventh submodule is used for judging whether a node pointed by the pointer b has a successor node or not;
the eighth submodule is used for pointing the pointer b to a successor node of the current pointed node when the successor node exists in the pointed node of the pointer b, and returning to execute the third submodule;
a ninth sub-module, configured to determine whether a child node exists in a node pointed by the pointer a when the collapsing link table after the deletion operation is empty or when a successor node does not exist in the node pointed by the pointer b, and if the child node exists, insert the child node of the node pointed by the pointer a into the tail of the queue to be modified;
the tenth submodule is used for judging whether the node pointed by the pointer a has a successor node or not;
the eleventh submodule is used for pointing the pointer a to a successor node of the current pointed node when the successor node exists in the node pointed by the pointer a, and returning to execute the second submodule;
a twelfth submodule, configured to, when there is no successor node in the node pointed to by the pointer a, complete incremental update of the target BFS spanning tree, and complete set V 'of the remaining nodes at the next stage'l+1Middle division v0Incremental updating of BFS spanning tree with other nodes outside as root nodes to complete the new generated subgraph Gl+1The distance between each node pair in (1) is updated.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
(1) the method for determining the step approach centrality is mainly used for describing and analyzing local centrality characteristics of large-scale data graphs. When the step-type approach centrality is calculated, the interference caused by the nodes with larger approach centrality is reduced by deleting the nodes with the largest approach centrality in the current sub-graph every time, and the step-type approach centrality with better locality is obtained, so that better hierarchical analysis is facilitated for the local condition of the data graph.
(2) In the calculation of the weightless graph, the existing calculation result is effectively utilized in an incremental updating mode, the dynamic updating of the shortest path between the nodes is realized, the high cost of complete recalculation is avoided, and the calculation of the step approach centrality provided by the invention can be realized more efficiently.
Drawings
Fig. 1 is a schematic flowchart of a method for determining a step approach centrality according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a method for incrementally updating a distance between node pairs according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention provides a method for determining the step approach centrality, which mainly comprises the following steps: firstly, establishing a data graph by taking data units of a database as nodes, taking association among the data units as edges and taking the strength of the association among the data units as weight; initializing a residual node set into all nodes in the current data graph, and selecting a proper weighting function; thirdly, calculating the distances between all node pairs in the residual node set, and calculating the approaching centrality of all nodes according to the distances; selecting a node set with the maximum approach centrality from the residual node sets, wherein the step approach centrality of the selected nodes is defined as the approach centrality calculated in the current subgraph, deleting the nodes from the residual node sets, and deleting the edges associated with the nodes from the data graph to generate a new subgraph; judging whether the residual node set is empty, if not, repeating the third to fifth steps until the residual node set is empty, and calculating all the points to obtain the step approach centrality of the point. According to the invention, the node with the maximum approaching centrality is deleted, and the approaching centrality of the node with the maximum approaching centrality calculated on the obtained sub-graph is defined as the step approaching centrality of the node; the step approach centrality of all the nodes is obtained by repeating the above process, so that the centrality of all the nodes is localized, the anti-interference capability of the centrality calculated by each node is improved, and the characteristics of the nodes in a local range can be better represented.
The invention provides a step proximity center (Hierarchical Closeness center) index for large-scale map data analysis, which comprises the following steps:
in graph G (V, E), the conventional approach centrality C of node Vc(v) Is defined as:
Figure BDA0001509939240000091
where α (d (u, V)) is the selected weighting function, and u is the other node in the node set V except for the node V.
In a sub-graph GlDegree of approaching traditional center Cc(v) The largest set of nodes is denoted Vl *Graph GlMiddle Vl *The set of associated edges is represented as
Figure BDA0001509939240000101
Defining a series of generations in graph GSubfigure GlThe following were used:
Figure BDA0001509939240000102
defining a set V of nodes remaining simultaneouslyl':
Figure BDA0001509939240000103
Residual edge set El':
Figure BDA0001509939240000104
Wherein A \ B represents the removal of a portion containing B from A.
When node u and node v exist in generation subgraph G at the same timelWhen up, a subgraph distance d is definedl(u, v) is node u and node v is in subgraph GlIs measured. The step approach center degree C proposed in the present invention is definedH(v) Comprises the following steps:
Figure BDA0001509939240000105
degree of approaching to the center of the tradition
Figure BDA0001509939240000106
Compared with the step approach centrality index provided by the invention, the step approach centrality index has better locality and anti-interference capability.
Fig. 1 is a schematic flow chart of a method for determining a step approach center degree according to an embodiment of the present invention, where the method shown in fig. 1 includes:
(1) establishing a data graph G (V, E) or G (V, E, W) by taking all data units in a target database as nodes, taking the association relation among the data units as edges and taking the strength of the association among the data units as a weight, wherein an edge set E represents a set of all edges in the data graph, a point set V represents a set of all nodes in the data graph, and a weight set W represents the weight of each edge in the edge set E;
(2) initializing current subgraph number l as 0, and current residual node set Vl'-V, current remainder set E'lBy V ═ El'and E'lConstruction of a Generation subgraph Gl(Vl',E'l) Or Gl(V'l,E'l,Wl') wherein, Wl'represents E'lA weight set of the middle edges;
(3) in generating subgraph GlThe current residual node set V is obtained by middle calculationl' distance d between each node pair inl(u,v);
In an alternative embodiment, in step (3), if sub-graph G is generatedlAs an authorized graph Gl(Vl',E'l,Wl') calculating the current residual node set V by adopting Dijkstra algorithml' distance d between each node pair inl(u, v) if subgraph G is generatedlIs a no-right graph Gl(Vl',E'l) Respectively constructing by using a Breadth-First-Search algorithm (BFS) to generate a subgraph GlBFS spanning tree with each node as root node for calculating and maintaining distance d between each node pairl(u,v)。
(4) According to the weighting function and the distance d between each pair of nodes obtained by calculationl(u, v) in generating subgraph GlThe current residual node set V is obtained by middle calculationl' the approximate centrality of each node in the set;
in an alternative embodiment, the weighting function may be selected by:
e.g. in the harmonious centrality
Figure BDA0001509939240000111
α (d (u, v)) -2 in the center of exponential decay-d(u,v)And the like. And according to the sensitive condition of the distance change, selecting a corresponding weighting function appropriately, thereby obtaining ideal approximate centrality.
In an alternative embodimentIn the embodiment, in the step (4), the distance d between each pair of nodes is calculated according to the weighting functionl(u, v) in generating subgraph GlThe current residual node set V is obtained by middle calculationl' the approximate centrality of each node v is:
Figure BDA0001509939240000112
wherein, α (d)l(u, v)) represents a weighting function, Cc(v) Represents the approximate centrality of the node V, and u represents the current set of nodes Vl' the nodes remaining after node v are removed.
(5) Finding out current residual node set Vl' node set V having the greatest degree of nearness in centerl *
Wherein, Vl *The step approach centrality C of the node u included in (1)H(u)=Cc(u)。
(6) From the current set of nodes remaining Vl' will be contained in a node set Vl *Deleting the nodes in the node group to obtain a next-level residual node set V'l+1
(7) Judging the residual node set V of the next level'l+1If the current state is not null, executing the step (8), otherwise, executing the step (9);
(8) from the current remaining edge set E'lWill be included in the edge set
Figure BDA0001509939240000121
Deleting the edges to obtain a remaining edge set E 'of the next level'l+1Then through the remaining node set V 'of the next stage'l+1And residual edge set E 'of the next stage'l+1Construction of a novel generative subgraph Gl+1In the new generation subgraph Gl+1In the method, the residual node set V 'of the next stage is obtained through calculation'l+1Distance d between each node pairl+1(u, v), when the subgraph number l is l +1, and returning to execute the step (4), wherein,
Figure BDA0001509939240000122
representing the generated subgraph GlMiddle Vl *The adjacent side of (2);
as shown in fig. 2, in an alternative embodiment, step (8) specifically includes:
(8.1) set E of edges left from the presentl' in will be included in the edge set
Figure BDA0001509939240000123
Deleting the edges to obtain a remaining edge set E 'of the next level'l+1(ii) a Then passes through a residual node set V 'of the next stage'l+1And residual edge set E 'of the next stage'l+1Construction of a novel generative subgraph Gl+1
(8.2) if new generation subgraph Gl+1For the weighted graph, recalculating residual node set V 'of the next stage by Dijkstra algorithm'l+1The distance between each pair of nodes in the set;
(8.3) if new generation subgraph Gl+1And (4) maintaining the residual node set V 'of the next level in a mode of updating the increment of the BFS spanning tree structure calculated in the step (3) as an unwarranted graph'l+1The distance between each pair of nodes in the array.
In an alternative embodiment, step (8.3) specifically includes:
(8.3.1) finding a residual node set V 'of the next stage'l+1Any node v in0Setting a root node of the target BFS spanning tree as a node r for the target BFS spanning tree of the root node, adding the node r into a queue to be modified, and creating a pointer a to point to a head node of the queue to be modified;
(8.3.2) inserting the child node x of the node pointed by the pointer a into the tail of the queue to be modified, if the inserted child node x belongs to Vl *Adding all brother nodes of the child node x in the target BFS spanning tree structure into an anchor point queue, creating a head node s of which the pointer b points to the anchor point queue, simultaneously adding all child nodes of the child node x into a collapse linked list, and deleting the connection between the node pointed by the pointer a and the child node x;
(8.3.3) search in generating subgraph GlWhether one node exists between the node s pointed to by the pointer b and any node t in the collapse chain table or notConnecting the node s and the node t by the edge, if the node s and the node t exist, executing the step (8.3.4), and if the node t does not exist, executing the step (8.3.6);
(8.3.4) creating a pointer to node t at node s, and then deleting node t from the collapsing list;
(8.3.5) judging whether the collapse linked list after the deletion operation is empty, if not, executing the step (8.3.6), and if so, executing the step (8.3.9);
(8.3.6) judging whether the node pointed by the pointer b has a child node, if so, inserting the child node of the node pointed by the pointer b into the tail of the anchor point queue;
(8.3.7) judging whether the node pointed by the pointer b has a subsequent node, if so, executing the step (8.3.8), and if not, executing the step (8.3.9);
(8.3.8) pointing the pointer b to the node subsequent to the node pointed to currently, and returning to execute the step (8.3.3);
(8.3.9) judging whether the node pointed by the pointer a has a child node, if so, inserting the child node pointed by the pointer a into the tail of the queue to be modified;
(8.3.10) judging whether the node pointed by the pointer a has a subsequent node, if so, executing the step (8.3.11), and if not, executing the step (8.3.12);
(8.3.11) pointing the pointer a to the node succeeding the node pointed to currently, and returning to execute the step (8.3.2);
(8.3.12) completing the incremental update to the target BFS spanning tree and completing the set of nodes remaining V 'for the next level'l+1Middle division v0Incremental updating of BFS spanning tree with other nodes outside as root nodes to complete new generation subgraph Gl+1The distance between each node pair in (1) is updated.
(9) And obtaining the step approach centrality of all the nodes in the data graph G.
The invention also provides a system for determining the step approach centrality, which comprises:
the data graph building module is used for building a data graph G (V, E) or G (V, E, W) by taking all data units in the target database as nodes, taking the association relationship among the data units as edges and taking the strength of the association among the data units as a weight, wherein an edge set E represents a set of all edges in the data graph, a point set V represents a set of all nodes in the data graph, and a weight set W represents the weight of each edge in the edge set E;
a generated subgraph construction module for initializing the current subgraph number l equal to 0 and the current residual node set Vl'-V, current remainder set E'lBy V ═ El'and E'lConstruction of a Generation subgraph Gl(Vl',E'l) Or Gl(Vl',E'l,Wl') wherein, Wl'represents E'lA weight set of the middle edges;
a first node pair distance calculation module for generating subgraph GlThe current residual node set V is obtained by middle calculationl' distance d between each node pair inl(u,v);
A proximity center calculation module for calculating the distance d between each pair of nodes according to the weighting functionl(u, v) in generating subgraph GlThe current residual node set V is obtained by middle calculationl' the approximate centrality of each node in the set;
a searching module for finding out the current residual node set Vl' node set V having the greatest degree of nearness in centerl *
A node set determination module for determining a node set V from a current node set V remainingl' will be included in the set of nodes Vl *Deleting the nodes in the node group to obtain a next-level residual node set V'l+1
A judging module for judging the residual node set V of the next level'l+1If the node set is empty, if the node set V is the residual node set V of the next stage'l+1If the data graph G is empty, the step approach centrality of all the nodes in the data graph G is obtained;
a newly generated subgraph construction module for the remaining node set V 'at the next stage'l+1Not empty, from the current remaining edge set El' in will be included in the edge set
Figure BDA0001509939240000142
Deleting the edges to obtain a remaining edge set E 'of the next level'l+1Then through the remaining node set V 'of the next stage'l+1And residual edge set E 'of the next stage'l+1Construction of a novel generative subgraph Gl+1In the new generation subgraph Gl+1In the method, the residual node set V 'of the next stage is obtained through calculation'l+1Distance d between each node pairl+1(u, v) when the subgraph number l is l +1, and returning to the approximate centrality calculation module, wherein,
Figure BDA0001509939240000141
representing the generated subgraph GlMiddle Vl *The adjacent side of (2).
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for determining a step approach centrality, comprising:
(1) establishing a data graph G (V, E) or G (V, E, W) by taking all data units in a target database as nodes, taking the association relation among the data units as edges and taking the strength of the association among the data units as a weight, wherein an edge set E represents a set of all edges in the data graph, a point set V represents a set of all nodes in the data graph, and a weight set W represents the weight of each edge in the edge set E;
(2) initializing current subgraph number l as 0, and current residual node set Vl'-V, current remainder set E'lBy V ═ El'and E'lConstruction of a Generation subgraph Gl(Vl',E'l) Or Gl(Vl',E'l,Wl') wherein, Wl' represents El' weight set of the middle edge;
(3) generating subgraph GlThe current remaining node is obtained by calculationCollection Vl' distance d between each node pair inl(u, v); v represents the current set of nodes remaining Vl' Each node in u represents the current set of nodes remaining Vl' the nodes remaining after node v is removed;
(4) according to the weighting function and the distance d between each pair of nodes obtained by calculationl(u, v) generating subgraph GlThe current residual node set V is obtained by calculationl' the approximate centrality of each node in the set;
(5) finding out the current residual node set Vl' node set V having the greatest degree of nearness in centerl *
(6) From the current set of nodes remaining Vl' will be included in the set of nodes Vl *Deleting the nodes in the node group to obtain a next-level residual node set V'l+1
(7) Judging the residual node set V 'of the next level'l+1If the current state is not null, executing the step (8), otherwise, executing the step (9);
(8) from the current remaining edge set E'lWill be included in the edge set
Figure FDA0002608738520000011
Deleting the edges to obtain a remaining edge set E 'of the next level'l+1Then through the remaining node set V 'of the next stage'l+1And the remaining edge set E 'of the next stage'l+1Construction of a novel generative subgraph Gl+1Generating subgraph G at said new generatorl+1In the method, the residual node set V 'of the next stage is obtained through calculation'l+1Distance d between each node pairl+1(u, v), when the subgraph number l is l +1, and returning to execute the step (4), wherein,
Figure FDA0002608738520000012
representing the generated subgraph GlMiddle Vl *The adjacent side of (2);
(9) and obtaining the step approach centrality of all the nodes in the data graph G.
2. The method of claim 1, wherein in step (3), if the generation subgraph G islAs an authorized graph Gl(Vl',E'l,Wl') calculating the current residual node set V by adopting Dijkstra algorithml' distance d between each node pair inl(u, v) if said generating subgraph GlIs a no-right graph Gl(Vl',E'l) Respectively constructing the generated subgraph G by adopting a breadth-first search algorithm BFSlBFS spanning tree with each node as root node for calculating and maintaining distance d between each node pairl(u,v)。
3. The method of claim 1, wherein in step (4), the distance d between each pair of nodes is calculated based on the weighting function andl(u, v) generating subgraph GlThe current residual node set V is obtained by calculationl' the approximate centrality of each node v is:
Figure FDA0002608738520000021
wherein, α (d)l(u, v)) represents a weighting function, Cc(v) Represents the approximate centrality of the node V, and u represents the current set of nodes Vl' the nodes remaining after node v are removed.
4. The method according to any one of claims 1 to 3, characterized in that step (8) comprises in particular:
(8.1) from the current remaining edge set E'lWill be included in the edge set
Figure FDA0002608738520000022
Deleting the edges to obtain a remaining edge set E 'of the next level'l+1(ii) a Then passes through a residual node set V 'of the next stage'l+1And the remaining edge set E 'of the next stage'l+1Construction of a novel generative subgraph Gl+1
(8.2) if said new generation subgraph Gl+1And recalculating the residual node set V 'of the next stage by adopting Dijkstra algorithm as a weighted graph'l+1The distance between each pair of nodes in the set;
(8.3) if said new generation subgraph Gl+1And (4) maintaining the residual node set V 'of the next stage in a mode of incremental updating of the BFS spanning tree structure calculated in the step (3) as an unauthorized graph'l+1The distance between each pair of nodes in the array.
5. The method according to claim 4, characterized in that step (8.3) comprises in particular:
(8.3.1) finding out the residual node set V 'of the next stage'l+1Any node v in0Setting a root node of a target BFS spanning tree as a node r for the target BFS spanning tree of the root node, adding the node r into a queue to be modified, and creating a pointer a to point to a head node of the queue to be modified;
(8.3.2) inserting the child node x of the node pointed by the pointer a into the tail of the queue to be modified, if the inserted child node x belongs to Vl *Adding all brother nodes of the child node x in the target BFS spanning tree structure into an anchor point queue, creating a pointer b to point to a head node s of the anchor point queue, simultaneously adding all child nodes of the child node x into a collapse linked list, and deleting the connection between the node pointed by the pointer a and the child node x;
(8.3.3) look-up on said generated subgraph GlIf yes, executing the step (8.3.4), and if not, executing the step (8.3.6);
(8.3.4) creating a pointer to node t at node s, and then deleting node t from the collapse linked list;
(8.3.5) judging whether the collapse linked list after the deletion operation is empty, if not, executing the step (8.3.6), and if so, executing the step (8.3.9);
(8.3.6) judging whether a child node exists in the node pointed by the pointer b, if so, inserting the child node of the node pointed by the pointer b into the tail of the anchor point queue;
(8.3.7) judging whether the node pointed by the pointer b has a subsequent node, if so, executing the step (8.3.8), and if not, executing the step (8.3.9);
(8.3.8) pointing the pointer b to the node subsequent to the node pointed to currently, and returning to execute the step (8.3.3);
(8.3.9) judging whether the node pointed by the pointer a has a child node, if so, inserting the child node of the node pointed by the pointer a into the tail of the queue to be modified;
(8.3.10) judging whether the node pointed by the pointer a has a subsequent node, if so, executing the step (8.3.11), and if not, executing the step (8.3.12);
(8.3.11) pointing the pointer a to the node succeeding the node pointed to currently, and returning to execute the step (8.3.2);
(8.3.12) completing the incremental update to the target BFS spanning tree and completing for the set of nodes remaining at the next level V'l+1Middle division v0Incremental updating of BFS spanning tree with other nodes outside as root nodes to complete the new generated subgraph Gl+1The distance between each node pair in (1) is updated.
6. A step proximity centrality determination system, comprising:
the data graph building module is used for building a data graph G (V, E) or G (V, E, W) by taking all data units in a target database as nodes, taking the association relations among the data units as edges and taking the strength of the association among the data units as a weight, wherein an edge set E represents a set of all edges in the data graph, a point set V represents a set of all nodes in the data graph, and a weight set W represents the weight of each edge in the edge set E;
a generated subgraph construction module for initializing the current subgraph number l equal to 0 and the current residual node set Vl'-V, current remainder set E'lBy V ═ El'and E'lConstruction of a Generation subgraph Gl(Vl',E'l) Or Gl(Vl',E'l,Wl') wherein, Wl'represents E'lA weight set of the middle edges;
a first node pair distance calculation module for generating subgraph GlThe current residual node set V is obtained by calculationl' distance d between each node pair inl(u, v); v represents the current set of nodes remaining Vl' Each node in u represents the current set of nodes remaining Vl' the nodes remaining after node v is removed;
a proximity center calculation module for calculating the distance d between each pair of nodes according to the weighting functionl(u, v) generating subgraph GlThe current residual node set V is obtained by calculationl' the approximate centrality of each node in the set;
a searching module for finding out the current node set Vl' node set V having the greatest degree of nearness in centerl *
A node set determination module for determining the current node set V from the current node set Vl' will be included in the set of nodes Vl *Deleting the nodes in the node group to obtain a next-level residual node set V'l+1
A judging module, configured to judge the remaining node set V of the next stage'l+1If the node set is empty, if the node set is the residual node set V of the next level'l+1If the data graph G is empty, the step approach centrality of all the nodes in the data graph G is obtained;
a newly generated subgraph construction module for set V 'of remaining nodes at the next stage'l+1Not empty, from the current remaining edge set E'lWill be included in the edge set
Figure FDA0002608738520000051
Deleting the edges to obtain a remaining edge set E 'of the next level'l+1Then through the remaining node set V 'of the next stage'l+1And the remaining edge set E 'of the next stage'l+1Construction of a novel generative subgraph Gl+1Generating subgraph G at said new generatorl+1In, calculateSet of nodes remaining to the next stage V'l+1Distance d between each node pairl+1(u, v) when the subgraph number l is l +1, and returning to the approximate centrality calculation module, wherein,
Figure FDA0002608738520000052
representing the generated subgraph GlMiddle Vl *The adjacent side of (2).
7. System according to claim 6, characterized in that said node pair distance calculation module, in particular for generating sub-graph G, is adapted to calculate said distance between said node pairslAs an authorized graph Gl(Vl',E'l,Wl') calculating the current set of nodes remaining V by using Dijkstra algorithml' distance d between each node pair inl(u, v) generating subgraph GlIs a no-right graph Gl(Vl',E'l) Then, the generated subgraph G is respectively constructed by adopting a breadth-first search algorithm BFSlBFS spanning tree with each node as root node for calculating and maintaining distance d between each node pairl(u,v)。
8. The system according to claim 6, wherein the approximate centrality calculation module is specifically configured to calculate the distance d between each pair of nodes according to the weighting functionl(u, v) generating subgraph GlThe current residual node set V is obtained by calculationl' the approximate centrality of each node v is:
Figure FDA0002608738520000053
wherein, α (d)l(u, v)) represents a weighting function, Cc(v) Represents the approximate centrality of the node V, and u represents the current set of nodes Vl' the nodes remaining after node v are removed.
9. The system of any of claims 6 to 8, wherein the new generation subgraph construction module comprises:
a newly generated subgraph construction submodule for constructing from the current remaining edge set E'lWill be included in the edge set
Figure FDA0002608738520000054
Deleting the edges to obtain a remaining edge set E 'of the next level'l+1(ii) a Then passes through a residual node set V 'of the next stage'l+1And the remaining edge set E 'of the next stage'l+1Construction of a novel generative subgraph Gl+1
A second node pair distance calculation module for calculating the distance between the new generation subgraph Gl+1When the map is a weighted map, recalculating residual node set V 'of the next stage by adopting Dijkstra algorithm'l+1The distance between each pair of nodes in the set;
a third node-to-distance computation module for generating subgraph G at the new generation subgraph Gl+1And if the node is an unweighted graph, maintaining the residual node set V 'of the next level in a mode of updating the increment of the BFS spanning tree structure calculated by the distance calculation module by the first node'l+1The distance between each pair of nodes in the array.
10. The system of claim 9, wherein the third node-to-distance calculation module comprises:
a first submodule for finding out a set V 'of nodes remaining with the next stage'l+1Any node v in0Setting a root node of a target BFS spanning tree as a node r for the target BFS spanning tree of the root node, adding the node r into a queue to be modified, and creating a pointer a to point to a head node of the queue to be modified;
a second sub-module, configured to insert a sub-node x of the node pointed by the pointer a into the tail of the queue to be modified, if the inserted sub-node x belongs to Vl *Adding all brother nodes of the child node x in the target BFS spanning tree structure into an anchor point queue, creating a pointer b to point to a head node s of the anchor point queue, simultaneously adding all child nodes of the child node x into a collapse linked list, and deleting the pointera connection of the pointed-to node to child node x;
a third sub-module for searching in the generated subgraph GlWhether an edge exists between the node s pointed by the pointer b and any node t in the collapse linked list to connect the node s and the node t;
a fourth sub-module, configured to create a pointer pointing to a node t at a node s when an edge exists between the node s pointed to by the pointer b and any node t in the collapsing chain table, and then delete the node t from the collapsing chain table;
the fifth submodule is used for judging whether the collapsing link table after the deleting operation is empty or not;
a sixth sub-module, configured to, when there is no edge connecting node s to a node t in the collapsing chain table between the node s pointed to by the pointer b and any node t in the collapsing chain table, or when the collapsing chain table after deletion is not empty, determine whether there is a child node in the node pointed to by the pointer b, and if there is a child node in the node pointed to by the pointer b, insert the child node of the node pointed to by the pointer b into the tail of the anchor point queue;
the seventh submodule is used for judging whether a node pointed by the pointer b has a successor node or not;
the eighth submodule is used for pointing the pointer b to a successor node of the current pointed node when the successor node exists in the pointed node of the pointer b, and returning to execute the third submodule;
a ninth sub-module, configured to determine whether a child node exists in a node pointed by the pointer a when the collapsing link table after the deletion operation is empty or when a successor node does not exist in the node pointed by the pointer b, and if the child node exists, insert the child node of the node pointed by the pointer a into the tail of the queue to be modified;
the tenth submodule is used for judging whether the node pointed by the pointer a has a successor node or not;
the eleventh submodule is used for pointing the pointer a to a successor node of the current pointed node when the successor node exists in the node pointed by the pointer a, and returning to execute the second submodule;
a twelfth submodule for pointing at the pointer aWhen the node has no successor node, finishing the incremental updating of the target BFS spanning tree and finishing the residual node set V 'of the next level'l+1Middle division v0Incremental updating of BFS spanning tree with other nodes outside as root nodes to complete the new generated subgraph Gl+1The distance between each node pair in (1) is updated.
CN201711349361.7A 2017-12-15 2017-12-15 Method and system for determining step approach centrality Active CN108052743B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711349361.7A CN108052743B (en) 2017-12-15 2017-12-15 Method and system for determining step approach centrality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711349361.7A CN108052743B (en) 2017-12-15 2017-12-15 Method and system for determining step approach centrality

Publications (2)

Publication Number Publication Date
CN108052743A CN108052743A (en) 2018-05-18
CN108052743B true CN108052743B (en) 2021-01-05

Family

ID=62133256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711349361.7A Active CN108052743B (en) 2017-12-15 2017-12-15 Method and system for determining step approach centrality

Country Status (1)

Country Link
CN (1) CN108052743B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046330B (en) * 2018-10-12 2022-08-16 中国电子科技集团公司第三十八研究所 Method for generating effective points based on variable value space
CN112380267B (en) * 2020-10-21 2022-08-05 山东大学 Community discovery method based on privacy graph
CN112989079B (en) * 2021-04-22 2021-08-03 北京电信易通信息技术股份有限公司 Novel image data retrieval method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799814A (en) * 2009-12-31 2010-08-11 茂名学院 Method for gathering free classification label into reticular classification structure
CN102196502A (en) * 2011-04-06 2011-09-21 东南大学 Congestion control method for wireless sensor network
CN103970856A (en) * 2014-05-07 2014-08-06 中国人民大学 Shortest path estimation method on authorized and directed dynamic network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060112146A1 (en) * 2004-11-22 2006-05-25 Nec Laboratories America, Inc. Systems and methods for data analysis and/or knowledge management
US9256706B2 (en) * 2013-09-03 2016-02-09 Synopsys Taiwan Co., LTD. Knowledge-based analog layout generator

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799814A (en) * 2009-12-31 2010-08-11 茂名学院 Method for gathering free classification label into reticular classification structure
CN102196502A (en) * 2011-04-06 2011-09-21 东南大学 Congestion control method for wireless sensor network
CN103970856A (en) * 2014-05-07 2014-08-06 中国人民大学 Shortest path estimation method on authorized and directed dynamic network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Parallel Algorithm for Core Maintenance in Dynamic Graphs;Na Wang 等;《2017 IEEE 37th International Conference on Distributed Computing Systems》;20170717;第2366-2371页 *
大规模图中低复杂度分布式算法浅析;华强胜 等;《南京信息工程大学学报(自然科学版)》;20171031(第5期);第533-543页 *
社会网络分析在专利引用中的实证研究;侯筱蓉 等;《情报科学》;20130228;第31卷(第2期);第63-71页 *

Also Published As

Publication number Publication date
CN108052743A (en) 2018-05-18

Similar Documents

Publication Publication Date Title
Song et al. Efficient routing on large road networks using hierarchical communities
CN108052743B (en) Method and system for determining step approach centrality
US20040088307A1 (en) Data structure for information systems
CN102176223B (en) Protein complex identification method based on key protein and local adaptation
Xie et al. Type-based exploration with multiple search queues for satisficing planning
CN105183796A (en) Distributed link prediction method based on clustering
CN106462620A (en) Distance queries on massive networks
CN107121146B (en) Optimum path planning method based on road chain depth
Azis et al. Comparison of Floyd-Warshall algorithm and greedy algorithm in determining the shortest route
CN109656798B (en) Vertex reordering-based big data processing capability test method for supercomputer
CN109992786A (en) A kind of semantic sensitive RDF knowledge mapping approximate enquiring method
CN109614520B (en) Parallel acceleration method for multi-pattern graph matching
CN110889530A (en) Destination prediction method based on recurrent neural network and server
Zhu et al. Transportation routing map abstraction approach: Algorithm and numerical analysis
CN111553509B (en) Rail transit route selection evaluation and cost optimization method aiming at geological environment risk
Liu et al. FHL-cube: multi-constraint shortest path querying with flexible combination of constraints
CN104899283A (en) Frequent sub-graph mining and optimizing method for single uncertain graph
CN111899117A (en) K-edge connected component mining system and mining method applied to social network
Baidoo et al. Solving the TSP using traditional computing approach
CN108197186B (en) Dynamic graph matching query method applied to social network
CN110807061A (en) Method for searching frequent subgraphs of uncertain graphs based on layering
JP5991052B2 (en) Update apparatus, update method, and update program
CN107564289B (en) Road network preprocessing method for merging traffic nodes
CN104899885B (en) A kind of Frequent tree mining for single uncertain figure excavates and optimization method
Jackovich et al. Comparing greedy constructive heuristic subtour elimination methods for the traveling salesman problem

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant