CN112417247A - Dynamic flow graph data vertex importance updating method and device based on random walk - Google Patents

Dynamic flow graph data vertex importance updating method and device based on random walk Download PDF

Info

Publication number
CN112417247A
CN112417247A CN202011315919.1A CN202011315919A CN112417247A CN 112417247 A CN112417247 A CN 112417247A CN 202011315919 A CN202011315919 A CN 202011315919A CN 112417247 A CN112417247 A CN 112417247A
Authority
CN
China
Prior art keywords
vertex
random walk
updating
flow graph
dynamic flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011315919.1A
Other languages
Chinese (zh)
Other versions
CN112417247B (en
Inventor
曾国荪
丁春玲
孙志鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202011315919.1A priority Critical patent/CN112417247B/en
Publication of CN112417247A publication Critical patent/CN112417247A/en
Application granted granted Critical
Publication of CN112417247B publication Critical patent/CN112417247B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to a method and a device for updating the importance of data vertexes of a dynamic flow graph based on random walk, wherein the method comprises the following steps: acquiring associated data in real time according to the time sequence, and updating the data of the dynamic flow graph in real time; acquiring an affected vertex and a newly added vertex in the data updating process of the dynamic flow graph at each moment; generating a random walk path by each vertex in the dynamic flow graph data in a preset random walk mode; calculating or updating the PageRank value of each affected vertex according to the total times of the random walk path passing through each affected vertex; aggregating the original vertexes of the dynamic flow graph data into a hyper-vertex, reserving all connecting edges of newly-added vertexes in the dynamic flow graph data, connecting the other ends of the connecting edges with the hyper-vertex, and thus obtaining a new graph, wherein the PageRank value of each newly-added vertex is calculated or updated in the new graph through the method. Compared with the prior art, the method and the device ensure the accuracy of the calculation result and the real-time performance of the calculation.

Description

Dynamic flow graph data vertex importance updating method and device based on random walk
Technical Field
The invention relates to the field of importance updating of data vertexes of dynamic flow graphs, in particular to a random walk-based importance updating method of data vertexes of a dynamic flow graph.
Background
The initial PageRank concept refers to the ranking value of Web page importance, which is currently widely referred to as the ranking value of vertex importance in a graph, and is usually obtained by continuously iteratively converging a connection matrix and a feature vector of the graph. In the big data era, with the rapid development of social networks, a plurality of large-scale dynamic flow graphs are generated, and the importance of each vertex in the graph, namely PageRank, needs to be calculated so as to develop field application. For example, in a dynamic social network, there is a need to find circles of friends on the fly, or to quickly discover criminal groups, etc., based on the PageRank of vertices.
The traditional method for solving the PageRank mainly comprises a PageRank method based on static graph calculation, an incremental power iteration method, an aggregation incremental calculation method and the like, and the method has the following defects:
1. the PageRank method based on static graph calculation utilizes global graph data to perform power iteration method calculation again on a changed new graph to update the PageRank, consumes a large amount of time and calculation resources, and is difficult to meet the real-time requirement of related graph application.
2. The incremental power iteration method provides an incremental updating iteration model, but the model needs a large amount of time overhead to ensure the accuracy of the PageRank updating, and meanwhile, errors of updating the PageRank along with a continuously arriving flow graph are accumulated continuously. The aggregation delta method has difficulty in determining the vertices that need to be aggregated, and the degree of aggregation directly affects the quality and computational complexity of the PageRank update.
In conclusion, the traditional method pursues the accuracy of PageRank calculation, so that graph data which changes continuously and rapidly are difficult to deal with; or the accuracy of the PageRank is sacrificed to obtain a small amount of calculation, the updating speed is accelerated, and the calculation error of the PageRank is accumulated continuously along with the continuous change of a flow graph; therefore, the traditional method is difficult to achieve reasonable balance on the accuracy and the real-time performance of updating the PageRank, is difficult to be suitable for a continuously changing dynamic flow diagram environment, cannot effectively and quickly update the PageRank value, and is particularly suitable for the application field needing real-time processing.
Disclosure of Invention
The invention aims to overcome the defect that the PageRank calculation cannot have both accuracy and real-time performance in the prior art, and provides a dynamic flow graph data vertex importance updating method based on random walk.
The purpose of the invention can be realized by the following technical scheme:
a dynamic flow graph data vertex importance updating method based on random walk comprises the following steps:
acquiring associated data in real time according to the time sequence, and updating the data of the dynamic flow graph in real time;
acquiring an affected vertex and a newly added vertex in the data updating process of the dynamic flow graph at each moment;
generating a random walk path by each vertex in the dynamic flow graph data in a preset random walk mode;
calculating or updating the PageRank value of each affected vertex according to the total times of the random walk path passing through each affected vertex;
aggregating original vertexes of the dynamic flow graph data into a hyper-vertex, reserving all connecting edges of newly-added vertexes in the dynamic flow graph data, connecting the other ends of the connecting edges with the hyper-vertex to obtain a new graph, generating a random walk path for each vertex of the new graph in a preset random walk mode, and calculating or updating the PageRank value of each newly-added vertex according to the total times that the random walk path passes through each newly-added vertex;
the random walk mode is that a certain vertex walks to the subsequent vertexes of other outgoing edges by taking alpha as probability, the random outgoing edge walk is selected by taking the inverse of the outgoing degree as probability, each vertex stops walking by taking 1-alpha as probability, if the walking to the subsequent vertexes is continued, the walking does not exceed R step at most, and when the vertex without the outgoing edge is met, the round of walking is stopped immediately.
Further, the random walk mode starting from any vertex repeats a preset first round number.
Furthermore, in the dynamic flow graph data updating process at each moment, updating the PageRank value of each vertex in a preset incremental calculation mode; the changes formed in the dynamic flow graph data updating process comprise adding edges, adding peaks, deleting peaks and deleting edges;
the incremental calculation mode comprises the following steps:
a first processing step of newly adding edges: if the vertex u and the vertex v in the newly added edge e ═ u, v) are both existing vertices, the number of times of increasing the passing vertex u is calculated
Figure BDA0002791407020000021
Starting from the vertex u by adopting the random walk mode
Figure BDA0002791407020000022
Randomly walking the wheel to obtain a vertex u random walking path;
and if the vertex u random walk path passes through the newly added edge e and passes through any vertex i, adding 1 to the total number of times of the original random walk path passing through the vertex i, and if the vertex u random walk path does not pass through the newly added edge e but passes through any vertex i, subtracting 1 from the number of times of the original random walk path passing through the vertex i, thereby calculating and updating the PageRank value of each vertex.
Further, the incremental calculation method further includes:
a second processing step of newly adding edges: and if the newly-added edge e is the edge of the newly-added vertex and the existing vertex, adding one to the total number of the vertices of the dynamic flow graph data, and updating the PageRank value of each vertex by adopting the first processing step of the newly-added edge.
Further, the incremental calculation method further includes:
a first processing step of deleting edges: if both vertex u and vertex v are existing vertices in the deleted edge e ═ u, v, the number of times the passing vertex u is reduced is calculated
Figure BDA0002791407020000031
By adopting the random walk mode, the out-degree vertex and the vertex v of the vertex u are randomly selected as the starting points to carry out
Figure BDA0002791407020000032
Randomly walking the wheel to obtain a vertex u random walking path;
if the vertex u random walk path from the out-degree vertex of the vertex u passes through any vertex i, adding 1 to the original total times of passing through the vertex i; and if the vertex u starting from the vertex v randomly walks through any vertex i, subtracting 1 from the original total number of times of passing through the vertex i, and calculating and updating the PageRank value of each vertex.
Further, the incremental calculation method further includes:
and a second processing step of deleting edges: and if the deleted edge e is the edge deleted caused by deleting one vertex, reducing the total number of the vertexes of the dynamic flow graph data by one, and updating the PageRank value of each vertex by adopting the first processing step of deleting the edge.
Further, the incremental calculation method further includes: and sequentially processing the changes formed in the dynamic flow graph data updating process until all changes are traversed.
Further, the unaffected vertex in the dynamic flow graph data updating process follows the original PageRank value of the vertex.
The invention also provides a device for updating the importance of the data vertex of the dynamic flow graph based on random walk, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor calls the computer program to execute the following steps:
acquiring associated data in real time according to the time sequence, and updating the data of the dynamic flow graph in real time;
acquiring an affected vertex and a newly added vertex in the data updating process of the dynamic flow graph at each moment;
generating a random walk path by each vertex in the dynamic flow graph data in a preset random walk mode;
calculating or updating the PageRank value of each affected vertex according to the total times of the random walk path passing through each affected vertex;
aggregating original vertexes of the dynamic flow graph data into a hyper-vertex, reserving all connecting edges of newly-added vertexes in the dynamic flow graph data, connecting the other ends of the connecting edges with the hyper-vertex to obtain a new graph, generating a random walk path for each vertex of the new graph in a preset random walk mode, and calculating or updating the PageRank value of each newly-added vertex according to the total times that the random walk path passes through each newly-added vertex;
the random walk mode is that a certain vertex walks to the subsequent vertexes of other outgoing edges by taking alpha as probability, the random outgoing edge walk is selected by taking the inverse of the outgoing degree as probability, each vertex stops walking by taking 1-alpha as probability, if the walking to the subsequent vertexes is continued, the walking does not exceed R step at most, and when the vertex without the outgoing edge is met, the round of walking is stopped immediately.
Furthermore, in the dynamic flow graph data updating process at each moment, updating the PageRank value of each vertex in a preset incremental calculation mode; the changes formed in the dynamic flow graph data updating process comprise adding edges, adding peaks, deleting peaks and deleting edges;
the incremental calculation mode comprises the following steps:
a first processing step of newly adding edges: if the vertex u and the vertex v in the newly added edge e ═ u, v) are both existing vertices, the number of times of increasing the passing vertex u is calculated
Figure BDA0002791407020000041
Starting from the vertex u by adopting the random walk mode
Figure BDA0002791407020000042
Randomly walking the wheel to obtain a vertex u random walking path;
and if the vertex u random walk path passes through the newly added edge e and passes through any vertex i, adding 1 to the total number of times of the original random walk path passing through the vertex i, and if the vertex u random walk path does not pass through the newly added edge e but passes through any vertex i, subtracting 1 from the number of times of the original random walk path passing through the vertex i, thereby calculating and updating the PageRank value of each vertex.
Compared with the prior art, the invention has the following advantages:
(1) the method calculates the PageRank value of the graph vertex through the random walk path generated by the random walk, fully reflects the probability change of the vertex being accessed through the random walk, and can accurately update the PageRank of the vertex;
determining an influence area of dynamic change in a flow graph on an existing graph by using the sound wave propagation principle, determining a newly added round needing random walk on the basis of existing random walk path information, and updating the PageRank by determining the walk probability of an influenced vertex passing through; forming a new graph with super-vertices by using the idea of aggregation increment for the newly added vertices, and then completing the calculation of the newly added vertices PageRank through random walk;
the calculation process utilizes the existing calculation result as much as possible, so that the speed of updating the PageRank is greatly increased; the dynamic flow graph PageRank obtained by incremental updating is accurate and effective, and a method for reliably and quickly updating PageRank is provided for the application field of dynamic flow graphs with high real-time requirements.
(2) The method utilizes the idea of incremental calculation to update the PageRank of the changed and influenced part of the vertexes of the dynamic flow graph, and simultaneously reserves and continues the PaegRank of the unaffected vertexes, so that the calculation amount of solution can be greatly reduced, the PageRank can be quickly obtained, and the correctness of the vertexes PageRank can be ensured.
(3) The invention uses the changed random walk path number as the round of the random walk again, can reflect the real number of times that the random walk is changed through the vertex i as much as possible, and improves any vertex
Figure BDA0002791407020000051
Accuracy of the PageRank increment calculation.
(4) In order to reflect the true vertex PageRank value as much as possible, the invention repeats M rounds of random walk starting from any vertex.
Drawings
FIG. 1 is a schematic diagram of a dynamic flow graph PageRank update incremental computation scheme change;
fig. 2 is a schematic diagram of a change in dynamic flow graph data.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
Example 1
The embodiment provides a method for updating importance of data vertex of a dynamic flow graph based on random walk, which comprises the following steps:
acquiring associated data in real time according to the time sequence, and updating the data of the dynamic flow graph in real time;
acquiring an affected vertex and a newly added vertex in the data updating process of the dynamic flow graph at each moment;
generating a random walk path by each vertex in the dynamic flow graph data in a preset random walk mode;
calculating or updating the PageRank value of each affected vertex according to the total times of the random walk path passing through each affected vertex;
aggregating original vertexes of the dynamic flow graph data into a hyper-vertex, reserving all connecting edges of newly-added vertexes in the dynamic flow graph data, connecting the other ends of the connecting edges with the hyper-vertex to obtain a new graph, generating a random walk path for each vertex of the new graph in a preset random walk mode, and calculating or updating the PageRank value of each newly-added vertex according to the total times that the random walk path passes through each newly-added vertex;
the random walk mode is that a certain vertex walks to the subsequent vertexes of other outgoing edges by taking alpha as probability, the random outgoing edge walk is selected by taking the inverse of the outgoing degree as probability, each vertex stops walking by taking 1-alpha as probability, if the walking to the subsequent vertexes is continued, the walking does not exceed R step at most, and when the vertex without the outgoing edge is met, the round of walking is stopped immediately.
As a preferred embodiment, the random walk mode starting from an arbitrary vertex repeats a preset first round number.
As a preferred embodiment, in the dynamic flow graph data updating process at each moment, updating the PageRank value of each vertex in a preset incremental calculation mode; the changes formed in the dynamic flow graph data updating process comprise adding edges, adding peaks, deleting peaks and deleting edges;
the incremental calculation mode comprises the following steps:
a first processing step of newly adding edges: if the vertex u and the vertex v in the newly added edge e ═ u, v) are both existing vertices, the number of times of increasing the passing vertex u is calculated
Figure BDA0002791407020000061
Starting from the vertex u by adopting the random walk mode
Figure BDA0002791407020000062
Randomly walking the wheel to obtain a vertex u random walking path;
and if the vertex u random walk path passes through the newly added edge e and passes through any vertex i, adding 1 to the total number of times of the original random walk path passing through the vertex i, and if the vertex u random walk path does not pass through the newly added edge e but passes through any vertex i, subtracting 1 from the number of times of the original random walk path passing through the vertex i, thereby calculating and updating the PageRank value of each vertex.
Further, the incremental calculation method further includes:
a second processing step of newly adding edges: and if the newly-added edge e is the edge of the newly-added vertex and the existing vertex, adding one to the total number of the vertices of the dynamic flow graph data, and updating the PageRank value of each vertex by adopting the first processing step of the newly-added edge.
Further, the incremental calculation method further includes:
a first processing step of deleting edges: if both vertex u and vertex v are existing vertices in the deleted edge e ═ u, v, the number of times the passing vertex u is reduced is calculated
Figure BDA0002791407020000063
By adopting the random walk mode, the out-degree vertex and the vertex v of the vertex u are randomly selected as the starting points to carry out
Figure BDA0002791407020000064
Random walk of wheelObtaining a vertex u random walk path;
if the vertex u random walk path from the out-degree vertex of the vertex u passes through any vertex i, adding 1 to the original total times of passing through the vertex i; and if the vertex u starting from the vertex v randomly walks through any vertex i, subtracting 1 from the original total number of times of passing through the vertex i, and calculating and updating the PageRank value of each vertex.
Further, the incremental calculation method further includes:
and a second processing step of deleting edges: and if the deleted edge e is the edge deleted caused by deleting one vertex, reducing the total number of the vertexes of the dynamic flow graph data by one, and updating the PageRank value of each vertex by adopting the first processing step of deleting the edge.
Further, the incremental calculation method further includes: and sequentially processing the changes formed in the dynamic flow graph data updating process until all changes are traversed.
Further, the unaffected vertex in the dynamic flow graph data updating process follows the original PageRank value of the vertex.
The invention also provides a device for updating the importance of the data vertex of the dynamic flow graph based on the random walk, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor calls the computer program to execute the method for updating the importance of the data vertex of the dynamic flow graph based on the random walk.
The key parts of the process are described in detail below.
1) General description
In FIG. 1, Δ G with graph change informationtIs determined by the principle of acoustic wave propagationtAnd the affected vertexes incrementally update the PageRank of the vertexes through a random walk method, and the unaffected vertexes follow the PageRank which is calculated previously. In fig. 1, when a new vertex arrives, a new graph with a hyper vertex needs to be constructed, and the PageRank of the new vertex is calculated by using a random walk method. Thereby completing the real-time updating of all vertices.
2) Dynamic flowsheet modeling and random walk mode definition
The associated data continuously arrive according to the time sequence 1,2,3, …, t, …, and thus the dynamic flow graph data G is obtained1,G2,G3,…,Gt…. G is satisfied because the data does not mutatet=Gt-1+ΔGtWherein Δ GtGraph change information for addition or deletion of vertices and edges in a t-time flow graph, GtAccording to Δ GtThe graph change information is given in the graph, and all graph data accumulated from the beginning to the time t are changed. Generally speaking,. DELTA.GtIs a very small scale diagram, and Δ G is not recordedt=(ΔV,ΔE)=(ΔV0+ΔV++ΔV-,ΔE++ΔE-) Wherein Δ V0G for adding or deleting edge connectiont-1Central origin set of vertices, Δ V+For the newly added set of vertices at time t, Δ V-Set of vertices deleted for time t, Δ E+For the newly added edge set at time t, Δ E-For the set of edges deleted at time t.
The random walk mode is defined as: in the graph G ═ V, E, all vertices start to walk to their succeeding outgoing vertices with a probability, α is also referred to as a random walk coefficient, and any outgoing side walk is selected with the inverse of the degree of departure as a probability, and each vertex stops walking with a probability of 1- α. If the walk to the subsequent vertex is continued, the step of R is not exceeded at most, and particularly when the vertex without the edge is encountered, the round of walk is stopped immediately. To try to reflect the true vertex PageRank values, M rounds of random walks starting from any vertex were repeated.
3) Determining original graph vertices affected by Δ G
Due to Δ GtGraph change information, Δ G, representing the generation and arrival of a flow graph at time ttWill pass through Δ V0Vertex pair G int-1The PageRank of the middle vertex has certain influence, and the influence is continuously reduced in the process of being spread and diffused to a distance until the influence infinitely approaches 0 or no subsequent departure vertex exists, and the spread of the influence is stopped. Further obtain the material receivingΔGtThe affected vertex is the vertex needing to recalculate PageRank, and the set formed by the vertices is marked as
Figure BDA0002791407020000081
4) PageRank incremental computation of affected vertices
Δ G will change the number of times the original random walk of the affected vertex has passed, so the number of random walk paths for which the map change position changes needs to be calculated first. In order to reflect the real number of times that the random walk is changed through the vertex i as much as possible, any vertex is increased
Figure BDA0002791407020000082
The accuracy of PageRank increment calculation is realized by using the changed random walk path number as the round of re-random walk, and updating the random walk to pass through G after the random walk of all rounds is finishedtAnd updating the total times of the vertexes of the affected region to obtain the PageRank of the affected vertexes.
5) Calculation of PageRank new vertex in delta G
Due to Δ GtThe method is a subgraph, all vertexes comprise original vertexes, newly added vertexes and deleted vertexes, and all edges comprise newly added edges and deleted edges. For Δ V+PageRank computation of newly added vertices in the set, taking into account Δ GtThe change of the graph is mainly generated in the original vertex set delta V0Position of middle vertex, Δ V0All the vertexes are aggregated into a supervertex, and the supervertex and the newly added vertex delta V are reserved+All the connecting edges of (2) are connected to the hyper vertex at the other ends thereof, thereby forming a new smaller scale graph denoted as G'tTherefore, the random walk mode can be continuously used, and the Delta G is calculatedtAnd adding the PageRank of the new vertex in the middle.
The specific implementation process of the dynamic flow graph data vertex importance updating method of the embodiment is as follows: in the existing diagram Gt-1On the basis, when the graph change information delta G generated by the dynamic flow graph arrives, according to the position information of the delta G,determining vertexes influenced by delta G by using a sound wave propagation principle, calculating the wandering probability change of the influenced vertexes through a random wandering algorithm, and changing part of random wandering paths so as to incrementally update the PageRank of the vertexes; for the unaffected original vertex, continue to use Gt-1The calculated vertex PageRank; and if the newly added vertex exists, calculating by utilizing an aggregation increment idea and a random walk algorithm to obtain the corresponding PageRank. The specific steps of the algorithm are as follows:
inputting: gt-1=(V,E),ΔGt=(ΔV,ΔE)=(ΔV0+ΔV++ΔV-,ΔE++ΔE-) Attenuation coefficient beta, threshold delta for influencing stop propagation, random walk coefficient alpha, and preset round number M of random walks;
and (3) outputting: gtPageRank value set corresponding to all vertexes in the system
Figure BDA0002791407020000083
Step 1: when Δ GtAfter arrival, the set of vertices affected by the arrival is determined
Figure BDA0002791407020000084
While unaffected vertex edges use the original PageRank, which constitutes a set
Figure BDA0002791407020000085
If there are affected vertices, i.e.
Figure BDA0002791407020000086
If not, turning to the step 2; if it is
Figure BDA0002791407020000087
Turning to step 3 if the empty set is obtained;
step 2: by Δ GtCan know Δ V+And Δ V-,ΔE+And Δ E-Determining whether the change is adding or deleting vertexes and edges, if the change is adding vertexes or edges, executing the step 2.1, and if the change is deleting vertexes or edges, executing the step 2.2;
step 2.1: (1) traversal sets
Figure BDA0002791407020000091
When the newly added edge e is an edge between two existing vertexes, (u, v), the number of times the passing vertex u is increased is calculated
Figure BDA0002791407020000092
In a walking manner as defined herein, using
Figure BDA0002791407020000093
As a round of random walk from the vertex u, the process is performed from the vertex u
Figure BDA0002791407020000094
Randomly walking the wheel, and if the random walking path passing through the newly added edge e passes through any vertex i, adding 1 to the total number of times of the original passing through the vertex i; if the vertex i passes through any vertex i of the affected area in the figure without passing through e, the total number of times of the original passing through the vertex i is reduced by 1. After all rounds of random walk are finished, statistics G is carried outtTotal number of times per vertex i of the affected area
Figure BDA0002791407020000095
Thereby calculating the affected vertex
Figure BDA0002791407020000096
PageRank of (c), constitute a set
Figure BDA0002791407020000097
(2) Traverse Δ E+When the new edge e is the edge of the new vertex and the existing vertex, GtThe total number of middle vertices is | V | +1, the operation process of processing the newly added edge is shown as (1) in the step 2.1, and the affected vertices are further calculated
Figure BDA0002791407020000098
PageRank, update set
Figure BDA0002791407020000099
(3) In general,. DELTA.GtTraversing Δ G to include information for adding multiple vertices and edgestRepeating the step 2 until the end, and turning to the step 3;
step 2.2: (1) traversal set Δ E-When the deleted edge e is (u, v), the number of times the passing vertex u is reduced is calculated
Figure BDA00027914070200000910
Using according to the walking mode defined above
Figure BDA00027914070200000911
Randomly selecting the out-degree vertex and the vertex v of the vertex u as the starting points to perform the round required by the random walk
Figure BDA00027914070200000912
If the random walk path starting from the out-degree vertex of the vertex u passes through any vertex i, adding 1 to the total number of times of the original passing through the vertex i; if the random walk path from vertex v passes through any vertex i, 1 is subtracted from the original total number of times that vertex i has been passed. After all rounds of random walk are finished, statistics G is carried outtTotal number of times per vertex i of the affected area
Figure BDA00027914070200000913
And then calculating the affected vertex
Figure BDA00027914070200000914
PageRank, update set
Figure BDA00027914070200000915
(2) Traverse Δ E-G when a deleted edge e causes a vertex to be deleted for the deletion of that edgetThe total number of vertices in the tree is | V | -1, the operation process of processing the deleted edge is shown as (1) in the step 2.2, and the affected vertices are calculated
Figure BDA00027914070200000916
PageRank value of (1), update set
Figure BDA00027914070200000917
(3) In general,. DELTA.GtTraversing Δ G for information including deleting multiple vertices and edgestRepeatedly executing the step 2 until the end, and turning to the step 3;
and step 3: for newly added vertex Δ V+By first using Δ V0Aggregating to form a hyper vertex, retaining the connection edge of the hyper vertex and the newly added vertex to form a new graph with the hyper vertex, solving the PageRank of the newly added vertex to form a set
Figure BDA00027914070200000918
And 4, step 4: output collection
Figure BDA00027914070200000919
The method has certain practical application value, for example, in the process of webpage searching, when the webpages and the related links thereof are newly added, the method can quickly determine the sequencing of searching results according to the PageRank values of all the webpages, and displays the top-K sequenced webpages to users, so that the users can quickly search the most important and related webpage information; in an electronic commerce system, the PageRank value of the commodity is quickly updated by using commodity information purchased or browsed by a user and the change of the concerned commodity through the method, and the updated commodity with higher ranking is recommended to the user; on the social network, according to the PageRank values of other users connected with a user, a friend circle closely connected with the user can be found, if the user is a criminal, a criminal group can also be found, when the friend circle of the criminal changes, the PageRank can be updated in an incremental mode according to the method, the trend of the criminal is analyzed, and then criminal groups are effectively attacked.
The embodiment also provides a web page search ranking method, which is based on the above dynamic flow graph data vertex importance updating method based on random walks, calculates PageRank values of all web pages to determine ranking of search results, and displays the web pages according to the ranking results.
The embodiment also provides a commodity recommendation method, which updates the PageRank value of a commodity and recommends the updated commodity with higher rank to a user based on the dynamic flow graph data vertex importance updating method based on random walk by using commodity information purchased or browsed by the user and the change of the concerned commodity.
The embodiment also provides a user friend prediction method, which is used for calculating the PageRank value of other users connected with a user on the basis of the dynamic flow graph data vertex importance updating method based on random walk on the basis of social information on a social network, and finding a friend circle closely connected with the user.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (10)

1. A method for updating importance of data vertex of a dynamic flow graph based on random walk is characterized by comprising the following steps:
acquiring associated data in real time according to the time sequence, and updating the data of the dynamic flow graph in real time;
acquiring an affected vertex and a newly added vertex in the data updating process of the dynamic flow graph at each moment;
generating a random walk path by each vertex in the dynamic flow graph data in a preset random walk mode;
calculating or updating the PageRank value of each affected vertex according to the total times of the random walk path passing through each affected vertex;
aggregating original vertexes of the dynamic flow graph data into a hyper-vertex, reserving all connecting edges of newly-added vertexes in the dynamic flow graph data, connecting the other ends of the connecting edges with the hyper-vertex to obtain a new graph, generating a random walk path for each vertex of the new graph in a preset random walk mode, and calculating or updating the PageRank value of each newly-added vertex according to the total times that the random walk path passes through each newly-added vertex;
the random walk mode is that a certain vertex walks to the subsequent vertexes of other outgoing edges by taking alpha as probability, the random outgoing edge walk is selected by taking the inverse of the outgoing degree as probability, each vertex stops walking by taking 1-alpha as probability, if the walking to the subsequent vertexes is continued, the walking does not exceed R step at most, and when the vertex without the outgoing edge is met, the round of walking is stopped immediately.
2. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 1, wherein the random walk mode starting from any vertex repeats a preset first round number.
3. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 2, wherein the PageRank value of each vertex is updated in a preset incremental calculation mode in the process of updating the data of the dynamic flow graph at each moment; the changes formed in the dynamic flow graph data updating process comprise adding edges, adding peaks, deleting peaks and deleting edges;
the incremental calculation mode comprises the following steps:
a first processing step of newly adding edges: if the vertex u and the vertex v in the newly added edge e ═ u, v) are both existing vertices, the number of times of increasing the passing vertex u is calculated
Figure FDA0002791407010000011
Starting from the vertex u by adopting the random walk mode
Figure FDA0002791407010000012
Randomly walking the wheel to obtain a vertex u random walking path;
and if the vertex u random walk path passes through the newly added edge e and passes through any vertex i, adding 1 to the total number of times of the original random walk path passing through the vertex i, and if the vertex u random walk path does not pass through the newly added edge e but passes through any vertex i, subtracting 1 from the number of times of the original random walk path passing through the vertex i, thereby calculating and updating the PageRank value of each vertex.
4. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 3, wherein the incremental calculation mode further comprises:
a second processing step of newly adding edges: and if the newly-added edge e is the edge of the newly-added vertex and the existing vertex, adding one to the total number of the vertices of the dynamic flow graph data, and updating the PageRank value of each vertex by adopting the first processing step of the newly-added edge.
5. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 3, wherein the incremental calculation mode further comprises:
a first processing step of deleting edges: if both vertex u and vertex v are existing vertices in the deleted edge e ═ u, v, the number of times the passing vertex u is reduced is calculated
Figure FDA0002791407010000021
By adopting the random walk mode, the out-degree vertex and the vertex v of the vertex u are randomly selected as the starting points to carry out
Figure FDA0002791407010000022
Randomly walking the wheel to obtain a vertex u random walking path;
if the vertex u random walk path from the out-degree vertex of the vertex u passes through any vertex i, adding 1 to the original total times of passing through the vertex i; and if the vertex u starting from the vertex v randomly walks through any vertex i, subtracting 1 from the original total times of passing through the vertex i, and calculating and updating the PageRank value of each vertex.
6. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 5, wherein the incremental calculation further comprises:
and a second processing step of deleting edges: and if the deleted edge e is the edge deleted caused by deleting one vertex, reducing the total number of the vertexes of the dynamic flow graph data by one, and updating the PageRank value of each vertex by adopting the first processing step of deleting the edge.
7. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 3, wherein the incremental calculation mode further comprises: and sequentially processing the changes formed in the data updating process of the dynamic flow graph until all the changes are traversed.
8. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 1, wherein the vertex which is not affected in the process of updating the data of the dynamic flow graph follows the original PageRank value of the vertex.
9. An apparatus for updating importance of data vertex of a dynamic flow graph based on random walk, comprising a memory and a processor, wherein the memory stores a computer program, and the processor calls the computer program to execute the following steps:
acquiring associated data in real time according to the time sequence, and updating the data of the dynamic flow graph in real time;
acquiring an affected vertex and a newly added vertex in the data updating process of the dynamic flow graph at each moment;
generating a random walk path by each vertex in the dynamic flow graph data in a preset random walk mode;
calculating or updating the PageRank value of each affected vertex according to the total times of the random walk path passing through each affected vertex;
aggregating original vertexes of the dynamic flow graph data into a hyper-vertex, reserving all connecting edges of newly-added vertexes in the dynamic flow graph data, connecting the other ends of the connecting edges with the hyper-vertex to obtain a new graph, generating a random walk path for each vertex of the new graph in a preset random walk mode, and calculating or updating the PageRank value of each newly-added vertex according to the total times that the random walk path passes through each newly-added vertex;
the random walk mode is that a certain vertex walks to the subsequent vertexes of other outgoing edges by taking alpha as probability, the random outgoing edge walk is selected by taking the inverse of the outgoing degree as probability, each vertex stops walking by taking 1-alpha as probability, if the walking to the subsequent vertexes is continued, the walking does not exceed R step at most, and when the vertex without the outgoing edge is met, the round of walking is stopped immediately.
10. The apparatus for updating importance of data vertex of dynamic flow graph based on random walk according to claim 9, wherein the PageRank value of each vertex is updated in a preset incremental calculation manner in the process of updating data of dynamic flow graph at each moment; the changes formed in the dynamic flow graph data updating process comprise adding edges, adding peaks, deleting peaks and deleting edges;
the incremental calculation mode comprises the following steps:
a first processing step of newly adding edges: if the vertex u and the vertex v in the newly added edge e ═ u, v) are both existing vertices, the number of times of increasing the passing vertex u is calculated
Figure FDA0002791407010000031
Starting from the vertex u by adopting the random walk mode
Figure FDA0002791407010000032
Randomly walking the wheel to obtain a vertex u random walking path;
and if the vertex u random walk path passes through the newly added edge e and passes through any vertex i, adding 1 to the total number of times of the original random walk path passing through the vertex i, and if the vertex u random walk path does not pass through the newly added edge e but passes through any vertex i, subtracting 1 from the number of times of the original random walk path passing through the vertex i, thereby calculating and updating the PageRank value of each vertex.
CN202011315919.1A 2020-11-22 2020-11-22 Dynamic flow graph data vertex importance updating method and device based on random walk Active CN112417247B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011315919.1A CN112417247B (en) 2020-11-22 2020-11-22 Dynamic flow graph data vertex importance updating method and device based on random walk

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011315919.1A CN112417247B (en) 2020-11-22 2020-11-22 Dynamic flow graph data vertex importance updating method and device based on random walk

Publications (2)

Publication Number Publication Date
CN112417247A true CN112417247A (en) 2021-02-26
CN112417247B CN112417247B (en) 2022-04-05

Family

ID=74776993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011315919.1A Active CN112417247B (en) 2020-11-22 2020-11-22 Dynamic flow graph data vertex importance updating method and device based on random walk

Country Status (1)

Country Link
CN (1) CN112417247B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023241641A1 (en) * 2022-06-15 2023-12-21 华为技术有限公司 Graph processing method and apparatus

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007100834A2 (en) * 2006-02-27 2007-09-07 The Regents Of The University Of California Graph querying, graph motif mining and the discovery of clusters
US20080275902A1 (en) * 2007-05-04 2008-11-06 Microsoft Corporation Web page analysis using multiple graphs
CN108009933A (en) * 2016-10-27 2018-05-08 中国科学技术大学先进技术研究院 Figure centrality computational methods and device
CN109255073A (en) * 2018-08-28 2019-01-22 麒麟合盛网络技术股份有限公司 A kind of personalized recommendation method, device and electronic equipment
CN110011838A (en) * 2019-03-25 2019-07-12 武汉大学 A kind of method for real time tracking of dynamic network PageRank value
CN110019989A (en) * 2019-04-08 2019-07-16 腾讯科技(深圳)有限公司 A kind of data processing method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007100834A2 (en) * 2006-02-27 2007-09-07 The Regents Of The University Of California Graph querying, graph motif mining and the discovery of clusters
US20080275902A1 (en) * 2007-05-04 2008-11-06 Microsoft Corporation Web page analysis using multiple graphs
CN108009933A (en) * 2016-10-27 2018-05-08 中国科学技术大学先进技术研究院 Figure centrality computational methods and device
CN109255073A (en) * 2018-08-28 2019-01-22 麒麟合盛网络技术股份有限公司 A kind of personalized recommendation method, device and electronic equipment
CN110011838A (en) * 2019-03-25 2019-07-12 武汉大学 A kind of method for real time tracking of dynamic network PageRank value
CN110019989A (en) * 2019-04-08 2019-07-16 腾讯科技(深圳)有限公司 A kind of data processing method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
周芝民等: "基于连通性和随机游走的好友推荐算法", 《信息技术》 *
章讯等: "基于网络结构改进社交网络好友推荐算法研究", 《信息技术》 *
赖斯: "基于GPGPU的PageRank值计算", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023241641A1 (en) * 2022-06-15 2023-12-21 华为技术有限公司 Graph processing method and apparatus

Also Published As

Publication number Publication date
CN112417247B (en) 2022-04-05

Similar Documents

Publication Publication Date Title
US20230252327A1 (en) Neural architecture search for convolutional neural networks
US11070643B2 (en) Discovering signature of electronic social networks
CN110837602B (en) User recommendation method based on representation learning and multi-mode convolutional neural network
US8903824B2 (en) Vertex-proximity query processing
Fan et al. Querying big graphs within bounded resources
CN113420190A (en) Merchant risk identification method, device, equipment and storage medium
US8438189B2 (en) Local computation of rank contributions
CN110213164B (en) Method and device for identifying network key propagator based on topology information fusion
Bergamini et al. Approximating betweenness centrality in fully dynamic networks
CN105224959A (en) The training method of order models and device
CN107563653A (en) Multi-robot full-coverage task allocation method
CN112053176B (en) Method, device, equipment and storage medium for analyzing information delivery data
CN109740106A (en) Large-scale network betweenness approximation method based on graph convolution neural network, storage device and storage medium
Xie et al. Dynamic interaction graphs with probabilistic edge decay
CN112417247B (en) Dynamic flow graph data vertex importance updating method and device based on random walk
CN106844736B (en) Time-space co-occurrence mode mining method based on time-space network
CN114385930A (en) Interest point recommendation method and system
CN113689270A (en) Method for determining black product device, electronic device, storage medium, and program product
WO2024098682A1 (en) Xai model evaluation method and apparatus, device, and medium
Song et al. Accurate and fast path computation on large urban road networks: A general approach
Lai et al. Parallel computations of local PageRank problem based on Graphics Processing Unit
CN114329231A (en) Object feature processing method and device, electronic equipment and storage medium
CN113627513A (en) Training data generation method and system, electronic device and storage medium
CN114154046B (en) Website search ranking method and system
CN106295844A (en) A kind of data processing method, device, system and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant