CN112417247A

CN112417247A - Dynamic flow graph data vertex importance updating method and device based on random walk

Info

Publication number: CN112417247A
Application number: CN202011315919.1A
Authority: CN
Inventors: 曾国荪; 丁春玲; 孙志鹏
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2020-11-22
Filing date: 2020-11-22
Publication date: 2021-02-26
Anticipated expiration: 2040-11-22
Also published as: CN112417247B

Abstract

The invention relates to a method and a device for updating the importance of data vertexes of a dynamic flow graph based on random walk, wherein the method comprises the following steps: acquiring associated data in real time according to the time sequence, and updating the data of the dynamic flow graph in real time; acquiring an affected vertex and a newly added vertex in the data updating process of the dynamic flow graph at each moment; generating a random walk path by each vertex in the dynamic flow graph data in a preset random walk mode; calculating or updating the PageRank value of each affected vertex according to the total times of the random walk path passing through each affected vertex; aggregating the original vertexes of the dynamic flow graph data into a hyper-vertex, reserving all connecting edges of newly-added vertexes in the dynamic flow graph data, connecting the other ends of the connecting edges with the hyper-vertex, and thus obtaining a new graph, wherein the PageRank value of each newly-added vertex is calculated or updated in the new graph through the method. Compared with the prior art, the method and the device ensure the accuracy of the calculation result and the real-time performance of the calculation.

Description

Dynamic flow graph data vertex importance updating method and device based on random walk

Technical Field

The invention relates to the field of importance updating of data vertexes of dynamic flow graphs, in particular to a random walk-based importance updating method of data vertexes of a dynamic flow graph.

Background

The initial PageRank concept refers to the ranking value of Web page importance, which is currently widely referred to as the ranking value of vertex importance in a graph, and is usually obtained by continuously iteratively converging a connection matrix and a feature vector of the graph. In the big data era, with the rapid development of social networks, a plurality of large-scale dynamic flow graphs are generated, and the importance of each vertex in the graph, namely PageRank, needs to be calculated so as to develop field application. For example, in a dynamic social network, there is a need to find circles of friends on the fly, or to quickly discover criminal groups, etc., based on the PageRank of vertices.

The traditional method for solving the PageRank mainly comprises a PageRank method based on static graph calculation, an incremental power iteration method, an aggregation incremental calculation method and the like, and the method has the following defects:

1. the PageRank method based on static graph calculation utilizes global graph data to perform power iteration method calculation again on a changed new graph to update the PageRank, consumes a large amount of time and calculation resources, and is difficult to meet the real-time requirement of related graph application.

2. The incremental power iteration method provides an incremental updating iteration model, but the model needs a large amount of time overhead to ensure the accuracy of the PageRank updating, and meanwhile, errors of updating the PageRank along with a continuously arriving flow graph are accumulated continuously. The aggregation delta method has difficulty in determining the vertices that need to be aggregated, and the degree of aggregation directly affects the quality and computational complexity of the PageRank update.

In conclusion, the traditional method pursues the accuracy of PageRank calculation, so that graph data which changes continuously and rapidly are difficult to deal with; or the accuracy of the PageRank is sacrificed to obtain a small amount of calculation, the updating speed is accelerated, and the calculation error of the PageRank is accumulated continuously along with the continuous change of a flow graph; therefore, the traditional method is difficult to achieve reasonable balance on the accuracy and the real-time performance of updating the PageRank, is difficult to be suitable for a continuously changing dynamic flow diagram environment, cannot effectively and quickly update the PageRank value, and is particularly suitable for the application field needing real-time processing.

Disclosure of Invention

The invention aims to overcome the defect that the PageRank calculation cannot have both accuracy and real-time performance in the prior art, and provides a dynamic flow graph data vertex importance updating method based on random walk.

The purpose of the invention can be realized by the following technical scheme:

a dynamic flow graph data vertex importance updating method based on random walk comprises the following steps:

acquiring associated data in real time according to the time sequence, and updating the data of the dynamic flow graph in real time;

acquiring an affected vertex and a newly added vertex in the data updating process of the dynamic flow graph at each moment;

generating a random walk path by each vertex in the dynamic flow graph data in a preset random walk mode;

calculating or updating the PageRank value of each affected vertex according to the total times of the random walk path passing through each affected vertex;

aggregating original vertexes of the dynamic flow graph data into a hyper-vertex, reserving all connecting edges of newly-added vertexes in the dynamic flow graph data, connecting the other ends of the connecting edges with the hyper-vertex to obtain a new graph, generating a random walk path for each vertex of the new graph in a preset random walk mode, and calculating or updating the PageRank value of each newly-added vertex according to the total times that the random walk path passes through each newly-added vertex;

the random walk mode is that a certain vertex walks to the subsequent vertexes of other outgoing edges by taking alpha as probability, the random outgoing edge walk is selected by taking the inverse of the outgoing degree as probability, each vertex stops walking by taking 1-alpha as probability, if the walking to the subsequent vertexes is continued, the walking does not exceed R step at most, and when the vertex without the outgoing edge is met, the round of walking is stopped immediately.

Further, the random walk mode starting from any vertex repeats a preset first round number.

Furthermore, in the dynamic flow graph data updating process at each moment, updating the PageRank value of each vertex in a preset incremental calculation mode; the changes formed in the dynamic flow graph data updating process comprise adding edges, adding peaks, deleting peaks and deleting edges;

the incremental calculation mode comprises the following steps:

a first processing step of newly adding edges: if the vertex u and the vertex v in the newly added edge e ═ u, v) are both existing vertices, the number of times of increasing the passing vertex u is calculated

Starting from the vertex u by adopting the random walk mode

Randomly walking the wheel to obtain a vertex u random walking path;

and if the vertex u random walk path passes through the newly added edge e and passes through any vertex i, adding 1 to the total number of times of the original random walk path passing through the vertex i, and if the vertex u random walk path does not pass through the newly added edge e but passes through any vertex i, subtracting 1 from the number of times of the original random walk path passing through the vertex i, thereby calculating and updating the PageRank value of each vertex.

Further, the incremental calculation method further includes:

a second processing step of newly adding edges: and if the newly-added edge e is the edge of the newly-added vertex and the existing vertex, adding one to the total number of the vertices of the dynamic flow graph data, and updating the PageRank value of each vertex by adopting the first processing step of the newly-added edge.

Further, the incremental calculation method further includes:

a first processing step of deleting edges: if both vertex u and vertex v are existing vertices in the deleted edge e ═ u, v, the number of times the passing vertex u is reduced is calculated

By adopting the random walk mode, the out-degree vertex and the vertex v of the vertex u are randomly selected as the starting points to carry out

Randomly walking the wheel to obtain a vertex u random walking path;

if the vertex u random walk path from the out-degree vertex of the vertex u passes through any vertex i, adding 1 to the original total times of passing through the vertex i; and if the vertex u starting from the vertex v randomly walks through any vertex i, subtracting 1 from the original total number of times of passing through the vertex i, and calculating and updating the PageRank value of each vertex.

Further, the incremental calculation method further includes:

and a second processing step of deleting edges: and if the deleted edge e is the edge deleted caused by deleting one vertex, reducing the total number of the vertexes of the dynamic flow graph data by one, and updating the PageRank value of each vertex by adopting the first processing step of deleting the edge.

Further, the incremental calculation method further includes: and sequentially processing the changes formed in the dynamic flow graph data updating process until all changes are traversed.

Further, the unaffected vertex in the dynamic flow graph data updating process follows the original PageRank value of the vertex.

The invention also provides a device for updating the importance of the data vertex of the dynamic flow graph based on random walk, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor calls the computer program to execute the following steps:

the incremental calculation mode comprises the following steps:

Starting from the vertex u by adopting the random walk mode

Randomly walking the wheel to obtain a vertex u random walking path;

Compared with the prior art, the invention has the following advantages:

(1) the method calculates the PageRank value of the graph vertex through the random walk path generated by the random walk, fully reflects the probability change of the vertex being accessed through the random walk, and can accurately update the PageRank of the vertex;

determining an influence area of dynamic change in a flow graph on an existing graph by using the sound wave propagation principle, determining a newly added round needing random walk on the basis of existing random walk path information, and updating the PageRank by determining the walk probability of an influenced vertex passing through; forming a new graph with super-vertices by using the idea of aggregation increment for the newly added vertices, and then completing the calculation of the newly added vertices PageRank through random walk;

the calculation process utilizes the existing calculation result as much as possible, so that the speed of updating the PageRank is greatly increased; the dynamic flow graph PageRank obtained by incremental updating is accurate and effective, and a method for reliably and quickly updating PageRank is provided for the application field of dynamic flow graphs with high real-time requirements.

(2) The method utilizes the idea of incremental calculation to update the PageRank of the changed and influenced part of the vertexes of the dynamic flow graph, and simultaneously reserves and continues the PaegRank of the unaffected vertexes, so that the calculation amount of solution can be greatly reduced, the PageRank can be quickly obtained, and the correctness of the vertexes PageRank can be ensured.

(3) The invention uses the changed random walk path number as the round of the random walk again, can reflect the real number of times that the random walk is changed through the vertex i as much as possible, and improves any vertex

Accuracy of the PageRank increment calculation.

(4) In order to reflect the true vertex PageRank value as much as possible, the invention repeats M rounds of random walk starting from any vertex.

Drawings

FIG. 1 is a schematic diagram of a dynamic flow graph PageRank update incremental computation scheme change;

fig. 2 is a schematic diagram of a change in dynamic flow graph data.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.

Example 1

The embodiment provides a method for updating importance of data vertex of a dynamic flow graph based on random walk, which comprises the following steps:

As a preferred embodiment, the random walk mode starting from an arbitrary vertex repeats a preset first round number.

As a preferred embodiment, in the dynamic flow graph data updating process at each moment, updating the PageRank value of each vertex in a preset incremental calculation mode; the changes formed in the dynamic flow graph data updating process comprise adding edges, adding peaks, deleting peaks and deleting edges;

the incremental calculation mode comprises the following steps:

Starting from the vertex u by adopting the random walk mode

Randomly walking the wheel to obtain a vertex u random walking path;

Further, the incremental calculation method further includes:

Random walk of wheelObtaining a vertex u random walk path;

Further, the incremental calculation method further includes:

The invention also provides a device for updating the importance of the data vertex of the dynamic flow graph based on the random walk, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor calls the computer program to execute the method for updating the importance of the data vertex of the dynamic flow graph based on the random walk.

The key parts of the process are described in detail below.

1) General description

In FIG. 1, Δ G with graph change information^tIs determined by the principle of acoustic wave propagation^tAnd the affected vertexes incrementally update the PageRank of the vertexes through a random walk method, and the unaffected vertexes follow the PageRank which is calculated previously. In fig. 1, when a new vertex arrives, a new graph with a hyper vertex needs to be constructed, and the PageRank of the new vertex is calculated by using a random walk method. Thereby completing the real-time updating of all vertices.

2) Dynamic flowsheet modeling and random walk mode definition

The associated data continuously arrive according to the time sequence 1,2,3, …, t, …, and thus the dynamic flow graph data G is obtained¹,G²,G³,…,G^t…. G is satisfied because the data does not mutate^t＝G^t-1+ΔG^tWherein Δ G^tGraph change information for addition or deletion of vertices and edges in a t-time flow graph, G^tAccording to Δ G^tThe graph change information is given in the graph, and all graph data accumulated from the beginning to the time t are changed. Generally speaking,. DELTA.G^tIs a very small scale diagram, and Δ G is not recorded^t＝(ΔV,ΔE)＝(ΔV⁰+ΔV⁺+ΔV^-,ΔE⁺+ΔE^-) Wherein Δ V⁰G for adding or deleting edge connection^t-1Central origin set of vertices, Δ V⁺For the newly added set of vertices at time t, Δ V^-Set of vertices deleted for time t, Δ E⁺For the newly added edge set at time t, Δ E^-For the set of edges deleted at time t.

The random walk mode is defined as: in the graph G ═ V, E, all vertices start to walk to their succeeding outgoing vertices with a probability, α is also referred to as a random walk coefficient, and any outgoing side walk is selected with the inverse of the degree of departure as a probability, and each vertex stops walking with a probability of 1- α. If the walk to the subsequent vertex is continued, the step of R is not exceeded at most, and particularly when the vertex without the edge is encountered, the round of walk is stopped immediately. To try to reflect the true vertex PageRank values, M rounds of random walks starting from any vertex were repeated.

3) Determining original graph vertices affected by Δ G

Due to Δ G^tGraph change information, Δ G, representing the generation and arrival of a flow graph at time t^tWill pass through Δ V⁰Vertex pair G in^t-1The PageRank of the middle vertex has certain influence, and the influence is continuously reduced in the process of being spread and diffused to a distance until the influence infinitely approaches 0 or no subsequent departure vertex exists, and the spread of the influence is stopped. Further obtain the material receivingΔG^tThe affected vertex is the vertex needing to recalculate PageRank, and the set formed by the vertices is marked as

4) PageRank incremental computation of affected vertices

Δ G will change the number of times the original random walk of the affected vertex has passed, so the number of random walk paths for which the map change position changes needs to be calculated first. In order to reflect the real number of times that the random walk is changed through the vertex i as much as possible, any vertex is increased

The accuracy of PageRank increment calculation is realized by using the changed random walk path number as the round of re-random walk, and updating the random walk to pass through G after the random walk of all rounds is finished^tAnd updating the total times of the vertexes of the affected region to obtain the PageRank of the affected vertexes.

5) Calculation of PageRank new vertex in delta G

Due to Δ G^tThe method is a subgraph, all vertexes comprise original vertexes, newly added vertexes and deleted vertexes, and all edges comprise newly added edges and deleted edges. For Δ V⁺PageRank computation of newly added vertices in the set, taking into account Δ G^tThe change of the graph is mainly generated in the original vertex set delta V⁰Position of middle vertex, Δ V⁰All the vertexes are aggregated into a supervertex, and the supervertex and the newly added vertex delta V are reserved⁺All the connecting edges of (2) are connected to the hyper vertex at the other ends thereof, thereby forming a new smaller scale graph denoted as G'^tTherefore, the random walk mode can be continuously used, and the Delta G is calculated^tAnd adding the PageRank of the new vertex in the middle.

The specific implementation process of the dynamic flow graph data vertex importance updating method of the embodiment is as follows: in the existing diagram G^t-1On the basis, when the graph change information delta G generated by the dynamic flow graph arrives, according to the position information of the delta G,determining vertexes influenced by delta G by using a sound wave propagation principle, calculating the wandering probability change of the influenced vertexes through a random wandering algorithm, and changing part of random wandering paths so as to incrementally update the PageRank of the vertexes; for the unaffected original vertex, continue to use G^t-1The calculated vertex PageRank; and if the newly added vertex exists, calculating by utilizing an aggregation increment idea and a random walk algorithm to obtain the corresponding PageRank. The specific steps of the algorithm are as follows:

inputting: g^t-1＝(V,E),ΔG^t＝(ΔV,ΔE)＝(ΔV⁰+ΔV⁺+ΔV^-,ΔE⁺+ΔE^-) Attenuation coefficient beta, threshold delta for influencing stop propagation, random walk coefficient alpha, and preset round number M of random walks;

and (3) outputting: g^tPageRank value set corresponding to all vertexes in the system

Step 1: when Δ G^tAfter arrival, the set of vertices affected by the arrival is determined

While unaffected vertex edges use the original PageRank, which constitutes a set

If there are affected vertices, i.e.

If not, turning to the step 2; if it is

Turning to step 3 if the empty set is obtained;

step 2: by Δ G^tCan know Δ V⁺And Δ V^-,ΔE⁺And Δ E^-Determining whether the change is adding or deleting vertexes and edges, if the change is adding vertexes or edges, executing the step 2.1, and if the change is deleting vertexes or edges, executing the step 2.2;

step 2.1: (1) traversal sets

When the newly added edge e is an edge between two existing vertexes, (u, v), the number of times the passing vertex u is increased is calculated

In a walking manner as defined herein, using

As a round of random walk from the vertex u, the process is performed from the vertex u

Randomly walking the wheel, and if the random walking path passing through the newly added edge e passes through any vertex i, adding 1 to the total number of times of the original passing through the vertex i; if the vertex i passes through any vertex i of the affected area in the figure without passing through e, the total number of times of the original passing through the vertex i is reduced by 1. After all rounds of random walk are finished, statistics G is carried out^tTotal number of times per vertex i of the affected area

Thereby calculating the affected vertex

PageRank of (c), constitute a set

(2) Traverse Δ E⁺When the new edge e is the edge of the new vertex and the existing vertex, G^tThe total number of middle vertices is | V | +1, the operation process of processing the newly added edge is shown as (1) in the step 2.1, and the affected vertices are further calculated

PageRank, update set

(3) In general,. DELTA.G^tTraversing Δ G to include information for adding multiple vertices and edges^tRepeating the step 2 until the end, and turning to the step 3;

step 2.2: (1) traversal set Δ E^-When the deleted edge e is (u, v), the number of times the passing vertex u is reduced is calculated

Using according to the walking mode defined above

Randomly selecting the out-degree vertex and the vertex v of the vertex u as the starting points to perform the round required by the random walk

If the random walk path starting from the out-degree vertex of the vertex u passes through any vertex i, adding 1 to the total number of times of the original passing through the vertex i; if the random walk path from vertex v passes through any vertex i, 1 is subtracted from the original total number of times that vertex i has been passed. After all rounds of random walk are finished, statistics G is carried out^tTotal number of times per vertex i of the affected area

And then calculating the affected vertex

PageRank, update set

(2) Traverse Δ E^-G when a deleted edge e causes a vertex to be deleted for the deletion of that edge^tThe total number of vertices in the tree is | V | -1, the operation process of processing the deleted edge is shown as (1) in the step 2.2, and the affected vertices are calculated

PageRank value of (1), update set

(3) In general,. DELTA.G^tTraversing Δ G for information including deleting multiple vertices and edges^tRepeatedly executing the step 2 until the end, and turning to the step 3;

and step 3: for newly added vertex Δ V⁺By first using Δ V⁰Aggregating to form a hyper vertex, retaining the connection edge of the hyper vertex and the newly added vertex to form a new graph with the hyper vertex, solving the PageRank of the newly added vertex to form a set

And 4, step 4: output collection

The method has certain practical application value, for example, in the process of webpage searching, when the webpages and the related links thereof are newly added, the method can quickly determine the sequencing of searching results according to the PageRank values of all the webpages, and displays the top-K sequenced webpages to users, so that the users can quickly search the most important and related webpage information; in an electronic commerce system, the PageRank value of the commodity is quickly updated by using commodity information purchased or browsed by a user and the change of the concerned commodity through the method, and the updated commodity with higher ranking is recommended to the user; on the social network, according to the PageRank values of other users connected with a user, a friend circle closely connected with the user can be found, if the user is a criminal, a criminal group can also be found, when the friend circle of the criminal changes, the PageRank can be updated in an incremental mode according to the method, the trend of the criminal is analyzed, and then criminal groups are effectively attacked.

The embodiment also provides a web page search ranking method, which is based on the above dynamic flow graph data vertex importance updating method based on random walks, calculates PageRank values of all web pages to determine ranking of search results, and displays the web pages according to the ranking results.

The embodiment also provides a commodity recommendation method, which updates the PageRank value of a commodity and recommends the updated commodity with higher rank to a user based on the dynamic flow graph data vertex importance updating method based on random walk by using commodity information purchased or browsed by the user and the change of the concerned commodity.

The embodiment also provides a user friend prediction method, which is used for calculating the PageRank value of other users connected with a user on the basis of the dynamic flow graph data vertex importance updating method based on random walk on the basis of social information on a social network, and finding a friend circle closely connected with the user.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A method for updating importance of data vertex of a dynamic flow graph based on random walk is characterized by comprising the following steps:

2. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 1, wherein the random walk mode starting from any vertex repeats a preset first round number.

3. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 2, wherein the PageRank value of each vertex is updated in a preset incremental calculation mode in the process of updating the data of the dynamic flow graph at each moment; the changes formed in the dynamic flow graph data updating process comprise adding edges, adding peaks, deleting peaks and deleting edges;

the incremental calculation mode comprises the following steps:

Starting from the vertex u by adopting the random walk mode

Randomly walking the wheel to obtain a vertex u random walking path;

4. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 3, wherein the incremental calculation mode further comprises:

5. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 3, wherein the incremental calculation mode further comprises:

Randomly walking the wheel to obtain a vertex u random walking path;

if the vertex u random walk path from the out-degree vertex of the vertex u passes through any vertex i, adding 1 to the original total times of passing through the vertex i; and if the vertex u starting from the vertex v randomly walks through any vertex i, subtracting 1 from the original total times of passing through the vertex i, and calculating and updating the PageRank value of each vertex.

6. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 5, wherein the incremental calculation further comprises:

7. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 3, wherein the incremental calculation mode further comprises: and sequentially processing the changes formed in the data updating process of the dynamic flow graph until all the changes are traversed.

8. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 1, wherein the vertex which is not affected in the process of updating the data of the dynamic flow graph follows the original PageRank value of the vertex.

9. An apparatus for updating importance of data vertex of a dynamic flow graph based on random walk, comprising a memory and a processor, wherein the memory stores a computer program, and the processor calls the computer program to execute the following steps:

10. The apparatus for updating importance of data vertex of dynamic flow graph based on random walk according to claim 9, wherein the PageRank value of each vertex is updated in a preset incremental calculation manner in the process of updating data of dynamic flow graph at each moment; the changes formed in the dynamic flow graph data updating process comprise adding edges, adding peaks, deleting peaks and deleting edges;

the incremental calculation mode comprises the following steps:

Starting from the vertex u by adopting the random walk mode

Randomly walking the wheel to obtain a vertex u random walking path;