CN112417247A - Dynamic flow graph data vertex importance updating method and device based on random walk - Google Patents
Dynamic flow graph data vertex importance updating method and device based on random walk Download PDFInfo
- Publication number
- CN112417247A CN112417247A CN202011315919.1A CN202011315919A CN112417247A CN 112417247 A CN112417247 A CN 112417247A CN 202011315919 A CN202011315919 A CN 202011315919A CN 112417247 A CN112417247 A CN 112417247A
- Authority
- CN
- China
- Prior art keywords
- vertex
- random walk
- updating
- flow graph
- dynamic flow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9537—Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention relates to a method and a device for updating the importance of data vertexes of a dynamic flow graph based on random walk, wherein the method comprises the following steps: acquiring associated data in real time according to the time sequence, and updating the data of the dynamic flow graph in real time; acquiring an affected vertex and a newly added vertex in the data updating process of the dynamic flow graph at each moment; generating a random walk path by each vertex in the dynamic flow graph data in a preset random walk mode; calculating or updating the PageRank value of each affected vertex according to the total times of the random walk path passing through each affected vertex; aggregating the original vertexes of the dynamic flow graph data into a hyper-vertex, reserving all connecting edges of newly-added vertexes in the dynamic flow graph data, connecting the other ends of the connecting edges with the hyper-vertex, and thus obtaining a new graph, wherein the PageRank value of each newly-added vertex is calculated or updated in the new graph through the method. Compared with the prior art, the method and the device ensure the accuracy of the calculation result and the real-time performance of the calculation.
Description
Technical Field
The invention relates to the field of importance updating of data vertexes of dynamic flow graphs, in particular to a random walk-based importance updating method of data vertexes of a dynamic flow graph.
Background
The initial PageRank concept refers to the ranking value of Web page importance, which is currently widely referred to as the ranking value of vertex importance in a graph, and is usually obtained by continuously iteratively converging a connection matrix and a feature vector of the graph. In the big data era, with the rapid development of social networks, a plurality of large-scale dynamic flow graphs are generated, and the importance of each vertex in the graph, namely PageRank, needs to be calculated so as to develop field application. For example, in a dynamic social network, there is a need to find circles of friends on the fly, or to quickly discover criminal groups, etc., based on the PageRank of vertices.
The traditional method for solving the PageRank mainly comprises a PageRank method based on static graph calculation, an incremental power iteration method, an aggregation incremental calculation method and the like, and the method has the following defects:
1. the PageRank method based on static graph calculation utilizes global graph data to perform power iteration method calculation again on a changed new graph to update the PageRank, consumes a large amount of time and calculation resources, and is difficult to meet the real-time requirement of related graph application.
2. The incremental power iteration method provides an incremental updating iteration model, but the model needs a large amount of time overhead to ensure the accuracy of the PageRank updating, and meanwhile, errors of updating the PageRank along with a continuously arriving flow graph are accumulated continuously. The aggregation delta method has difficulty in determining the vertices that need to be aggregated, and the degree of aggregation directly affects the quality and computational complexity of the PageRank update.
In conclusion, the traditional method pursues the accuracy of PageRank calculation, so that graph data which changes continuously and rapidly are difficult to deal with; or the accuracy of the PageRank is sacrificed to obtain a small amount of calculation, the updating speed is accelerated, and the calculation error of the PageRank is accumulated continuously along with the continuous change of a flow graph; therefore, the traditional method is difficult to achieve reasonable balance on the accuracy and the real-time performance of updating the PageRank, is difficult to be suitable for a continuously changing dynamic flow diagram environment, cannot effectively and quickly update the PageRank value, and is particularly suitable for the application field needing real-time processing.
Disclosure of Invention
The invention aims to overcome the defect that the PageRank calculation cannot have both accuracy and real-time performance in the prior art, and provides a dynamic flow graph data vertex importance updating method based on random walk.
The purpose of the invention can be realized by the following technical scheme:
a dynamic flow graph data vertex importance updating method based on random walk comprises the following steps:
acquiring associated data in real time according to the time sequence, and updating the data of the dynamic flow graph in real time;
acquiring an affected vertex and a newly added vertex in the data updating process of the dynamic flow graph at each moment;
generating a random walk path by each vertex in the dynamic flow graph data in a preset random walk mode;
calculating or updating the PageRank value of each affected vertex according to the total times of the random walk path passing through each affected vertex;
aggregating original vertexes of the dynamic flow graph data into a hyper-vertex, reserving all connecting edges of newly-added vertexes in the dynamic flow graph data, connecting the other ends of the connecting edges with the hyper-vertex to obtain a new graph, generating a random walk path for each vertex of the new graph in a preset random walk mode, and calculating or updating the PageRank value of each newly-added vertex according to the total times that the random walk path passes through each newly-added vertex;
the random walk mode is that a certain vertex walks to the subsequent vertexes of other outgoing edges by taking alpha as probability, the random outgoing edge walk is selected by taking the inverse of the outgoing degree as probability, each vertex stops walking by taking 1-alpha as probability, if the walking to the subsequent vertexes is continued, the walking does not exceed R step at most, and when the vertex without the outgoing edge is met, the round of walking is stopped immediately.
Further, the random walk mode starting from any vertex repeats a preset first round number.
Furthermore, in the dynamic flow graph data updating process at each moment, updating the PageRank value of each vertex in a preset incremental calculation mode; the changes formed in the dynamic flow graph data updating process comprise adding edges, adding peaks, deleting peaks and deleting edges;
the incremental calculation mode comprises the following steps:
a first processing step of newly adding edges: if the vertex u and the vertex v in the newly added edge e ═ u, v) are both existing vertices, the number of times of increasing the passing vertex u is calculatedStarting from the vertex u by adopting the random walk modeRandomly walking the wheel to obtain a vertex u random walking path;
and if the vertex u random walk path passes through the newly added edge e and passes through any vertex i, adding 1 to the total number of times of the original random walk path passing through the vertex i, and if the vertex u random walk path does not pass through the newly added edge e but passes through any vertex i, subtracting 1 from the number of times of the original random walk path passing through the vertex i, thereby calculating and updating the PageRank value of each vertex.
Further, the incremental calculation method further includes:
a second processing step of newly adding edges: and if the newly-added edge e is the edge of the newly-added vertex and the existing vertex, adding one to the total number of the vertices of the dynamic flow graph data, and updating the PageRank value of each vertex by adopting the first processing step of the newly-added edge.
Further, the incremental calculation method further includes:
a first processing step of deleting edges: if both vertex u and vertex v are existing vertices in the deleted edge e ═ u, v, the number of times the passing vertex u is reduced is calculatedBy adopting the random walk mode, the out-degree vertex and the vertex v of the vertex u are randomly selected as the starting points to carry outRandomly walking the wheel to obtain a vertex u random walking path;
if the vertex u random walk path from the out-degree vertex of the vertex u passes through any vertex i, adding 1 to the original total times of passing through the vertex i; and if the vertex u starting from the vertex v randomly walks through any vertex i, subtracting 1 from the original total number of times of passing through the vertex i, and calculating and updating the PageRank value of each vertex.
Further, the incremental calculation method further includes:
and a second processing step of deleting edges: and if the deleted edge e is the edge deleted caused by deleting one vertex, reducing the total number of the vertexes of the dynamic flow graph data by one, and updating the PageRank value of each vertex by adopting the first processing step of deleting the edge.
Further, the incremental calculation method further includes: and sequentially processing the changes formed in the dynamic flow graph data updating process until all changes are traversed.
Further, the unaffected vertex in the dynamic flow graph data updating process follows the original PageRank value of the vertex.
The invention also provides a device for updating the importance of the data vertex of the dynamic flow graph based on random walk, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor calls the computer program to execute the following steps:
acquiring associated data in real time according to the time sequence, and updating the data of the dynamic flow graph in real time;
acquiring an affected vertex and a newly added vertex in the data updating process of the dynamic flow graph at each moment;
generating a random walk path by each vertex in the dynamic flow graph data in a preset random walk mode;
calculating or updating the PageRank value of each affected vertex according to the total times of the random walk path passing through each affected vertex;
aggregating original vertexes of the dynamic flow graph data into a hyper-vertex, reserving all connecting edges of newly-added vertexes in the dynamic flow graph data, connecting the other ends of the connecting edges with the hyper-vertex to obtain a new graph, generating a random walk path for each vertex of the new graph in a preset random walk mode, and calculating or updating the PageRank value of each newly-added vertex according to the total times that the random walk path passes through each newly-added vertex;
the random walk mode is that a certain vertex walks to the subsequent vertexes of other outgoing edges by taking alpha as probability, the random outgoing edge walk is selected by taking the inverse of the outgoing degree as probability, each vertex stops walking by taking 1-alpha as probability, if the walking to the subsequent vertexes is continued, the walking does not exceed R step at most, and when the vertex without the outgoing edge is met, the round of walking is stopped immediately.
Furthermore, in the dynamic flow graph data updating process at each moment, updating the PageRank value of each vertex in a preset incremental calculation mode; the changes formed in the dynamic flow graph data updating process comprise adding edges, adding peaks, deleting peaks and deleting edges;
the incremental calculation mode comprises the following steps:
a first processing step of newly adding edges: if the vertex u and the vertex v in the newly added edge e ═ u, v) are both existing vertices, the number of times of increasing the passing vertex u is calculatedStarting from the vertex u by adopting the random walk modeRandomly walking the wheel to obtain a vertex u random walking path;
and if the vertex u random walk path passes through the newly added edge e and passes through any vertex i, adding 1 to the total number of times of the original random walk path passing through the vertex i, and if the vertex u random walk path does not pass through the newly added edge e but passes through any vertex i, subtracting 1 from the number of times of the original random walk path passing through the vertex i, thereby calculating and updating the PageRank value of each vertex.
Compared with the prior art, the invention has the following advantages:
(1) the method calculates the PageRank value of the graph vertex through the random walk path generated by the random walk, fully reflects the probability change of the vertex being accessed through the random walk, and can accurately update the PageRank of the vertex;
determining an influence area of dynamic change in a flow graph on an existing graph by using the sound wave propagation principle, determining a newly added round needing random walk on the basis of existing random walk path information, and updating the PageRank by determining the walk probability of an influenced vertex passing through; forming a new graph with super-vertices by using the idea of aggregation increment for the newly added vertices, and then completing the calculation of the newly added vertices PageRank through random walk;
the calculation process utilizes the existing calculation result as much as possible, so that the speed of updating the PageRank is greatly increased; the dynamic flow graph PageRank obtained by incremental updating is accurate and effective, and a method for reliably and quickly updating PageRank is provided for the application field of dynamic flow graphs with high real-time requirements.
(2) The method utilizes the idea of incremental calculation to update the PageRank of the changed and influenced part of the vertexes of the dynamic flow graph, and simultaneously reserves and continues the PaegRank of the unaffected vertexes, so that the calculation amount of solution can be greatly reduced, the PageRank can be quickly obtained, and the correctness of the vertexes PageRank can be ensured.
(3) The invention uses the changed random walk path number as the round of the random walk again, can reflect the real number of times that the random walk is changed through the vertex i as much as possible, and improves any vertexAccuracy of the PageRank increment calculation.
(4) In order to reflect the true vertex PageRank value as much as possible, the invention repeats M rounds of random walk starting from any vertex.
Drawings
FIG. 1 is a schematic diagram of a dynamic flow graph PageRank update incremental computation scheme change;
fig. 2 is a schematic diagram of a change in dynamic flow graph data.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
Example 1
The embodiment provides a method for updating importance of data vertex of a dynamic flow graph based on random walk, which comprises the following steps:
acquiring associated data in real time according to the time sequence, and updating the data of the dynamic flow graph in real time;
acquiring an affected vertex and a newly added vertex in the data updating process of the dynamic flow graph at each moment;
generating a random walk path by each vertex in the dynamic flow graph data in a preset random walk mode;
calculating or updating the PageRank value of each affected vertex according to the total times of the random walk path passing through each affected vertex;
aggregating original vertexes of the dynamic flow graph data into a hyper-vertex, reserving all connecting edges of newly-added vertexes in the dynamic flow graph data, connecting the other ends of the connecting edges with the hyper-vertex to obtain a new graph, generating a random walk path for each vertex of the new graph in a preset random walk mode, and calculating or updating the PageRank value of each newly-added vertex according to the total times that the random walk path passes through each newly-added vertex;
the random walk mode is that a certain vertex walks to the subsequent vertexes of other outgoing edges by taking alpha as probability, the random outgoing edge walk is selected by taking the inverse of the outgoing degree as probability, each vertex stops walking by taking 1-alpha as probability, if the walking to the subsequent vertexes is continued, the walking does not exceed R step at most, and when the vertex without the outgoing edge is met, the round of walking is stopped immediately.
As a preferred embodiment, the random walk mode starting from an arbitrary vertex repeats a preset first round number.
As a preferred embodiment, in the dynamic flow graph data updating process at each moment, updating the PageRank value of each vertex in a preset incremental calculation mode; the changes formed in the dynamic flow graph data updating process comprise adding edges, adding peaks, deleting peaks and deleting edges;
the incremental calculation mode comprises the following steps:
a first processing step of newly adding edges: if the vertex u and the vertex v in the newly added edge e ═ u, v) are both existing vertices, the number of times of increasing the passing vertex u is calculatedStarting from the vertex u by adopting the random walk modeRandomly walking the wheel to obtain a vertex u random walking path;
and if the vertex u random walk path passes through the newly added edge e and passes through any vertex i, adding 1 to the total number of times of the original random walk path passing through the vertex i, and if the vertex u random walk path does not pass through the newly added edge e but passes through any vertex i, subtracting 1 from the number of times of the original random walk path passing through the vertex i, thereby calculating and updating the PageRank value of each vertex.
Further, the incremental calculation method further includes:
a second processing step of newly adding edges: and if the newly-added edge e is the edge of the newly-added vertex and the existing vertex, adding one to the total number of the vertices of the dynamic flow graph data, and updating the PageRank value of each vertex by adopting the first processing step of the newly-added edge.
Further, the incremental calculation method further includes:
a first processing step of deleting edges: if both vertex u and vertex v are existing vertices in the deleted edge e ═ u, v, the number of times the passing vertex u is reduced is calculatedBy adopting the random walk mode, the out-degree vertex and the vertex v of the vertex u are randomly selected as the starting points to carry outRandom walk of wheelObtaining a vertex u random walk path;
if the vertex u random walk path from the out-degree vertex of the vertex u passes through any vertex i, adding 1 to the original total times of passing through the vertex i; and if the vertex u starting from the vertex v randomly walks through any vertex i, subtracting 1 from the original total number of times of passing through the vertex i, and calculating and updating the PageRank value of each vertex.
Further, the incremental calculation method further includes:
and a second processing step of deleting edges: and if the deleted edge e is the edge deleted caused by deleting one vertex, reducing the total number of the vertexes of the dynamic flow graph data by one, and updating the PageRank value of each vertex by adopting the first processing step of deleting the edge.
Further, the incremental calculation method further includes: and sequentially processing the changes formed in the dynamic flow graph data updating process until all changes are traversed.
Further, the unaffected vertex in the dynamic flow graph data updating process follows the original PageRank value of the vertex.
The invention also provides a device for updating the importance of the data vertex of the dynamic flow graph based on the random walk, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor calls the computer program to execute the method for updating the importance of the data vertex of the dynamic flow graph based on the random walk.
The key parts of the process are described in detail below.
1) General description
In FIG. 1, Δ G with graph change informationtIs determined by the principle of acoustic wave propagationtAnd the affected vertexes incrementally update the PageRank of the vertexes through a random walk method, and the unaffected vertexes follow the PageRank which is calculated previously. In fig. 1, when a new vertex arrives, a new graph with a hyper vertex needs to be constructed, and the PageRank of the new vertex is calculated by using a random walk method. Thereby completing the real-time updating of all vertices.
2) Dynamic flowsheet modeling and random walk mode definition
The associated data continuously arrive according to the time sequence 1,2,3, …, t, …, and thus the dynamic flow graph data G is obtained1,G2,G3,…,Gt…. G is satisfied because the data does not mutatet=Gt-1+ΔGtWherein Δ GtGraph change information for addition or deletion of vertices and edges in a t-time flow graph, GtAccording to Δ GtThe graph change information is given in the graph, and all graph data accumulated from the beginning to the time t are changed. Generally speaking,. DELTA.GtIs a very small scale diagram, and Δ G is not recordedt=(ΔV,ΔE)=(ΔV0+ΔV++ΔV-,ΔE++ΔE-) Wherein Δ V0G for adding or deleting edge connectiont-1Central origin set of vertices, Δ V+For the newly added set of vertices at time t, Δ V-Set of vertices deleted for time t, Δ E+For the newly added edge set at time t, Δ E-For the set of edges deleted at time t.
The random walk mode is defined as: in the graph G ═ V, E, all vertices start to walk to their succeeding outgoing vertices with a probability, α is also referred to as a random walk coefficient, and any outgoing side walk is selected with the inverse of the degree of departure as a probability, and each vertex stops walking with a probability of 1- α. If the walk to the subsequent vertex is continued, the step of R is not exceeded at most, and particularly when the vertex without the edge is encountered, the round of walk is stopped immediately. To try to reflect the true vertex PageRank values, M rounds of random walks starting from any vertex were repeated.
3) Determining original graph vertices affected by Δ G
Due to Δ GtGraph change information, Δ G, representing the generation and arrival of a flow graph at time ttWill pass through Δ V0Vertex pair G int-1The PageRank of the middle vertex has certain influence, and the influence is continuously reduced in the process of being spread and diffused to a distance until the influence infinitely approaches 0 or no subsequent departure vertex exists, and the spread of the influence is stopped. Further obtain the material receivingΔGtThe affected vertex is the vertex needing to recalculate PageRank, and the set formed by the vertices is marked as
4) PageRank incremental computation of affected vertices
Δ G will change the number of times the original random walk of the affected vertex has passed, so the number of random walk paths for which the map change position changes needs to be calculated first. In order to reflect the real number of times that the random walk is changed through the vertex i as much as possible, any vertex is increasedThe accuracy of PageRank increment calculation is realized by using the changed random walk path number as the round of re-random walk, and updating the random walk to pass through G after the random walk of all rounds is finishedtAnd updating the total times of the vertexes of the affected region to obtain the PageRank of the affected vertexes.
5) Calculation of PageRank new vertex in delta G
Due to Δ GtThe method is a subgraph, all vertexes comprise original vertexes, newly added vertexes and deleted vertexes, and all edges comprise newly added edges and deleted edges. For Δ V+PageRank computation of newly added vertices in the set, taking into account Δ GtThe change of the graph is mainly generated in the original vertex set delta V0Position of middle vertex, Δ V0All the vertexes are aggregated into a supervertex, and the supervertex and the newly added vertex delta V are reserved+All the connecting edges of (2) are connected to the hyper vertex at the other ends thereof, thereby forming a new smaller scale graph denoted as G'tTherefore, the random walk mode can be continuously used, and the Delta G is calculatedtAnd adding the PageRank of the new vertex in the middle.
The specific implementation process of the dynamic flow graph data vertex importance updating method of the embodiment is as follows: in the existing diagram Gt-1On the basis, when the graph change information delta G generated by the dynamic flow graph arrives, according to the position information of the delta G,determining vertexes influenced by delta G by using a sound wave propagation principle, calculating the wandering probability change of the influenced vertexes through a random wandering algorithm, and changing part of random wandering paths so as to incrementally update the PageRank of the vertexes; for the unaffected original vertex, continue to use Gt-1The calculated vertex PageRank; and if the newly added vertex exists, calculating by utilizing an aggregation increment idea and a random walk algorithm to obtain the corresponding PageRank. The specific steps of the algorithm are as follows:
inputting: gt-1=(V,E),ΔGt=(ΔV,ΔE)=(ΔV0+ΔV++ΔV-,ΔE++ΔE-) Attenuation coefficient beta, threshold delta for influencing stop propagation, random walk coefficient alpha, and preset round number M of random walks;
Step 1: when Δ GtAfter arrival, the set of vertices affected by the arrival is determinedWhile unaffected vertex edges use the original PageRank, which constitutes a setIf there are affected vertices, i.e.If not, turning to the step 2; if it isTurning to step 3 if the empty set is obtained;
step 2: by Δ GtCan know Δ V+And Δ V-,ΔE+And Δ E-Determining whether the change is adding or deleting vertexes and edges, if the change is adding vertexes or edges, executing the step 2.1, and if the change is deleting vertexes or edges, executing the step 2.2;
step 2.1: (1) traversal setsWhen the newly added edge e is an edge between two existing vertexes, (u, v), the number of times the passing vertex u is increased is calculatedIn a walking manner as defined herein, usingAs a round of random walk from the vertex u, the process is performed from the vertex uRandomly walking the wheel, and if the random walking path passing through the newly added edge e passes through any vertex i, adding 1 to the total number of times of the original passing through the vertex i; if the vertex i passes through any vertex i of the affected area in the figure without passing through e, the total number of times of the original passing through the vertex i is reduced by 1. After all rounds of random walk are finished, statistics G is carried outtTotal number of times per vertex i of the affected areaThereby calculating the affected vertexPageRank of (c), constitute a set(2) Traverse Δ E+When the new edge e is the edge of the new vertex and the existing vertex, GtThe total number of middle vertices is | V | +1, the operation process of processing the newly added edge is shown as (1) in the step 2.1, and the affected vertices are further calculatedPageRank, update set(3) In general,. DELTA.GtTraversing Δ G to include information for adding multiple vertices and edgestRepeating the step 2 until the end, and turning to the step 3;
step 2.2: (1) traversal set Δ E-When the deleted edge e is (u, v), the number of times the passing vertex u is reduced is calculatedUsing according to the walking mode defined aboveRandomly selecting the out-degree vertex and the vertex v of the vertex u as the starting points to perform the round required by the random walkIf the random walk path starting from the out-degree vertex of the vertex u passes through any vertex i, adding 1 to the total number of times of the original passing through the vertex i; if the random walk path from vertex v passes through any vertex i, 1 is subtracted from the original total number of times that vertex i has been passed. After all rounds of random walk are finished, statistics G is carried outtTotal number of times per vertex i of the affected areaAnd then calculating the affected vertexPageRank, update set(2) Traverse Δ E-G when a deleted edge e causes a vertex to be deleted for the deletion of that edgetThe total number of vertices in the tree is | V | -1, the operation process of processing the deleted edge is shown as (1) in the step 2.2, and the affected vertices are calculatedPageRank value of (1), update set(3) In general,. DELTA.GtTraversing Δ G for information including deleting multiple vertices and edgestRepeatedly executing the step 2 until the end, and turning to the step 3;
and step 3: for newly added vertex Δ V+By first using Δ V0Aggregating to form a hyper vertex, retaining the connection edge of the hyper vertex and the newly added vertex to form a new graph with the hyper vertex, solving the PageRank of the newly added vertex to form a set
The method has certain practical application value, for example, in the process of webpage searching, when the webpages and the related links thereof are newly added, the method can quickly determine the sequencing of searching results according to the PageRank values of all the webpages, and displays the top-K sequenced webpages to users, so that the users can quickly search the most important and related webpage information; in an electronic commerce system, the PageRank value of the commodity is quickly updated by using commodity information purchased or browsed by a user and the change of the concerned commodity through the method, and the updated commodity with higher ranking is recommended to the user; on the social network, according to the PageRank values of other users connected with a user, a friend circle closely connected with the user can be found, if the user is a criminal, a criminal group can also be found, when the friend circle of the criminal changes, the PageRank can be updated in an incremental mode according to the method, the trend of the criminal is analyzed, and then criminal groups are effectively attacked.
The embodiment also provides a web page search ranking method, which is based on the above dynamic flow graph data vertex importance updating method based on random walks, calculates PageRank values of all web pages to determine ranking of search results, and displays the web pages according to the ranking results.
The embodiment also provides a commodity recommendation method, which updates the PageRank value of a commodity and recommends the updated commodity with higher rank to a user based on the dynamic flow graph data vertex importance updating method based on random walk by using commodity information purchased or browsed by the user and the change of the concerned commodity.
The embodiment also provides a user friend prediction method, which is used for calculating the PageRank value of other users connected with a user on the basis of the dynamic flow graph data vertex importance updating method based on random walk on the basis of social information on a social network, and finding a friend circle closely connected with the user.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.
Claims (10)
1. A method for updating importance of data vertex of a dynamic flow graph based on random walk is characterized by comprising the following steps:
acquiring associated data in real time according to the time sequence, and updating the data of the dynamic flow graph in real time;
acquiring an affected vertex and a newly added vertex in the data updating process of the dynamic flow graph at each moment;
generating a random walk path by each vertex in the dynamic flow graph data in a preset random walk mode;
calculating or updating the PageRank value of each affected vertex according to the total times of the random walk path passing through each affected vertex;
aggregating original vertexes of the dynamic flow graph data into a hyper-vertex, reserving all connecting edges of newly-added vertexes in the dynamic flow graph data, connecting the other ends of the connecting edges with the hyper-vertex to obtain a new graph, generating a random walk path for each vertex of the new graph in a preset random walk mode, and calculating or updating the PageRank value of each newly-added vertex according to the total times that the random walk path passes through each newly-added vertex;
the random walk mode is that a certain vertex walks to the subsequent vertexes of other outgoing edges by taking alpha as probability, the random outgoing edge walk is selected by taking the inverse of the outgoing degree as probability, each vertex stops walking by taking 1-alpha as probability, if the walking to the subsequent vertexes is continued, the walking does not exceed R step at most, and when the vertex without the outgoing edge is met, the round of walking is stopped immediately.
2. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 1, wherein the random walk mode starting from any vertex repeats a preset first round number.
3. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 2, wherein the PageRank value of each vertex is updated in a preset incremental calculation mode in the process of updating the data of the dynamic flow graph at each moment; the changes formed in the dynamic flow graph data updating process comprise adding edges, adding peaks, deleting peaks and deleting edges;
the incremental calculation mode comprises the following steps:
a first processing step of newly adding edges: if the vertex u and the vertex v in the newly added edge e ═ u, v) are both existing vertices, the number of times of increasing the passing vertex u is calculatedStarting from the vertex u by adopting the random walk modeRandomly walking the wheel to obtain a vertex u random walking path;
and if the vertex u random walk path passes through the newly added edge e and passes through any vertex i, adding 1 to the total number of times of the original random walk path passing through the vertex i, and if the vertex u random walk path does not pass through the newly added edge e but passes through any vertex i, subtracting 1 from the number of times of the original random walk path passing through the vertex i, thereby calculating and updating the PageRank value of each vertex.
4. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 3, wherein the incremental calculation mode further comprises:
a second processing step of newly adding edges: and if the newly-added edge e is the edge of the newly-added vertex and the existing vertex, adding one to the total number of the vertices of the dynamic flow graph data, and updating the PageRank value of each vertex by adopting the first processing step of the newly-added edge.
5. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 3, wherein the incremental calculation mode further comprises:
a first processing step of deleting edges: if both vertex u and vertex v are existing vertices in the deleted edge e ═ u, v, the number of times the passing vertex u is reduced is calculatedBy adopting the random walk mode, the out-degree vertex and the vertex v of the vertex u are randomly selected as the starting points to carry outRandomly walking the wheel to obtain a vertex u random walking path;
if the vertex u random walk path from the out-degree vertex of the vertex u passes through any vertex i, adding 1 to the original total times of passing through the vertex i; and if the vertex u starting from the vertex v randomly walks through any vertex i, subtracting 1 from the original total times of passing through the vertex i, and calculating and updating the PageRank value of each vertex.
6. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 5, wherein the incremental calculation further comprises:
and a second processing step of deleting edges: and if the deleted edge e is the edge deleted caused by deleting one vertex, reducing the total number of the vertexes of the dynamic flow graph data by one, and updating the PageRank value of each vertex by adopting the first processing step of deleting the edge.
7. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 3, wherein the incremental calculation mode further comprises: and sequentially processing the changes formed in the data updating process of the dynamic flow graph until all the changes are traversed.
8. The method for updating the importance of the data vertex of the dynamic flow graph based on the random walk according to claim 1, wherein the vertex which is not affected in the process of updating the data of the dynamic flow graph follows the original PageRank value of the vertex.
9. An apparatus for updating importance of data vertex of a dynamic flow graph based on random walk, comprising a memory and a processor, wherein the memory stores a computer program, and the processor calls the computer program to execute the following steps:
acquiring associated data in real time according to the time sequence, and updating the data of the dynamic flow graph in real time;
acquiring an affected vertex and a newly added vertex in the data updating process of the dynamic flow graph at each moment;
generating a random walk path by each vertex in the dynamic flow graph data in a preset random walk mode;
calculating or updating the PageRank value of each affected vertex according to the total times of the random walk path passing through each affected vertex;
aggregating original vertexes of the dynamic flow graph data into a hyper-vertex, reserving all connecting edges of newly-added vertexes in the dynamic flow graph data, connecting the other ends of the connecting edges with the hyper-vertex to obtain a new graph, generating a random walk path for each vertex of the new graph in a preset random walk mode, and calculating or updating the PageRank value of each newly-added vertex according to the total times that the random walk path passes through each newly-added vertex;
the random walk mode is that a certain vertex walks to the subsequent vertexes of other outgoing edges by taking alpha as probability, the random outgoing edge walk is selected by taking the inverse of the outgoing degree as probability, each vertex stops walking by taking 1-alpha as probability, if the walking to the subsequent vertexes is continued, the walking does not exceed R step at most, and when the vertex without the outgoing edge is met, the round of walking is stopped immediately.
10. The apparatus for updating importance of data vertex of dynamic flow graph based on random walk according to claim 9, wherein the PageRank value of each vertex is updated in a preset incremental calculation manner in the process of updating data of dynamic flow graph at each moment; the changes formed in the dynamic flow graph data updating process comprise adding edges, adding peaks, deleting peaks and deleting edges;
the incremental calculation mode comprises the following steps:
a first processing step of newly adding edges: if the vertex u and the vertex v in the newly added edge e ═ u, v) are both existing vertices, the number of times of increasing the passing vertex u is calculatedStarting from the vertex u by adopting the random walk modeRandomly walking the wheel to obtain a vertex u random walking path;
and if the vertex u random walk path passes through the newly added edge e and passes through any vertex i, adding 1 to the total number of times of the original random walk path passing through the vertex i, and if the vertex u random walk path does not pass through the newly added edge e but passes through any vertex i, subtracting 1 from the number of times of the original random walk path passing through the vertex i, thereby calculating and updating the PageRank value of each vertex.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011315919.1A CN112417247B (en) | 2020-11-22 | 2020-11-22 | Dynamic flow graph data vertex importance updating method and device based on random walk |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011315919.1A CN112417247B (en) | 2020-11-22 | 2020-11-22 | Dynamic flow graph data vertex importance updating method and device based on random walk |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112417247A true CN112417247A (en) | 2021-02-26 |
CN112417247B CN112417247B (en) | 2022-04-05 |
Family
ID=74776993
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011315919.1A Active CN112417247B (en) | 2020-11-22 | 2020-11-22 | Dynamic flow graph data vertex importance updating method and device based on random walk |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112417247B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023241641A1 (en) * | 2022-06-15 | 2023-12-21 | 华为技术有限公司 | Graph processing method and apparatus |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007100834A2 (en) * | 2006-02-27 | 2007-09-07 | The Regents Of The University Of California | Graph querying, graph motif mining and the discovery of clusters |
US20080275902A1 (en) * | 2007-05-04 | 2008-11-06 | Microsoft Corporation | Web page analysis using multiple graphs |
CN108009933A (en) * | 2016-10-27 | 2018-05-08 | 中国科学技术大学先进技术研究院 | Figure centrality computational methods and device |
CN109255073A (en) * | 2018-08-28 | 2019-01-22 | 麒麟合盛网络技术股份有限公司 | A kind of personalized recommendation method, device and electronic equipment |
CN110011838A (en) * | 2019-03-25 | 2019-07-12 | 武汉大学 | A kind of method for real time tracking of dynamic network PageRank value |
CN110019989A (en) * | 2019-04-08 | 2019-07-16 | 腾讯科技(深圳)有限公司 | A kind of data processing method and device |
-
2020
- 2020-11-22 CN CN202011315919.1A patent/CN112417247B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007100834A2 (en) * | 2006-02-27 | 2007-09-07 | The Regents Of The University Of California | Graph querying, graph motif mining and the discovery of clusters |
US20080275902A1 (en) * | 2007-05-04 | 2008-11-06 | Microsoft Corporation | Web page analysis using multiple graphs |
CN108009933A (en) * | 2016-10-27 | 2018-05-08 | 中国科学技术大学先进技术研究院 | Figure centrality computational methods and device |
CN109255073A (en) * | 2018-08-28 | 2019-01-22 | 麒麟合盛网络技术股份有限公司 | A kind of personalized recommendation method, device and electronic equipment |
CN110011838A (en) * | 2019-03-25 | 2019-07-12 | 武汉大学 | A kind of method for real time tracking of dynamic network PageRank value |
CN110019989A (en) * | 2019-04-08 | 2019-07-16 | 腾讯科技(深圳)有限公司 | A kind of data processing method and device |
Non-Patent Citations (3)
Title |
---|
周芝民等: "基于连通性和随机游走的好友推荐算法", 《信息技术》 * |
章讯等: "基于网络结构改进社交网络好友推荐算法研究", 《信息技术》 * |
赖斯: "基于GPGPU的PageRank值计算", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023241641A1 (en) * | 2022-06-15 | 2023-12-21 | 华为技术有限公司 | Graph processing method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN112417247B (en) | 2022-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230252327A1 (en) | Neural architecture search for convolutional neural networks | |
US11070643B2 (en) | Discovering signature of electronic social networks | |
CN110837602B (en) | User recommendation method based on representation learning and multi-mode convolutional neural network | |
US8903824B2 (en) | Vertex-proximity query processing | |
Fan et al. | Querying big graphs within bounded resources | |
CN113420190A (en) | Merchant risk identification method, device, equipment and storage medium | |
US8438189B2 (en) | Local computation of rank contributions | |
CN110213164B (en) | Method and device for identifying network key propagator based on topology information fusion | |
Bergamini et al. | Approximating betweenness centrality in fully dynamic networks | |
CN105224959A (en) | The training method of order models and device | |
CN107563653A (en) | Multi-robot full-coverage task allocation method | |
CN112053176B (en) | Method, device, equipment and storage medium for analyzing information delivery data | |
CN109740106A (en) | Large-scale network betweenness approximation method based on graph convolution neural network, storage device and storage medium | |
Xie et al. | Dynamic interaction graphs with probabilistic edge decay | |
CN112417247B (en) | Dynamic flow graph data vertex importance updating method and device based on random walk | |
CN106844736B (en) | Time-space co-occurrence mode mining method based on time-space network | |
CN114385930A (en) | Interest point recommendation method and system | |
CN113689270A (en) | Method for determining black product device, electronic device, storage medium, and program product | |
WO2024098682A1 (en) | Xai model evaluation method and apparatus, device, and medium | |
Song et al. | Accurate and fast path computation on large urban road networks: A general approach | |
Lai et al. | Parallel computations of local PageRank problem based on Graphics Processing Unit | |
CN114329231A (en) | Object feature processing method and device, electronic equipment and storage medium | |
CN113627513A (en) | Training data generation method and system, electronic device and storage medium | |
CN114154046B (en) | Website search ranking method and system | |
CN106295844A (en) | A kind of data processing method, device, system and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |