CN105808779B - Figure roaming parallel calculating method and application based on beta pruning - Google Patents

Figure roaming parallel calculating method and application based on beta pruning Download PDF

Info

Publication number
CN105808779B
CN105808779B CN201610192758.9A CN201610192758A CN105808779B CN 105808779 B CN105808779 B CN 105808779B CN 201610192758 A CN201610192758 A CN 201610192758A CN 105808779 B CN105808779 B CN 105808779B
Authority
CN
China
Prior art keywords
vertex
class
page
boundary point
beta pruning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610192758.9A
Other languages
Chinese (zh)
Other versions
CN105808779A (en
Inventor
余华山
王娜
孟佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201610192758.9A priority Critical patent/CN105808779B/en
Publication of CN105808779A publication Critical patent/CN105808779A/en
Application granted granted Critical
Publication of CN105808779B publication Critical patent/CN105808779B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of, and the figure based on beta pruning roams parallel calculating method and application, it is related to large-scale data parallel computation and processing technology field, the efficient figure roaming based on beta pruning that this method is realized in shared drive system, pretreatment stage before the computation, the boundary point in figure is identified according to topological characteristic, carries out corresponding beta pruning calculating operation in calculating process on this basis.The present invention enables the renewal speed more fast diffusion on vertex by cutting off boundary point, and convergence speed of the algorithm is accelerated, to reduce iteration wheel number;It when being divided in parallel computation to task, cuts off boundary point and makes task division more balanced, to achieve the purpose that reduce computing cost, promote calculated performance.

Description

Figure roaming parallel calculating method and application based on beta pruning
Technical field
The present invention relates to large-scale data parallel computation and processing technology field more particularly to a kind of figure based on beta pruning are unrestrained Swim parallel calculating method and application.
Background technique
In recent years, as the continuous development of internet, more and more information and data need to be deposited with graph structure Storage and calculating, for example, the friend relation of online social networks, the adduction relationship in paper library, the website recorded in search engine Knowledge mapping etc. in linking relationship and wikipedia.Often data scale is huge for the calculating of these figures, and the quantity on point and side reaches To GB/TB/ZB rank, even more greatly.Large Scale Graphs calculating, which has, calculates the difficult points such as low, the data random access of density.Parallel meter It is a kind of common technology means of Large Scale Graphs computational problem.
Figure is a kind of data structure, and figure G is usually made of the set E on the side between the set V and vertex on one group of vertex.Appoint A line e=(u, v) expression anticipate there are a side from vertex u to vertex v, u is referred to as the father node of v, and v is referred to as the child node of u, E is referred to as the side out of u, and v's enters side.In figure computation model, V and E include calculating status data.The update of state is by every A series of iterative calculation is run on a vertex to complete, calculated result is the poly- of the end-state on all vertex and side in figure It closes.Vertex calculates the state dependent on itself vertex, neighbours vertex and side, and can update these states.
Figure roaming refers to for scheming G, provides certain constraint condition, has any vertex v not satisfy the constraint condition in G When, opposite vertexes v is updated operation, and the updated state of vertex v can influence its neighbours vertex by going out side, cause neighbours The update on vertex, until all vertex all meet constraint condition in figure G, until no vertex needs to be updated operation again.Its In, updating operation is to execute calculating according to algorithm on the vertex of figure.
In existing figure Computational frame, Pregel is the typical calculation frame (Grzegorz calculated for distributed figure Malewicz et al.Pregel:A System for Large-Scale Graph Processing.ACM SIGMOD, 2010), it is mainly used for the calculating such as figure traversal, shortest path, PageRank.This is a frame centered on vertex, vertex By they side circulation and meanwhile send information to they all external neighbors, representative points collect these using correlation combiner Information, system are batch synchronizations, so received value only can be just seen in next round.Pregel performance is relatively slow, It may be due to frame expense and to be utilized the machine of distributed storage.On the one hand, although distributed system has stronger meter Calculation ability and storage capacity, but for the distributed system huge for number of nodes, calculate that density is low and data random access The characteristics of cause communication overhead in calculating process very big, handle diagram technology when bring performance boost and system cost increase It mismatches, computing capability can not play completely;On the other hand, Pregel directly calculates the figure of input, does not locate in advance Reason mechanism has some unnecessary computing costs in actual calculating process.
Summary of the invention
In order to overcome the above-mentioned deficiencies of the prior art, the present invention provides a kind of figure roaming parallel calculating method based on beta pruning And application, the efficient figure roaming based on beta pruning is realized in shared drive system, passes through pretreatment stage pair before the computation Figure carries out topological characteristic identification and carries out corresponding cut operator in calculating process on this basis, to reduce computing cost.
The principle of the present invention is: beta pruning figure roaming model provided by the invention is cut for " boundary point " vertex in figure Branch operation reduces computing cost to optimize calculating process, improves computational efficiency.In figure calculating process, each vertex v The state that state changes the father node dependent on v changes.Enter situation while with out according to point each in figure, it can be in figure Point is classified, and the point for not entering side is known as " input point ", and the point on side is not known as " output point " out, other points are known as " intermediate Point ".Such mode classification can be used as a topological characteristic of figure." input point " and " output point " is equivalent to " boundary of figure Point ", they can show the situation different from intermediate vertex in calculating.For " input point " that does not enter side, initially Its state will not update again after change;For " output point " for not going out side, although can be continuous in entire calculating process It updates, but since they do not go out side, updated state will not be transmitted to other vertex again, so only final update shape State is just meaningful, and update in the process is not necessarily.There are a large amount of " boundary points ", these points in standard drawing, natural figure Ratio to scheme itself the characteristics of it is related, such as have in Friendster social network diagram 44% " boundary point ", Twitter society " boundary point " for having 22% in network is handed over, is had in wikipedia relational graph 14% " boundary point ", there is optimization space.This The beta pruning figure roaming model that invention provides carries out cut operator for " boundary point " vertex in figure, by wiping out boundary point optimization Calculating process reduces computing cost, improves computational efficiency.
Present invention provide the technical scheme that
A kind of figure roaming parallel calculating method based on beta pruning, classifies to vertex in figure, for the boundary point in figure Cut operator is carried out, optimizes calculating process by wiping out boundary point, including pretreatment stage, beta pruning calculation stages, ending are mended Fill calculation stages;Specifically comprise the following steps:
1) it in pretreatment stage, identifies the topological characteristic of figure, pretreatment operation is carried out to figure;Specific step is as follows:
11) diagram data is read in, is scanned for according to the topological characteristic on vertex in figure, obtains boundary point and non-boundary point;Side Boundary's point is divided into a class, b class, c class and d class;
A class: enter the vertex that number of edges amount is 0;
B class: enter side, go out the vertex that number of edges amount is 1, and the father node u1 eligible b1 or b2 on the vertex:
The number of edges amount that enters of b1:u1 is 0, and number of edges amount is 1 out;
B2:u1 is b class vertex;
C class: go out the vertex that number of edges amount is 0;
D class: enter side, go out the vertex that number of edges amount is 1, and the child node u2 eligible d1 or d2 on the vertex:
The number of edges amount that goes out of d1:u2 is 0, and entering number of edges amount is 1;
D2:u2 is d class vertex.
12) a class and b class boundary point are initialized, i.e., is calculated according to the calculation method of selection, obtains corresponding edge The calculated result of boundary's point;
13) for a class and b class boundary point, when their child node is non-boundary point, by the calculated result of step 12) Pass to their child node;
14) side between a class and b class boundary point and their child node is left out from graph structure;
Above-mentioned steps 13) and 14) may make a class and the child node of b class boundary point that need not visit again corresponding father node (i.e. Corresponding a class and b class boundary point), the access expense on side can be reduced.
2) in calculation stages, beta pruning iterative calculation is carried out to pretreated figure;
21) vertex in figure is successively accessed, if the vertex boundary point, skips the vertex;Otherwise the vertex is utilized Step 12) the calculation method re-starts calculating, obtains new calculated result, the calculated result as the vertex;
22) after vertex all accesses, according to the condition of convergence set by the step 12) calculation method, when current When all vertex reach the condition of convergence in figure, terminates beta pruning and calculate;Otherwise, 21~22 are repeated and carries out beta pruning iterative calculation;
3) in finishing phase, supplement calculation is carried out;
To c class and d class boundary point, is calculated according to calculation method described in step 12), obtain calculated result, as phase Answer the result of boundary point.
Parallel calculating method is roamed for the above-mentioned figure based on beta pruning, further, the step 12) calculation method is PageRank algorithm or signal source shortest path algorithm.The signal source shortest path algorithm is dijkstra's algorithm, Bellman- Ford algorithm or SPFA algorithm.
The above-mentioned parallel calculating method based on beta pruning figure roaming model can be applied to PageRank (page rank), based on figure Register distribution and shortest path acquisition of coloring problem etc..Wherein, PageRank (page rank) method is used to presentation web page Grade/importance the significance level of each website is calculated by the linking relationship between website.Based on map colouring problem Register allocation method, be that is provided by a kind of color rendering intent, makes the vertex color of arbitrary neighborhood not by given non-directed graph G Together, this method, which is applied to register allocation technique, can be improved program execution speed;Shortest path be intended to find in figure two vertex it Between shortest path, can be used to solve pipe laying, route installation, plant area's layout and equipment and the shortest path in practical problems such as update The acquisition of diameter.
The Web page importance for the figure roaming parallel calculating method realization that the present invention also provides a kind of using above-mentioned based on beta pruning Sort method calculates the importance ranking for obtaining webpage as a result, specially following process by PageRank method:
61) linking relationship between webpage and webpage is expressed as figure G, respectively represents the different pages with vertex v i in figure, Being directed toward in the arrow representation page i of vj with vertex v i has the hyperlink for being directed toward page j;
62) it is scanned for according to the topological characteristic on vertex in figure, obtains the boundary point of four seed types, respectively a class~d Class;
63) it is directed to a class and b class boundary point, is initialized by PageRank method, a class and b class boundary point are obtained Calculated result;
64) it is passed to by corresponding calculated result when its child node is non-boundary point for a class and b class boundary point Child node;
65) side between a class and b class boundary point and their child node is wiped out;
66) calculating is iterated by PageRank method for other all vertex in figure not being boundary point again, schemed In all vertex terminate iteration when reaching the iteration convergence threshold value of setting;
67) c class and d class boundary point are calculated again by PageRank method, obtains the meter of c class and d class boundary point Calculate result;
68) it is ranked up according to the calculated result on all vertex, obtains the importance ranking result of webpage.
For above-mentioned Web page importance sort method, further, the PageRank method specifically:
If the PageRank value of page x is P (x), the hyperlink quantity that page x includes is N (x), institute pointed by page x There is the collection of the page to be combined into B (x), the PageRank value of any page i be calculated by formula 1:
P (i)=C1 ∑j∈B(i)P (j)/N (j)+C2 (formula 1)
In formula 1, i is any page;P (i) is the PageRank value of page i;C1, C2 are constant, in the present embodiment, C1= 0.85, C2=0.15;B (i) is the set of all pages pointed by page i;J is any page in set B (i);P (j) is The PageRank value of page j;N (j) is the hyperlink quantity that page j includes.
For above-mentioned Web page importance sort method, further, the iteration convergence condition of the PageRank method is The PageRank value of all pages no longer changes in figure, is expressed as formula 2:
i∈V(G)|Pnew(i)-Pold(i) | < ε (formula 2)
In formula 2, V (G) is the set for scheming all pages in G, and i is any page in V (G), PnewIt (i) is when previous round changes The PageRank value of page i, P after generationoldIt (i) is the PageRank value of page i after last round of iteration, ε is that mathematics is anticipated Minimum in justice.In embodiments of the present invention, ε value is 0.0000001.
Compared with prior art, the beneficial effects of the present invention are:
The present invention provides a kind of parallel calculating method based on shared memory systems, utilizes technical side provided by the invention Case, beta pruning meeting in calculating process bring computing cost to decline, calculated performance mentions so that points to be calculated and number of edges are reduced It rises;Due to having cut off these " boundary points ", the renewal speed on vertex can be spread faster, and convergence speed of the algorithm is accelerated, can Reduce iteration wheel number;When dividing in parallel computation to task, cutting off " boundary point " enables to task division more equal Weighing apparatus, to achieve the effect that reduce computing cost, promote calculated performance.
The above-mentioned parallel calculating method based on beta pruning figure roaming model can be applied to PageRank (page rank), based on figure Register distribution and shortest path acquisition of coloring problem etc..Web page importance sequence is applied the inventive method to, is passed through PageRank method calculates the importance ranking for obtaining webpage as a result, comparing the calculation method of not beta pruning, the method provided by the present invention The vertex quantity for either needing to calculate PageRank value in iteration wheel number or each round is greatly reduced, and is greatly reduced Computing cost improves computational efficiency.
Detailed description of the invention
Fig. 1 is the flow diagram of figure loaming method provided by the invention.
Fig. 2 is PageRank algorithm exemplary diagram G provided in an embodiment of the present invention;
Wherein, circle represents vertex, schemes to include seven vertex v1-v7 in G;Arrow representative edge between vertex, vi are directed toward The arrow representative edge (vi, vj) of vj.
Fig. 3 is the figure G' obtained after pre-processing in the embodiment of the present invention to exemplary diagram G;
Wherein, virtual coil and dotted line respectively indicate the vertex v 1 cut off, v2, v6, v7 and side (v1, v2), (v2, v3), (v5, v6), (v6, v7).
Specific embodiment
With reference to the accompanying drawing, the present invention, the model of but do not limit the invention in any way are further described by embodiment It encloses.
The present invention provides a kind of figure roaming parallel calculating method based on beta pruning, classifies to vertex in figure, for figure In boundary point carry out cut operator, optimize calculating process by wiping out boundary point, include the following steps:
1) pretreatment stage identifies the topological characteristic of figure, including classified to boundary point, initialize calculating, to figure into Row pretreatment operation;Specific step is as follows:
11) diagram data is read in, is scanned for according to the topological characteristic on vertex, obtains boundary point;Boundary point is divided into four classes:
A class: enter the vertex that number of edges amount is 0;
B class: enter side, go out number of edges amount and be 1 vertex, and enter the eligible b1 or b2 of starting point u1 (father node) on side:
The number of edges amount that enters of b1:u1 is 0, and number of edges amount is 1 out;
B2:u1 is b class vertex;
C class: go out the vertex that number of edges amount is 0;
D class: enter side, go out the vertex that number of edges amount is 1, and the eligible d1 or d2 of terminal u2 (child node) on side out:
The number of edges amount that goes out of d1:u2 is 0, and entering number of edges amount is 1;
D2:u2 is d class vertex.
12) above-mentioned a class, b class, c class and d class vertex are marked and is;A class and b class boundary point are carried out initial Change, i.e., according to the calculation method of selection (such as: PageRank algorithm or signal source shortest path algorithm) to these point (a class and b class sides Boundary's point) it is calculated, obtain initialization result;
Wherein, dijkstra's algorithm is one of the classic algorithm of signal source shortest path problem, have at present dijkstra's algorithm, Bellman-ford algorithm, SPFA algorithm etc..
13) for a class and b class boundary point, when their child node is not labeled as boundary point, by the meter of step 12) The child node that result passes to them is calculated, so as to calculation stages use, therefore data structure need to be created for these child nodes, be used for Store the calculated result of its father node;
When their child node is marked as boundary point, they are not belonging to calculative vertex in step 2, do not need It is worth by biography;
14) side between a class and b class boundary point and their child node is left out from graph structure;
Above-mentioned steps 13) and 14) may make a class and the child node of b class boundary point that need not visit again corresponding father node (i.e. Corresponding a class and b class boundary point), the access expense on side can be reduced.
2) calculation stages carry out beta pruning calculating to pretreated figure;
21) vertex is skipped if the vertex is marked as " boundary point " in the vertex successively accessed in figure;Otherwise right The vertex is recalculated, and new calculated result is obtained, the calculated result before replacing the vertex with new calculated result;
22) after vertex all accesses, according to the condition of convergence set by the calculation method of selection, judge in current figure Whether all vertex reach the condition of convergence, if it is, calculation stages terminate;Otherwise, 21~22 process is repeated;
3) finishing phase carries out supplement calculation;
Finishing phase after calculation stages counts c class and d class vertex according to the algorithm specifically implemented It calculates, obtains calculated result, the as calculated result of respective vertices.
Thus the end-state on all vertex is obtained.
Below by embodiment, the present invention will be further described.
Following embodiment is ranked up by using importance of the PageRank algorithm to Webpage, in the present embodiment, Webpage is seven, and there are linking relationships between the page;Seven linking relationships between the page and the page can be expressed as Fig. 2 In figure G;Wherein v1-v7 respectively represents seven pages, and vi, which is directed toward in the arrow representing pages i of vj, the hyperlink for being directed toward page j It connects.Using the figure loaming method provided by the invention based on beta pruning graph model, PageRank algorithm is executed on the figure G in Fig. 2 The importance of Webpage is ranked up, ranking results are obtained.
The calculating process of PageRank algorithm is as follows:
If the PageRank value of page x is P (x), the hyperlink quantity (i.e. vertex x's goes out number of edges amount) that page x includes is N (x), the collection of all pages (i.e. the child node of vertex x) pointed by page x is combined into B (x), then for any page i:
P (i)=C1 ∑j∈B(i)P (j)/N (j)+C2 (formula 1)
In formula 1, i is any page;P (i) is the PageRank value of page i;C1, C2 are constant, in the present embodiment, C1= 0.85, C2=0.15;B (i) is the set of all pages pointed by page i;J is any page in set B (i);P (j) is The PageRank value of page j;N (j) is the hyperlink quantity that page j includes;
When initial, the PageRank value initial assignment of all pages is 1, then starts to carry out according to above-mentioned formula (formula 1) Iterative calculation;The condition of convergence of algorithm is that the PageRank value of all pages in figure no longer changes, and is expressed as formula 2:
i∈v(G)|Pnew(i)-Pold(i) | < ε (formula 2)
In formula 2, V (G) is the set for scheming all pages in G, and i is any page in V (G), PnewIt (i) is when previous round changes The PageRank value of page i, P after generationoldIt (i) is the PageRank value of page i after last round of iteration, ε is that mathematics is anticipated Minimum in justice in the present embodiment, takes ε=0.0000001.
Using the figure loaming method provided by the invention based on beta pruning graph model, PageRank is executed on the figure G in Fig. 2 Algorithm is ranked up the importance of Webpage, obtains ranking results by following calculating process:
1) pretreatment stage identifies the topological characteristic of figure:
11) data in figure G are read in, figure G illustrates seven linking relationships between the page and the page, opens up to " boundary point " Feature is flutterred to scan for and mark;" boundary point " in Fig. 2 is respectively the page (being expressed as vertex) v1, v2, v6, v7, wherein v1 For a class boundary point, v2 is b class boundary point, and v6 is d class boundary point, and v7 is c class boundary point;
12) it makes marks for above-mentioned boundary point v1, v2, v6, v7;To a/b class boundary point v1, v2, carried out initially by formula 1 Change and calculate, obtains following result:
P (v1)=0.15;
P (v2)=0.85*P (1)/1+0.15=0.28;
13) the child node v3 of v2 is not boundary point, so the calculated result of v2 needs to pass to v3.
As shown in formula 1, the PageRank value needs for calculating v3 first seek the P (j) of all father nodes of v3/N (j) value With, after removing v2, the P (j) v2/N (j) value is needed to be stored in the data structure of v3, so as to v3 calculate when can call The numerical value, storage value are P (v2)/N (v2)=0.28/1=0.28.
14) side (v2, v3) is removed from graph structure, then will not be revisited when accessing the father node of v3 and asks vertex v 2.
2) formal calculation stages carry out beta pruning calculating to figure:
21) the vertex v 1-v7 in figure is successively accessed, wherein v1, v2, v6, v7 are labeled, skip and do not access;To v3, v4, V5 passes through formula 1 respectively and is calculated and (retain two-decimal):
As shown in Figure 3, v3 does not have father node in pretreated figure G ', and the calculated result of v2 is stored in v3 0.28, therefore in first round iteration, P (v3)=0.85*P (2)+0.15=0.39;Later since the new value of not no father node passes To v3, P (v3) is remained unchanged;
The father node of v4 is v3, and in each round iteration, v4 takes the last round of result of P (v3) to be calculated by formula 1;V5's Father node is v3 and v4, and in each round iteration, P (v5) takes the last round of result of P (v3) and P (v4) to pass through formula 1 and calculated;Tool Body calculated result see the table below 1:
1 beta pruning of table is calculated to the obtained PageRank value in non-boundary point iterative process
P(v3) P(v4) P(v5)
Initial value 1.00 1.00 1.00
1 wheel iteration 0.39 0.58 1.42
2 wheel iteration 0.39 0.31 0.56
3 wheel iteration 0.39 0.31 0.58
4 wheel iteration 0.39 0.31 0.58
Judge whether to reach the condition of convergence according to formula 2, in the present embodiment, takes ε=0.0000001 in formula 2.4th wheel iteration Afterwards, | Pnew(v3)-Pold(v3)|+|Pnew(v4)-Pold(v4)|+|Pnew(v5)-Pold(v5) |=0 < ε has reached the condition of convergence, Algorithm terminates.
3) supplement calculation of finishing phase:
Calculating is updated by formula 1 to v6, v7:
P (v6)=0.85*P (5)/1+0.15=0.64
P (v7)=0.85*P (6)/1+0.15=0.69
So final result are as follows:
The PageRank value final result that 2 beta pruning of table is calculated
P(v1) P(v2) P(v3) P(v4) P(v5) P(v6) P(v7)
0.15 0.28 0.39 0.31 0.58 0.64 0.69
The figure loaming method provided by the invention based on beta pruning graph model is utilized as a result, is executed on the figure G in Fig. 2 PageRank algorithm is ranked up the importance of Webpage, the PageRank ranking results of acquisition are as follows: v7 > v6 > v5 > v3 >v4>v2>v1。
Method can reduce the effect of computing cost to illustrate the invention, be listed below straight using existing not pruning method The iterative process that PageRank calculating is carried out to Fig. 2 is connect, as control.Directly Fig. 2 is carried out using existing not pruning method The iterative process that PageRank is calculated is as shown in table 2:
Table 2
P(v1) P(v2) P(v3) P(v4) P(v5) P(v6) P(v7)
Initial value 1.00 1.00 1.00 1.00 1.00 1.00 1.00
1 wheel iteration 0.15 1.00 1.00 0.58 1.43 1.00 1.00
2 wheel iteration 0.15 0.28 1.00 0.58 1.06 1.36 0.58
3 wheel iteration 0.15 0.28 0.39 0.58 1.06 1.05 1.31
4 wheel iteration 0.15 0.28 0.39 0.31 0.81 0.58 1.05
5 wheel iteration 0.15 0.28 0.39 0.31 0.58 0.84 1.05
6 wheel iteration 0.15 0.28 0.39 0.31 0.58 0.64 0.86
7 wheel iteration 0.15 0.28 0.39 0.31 0.58 0.64 0.69
8 wheel iteration 0.15 0.28 0.39 0.31 0.58 0.64 0.69
As can be seen that final calculation result is consistent.It uses the iteration wheel number of existing not pruning method for 8 wheels, and uses The method of the present invention makes the iteration wheel number after beta pruning there was only 4 wheels, and only needs to calculate the PageRank on three vertex in each round Value, greatly reduces computing cost.
It should be noted that the purpose for publicizing and implementing example is to help to further understand the present invention, but the skill of this field Art personnel, which are understood that, not to be departed from the present invention and spirit and scope of the appended claims, and various substitutions and modifications are all It is possible.Therefore, the present invention should not be limited to embodiment disclosure of that, and the scope of protection of present invention is with claim Subject to the range that book defines.

Claims (6)

1. a kind of Web page importance sort method of the figure roaming parallel computation based on beta pruning, by the link between webpage and webpage Relationship is expressed as figure G, respectively represents the different pages with vertex v i in figure, and being directed toward in the arrow representation page i of vj with vertex v i has It is directed toward the hyperlink of page j;Classify to vertex in figure, cut operator is carried out for the boundary point in figure, by wiping out side Boundary's point optimizes calculating process, including pretreatment stage, beta pruning calculation stages, ending supplement calculation stage;It specifically includes as follows Step:
1) it in pretreatment stage, identifies the topological characteristic of figure, pretreatment operation is carried out to figure;Specific step is as follows:
11) diagram data is read in, is scanned for according to the topological characteristic on vertex in figure, obtains boundary point and non-boundary point;Boundary point It is divided into a class, b class, c class and d class;Specifically:
A class: enter the vertex that number of edges amount is 0;
B class: enter side, go out the vertex that number of edges amount is 1, and the father node u1 eligible b1 or b2 on the vertex:
The number of edges amount that enters of b1:u1 is 0, and number of edges amount is 1 out;
B2:u1 is b class vertex;
C class: go out the vertex that number of edges amount is 0;
D class: enter side, go out the vertex that number of edges amount is 1, and the child node u2 eligible d1 or d2 on the vertex:
The number of edges amount that goes out of d1:u2 is 0, and entering number of edges amount is 1;
D2:u2 is d class vertex;
12) a class and b class boundary point are initialized, i.e., is calculated according to the calculation method of selection, obtain corresponding boundary point Calculated result;
13) a class and b class boundary point are transmitted the calculated result of step 12) when their child node is non-boundary point To their child node;
14) side between a class and b class boundary point and their child node is left out from graph structure;
2) in calculation stages, beta pruning iterative calculation is carried out to pretreated figure;
21) vertex is skipped if the vertex is boundary point in the vertex successively accessed in figure;Otherwise step is utilized to the vertex Rapid 12) the described calculation method re-starts calculating, obtains new calculated result, the calculated result as the vertex;
22) after vertex all accesses, according to the condition of convergence set by the step 12) calculation method, when in current figure When all vertex reach the condition of convergence, terminates beta pruning and calculate;Otherwise, 21~22 are repeated and carries out beta pruning iterative calculation;
3) in finishing phase, supplement calculation is carried out;
To c class and d class boundary point, is calculated according to calculation method described in step 12), obtain calculated result, as corresponding edge The result of boundary's point;
4) it is ranked up according to the calculated result on all vertex, obtains the importance ranking result of webpage.
2. the Web page importance sort method of the figure roaming parallel computation based on beta pruning as described in claim 1, characterized in that step Rapid 12) the described calculation method is PageRank algorithm or signal source shortest path algorithm.
3. the Web page importance sort method of the figure roaming parallel computation based on beta pruning as claimed in claim 2, characterized in that institute Stating signal source shortest path algorithm is dijkstra's algorithm, bellman-ford algorithm or SPFA algorithm.
4. the Web page importance sort method of the figure roaming parallel computation based on beta pruning as claimed in claim 2, characterized in that institute State PageRank method specifically:
If the PageRank value of page x is P (x), the hyperlink quantity that page x includes is N (x), all pages pointed by page x The collection in face is combined into B (x), and the PageRank value of any page i is calculated by formula 1:
P (i)=C1 ∑j∈B(i)P (j)/N (j)+C2 (formula 1)
In formula 1, i is any page;P (i) is the PageRank value of page i;C1, C2 are constant;B (i) is pointed by page i The set of all pages;J is any page in set B (i);P (j) is the PageRank value of page j;N (j) is that page j includes Hyperlink quantity.
5. the Web page importance sort method of the figure roaming parallel computation based on beta pruning as claimed in claim 2, characterized in that institute The PageRank value that the iteration convergence condition of PageRank method is stated as all pages in figure no longer changes, and is expressed as formula 2:
i∈V(G)|Pnew(i)-Pold(i) | < ε (formula 2)
In formula 2, V (G) is the set for scheming all pages in G, and i is any page in V (G), Pnwe(i) for when previous round iteration knot The PageRank value of page i, P after beamoldIt (i) is the PageRank value of page i after last round of iteration, ε is in mathematical meaning Minimum.
6. the Web page importance sort method of the figure roaming parallel computation based on beta pruning as claimed in claim 5, characterized in that ε Value is 0.0000001.
CN201610192758.9A 2016-03-30 2016-03-30 Figure roaming parallel calculating method and application based on beta pruning Expired - Fee Related CN105808779B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610192758.9A CN105808779B (en) 2016-03-30 2016-03-30 Figure roaming parallel calculating method and application based on beta pruning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610192758.9A CN105808779B (en) 2016-03-30 2016-03-30 Figure roaming parallel calculating method and application based on beta pruning

Publications (2)

Publication Number Publication Date
CN105808779A CN105808779A (en) 2016-07-27
CN105808779B true CN105808779B (en) 2019-06-07

Family

ID=56459392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610192758.9A Expired - Fee Related CN105808779B (en) 2016-03-30 2016-03-30 Figure roaming parallel calculating method and application based on beta pruning

Country Status (1)

Country Link
CN (1) CN105808779B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992572A (en) * 2017-11-30 2018-05-04 天津大学 A kind of distributed graph coloring algorithm based on Pregel
CN110689116B (en) * 2019-09-24 2022-12-27 安徽寒武纪信息科技有限公司 Neural network pruning method and device, computer equipment and storage medium
CN110780947B (en) * 2019-10-21 2023-10-13 深圳大学 PageRank parallel computing acceleration method for social graph data
CN111369052B (en) * 2020-03-03 2021-02-12 中铁工程设计咨询集团有限公司 Simplified road network KSP optimization algorithm
WO2022000375A1 (en) 2020-07-01 2022-01-06 Paypal, Inc. Graph storage in database

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014092536A1 (en) * 2012-12-14 2014-06-19 Mimos Berhad A system and method for dynamic generation of distribution plan for intensive social network analysis (sna) tasks
CN103559016B (en) * 2013-10-23 2016-09-07 江西理工大学 A kind of Frequent tree mining method for digging based on graphic process unit parallel computation
CN103559258A (en) * 2013-11-04 2014-02-05 同济大学 Webpage ranking method based on cloud computation
CN104063507B (en) * 2014-07-09 2017-10-17 时趣互动(北京)科技有限公司 A kind of figure computational methods and system
CN104951505A (en) * 2015-05-20 2015-09-30 中国科学院信息工程研究所 Large-scale data clustering method based on graphic calculation technology

Also Published As

Publication number Publication date
CN105808779A (en) 2016-07-27

Similar Documents

Publication Publication Date Title
CN105808779B (en) Figure roaming parallel calculating method and application based on beta pruning
CN103605662B (en) Distributed computation frame parameter optimizing method, device and system
CN105677683B (en) Batch data querying method and device
WO2018205892A1 (en) Incremental graph computations for querying large graphs
Klein et al. A fully dynamic approximation scheme for shortest paths in planar graphs
CN103246732B (en) A kind of abstracting method of online Web news content and system
CN108228724A (en) Power grid GIS topology analyzing method and storage medium based on chart database
van Hoesel et al. Using geometric techniques to improve dynamic programming algorithms for the economic lot-sizing problem and extensions
CN105224959A (en) The training method of order models and device
CN103699664A (en) Dynamic topology analysis method for power distribution network
CN102737114A (en) MapReduce-based big picture distance connection query method
CN109241355A (en) Accessibility querying method, system and the readable storage medium storing program for executing of directed acyclic graph
CN105138600A (en) Graph structure matching-based social network analysis method
CN104748757B (en) A kind of data in navigation electronic map update method and device
Tholey Linear time algorithms for two disjoint paths problems on directed acyclic graphs
CN115240048A (en) Deep learning operator positioning fusion method and device for image classification
Born et al. Layout embedding via combinatorial optimization
CN106445913A (en) MapReduce-based semantic inference method and system
CN104572832B (en) A kind of demand meta-model construction method and device
Sui et al. Learning 3-opt heuristics for traveling salesman problem via deep reinforcement learning
CN107330169A (en) A kind of regional cold supply system pipe network route planning method and system
CN106547916A (en) A kind of user&#39;s portrait tag queries method and device
CN104462095A (en) Extraction method and device of common pars of query statements
Wang et al. Application of A* algorithm in intelligent vehicle path planning
CN114564523B (en) Big data vulnerability analysis method and cloud AI system for intelligent virtual scene

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190607

CF01 Termination of patent right due to non-payment of annual fee