CN105808779B - Figure roaming parallel calculating method and application based on beta pruning - Google Patents
Figure roaming parallel calculating method and application based on beta pruning Download PDFInfo
- Publication number
- CN105808779B CN105808779B CN201610192758.9A CN201610192758A CN105808779B CN 105808779 B CN105808779 B CN 105808779B CN 201610192758 A CN201610192758 A CN 201610192758A CN 105808779 B CN105808779 B CN 105808779B
- Authority
- CN
- China
- Prior art keywords
- vertex
- class
- page
- boundary point
- beta pruning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of, and the figure based on beta pruning roams parallel calculating method and application, it is related to large-scale data parallel computation and processing technology field, the efficient figure roaming based on beta pruning that this method is realized in shared drive system, pretreatment stage before the computation, the boundary point in figure is identified according to topological characteristic, carries out corresponding beta pruning calculating operation in calculating process on this basis.The present invention enables the renewal speed more fast diffusion on vertex by cutting off boundary point, and convergence speed of the algorithm is accelerated, to reduce iteration wheel number;It when being divided in parallel computation to task, cuts off boundary point and makes task division more balanced, to achieve the purpose that reduce computing cost, promote calculated performance.
Description
Technical field
The present invention relates to large-scale data parallel computation and processing technology field more particularly to a kind of figure based on beta pruning are unrestrained
Swim parallel calculating method and application.
Background technique
In recent years, as the continuous development of internet, more and more information and data need to be deposited with graph structure
Storage and calculating, for example, the friend relation of online social networks, the adduction relationship in paper library, the website recorded in search engine
Knowledge mapping etc. in linking relationship and wikipedia.Often data scale is huge for the calculating of these figures, and the quantity on point and side reaches
To GB/TB/ZB rank, even more greatly.Large Scale Graphs calculating, which has, calculates the difficult points such as low, the data random access of density.Parallel meter
It is a kind of common technology means of Large Scale Graphs computational problem.
Figure is a kind of data structure, and figure G is usually made of the set E on the side between the set V and vertex on one group of vertex.Appoint
A line e=(u, v) expression anticipate there are a side from vertex u to vertex v, u is referred to as the father node of v, and v is referred to as the child node of u,
E is referred to as the side out of u, and v's enters side.In figure computation model, V and E include calculating status data.The update of state is by every
A series of iterative calculation is run on a vertex to complete, calculated result is the poly- of the end-state on all vertex and side in figure
It closes.Vertex calculates the state dependent on itself vertex, neighbours vertex and side, and can update these states.
Figure roaming refers to for scheming G, provides certain constraint condition, has any vertex v not satisfy the constraint condition in G
When, opposite vertexes v is updated operation, and the updated state of vertex v can influence its neighbours vertex by going out side, cause neighbours
The update on vertex, until all vertex all meet constraint condition in figure G, until no vertex needs to be updated operation again.Its
In, updating operation is to execute calculating according to algorithm on the vertex of figure.
In existing figure Computational frame, Pregel is the typical calculation frame (Grzegorz calculated for distributed figure
Malewicz et al.Pregel:A System for Large-Scale Graph Processing.ACM SIGMOD,
2010), it is mainly used for the calculating such as figure traversal, shortest path, PageRank.This is a frame centered on vertex, vertex
By they side circulation and meanwhile send information to they all external neighbors, representative points collect these using correlation combiner
Information, system are batch synchronizations, so received value only can be just seen in next round.Pregel performance is relatively slow,
It may be due to frame expense and to be utilized the machine of distributed storage.On the one hand, although distributed system has stronger meter
Calculation ability and storage capacity, but for the distributed system huge for number of nodes, calculate that density is low and data random access
The characteristics of cause communication overhead in calculating process very big, handle diagram technology when bring performance boost and system cost increase
It mismatches, computing capability can not play completely;On the other hand, Pregel directly calculates the figure of input, does not locate in advance
Reason mechanism has some unnecessary computing costs in actual calculating process.
Summary of the invention
In order to overcome the above-mentioned deficiencies of the prior art, the present invention provides a kind of figure roaming parallel calculating method based on beta pruning
And application, the efficient figure roaming based on beta pruning is realized in shared drive system, passes through pretreatment stage pair before the computation
Figure carries out topological characteristic identification and carries out corresponding cut operator in calculating process on this basis, to reduce computing cost.
The principle of the present invention is: beta pruning figure roaming model provided by the invention is cut for " boundary point " vertex in figure
Branch operation reduces computing cost to optimize calculating process, improves computational efficiency.In figure calculating process, each vertex v
The state that state changes the father node dependent on v changes.Enter situation while with out according to point each in figure, it can be in figure
Point is classified, and the point for not entering side is known as " input point ", and the point on side is not known as " output point " out, other points are known as " intermediate
Point ".Such mode classification can be used as a topological characteristic of figure." input point " and " output point " is equivalent to " boundary of figure
Point ", they can show the situation different from intermediate vertex in calculating.For " input point " that does not enter side, initially
Its state will not update again after change;For " output point " for not going out side, although can be continuous in entire calculating process
It updates, but since they do not go out side, updated state will not be transmitted to other vertex again, so only final update shape
State is just meaningful, and update in the process is not necessarily.There are a large amount of " boundary points ", these points in standard drawing, natural figure
Ratio to scheme itself the characteristics of it is related, such as have in Friendster social network diagram 44% " boundary point ", Twitter society
" boundary point " for having 22% in network is handed over, is had in wikipedia relational graph 14% " boundary point ", there is optimization space.This
The beta pruning figure roaming model that invention provides carries out cut operator for " boundary point " vertex in figure, by wiping out boundary point optimization
Calculating process reduces computing cost, improves computational efficiency.
Present invention provide the technical scheme that
A kind of figure roaming parallel calculating method based on beta pruning, classifies to vertex in figure, for the boundary point in figure
Cut operator is carried out, optimizes calculating process by wiping out boundary point, including pretreatment stage, beta pruning calculation stages, ending are mended
Fill calculation stages;Specifically comprise the following steps:
1) it in pretreatment stage, identifies the topological characteristic of figure, pretreatment operation is carried out to figure;Specific step is as follows:
11) diagram data is read in, is scanned for according to the topological characteristic on vertex in figure, obtains boundary point and non-boundary point;Side
Boundary's point is divided into a class, b class, c class and d class;
A class: enter the vertex that number of edges amount is 0;
B class: enter side, go out the vertex that number of edges amount is 1, and the father node u1 eligible b1 or b2 on the vertex:
The number of edges amount that enters of b1:u1 is 0, and number of edges amount is 1 out;
B2:u1 is b class vertex;
C class: go out the vertex that number of edges amount is 0;
D class: enter side, go out the vertex that number of edges amount is 1, and the child node u2 eligible d1 or d2 on the vertex:
The number of edges amount that goes out of d1:u2 is 0, and entering number of edges amount is 1;
D2:u2 is d class vertex.
12) a class and b class boundary point are initialized, i.e., is calculated according to the calculation method of selection, obtains corresponding edge
The calculated result of boundary's point;
13) for a class and b class boundary point, when their child node is non-boundary point, by the calculated result of step 12)
Pass to their child node;
14) side between a class and b class boundary point and their child node is left out from graph structure;
Above-mentioned steps 13) and 14) may make a class and the child node of b class boundary point that need not visit again corresponding father node (i.e.
Corresponding a class and b class boundary point), the access expense on side can be reduced.
2) in calculation stages, beta pruning iterative calculation is carried out to pretreated figure;
21) vertex in figure is successively accessed, if the vertex boundary point, skips the vertex;Otherwise the vertex is utilized
Step 12) the calculation method re-starts calculating, obtains new calculated result, the calculated result as the vertex;
22) after vertex all accesses, according to the condition of convergence set by the step 12) calculation method, when current
When all vertex reach the condition of convergence in figure, terminates beta pruning and calculate;Otherwise, 21~22 are repeated and carries out beta pruning iterative calculation;
3) in finishing phase, supplement calculation is carried out;
To c class and d class boundary point, is calculated according to calculation method described in step 12), obtain calculated result, as phase
Answer the result of boundary point.
Parallel calculating method is roamed for the above-mentioned figure based on beta pruning, further, the step 12) calculation method is
PageRank algorithm or signal source shortest path algorithm.The signal source shortest path algorithm is dijkstra's algorithm, Bellman-
Ford algorithm or SPFA algorithm.
The above-mentioned parallel calculating method based on beta pruning figure roaming model can be applied to PageRank (page rank), based on figure
Register distribution and shortest path acquisition of coloring problem etc..Wherein, PageRank (page rank) method is used to presentation web page
Grade/importance the significance level of each website is calculated by the linking relationship between website.Based on map colouring problem
Register allocation method, be that is provided by a kind of color rendering intent, makes the vertex color of arbitrary neighborhood not by given non-directed graph G
Together, this method, which is applied to register allocation technique, can be improved program execution speed;Shortest path be intended to find in figure two vertex it
Between shortest path, can be used to solve pipe laying, route installation, plant area's layout and equipment and the shortest path in practical problems such as update
The acquisition of diameter.
The Web page importance for the figure roaming parallel calculating method realization that the present invention also provides a kind of using above-mentioned based on beta pruning
Sort method calculates the importance ranking for obtaining webpage as a result, specially following process by PageRank method:
61) linking relationship between webpage and webpage is expressed as figure G, respectively represents the different pages with vertex v i in figure,
Being directed toward in the arrow representation page i of vj with vertex v i has the hyperlink for being directed toward page j;
62) it is scanned for according to the topological characteristic on vertex in figure, obtains the boundary point of four seed types, respectively a class~d
Class;
63) it is directed to a class and b class boundary point, is initialized by PageRank method, a class and b class boundary point are obtained
Calculated result;
64) it is passed to by corresponding calculated result when its child node is non-boundary point for a class and b class boundary point
Child node;
65) side between a class and b class boundary point and their child node is wiped out;
66) calculating is iterated by PageRank method for other all vertex in figure not being boundary point again, schemed
In all vertex terminate iteration when reaching the iteration convergence threshold value of setting;
67) c class and d class boundary point are calculated again by PageRank method, obtains the meter of c class and d class boundary point
Calculate result;
68) it is ranked up according to the calculated result on all vertex, obtains the importance ranking result of webpage.
For above-mentioned Web page importance sort method, further, the PageRank method specifically:
If the PageRank value of page x is P (x), the hyperlink quantity that page x includes is N (x), institute pointed by page x
There is the collection of the page to be combined into B (x), the PageRank value of any page i be calculated by formula 1:
P (i)=C1 ∑j∈B(i)P (j)/N (j)+C2 (formula 1)
In formula 1, i is any page;P (i) is the PageRank value of page i;C1, C2 are constant, in the present embodiment, C1=
0.85, C2=0.15;B (i) is the set of all pages pointed by page i;J is any page in set B (i);P (j) is
The PageRank value of page j;N (j) is the hyperlink quantity that page j includes.
For above-mentioned Web page importance sort method, further, the iteration convergence condition of the PageRank method is
The PageRank value of all pages no longer changes in figure, is expressed as formula 2:
∑i∈V(G)|Pnew(i)-Pold(i) | < ε (formula 2)
In formula 2, V (G) is the set for scheming all pages in G, and i is any page in V (G), PnewIt (i) is when previous round changes
The PageRank value of page i, P after generationoldIt (i) is the PageRank value of page i after last round of iteration, ε is that mathematics is anticipated
Minimum in justice.In embodiments of the present invention, ε value is 0.0000001.
Compared with prior art, the beneficial effects of the present invention are:
The present invention provides a kind of parallel calculating method based on shared memory systems, utilizes technical side provided by the invention
Case, beta pruning meeting in calculating process bring computing cost to decline, calculated performance mentions so that points to be calculated and number of edges are reduced
It rises;Due to having cut off these " boundary points ", the renewal speed on vertex can be spread faster, and convergence speed of the algorithm is accelerated, can
Reduce iteration wheel number;When dividing in parallel computation to task, cutting off " boundary point " enables to task division more equal
Weighing apparatus, to achieve the effect that reduce computing cost, promote calculated performance.
The above-mentioned parallel calculating method based on beta pruning figure roaming model can be applied to PageRank (page rank), based on figure
Register distribution and shortest path acquisition of coloring problem etc..Web page importance sequence is applied the inventive method to, is passed through
PageRank method calculates the importance ranking for obtaining webpage as a result, comparing the calculation method of not beta pruning, the method provided by the present invention
The vertex quantity for either needing to calculate PageRank value in iteration wheel number or each round is greatly reduced, and is greatly reduced
Computing cost improves computational efficiency.
Detailed description of the invention
Fig. 1 is the flow diagram of figure loaming method provided by the invention.
Fig. 2 is PageRank algorithm exemplary diagram G provided in an embodiment of the present invention;
Wherein, circle represents vertex, schemes to include seven vertex v1-v7 in G;Arrow representative edge between vertex, vi are directed toward
The arrow representative edge (vi, vj) of vj.
Fig. 3 is the figure G' obtained after pre-processing in the embodiment of the present invention to exemplary diagram G;
Wherein, virtual coil and dotted line respectively indicate the vertex v 1 cut off, v2, v6, v7 and side (v1, v2), (v2, v3),
(v5, v6), (v6, v7).
Specific embodiment
With reference to the accompanying drawing, the present invention, the model of but do not limit the invention in any way are further described by embodiment
It encloses.
The present invention provides a kind of figure roaming parallel calculating method based on beta pruning, classifies to vertex in figure, for figure
In boundary point carry out cut operator, optimize calculating process by wiping out boundary point, include the following steps:
1) pretreatment stage identifies the topological characteristic of figure, including classified to boundary point, initialize calculating, to figure into
Row pretreatment operation;Specific step is as follows:
11) diagram data is read in, is scanned for according to the topological characteristic on vertex, obtains boundary point;Boundary point is divided into four classes:
A class: enter the vertex that number of edges amount is 0;
B class: enter side, go out number of edges amount and be 1 vertex, and enter the eligible b1 or b2 of starting point u1 (father node) on side:
The number of edges amount that enters of b1:u1 is 0, and number of edges amount is 1 out;
B2:u1 is b class vertex;
C class: go out the vertex that number of edges amount is 0;
D class: enter side, go out the vertex that number of edges amount is 1, and the eligible d1 or d2 of terminal u2 (child node) on side out:
The number of edges amount that goes out of d1:u2 is 0, and entering number of edges amount is 1;
D2:u2 is d class vertex.
12) above-mentioned a class, b class, c class and d class vertex are marked and is;A class and b class boundary point are carried out initial
Change, i.e., according to the calculation method of selection (such as: PageRank algorithm or signal source shortest path algorithm) to these point (a class and b class sides
Boundary's point) it is calculated, obtain initialization result;
Wherein, dijkstra's algorithm is one of the classic algorithm of signal source shortest path problem, have at present dijkstra's algorithm,
Bellman-ford algorithm, SPFA algorithm etc..
13) for a class and b class boundary point, when their child node is not labeled as boundary point, by the meter of step 12)
The child node that result passes to them is calculated, so as to calculation stages use, therefore data structure need to be created for these child nodes, be used for
Store the calculated result of its father node;
When their child node is marked as boundary point, they are not belonging to calculative vertex in step 2, do not need
It is worth by biography;
14) side between a class and b class boundary point and their child node is left out from graph structure;
Above-mentioned steps 13) and 14) may make a class and the child node of b class boundary point that need not visit again corresponding father node (i.e.
Corresponding a class and b class boundary point), the access expense on side can be reduced.
2) calculation stages carry out beta pruning calculating to pretreated figure;
21) vertex is skipped if the vertex is marked as " boundary point " in the vertex successively accessed in figure;Otherwise right
The vertex is recalculated, and new calculated result is obtained, the calculated result before replacing the vertex with new calculated result;
22) after vertex all accesses, according to the condition of convergence set by the calculation method of selection, judge in current figure
Whether all vertex reach the condition of convergence, if it is, calculation stages terminate;Otherwise, 21~22 process is repeated;
3) finishing phase carries out supplement calculation;
Finishing phase after calculation stages counts c class and d class vertex according to the algorithm specifically implemented
It calculates, obtains calculated result, the as calculated result of respective vertices.
Thus the end-state on all vertex is obtained.
Below by embodiment, the present invention will be further described.
Following embodiment is ranked up by using importance of the PageRank algorithm to Webpage, in the present embodiment,
Webpage is seven, and there are linking relationships between the page;Seven linking relationships between the page and the page can be expressed as Fig. 2
In figure G;Wherein v1-v7 respectively represents seven pages, and vi, which is directed toward in the arrow representing pages i of vj, the hyperlink for being directed toward page j
It connects.Using the figure loaming method provided by the invention based on beta pruning graph model, PageRank algorithm is executed on the figure G in Fig. 2
The importance of Webpage is ranked up, ranking results are obtained.
The calculating process of PageRank algorithm is as follows:
If the PageRank value of page x is P (x), the hyperlink quantity (i.e. vertex x's goes out number of edges amount) that page x includes is N
(x), the collection of all pages (i.e. the child node of vertex x) pointed by page x is combined into B (x), then for any page i:
P (i)=C1 ∑j∈B(i)P (j)/N (j)+C2 (formula 1)
In formula 1, i is any page;P (i) is the PageRank value of page i;C1, C2 are constant, in the present embodiment, C1=
0.85, C2=0.15;B (i) is the set of all pages pointed by page i;J is any page in set B (i);P (j) is
The PageRank value of page j;N (j) is the hyperlink quantity that page j includes;
When initial, the PageRank value initial assignment of all pages is 1, then starts to carry out according to above-mentioned formula (formula 1)
Iterative calculation;The condition of convergence of algorithm is that the PageRank value of all pages in figure no longer changes, and is expressed as formula 2:
∑i∈v(G)|Pnew(i)-Pold(i) | < ε (formula 2)
In formula 2, V (G) is the set for scheming all pages in G, and i is any page in V (G), PnewIt (i) is when previous round changes
The PageRank value of page i, P after generationoldIt (i) is the PageRank value of page i after last round of iteration, ε is that mathematics is anticipated
Minimum in justice in the present embodiment, takes ε=0.0000001.
Using the figure loaming method provided by the invention based on beta pruning graph model, PageRank is executed on the figure G in Fig. 2
Algorithm is ranked up the importance of Webpage, obtains ranking results by following calculating process:
1) pretreatment stage identifies the topological characteristic of figure:
11) data in figure G are read in, figure G illustrates seven linking relationships between the page and the page, opens up to " boundary point "
Feature is flutterred to scan for and mark;" boundary point " in Fig. 2 is respectively the page (being expressed as vertex) v1, v2, v6, v7, wherein v1
For a class boundary point, v2 is b class boundary point, and v6 is d class boundary point, and v7 is c class boundary point;
12) it makes marks for above-mentioned boundary point v1, v2, v6, v7;To a/b class boundary point v1, v2, carried out initially by formula 1
Change and calculate, obtains following result:
P (v1)=0.15;
P (v2)=0.85*P (1)/1+0.15=0.28;
13) the child node v3 of v2 is not boundary point, so the calculated result of v2 needs to pass to v3.
As shown in formula 1, the PageRank value needs for calculating v3 first seek the P (j) of all father nodes of v3/N (j) value
With, after removing v2, the P (j) v2/N (j) value is needed to be stored in the data structure of v3, so as to v3 calculate when can call
The numerical value, storage value are P (v2)/N (v2)=0.28/1=0.28.
14) side (v2, v3) is removed from graph structure, then will not be revisited when accessing the father node of v3 and asks vertex v 2.
2) formal calculation stages carry out beta pruning calculating to figure:
21) the vertex v 1-v7 in figure is successively accessed, wherein v1, v2, v6, v7 are labeled, skip and do not access;To v3, v4,
V5 passes through formula 1 respectively and is calculated and (retain two-decimal):
As shown in Figure 3, v3 does not have father node in pretreated figure G ', and the calculated result of v2 is stored in v3
0.28, therefore in first round iteration, P (v3)=0.85*P (2)+0.15=0.39;Later since the new value of not no father node passes
To v3, P (v3) is remained unchanged;
The father node of v4 is v3, and in each round iteration, v4 takes the last round of result of P (v3) to be calculated by formula 1;V5's
Father node is v3 and v4, and in each round iteration, P (v5) takes the last round of result of P (v3) and P (v4) to pass through formula 1 and calculated;Tool
Body calculated result see the table below 1:
1 beta pruning of table is calculated to the obtained PageRank value in non-boundary point iterative process
P(v3) | P(v4) | P(v5) | |
Initial value | 1.00 | 1.00 | 1.00 |
1 wheel iteration | 0.39 | 0.58 | 1.42 |
2 wheel iteration | 0.39 | 0.31 | 0.56 |
3 wheel iteration | 0.39 | 0.31 | 0.58 |
4 wheel iteration | 0.39 | 0.31 | 0.58 |
Judge whether to reach the condition of convergence according to formula 2, in the present embodiment, takes ε=0.0000001 in formula 2.4th wheel iteration
Afterwards, | Pnew(v3)-Pold(v3)|+|Pnew(v4)-Pold(v4)|+|Pnew(v5)-Pold(v5) |=0 < ε has reached the condition of convergence,
Algorithm terminates.
3) supplement calculation of finishing phase:
Calculating is updated by formula 1 to v6, v7:
P (v6)=0.85*P (5)/1+0.15=0.64
P (v7)=0.85*P (6)/1+0.15=0.69
So final result are as follows:
The PageRank value final result that 2 beta pruning of table is calculated
P(v1) | P(v2) | P(v3) | P(v4) | P(v5) | P(v6) | P(v7) |
0.15 | 0.28 | 0.39 | 0.31 | 0.58 | 0.64 | 0.69 |
The figure loaming method provided by the invention based on beta pruning graph model is utilized as a result, is executed on the figure G in Fig. 2
PageRank algorithm is ranked up the importance of Webpage, the PageRank ranking results of acquisition are as follows: v7 > v6 > v5 > v3
>v4>v2>v1。
Method can reduce the effect of computing cost to illustrate the invention, be listed below straight using existing not pruning method
The iterative process that PageRank calculating is carried out to Fig. 2 is connect, as control.Directly Fig. 2 is carried out using existing not pruning method
The iterative process that PageRank is calculated is as shown in table 2:
Table 2
P(v1) | P(v2) | P(v3) | P(v4) | P(v5) | P(v6) | P(v7) | |
Initial value | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
1 wheel iteration | 0.15 | 1.00 | 1.00 | 0.58 | 1.43 | 1.00 | 1.00 |
2 wheel iteration | 0.15 | 0.28 | 1.00 | 0.58 | 1.06 | 1.36 | 0.58 |
3 wheel iteration | 0.15 | 0.28 | 0.39 | 0.58 | 1.06 | 1.05 | 1.31 |
4 wheel iteration | 0.15 | 0.28 | 0.39 | 0.31 | 0.81 | 0.58 | 1.05 |
5 wheel iteration | 0.15 | 0.28 | 0.39 | 0.31 | 0.58 | 0.84 | 1.05 |
6 wheel iteration | 0.15 | 0.28 | 0.39 | 0.31 | 0.58 | 0.64 | 0.86 |
7 wheel iteration | 0.15 | 0.28 | 0.39 | 0.31 | 0.58 | 0.64 | 0.69 |
8 wheel iteration | 0.15 | 0.28 | 0.39 | 0.31 | 0.58 | 0.64 | 0.69 |
As can be seen that final calculation result is consistent.It uses the iteration wheel number of existing not pruning method for 8 wheels, and uses
The method of the present invention makes the iteration wheel number after beta pruning there was only 4 wheels, and only needs to calculate the PageRank on three vertex in each round
Value, greatly reduces computing cost.
It should be noted that the purpose for publicizing and implementing example is to help to further understand the present invention, but the skill of this field
Art personnel, which are understood that, not to be departed from the present invention and spirit and scope of the appended claims, and various substitutions and modifications are all
It is possible.Therefore, the present invention should not be limited to embodiment disclosure of that, and the scope of protection of present invention is with claim
Subject to the range that book defines.
Claims (6)
1. a kind of Web page importance sort method of the figure roaming parallel computation based on beta pruning, by the link between webpage and webpage
Relationship is expressed as figure G, respectively represents the different pages with vertex v i in figure, and being directed toward in the arrow representation page i of vj with vertex v i has
It is directed toward the hyperlink of page j;Classify to vertex in figure, cut operator is carried out for the boundary point in figure, by wiping out side
Boundary's point optimizes calculating process, including pretreatment stage, beta pruning calculation stages, ending supplement calculation stage;It specifically includes as follows
Step:
1) it in pretreatment stage, identifies the topological characteristic of figure, pretreatment operation is carried out to figure;Specific step is as follows:
11) diagram data is read in, is scanned for according to the topological characteristic on vertex in figure, obtains boundary point and non-boundary point;Boundary point
It is divided into a class, b class, c class and d class;Specifically:
A class: enter the vertex that number of edges amount is 0;
B class: enter side, go out the vertex that number of edges amount is 1, and the father node u1 eligible b1 or b2 on the vertex:
The number of edges amount that enters of b1:u1 is 0, and number of edges amount is 1 out;
B2:u1 is b class vertex;
C class: go out the vertex that number of edges amount is 0;
D class: enter side, go out the vertex that number of edges amount is 1, and the child node u2 eligible d1 or d2 on the vertex:
The number of edges amount that goes out of d1:u2 is 0, and entering number of edges amount is 1;
D2:u2 is d class vertex;
12) a class and b class boundary point are initialized, i.e., is calculated according to the calculation method of selection, obtain corresponding boundary point
Calculated result;
13) a class and b class boundary point are transmitted the calculated result of step 12) when their child node is non-boundary point
To their child node;
14) side between a class and b class boundary point and their child node is left out from graph structure;
2) in calculation stages, beta pruning iterative calculation is carried out to pretreated figure;
21) vertex is skipped if the vertex is boundary point in the vertex successively accessed in figure;Otherwise step is utilized to the vertex
Rapid 12) the described calculation method re-starts calculating, obtains new calculated result, the calculated result as the vertex;
22) after vertex all accesses, according to the condition of convergence set by the step 12) calculation method, when in current figure
When all vertex reach the condition of convergence, terminates beta pruning and calculate;Otherwise, 21~22 are repeated and carries out beta pruning iterative calculation;
3) in finishing phase, supplement calculation is carried out;
To c class and d class boundary point, is calculated according to calculation method described in step 12), obtain calculated result, as corresponding edge
The result of boundary's point;
4) it is ranked up according to the calculated result on all vertex, obtains the importance ranking result of webpage.
2. the Web page importance sort method of the figure roaming parallel computation based on beta pruning as described in claim 1, characterized in that step
Rapid 12) the described calculation method is PageRank algorithm or signal source shortest path algorithm.
3. the Web page importance sort method of the figure roaming parallel computation based on beta pruning as claimed in claim 2, characterized in that institute
Stating signal source shortest path algorithm is dijkstra's algorithm, bellman-ford algorithm or SPFA algorithm.
4. the Web page importance sort method of the figure roaming parallel computation based on beta pruning as claimed in claim 2, characterized in that institute
State PageRank method specifically:
If the PageRank value of page x is P (x), the hyperlink quantity that page x includes is N (x), all pages pointed by page x
The collection in face is combined into B (x), and the PageRank value of any page i is calculated by formula 1:
P (i)=C1 ∑j∈B(i)P (j)/N (j)+C2 (formula 1)
In formula 1, i is any page;P (i) is the PageRank value of page i;C1, C2 are constant;B (i) is pointed by page i
The set of all pages;J is any page in set B (i);P (j) is the PageRank value of page j;N (j) is that page j includes
Hyperlink quantity.
5. the Web page importance sort method of the figure roaming parallel computation based on beta pruning as claimed in claim 2, characterized in that institute
The PageRank value that the iteration convergence condition of PageRank method is stated as all pages in figure no longer changes, and is expressed as formula 2:
∑i∈V(G)|Pnew(i)-Pold(i) | < ε (formula 2)
In formula 2, V (G) is the set for scheming all pages in G, and i is any page in V (G), Pnwe(i) for when previous round iteration knot
The PageRank value of page i, P after beamoldIt (i) is the PageRank value of page i after last round of iteration, ε is in mathematical meaning
Minimum.
6. the Web page importance sort method of the figure roaming parallel computation based on beta pruning as claimed in claim 5, characterized in that ε
Value is 0.0000001.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610192758.9A CN105808779B (en) | 2016-03-30 | 2016-03-30 | Figure roaming parallel calculating method and application based on beta pruning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610192758.9A CN105808779B (en) | 2016-03-30 | 2016-03-30 | Figure roaming parallel calculating method and application based on beta pruning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105808779A CN105808779A (en) | 2016-07-27 |
CN105808779B true CN105808779B (en) | 2019-06-07 |
Family
ID=56459392
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610192758.9A Expired - Fee Related CN105808779B (en) | 2016-03-30 | 2016-03-30 | Figure roaming parallel calculating method and application based on beta pruning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105808779B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107992572A (en) * | 2017-11-30 | 2018-05-04 | 天津大学 | A kind of distributed graph coloring algorithm based on Pregel |
CN110689116B (en) * | 2019-09-24 | 2022-12-27 | 安徽寒武纪信息科技有限公司 | Neural network pruning method and device, computer equipment and storage medium |
CN110780947B (en) * | 2019-10-21 | 2023-10-13 | 深圳大学 | PageRank parallel computing acceleration method for social graph data |
CN111369052B (en) * | 2020-03-03 | 2021-02-12 | 中铁工程设计咨询集团有限公司 | Simplified road network KSP optimization algorithm |
WO2022000375A1 (en) | 2020-07-01 | 2022-01-06 | Paypal, Inc. | Graph storage in database |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014092536A1 (en) * | 2012-12-14 | 2014-06-19 | Mimos Berhad | A system and method for dynamic generation of distribution plan for intensive social network analysis (sna) tasks |
CN103559016B (en) * | 2013-10-23 | 2016-09-07 | 江西理工大学 | A kind of Frequent tree mining method for digging based on graphic process unit parallel computation |
CN103559258A (en) * | 2013-11-04 | 2014-02-05 | 同济大学 | Webpage ranking method based on cloud computation |
CN104063507B (en) * | 2014-07-09 | 2017-10-17 | 时趣互动(北京)科技有限公司 | A kind of figure computational methods and system |
CN104951505A (en) * | 2015-05-20 | 2015-09-30 | 中国科学院信息工程研究所 | Large-scale data clustering method based on graphic calculation technology |
-
2016
- 2016-03-30 CN CN201610192758.9A patent/CN105808779B/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN105808779A (en) | 2016-07-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105808779B (en) | Figure roaming parallel calculating method and application based on beta pruning | |
CN103605662B (en) | Distributed computation frame parameter optimizing method, device and system | |
CN105677683B (en) | Batch data querying method and device | |
WO2018205892A1 (en) | Incremental graph computations for querying large graphs | |
Klein et al. | A fully dynamic approximation scheme for shortest paths in planar graphs | |
CN103246732B (en) | A kind of abstracting method of online Web news content and system | |
CN108228724A (en) | Power grid GIS topology analyzing method and storage medium based on chart database | |
van Hoesel et al. | Using geometric techniques to improve dynamic programming algorithms for the economic lot-sizing problem and extensions | |
CN105224959A (en) | The training method of order models and device | |
CN103699664A (en) | Dynamic topology analysis method for power distribution network | |
CN102737114A (en) | MapReduce-based big picture distance connection query method | |
CN109241355A (en) | Accessibility querying method, system and the readable storage medium storing program for executing of directed acyclic graph | |
CN105138600A (en) | Graph structure matching-based social network analysis method | |
CN104748757B (en) | A kind of data in navigation electronic map update method and device | |
Tholey | Linear time algorithms for two disjoint paths problems on directed acyclic graphs | |
CN115240048A (en) | Deep learning operator positioning fusion method and device for image classification | |
Born et al. | Layout embedding via combinatorial optimization | |
CN106445913A (en) | MapReduce-based semantic inference method and system | |
CN104572832B (en) | A kind of demand meta-model construction method and device | |
Sui et al. | Learning 3-opt heuristics for traveling salesman problem via deep reinforcement learning | |
CN107330169A (en) | A kind of regional cold supply system pipe network route planning method and system | |
CN106547916A (en) | A kind of user's portrait tag queries method and device | |
CN104462095A (en) | Extraction method and device of common pars of query statements | |
Wang et al. | Application of A* algorithm in intelligent vehicle path planning | |
CN114564523B (en) | Big data vulnerability analysis method and cloud AI system for intelligent virtual scene |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190607 |
|
CF01 | Termination of patent right due to non-payment of annual fee |