CN105069290B

CN105069290B - A kind of parallelization key node towards consignment data finds method

Info

Publication number: CN105069290B
Application number: CN201510469302.8A
Authority: CN
Inventors: 马云龙; 刘敏; 桂峰; 章锋; 袁菡; 孙源
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2015-08-03
Filing date: 2015-08-03
Publication date: 2017-12-26
Anticipated expiration: 2035-08-03
Also published as: CN105069290A

Abstract

The present invention relates to a kind of parallelization key node towards consignment data to find method, including：Step S1：According to the transmitting-receiving total degree of each node obtains node liveness, the weights using node liveness as node itself in setting time in consignment data；Step S2：According in consignment data in setting time the frequency of interaction of each node pair and shared neighbours several times figureofmerit obtain each node pair side weights, be an oriented double weighted network figures by the net definitions formed by consignment data；Step S3：The weights of node itself and the weights on the side of node pair are added on the basis of PageRank algorithms, concurrently excavate the key node in oriented double weighted network figures.Compared with prior art, the present invention takes full advantage of the information in logistics consignment network, reduces the loss of effective information, improves the accuracy that key node is found in network, while parallelization is run, and substantially increases the efficiency and stability of key node excavation.

Description

A kind of parallelization key node towards consignment data finds method

Technical field

It is crucial more particularly, to a kind of parallelization towards consignment data the present invention relates to social network analysis technical field Node discovery method.

Background technology

After being put forward in nineteen twenty " community network " this concept by British scholar, researchers are to community network Research be never interrupted.Especially now with biology information technology, network technology, the communication technology, social platform quick hair Open up, a huge complicated community network is formd between each individual in community network.In social life, complex network It is closely bound up with our life, we often in contact with to complex network include：Internet, Wan Wei in computer realm Net, communication network, mail network, micro blog network, the albumen of logistics consignment relational network and biomedical sector in logistics Matter and the network of the interphase interaction of protein.Key node is important one kind of generally existing in social network structure Node, the in recent years research to key node in community network are always a focus.Find to close in society and physical network Key node, which to its importance assess, has critically important practical significance.Such as a group of society is searched out in social networks Key node in most active user in body, positioning network attack and defence, determines key person etc. in logistics network.To society Key node in meeting network structure finds to help to excavate the information in community network more profoundly, finds out community structure In key node, the theory and realistic meaning of the 26S Proteasome Structure and Function own profound for understanding community network.

First, the existing research on key node in complex network is all the PageRank algorithms using Google mostly And it is improved on its basis.But most of key nodes find that algorithm only considered the weights on side, few people The weights of node itself are taken into account, cause to have ignored many useful information when excavating key person in a network, It has impact on the accuracy of key node discovery.Secondly, outside our definition node liveness are as node itself weights, Wo Menyong Two factors calculate the weights on side, one be two nodes connecting side shared neighbours' number, another is between node pair Frequency of interaction, the information being thus sufficiently used in network.Finally, due to computer technology and Internet technology is fast Exhibition is hailed, the ability that people obtain data constantly strengthens, and the network size of researcher's research is also from original tens to hundreds of Individual node rise to million to millions scale, while in view of MapReduce programming frameworks be adapted to handle large-scale data, Therefore the present invention proposes to be based on MapReduce programming frameworks, realizes the parallelization key node hair towards extensive consignment data It is existing.

The content of the invention

It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide one kind towards consignment data Parallelization key node find method, based on real logistics network, by node liveness, node frequency of interaction and node pair Shared neighbours' number etc. consider in weight computing, take full advantage of the information in logistics consignment network, reduce effective information Loss, the accuracy that key node in network is found is improved, and based on MapReduce programming frameworks, in comparative maturity Google PageRank algorithms on make improvements, realize the parallelization of algorithm, substantially increase key node excavation Efficiency and stability.

The purpose of the present invention can be achieved through the following technical solutions：

A kind of parallelization key node towards consignment data finds method, including：

Step S1：According to the transmitting-receiving total degree of each node obtains node liveness in setting time in consignment data, will save Weights of the point liveness as node itself；

Step S2：According to the frequency of interaction of each node pair and shared neighbours figureofmerit several times in setting time in consignment data The weights on the side of each node pair are obtained, are an oriented double weighted network figures by the net definitions formed by consignment data；

Step S3：The weights of node itself and the weights on the side of node pair are added on the basis of PageRank algorithms, and The key node in oriented double weighted network figures is excavated capablely.

The node liveness meets below equation：

a_i=M_i/Max_num (1)

In formula, a_iRepresent the node liveness of node i, M_iRepresent that node i receives and dispatches total degree, Max_num in setting time Represent all M_iIn maximum.

The weights on the side meet below equation：

w_ji=a × freq_ij+(1-a)Neighbor(i,j) (2)

In formula, w_jiRepresent the weights on the side between node i and node j, freq_ijRepresent the friendship between node i and node j Crossing over frequency, figureofmerit, a represent Dynamic gene to the shared neighbours between Neighbor (i, j) expression node is and node j several times.

The frequency of interaction meets below equation：

freq_ij=n_ij/Max_num (3)

In formula, freq_ijRepresent the frequency of interaction between node i and node j, n_ijRepresent node i and the side that node j is formed Occurrence number, Max_num represents all n_ijIn maximum.

Figureofmerit meets below equation to the shared neighbours several times：

Neighbor (i, j)=Neighbor_shared_num (i, j)/Max_SharedNum (4)

In formula, Neighbor (i, j) represents shared neighbours between node i and node j figureofmerit several times, Neighbor_ Shared_num (i, j) represents shared neighbours' number between node i and node j, described in Max_SharedNum is represented Maximum in Neighbor_shared_num (i, j).

The step S3 is specially：

301：The PageRank value of each node is obtained, meets below equation：

PR(p_i)=a_i/N+(1-a_i)×ΣPR(p_j)×w_ji/L(p_j) (5)

In formula, PR (p_i) represent node i PageRank value, p_j∈M(p_i), M (p_i) represent to point to the set of node i, L (p_j) represent to point to the out-degree of this node of node i, N represents node number total in consignment data, a_iRepresent the section of node i Point liveness, w_jiRepresent the weights on the side between node i and node j；

302：For each node, the PageRank value obtained twice before and after contrast, whether the absolute value of both differences of judgement More than given threshold epsilon, if so, jump procedure 301, continues to obtain the PageRank value of each node of next round, if it is not, performing step Rapid 303；

303：The PageRank value of each node finally obtained to step 302 is ranked up, and the node of k is institute before ranking The key node of excavation, k are the quantity of key node.

It is parallel to be based on the progress of MapReduce programming frameworks for the data of each step in the parallelization key node discovery method Change is handled.

Compared with prior art, the present invention has advantages below：

1) because in the existing discovery algorithm for key node in complex network, few researchers are simultaneously in view of section The weights of point itself and the weights on the side influenceed by frequency of interaction between node and shared neighbours' number, and the inventive method is designing In be additionally contemplates that the liveness of node itself, the weights using node liveness as node itself, considering the weights on side When, shared neighbours' number of the factor, i.e. frequency of interaction between node and node of the weights on two decision sides of introduction, sufficiently The information in network is make use of, improves the accuracy of algorithm, is suitable for key node in large scale community network and finds.

2) consignment network is built based on consignment data, PageRank algorithms is applied to the net of logistics consignment data formation Key node is excavated in network, suitable for the accurately and quickly excavation of the key node magnanimity consignment data.

3) parallelization calculating is realized to the PageRank after improvement based on MapReduce programming frameworks, is greatly improved The autgmentability of algorithm, digging efficiency and stability.

Brief description of the drawings

Fig. 1 is the overall flow figure of Parallelization Scheme of the present invention；

Fig. 2 is the procedure chart of MapReduce processing datas；

Fig. 3 is the schematic diagram that shared neighbours' number defines.

Embodiment

The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.The present embodiment is with technical solution of the present invention Premised on implemented, give detailed embodiment and specific operating process, but protection scope of the present invention is not limited to Following embodiments.

As shown in Fig. 2 MapReduce passes through the step of division, mass data is grouped and its processing will be distributed to Each partial node under host node is completed jointly, and the result of calculation for finally integrating each partial node obtains final result. Whole data handling procedure is abstracted as two parts by MapReduce, with function representation, respectively map and reduce.Map's Work is that reduce is responsible for the result for collecting multitasking into multiple by Task-decomposing.Data set under MapReduce frameworks Multiple small data sets can be resolved into, and processing can be parallelized.

As shown in figure 1, a kind of parallelization towards consignment data based on MapReduce frameworks and PageRank algorithms is closed Key node discovery method includes：

Step S1：According to the transmitting-receiving total degree of each node obtains node liveness in setting time in consignment data, will save Weights of the point liveness as node itself.It is specific as follows：

Based on MapReduce frameworks, by the multiple data block split () of the random division of raw data set to be excavated, Computer node in MapReduce clusters starts multiple Mapper, and each Mapper stages handle corresponding data block respectively Information：The relevant information of node data is read, by the processing routine map () in map functions, is translated into<key, value>Form exports, and it is present node to obtain key values, and value values are the adjacent nodes for having interactive relation with present node.Example It is A → B if any a consignment behavior, A represents the sender in this consignment behavior here, and B is represented in this consignment behavior Addressee, to be oriented, for A → B when map export, although input is oriented, but export be A-B with B-A, it is undirected.Finally, the output result of each map functions is transferred in the processing routine reduce () in Reudcer stages Carry out result to collect, count the total degree of each node transmitting-receiving express delivery, that is, total degree is received and dispatched, with node:Count data format Write in file and save.

By transmitting-receiving total degree calculate node liveness, meet below equation：

a_i=M_i/Max_num (1)

Step S2：According to the frequency of interaction of each node pair and shared neighbours figureofmerit several times in setting time in consignment data The weights on the side of each node pair are obtained, are an oriented double weighted network figures by the net definitions formed by consignment data.Specifically It is as follows：

201：Calculate the frequency of interaction of each node pair in setting time：

Data prediction is same as above, and is several pieces by raw data set random division, the computer node of MapReduce clusters Starting multiple Mapper, each Mapper stages handle corresponding data block information respectively, read the relevant information on node and side, It is translated into<key,value>Form exports, and the key values of acquisition are nodes pair, and value values are 1, for example, once A → B Consignment behavior, map output is<A-B,1>With<B-A,1>.Then, each map output is sent to the progress of Reducer ends Collect, count the total degree that each node occurs to the side of formation, can all have finally for each consignment behavior<node1- node2:count>With<node2-node1:count>Form is write in file and saved.

Then frequency of interaction meets below equation：

freq_ij=n_ij/Max_num (2)

202：Each node is to enjoying neighbours' figureofmerit several times in calculating setting time：

Data prediction is same as above, and to raw data set processing, by Mapper processing, is obtained<key,value>Form Output, the key of acquisition value is node pair, and value values are the common adjacent nodes of the node one of the node centering two, most Afterwards, collecting for result is carried out at Reducer ends, shared neighbours number of each node to i.e. each side is counted, finally for every A line preserves two values, such as A → B, and what finally we came out is<A-B:count>With<B-A:count>.

As shown in figure 3, two nodes A and B of interaction shared neighbours' number ,=the shared neighbours' number that sends+is shared and receives neighbours Number, shares that neighbor node number is more, and showing it to exist together, a possibility for associating scope is bigger, and relation is closer, then shares neighbour Occupy figureofmerit several times and meet below equation：

Neighbor (i, j)=Neighbor_shared_num (i, j)/Max_SharedNum (3)

In formula, Neighbor (i, j) represents shared neighbours between node i and node j figureofmerit several times, Neighbor_ Shared_num (i, j) represents shared neighbours' number between node i and node j, and Max_SharedNum represents Neighbor_ Maximum in shared_num (i, j).

203：Calculate the weights on the side of each node pair, the weight computing formula on the side between two nodes is as follows：

w_ji=a × freq_ij+(1-a)Neighbor(i,j) (4)

Step S3：The weights of node itself and the weights on the side of node pair are added on the basis of PageRank algorithms, and The key node in oriented double weighted network figures is excavated capablely.Specially：

301：According to the step S2 node liveness calculated and the weights of each edge, by Google's webpage after improving Rank algorithm-PageRank algorithms obtain the PageRank value of each node, and PageRank calculation formula are as follows：

PR(p_i)=a_i/N+(1-a_i)×ΣPR(p_j)×w_ji/L(p_j) (5)

302：After the PageRank value for calculating all nodes, PageRank value that last computation is come out and current PankRank values are contrasted, if the PageRank value of each node and the absolute value of the difference of last time are more than given threshold value ε, then repeat step 301 calculate the PageRank value of each node of next round.If PageRank value before and after this twice The absolute value of difference is less than given threshold epsilon, then performs step 303；

303：The PageRank value of each node finally obtained to step 302 is ranked up, and the node of k is institute before ranking K most important key nodes are excavated, k is the quantity of key node.

Illustrated below by taking actual program in MapReduce frameworks as an example：

1) consignment data to be excavated are divided into multiple data blocks to handle respectively, by a MapReduce operation, output <key,value>Formal model, wherein, key values are one node is of people in network, and value values are that have consignment behavior with node i Node number, including the number of sender and addressee.Specifically include following steps：

11) consignment data to be excavated are divided into data block form, Mapper processing is given in units of data block.

12) each calculate node handles corresponding data block respectively in cluster, performs a MapReduce operation.

The Mapper stages：

Input：The original consignment data of analysis to be excavated；

Output：<node_i,node_j>, its interior joint node_iAnd node_jAll represent the addressee for participating in a consignment behavior People and sender, and node_iAnd node_jIt can be addressee or be sender, so in the Mapper stages, for< node_i,node_j>Such node pair, we export in the Mapper stages and should exported<node_i,node_j>Also to export< node_j,node_i>。

The Reducer stages：

Input：<node_i,node_j>；

Output：<node_i,count>, wherein key is node node_i, value be and node node_iThere is transmitting-receiving relation Degree of node count, result is write into a file A1 on HDFS (Hadoop Distributed File System).

2) consignment data to be excavated are divided into multiple data blocks to handle respectively, by a MapReduce operation, output <key,value>Formal model, wherein, key values are that the node pair of a consignment behavior occurs in logistics network, and value values are Integer, represent number of each node to appearance.Specifically include following steps：

21) consignment data to be excavated are divided into data block form, Mapper processing is given in units of data block.

22) each calculate node handles corresponding data block respectively in cluster, performs a MapReduce operation.

The Mapper stages：

Input：The original consignment data of analysis to be excavated；

Output：<(node_i,node_j),1>, its interior joint node_iAnd node_jAll represent and participate in a consignment behavior Addressee and sender, this output form illustrate node node_iTo node node_jOnce consignment behavior.

The Reducer stages：

Input：<(node_i,node_j),1>；

Output：<(node_i,node_j),count>, wherein key is node to (node_i,node_j), this section of value Point writes result a file A2 on HDFS to the number of appearance.

3) similar with previous step, input is still original consignment data set, calculates shared neighbours' number of egress pair, It is node pair to obtain key values, the consignment behavior of sender and addressee is represented, according to the definition of the shared neighbours of node pair Shared neighbours' number of simultaneously node pair is calculated, specifically includes following steps：

31) consignment data to be excavated are divided into data block form, Mapper processing is given in units of data block.

32) each calculate node handles corresponding data block respectively in cluster, performs MapReduce operations twice.

The Mapper1 stages：

Input：The original consignment data of analysis to be excavated；

Output：<node_i,node_j>, its interior joint node_iAnd node_jAll represent the addressee for participating in a consignment behavior People and sender, this output form illustrate node node_iTo node node_jOnce consignment behavior.

The Reducer1 stages：

Input：<node_i,node_j>；

Output：<(node_i,node_j),adjacent nodes of node_i>, wherein key is node to (node_i, node_j), value is the node set being connected with this node, and obtained result is exactly the critical sheet form in figure.

The Mapper2 stages：

Input：<(node_i,node_j),adjacent nodes of node_i>, that is to say, that Mapper2 input is just It is Mapper1 Reduce output results；

Output：<(node_a,node_b),node_i>, wherein key values (node_a,node_b) represent node node_iNeighbours The node pair that any two node forms in node, the sequence number sequence of its previous node of interior joint centering are forward compared with the latter.Example Such as, for input<A,(B,C,D)>,<B,(C,D)>, then Mapper2 output is exactly<(B,C),A>、<(B,D),A>、< (C,D),A>,<(C,D),B>。

The Reducer2 stages：

Input：<(node_a,node_b),node_i>；

Output：<(node_a,node_b),common adjacent nodes of node_a,node_b>, such as upper one In the hypothesis of step, the output in Reducer2 stages is exactly<(B,C),A>、<(B,D),A>、<(C,D),A,B>.Result is write into text In part A3.

4) PageRank value of each node is calculated, before Mapper stages and Reduce stages is carried out, is individually write One program individually write a stand-alone program be used for read the data that above obtain, read in setup functions in the present embodiment Data in file A1, file A2 and file A3, each is then obtained according to the definition of node liveness and calculation formula (1) The liveness of node, using node character string form as key values, deposited the node liveness calculated as value values Into the HashMap set hashmap1 defined.Then according to the definition of the frequency of interaction between node and calculation formula (2), the shared neighbours between node several times come by the calculation formula (4) of the definition of figureofmerit and the weights on calculation formula (3) and side Calculate the weights on side.Wherein a is the Dynamic gene for two factors for influenceing side right value, can control the two Effects of Factors sides Weight.Then using the character string forms on side as key values, by the weight w on the side_jiIt is stored in and defines as value values HashMap set hashmap2 in.Then algorithms are found according to based on double weighting key nodes after PageRank algorithm improvements Definition and formula (5) calculate the PageRank value of each node.A MapReduce operation is performed, calculates each section The PageRank value of point, the then input using the result that last time obtains as MapReduce operations next time so ceaselessly change In generation, goes down, until the absolute value of the PageRank value difference of each corresponding node in operation twice in succession is less than given threshold value ε just stops iterative process, so obtains result.Specifically include following steps：

41) calculate node liveness and the weights on side in setup functions

File A1, A2, A3 data are read in setup functions, it is then public according to the definition of node liveness and calculating Formula calculates the liveness of each node, and result is stored in hashmap1 set.Then further according to node frequency of interaction The fixed and calculation formula of shared neighbours' number between node calculates the weights on the side between every a pair of nodes, and its result is deposited In hashmap2 set.

42) each calculate node handles corresponding data block respectively in cluster, performs a MapReduce operation, specifically Processing is mainly the pretreatment of data, and main step is as follows：

The Mapper stages：

Input：<node_i,node_j>The consignment data to be excavated of form, original data mode style representatives node node_iGive node node_jAn express delivery, directive, node are posted_i→node_j, finally obtained critical sheet form.

Output：<node_j,node_i>, initial data is changed into a direction, form has changed node into_j→node_i

The Reducer stages：

Input：<node_j,list of nodes point to node_j>.After Mapper output results, pass through The processing of shuffle and combine functions, Reduce input key values are addressee node_j, value values be send by special delivery to node_jSender set.

Output：<node_j,list of nodes point to node_j>, Reduce directly exports result, such as Obtain result<A,(B,C,D)>, represent node B, C, D and all posted an express delivery to node A.A is addressee, and B, C, D are Sender gathers.

43) this step is arrived, it is known that each liveness of node, the weights on each side and initial data are after treatment Sheet form is abutted, each calculate node handles corresponding data block respectively in cluster, then performs a MapReduce operation, root According to the key node innovatory algorithm formula (5) based on PageRank, the PageRank value of each node is calculated.We are with puppet below Calculating details of the form of code to egress PageRank.

Algorithm 1:Map(key,value)

Input:

Logistics network nodes -- logistic network nodal points

PR(p_i):The PageRank value of node -- PageRank value

w_ij:The weights on the value of the edges (i, j) -- side

Links[p₁,p₂,p₃,...p_m]:all the node p_j linked by node p_i

Output:

List of<key:value>

1.Emit(p_i,links[p₁,p₂,p₃,...p_m])

2.For each p_j in links[p₁,p₂,p₃,...p_m]

3.Partial (j)=PR (p_i)×w_ij/L(p_j)

4.Emit(p_j,partial(j))

5.End For

Algorithm 2:Reduce(key,value)

Input:

Logistics Network node p_j list of<p_j,partial(j)>

Output:

PR(p_j):the PageRank value of user p_j

1.//Initial new PageRank value of node p_j

2.PR(p_j)=0

3.For each partial(j)in the list

4.PR(p_j) +=partial (j)

5.End For

6.PR(p_j)=(1-a) × PR (p_i)+a/N //N be nodes sum

44) when the PageRank value for each node for obtaining calculating for the first time, by the node obtained for the first time Initial p ageRank value of the PageRank value as the node of second of MapReduce operation, then carries out second of iteration To calculate the PageRank value of next iteration process.So using the result that last interative computation calculates as calculating next time The initial p ageRank values of each node, are constantly iterated computing, until the last each node being calculated PageRank value differs with the PageRank value for each node being calculated next time just to be terminated to change no more than given threshold epsilon For computing, what is now obtained is exactly the PageRank value of final each node.Then to each node according to respective PageRank value is ranked up, and k's is exactly preceding k most important key nodes before ranking.

In the existing research to key node, few people pay close attention to the discovery of key node in logistics consignment network, the present invention Based on real logistics network, shared neighbours' number of node liveness, node frequency of interaction and node pair etc. is considered into weights In calculating, the information in logistics consignment network is taken full advantage of, reduces the loss of effective information, improves crucial section in network The accuracy of point discovery, and based on MapReduce programming frameworks, it is right on PageRank algorithms to be obtained in the Google of comparative maturity It is improved, and realizes the parallelization of algorithm, substantially increases the efficiency and stability of key node excavation.

For finding key node in the small scale network that is formed in small data set, traditional uniprocessor algorithm can be very good Meet to require, and efficiency is appropriate.But seem power not for the large scale network that mass data is formed, traditional uniprocessor algorithm From the heart, the method superiority that the present invention puts forward is fairly obvious.

Claims

1. a kind of parallelization key node towards consignment data finds method, it is characterised in that including：

Step S1：According to the transmitting-receiving total degree of each node obtains node liveness in setting time in consignment data, node is lived Weights of the jerk as node itself；

Step S2：According to figureofmerit obtains several times by the frequency of interaction of each node pair and shared neighbours in setting time in consignment data The weights on the side of each node pair, it is an oriented double weighted network figures by the net definitions formed by consignment data；

The weights on the side meet below equation：

w_ji=a × freq_ij+(1-a)Neighbor(i,j) (2)

In formula, w_jiRepresent the weights on the side between node i and node j, freq_ijRepresent the interaction frequency between node i and node j Rate, figureofmerit, a represent Dynamic gene to the shared neighbours between Neighbor (i, j) expression node is and node j several times；

Step S3：The weights of node itself and the weights on the side of node pair are added on the basis of PageRank algorithms, concurrently Excavate the key node in oriented double weighted network figures.

2. a kind of parallelization key node towards consignment data according to claim 1 finds method, it is characterised in that The node liveness meets below equation：

a_i=M_i/Max_num (1)

In formula, a_iRepresent the node liveness of node i, M_iRepresent that node i receives and dispatches total degree in setting time, Max_num is represented All M_iIn maximum.

3. a kind of parallelization key node towards consignment data according to claim 1 finds method, it is characterised in that The frequency of interaction meets below equation：

freq_ij=n_ij/Max_num (3)

In formula, freq_ijRepresent the frequency of interaction between node i and node j, n_ijRepresent the appearance on the side of node i and node j formation Number, Max_num represent all n_ijIn maximum.

4. a kind of parallelization key node towards consignment data according to claim 1 finds method, it is characterised in that Figureofmerit meets below equation to the shared neighbours several times：

Neighbor (i, j)=Neighbor_shared_num (i, j)/Max_SharedNum (4)

5. a kind of parallelization key node towards consignment data according to claim 1 finds method, it is characterised in that The step S3 is specially：

301：The PageRank value of each node is obtained, meets below equation：

PR(p_i)=a_i/N+(1-a_i)×∑PR(p_j)×w_ji/L(p_j) (5)

In formula, PR (p_i) represent node i PageRank value, p_j∈M(p_i), M (p_i) represent to point to the set of node i, L (p_j) table Show the out-degree for this node for pointing to node i, N represents node number total in consignment data, a_iRepresent that the node of node i enlivens Degree, w_jiRepresent the weights on the side between node i and node j；

302：For each node, the PageRank value obtained twice before and after contrast judges whether the absolute value of both differences is more than Given threshold epsilon, if so, jump procedure 301, continues to obtain the PageRank value of each node of next round, if it is not, performing step 303；

303：The PageRank value of each node finally obtained to step 302 is ranked up, and the node of k is is excavated before ranking Key node, k be key node quantity.

6. a kind of parallelization key node towards consignment data according to claim 1 finds method, it is characterised in that The data of each step are based on MapReduce programming frameworks and carry out parallelization processing in the parallelization key node discovery method.