CN105069290A - Parallelization critical node discovery method for postal delivery data - Google Patents

Parallelization critical node discovery method for postal delivery data Download PDF

Info

Publication number
CN105069290A
CN105069290A CN201510469302.8A CN201510469302A CN105069290A CN 105069290 A CN105069290 A CN 105069290A CN 201510469302 A CN201510469302 A CN 201510469302A CN 105069290 A CN105069290 A CN 105069290A
Authority
CN
China
Prior art keywords
node
key
parallelization
represent
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510469302.8A
Other languages
Chinese (zh)
Other versions
CN105069290B (en
Inventor
马云龙
刘敏
桂峰
章锋
袁菡
孙源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201510469302.8A priority Critical patent/CN105069290B/en
Publication of CN105069290A publication Critical patent/CN105069290A/en
Application granted granted Critical
Publication of CN105069290B publication Critical patent/CN105069290B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Telephonic Communication Services (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present invention relates to a parallelization critical node discovery method for postal delivery data. The method comprises the following steps: step S1: acquiring node activity according to the total number of sending and receiving times of each node in a set time in the postal delivery data, and taking the node activity as the own weight value of the node; step S2: acquiring the weight values of edges of each node pair according to the interaction frequency and shared neighbor number metric indexes of each node pair in the set time in the postal delivery data, and defining a network formed by the postal delivery data as a directed double-weighted network graph; and step S3: adding the own weight values of the nodes and the weight values of the edges of the node pairs on the basis of a PageRank algorithm, and excavating critical nodes in the directed double-weighted network graph in parallel. In contrast to the prior art, the parallelization critical node discovery method fully utilizes information in a logistics postal delivery network, reduces the loss of useful information, improves the accuracy of discovery of critical nodes in the network, and parallel operation is implemented at the same time, thereby greatly improving the efficiency and stability of critical node excavation.

Description

A kind of parallelization key node discover method towards consignment data
Technical field
The present invention relates to social network analysis technical field, especially relate to a kind of parallelization key node discover method towards consignment data.
Background technology
After being put forward by British scholar in nineteen twenty " community network " this concept, the research of researchers to community network was never interrupted.Especially the present fast development along with biology information technology, network technology, the communication technology, social platform, defines the community network of a huge complexity between each individuality in community network.In social life, complex network and our life closely bound up, the complex network that we often touch comprises: the Internet in computer realm, WWW, communication network, mail network, micro blog network, the logistics consignment relational network in logistics and interactional network between the protein-protein of biomedical sector.Key node is ubiquitous important a kind of node in social network structure, is a focus in recent years to the research of key node in community network always.In society and physical network, find that key node to be assessed its importance and had very important practical significance.In social networks, such as find out most active user in a public organization, fixer network is attacked and key node in defence, to determine in logistics network key person etc.Key node in social network structure is found that there is and helps more profound the information excavated in community network, find out the key node in community structure, for the theory of 26S Proteasome Structure and Function own profound and the realistic meaning of understanding community network.
First, the existing research about key node in complex network is all utilize the PageRank algorithm of Google and improve on its basis mostly.But most of key node finds that algorithm only considered the weights on limit, and the weights of node self take into account by few people, cause have ignored much useful information when excavating key person in a network, have impact on the accuracy that key node finds.Secondly, our defined node liveness is as outside node self weights, and we are with two because usually calculating the weights on limit, and one is shared neighbours' number of two nodes of fillet, another be node between frequency of interaction, so just make use of the information in network fully.Finally, due to the fast development of computer technology and Internet technology, the ability that people obtain data constantly strengthens, the network size of researchist's research also rises to the scale of 100 ten thousand to millions from original tens to a hundreds of node, consider that MapReduce programming framework is applicable to process large-scale data simultaneously, therefore the present invention proposes based on MapReduce programming framework, and the parallelization key node realized towards extensive consignment data finds.
Summary of the invention
Object of the present invention be exactly in order to overcome above-mentioned prior art exist defect and a kind of parallelization key node discover method towards consignment data is provided, based on real logistics network, by node liveness, node frequency of interaction and the right shared neighbours' number etc. of node are considered in weight computing, take full advantage of the information in logistics consignment network, decrease the loss of effective information, improve the accuracy that in network, key node finds, and based on MapReduce programming framework, make improvements on the PageRank algorithm of the Google of comparative maturity, the parallelization of implementation algorithm, substantially increase efficiency and the stability of key node excavation.
Object of the present invention can be achieved through the following technical solutions:
Towards a parallelization key node discover method for consignment data, comprising:
Step S1: the transmitting-receiving total degree according to node each in setting-up time in consignment data obtains node liveness, using the weights of node liveness as node self;
The net definitions formed by consignment data is an oriented couple of weighting network figure by step S2: the frequency of interaction right according to node each in setting-up time in consignment data and shared neighbours several times figureofmerit obtain the weights on the right limit of each node;
Step S3: the weights adding the weights of ingress self and the right limit of node on the basis of PageRank algorithm, excavates the key node in oriented couple of weighting network figure concurrently.
Described node liveness meets following formula:
a i=M i/Max_num(1)
In formula, a irepresent the node liveness of node i, M irepresent that node i receives and dispatches total degree in setting-up time, Max_num represents all M iin maximal value.
The weights on described limit meet following formula:
w ji=a×freq ij+(1-a)Neighbor(i,j)(2)
In formula, w jirepresent the weights on the limit between node i and node j, freq ijrepresent the frequency of interaction between node i and node j, Neighbor (i, j) represents the shared neighbours figureofmerit several times between node i and node j, and a represents Dynamic gene.
Described frequency of interaction meets following formula:
freq ij=n ij/Max_num(3)
In formula, freq ijrepresent the frequency of interaction between node i and node j, n ijrepresent the occurrence number on the limit that node i and node j are formed, Max_num represents all n ijin maximal value.
Described shared neighbours several times figureofmerit meet following formula:
Neighbor(i,j)=Neighbor_shared_num(i,j)/Max_SharedNum(4)
In formula, Neighbor (i, j) the shared neighbours figureofmerit several times between node i and node j is represented, Neighbor_shared_num (i, j) the shared neighbours' number between node i and node j is represented, Max_SharedNum represents the maximal value in described Neighbor_shared_num (i, j).
Described step S3 is specially:
301: the PageRank value obtaining each node, meets following formula:
PR(p i)=a i/N+(1-a i)×ΣPR(p j)×w ji/L(p j)(5)
In formula, PR (p i) represent the PageRank value of node i, p j∈ M (p i), M (p i) represent the set pointing to node i, L (p j) representing the out-degree of this node pointing to node i, N represents node number total in consignment data, a irepresent the node liveness of node i, w jirepresent the weights on the limit between node i and node j;
302: for each node, the PageRank value of twice acquisition before and after contrast, whether the absolute value of both judgements difference is greater than given threshold epsilon, if so, jump procedure 301, continues the PageRank value obtaining each node of next round, if not, performs step 303;
303: sort to the PageRank value of each node that step 302 finally obtains, before rank, the node of k is excavated key node, and k is the quantity of key node.
In this parallelization key node discover method, the data of each step all carry out parallelization process based on MapReduce programming framework.
Compared with prior art, the present invention has the following advantages:
1) due to existing in the discovery algorithm of key node in complex network, the weights seldom having researcher simultaneously to consider node self and the weights on limit affected by frequency of interaction between node and shared neighbours' number, and the inventive method also take into account the liveness of node self in the design, using the weights of node liveness as node self, when the weights considering limit, introduce the factor that two determine the weights on limit, i.e. shared neighbours' number of internodal frequency of interaction and node, make use of the information in network fully, improve the accuracy of algorithm, be suitable for key node in large scale community network to find.
2) build consignment network based on consignment data, PageRank algorithm is applied in the network of logistics consignment data formation and excavate key node, be applicable to the accurate and excavation fast of key node in magnanimity consignment data.
3) achieve parallelization based on MapReduce programming framework to the PageRank after improved to calculate, substantially increase the extendability of algorithm, digging efficiency and stability.
Accompanying drawing explanation
Fig. 1 is the overall flow figure of Parallelization Scheme of the present invention;
Fig. 2 is the procedure chart of MapReduce process data;
Fig. 3 is the schematic diagram of shared neighbours' number definition.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in detail.The present embodiment is implemented premised on technical solution of the present invention, give detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.
As shown in Figure 2, the step of MapReduce by dividing, divides into groups mass data and by its process, each partial node distributed under host node completes jointly, the result of calculation finally integrating each partial node obtains net result.MapReduce by abstract for whole data handling procedure be two parts, with function representation, be respectively map and reduce.The work of map is that Task-decomposing is become multiple, and reduce is responsible for the result gathering multitasking.Data set under MapReduce framework can resolve into multiple small data set, and can by parallelization process.
As shown in Figure 1, a kind of parallelization key node discover method towards consignment data based on MapReduce framework and PageRank algorithm comprises:
Step S1: the transmitting-receiving total degree according to node each in setting-up time in consignment data obtains node liveness, using the weights of node liveness as node self.Specific as follows:
Based on MapReduce framework, by multiple for division random for raw data set to be excavated data block split (), computer node in MapReduce cluster starts multiple Mapper, each Mapper stage processes corresponding data block information respectively: the relevant information reading node data, handling procedure map () in map function, be translated into <key, value> form exports, obtaining key value is present node, and value value is the adjacent node having interactive relation with present node.Such as have a consignment behavior to be A → B, A represents the sender in this consignment behavior here, and B represents the addressee in this consignment behavior, for oriented, when map exports for A → B, although be input as oriented, but that export is A-B and B-A, for undirected.Finally, the Output rusults of each map function is transferred to and carries out result in the handling procedure reduce () in Reudcer stage and gather, add up the total degree of each node transmitting-receiving express delivery, namely receive and dispatch total degree, write in file with the data layout of node:count and save.
By transmitting-receiving total degree computing node liveness, meet following formula:
a i=M i/Max_num(1)
In formula, a irepresent the node liveness of node i, M irepresent that node i receives and dispatches total degree in setting-up time, Max_num represents all M iin maximal value.
The net definitions formed by consignment data is an oriented couple of weighting network figure by step S2: the frequency of interaction right according to node each in setting-up time in consignment data and shared neighbours several times figureofmerit obtain the weights on the right limit of each node.Specific as follows:
201: calculate the frequency of interaction that in setting-up time, each node is right:
Data prediction is the same, raw data set random division is several pieces, the computer node of MapReduce cluster starts multiple Mapper, each Mapper stage processes corresponding data block information respectively, read the relevant information on node and limit, be translated into <key, value> form exports, the key value obtained is node pair, value value is 1, such as, the once consignment behavior of A → B, map exports as <A-B, 1> and <B-A, 1>.Then, the output of each map is sent to Reducer end and gathers, count the total degree that each node occurs the limit formed, finally <node1-node2:count> and <node2-node1:count> form all can be had to write in file for each consignment behavior and save.
Then frequency of interaction meets following formula:
freq ij=n ij/Max_num(2)
In formula, freq ijrepresent the frequency of interaction between node i and node j, n ijrepresent the occurrence number on the limit that node i and node j are formed, Max_num represents all n ijin maximal value.
202: in calculating setting-up time, each node is to enjoying neighbours' figureofmerit several times:
Data prediction is the same, to raw data set process, through the process of Mapper, obtain <key, value> form exports, the value of the key obtained is node pair, value value is the common adjacent nodes of this node centering two nodes one, finally, gathering of result is carried out at Reducer end, count each node to the shared neighbours' number being each limit, finally two values are preserved for each limit, such as A → B, that finally we come out is <A-B:count> and <B-A:count>.
As shown in Figure 3, shared neighbours' number of two mutual node A and B=share sends neighbours' number+share accepting neighbours' number, and shared neighbor node number is more, shows that its possibility of associating scope that exists together is larger, relation is tightr, then share neighbours several times figureofmerit meet following formula:
Neighbor(i,j)=Neighbor_shared_num(i,j)/Max_SharedNum(3)
In formula, Neighbor (i, j) the shared neighbours figureofmerit several times between node i and node j is represented, Neighbor_shared_num (i, j) the shared neighbours' number between node i and node j is represented, Max_SharedNum represents the maximal value in Neighbor_shared_num (i, j).
203: the weights calculating the right limit of each node, the weight computing formula on the limit between two nodes is as follows:
w ji=a×freq ij+(1-a)Neighbor(i,j)(4)
In formula, w jirepresent the weights on the limit between node i and node j, freq ijrepresent the frequency of interaction between node i and node j, Neighbor (i, j) represents the shared neighbours figureofmerit several times between node i and node j, and a represents Dynamic gene.
Step S3: the weights adding the weights of ingress self and the right limit of node on the basis of PageRank algorithm, excavates the key node in oriented couple of weighting network figure concurrently.Be specially:
301: according to the node liveness calculated of step S2 and the weights on every bar limit, obtained the PageRank value of each node by the Google's page rank algorithm-PageRank algorithm after improving, PageRank computing formula is as follows:
PR(p i)=a i/N+(1-a i)×ΣPR(p j)×w ji/L(p j)(5)
In formula, PR (p i) represent the PageRank value of node i, p j∈ M (p i), M (p i) represent the set pointing to node i, L (p j) representing the out-degree of this node pointing to node i, N represents node number total in consignment data, a irepresent the node liveness of node i, w jirepresent the weights on the limit between node i and node j;
302: after calculating the PageRank value of all nodes, last computation PageRank value out and current PankRank value are contrasted, if the absolute value of the PageRank value of each node and the difference of last time is greater than given threshold epsilon, then repeat the PageRank value that step 301 calculates each node of next round.If absolute value of difference of PageRank value of twice is less than given threshold epsilon before and after this, then perform step 303;
303: sort to the PageRank value of each node that step 302 finally obtains, before rank, the node of k is by being excavated k most important key node, and k is the quantity of key node.
Be described for actual program in MapReduce framework below:
1) consignment data to be excavated are divided into multiple data block to process respectively, through a MapReduce operation, export <key, value> formal model, wherein, key value is people's node i in network, and value value is the node number having consignment behavior with node i, comprises the number of sender and addressee.Specifically comprise the following steps:
11) consignment Data Segmentation to be excavated is become data block form, in units of data block, give Mapper process.
12) in cluster, each computing node processes corresponding data block respectively, performs a MapReduce operation.
The Mapper stage:
Input: the original consignment data of analysis to be excavated;
Output:<node i, node j>, its interior joint node iand node jall represent addressee and the sender of a participation consignment behavior, and node iand node jboth can be addressee also can be sender, so in the Mapper stage, for <node i, node jthe node pair that > is such, we export should export <node in the Mapper stages i, node j> also will export <node j, node i>.
The Reducer stage:
Input:<node i,node j>;
Output:<node i, count>, wherein key is node node i, value is and node node ithere is the degree of node count of transmitting-receiving relation, result is write a file A1 on HDFS (HadoopDistributedFileSystem).
2) consignment data to be excavated are divided into multiple data block to process respectively, through a MapReduce operation, export <key, value> formal model, wherein, key value is the node pair that a consignment behavior occurs in logistics network, and value value is integer, represents each node to the number of times occurred.Specifically comprise the following steps:
21) consignment Data Segmentation to be excavated is become data block form, in units of data block, give Mapper process.
22) in cluster, each computing node processes corresponding data block respectively, performs a MapReduce operation.
The Mapper stage:
Input: the original consignment data of analysis to be excavated;
Output:< (node i, node j), 1>, its interior joint node iand node jall represent addressee and the sender of a participation consignment behavior, this output form describes node node ito node node jonce consignment behavior.
The Reducer stage:
Input:<(node i,node j),1>;
Output:< (node i, node j), count>, wherein key is that node is to (node i, node j), result, to the number of times occurred, is write a file A2 on HDFS by this node of value.
3) similar with previous step, input is still original consignment data set, calculate shared neighbours' number that node is right, obtaining key value is node pair, represent a consignment behavior of sender and addressee, calculate according to the definition of the right shared neighbours of node and the right shared neighbours' number of node, specifically comprise the following steps:
31) consignment Data Segmentation to be excavated is become data block form, in units of data block, give Mapper process.
32) in cluster, each computing node processes corresponding data block respectively, performs twice MapReduce operation.
The Mapper1 stage:
Input: the original consignment data of analysis to be excavated;
Output:<node i, node j>, its interior joint node iand node jall represent addressee and the sender of a participation consignment behavior, this output form describes node node ito node node jonce consignment behavior.
The Reducer1 stage:
Input:<node i,node j>;
Output:< (node i, node j), adjacentnodesofnode i>, wherein key is that node is to (node i, node j), value is the node set be connected with this node, and the result obtained is exactly the critical table form in figure.
The Mapper2 stage:
Input:< (node i, node j), adjacentnodesofnode i>, that is the input of Mapper2 is exactly the Reduce Output rusults of Mapper1;
Output:< (node a, node b), node i>, wherein key value (node a, node b) represent node node ithe node pair of any two nodes composition in neighbor node, the sequence number sequence of the previous node of its interior joint centering is forward compared with the latter.Such as, for input <A, (B, C, D) >, <B, (C, D) >, so the output of Mapper2 is exactly < (B, C), A>, < (B, D), A>, < (C, D), A>, < (C, D), B>.
The Reducer2 stage:
Input:<(node a,node b),node i>;
Output:< (node a, node b), commonadjacentnodesofnode a, node b>, such as in the hypothesis of previous step, the output in Reducer2 stage is exactly < (B, C), A>, < (B, D), A>, < (C, D), A, B>.Result is write in file A3.
4) the PageRank value of each node is calculated, before carrying out Mapper stage and Reduce stage, write separately a program and write separately a stand-alone program for reading the data obtained above, data in the present embodiment in setup function in file reading A1, file A2 and file A3, then the liveness of each node is obtained according to the definition of node liveness and computing formula (1), using node character string form as key value, the node liveness calculated is stored in as value value the HashMap defined and gathers in hashmap1.Then the weights on limit are calculated according to the computing formula (4) of the definition of internodal frequency of interaction and computing formula (2), internodal shared the neighbours definition of figureofmerit and the weights on computing formula (3) and limit several times.Wherein a is the Dynamic gene of two factors affecting limit weights, can control the weight on these two Effects of Factors limits.Then using the character string forms on limit as key value, by the weight w on this limit jileave as value value the HashMap defined in gather in hashmap2.Then find that algorithm definition and formula (5) calculate the PageRank value of each node according to based on weighting key node two after PageRank algorithm improvement.Perform a MapReduce operation, calculate the PageRank value of each node, then the result obtained last time is as the input of MapReduce operation next time, so ceaselessly iteration is gone down, until the absolute value of the PageRank value difference of each node corresponding in double operation is less than given threshold epsilon just stop iterative process, obtain result like this.Specifically comprise the following steps:
41) weights on computing node liveness and limit in setup function
The data of file reading A1, A2, A3 in setup function, then calculate the liveness of each node according to the definition of node liveness and computing formula, result left in hashmap1 set.And then the weights on every a pair internodal limit are calculated according to the fixed of node frequency of interaction and internodal shared neighbours' number and computing formula, its result is left in during hashmap2 gathers.
42) in cluster, each computing node processes corresponding data block respectively, and perform a MapReduce operation, process the pre-service of mainly data specifically, main step is as follows:
The Mapper stage:
Input:<node i, node jthe consignment data to be excavated of > form, original data mode style representatives node node ito node node jpost an express delivery, directive, node i→ node j, finally obtain critical table form.
Output:<node j, node i>, raw data is changed a direction, form has changed node into j→ node i
The Reducer stage:
Input:<node j,listofnodespointtonode j>。After Mapper Output rusults, through the process of shuffle and combine function, the input key value of Reduce is addressee node j, value value sends by special delivery to node jsender set.
Output:<node j, listofnodespointtonode jresult directly exports by >, Reduce, such as, obtain result <A, (B, C, D) >, represent Node B, C, D and all posted an express delivery to node A.A is addressee, and B, C, D are that sender gathers.
43) to this step, the liveness of known each node, the weights on each limit and raw data adjacency list form after treatment, in cluster, each computing node processes corresponding data block respectively, perform a MapReduce operation again, according to the key node innovatory algorithm formula (5) based on PageRank, calculate the PageRank value of each node.Below we with the form of false code to the computational details of egress PageRank.
Algorithm1:Map(key,value)
Input:
Logisticsnetworknodes--logistic network nodal points
PR (p i): thePageRankvalueofnode--PageRank value
W ij: the weights on thevalueoftheedges (i, j)--limit
Links[p 1,p 2,p 3,...p m]:allthenodep jlinkedbynodep i
Output:
Listof<key:value>
1.Emit(p i,links[p 1,p 2,p 3,...p m])
2.Foreachp jinlinks[p 1,p 2,p 3,...p m]
3.Partial(j)=PR(p i)×w ij/L(p j)
4.Emit(p j,partial(j))
5.EndFor
Algorithm2:Reduce(key,value)
Input:
LogisticsNetworknodep jlistof<p j,partial(j)>
Output:
PR(p j):thePageRankvalueofuserp j
1.//InitialnewPageRankvalueofnodep j
2.PR(p j)=0
3.Foreachpartial(j)inthelist
4.PR(p j)+=partial(j)
5.EndFor
6.PR (p j)=(1-a) × PR (p i)+a/N//N is the sum of nodes
44) when the PageRank value obtaining each node that first time calculates, the value of the PageRank of the node obtained first time, as the initial p ageRank value of the node of second time MapReduce operation, then carries out second time iteration to calculate the PageRank value of next iteration process.The result calculated by last interative computation is like this as the initial p ageRank value calculating each node next time, constantly carry out interative computation, until the PageRank value of each node that calculates of last time differs with the PageRank value of each node calculated next time be no more than given threshold epsilon with regard to finishing iteration computing, what now obtain is exactly the PageRank value of final each node.Then sort according to respective PageRank value to each node, before rank, k's is exactly front k most important key node.
Existing in key node research, few people pay close attention to the discovery of key node in logistics consignment network, the present invention is based on real logistics network, by node liveness, node frequency of interaction and the right shared neighbours' number etc. of node are considered in weight computing, take full advantage of the information in logistics consignment network, decrease the loss of effective information, improve the accuracy that in network, key node finds, and based on MapReduce programming framework, obtain on PageRank algorithm in the Google of comparative maturity and make improvements, the parallelization of implementation algorithm, substantially increase efficiency and the stability of key node excavation.
Find key node in the small scale network formed in small data set, traditional uniprocessor algorithm can well meet the demands, and efficiency is suitable.But for the large scale network that mass data is formed, traditional uniprocessor algorithm seems unable to do what one wishes, and the method superiority that the present invention puts forward is fairly obvious.

Claims (7)

1., towards a parallelization key node discover method for consignment data, it is characterized in that, comprising:
Step S1: the transmitting-receiving total degree according to node each in setting-up time in consignment data obtains node liveness, using the weights of node liveness as node self;
The net definitions formed by consignment data is an oriented couple of weighting network figure by step S2: the frequency of interaction right according to node each in setting-up time in consignment data and shared neighbours several times figureofmerit obtain the weights on the right limit of each node;
Step S3: the weights adding the weights of ingress self and the right limit of node on the basis of PageRank algorithm, excavates the key node in oriented couple of weighting network figure concurrently.
2. a kind of parallelization key node discover method towards consignment data according to claim 1, it is characterized in that, described node liveness meets following formula:
a i=M i/Max_num(1)
In formula, a irepresent the node liveness of node i, M irepresent that node i receives and dispatches total degree in setting-up time, Max_num represents all M iin maximal value.
3. a kind of parallelization key node discover method towards consignment data according to claim 1, it is characterized in that, the weights on described limit meet following formula:
w ji=a×freq ij+(1-a)Neighbor(i,j)(2)
In formula, w jirepresent the weights on the limit between node i and node j, freq ijrepresent the frequency of interaction between node i and node j, Neighbor (i, j) represents the shared neighbours figureofmerit several times between node i and node j, and a represents Dynamic gene.
4. a kind of parallelization key node discover method towards consignment data according to claim 1, it is characterized in that, described frequency of interaction meets following formula:
freq ij=n ij/Max_num(3)
In formula, freq ijrepresent the frequency of interaction between node i and node j, n ijrepresent the occurrence number on the limit that node i and node j are formed, Max_num represents all n ijin maximal value.
5. a kind of parallelization key node discover method towards consignment data according to claim 1, it is characterized in that, described shared neighbours several times figureofmerit meet following formula:
Neighbor(i,j)=Neighbor_shared_num(i,j)/Max_SharedNum(4)
In formula, Neighbor (i, j) the shared neighbours figureofmerit several times between node i and node j is represented, Neighbor_shared_num (i, j) the shared neighbours' number between node i and node j is represented, Max_SharedNum represents the maximal value in described Neighbor_shared_num (i, j).
6. a kind of parallelization key node discover method towards consignment data according to claim 1, it is characterized in that, described step S3 is specially:
301: the PageRank value obtaining each node, meets following formula:
PR(p i)=a i/N+(1-a i)×ΣPR(p j)×w ji/L(p j)(5)
In formula, PR (p i) represent the PageRank value of node i, p j∈ M (p i), M (p i) represent the set pointing to node i, L (p j) representing the out-degree of this node pointing to node i, N represents node number total in consignment data, a irepresent the node liveness of node i, w jirepresent the weights on the limit between node i and node j;
302: for each node, the PageRank value of twice acquisition before and after contrast, whether the absolute value of both judgements difference is greater than given threshold epsilon, if so, jump procedure 301, continues the PageRank value obtaining each node of next round, if not, performs step 303;
303: sort to the PageRank value of each node that step 302 finally obtains, before rank, the node of k is excavated key node, and k is the quantity of key node.
7. a kind of parallelization key node discover method towards consignment data according to claim 1, it is characterized in that, in this parallelization key node discover method, the data of each step all carry out parallelization process based on MapReduce programming framework.
CN201510469302.8A 2015-08-03 2015-08-03 A kind of parallelization key node towards consignment data finds method Expired - Fee Related CN105069290B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510469302.8A CN105069290B (en) 2015-08-03 2015-08-03 A kind of parallelization key node towards consignment data finds method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510469302.8A CN105069290B (en) 2015-08-03 2015-08-03 A kind of parallelization key node towards consignment data finds method

Publications (2)

Publication Number Publication Date
CN105069290A true CN105069290A (en) 2015-11-18
CN105069290B CN105069290B (en) 2017-12-26

Family

ID=54498655

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510469302.8A Expired - Fee Related CN105069290B (en) 2015-08-03 2015-08-03 A kind of parallelization key node towards consignment data finds method

Country Status (1)

Country Link
CN (1) CN105069290B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106506192A (en) * 2016-10-09 2017-03-15 中国电子科技集团公司第三十六研究所 A kind of method and apparatus of identification network key node
CN106685690A (en) * 2016-10-27 2017-05-17 中南大学 Computer network key node discovery method based on simulated building process
CN107729478A (en) * 2017-10-16 2018-02-23 天津微迪加科技有限公司 A kind of data analysing method and device
CN109379220A (en) * 2018-10-10 2019-02-22 太原理工大学 The method that complex network key node cluster based on Combinatorial Optimization excavates
CN112507996A (en) * 2021-02-05 2021-03-16 成都东方天呈智能科技有限公司 Face detection method of main sample attention mechanism
CN112990633A (en) * 2019-12-18 2021-06-18 菜鸟智能物流控股有限公司 Index data generation method, logistics cost simulation method, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130148558A1 (en) * 2011-12-12 2013-06-13 Qualcomm Incorporated Low power node dormant state
CN103259263A (en) * 2013-05-31 2013-08-21 重庆大学 Electrical power system key node identification method based on active power load flow betweenness
CN103906271A (en) * 2014-04-21 2014-07-02 西安电子科技大学 Method for measuring key nodes in Ad Hoc network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130148558A1 (en) * 2011-12-12 2013-06-13 Qualcomm Incorporated Low power node dormant state
CN103259263A (en) * 2013-05-31 2013-08-21 重庆大学 Electrical power system key node identification method based on active power load flow betweenness
CN103906271A (en) * 2014-04-21 2014-07-02 西安电子科技大学 Method for measuring key nodes in Ad Hoc network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩忠明: "《加权社会网络中重要节点发现算法》", 《计算机应用》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106506192A (en) * 2016-10-09 2017-03-15 中国电子科技集团公司第三十六研究所 A kind of method and apparatus of identification network key node
CN106685690A (en) * 2016-10-27 2017-05-17 中南大学 Computer network key node discovery method based on simulated building process
CN106685690B (en) * 2016-10-27 2019-07-09 中南大学 Computer network key node based on simulation building process finds method
CN107729478A (en) * 2017-10-16 2018-02-23 天津微迪加科技有限公司 A kind of data analysing method and device
CN109379220A (en) * 2018-10-10 2019-02-22 太原理工大学 The method that complex network key node cluster based on Combinatorial Optimization excavates
CN109379220B (en) * 2018-10-10 2021-06-15 太原理工大学 Complex network key node cluster mining method based on combination optimization
CN112990633A (en) * 2019-12-18 2021-06-18 菜鸟智能物流控股有限公司 Index data generation method, logistics cost simulation method, equipment and storage medium
CN112990633B (en) * 2019-12-18 2024-04-05 菜鸟智能物流控股有限公司 Index data generation method, logistics cost simulation method, equipment and storage medium
CN112507996A (en) * 2021-02-05 2021-03-16 成都东方天呈智能科技有限公司 Face detection method of main sample attention mechanism

Also Published As

Publication number Publication date
CN105069290B (en) 2017-12-26

Similar Documents

Publication Publication Date Title
CN105069290A (en) Parallelization critical node discovery method for postal delivery data
CN103379158B (en) The method and system of commending friends information in a kind of social networks
CN103678671A (en) Dynamic community detection method in social network
CN103020267B (en) Based on the complex network community structure method for digging of triangular cluster multi-label
CN103914528A (en) Parallelizing method of association analytical algorithm
Jain et al. An adaptive parallel algorithm for computing connected components
CN110719106B (en) Social network graph compression method and system based on node classification and sorting
CN105913235A (en) Client account transfer relation analysis method and system
CN104731925A (en) MapReduce-based FP-Growth load balance parallel computing method
CN105138650A (en) Hadoop data cleaning method and system based on outlier mining
CN112182306A (en) Uncertain graph-based community discovery method
CN102298618B (en) Method for obtaining matching degree to execute corresponding operations and device and equipment
CN104700311B (en) A kind of neighborhood in community network follows community discovery method
CN111861771A (en) Multi-objective optimization community discovery system and method based on dynamic social network attributes
Park et al. On the power of gradual network alignment using dual-perception similarities
Kusumakumari et al. Frequent pattern mining on stream data using Hadoop CanTree-GTree
CN103761298A (en) Distributed-architecture-based entity matching method
CN107590225A (en) A kind of Visualized management system based on distributed data digging algorithm
Yang et al. An efficient accelerator for point-based and voxel-based point cloud neural networks
Demetrescu et al. Adapting parallel algorithms to the W-Stream model, with applications to graph problems
CN111107493B (en) Method and system for predicting position of mobile user
CN108509531B (en) Spark platform-based uncertain data set frequent item mining method
CN112036510B (en) Model generation method, device, electronic equipment and storage medium
Wu et al. A new approach to mine frequent patterns using item-transformation methods
CN116128701A (en) Device and method for executing graph calculation task

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20171226

Termination date: 20200803