CN108491449A

CN108491449A - A kind of community discovery method based on neighbour's feature propagation label

Info

Publication number: CN108491449A
Application number: CN201810158349.6A
Authority: CN
Inventors: 张霄宏; 姜玉林; 唐朝生; 张冬生; 钱凯; 史爱静
Original assignee: Henan University of Technology
Current assignee: Henan University of Technology
Priority date: 2018-02-25
Filing date: 2018-02-25
Publication date: 2018-09-04

Abstract

The present invention proposes that a kind of community discovery method based on neighbour's feature propagation label, this method introduce label and pre-allocate mechanism, and local optimum is carried out to the label of random initializtion；In communication process, the label of each node is updated according to the sequence of influence power, has been abandoned the way for updating node label in original method by random sequence, can have been shown better stability and higher accuracy.The method of the present invention introduces label predistribution mechanism, three factors of tight ness rating between the influence power and node of the number, neighboring node that are occurred in neighboring node according to each label optimize initialization result, it is propagated into row label on the basis of optimization, has reduced or eliminated influence of the result to community division result of random initializtion.In addition, this method calculates according to the sequence of influence power from high to low in communication process and update the label of each node, reduce the update repeatedly to each node label.

Description

A kind of community discovery method based on neighbour's feature propagation label

Technical field

The invention belongs to network fields, and in particular, to a kind of community discovery method based on neighbour's feature propagation label.

Background technology

Most complication system can carry out abstract processing with complex network modeling in reality.Studies have shown that removing Outside with small world effects and scaleless property, also in the prevalence of community structure in complex network, community structure is currently without system One definition, it is considered that community structure shows as community's interior nodes contact closely, is connected between different community's nodes sparse Feature.This structure is very common in actual life, such as：Attract constituted good friend by different interest in social networks Circle；The industry circle being made of different industries in economy and trade network；Interaction shape in protein network between the protein of different structure At different units.It was found that the community structure in complication system not only has higher research value, but also we are understood The function and mutual relationship that the composition of complication system, apparent each component part play also have important reality meaning Justice.

GN algorithms were proposed in 2002, community discovery field starts by the extensive of researcher from Girvan and Newman Concern.Existing method improves the performance of label propagation algorithm to a certain extent at present, or but these methods disappear completely In addition to randomness, the characteristics of label propagation algorithm is by network topology discovery community cannot be embodied；Possess the higher time Complexity cannot carry out the community discovery of large scale network.

Description of the drawings

Fig. 1 is 5 meshed network figures.

Fig. 2 is 8 meshed network figures.

Invention content

The present invention proposes that a kind of community discovery method based on neighbour's feature propagation label, this method introduce label predistribution Mechanism carries out local optimum to the label of random initializtion；In communication process, each section is updated according to the sequence of influence power The label of point has abandoned the way for updating node label in original method by random sequence, can show better stability and more High accuracy.

A kind of community discovery method based on neighbour's feature propagation label, it is characterised in that：This approach includes the following steps： The label for obtaining random initializtion carries out local optimum to above-mentioned label, and each node can obtain a part after optimization Optimal label；Finally based on the local optimum label that each node obtains, starts label and propagate, specially

The first step initializes the label of all nodes in network, assigns each node one globally unique label；

Second step, label are propagated in advance；

Third walks, and note iterations are t, and the initial value of t is 1；

All nodes in network are formed one by the 4th step by the descending sequence of the neighboring node influence power of each node A ordered set；

The most label of the occurrence number in its all neighbor node is made each node in set by the 5th step The label obtained in current iteration for the node；

6th step, for the arbitrary node in set, if node is in current iteration and the mark obtained in last iteration Sign it is identical, then label propagation terminate；Otherwise, it enables t=t+1 and returns to the 4th step.

Particularly, in the pre- communication process of label,

1st step,Then v_iAnd v_jFor neighbor relationships, and it is denoted asWhereinIndicate the neighbor relationships between node；

2nd step, ifThen v_iAnd v_jNeighboring node each other；

3rd step,v_iNeighbour set

4th step, calculate node tight ness ratingIndicate any two node v_iAnd v_jIt Between tightness degree；

5th step, node v_iNeighboring node influence power be denoted asIt is obtained by iterating to calculate.Remember f_inf(v_i, k) and indicate v_i Influence power at the kth iteration, if when kth time iteration all nodes f_infThe sum of with (k-1) secondary iteration when all nodes F_infThe sum of difference be less than particular value (ε), then restrain, the influence power that each node is obtained at the kth iteration is as the node Final influence power, that is, have

6th step, it is assumed that node v_jIt is v_iNeighboring node, by v_jLabel by v_iThe degree of receiving is denoted as v_jLabel for v_iThe decision value of label distribution, is expressed asAnd

7th step, the set that constitutes of the corresponding label of the powerful node bigger than present node influence power be denoted as L, v_i Candidate tag setWhereinIndicate node v_jLabel；

8th step, if

9th step, ifAnd

10th step, ifAnd

The method of the present invention introduces label predistribution mechanism, the number occurred in neighboring node according to each label, close Three factors of tight ness rating between the influence power and node of neighbors optimize initialization result, into rower on the basis of optimization Label are propagated, and influence of the result to community division result of random initializtion has been reduced or eliminated.In addition, this method is in communication process In the label of each node is calculated and updated according to the sequence of influence power from high to low, reduce to each node label repeatedly more Newly.

Specific implementation mode

The Research Thinking that the present invention uses is as follows：

The label for first obtaining random initializtion carries out local optimum to these labels, and each node can obtain after optimization To the label of a local optimum, finally based on the local optimum label that each node obtains, starts label and propagate.

Second step, label are propagated in advance

Third walks, and note iterations are t, and the initial value of t is 1；

6th step, for the arbitrary node in set, if node is in local iteration and the mark obtained in last iteration Sign it is identical, then label propagation terminate；Otherwise, it enables t=t+1 and returns to the 4th step.

Wherein in the pre- communication process of second step, following steps are specifically included：

1st step describes the neighbor relationships between node；

2nd step, description meet the node of neighbor relationships, i.e. neighboring node；

3rd step constitutes neighboring node set by neighboring node；

4th step, according to the tightness degree contacted between the attribute calculate node of neighboring node set；

5th step calculates the influence power that some node has under neighboring node effect；

6th step describes the labeling assignments of some node to acceptance level of another node as its local optimum label；

7th step, candidate tag set describe a set being made of special tag, each label in these set Some node may be all assigned to as its local optimum label, specific method：In the neighboring node set for checking the node All nodes, the label that influence power in set is more than to all nodes of this node form a set, are exactly the time of this node Select tag set；

8th step, if the candidate tag set of some node is sky, the existing label of this node is exactly that it is being propagated in advance The local optimum label obtained in the process；

9th step exists if the candidate tag set of some node only includes a label using this label as this node The local optimum label obtained in pre- communication process；

If the candidate tag set of the 10th step some node includes multiple labels, by the maximum label of label decision value The local optimum label obtained in pre- communication process as this node.

Network, wherein V={ v are indicated with figure G=(V, E)_i| i=1,2 ..., n } indicate network in all nodes set, E={ (v_i,v_j)|v_i,v_j∈ V } indicate network in all sides set, n indicate network node total number, m indicate network it is total Number of edges.Neighbor relationships and neighboring node attribute are defined as follows：

Define 1 neighbor relationships.IfThen v_iAnd v_jFor neighbor relationships, and it is denoted asWhereinIndicate the neighbor relationships between node.

Define 2 neighboring nodes.IfThen v_iAnd v_jNeighboring node each other.

Define 3 neighboring node set.v_iNeighbour set

Define 4 node tight ness ratings.Arbitrary node v is described_iAnd v_jBetween tightness degree, be denoted asIt can be according to formula (1) It is calculated：

The bigger expression node v of value_iAnd v_jIt is closer.

In complex network, the effect that each node plays is different, and significance level in a network is also different.Node influence power It is a kind of method that can quantify this significance level, present invention introduces the influence powers of neighboring node to describe the important of neighboring node Degree.The influence power of neighboring node is defined as follows：

Define 5 neighboring node influence powers.The influence that certain node generates under the collective effect of its all neighboring node is described Power is denoted as Inf.Node v_iNeighboring node influence power beIt need to be obtained by iterating to calculate.Remember f_inf(v_i, k) and indicate v_i Influence power when kth time iteration.If the f of all nodes when kth time iteration_infThe sum of with all nodes when (k-1) secondary iteration f_infThe sum of difference be less than particular value ε, then restrain, the influence power that each node is obtained at the kth iteration as the node most Whole influence power.With v_iFor,f_inf(v_i, k) and it is calculated by formula (2), (3)：

By taking 5 meshed networks as shown in Figure 1 as an example, the influence power for initializing each node in network is 1, then according to section Point numbers the influence power that ascending sequence calculates each node successively.Due to v₁Neighboring node there was only v₂, v₁Influence power it is complete Entirely by v₂It determines, i.e.,v₂Influence power by its neighboring node v₁1 and v₃It determines,

I.e.Similarly, f can be obtained_inf(v₃,2)f_inf(v₃,2)f_inf(v₃, 2) and it is respectively 5/ 3,19/18 and 39/36.It is iterated calculating as procedure described above until meeting the condition of convergence.

When initializing node label using random method, community division result will appear fluctuation by a relatively large margin and make to draw Divide result highly unstable.To overcome the problems, such as this, the present invention introduces label before label propagation and pre-allocates step, by right The local directed complete set of network node label reduces influence of the mode to final result because of random initializtion node label to improve society Area finds stability and the accuracy of result.

It is obtained it is considered that the formation of the final community of each in community network is gradually extended by the small community in part, And local community is collectively constituted by having certain similar features and contacting more close node.In these nodes, shadow Ring the larger node of power has large effect to the distribution of neighboring node label.But if only according to the influence power of neighboring node come Determine the distribution of node label, then the label spread scope of the larger node of individual effect power can become very wide, easily shape At super community.To avoid such case, present invention introduces label Decision of Allocation values, are distributed into row label according to the value.

Define 6 label Decision of Allocation values.The description influence degree of neighboring node to present node when distributing label.It is false If node v_jIt is v_iNeighboring node, then v_jTo v_iLabel Decision of Allocation value be denoted asIt can be calculated according to formula (4).

If the Inf neighboring nodes that certain node has more than one influence power larger, these nodes are likely to influence Label to present node distributes, i.e., the label of some node can be pre-assigned to present node in these nodes.The present invention draws The label of some node may be pre-assigned to by entering candidate tag set description.

Define 7 candidate tag sets.By the powerful neighboring node corresponding label bigger than present node influence power The set of composition, is denoted as L, with v_iFor, v_iCandidate tag setIt is indicated by formula (5).

WhereinIndicate node v_jLabel.

The policy depiction that label pre-allocation process is taken is as follows：

If strategy 1

If strategy 2And

If strategy 3And

According to strategy 1, if v_iCandidate tag set be sky, the v in pre- propagate_iLabel remain unchanged.According to strategy 2, if v_iCandidate tag set only includes a label, then in pre- propagate by v_iTag update is the label in candidate collection. According to strategy 3, if candidate tag set includes multiple labels, the label point of each candidate label corresponding node is calculated first With decision value, then using the label corresponding to maximum value as distributing to v in pre- communication process_iLabel.

Below with the node v in Fig. 2₅For care label pre- communication process.It is calculated according to influence power shown in the present invention Method calculates v₅The influence power of each neighboring node, and the sequence of influence power from high to low is pressed to these node sequencings, obtained section Point sequence is v₄,v₈,v₆,v₅,v₇,v₂,v₃,v₁.The v known to ranking results₄,v₈And v₆Influence power ratio v₅Greatly, thus v₅Candidate Tag set should be made of the label of these three nodes, i.e., It is not sky, it should be according to strategy 3 To v₅Pre-allocate label.It is computed, v₄、v₆And v₈Label decision valueWithRespectively 0.27263, 0.34102 and 0.34104.Due toMaximum, therefore should be by v₈Corresponding label is as v₅The mark obtained in pre- communication process Label.

Strategically 1-3 continues label and propagates in advance, the label until having updated all nodes in Fig. 2.It is propagated through in advance Journey has carried out local optimum according to certain strategy to the label that each node obtains in initialization procedure, with the result of optimization The initial labels of each node when as formal communication reduce the influence for being randomly assigned label to final community division result with this.

The pre- communication process of label reduces random point to being optimized to a label for node distribution in initialization procedure Influence with label to community division result.The label obtained in pre- communication process with each node is by label communication process Basis is iterated propagation according to network topology structure to label, and using the result of iterative diffusion as final division community Foundation.

The problem of slow method convergence that label may be brought, unstable result being propagated by random node sequence.To avoid This problem, it is necessary to simultaneously be propagated in any suitable order into row label since suitable node.Context of methods changes label For the start node in communication process, selected the maximum node of influence power and propagated as label, and by node influence power by greatly to Small sequence be updated to the label of each node, until convergence.The label of each node, divides society when according to convergence Area.

Claims

1. a kind of community discovery method based on neighbour's feature propagation label, it is characterised in that：This approach includes the following steps：It obtains The label for obtaining random initializtion carries out local optimum to above-mentioned label, and each node can obtain a part most after optimization Excellent label；Finally based on the local optimum label that each node obtains, starts label and propagate, specially

Second step, label are propagated in advance；

Third walks, and note iterations are t, and the initial value of t is 1；

All nodes in network are formed an ordered set by the 4th step by the descending sequence of neighboring node influence power；

5th step, for each node in set, using the most label of the occurrence number in its all neighbor node as this The label that node is obtained in current iteration；

6th step, for the arbitrary node in set, if node is in local iteration and the label phase obtained in last iteration Together, then label propagation terminates；Otherwise, it enables t=t+1 and returns to the 4th step.

2. according to a kind of community discovery method based on neighbour's feature propagation label described in claim 1, it is characterised in that： In the pre- communication process of label,

1st step,Then v_iAnd v_jFor neighbor relationships, and it is denoted asWherein Indicate the neighbor relationships between node；

2nd step, ifThen v_iAnd v_jNeighboring node each other；

3rd step,v_iNeighbour setAnd

4th step, calculate node tight ness ratingIndicate any two node v_iAnd v_jBetween Tightness degree；

5th step, node v_iNeighboring node influence power be denoted asIt is obtained by iterating to calculate.f_inf(v_i, k) and indicate v_iIn kth Influence power when secondary iteration；If the f of all nodes when kth time iteration_infThe sum of f with all nodes when (k-1) secondary iteration_inf The sum of difference be less than particular value (ε), then restrain, the influence power that each node is obtained at the kth iteration as the node most Eventually, that is, have

6th step, it is assumed that node v_jIt is v_iNeighboring node, by v_jLabel by v_iThe degree of receiving is denoted as v_jLabel for v_iMark The decision value for signing distribution, is expressed asAnd

7th step, the set that constitutes of the corresponding label of the powerful node bigger than present node influence power be denoted as L, v_iCandidate Tag setAndWhereinIndicate node v_jLabel；

8th step, if

9th step, ifAnd

10th step, ifAnd