CN105631751A - Directional local group discovery method - Google Patents

Directional local group discovery method Download PDF

Info

Publication number
CN105631751A
CN105631751A CN201510996221.3A CN201510996221A CN105631751A CN 105631751 A CN105631751 A CN 105631751A CN 201510996221 A CN201510996221 A CN 201510996221A CN 105631751 A CN105631751 A CN 105631751A
Authority
CN
China
Prior art keywords
colony
node
collection
attribute
limit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510996221.3A
Other languages
Chinese (zh)
Inventor
潘理
吴鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201510996221.3A priority Critical patent/CN105631751A/en
Publication of CN105631751A publication Critical patent/CN105631751A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a directional local group discovery method comprising the following steps that a network adjacent matrix and an attribute matrix are established; an attribute importance weight vector is inferred based on the model node of a response directional target; a network edge weight is weighted again based on the weight vector; edges with significantly large weights are extracted on the network weighted again so as to form group seeds; directional local groups are extracted through local extension of group seed optimization weighted conductance; and unimportant and repeated groups in the extracted directional local groups are removed. A directional target self-adaptive inference method is provided by aiming at the characteristics that social network group structures are diverse and group application targets are clear, the network is weighted again under the inferred targeted subspace and the seeds are constructed, the directional local groups are extracted based on local extension, and thus the method is suitable for specific social application targets.

Description

Directed local colony discover method
Technology neighborhood
The present invention relates to social network technical field, specifically, it relates to directed local colony discover method in a kind of social network, can be used for social network functional analysis, structures visualization and various social activity application input.
Background technology
Colony in social network finds to play an important role to understanding network function, visual network structure and develop other social application. From structure, colony inside connects closely, connects sparse between colony; From attribute, colony inner on specified genus subspace relatively homogeneous.
Through the literature search of prior art being found, major part colony discover method only considers network topology information, and only extracts a kind of fixing group structure. In fact, due to complexity and the huge property of social network, it comprises multiple group structure usually, and the target of different social application is different, need the group structure of different deflection, it is thus desirable to extract suitable directed local colony based on specific application target and user's interest.
However, it may be difficult to only just obtain directed group structure based on network structure information. Nowadays, it is possible to obtain the attribute information of a large amount of social network, attribute can react and describe application target and user's interest, because herein is provided a kind of method guiding directed local colony to find.
Colony's discover method of compages information and attribute information mainly comprises attribute total space method and attribute subspace method. Attribute total space method uses all hierarchical cluster attribute node collection given. The people such as Xu delivered, in international conference " SIGMOD ", the article being entitled as " Amodel-basedapproachtoattributedgraphclustering " in 2012, adopted Bayes's model to process structure and all attribute informations in literary composition simultaneously; This model is that each possible colony's structure based connects and all property distribution distribute a probability, colony is pinpointed the problems and changes into a probabilistic inference problem, and uses the variational method to solve. But, the attribute of not all acquisition and certain specific target are related, and the usual resolving ability of total space method is not enough, causes finding not good colony. On the other hand, attribute subspace method is based on certain attribute subspace cluster node collection. The people such as Huang deliver, on the international periodical " InformationScience " of 2015 Nian, the article being entitled as " Densecommunitydetectioninmulti-valuedattributednetworks ", literary composition adopts the subspace clustering method based on unit, find the unit in subspace with dense connection, it is desired to colony meets subspace interest thresholding, fraction of coverage and connection property thresholding. But, existing subspace is selected based on non-supervisory feature selection mechanism usually, cannot for specific objective chooser space.
Summary of the invention
For defect of the prior art, the directed local colony discover method that it is an object of the invention to provide, comprises the steps:
Step 1: adjacency matrix A and the attribute matrix B setting up network to be analyzed;
Step 2: user provides the model node v of a reaction orientation targetp, the present invention infers the Importance of attribute weight vector collection in its structure neighborhood based on this node;
Step 3: judge whether weight vector collection is empty, if the non-sky of weight vector collection, then carries out step 4; If weight vector collection is empty, then perform step 13;
Step 4: concentrate from weight vector and take out a weight vector
Step 5: based on described weight vectorAgain weighting obtains the comprehensive weights of network edge;
Step 6: extract the limit that weights are significantly big on weighting network again, and build colony's subset with described limit;
Step 7: judge whether colony's subset is empty, if the non-sky of colony's subset, then performs step 8; If colony's subset is empty, then perform step 12;
Step 8: take out colony's seed from colony's subset;
Step 9: judge whether described colony seed belongs to set of access nodes, if not belonging to set of access nodes, then performs step 10; If belonging to set of access nodes, then return and perform step 7;
Step 10: colony's seed described in local expansion until the weighting of colony representated by this seed to lead rate minimum, now obtain a directed local colony, the ratio that the weighted volume with this colony is cut in the weighting that rate is defined as this colony is led in the weighting of colony;
Step 11: directed local colony is added described weight vectorUnder directed local colony collection, upgrade set of access nodes;
Step 12: remove directed local colony under described weight vector and concentrate unimportant in the directed local colony repeated, the colony of unimportant colony to be those internal edges weights sums with the ratio of network all limits weights sum be less than a significance thresholding, the colony of repetition is that those and the scale occured simultaneously between having there is colony that colony concentrates are greater than the colony that is repeated thresholding;
Step 13: export all directed local colonies collection.
Preferably, the adjacency matrix A coding network structural information in described step 1, the either element A in matrixijRepresentative edge (vi,vj) topological weights, on duty when being 0, represent corresponding node between there is not limit, attribute matrix B coding network attribute information, the either element B in attribute matrix BipRepresent the p attribute value of i-th node.
Preferably, described step 2 comprises: based on the model node v of a reaction orientation targetpInfer Importance of attribute weight vector collection ��, wherein:
In formula:Represent kth weight vector,Representing the weights of q attribute in certain weight vector, q represents property index, and t represents attribute number, SDqRepresent the similarity of model's node collection on the q attribute;
Specifically, comprising:
Step 2.1: stochastic sampling in a network | PR| individual node to composition random node to collection, | PR| represent that random node is to the number of centralized node pair;
Step 2.2: calculate all random nodes to the sum of squares RSum of the difference of attribute valueq, calculate standardizing factor ��q=RSumq/|PR|;
Step 2.3: the proximity network extracting model node out, namely the network of all node compositions being connected with model node, divides out neighborhood colony collection NCS (v in proximity networkp);
Step 2.4: judge whether neighborhood colony collection is empty, then performs step 2.5 if not empty, otherwise end step 2;
Step 2.5: concentrate from neighborhood colony and take out a neighborhood colony NCk, judge whether the inner number of nodes of neighborhood colony is greater than CSlIf then carrying out 2.6, otherwise returning step 2.4;
Step 2.6: choose CS in described neighborhood colony at randomlIndividual node composition model's node collection, model's node concentrates the node of all any two nodes compositions to composition similar node to collection;
Step 2.7: calculate all similar node to the sum of squares SSum of the difference of attribute valueq;
Step 2.8: calculate the similarity of model's node collection on the q attribute
Step 2.9: the importance weight calculating q attribute
Preferably, described step 5 comprises:
Step 5.1: the attributive distance on every bar limit in computational grid
In formula: BiRepresent the attribute vector of i-th node, BjRepresent the attribute vector of jth node,Represent that diagonal lines isDiagonal matrix, (Bi-Bj)TRepresent the transposition of i-th node attribute vector with the difference of jth node attribute vector;
Step: 5.2: the weights calculating every bar limit, namely based on described weight vectorAgain weighting network limit weights W={Wij}:
W i j = 1 SED i j A v g D + γ A v g A × A i j , ∀ ( v i , v j ) ∈ ϵ ,
In formula: viRepresent node i, vjRepresent node j, (vi,vj) representing the limit between node i and node j, E represents network edge collection, WijRepresent limit (vi,vj) weights, SEDijRepresent limit (vi,vj) attributive distance, AvgD represents all limits average properties distance; AijRepresent limit (vi,vj) structure weights, on duty is that 0 expression does not exist this limit; AvgA represents all limits average structure distance, and �� represents control attribute and the balance parameters of structure importance balance.
Preferably, described step 6 comprises: extracts the significantly big limit of weights on weighting network again and builds colony subset SeedSetk;
Specifically, comprising:
Step 6.1: is sorted composition sequence limit collection from big to small by weights in all limits;
Step 6.2: take out sequence limit and concentrate front SizeBSBar limit composition guiding set BS;
Step 6.3: the average of limit weights and variance are as parameter in guiding set, guide a normal distribution;
Step 6.4: take out the limit that foremost is concentrated on sequence limit;
Step 6.5: judge whether the weights on the limit of described taking-up meet described normal distribution, if meeting, adds guiding set by limit, upgrades normal distribution average with the average of limit weights in new guiding set, and returns step 6.4, if not meeting, performs step 6.6;
Step 6.6: be guided out a subgraph with limits all in guiding set;
Step 6.7: in subgraph, each connected component is colony's seed, all connected components form colony's subset.
Preferably, described step 10 comprises:
Step 10.1: colony's seed, as current colony, calculates the weighting of current colony and leads rate ��curr, weighting is led rate and is defined as:
ψ = W C u t m i n ( W V o l , T V o l - W V o l ) ;
In formula: WCut represents that the weighting of colony is cut, i.e. limit weights sum between node outside colony's interior joint and colony, WVol represents the weighted volume of colony, i.e. all node incidence edge weights sums in colony, TVol represents total weighted volume of network, i.e. all node incidence edge weights sums in network;
Step 10.2: current conductivity value is given and initially leads rate ��init=��curr;
Step 10.3: the current colony of local expansion cannot reduce, until continuing expansion, the rate of leading, calculates new for leading rate ��curr;
Step 10.4: the current colony of local contraction cannot reduce, until continuing contraction, the rate of leading, calculates new for leading rate ��curr;
Step 10.5: whether judgement is initially led rate and work as leading rate equal, if equal, then end step 10, no inequal, then return step 10.2.
Preferably, described step 12 comprises:
Step 12.1: all colonies that directed local colony concentrates are led rate by weighting and sorts from small to large;
Step 12.2: judge whether colony's collection is empty, if not empty, then perform step 12.3; , if it is empty, then perform step 12.6;
Step 12.3: take out the colony that directed local colony concentrates foremost;
Step 12.4: judge whether the described colony taken out is less than significance thresholding, if being less than, then adds this colony and removes collection; If being not less than, then judging whether this colony belongs to and remove collection, removing collection if not belonging to, this colony being added and retains collection, and perform step 12.5, removing collection if belonging to, return and perform step 12.2;
Step 12.5: directed local colony concentrates each colony after the described colony coming taking-up judge, namely whether the degree of overlapping of each colony after described colony and described colony is greater than overlapping thresholding, if being greater than overlapping thresholding, the colony after described colony being added and removing collection; If being less than or equal to overlapping thresholding, then the colony after described colony is retained in directed local colony and concentrates;
Step 12.6: form new directed local colony collection by retaining all colonies concentrated.
Compared with prior art, the present invention has following useful effect:
1, according to directed local colony provided by the invention discover method, only need to provide a model node relevant to target group, get final product automatic hard objectives, infer the Importance of attribute weight vector collection around this node.
2, according to directed local colony provided by the invention discover method, based on the Importance of attribute weight vector weighting network again inferred, the limit weights in outstanding target group, extract the composition colony of the limit in target group seed, and local expansion becomes target group.
3, according to directed local colony provided by the invention discover method, directed local colony collection is carried out aftertreatment, it is possible to effectively remove the wherein unimportant colony with repetition.
Accompanying drawing explanation
By reading with reference to the detailed description that non-limiting example is done by the following drawings, the other features, objects and advantages of the present invention will become more obvious:
Fig. 1 is the schema of directed local colony provided by the invention discover method;
Fig. 2 is that the directed colony between the present invention and multiple existing method finds that there is validity performance comparison figure, wherein, Fig. 2 (a) is for performance is with fuzzy parameter variation diagram, Fig. 2 (b) is for performance is with attribute number variation diagram, Fig. 2 (c) is for performance is with colony's number variation diagram in subspace, and Fig. 2 (d) is for performance is with number of network node variation diagram;
Fig. 3 is the Riming time of algorithm comparison diagram between the present invention and multiple existing method, and wherein, Fig. 3 (a) is for working time is with network edge number variation diagram, and Fig. 3 (b) is for working time is with attribute number variation diagram.
Embodiment
Below in conjunction with specific embodiment, the present invention is described in detail. The technician contributing to this neighborhood is understood the present invention by following examples further, but does not limit the present invention in any form. It should be appreciated that concerning the those of ordinary skill of this neighborhood, without departing from the inventive concept of the premise, it is also possible to make some distortion and improvement. These all belong to protection scope of the present invention.
In order to the technical scheme being illustrated more clearly in the present invention, the specific embodiment being listed below illustrates further:
According to directed local colony provided by the invention discover method, comprise the steps:
Step S1, the adjacency matrix A setting up network to be analyzed and attribute matrix B: for all nodes of network are numbered continuously, numbering is from 1; Elements A in adjacency matrix AijRepresentative edge (vi,vj) structure weights, value be 0 represent corresponding node between there is not limit; Build attribute matrix B, the element B in attribute matrix BipRepresent the p attribute value of i-th node;
Step S2, model node v based on a reaction orientation targetpInfer Importance of attribute weight vector collection
Wherein, t is attribute number, SDqIt is that model's node collection is at attribute fqOn similarity;
Described step S2, is specially:
Step S21, stochastic sampling | PR| individual node is to forming random node to collection;
Step S22, calculate all random nodes to f on each attributeqValue difference square sum RSumq, calculate standardizing factor ��q=RSumq/|PR|;
Step S23, the proximity network extracting model node out, divide neighborhood colony collection NCS (v in proximity networkp);
Step S24, whether collection is empty to judge neighborhood colony, then carries out step S25 if not, otherwise end step S2;
Step S25, from colony concentrate take out a neighborhood colony NCk, judge whether its scale is greater than CSlIf then carrying out S26, otherwise returning step S24;
Step S26, choose CS in neighborhood colonylIndividual node composition model's node collection, model's node concentrates the node of all any two nodes compositions to composition similar node to collection;
Step S27, calculate all similar node to f on each attributeqValue difference square sum SSumq;
Step S28, calculating model's node collection are at attribute fqOn similarity
Step S29, calculate each attribute fqImportance weight
Whether step S3, to judge weight vector collection be empty, then carries out step S4 if not, otherwise carries out step S13;
Step S4, from vector set take out a vector
Step S5, based on this weight vectorAgain weighting network limit weights W={Wij}:
W i j = 1 SED i j A v g D + γ A v g A × A i j , ∀ ( v i , v j ) ∈ ϵ ,
Wherein, WijRepresent limit (vi,vj) weights, SEDijRepresent limit (vi,vj) attributive distance, AvgD represents all limits average properties distance, AijRepresent limit (vi,vj) structure weights, its intermediate value is that 0 expression does not exist this limit, and AvgA represents all limits average structure distance, and �� represents control attribute and the balance parameters of structure importance balance;
Described step S5, is specially:
Step S51, the attributive distance calculating every bar limit
Step S52, the comprehensive weights calculating every bar limit W i j = 1 SED i j A v g D + γ A v g A × A i j , ∀ ( v i , v j ) ∈ ϵ ;
Step S6, the limit that to extract weights on weighting network again significantly big build colony subset SeedSetk;
Described step S6, is specially:
Step S61, is sorted composition sequence limit collection from big to small by weights in all limits;
Size before step S62, taking-up sequence limit collectionBSBar limit composition guiding set BS;
Step S63, taking the average of limit weights in guiding set and variance as parameter, guide a normal distribution;
Step S64, the limit taking out sequence Bian Ji foremost
Step S65, detecting whether described limit meets this normal distribution, if meeting, this limit being added guiding set, upgrade normal distribution average with new guiding set limit weights average, returning step S64, if not meeting, carrying out step S66;
Step S66, it is guided out a subgraph with limits all in guiding set;
In step S67, subgraph, each connected component is colony's seed, and all connected components form colony's subset;
Whether step S7, to judge subset be empty, then carries out step S8 if not, otherwise carries out S12;
Step S8, from subset take out colony's seed;
Step S9, judge whether this colony's seed belongs to set of access nodes visitedNodes, then carry out step S10 if not, otherwise return step S7;
This colony's seed optimization weighting of step S10, local expansion is led rate and is extracted a directed local colony;
Described step S10, is specially:
Step S101, colony's seed, as current colony, calculate as leading rate ��curr;
Step S102, when leading rate give initially lead rate ��init=��curr;
The current colony of step S103, local expansion cannot reduce, until continuing expansion, the rate of leading, and calculates new for leading rate ��curr;
The current colony of step S104, local contraction cannot reduce, until continuing contraction, the rate of leading, and calculates new for leading rate ��curr;
Step S105, judge initially to lead rate and when whether leading rate is equal, if then end step S10, otherwise return step S102;
Step S11, colony is added weight vector under directed local colony collection, upgrade set of access nodes;
Step S12, remove this weight vector under directed local colony concentrate colony that is unimportant and that repeat;
Described step S12, is specially:
Step S121, the colony that sorts from small to large by rate of leading concentrate all colonies;
Step S122, whether collection is empty to judge colony, then carries out S123 if not, otherwise carries out S126;
Step S123, take out a colony in turn by this;
Step S124, judge take out described colony whether be less than significance thresholding, if being less than, then this colony is added and removes collection; If being not less than, then judging whether this colony belongs to and remove collection, removing collection if not belonging to, this colony being added and retains collection, and perform step S125, removing collection if belonging to, return and perform step S122;
Step S125, directed local colony is concentrated the described colony coming taking-up after each colony judge, namely whether the degree of overlapping of each colony after described colony and described colony is greater than overlapping thresholding, if being greater than overlapping thresholding, the colony after described colony being added and removing collection; If being less than or equal to overlapping thresholding, then the colony after described colony is retained in directed local colony and concentrates;
The colony that step S127, reservation are concentrated forms new colony's collection;
Step S13, export all colonies collection.
For the present embodiment to be solved technical problem, technical scheme and advantage clearly, below in conjunction with accompanying drawing, the present embodiment is described in detail.
As shown in Figure 1, the directed local colony discover method that the present embodiment provides, comprises the steps:
Step S1, the adjacency matrix A setting up network to be analyzed and attribute matrix B: for all nodes of network are numbered continuously, numbering is from 1; Build square adjacency matrix A, the elements A in adjacency matrix AijRepresentative edge (vi,vj) structure weights, value be 0 represent corresponding node between there is not limit; Build attribute matrix B, the element B in attribute matrix BipRepresent the p attribute value of i-th node.
Step S2, model node v based on a reaction orientation targetpInfer Importance of attribute weight vector collection
Wherein, t is attribute number, SDqIt is that model's node collection is at attribute fqOn similarity.
Step S6, the limit that to extract weights on weighting network again significantly big build colony subset SeedSetk, is sorted composition sequence limit collection from big to small by weights in all limits, takes out Size before the collection of sequence limitBSBar limit composition guiding set BS, in guiding set, the average of limit weights and variance are as parameter, guide a normal distribution, whether the limit that detection sequence limit is concentrated one by one meets this normal distribution, if meeting, limit is added guiding set, and upgrades normal distribution average with the average of limit weights in new guiding set, if not meeting, stop detection, being guided out a subgraph with limits all in guiding set, in subgraph, each connected component is colony's seed, and all connected components form colony's subset.
Whether step S7, to judge subset be empty, then carries out step S8 if not, otherwise carries out S12.
Step S8, from subset take out colony's seed.
Step S9, judge whether this colony's seed belongs to set of access nodes visitedNodes, then carry out step S10 if not, otherwise return step S7.
This colony's seed optimization weighting of step S10, local expansion is led rate and is extracted a directed local colony, and colony's seed, as current colony, calculates as leading rate ��curr, give when leading rate and initially lead rate ��init=��curr, the current colony of local expansion cannot reduce, until continuing expansion, the rate of leading, and calculates new for leading rate ��curr, the current colony of local contraction cannot reduce, until continuing contraction, the rate of leading, and calculates new for leading rate ��curr, judge initially to lead rate and when whether leading rate is equal, if then end step S10, otherwise return and continue expansion colony.
Step S11, colony is added weight vector under directed local colony collection, upgrade set of access nodes.
Step S12, removing directed local colony under this weight vector concentrates unimportant with the colony repeated, all colonies are concentrated by rate of the leading colony that sorts from small to large, judge whether colony's collection is empty, each colony is taken out from small to large in turn if not empty by leading rate, judge whether its significance is less than significance thresholding, if then this colony adds and removes collection, otherwise judge whether it belongs to and remove collection, then this colony is added if not and retain collection, and whether the degree of overlapping judging each colony after coming this colony and this colony is greater than overlapping thresholding, if then adding remove collection by coming colony below, iteration said process is until former colony concentrates all colonies to be all removed, retain the colony concentrated and form new colony's collection.
Step S13, export all colonies collection.
The validity of the present embodiment can be illustrated further by emulation experiment below. It should be noted that, in experiment, the parameter of application does not affect the generality of the present invention.
1) simulated conditions:
CPUIntelI7-3770S3.10GHz, RAM16.00GB, operating system Windows10, software Matlab 2013.
2) content is emulated:
Choose synthetic attribute network to test, the LFR benchmark network using Lancichinetti and Fortunato to propose generates the data set with different mixing parameter �� and different scales n, the degree of mixing of hybrid parameter �� net control, it is worth more big, mixture of networks degree is more big, more difficult accurate discovery colony. On each LFR benchmark network, for the attached length of each node is AN attribute vector, generate numerical value attribute (num) network, two-value property (bin) network and absolute value attribute (cate) network respectively. Assuming that exists two attribute subspace, every sub spaces has 5 important attribute, has NCS directed colony.
The present embodiment represents with TLCD in emulation experiment.
Other colony's discover method of the present embodiment and 4 is carried out emulation contrast. These 4 methods are as follows, the Louvian method proposed in " Fastunfoldingofcommunitiesinlargenetworks " that the people such as Vincent delivered on " JournalofStatisticalMechanics " in 2008, the method only uses network topology information; The PICS method proposed in " the Pics:Parameter-freeidentificationofcohesivesubgroupsinla rgeattributedgraphs " that the people such as Akoglu delivered on " SDM " in 2012, the method is the overall space method simultaneously using structural information and attribute information; BAGC method that the people such as Xu published an article in international conference " SIGMOD " in 2012 in " Amodel-basedapproachtoattributedgraphclustering " and propose, the method is the overall space method simultaneously using topology information and attribute information; " Focusedclusteringandoutlierdetectioninlargeattributedgra phs " middle FocusCO method proposed that the people such as Perozzi delivered in 2014 at " SIGKDD ", the method is the attribute subspace method simultaneously using topology information and attribute information.
Emulation experiment validity results of property is as shown in Fig. 2 (a)��Fig. 2 (d), and the performance of PICS is worst in all cases, and when major part, the performance the 2nd of BAGC is poor, which show the shortcoming of total space method. Performance is all best in all cases for TLCD, and FocusCO is then relatively unstable and poor. Working time, result was as shown in Fig. 3 (a)��Fig. 3 (b), and PICS needs maximum working times in all cases. The working time of BAGC is shorter when network is less, but increases along with the increase of network. Only depending primarily on expansion colony's seed the working time of TLCD and FocusCO, their time curve is steadily approximate.
The directed local colony discover method that the present embodiment provides, can be used for the visual network with node attribute information, it has been found that the group structure of specific objective, is applicable to various different social application. The present embodiment infers Importance of attribute weight vector based on the model node of a reaction orientation target; Based on weight vector again weighting network limit weights; Again weighting network extracts the limit composition colony seed that weights are significantly big; The seed optimization weighting of the present embodiment local expansion colony is led rate and is extracted directed local colony, and removes in the directed local colony extracted unimportant in the colony repeated.
Above specific embodiments of the invention are described. It is understood that the present invention is not limited to above-mentioned particular implementation, this neighborhood technician can make various distortion or amendment within the scope of the claims, and this does not affect the flesh and blood of the present invention.

Claims (7)

1. a directed local colony discover method, it is characterised in that, comprise the steps:
Step 1: adjacency matrix A and the attribute matrix B setting up network to be analyzed;
Step 2: user provides the model node v of a reaction orientation targetp, the present invention infers the Importance of attribute weight vector collection in its structure neighborhood based on this node;
Step 3: judge whether weight vector collection is empty, if the non-sky of weight vector collection, then carries out step 4; If weight vector collection is empty, then perform step 13;
Step 4: concentrate from weight vector and take out a weight vector
Step 5: based on described weight vectorAgain weighting obtains the comprehensive weights of network edge;
Step 6: extract the limit that weights are significantly big on weighting network again, and build colony's subset with described limit;
Step 7: judge whether colony's subset is empty, if the non-sky of colony's subset, then performs step 8; If colony's subset is empty, then perform step 12;
Step 8: take out colony's seed from colony's subset;
Step 9: judge whether described colony seed belongs to set of access nodes, if not belonging to set of access nodes, then performs step 10; If belonging to set of access nodes, then return and perform step 7;
Step 10: colony's seed described in local expansion until the weighting of colony representated by this seed to lead rate minimum, now obtain a directed local colony, the ratio that the weighted volume with this colony is cut in the weighting that rate is defined as this colony is led in the weighting of colony;
Step 11: directed local colony is added described weight vectorUnder directed local colony collection, upgrade set of access nodes;
Step 12: remove directed local colony under described weight vector and concentrate unimportant in the directed local colony repeated, the colony of unimportant colony to be those internal edges weights sums with the ratio of network all limits weights sum be less than a significance thresholding, the colony of repetition is that those and the scale occured simultaneously between having there is colony that colony concentrates are greater than the colony that is repeated thresholding;
Step 13: export all directed local colonies collection.
2. directed local colony according to claim 1 discover method, it is characterised in that, the adjacency matrix A coding network structural information in described step 1, the either element A in matrixijRepresentative edge (vi,vj) topological weights, on duty when being 0, represent corresponding node between there is not limit, attribute matrix B coding network attribute information, the either element B in attribute matrix BipRepresent the p attribute value of i-th node.
3. directed local colony according to claim 1 discover method, it is characterised in that, described step 2 comprises: based on the model node v of a reaction orientation targetpInfer Importance of attribute weight vector collection ��, wherein:
In formula:Represent kth weight vector,Representing the weights of q attribute in certain weight vector, q represents property index, and t represents attribute number, SDqRepresent the similarity of model's node collection on the q attribute;
Specifically, comprising:
Step 2.1: stochastic sampling in a network | PR| individual node to composition random node to collection, | PR| represent that random node is to the number of centralized node pair;
Step 2.2: calculate all random nodes to the sum of squares RSum of the difference of attribute valueq, calculate standardizing factor ��q=RSumq/|PR|;
Step 2.3: the proximity network extracting model node out, namely the network of all node compositions being connected with model node, divides out neighborhood colony collection NCS (v in proximity networkp);
Step 2.4: judge whether neighborhood colony collection is empty, then performs step 2.5 if not empty, otherwise end step 2;
Step 2.5: concentrate from neighborhood colony and take out a neighborhood colony NCk, judge whether the inner number of nodes of neighborhood colony is greater than CSlIf then carrying out 2.6, otherwise returning step 2.4;
Step 2.6: choose CS in described neighborhood colony at randomlIndividual node composition model's node collection, model's node concentrates the node of all any two nodes compositions to composition similar node to collection;
Step 2.7: calculate all similar node to the sum of squares SSum of the difference of attribute valueq;
Step 2.8: calculate the similarity of model's node collection on the q attribute
Step 2.9: the importance weight calculating q attribute
4. directed local colony according to claim 1 discover method, it is characterised in that, described step 5 comprises:
Step 5.1: the attributive distance on every bar limit in computational grid
In formula: BiRepresent the attribute vector of i-th node, BjRepresent the attribute vector of jth node,Represent that diagonal lines isDiagonal matrix, (Bi-Bj)TRepresent the transposition of i-th node attribute vector with the difference of jth node attribute vector;
Step: 5.2: the weights calculating every bar limit, namely based on described weight vectorAgain weighting network limit weights W={Wij}:
W i j = 1 SED i j A v g D + γ A v g A × A i j , ∀ ( v i , v j ) ∈ ϵ ,
In formula: viRepresent node i, vjRepresent node j, (vi,vj) representing the limit between node i and node j, E represents network edge collection, WijRepresent limit (vi,vj) weights, SEDijRepresent limit (vi,vj) attributive distance, AvgD represents all limits average properties distance; AijRepresent limit (vi,vj) structure weights, on duty is that 0 expression does not exist this limit; AvgA represents all limits average structure distance, and �� represents control attribute and the balance parameters of structure importance balance.
5. directed local colony according to claim 1 discover method, it is characterised in that, described step 6 comprises: extracts the significantly big limit of weights on weighting network again and builds colony subset SeedSetk;
Specifically, comprising:
Step 6.1: is sorted composition sequence limit collection from big to small by weights in all limits;
Step 6.2: take out sequence limit and concentrate front SizeBSBar limit composition guiding set BS;
Step 6.3: the average of limit weights and variance are as parameter in guiding set, guide a normal distribution;
Step 6.4: take out the limit that foremost is concentrated on sequence limit;
Step 6.5: judge whether the weights on the limit of described taking-up meet described normal distribution, if meeting, adds guiding set by limit, upgrades normal distribution average with the average of limit weights in new guiding set, and returns step 6.4, if not meeting, performs step 6.6;
Step 6.6: be guided out a subgraph with limits all in guiding set;
Step 6.7: in subgraph, each connected component is colony's seed, all connected components form colony's subset.
6. directed local colony according to claim 1 discover method, it is characterised in that, described step 10 comprises:
Step 10.1: colony's seed, as current colony, calculates the weighting of current colony and leads rate ��curr, weighting is led rate and is defined as:
ψ = W C u t m i n ( W V o l , T V o l - W V o l ) ;
In formula: WCut represents that the weighting of colony is cut, i.e. limit weights sum between node outside colony's interior joint and colony, WVol represents the weighted volume of colony, i.e. all node incidence edge weights sums in colony, TVol represents total weighted volume of network, i.e. all node incidence edge weights sums in network;
Step 10.2: current conductivity value is given and initially leads rate ��init=��curr;
Step 10.3: the current colony of local expansion cannot reduce, until continuing expansion, the rate of leading, calculates new for leading rate ��curr;
Step 10.4: the current colony of local contraction cannot reduce, until continuing contraction, the rate of leading, calculates new for leading rate ��curr;
Step 10.5: whether judgement is initially led rate and work as leading rate equal, if equal, then end step 10, no inequal, then return step 10.2.
7. directed local colony according to claim 1 discover method, it is characterised in that, described step 12 comprises:
Step 12.1: all colonies that directed local colony concentrates are led rate by weighting and sorts from small to large;
Step 12.2: judge whether colony's collection is empty, if not empty, then perform step 12.3; , if it is empty, then perform step 12.6;
Step 12.3: take out the colony that directed local colony concentrates foremost;
Step 12.4: judge whether the described colony taken out is less than significance thresholding, if being less than, then adds this colony and removes collection; If being not less than, then judging whether this colony belongs to and remove collection, removing collection if not belonging to, this colony being added and retains collection, and perform step 12.5, removing collection if belonging to, return and perform step 12.2;
Step 12.5: directed local colony concentrates each colony after the described colony coming taking-up judge, namely whether the degree of overlapping of each colony after described colony and described colony is greater than overlapping thresholding, if being greater than overlapping thresholding, the colony after described colony being added and removing collection; If being less than or equal to overlapping thresholding, then the colony after described colony is retained in directed local colony and concentrates;
Step 12.6: form new directed local colony collection by retaining all colonies concentrated.
CN201510996221.3A 2015-12-25 2015-12-25 Directional local group discovery method Pending CN105631751A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510996221.3A CN105631751A (en) 2015-12-25 2015-12-25 Directional local group discovery method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510996221.3A CN105631751A (en) 2015-12-25 2015-12-25 Directional local group discovery method

Publications (1)

Publication Number Publication Date
CN105631751A true CN105631751A (en) 2016-06-01

Family

ID=56046642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510996221.3A Pending CN105631751A (en) 2015-12-25 2015-12-25 Directional local group discovery method

Country Status (1)

Country Link
CN (1) CN105631751A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108257036A (en) * 2018-01-12 2018-07-06 西安电子科技大学 Discovery method, the Web Community's system of overlapping community are extended based on seed node
CN109523012A (en) * 2018-10-11 2019-03-26 上海交通大学 Based on Variational Solution Used coupled modes to the expression learning method of symbol directed networks
CN111325350A (en) * 2020-02-19 2020-06-23 第四范式(北京)技术有限公司 Suspicious tissue discovery system and method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108257036A (en) * 2018-01-12 2018-07-06 西安电子科技大学 Discovery method, the Web Community's system of overlapping community are extended based on seed node
CN109523012A (en) * 2018-10-11 2019-03-26 上海交通大学 Based on Variational Solution Used coupled modes to the expression learning method of symbol directed networks
CN109523012B (en) * 2018-10-11 2021-06-04 上海交通大学 Expression learning method for symbol directed network based on variational decoupling mode
CN111325350A (en) * 2020-02-19 2020-06-23 第四范式(北京)技术有限公司 Suspicious tissue discovery system and method
CN111325350B (en) * 2020-02-19 2023-09-29 第四范式(北京)技术有限公司 Suspicious tissue discovery system and method

Similar Documents

Publication Publication Date Title
CN103678671B (en) A kind of dynamic community detection method in social networks
Crawford et al. Parameter tuning of a choice-function based hyperheuristic using particle swarm optimization
Tang et al. Exploring dynamic property of traffic flow time series in multi-states based on complex networks: Phase space reconstruction versus visibility graph
CN104933624A (en) Community discovery method of complex network and important node discovery method of community
Wang et al. Asymmetric intimacy and algorithm for detecting communities in bipartite networks
CN103020163A (en) Node-similarity-based network community division method in network
CN103034687B (en) A kind of relating module recognition methodss based on 2 class heterogeneous networks
CN105631751A (en) Directional local group discovery method
CN103888541A (en) Method and system for discovering cells fused with topology potential and spectral clustering
CN106102163A (en) WLAN fingerprint positioning method based on RSS linear correlation Yu secondary weighted centroid algorithm
CN107527295A (en) Dynamics community of Academic Teams based on tense coauthorship network finds method and its method for evaluating quality
Jin et al. Detect overlapping communities via ranking node popularities
CN107563220A (en) A kind of computer based big data analysis and Control system and control method
CN106780058A (en) The group dividing method and device of dynamic network
CN109921938A (en) Fault detection method under a kind of cloud computing environment
Hu et al. A new algorithm CNM-Centrality of detecting communities based on node centrality
CN104657442A (en) Multi-target community discovering method based on local searching
CN105228185A (en) A kind of method for Fuzzy Redundancy node identities in identification communication network
CN102521655A (en) Method for detecting dynamic network community on basis of non-dominated neighbor immune algorithm
CN110442800A (en) A kind of semi-supervised community discovery method of aggregators attribute and graph structure
CN107276093B (en) The Probabilistic Load calculation method cut down based on scene
CN105488247A (en) K-mean community structure mining method and apparatus
Mukhaiyar et al. The Generalized STAR Modelling with Minimum Spanning Tree Approach of Weight Matrix for COVID-19 Case in Java Island
CN108171538A (en) User data processing method and system
CN104933103A (en) Multi-target community discovering method integrating structure clustering and attributive classification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160601