CN109063173A - A kind of semi-supervised overlapping community discovery method based on partial tag information - Google Patents

A kind of semi-supervised overlapping community discovery method based on partial tag information Download PDF

Info

Publication number
CN109063173A
CN109063173A CN201810953974.XA CN201810953974A CN109063173A CN 109063173 A CN109063173 A CN 109063173A CN 201810953974 A CN201810953974 A CN 201810953974A CN 109063173 A CN109063173 A CN 109063173A
Authority
CN
China
Prior art keywords
node
label
community
network
prior information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810953974.XA
Other languages
Chinese (zh)
Inventor
李建平
顾小丰
胡健
林思哲
杨久东
蔡京京
张建国
赖志龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201810953974.XA priority Critical patent/CN109063173A/en
Publication of CN109063173A publication Critical patent/CN109063173A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a kind of semi-supervised overlapping community discovery method based on partial tag information.The present invention takes full advantage of the positive and negative label information in network, and takes asynchronous propagation mechanism.These prior informations that the present invention will make full use of in network, and part Must-link prior information is converted into Must-in information, Cannot-link prior information is converted into Cannot-in information, and community discovery process is instructed with this, time complexity of the invention is O (mnlog (m)), although therefore the present invention improves the not increase of community discovery accuracy time complexity.

Description

A kind of semi-supervised overlapping community discovery method based on partial tag information
Technical field
The present invention relates to community discovery method technical fields, and in particular to a kind of based on the semi-supervised heavy of partial tag information Folded community discovery method.
Background technique
In real world, many complication systems can be described as the form of complex network.Community structure is as complicated One of key property of network plays important role in people's lives.In time, it accurately finds to hide in network Community structure, and then analyze the internal feature of complication system, can not only instruct the production activity of people, but also for understanding And Control complex systems are also very helpful.
Traditional community discovery algorithm is since time complexity is high, division result accuracy rate is low, needs to specify community to advise in advance The reasons such as mould and cannot be widely used.
Tradition overlapping community discovery method has COPRA algorithm, and in traditional COPRA algorithm, the communication process of label is only Only related with neighbor node, the probability that label travels to this node is identical by all neighbor nodes.However, in truth Under, bigger with the similarity of this node, the probability that label is traveled to this node is also bigger.In overlapping community discovery neck Domain, traditional COPRA algorithm only only account for the topological structure of network.However, studies have shown that in live network often there is A small amount of prior information, so that accuracy rate during community discovery is lower.
Summary of the invention
For above-mentioned deficiency in the prior art, a kind of semi-supervised overlapping based on partial tag information provided by the invention Community discovery method solves the problems, such as that accuracy rate of traditional COPRA algorithm during community discovery is lower.
In order to achieve the above object of the invention, a kind of the technical solution adopted by the present invention are as follows: half based on partial tag information Monitor overlay community discovery method, comprising the following steps:
S1, node Subject Matrix is initialized according to the positive label prior information of Must-in and Must-linkN is network In interstitial content, C be network in community's number;
S2, by the Must-in and the positive label prior information of Must-link, Cannot-in and Cannot-link in network Negative label prior information is converted to the subordinated-degree matrix F of networkn×CAnd label existence matrix S in networkn×C
As the node v in networkiIt is Must-in relationship with community c, then fic=1, otherwise fic=0;Label exists in network Property matrix Sn×CValue are as follows: as nodes viContain Must-in or Cannot-in information between community c, then sic= al, otherwise sic=au, 0≤(al,au)≤1, alIt is substantially equal to 1, auIt is substantially equal to 0;fic∈Fn×C, sic∈Sn×C, i=1, 2 ..., n, c=1,2 ..., C;
S3, label communication process is executed according to network affiliation matrix iteration, as node viWith the label membership of community c When determining, i.e. sic=al, no longer calculate node viLabel ownership, otherwise, from the tally set of neighbor node obtain node vi's Label ownership;
S4, according to node viLabel ownership calculate network in community's label number, if number change, enter step Otherwise S5 enters step S6;
S5, the smallest size value of each community's label is assigned to mt, and return step S2;
S6, the smallest size value of all community's labels is assigned to mt
S7, work as mt=mt-1When, terminate algorithm, otherwise return step S2.
Further: the formula of iterative process in the step S3 are as follows:
In above formula,For t+1 iteration posterior nodal point viWith the label membership of community c,For t iteration posterior nodal point viWith the label membership of community c, wijFor node viWith node vjLabel transition probability,For t+1 iteration posterior nodal point viWith node vjLabel membership, wikFor node viWith node collection vkLabel transition probability,For t iteration posterior nodal point viWith node collection vkLabel membership, N (i) be node viNeighbor node.
Further: label communication process in the step S3 are as follows: for node viWith the member of community c its clear relationship Element, no longer progress label communication process;For containing Must-link prior information but any one node does not all have Must- The node of in prior information is to executing open label communication strategy, for containing Cannot-link prior information but any one A node does not all contain the node of Cannot-in prior information to executing stringent label communication strategy;And it is unknown for those The node of prior information then calculates its probability for being subordinated to community c from its neighbor node.
Further: the calculation method of community's label number in the step S4 are as follows: if node viWith node vjHave Must-link prior information, but be free of Must-in prior information, then in node viWith node vjBetween establish one it is unidirectional Must-link prior information channel;If node viWith node vjBetween have Cannot-link prior information, then looking for node vi's When label, by node vjLeave out from its candidate tally set.
The invention has the benefit that the present invention takes full advantage of the positive and negative label information in network, and take asynchronous Mechanism of transmission.These prior informations that the present invention will make full use of in network, and part Must-link prior information is converted to Cannot-link prior information is converted to Cannot-in information by Must-in information, and instructs community discovery mistake with this Journey, time complexity of the invention are O (mnlog (m)), although therefore the present invention improves community discovery accuracy time complexity There is no increase for degree.
Detailed description of the invention
Fig. 1 is flow chart of the present invention.
Specific embodiment
A specific embodiment of the invention is described below, in order to facilitate understanding by those skilled in the art this hair It is bright, it should be apparent that the present invention is not limited to the ranges of specific embodiment, for those skilled in the art, As long as various change is in the spirit and scope of the present invention that the attached claims limit and determine, these variations are aobvious and easy See, all are using the innovation and creation of present inventive concept in the column of protection.
As shown in Figure 1, a kind of semi-supervised overlapping community discovery method based on partial tag information, comprising the following steps:
S1, node is initialized to Must-link (must be related) positive label prior information according to Must-in (must include) Subject MatrixN is the interstitial content in network, and C is community's number in network.
Part Cannot-link and the Cannot-in information in network is pre-processed first:
If (vi,vj)∈DT, (vj,vk)∈DT, then (vi,vk)∈DT
If (vi,vj)∈DT, (vj,vk)∈DN, then (vi,vk)∈DN
If (vi,vj)∈DT, vi∈NT, then vj∈NT, and node viWith node vjLabel be the same;
If (vi,vj)∈DT, vi∈NN, then vj∈NN, and node viWith node vjOn negative label be the same;
If (vi,vj)∈DN, vi∈NT, then vj∈NN, and node viOn positive label and node vjOn negative label be one Sample.
S2, by network Must-in and the positive label prior information of Must-link, Cannot-in (cannot include) and Cannot-link (cannot be related) negative label prior information is converted to the subordinated-degree matrix F of networkn×CAnd label in network Existence matrix Sn×C
As the node v in networkiIt is Must-in relationship with community c, then fic=1, otherwise fic=0;Label exists in network Property matrix Sn×CValue are as follows: as nodes viContain Must-in or Cannot-in information between community c, then sic= al, otherwise sic=au, 0≤(al,au)≤1, alIt is substantially equal to 1, auIt is substantially equal to 0.fic∈Fn×C, sic∈Sn×C, i=1, 2 ..., n, c=1,2 ..., C.
S3, label communication process is executed according to network affiliation matrix iteration, as node viWith the label membership of community c When determining, i.e. sic=al, no longer calculate node viLabel ownership, otherwise, from the tally set of neighbor node obtain node vi's Label ownership.The formula of iterative process are as follows:
In above formula,For t+1 iteration posterior nodal point viWith the label membership of community c,For t iteration posterior nodal point viWith the label membership of community c, wijFor node viWith node vjLabel transition probability,For t+1 iteration posterior nodal point viWith node vjLabel membership, wikFor node viWith node collection vkLabel transition probability,For t iteration posterior nodal point viWith node collection vkLabel membership, N (i) be node viNeighbor node.
Label communication process are as follows: for node viWith the element of community c its clear relationship, no longer progress label is propagated through Journey;For containing Must-link prior information but any one node does not all have the node of Must-in prior information to execution Open label communication strategy, for containing Cannot-link prior information but any one node does not all contain Cannot-in The node of the prior information label communication strategy stringent to execution;And for the node of those unknown prior informations, then from its neighbour It occupies and calculates its probability for being subordinated to community c in node.
S4, according to node viLabel ownership calculate network in community's label number, if number change, enter step Otherwise S5 enters step S6.The calculation method of community's label number are as follows: if node viWith node vjBelieve with Must-link priori Breath, but be free of Must-in prior information, then in node viWith node vjBetween establish a unidirectional Must-link prior information Channel;If node viWith node vjBetween have Cannot-link prior information, then looking for node viLabel when, by node vjFrom Leave out in its candidate tally set.
S5, the smallest size value of each community's label is assigned to mt, and return step S2.
S6, the smallest size value of all community's labels is assigned to mt
S7, work as mt=mt-1When, terminate algorithm, otherwise return step S2.
In embodiments of the present invention, reality in there is positive label priori knowledge for instruct community discovery have it is important Effect, in addition, some negative label prior informations in reality are to instructing community discovery equally also to play an important role.Such as Cannot-link and Cannot-in information in network be also it is ubiquitous, overlapping community discovery in be responsible for can not or Scarce responsibility.

Claims (4)

1. a kind of semi-supervised overlapping community discovery method based on partial tag information, which comprises the following steps:
S1, node Subject Matrix is initialized according to the positive label prior information of Must-in and Must-linkN is in network Interstitial content, C are community's number in network;
S2, Must-in and the positive label prior information of Must-link, the Cannot-in and Cannot-link in network are born into mark Label prior information is converted to the subordinated-degree matrix F of networkn×CAnd label existence matrix S in networkn×C
As the node v in networkiIt is Must-in relationship with community c, then fic=1, otherwise fic=0, label existence square in network Battle array Sn×CValue are as follows: as nodes viContain Must-in or Cannot-in information between community c, then sic=al, no Then sic=au, 0≤(al,au)≤1, alIt is substantially equal to 1, auIt is substantially equal to 0;fic∈Fn×C, sic∈Sn×C, i=1, 2 ..., n, c=1,2 ..., C;
S3, label communication process is executed according to network affiliation degree matrix iteration, as node viIt is true with the label membership of community c Periodically, i.e. sic=al, no longer calculate node viLabel ownership, otherwise, from the tally set of neighbor node obtain node viMark Label ownership;
S4, according to node viLabel ownership calculate network in community's label number, if number change, enter step S5, it is no Then enter step S6;
S5, the smallest size value of each community's label is assigned to mt, and return step S2;
S6, the smallest size value of all community's labels is assigned to mt
S7, work as mt=mt-1When, terminate algorithm, otherwise return step S2.
2. the semi-supervised overlapping community discovery method according to claim 1 based on partial tag information, which is characterized in that The formula of iterative process in the step S3 are as follows:
In above formula,For t+1 iteration posterior nodal point viWith the label membership of community c,For t iteration posterior nodal point viWith society The label membership of area c, wijFor node viWith node vjAttraction,For t+1 iteration posterior nodal point viWith node vj's Label membership, wikFor node viWith node collection vkAttraction,For t iteration posterior nodal point viWith node collection vkLabel Membership, N (i) are node viNeighbor node.
3. the semi-supervised overlapping community discovery method according to claim 1 based on partial tag information, which is characterized in that Label communication process in the step S3 are as follows: for node viWith the element of community c its clear relationship, no longer progress label biography Broadcast process;For containing Must-link prior information but any one node does not all have the node pair of Must-in prior information Open label communication strategy is executed, for containing Cannot-link prior information but any one node does not all contain The node of the Cannot-in prior information label communication strategy stringent to execution;And for the node of those unknown prior informations, Its probability for being subordinated to community c is then calculated from its neighbor node.
4. the semi-supervised overlapping community discovery method according to claim 1 based on partial tag information, which is characterized in that The calculation method of community's label number in the step S4 are as follows: if node viWith node vjWith Must-link prior information, but Without Must-in prior information, then in node viWith node vjBetween establish a unidirectional Must-link prior information channel; If node viWith node vjBetween have Cannot-link prior information, then looking for node viLabel when, by node vjFrom its time It selects in tally set and leaves out.
CN201810953974.XA 2018-08-21 2018-08-21 A kind of semi-supervised overlapping community discovery method based on partial tag information Pending CN109063173A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810953974.XA CN109063173A (en) 2018-08-21 2018-08-21 A kind of semi-supervised overlapping community discovery method based on partial tag information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810953974.XA CN109063173A (en) 2018-08-21 2018-08-21 A kind of semi-supervised overlapping community discovery method based on partial tag information

Publications (1)

Publication Number Publication Date
CN109063173A true CN109063173A (en) 2018-12-21

Family

ID=64687569

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810953974.XA Pending CN109063173A (en) 2018-08-21 2018-08-21 A kind of semi-supervised overlapping community discovery method based on partial tag information

Country Status (1)

Country Link
CN (1) CN109063173A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298774A (en) * 2021-05-20 2021-08-24 复旦大学 Image segmentation method and device based on dual condition compatible neural network
CN113434815A (en) * 2021-07-02 2021-09-24 中国计量大学 Community detection method based on similar and dissimilar constraint semi-supervised nonnegative matrix factorization

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9418142B2 (en) * 2013-05-24 2016-08-16 Google Inc. Overlapping community detection in weighted graphs
CN103729475B (en) * 2014-01-24 2016-10-26 福州大学 Multi-tag in a kind of social networks propagates overlapping community discovery method
CN108073944A (en) * 2017-10-18 2018-05-25 南京邮电大学 A kind of label based on local influence power propagates community discovery method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9418142B2 (en) * 2013-05-24 2016-08-16 Google Inc. Overlapping community detection in weighted graphs
CN103729475B (en) * 2014-01-24 2016-10-26 福州大学 Multi-tag in a kind of social networks propagates overlapping community discovery method
CN108073944A (en) * 2017-10-18 2018-05-25 南京邮电大学 A kind of label based on local influence power propagates community discovery method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
睢世凯: "基于局部标签信息的半监督社区发现算法研究", 《中国优秀硕士学位论文全文数据库科技信息辑》 *
陈俊宇: "一种半监督的局部扩展式重叠社区发现方法", 《计算机研究与发展》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298774A (en) * 2021-05-20 2021-08-24 复旦大学 Image segmentation method and device based on dual condition compatible neural network
CN113434815A (en) * 2021-07-02 2021-09-24 中国计量大学 Community detection method based on similar and dissimilar constraint semi-supervised nonnegative matrix factorization

Similar Documents

Publication Publication Date Title
Shu et al. Expansion-squeeze-excitation fusion network for elderly activity recognition
Thakoor et al. Bootstrapped representation learning on graphs
CN103678431B (en) A kind of recommendation method to be scored based on standard label and project
Shen et al. Shape clustering: Common structure discovery
CN112199462A (en) Cross-modal data processing method and device, storage medium and electronic device
EP3136293A1 (en) Method and device for processing an image of pixels, corresponding computer program product and computer readable medium
Xu et al. Weakly supervised deep semantic segmentation using CNN and ELM with semantic candidate regions
Li et al. Patch alignment manifold matting
Xu et al. Instance-level coupled subspace learning for fine-grained sketch-based image retrieval
CN104966052A (en) Attributive characteristic representation-based group behavior identification method
CN109063173A (en) A kind of semi-supervised overlapping community discovery method based on partial tag information
Shang et al. Cattle behavior recognition based on feature fusion under a dual attention mechanism
Wang et al. Salient object detection by robust foreground and background seed selection
CN112906517B (en) Self-supervision power law distribution crowd counting method and device and electronic equipment
Mund et al. Active online confidence boosting for efficient object classification
Zhang et al. A fast object tracker based on integrated multiple features and dynamic learning rate
CN117315090A (en) Cross-modal style learning-based image generation method and device
CN103942779A (en) Image segmentation method based on combination of graph theory and semi-supervised learning
Wibowo et al. Convolutional shallow features for performance improvement of histogram of oriented gradients in visual object tracking
Bi et al. C^ 2 C 2 Net: a complementary co-saliency detection network
Kawewong et al. A speeded-up online incremental vision-based loop-closure detection for long-term SLAM
Zhang et al. Search-based depth estimation via coupled dictionary learning with large-margin structure inference
Lv et al. A challenge of deep‐learning‐based object detection for hair follicle dataset
CN115168609A (en) Text matching method and device, computer equipment and storage medium
Yao et al. Multi‐scale feature learning and temporal probing strategy for one‐stage temporal action localization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181221

RJ01 Rejection of invention patent application after publication