CN107169871B - Multi-relationship community discovery method based on relationship combination optimization and seed expansion - Google Patents

Multi-relationship community discovery method based on relationship combination optimization and seed expansion Download PDF

Info

Publication number
CN107169871B
CN107169871B CN201710260510.6A CN201710260510A CN107169871B CN 107169871 B CN107169871 B CN 107169871B CN 201710260510 A CN201710260510 A CN 201710260510A CN 107169871 B CN107169871 B CN 107169871B
Authority
CN
China
Prior art keywords
community
communities
nodes
relationship
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710260510.6A
Other languages
Chinese (zh)
Other versions
CN107169871A (en
Inventor
杨清海
尹霄冲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Xian Cetc Xidian University Radar Technology Collaborative Innovation Research Institute Co Ltd
Original Assignee
Xidian University
Xian Cetc Xidian University Radar Technology Collaborative Innovation Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University, Xian Cetc Xidian University Radar Technology Collaborative Innovation Research Institute Co Ltd filed Critical Xidian University
Priority to CN201710260510.6A priority Critical patent/CN107169871B/en
Publication of CN107169871A publication Critical patent/CN107169871A/en
Application granted granted Critical
Publication of CN107169871B publication Critical patent/CN107169871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Abstract

The invention belongs to the technical field of social networks and computer application, and discloses a multi-relationship community discovery method based on relationship combination optimization and seed expansion. Experiments show that compared with the traditional method, the method has the advantages of high result accuracy and strong anti-noise capability.

Description

Multi-relationship community discovery method based on relationship combination optimization and seed expansion
Technical Field
The invention belongs to the technical field of social networks and computer application, and particularly relates to a multi-relationship community discovery method based on relationship combination optimization and seed expansion.
Background
The Social Network (Social Network) is a Network structure formed by a plurality of nodes and connecting edges, the nodes can correspond to people in real life and various Social organizations, and the connecting edges between the nodes can correspond to various communication relationships in daily life, such as Social relationships among friends, family, classmates and the like between people, and also can be contact relationships among WeChat, telephone, mails and the like. The community discovery is an important research direction of complex networks including social networks, and has great application value in the fields of electronic commerce, public safety, biology and the like. The community refers to a plurality of node clusters which have more common characteristics and are connected with each other. The network topology structure is characterized in that the connection among the nodes in the same community is tight, and the connection among the nodes which do not belong to the same community is sparse. The community discovery is a basic task of social network analysis, and is to mine community structures existing in the network, and research the communities of the network has a crucial role in understanding the structure and function of the whole network. Generally, research and analysis on social networks mostly research one relationship in the networks, and various types of relationships exist among social network individuals in real life, for example, social relationships such as friends, classmates, family members exist among people. Besides traditional telephone and short messages, various internet-based communication links such as WeChat, microblog, human Facebook, Twitter, Youtube and the like also commonly exist in social networks, and even in the same communication tool, various communication ways exist, for example, communication among people in microblog can be generated through various channels such as comment, forwarding, praise and the like. Social networks of multiple relationships are ubiquitous in real life. At present, community discovery methods for social networks are basically based on research of a relationship, and because a communication mode among social individuals has certain subjectivity, communication information among individuals cannot be fully mastered only by adopting a relationship, and community division results are probably not corresponding to reality finally due to incomplete information. Meanwhile, some communication mode may generate more social noise due to easier contact with strangers or persons who are not familiar with the communication mode, and the noise can cover up real community structures in real life. When some existing multi-relationship community discovery methods are used for carrying out community discovery on a multi-relationship network, networks formed by various relationships are all viewed as the same thing, but in the actual multi-relationship social network, due to the fact that communication modes are different, social noise of each relationship is different. Although the community information among the relationships can be supplemented by the method, noise of the social relationships is introduced, and the result of community discovery by comprehensively considering multiple relationships is rather inferior to the result of community discovery by only considering a single relationship under the condition that the noise of some social relationships is relatively large.
In summary, the problems of the prior art are as follows: in the existing community discovery method for the social network, the difference of each relationship is ignored, and the information brought by each relationship is treated as same as one another, so that the social noise carried by each relationship network is introduced when the community information of each relationship network is integrated, the community discovery accuracy is not high enough, and even the community discovery result obtained by comprehensively considering multiple relationships is inferior to the community discovery result obtained by considering only a single relationship when the noise of some relationships is high.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a multi-relationship community discovery method based on relationship combination optimization and seed expansion.
The invention is realized in this way, a multi-relation community discovery method based on relation combination optimization and seed expansion, the multi-relation community discovery method based on relation combination optimization and seed expansion fuses a multi-relation network into a single-relation network which can effectively synthesize the community information of each relation and has low noise at the same time through the weight proportion of each relation in a multi-objective optimization network, then the community division information of each relation in the multi-relation network is synthesized to find out the crowd in the same community in each relation, the small community formed by the crowds is used as a seed community, and the community mining is carried out on the multi-relation social network by adopting a seed expansion strategy to obtain the community structure division with higher accuracy;
the weight ratio optimization sub-target adopts modularity as an index for measuring the structural strength and reliability of the result community, and a calculation result CaModularity of QaThe modularity is calculated as follows:
Figure BDA0001274573640000031
wherein A isi,jIs the adjacency matrix of the entire network, m is the number of edges contained in the network, kiAnd kjDegree of node i and node j, respectively, (C)i,Cj) The value of (C) depends on whether i and j are in the same community, if so (C)i,Cj) If not, the value is 0;
the second weight proportion optimization sub-target adopts NMI as a community structure information similarity measurement standard, and is calculated as follows:
Figure BDA0001274573640000032
the rows of the confusion matrix H represent real community division results, and the columns of the confusion matrix H represent community results drawn by the division algorithm. CAAnd CBRespectively represents the number of communities contained in the A result and the B result, HijRepresents the number of nodes, H, that should be in community i but appear in community ji.And H.jRespectively representing the sum of the number of nodes of communities in the ith row and the jth column, wherein N is the number of nodes contained in the whole network; when NMI is equal to 1, the community structures representing two results of A and B are completely the same, and when NMI is equal to 0, the community structures represent completely different;
the seed expansion strategy selects the community S with the maximum number of nodesiAs a seed community; calculating seed community SiThe similarity with the rest candidate seed communities and the rest nodes is calculated as follows:
Figure BDA0001274573640000033
u and v are respectively community Si,SjNode n of (1)iAnd njRespectively the number of nodes in the two communities,
Figure BDA0001274573640000034
and
Figure BDA0001274573640000035
is a set of node numbers.
Further, the multi-relationship community discovery method based on relationship combination optimization and seed expansion comprises the following steps:
1) directly configuring weights for all relations of the multiple relation networks according to the same proportion to combine the multiple relation networks into a single relation network A with weights;
2) adopting a community division method of single relation to divide the communities in the network A, and recording the obtained community division result as CaThen, an index for measuring the reliability and structural strength of the community discovery result is adopted for the result CaThe measurement is carried out, and the value obtained by the measurement is recorded as Qa
3) Taking the weight ratio of each relation as a decision variable, and simultaneously optimizing the following two sub-targets by adopting a multi-objective optimization method to obtain a group of optimal decision variable sets;
optimization sub-goal one: the new weight proportion is combined to form a single relation network B and a network A, and the difference value of the structural measurement indexes of the community division result is as large as possible;
and optimizing a sub-target II: the community structure information similarity displayed by the community division results of the new weight proportion combined single-relation network B and the network A is as large as possible.
4) Selecting the optimal decision variables obtained in the step 3) in a centralized manner to enable the sub-objectives to be the largest in weight ratio, fusing the multiple relation networks into a single relation network M, carrying out community discovery on the M by adopting a single relation community discovery method, and recording the obtained number of communities as K;
5) respectively carrying out community division on the network formed by each relationship by adopting a single-relationship community discovery method;
6) setting an iteration mark L as 0, counting the results found by the community in each relationship in the step 5), and putting the nodes which are in one community in all the relationships into a seed community set C as the seed communitiesLPerforming the following steps;
7) calculating the similarity between each node in the network through a similarity calculation method through a single relation network M;
8) if L is 0, let L be L +1, set CL=CL-1Carrying out the next step; if L is>0, detection set CLAnd CL-1If yes, turning to step 11), otherwise, making L equal to L +1, and collecting CL=CL-1Carrying out the next step;
9) seed community candidate set CLThe communities not participating in the merger yet in (1) include the number of nodes according to the communitiesSorting from big to small, and selecting the community S with the highest sortingiAs a seed community;
10) calculating a seed community S by the similarity calculated in the step 7)iThe similarity between the candidate seed communities and the rest nodes is recorded as Sim;
11) selecting and SiPlacing communities and nodes with similarity exceeding set threshold β into candidate areas, and calculating S through a local fitness calculation methodiThe current fitness F and the current fitness F of each community of the candidate areahThen respectively calculating communities, nodes and S of the candidate merging areaiThe combined fitness value is recorded as FnewThen calculating FnewRelative to SiGrowth rate V of original fitness FsAnd candidate community fitness FnewIs increased by the rate Vh
12) At VsAnd VhAll are greater than the threshold value of the growth rate, and then V is selecteds+VhLargest candidate community SjAnd SiMerging into a new community, updating SiAnd set of seed communities CLReturning to the step 10); if not, the set C is checkedLIf the existing communities are merged, if not, returning to the step 9), if yes, the set C is collectedLClearing the merged records of all communities in the database, and returning to the step 8);
13) checking the number of communities in the network, if the number is less than equal to K, outputting the result, ending the program and outputting a set CL(ii) a If the K is larger than the threshold K, executing the following elimination strategy; sequentially calculating the rest communities and the similarity between the nodes and the rest communities by taking the first K communities with the maximum number of the nodes as targets, merging the rest communities and the nodes into the first K communities with the highest similarity to the rest communities and the nodes, and updating the set CLFinally, output community set CLAs a final partitioning result.
Further, community division is carried out on the network formed by the four relations by adopting a single relation community discovery algorithm BGLL, and the statistical division result is according to a formula
Figure BDA0001274573640000051
Obtaining an affinity matrix I, wherein IijFor the corresponding values of nodes i, j in the matrix, N is the total number of dimensions, here 4, m denotes the number of dimensions, Cm(i, j) indicates whether two points are divided into the same community in the m-dimension, and if the two points are divided into the same community in the m-dimension, Cm(i, j) a value of 1, otherwise 0; traversing the matrix I, and setting all numbers with values less than 4 as zero to obtain an area matrix F; performing depth-first traversal on the matrix F to obtain all connected sub-regions thereof, and storing the regions with the sub-region node number exceeding 1 into a candidate seed community set CL,L=0。
Further, calculating the similarity Sim between each node in the network through the single relation network M:
Figure BDA0001274573640000052
wherein K represents the number of dimensions of i and j divided into the same community, N is 4, the Δ gain factor is 1, the weight of the connecting edge of the node i and the node j when W (i, j) is reached, com (i, j) is the intersection of two node neighbor nodes, and N (i, j) is the union of the two node neighbor nodes.
Further, selecting and SiPlacing communities and nodes with similarity exceeding a set threshold value gamma-1 into a candidate area, and calculating SiThe current fitness F and the current fitness F of each community of the candidate areahThe calculation formula is as follows:
Figure BDA0001274573640000053
wherein is of the formula
Figure BDA0001274573640000054
Twice the sum of the weights of the edges inside the community,
Figure BDA0001274573640000055
α is a parameter for regulating the scale of the community, and the value is set as one;
then respectively calculating communities, nodes and S of the candidate merging areaiThe combined fitness value is recorded, and then F is calculatednewRelative to SiGrowth rate V of original fitness FsAnd candidate community fitness FhIs increased by the rate Vh
At VsAnd VhSelecting V on the premise that the V values are all larger than the threshold value of the growth rate to be 0.1s+VhThe largest one of the communities and SiMerging into a new community, updating SiAnd set of seed communities CL(ii) a If not, the set C is checkedLWhether all existing communities are merged; if yes, set CLClearing the merged records of all communities in the database;
checking the number of communities, if the number of communities is less than equal to K, outputting the result, ending the program and outputting a set CL(ii) a If the number of the nodes is larger than K, sequentially calculating the similarity between the rest communities and the nodes by taking the first K communities with the largest number of the nodes as targets, merging the rest communities and the nodes into the first K communities with the highest similarity to the rest communities and the nodes, and updating the set CLFinally, output community set CLAs a final partitioning result.
Another object of the present invention is to provide a social network applying the multi-relationship community discovery method based on relationship combination optimization and seed expansion.
Another object of the present invention is to provide a computer for applying the multi-relationship community discovery method based on relationship combination optimization and seed expansion.
The invention has the advantages and positive effects that:
1. the invention carries out combined optimization on various relations in the multi-relation network through weight matching, can better overcome social noise brought by different relation networks while synthesizing effective information of each relation network, and can still obtain more accurate community division results when each relation network has higher noise. According to the invention, the group of people in the same community in each relationship is found out by integrating the community division information of each relationship in the multi-relationship network, and the very fixed social circles are generally positioned at the core positions of the community, so that the community division information is used as a seed community to expand, and the hidden real community in the multi-relationship network is more favorably found. The artificial network experiment shows that the accuracy of the community discovery result of the invention is improved by 9% on average compared with the traditional method, and the specific effect is shown in figure 4. Under the severe condition that the proportion difference of noise carried by each relationship is as high as 50%, the accuracy of the community discovery result affected by noise of the traditional multi-relationship method is reduced to 71% and is not as good as 82% of the community discovery result affected by noise which is independently adopted with the minimum noise, and the accuracy of the method is 86.3% under the condition, and the specific effect is shown in figure 5.
Drawings
Fig. 1 is a flowchart of a multi-relationship community discovery method based on relationship combination optimization and seed expansion according to an embodiment of the present invention.
Fig. 2 is a flowchart of implementing multi-relationship community discovery based on relationship combination optimization and seed expansion according to an embodiment of the present invention.
FIG. 3 is a graph illustrating the comparison of the results provided by embodiments of the present invention with the results of community discovery using only one relationship.
Fig. 4 is a schematic diagram comparing the results of PMM under different noises according to the embodiment of the present invention and the conventional multiple relation community discovery method.
Fig. 5 is a schematic diagram illustrating specific effects of the present invention according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The following detailed description of the principles of the invention is provided in connection with the accompanying drawings.
The multi-relationship community discovery method based on relationship combination optimization and seed expansion provided by the embodiment of the invention fuses a multi-relationship network into a single-relationship network which can effectively synthesize community information of each relationship and has low noise by optimizing the weight ratio of each relationship in the network, then searches the community division information of each relationship in the multi-relationship network for the people who are in the same community in each relationship, takes a small community formed by the people as a seed community, and adopts a seed expansion strategy to carry out community mining on the multi-relationship social network, thereby obtaining the community structure division with higher accuracy.
As shown in fig. 1, the method for discovering a multi-relationship community based on relationship combination optimization and seed expansion provided by the embodiment of the present invention includes the following steps:
s101: directly configuring weights for each relationship of the multiple relationship networks according to a ratio of 1:1, and combining the multiple relationship networks into a single relationship network with the weights;
s102: adopting a single-relation community division algorithm to divide the network into communities, and adopting a certain measure index to calculate the community structure strength of the division result;
s103: optimizing the weight combination ratio of each dimension relation by adopting a multi-objective evolutionary algorithm, wherein the optimization target I is that the difference between the community division result of the network formed by combining the new weights and the community structure strength index of the network community division result formed by combining the network in the ratio of 1:1 at the beginning is as large as possible; the second optimization target adopts an index to measure the similarity of the community division results of the two, so that the similarity is as high as possible;
s104: converting the multi-relation network into a single-relation network by using the optimized weight ratio;
s105: respectively carrying out community division on a network formed by each relation by adopting a single-relation community discovery algorithm, counting the results of relation community discovery, and taking the node in one community in all the relations as a candidate seed community;
s106: merging the seed community and the residual nodes through the selected community similarity function and the local fitness function;
s107: and after the merging is finished, detecting the number of communities to determine whether to use an elimination strategy or not, and outputting a final community division result.
The multi-relationship community discovery method based on relationship combination optimization and seed expansion provided by the embodiment of the invention specifically comprises the following steps:
1) the relations of the multiple relation networks are directly configured with weights according to the same proportion, and the multiple relation networks are combined into a single relation network A with the weight.
2) Adopting a community division method of single relation to divide the communities in the network A, and recording the obtained community division result as CaThen, an index for measuring the reliability and structural strength of the community discovery result is adopted for the result CaThe measurement is carried out, and the value obtained by the measurement is recorded as Qa
3) Taking the weight ratio of each relation as a decision variable, and simultaneously optimizing the following two sub-targets by adopting a multi-objective optimization method to obtain a group of optimal decision variable sets;
optimization sub-goal one: the new weight proportion is combined to form a single relation network B and a network A, and the difference value of the structural measurement indexes of the community division result is as large as possible;
and optimizing a sub-target II: the community structure information similarity displayed by the community division results of the new single-relationship network B and the network A combined by the new weight proportion is as large as possible, and the community structure information similarity of the two networks can be obtained by using some common indexes, such as standardized mutual information NMI.
4) And 3) centrally selecting the optimal decision variables obtained in the step 3) to enable the sub-targets to be the maximum weight ratio, fusing the multiple relation networks into a single relation network M, carrying out community discovery on the M by adopting a single relation community discovery method, and recording the obtained number of communities as K.
5) And respectively carrying out community division on the network formed by each relationship by adopting a single-relationship community discovery method.
6) Setting an iteration mark L as 0, counting the results found by the community in each relationship in the step 5), and putting the nodes which are in one community in all the relationships into a seed community set C as the seed communitiesLIn (1).
7) The similarity between the nodes in the network is calculated through a similarity calculation method through the single relation network M.
8) If L is 0, let L be L +1, set CL=CL-1Carrying out the next step; if L is>0, detection set CLAnd CL-1If yes, turning to step 11), otherwise, making L equal to L +1, and collecting CL=CL-1Proceed to the next step.
9) Seed community candidate set CLThe communities which do not participate in merging in the method are sorted from large to small according to the number of nodes contained in the communities, and the community S with the highest sorting degree is selectediAs a seed community.
10) Calculating a seed community S by the similarity calculated in the step 7)iAnd the similarity between the candidate seed communities and the rest nodes is recorded as Sim.
11) Selecting and SiPlacing communities and nodes with similarity exceeding set threshold β into candidate areas, and calculating S through a local fitness calculation methodiThe current fitness F and the current fitness F of each community of the candidate areahThen respectively calculating communities, nodes and S of the candidate merging areaiThe combined fitness value is recorded as FnewThen calculating FnewRelative to SiGrowth rate V of original fitness FsAnd candidate community fitness FnewIs increased by the rate Vh
12) At VsAnd VhAll are greater than the threshold value of the growth rate, and then V is selecteds+VhLargest candidate community SjAnd SiMerging into a new community, updating SiAnd set of seed communities CLReturning to the step 10); if not, the set C is checkedLIf the existing communities are merged, if not, returning to the step 9), if yes, the set C is collectedLThe merged records of all communities in the database are cleared, and the step 8) is returned.
13) Checking the number of communities in the network, if the number is less than equal to K, outputting the result, ending the program and outputting a set CL(ii) a If the K is larger than the threshold K, executing the following elimination strategy;
sequentially calculating the rest communities and the similarity between the nodes and the rest communities by taking the first K communities with the maximum number of the nodes as targets, merging the rest communities and the nodes into the first K communities with the highest similarity to the rest communities and the nodes, and updating the set CLFinally, output the community setCLAs a final partitioning result.
The application of the principles of the present invention will now be described in further detail with reference to the accompanying drawings.
The multi-relationship community discovery method based on relationship combination optimization and seed expansion provided by the embodiment of the invention comprises the following steps:
the method comprises the following steps: inputting a multi-relation network, processing each relation network into an adjacent matrix, and inputting a similarity threshold value sigma which is 1 and an increase rate threshold value beta which is 0.1;
the adopted multi-relation network is an artificially generated network, the size of the artificial network is 350 nodes, and the artificial network comprises three communities, wherein each community comprises 50,100 and 200 nodes respectively. The nodes have a relationship of four dimensions, that is, four connection modes exist among the nodes. Nodes in the same social interval are connected by a probability u, and the connection probability between nodes not belonging to the same community changes according to the dimension, namely the connection probability v of the nodes in different social intervals under different relations is different. Noise is added into the network at the same time, and the noise generation method is that all nodes in the network are randomly connected with probability r.
Step two: the four relations of the multi-relation network are directly configured with weights according to the proportion of 1:1, the multi-relation network is combined into a single-relation network A with the weights, a community division algorithm BGLL of the single relation is selected for carrying out community division on the network A, and the obtained community division result is marked as CaIn the embodiment of the invention, the modularity is adopted as an index for measuring the structural strength and reliability of the result community, and the result C is calculatedaModularity of QaThe modularity is calculated as follows:
Figure BDA0001274573640000101
wherein A isi,jIs the adjacency matrix of the entire network, m is the number of edges contained in the network, kiAnd kjDegree of node i and node j, respectively, (C)i,Cj) The value of (C) depends on whether i and j are in the same community, if so (C)i,Cj) Taking a value of noIt takes 0.
Step three: taking the weight ratio of each relation as a decision variable, and simultaneously optimizing the following two sub-targets by adopting a multi-objective optimization method to obtain a group of optimal decision variable sets;
optimization sub-goal one: the modularity difference of the community division results of the new weight proportion combined single relation network B and the network A is as large as possible, and the sub-targets II are optimized: the similarity of community structure information displayed by the community division results of the new single-relationship network B and the network A combined by the new weight proportion is as large as possible, and the implementation case uses the standardized mutual information NMI and is calculated as follows:
Figure BDA0001274573640000111
the rows of the confusion matrix H represent real community division results, and the columns of the confusion matrix H represent community results drawn by the division algorithm. CAAnd CBRespectively represents the number of communities contained in the A result and the B result, HijRepresents the number of nodes, H, that should be in community i but appear in community ji.And H.jThe sum of the numbers of nodes respectively representing communities in the ith row and the jth column, and N is the number of nodes contained in the entire network. When NMI is 1, the community structures representing the two results of a and B are identical, and when NMI is 0, the community structures represent completely different. The initial population size of the multi-target genetic algorithm is 50, the cross probability is 0.8, the variation probability is 0.2, the iteration times are 300, and the coding mode is real number coding.
Step four: and D, selecting a solution which enables the modularity value to be maximum in the optimal solution set obtained in the step three as a final solution, combining the multiple relation networks into a single relation network M according to a weight ratio given by the final solution, carrying out community discovery on the M by adopting a single relation community discovery algorithm, and recording the obtained number of communities as K, wherein the K is 3.
Step five: adopting a single relation community discovery algorithm BGLL to respectively carry out community division on the network consisting of the four relations, and counting the division result according to a formula
Figure BDA0001274573640000112
Obtaining an affinity matrix I, wherein IijFor the corresponding values of nodes i, j in the matrix, N is the total number of dimensions, here 4, m denotes the number of dimensions, Cm(i, j) indicates whether two points are divided into the same community in the m-dimension, and if the two points are divided into the same community in the m-dimension, Cm(i, j) a value of 1, otherwise 0; traversing the matrix I, and setting all numbers with values less than 4 as zero to obtain an area matrix F; performing depth-first traversal on the matrix F to obtain all connected sub-regions thereof, and storing the regions with the sub-region node number exceeding 1 into a candidate seed community set CL,L=0。
Step six: calculating the similarity Sim between each node in the network through a single relation network M:
Figure BDA0001274573640000121
wherein K represents the number of dimensions of i and j divided into the same community, N is 4, the Δ gain factor is 1, the weight of the connecting edge of the node i and the node j when W (i, j) is reached, com (i, j) is the intersection of two node neighbor nodes, and N (i, j) is the union of the two node neighbor nodes.
Step seven: if L is 0, let L be L +1, set CL=CL-1Carrying out the next step; no case detection set CLAnd CL-1If yes, turning to step eleven, otherwise, making L equal to L +1, and collecting CL=CL-1Proceed to the next step.
Step eight: seed community candidate set CLThe communities which do not participate in merging in the method are sorted according to the number of the nodes contained in the communities, and the community S with the largest number of the nodes is selectediAs a seed community.
Step nine: calculating seed community SiThe similarity with the rest candidate seed communities and the rest nodes is calculated as follows:
Figure BDA0001274573640000122
u and v are respectively community Si,SjNode n of (1)iAnd njRespectively the number of nodes in the two communities,
Figure BDA0001274573640000124
and
Figure BDA0001274573640000125
is a set of node numbers.
Step ten: selecting and SiPlacing communities and nodes with similarity exceeding a set threshold value gamma-1 into a candidate area, and calculating SiThe current fitness F and the current fitness F of each community of the candidate areahThe calculation formula is as follows:
Figure BDA0001274573640000123
wherein is of the formula
Figure BDA0001274573640000127
Twice the sum of the weights of the edges inside the community,
Figure BDA0001274573640000126
is the sum of the weights of the external sides of the community, α is a parameter for regulating the size of the community, and the value is set as one;
then respectively calculating communities, nodes and S of the candidate merging areaiThe combined fitness value is recorded, and then F is calculatednewRelative to SiGrowth rate V of original fitness FsAnd candidate community fitness FhIs increased by the rate Vh
Step eleven: at VsAnd VhSelecting V on the premise that the V values are all larger than the threshold value of the growth rate to be 0.1s+VhThe largest one of the communities and SiMerging into a new community, updating SiAnd set of seed communities CLReturning to the step ten; if not, the set C is checkedLIf the existing communities are merged, returning to the step nine if the existing communities are not merged, and if the existing communities are merged, returning to the step nineSet CLAnd resetting the merged records of all communities in the database, and returning to the step eight.
Step twelve: checking the number of communities, if the number of communities is less than equal to K, outputting the result, ending the program and outputting a set CL(ii) a If the number of the nodes is larger than K, sequentially calculating the similarity between the rest communities and the nodes by taking the first K communities with the largest number of the nodes as targets, merging the rest communities and the nodes into the first K communities with the highest similarity to the rest communities and the nodes, and updating the set CLFinally, output community set CLAs a final partitioning result.
The application effect of the present invention will be described in detail with reference to the simulation.
The first simulation is that in the case of the embodiment of the present invention, u is 0.5, and the noise r is 0.1, the present invention compares the result of community discovery by using a BGLL algorithm with only one relationship, and uses an NMI value to verify the similarity between the experimental result and the real community structure, as shown in fig. 3. It can be seen that the results of community discovery using multiple relationships are superior to the results using only one relationship.
Simulation two is a comparison of the detection results of the present invention with the conventional PMM method when the noise is continuously increased (from 0.1 to 0.4, each time by 0.05) in the embodiment of the present invention, as shown in fig. 4. It can be seen that the method of the present invention has a strong anti-noise effect.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (5)

1. A multi-relationship community discovery method based on relationship combination optimization and seed expansion is characterized in that a multi-relationship network is fused into a single-relationship network which can effectively integrate community information of all relationships and has low noise through the weight proportion of all relationships in a multi-objective optimization network, then community division information of all relationships in the multi-relationship network is integrated to find out people in the same community in all relationships, small communities formed by the people are used as seed communities, and a seed expansion strategy is adopted to carry out community mining on the multi-relationship social network to obtain community structure division with higher accuracy;
the weight ratio optimization sub-target adopts modularity as an index for measuring the structural strength and reliability of the result community, and a calculation result CaModularity of QaThe modularity is calculated as follows:
Figure FDA0002592799040000011
wherein A isi,jIs the adjacency matrix of the entire network, m is the number of edges contained in the network, kiAnd kjDegree of node i and node j, respectively, (C)i,Cj) The value of (C) depends on whether i and j are in the same community, if so (C)i,Cj) The value is 1, otherwise, the value is 0;
the second weight proportion optimization sub-target adopts NMI as a community structure information similarity measurement standard, and is calculated as follows:
Figure FDA0002592799040000012
the rows of the confusion matrix H represent real community division results, and the columns of the confusion matrix H represent community results drawn by a division algorithm; cAAnd CBRespectively representing the number of communities contained in the A result and the B result, MijRepresenting the number of nodes, M, that should be in community i but appear in community ji.And M.jRespectively representing the sum of the number of nodes of communities in the ith row and the jth column, wherein N is the number of nodes contained in the whole network; when NMI is 1, the community structures representing the two results of A and B are completely the same, and when NMI is 0, the community structures representing the two results of A and B are completely different;
the seed expansion strategy selects the community s with the maximum number of nodesiAs a seed community; calculating seed communities siSimilarity to the remaining candidate seed communities and the remaining nodes,the calculation is as follows:
Figure FDA0002592799040000021
u and v are respectively communities si,sjNode n of (1)iAnd njRespectively the number of nodes in the two communities,
Figure FDA0002592799040000022
and
Figure FDA0002592799040000023
is a set of node numbers;
the multi-relationship community discovery method based on relationship combination optimization and seed expansion comprises the following steps:
1) directly configuring weights for all relations of the multiple relation networks according to the same proportion, and combining the multiple relation networks into a single relation network A with the weights;
2) adopting a community division method of single relation to divide the communities in the network A, and recording the obtained community division result as CaThen, an index for measuring the reliability and structural strength of the community discovery result is adopted for the result CaThe measurement is carried out, and the value obtained by the measurement is recorded as Qa
3) Taking the weight ratio of each relation as a decision variable, and simultaneously optimizing the following two sub-targets by adopting a multi-objective optimization method to obtain a group of optimal decision variable sets;
optimization sub-goal one: the new weight proportion is combined to form a single relation network B and a network A, and the difference value of the structural measurement indexes of the community division result is as large as possible;
and optimizing a sub-target II: the similarity of community structure information displayed by the community division results of the new weight proportion combined single-relation network B and the network A is as large as possible;
4) selecting a weight ratio which enables sub-targets to be maximum in the optimal decision variables obtained in the step 3) in a centralized manner, fusing multiple relation networks into a single relation network M, carrying out community discovery on the M by adopting a single relation community discovery method, and recording the obtained number of communities as K;
5) respectively carrying out community division on the network formed by each relationship by adopting a single-relationship community discovery method;
6) setting an iteration mark L as 0, counting the results found by the community in each relationship in the step 5), and putting the nodes which are in one community in all the relationships into a seed community set C as the seed communitiesLPerforming the following steps;
7) calculating the similarity between each node in the network through a similarity calculation method through a single relation network M;
8) if L is 0, let L be L +1, set CL=CL-1Carrying out the next step; if L > 0, detect set CLAnd CL-1If yes, turning to step 11), otherwise, making L equal to L +1, and collecting CL=CL-1Carrying out the next step;
9) seed community candidate set CLThe communities which do not participate in merging in the method are sorted from large to small according to the number of nodes contained in the communities, and the community s with the highest sorting degree is selectediAs a seed community;
10) calculating the seed community s according to the similarity calculated in the step 7)iThe similarity between the candidate seed communities and the rest nodes is recorded as Sim;
11) selecting and siPlacing communities and nodes with similarity exceeding set threshold β into candidate areas, and calculating s through a local fitness calculation methodiThe current fitness F and the current fitness F of each community of the candidate areahThen respectively calculating communities, nodes and s of the candidate merging areaiThe combined fitness value is recorded as FnewThen calculating FnewRelative to siGrowth rate V of original fitness FsAnd candidate community fitness FnewIs increased by the rate Vh
12) At VsAnd VhAll are greater than the threshold value of the growth rate, and then V is selecteds+VhLargest candidate community sjAnd siMerge into a new community, update siAnd seed communitySet CLReturning to the step 10); if not, the set C is checkedLIf the existing communities are merged, if not, returning to the step 9), if yes, the set C is collectedLClearing the merged records of all communities in the database, and returning to the step 8);
13) checking the number of communities in the network, if the number is less than equal to K, outputting the result, ending the program and outputting a set CL(ii) a If the K is greater than the threshold K, the following elimination strategy is executed: sequentially calculating the rest communities and the similarity between the nodes and the rest communities by taking the first K communities with the maximum number of the nodes as targets, merging the rest communities and the nodes into the first K communities with the highest similarity to the rest communities and the nodes, and updating the set CLFinally, output community set CLAs a final partitioning result.
2. The multi-relationship community discovery method based on relationship combination optimization and seed expansion as claimed in claim 1, wherein a single-relationship community discovery algorithm BGLL is adopted to perform community division on the network composed of four relationships respectively, and statistical division results are according to a formula
Figure FDA0002592799040000031
Obtaining an affinity matrix I, wherein IijFor the corresponding values of nodes i, j in the matrix, N is the total number of dimensions, here 4, m denotes the number of dimensions, Cm(i, j) indicates whether two points are divided into the same community in the m-dimension, and if the two points are divided into the same community in the m-dimension, Cm(i, j) a value of 1, otherwise 0; traversing the matrix I, and setting all numbers with values less than 4 as zero to obtain an area matrix F; performing depth-first traversal on the matrix F to obtain all connected sub-regions thereof, and storing the regions with the sub-region node number exceeding 1 into a candidate seed community set CL,L=0。
3. The multi-relationship community discovery method based on relationship combination optimization and seed expansion as claimed in claim 1, wherein the similarity Sim between each node in the network is calculated by a single-relationship network M:
Figure FDA0002592799040000041
wherein K represents the number of dimensions of i and j divided into the same community, N is 4, the gain factor Δ is 1, the weight of the connecting edge of the node i and the node j when W (i, j) is reached, com (i, j) is the intersection of two node neighbor nodes, and N (i, j) is the union of the two node neighbor nodes.
4. The multi-relationship community discovery method based on relationship combinatorial optimization and seed dilation according to claim 1, wherein the sum s is selectediPlacing communities and nodes with similarity exceeding a set threshold value gamma-1 into a candidate area, and calculating siThe current fitness F and the current fitness F of each community of the candidate areahThe calculation formula is as follows:
Figure FDA0002592799040000042
wherein is of the formula
Figure FDA0002592799040000043
Twice the sum of the weights of the edges inside the community,
Figure FDA0002592799040000044
α is a parameter for regulating the scale of the community, and the value is set to be 1;
then respectively calculating communities, nodes and s of the candidate merging areasiThe combined fitness value is recorded as FnewThen calculate FnewRelative to siGrowth rate V of original fitness FsAnd candidate community fitness FhIs increased by the rate Vh
At VsAnd VhSelecting V on the premise that the V values are all larger than the threshold value of the growth rate to be 0.1s+VhThe largest one of the communities and siMerge into a new community, update siAnd set of seed communities CL(ii) a If not, the set C is checkedLWhether all existing communities are merged; if yes, set CLClearing the merged records of all communities in the database;
checking the number of communities, if the number of communities is less than equal to K, outputting the result, ending the program and outputting a set CL(ii) a If the number of the nodes is larger than K, sequentially calculating the similarity between the rest communities and the nodes by taking the first K communities with the largest number of the nodes as targets, merging the rest communities and the nodes into the first K communities with the highest similarity to the rest communities and the nodes, and updating the set CLFinally, output community set CLAs a final partitioning result.
5. A computer applying the multi-relationship community discovery method based on the relationship combination optimization and the seed expansion as claimed in any one of claims 1 to 4.
CN201710260510.6A 2017-04-20 2017-04-20 Multi-relationship community discovery method based on relationship combination optimization and seed expansion Active CN107169871B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710260510.6A CN107169871B (en) 2017-04-20 2017-04-20 Multi-relationship community discovery method based on relationship combination optimization and seed expansion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710260510.6A CN107169871B (en) 2017-04-20 2017-04-20 Multi-relationship community discovery method based on relationship combination optimization and seed expansion

Publications (2)

Publication Number Publication Date
CN107169871A CN107169871A (en) 2017-09-15
CN107169871B true CN107169871B (en) 2020-08-28

Family

ID=59813344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710260510.6A Active CN107169871B (en) 2017-04-20 2017-04-20 Multi-relationship community discovery method based on relationship combination optimization and seed expansion

Country Status (1)

Country Link
CN (1) CN107169871B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776781B (en) * 2016-11-11 2018-08-24 深圳云天励飞技术有限公司 A kind of human relation network analysis method and device
CN108648094A (en) * 2018-05-08 2018-10-12 阿里巴巴集团控股有限公司 A kind of community discovery method, device and equipment
CN108876116B (en) * 2018-05-30 2021-08-24 西北工业大学 Manufacturing effect knowledge recommendation method oriented to manufacturing technology optimization
CN112528166A (en) * 2020-12-16 2021-03-19 平安养老保险股份有限公司 User relationship analysis method and device, computer equipment and storage medium
CN113395172B (en) * 2021-05-18 2022-11-11 中国电子科技集团公司第五十四研究所 Important user discovery and behavior prediction method based on communication network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663499A (en) * 2012-03-11 2012-09-12 西安电子科技大学 Network community division method based on simulated annealing genetic algorithm
CN102768735A (en) * 2012-07-04 2012-11-07 西安电子科技大学 Network community partitioning method based on immune clone multi-objective optimization
CN104200272A (en) * 2014-08-28 2014-12-10 北京工业大学 Complex network community mining method based on improved genetic algorithm
CN104268629A (en) * 2014-09-15 2015-01-07 西安电子科技大学 Complex network community detecting method based on prior information and network inherent information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663499A (en) * 2012-03-11 2012-09-12 西安电子科技大学 Network community division method based on simulated annealing genetic algorithm
CN102768735A (en) * 2012-07-04 2012-11-07 西安电子科技大学 Network community partitioning method based on immune clone multi-objective optimization
CN104200272A (en) * 2014-08-28 2014-12-10 北京工业大学 Complex network community mining method based on improved genetic algorithm
CN104268629A (en) * 2014-09-15 2015-01-07 西安电子科技大学 Complex network community detecting method based on prior information and network inherent information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NCSS: 一种快速有效的复杂网络社团划分算法;韩忠明 等;《中国科学》;20160413;第431-444页 *

Also Published As

Publication number Publication date
CN107169871A (en) 2017-09-15

Similar Documents

Publication Publication Date Title
CN107169871B (en) Multi-relationship community discovery method based on relationship combination optimization and seed expansion
CN102413029B (en) Method for partitioning communities in complex dynamic network by virtue of multi-objective local search based on decomposition
Zadeh et al. A multi-population cultural algorithm for community detection in social networks
CN103473786B (en) Gray level image segmentation method based on multi-objective fuzzy clustering
CN109214503B (en) Power transmission and transformation project cost prediction method based on KPCA-LA-RBM
CN110232434A (en) A kind of neural network framework appraisal procedure based on attributed graph optimization
CN111178611A (en) Method for predicting daily electric quantity
CN110473592A (en) The multi-angle of view mankind for having supervision based on figure convolutional network cooperate with lethal gene prediction technique
CN106452825A (en) Power distribution and utilization communication network alarm correlation analysis method based on improved decision tree
CN110879856A (en) Social group classification method and system based on multi-feature fusion
CN113297174B (en) Land utilization change simulation method based on deep learning
CN109686402A (en) Based on key protein matter recognition methods in dynamic weighting interactive network
Kaur et al. Comparative analysis of quality metrics for community detection in social networks using genetic algorithm
Jia et al. Measuring quadrangle formation in complex networks
Babers et al. A nature-inspired metaheuristic lion optimization algorithm for community detection
CN110716998B (en) Fine scale population data spatialization method
CN108446712A (en) ODN nets Intelligent planning method, apparatus and system
CN117313516A (en) Fermentation product prediction method based on space-time diagram embedding
CN114578087B (en) Wind speed uncertainty measurement method based on non-dominant sorting and stochastic simulation algorithm
CN116776260A (en) Rock burst grade double-model step-by-step prediction method based on machine learning
CN106911512B (en) Game-based link prediction method and system in exchangeable graph
CN115659807A (en) Method for predicting talent performance based on Bayesian optimization model fusion algorithm
Liu et al. An improved multiobjective evolutionary approach for community detection in multilayer networks
Yang et al. Seeking community structure in networks via biogeography-based optimization with consensus dynamics
Wang et al. FCM algorithm and index CS for the signal sorting of radiant points

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant