CN107169871B

CN107169871B - Multi-relationship community discovery method based on relationship combination optimization and seed expansion

Info

Publication number: CN107169871B
Application number: CN201710260510.6A
Authority: CN
Inventors: 杨清海; 尹霄冲
Original assignee: Xidian University; Xian Cetc Xidian University Radar Technology Collaborative Innovation Research Institute Co Ltd
Current assignee: Xidian University; Xian Cetc Xidian University Radar Technology Collaborative Innovation Research Institute Co Ltd
Priority date: 2017-04-20
Filing date: 2017-04-20
Publication date: 2020-08-28
Anticipated expiration: 2037-04-20
Also published as: CN107169871A

Abstract

The invention belongs to the technical field of social networks and computer application, and discloses a multi-relationship community discovery method based on relationship combination optimization and seed expansion. Experiments show that compared with the traditional method, the method has the advantages of high result accuracy and strong anti-noise capability.

Description

Multi-relationship community discovery method based on relationship combination optimization and seed expansion

Technical Field

The invention belongs to the technical field of social networks and computer application, and particularly relates to a multi-relationship community discovery method based on relationship combination optimization and seed expansion.

Background

The Social Network (Social Network) is a Network structure formed by a plurality of nodes and connecting edges, the nodes can correspond to people in real life and various Social organizations, and the connecting edges between the nodes can correspond to various communication relationships in daily life, such as Social relationships among friends, family, classmates and the like between people, and also can be contact relationships among WeChat, telephone, mails and the like. The community discovery is an important research direction of complex networks including social networks, and has great application value in the fields of electronic commerce, public safety, biology and the like. The community refers to a plurality of node clusters which have more common characteristics and are connected with each other. The network topology structure is characterized in that the connection among the nodes in the same community is tight, and the connection among the nodes which do not belong to the same community is sparse. The community discovery is a basic task of social network analysis, and is to mine community structures existing in the network, and research the communities of the network has a crucial role in understanding the structure and function of the whole network. Generally, research and analysis on social networks mostly research one relationship in the networks, and various types of relationships exist among social network individuals in real life, for example, social relationships such as friends, classmates, family members exist among people. Besides traditional telephone and short messages, various internet-based communication links such as WeChat, microblog, human Facebook, Twitter, Youtube and the like also commonly exist in social networks, and even in the same communication tool, various communication ways exist, for example, communication among people in microblog can be generated through various channels such as comment, forwarding, praise and the like. Social networks of multiple relationships are ubiquitous in real life. At present, community discovery methods for social networks are basically based on research of a relationship, and because a communication mode among social individuals has certain subjectivity, communication information among individuals cannot be fully mastered only by adopting a relationship, and community division results are probably not corresponding to reality finally due to incomplete information. Meanwhile, some communication mode may generate more social noise due to easier contact with strangers or persons who are not familiar with the communication mode, and the noise can cover up real community structures in real life. When some existing multi-relationship community discovery methods are used for carrying out community discovery on a multi-relationship network, networks formed by various relationships are all viewed as the same thing, but in the actual multi-relationship social network, due to the fact that communication modes are different, social noise of each relationship is different. Although the community information among the relationships can be supplemented by the method, noise of the social relationships is introduced, and the result of community discovery by comprehensively considering multiple relationships is rather inferior to the result of community discovery by only considering a single relationship under the condition that the noise of some social relationships is relatively large.

In summary, the problems of the prior art are as follows: in the existing community discovery method for the social network, the difference of each relationship is ignored, and the information brought by each relationship is treated as same as one another, so that the social noise carried by each relationship network is introduced when the community information of each relationship network is integrated, the community discovery accuracy is not high enough, and even the community discovery result obtained by comprehensively considering multiple relationships is inferior to the community discovery result obtained by considering only a single relationship when the noise of some relationships is high.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a multi-relationship community discovery method based on relationship combination optimization and seed expansion.

The invention is realized in this way, a multi-relation community discovery method based on relation combination optimization and seed expansion, the multi-relation community discovery method based on relation combination optimization and seed expansion fuses a multi-relation network into a single-relation network which can effectively synthesize the community information of each relation and has low noise at the same time through the weight proportion of each relation in a multi-objective optimization network, then the community division information of each relation in the multi-relation network is synthesized to find out the crowd in the same community in each relation, the small community formed by the crowds is used as a seed community, and the community mining is carried out on the multi-relation social network by adopting a seed expansion strategy to obtain the community structure division with higher accuracy;

the weight ratio optimization sub-target adopts modularity as an index for measuring the structural strength and reliability of the result community, and a calculation result C_aModularity of Q_aThe modularity is calculated as follows:

wherein A is_i，jIs the adjacency matrix of the entire network, m is the number of edges contained in the network, k_iAnd k_jDegree of node i and node j, respectively, (C)_i，C_j) The value of (C) depends on whether i and j are in the same community, if so (C)_i，C_j) If not, the value is 0;

the second weight proportion optimization sub-target adopts NMI as a community structure information similarity measurement standard, and is calculated as follows:

the rows of the confusion matrix H represent real community division results, and the columns of the confusion matrix H represent community results drawn by the division algorithm. C_AAnd C_BRespectively represents the number of communities contained in the A result and the B result, H_ijRepresents the number of nodes, H, that should be in community i but appear in community j_i.And H_.jRespectively representing the sum of the number of nodes of communities in the ith row and the jth column, wherein N is the number of nodes contained in the whole network; when NMI is equal to 1, the community structures representing two results of A and B are completely the same, and when NMI is equal to 0, the community structures represent completely different;

the seed expansion strategy selects the community S with the maximum number of nodes_iAs a seed community; calculating seed community S_iThe similarity with the rest candidate seed communities and the rest nodes is calculated as follows:

u and v are respectively community S_i，S_jNode n of (1)_iAnd n_jRespectively the number of nodes in the two communities,

and

is a set of node numbers.

Further, the multi-relationship community discovery method based on relationship combination optimization and seed expansion comprises the following steps:

1) directly configuring weights for all relations of the multiple relation networks according to the same proportion to combine the multiple relation networks into a single relation network A with weights;

2) adopting a community division method of single relation to divide the communities in the network A, and recording the obtained community division result as C_aThen, an index for measuring the reliability and structural strength of the community discovery result is adopted for the result C_aThe measurement is carried out, and the value obtained by the measurement is recorded as Q_a；

3) Taking the weight ratio of each relation as a decision variable, and simultaneously optimizing the following two sub-targets by adopting a multi-objective optimization method to obtain a group of optimal decision variable sets;

optimization sub-goal one: the new weight proportion is combined to form a single relation network B and a network A, and the difference value of the structural measurement indexes of the community division result is as large as possible;

and optimizing a sub-target II: the community structure information similarity displayed by the community division results of the new weight proportion combined single-relation network B and the network A is as large as possible.

4) Selecting the optimal decision variables obtained in the step 3) in a centralized manner to enable the sub-objectives to be the largest in weight ratio, fusing the multiple relation networks into a single relation network M, carrying out community discovery on the M by adopting a single relation community discovery method, and recording the obtained number of communities as K;

5) respectively carrying out community division on the network formed by each relationship by adopting a single-relationship community discovery method;

6) setting an iteration mark L as 0, counting the results found by the community in each relationship in the step 5), and putting the nodes which are in one community in all the relationships into a seed community set C as the seed communities_LPerforming the following steps;

7) calculating the similarity between each node in the network through a similarity calculation method through a single relation network M;

8) if L is 0, let L be L +1, set C_L＝C_L-1Carrying out the next step; if L is>0, detection set C_LAnd C_L-1If yes, turning to step 11), otherwise, making L equal to L +1, and collecting C_L＝C_L-1Carrying out the next step;

9) seed community candidate set C_LThe communities not participating in the merger yet in (1) include the number of nodes according to the communitiesSorting from big to small, and selecting the community S with the highest sorting_iAs a seed community;

10) calculating a seed community S by the similarity calculated in the step 7)_iThe similarity between the candidate seed communities and the rest nodes is recorded as Sim;

11) selecting and S_iPlacing communities and nodes with similarity exceeding set threshold β into candidate areas, and calculating S through a local fitness calculation method_iThe current fitness F and the current fitness F of each community of the candidate area_hThen respectively calculating communities, nodes and S of the candidate merging area_iThe combined fitness value is recorded as F_newThen calculating F_newRelative to S_iGrowth rate V of original fitness F_sAnd candidate community fitness F_newIs increased by the rate V_h；

12) At V_sAnd V_hAll are greater than the threshold value of the growth rate, and then V is selected_s+V_hLargest candidate community S_jAnd S_iMerging into a new community, updating S_iAnd set of seed communities C_LReturning to the step 10); if not, the set C is checked_LIf the existing communities are merged, if not, returning to the step 9), if yes, the set C is collected_LClearing the merged records of all communities in the database, and returning to the step 8);

13) checking the number of communities in the network, if the number is less than equal to K, outputting the result, ending the program and outputting a set C_L(ii) a If the K is larger than the threshold K, executing the following elimination strategy; sequentially calculating the rest communities and the similarity between the nodes and the rest communities by taking the first K communities with the maximum number of the nodes as targets, merging the rest communities and the nodes into the first K communities with the highest similarity to the rest communities and the nodes, and updating the set C_LFinally, output community set C_LAs a final partitioning result.

Further, community division is carried out on the network formed by the four relations by adopting a single relation community discovery algorithm BGLL, and the statistical division result is according to a formula

Obtaining an affinity matrix I, wherein I_ijFor the corresponding values of nodes i, j in the matrix, N is the total number of dimensions, here 4, m denotes the number of dimensions, C_m(i, j) indicates whether two points are divided into the same community in the m-dimension, and if the two points are divided into the same community in the m-dimension, C_m(i, j) a value of 1, otherwise 0; traversing the matrix I, and setting all numbers with values less than 4 as zero to obtain an area matrix F; performing depth-first traversal on the matrix F to obtain all connected sub-regions thereof, and storing the regions with the sub-region node number exceeding 1 into a candidate seed community set C_L，L＝0。

Further, calculating the similarity Sim between each node in the network through the single relation network M:

wherein K represents the number of dimensions of i and j divided into the same community, N is 4, the Δ gain factor is 1, the weight of the connecting edge of the node i and the node j when W (i, j) is reached, com (i, j) is the intersection of two node neighbor nodes, and N (i, j) is the union of the two node neighbor nodes.

Further, selecting and S_iPlacing communities and nodes with similarity exceeding a set threshold value gamma-1 into a candidate area, and calculating S_iThe current fitness F and the current fitness F of each community of the candidate area_hThe calculation formula is as follows:

wherein is of the formula

Twice the sum of the weights of the edges inside the community,

α is a parameter for regulating the scale of the community, and the value is set as one;

then respectively calculating communities, nodes and S of the candidate merging area_iThe combined fitness value is recorded, and then F is calculated_newRelative to S_iGrowth rate V of original fitness F_sAnd candidate community fitness F_hIs increased by the rate V_h；

At V_sAnd V_hSelecting V on the premise that the V values are all larger than the threshold value of the growth rate to be 0.1_s+V_hThe largest one of the communities and S_iMerging into a new community, updating S_iAnd set of seed communities C_L(ii) a If not, the set C is checked_LWhether all existing communities are merged; if yes, set C_LClearing the merged records of all communities in the database;

checking the number of communities, if the number of communities is less than equal to K, outputting the result, ending the program and outputting a set C_L(ii) a If the number of the nodes is larger than K, sequentially calculating the similarity between the rest communities and the nodes by taking the first K communities with the largest number of the nodes as targets, merging the rest communities and the nodes into the first K communities with the highest similarity to the rest communities and the nodes, and updating the set C_LFinally, output community set C_LAs a final partitioning result.

Another object of the present invention is to provide a social network applying the multi-relationship community discovery method based on relationship combination optimization and seed expansion.

Another object of the present invention is to provide a computer for applying the multi-relationship community discovery method based on relationship combination optimization and seed expansion.

The invention has the advantages and positive effects that:

1. the invention carries out combined optimization on various relations in the multi-relation network through weight matching, can better overcome social noise brought by different relation networks while synthesizing effective information of each relation network, and can still obtain more accurate community division results when each relation network has higher noise. According to the invention, the group of people in the same community in each relationship is found out by integrating the community division information of each relationship in the multi-relationship network, and the very fixed social circles are generally positioned at the core positions of the community, so that the community division information is used as a seed community to expand, and the hidden real community in the multi-relationship network is more favorably found. The artificial network experiment shows that the accuracy of the community discovery result of the invention is improved by 9% on average compared with the traditional method, and the specific effect is shown in figure 4. Under the severe condition that the proportion difference of noise carried by each relationship is as high as 50%, the accuracy of the community discovery result affected by noise of the traditional multi-relationship method is reduced to 71% and is not as good as 82% of the community discovery result affected by noise which is independently adopted with the minimum noise, and the accuracy of the method is 86.3% under the condition, and the specific effect is shown in figure 5.

Drawings

Fig. 1 is a flowchart of a multi-relationship community discovery method based on relationship combination optimization and seed expansion according to an embodiment of the present invention.

Fig. 2 is a flowchart of implementing multi-relationship community discovery based on relationship combination optimization and seed expansion according to an embodiment of the present invention.

FIG. 3 is a graph illustrating the comparison of the results provided by embodiments of the present invention with the results of community discovery using only one relationship.

Fig. 4 is a schematic diagram comparing the results of PMM under different noises according to the embodiment of the present invention and the conventional multiple relation community discovery method.

Fig. 5 is a schematic diagram illustrating specific effects of the present invention according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The following detailed description of the principles of the invention is provided in connection with the accompanying drawings.

The multi-relationship community discovery method based on relationship combination optimization and seed expansion provided by the embodiment of the invention fuses a multi-relationship network into a single-relationship network which can effectively synthesize community information of each relationship and has low noise by optimizing the weight ratio of each relationship in the network, then searches the community division information of each relationship in the multi-relationship network for the people who are in the same community in each relationship, takes a small community formed by the people as a seed community, and adopts a seed expansion strategy to carry out community mining on the multi-relationship social network, thereby obtaining the community structure division with higher accuracy.

As shown in fig. 1, the method for discovering a multi-relationship community based on relationship combination optimization and seed expansion provided by the embodiment of the present invention includes the following steps:

s101: directly configuring weights for each relationship of the multiple relationship networks according to a ratio of 1:1, and combining the multiple relationship networks into a single relationship network with the weights;

s102: adopting a single-relation community division algorithm to divide the network into communities, and adopting a certain measure index to calculate the community structure strength of the division result;

s103: optimizing the weight combination ratio of each dimension relation by adopting a multi-objective evolutionary algorithm, wherein the optimization target I is that the difference between the community division result of the network formed by combining the new weights and the community structure strength index of the network community division result formed by combining the network in the ratio of 1:1 at the beginning is as large as possible; the second optimization target adopts an index to measure the similarity of the community division results of the two, so that the similarity is as high as possible;

s104: converting the multi-relation network into a single-relation network by using the optimized weight ratio;

s105: respectively carrying out community division on a network formed by each relation by adopting a single-relation community discovery algorithm, counting the results of relation community discovery, and taking the node in one community in all the relations as a candidate seed community;

s106: merging the seed community and the residual nodes through the selected community similarity function and the local fitness function;

s107: and after the merging is finished, detecting the number of communities to determine whether to use an elimination strategy or not, and outputting a final community division result.

The multi-relationship community discovery method based on relationship combination optimization and seed expansion provided by the embodiment of the invention specifically comprises the following steps:

1) the relations of the multiple relation networks are directly configured with weights according to the same proportion, and the multiple relation networks are combined into a single relation network A with the weight.

2) Adopting a community division method of single relation to divide the communities in the network A, and recording the obtained community division result as C_aThen, an index for measuring the reliability and structural strength of the community discovery result is adopted for the result C_aThe measurement is carried out, and the value obtained by the measurement is recorded as Q_a。

and optimizing a sub-target II: the community structure information similarity displayed by the community division results of the new single-relationship network B and the network A combined by the new weight proportion is as large as possible, and the community structure information similarity of the two networks can be obtained by using some common indexes, such as standardized mutual information NMI.

4) And 3) centrally selecting the optimal decision variables obtained in the step 3) to enable the sub-targets to be the maximum weight ratio, fusing the multiple relation networks into a single relation network M, carrying out community discovery on the M by adopting a single relation community discovery method, and recording the obtained number of communities as K.

5) And respectively carrying out community division on the network formed by each relationship by adopting a single-relationship community discovery method.

6) Setting an iteration mark L as 0, counting the results found by the community in each relationship in the step 5), and putting the nodes which are in one community in all the relationships into a seed community set C as the seed communities_LIn (1).

7) The similarity between the nodes in the network is calculated through a similarity calculation method through the single relation network M.

8) If L is 0, let L be L +1, set C_L＝C_L-1Carrying out the next step; if L is>0, detection set C_LAnd C_L-1If yes, turning to step 11), otherwise, making L equal to L +1, and collecting C_L＝C_L-1Proceed to the next step.

9) Seed community candidate set C_LThe communities which do not participate in merging in the method are sorted from large to small according to the number of nodes contained in the communities, and the community S with the highest sorting degree is selected_iAs a seed community.

10) Calculating a seed community S by the similarity calculated in the step 7)_iAnd the similarity between the candidate seed communities and the rest nodes is recorded as Sim.

11) Selecting and S_iPlacing communities and nodes with similarity exceeding set threshold β into candidate areas, and calculating S through a local fitness calculation method_iThe current fitness F and the current fitness F of each community of the candidate area_hThen respectively calculating communities, nodes and S of the candidate merging area_iThe combined fitness value is recorded as F_newThen calculating F_newRelative to S_iGrowth rate V of original fitness F_sAnd candidate community fitness F_newIs increased by the rate V_h。

12) At V_sAnd V_hAll are greater than the threshold value of the growth rate, and then V is selected_s+V_hLargest candidate community S_jAnd S_iMerging into a new community, updating S_iAnd set of seed communities C_LReturning to the step 10); if not, the set C is checked_LIf the existing communities are merged, if not, returning to the step 9), if yes, the set C is collected_LThe merged records of all communities in the database are cleared, and the step 8) is returned.

13) Checking the number of communities in the network, if the number is less than equal to K, outputting the result, ending the program and outputting a set C_L(ii) a If the K is larger than the threshold K, executing the following elimination strategy;

sequentially calculating the rest communities and the similarity between the nodes and the rest communities by taking the first K communities with the maximum number of the nodes as targets, merging the rest communities and the nodes into the first K communities with the highest similarity to the rest communities and the nodes, and updating the set C_LFinally, output the community setC_LAs a final partitioning result.

The application of the principles of the present invention will now be described in further detail with reference to the accompanying drawings.

The multi-relationship community discovery method based on relationship combination optimization and seed expansion provided by the embodiment of the invention comprises the following steps:

the method comprises the following steps: inputting a multi-relation network, processing each relation network into an adjacent matrix, and inputting a similarity threshold value sigma which is 1 and an increase rate threshold value beta which is 0.1;

the adopted multi-relation network is an artificially generated network, the size of the artificial network is 350 nodes, and the artificial network comprises three communities, wherein each community comprises 50,100 and 200 nodes respectively. The nodes have a relationship of four dimensions, that is, four connection modes exist among the nodes. Nodes in the same social interval are connected by a probability u, and the connection probability between nodes not belonging to the same community changes according to the dimension, namely the connection probability v of the nodes in different social intervals under different relations is different. Noise is added into the network at the same time, and the noise generation method is that all nodes in the network are randomly connected with probability r.

Step two: the four relations of the multi-relation network are directly configured with weights according to the proportion of 1:1, the multi-relation network is combined into a single-relation network A with the weights, a community division algorithm BGLL of the single relation is selected for carrying out community division on the network A, and the obtained community division result is marked as C_aIn the embodiment of the invention, the modularity is adopted as an index for measuring the structural strength and reliability of the result community, and the result C is calculated_aModularity of Q_aThe modularity is calculated as follows:

wherein A is_i，jIs the adjacency matrix of the entire network, m is the number of edges contained in the network, k_iAnd k_jDegree of node i and node j, respectively, (C)_i，C_j) The value of (C) depends on whether i and j are in the same community, if so (C)_i，C_j) Taking a value of noIt takes 0.

Step three: taking the weight ratio of each relation as a decision variable, and simultaneously optimizing the following two sub-targets by adopting a multi-objective optimization method to obtain a group of optimal decision variable sets;

optimization sub-goal one: the modularity difference of the community division results of the new weight proportion combined single relation network B and the network A is as large as possible, and the sub-targets II are optimized: the similarity of community structure information displayed by the community division results of the new single-relationship network B and the network A combined by the new weight proportion is as large as possible, and the implementation case uses the standardized mutual information NMI and is calculated as follows:

the rows of the confusion matrix H represent real community division results, and the columns of the confusion matrix H represent community results drawn by the division algorithm. C_AAnd C_BRespectively represents the number of communities contained in the A result and the B result, H_ijRepresents the number of nodes, H, that should be in community i but appear in community j_i.And H_.jThe sum of the numbers of nodes respectively representing communities in the ith row and the jth column, and N is the number of nodes contained in the entire network. When NMI is 1, the community structures representing the two results of a and B are identical, and when NMI is 0, the community structures represent completely different. The initial population size of the multi-target genetic algorithm is 50, the cross probability is 0.8, the variation probability is 0.2, the iteration times are 300, and the coding mode is real number coding.

Step four: and D, selecting a solution which enables the modularity value to be maximum in the optimal solution set obtained in the step three as a final solution, combining the multiple relation networks into a single relation network M according to a weight ratio given by the final solution, carrying out community discovery on the M by adopting a single relation community discovery algorithm, and recording the obtained number of communities as K, wherein the K is 3.

Step five: adopting a single relation community discovery algorithm BGLL to respectively carry out community division on the network consisting of the four relations, and counting the division result according to a formula

Step six: calculating the similarity Sim between each node in the network through a single relation network M:

Step seven: if L is 0, let L be L +1, set C_L＝C_L-1Carrying out the next step; no case detection set C_LAnd C_L-1If yes, turning to step eleven, otherwise, making L equal to L +1, and collecting C_L＝C_L-1Proceed to the next step.

Step eight: seed community candidate set C_LThe communities which do not participate in merging in the method are sorted according to the number of the nodes contained in the communities, and the community S with the largest number of the nodes is selected_iAs a seed community.

Step nine: calculating seed community S_iThe similarity with the rest candidate seed communities and the rest nodes is calculated as follows:

and

is a set of node numbers.

Step ten: selecting and S_iPlacing communities and nodes with similarity exceeding a set threshold value gamma-1 into a candidate area, and calculating S_iThe current fitness F and the current fitness F of each community of the candidate area_hThe calculation formula is as follows:

wherein is of the formula

Twice the sum of the weights of the edges inside the community,

is the sum of the weights of the external sides of the community, α is a parameter for regulating the size of the community, and the value is set as one;

then respectively calculating communities, nodes and S of the candidate merging area_iThe combined fitness value is recorded, and then F is calculated_newRelative to S_iGrowth rate V of original fitness F_sAnd candidate community fitness F_hIs increased by the rate V_h。

Step eleven: at V_sAnd V_hSelecting V on the premise that the V values are all larger than the threshold value of the growth rate to be 0.1_s+V_hThe largest one of the communities and S_iMerging into a new community, updating S_iAnd set of seed communities C_LReturning to the step ten; if not, the set C is checked_LIf the existing communities are merged, returning to the step nine if the existing communities are not merged, and if the existing communities are merged, returning to the step nineSet C_LAnd resetting the merged records of all communities in the database, and returning to the step eight.

Step twelve: checking the number of communities, if the number of communities is less than equal to K, outputting the result, ending the program and outputting a set C_L(ii) a If the number of the nodes is larger than K, sequentially calculating the similarity between the rest communities and the nodes by taking the first K communities with the largest number of the nodes as targets, merging the rest communities and the nodes into the first K communities with the highest similarity to the rest communities and the nodes, and updating the set C_LFinally, output community set C_LAs a final partitioning result.

The application effect of the present invention will be described in detail with reference to the simulation.

The first simulation is that in the case of the embodiment of the present invention, u is 0.5, and the noise r is 0.1, the present invention compares the result of community discovery by using a BGLL algorithm with only one relationship, and uses an NMI value to verify the similarity between the experimental result and the real community structure, as shown in fig. 3. It can be seen that the results of community discovery using multiple relationships are superior to the results using only one relationship.

Simulation two is a comparison of the detection results of the present invention with the conventional PMM method when the noise is continuously increased (from 0.1 to 0.4, each time by 0.05) in the embodiment of the present invention, as shown in fig. 4. It can be seen that the method of the present invention has a strong anti-noise effect.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A multi-relationship community discovery method based on relationship combination optimization and seed expansion is characterized in that a multi-relationship network is fused into a single-relationship network which can effectively integrate community information of all relationships and has low noise through the weight proportion of all relationships in a multi-objective optimization network, then community division information of all relationships in the multi-relationship network is integrated to find out people in the same community in all relationships, small communities formed by the people are used as seed communities, and a seed expansion strategy is adopted to carry out community mining on the multi-relationship social network to obtain community structure division with higher accuracy;

wherein A is_i，jIs the adjacency matrix of the entire network, m is the number of edges contained in the network, k_iAnd k_jDegree of node i and node j, respectively, (C)_i，C_j) The value of (C) depends on whether i and j are in the same community, if so (C)_i，C_j) The value is 1, otherwise, the value is 0;

the rows of the confusion matrix H represent real community division results, and the columns of the confusion matrix H represent community results drawn by a division algorithm; c_AAnd C_BRespectively representing the number of communities contained in the A result and the B result, M_ijRepresenting the number of nodes, M, that should be in community i but appear in community j_i.And M_.jRespectively representing the sum of the number of nodes of communities in the ith row and the jth column, wherein N is the number of nodes contained in the whole network; when NMI is 1, the community structures representing the two results of A and B are completely the same, and when NMI is 0, the community structures representing the two results of A and B are completely different;

the seed expansion strategy selects the community s with the maximum number of nodes_iAs a seed community; calculating seed communities s_iSimilarity to the remaining candidate seed communities and the remaining nodes,the calculation is as follows:

u and v are respectively communities s_i，s_jNode n of (1)_iAnd n_jRespectively the number of nodes in the two communities,

and

is a set of node numbers;

the multi-relationship community discovery method based on relationship combination optimization and seed expansion comprises the following steps:

1) directly configuring weights for all relations of the multiple relation networks according to the same proportion, and combining the multiple relation networks into a single relation network A with the weights;

and optimizing a sub-target II: the similarity of community structure information displayed by the community division results of the new weight proportion combined single-relation network B and the network A is as large as possible;

4) selecting a weight ratio which enables sub-targets to be maximum in the optimal decision variables obtained in the step 3) in a centralized manner, fusing multiple relation networks into a single relation network M, carrying out community discovery on the M by adopting a single relation community discovery method, and recording the obtained number of communities as K;

8) if L is 0, let L be L +1, set C_L＝C_L-1Carrying out the next step; if L > 0, detect set C_LAnd C_L-1If yes, turning to step 11), otherwise, making L equal to L +1, and collecting C_L＝C_L-1Carrying out the next step;

9) seed community candidate set C_LThe communities which do not participate in merging in the method are sorted from large to small according to the number of nodes contained in the communities, and the community s with the highest sorting degree is selected_iAs a seed community;

10) calculating the seed community s according to the similarity calculated in the step 7)_iThe similarity between the candidate seed communities and the rest nodes is recorded as Sim;

12) At V_sAnd V_hAll are greater than the threshold value of the growth rate, and then V is selected_s+V_hLargest candidate community s_jAnd s_iMerge into a new community, update s_iAnd seed communitySet C_LReturning to the step 10); if not, the set C is checked_LIf the existing communities are merged, if not, returning to the step 9), if yes, the set C is collected_LClearing the merged records of all communities in the database, and returning to the step 8);

13) checking the number of communities in the network, if the number is less than equal to K, outputting the result, ending the program and outputting a set C_L(ii) a If the K is greater than the threshold K, the following elimination strategy is executed: sequentially calculating the rest communities and the similarity between the nodes and the rest communities by taking the first K communities with the maximum number of the nodes as targets, merging the rest communities and the nodes into the first K communities with the highest similarity to the rest communities and the nodes, and updating the set C_LFinally, output community set C_LAs a final partitioning result.

2. The multi-relationship community discovery method based on relationship combination optimization and seed expansion as claimed in claim 1, wherein a single-relationship community discovery algorithm BGLL is adopted to perform community division on the network composed of four relationships respectively, and statistical division results are according to a formula

3. The multi-relationship community discovery method based on relationship combination optimization and seed expansion as claimed in claim 1, wherein the similarity Sim between each node in the network is calculated by a single-relationship network M:

wherein K represents the number of dimensions of i and j divided into the same community, N is 4, the gain factor Δ is 1, the weight of the connecting edge of the node i and the node j when W (i, j) is reached, com (i, j) is the intersection of two node neighbor nodes, and N (i, j) is the union of the two node neighbor nodes.

4. The multi-relationship community discovery method based on relationship combinatorial optimization and seed dilation according to claim 1, wherein the sum s is selected_iPlacing communities and nodes with similarity exceeding a set threshold value gamma-1 into a candidate area, and calculating s_iThe current fitness F and the current fitness F of each community of the candidate area_hThe calculation formula is as follows:

wherein is of the formula

Twice the sum of the weights of the edges inside the community,

α is a parameter for regulating the scale of the community, and the value is set to be 1;

then respectively calculating communities, nodes and s of the candidate merging areas_iThe combined fitness value is recorded as F_newThen calculate F_newRelative to s_iGrowth rate V of original fitness F_sAnd candidate community fitness F_hIs increased by the rate V_h；

At V_sAnd V_hSelecting V on the premise that the V values are all larger than the threshold value of the growth rate to be 0.1_s+V_hThe largest one of the communities and s_iMerge into a new community, update s_iAnd set of seed communities C_L(ii) a If not, the set C is checked_LWhether all existing communities are merged; if yes, set C_LClearing the merged records of all communities in the database;

5. A computer applying the multi-relationship community discovery method based on the relationship combination optimization and the seed expansion as claimed in any one of claims 1 to 4.