CN106022936B - Community structure-based influence maximization algorithm applicable to thesis cooperative network - Google Patents

Community structure-based influence maximization algorithm applicable to thesis cooperative network Download PDF

Info

Publication number
CN106022936B
CN106022936B CN201610353585.4A CN201610353585A CN106022936B CN 106022936 B CN106022936 B CN 106022936B CN 201610353585 A CN201610353585 A CN 201610353585A CN 106022936 B CN106022936 B CN 106022936B
Authority
CN
China
Prior art keywords
community
influence
node
edges
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610353585.4A
Other languages
Chinese (zh)
Other versions
CN106022936A (en
Inventor
吴骏
陈厚兵
张梓雄
王晓彤
吴和生
王崇骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201610353585.4A priority Critical patent/CN106022936B/en
Publication of CN106022936A publication Critical patent/CN106022936A/en
Application granted granted Critical
Publication of CN106022936B publication Critical patent/CN106022936B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an influence maximization algorithm (COMAX algorithm) based on a community structure and applicable to a thesis cooperative network, which comprises the following steps: 1) a community discovery phase a constructs a thesis cooperation network graph; b, merging local communities; c, constructing a new network diagram; d, finishing; 2) a seed node selection stage a is used for calculating the influence of each community; b, selecting a corresponding node in the community with the largest influence; and c, finishing. The influence maximization algorithm based on the community structure provides a new solution for the influence maximization problem of the thesis cooperation network, and results show that the COMAX algorithm provided by the invention is close to a greedy algorithm in the influence coverage range on the ICM model, and the time efficiency is very good.

Description

Community structure-based influence maximization algorithm applicable to thesis cooperative network
Technical Field
The invention relates to a method for solving an influence maximization problem of a thesis cooperation network, in particular to a method for solving the influence maximization problem based on a community structure.
Background
In recent years, online social networks have been rapidly developed, and more social websites are presented. Information dissemination in these social networks, both in scale and in efficiency, has surpassed real life. The influence maximization problem concerns how to select a fixed number of seed nodes so as to maximize the coverage of information propagation. When a subject or a field needs to be investigated or deeply understood, all the data in the field cannot be viewed, a part of works of authors with high influence are selected, and how to find the authors with high influence is the process of selecting seed nodes.
In 2003, formalization of Kempe, Kleinberg, and Tardos three [ Maximizing the Spread of infiniture through a Social Network ] defined the impact maximization problem, transformed it into a discrete optimization problem, and proved that the problem was NP-Hard difficult. Under the linear threshold model and the independent cascade model, a greedy algorithm is given, and the approximate ratio of the greedy algorithm to the optimal algorithm is proved to be (1-1/e). However, the time complexity of the greedy algorithm is very high, the degree distribution condition of the network is not considered, the community structure of the network is not considered, the influence of each seed node needs to be recalculated when the seed node is selected every time, and the time efficiency is low.
In 2007, aiming at the problem that the greedy algorithm is high in time complexity, Leskovec et al [ Cost-effective Outbreak Detection in Networks ] apply the sub-model characteristics in the influence maximization, and provide an optimization strategy of 'LazyForward', and a CELF algorithm.
In 2009, Chen Wei et al [ Efficient knowledge in social networks ] proposed a New greedy algorithm and a MixGreeny algorithm based on the high time complexity of the greedy algorithm. The NewGreedy algorithm preprocesses an original network graph, deletes edges irrelevant to a propagation process, and finally changes a problem into a reachable node set of a seed node set in a new network graph. The MixGreedy algorithm is the combination of a NewGreedy algorithm and a CELF algorithm, the NewGreedy algorithm is used when the first node is selected, the initial influence of each node is calculated, and then the CELF algorithm is used when the seed node is selected. The results show that the coverage range of the New Greedy algorithm and the coverage range of the MixGreedy algorithm are close to the greedy algorithm, the time efficiency is higher than that of the greedy algorithm, but Monte Carlo simulation experiments need to be applied for multiple times, the overall efficiency is low, and the method is not suitable for large-scale social networks.
Many influence maximization algorithms do not consider the community structure of the network, but the connection between nodes inside the community is closer than the connection outside the community, and accordingly, in the information dissemination process, the possibility that a node activates other nodes in the same community is higher than the possibility that the node outside the community is activated. The influence maximization algorithm based on the community structure is provided, the whole network is divided into relatively independent communities, the node influence is calculated in each community, and then the maximum influence is used as the community influence. After the seed nodes are selected, the influence value of one community only needs to be recalculated, and the influence value does not need to be recalculated completely, so that the efficiency of selecting the seed nodes is greatly improved.
Disclosure of Invention
The invention aims to solve the technical problem of providing a seed node selection method suitable for the influence maximization problem of a thesis cooperative network.
The technical scheme is as follows: in order to solve the above problems, the impact maximization algorithm based on community structure for paper cooperation network of the present invention includes the following steps:
1) a community discovery phase;
a, constructing an initial thesis cooperation network graph;
b, merging local communities;
c, constructing a new network diagram;
d, finishing;
2) seed node selection stage
a, calculating community influence;
b, selecting a seed node;
and c, finishing.
In the invention, the nodes in the network graph constructed in the steps 1) -a represent authors, edges in the network graph represent that cooperative relationships exist among the authors and papers are published together, and the authority values of the edges represent the number of the papers published together.
In the invention, merging the local communities in the steps 1) -b means that each node is taken as a local community, each node is selected to be connected with the node and merged with the community with the largest modularity value increment after merging, wherein the modularity value is expressed by the following formula:
Figure BDA0000999154480000021
where nc denotes the number of all communities, incIndicating the number of edges, tot, inside the community ccIndicating the number of all edges connected to the nodes in community c.
The increment of the modularity value after the node i and the community c are merged is as follows:
Figure BDA0000999154480000022
wherein
Figure BDA0000999154480000031
Representing the number of edges connecting node i with community c, which after merging become the internal edges of the new community, kiRepresenting the degree of node i.
In the invention, the step 1) -c of constructing the new network graph means that all the nodes in the communities obtained in the step 1) -b after combination are represented by one node to be used as the nodes in the new network graph, and the connecting edges between the original communities become the connecting edges between the nodes in the new network graph.
In the invention, the step 2) -a of calculating the community influence refers to taking the influence value of the node with the largest influence in the community as the influence of the community and recording the corresponding node.
In the invention, the step 2) -b of selecting the seed node refers to selecting the corresponding node in the community with the largest influence, and the influence of the corresponding community needs to be recalculated.
The invention has the beneficial effects that: the influence maximization algorithm of the thesis cooperation network based on the community structure provides a new heuristic solution for solving the influence maximization problem, the influence propagation range of the selected seed nodes is close to that of the greedy algorithm, the time efficiency is high, and the influence maximization algorithm is suitable for solving the influence maximization problem of a large-scale social network.
Drawings
Fig. 1 is a flowchart of a paper cooperation network influence maximization method based on a community structure according to an embodiment of the present invention.
FIG. 2 is a flow chart of the community discovery phase of FIG. 1.
Fig. 3 is a flow chart of the seed stage selection stage in fig. 1.
Fig. 4 is a comparison of the coverage of the influence of the algorithm (COMAX) proposed by the invention and the seed nodes selected by other methods on the Hep data set.
Detailed Description
In order to better understand the technical content of the present invention, specific embodiments are described below with reference to the accompanying drawings.
As shown in FIG. 1, the method has two stages, namely a community discovery stage and a seed node selection stage.
The influence maximization algorithm based on the community structure and applied to the thesis cooperative network comprises the following steps:
1) a community discovery phase;
a, constructing an initial thesis cooperation network graph;
b, merging local communities;
c, constructing a new network diagram;
d, finishing;
2) seed node selection stage
a, calculating community influence;
b, selecting a seed node;
and c, finishing.
Fig. 2 is a flow chart of the community discovery phase, which is divided into three main parts, namely, constructing an original network graph, merging local communities and constructing a new network graph. After the local communities are merged, all nodes in the same local community need to be abstracted into one node, a new network is established, and merging is performed again. The merging is performed when the module value increment is positive.
The concrete steps of the community discovery phase are as follows:
step 1-0 is the method start;
step 1-1 is to traverse the corpus, which is the first step in building the network and needs to record the author information of all relevant corpora.
Step 1-2 is to extract the cooperative relationship, the nodes of the network are constructed in step 1-1, but the edges between the nodes and the weights of the edges are not known, the authors cooperate to construct an edge between the two authors, and the weight of the final edge is the total number of the two authors cooperating to construct the thesis.
Step 1-3 is to construct a cooperative network graph, and a undirected weighted graph G (V, E, W) is constructed by using the nodes constructed in step 1-1 and the edges constructed in step 1-2. V denotes authors, E denotes a cooperative relationship between authors, and W denotes the number of cooperative papers between authors.
Step 1-4 is to calculate the modularity value increment of the combination of the node and the connected community, and the modularity value increment after the combination of the node i and the community c is as follows:
Figure BDA0000999154480000041
wherein
Figure BDA0000999154480000042
Representing the number of edges connecting node i with community c, which after merging become the internal edges of the new community, kiRepresenting the degree of node i. At this step, for each node, the modularity value increment after it is merged with all connected communities needs to be calculated, and the maximum increment value and the corresponding community are recorded.
And 1-5, judging whether the maximum modularity value increment after a certain node and a connected community are merged in all the nodes is larger than 0, if not, jumping to the step 1-8, and ending the community discovery phase.
Steps 1-6 are the merge phase, merging, for each node, with the community of the largest modularity value increment greater than 0.
Step 1-7 is to construct a new network graph, abstract all the nodes in the same community after being merged in step 1-6 into a node, and take the edge between the original communities as the edge between the nodes in the new graph, so that the number of the nodes in the new network graph is consistent with the number of the communities after being merged in step 1-6, and each node represents one previous community. Then jumps to step 1-4.
Steps 1-8 are returning to the community structure of the community network, where the community discovery phase is completed.
Fig. 3 is a flow chart of the seed node selection stage, which is divided into two main parts, namely, calculating community influence and selecting a seed node. The influence of all communities needs to be calculated firstly, then the node corresponding to the community with the maximum influence is selected, only the influence of the selected community needs to be recalculated, and other communities do not need to be recalculated.
The seed node selection stage comprises the following specific steps:
step 2-0 is the method start;
and step 2-1, calculating the influence of the nodes in the community. The information propagation model used by us is an independent cascade model, and for the weighted network graph, the expected value of the influence of the node v is as follows:
Figure BDA0000999154480000051
wherein invIs the sum of the edge weights, t, of the node v and the nodes connected inside the communityvIs the sum of the edge weights that have become seed nodes in the neighbors of node v, and p is the probability of each edge being successfully activated. For node u and node v, the edge weight value between them is t, and if u is in the active state, the probability that u activates v is 1- (1-p)t
And 2-2, calculating community influence, wherein the community influence is the maximum influence value of all nodes in the community, and recording the nodes corresponding to the influence values.
And 2-3, selecting seed nodes, firstly positioning the community with the largest influence, then selecting nodes corresponding to the community, and adding the nodes into the seed node set.
And 2-4, judging whether the seed node selection process is finished, if the number of the selected seed nodes reaches K, skipping to the step 2-6, and finishing the algorithm.
And 2-5, recalculating the influence values of all nodes in the community with the maximum influence in the step 2-3, then calculating the influence of the community, and jumping to the step 2-3.
And 2-6, returning the selected seed node set until the seed selection is completed.
The data set Hep used in fig. 4 is a data set frequently used by the impact maximization problem, and is a cooperative network diagram in the high-energy physical direction. It can be found from the graph that as the number of the seed nodes increases, the influence coverage of the seed node set is increased, and the influence coverage of the seed node set selected by the COMAX algorithm is very close to that of the greedy algorithm CELF algorithm after acceleration, but the time efficiency is higher than that of the CELF algorithm by multiple orders of magnitude.
In summary, the influence maximization algorithm based on the community structure provides a new method for discovering high-influence nodes for the thesis cooperation network, and the method includes the steps of firstly dividing the network into relatively independent community structures, then calculating the community influence, selecting the corresponding node in the community with the largest influence to be added into the seed node, and recalculating the community influence, so that K seed nodes are found in a circulating manner.
Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention should be determined by the appended claims.

Claims (1)

1. A method for maximizing influence based on a community structure and applied to a thesis cooperation network is characterized by comprising the following steps:
1) a community discovery phase;
a, constructing an initial thesis cooperation network graph;
b, merging local communities;
c, constructing a new network diagram;
d, finishing;
2) seed node selection stage
a, calculating community influence;
b, selecting a seed node;
c, finishing;
the step 1) -a of constructing the cooperative network graph means that in the constructed network graph, nodes represent authors, edges in the graph represent that a cooperative relationship exists between two authors and papers are published together, the weight values of the edges represent the number of the papers published together, and the constructed network graph is an undirected graph;
the local communities are merged in the steps 1) -b, namely, each node is taken as a local community, each node is selected to be connected with the node, and the communities with the largest modularity value increment after merging are merged, wherein the modularity value formula is as follows:
Figure FDA0002344206710000011
where nc denotes the number of all communities, incIndicating the number of edges, tot, inside the community ccRepresenting the number of all edges connected to the nodes in community c, and m representing the number of all edges in the network;
the increment of the modularity value after the node i and the community c are merged is as follows:
Figure FDA0002344206710000012
wherein
Figure FDA0002344206710000013
Representing the number of edges connecting node i with community c, which after merging become the internal edges of the new community, kiRepresenting the degree of the node i;
the step 1) -c of constructing the new network graph refers to that all nodes in the communities obtained in the step 1) -b after combination are represented by one node to serve as nodes in the new network graph, and connecting edges between the original communities become connecting edges between the nodes in the new network graph;
calculating the influence of the community in the steps 2) -a, namely taking the influence value of the node with the maximum influence in the community as the influence of the community, and recording the corresponding node;
selecting the seed node in the steps 2) -b refers to selecting a corresponding node in the community with the largest influence, and recalculating the influence of the corresponding community.
CN201610353585.4A 2016-05-25 2016-05-25 Community structure-based influence maximization algorithm applicable to thesis cooperative network Active CN106022936B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610353585.4A CN106022936B (en) 2016-05-25 2016-05-25 Community structure-based influence maximization algorithm applicable to thesis cooperative network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610353585.4A CN106022936B (en) 2016-05-25 2016-05-25 Community structure-based influence maximization algorithm applicable to thesis cooperative network

Publications (2)

Publication Number Publication Date
CN106022936A CN106022936A (en) 2016-10-12
CN106022936B true CN106022936B (en) 2020-03-20

Family

ID=57093642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610353585.4A Active CN106022936B (en) 2016-05-25 2016-05-25 Community structure-based influence maximization algorithm applicable to thesis cooperative network

Country Status (1)

Country Link
CN (1) CN106022936B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103551A (en) * 2017-03-20 2017-08-29 重庆邮电大学 A kind of coauthorship network community division method of selected seed node
CN109118381B (en) * 2018-08-29 2019-10-22 中国人民解放军陆军工程大学 A kind of symbolic network Combo discovering method based on game theory
CN109617871B (en) * 2018-12-06 2020-04-14 西安电子科技大学 Network node immunization method based on community structure information and threshold
CN114707040B (en) * 2022-04-08 2023-08-18 中国电信股份有限公司 Enterprise cooperation group data classification method and device, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101383748A (en) * 2008-10-24 2009-03-11 北京航空航天大学 Community division method in complex network
CN102202012A (en) * 2011-05-30 2011-09-28 中国人民解放军总参谋部第五十四研究所 Group dividing method and system of communication network
CN103020302A (en) * 2012-12-31 2013-04-03 中国科学院自动化研究所 Academic core author excavation and related information extraction method and system based on complex network
CN104820945A (en) * 2015-04-17 2015-08-05 南京大学 Online social network information transmision maximization method based on community structure mining algorithm
CN105184075A (en) * 2015-09-01 2015-12-23 南京大学 Multi-triangular group similarity cohesion based overlapping community discovery method applicable to TCMF (Traditional Chinese Medicine Formula) network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101383748A (en) * 2008-10-24 2009-03-11 北京航空航天大学 Community division method in complex network
CN102202012A (en) * 2011-05-30 2011-09-28 中国人民解放军总参谋部第五十四研究所 Group dividing method and system of communication network
CN103020302A (en) * 2012-12-31 2013-04-03 中国科学院自动化研究所 Academic core author excavation and related information extraction method and system based on complex network
CN104820945A (en) * 2015-04-17 2015-08-05 南京大学 Online social network information transmision maximization method based on community structure mining algorithm
CN105184075A (en) * 2015-09-01 2015-12-23 南京大学 Multi-triangular group similarity cohesion based overlapping community discovery method applicable to TCMF (Traditional Chinese Medicine Formula) network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Determining the Top-k Nodes in Social Networks using the Shapley Value (Short Paper);N.Rama Suri等;《Proc.of 7th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2008)》;20080516;第1509-1512页 *
一种新型的社会网络影响最大化算法;田家堂等;《计算机学报》;20111031;第34卷(第10期);第1956-1965页 *
微博网络上的重叠社群发现与全局表示;胡云等;《软件学报》;20141231;第25卷(第12期);第2824-2836页 *
用于社团发现的Girvan-Newman 改进算法;朱小虎等;《计算机科学与探索》;20101231;第4卷(第12期);第1101-1108页 *

Also Published As

Publication number Publication date
CN106022936A (en) 2016-10-12

Similar Documents

Publication Publication Date Title
CN106022936B (en) Community structure-based influence maximization algorithm applicable to thesis cooperative network
Fletcher et al. Unstructured peer-to-peer networks: Topological properties and search performance
CN109218304B (en) Network risk blocking method based on attack graph and co-evolution
CN110166344B (en) Identity identification method, device and related equipment
CN112907369B (en) Block chain-based data consensus method and device, electronic equipment and storage medium
US10606867B2 (en) Data mining method and apparatus
CN109376544B (en) Method for preventing community structure in complex network from being deeply excavated
CN110705045B (en) Link prediction method for constructing weighted network by utilizing network topology characteristics
CN113422695B (en) Optimization method for improving robustness of topological structure of Internet of things
CN110838072A (en) Social network influence maximization method and system based on community discovery
CN110659284A (en) Block sequencing method and system based on tree graph structure and data processing terminal
CN112446634B (en) Method and system for detecting influence maximization node in social network
CN105468681A (en) Network negative information impact minimization method based on topic model
CN110809066A (en) IPv6 address generation model creation method, device and address generation method
CN105162654A (en) Link prediction method based on local community information
CN110995597A (en) Method and system for selecting safe link of power communication network
Du et al. A genetic simulated annealing algorithm to optimize the small-world network generating process
CN112235254B (en) Rapid identification method for Tor network bridge in high-speed backbone network
CN112600905A (en) Transaction broadcasting and block generating method, apparatus and storage medium
CN115878729B (en) Node block storage allocation optimization method and system based on alliance chain
Farahani et al. A hybrid meta-heuristic optimization algorithm based on SFLA
CN109450684B (en) Method and device for expanding physical node capacity of network slicing system
CN107222334A (en) Suitable for the local Combo discovering method based on core triangle of social networks
KR101907551B1 (en) Effective graph clustering apparatus and method for probabilistic graph
CN109492677A (en) Time-varying network link prediction method based on bayesian theory

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant