CN111784529A - Social network high-quality user identification method based on overlapping community detection - Google Patents

Social network high-quality user identification method based on overlapping community detection Download PDF

Info

Publication number
CN111784529A
CN111784529A CN202010596006.5A CN202010596006A CN111784529A CN 111784529 A CN111784529 A CN 111784529A CN 202010596006 A CN202010596006 A CN 202010596006A CN 111784529 A CN111784529 A CN 111784529A
Authority
CN
China
Prior art keywords
node
individual
population
cost
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010596006.5A
Other languages
Chinese (zh)
Inventor
张磊
孙凤姣
刘玉童
吴鑫鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202010596006.5A priority Critical patent/CN111784529A/en
Publication of CN111784529A publication Critical patent/CN111784529A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a social network high-quality user identification method based on overlapping community detection. By designing the initialization strategy and the local search strategy, the invention can effectively find the high-influence user combination under different cost budgets in the network, and provides a plurality of choices for decision makers with different cost budget requirements.

Description

Social network high-quality user identification method based on overlapping community detection
Technical Field
The invention relates to a social network, in particular to a social network high-quality user identification method based on overlapping community detection.
Background
With the development of internet technology, people share their opinion information through social platforms such as Twitter and Weibo, so that information dissemination breaks through the limitations of time and space. Currently, most research works only focus on the influence of the nodes and neglect the cost of the nodes in the aspect of maximizing the social influence. In actual marketing, a merchant often adopts some marketing strategies, for example, a star introduction, giving free trial products, discounting and the like to promote the products, and a certain cost is required to be invested in the process. In addition, because the influence between users is different in magnitude, the users with large influence can make the spread range of the information wide. Since the higher the impact the higher the cost. In this case, it is important to identify good users in the network, while maximizing impact and minimizing costs. Therefore, a social network high-quality user identification method based on overlapping community detection is provided.
Currently, influence-cost optimization methods in social networks are mainly classified into the following two categories:
the first type: a fixed cost budget needs to be set. According to the social network, a fixed cost budget is set in advance, a greedy strategy is used in a current common method to find seed nodes in the network until the cost budget is exceeded, but the time consumption is large and only one group of seed node combinations can be found by using the greedy strategy.
The second type: no fixed cost budget needs to be set. The cost budget fixed in advance is not needed, the cost of the seed node is used as an optimization target, from the perspective of a decision maker, the cost budget selected by the decision maker is as small as possible, and the generated influence is as large as possible. The current common algorithm is solved by utilizing a multi-objective optimization method, but does not combine the information of the overlapped community structures in the network to find the users with good quality in the network. However, overlapping points of overlapping communities allow information to be propagated between different communities, acting as a "bridge," and at the same level of cost, the influence of overlapping points is propagated significantly better than non-overlapping points. In addition to this, the lack of a suitable strategy does not perform well in a specific problem.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a social network high-quality user identification method based on overlapping community detection. And mining users with high influence and low cost in the network by utilizing the overlapped community structure information. Through an effective initialization strategy and a local search strategy, a plurality of seed node combination schemes with different cost budgets are provided for decision makers to select, and therefore the requirement for solving practical problems is met.
In order to solve the technical problems, the invention adopts the following technical scheme:
a social network high-quality user identification method based on overlapping community detection is characterized by comprising the following steps:
the network is defined as G (V, E), V ═ V1,v2,…,vi,…,vnDenotes a user in the social network, viRepresents the ith user; n is the total number of nodes; e ═ E ij1,2, …, n; j ═ 1,2, …, n } indicates that there is a connection between any two nodes; e.g. of the typeijRepresents the ith node viAnd j node vjWhether an edge exists between; if eij1 denotes the ith node viAnd j node vjThere is an edge connection between them, then the ith node viAnd j node vjNodes which are mutually adjacent; if eij0 denotes the ith node viAnd j node vjThere is no edge connection between them, i.e. there is no connection;
step 1, detection of overlapping communities
Preprocessing a network by using an overlapping community detection algorithm to obtain an overlapping community structure of the network and a community label of a community where each node is located;
step 2, individual coding
For the social network, seed nodes with fixed lengths are selected to form an individual, the individual is coded in an integer mode, and the individual X ═ { X } representing the seed node combination is obtained1,x2,…,xk},xiRepresenting the node number in the network, and k representing the number of the selected seed nodes in the social network;
step 3, initialization
Step 3.1, defining the population scale as pop, the maximum iteration number as maxgen, the initial iteration number gen as 0, and controlling the parameter of the local search number as m;
step 3.2, calculating the cost performance index of the node:
firstly, respectively calculating the structure degree structDegreee (v) of the node and the cost c (v) of the node, and then calculating the cost performance index of the node;
step 3.2.1, calculating node structure degree
Figure BDA0002557277410000021
In the formula (1), LvCommunity tag, N, representing node vvA set of neighbor nodes representing node v;
Figure BDA0002557277410000031
indicating the overlapping importance of the node v itself,
Figure BDA0002557277410000032
represents the overlapping importance of the neighbors of node v;
step 3.2.2, calculate node cost
Figure BDA0002557277410000033
In the formula (2), CiRepresents the cost of node i, diRepresenting the size of i degrees of the node; r, m and t are fixed constants, r is used for constructing the cost of different levels, m and t are respectively used for measuring the cost difference of nodes of different levels and the same level, and the larger the degree of the node is, the higher the cost is;
step 3.2.3, obtaining the cost performance index of the node
Figure BDA0002557277410000034
In the formula (3), structDegreee (v) and C (v) are respectively formula (1) and formula (2);
step 3.3, setting pop individuals in the population { X1,X2,…,Xi,…,XpopIn which X isiTo represent(ii) an ith individual;
and 3.4, selecting pop-k nodes to form an individual according to descending order of degree, and recording as X0
Step 3.5 obtaining the individuals X according to step 3.40
Step 3.5.1, for individual X0Randomly generating a random number r of [0,1) for each gene position, and traversing each gene position;
step 3.5.2, if the random number r is greater than 0.5, randomly selecting two nodes v and u from the network;
step 3.5.3, comparing the size of the cost performance indexes cp (v) and cp (u), if the node with large cost performance index does not exist in the individual, the node with large cost performance index replaces the node of the gene position; if the node with large cost performance index exists in the individual or the cost performance indexes of the node and the individual are the same, turning to the step 3.5.2 until all the gene positions are traversed to obtain the t-th individual Xt={x1,x2,…,xk};
Step 3.6, repeat step 3.5 until pop individuals { X }are obtained1,X2,…,Xi,…,Xpop}, constructing an initial population P1={X1,X2,…,Xi,…,Xpop};
Step 3.7, calculating the tth individual X in the initial population by using the formula (4)tA corresponding 2-dimensional objective function value comprising: the influence of the nodes is approximately evaluated, the larger the influence is, the better the influence is, the smaller the cost is, the better the influence is;
Figure BDA0002557277410000041
wherein N issRepresenting nodes covered by a 1-hop range of the node s, namely neighbor nodes of the s, wherein p is the propagation probability;
Figure BDA0002557277410000042
representing the influence of the 1-hop range of the node s, CiThe cost of the ith node in the seed node S;
step 3.8, sequencing the obtained initialized population according to a non-dominated sequencing method to obtain a sequenced population with a plurality of front surfaces; calculating the crowding distance of the sorted population with the plurality of leading faces according to the Euclidean distance;
step 4, population evolution
Step 4.1, selecting the sorted population with a plurality of leading edges by adopting a championship selection strategy to obtain a mating pool;
4.2, carrying out cross variation by using individuals in the mating pool to generate a new population with the size of pop, which is called a sub-population;
4.3, calculating the corresponding influence and cost of the individuals in the sub-population according to the step 3.7;
and 4.4, mixing the father population and the child population, carrying out non-dominated sorting, calculating the crowding distance of the sorted population according to the Euclidean distance, and marking the current population as Ptemp
Step 4.5, judging whether the current population needs to be subjected to local search according to the formula (5), and if not, executing step 4.6 to show that the local search is not executed; if equation (5) is true, go to step 4.7 to perform local search:
gen|m=0 (5)
wherein gen is the current iteration number, m is a set parameter used for controlling the number of local search, and '|' is a modulo operation;
step 4.6, Slave population PtempSelecting pop individuals as populations of the gen +1 iteration;
step 4.7, local search
Step 4.7.1, Slave population PtempAll individuals of the first front surface are selected and sorted in descending order according to the individual influence; the first half of the individuals after the sorting is marked as PinfMake a local search on the influence target, the latter half is marked as PcostLocal search on the target is made;
step 4.7.2, PinfIn each individual, the nodes are sorted according to the ascending order of the structure degree, and the search length l is randomly selected; starting from the first gene locus,randomly replacing the node on the current gene by the neighbor node of the current gene, wherein the neighbor node does not exist in the current individual, and if the influence of the replaced individual is larger than that of the previous individual, the node is reserved; otherwise, traversing the neighbor nodes; repeating the steps until the traversal length exceeds l, and finally obtaining a searched individual set, and marking as P'inf
Step 4.7.3, PcostThe node cost of each individual is sorted in descending order, the search length l is randomly selected, from the first gene position, the neighbor node of the individual replaces the node on the current gene randomly, the neighbor node does not exist in the current individual, and if the cost of the replaced individual is less than the cost of the previous individual, the node is reserved; otherwise, traversing the neighbor nodes; repeating the steps until the traversal length exceeds l, and finally obtaining a searched individual set, and marking as P'cost
Step 4.7.4, mix Ptemp、P'inf、P'costThe individuals in the population group are subjected to non-dominated sorting, the crowding distance of the sorted population group is calculated according to the Euclidean distance, and pop individuals are selected from the crowding distance as the population of the gen +1 th iteration;
step 4.8, assigning gen +1 to gen; and repeating the step 4 until the maximum iteration times are reached, thereby obtaining a final iterated population which is marked as Lastpop;
and 4.9, selecting all individuals in the first front surface from the Lastpop population, wherein the seed node combination in the front surface can provide various solutions for decision makers with different cost budget requirements.
Compared with the prior art, the invention has the beneficial effects that:
1. compared with a single target with fixed cost budgets and a greedy strategy for continuously optimizing the seed node combination, the algorithm can simultaneously obtain a group of seed node combinations with different cost budgets, and the running time of the algorithm is much shorter than that of the greedy strategy.
2. Compared with a multi-target method without fixed cost budget, the algorithm effectively utilizes the structure information of the overlapped communities, and provides an initialization strategy and a local search strategy which can effectively improve the performance of the algorithm.
Drawings
FIG. 1 is a flow chart of the algorithm of the present invention;
FIG. 2 illustrates the detection of overlapping communities according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
As shown in fig. 1 and 2, a social network high-quality user identification method based on overlapping community detection is performed according to the following steps:
the network is defined as G (V, E), V ═ V1,v2,…,vi,…,vnDenotes a user in the social network, viRepresents the ith user; n is the total number of nodes; e ═ E ij1,2, …, n; j ═ 1,2, …, n } indicates that there is a connection between any two nodes; e.g. of the typeijRepresents the ith node viAnd j node vjWhether an edge exists between; if eij1 denotes the ith node viAnd j node vjThere is an edge connection between them, then the ith node viAnd j node vjNodes which are mutually adjacent; if eij0 denotes the ith node viAnd j node vjThere is no edge connection between them, i.e. there is no connection;
step 1, detection of overlapping communities
Preprocessing a network by using an overlapping community detection algorithm to obtain an overlapping community structure of the network and a community label of a community where each node is located;
step 2, individual coding
For the social network, seed nodes with fixed lengths are selected to form an individual, the individual is coded in an integer mode, and the individual X ═ { X } representing the seed node combination is obtained1,x2,…,xk},xiRepresenting node numbers in the network, k representing a selected seed in the social networkThe number of nodes.
Step 3, initialization
Step 3.1, defining the population scale as pop, the maximum iteration number as maxgen, the initial iteration number gen as 0, and controlling the parameter of the local search number as m;
and 3.2, calculating the cost performance index of the node.
Firstly, respectively calculating the structure degree structDegree (v and the cost c (v) of the node), and then calculating the cost performance index of the node.
Step 3.2.1, calculating node structure degree
Figure BDA0002557277410000061
In the formula (1), LvCommunity tag, N, representing node vvA set of neighbor nodes representing node v.
Figure BDA0002557277410000062
Indicating the overlapping importance of the node v itself,
Figure BDA0002557277410000063
representing the overlapping importance of the neighbors of node v.
Step 3.2.2, calculate node cost
Figure BDA0002557277410000071
In the formula (2), CiRepresents the cost of node i, diRepresenting the size of i degrees of the node; r, m and t are fixed constants, r is used for constructing the cost of different levels, m and t are respectively used for measuring the cost difference of nodes of different levels and the same level, and the larger the degree of the node is, the higher the cost is.
Step 3.2.3, obtaining the cost performance index of the node
Figure BDA0002557277410000072
In the formula (3), structDegreee (v) and C (v) are respectively formula (1) and formula (2).
Step 3.3, setting pop individuals in the population { X1,X2,…,Xi,…,XpopIn which X isiRepresents the ith individual;
and 3.4, selecting top-k nodes to form an individual according to descending order of degree, and recording as X0
Step 3.5 obtaining the individuals X according to step 3.40
Step 3.5.1, for individual X0A random number r of [0,1) is randomly generated for each locus, traversing each locus.
Step 3.5.2, if the random number r is greater than 0.5, randomly selecting two nodes v and u from the network;
step 3.5.3, comparing the size of the cost performance indexes cp (v) and cp (u), if the node with large cost performance index does not exist in the individual, the node with large cost performance index replaces the node of the gene position; if the node with large cost performance index exists in the individual or the cost performance indexes of the node and the individual are the same, turning to the step 3.5.2 until all the gene positions are traversed to obtain the t-th individual Xt={x1,x2,…,xk};
Step 3.6, repeat step 3.5 until pop individuals { X }are obtained1,X2,…,Xi,…,Xpop}, constructing an initial population P1={X1,X2,…,Xi,…,Xpop};
Step 3.7, calculating the tth individual X in the initial population by using the formula (4)tA corresponding 2-dimensional objective function value comprising: the influence of the nodes is approximately evaluated, the larger the influence is, the better the influence is, and the smaller the influence is, the better the cost is when the seed node is S.
Figure BDA0002557277410000073
Wherein N issAnd representing nodes covered by the 1-hop range of the node s, namely neighbor nodes of the s, and p is the propagation probability.
Figure BDA0002557277410000074
Representing the influence of the 1-hop range of the node s, CiIs the cost of the ith node in the seed node S.
Step 3.8, sequencing the obtained initialized population according to a non-dominated sequencing method to obtain a sequenced population with a plurality of front surfaces; and calculating the crowding distance of the sorted population with the plurality of front surfaces according to the Euclidean distance.
Step 4, population evolution
And 4.1, selecting the sorted population with a plurality of leading edges by adopting a championship selection strategy to obtain a mating pool.
4.2, carrying out cross variation by using individuals in the mating pool to generate a new population with the size of pop, which is called a sub-population;
4.3, calculating the corresponding influence and cost of the individuals in the sub-population according to the step 3.7;
and 4.4, mixing the father population and the child population, carrying out non-dominated sorting, calculating the crowding distance of the sorted population according to the Euclidean distance, and marking the current population as Ptemp
Step 4.5, judging whether the current population needs to be subjected to local search according to the formula (5), and if not, executing step 4.6 to show that the local search is not executed; if equation (5) is true, go to step 4.7 to perform local search:
gen|m=0 (5)
wherein gen is the current iteration number, m is a set parameter used for controlling the number of local search, and '|' is a modulo operation;
step 4.6, Slave population PtempSelecting pop individuals as populations of the gen +1 iteration;
step 4.7, local search
Step 4.7.1, Slave population PtempAll individuals of the first front surface are selected and sorted according to the descending order of the individual influence. The first half of the individuals after the sorting is marked as PinfMake a local search on the influence target, the latter half is marked as PcostMake a game on the objectAnd (4) searching.
Step 4.7.2, PinfAnd each individual in the node list is sorted according to the ascending order of the structure degree, and the search length l is randomly selected. Starting from the first gene position, randomly replacing the node on the current gene by using the neighbor node of the first gene position, wherein the neighbor node does not exist in the current individual, and if the influence of the replaced individual is larger than that of the previous individual, retaining the node; otherwise, the neighbor nodes are traversed. Repeating the steps until the traversal length exceeds l, and finally obtaining a searched individual set, and marking as P'inf
Step 4.7.3, PcostThe node cost of each individual is sorted in descending order, the search length l is randomly selected, from the first gene position, the neighbor node of the individual replaces the node on the current gene randomly, the neighbor node does not exist in the current individual, and if the cost of the replaced individual is less than the cost of the previous individual, the node is reserved; otherwise, the neighbor nodes are traversed. Repeating the steps until the traversal length exceeds l, and finally obtaining a searched individual set, and marking as P'cost
Step 4.7.4, mix Ptemp、P'inf、P'costThe individuals in (1) are subjected to non-dominant sorting, the crowding distance of the sorted population is calculated according to the Euclidean distance, and pop individuals are selected from the crowding distance as the population of the gen +1 th iteration.
Step 4.8, assigning gen +1 to gen; and repeating the step 4 until the maximum iteration times are reached, thereby obtaining a final iterated population which is marked as Lastpop;
and 4.9, selecting all individuals in the first front surface from the Lastpop population, wherein the seed node combination in the front surface can provide various solutions for decision makers with different cost budget requirements.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (2)

1. A social network high-quality user identification method based on overlapping community detection is characterized by comprising the following steps:
the network is defined as G (V, E), V ═ V1,v2,…,vi,…,vnDenotes a user in the social network, viRepresents the ith user; n is the total number of nodes; e ═ Eij1,2, …, n; j ═ 1,2, …, n } indicates that there is a connection between any two nodes; e.g. of the typeijRepresents the ith node viAnd j node vjWhether an edge exists between; if eij1 denotes the ith node viAnd j node vjThere is an edge connection between them, then the ith node viAnd j node vjNodes which are mutually adjacent; if eij0 denotes the ith node viAnd j node vjThere is no edge connection between them, i.e. there is no connection;
step 1, detection of overlapping communities
Preprocessing a network by using an overlapping community detection algorithm to obtain an overlapping community structure of the network and a community label of a community where each node is located;
step 2, individual coding
For the social network, seed nodes with fixed lengths are selected to form an individual, the individual is coded in an integer mode, and the individual X ═ { X } representing the seed node combination is obtained1,x2,…,xk},xiRepresenting the node number in the network, and k representing the number of the selected seed nodes in the social network;
step 3, initialization
Step 3.1, defining the population scale as pop, the maximum iteration number as maxgen, the initial iteration number gen as 0, and controlling the parameter of the local search number as m;
step 3.2, calculating the cost performance index of the node:
firstly, respectively calculating the structure degree structDegreee (v) of the node and the cost c (v) of the node, and then calculating the cost performance index of the node;
step 3.2.1, calculating node structure degree
Figure FDA0002557277400000011
In the formula (1), LvCommunity tag, N, representing node vvA set of neighbor nodes representing node v;
Figure FDA0002557277400000012
indicating the overlapping importance of the node v itself,
Figure FDA0002557277400000013
represents the overlapping importance of the neighbors of node v;
step 3.2.2, calculate node cost
Figure FDA0002557277400000014
In the formula (2), CiRepresents the cost of node i, diRepresenting the size of i degrees of the node; r, m and t are fixed constants, r is used for constructing the cost of different levels, m and t are respectively used for measuring the cost difference of nodes of different levels and the same level, and the larger the degree of the node is, the higher the cost is;
step 3.2.3, obtaining the cost performance index of the node
Figure FDA0002557277400000021
In the formula (3), structDegreee (v) and C (v) are respectively formula (1) and formula (2);
step 3.3, setting pop individuals in the population { X1,X2,…,Xi,…,XpopIn which X isiRepresents the ith individual;
and 3.4, selecting top-k nodes to form an individual according to descending order of degree, and recording as X0
Step 3.5 obtaining the individuals X according to step 3.40
Step 3.5.1, for individual X0Randomly generating a random number r of [0,1) for each gene position, and traversing each gene position;
step 3.5.2, if the random number r is greater than 0.5, randomly selecting two nodes v and u from the network;
step 3.5.3, comparing the size of the cost performance indexes cp (v) and cp (u), if the node with large cost performance index does not exist in the individual, the node with large cost performance index replaces the node of the gene position; if the node with large cost performance index exists in the individual or the cost performance indexes of the node and the individual are the same, turning to the step 3.5.2 until all the gene positions are traversed to obtain the t-th individual Xt={x1,x2,…,xk};
Step 3.6, repeat step 3.5 until pop individuals { X }are obtained1,X2,…,Xi,…,Xpop}, constructing an initial population P1={X1,X2,…,Xi,…,Xpop};
Step 3.7, calculating the tth individual X in the initial population by using the formula (4)tA corresponding 2-dimensional objective function value comprising: the influence of the nodes is approximately evaluated, the larger the influence is, the better the influence is, the smaller the cost is, the better the influence is;
Figure FDA0002557277400000022
wherein N issRepresenting nodes covered by a 1-hop range of the node s, namely neighbor nodes of the s, wherein p is the propagation probability;
Figure FDA0002557277400000023
representing the influence of the 1-hop range of the node s, CiThe cost of the ith node in the seed node S;
step 3.8, sequencing the obtained initialized population according to a non-dominated sequencing method to obtain a sequenced population with a plurality of front surfaces; calculating the crowding distance of the sorted population with the plurality of leading faces according to the Euclidean distance;
step 4, population evolution
Step 4.1, selecting the sorted population with a plurality of leading edges by adopting a championship selection strategy to obtain a mating pool;
4.2, carrying out cross variation by using individuals in the mating pool to generate a new population with the size of pop, which is called a sub-population;
4.3, calculating the corresponding influence and cost of the individuals in the sub-population according to the step 3.7;
and 4.4, mixing the father population and the child population, carrying out non-dominated sorting, calculating the crowding distance of the sorted population according to the Euclidean distance, and marking the current population as Ptemp
Step 4.5, judging whether the current population needs to be subjected to local search according to the formula (5), and if not, executing step 4.6 to show that the local search is not executed; if equation (5) is true, go to step 4.7 to perform local search:
gen|m=0 (5)
wherein gen is the current iteration number, m is a set parameter used for controlling the number of local search, and '|' is a modulo operation;
step 4.6, Slave population PtempSelecting pop individuals as populations of the gen +1 iteration;
step 4.7, local search;
step 4.8, assigning gen +1 to gen; and repeating the step 4 until the maximum iteration times are reached, thereby obtaining a final iterated population which is marked as Lastpop;
and 4.9, selecting all individuals in the first front surface from the Lastpop population, wherein the seed node combination in the front surface can provide various solutions for decision makers with different cost budget requirements.
2. The method for identifying good users of social networks based on the detection of the overlapping communities as claimed in claim 1, wherein the local search specifically comprises the steps of:
step 4.7.1, Slave population PtempAll individuals of the first leading surface are selected according toSorting the individual influence in descending order; the first half of the individuals after the sorting is marked as PinfMake a local search on the influence target, the latter half is marked as PcostLocal search on the target is made;
step 4.7.2, PinfIn each individual, the nodes are sorted according to the ascending order of the structure degree, and the search length l is randomly selected; starting from the first gene position, randomly replacing the node on the current gene by using the neighbor node of the first gene position, wherein the neighbor node does not exist in the current individual, and if the influence of the replaced individual is larger than that of the previous individual, retaining the node; otherwise, traversing the neighbor nodes; repeating the steps until the traversal length exceeds l, and finally obtaining a searched individual set, and marking as P'inf
Step 4.7.3, PcostThe node cost of each individual is sorted in descending order, the search length l is randomly selected, from the first gene position, the neighbor node of the individual replaces the node on the current gene randomly, the neighbor node does not exist in the current individual, and if the cost of the replaced individual is less than the cost of the previous individual, the node is reserved; otherwise, traversing the neighbor nodes; repeating the steps until the traversal length exceeds l, and finally obtaining a searched individual set, and marking as P'cost
Step 4.7.4, mix Ptemp、P'inf、P'costThe individuals in (1) are subjected to non-dominant sorting, the crowding distance of the sorted population is calculated according to the Euclidean distance, and pop individuals are selected from the crowding distance as the population of the gen +1 th iteration.
CN202010596006.5A 2020-06-28 2020-06-28 Social network high-quality user identification method based on overlapping community detection Pending CN111784529A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010596006.5A CN111784529A (en) 2020-06-28 2020-06-28 Social network high-quality user identification method based on overlapping community detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010596006.5A CN111784529A (en) 2020-06-28 2020-06-28 Social network high-quality user identification method based on overlapping community detection

Publications (1)

Publication Number Publication Date
CN111784529A true CN111784529A (en) 2020-10-16

Family

ID=72760069

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010596006.5A Pending CN111784529A (en) 2020-06-28 2020-06-28 Social network high-quality user identification method based on overlapping community detection

Country Status (1)

Country Link
CN (1) CN111784529A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801692A (en) * 2021-01-14 2021-05-14 安徽大学 Advertisement marketing effective user identification method based on influence indexes

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801692A (en) * 2021-01-14 2021-05-14 安徽大学 Advertisement marketing effective user identification method based on influence indexes
CN112801692B (en) * 2021-01-14 2022-08-30 安徽大学 Advertisement marketing effective user identification method based on influence indexes

Similar Documents

Publication Publication Date Title
Naruchitparames et al. Friend recommendations in social networks using genetic algorithms and network topology
CN112115377B (en) Graph neural network link prediction recommendation method based on social relationship
Kurdi A memetic algorithm with novel semi-constructive evolution operators for permutation flowshop scheduling problem
Burbrink et al. Resolving spatial complexities of hybridization in the context of the gray zone of speciation in North American ratsnakes (Pantherophis obsoletus complex)
CN107203590B (en) Personalized movie recommendation method based on improved NSGA-II
CN102413029A (en) Method for partitioning communities in complex dynamic network by virtue of multi-objective local search based on decomposition
CN107092812B (en) Method for identifying key protein based on genetic algorithm in PPI network
CN111191343A (en) Multi-mode multi-target differential evolution algorithm based on random sequencing learning
Lin et al. Steering information diffusion dynamically against user attention limitation
CN111784529A (en) Social network high-quality user identification method based on overlapping community detection
Wickman et al. A Generic Graph Sparsification Framework using Deep Reinforcement Learning
CN117271912A (en) Activity registration recommendation method of capability verification plan
CN108681570A (en) A kind of individualized webpage recommending method based on multi-objective Evolutionary Algorithm
CN116720001A (en) Opinion maximization method based on fairness constraint
CN111008334A (en) Top-K recommendation method and system based on local pairwise ordering and global decision fusion
CN115186189A (en) Mixed recommendation algorithm based on weighted bipartite graph
CN110297977B (en) Personalized recommendation single-target evolution method for crowd funding platform
CN112347369B (en) Integrated learning dynamic social network link prediction method based on network characterization
CN111291904B (en) Preference prediction method and device and computer equipment
CN115062236A (en) Hybrid rearrangement travel recommendation method and system based on multi-objective optimization
CN116245610B (en) Book fine-arranging method based on Monte Carlo method and lightweight graph neural network
CN109727150B (en) Community identification method for multi-user online learning platform
Bütün et al. A multi-objective genetic algorithm for community discovery
CN114741579A (en) Large-scale community detection method combining attribute information and structural information
CN112801692B (en) Advertisement marketing effective user identification method based on influence indexes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination