CN113723504B - Method for identifying influential propagators in complex network - Google Patents

Method for identifying influential propagators in complex network Download PDF

Info

Publication number
CN113723504B
CN113723504B CN202110999228.6A CN202110999228A CN113723504B CN 113723504 B CN113723504 B CN 113723504B CN 202110999228 A CN202110999228 A CN 202110999228A CN 113723504 B CN113723504 B CN 113723504B
Authority
CN
China
Prior art keywords
node
nodes
shell
index
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110999228.6A
Other languages
Chinese (zh)
Other versions
CN113723504A (en
Inventor
刘小洋
叶舒
张梦瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Technology
Original Assignee
Chongqing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Technology filed Critical Chongqing University of Technology
Priority to CN202110999228.6A priority Critical patent/CN113723504B/en
Publication of CN113723504A publication Critical patent/CN113723504A/en
Application granted granted Critical
Publication of CN113723504B publication Critical patent/CN113723504B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Optimization (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Operations Research (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method for identifying influential propagators in a complex network, which comprises the following steps: s1, arranging the first judgment scores in a descending order; s2, arranging the second judgment scores in a descending order; s3, selecting a node with the highest first judgment score and the highest second judgment score, and covering the node and the neighbor nodes thereof; if the first judgment score is the same as the first judgment score, a plurality of nodes with the same second judgment score are arranged, and one node is randomly selected; s4, judging whether the number of the selected nodes meets the set value, if so, executing the next step, and if not, executing the step S3; s5, after the selection is finished, obtaining a selected node set; and S6, displaying the performance index. The invention can identify a group of most influential nodes in the complex network and display the indexes thereof.

Description

Method for identifying influential propagators in complex network
Technical Field
The invention relates to the technical field of network information mining, in particular to a method for identifying influential propagators in a complex network.
Background
Complex networks are ubiquitous in real life, such as social networks, biomolecular networks, citation networks, transportation networks, however, often only a portion of the nodes can have a large impact on these networks, which would lead to a breakdown of these networks if removed. In recent years, the identification of a set of influential nodes has attracted considerable attention from the complex network sciences community, with great significance to real network applications such as information dissemination, epidemic control, rumor control, viral marketing, arrest of key suspects of criminal networks, prevention of catastrophic interruption of the power grid, and all.
A number of classical centrality methods based on network topology are used to identify key nodes in the network, such as centrality (Degree Centrality), proximity centrality (Closeness Centrality), betweenness centrality (Betweenness Centrality), feature vector centrality (Eigenvector Centrality). The centrality calculates the number of direct neighbors of the node, and the more neighbors of the node are considered, the more important the node is, but the centrality only considers the local information of the node. The reciprocal of the shortest distance from one node to all other nodes is calculated near the center, and the calculation complexity is high due to the consideration of global information. The number of shortest paths passing through one node is calculated through the betweenness centrality, and the calculation complexity is high. Feature vector centrality considers that the more important a node's neighbor node is, the more important that node is. Kitsak et al consider nodes located at the core of the network to be more influential and propose a k-shell decomposition method that effectively identifies the single most influential node. The k-shell decomposition method gives the same k-shell index to nodes with different propagation capacities, only the surplus degree is considered, and the connection to the deleted node is not considered, so that the k-shell decomposition method is a coarse-grained centrality method. Based on this, a number of centrality algorithms have been proposed to improve the performance of the k-shell decomposition method. With extended H-index, lu et al propose H-index centrality, where H-index of a node is defined such that the node has at least one neighbor and each neighbor has a degree not less than. ClusterRank combines centrality and cluster coefficients to determine the influence of nodes. In recent years, information entropy has been proposed for evaluating the importance of nodes in a network. Nie et al propose the use of Mapping Entropy (ME) to identify key nodes in a network, taking into account the correlation between all node neighbors. Degree of association, number of cores and entropy Sheikhahhmadi et al propose a mixed method (mixed Core, degree and Entropy, MCDE) to order nodes. Ji et al propose the use of percolation to identify scattered propagation nodes in a network.
Kempe et al demonstrate that the problem of maximization of impact is an NP-complex problem, which proposes greedy hill climbing algorithms, but with higher complexity, are not suitable for large-scale networks. Based on this, many researchers have focused on heuristic algorithms in recent years. Chen et al believe that when a node's neighbor is selected as a seed node, the node's degree should be given a certain discount, and a degree discount heuristic is proposed. A large number of centrality algorithms select high centrality nodes, most of which are clustered together, which may result in overlapping propagation ranges of influence. Based on the idea of decentralized selection of nodes, cao et al propose a core coverage algorithm (Core Cover Algorithm, CCA) whereby the highest-shelled and highest-degree node is selected and the neighbor nodes are covered out on each round. Since CCA covers all neighbor nodes, yang et al propose a heuristic (Neighborhood Coreness Cover and Discount Heuristic Algorithm, NCCDH) for neighborhood kernel coverage and discount, each round selects the node with the largest neighborhood kernel, each round only covers neighbor nodes of the same shell, and the remaining neighbor nodes discount off the corresponding k-shell value. Based on the voting selection strategy, zhang et al propose a voting ranking algorithm that gives each node a voting score and voting capability, but the algorithm only considers local information and gives each node the same voting capability, and the contributions of neighboring nodes cannot be distinguished.
Disclosure of Invention
The invention aims at least solving the technical problems in the prior art, and particularly creatively provides a method for identifying influential propagators in a complex network.
In order to achieve the above object of the present invention, the present invention provides a method for identifying influential propagators in a complex network, including:
s1, arranging the first judgment scores in a descending order;
s2, arranging the second judgment scores in a descending order;
s3, selecting a node with the highest first judgment score and the highest second judgment score, and covering the node and the neighbor nodes thereof; if the first judgment score is the same as the first judgment score, a plurality of nodes with the same second judgment score are arranged, and one node is randomly selected;
s4, judging whether the number of the selected nodes meets the set value, if so, executing the next step, and if not, executing the step S3;
and S5, after the selection is finished, obtaining a selected node set.
And S6, displaying the performance index.
In a preferred embodiment of the present invention, when the first judgment score is k-shell and the second judgment score is VoteRank, the method for identifying the influential propagator comprises the following steps:
s1, calculating k-shell values of each node according to a k-shell decomposition method, and arranging the k-shell values in descending order;
S2, calculating the voting score of each node according to a VoteRank algorithm, and arranging the voting scores in descending order;
s3, selecting a node with the maximum k-shell value and the highest VoteRank value, and covering the node and neighbor nodes thereof; if a plurality of nodes with the same VoteRank value exist in the maximum and same k-shell value, randomly selecting one node;
s4, judging whether the number of the selected nodes is equal to the set number, if so, executing the next step; if not, jumping to execute S3;
and S5, after the selection is finished, obtaining a selected node set.
Or/and when the first judgment score is k-shell and the second judgment score is H-index, the method for identifying the influential propagators comprises the following steps:
s1, calculating k-shell values of each node according to a k-shell decomposition method, and arranging the k-shell values in descending order;
s2, calculating the H-Index value of each node, and arranging the H-Index values in descending order;
s3, selecting a node with the maximum k-shell value and the highest H-index value, and covering the node and neighbor nodes thereof; if a plurality of nodes with the same H-index value exist in the largest and same k-shell value, randomly selecting one node;
s4, judging whether the number of the selected nodes is equal to the set number, if so, executing the next step; if not, jumping to execute S3;
And S5, after the selection is finished, obtaining a selected node set.
Or/and when the first judgment score is k-shell and the second judgment score is NH-index, the method for identifying the influential propagators comprises the following steps:
s1, calculating k-shell values of each node according to a k-shell decomposition method, and arranging the k-shell values in descending order;
s2, calculating a neighborhood H-Index of each node, and arranging the neighborhood H-Index in descending order according to the H-Index value;
s3, selecting a node with the maximum k-shell value and the highest neighborhood H-index, and covering the node and the neighbor nodes thereof; if a plurality of nodes with the same neighborhood H-index value exist in the largest and same k-shell value, randomly selecting one node;
s4, judging whether the number of the selected nodes is equal to the set number, if so, executing the next step; if not, jumping to execute S3;
and S5, after the selection is finished, obtaining a selected node set.
Or/and the second judgment score is VoteRank, the method for identifying the influential propagators comprises the following steps:
s1, calculating an H-Index value of each node, and arranging the H-Index values in descending order;
s2, calculating the voting score of each node according to a VoteRank algorithm, and arranging the voting scores in descending order;
s3, selecting a node with the maximum H-Index value and the maximum VoteRank value, and covering the node and the neighbor nodes thereof; if a plurality of nodes with the same VoteRank value exist in the maximum and same H-Index value, randomly selecting one node;
S4, judging whether the number of the selected nodes is equal to the set number, if so, executing the next step; if not, jumping to execute S3;
and S5, after the selection is finished, obtaining a selected node set.
In a preferred embodiment of the present invention, the node differentiating formula of the k-shell value includes:
Figure BDA0003235075090000051
wherein k is s Representing the same k-shell value from the network,
Figure BDA0003235075090000052
represents the maximum k-shell value, d of the network ij Representing the shortest distance from node i to node J, wherein J represents a network core node set, namely the node with the highest k-shell value; />
Figure BDA0003235075090000057
Representing a set of nodes with the same k-shell value.
In a preferred embodiment of the present invention, the VoteRank comprises:
each node can obtain votes from neighbors, each node vV is assigned a tuple (S v ,Va v ),S v Representing the voting score, va, obtained by node v from a neighboring node v Representing voting capability of node v to neighbor node, S v Can be expressed as
Figure BDA0003235075090000053
Where N (v) represents the direct neighbor set, va, of node v i Representing the voting capability of node i on neighboring nodes.
In a preferred embodiment of the present invention, the H-index comprises:
the H-index of node i is defined as:
Figure BDA0003235075090000054
wherein H (·) is a functional representation of the node H-index, and the degree of the neighbor node of node i is
Figure BDA0003235075090000055
In a preferred embodiment of the present invention, the domain H-index comprises:
the neighborhood H-index of node i is defined as:
Figure BDA0003235075090000056
where N (i) represents the direct neighbor set of node i, i.e., all neighbor nodes contained, h j And represents the H-index of node j.
In a preferred embodiment of the invention, the performance indicators comprise SIR and/or average shortest path L s The SIR includes:
firstly, setting an initial selected seed node as an infection state, wherein all other nodes in a network are in a susceptible state, and at each time step, each infection node infects the susceptible node in the direct neighborhood of the infection node with the probability of beta;
meanwhile, each infected node can become a recovery state with the probability of gamma, and the node which becomes the recovery state can not be infected any more; the differential equation is
Figure BDA0003235075090000061
The infection probability beta cannot be too small or too large, and if beta is too small, the infectious disease cannot successfully infect the whole network or even cannot be transmitted; if β is too large, infectious diseases can infect almost the entire network, and the impact between different nodes cannot be distinguished, which is not significant. So that beta is selected to be higher than the propagation threshold beta min . The infection rate is defined as
Figure BDA0003235075090000062
Wherein S represents susceptibility, I represents infection, and R represents recovery.
In a preferred embodiment of the invention, the SIR further comprises:
infection scale F (t):
Figure BDA0003235075090000063
infection Scale F (t) c ):
Figure BDA0003235075090000064
Wherein n is I(t) And n R(t) The number of infected nodes and the number of restored nodes at time t are respectively represented, and n represents the total number of nodes in the network.
A larger F (t) indicates more nodes that are infected at time t and have a larger impact, the algorithm performance is better, and a shorter t indicates a faster propagation speed.
During the infection process, the number of nodes changing from the infection state to the recovery state gradually increases at each time step, and finally reaches a peak value, namely a stable state.
In a preferred embodiment of the present invention, the average shortest path L s Comprising the following steps:
the average shortest path length between the selected node sets S is defined as:
Figure BDA0003235075090000071
where |S| represents the radix of the finite set, i.e., the length of set S, l u,v Representing the shortest path length from node u to node v, a larger L S Representing the selected seed nodes more diffuse may maximize the propagation impact.
In summary, due to the adoption of the technical scheme, the beneficial effects of the invention are as follows:
1) In combination with global position information decomposed by the k-shell and local voting information of the VoteRank method, the most popular nodes at the core position of the network are considered to be most important, and a new KVOteRank method is provided for identifying a group of most influential nodes in the complex network.
2) Aiming at the problem that H-index only considers network local information and both H-index and k-shell are too coarse-grained, and inspired by a CCA method, k-shell is used as a first judgment score, H-index is used as a second judgment score, and a KHIndex method with a coverage neighborhood having a maximum H-index node is provided; the second-order neighborhood H-index is utilized, an expansion method KNIndex is further provided for selecting a group of most influential propagators in a complex network, and the network local information is further considered by the new expansion method KNIndex.
3) By utilizing the characteristic that H-index is used as an intermediate state of the degree and the core number, the H-index is used for approximately replacing the k-shell, H-index is used as a first judgment score, voteRank is used as a second judgment score, and a novel method for identifying the HVoteRank by the key nodes of the complex network is provided.
4) Based on the above tests of the hybrid neighborhood coverage method KVoteRank, KHIndex, KNHIndex and HVoteRank of the k-shell, H-index and volterank methods, an improved KNC method of neighborhood coverage of k-shell is proposed to identify a set of most influential propagators in a complex network; meanwhile, local and global information in the network is considered, and the problem that the k-shell decomposition method cannot distinguish the same shell node and H-index, voteRank only considers local neighbor information is solved.
5) The proposed KNC method is comprehensively evaluated by using an SIR model and average shortest path length, and simulation experiments are carried out on 8 real networks such as Jazz, which prove that the initial node set selected by the proposed KNC method is superior to the existing baseline method in infection scale under different infection rates and different initial node ratios, and the selected initial node set is more dispersed.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a schematic diagram of an embodiment of the KVOTERank algorithm of the present invention.
Figure 2 is a schematic diagram of the SIR propagation process of the present invention.
FIG. 3 is a schematic representation of the infection scale for different initial node ratios of the present invention.
FIG. 4 is a comparative schematic of the infection scale of the present invention.
FIG. 5 is a graphical representation of the infection scale for different infection rates according to the present invention.
Fig. 6 is a schematic diagram of the initial set of nodes shortest path length according to the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
1. Related work
Given a network G (V, E), V and E represent node sets and edge sets, respectively, n= |v| represents the number of nodes in the network, m= |e| represents the number of edges in the network, where || represents the cardinality of the set, and a= { a is used ij The adjacency matrix of figure G is represented when a ij =1 represents node v i And node v j With edges therebetween, otherwise a ij Node v shown by =0 i And node v j Without edges therebetween. N (v) represents the set of direct neighbors of node v.
1.1 k-shell centrality
The k-shell method considers that the position of a node is more important than the neighbor of the node, and one node is located in the core position of the network and has higher propagation influence even if the degree is smaller. The algorithm firstly removes the nodes with the degree of 1, the degree value is reduced in the removing process, and the nodes with the degree of 1 are continuously removed until the nodes with the degree of 1 are not in the network, and all the removed nodes are distributed as 1-shells; and then removing the nodes with the surplus degree of 2, and iteratively removing the nodes with the surplus degree value of less than or equal to 2 until the surplus degree of all the nodes is more than 2, wherein all the removed nodes are allocated with 2 shells, and repeating the process until all the nodes in the network are allocated with corresponding shells. However, the k-shell method allocates a large number of k-shell values of the same nodes, and the nodes have different propagation capacities, which is a coarse-grained method, and as the shell layer increases, the propagation influence ranges of the nodes may overlap as the node aggregation degree is higher.
1.2 Mixer decomposition (Mixed degree decomposition, MDD)
k-shell decomposition considers only the connections of the remaining nodes at a time, and ignores the edges connected to the deleted nodes altogether, zeng et al propose a mixture decomposition method (Mixed Degree Decomposition, MDD), while considering the node's remaining and exhaustion degrees, the mixture centrality of each node being expressed as
k m (v)=k r +λ*k e (1)
Wherein k is r Representing the degree of remainder of the node, i.e. the edges connected to the remaining nodes,k e represents the exhaustion degree of the node, namely the edge connected to the deleted node, and lambda is an adjustable parameter with a value between 0 and 1. When λ=1 then the node exhaustion is fully considered, in which case the MDD method is equivalent to the centrality of degree, and when λ=0 then the MDD method only considers the surplus degree, in which case the MDD method is also equivalent to the k-shell decomposition method.
1.3 neighborhood nucleus centering (Neighborhood coreness centrality)
The neighborhood nucleus (Neighborhood Coreness, NC) was proposed by Bae et al, which considers that the impact is greater when a propagator has more neighbors at the network core. The method improves the degeneracy of the k-shell method and balances the position relationship between the degree and the nodes. Defining the neighborhood nucleus number of the node v as
Figure BDA0003235075090000101
Where N (v) represents the direct neighbor set of node v and ks (w) represents the k-shell index of neighbor node w. Bae et al also propose an expanded neighborhood kernel number (Extend Neighborhood Coreness, nc+), which is defined as the number of expanded neighborhood kernels of node v, taking into account the second order neighbors of the node
Figure BDA0003235075090000102
Wherein C is nc (w) represents the number of neighbor cores of the neighbor node w.
1.4 improved k-Shell index
Considering the distance between the target node and the network core location, liu et al propose an improved k-shell method for distinguishing nodes of the same shell. The method considers that the closer the nodes with the same k-shell value are to the core position of the network, the larger the propagation influence is. The node discrimination formula for the same k-shell value is as follows
Figure BDA0003235075090000103
Wherein k is s Representing the same k-shell value from the network,
Figure BDA0003235075090000104
represents the maximum k-shell value, d of the network ij Representing the shortest distance from node i to node J, J representing the set of network core nodes, i.e., the node with the highest k-shell value. />
Figure BDA0003235075090000105
Representing a set of nodes with the same k-shell value.
1.5 H-index
The iterative k-shell decomposition process requires topology information of the network global, which limits its application in large-scale networks, while H-index is a local metric that takes into account the network part information (i.e. the degree of neighbors), which was first used to evaluate the academic yield number and the academic yield level of researchers. The H index of a person means that among the N papers published by the person, H papers are respectively cited for at least H times, and the number of the reference times of the rest N-H papers is not more than H times. In network science, node v is defined i And H satisfies at least H neighbor nodes for H, each neighbor node having a degree not less than H.
1.6 VoteRank
Zhang et al introduced the volterank method to identify a set of scattered propagators in a network. In real life, if a already supports B, a's supporting force on other people is reduced. VoteRank selects nodes one by one according to the voting scores of neighboring nodes, and when a certain node has been selected as the most influential node, its neighboring node voting ability is reduced, and the selected node does not participate in the next round of voting. The VoteRank method assigns each node a tuple (s v ,va v ) Wherein s is v Representing the voting score obtained by node v from a neighboring node, va v Representing the voting capability of node v. VoteRank may be composed of four parts:
step 1: initializing. All tuples are initialized with (0, 1), i.e. the voting score of each node is 0 and the voting capacity of each node is 1.
Step 2: voting. Each node obtains a voting score based on the sum of the voting capabilities of the neighboring nodes. The node with the highest voting score and which is not selected is selected as the most influential node. To avoid the node being selected again, the voting score of the node will be set to 0. To ensure that the node will not participate in the next vote, the voting capability of the node is also set to 0.
Step 3: updating. In order to obtain a more decentralized seed node set, after selecting a seed node v in step 2, in the next round, the voting capability of the neighbor node of v needs to be discounted, if u is the neighbor node of the node v, va u =va u -f,(va u > 0) or va u =0,(va u .ltoreq.0)), wherein
Figure BDA0003235075090000111
< k > is the average degree of nodes in the network, va u Representing the voting capability of node u.
Step 4: repeating the steps 2 and 3 until the number of the selected nodes meets the set number requirement.
The center of VoteRank has higher accuracy than the center of degree, the H index, the k-shell and the like, but the VoteRank does not consider the position distribution of nodes in the network, and in reality, even though the voting score is lower, the nodes positioned in the center of the network can generate larger influence.
2. Proposed method (Proposed method)
VoteRank centrality selects a set of influential propagators based on a voting mechanism, each node can obtain votes from neighbors, each node V e V is assigned a tuple (S v ,Va v ),S v Representing the voting score, va, obtained by node v from a neighboring node v Representing the voting capability of node v on neighboring nodes. S is S v Can be expressed as
Figure BDA0003235075090000121
Where N (v) represents the direct neighbor set, va, of node v i Representing the voting capability of node i on neighboring nodes.
H-index, i.e., H index, is defined as H, then the node has at least H neighbors, and each neighbor has a degree not less than H. For network G (V, E), define the degree of node i as k i The degree of the neighbor node is
Figure BDA0003235075090000122
Wherein (1)>
Figure BDA0003235075090000123
1 st neighbor node j representing node i 1 Degree of (1)/(2)>
Figure BDA0003235075090000124
2 nd neighbor node j representing node i 1 Degree of (1)/(2)>
Figure BDA0003235075090000125
Represents the kth of node i i Degree of each neighbor node.
The H-index of node i is defined as
Figure BDA0003235075090000126
Where H (·) is a functional representation of the node H-index.
Inspired by the neighborhood core centrality NC, an extended neighborhood H index is given here, the second order neighborhood H-index is considered, and the neighborhood H-index of the node i is defined as
Figure BDA0003235075090000127
Where N (i) represents the direct neighbor set of node i, i.e., all neighbor nodes contained, h j And represents the H-index of node j.
The KNC method is based on two judgment scores, wherein the first judgment score is a main judgment condition, the second judgment score is a secondary judgment condition, each time, an uncovered node which meets the highest judgment score and the highest judgment score is selected as a seed node, and after each round of node selection, the direct neighbor node is covered to dispersedly select the nodes, so that the propagation influence is maximized. The invention mainly describes a KVOTERank method which is a sub-method of the KNC method.
VoteRank centrality does not consider local information of nodes, differences of neighbor nodes, and position information of the nodes. Inspired by the core coverage algorithm (Core Cover Algorithm, CCA), the proposed kvaterank algorithm considers the most popular (i.e. the most volterank-valued) nodes located at the network core location to have a greater propagation impact. Nodes located in the core of the network have a greater impact even though the VoteRank value is smaller. The kvaterank algorithm procedure is presented herein. The KVOTERank algorithm execution process is divided into 4 stages:
stage 1: and calculating the k-shell value of each node according to a k-shell decomposition method, and arranging the k-shell values in descending order.
Stage 2: and calculating the voting score of each node according to the VoteRank algorithm for the same k-shell value, and arranging the voting scores in descending order.
Stage 3: and selecting the node with the highest VoteRank value, and covering the node and the neighbor nodes thereof. If there are multiple nodes with the same VoteRank value in the same shell, one node is randomly selected.
Stage 4: and (3) repeating the step (3) until the number of the selected nodes meets the set number requirement.
The algorithm can solve the problem that the k-shell decomposition method cannot distinguish the influence of the same shell node, and the nodes of the same shell are distinguished according to the voting scores of the nodes. In order to avoid overlapping of the impact ranges, the seed node is selected in a decentralized manner in stage 3 of the proposed algorithm, i.e. after each selection of the seed node, the node and its immediate neighbors are covered. The kvaterank algorithm description is shown in algorithm 1.
Table 1 KVOTERank algorithm
Figure BDA0003235075090000131
2-4 rows of the KVOTERank algorithm represent stage 1, calculate the node k-shell value, 5-8 rows represent stage 2, calculate the volternk score of each node, 9-17 rows represent stage 3 and stage 4, select the scattered seed node with the highest shell layer and highest voting score, i.e. when selecting the seed node, the k-shell value is the primary judgment condition, the voting score is the secondary judgment condition, and the neighbor nodes of the selected node are covered to dispersedly select the most influential node so that the propagation ranges do not overlap.
Considering that H-index and VoteRank consider the information of neighbor nodes as well, the centrality of local information is considered, and the larger the index of H-index is, the larger the social influence is, and combining the H-index and k-shell methods, the node which is positioned at the network core position and has the highest index (large influence) is considered to have the most influence, and the H-index (KHIndex) method based on k-shell decomposition is provided, wherein the first judgment score is k-shell, and the second judgment score is H-index. Further considering local information of the node, representing the semi-local information by using a neighborhood H-index, which is equivalent to considering a 4-order neighborhood of the node, and providing an extended KN HIndex algorithm of the neighborhood H-index based on k-shell decomposition, wherein the first judgment score of the KN HIndex method is k-shell, and the second judgment score is neighborhood H-index. For the KHIndex (and KNIndex) algorithm, stage 2 is to calculate the H-Index (and neighborhood H-Index) for each node, and stage 3 is to select the node with the largest k-shell value and highest H-Index for each round. Lu et al state the relationship of H-index to the degree and the number of nuclei, the degree being the initial state, H-index being the intermediate state and the number of nuclei being the steady state. Therefore, the KNC method considers that the H-index is approximately used for replacing the k-shell, namely, the H-index is used as a first judgment score, the VoteRank score is used as a second judgment score, and the node with high influence can be selected, so that the VoteRank method based on the H-index is named as HVoteRank. Because the H-index is also a coarse granularity method, H-index is used as a first judgment score, voteRank is used as a second judgment score, the node with the largest H-index and highest voting score is selected each time, after each round of node selection, the direct neighbor nodes are covered to select the nodes in a scattered way, and the algorithm process is similar to a KVOteRank algorithm. Table 2 gives the judgment scores for the KVoteRank, KHIndex, KNHIndex and HVoteRank 4 sub-methods of the KNC method.
Table 2 KNC method judgment score
Judgement condition KVoteRank KHIndex KNHIndex HVoteRank
Main score k-shell k-shell k-shell H-index
Second score VoteRank H-index NH-index VoteRank
In order to intuitively interpret the proposed algorithm, a kvaterank algorithm process diagram (see fig. 1) is presented, and the process of selecting the 3 most influential seed nodes by the proposed kvaterank algorithm is simply visualized, KHINdex, HNHIndex, HVoteRank being similar to kvaterank and not presented here.
According to the KVOTERank algorithm process diagram of FIG. 1, first, the k-shell value of each node in the network is calculated, and different colors represent that the node is in different shells. Node sets {1,2,3,4} are in 3-shell, {5,6,7,8} are in 2-shell, and the rest nodes are in 3-shell; next, the voting score for each node is calculated, the k-shell value for each node is plotted in the graph, along with the VoteRank score, which are given in Table 3 for all nodes.
TABLE 3 VoteRank score and H-index for nodes in a network
Figure BDA0003235075090000151
As shown in the kvaterank algorithm procedure fig. 1, according to the kvaterank algorithm, node 1 with the highest voting score (8) and 3-shell is selected first, and then the direct neighbor node set {2,3,4,5,6,7} of node 1 is covered, so that the first round of selection ends. Then node 8 with a voting score of 3.16 at the 2-shell is selected and the uncovered direct neighbor nodes 25,26 of node 8 are covered and the second round of selection ends. Then, node 23 with a vote score of 7.58 at 1-shell is selected and its neighbor node 16,17,18,19,20,21,22 is covered. The selection process of such 3 seed nodes (seed nodes, i.e., the most influential propagators selected) ends, with the set of selected seed nodes being {1,8,23}. For the KHIndex algorithm, according to the table 3, node 4 is selected first, then its neighbor node 1,2,3,5,8 is covered, then one of the randomly selected nodes 6,7 is selected, assuming that node 6 is randomly selected, neighbor node 7 is covered; finally, select node {15,17,23,25}, assume select node 15, then overlay neighbor nodes {13,14,17},3 seed-node selections are complete, and 3 most influential nodes selected by the KHIndex algorithm are {4,6,15}. The process is similar to the kvaterank algorithm for KNHIndex and HVoteRank algorithms, differing only in the first and second judgment scores.
Here, a simple test of the propagation process is performed on the k-shell and kmotehank selecting 15 nodes in the CEnew network (453 nodes), and the result is shown in SIR propagation process diagram 2, where blue nodes represent susceptible nodes, red represent infected nodes, and green represents immune nodes.
In the SIR propagation process diagram of fig. 2, the k-shell decomposition method and the proposed kmotehank algorithm are simply compared with the SIR propagation model, it can be seen that the initial nodes selected by the k-shell algorithm are somewhat clustered together, while the initial nodes selected by the kmotehank algorithm are scattered with each other, and at a time step of 10, the kmotehank algorithm infects 11 more nodes than the k-shell method.
3. Data set
To demonstrate the superiority of the performance of the proposed KNC method, the experiment used 8 real network datasets of different types and sizes. Most of the data sets come from a social graph database SNAP, compiled by the university of stamford college and students. Wherein 1) Jazz: the data set records jazz bands showing between 1912 and 1940; 2) CEnew: the cenw dataset is an edge list of the caenorhabditis elegans metabolic network. 3) Crimes: the data set is a crime network, the nodes represent a person, and the sides represent that two criminals participate in a crime together; 4) Email: the relationship of mail exchange between the Luo Weila university users is recorded; 5) Hamster: friendship and family contact between "www.hamsterster.com" website users are defined; 6) Ca-GrQc: the dataset is a collaborative network from electronically printed arXiv covering scientific collaboration between authors submitted to papers of the broad relativity and quantum universe categories. 7) Condmat: is a partner network based on the concentrate part of electronic version arXiv archived in 1995 to 1999. 8) Enrons: the secure email interactions are in the secure community and include information about over one million emails. Specific topology information description of the 8 real networks of Jazz et al is shown in table 4.
Table 4 network topology
Network n m <k> k max <d> <c> β min
Jazz 198 2742 27.697 100 2.235 0.6175 0.0266
CEnew 453 2025 8.94 237 2.664 0.646 0.0256
Crimes 829 1473 3.554 25 5.04 0.008 0.1960
Email 1133 5451 9.622 71 3.606 0.2540 0.0565
Hamster 2426 16631 13.711 273 3.67 0.538 0.0241
Ca-GrQc 4158 13422 6.456 81 6.049 0.665 0.0589
Condmat 23133 93497 8.083 281 5.352 0.633 0.0475
Enron 33696 180811 10.732 1383 4.025 0.708 0.0071
In the table 4 network topology table, n represents the total number of nodes in the network, and m represents the number of edges in the network;
Figure BDA0003235075090000171
representing the average degree, k, of nodes in a network max Represents the maximum degree of nodes in the network, < d > represents the average shortest path length of the network, +.>
Figure BDA0003235075090000172
Representing the average cluster coefficient of the network, I i Representing the number of edges between the direct neighbors of node i; beta min For the propagation threshold, here can be defined by +.>
Figure BDA0003235075090000173
And (5) calculating to obtain the product. Where N (i) represents the direct neighbor set of node i, |N (i) | represents the cardinality of set N (i), k represents the degree of nodes in the network,<·>for the averaging operation.
4. Performance index
4.1SIR epidemic model
An infectious disease model of susceptibility to infection recovery (SIR) is used to evaluate the performance of the proposed method. In the SIR model, the node has three states, susceptibility (S), infection (I), and recovery (R). Wherein the susceptibility status indicates that the node is susceptible to disease and information, the infection status indicates that the node has been infected with disease or has been activated by information, and the recovery status indicates that the node has recovered and no more information or disease is transferred. Firstly, setting the initial selected seed node as an infection state, and setting all other nodes in the network as susceptibility states, wherein each infection node infects the susceptibility nodes in the direct neighborhood of the infection node with the probability of beta in each time step. At the same time, each infected node becomes a recovery state with a probability of γ (γ represents the recovery probability), and the node that becomes the recovery state is not infected any more. The differential equation is
Figure BDA0003235075090000181
The infection probability beta cannot be too small or too large, and if beta is too small, the infectious disease cannot successfully infect the whole network or even cannot be transmitted; if β is too large, infectious diseases can infect almost the entire network, and the impact between different nodes cannot be distinguished, which is not significant. So that beta is selected to be higher than the propagation threshold beta min The propagation threshold for each network is given in column 8 of table 4. In this experiment, the infection rate was defined as
Figure BDA0003235075090000182
Due to the randomness in the model, the experimental results should be averaged over multiple simulations.
The infection probability beta refers to the probability that an infected node can infect a susceptible node in the neighborhood in one time step, and lambda refers to the infection rate, namely the ratio of the infection probability to the recovery rate, and can be understood as the infection capacity of the SIR model, and the infection probability under the recovery condition.
Algorithm performance can be measured by measuring the node's propagation capacity, which can be measured by the size of the infection at time t, F (t), and the final size of the infection, F (t c ) To represent. The infection scale represents the impact of selecting a node at time t,is defined as
Figure BDA0003235075090000183
Wherein n is I(t) And n R(t) The method respectively represents the number of infected nodes and the number of restored nodes at the time t, n represents the total number of nodes in the network, a larger F (t) represents more infected nodes at the time t, the influence is larger, the algorithm performance is better, and a shorter t represents faster propagation speed.
During the infection process, the number of nodes changing from the infection state to the recovery state gradually increases at each time step, and finally reaches a peak value, namely a stable state. Final infection size F (t c ) I.e. the proportion of the total number of recovery nodes indicates the final influence of the initially selected seed node, defined as
Figure BDA0003235075090000184
Thus, F (t) evaluates the propagation influence of the node at time t, F (t) c ) The propagation influence of the node when the SIR propagation process reaches steady state is evaluated.
4.2 average shortest path L s
For the selected seed nodes, if the seed nodes are gathered together like the degree centrality or the k-shell centrality, the propagation influence ranges overlap, so that the propagation influence ranges are more easily expanded by selecting the scattered seed nodes. By measuring the average shortest path L between selected seed nodes s The dispersion degree of the selected nodes is measured, and the performances of different algorithms are compared. The average shortest path length between the selected set of seed nodes S is defined as
Figure BDA0003235075090000191
Where |S| represents the radix of the finite set, i.e., the length of set S, l u,v Representing the shortest path length from node u to node v, a larger L S Representing the selected seed nodes more diffuse may maximize the propagation impact.
5. Experimental results and analysis
To test the effectiveness of the KVoteRank, KHIndex, KNHIndex and HVoteRank4 seed algorithms of the proposed KNC method, simulation analysis was performed on 8 real networks in Jazz et al with SIR model and average path length according to the performance index proposed in section 5, further comparing KVoteRank, KHIndex, KNHIndex and HVoteRank with the 7 baseline centrality algorithms Degree, k-sehll, NC, NC +, pageRank, H-index and volterank, while comparing the proposed 4 algorithm performances.
5.1 SIR model simulation analysis
The performance of different algorithms is judged according to the final infection scale under the condition of different initial seed node scales, different proportions are adopted when the initial propagation seed nodes are selected in view of different scales of the network, and a larger initial proportion is given to the network with smaller scale. The initial seed node ratio is set to 0.03 at maximum for networks Jazz, CEnew, crimes, email, hamster, ca-GrQc and 0.003 at maximum for larger scale networks Condmat and acron. The infection probability beta is set to 1.5 beta min The final infection scale F (t c ) As shown in the different initial node ratios of fig. 3. The x-axis represents the ratio p of different initial seed nodes, the y-axis represents the final infection scale at each ratio, and the subgraph in fig. 3 (a) of different initial node ratios alone gives the infection scale for the proposed KVoteRank, KHIndex, KNHIndex and HVoteRank 4 methods. The experimental results were obtained by the average of 300 experiments.
As can be seen from fig. 3, which shows the scale of infection from the initial node scale, the proposed 4 methods KVoteRank, KHIndex, KNHIndex and HVoteRank all achieved satisfactory results on 8 networks, demonstrating the superiority of the proposed methods. At smaller initial p, the 4 KNC methods presented were comparable to other baseline methods, but as p increased, the resulting 4 KNC methods had a final infection scale that was progressively better than the other baseline algorithms. Of all 8 networks, the proposed 4 methods perform best, especially on network CEnew, hamster and large-scale network Enron, with KNIndex performing slightly worse in small-scale network Jazz, and HVoteRank method in CEnew network. For example, in a CEnew network, the KHIndex method infects more than 12% of nodes with an initial node ratio of 0.02, whereas the baseline method does not reach this scale at an initial node ratio of 0.03. In the Hamster network, the proposed kvaterank method infects about 3.5% more nodes than the benchmark method. In all networks, the performance of the proposed KVoteRank, KHIndex, KNHIndex and HVoteRank algorithms is different from that of the other network, but the performance of the proposed method is better than 7 centralities of the benchmark, and the effectiveness of the proposed method is proved.
To verify the propagation scale and propagation speed of the selected seed nodes of the different algorithms, a time-step experiment is used to verify the performance of the different algorithms, and to maintain the consistency of the experiment, a fixed number of initial seed node ratios are set. For smaller scale networks Jazz, CEnew, crimes, email, hamster, ca-GrQc, the initial seed node ratio is 0.03, and for larger scale networks Condmat and Enron networks, the initial seed node ratio is 0.003. Experimental results were obtained by averaging 1000 experiments. The results of F (t) over time are shown in the comparative graph of infection scale in FIG. 4, with the axis representing time step and the axis representing infection scale over time.
As can be seen from the infection scale comparison graph of the initial node set of fig. 4, the proposed KVoteRank, KHIndex, KNHIndex and HVoteRank 4 algorithms always reach the highest peak, i.e. the infection protocol maximum, and always reach the most stable state, i.e. the infection rate is the most rapid, compared to the seven baseline algorithms of Degree, k-shell, etc. Among Jazz, CEnew, hamster, ca-GrQc, condma and Enront networks, HVoteRank performs best, especially at CEnew, hamster and Enront networks. HVoteRank is about 3.5% higher than the worst k-shell method and 2.5% higher than the best H-index for the CEnew network, about 3.8% higher than the best volterank method for the Hamster network, about 0.625% higher than the benchmark method for the large scale network Enron, KHIndex and KNHIndex perform comparable to HVoteRank, and kvaoterank performs slightly worse. Among Crimes networks, KVOTERank performs best, HVOteRank performs slightly worse than the 4 methods proposed, but clearly better than the other baseline methods. In Email networks, the proposed 4 KNC methods perform substantially equally well, about 2.2% higher than the baseline method. The k-shell method and nc+ perform worst in all networks, which can be explained by the k-shell method selecting seed nodes that aggregate with each other, resulting in overlapping propagation impact ranges. The 4 KNC methods such as KVOTERank and the like can always reach a stable state in the least time, and the 4 methods such as KVOTERank and the like have small differences in performance but are obviously superior to the other 7 reference methods.
In addition to the proportion of the initial seed nodes selected, the infection rate also affects the node transmission process, different transmission capacities are represented, the infection probability is set to be changed from 1.0 to 2.0, the transmission influence of the nodes is observed, the infection scale of different algorithms under different infection probabilities is shown in fig. 5, and the experimental result is obtained through the average of 300 experiments. The axis represents infection rate and the axis represents infection scale at different infection rates, the subgraph in fig. 5 (a) alone gives the infection scale for the proposed KVoteRank, KHIndex, KNHIndex and HVoteRank methods at co-infection probability in the Jazz network.
Observing the infection scale graph of different infection rates of fig. 5, the proposed KVoteRank, KHIndex, KNHIndex and HVoteRank methods show better performance on 8 networks and are superior to other 7 reference methods, especially on CEnew and Hamster networks, the performance is obviously superior to other methods, and at an infection rate of 2.0, the proposed HVoteRank algorithm can infect more than about 2.9% more on CEnew networks than on reference methods, and in Hamster networks, the kvaoterank algorithm is about 3.5% higher than on reference methods. All the proposed methods perform slightly worse in networks Jazz and Condmat. In networks Jazz and Email, at smaller rates of infection, the proposed method is comparable to other methods, with increasing infection scale, as messages may not be successfully propagated due to the smaller rates of infection. Experiments show that the proposed KNC sub-method KVoteRank, KHIndex, KNHIndex and HVoteRank have a stronger generalization ability than the reference method.
5.2 average Path Length analysis
The k-shell approach tends to select a single most influential node, but if a set of most influential nodes is selected, the k-shell approach does not perform well because the nodes of the high shells of the k-shell approach aggregate with each other, resulting in overlapping propagation influence ranges, and maximization of influence is not achieved. In general, the more dispersed the initial set of seed nodes selected, the more propagation effects can be maximized. The average path length of the node sets is used to measure the distance between the initial infection seed node sets. Fig. 6 shows the shortest path lengths between the seed node sets selected by the different algorithms of the networks Jazz and Crimes, with the remaining shortest paths being given in the initial node set shortest path table of Table 5.
Looking at the initial node set shortest path length graph of fig. 6, it can be seen that the average shortest path for the nodes selected for the KNC method proposed by the network Jazz is larger, but the distance is smaller than that of the CCA method, and in the Crimes network, the average shortest paths of kmotehank and KNHIndex are the largest, which means that the more scattered the selected seed nodes are, the more likely the propagation effect is maximized. Initial node set shortest path length table 5 gives the average shortest path length between the initial node sets selected by different networks under different algorithms. To maintain experimental consistency, for small scale networks Jazz, its initial node set ratio was set to 0.3, for smaller scale networks CEnew, crimes, email, hamster and Ca-GrQc, its initial node set ratio was set to 0.03, for larger scale networks Condmat and Enron, its initial node set ratio column was set to 0.003, and columns 7-10 of table 5 are the average shortest path lengths of the initial node sets selected by the proposed methods KHIndex, KNHIndex, HVoteRank and kmotehank, respectively.
Table 5 average path length of initial set of nodes
Figure BDA0003235075090000221
Figure BDA0003235075090000231
As can be seen from the table 5 of initial node set shortest path lengths, the proposed KNC sub-method KVoteRank, KHIndex, KNHIndex and HVoteRank average shortest path length are significantly larger than other reference methods except for the network Ca-GrQc, and it is also explained that the proposed method does not simply select a decentralized node, but a decentralized node according to propagation capability. Generally, the more the initial seed nodes are selected, the more information can be transferred to the whole network, so the average shortest path length is generally used as an evaluation index, but the performance of the algorithm cannot be absolutely described. To compare algorithm execution efficiency, the run times of the different algorithms were tested and the run times of the different methods in the 8 networks are given in the run schedule table 6.
TABLE 6 runtime(s)
Methods Jazz CEnew Email Crimes Hamster Ca-GrQc Condmat Enron
k-shell 0.0020 0.0040 0.0080 0.0050 0.0230 0.0240 0.2414 0.5446
Voterank 0.0020 0.0030 0.0090 0.0050 0.0538 0.0668 0.2683 0.8327
NC+ 0.0030 0.0040 0.0090 0.0040 0.0269 0.0249 0.2683 0.6323
H-index 0.0090 0.0070 0.0180 0.0060 0.0568 0.0499 0.3561 0.6462
NCCDH 0.0807 0.1436 0.2692 0.1237 0.8097 0.7071 1.9051 6.1238
CCA 0.1157 0.2653 1.6977 1.0874 9.8017 71.5783 231.2250 357.7111
Pagerank 0.1277 0.0798 0.2134 0.1117 0.6114 0.5335 2.5161 6.4447
KVR 0.0808 0.1307 0.4016 0.1506 1.2138 1.7222 28.5937 66.4826
HVR 0.1067 0.1630 0.4179 0.1824 1.2643 1.8282 29.6787 67.0074
KHI 0.0817 0.1077 0.3182 0.1466 0.8219 0.8166 1.3677 4.3059
KNHI 0.0778 0.1137 0.3321 0.1176 0.8134 0.8152 1.3933 4.7010
By combining the previous comparison results, the proposed method achieves the best effect in a reasonable time, and among the four methods, KHIndex and KNHIndex have less running time than kvaterank and HVoteRank. The proposed KNC method has a run time of no more than 70 seconds in all networks, and KHIndex and KNIndex run times of no more than 5 seconds.
6. Conclusion(s)
The invention provides a mixed k-shell-based neighborhood coverage KNC method for identifying a group of most powerful nodes in a complex network, and simultaneously considers global information and local information of the network. The KNC method is based on two judgment scores, each round of the uncovered nodes with the highest first judgment score and the highest second judgment score are selected as seed nodes, the first judgment score is the main score, the second judgment score is the secondary score, and after each round of the nodes are selected, the first-order neighbors are covered, so that the overlapping of the propagation influence ranges is avoided. Firstly, the k-shell decomposition method gives the same propagation capability to the nodes of the same shell layer, which is a coarse-grained ordering method, and the VoteRank method gives the same voting capability to each node, which is a centrality considering local information, and a KVOTERank algorithm based on the k-shell decomposition is provided for the problems, wherein the KVOTERank considers that the most popular node at the network core position has the propagation influence, and even if the VoteRank value of the node is smaller, the node is positioned at the network center, which is also important. Secondly, considering that H-index is also a coarse-grained local information method, and the larger the H-index is, the larger the social influence is, the H-index and k-shell are combined, a KHIndex method is provided, the second-order neighborhood H-index of the node is further considered, an extended KN Index method is provided, thirdly, the H-index is used as the characteristic of the intermediate state of the degree and the core number to approximately replace k-shell, and a group of most influential propagators are selected by combining the H-index and VoteRank, so that the HVoteRank method is provided. The proposed KVoteRank et al 4 KNC sub-methods are all based on the idea of coverage to maximize the propagation impact. In addition, SIR models and average shortest path length are used to evaluate the proposed KVoteRank, KHIndex, KNHIndex and HVoteRank methods against other existing Degree, k-shell, etc. benchmark methods. Test results on 8 real network data sets of Jazz et al show that the infection scale of the proposed KVoteRank, KHIndex, KNHIndex and HVoteRank under different initial node ratios, the infection scale and the transmission speed under different infection rates are obviously superior to those of the existing reference method, and the initial node set selected by the KNC method is larger, so that the selected seed nodes are more dispersed, and the transmission influence is larger. It can be seen that the proposed KNC method is rational and efficient.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims (11)

1. A method for implementing influential propagator identification in a complex network, comprising:
s1, arranging the first judgment scores in a descending order;
s2, arranging the second judgment scores in a descending order;
s3, selecting a node with the highest first judgment score and the highest second judgment score, and covering the node and the neighbor nodes thereof; if the first judgment score is the same as the first judgment score, a plurality of nodes with the same second judgment score are arranged, and one node is randomly selected;
s4, judging whether the number of the selected nodes meets the set value, if so, executing the next step, and if not, executing the step S3;
s5, after the selection is finished, obtaining a selected node set;
s6, displaying the performance index;
when the first judgment score is k-shell and the second judgment score is VoteRank, the method for identifying the influential propagators comprises the following steps of:
s1, calculating k-shell values of each node according to a k-shell decomposition method, and arranging the k-shell values in descending order;
S2, calculating the voting score of each node according to a VoteRank algorithm, and arranging the voting scores in descending order;
s3, selecting a node with the maximum k-shell value and the highest VoteRank value, and covering the node and neighbor nodes thereof; if a plurality of nodes with the same VoteRank value exist in the maximum and same k-shell value, randomly selecting one node;
s4, judging whether the number of the selected nodes is equal to the set number, if so, executing the next step; if not, jumping to execute S3;
and S5, after the selection is finished, obtaining a selected node set.
2. The method for identifying influential propagators in a complex network according to claim 1, wherein when the first judgment score is k-shell and the second judgment score is H-index, the method comprises the following steps:
s1, calculating k-shell values of each node according to a k-shell decomposition method, and arranging the k-shell values in descending order;
s2, calculating the H-Index value of each node, and arranging the H-Index values in descending order;
s3, selecting a node with the maximum k-shell value and the highest H-index value, and covering the node and neighbor nodes thereof; if a plurality of nodes with the same H-index value exist in the largest and same k-shell value, randomly selecting one node;
S4, judging whether the number of the selected nodes is equal to the set number, if so, executing the next step; if not, jumping to execute S3;
and S5, after the selection is finished, obtaining a selected node set.
3. The method for identifying influential propagators in a complex network according to claim 1, wherein when the first judgment score is k-shell and the second judgment score is NH-index, the method comprises the following steps:
s1, calculating k-shell values of each node according to a k-shell decomposition method, and arranging the k-shell values in descending order;
s2, calculating a neighborhood H-Index of each node, and arranging the neighborhood H-Index in descending order according to the H-Index value;
s3, selecting a node with the maximum k-shell value and the highest neighborhood H-index, and covering the node and the neighbor nodes thereof; if a plurality of nodes with the same neighborhood H-index value exist in the largest and same k-shell value, randomly selecting one node;
s4, judging whether the number of the selected nodes is equal to the set number, if so, executing the next step; if not, jumping to execute S3;
and S5, after the selection is finished, obtaining a selected node set.
4. The method for identifying influential propagators in a complex network according to claim 1, wherein when the first judgment score is H-index and the second judgment score is volterank, the method comprises the steps of:
S1, calculating an H-Index value of each node, and arranging the H-Index values in descending order;
s2, calculating the voting score of each node according to a VoteRank algorithm, and arranging the voting scores in descending order;
s3, selecting a node with the maximum H-Index value and the maximum VoteRank value, and covering the node and the neighbor nodes thereof; if a plurality of nodes with the same VoteRank value exist in the maximum and same H-Index value, randomly selecting one node;
s4, judging whether the number of the selected nodes is equal to the set number, if so, executing the next step; if not, jumping to execute S3;
and S5, after the selection is finished, obtaining a selected node set.
5. A method for identifying influential propagators in a complex network according to claims 1-3, wherein the node differentiating formula of k-shell value comprises:
Figure FDA0004145500740000031
wherein k is s Representing the same k-shell value from the network,
Figure FDA0004145500740000033
represents the maximum k-shell value, d of the network ij Representing the shortest distance from node i to node J, wherein J represents a network core node set, namely the node with the highest k-shell value; />
Figure FDA0004145500740000034
Representing a set of nodes with the same k-shell value.
6. A method of implementing influential propagator identification in a complex network according to claim 1 or claim 4 in which the VoteRank comprises:
Each node can obtain votes from neighbors, each node V e V being assigned a tuple (S v ,Va v ),S v Representing the voting score, va, obtained by node v from a neighboring node v Representing voting capability of node v to neighbor node, S v Can be expressed as
Figure FDA0004145500740000032
Where N (v) represents the direct neighbor set, va, of node v i Representing the voting capability of node i on neighboring nodes.
7. A method of implementing influential propagator identification in a complex network according to claim 2 or claim 4 in which the H-index comprises:
the H-index of node i is defined as:
Figure FDA0004145500740000041
wherein H (·) is a functional representation of the node H-index, and the degree of the neighbor node of node i is
Figure FDA0004145500740000045
8. The method for identifying influential propagators in a complex network according to claim 7, wherein the domain H-index comprises:
the neighborhood H-index of node i is defined as:
Figure FDA0004145500740000042
where N (i) represents the direct neighbor set of node i, i.e., all neighbor nodes contained, h j And represents the H-index of node j.
9. The method for identifying influential propagators in a complex network according to claim 1 in which the performance metrics comprise an epidemic model comprising:
firstly, setting an initial selected seed node as an infection state, wherein all other nodes in a network are in a susceptible state, and at each time step, each infection node infects the susceptible node in the direct neighborhood of the infection node with the probability of beta;
Meanwhile, each infected node can become a recovery state with the probability of gamma, and the node which becomes the recovery state can not be infected any more; the differential equation is
Figure FDA0004145500740000043
The infection rate is defined as
Figure FDA0004145500740000044
Wherein S represents susceptibility, I represents infection, and R represents recovery.
10. The method of claim 9, wherein the epidemic model further comprises:
infection scale F (t):
Figure FDA0004145500740000051
final infection size F (t) c ):
Figure FDA0004145500740000052
Wherein n is I(t) And n R(t) The number of infected nodes and the number of restored nodes at time t are respectively represented, and n represents the total number of nodes in the network.
11. The method for identifying influential propagators in a complex network according to claim 1 wherein the performance metrics further comprise an average shortest path L s
Figure FDA0004145500740000053
Where |S| represents the radix of the finite set, i.e., the length of set S, l u,v Representing the shortest path length from node u to node v.
CN202110999228.6A 2021-08-28 2021-08-28 Method for identifying influential propagators in complex network Active CN113723504B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110999228.6A CN113723504B (en) 2021-08-28 2021-08-28 Method for identifying influential propagators in complex network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110999228.6A CN113723504B (en) 2021-08-28 2021-08-28 Method for identifying influential propagators in complex network

Publications (2)

Publication Number Publication Date
CN113723504A CN113723504A (en) 2021-11-30
CN113723504B true CN113723504B (en) 2023-05-16

Family

ID=78678674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110999228.6A Active CN113723504B (en) 2021-08-28 2021-08-28 Method for identifying influential propagators in complex network

Country Status (1)

Country Link
CN (1) CN113723504B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114666229B (en) * 2022-03-21 2023-10-03 天津商业大学 Complex network node influence measuring method and system based on limited propagation domain

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034562A (en) * 2018-07-09 2018-12-18 中国矿业大学 A kind of social networks node importance appraisal procedure and system
CN110135092A (en) * 2019-05-21 2019-08-16 江苏开放大学(江苏城市职业学院) Complicated weighting network of communication lines key node recognition methods based on half local center
CN110909173A (en) * 2019-11-13 2020-03-24 河海大学 Non-overlapping community discovery method based on label propagation
CN111428323A (en) * 2020-04-16 2020-07-17 太原理工大学 Method for identifying group of key nodes by using generalized discount degree and k-shell in complex network
CN112148991A (en) * 2020-10-16 2020-12-29 重庆理工大学 Social network node influence recommendation method for fusion degree discount and local node

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100679250B1 (en) * 2005-07-22 2007-02-05 한국전자통신연구원 Method for automatically selecting a cluster header in a wireless sensor network and for dynamically configuring a secure wireless sensor network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034562A (en) * 2018-07-09 2018-12-18 中国矿业大学 A kind of social networks node importance appraisal procedure and system
CN110135092A (en) * 2019-05-21 2019-08-16 江苏开放大学(江苏城市职业学院) Complicated weighting network of communication lines key node recognition methods based on half local center
CN110909173A (en) * 2019-11-13 2020-03-24 河海大学 Non-overlapping community discovery method based on label propagation
CN111428323A (en) * 2020-04-16 2020-07-17 太原理工大学 Method for identifying group of key nodes by using generalized discount degree and k-shell in complex network
CN112148991A (en) * 2020-10-16 2020-12-29 重庆理工大学 Social network node influence recommendation method for fusion degree discount and local node

Also Published As

Publication number Publication date
CN113723504A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
Li et al. Hierarchy ranking method for multimodal multiobjective optimization with local Pareto fronts
Tang et al. A discrete shuffled frog-leaping algorithm to identify influential nodes for influence maximization in social networks
Pizzuti Evolutionary computation for community detection in networks: A review
Zhang et al. Membership inference attacks against recommender systems
CN113723503A (en) Method for identifying influential propagators in complex network
Guney et al. Exploiting protein-protein interaction networks for genome-wide disease-gene prioritization
Wei et al. Weighted k-shell decomposition for complex networks based on potential edge weights
CN113726567B (en) Method for identifying influential propagators in complex network
Gong et al. An improved memetic algorithm for community detection in complex networks
Kundu et al. Fuzzy-rough community in social networks
CN106951524A (en) Overlapping community discovery method based on node influence power
CN111428323A (en) Method for identifying group of key nodes by using generalized discount degree and k-shell in complex network
CN113723504B (en) Method for identifying influential propagators in complex network
Wang et al. Bayesian cognitive trust model based self-clustering algorithm for MANETs
Chang et al. Relative centrality and local community detection
Dong et al. TSIFIM: A three-stage iterative framework for influence maximization in complex networks
Yu et al. Unsupervised euclidean distance attack on network embedding
CN106327343A (en) Initial user selection method in social network influence spreading
Lu et al. Identifying vital nodes in complex networks based on information entropy, minimum dominating set and distance
CN112380456A (en) Condensation entropy based dynamic influence maximization method
Fu et al. An improved competitive particle swarm optimization algorithm based on de-heterogeneous information
CN115130044A (en) Influence node identification method and system based on second-order H index
Zhu et al. Roles of degree, H-index and coreness in link prediction of complex networks
Topîrceanu Competition‐Based Benchmarking of Influence Ranking Methods in Social Networks
Zhou et al. Improved community structure discovery algorithm based on penalised matrix decomposition for complex networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant