CN113723504A

CN113723504A - Method for identifying influential propagators in complex network

Info

Publication number: CN113723504A
Application number: CN202110999228.6A
Authority: CN
Inventors: 刘小洋; 叶舒; 张梦瑶
Original assignee: Chongqing University of Technology
Current assignee: Chongqing University of Technology
Priority date: 2021-08-28
Filing date: 2021-08-28
Publication date: 2021-11-30
Anticipated expiration: 2041-08-28
Also published as: CN113723504B

Abstract

The invention provides a method for realizing the identification of influential propagators in a complex network, which comprises the following steps: s1, arranging the first judgment scores in a descending order; s2, arranging the second judgment scores in a descending order; s3, selecting the node with the maximum first judgment score and the maximum second judgment score, and covering the node and the neighbor nodes; if the first judgment score is the maximum and the same, a plurality of nodes with the same second judgment score exist, and one node is randomly selected; s4, judging whether the number of the selected nodes meets the set value, if so, executing the next step, otherwise, executing the step S3; s5, obtaining a selected node set after the selection is finished; and S6, displaying the performance index. The invention can identify a group of nodes with the most influence in the complex network and display the indexes of the nodes.

Description

Method for identifying influential propagators in complex network

Technical Field

The invention relates to the technical field of network information mining, in particular to a method for identifying influential propagators in a complex network.

Background

Complex networks are ubiquitous in real life, such as social networks, biomolecular networks, citation networks, and transportation networks, however, only a part of nodes can have a large influence on the networks, and if the nodes are removed, the networks will be broken down. In recent years, identifying a group of influential nodes has attracted extensive attention from the complex network scientific community, and has great significance for real network applications such as information dissemination, epidemic control, rumor control, virus marketing, arresting key suspects of criminal networks, preventing catastrophic outages of power grids, and the like.

A number of classical Centrality methods based on the topology of the network are used to identify key nodes in the network, such as Degree Centrality (Degree Centrality), Closeness Centrality (Closeness Centrality), Betweenness Centrality (Betweenness Centrality), and Eigenvector Centrality (Eigenvector Centrality). The direct neighbor number of the node is calculated by the centrality, the more the neighbors of the node are, the more important the node is, but the centrality only considers the local information of the node. The approximate centrality is calculated by the reciprocal of the shortest distance from one node to all other nodes, and the calculation complexity is high due to the consideration of global information. The betweenness centrality calculates the quantity of the shortest paths passing through one node, and has higher calculation complexity. The more important a node's neighbor nodes are considered by feature vector centrality, the more important the node is. Kitsak et al consider nodes located at the core of the network to be more influential, and propose a k-shell decomposition method that effectively identifies the most influential single node. The k-shell decomposition method endows nodes with different propagation capacities with the same k-shell index, only considers the surplus degree, but does not consider the connection connected to the deleted node, and is a coarse-grained centrality method. Based on this, a number of centrality algorithms have been proposed to improve the performance of the k-shell decomposition method. With the extended H-index, lu et al propose H-index centrality, and the H-index of a node is defined as that the node has at least one neighbor and the degree of each neighbor is not less than. ClusterRank combines the degree centrality and the clustering coefficient to judge the influence of the nodes. In recent years, information entropy has been proposed for evaluating the importance of nodes in a network. Nie et al propose the use of Mapping Entropy (ME) to identify key nodes in a network, taking into account the correlation between all node neighbors. Combining, nuclear and Entropy, Sheikhahmadi et al proposed a hybrid Core, Degreee and Encopy, MCDE, to order nodes. Ji et al propose the use of bleed-through to identify dispersed propagation nodes in a network.

Kempe et al prove that the influence maximization problem is an NP-complete problem, and a greedy hill climbing algorithm is proposed, but the complexity is high, and the method is not suitable for large-scale networks. Based on this, many researchers have focused on heuristic algorithms in recent years. Chen et al propose a degree discount heuristic algorithm, which considers that when the neighbors of a node are selected as seed nodes, the degree of the node should be given a certain discount. A large number of highly centralised nodes, selected by centralised algorithms, are mostly clustered together, which may lead to overlapping propagation impact ranges. Based on the idea of selecting nodes in a decentralized manner, cao et al propose a Core Coverage Algorithm (CCA), and select the node with the highest shell and the highest degree according to the Core coverage Algorithm, and Cover the neighbor nodes in each round. Since the CCA covers all neighbor nodes, Yang et al propose a Heuristic Algorithm (neighbor core coverage and discovery respiratory Algorithm, NCCDH) of Neighborhood core number covering and discounting, select the node with the largest Neighborhood core number in each round, and each round only covers the neighbor nodes of the same shell, and the other neighbor nodes Discount off the corresponding k-shell values. Based on a voting selection strategy, Zhang et al propose a voting ranking algorithm, which gives each node a voting score and a voting ability, but the algorithm only considers local information and gives each node the same voting ability, and the contributions of neighboring nodes cannot be distinguished.

Disclosure of Invention

The invention aims to at least solve the technical problems in the prior art, and particularly creatively provides a method for identifying influential propagators in a complex network.

In order to achieve the above object of the present invention, the present invention provides a method for identifying influential propagators in a complex network, comprising:

s1, arranging the first judgment scores in a descending order;

s2, arranging the second judgment scores in a descending order;

s3, selecting the node with the maximum first judgment score and the maximum second judgment score, and covering the node and the neighbor nodes; if the first judgment score is the maximum and the same, a plurality of nodes with the same second judgment score exist, and one node is randomly selected;

s4, judging whether the number of the selected nodes meets the set value, if so, executing the next step, otherwise, executing the step S3;

and S5, obtaining the selected node set after the selection is finished.

And S6, displaying the performance index.

In a preferred embodiment of the present invention, when the first determination score is k-shell and the second determination score is VoteRank, the method for identifying an influence propagator includes the steps of:

s1, calculating the k-shell value of each node according to a k-shell decomposition method, and arranging the k-shell values in a descending order according to k-shells;

s2, calculating the voting score of each node according to a VoteRank algorithm, and arranging the voting scores in a descending order;

s3, selecting the node with the largest k-shell value and the highest VoteRank value, and covering the node and the neighbor nodes thereof; if a plurality of nodes with the same VoteRank value exist in the maximum and same k-shell value, one node is randomly selected;

s4, judging whether the number of the selected nodes is equal to the set number, if so, executing the next step; if not, jumping to execute S3;

and S5, obtaining the selected node set after the selection is finished.

Or/and when the first judgment score is k-shell and the second judgment score is H-index, the method for identifying the influential propagator comprises the following steps:

s2, calculating the H-Index value of each node, and sorting the H-Index values in a descending order;

s3, selecting the node with the largest k-shell value and the highest H-index value, and covering the node and the neighbor nodes thereof; if a plurality of nodes with the same H-index value exist in the maximum and same k-shell value, one node is randomly selected;

and S5, obtaining the selected node set after the selection is finished.

Or/and when the first judgment score is k-shell and the second judgment score is NH-index, the method for identifying the influential propagator comprises the following steps:

s2, calculating the neighborhood H-Index of each node, and arranging the neighborhood H-Index in descending order according to the H-Index value;

s3, selecting a node with the largest k-shell value and the highest neighborhood H-index, and covering the node and the neighbor nodes; if a plurality of nodes with the same neighborhood H-index value exist in the maximum and same k-shell value, one node is randomly selected;

and S5, obtaining the selected node set after the selection is finished.

Or/and when the second judgment score is VoteRank, the method for identifying the influence propagator comprises the following steps:

s1, calculating the H-Index value of each node, and sorting the H-Index values in a descending order;

s3, selecting the node with the maximum H-Index value and the maximum VoteRank value, and covering the node and the neighbor nodes thereof; if a plurality of nodes with the same VoteRank value exist in the maximum and same H-Index value, one node is randomly selected;

and S5, obtaining the selected node set after the selection is finished.

In a preferred embodiment of the present invention, the node distinguishing formula of the k-shell value includes:

wherein k is_sRepresenting the same k-shell value from the network,

represents the maximum k-shell value of the network, d_ijRepresenting the shortest distance from the node i to the node J, wherein J represents a network core node set, namely the node with the highest k-shell value;

representing a set of nodes having the same k-shell value.

In a preferred embodiment of the present invention, the VoteRank comprises:

each node can get a vote from the neighbors, each node V ∈ V will be given a tuple (S)_v,Va_v)，S_vIndicating the voting score, Va, obtained by node v from the neighboring nodes_vRepresenting the voting ability of node v to neighbor nodes, S_vCan be expressed as

Where N (v) represents the direct neighbor set of node v, Va_iRepresenting the voting ability of node i to neighbor nodes.

In a preferred embodiment of the present invention, the H-index includes:

the H-index of node i is defined as:

wherein H (-) is a function representation of the node H-index, and the degree of the neighbor node of the node i is

In a preferred embodiment of the present invention, the domain H-index includes:

the neighborhood H-index of node i is defined as:

where N (i) represents the direct neighbor set of node i, i.e. all neighbor nodes included, h_jRepresenting the H-index of node j.

In a preferred embodiment of the invention, the performance indicator comprises SIR and/or average shortest path L_sAnd the SIR comprises:

firstly, setting an initially selected seed node as an infection state, setting all other nodes in a network as susceptible states, and infecting the susceptible nodes in the direct neighborhood of each infected node by each infected node according to the probability of beta at each time step;

meanwhile, each infected node can be changed into a recovery state with the probability of gamma, and the node which is changed into the recovery state can not be infected any more; has a differential equation of

The infection probability beta cannot be too small or too large, and if the infection probability beta is too small, the infectious disease cannot successfully infect the whole network and even cannot spread; if β is too large, the infectious disease can infect almost the entire network, and the influence between different nodes cannot be distinguished, making the comparison less meaningful. So that beta is selected to be above the propagation threshold beta_min. The infection rate is defined as

Wherein S represents susceptibility, I represents infection, and R represents recovery.

In a preferred embodiment of the present invention, the SIR further includes:

infection scale f (t):

infection Scale F (t)_c)：

Wherein n is_I(t)And n_R(t)Respectively representing the number of infected nodes and the number of recovery nodes at time t, and n representing the total number of nodes in the network.

A larger f (t) indicates more infected nodes at time t, and a larger influence, the algorithm performance is better, and a shorter t indicates a faster propagation speed.

In the infection process, the number of nodes which are changed from the infection state to the recovery state is gradually increased at each time step, and finally the peak value is reached, namely the stable state.

In a preferred embodiment of the invention, the average shortest path L_sThe method comprises the following steps:

the average shortest path length between the selected node set S is defined as:

where | S | represents the cardinality of a finite set, i.e. the length of the set S, |_u,vRepresents the shortest path length from node u to node v, a larger L_SThe seed nodes representing the selection are more distributed, and the propagation influence can be maximized.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

1) the global position information of k-shell decomposition and the local voting information of the VoteRank method are combined, the most popular nodes located in the core position of the network are considered to be the most important, and a new KVoteRank method is provided for identifying a group of most influential nodes in a complex network.

2) Aiming at the problems that only local network information is considered in the H-index and both the H-index and the k-shell are too coarse-grained, the KHIndex method covering the neighborhood and having the maximum H-index node is provided by taking the k-shell as a first judgment score and taking the H-index as a second judgment score, which is inspired by a CCA method; by utilizing the second-order neighborhood H-index, an expansion method KNHINdex is further provided to select a group of most influential propagators in the complex network, and the new expansion KNHINdex method further considers network local information.

3) The characteristic that H-index is used as an intermediate state of degree and core number is utilized, the H-index is used for approximately replacing k-shell, the H-index is used as a first judgment score, VoteRank is used as a second judgment score, and the novel method for identifying the HVoteRank by the key nodes of the complex network is provided.

4) Based on the tests of the mixed neighborhood covering methods KVoteRank, KHIndex, KNHINdex and HVoteRank of the k-shell, H-index and VoteRank methods, the improved KNC method of neighborhood covering of the k-shell is provided for identifying a group of most influential propagators in a complex network; meanwhile, local and global information in the network is considered, and the problem that the k-shell decomposition method cannot distinguish the same shell node and H-index and VoteRank only consider local neighbor information is solved.

5) The proposed KNC method is comprehensively evaluated by using an SIR model and the average shortest path length, and simulation experiments are carried out on 8 real networks such as Jazz and the like, so that the infection scale of the initial node set selected by the proposed KNC method is better than that of the existing baseline method under different infection rates and different initial node proportions, and the selected initial node set is more dispersed.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic diagram of a specific embodiment of the KVoteRank algorithm of the present invention.

Fig. 2 is a schematic diagram of the SIR propagation process of the present invention.

FIG. 3 is a schematic illustration of the infection size for different initial node ratios according to the present invention.

FIG. 4 is a comparative illustration of the scale of infection according to the invention.

FIG. 5 is a schematic representation of the infection scale for different infection rates of the present invention.

Fig. 6 is a schematic diagram of the shortest path length of the initial node set according to the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

1. Related work

Given a network G (V, E), V and E represent the set of nodes and the set of edges, respectively, n ═ V | represents the number of nodes in the network, m ═ E | represents the number of edges in the network, where | · | represents the cardinality of the set, and a ═ { a ═ a ·_ijDenotes the adjacency matrix of diagram G when a_ij1 denotes a node v_iAnd node v_jThere is an edge in between, otherwise, a_ijNode v is shown as 0_iAnd node v_jThere is no edge in between. N (v) represents the direct neighbor number set for node v.

1.1 k-Shell centrality

The k-shell method considers that the position of a node is more important than the neighbor of the node, and a node is positioned in the core position of a network and has higher propagation influence even if the node is slightly positioned. The algorithm firstly removes nodes with the degree of 1, the degree value is reduced in the removing process, and the nodes with the degree value equal to 1 are continuously removed until no nodes with the degree of 1 exist in the network, and all the removed nodes are allocated to be 1-shells; and then removing the nodes with the surplus degree of 2, and iteratively removing the nodes with the surplus degree value less than or equal to 2 until the surplus degrees of all the nodes are greater than 2, assigning 2 shells to all the removed nodes, and repeating the process until all the nodes in the network are assigned corresponding shells. However, the k-shell method allocates a large number of nodes with the same k-shell value, and the nodes have different propagation capacities, which is a coarse-grained method, and as the shell is increased, the node aggregation degree is higher, and the propagation influence ranges of the nodes may overlap.

1.2 Mixed Degree Decomposition (MDD)

k-shell Decomposition considers only the connections of the remaining nodes at a time, and completely ignores the edges connected to the deleted nodes, Zeng et al propose a Mixed Degree Decomposition (MDD) method, taking into account the remaining Degree and the exhaustion Degree of the nodes, and the centrality of the mixing Degree of each node is expressed as

k^m(v)＝k^r+λ*k^e (1)

Wherein k is^rRepresenting the degree of remainders of the nodes, i.e. the edges, k, connected to the remaining nodes^eAnd the degree of exhaustion of the node is represented, namely the edge connected to the deleted node, and the lambda is an adjustable parameter with the value between 0 and 1. When λ is 1, the degree of exhaustion of the node is fully considered, in which case the MDD method is equivalent to centrality, and when λ is 0, the MDD method only considers the remaining degree, in which case the MDD method is also equivalent to the k-shell decomposition method.

1.3 Neighborhood core centrality (neighborwood core centrality)

Neighborhood kernel Number (NC) was proposed by Bae et al, which considers that a propagator has more neighbors in the core of the network, and is more influential. The method improves the degeneracy of the k-shell method, and balances the degree and the position relation of the nodes. Define the neighborhood kernel number of node v as

Where N (v) represents the direct neighbor set of node v, and ks (w) represents the k-shell metric of neighbor node w. Bae et al also propose an extended Neighborhood kernel number (NC +) that considers the second-order Neighborhood of a node, with the extended Neighborhood kernel number of node v defined as

Wherein C is_nc(w) represents the number of neighborhood cores of the neighbor node w.

1.4 improved k-Shell index

In consideration of the distance between the target node and the core position of the network, Liu et al propose an improved k-shell method for distinguishing nodes of the same shell. The method considers that the closer the node with the same k-shell value is to the core position of the network, the greater the propagation influence is. The node distinguishing formula of the same k-shell value is as follows

Wherein k is_sRepresenting the same k-shell value from the network,

represents the maximum k-shell value of the network, d_ijAnd J represents the shortest distance from the node i to the node J, and represents the network core node set, namely the node with the highest k-shell value.

Representing a set of nodes having the same k-shell value.

1.5 H-index

The iterative k-shell decomposition process requires topology information of the network global, which limits its application in large-scale networks, while H-index is a local metric considering network section information (i.e., degree of neighbor), and was originally used to evaluate the academic output quantity and academic output level of researchers. The H index of a person means that in N papers published by the person, H papers are respectively quoted at least H times, and the quoted times of the rest N-H papers do not exceed H times. In network science, a node v is defined_iH-index of satisfying at least H neighbor nodes for HAnd (4) counting the degree of each neighbor node to be not less than h.

1.6 VoteRank

Zhang et al introduced the VoteRank method to identify a set of scattered propagators in a network. In real life, if a already supports B, the support of a to others is weakened. VoteRank selects nodes one by one according to the voting scores of the neighbor nodes, and when a certain node is selected as the most influential node, the voting capacity of the neighbor node is weakened, and the selected node does not participate in the next round of voting. VoteRank method assigns each node a tuple(s)_v,va_v) Wherein s is_vRepresents the voting score, va, obtained by the node v from the neighboring nodes_vRepresenting the voting ability of node v. VoteRank may be composed of four parts:

step 1: and (5) initializing. All tuples are initialized with (0,1), i.e. the voting score of each node is 0 and the voting capacity of each node is 1.

Step 2: and (6) voting. And each node obtains a voting score according to the sum of the voting capacity of the neighbor nodes. The node with the highest voting score and not selected is selected as the most influential node. To avoid the node being selected again, the voting score of the node will be set to 0. To ensure that the node does not participate in the next vote, the voting capability of the node is also set to 0.

And step 3: and (6) updating. In order to obtain a more dispersed seed node set, after the seed node v is selected in the step 2, in the next round, the voting capacity of the neighbor node of v needs to be discounted, and if u is the neighbor node of the node v, va is_u＝va_u-f，(va_u> 0) or va_u＝0，(va_u0) or less), wherein

< k > is the mean degree of the nodes in the network, va_uIndicating the voting capability of node u.

And 4, step 4: and (4) repeating the

steps

2 and 3 until the number of the selected nodes meets the set number requirement.

The VoteRank centrality has higher accuracy compared with the degree centrality, the H index, the k-shell and the like, but the VoteRank does not consider the position distribution of the nodes in the network, and in reality, even if the voting score is lower, the nodes positioned in the center of the network can generate larger influence.

2. Proposed method (deployed method)

VoteRank centrality selects a set of influential propagators based on a voting mechanism, each node can obtain a vote from the neighbors, each node V ∈ V will be given a tuple (S)_v,Va_v)，S_vIndicating the voting score, Va, obtained by node v from the neighboring nodes_vRepresenting the voting ability of node v for a neighbor node. S_vCan be expressed as

And H-index, namely H index, is defined as H, the node has at least H neighbors, and the degree of each neighbor is not less than H. For network G (V, E), the degree of node i is defined as k_iDegree of its neighbor node is

Wherein the content of the first and second substances,

1 st neighbor node j representing node i₁The degree of (a) is greater than (b),

2 nd neighbor node j representing node i₁The degree of (a) is greater than (b),

k-th representing node i_iDegree of each neighbor node.

The H-index of node i is defined as

Where H (-) is a functional representation of the node H-index.

Inspired by the centrality NC of the neighborhood kernel number, the expanded neighborhood H index is given, the second-order neighborhood H-index is considered, and the neighborhood H-index of the node i is defined as

The KNC method is based on two judgment scores, wherein the first judgment score is a main judgment condition, the second judgment score is a secondary judgment condition, uncovered nodes which meet the conditions that the first judgment score is the highest and the second judgment score is the highest are selected as seed nodes each time, and after each round of node selection, direct neighbor nodes are covered to dispersedly select the nodes, so that the propagation influence of the nodes is maximized. The invention mainly describes a KVoteRank method which is a sub-method of a KNC method.

The VoteRank centrality does not consider local information of the nodes, does not consider the difference of neighbor nodes, and does not consider the position information of the nodes. Inspired by the Core Coverage Algorithm (CCA), the proposed kvetrank Algorithm considers the nodes located at the Core of the network and most popular (i.e., highest value of the votemank) to have a greater propagation impact. Nodes located in the core of the network have a large impact even if the VoteRank value is small. Here, the KVoteRank algorithm procedure is given. The execution process of the KVotemRank algorithm is divided into 4 stages:

stage 1: and calculating the k-shell value of each node according to a k-shell decomposition method, and arranging the k-shell values in a k-shell descending order.

And (2) stage: and calculating the voting score of each node according to a VoteRank algorithm for the same k-shell value, and arranging the voting scores in a descending order.

And (3) stage: and selecting the node with the highest VoteRank value, and covering the node and the neighbor nodes thereof. And if a plurality of nodes with the same VoteRank value exist in the same shell, randomly selecting one node.

And (4) stage: and repeating the step 3 until the number of the selected nodes meets the set number requirement.

The algorithm can solve the problem that the influence of the same shell node cannot be distinguished by a k-shell decomposition method, and the nodes of the same shell layer are distinguished according to the voting scores of the nodes. To avoid overlapping of the influence ranges, the seed nodes are selected dispersedly at stage 3 of the proposed algorithm, i.e. after each selection of a seed node, the node and its immediate neighbors are covered. The KVoteRank algorithm is described as shown in algorithm 1.

TABLE 1KVoteRank Algorithm

2-4 rows of the KVoteRank algorithm represent a stage 1, calculation of a node k-shell value is carried out, 5-8 rows represent a stage 2, calculation of a voterank score of each node is carried out, 9-17 rows represent a stage 3 and a stage 4, dispersed seed nodes with the highest shell and the highest voting score are selected, namely when the seed nodes are selected, the k-shell value where the seed nodes are located is a main judgment condition, the voting score is a secondary judgment condition, and in order to enable propagation ranges not to overlap, the nodes with the most influence are dispersedly selected by covering neighbor nodes of the selected nodes.

Considering that the H-index and the VoteRank also consider information of neighbor nodes, namely the centrality of local information is considered, the larger the H-index is, the larger the social influence is, and by combining the H-index and the k-shell methods, the most influential power of the node which is positioned at the core position of a network and has the highest H index (large influence) is considered, and an H-index (KHIndex) method based on k-shell decomposition is provided, wherein the first judgment score is k-shell, and the second judgment score is H-index. And further considering local information of the nodes, expressing the semi-local information by using a neighborhood H-index, equivalently considering a 4-order neighborhood of the nodes, and providing an extended KNHINdex algorithm of the neighborhood H-index based on k-shell decomposition, wherein a first judgment score of the KNHINdex algorithm is k-shell, and a second judgment score is the neighborhood H-index. For the KHIndex (and KNHINdex) algorithm, stage 2 is to calculate the H-Index (and neighborhood H-Index) of each node, and stage 3 is to select the node with the largest k-shell value and the highest H-Index (neighborhood H-Index) for each round. Lu et al indicate the relationship of H-index to degree and kernel number, with degree being the initial state, H-index being the intermediate state, and kernel number being the steady state. Therefore, the KNC method considers that the H-index is approximately substituted for k-shell, namely H-index is used as a first judgment score, and the VoteRank score is used as a second judgment score to select nodes with high influence, so that the VoteRank method based on the H-index is named as HVoteRank. The H-index is also a coarse-granularity method, the H-index is used as a first judgment score, the VoteRank is used as a second judgment score, the node with the largest H-index and the highest voting score is selected each time, after each round of node selection, direct neighbor nodes are covered to dispersedly select the nodes, and the algorithm process is similar to that of the KVoteRank algorithm. Table 2 shows the judgment scores of KVoteRank, KHIndex, KNHINdex and HVoteRank4 sub-methods of the KNC method.

TABLE 2 KNC method judgment scores

Judgement condition	KVoteRank	KHIndex	KNHIndex	HVoteRank
					Main score	k-shell	k-shell	k-shell	H-index
Second score	VoteRank	H-index	NH-index	VoteRank

To intuitively explain the proposed algorithm, a KVoteRank algorithm process diagram (see fig. 1) is given, and the process of the proposed KVoteRank algorithm selecting the 3 most influential seed nodes is simply visualized, KHINdex, hnhidex, HVoteRank are similar to KVoteRank and are not given here.

According to the KVoteRank algorithm process diagram of FIG. 1, first, the k-shell value of each node in the network is calculated, and different colors represent that the nodes are on different shells. The node set {1,2,3,4} is in 3-shell, {5,6,7,8} is in 2-shell, and the rest nodes are in 3-shell; then, the voting score of each node is calculated, the k-shell value and the VoteRank score of each node are marked in the graph, and the k-shell value, the VoteRank score, the H-index and the neighborhood H-index of all the nodes are shown in the table 3.

TABLE 3 VoteRank scores and H-index for nodes in the network

As shown in the KVoteRank algorithm process fig. 1, according to the KVoteRank algorithm, the node 1 with the highest 3-shell and voting score (8) is selected first, and then the direct neighbor node set {2,3,4,5,6,7} of the node 1 is covered, so that the first round of selection is finished. Then node 8 with a voting score of 3.16 at 2-shell is selected and the uncovered immediate neighbors {25,26} of node 8 are covered, and the second round of selection ends. Then node 23 with a voting score of 7.58 at 1-shell is selected and its neighbor nodes {16,17,18,19,20,21,22} are covered. The selection process of the 3 seed nodes (seed nodes are the most influential propagators selected) is thus ended, and the set of selected seed nodes is {1,8,23 }. For the KHIndex algorithm, according to the score table 3, first, the node 4 is selected, then the neighboring nodes {1,2,3,5,8} are covered, then one of the nodes 6,7 is selected randomly, and it is assumed that the node 6 is selected randomly and the neighboring node 7 is covered; finally, selecting a node {15,17,23,25}, and assuming that the node 15 is selected, then covering the neighbor nodes {13,14,17}, the selection of the 3 seed nodes is completed, and the KHIndex algorithm selects the 3 most influential nodes as {4,6,15 }. The process is similar to the KVoteRank algorithm for knhidex and HVoteRank algorithms, except for the first and second decision scores.

Here, a simple test of propagation process is performed on k-shell and KVoteRank selecting 15 nodes in CEnew network (453 nodes), and the result is shown in fig. 2, where blue nodes represent susceptible nodes, red nodes represent infected nodes, and green nodes represent immune nodes.

Simply comparing the k-shell decomposition method with the proposed KVoteRank algorithm in the SIR propagation process diagram of fig. 2 using the SIR propagation model, it can be seen that the initial nodes selected by the k-shell algorithm are somewhat clustered with each other, while the initial nodes selected by the KVoteRank algorithm are scattered with each other, and at time step 10, the KVoteRank algorithm infects 11 more nodes than the k-shell method.

3. Data set

To demonstrate the superiority of the performance of the proposed KNC method, this experiment used 8 real network datasets of different types and sizes. Most of the data set comes from a database SNAP of a social graph, compiled by the college and students of the stanford university. Wherein, 1) Jazz: the data set recorded a jazz band performing between 1912 and 1940; 2) CEnew: the CEnew dataset is an edge list of caenorhabditis elegans metabolic networks. 3) Crimes: the data set is a crime network, the nodes represent a person, and the edges represent two criminals participating in a crime together; 4) email: recording the relation of mail exchange among users of the university of Rovila; 5) hamster: define friendship and family contact between "www.hamsterster.com" website users; 6) Ca-GrQc: the dataset is a collaborative network from electronic printing arXiv, encompassing scientific collaboration between authors of papers submitted to the broad relativistic and quantum-cosmology categories. 7) Condmat: is a network of collaborators based on the concentrated material part of the electronic version arXiv filed 1995 to 1999. 8) And (3) Enrons: secure email interaction in a secure community, including information about more than one million emails. The detailed topology information description of Jazz et al 8 real networks is shown in table 4.

TABLE 4 network topology

Network	n	m	<k>	k_max	<d>	<c>	β_min
								Jazz	198	2742	27.697	100	2.235	0.6175	0.0266
CEnew	453	2025	8.94	237	2.664	0.646	0.0256
								Crimes	829	1473	3.554	25	5.04	0.008	0.1960
Email	1133	5451	9.622	71	3.606	0.2540	0.0565
								Hamster	2426	16631	13.711	273	3.67	0.538	0.0241
Ca-GrQc	4158	13422	6.456	81	6.049	0.665	0.0589
								Condmat	23133	93497	8.083	281	5.352	0.633	0.0475
Enron	33696	180811	10.732	1383	4.025	0.708	0.0071

In table 4, the network topology table, n represents the total number of nodes in the network, and m represents the number of edges in the network;

representing the mean, k, of nodes in the network_maxRepresenting maximum of nodes in a networkDegree, < d > represents the average shortest path length of the network,

mean clustering coefficient, I, representing the network_iRepresents the number of edges between the direct neighbors of node i; beta is a_minFor propagating the threshold, here can be made of

And (4) calculating. Where N (i) represents the set of direct neighbors of node i, | N (i) | represents the cardinality of set N (i), k represents the degree of the node in the network,<·>an averaging operation.

4. Performance index

4.1SIR epidemic model

An infectious disease model of susceptibility-induced-Recovered (SIR) was used to evaluate the performance of the proposed method. In the SIR model, nodes have three states, susceptibilities (S), infections (I) and recoveries (R). Wherein the susceptibility state indicates that the class node is susceptible to disease and information, the infection state indicates that the class node has been infected by disease or activated by information, and the recovery state indicates that the node has recovered and is not to pass information or disease. Firstly, setting an initially selected seed node as an infection state, setting all other nodes in the network as susceptibility states, and at each time step, infecting a susceptible node in a direct neighborhood of each infected node with a probability of beta. Meanwhile, each infected node becomes a recovery state with a probability of gamma (gamma represents the recovery probability), and the node which becomes the recovery state can not be infected any more. Has a differential equation of

The infection probability beta cannot be too small or too large, and if the infection probability beta is too small, the infectious disease cannot successfully infect the whole network and even cannot spread; if β is too large, the infectious disease can infect almost the entire network, and the influence between different nodes cannot be distinguished, making the comparison less meaningful. So that beta is selected due to highAt propagation threshold β_minThe propagation threshold for each network is given in column 8 of table 4. In this experiment, the infection rate was defined as

Due to the randomness in the model, the experimental results should be averaged through simulation for multiple times.

Wherein the infection probability beta is the probability that an infected node can infect a susceptible node in a neighbor at a time step, and the lambda is the infection rate, namely the ratio of the infection probability to the recovery rate, which can be understood as the infection capability of the SIR model and the infection probability under the recovery condition.

The performance of the algorithm can be measured by measuring the transmission capability of the node, and the transmission capability can be measured by the infection scale F (t) at the time t and the final infection scale F (t)_c) To indicate. The size of the infection represents the influence of selecting a node at time t, defined as

Wherein n is_I(t)And n_R(t)Respectively representing the number of infected nodes and the number of recovery nodes at the moment t, wherein n represents the total number of nodes in the network, and larger F (t) indicates that more infected nodes are arranged at the moment t, the influence is larger, the algorithm performance is better, and shorter t indicates that the propagation speed is faster.

In the infection process, the number of nodes which are changed from the infection state to the recovery state is gradually increased at each time step, and finally the peak value is reached, namely the stable state. Final infection Scale F (t)_c) I.e. the proportion of the total number of recovery nodes indicates the ultimate impact of the initially selected seed node, defined as

Therefore, F (t) evaluates the propagation influence of the node at time t, F (t)_c) Evaluating the node propagation at SIRPropagation influence when the range reaches steady state.

4.2 average shortest Path L_s

For the selected seed nodes, if the seed nodes are clustered together like centroidinity or k-shell centrality, the propagation influence ranges overlap, so it is easier to select scattered seed nodes to expand the propagation influence ranges. By measuring the average shortest path L between selected seed nodes_sThe dispersion degree of the selected nodes is measured, and the performances of different algorithms are compared. The length of the average shortest path between the selected seed node set S is defined as

5. Results and analysis of the experiments

In order to test the effectiveness of the KVoteRank, KHIndex, KNHINdex and HVoteRank4 seed algorithms of the KNC method, simulation analysis is carried out on 8 real networks such as Jazz and the like by using an SIR model and an average path length according to the performance indexes provided in section 5, and the KVoteRank, KHIndex, KNHINdex and HVoteRank are further compared with the centrality algorithms of 7 baselines such as Degree, k-sehll, NC +, PageRank, H-index and VoteRank, and the performances of the provided 4 algorithms are compared at the same time.

5.1 SIR model simulation analysis

The performance of different algorithms is judged according to the final infection scale under the condition of different initial seed node scales, different proportions are adopted when the initial propagation seed nodes are selected in view of different scales of the network, and a larger initial proportion is given to the network with a smaller scale. For the networks Jazz, CEnew, Crimes, Email, Hamster, Ca-GrQc, the initial seed node proportion is set to be 0.03 at most, and for the scaleLarger networks, Condmat and Enron, with the initial seed node ratio set to 0.003 maximum. The probability of infection beta is set to 1.5 beta_minFinal infection scale F (t) for each algorithm at different initial seed node ratios_c) As shown in the different initial node scale diagrams of fig. 3. The x-axis represents the different initial seed node ratios p, the y-axis represents the final infection scale at each ratio, and the subgraph in fig. 3(a) separately gives the infection scales of the proposed KVoteRank, KHIndex, knhiddex and HVoteRank4 methods at different initial node ratios. The experimental results were obtained by averaging 300 experiments.

From the different initial node proportional infection scale in fig. 3, it can be seen that the proposed 4 methods KVoteRank, KHIndex, knhidex and HVoteRank all achieved satisfactory results on 8 networks, demonstrating the superiority of the proposed method. When the initial p is small, the proposed 4 KNC methods are equivalent to other benchmark methods, but as p increases, the resulting 4 KNC methods are gradually superior to other benchmark algorithms in the final infection scale. The 4 proposed methods performed best in all 8 networks, especially on the network CEnew, Hamster and large scale network acron, with knhiddex performing slightly worse in the small scale network Jazz and HVoteRank performing slightly worse in the CEnew network. For example, in the CEnew network, the KHIndex method infects more than 12% of nodes with an initial node ratio of 0.02 only, whereas the baseline method does not reach this scale at an initial node ratio of 0.03. In the Hamster network, the proposed KVoteRank method infects about 3.5% more nodes than the benchmark method. In all networks, the performance of the proposed KVoteRank, KHIndex, KNHINdex and HVoteRank algorithms is comparable, but the performance is better than that of the benchmark 7 centralities, and the effectiveness of the proposed method is proved.

In order to verify the propagation scale and propagation speed of seed nodes selected by different algorithms, a time step experiment is used to verify the performance of the different algorithms, and a fixed number of initial seed node ratios are set to maintain the consistency of the experiment. For the smaller scale networks Jazz, CEnew, Crimes, Email, Hamster, Ca-GrQc, the initial seed node ratio was 0.03, and for the larger scale networks Condmat and Enron, the initial seed node ratio was 0.003. The experimental results were obtained by averaging 1000 experiments. The results of F (t) as a function of time are shown in the comparative infection scale plot of FIG. 4, with the axis representing the time step and the axis representing the scale of infection over time.

As can be seen from the infection scale comparison chart of the initial node set in FIG. 4, compared with seven baseline algorithms such as Degree, k-shell and the like, the proposed KVoteRank, KHIndex, KNHINdex and HVoteRank4 algorithms always reach the highest peak, i.e., the infection scale is the largest, and always reach the most stable state, i.e., the infection speed is the fastest. HVoteRank performs best in Jazz, CENew, Hamster, Ca-GrQc, Condma, and Enron networks, especially in CENew, Hamster, and Enron networks. HVoteRank is about 3.5% higher than the worst k-shell method for CEnew networks, 2.5% higher than the best H-index, about 3.8% higher than the best voterarank method for Hamster networks, about 0.625% higher than the benchmark method for large scale networks, khondex and knhiddex perform equivalently to HVoteRank, and KVoteRank performs slightly worse. KVoteRank performs best in Crimes networks, and HVoteRank performs slightly worse in the 4 proposed methods, but significantly better than the other benchmark methods. In Email networks, the performance of the proposed 4 KNC methods is essentially equal, about 2.2% higher than the benchmark method. In all networks, the k-shell method and NC + perform the worst, which can be explained by the overlap of the propagation impact ranges, as the seed nodes selected by the k-shell method cluster with each other. The proposed 4 KNC methods, KVoteRank and the like, can always reach a stable state with the least time, and the 4 methods, KVoteRank and the like, have slight differences in performance but are obviously superior to other 7 reference methods.

In addition to the proportion of the initial seed nodes selected, the infection rate also affects the node transmission process, different transmission capacities are represented, the infection probability is set to change from 1.0 to 2.0, the transmission influence of the nodes is observed, the infection scales of different algorithms under different infection probabilities are shown in fig. 5, and the experimental results are obtained by averaging 300 experiments. The axis indicates the infection rate, the axis indicates the infection scale at different infection rates, and the subgraph in fig. 5(a) gives the infection scale for the proposed four methods KVoteRank, KHIndex, knhidex and HVoteRank at the same infection probability in the Jazz network separately.

Observing the infection scale graph of different infection rates in fig. 5, the proposed KVoteRank, KHIndex, knhidex and HVoteRank methods show better performance on 8 networks and are all better than other 7 reference methods, especially on CEnew and Hamster networks, the performance is obviously better than other methods, the proposed HVoteRank algorithm can infect about 2.9% more on CEnew networks than the reference methods at an infection rate of 2.0, and the KVoteRank algorithm is about 3.5% higher than the reference methods in Hamster networks. All proposed methods performed slightly worse in the networks Jazz and Condmat. In networks Jazz and Email, the proposed method is comparable to other methods when the infection rate is small, with the size of the infection gradually increasing with increasing, because the messages may not be successfully spread due to the small infection rate. Experiments show that the KNC sub-methods KVoteRank, KHIndex, KNHINdex and HVoteRank have stronger generalization capability compared with the reference method.

5.2 mean Path Length analysis

The k-shell method tends to select a single most influential node, but if a group of most influential nodes is selected, the k-shell method performs poorly because the high-shell nodes of the k-shell method are clustered together with each other, resulting in overlapping propagation influence ranges, and the influence is not maximized. In general, the more dispersed the set of initial seed nodes selected, the more the propagation impact can be maximized. The average path length of a set of nodes is used to measure the distance between the set of initially infected seed nodes. Fig. 6 shows the shortest path lengths between seed node sets selected by different algorithms of network Jazz and Crimes, and the rest shortest paths are shown in the initial node set shortest path table in table 5.

Observing the shortest path length graph of the initial node set in fig. 6, it can be seen that the average shortest path of the nodes selected by the KNC method proposed by network Jazz is larger, but the distance is smaller than that of the CCA method, and in the Crimes network, the average shortest paths of KVoteRank and knhiddex are the largest, which indicates that the more dispersed the selected seed nodes are, the more likely the propagation influence is to be maximized. The initial node set shortest path length table 5 gives the average shortest path length between initial node sets selected by different networks under different algorithms. To maintain experimental consistency, the initial node set ratio is set to 0.3 for small-scale networks Jazz, 0.03 for smaller-scale networks CENew, Crimes, Email, Hamster, and Ca-GrQc, 0.003 for larger-scale networks Condmat and Enron, and the average shortest path length of the initial node set selected by the proposed methods KHIndex, KNHINdex, HVoteRank, and KVoteRank, respectively, are shown in columns 7-10 of Table 5.

TABLE 5 average Path Length for initial node set

It can be seen from table 5 of the initial node set shortest path length table that, besides the network Ca-GrQc, the average shortest path lengths of the proposed KNC sub-methods KVoteRank, KHIndex, knhitex and HVoteRank are all significantly larger than those of other reference methods, and it also indicates that the proposed method does not simply select scattered nodes, but selects scattered nodes according to the propagation capacity. Generally, the more scattered the selected initial seed nodes are, the more information can be transferred to the whole network, so the average shortest path length is generally used as an evaluation index, but the quality of the algorithm performance cannot be absolutely described. To compare the algorithm execution efficiency, the run times of the different algorithms were tested, and the run times of the different methods in the 8 networks are given in the run time table 6.

TABLE 6 run time(s)

Methods	Jazz	CEnew	Email	Crimes	Hamster	Ca-GrQc	Condmat	Enron
									k-shell	0.0020	0.0040	0.0080	0.0050	0.0230	0.0240	0.2414	0.5446
Voterank	0.0020	0.0030	0.0090	0.0050	0.0538	0.0668	0.2683	0.8327
									NC+	0.0030	0.0040	0.0090	0.0040	0.0269	0.0249	0.2683	0.6323
H-index	0.0090	0.0070	0.0180	0.0060	0.0568	0.0499	0.3561	0.6462
									NCCDH	0.0807	0.1436	0.2692	0.1237	0.8097	0.7071	1.9051	6.1238
CCA	0.1157	0.2653	1.6977	1.0874	9.8017	71.5783	231.2250	357.7111
									Pagerank	0.1277	0.0798	0.2134	0.1117	0.6114	0.5335	2.5161	6.4447
KVR	0.0808	0.1307	0.4016	0.1506	1.2138	1.7222	28.5937	66.4826
									HVR	0.1067	0.1630	0.4179	0.1824	1.2643	1.8282	29.6787	67.0074
KHI	0.0817	0.1077	0.3182	0.1466	0.8219	0.8166	1.3677	4.3059
									KNHI	0.0778	0.1137	0.3321	0.1176	0.8134	0.8152	1.3933	4.7010

In combination with the previous comparison results, it can be concluded that the proposed method achieves the best results in a reasonable time and that among the four methods, KHIndex and knhidex have less run times than KVoteRank and HVoteRank. The proposed KNC method operates for no more than 70 seconds in all networks, and the KHIndex and KNHINdex operations do not exceed 5 seconds.

6. Conclusion

The invention provides a mixed neighborhood coverage KNC method based on k-shell to identify a group of most influential nodes in a complex network, and simultaneously, global information and local information of the network are considered. The KNC method is based on two judgment scores, each round of selection is carried out on uncovered nodes with the highest first judgment score and the highest second judgment score to serve as seed nodes, the first judgment score is a main score, the second judgment score is a secondary score, and after each round of selection is carried out on the nodes, the first-order neighbors of the nodes are covered, so that overlapping of propagation influence ranges is avoided. Firstly, the k-shell decomposition method endows the same propagation capacity to the same shell node, and is a coarse-grained sequencing method, the VoteRank method endows each node with the same voting capacity, and is centrality considering local information, a KVoteRank algorithm based on k-shell decomposition is provided aiming at the problems, the KVoteRank considers that the most popular node located at the core position of the network has the most propagation influence, and the node is important even if the VoteRank value of the node is small but located at the center of the network. Secondly, considering that the H-index is also a method of local information with coarse granularity, the larger the H-index is, the larger the social influence is, the H-index and the k-shell are combined, a KHIndex method is proposed, further considering the second-order neighborhood H-index of a node, an extended KNHINdex method is proposed, thirdly, the characteristic that the H-index is used as the intermediate state of degree and the number of kernels is utilized, the H-index is used for approximately replacing the k-shell, and a group of most influential propagators are selected by combining the H-index and the VoteRank, so that the HVotenk method is proposed. The proposed 4 KNC sub-methods, KVoteRank, etc., are all based on the idea of overlay to maximize the propagation impact. In addition, the SIR model and the average shortest path length are used to evaluate the proposed KVoteRank, KHIndex, KNHINdex and HVoteRank methods against other existing standard methods such as Degree, k-shell, etc. The test results on 8 real network data sets such as Jazz and the like show that the infection scale, the infection scale and the propagation speed of the proposed KVoteRank, KHIndex, KNHINdex and HVoteRank under different initial node proportions and different infection rates are obviously superior to those of the existing reference method, and the initial node set selected by the proposed KNC method is larger, which means that the selected seed nodes are more dispersed and can generate larger propagation influence. It can be seen that the proposed KNC method is rational and efficient.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A method for identifying influential propagators in a complex network, comprising:

s1, arranging the first judgment scores in a descending order;

s2, arranging the second judgment scores in a descending order;

s5, obtaining a selected node set after the selection is finished;

and S6, displaying the performance index.

2. The method for identifying the influence propagator in the complex network according to claim 1, wherein when the first judgment score is k-shell and the second judgment score is VoteRank, the method for identifying the influence propagator comprises the following steps:

s5, obtaining a selected node set after the selection is finished;

or/and when the first judgment score is H-index and the second judgment score is VoteRank, the method for identifying the influential propagator comprises the following steps:

and S5, obtaining the selected node set after the selection is finished.

3. The method of claim 2, wherein the node differentiating formula of the k-shell value comprises:

wherein k is_sRepresenting the same k-shell value from the network,

representing a set of nodes having the same k-shell value.

4. The method of claim 2, wherein the VoteRank comprises:

5. The method of claim 2, wherein the H-index comprises:

the H-index of node i is defined as:

6. The method of claim 5, wherein the domain H-index comprises:

the neighborhood H-index of node i is defined as:

7. Method for identifying influential propagators in implementing a complex network according to claim 1, characterized in that the performance indicators comprise SIR and/or average shortest path L_sAnd the SIR comprises:

The infection rate is defined as

8. The method of claim 7, wherein the SIR further comprises:

infection scale f (t):

infection Scale F (t)_c)：