CN113723504B

CN113723504B - Method for identifying influential propagators in complex network

Info

Publication number: CN113723504B
Application number: CN202110999228.6A
Authority: CN
Inventors: 刘小洋; 叶舒; 张梦瑶
Original assignee: Chongqing University of Technology
Current assignee: Chongqing University of Technology
Priority date: 2021-08-28
Filing date: 2021-08-28
Publication date: 2023-05-16
Anticipated expiration: 2041-08-28
Also published as: CN113723504A

Abstract

The invention provides a method for identifying influential propagators in a complex network, which comprises the following steps: s1, arranging the first judgment scores in a descending order; s2, arranging the second judgment scores in a descending order; s3, selecting a node with the highest first judgment score and the highest second judgment score, and covering the node and the neighbor nodes thereof; if the first judgment score is the same as the first judgment score, a plurality of nodes with the same second judgment score are arranged, and one node is randomly selected; s4, judging whether the number of the selected nodes meets the set value, if so, executing the next step, and if not, executing the step S3; s5, after the selection is finished, obtaining a selected node set; and S6, displaying the performance index. The invention can identify a group of most influential nodes in the complex network and display the indexes thereof.

Description

Method for identifying influential propagators in complex network

Technical Field

The invention relates to the technical field of network information mining, in particular to a method for identifying influential propagators in a complex network.

Background

Complex networks are ubiquitous in real life, such as social networks, biomolecular networks, citation networks, transportation networks, however, often only a portion of the nodes can have a large impact on these networks, which would lead to a breakdown of these networks if removed. In recent years, the identification of a set of influential nodes has attracted considerable attention from the complex network sciences community, with great significance to real network applications such as information dissemination, epidemic control, rumor control, viral marketing, arrest of key suspects of criminal networks, prevention of catastrophic interruption of the power grid, and all.

A number of classical centrality methods based on network topology are used to identify key nodes in the network, such as centrality (Degree Centrality), proximity centrality (Closeness Centrality), betweenness centrality (Betweenness Centrality), feature vector centrality (Eigenvector Centrality). The centrality calculates the number of direct neighbors of the node, and the more neighbors of the node are considered, the more important the node is, but the centrality only considers the local information of the node. The reciprocal of the shortest distance from one node to all other nodes is calculated near the center, and the calculation complexity is high due to the consideration of global information. The number of shortest paths passing through one node is calculated through the betweenness centrality, and the calculation complexity is high. Feature vector centrality considers that the more important a node's neighbor node is, the more important that node is. Kitsak et al consider nodes located at the core of the network to be more influential and propose a k-shell decomposition method that effectively identifies the single most influential node. The k-shell decomposition method gives the same k-shell index to nodes with different propagation capacities, only the surplus degree is considered, and the connection to the deleted node is not considered, so that the k-shell decomposition method is a coarse-grained centrality method. Based on this, a number of centrality algorithms have been proposed to improve the performance of the k-shell decomposition method. With extended H-index, lu et al propose H-index centrality, where H-index of a node is defined such that the node has at least one neighbor and each neighbor has a degree not less than. ClusterRank combines centrality and cluster coefficients to determine the influence of nodes. In recent years, information entropy has been proposed for evaluating the importance of nodes in a network. Nie et al propose the use of Mapping Entropy (ME) to identify key nodes in a network, taking into account the correlation between all node neighbors. Degree of association, number of cores and entropy Sheikhahhmadi et al propose a mixed method (mixed Core, degree and Entropy, MCDE) to order nodes. Ji et al propose the use of percolation to identify scattered propagation nodes in a network.

Kempe et al demonstrate that the problem of maximization of impact is an NP-complex problem, which proposes greedy hill climbing algorithms, but with higher complexity, are not suitable for large-scale networks. Based on this, many researchers have focused on heuristic algorithms in recent years. Chen et al believe that when a node's neighbor is selected as a seed node, the node's degree should be given a certain discount, and a degree discount heuristic is proposed. A large number of centrality algorithms select high centrality nodes, most of which are clustered together, which may result in overlapping propagation ranges of influence. Based on the idea of decentralized selection of nodes, cao et al propose a core coverage algorithm (Core Cover Algorithm, CCA) whereby the highest-shelled and highest-degree node is selected and the neighbor nodes are covered out on each round. Since CCA covers all neighbor nodes, yang et al propose a heuristic (Neighborhood Coreness Cover and Discount Heuristic Algorithm, NCCDH) for neighborhood kernel coverage and discount, each round selects the node with the largest neighborhood kernel, each round only covers neighbor nodes of the same shell, and the remaining neighbor nodes discount off the corresponding k-shell value. Based on the voting selection strategy, zhang et al propose a voting ranking algorithm that gives each node a voting score and voting capability, but the algorithm only considers local information and gives each node the same voting capability, and the contributions of neighboring nodes cannot be distinguished.

Disclosure of Invention

The invention aims at least solving the technical problems in the prior art, and particularly creatively provides a method for identifying influential propagators in a complex network.

In order to achieve the above object of the present invention, the present invention provides a method for identifying influential propagators in a complex network, including:

s1, arranging the first judgment scores in a descending order;

s2, arranging the second judgment scores in a descending order;

s3, selecting a node with the highest first judgment score and the highest second judgment score, and covering the node and the neighbor nodes thereof; if the first judgment score is the same as the first judgment score, a plurality of nodes with the same second judgment score are arranged, and one node is randomly selected;

s4, judging whether the number of the selected nodes meets the set value, if so, executing the next step, and if not, executing the step S3;

and S5, after the selection is finished, obtaining a selected node set.

And S6, displaying the performance index.

In a preferred embodiment of the present invention, when the first judgment score is k-shell and the second judgment score is VoteRank, the method for identifying the influential propagator comprises the following steps:

s1, calculating k-shell values of each node according to a k-shell decomposition method, and arranging the k-shell values in descending order;

S2, calculating the voting score of each node according to a VoteRank algorithm, and arranging the voting scores in descending order;

s3, selecting a node with the maximum k-shell value and the highest VoteRank value, and covering the node and neighbor nodes thereof; if a plurality of nodes with the same VoteRank value exist in the maximum and same k-shell value, randomly selecting one node;

s4, judging whether the number of the selected nodes is equal to the set number, if so, executing the next step; if not, jumping to execute S3;

and S5, after the selection is finished, obtaining a selected node set.

Or/and when the first judgment score is k-shell and the second judgment score is H-index, the method for identifying the influential propagators comprises the following steps:

s2, calculating the H-Index value of each node, and arranging the H-Index values in descending order;

s3, selecting a node with the maximum k-shell value and the highest H-index value, and covering the node and neighbor nodes thereof; if a plurality of nodes with the same H-index value exist in the largest and same k-shell value, randomly selecting one node;

And S5, after the selection is finished, obtaining a selected node set.

Or/and when the first judgment score is k-shell and the second judgment score is NH-index, the method for identifying the influential propagators comprises the following steps:

s2, calculating a neighborhood H-Index of each node, and arranging the neighborhood H-Index in descending order according to the H-Index value;

s3, selecting a node with the maximum k-shell value and the highest neighborhood H-index, and covering the node and the neighbor nodes thereof; if a plurality of nodes with the same neighborhood H-index value exist in the largest and same k-shell value, randomly selecting one node;

and S5, after the selection is finished, obtaining a selected node set.

Or/and the second judgment score is VoteRank, the method for identifying the influential propagators comprises the following steps:

s1, calculating an H-Index value of each node, and arranging the H-Index values in descending order;

s3, selecting a node with the maximum H-Index value and the maximum VoteRank value, and covering the node and the neighbor nodes thereof; if a plurality of nodes with the same VoteRank value exist in the maximum and same H-Index value, randomly selecting one node;

and S5, after the selection is finished, obtaining a selected node set.

In a preferred embodiment of the present invention, the node differentiating formula of the k-shell value includes:

wherein k is _s Representing the same k-shell value from the network,

represents the maximum k-shell value, d of the network _ij Representing the shortest distance from node i to node J, wherein J represents a network core node set, namely the node with the highest k-shell value; />

Representing a set of nodes with the same k-shell value.

In a preferred embodiment of the present invention, the VoteRank comprises:

each node can obtain votes from neighbors, each node vV is assigned a tuple (S _v ,Va _v )，S _v Representing the voting score, va, obtained by node v from a neighboring node _v Representing voting capability of node v to neighbor node, S _v Can be expressed as

Where N (v) represents the direct neighbor set, va, of node v _i Representing the voting capability of node i on neighboring nodes.

In a preferred embodiment of the present invention, the H-index comprises:

the H-index of node i is defined as:

wherein H (·) is a functional representation of the node H-index, and the degree of the neighbor node of node i is

In a preferred embodiment of the present invention, the domain H-index comprises:

the neighborhood H-index of node i is defined as:

where N (i) represents the direct neighbor set of node i, i.e., all neighbor nodes contained, h _j And represents the H-index of node j.

In a preferred embodiment of the invention, the performance indicators comprise SIR and/or average shortest path L _s The SIR includes:

firstly, setting an initial selected seed node as an infection state, wherein all other nodes in a network are in a susceptible state, and at each time step, each infection node infects the susceptible node in the direct neighborhood of the infection node with the probability of beta;

meanwhile, each infected node can become a recovery state with the probability of gamma, and the node which becomes the recovery state can not be infected any more; the differential equation is

The infection probability beta cannot be too small or too large, and if beta is too small, the infectious disease cannot successfully infect the whole network or even cannot be transmitted; if β is too large, infectious diseases can infect almost the entire network, and the impact between different nodes cannot be distinguished, which is not significant. So that beta is selected to be higher than the propagation threshold beta _min . The infection rate is defined as

Wherein S represents susceptibility, I represents infection, and R represents recovery.

In a preferred embodiment of the invention, the SIR further comprises:

infection scale F (t):

infection Scale F (t) _c )：

Wherein n is _I(t) And n _R(t) The number of infected nodes and the number of restored nodes at time t are respectively represented, and n represents the total number of nodes in the network.

A larger F (t) indicates more nodes that are infected at time t and have a larger impact, the algorithm performance is better, and a shorter t indicates a faster propagation speed.

During the infection process, the number of nodes changing from the infection state to the recovery state gradually increases at each time step, and finally reaches a peak value, namely a stable state.

In a preferred embodiment of the present invention, the average shortest path L _s Comprising the following steps:

the average shortest path length between the selected node sets S is defined as:

where |S| represents the radix of the finite set, i.e., the length of set S, l _u,v Representing the shortest path length from node u to node v, a larger L _S Representing the selected seed nodes more diffuse may maximize the propagation impact.

In summary, due to the adoption of the technical scheme, the beneficial effects of the invention are as follows:

1) In combination with global position information decomposed by the k-shell and local voting information of the VoteRank method, the most popular nodes at the core position of the network are considered to be most important, and a new KVOteRank method is provided for identifying a group of most influential nodes in the complex network.

2) Aiming at the problem that H-index only considers network local information and both H-index and k-shell are too coarse-grained, and inspired by a CCA method, k-shell is used as a first judgment score, H-index is used as a second judgment score, and a KHIndex method with a coverage neighborhood having a maximum H-index node is provided; the second-order neighborhood H-index is utilized, an expansion method KNIndex is further provided for selecting a group of most influential propagators in a complex network, and the network local information is further considered by the new expansion method KNIndex.

3) By utilizing the characteristic that H-index is used as an intermediate state of the degree and the core number, the H-index is used for approximately replacing the k-shell, H-index is used as a first judgment score, voteRank is used as a second judgment score, and a novel method for identifying the HVoteRank by the key nodes of the complex network is provided.

4) Based on the above tests of the hybrid neighborhood coverage method KVoteRank, KHIndex, KNHIndex and HVoteRank of the k-shell, H-index and volterank methods, an improved KNC method of neighborhood coverage of k-shell is proposed to identify a set of most influential propagators in a complex network; meanwhile, local and global information in the network is considered, and the problem that the k-shell decomposition method cannot distinguish the same shell node and H-index, voteRank only considers local neighbor information is solved.

5) The proposed KNC method is comprehensively evaluated by using an SIR model and average shortest path length, and simulation experiments are carried out on 8 real networks such as Jazz, which prove that the initial node set selected by the proposed KNC method is superior to the existing baseline method in infection scale under different infection rates and different initial node ratios, and the selected initial node set is more dispersed.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

FIG. 1 is a schematic diagram of an embodiment of the KVOTERank algorithm of the present invention.

Figure 2 is a schematic diagram of the SIR propagation process of the present invention.

FIG. 3 is a schematic representation of the infection scale for different initial node ratios of the present invention.

FIG. 4 is a comparative schematic of the infection scale of the present invention.

FIG. 5 is a graphical representation of the infection scale for different infection rates according to the present invention.

Fig. 6 is a schematic diagram of the initial set of nodes shortest path length according to the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

1. Related work

Given a network G (V, E), V and E represent node sets and edge sets, respectively, n= |v| represents the number of nodes in the network, m= |e| represents the number of edges in the network, where || represents the cardinality of the set, and a= { a is used _ij The adjacency matrix of figure G is represented when a _ij =1 represents node v _i And node v _j With edges therebetween, otherwise a _ij Node v shown by =0 _i And node v _j Without edges therebetween. N (v) represents the set of direct neighbors of node v.

1.1 k-shell centrality

The k-shell method considers that the position of a node is more important than the neighbor of the node, and one node is located in the core position of the network and has higher propagation influence even if the degree is smaller. The algorithm firstly removes the nodes with the degree of 1, the degree value is reduced in the removing process, and the nodes with the degree of 1 are continuously removed until the nodes with the degree of 1 are not in the network, and all the removed nodes are distributed as 1-shells; and then removing the nodes with the surplus degree of 2, and iteratively removing the nodes with the surplus degree value of less than or equal to 2 until the surplus degree of all the nodes is more than 2, wherein all the removed nodes are allocated with 2 shells, and repeating the process until all the nodes in the network are allocated with corresponding shells. However, the k-shell method allocates a large number of k-shell values of the same nodes, and the nodes have different propagation capacities, which is a coarse-grained method, and as the shell layer increases, the propagation influence ranges of the nodes may overlap as the node aggregation degree is higher.

1.2 Mixer decomposition (Mixed degree decomposition, MDD)

k-shell decomposition considers only the connections of the remaining nodes at a time, and ignores the edges connected to the deleted nodes altogether, zeng et al propose a mixture decomposition method (Mixed Degree Decomposition, MDD), while considering the node's remaining and exhaustion degrees, the mixture centrality of each node being expressed as

k ^m (v)＝k ^r +λ*k ^e (1)

Wherein k is ^r Representing the degree of remainder of the node, i.e. the edges connected to the remaining nodes,k ^e represents the exhaustion degree of the node, namely the edge connected to the deleted node, and lambda is an adjustable parameter with a value between 0 and 1. When λ=1 then the node exhaustion is fully considered, in which case the MDD method is equivalent to the centrality of degree, and when λ=0 then the MDD method only considers the surplus degree, in which case the MDD method is also equivalent to the k-shell decomposition method.

1.3 neighborhood nucleus centering (Neighborhood coreness centrality)

The neighborhood nucleus (Neighborhood Coreness, NC) was proposed by Bae et al, which considers that the impact is greater when a propagator has more neighbors at the network core. The method improves the degeneracy of the k-shell method and balances the position relationship between the degree and the nodes. Defining the neighborhood nucleus number of the node v as

Where N (v) represents the direct neighbor set of node v and ks (w) represents the k-shell index of neighbor node w. Bae et al also propose an expanded neighborhood kernel number (Extend Neighborhood Coreness, nc+), which is defined as the number of expanded neighborhood kernels of node v, taking into account the second order neighbors of the node

Wherein C is _nc (w) represents the number of neighbor cores of the neighbor node w.

1.4 improved k-Shell index

Considering the distance between the target node and the network core location, liu et al propose an improved k-shell method for distinguishing nodes of the same shell. The method considers that the closer the nodes with the same k-shell value are to the core position of the network, the larger the propagation influence is. The node discrimination formula for the same k-shell value is as follows

Wherein k is _s Representing the same k-shell value from the network,

represents the maximum k-shell value, d of the network _ij Representing the shortest distance from node i to node J, J representing the set of network core nodes, i.e., the node with the highest k-shell value. />

Representing a set of nodes with the same k-shell value.

1.5 H-index

The iterative k-shell decomposition process requires topology information of the network global, which limits its application in large-scale networks, while H-index is a local metric that takes into account the network part information (i.e. the degree of neighbors), which was first used to evaluate the academic yield number and the academic yield level of researchers. The H index of a person means that among the N papers published by the person, H papers are respectively cited for at least H times, and the number of the reference times of the rest N-H papers is not more than H times. In network science, node v is defined _i And H satisfies at least H neighbor nodes for H, each neighbor node having a degree not less than H.

1.6 VoteRank

Zhang et al introduced the volterank method to identify a set of scattered propagators in a network. In real life, if a already supports B, a's supporting force on other people is reduced. VoteRank selects nodes one by one according to the voting scores of neighboring nodes, and when a certain node has been selected as the most influential node, its neighboring node voting ability is reduced, and the selected node does not participate in the next round of voting. The VoteRank method assigns each node a tuple (s _v ,va _v ) Wherein s is _v Representing the voting score obtained by node v from a neighboring node, va _v Representing the voting capability of node v. VoteRank may be composed of four parts:

step 1: initializing. All tuples are initialized with (0, 1), i.e. the voting score of each node is 0 and the voting capacity of each node is 1.

Step 2: voting. Each node obtains a voting score based on the sum of the voting capabilities of the neighboring nodes. The node with the highest voting score and which is not selected is selected as the most influential node. To avoid the node being selected again, the voting score of the node will be set to 0. To ensure that the node will not participate in the next vote, the voting capability of the node is also set to 0.

Step 3: updating. In order to obtain a more decentralized seed node set, after selecting a seed node v in step 2, in the next round, the voting capability of the neighbor node of v needs to be discounted, if u is the neighbor node of the node v, va _u ＝va _u -f，(va _u > 0) or va _u ＝0，(va _u .ltoreq.0)), wherein

< k > is the average degree of nodes in the network, va _u Representing the voting capability of node u.

Step 4: repeating the

steps

2 and 3 until the number of the selected nodes meets the set number requirement.

The center of VoteRank has higher accuracy than the center of degree, the H index, the k-shell and the like, but the VoteRank does not consider the position distribution of nodes in the network, and in reality, even though the voting score is lower, the nodes positioned in the center of the network can generate larger influence.

2. Proposed method (Proposed method)

VoteRank centrality selects a set of influential propagators based on a voting mechanism, each node can obtain votes from neighbors, each node V e V is assigned a tuple (S _v ,Va _v )，S _v Representing the voting score, va, obtained by node v from a neighboring node _v Representing the voting capability of node v on neighboring nodes. S is S _v Can be expressed as

H-index, i.e., H index, is defined as H, then the node has at least H neighbors, and each neighbor has a degree not less than H. For network G (V, E), define the degree of node i as k _i The degree of the neighbor node is

Wherein (1)>

1 st neighbor node j representing node i ₁ Degree of (1)/(2)>

2 nd neighbor node j representing node i ₁ Degree of (1)/(2)>

Represents the kth of node i _i Degree of each neighbor node.

The H-index of node i is defined as

Where H (·) is a functional representation of the node H-index.

Inspired by the neighborhood core centrality NC, an extended neighborhood H index is given here, the second order neighborhood H-index is considered, and the neighborhood H-index of the node i is defined as

The KNC method is based on two judgment scores, wherein the first judgment score is a main judgment condition, the second judgment score is a secondary judgment condition, each time, an uncovered node which meets the highest judgment score and the highest judgment score is selected as a seed node, and after each round of node selection, the direct neighbor node is covered to dispersedly select the nodes, so that the propagation influence is maximized. The invention mainly describes a KVOTERank method which is a sub-method of the KNC method.

VoteRank centrality does not consider local information of nodes, differences of neighbor nodes, and position information of the nodes. Inspired by the core coverage algorithm (Core Cover Algorithm, CCA), the proposed kvaterank algorithm considers the most popular (i.e. the most volterank-valued) nodes located at the network core location to have a greater propagation impact. Nodes located in the core of the network have a greater impact even though the VoteRank value is smaller. The kvaterank algorithm procedure is presented herein. The KVOTERank algorithm execution process is divided into 4 stages:

stage 1: and calculating the k-shell value of each node according to a k-shell decomposition method, and arranging the k-shell values in descending order.

Stage 2: and calculating the voting score of each node according to the VoteRank algorithm for the same k-shell value, and arranging the voting scores in descending order.

Stage 3: and selecting the node with the highest VoteRank value, and covering the node and the neighbor nodes thereof. If there are multiple nodes with the same VoteRank value in the same shell, one node is randomly selected.

Stage 4: and (3) repeating the step (3) until the number of the selected nodes meets the set number requirement.

The algorithm can solve the problem that the k-shell decomposition method cannot distinguish the influence of the same shell node, and the nodes of the same shell are distinguished according to the voting scores of the nodes. In order to avoid overlapping of the impact ranges, the seed node is selected in a decentralized manner in stage 3 of the proposed algorithm, i.e. after each selection of the seed node, the node and its immediate neighbors are covered. The kvaterank algorithm description is shown in algorithm 1.

Table 1 KVOTERank algorithm

2-4 rows of the KVOTERank algorithm represent stage 1, calculate the node k-shell value, 5-8 rows represent stage 2, calculate the volternk score of each node, 9-17 rows represent stage 3 and stage 4, select the scattered seed node with the highest shell layer and highest voting score, i.e. when selecting the seed node, the k-shell value is the primary judgment condition, the voting score is the secondary judgment condition, and the neighbor nodes of the selected node are covered to dispersedly select the most influential node so that the propagation ranges do not overlap.

Considering that H-index and VoteRank consider the information of neighbor nodes as well, the centrality of local information is considered, and the larger the index of H-index is, the larger the social influence is, and combining the H-index and k-shell methods, the node which is positioned at the network core position and has the highest index (large influence) is considered to have the most influence, and the H-index (KHIndex) method based on k-shell decomposition is provided, wherein the first judgment score is k-shell, and the second judgment score is H-index. Further considering local information of the node, representing the semi-local information by using a neighborhood H-index, which is equivalent to considering a 4-order neighborhood of the node, and providing an extended KN HIndex algorithm of the neighborhood H-index based on k-shell decomposition, wherein the first judgment score of the KN HIndex method is k-shell, and the second judgment score is neighborhood H-index. For the KHIndex (and KNIndex) algorithm, stage 2 is to calculate the H-Index (and neighborhood H-Index) for each node, and stage 3 is to select the node with the largest k-shell value and highest H-Index for each round. Lu et al state the relationship of H-index to the degree and the number of nuclei, the degree being the initial state, H-index being the intermediate state and the number of nuclei being the steady state. Therefore, the KNC method considers that the H-index is approximately used for replacing the k-shell, namely, the H-index is used as a first judgment score, the VoteRank score is used as a second judgment score, and the node with high influence can be selected, so that the VoteRank method based on the H-index is named as HVoteRank. Because the H-index is also a coarse granularity method, H-index is used as a first judgment score, voteRank is used as a second judgment score, the node with the largest H-index and highest voting score is selected each time, after each round of node selection, the direct neighbor nodes are covered to select the nodes in a scattered way, and the algorithm process is similar to a KVOteRank algorithm. Table 2 gives the judgment scores for the KVoteRank, KHIndex, KNHIndex and HVoteRank 4 sub-methods of the KNC method.

Table 2 KNC method judgment score

Judgement condition	KVoteRank	KHIndex	KNHIndex	HVoteRank
					Main score	k-shell	k-shell	k-shell	H-index
Second score	VoteRank	H-index	NH-index	VoteRank

In order to intuitively interpret the proposed algorithm, a kvaterank algorithm process diagram (see fig. 1) is presented, and the process of selecting the 3 most influential seed nodes by the proposed kvaterank algorithm is simply visualized, KHINdex, HNHIndex, HVoteRank being similar to kvaterank and not presented here.

According to the KVOTERank algorithm process diagram of FIG. 1, first, the k-shell value of each node in the network is calculated, and different colors represent that the node is in different shells. Node sets {1,2,3,4} are in 3-shell, {5,6,7,8} are in 2-shell, and the rest nodes are in 3-shell; next, the voting score for each node is calculated, the k-shell value for each node is plotted in the graph, along with the VoteRank score, which are given in Table 3 for all nodes.

TABLE 3 VoteRank score and H-index for nodes in a network

As shown in the kvaterank algorithm procedure fig. 1, according to the kvaterank algorithm, node 1 with the highest voting score (8) and 3-shell is selected first, and then the direct neighbor node set {2,3,4,5,6,7} of node 1 is covered, so that the first round of selection ends. Then node 8 with a voting score of 3.16 at the 2-shell is selected and the uncovered direct neighbor nodes 25,26 of node 8 are covered and the second round of selection ends. Then, node 23 with a vote score of 7.58 at 1-shell is selected and its

neighbor node

16,17,18,19,20,21,22 is covered. The selection process of such 3 seed nodes (seed nodes, i.e., the most influential propagators selected) ends, with the set of selected seed nodes being {1,8,23}. For the KHIndex algorithm, according to the table 3, node 4 is selected first, then its

neighbor node

1,2,3,5,8 is covered, then one of the randomly selected nodes 6,7 is selected, assuming that node 6 is randomly selected, neighbor node 7 is covered; finally, select node {15,17,23,25}, assume select node 15, then overlay neighbor nodes {13,14,17},3 seed-node selections are complete, and 3 most influential nodes selected by the KHIndex algorithm are {4,6,15}. The process is similar to the kvaterank algorithm for KNHIndex and HVoteRank algorithms, differing only in the first and second judgment scores.

Here, a simple test of the propagation process is performed on the k-shell and kmotehank selecting 15 nodes in the CEnew network (453 nodes), and the result is shown in SIR propagation process diagram 2, where blue nodes represent susceptible nodes, red represent infected nodes, and green represents immune nodes.

In the SIR propagation process diagram of fig. 2, the k-shell decomposition method and the proposed kmotehank algorithm are simply compared with the SIR propagation model, it can be seen that the initial nodes selected by the k-shell algorithm are somewhat clustered together, while the initial nodes selected by the kmotehank algorithm are scattered with each other, and at a time step of 10, the kmotehank algorithm infects 11 more nodes than the k-shell method.

3. Data set

To demonstrate the superiority of the performance of the proposed KNC method, the experiment used 8 real network datasets of different types and sizes. Most of the data sets come from a social graph database SNAP, compiled by the university of stamford college and students. Wherein 1) Jazz: the data set records jazz bands showing between 1912 and 1940; 2) CEnew: the cenw dataset is an edge list of the caenorhabditis elegans metabolic network. 3) Crimes: the data set is a crime network, the nodes represent a person, and the sides represent that two criminals participate in a crime together; 4) Email: the relationship of mail exchange between the Luo Weila university users is recorded; 5) Hamster: friendship and family contact between "www.hamsterster.com" website users are defined; 6) Ca-GrQc: the dataset is a collaborative network from electronically printed arXiv covering scientific collaboration between authors submitted to papers of the broad relativity and quantum universe categories. 7) Condmat: is a partner network based on the concentrate part of electronic version arXiv archived in 1995 to 1999. 8) Enrons: the secure email interactions are in the secure community and include information about over one million emails. Specific topology information description of the 8 real networks of Jazz et al is shown in table 4.

Table 4 network topology

Network	n	m	<k>	k _max	<d>	<c>	β _min
								Jazz	198	2742	27.697	100	2.235	0.6175	0.0266
CEnew	453	2025	8.94	237	2.664	0.646	0.0256
								Crimes	829	1473	3.554	25	5.04	0.008	0.1960
Email	1133	5451	9.622	71	3.606	0.2540	0.0565
								Hamster	2426	16631	13.711	273	3.67	0.538	0.0241
Ca-GrQc	4158	13422	6.456	81	6.049	0.665	0.0589
								Condmat	23133	93497	8.083	281	5.352	0.633	0.0475
Enron	33696	180811	10.732	1383	4.025	0.708	0.0071

In the table 4 network topology table, n represents the total number of nodes in the network, and m represents the number of edges in the network;

representing the average degree, k, of nodes in a network _max Represents the maximum degree of nodes in the network, < d > represents the average shortest path length of the network, +.>

Representing the average cluster coefficient of the network, I _i Representing the number of edges between the direct neighbors of node i; beta _min For the propagation threshold, here can be defined by +.>

And (5) calculating to obtain the product. Where N (i) represents the direct neighbor set of node i, |N (i) | represents the cardinality of set N (i), k represents the degree of nodes in the network,<·>for the averaging operation.

4. Performance index

4.1SIR epidemic model

An infectious disease model of susceptibility to infection recovery (SIR) is used to evaluate the performance of the proposed method. In the SIR model, the node has three states, susceptibility (S), infection (I), and recovery (R). Wherein the susceptibility status indicates that the node is susceptible to disease and information, the infection status indicates that the node has been infected with disease or has been activated by information, and the recovery status indicates that the node has recovered and no more information or disease is transferred. Firstly, setting the initial selected seed node as an infection state, and setting all other nodes in the network as susceptibility states, wherein each infection node infects the susceptibility nodes in the direct neighborhood of the infection node with the probability of beta in each time step. At the same time, each infected node becomes a recovery state with a probability of γ (γ represents the recovery probability), and the node that becomes the recovery state is not infected any more. The differential equation is

The infection probability beta cannot be too small or too large, and if beta is too small, the infectious disease cannot successfully infect the whole network or even cannot be transmitted; if β is too large, infectious diseases can infect almost the entire network, and the impact between different nodes cannot be distinguished, which is not significant. So that beta is selected to be higher than the propagation threshold beta _min The propagation threshold for each network is given in column 8 of table 4. In this experiment, the infection rate was defined as

Due to the randomness in the model, the experimental results should be averaged over multiple simulations.

The infection probability beta refers to the probability that an infected node can infect a susceptible node in the neighborhood in one time step, and lambda refers to the infection rate, namely the ratio of the infection probability to the recovery rate, and can be understood as the infection capacity of the SIR model, and the infection probability under the recovery condition.

Algorithm performance can be measured by measuring the node's propagation capacity, which can be measured by the size of the infection at time t, F (t), and the final size of the infection, F (t _c ) To represent. The infection scale represents the impact of selecting a node at time t,is defined as

Wherein n is _I(t) And n _R(t) The method respectively represents the number of infected nodes and the number of restored nodes at the time t, n represents the total number of nodes in the network, a larger F (t) represents more infected nodes at the time t, the influence is larger, the algorithm performance is better, and a shorter t represents faster propagation speed.

During the infection process, the number of nodes changing from the infection state to the recovery state gradually increases at each time step, and finally reaches a peak value, namely a stable state. Final infection size F (t _c ) I.e. the proportion of the total number of recovery nodes indicates the final influence of the initially selected seed node, defined as

Thus, F (t) evaluates the propagation influence of the node at time t, F (t) _c ) The propagation influence of the node when the SIR propagation process reaches steady state is evaluated.

4.2 average shortest path L _s

For the selected seed nodes, if the seed nodes are gathered together like the degree centrality or the k-shell centrality, the propagation influence ranges overlap, so that the propagation influence ranges are more easily expanded by selecting the scattered seed nodes. By measuring the average shortest path L between selected seed nodes _s The dispersion degree of the selected nodes is measured, and the performances of different algorithms are compared. The average shortest path length between the selected set of seed nodes S is defined as

5. Experimental results and analysis

To test the effectiveness of the KVoteRank, KHIndex, KNHIndex and HVoteRank4 seed algorithms of the proposed KNC method, simulation analysis was performed on 8 real networks in Jazz et al with SIR model and average path length according to the performance index proposed in section 5, further comparing KVoteRank, KHIndex, KNHIndex and HVoteRank with the 7 baseline centrality algorithms Degree, k-sehll, NC, NC +, pageRank, H-index and volterank, while comparing the proposed 4 algorithm performances.

5.1 SIR model simulation analysis

The performance of different algorithms is judged according to the final infection scale under the condition of different initial seed node scales, different proportions are adopted when the initial propagation seed nodes are selected in view of different scales of the network, and a larger initial proportion is given to the network with smaller scale. The initial seed node ratio is set to 0.03 at maximum for networks Jazz, CEnew, crimes, email, hamster, ca-GrQc and 0.003 at maximum for larger scale networks Condmat and acron. The infection probability beta is set to 1.5 beta _min The final infection scale F (t _c ) As shown in the different initial node ratios of fig. 3. The x-axis represents the ratio p of different initial seed nodes, the y-axis represents the final infection scale at each ratio, and the subgraph in fig. 3 (a) of different initial node ratios alone gives the infection scale for the proposed KVoteRank, KHIndex, KNHIndex and HVoteRank 4 methods. The experimental results were obtained by the average of 300 experiments.

As can be seen from fig. 3, which shows the scale of infection from the initial node scale, the proposed 4 methods KVoteRank, KHIndex, KNHIndex and HVoteRank all achieved satisfactory results on 8 networks, demonstrating the superiority of the proposed methods. At smaller initial p, the 4 KNC methods presented were comparable to other baseline methods, but as p increased, the resulting 4 KNC methods had a final infection scale that was progressively better than the other baseline algorithms. Of all 8 networks, the proposed 4 methods perform best, especially on network CEnew, hamster and large-scale network Enron, with KNIndex performing slightly worse in small-scale network Jazz, and HVoteRank method in CEnew network. For example, in a CEnew network, the KHIndex method infects more than 12% of nodes with an initial node ratio of 0.02, whereas the baseline method does not reach this scale at an initial node ratio of 0.03. In the Hamster network, the proposed kvaterank method infects about 3.5% more nodes than the benchmark method. In all networks, the performance of the proposed KVoteRank, KHIndex, KNHIndex and HVoteRank algorithms is different from that of the other network, but the performance of the proposed method is better than 7 centralities of the benchmark, and the effectiveness of the proposed method is proved.

To verify the propagation scale and propagation speed of the selected seed nodes of the different algorithms, a time-step experiment is used to verify the performance of the different algorithms, and to maintain the consistency of the experiment, a fixed number of initial seed node ratios are set. For smaller scale networks Jazz, CEnew, crimes, email, hamster, ca-GrQc, the initial seed node ratio is 0.03, and for larger scale networks Condmat and Enron networks, the initial seed node ratio is 0.003. Experimental results were obtained by averaging 1000 experiments. The results of F (t) over time are shown in the comparative graph of infection scale in FIG. 4, with the axis representing time step and the axis representing infection scale over time.

As can be seen from the infection scale comparison graph of the initial node set of fig. 4, the proposed KVoteRank, KHIndex, KNHIndex and HVoteRank 4 algorithms always reach the highest peak, i.e. the infection protocol maximum, and always reach the most stable state, i.e. the infection rate is the most rapid, compared to the seven baseline algorithms of Degree, k-shell, etc. Among Jazz, CEnew, hamster, ca-GrQc, condma and Enront networks, HVoteRank performs best, especially at CEnew, hamster and Enront networks. HVoteRank is about 3.5% higher than the worst k-shell method and 2.5% higher than the best H-index for the CEnew network, about 3.8% higher than the best volterank method for the Hamster network, about 0.625% higher than the benchmark method for the large scale network Enron, KHIndex and KNHIndex perform comparable to HVoteRank, and kvaoterank performs slightly worse. Among Crimes networks, KVOTERank performs best, HVOteRank performs slightly worse than the 4 methods proposed, but clearly better than the other baseline methods. In Email networks, the proposed 4 KNC methods perform substantially equally well, about 2.2% higher than the baseline method. The k-shell method and nc+ perform worst in all networks, which can be explained by the k-shell method selecting seed nodes that aggregate with each other, resulting in overlapping propagation impact ranges. The 4 KNC methods such as KVOTERank and the like can always reach a stable state in the least time, and the 4 methods such as KVOTERank and the like have small differences in performance but are obviously superior to the other 7 reference methods.

In addition to the proportion of the initial seed nodes selected, the infection rate also affects the node transmission process, different transmission capacities are represented, the infection probability is set to be changed from 1.0 to 2.0, the transmission influence of the nodes is observed, the infection scale of different algorithms under different infection probabilities is shown in fig. 5, and the experimental result is obtained through the average of 300 experiments. The axis represents infection rate and the axis represents infection scale at different infection rates, the subgraph in fig. 5 (a) alone gives the infection scale for the proposed KVoteRank, KHIndex, KNHIndex and HVoteRank methods at co-infection probability in the Jazz network.

Observing the infection scale graph of different infection rates of fig. 5, the proposed KVoteRank, KHIndex, KNHIndex and HVoteRank methods show better performance on 8 networks and are superior to other 7 reference methods, especially on CEnew and Hamster networks, the performance is obviously superior to other methods, and at an infection rate of 2.0, the proposed HVoteRank algorithm can infect more than about 2.9% more on CEnew networks than on reference methods, and in Hamster networks, the kvaoterank algorithm is about 3.5% higher than on reference methods. All the proposed methods perform slightly worse in networks Jazz and Condmat. In networks Jazz and Email, at smaller rates of infection, the proposed method is comparable to other methods, with increasing infection scale, as messages may not be successfully propagated due to the smaller rates of infection. Experiments show that the proposed KNC sub-method KVoteRank, KHIndex, KNHIndex and HVoteRank have a stronger generalization ability than the reference method.

5.2 average Path Length analysis

The k-shell approach tends to select a single most influential node, but if a set of most influential nodes is selected, the k-shell approach does not perform well because the nodes of the high shells of the k-shell approach aggregate with each other, resulting in overlapping propagation influence ranges, and maximization of influence is not achieved. In general, the more dispersed the initial set of seed nodes selected, the more propagation effects can be maximized. The average path length of the node sets is used to measure the distance between the initial infection seed node sets. Fig. 6 shows the shortest path lengths between the seed node sets selected by the different algorithms of the networks Jazz and Crimes, with the remaining shortest paths being given in the initial node set shortest path table of Table 5.

Looking at the initial node set shortest path length graph of fig. 6, it can be seen that the average shortest path for the nodes selected for the KNC method proposed by the network Jazz is larger, but the distance is smaller than that of the CCA method, and in the Crimes network, the average shortest paths of kmotehank and KNHIndex are the largest, which means that the more scattered the selected seed nodes are, the more likely the propagation effect is maximized. Initial node set shortest path length table 5 gives the average shortest path length between the initial node sets selected by different networks under different algorithms. To maintain experimental consistency, for small scale networks Jazz, its initial node set ratio was set to 0.3, for smaller scale networks CEnew, crimes, email, hamster and Ca-GrQc, its initial node set ratio was set to 0.03, for larger scale networks Condmat and Enron, its initial node set ratio column was set to 0.003, and columns 7-10 of table 5 are the average shortest path lengths of the initial node sets selected by the proposed methods KHIndex, KNHIndex, HVoteRank and kmotehank, respectively.

Table 5 average path length of initial set of nodes

As can be seen from the table 5 of initial node set shortest path lengths, the proposed KNC sub-method KVoteRank, KHIndex, KNHIndex and HVoteRank average shortest path length are significantly larger than other reference methods except for the network Ca-GrQc, and it is also explained that the proposed method does not simply select a decentralized node, but a decentralized node according to propagation capability. Generally, the more the initial seed nodes are selected, the more information can be transferred to the whole network, so the average shortest path length is generally used as an evaluation index, but the performance of the algorithm cannot be absolutely described. To compare algorithm execution efficiency, the run times of the different algorithms were tested and the run times of the different methods in the 8 networks are given in the run schedule table 6.

TABLE 6 runtime(s)

Methods	Jazz	CEnew	Email	Crimes	Hamster	Ca-GrQc	Condmat	Enron
									k-shell	0.0020	0.0040	0.0080	0.0050	0.0230	0.0240	0.2414	0.5446
Voterank	0.0020	0.0030	0.0090	0.0050	0.0538	0.0668	0.2683	0.8327
									NC+	0.0030	0.0040	0.0090	0.0040	0.0269	0.0249	0.2683	0.6323
H-index	0.0090	0.0070	0.0180	0.0060	0.0568	0.0499	0.3561	0.6462
									NCCDH	0.0807	0.1436	0.2692	0.1237	0.8097	0.7071	1.9051	6.1238
CCA	0.1157	0.2653	1.6977	1.0874	9.8017	71.5783	231.2250	357.7111
									Pagerank	0.1277	0.0798	0.2134	0.1117	0.6114	0.5335	2.5161	6.4447
KVR	0.0808	0.1307	0.4016	0.1506	1.2138	1.7222	28.5937	66.4826
									HVR	0.1067	0.1630	0.4179	0.1824	1.2643	1.8282	29.6787	67.0074
KHI	0.0817	0.1077	0.3182	0.1466	0.8219	0.8166	1.3677	4.3059
									KNHI	0.0778	0.1137	0.3321	0.1176	0.8134	0.8152	1.3933	4.7010

By combining the previous comparison results, the proposed method achieves the best effect in a reasonable time, and among the four methods, KHIndex and KNHIndex have less running time than kvaterank and HVoteRank. The proposed KNC method has a run time of no more than 70 seconds in all networks, and KHIndex and KNIndex run times of no more than 5 seconds.

6. Conclusion(s)

The invention provides a mixed k-shell-based neighborhood coverage KNC method for identifying a group of most powerful nodes in a complex network, and simultaneously considers global information and local information of the network. The KNC method is based on two judgment scores, each round of the uncovered nodes with the highest first judgment score and the highest second judgment score are selected as seed nodes, the first judgment score is the main score, the second judgment score is the secondary score, and after each round of the nodes are selected, the first-order neighbors are covered, so that the overlapping of the propagation influence ranges is avoided. Firstly, the k-shell decomposition method gives the same propagation capability to the nodes of the same shell layer, which is a coarse-grained ordering method, and the VoteRank method gives the same voting capability to each node, which is a centrality considering local information, and a KVOTERank algorithm based on the k-shell decomposition is provided for the problems, wherein the KVOTERank considers that the most popular node at the network core position has the propagation influence, and even if the VoteRank value of the node is smaller, the node is positioned at the network center, which is also important. Secondly, considering that H-index is also a coarse-grained local information method, and the larger the H-index is, the larger the social influence is, the H-index and k-shell are combined, a KHIndex method is provided, the second-order neighborhood H-index of the node is further considered, an extended KN Index method is provided, thirdly, the H-index is used as the characteristic of the intermediate state of the degree and the core number to approximately replace k-shell, and a group of most influential propagators are selected by combining the H-index and VoteRank, so that the HVoteRank method is provided. The proposed KVoteRank et al 4 KNC sub-methods are all based on the idea of coverage to maximize the propagation impact. In addition, SIR models and average shortest path length are used to evaluate the proposed KVoteRank, KHIndex, KNHIndex and HVoteRank methods against other existing Degree, k-shell, etc. benchmark methods. Test results on 8 real network data sets of Jazz et al show that the infection scale of the proposed KVoteRank, KHIndex, KNHIndex and HVoteRank under different initial node ratios, the infection scale and the transmission speed under different infection rates are obviously superior to those of the existing reference method, and the initial node set selected by the KNC method is larger, so that the selected seed nodes are more dispersed, and the transmission influence is larger. It can be seen that the proposed KNC method is rational and efficient.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A method for implementing influential propagator identification in a complex network, comprising:

s1, arranging the first judgment scores in a descending order;

s2, arranging the second judgment scores in a descending order;

s5, after the selection is finished, obtaining a selected node set;

s6, displaying the performance index;

when the first judgment score is k-shell and the second judgment score is VoteRank, the method for identifying the influential propagators comprises the following steps of:

and S5, after the selection is finished, obtaining a selected node set.

2. The method for identifying influential propagators in a complex network according to claim 1, wherein when the first judgment score is k-shell and the second judgment score is H-index, the method comprises the following steps:

and S5, after the selection is finished, obtaining a selected node set.

3. The method for identifying influential propagators in a complex network according to claim 1, wherein when the first judgment score is k-shell and the second judgment score is NH-index, the method comprises the following steps:

and S5, after the selection is finished, obtaining a selected node set.

4. The method for identifying influential propagators in a complex network according to claim 1, wherein when the first judgment score is H-index and the second judgment score is volterank, the method comprises the steps of:

and S5, after the selection is finished, obtaining a selected node set.

5. A method for identifying influential propagators in a complex network according to claims 1-3, wherein the node differentiating formula of k-shell value comprises:

wherein k is _s Representing the same k-shell value from the network,

Representing a set of nodes with the same k-shell value.

6. A method of implementing influential propagator identification in a complex network according to claim 1 or claim 4 in which the VoteRank comprises:

Each node can obtain votes from neighbors, each node V e V being assigned a tuple (S _v ,Va _v )，S _v Representing the voting score, va, obtained by node v from a neighboring node _v Representing voting capability of node v to neighbor node, S _v Can be expressed as

7. A method of implementing influential propagator identification in a complex network according to claim 2 or claim 4 in which the H-index comprises:

the H-index of node i is defined as:

8. The method for identifying influential propagators in a complex network according to claim 7, wherein the domain H-index comprises:

the neighborhood H-index of node i is defined as:

9. The method for identifying influential propagators in a complex network according to claim 1 in which the performance metrics comprise an epidemic model comprising:

The infection rate is defined as

10. The method of claim 9, wherein the epidemic model further comprises:

infection scale F (t):

final infection size F (t) _c )：

11. The method for identifying influential propagators in a complex network according to claim 1 wherein the performance metrics further comprise an average shortest path L _s ：

Where |S| represents the radix of the finite set, i.e., the length of set S, l _u,v Representing the shortest path length from node u to node v.