CN114978983A - Influence node identification method and system based on second-order H index and voting mechanism - Google Patents

Influence node identification method and system based on second-order H index and voting mechanism Download PDF

Info

Publication number
CN114978983A
CN114978983A CN202210684358.5A CN202210684358A CN114978983A CN 114978983 A CN114978983 A CN 114978983A CN 202210684358 A CN202210684358 A CN 202210684358A CN 114978983 A CN114978983 A CN 114978983A
Authority
CN
China
Prior art keywords
voting
node
order
result
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210684358.5A
Other languages
Chinese (zh)
Other versions
CN114978983B (en
Inventor
马志新
王赟栋
徐玉生
刘莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gansu Daily Newspaper Group Co ltd
Lanzhou University
Original Assignee
Gansu Daily Newspaper Group Co ltd
Lanzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gansu Daily Newspaper Group Co ltd, Lanzhou University filed Critical Gansu Daily Newspaper Group Co ltd
Priority to CN202210684358.5A priority Critical patent/CN114978983B/en
Publication of CN114978983A publication Critical patent/CN114978983A/en
Application granted granted Critical
Publication of CN114978983B publication Critical patent/CN114978983B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Mathematical Optimization (AREA)
  • Signal Processing (AREA)
  • Pure & Applied Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses an influence node identification method and system based on a second-order H index and a voting mechanism, wherein data to be tested are acquired, data preprocessing is carried out, and a data preprocessing result is acquired; initializing based on the preprocessing result, acquiring an initialization result, and acquiring a voting result through voting; screening based on the voting result to obtain a screening result; and updating based on the screening result, acquiring an updating result, selecting a specified number of seed nodes, and finishing node identification. According to the method and the device, the initial voting capacity of all the nodes is set to be the SHIKS value of the node, the second-order neighbors are taken into consideration in the voting stage, the voting capacity of all the nodes in the second-order neighborhood is updated in the updating stage according to the shortest path length between the nodes and the seed nodes, the screened seed nodes are far away from each other, the propagation effect is better, and the method and the device have higher accuracy compared with a traditional identification mode.

Description

Influence node identification method and system based on second-order H index and voting mechanism
Technical Field
The application belongs to the field of influence node identification algorithms based on a second-order H index, and particularly relates to an influence node identification method and system based on a second-order H index and a voting mechanism.
Background
The H-index (H-index) is a hybrid quantitative index originally proposed by george-herch and used to measure the number of papers and the influence. The H index is described as: there are H papers cited under the name of an author at least H times, so this index can represent the academic ability of the author's paper to some extent.
Hirsch et al then introduced the H-index as a centrality indicator into the impact maximization problem to identify the nodes with impact.
The H index is used for measuring the influence of the node by the number of neighboring nodes directly connected to the node, but if the index is used alone, a group of nodes with influence cannot be mined more accurately, because a plurality of nodes with the same H index may exist, and the difference between the influences of the nodes cannot be further quantified, so that the influence of the node is measured by taking the self-degree of the node into consideration while introducing the second-order H index and combining the self-degree with the entropy of the node information.
Although the traditional VoteRank algorithm has a remarkable effect compared with some classical algorithms when mining nodes with influences, certain limitations also exist. The algorithm considers that the initial voting capacity of each node is the same and is set to be 1, but in many voting scenes in real life, the importance of each participant is different, namely the initial voting capacity of the participants should be different. In addition, in the voting phase, the voting score of each node comes from only the first-order neighbor nodes, but besides the first-order neighbors, the second-order neighbors or more distant nodes play an important role in the process of mining the seed nodes. Furthermore, the VoteRank needs to reduce the voting capacity in the update stage, and the reduction amount should not be set to a fixed value, but determined according to the distance between the seed node and the node. If the node is farther away from the seed node, the degree of weakening the voting ability of the node is lower, and if the node is closer to the seed node, the degree of weakening the voting ability of the node is higher. Based on the analysis, the application combines the SHIKS algorithm with the Voterank algorithm, and provides a new impact node identification algorithm SHIKS-Voterank.
The SHIKS-VoteRank algorithm is an influence node identification algorithm which combines SHIKS and VoteRank and has better accuracy and execution efficiency, the algorithm calculates the influence of each node by using the SHIKS algorithm in the initialization stage, and the value is used as the initial voting capacity, namely va, of each node in the voting mechanism v =H-index 2 (v) In that respect In addition, in the voting process of the VoteRank, the voting score of a node only comes from the node directly connected with the node, namely the algorithm does not consider the influence of neighbor nodes two or three hops away from the node on the node. Therefore, in the SHIKS-VoteRank algorithm, the calculation of the voting score needs to take into account the first-order and second-order neighbor nodes of the node. In the updating stage, all nodes participating in the voting need to be weakened in a variable manner, and the weakening strength depends on the distance between the node and the seed node. The closer to the seed node, the greater the weakened strength, and vice versa. Therefore, on the basis of the original attenuation factor, the shortest path length between nodes is introduced, and a new attenuation factor is defined. Since the voting stage takes the second-order neighbor nodes into account, in the updating stage, the voting capability of the second-order neighbor nodes also needs to be updated at the same time.
Disclosure of Invention
The application provides an influence node identification method and system based on a second-order H index and a voting mechanism.
In order to achieve the above purpose, the present application provides the following solutions:
the influence node identification method based on the second-order H index and the voting mechanism comprises the following steps:
acquiring data to be tested;
based on the data to be tested, data preprocessing is carried out to obtain a data preprocessing result, and the preprocessing result comprises: the importance of the node and the SHIKS value, and initialization is carried out;
voting based on the initialized preprocessing result to obtain a voting result;
screening based on the voting result to obtain a screening result;
and updating based on the screening result, selecting a specified number of seed nodes, and finishing node identification.
Preferably, the data preprocessing method includes:
calculating a second-order H index of each node of the data to be tested based on the obtained data to be tested, and obtaining the second-order H index of each node of the data to be tested;
and calculating the importance and the SHIKS value of each node based on the second-order H index of each node of the data to be tested.
Preferably, the initialization method includes:
and initializing the voting score and the voting capacity, wherein the voting score is 0, and the voting capacity is the SHIKS value of the node.
Preferably, the voting method includes:
the first-order neighbor and the second-order neighbor both participate in voting, i.e. the voting score is
Figure RE-GDA0003765279490000041
Wherein gamma is u Is a first-order and second-order neighbor node set of a node u, a corner mark v represents a node, va represents voting capability and va v Indicating the voting ability of the node.
Preferably, the screening method comprises: and after the voting is finished, adding the node with the highest voting score into the seed node set, wherein the voting capacity and the voting score of the node are set to be 0, so that the node does not participate in the subsequent voting any more and can not be elected for the second time.
Preferably, the updating method includes: the voting capacity of the first-order and second-order neighbor nodes of the seed node needs to be attenuated, the attenuation is determined by the length of the shortest path between the node and the seed node, namely an attenuation factor
Figure RE-GDA0003765279490000042
Where < k > represents the average of the sum of degrees of the respective nodes, and d represents the shortest path length.
In order to better realize the technical content, the application also provides an influence node identification system based on a second-order H index and a voting mechanism,
the method comprises the following steps: the system comprises a data acquisition module, a data preprocessing module, a voting mechanism module, a voting screening module and a node identification module;
the data acquisition module is used for acquiring data to be tested;
the data preprocessing module is used for preprocessing data based on the data to be tested and obtaining a data preprocessing result, wherein the preprocessing result comprises: the importance of the node and the SHIKS value, and initialization is carried out;
the voting mechanism module is used for voting based on the initialized preprocessing result to acquire a voting result;
the voting screening module is used for screening based on the voting result to obtain a screening result;
and the node identification module is used for updating based on the screening result, selecting a designated number of seed nodes and finishing node identification. Preferably, the initialization method in the data initialization module includes:
and initializing the voting score and the voting capacity, wherein the voting score is 0, and the voting capacity is the SHIKS value of the node.
Preferably, the screening method in the voting screening module comprises: and after the voting is finished, adding the node with the highest voting score into the seed node set, setting the voting capacity of the node to be 0, and not participating in the subsequent voting any more.
Preferably, the updating method includes: the voting capacity of the first-order and second-order neighbor nodes of the seed node needs to be attenuated, the attenuation is determined by the length of the shortest path between the node and the seed node, namely an attenuation factor
Figure RE-GDA0003765279490000051
Where < k > represents the average of the sum of degrees of the respective nodes and d represents the shortest path length.
The beneficial effect of this application does: the application discloses an influence node identification method and system based on a second-order H index and a voting mechanism, the initial voting capacity of all nodes is set to be the SHIKS value of the node, second-order neighbors are taken into consideration in a voting stage, the voting capacity of all nodes in a second-order neighborhood is updated according to the shortest path length between the node and the seed node in an updating stage, the screened seed nodes are far away from each other, the propagation effect is better, and the method and system are more accurate compared with a traditional identification mode.
Drawings
In order to more clearly illustrate the technical solution of the present application, the drawings needed to be used in the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for a person skilled in the art to obtain other drawings without any inventive exercise.
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present application;
FIG. 2 is a diagram illustrating a specific example of a VoteRank algorithm flow according to an embodiment of the present application;
FIG. 3 is a graph showing the variation of the infection amount F (t) with time t in the example of the present application;
FIG. 4 shows the final infection scale F (t) of the examples of the present application c ) A line graph schematic diagram which changes along with the seed node proportion rho;
FIG. 5 shows an average shortest distance L between seed nodes according to an embodiment of the present disclosure s A line graph schematic diagram which changes along with the seed node proportion rho;
FIG. 6 is a schematic flow chart of the VoteRank algorithm according to the embodiment of the present application;
fig. 7 is a schematic diagram of a system according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
As shown in fig. 1, the method for identifying an influence node based on a second-order H index and a voting mechanism specifically includes the following steps:
acquiring data to be tested;
based on the data to be tested, data preprocessing is carried out to obtain a data preprocessing result, and the result comprises: the importance of the node and the SHIKS value, and initialization is carried out;
voting based on the initialized result;
screening based on the voted result to obtain a screening result;
and updating based on the screening result, selecting a specified number of seed nodes, and finishing node identification.
The data preprocessing method comprises the following steps:
calculating a second-order H index of each node of the data to be tested based on the obtained data to be tested, and obtaining the second-order H index of each node of the data to be tested;
and calculating the importance and the SHIKS value of each node based on the second-order H index of each node of the data to be tested.
The initialization method comprises the following steps:
and initializing the voting score and the voting capacity, wherein the voting score is 0, and the voting capacity is the SHIKS value of the node.
The voting method comprises the following steps:
the first-order neighbor and the second-order neighbor both participate in voting, i.e. the voting score is
Figure RE-GDA0003765279490000071
Wherein gamma is u Is a first-order and second-order neighbor node set of a node u, a corner mark v represents a node, va represents voting capability and va v Indicating the voting ability of the node.
The screening method comprises the following steps: and after the voting is finished, adding the node with the highest voting score into the seed node set, wherein the voting capacity and the voting score of the node are set to be 0, so that the node does not participate in the subsequent voting any more and can not be elected for the second time.
The updating method comprises the following steps: the voting capacity of the first-order and second-order neighbor nodes of the seed node needs to be attenuated, the attenuation is determined by the length of the shortest path between the node and the seed node, namely an attenuation factor
Figure RE-GDA0003765279490000081
Wherein < k > represents the average value of the sum of degrees of each node, d represents the shortest path length, and the designated number of seed nodes are selected through updating to complete node identification.
In this embodiment, the specific operation steps are as follows:
the method comprises the following steps: the degree of a first-order neighbor node and a second-order neighbor node of each node is calculated, and then a second-order H index of each node is obtained according to the concept of the second-order H index.
The moderate concept is as follows:
degree (Degree) is the simplest and most direct index for describing the attribute of a node in a network, and the Degree of a node is the Degree of how many first-order neighbor nodes a node has. Degree of node i degree i Is defined as:
Figure RE-GDA0003765279490000082
the second order exponent is defined as follows:
if the second-order H index value of a node is k, at least k nodes with the degree of k are arranged in the first-order and second-order neighbor nodes of the node.
Step two: the importance of each node is calculated according to equation 1 and then the SHIKS value of each node is calculated according to equation 2.
The importance calculation method of the node v is shown as formula 2:
Figure RE-GDA0003765279490000091
wherein, H-index 2 (v) Second order H-exponent, degree representing node v v Degree of the node v,
Figure RE-GDA0003765279490000092
The ratio of the degree of the node v to the sum of the degrees of all nodes in the network, and N represents the total number of nodes in the network.
The node information entropy calculation mode of the node v is shown as formula 3:
Figure RE-GDA0003765279490000093
wherein the content of the first and second substances,
Figure RE-GDA0003765279490000094
set of k-th order neighbor nodes, Im, representing node v p Denotes the importance of the node p, Im q Representing the importance of node q.
Step three: an initialization stage: initializing the voting score and the voting capacity of each node, wherein the voting score is 0, and the voting capacity is the SHIKS value of the node, namely (S) u ,va u )= (0,SHIKS(u))。
Step four: a voting stage: the first-order neighbors and the second-order neighbors (if any) of each node participate in the vote, i.e., the vote score is
Figure RE-GDA0003765279490000095
Wherein gamma is u Is a set of first and second order neighbor nodes for node u. Notably, if a node has been selected as a seed node in a previous round, the voting score for that node is set to 0, avoiding being selected twice.
Step five: a screening stage: and after each round of voting is finished, counting the voting scores of all the nodes, and selecting the node with the highest voting score to add into the seed node set. Meanwhile, the voting capacity of the node is set to 0, and the node does not participate in subsequent voting.
Step six: and (3) an updating stage: the voting capacity of the first-order and second-order neighbor nodes of the seed node needs to be attenuated, the attenuation is determined by the length of the shortest path between the node and the seed node, namely an attenuation factor
Figure RE-GDA0003765279490000101
Wherein<k>Representing the average degree of the network; d represents the shortest path length between the node and the seed node.
Step seven: and repeating the fourth step to the sixth step until the seed nodes with the specified number are selected.
To more intuitively illustrate the concept of the SHIKS-VoteRank algorithm, the network shown in FIG. 2 is used to illustrate the process of selecting a group of influential nodes, here taking the selection of two nodes as an example.
Fig. 2 is an exemplary network with 7 nodes, denoted by G ═ V, E, where V denotes a set of all nodes in the network and E denotes a set of edges between nodes. By H-index 2 (v) represents the second-order H index of the node v, and SHIKS (v) represents the SHIKS value of the node v, (S) v ,va v ) Represents the voting score and voting ability of the node v, f represents a decay factor,<k>represents the average degree of the network, d represents the length of the shortest path between the node v and the seed node。
The initial voting capability of each node, i.e., the SHIKS value, needs to be calculated first. Taking node a as an example, according to the definition of the second-order H index, the second-order index is defined as follows:
if the second-order H index value of a node is k, at least k nodes with the degree of k are arranged in the first-order and second-order neighbor nodes of the node.
The second-order H index value of the node A is calculated to be 3, namely H-index 2 (A) 3. And then calculating according to formula 2 and formula 3 to obtain the SHIKS value of the node A.
Figure RE-GDA0003765279490000102
Figure RE-GDA0003765279490000111
Likewise, the SHIKS value of all other nodes can be calculated by the above formula. Table 1 shows the corresponding SHIKS values for each node. After that, each node is initialized and,
TABLE 1
Figure RE-GDA0003765279490000112
Setting the initial voting capability of the node to the SHIKS value of the node, and the voting score is 0, as shown in figure 2. And after the initialization is finished, entering a voting stage. The voting score of each node is equal to the sum of the voting capabilities of its first-order neighbors and second-order neighbors.
The voting score of the node A can be calculated according to equation 4, wherein gamma A Is the first and second order neighbor set of node a.
Figure RE-GDA0003765279490000113
Wherein, gamma is u Representing first and second order neighbors of node uAnd (4) aggregation of the nodes.
Figure RE-GDA0003765279490000114
Likewise, the voting scores of other nodes can be calculated by this formula, as shown in fig. 2. Then, the screening stage is entered, and the voting score of the node B is 240.8921, which is the highest among all the nodes, so that the node B is selected as the seed node. It should be noted that after node B is selected as the seed node, its voting ability and voting score both need to be set to 0, the voting ability is set to 0 to make the node not participate in the subsequent voting, and the voting score is set to 0 to make the node not be elected repeatedly. After the election is finished, the updating stage is entered, and as the node A, C, D, E, F, G votes for the node B, the voting capacity of the nodes needs to be attenuated, namely va v =va v -f. Wherein the attenuation factor
Figure RE-GDA0003765279490000121
Taking node a as an example, the new voting capacity of node a is:
Figure RE-GDA0003765279490000122
the voting ability of the nodes after each node update is shown in fig. 2. Then, entering the second round of voting phase, a new voting score of each node can be obtained according to equation 4, as shown in fig. 2, the voting score of the node C is 385.3528 which is the highest among all the nodes, so that the node C is selected as the seed node, and the voting score and voting capacity of the node C are set to 0. At this time, the number of the seed nodes reaches two, and the algorithm stops.
Step eight: results and analysis of the experiments
S8.1 Experimental setup
S8.1.1 data set
In order to more truly evaluate the performance and accuracy of the SHIKS-voterrank algorithm, the experiment uses 12 reference real network data sets of different scales and different characteristics, respectively: jazz, USAir97, Email, Celegansroad, Hamster, Polblogs, Power, Router, Yeast, Facebook, CEnew, US-Air 2010.
S8.1.2 comparison algorithm
Seven influence node identification algorithms are compared in the experiment, and are respectively as follows: DC. CC, BC, MCDE, H-index, VoteRank, SHIKS.
S8.1.3 evaluation index
In the experiment of this example, the amount of infection F (t) at each time and the final amount of infection F (t) were used c ) Average shortest path length between seed nodes, L s And evaluating the accuracy and the effectiveness of the algorithm by the indexes.
S8.1.4 Experimental Environment
In this embodiment, all experiments are performed on a 64-bit Windows11 operating system with 11th Gen Intel (R) core (tm) i7-1165G7@2.80GHz CPU and 16G memory, the algorithms involved in this embodiment are implemented by using Python language version 3.8, and the experimental results are drawn by using a Python drawing library Matplotlib.
TABLE 2
Figure RE-GDA0003765279490000131
S8.1.5 parameter setting
In order to evaluate the propagation capability of the seed nodes mined by the proposed algorithm, the present embodiment performs all propagation scale comparison experiments by using the SIR model. In the SIR model, each node has three states: infection state, recovery state, susceptibility state. The node in the infection state infects the susceptible node with a certain probability beta, and the node returns to the normal state with a certain probability gamma. In the experiment of the present embodiment, the recovery rate γ was set to 0.01. The setting of the infection rate is very important, if the infection rate is too low, the transmission effect may be poor, even the transmission cannot be carried out, but if the infection rate is too high, the situation of infection outbreak may occur in the whole network,the impact on a single node is difficult to distinguish. Therefore, in this example, the infection rate β is slightly larger than the transmission rate λ c ,
Figure RE-GDA0003765279490000141
Wherein<k>Which represents the average degree of the network,<k 2 >representing the average of the sum of squares of the node degrees in the network. Infection rate beta and transmission rate lambda used by 12 real networks c . As shown in table 2. Since there is some error in the result obtained by each simulation, the experiment of this embodiment sets the number of simulations to 2000, and takes the average value of 2000 simulations as the final result.
S8.2 analysis of Experimental results
FIG. 3 is a graph of the amount of infection (the proportion of infected nodes to recovery nodes to all nodes in the network) over time at each time, with the X-axis representing the time of infection t and the Y-axis representing the amount of infection F (t) c ). The initial node number in this experiment was set to 20% of the total node number. As can be seen from the figure, in the celegsnereal, CEnew, USAir2010, Facebook, Polblogs and Router network, the number of infected nodes per time of the SHIKS-VoteRank is obviously greater than that of other comparison algorithms such as SHIKS, VoteRank and H-index, namely the infected capability of the seed node is relatively strong, the infected range is also larger, and from the steep degree of the curve, the curve of the SHIKS-VoteRank is steeper than that of the other algorithms, which shows that the seed node excavated by the SHIKS-VoteRank has a faster propagation speed. In USAir97 and Power networks, when t is<At 200, the performance of SHIKS-VoteRank is only slightly lower than that of VoteRank, when t is>At 200, the performance of the SHIKS-VoteRank is basically equal to that of the VoteRank and the SHIKS. On the whole, in most real networks, the SHIKS-VoteRank algorithm is excellent in performance, the excavated seed nodes have better propagation capacity, and the accuracy and the effectiveness of the algorithm are verified.
FIG. 4 is the final infection scale F (t) c ) Graph with initial seed node ratio ρ. The X-axis represents the seed node ratios, 0.04, 0.08, 0.12, 0.16, and 0.20, respectively. The Y-axis represents the final infection scale. In the network Email, Celegansneal, USAir2010, USAir,In Facebook, Hamster, Polblogs and Router networks, the performance of the SHIKS-VotetRank algorithm is superior to that of other algorithms under the condition of seed nodes in any proportion. In a Jazz network, the SHIKS-VotetRank algorithm is not stable, when rho is 0.04, the SHIKS-VotetRank performance is only slightly higher than VotetRank, the SHIKS-VotetRank performance is obviously improved along with the gradual increase of the proportion of the seed nodes, when rho is 0.12, the SHIKS-VotetRank exceeds H-index, and then the SHIKS-VotetRank algorithm and the SHIKS algorithm show a tendency of alternating leading. In USAir97, Power network, when rho>At 0.1, the number of the infected nodes of the SHIKS-VoteRank exceeds the SHIKS and VoteRank algorithms, and then the infected nodes are always in the leading position. And in Yeast network, when p>At 0.14, the SHIKS-VoteRank performance was higher than that of SHIKS.
In general, although the seed node proportion is different, the SHIKS-VoteRank algorithm is superior to other comparison algorithms in the final infection scale in most networks.
FIG. 5 is a line graph of the average shortest distance between seed nodes as a function of the seed node scale. Wherein the X axis is the seed node proportion rho, the values are respectively 0.005, 0.010, 0.015, 0.020, 0.025 and 0.030, and the Y axis is the average shortest distance L between the seed nodes s . As can be seen from the figure, in addition to the Jazz and Eamil networks, the seed selected by the SHIKS-voterrank algorithm can be distributed throughout the network compared to other algorithms, i.e., the selected seed nodes are more distributed, especially in the Celegansneural, USAir97, USAir2010, Power, Router networks. In the network CEnew, USAir97, ρ>At 0.08, the performance of the SHIKS-VoteRank algorithm is obviously improved and exceeds that of the SHIKS algorithm. In Facebook, Hamster, Yeast networks, when ρ<0.12, the effect is not good, but when p is>At 0.12, the effect is remarkably improved. In a whole view, when the seed node proportion is low, the effect is general, but with the proportion increasing, the advantages of the algorithm are reflected.
The embodiment firstly describes the idea and specific steps of the traditional VoteRank algorithm in detail through an example network, analyzes the limitation of the VoteRank, and then provides an influence node identification algorithm SHIKS-VoteRank with higher accuracy aiming at the defects of the VoteRank. And secondly, describing the idea and specific steps of the SHIKS-VoteRank algorithm in detail, wherein the algorithm is used for setting the initial voting capacity of all nodes as the SHIKS value of the node and taking second-order neighbors into consideration in the voting stage. In addition, in the updating stage, the voting capacity of all nodes in the second-order neighborhood is updated according to the shortest path length between the nodes and the seed nodes. And finally, the algorithm is applied to 12 real networks, and a comparison experiment is carried out with the SHIKS and other classical influence node identification algorithms, and the result shows that the seed nodes screened out by the SHIKS-VoteRank are far away from each other, so that the algorithm has a better propagation effect, and the accuracy and the effectiveness of the algorithm are verified.
Example two
As shown in fig. 7, the present application further provides an influence node identification system based on a second-order H-index and voting mechanism,
the method comprises the following steps: the system comprises a data acquisition module, a data preprocessing module, a voting mechanism module, a voting screening module and a node identification module;
the data acquisition module is used for acquiring data to be tested;
the data preprocessing module is used for preprocessing data based on the data to be tested and acquiring a data preprocessing result, and the result comprises: the importance of the node and the SHIKS value, and initialization is carried out;
the voting mechanism module is used for voting based on the initialized result;
the voting screening module is used for screening based on the voting result to obtain a screening result;
and the node identification module is used for updating based on the screening result, selecting a designated number of seed nodes and finishing node identification.
Specifically, the initialization method in the data initialization module includes:
and initializing the voting score and the voting capacity, wherein the voting score is 0, and the voting capacity is the SHIKS value of the node.
The voting mechanism module carries out voting based on the initialization result to acquire a voting result;
the voting screening module is used for screening based on the voting result to obtain a screening result;
the screening method in the voting screening module comprises the following steps: and after the voting is finished, adding the node with the highest voting score into the seed node set, wherein the voting capacity and the voting score of the node are set to be 0, so that the node does not participate in the subsequent voting any more and can not be elected for the second time.
And the node identification module is used for updating based on the screening result, acquiring the updating result, selecting a specified number of seed nodes and finishing node identification.
The updating method in the node identification module comprises the following steps: the voting capacity of the first-order and second-order neighbor nodes of the seed node needs to be attenuated, the attenuation is determined by the length of the shortest path between the node and the seed node, namely an attenuation factor
Figure RE-GDA0003765279490000171
Where < k > represents the average of the sum of degrees of the respective nodes and d represents the shortest path length.
VoteRank is an algorithm proposed by Zhang et al based on a voting mechanism to identify a group of influential nodes in a complex network. VoteRank assigns a tuple (S) to each node u ,υa u ) Recording the voting score and the voting capacity of the node after each round of voting, S u Voting score, va, representing a node u Indicating the voting ability of the node. At an initial time, the voting score and voting ability of each node are set to 0 and 1, and then the voting phase is entered. In the voting stage, the voting score of each node is obtained by summing up the voting capacities of the neighboring nodes of the node, and the calculation formula is as follows:
Figure RE-GDA0003765279490000181
wherein, gamma is u A set of first and second order neighbor nodes representing node u. In the screening stage, the node with the highest voting score is added into the seedAnd setting the voting score and the voting capacity of the node to be 0 in the node set, wherein the voting score and the voting capacity are set to be 0 in order to avoid that the node is selected again in the subsequent rounds, and the node is not participated in the subsequent voting any more. Finally, in the updating stage, the voting capacity of the neighbor nodes of the seed node needs to be weakened, and each round is weakened
Figure RE-GDA0003765279490000182
Until the voting ability reaches 0, wherein<k>Is the average degree of the network. The process of voting, election, and updating is then repeated until a specified number of seed nodes are elected. In order to more intuitively show the concept of the VoteRank algorithm, the steps of VoteRank are described in detail by using the network shown in FIG. 7.
Fig. 6 shows the voting scores and voting ability of the nodes after the first round of voting. Taking node C as an example, the voting score is:
Figure RE-GDA0003765279490000183
by analogy, the voting scores of the rest nodes can be obtained. As can be seen in the figure, node C has the highest vote score, and therefore node C is added to the set of seed nodes. Since node C is elected, it will not participate in subsequent votes and therefore its voting score and voting ability are set to 0, and moreover, the voting ability of node C's neighbor node A, B, D, E, F, G will be reduced
Figure RE-GDA0003765279490000191
The voting ability of each node after the update is shown in fig. 7. Then, a second round of voting is started, taking node a as an example, since the voting ability of node C has been set to 0, the voting score of node a comes only from node B, and in the last round, the voting ability of node B is weakened to 0.5714, so the voting score of node a is 0.5714. By analogy, a second round of voting scores for each of the remaining nodes can be derived. As can be seen in the figure, the voting score of node H is first at 4.1428, sinceThis adds node H to the set of seed nodes in the second round. The voting ability of the neighbor nodes to node H then continues to diminish. The above process is repeated until a sufficient number of seed nodes are selected.
The above-described embodiments are merely illustrative of the preferred embodiments of the present application, and do not limit the scope of the present application, and various modifications and improvements made to the technical solutions of the present application by those skilled in the art without departing from the spirit of the present application should fall within the protection scope defined by the claims of the present application.

Claims (10)

1. The influence node identification method based on the second-order H index and the voting mechanism is characterized in that,
acquiring data to be tested;
based on the data to be tested, data preprocessing is carried out to obtain a data preprocessing result, and the preprocessing result comprises: the importance of the node and the SHIKS value, and initialization is carried out;
voting based on the initialized preprocessing result to obtain a voting result;
screening based on the voting result to obtain a screening result;
and updating based on the screening result, selecting a specified number of seed nodes, and finishing node identification.
2. The method for identifying an influence node based on a second-order H-index and voting mechanism according to claim 1,
the data preprocessing method comprises the following steps:
calculating a second-order H index of each node of the data to be tested based on the obtained data to be tested, and obtaining the second-order H index of each node of the data to be tested;
and calculating the importance and the SHIKS value of each node based on the second-order H index of each node of the data to be tested.
3. The method of claim 2, wherein the influence node identification method based on the second-order H index and voting mechanism,
the initialization method comprises the following steps:
and initializing the voting score and the voting capacity, wherein the voting score is 0, and the voting capacity is the SHIKS value of the node.
4. The method for identifying an influence node based on a second-order H index and voting mechanism according to claim 1,
the voting method comprises the following steps:
the first-order neighbor and the second-order neighbor participate in the voting, i.e. the voting score is
Figure FDA0003699508000000022
Wherein gamma is u Is a first-order and second-order neighbor node set of a node u, a corner mark v represents a node, va represents voting capability and va v Representing the voting ability of node v.
5. The method of claim 4, wherein the influence node identification method based on the second-order H index and voting mechanism,
the screening method comprises the following steps: and after the voting is finished, adding the node with the highest voting score into the seed node set, wherein the voting capacity and the voting score of the node are set to be 0, so that the node does not participate in the subsequent voting any more and can not be elected for the second time.
6. The method of claim 5, wherein the influence node identification method based on the second-order H index and voting mechanism,
the updating method comprises the following steps: the voting capacity of the first-order and second-order neighbor nodes of the seed node needs to be attenuated, the attenuation is determined by the length of the shortest path between the node and the seed node, namely an attenuation factor
Figure FDA0003699508000000021
Where < k > represents the average of the sum of degrees of the respective nodes and d represents the shortest path length.
7. The influence node identification system based on the second-order H index and the voting mechanism is characterized in that,
the method comprises the following steps: the system comprises a data acquisition module, a data preprocessing module, a voting mechanism module, a voting screening module and a node identification module;
the data acquisition module is used for acquiring data to be tested;
the data preprocessing module is used for preprocessing data based on the data to be tested and obtaining a data preprocessing result, wherein the preprocessing result comprises: the importance of the node and the SHIKS value, and initialization is carried out;
the voting mechanism module is used for voting based on the initialized preprocessing result to acquire a voting result;
the voting screening module is used for screening based on the voting result to obtain a screening result;
and the node identification module is used for updating based on the screening result, selecting a designated number of seed nodes and finishing node identification.
8. The second order H-exponent and voting mechanism-based influence node identification system of claim 7,
the initialization method in the data initialization module comprises the following steps:
and initializing the voting score and the voting capacity, wherein the voting score is 0, and the voting capacity is the SHIKS value of the node.
9. The second order H-exponent and voting mechanism-based influence node identification system of claim 7,
the screening method in the voting screening module comprises the following steps: and after the voting is finished, adding the node with the highest voting score into the seed node set, wherein the voting capacity and the voting score of the node are set to be 0, so that the node does not participate in the subsequent voting any more and can not be elected for the second time.
10. The second order H-exponent and voting mechanism-based influence node identification system of claim 9,
the updating method comprises the following steps: the voting capacity of the first-order and second-order neighbor nodes of the seed node needs to be attenuated, the attenuation is determined by the length of the shortest path between the node and the seed node, namely an attenuation factor
Figure FDA0003699508000000041
Where < k > represents the average of the sum of degrees of the respective nodes and d represents the shortest path length.
CN202210684358.5A 2022-06-17 2022-06-17 Impact node identification method and system based on second-order H index and voting mechanism Active CN114978983B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210684358.5A CN114978983B (en) 2022-06-17 2022-06-17 Impact node identification method and system based on second-order H index and voting mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210684358.5A CN114978983B (en) 2022-06-17 2022-06-17 Impact node identification method and system based on second-order H index and voting mechanism

Publications (2)

Publication Number Publication Date
CN114978983A true CN114978983A (en) 2022-08-30
CN114978983B CN114978983B (en) 2023-10-20

Family

ID=82963280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210684358.5A Active CN114978983B (en) 2022-06-17 2022-06-17 Impact node identification method and system based on second-order H index and voting mechanism

Country Status (1)

Country Link
CN (1) CN114978983B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140249659A1 (en) * 2013-03-04 2014-09-04 Patrick Kehyari Method for Finding Influential Nodes in a Social Network
CN106372743A (en) * 2016-08-23 2017-02-01 浙江工业大学 Second-order local community and common neighbor proportion information-based method for predicting unknown connected edges of network
CN110417591A (en) * 2019-07-23 2019-11-05 中南民族大学 Ballot node configuration method and system
CN113723503A (en) * 2021-08-28 2021-11-30 重庆理工大学 Method for identifying influential propagators in complex network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140249659A1 (en) * 2013-03-04 2014-09-04 Patrick Kehyari Method for Finding Influential Nodes in a Social Network
CN106372743A (en) * 2016-08-23 2017-02-01 浙江工业大学 Second-order local community and common neighbor proportion information-based method for predicting unknown connected edges of network
CN110417591A (en) * 2019-07-23 2019-11-05 中南民族大学 Ballot node configuration method and system
CN113723503A (en) * 2021-08-28 2021-11-30 重庆理工大学 Method for identifying influential propagators in complex network

Also Published As

Publication number Publication date
CN114978983B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
Zhang et al. Community detection in networks with node features
Barrat et al. Rate equation approach for correlations in growing network models
Sülflow et al. Robust multi-objective optimization in high dimensional spaces
CN112364584A (en) Static time sequence analysis method based on distribution
Zhang et al. A combinatorial model and algorithm for globally searching community structure in complex networks
Neyman et al. Repeated games with public uncertain duration process
Wright et al. Neural network architecture selection analysis with application to cryptography location
CN113054651B (en) Network topology optimization method, device and system
CN115051929A (en) Network fault prediction method and device based on self-supervision target perception neural network
CN110334104A (en) A kind of list update method, device, electronic equipment and storage medium
CN114978983A (en) Influence node identification method and system based on second-order H index and voting mechanism
CN111711530A (en) Link prediction algorithm based on community topological structure information
Li et al. The power function hidden in the vulnerability of fractal complex networks
CN112270058A (en) Optical network multi-channel transmission quality prediction method based on echo state network
CN109218184B (en) Router attribution AS identification method based on port and structure information
CN113780656B (en) Complex product multi-source change propagation influence prediction method based on cluster decoupling
Kraiczy et al. On weakly and strongly popular rankings
CN110210978A (en) Determine the method, apparatus, equipment and storage medium of fund factor validity
CN115130044A (en) Influence node identification method and system based on second-order H index
CN110738421B (en) Multilayer network user influence measuring method based on shortest propagation path
Esmaeili et al. Centralized fairness for redistricting
CN111797281A (en) Two-layer dissimilarity community discovery algorithm research based on central node
Fletcher et al. Optimizing a Distributed Graph Data Structure for K-Path Centrality Estimation on HPC
Liu et al. Identifying Influential Nodes Based on Optimized Structural Holes in Complex Networks
Frank Measuring social capital by network capacity indices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant