CN114242261A

CN114242261A - Virus propagation control method based on bounded seepage-greedy algorithm

Info

Publication number: CN114242261A
Application number: CN202111518210.6A
Authority: CN
Inventors: 刘洋; 陈晓祺; 王震; 王茜; 李学龙
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-12-10
Filing date: 2021-12-10
Publication date: 2022-03-25

Abstract

The invention provides a virus propagation control method based on a bounded seepage-greedy algorithm. Firstly, extracting target network information, and knowing target network nodes and connection edge attributes; then, based on the seepage theory, the target network continuously occupies specific nodes, and the reverse process of removing key nodes is realized; occupying the node which minimizes the objective function each time, limiting the size of the maximum connected component and the degree of the residual network nodes, and controlling virus propagation; setting a critical index threshold value, and updating the threshold value to occupy more candidate nodes when the critical indexes of all the candidate nodes exceed the threshold value; and finally, expressing the effect of controlling virus propagation by using the sequence parameters and the network toughness. The invention can realize the rapid and efficient decomposition of the large-scale network, thereby controlling the virus propagation in time and reducing the loss caused by the virus propagation.

Description

Virus propagation control method based on bounded seepage-greedy algorithm

Technical Field

The invention belongs to the technical field of network information analysis, and particularly relates to a virus propagation control method based on a bounded seepage-greedy algorithm.

Background

The network can simulate the interaction situation inside a complex system, wherein nodes represent individuals in the system, and edges represent interaction relations among the individuals. The application of the network is beneficial to researching the global property of the system. The network can test the effect of various artificial measures used for the real system and provide an optimal solution for controlling, predicting, optimizing and reconstructing the real system.

The network decomposition problem refers to identifying a set of key nodes for a given network, the removal of which can maximally decompose the network. The network decomposition can effectively reflect and analyze the actual situation, for example, compared with a diffusion system under the assumption of an average field, the network diffusion system can better represent the propagation mode of the virus; by identifying key nodes and decomposing a virus propagation network, the following problems can be solved to a certain extent: 1) which types of people need to be preferentially isolated when controlling virus transmission? 2) Resource-constrained, limited number of vaccines, which groups should be given priority to vaccine injection? 3) Which places should be focused on? On the other hand, the key nodes dominate the dynamic development of the virus propagation system, and the identification of the key nodes can help to find the diffusion source of virus propagation, so that the computing resources are saved.

The network decomposition problem has proven to be an NP-hard (Non-deterministic polymeric-time Hardness) problem. The network decomposition methods are specifically classified into the following four categories: 1) method based on local information: such methods do not require a known network topology, and randomly removing nodes from the network to achieve network decomposition is often not efficient enough. And then, deriving an acquaintance algorithm, and removing one neighbor node of a group of nodes to realize network decomposition by randomly selecting the group of nodes, wherein the efficiency is often lower than that of a node centrality-based method. 2) The method based on the node centrality comprises the following steps: and calculating the importance of the nodes by using node indexes such as degree centrality, feature vector centrality, Pagerank, betweenness centrality, Katz centrality and the like, and selecting the nodes with high importance as key nodes. The degree centrality method considers that nodes with higher degrees have higher importance. The characteristic vector centrality method considers that nodes connected with important nodes are also important nodes, so that the centrality of the nodes is obtained by adding the centrality of the neighbors of the nodes. 3) The heuristic method comprises the following steps: after removing the node with the highest importance, the degree of the original neighbor node is reduced. On the basis of the node centrality-based method, after removing nodes every time, the heuristic method recalculates the importance of the nodes in the rest network, and removes the nodes with the highest importance again. 4) The indirect method comprises the following steps: the indirect method can decompose the network more efficiently. The method based on the ring removal comprises a belief propagation guiding method, a minimum summation and a reverse greedy method, and the network decomposition can be realized by solving the problem of a feedback vertex set in the ring removal problem. The method based on graph segmentation realizes spectrum dichotomy through an approximation strategy and draws a vertex separator according to the minimum vertex coverage of the spectrum dichotomy, but the method directly considers the whole network and is not efficient enough. The FINDER method is based on a graph neural network and reinforcement learning to solve the network decomposition problem, and is theoretically a method considering a local network because it is based on the graph neural network.

The method is applied to realize network decomposition, and more nodes need to be removed. In practical application scenarios, especially in resource-poor areas, the resources for controlling virus propagation are limited, so that too many people cannot be vaccinated, isolated and protected, and too many places cannot be closed, thereby making it difficult to control virus propagation.

The above method is not very efficient to apply, especially in large networks. For large-scale network data, part of methods have too long calculation time or exceed the memory limit, and the network decomposition problem, such as the GND method based on graph partitioning, is difficult to solve. In practice, in the face of large-scale outbreak of virus propagation, real-time information of all parties needs to be considered, and the system can quickly respond to the outbreak of the virus to carry out prevention and control; however, the above method is difficult to be applied to a large-scale network in practice, and it is difficult to realize a quick response to virus propagation.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a virus propagation control method based on a bounded seepage-greedy algorithm. Firstly, extracting target network information, and knowing target network nodes and connection edge attributes; then, based on the seepage theory, the target network continuously occupies specific nodes, and the reverse process of removing key nodes is realized; occupying the node which minimizes the objective function each time, limiting the size of the maximum connected component and the degree of the residual network nodes, and controlling virus propagation; setting a critical index threshold value, and updating the threshold value to occupy more candidate nodes when the critical indexes of all the candidate nodes exceed the threshold value; and finally, expressing the effect of controlling virus propagation by using the sequence parameters and the network toughness. The invention can realize the rapid and efficient decomposition of the large-scale network, thereby controlling the virus propagation in time and reducing the loss caused by the virus propagation.

A virus propagation control method based on a bounded seepage-greedy algorithm is characterized by comprising the following steps:

step 1: inputting related data of the crowd participating in virus propagation, including individual information, individual quantity, connection among individuals and probability of propagating viruses, constructing a virus propagation network G (N, M) corresponding to the crowd data of the virus propagation by taking the individuals in the crowd data of the virus propagation as nodes, the connection among the individuals as edges and the probability of propagating the viruses among the individuals as weights of the edges, wherein the point set of the network is N, the edge set is M, and the edge weight between the nodes v and w is beta_vw；

Step 2: initializing all nodes in the virus propagation network to be in an unoccupied state to form an unoccupied node set N_r(t); constructing a set of candidate nodes N_c(t), initially a set of unoccupied nodes N_r(t) any subset of which the number of nodes satisfies y ≦ N, N being the number of nodes contained in the point set N of the initial network; constructing an occupied node set N_o(t), initially an empty set; all edges are initialized to an unoccupied state, forming a set M of unoccupied edges_r(t); building a set of occupied edges M_o(t), initially an empty set; t represents each time after the start of virus propagation control, and initially t is 0 and occupies the node sequence S^r(t) is a null sequence; setting a critical index threshold

Initial 1, temporary value

Initially 1;

and step 3: at the time t, selecting a candidate node set N_c(t) the node that minimizes the objective function ψ (u), andif a plurality of nodes which enable the objective function psi (u) to be minimum exist at the same time, one node is randomly selected to be converted into the occupied state; the node is then assembled from the unoccupied node set N_r(t) candidate node set N_c(t) deleted, added to the set of occupied nodes N_o(t) and added to the sequence of occupied nodes S^r(t) end of; if two adjacent nodes are in the occupied state, the edge between the two nodes is converted into the occupied state, and the edge is not in the unoccupied edge set M_r(t) deleted, added to the set of occupied edges M_o(t) in (a);

and 4, step 4: candidate node set N at time t_c(t) the key index I of all nodes exceeds the key index threshold

If yes, turning to the step 5; otherwise, returning to the step 3 if t is t + 1;

and 5: if the key index threshold value is updated from the last time

At the current moment t, at least one node is selected from the network and is converted into an occupied state, and then a key index threshold value is set

Updated to alpha x minI I and then temporarily stored

Is updated to be new

Otherwise, the critical index threshold

Is updated to

Temporarily storing the value again

Is updated to be new

After the updating is finished, judging whether t is larger than the number n of nodes, if so, using the occupied node sequence S obtained at the moment^r(t) is the final occupied node sequence, go to step 6, otherwise, t equals t +1, return to step 3; the alpha is an updating parameter, and alpha is more than 1;

step 6: all nodes are converted into an unoccupied state again according to an occupied node sequence S^r(t) sequentially converting the nodes into occupied states by the internal sequence; calculating a sequence parameter G each time node state conversion is carried out_a(q); when the sequence parameter is increased from 0 to a non-zero constant for the first time, the unoccupied node ratio q at this time is recorded as an unoccupied node ratio threshold q_cUnoccupied node proportion threshold q_cRepresents the minimum node proportion which needs to be removed for controlling the virus propagation, and the smaller the value of the minimum node proportion is, the smaller the node proportion needs to be removed for controlling the virus propagation is; when occupying the node sequence S^rAnd (t) when all the nodes in the node(s) are converted into the occupied states, calculating the network toughness F, wherein the network toughness F represents the virus propagation control effect, and the smaller the value of the network toughness F is, the better the virus propagation control effect is.

Specifically, the unoccupied node set N described in step 2_r(t) and occupied node set N_o(t), at any moment, the intersection is empty, and the union is a point set N; set of unoccupied edges M_r(t) and set of occupied edges M_oAnd (t), at any moment, the intersection is empty, and the union is an edge set M.

Specifically, the objective function ψ (u) described in step 3 is set to:

wherein, for the candidate node set N_cThe node u in (t), ψ (u) represents an objective function value, and i (u) represents a key index value of the node u;

is a function of node u, for any satisfaction

The node(s) u of (a),

set to an equal finite number.

Specifically, the key index I in step 4 is set as the external degree of the node, and the calculation formula is as follows:

wherein the content of the first and second substances,

representing the external degree of the node u, wherein c (u) is a connected component where the node u is converted into an occupied state, and v represents a node in the connected component c (u); k is a radical of_vIs the degree, k, of node v in the initial network G (N, M)_v' for node v in occupied network G (N)_o(t),M_o(t)), said occupied network G (N)_o(t),M_o(t)) means that the set of occupied nodes N is at time t_oSet of nodes and occupied edges M in (t)_o(t) a network in which edges are connected to each other according to the structure of the initial network G (N, M); the connected component is a sub-network of the virus transmission network; the node degree is the number of edges connected with the node.

Specifically, the key index I in step 4 is set as the external propagation probability of the node, and the calculation formula is as follows:

wherein the content of the first and second substances,

representing the external propagation probability of the node u, wherein c (u) is a connected component where the node u is converted into an occupied state, and v represents a node in the connected component c (u); Γ (v) represents the unoccupied set of neighbor nodes of node v; w represents a node in the set Γ (v); beta is a_vwRepresenting the edge weight between nodes v and w.

Specifically, the occupied node sequence S is obtained in step 6^rThe internal order of (t) refers to the optimal order of occupying nodes for the transition from controlled to uncontrolled virus propagation.

Specifically, the sequence parameter G described in step 6_aThe calculation formula of (q) is:

wherein q is the proportion of unoccupied nodes in the network, c "_maxIs the maximum connected component, | c "_maxAnd l is the number of nodes contained in the maximum connected component, and the maximum connected component is the sub-network with the maximum number of nodes when the proportion of the unoccupied nodes in the network is q.

Specifically, the calculation formula of the network toughness F in step 6 is as follows:

the invention has the beneficial effects that: by adopting the seepage process continuously occupying nodes and a specific objective function design strategy, the method can be used for solving the problem of rapid decomposition of a large-scale network, and has higher network decomposition efficiency, thereby effectively controlling virus propagation in time and reducing the loss caused by the virus propagation; the invention realizes network decomposition, removes fewer nodes, has smaller virus transmission scale, thus has less resources consumed by controlling virus transmission, controls virus transmission under the condition of resource deficiency, can theoretically play a better protection role on the network and achieves the aim of saving protection cost; the method has the advantages of low time complexity and space complexity, high calculation efficiency and capability of quickly responding to the emergent virus propagation event; the method has good performance on large-scale network data and is suitable for the network decomposition problem in a large-scale network.

Drawings

FIG. 1 is a flow chart of the virus propagation control method based on the bounded seepage-greedy algorithm;

FIG. 2 is a schematic of the external degree of a node of the present invention;

FIG. 3 is a diagram illustrating the results of using different methods to obtain the change in order parameters with respect to the proportion of unoccupied nodes in four different networks; the graph shows (a) results of different methods for obtaining the sequence parameter changes relative to the proportion of unoccupied nodes in the Power network, (b) results of different methods for obtaining the sequence parameter changes relative to the proportion of unoccupied nodes in the loc-Gowalla network, (c) results of different methods for obtaining the sequence parameter changes relative to the proportion of unoccupied nodes in the twitter-L network, and (d) results of different methods for obtaining the sequence parameter changes relative to the proportion of unoccupied nodes in the as-Sktter network;

fig. 4 is a diagram of computation times in different networks using different methods.

Detailed Description

The present invention will be further described with reference to the following drawings and examples, which include, but are not limited to, the following examples.

As shown in fig. 1, the present invention provides a virus propagation control method based on bounded seepage-greedy algorithm, which is implemented as follows:

step 1: inputting related data of the crowd participating in virus propagation, including individual information, individual quantity, connection among individuals and probability of propagating viruses, constructing a virus propagation network G (N, M) corresponding to the crowd data of the virus propagation by taking the individuals in the crowd data of the virus propagation as nodes, the connection among the individuals as edges and the probability of propagating the viruses among the individuals as weights of the edges, wherein the point set of the network is N, the edge set is M, and the edge weight between the nodes v and w is beta_vw。

Step 2:initializing all nodes in the virus propagation network to be in an unoccupied state to form an unoccupied node set N_r(t); constructing a set of candidate nodes N_c(t), initially a set of unoccupied nodes N_r(t) any subset of which the number of nodes satisfies y ≦ N, N being the number of nodes contained in the point set N of the initial network; constructing an occupied node set N_o(t), initially an empty set; all edges are initialized to an unoccupied state, forming a set M of unoccupied edges_r(t); building a set of occupied edges M_o(t), initially an empty set. Set of unoccupied nodes N_r(t) and occupied node set N_o(t), at any moment, the intersection is empty, and the union is a point set N; unoccupied contiguous edge set M_r(t) and occupied contiguous edge set M_oAnd (t), at any moment, the intersection is empty, and the union is an edge set M.

t represents each time after the start of virus propagation control, and initially t is 0 and occupies the node sequence S^r(t) is a null sequence; setting a critical index threshold

Initial 1, temporary value

Initially 1.

And step 3: at the time t, selecting a candidate node set N_c(t) converting the state of the node into an occupied state by the node which minimizes the objective function ψ (u), and if a plurality of nodes which minimize the objective function ψ (u) exist simultaneously, randomly selecting one of the nodes to convert into an occupied state; the node is then assembled from the unoccupied node set N_r(t) candidate node set N_c(t) deleted, added to the set of occupied nodes N_o(t) and added to the sequence of occupied nodes S^r(t) end of; if two adjacent nodes are in the occupied state, the edge between the two nodes is converted into the occupied state, and the edge is not in the unoccupied edge set M_r(t) deleted, added to the set of occupied edges M_o(t) in (a).

Wherein the objective function ψ (u) is set to:

is a function of node u, for any satisfaction

The node(s) u of (a),

set to an equal finite number.

If yes, turning to the step 5; otherwise, t is t +1, and the procedure returns to step 3.

Wherein, key index I can adopt two kinds of setting modes:

(1) the key index I is set as the external degree of the node, and the calculation formula is as follows:

wherein the content of the first and second substances,

representing the external degree of the node u, wherein c (u) is a connected component where the node u is converted into an occupied state, and v represents a node in the connected component c (u); k is a radical of_vIs the degree, k, of node v in the initial network G (N, M)_v' for node v in occupied network G (N)_o(t),M_o(t)), degree of said occupied networkG(N_o(t),M_o(t)) means that the set of occupied nodes N is at time t_oSet of nodes and occupied edges M in (t)_o(t) a network in which edges are connected to each other according to the structure of the initial network G (N, M); the connected component is a sub-network of the virus transmission network; the node degree is the number of edges connected with the node. As shown in fig. 2, the black solid nodes represent occupied nodes, the gray solid nodes and the hollow nodes represent unoccupied nodes, the solid lines represent occupied edges, the dotted lines represent unoccupied edges, and the light gray block-shaped cover portions represent connected components. For example, node u, node j, and node w are unoccupied nodes, and node v and node i are unoccupied nodes; if the unoccupied node u is converted into the occupied state, the connected component c (u) of the node u comprises the node u and the node v, and the external degree of the node u is calculated according to a formula

Similarly, if the unoccupied node j is converted into the occupied state, the connected component of the node j is c (j), and the external degree of the node j is

(2) The key index I is set as the external propagation probability of the node, and the calculation formula is as follows:

wherein the content of the first and second substances,

And 5: if the key index threshold value is updated from the last time

Updated to alpha x minI I and then temporarily stored

Is updated to be new

Otherwise, the critical index threshold

Is updated to

Temporarily storing the value again

Is updated to be new

step 6: all nodes are converted into an unoccupied state again according to an occupied node sequence S^r(t) internal sequence of nodes into occupied states, said sequence S of occupied nodes^rThe internal order of (t) refers to the optimal order of occupying nodes for the transition from controlled to uncontrolled virus propagation.

Calculating a sequence parameter G each time node state conversion is carried out_a(q), sequence parameter G_a(q) represents the degree of the change from controlled to uncontrolled virus propagation in the current network, the smaller the sequence parameter, the more the sequence parameter isThe greater the extent to which the virus propagation is controlled in the pre-network, the sequence parameter G_aThe calculation formula of (q) is:

wherein q is the proportion of unoccupied nodes in the network, c "_maxIs the maximum connected component, | c "_maxAnd l is the number of nodes contained in the maximum connected component, and the maximum connected component is the sub-network with the maximum number of nodes when the proportion of the unoccupied nodes in the network is q, and represents the size of the infected crowd spreading the virus.

The process of continuously changing the node into the occupied state in the network can be regarded as the reverse seepage process of continuously removing the node from the complete network, the node selection sequence of the node and the node is reversed, and the application effect is reversed, wherein the unoccupied state of the node or the edge and the removed state are the same state. The nodes are contacted and spread viruses through edges, if part of the nodes and the edges are removed, the current network is decomposed into a plurality of blocks, the contact between the nodes in the occupied state and other nodes can be blocked, and the wide spread of the viruses is limited. Therefore, the proportion q of unoccupied nodes, or the proportion of removed nodes, represents the size of the immune or isolated population in virus transmission, and reflects the strength of the preventive measures.

When the node state begins to be converted, the proportion q of the unoccupied nodes is 1, the sequence parameter is 0, the proportion q of the unoccupied nodes is gradually reduced along with the increase of the number of the occupied nodes, the sequence parameter is gradually increased, when the sequence parameter is increased from 0 to a non-zero constant for the first time, the maximum connected component appears, and the virus propagation begins to spread widely; the unoccupied node ratio q at this time is recorded as an unoccupied node ratio threshold q_cUnoccupied node proportion threshold q_cRepresenting the minimum proportion of nodes removed required to control viral transmission, q_cThe smaller the size of the number of nodes that need to be removed to control virus spread, the smaller the size of the population that needs to be immunized or sequestered.

When occupying the node sequence S^r(t) all nodesAnd when the state is converted into the occupied state, calculating the network toughness F, wherein the network toughness F represents the virus propagation control effect, and the smaller the value of the network toughness F is, the better the virus propagation control effect is.

The calculation formula of the network toughness F is as follows:

to verify the effectiveness of the method of the invention, experiments were performed on a virus propagation network, the network parameters of which are shown in table 1.

TABLE 1

Data set	Number of nodes	Number of edges
			Yeast	2375	11693
Power	4941	6594
			p2p-Gnutella08	6301	20777
CA-AstroPh	18771	198050
			Email-Enron	36692	183831
loc-Gowalla	196591	950327
			twitter-L	532325	694606
web-Google	875713	4322051
			PAroad	1088092	1541898
Flickr	1624991	15473043
			as-Skitter	1696415	11095298
LiveJournal	3997962	34681189

Selecting five common network decomposition methods including HD (high hierarchy Degree) method, AHD (Adaptive high hierarchy Degree) method and AMSRGS (Min-sum)and Reverse-Greedy Strategy), GND (Generalized network decomposition), folder (binding key players in Networks through DEep learning to find key nodes), and BPG-I (bound-perlation Greedy-I, the key indicator being the Bounded seepage-Greedy method of the external degree), BPG-II (bound-perlation Greedy-II, the key indicator being the Bounded seepage-Greedy method of the external propagation probability) of the present invention, where table 2 gives the network toughness F values obtained by different methods on different Networks, and table 3 gives the unoccupied node proportion threshold q obtained by different methods on different Networks_c。

TABLE 2

TABLE 3

As can be seen from Table 2, the network toughness F of the BPG-I method is reduced by 30% or more as compared with the HD, AHD, FINDER methods; compared with the AMSRGS and GND methods, the network toughness F of the BPG-I method is reduced by more than 20%; the network toughness F of the BPG-I process is increased by about 5% compared to the BPG-II process; as can be seen, the BPG-I, BPG-II method of the present invention has better virus propagation control effect than other methods, and the control effect of the BPG-II method is slightly better than that of BPG-I.

As can be seen from Table 3, the unoccupied-node ratio threshold q of the BPG-I method is higher than that of the HD, AHD, GND, and FINDER methods_cThe reduction is more than 40 percent; compared with AMSRGS method and BPG-II method, the unoccupied node proportion threshold q of the BPG-I method_cOverall values are similar, in the PArod and as-Skitter netsUnoccupied node proportion threshold q of BPG-I method on network_cCompared with the AMSRGS method, the reduction is more than 10%. Compared with other methods, the BPG-I, BPG-II method of the invention needs smaller proportion of removed nodes for controlling virus transmission, and needs smaller immunization or isolated population scale;

in tables 2 and 3, "-" indicates that the calculation time was too long or exceeded the memory limit. With respect to networks such as Flickr and LiveJournal with large scale, the GND method is difficult to realize network decomposition due to the limitation of computation time and space. The AMSRGS and FINDER methods have the same problem in LiveJournal networks. The BPG-I, BPG-II method can realize network decomposition in large-scale networks such as Flickr, Livejournal and the like, and compared with HD and AHD methods, the network toughness F is reduced by more than 10 percent, and the unoccupied node proportion threshold q is_cThe reduction is more than 20 percent. It can be seen that the present invention performs well on large scale networks.

FIG. 3 is a graph showing the results of varying the order parameters obtained by different methods on a Power, loc-Gowalla, twitter-L, as-Skter network with respect to the proportion of nodes removed, where the abscissa q is the proportion of unoccupied nodes in the network and the ordinate G is the proportion of unoccupied nodes in the network_a(q) is an order parameter; HD is a degree centrality method, AHD is a self-adaptability centrality method, FINDER is a method for searching key nodes for deep reinforcement learning, BPG-I is a method adopting the external degree of the nodes as a key index, and BPG-II is a method adopting the external propagation probability as a key index.

Under the condition of certain node removing proportion q, the sequence parameter G obtained by the invention_a(q) tends to be smaller and the extent to which viral propagation is controlled tends to be greater. For example, in a Power network, when the removed node ratio q is 0.03, the BPG-I method obtains the order parameter G_a(q) is 0.0820, and the sequence parameter G is obtained by the FINDER method_a(q) is 0.7758; it can be seen that, corresponding to the real situation, under the conditions of immunization or isolation of patients of the same proportion, the use of the invention results in a smaller population of infected persons, and a greater degree of control of viral transmission;

obtaining the sequence parameter G_a(q) in the case where (q) is constant,the proportion q of the removed nodes is smaller; for example, in a loc-Gowalla network, when G is_a(q) 0.01, the removed node ratio q of the BPG-I method is 0.1354, and the removed node ratio q of the FINDER method is 0.1919; it can be seen that, corresponding to the real situation, the proportion of patients immunized or isolated using the invention is smaller, with the same degree of control of viral transmission.

FIG. 4 is a comparison graph of computation Time of different networks by different methods, wherein the abscissa represents different networks, and the abscissa represents the networks, and the networks are Yeast, Power, p2p (p2p-Gnutella08), CA (CA-AstroPh), Email (Email-Enron), loc (loc-Gowalla), twitter (twitter-L), web (web-Google), PAroad, Flickr, as (as-Skter), and live (live journal) networks, and the ordinate represents the computation Time; in the illustration, AMSRGS is a minimum sum and inverse greedy method;

compared with AMSRGS and FINDER methods, the BPG-I, BPG-II method disclosed by the invention has the advantages that the calculation time is obviously reduced, and the efficiency is obviously improved; for example, in the as-Skitter network, the computation speed of the BPG-I method is increased by more than 1500 times compared with the AMSRGS method and is increased by more than 70 times compared with the FINDER method; the method has high calculation efficiency and can quickly respond to the emergent virus propagation event.

In conclusion, the method realizes network decomposition, has fewer removed nodes and smaller virus propagation scale, controls virus propagation by using the method under the condition of resource shortage, and theoretically can play a better protection role on the network. The method has the advantages of low time complexity and space complexity, high calculation efficiency, good performance aiming at large-scale network data, and suitability for the network decomposition problem in a large-scale network.

Claims

1. A virus propagation control method based on a bounded seepage-greedy algorithm is characterized by comprising the following steps:

step 1: inputting relevant data of the crowd participating in virus transmission, including individual information, individual quantity, connection among individuals and probability of transmitting viruses, taking the individuals in the crowd data of the virus transmission as nodes and the connection among the individuals as edgesThe probability of spreading viruses among individuals is the weight of the edge, a virus spreading network G (N, M) corresponding to the virus spreading crowd data is constructed, the point set of the network is N, the edge set is M, and the edge weight between the nodes v and w is beta_vw；

Initial 1, temporary value

Initially 1;

and step 3: at the time t, selecting a candidate node set N_c(t) converting the state of the node into an occupied state by the node which minimizes the objective function ψ (u), and if a plurality of nodes which minimize the objective function ψ (u) exist simultaneously, randomly selecting one of the nodes to convert into an occupied state; the node is then assembled from the unoccupied node set N_r(t) candidate node set N_c(t) deleted, added to the set of occupied nodes N_o(t) and added to the sequence of occupied nodes S^r(t) end of; if two adjacent nodes are in the occupied state, the edge between the two nodes is converted into the occupied state, and the edge is not in the unoccupied edge set M_r(t) deleted, added to the set of occupied edges M_o(t) in (a);

and 5: if the key index threshold value is updated from the last time

Updated to alpha x minI I and then temporarily stored

Is updated to be new

Otherwise, the critical index threshold

Is updated to

Temporarily storing the value again

Is updated to be new

step 6: convert all nodes againIn the unoccupied state, according to an occupied node sequence S^r(t) sequentially converting the nodes into occupied states by the internal sequence; calculating a sequence parameter G each time node state conversion is carried out_a(q); when the sequence parameter is increased from 0 to a non-zero constant for the first time, the unoccupied node ratio q at this time is recorded as an unoccupied node ratio threshold q_cUnoccupied node proportion threshold q_cRepresents the minimum node proportion which needs to be removed for controlling the virus propagation, and the smaller the value of the minimum node proportion is, the smaller the node proportion needs to be removed for controlling the virus propagation is; when occupying the node sequence S^rAnd (t) when all the nodes in the node(s) are converted into the occupied states, calculating the network toughness F, wherein the network toughness F represents the virus propagation control effect, and the smaller the value of the network toughness F is, the better the virus propagation control effect is.

2. The virus propagation control method based on the bounded seepage-greedy algorithm as claimed in claim 1, wherein: unoccupied node set N described in step 2_r(t) and occupied node set N_o(t), at any moment, the intersection is empty, and the union is a point set N; set of unoccupied edges M_r(t) and set of occupied edges M_oAnd (t), at any moment, the intersection is empty, and the union is an edge set M.

3. The virus propagation control method based on the bounded seepage-greedy algorithm as claimed in claim 1, wherein: the objective function ψ (u) described in step 3 is set to:

is a function of node u, for any satisfaction

The node(s) u of (a),

set to an equal finite number.

4. The virus propagation control method based on the bounded seepage-greedy algorithm as claimed in claim 1, wherein: the key index I in the step 4 is set as the external degree of the node, and the calculation formula is as follows:

wherein the content of the first and second substances,

representing the external degree of the node u, wherein c (u) is a connected component where the node u is converted into an occupied state, and v represents a node in the connected component c (u); k is a radical of_vIs the degree, k 'of the node v in the initial network G (N, M)'_vFor node v in occupied network G (N)_o(t),M_o(t)), said occupied network G (N)_o(t),M_o(t)) means that the set of occupied nodes N is at time t_oSet of nodes and occupied edges M in (t)_o(t) a network in which edges are connected to each other according to the structure of the initial network G (N, M); the connected component is a sub-network of the virus transmission network; the node degree is the number of edges connected with the node.

5. The virus propagation control method based on the bounded seepage-greedy algorithm as claimed in claim 1, wherein: the key index I in the step 4 is set as the external propagation probability of the node, and the calculation formula is as follows:

wherein the content of the first and second substances,

6. The virus propagation control method based on the bounded seepage-greedy algorithm as claimed in claim 1, wherein: the occupied node sequence S is obtained in step 6^rThe internal order of (t) refers to the optimal order of occupying nodes for the transition from controlled to uncontrolled virus propagation.

7. The virus propagation control method based on the bounded seepage-greedy algorithm as claimed in claim 1, wherein: the sequence parameter G described in step 6_aThe calculation formula of (q) is:

wherein q is the proportion of unoccupied nodes in the network, c ″)_maxIs the maximum connected component, | c_maxAnd l is the number of nodes contained in the maximum connected component, and the maximum connected component is the sub-network with the maximum number of nodes when the proportion of the unoccupied nodes in the network is q.

8. The virus propagation control method based on the bounded seepage-greedy algorithm as claimed in claim 1, wherein: the calculation formula of the network toughness F in the step 6 is as follows:

。