CN116318831A

CN116318831A - Botnet propagation prediction method based on network security knowledge graph

Info

Publication number: CN116318831A
Application number: CN202310054002.8A
Authority: CN
Inventors: 张静; 张海霞; 彭媛媛; 连一峰; �田润
Original assignee: Institute of Software of CAS
Current assignee: Institute of Software of CAS
Priority date: 2023-02-03
Filing date: 2023-02-03
Publication date: 2023-06-23

Abstract

The invention discloses a botnet propagation prediction method based on a network security knowledge graph, which comprises the following steps: 1) Constructing a network security knowledge graph based on network security data, and marking nodes related to each botnet event in historical network security data on the network security knowledge graph; 2) Calculating the state of each node in the network security knowledge graph, and generating a botnet propagation topological graph of the corresponding botnet according to the state of each node in each botnet event; 3) Layering nodes in the botnet propagation topological graph of the target botnet, and setting a corresponding influence value Ks for the nodes in each layer; 4) Constructing a propagation model of the target botnet according to the states of all nodes in the target botnet and the corresponding influence value Ks; 5) And generating a dynamic topological graph of the target botnet propagation process according to the propagation model, and predicting the propagation condition of the target botnet at the next moment.

Description

Botnet propagation prediction method based on network security knowledge graph

Technical Field

The invention relates to the field of network security, in particular to a botnet propagation prediction method based on a network security knowledge graph.

Background

Botnets are one of the major security threats that jeopardize the internet, and an attacking organization or individual can spread bots to infect a large number of hosts on the internet through various ways, and can issue various operation instructions to the infected hosts through one control channel, or continue to spread bots, or wait as broilers to participate in launching other types of attacks, thereby forming a one-to-one, one-to-many, or many-to-one control network between a controller and the infected hosts. Most botnets can be spread through bots under the control of attackers, controlled end-to-end connected, and more hosts infected, and the development is continuous and strong. An attack organization or a person serves as a zombie network controller, once the zombie network with a certain scale is owned, resources controlled by the zombie network can be utilized to form certain destructive attack capacity on the network, so that the attack organization or the person is allowed to obtain economic benefits in the form of letting the resources or providing services, and the network space safety is endangered.

At present, a great deal of literature is used for researching the botnet transmission mechanism, and most research works divide all hosts into two types of infected hosts and easily infected hosts, so that the botnet transmission mechanism is researched on the basis. However, in the actual bot program propagation process, when an vulnerable host is infected, the infected state cannot always be maintained, and the existing bot network propagation model is constructed, so that single bot network attack events are not combined with the previous network security events, and various network resource elements associated with the host are not considered in the calculation of the bot host propagation capability.

Disclosure of Invention

Aiming at the problems, the invention aims to provide a botnet propagation prediction method based on a network security knowledge graph. According to the method, based on the network security knowledge graph, bot host nodes with different propagation capacities are layered, and the infection capacity of each bot host node is calculated according to the associated network space entity nodes, so that a dynamic topological graph capable of describing the bot network propagation process is formed. In order to accurately observe a dynamic propagation process of botnet behaviors, the method comprises the steps of firstly constructing a knowledge graph based on multi-source heterogeneous network security data (known historical network security event data which comprises botnet event data which are recognized and found), marking entity nodes related to known botnet events on the knowledge graph, calculating the infection capacity of each node in the graph by using a K-shell algorithm and an SIR infectious disease model, constructing a propagation model of the botnet, and further forming a dynamic topological graph capable of describing the botnet propagation process. Mainly comprises the following steps:

step one: and constructing a network security knowledge graph based on the network security data.

In order to fully mine the connection between the network security entities, the network security entities are subjected to association analysis, the relationship between the network security ontology and the ontology is defined from the view point of network security protection based on network security data, and then the network security knowledge graph is constructed.

Step two: host IP states involved in the botnet are identified based on the SIR model.

And placing the botnet in a global network security event for association analysis. To obtain the propagation model of this botnet, the state of the botnet host IP needs to be identified in the knowledge graph, such as a susceptible host, an infected host, a host that has been repaired after infection, and the topology graph of this botnet is identified in a massive network security event.

Step three: the IP nodes are classified based on a K-shell algorithm.

The traditional SIR model only calculates the propagation capacity of the node in the network, does not judge the influence of the node from the perspective of global topology, and has certain limitation. In order to judge the importance of the IP nodes in the knowledge graph network, a K-shell method is used for carrying out coarse-grained decomposition on the global topological structure of the botnet, and influence values Ks are distributed for the nodes of different layers.

Step four: the infectivity capacity of each node in the profile was calculated.

After layering zombie host IP nodes, its infection capability is calculated for different levels of host nodes. Because we have constructed a network security knowledge graph, grasp a large amount of information related to the botnet at this time, the infection capability of each bot host IP may be related to hidden danger and vulnerability existing in the corresponding system, and the infection capability of the bot host IP itself is related to the infection capability of its neighboring nodes, and in this step, the infection capability of the bot host IP needs to be calculated based on the massive information, so as to form a dynamic topology graph of the bot network propagation process transformed with time t.

In summary, the technical scheme of the invention is as follows:

a botnet propagation prediction method based on a network security knowledge graph comprises the following steps:

1) Constructing a network security knowledge graph based on network security data, and marking nodes related to each botnet event in historical network security data on the network security knowledge graph;

2) Calculating the state of each node in the network security knowledge graph, and generating a botnet propagation topological graph of the corresponding botnet according to the state of each node in each botnet event;

3) Layering nodes in the botnet propagation topological graph of the target botnet, and setting a corresponding influence value Ks for the nodes in each layer;

4) Constructing a propagation model of the target botnet according to the states of all nodes in the target botnet and the corresponding influence value Ks;

5) And generating a dynamic topological graph of the target botnet propagation process according to the propagation model, and predicting the propagation condition of the target botnet at the next moment.

Further, calculating the state of each node in the network security knowledge graph based on an SIR model; the states include susceptible nodes, nodes that have been infected, nodes that have been repaired after infection.

Further, layering nodes in the botnet propagation topological graph by adopting a K-shell algorithm: firstly deleting nodes with the degree of 1 in the botnet propagation topological graph and connecting edges thereof, classifying all the deleted nodes into 1-shell layers at the moment, and distributing Ks values of 1 for the nodes; then deleting nodes with the degree of 2 in the botnet and the connected edges thereof, classifying the deleted nodes into 2-shell layers, and distributing Ks values of 2 for the deleted nodes; sequentially increasing node degrees, deleting nodes with corresponding degrees and distributing corresponding Ks values for the nodes until all nodes in the botnet propagation topological graph are layered and corresponding Ks values are distributed; the layer to which the node with the largest Ks value belongs is a core layer of the network, and the node at the core layer of the network has the largest influence.

Further, the propagation model is

Wherein (1)>

Representing the infection capacity of node i itself at time t,/->

Representing the infection capability of a neighbor node j of a node i at the moment t, L _ij Representing the relation level between the node i and the node j, wherein the level difference between the node i and the node j is m, L _ij =m; ki is the Ks value corresponding to node i, and Kj is the Ks value corresponding to node j; beta _i Indicating the infectious capacity of node i to infect other nodes.

Further, the method comprises the steps of,

m is a set maximum value, if L _ij If the value of (2) is greater than M, discard the L _ij 。

Further, the network security data includes, but is not limited to: botnet data, phishing data, website Trojan network data.

A server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the steps of the above method.

A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor realizes the steps of the above method.

Compared with the prior art, the invention has the following positive effects:

(1) Importance ordering is carried out on zombie host nodes with different propagation capacities, so that nodes with influence in network topology can be identified;

(2) The infection capability of the zombie host is related to the network space entity node related to the zombie host, such as host-related vulnerability, neighbor node transmission capability and the like, and the infection capability of each zombie host node is calculated, so that the method is more in line with the transmission capability of the zombie network host in an actual event.

Drawings

FIG. 1 is a flow chart of a botnet propagation model construction method based on a network security knowledge graph.

Fig. 2 is a network security knowledge graph ontology schema diagram.

Fig. 3 is a hierarchical diagram of zombie host IP nodes based on a K-shell algorithm.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below.

Step one: and constructing a network security knowledge graph based on the known network security event data.

In order to fully mine the connection between network security events and realize the association analysis of the entities involved in various network security events, a network security knowledge graph needs to be constructed based on network security data. Firstly, analyzing network security knowledge, and further constructing a network security knowledge graph, as shown in fig. 2, the entity relationship triples comprise:

"System", "belonging unit", "Unit" >

System, hidden danger

"System", "vulnerability exists", "vulnerability" >

"System Domain name", "associated URL", "System URL" >

"IP", "belonging unit", "unit" >

"System URL", "IP Address", "IP" >

"IP", "connection", "IP" >

"System", "System Domain name" >

"System", "IP Address", "IP" >

"System Domain name", "IP Address", "IP" >

Wherein a "connection" relationship between "IP" entities indicates that there is a wired or wireless connection between the IPs.

According to the ontology structure diagram, triples can be extracted from multi-source heterogeneous network security event data, and a network security knowledge graph of an instance layer is constructed.

For an event of a botnet which has occurred, a controller of the event may only grasp a part of IP nodes of the botnet at present, and in order to obtain a propagation process of the botnet, the IP nodes related to the botnet are marked in a knowledge graph. Specifically, we divide the state of IP nodes into the following three categories:

1) S state: indicating an vulnerable host IP.

2) I state: indicating that the host IP has been infected.

3) R state: indicating that normal host IP has been restored by some means after being infected.

And marking the IP address related to the botnet by using the three states, so that a propagation topological graph of the current botnet can be obtained.

Step three: the IP nodes are classified based on a K-shell algorithm.

In order to judge the importance of the IP nodes in the knowledge graph network, a K-shell method is used for coarsening and decomposing a botnet propagation topological structure, and Ks values are distributed to the nodes of different layers. First, deleting nodes with degree 1 in botnet and its connected edges, at this time, classifying all deleted nodes into 1-shell layer, and assigning Ks value to them, which is 1. And continuously deleting the node with the degree of 2 in the botnet and the connected edge thereof, classifying the deleted node into a 2-shell layer, and distributing Ks value for the deleted node. The above process is repeated until all nodes in the botnet are hierarchical and assign Ks values. The layer to which the node with the largest Ks value belongs is a core layer of the network, and the node at the core layer of the network has the largest influence.

Specifically, K-shell decomposition of nodes in a botnet is shown in FIG. 3. The controlled host IP nodes involved in the botnet are divided into different K layers, and the more controlled host IP nodes connected with other nodes have higher Ks values, and correspondingly have higher importance.

Step four: the infectivity capacity of each node in the profile was calculated.

In the SIR model, the propagation path is s→i→r, and the infected host IP is infected with the infected host IP at the infection probability β, and is regarded as not being infected again after being repaired. The number of the susceptible hosts in all IP nodes in the knowledge graph at the time t is represented by S (t), the number of the hosts which are infected by the botnet in all IP nodes in the knowledge graph at the time t is represented by I (t), the number of the botnet which is infected into the botnet before in the knowledge graph at the time t is represented by R (t), and then the IP number of the repaired botnet can be represented by a propagation dynamics differential equation:

where β represents the probability that the susceptible host IP is infected and γ represents the probability that an infected zombie host IP will take some measure to be restored to a normal host IP. The first equation in the propagation dynamics differential equation shows that S (t) is monotonically decreasing, beta S (t) I (t) is the newly increased number of infected people at the moment t, the second equation shows that the number of infected people at the moment t is the newly increased number of infected people minus the newly increased number of healed people, and the third equation shows the newly increased number of healed people at the moment t. In order to calculate the infection capability of each bot host in the botnet to infect other hosts more accurately, the K-shell hierarchy, the infection capability of the bot host IP itself and the infection capability of the neighbor nodes of the bot host IP need to be comprehensively considered, so that a more accurate propagation model is calculated, and the method specifically comprises the following steps:

1) The host IP nodes in the same K layer have the same Ks value, which means that the K layer nodes have the same importance in the botnet;

2) The infection capability calculation of the IP neighbor nodes of the zombie host is similar to that of the zombie host, but when the neighbor node effect is classified as the IP infection capability of the zombie host, different Ks values are given to the neighbor nodes according to different K layers.

3) The zombie host IP itself's infection capability β includes: vulnerability, hidden danger, time of the zombie host IP infection (if the host is infected earlier and not repaired, the zombie host is considered to have stronger infection capability) and the like exist in the IP-associated system in the knowledge graph, and the influence degree of different association elements on the zombie host IP infection capability is determined based on the node relation hierarchy, wherein the specific definition is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,

representing the infection capability of IP node i of zombie host at t time, and +.>

Representing infection capability of neighbor node j of zombie host IP node i at time t, wherein L _ij Representing the relation hierarchy between node i and node j, the smaller the hierarchy, the stronger the relation, weight +.>

The larger the level difference between node i and node j is m, then L _ij =m; switch for closingThe maximum system level is 5, and if the system level exceeds the threshold value, the influence of the neighbor node is not calculated; beta represents the infection capability of each zombie host in the zombie network to infect other hosts, n is the total number of neighbor nodes of the node i, and Ki is the Ks value corresponding to the node i.

In summary, the infection capability of the botnet host IP at the time t can be dynamically calculated, the probability that the susceptible host IP is infected at the time t+1 is obtained, and a dynamic topological graph of the botnet propagation process transformed with time is formed.

The above description describes the present invention in order to enable one skilled in the art to understand the present invention and to implement it according to the present invention, and is not intended to limit the scope of the present invention. All equivalent changes or modifications made according to the spirit of the present invention should be included in the scope of the present invention.

Claims

1. A botnet propagation prediction method based on a network security knowledge graph comprises the following steps:

2. The method of claim 1, wherein the state of each node in the network security knowledge-graph is calculated based on an SIR model; the states include susceptible nodes, nodes that have been infected, nodes that have been repaired after infection.

3. The method of claim 2, wherein nodes in the botnet propagation topology are layered using a K-shell algorithm: firstly deleting nodes with the degree of 1 in the botnet propagation topological graph and connecting edges thereof, classifying all the deleted nodes into 1-shell layers at the moment, and distributing K for the nodes _s A value of 1; then deleting the node with the degree of 2 in the botnet and the connected edge thereof, classifying the deleted node into 2-shell layers, and distributing K for the deleted node _s A value of 2; sequentially increasing node degree, deleting nodes with corresponding degree and distributing corresponding K for the nodes _s Values until all nodes in the botnet propagation topology are hierarchical and corresponding K is assigned _s A value; wherein K is _s The layer to which the node with the largest value belongs is the core layer of the network, and the node at the core layer of the network has the largest influence.

4. A method according to claim 1, 2 or 3, wherein the propagation model is

Wherein (1)>

Representing the infection capacity of node i itself at time t,/->

Representing the infection capability of a neighbor node j of a node i at the moment t, L _ij Representing the relation level between the node i and the node j, wherein the level difference between the node i and the node j is m, L _ij ＝m；K _i Is a nodei corresponds to K _s Value, K _j Is K corresponding to node j _s A value; beta _i Indicating the infectious capacity of node i to infect other nodes.

5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,

6. A method according to claim 1 or 2 or 3, wherein the network security data includes, but is not limited to: botnet data, phishing data, website Trojan network data.

7. A server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the steps of the method of any of claims 1 to 6.

8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.