CN113094707A

CN113094707A - Transverse mobile attack detection method and system based on heterogeneous graph network

Info

Publication number: CN113094707A
Application number: CN202110347685.7A
Authority: CN
Inventors: 卢志刚; 王天; 姜波; 刘俊荣; 刘松; 董璞
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2021-07-09
Anticipated expiration: 2041-03-31
Also published as: CN113094707B

Abstract

The invention relates to a heterogeneous graph network-based lateral mobile attack detection method and system. The method is based on an authentication log of an intranet, a login behavior diagram between a user and a host is structured, a user login diagram and a source host path diagram are constructed, and then two-stage anomaly detection is carried out on the diagrams. The first stage is based on a user login graph, a graph neural network algorithm with maximized mutual information is used for learning a behavior pattern of a host, and partial abnormal samples are obtained through calculation of a local abnormal factor algorithm; and in the second stage, based on the source host path diagram and the labeled samples obtained in the first stage, a heterogeneous diagram attention network algorithm is used for semi-supervised learning, and the transverse mobile attack behavior is detected. The method can simply and effectively detect the transverse mobile attack behavior under the condition of no sample label, has the effect exceeding that of most methods with supervision learning, and has high recall rate and low false alarm rate.

Description

Transverse mobile attack detection method and system based on heterogeneous graph network

Technical Field

The invention relates to the field of computer network security, which is used for resisting horizontal mobile attack behaviors implemented in advanced persistent threats, in particular to a horizontal mobile attack detection method and a horizontal mobile attack detection system based on a heterogeneous graph network.

Background

In recent years, with the rapid development of the internet, the network environment becomes increasingly complex, and the network attack increasingly presents a highly frequent situation. Among them, Advanced Persistent Threat (APT) benefits from the progress of the attack technique and the improvement of the attack organization, and attacks are increasingly frequent. Compared with other attacks, the APT attack has a longer latent period and larger destructive power, such as intervention in American great election, damage to a power grid and the like. The attack method is more comprehensive, and a customized attack tool can be developed through long-term observation on the target, so that the threat is huge. Therefore, the detection and protection of APT attacks have become an urgent problem to be solved in current network security.

Lateral movement is an extremely important ring of the APT attack, and is a main process for implementing the attack after an attacker enters an intranet. According to the ATT & CK framework, lateral movement consists of the technology used by attackers to enter and control remote systems on the network. After an attacker successfully invades the network and establishes a foothold, the attacker usually moves transversely in the network for the next step of attacking and collecting information of the target network, finally obtains the control right of the whole network, achieves the purposes of destroying the target network or infrastructure, stealing confidential data or core intellectual property rights and the like, and has great harm.

At present, the detection of the lateral movement attack is still in a relatively preliminary stage, and the research on the detection of the lateral movement attack mainly converts the detection into the detection of an abnormal user or a host in an intranet, and detects abnormal performance exceeding a threshold value by modeling the behavior of the user or the host. The detection target can be classified into a moving target type and a moving path type according to the detection target. The moving target type method mainly detects the user or host machine attacked and trapped by an attacker in the transverse moving attack; the moving path type method uses a moving path occurring in the lateral movement attack as a detection target. Much of the existing research work is focused on moving target type lateral movement attack detection, and the movement path of the lateral movement attack is less researched.

In conclusion, the transverse mobile attack usually pretends normal users to operate by stealing user credentials, has high concealment and is difficult to detect. The existing lateral movement attack detection research method generally converts the method into the detection of abnormal users or hosts in an intranet, but still has the following defects and shortcomings: firstly, the false alarm rate of the existing method is generally higher due to the massive multi-source security logs. Secondly, in an actual network environment, a few abnormal users or hosts cannot be observed or can be observed, and the abnormal users or hosts are not fully utilized; thirdly, the intranet is essentially an association diagram formed by users and a host, and lateral movement attack detection on the diagram is yet to be researched.

Disclosure of Invention

In order to solve the above problems, a two-stage lateral mobile attack detection method hglm (terrestrial Movement detection using Heterogeneous graph) based on a Heterogeneous graph network is proposed herein.

The principle of the invention is as follows: based on the authentication log of the intranet, the login behavior graph between the user and the host is structured, the user login graph and the source host path graph are constructed, and then two-stage anomaly detection is carried out on the graph. The first stage is based on a user login graph, a graph neural network algorithm with maximized mutual information is used for learning a behavior pattern of a host, and partial abnormal samples are obtained through calculation of a local abnormal factor algorithm; and in the second stage, based on the source host path diagram and the labeled samples obtained in the first stage, a heterogeneous diagram attention network algorithm is used for semi-supervised learning, and the transverse mobile attack behavior is detected.

In order to achieve the purpose, the invention adopts the specific technical scheme that:

a method for detecting lateral mobile attacks based on a heterogeneous graph network comprises the following steps:

1) and (6) data set extraction. Since the lateral movement attack involves login authentication behavior between the user and the host, the data set extraction is to collect authentication logs generated by the intranet equipment and construct a data set.

2) The security log graph is structured. And constructing a user login graph and a source host path graph by using the extracted data set.

3) Abnormal login behavior detection based on unsupervised learning: and carrying out abnormal login behavior detection based on unsupervised learning based on the user login graph. This section is the first stage of HGLM two-stage anomaly detection.

4) Lateral movement attack detection based on semi-supervised learning: and performing semi-supervised learning based lateral movement attack detection based on the source host path diagram and the small amount of labeled samples in the first stage. This section is the second stage of HGLM two-stage anomaly detection.

Further, the security log graph structuring mainly comprises three parts, namely data preprocessing, building of a user login graph and building of a source host path graph.

a) The first step of log graph structuring is to preprocess the authentication log of the intranet. The authentication log typically contains attributes such as authentication time, source user, target user, source host, target host, and authentication status. The original log information is redundantly refuted, and therefore needs to be processed into a format that conforms to the lateral movement attack scenario. First, since an attacker typically moves laterally from one host to another using a trapped user, we only need to focus on the same authentication events for the source user and the target user. Second, the lateral mobile attack involves at least two hosts, so we need to filter the same authentication events for the source host and the target host. In summary, the pretreatment process is as follows: giving an authentication log data set D, traversing each authentication event, screening out the events of which the source user is the same as the target user and the source host is different from the target host, and obtaining a processed data set D₁。

b) The User Authentication Graph (UAG) is a directionless homogeneous Graph, which represents the login behavior pattern between hosts for a certain time. Definition map G_uIn the figure, a node V represents a host, and an edge E represents a login connection between hosts of a user. The login times of the user on the host under the sliding window are given to the nodes in the graph as the characteristics F, the edges in the graph are not given with the characteristics, only the connection relation is shown, and the user login graph network with the characteristics is obtained. In particular, given data set D₁User u and sliding window length L, first at D₁Screening out authentication events belonging to the user u to obtain a data set D_u. Secondly, dividing the data into a plurality of time windows according to the length L of the sliding window, and calculating the login frequency characteristic F of the user on the host under different windows. Finally, traverse D_uAdding the source host and the target host to a node V in the graph, adding an edge E (the node and the edge are ignored if the addition is repeated) of the source host and the target host connected to the graph, adding one to the login times of the source host and the target host in the window corresponding to the graph F, and obtaining a user login graph G with characteristics of a user u after the traversal is finished_u＝(V,E,F)。

c) The source Host Path Graph (HPG) is a directed heterogeneous Graph, which represents the association relationship between the login Path of the user to the target Host and the source Host. Definition map G_pTwo types of nodes are defined in the graph, (V, E, F), one type representing a source host V_srcClass V representing the user's login path to the target host_pathThere are also two types of edges, one type being the sending edge E_sendThe login path node from the source host node to the target host points to the user, and represents that the user logs in from the source host to the target host; the other is depending edge E_onThe point of the login path node from the user to the target host is to the source host node, which means that the login path from the user to the target host occurs on the source host, and the two types of edges are symmetrical. Through the occurrence frequency and the statistical characteristic F of the login path under the sliding window on the source host_statisticAssigning edges to nodesOnly the connection relation is shown, and a source host path diagram network is obtained. In particular, given data set D₁Sliding window length L and statistical characteristics F_statisticGo through D₁Adding V to the source host at each event_srcSplicing the user and the target host into a login path as a node to be added to the V_pathAnd adding a connecting edge pointed to the login path by the source host to E_sendSymmetrically adding a connecting edge pointed to the source host by the login path to E_onAnd calculating the login frequency characteristics of the sliding window and the user login graph. Finally, the node V of the type of the login path in the graph is checked_pathTraversing to obtain the statistical characteristic F_statisticAppending to the node while targeting the source host node V_srcEndowing one-hot coding characteristics to obtain a source host path diagram G with characteristics_p(V, E, F). The used statistical characteristics comprise success and failure times of authentication from the user to the target host, the ratio of the authentication times from the user to the target host to the total authentication times of the user, and the minimum value, the maximum value and the average value of time intervals of authentication events from the user to the target host.

Further, the unsupervised learning-based abnormal login behavior detection comprises: based on a user login Graph, firstly, a Graph neural network algorithm (Deep Graph Infomax, DGI) with maximized mutual information is used for learning a behavior mode of a host, namely, a hidden layer feature representation of a sample is obtained by training mutual information of a local feature h and a global feature s of the maximized sample, specifically, in the Graph, a feature vector of each node is the local feature h of the node, training and learning are carried out through a Graph convolution kernel encoder, and the global feature s is obtained through an average readout function. And then, applying random disorder disturbance to the nodes to obtain negative samples, using a discriminator to score a sample pair consisting of h and s, and finally obtaining hidden layer representation of the nodes. And then, based on sample characteristic representation learned by DGI, detecting by using a Local abnormal Factor algorithm (LOF), obtaining a small amount of labeled host samples by setting a threshold, and synthesizing a login path from a user to a target host with the labeled host samples and a user group corresponding to the labeled host samples for second-stage semi-supervised learning.

Further, the semi-supervised learning based lateral movement attack detection comprises: based on a source host path graph and a small number of labeled samples in the first stage, a Heterogeneous graph Attention Network algorithm (HAN) is used for semi-supervised learning on the graph, and through learning the association between login path nodes, more transverse movement attack behaviors are detected. HAN introduces attention mechanisms into heterogeneous graphs, including node-level attention and semantic-level attention. By defining meta-paths (meta-paths) on the graph, node-level attention primarily learns the weights of neighboring nodes on its meta-paths, while semantic-level attention learning is based on the weights of different meta-paths. And finally, obtaining a final node representation through corresponding aggregation operation. Specifically, in the figure, two meta paths are defined: meta-path p from path node to source host node₁(v_path，e_on，v_src) And meta path p from path node to source host node to path node₂(v_path，e_on，v_src，e_send，v_path). Based on the two meta paths, node level attention and semantic level attention features are calculated, the labeled samples in the first stage are used, cross entropy loss functions are used as targets for semi-supervised learning, and transverse movement attack behaviors are detected.

Based on the same inventive concept, the invention also provides a system for detecting the lateral mobile attack based on the heterogeneous graph network, which comprises the following steps:

the data acquisition module is used for collecting authentication logs generated by the intranet equipment and constructing a data set;

and the safety log graph structuring module is used for constructing a user login graph and a source host path graph by utilizing the data set.

The abnormal login behavior detection module based on unsupervised learning is used for detecting the abnormal login behavior based on unsupervised learning based on the user login graph;

and the transverse mobile attack detection module based on semi-supervised learning is used for carrying out transverse mobile attack detection based on semi-supervised learning based on the source host path diagram and the labeled sample obtained by detecting the abnormal login behavior based on unsupervised learning.

Compared with the prior art, the invention has the beneficial effects that:

the method can simply and effectively detect the transverse moving attack behavior under the condition of no sample label, the AUC value on the CMCS Events of the related public data set exceeds 95 percent, the TPR of part of users reaches 100 percent, the FPR is 0, the effect exceeds that of most methods with supervised learning, and the method has high recall rate and low false alarm rate.

Drawings

Fig. 1 is an overall flow chart of the present invention for detecting a lateral movement attack based on a heterogeneous graph network. Wherein X represents the initial characteristic of the positive sample, X 'represents the initial characteristic of the disturbed negative sample, H represents the hidden layer characteristic of the positive sample after graph convolution, H' represents the hidden layer characteristic of the negative sample after graph convolution, D represents a classifier, R represents an average Readout function, S represents the global characteristic calculated by the average Readout function, and Z represents the global characteristic calculated by the average Readout function₁～Z_pThe hidden layer features acquired through node level attention are represented, and Z represents the hidden layer features acquired through semantic level attention.

FIG. 2 is a flow chart of the construction of a user login diagram in the present invention.

FIG. 3 is a flow chart of the construction of a source host path graph in the present invention.

Fig. 4 is a flow chart of abnormal login behavior detection based on unsupervised learning in the present invention.

Fig. 5 is a flow chart of the lateral movement attack detection based on semi-supervised learning in the present invention.

Fig. 6 is a graph of the detection performance results of the HGLM in the method of the present invention for different users on the public data set CMCS Events.

Detailed Description

In order to make the technical solutions in the embodiments of the present invention better understood and make the objects, features, and advantages of the present invention more comprehensible, the technical core of the present invention is described in further detail below with reference to the accompanying drawings and examples.

The invention discloses a heterogeneous graph network-based method for detecting a lateral mobile attack, which mainly comprises four parts of data acquisition, security log graph structurization, unsupervised learning-based abnormal login behavior detection and semi-supervised learning-based lateral mobile attack detection, and mainly comprises the following steps of:

step 100 is data set extraction, namely, collecting authentication logs generated by the intranet equipment for a period of time to form a data set.

Step 200 is a security log graph structuring, which mainly comprises three parts, namely data preprocessing, construction of a user login graph and construction of a source host path graph.

The construction of the user login diagram is shown in fig. 2.

Step 210, for a given data set D₁First define user u and sliding window length L for use at D₁And screening out authentication events belonging to the user u and calculating the login times characteristic F of the user on the host under different windows.

Step 220, for data set D₁And traversing.

Step 230, at D₁And screening out authentication events belonging to the user u.

Step 240, add the source host and the target host of the authentication event to node V in the graph, and add an edge E of the source host and the target host connected to the graph (node and edge are ignored if adding duplicates).

Step 250, calculating the login times characteristic F of the user on the host computer under different windows, and adding one to the login times of the source host computer and the target host computer under the corresponding windows.

Obtaining a user login graph G with characteristics of the user u after the traversal is finished_u＝(V,E,F)。

The construction of the source host path graph is shown in fig. 3.

Step 260, given data set D₁First, defining the length L of the sliding window and the extracted statistical characteristics F_statistic。

Step 270, traverse D₁Adding V to the source host at each event_srcSplicing the user and the target host into a login path as a node to be added to the V_pathAnd adding a connecting edge pointed to the login path by the source host to E_sendSymmetrically adding a connecting edge pointed to the source host by the login path to E_on。

Step 280, calculating the occurrence frequency characteristic F of the login path under different windows, and adding one to the occurrence frequency of the login path under the corresponding window.

Step 290, corresponding statistical characteristics F_statisticAnd adding the information to the login path node.

After traversing, the source host node V in the graph is aligned_srcEndowing one-hot coding characteristic to obtain a source host path diagram G with the characteristic_p＝(V,E,F)。

Step 300 is a two-stage anomaly detection, the first stage is an unsupervised learning-based anomaly logging behavior detection, and the second stage is a semi-supervised learning-based lateral movement attack detection.

Abnormal login behavior detection based on unsupervised learning, as shown in fig. 4.

And step 310, based on the user login graph, firstly, learning the behavior pattern of the host by using DGI, and performing node disturbance by using a random disorder method to obtain a negative sample.

In step 320, the feature vector of each node in the graph is the local feature h of the node, training and learning are performed through a graph convolution kernel encoder, and the global feature s is obtained through an average readout function. And with the goal of maximizing the mutual information of the local features and the global features, a discriminator is used for scoring positive and negative 'sample pairs' consisting of h and s to obtain the hidden layer representation of the nodes.

And step 330, based on the sample characteristic representation learned by DGI, detecting by using a local abnormal factor algorithm, and obtaining a small amount of labeled host samples by setting a threshold value.

And finally, synthesizing the labeled host computer sample and the user group corresponding to the labeled host computer sample into a login path training sample from the user to the target host computer for the second-stage semi-supervised learning.

Lateral movement attack detection based on semi-supervised learning, as shown in fig. 5.

At step 340, semi-supervised learning on the graph is performed using the HAN based on the source host path graph and the small number of labeled exemplars of the first stage. First, two meta-paths are defined: meta-path p from path node to source host node₁(v_path，e_on，v_src) And meta path p from path node to source host node to path node₂(v_path，e_on，v_src，e_send，v_path)。

And 350, calculating node level attention and semantic level attention features based on the two meta paths, performing semi-supervised learning by using the labeled samples in the first stage and taking a cross entropy loss function as a target, and detecting the lateral movement attack behavior.

And finally, combining the abnormal samples detected in the first stage and the second stage, namely, the transverse movement attack behavior result detected by the HGLM model.

Experiments on a CMCS Events open data set show that the AUC value of the detection result of the HGLM in the method exceeds 95%, the TPR of part of users reaches 100%, the FPR is 0, the effect exceeds that of most methods with supervised learning, and the method has high recall rate and low false alarm rate. The experimental results are shown in table 1, and compared with the existing methods, the method provided by the invention is simple and effective in HGLM, does not need a sample label, and can exceed most of supervised detection methods. In addition, the detection performance of the model on different users is shown in fig. 6, and it can be found that for most users, the recall rate of the model can exceed 95%, and the false alarm rate is lower than 5%.

TABLE 1 Performance comparison of lateral Shift attack detection models

Based on the same inventive concept, another embodiment of the present invention provides a system for detecting a lateral mobile attack based on a heterogeneous graph network, which includes:

Based on the same inventive concept, another embodiment of the present invention provides an electronic device (computer, server, smartphone, etc.) comprising a memory storing a computer program configured to be executed by the processor, and a processor, the computer program comprising instructions for performing the steps of the inventive method.

Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, performs the steps of the inventive method.

Portions of the invention not described in detail (e.g., local anomaly factor algorithms) are well known to those skilled in the art.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail by using examples, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered in the claims of the present invention.

Claims

1. A horizontal mobile attack detection method based on a heterogeneous graph network is characterized by comprising the following steps:

collecting authentication logs generated by intranet equipment and constructing a data set;

and constructing a user login graph and a source host path graph by using the data set.

Based on the user login graph, carrying out abnormal login behavior detection based on unsupervised learning;

and performing lateral movement attack detection based on semi-supervised learning based on the source host path diagram and the labeled sample obtained by detecting the abnormal login behavior based on unsupervised learning.

2. The method of claim 1, wherein data pre-processing is performed before constructing the user login graph and the source host path graph; the data preprocessing comprises the following steps: giving an authentication log data set D, traversing each authentication event, screening out the events of which the source user is the same as the target user and the source host is different from the target host, and obtaining a processed data set D₁。

3. The method of claim 2, wherein the user login graph is a homogeneous graph representing login behavior patterns between hosts for a certain period of time; the construction process of the user login graph comprises the following steps: given data set D₁User u and sliding window length L, at D₁Screening out authentication events belonging to the user u to obtain a data set D_u(ii) a Dividing the data into a plurality of time windows according to the length L of the sliding window, and calculating the login frequency characteristic F of the user on the host under different windows; traverse D_uIn each authentication event, a source host and a target host are added to a node V in the graph, an edge E of the source host and the target host connected to the graph is added, meanwhile, the login times of the source host and the target host in the window F under the corresponding window are increased by one, and the user login graph G with the characteristics of the user u is obtained after traversal is finished_u＝(V,E,F)。

4. The method of claim 3, wherein the source host path graph is a directed heterogeneous graph representing an association between a user's login path to the target host and the source host; two types of nodes are defined in the source host path diagram, wherein one type represents a source host V_srcClass V representing the user's login path to the target host_path(ii) a There are also two types of edges, one type being the sending edge E_sendThe login path node from the source host node to the target host points to the user, and represents that the user logs in from the source host to the target host; the other is depending edge E_onThe node of the login path from the user to the target host points to the node of the source host, which means that the login path from the user to the target host occurs on the source host, and the edges of the two types are symmetrical; through the occurrence frequency and the statistical characteristic F of the login path under the sliding window on the source host_statisticAnd (4) endowing the nodes with edges which only represent connection relation to obtain a source host path graph network.

5. Method according to claim 4, characterized in that said statistical feature F_statisticThe method comprises the following steps: the authentication success and failure times of the user to the target host, the ratio of the authentication times of the user to the target host to the total authentication times of the user, and the minimum value, the maximum value and the average value of the time interval of the authentication event of the user to the target host.

6. The method of claim 1, wherein the unsupervised learning-based abnormal login behavior detection comprises:

based on a user login graph, learning a behavior pattern of a host by using a graph neural network algorithm with maximized mutual information, namely obtaining hidden layer feature representation of a sample by maximizing mutual information training of a local feature h and a global feature s of the sample;

obtaining a negative sample by applying random disorder disturbance to the node, and scoring a sample pair consisting of h and s by using a discriminator to obtain hidden layer representation of the node;

and based on sample characteristic representation learned by the graph neural network algorithm, detecting by using a local abnormal factor algorithm, obtaining a small number of labeled host samples by setting a threshold value, synthesizing a login path from a user to a target host with the corresponding user group, and using the login path for the second stage of semi-supervised learning.

7. The method of claim 1, wherein the semi-supervised learning based lateral movement attack detection comprises:

two meta paths are defined: a meta path from the path node to the source host node and a meta path from the path node to the source host node to the path node;

and calculating node-level attention and semantic-level attention features based on the two meta paths, performing semi-supervised learning by using the labeled sample obtained by detecting the abnormal login behavior based on unsupervised learning and taking a cross entropy loss function as a target, and detecting the lateral movement attack behavior.

8. A heterogeneous graph network based lateral mobile attack detection system using the method of any one of claims 1 to 7, comprising:

9. An electronic apparatus, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a computer, implements the method of any one of claims 1 to 7.