CN116846645A

CN116846645A - Network intrusion detection method based on self-supervision cooperative contrast learning and application thereof

Info

Publication number: CN116846645A
Application number: CN202310831500.9A
Authority: CN
Inventors: 陈兵; 谢磊
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2023-07-07
Filing date: 2023-07-07
Publication date: 2023-10-03

Abstract

The invention discloses a network intrusion detection method based on self-supervision collaborative contrast learning and application thereof, wherein the method comprises the following steps: establishing a host interaction graph based on network traffic data collected periodically in a target network, wherein the host interaction graph comprises nodes and edges; based on a network structure view and a meta-path view, extracting local structure features and higher-order structure features of the host interaction graph, and obtaining edge feature embedding of the network structure view and the meta-path view; performing self-supervision collaborative contrast learning by combining the edge feature embedding of the network structure view and the element path view, and training a feature embedding model of the network structure view; and extracting network traffic characteristics in the host interaction graph based on the characteristic embedding model, and generating an edge characteristic embedding vector to identify malicious network traffic. The method can effectively improve generalization and accuracy of network intrusion detection.

Description

Network intrusion detection method based on self-supervision cooperative contrast learning and application thereof

Technical Field

The invention relates to the field of flow monitoring, in particular to a network intrusion detection method based on self-supervision cooperative contrast learning and application thereof.

Background

Today, network systems face increasingly serious security challenges, where Advanced Persistent Threat (APT) attacks with organisation, specific goals and long duration have a high risk occurrence worldwide. APT attacks are a specific attack on the premise of commercial or political purposes, which enable to obtain important information of a certain organization and even of a country through a series of targeted attacks, in particular developing attacks for the national important infrastructures and units. Thus, there is a need for efficient methods to detect complex network attacks like APT attacks.

In the prior art, methods in the field of attack detection are mainly divided into two types of detection methods, namely detection based on a host and detection based on network traffic. The method mainly analyzes malicious behaviors of attacks in a terminal host based on a detection model of the host; network traffic based detection is mainly to collect and analyze communication traffic in a network and extract corresponding features to realize traffic detection.

In practice, an enterprise typically uses a Network Intrusion Detection System (NIDS) to perform attack detection on network traffic to protect data security of the enterprise. Conventionally, NIDS can be divided into two broad categories, signature-based NIDS and behavior-based NIDS. The signature-based NIDS uses a set of predetermined rules, metrics, or calculations to detect network traffic. In fact, signature-based NIDS often fail to cope with unknown attacks due to the time-lag of vulnerability disclosure and the existence of realistic challenges such as updating of the offending weapon.

On the other hand, behavior-based NIDS rely on more complex operations, often employing machine learning methods to identify complex and evolving network attacks. For specific behaviors, a supervised learning method is generally used, however, in the case of unknown attacks such as zero-day attacks, network traffic can rarely be classified correctly.

In addition, because complex network attacks such as APT attacks often have multi-step attack behaviors such as transverse movement in actual intrusion actions, the traditional NIDS method usually ignores the topology mode of network traffic, and is difficult to capture the overall network diagram mode and the transverse movement path of the complex network attacks.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person of ordinary skill in the art.

Disclosure of Invention

The invention aims to provide a network intrusion detection method and application based on self-supervision collaborative contrast learning, which can improve generalization and accuracy of network intrusion detection.

In order to achieve the above purpose, the embodiment of the invention provides a network intrusion detection method based on self-supervision cooperative contrast learning.

In one or more embodiments of the present invention, the detection method includes: establishing a host interaction graph based on network traffic data collected periodically in a target network, wherein the host interaction graph comprises nodes and edges, the nodes comprise terminal entities in the target network, and the edges comprise interaction relations among the terminal entities; based on a network structure view and a meta-path view, extracting local structure features and higher-order structure features of the host interaction graph, and obtaining edge feature embedding of the network structure view and the meta-path view; performing self-supervision collaborative contrast learning by combining the edge feature embedding of the network structure view and the element path view, and training a feature embedding model of the network structure view; and extracting network traffic characteristics in the host interaction graph based on the characteristic embedding model, and generating an edge characteristic embedding vector to identify malicious network traffic.

In one or more embodiments of the present invention, the obtaining the edge feature embedding of the network structure view specifically includes: carrying out neighborhood information aggregation on each node of the host interaction graph to obtain node characteristic embedding; and obtaining edge feature embedding of the network structure view based on the node feature embedding.

In one or more embodiments of the present invention, the obtaining the edge feature embedding of the meta-path view specifically includes: corresponding node characteristics are aggregated for each meta-path in the meta-path view by using GCN codes; acquiring edge feature embedding of each element path based on the node feature embedding; and based on semantic level attention fusion, different meta-paths are fused, and edge feature embedding of the meta-path view is obtained.

In one or more embodiments of the invention, the method further comprises: performing self-supervision collaborative contrast learning based on edge feature embedding of the network structure view and the meta-path view, and calculating contrast loss of a positive sample set and a negative sample set under the network structure view, wherein the positive sample set and the negative sample set are from the meta-path view or the network mode view; and training a characteristic embedding model of the network structure view and the element path view based on the contrast loss.

In one or more embodiments of the invention, the method further comprises: extracting characteristics of network flow data in the host interaction diagram based on a characteristic embedding model in the network structure diagram, and generating the edge characteristic embedding vector; and/or detecting the edge feature embedded vector based on an unsupervised anomaly detection algorithm to identify malicious network traffic.

In one or more embodiments of the invention, the unsupervised anomaly detection algorithm includes at least one of PCA, IF, HBOS.

In one or more embodiments of the present invention, the terminal entity includes at least one of a host and a DNS server, and the interaction includes at least one of an authentication grant, a network request, and a network response.

In another aspect of the invention, a network intrusion detection device based on self-supervision collaborative contrast learning is provided, which comprises a drawing module, an extraction module, a training module and a detection module.

And the drawing module is used for establishing a host interaction diagram based on network traffic data which are collected periodically in a target network, wherein the host interaction diagram comprises nodes and edges, the nodes comprise terminal entities in the target network, and the edges comprise interaction relations among the terminal entities.

And the extraction module is used for extracting the local structural features and the high-order structural features of the host interaction graph based on the network structural view and the element path view, and obtaining the edge feature embedding of the network structural view and the element path view.

And the training module is used for carrying out self-supervision collaborative contrast learning by combining the edge feature embedding of the network structure view and the element path view, and training a feature embedding model of the network structure view.

And the detection module is used for extracting network traffic characteristics in the host interaction graph based on the characteristic embedding model and generating an edge characteristic embedding vector to identify malicious network traffic.

In one or more embodiments of the invention, the extraction module is further to: carrying out neighborhood information aggregation on each node of the host interaction graph to obtain node characteristic embedding; and obtaining edge feature embedding of the network structure view based on the node feature embedding.

In one or more embodiments of the invention, the extraction module is further to: corresponding node characteristics are aggregated for each meta-path in the meta-path view by using GCN codes; acquiring edge feature embedding of each element path based on the node feature embedding; and based on semantic level attention fusion, different meta-paths are fused, and edge feature embedding of the meta-path view is obtained.

In one or more embodiments of the invention, the training module is further to: performing self-supervision collaborative contrast learning based on edge feature embedding of the network structure view and the meta-path view, and calculating contrast loss of a positive sample set and a negative sample set under the network structure view, wherein the positive sample set and the negative sample set are from the meta-path view or the network mode view; and training a characteristic embedding model of the network structure view and the element path view based on the contrast loss.

In one or more embodiments of the invention, the detection module is further configured to: extracting characteristics of network flow data in the host interaction diagram based on a characteristic embedding model in the network structure diagram, and generating the edge characteristic embedding vector; and/or detecting the edge feature embedded vector based on an unsupervised anomaly detection algorithm to identify malicious network traffic.

In another aspect of the invention, there is provided a computing device comprising: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform a network intrusion detection method based on self-supervised collaborative contrast learning as described above.

In another aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a network intrusion detection method based on self-supervised collaborative contrast learning as described above.

Compared with the prior art, the network intrusion detection method and the application based on self-supervision cooperative contrast learning according to the embodiment of the invention can fully utilize the topological characteristics of network traffic and expand the network intrusion detection range by constructing the host interaction diagram; the local structural features in the host interaction diagram are extracted through the network structural view, so that malicious network traffic and benign network traffic in the local topological features can be discovered; the high-order structural features in the host interaction diagram are extracted through the meta-path view, and the meta-path of the high-order interaction relation can be fully utilized to capture path-level features such as lateral movement or illegal access of an attacker between hosts; the cross-view self-supervision learning is performed through the self-supervision cooperation contrast learning mechanism, so that the problem of difficulty in acquiring tag data can be solved, the characteristic information of the trainable data is fully utilized to mine potential tags, and generalization and accuracy of network intrusion detection are improved.

On the other hand, the invention can classify and identify the edge feature embedded vector extracted from the feature embedded model based on at least three anomaly detection algorithms to obtain the effective data of the network traffic data, embody the portability and the universality of the network intrusion detection method and improve the performance of the unsupervised network intrusion detection algorithm.

Drawings

FIG. 1 is a flow chart of a method of network intrusion detection based on self-supervised collaborative contrast learning according to an embodiment of the present invention;

FIG. 2 is a block diagram of a network intrusion detection method based on self-supervised collaborative contrast learning according to an embodiment of the present invention;

FIG. 3 is a specific flowchart of a network intrusion detection method based on self-supervised collaborative contrast learning according to an embodiment of the present invention;

FIG. 4 is a block diagram of a network intrusion detection device based on self-supervised collaborative contrast learning according to an embodiment of the present invention;

FIG. 5 is a hardware architecture diagram of a network intrusion detection computing device based on self-supervised collaborative contrast learning according to an embodiment of the invention.

Detailed Description

The following detailed description of embodiments of the invention is, therefore, to be taken in conjunction with the accompanying drawings, and it is to be understood that the scope of the invention is not limited to the specific embodiments.

Throughout the specification and claims, unless explicitly stated otherwise, the term "comprise" or variations thereof such as "comprises" or "comprising", etc. will be understood to include the stated element or component without excluding other elements or components.

The following describes in detail the technical solutions provided by the embodiments of the present invention with reference to the accompanying drawings.

Example 1

As shown in fig. 1 to 3, a network intrusion detection method based on self-supervised collaborative contrast learning according to an embodiment of the present invention is described, and the detection method includes the following steps.

In step S101, a host interaction graph is established based on network traffic data periodically collected in the target network.

In this embodiment, network traffic data is collected periodically within the target computer network and stored in a database, wherein the network traffic data includes multiple types of network traffic data, such as network request data, domain name resolution data, network response data, and the like.

Specifically, the collected network traffic data is converted in the form of a graph structure, a host interaction graph is created based on the network traffic data, and the host interaction graph includes a set of nodes and a set of edges, denoted as g= (V, E), where node V is a set of terminal entities, and edge E is an interaction relationship between terminal entities V, that is, a set of interaction events from a source terminal to a destination terminal, and may be an iso-graph.

Further, node V in the host interaction graph G includes and is not limited to a terminal entity such as a host and a DNS server, and edge E includes and is not limited to a host interaction relationship such as authentication authorization, network request, network response, and the like.

In step S102, based on the network structure view and the meta path view, the local structure features and the higher-order structure features of the host interaction graph are extracted, and the edge feature embedding of the network structure view and the meta path view is obtained.

In this embodiment, the information of the host interaction graph is extracted by the network structure view and the meta path view, so that the local structural feature of the host interaction graph can be extracted based on the network structure view, and the high-order structural feature of the host interaction graph can be extracted based on the meta path view. The network structure refers to a construction mode of the graph, represents a direct connection mode among different types of nodes, and can be used for describing a local structure of the graph; the meta path refers to the link relation among nodes, and can be used for drawing a high-order structure of the graph.

Specifically, the local structural features of the host interaction diagram are extracted based on the network structural view, and edge feature embedding of the network structural view is obtained.

Setting the node characteristic of the node v in the host interaction graph G as x _v = {1, …,1}, namely, firstly, assigning the node characteristic as 1, and aggregating the sampled neighbor edge and the k-th layer neighbor information, wherein the calculation formula of the neighbor information aggregation of the node v is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,the edge feature uv, u of the sampling neighborhood N (v) on the k-1 layer, denoted node v, refers to the neighboring nodes of node v. The aggregated edge feature may be embedded +.>And connecting the information of the current node v with a layer of aggregation information on the current node, so as to obtain the embedded characteristic of the current node v and use the embedded characteristic to transmit the side characteristic information.

k denotes depth, and the final node at depth k representsThe final edge of each edge uv is embedded +.>The calculation formula of the embedded characteristic of the edge uv is as follows:

through the calculation, neighborhood information aggregation can be carried out on each node of the host interaction graph, node characteristic embedding is obtained, and edge characteristic embedding of the network structure view is obtained based on the node characteristic embedding.

Further, high-order structural features of the host interaction graph are extracted based on the meta-path view, and edge feature embedding of the meta-path view is obtained.

For a given node i, it associates M meta-paths { P } ₁ ,P ₂ ,…,P _M Node i defines a meta-path based neighbor asEach element path represents a semantic similarity, the node characteristics of the element paths are encoded by using GCN related to the element paths, and the calculation formula of the node characteristics is as follows:

wherein d _i And d _j Degree of i and j nodes, h _i And h _j Respectively corresponding projection features.

Based on given M element pathsObtaining M embedding paths of the element path of the node iSimilarly, the edge of edge uv in meta-path is embedded +.>Computing a merged connection embedding u and v for a pair of nodes, obtaining an edge feature embedding +_in the meta-path starting from node i>Fusion of M embeddings into the final embedment under the meta-path view using semantic level of attention>The embedded calculation formula of the node i under the meta-path view is as follows:

wherein, beta P _n Representing meta-path P _n Importance to node i, beta P _n Pair element path P _n The importance of (2) is weighted, and the calculation formula is as follows:

wherein w is _mp ∈R ^d×d And b _mp ∈R ^d×1 As a learnable parameter, a _mp Attention vectors are at the semantic level.

Through the calculation, the corresponding node characteristics can be aggregated by using GCN codes for each meta-path in the meta-path view; acquiring edge feature embedding of each element path based on node feature embedding; and fusing different meta-paths based on semantic level attention to obtain the edge feature embedding of the meta-path view.

In step S103, self-supervised collaborative contrast learning is performed in combination with edge feature embedding of the network structure view and the meta-path view, and a feature embedding model of the network structure view is trained.

In the embodiment, self-supervision collaborative contrast learning is performed based on edge feature embedding of a network structure view and a meta-path view, and contrast loss of a positive sample set and a negative sample set in the network structure view is calculated, wherein the positive sample set and the negative sample set come from the meta-path view or the network mode view; based on the contrast loss, training a feature embedding model of the network structure view and the meta-path view.

It can be seen that the core idea of contrast learning is that similar samples are close and dissimilar samples are far apart, so a similarity measure is required to measure the similarity of two representations.

Specifically, acquiring edge feature embedding i of network structure view and meta-path viewAnd->Inputting it into a MLP (Multi-Layer Percention) with hidden Layer, mapping to the space for calculating contrast loss,/>And->The calculation formula of the projection of (2) is:

wherein σ is ELU nonlinear function, parameter { W ⁽²⁾ ,W ⁽¹⁾ ,b ⁽²⁾ ,b ⁽¹⁾ From network structure view and meta-pathThe edge features of the view are embedded in the share.

Given a node i under a network structure view, the embedding under the meta-path view can be defined as a positive sample, namely the embedding of the target node i is from the network structure view, and the embedding of the positive sample and the negative sample is from the meta-path view, so that cross-view self-supervision collaborative contrast learning is realized. Specifically, the E-GraphSAGE and meta-path view output the edge embedded graph of the input graph and the negative graph, i.e. a positive and negative sample of the embodiment.

It is known that E-graph sage is a gnn based NIDS that allows for the incorporation of edge features and topology patterns into internet of things network intrusion detection.

For positive sample set P _i And negative sample set N _i The calculation formula of the contrast loss under the network structure view is as follows:

where sim (u, v) represents the cosine similarity of the two vectors u and v, and τ represents the temperature parameter.

Similarly, contrast lossAnd->Similarly, the difference is that the target embedding is from the meta-path view, and the embedding of the positive and negative samples is from the network structure view.

The calculation formula of the overall objective function is as follows:

where λ is a coefficient that balances the effects of the network structure view and the meta-path view.

In this embodiment, the self-supervision collaborative contrast learning mechanism optimizes the feature embedding model of the network structure view and the meta-path view through back propagation, and makes the feature embedding model in the network structure view and the meta-path view iterate continuously through training to learn advanced feature embedding of benign network traffic and malicious network traffic, so as to obtain the feature embedding model of the optimized network structure view.

In step S104, based on the feature embedding model, network traffic features in the host interaction graph are extracted, and edge feature embedding vectors are generated to identify malicious network traffic.

In this embodiment, based on the feature embedding model of the optimized network structure view, the network traffic features in the host interaction graph are extracted, and the edge feature embedding vector is generated as input information to identify malicious network traffic.

Specifically, the invention judges whether the edge is malicious network traffic and completes the target of intrusion detection by taking the edge feature embedded vector as the input of the unsupervised anomaly detection model, thereby being capable of rapidly detecting complex network attacks in the target network.

Further, the unsupervised anomaly detection adopted in the embodiment includes anomaly detection algorithms such as PCA, IF and HBOS. In a real environment, training samples may contain malicious samples, and an unsupervised anomaly detection process may contain contaminated training.

HBOS (Histogram-based Outlier Score) is a Histogram-based unsupervised anomaly detection algorithm that divides samples into a plurality of bins according to features, and the probability that a bin with a small number of samples is an anomaly value is high. One of the main ideas of this approach is that for each sample an anomaly score is made, the higher the score the more likely an anomaly point.

IF (Isolation Forest) is an Ensemble-based anomaly detection method with linear time complexity and high accuracy. One of the main ideas of this method is to find out data in a large amount of data that does not fit well with the laws of other data.

PCA (Principal ComponentAnalysis), principal component analysis, is a dimension reduction algorithm. The method is mainly characterized in that feature vectors obtained after feature value decomposition can reflect different directions of variance variation degrees of original data, wherein the feature values are the variance sizes of the data in corresponding directions, and variance variation in different directions reflects internal characteristics of the data.

As shown in fig. 2, the flow of the present embodiment mainly includes a preprocessing and graph generating unit, a cooperative contrast learning training unit, and an abnormality detecting unit. The preprocessing and graph generating unit collects complex network flow data in advance, and converts the complex network flow data into a graph structure as source data; the collaborative contrast learning training unit performs cross-view self-supervision collaborative contrast learning by using a network structure view and a meta-path view to obtain a feature embedding model in network flow data; the anomaly detection unit detects the extracted edge feature embedded vector through an unsupervised anomaly detection algorithm to identify benign network traffic and malicious network traffic. Through the three flow units of the embodiment, the problem that the traditional NIDS method is difficult to capture the overall network diagram mode and the transverse moving path of complex network attacks like ATP and the like can be solved, and the method has practical application significance.

According to the network intrusion detection method and the application based on self-supervision cooperative contrast learning, which are disclosed by the embodiment of the invention, the topological characteristics of network traffic can be fully utilized by constructing the host interaction diagram, and the network intrusion detection range is enlarged; the local structural features in the host interaction diagram are extracted through the network structural view, so that malicious network traffic and benign network traffic in the local topological features can be discovered; the high-order structural features in the host interaction diagram are extracted through the meta-path view, and the meta-path of the high-order interaction relation can be fully utilized to capture path-level features such as lateral movement or illegal access of an attacker between hosts; the cross-view self-supervision learning is performed through the self-supervision cooperation contrast learning mechanism, so that the problem of difficulty in acquiring tag data can be solved, the characteristic information of the trainable data is fully utilized to mine potential tags, and generalization and accuracy of network intrusion detection are improved.

As shown in fig. 4, a network intrusion detection device based on self-supervised collaborative contrast learning according to an embodiment of the present invention is described.

In an embodiment of the present invention, a network intrusion detection device based on self-supervision cooperative contrast learning includes a drawing module 401, an extraction module 402, a training module 403, and a detection module 404.

The drawing module 401 is configured to establish a host interaction graph based on network traffic data periodically collected in a target network, where the host interaction graph includes nodes and edges, the nodes include terminal entities in the target network, and the edges include interaction relationships between the terminal entities.

And the extracting module 402 is configured to extract local structural features and higher-order structural features of the host interaction graph based on the network structural view and the meta-path view, and obtain edge feature embedding of the network structural view and the meta-path view.

And the training module 403 is used for carrying out self-supervision collaborative contrast learning in combination with the edge feature embedding of the network structure view and the element path view, and training a feature embedding model of the network structure view.

And the detection module 404 is configured to extract network traffic characteristics in the host interaction graph based on the characteristic embedding model, and generate an edge characteristic embedding vector to identify malicious network traffic.

The extraction module 402 is further configured to: carrying out neighborhood information aggregation on each node of the host interaction graph to obtain node characteristic embedding; and obtaining edge feature embedding of the network structure view based on the node feature embedding.

The extraction module 402 is further configured to: corresponding node characteristics are aggregated for each meta-path in the meta-path view by using GCN codes; acquiring edge feature embedding of each element path based on the node feature embedding; and based on semantic level attention fusion, different meta-paths are fused, and edge feature embedding of the meta-path view is obtained.

The training module 403 is further configured to: performing self-supervision collaborative contrast learning based on edge feature embedding of the network structure view and the meta-path view, and calculating contrast loss of a positive sample set and a negative sample set under the network structure view, wherein the positive sample set and the negative sample set are from the meta-path view or the network mode view; and training a characteristic embedding model of the network structure view and the element path view based on the contrast loss.

The detection module 404 is further configured to: extracting characteristics of network flow data in the host interaction diagram based on a characteristic embedding model in the network structure diagram, and generating the edge characteristic embedding vector; and/or detecting the edge feature embedded vector based on an unsupervised anomaly detection algorithm to identify malicious network traffic.

Fig. 5 illustrates a hardware architecture diagram of a computing device 50 for network intrusion detection based on self-supervised collaborative contrast learning, according to an embodiment of the present description. As shown in fig. 5, computing device 50 may include at least one processor 501, memory 502 (e.g., non-volatile memory), memory 503, and communication interface 504, and at least one processor 501, memory 502, memory 503, and communication interface 504 are connected together via bus 505. The at least one processor 501 executes at least one computer-readable instruction stored or encoded in the memory 502.

It should be appreciated that the computer-executable instructions stored in memory 502, when executed, cause at least one processor 501 to perform the various operations and functions described above in connection with fig. 1-5 in various embodiments of the present description.

In embodiments of the present description, computing device 50 may include, but is not limited to: personal computers, server computers, workstations, desktop computers, laptop computers, notebook computers, mobile computing devices, smart phones, tablet computers, cellular phones, personal Digital Assistants (PDAs), handsets, messaging devices, wearable computing devices, consumer electronic devices, and the like.

According to one embodiment, a program product, such as a machine-readable medium, is provided. The machine-readable medium may have instructions (i.e., elements described above implemented in software) that, when executed by a machine, cause the machine to perform the various operations and functions described above in connection with fig. 1-5 in various embodiments of the specification. In particular, a system or apparatus provided with a readable storage medium having stored thereon software program code implementing the functions of any of the above embodiments may be provided, and a computer or processor of the system or apparatus may be caused to read out and execute instructions stored in the readable storage medium.

According to the network intrusion detection method and the application based on self-supervision cooperative contrast learning, which are disclosed by the embodiment of the invention, the topological characteristics of network traffic can be fully utilized by constructing the host interaction diagram, and the network intrusion detection range is enlarged; the local structural features in the host interaction diagram are extracted through the network structural view, so that malicious network traffic and benign network traffic in the local topological features can be discovered; the high-order structural features in the host interaction diagram are extracted through the meta-path view, and the meta-path of the high-order interaction relation can be fully utilized to capture path-level features such as lateral movement or illegal access of an attacker between hosts; the cross-view self-supervision learning is carried out through a self-supervision cooperation contrast learning mechanism, so that the problem of difficult acquisition of tag data can be solved, the characteristic information of the trainable data is fully utilized to mine potential tags, and the generalization and the accuracy of network intrusion detection are improved; the method can be used for classifying and identifying the edge feature embedded vectors extracted from the feature embedded model based on at least three anomaly detection algorithms to obtain the effective data of the network traffic data, so that the portability and the universality of the network intrusion detection method are reflected, and the performance of the unsupervised network intrusion detection algorithm is improved.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing descriptions of specific exemplary embodiments of the present invention are presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain the specific principles of the invention and its practical application to thereby enable one skilled in the art to make and utilize the invention in various exemplary embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims

1. The network intrusion detection method based on self-supervision cooperative contrast learning is characterized by comprising the following steps of:

establishing a host interaction graph based on network traffic data collected periodically in a target network, wherein the host interaction graph comprises nodes and edges, the nodes comprise terminal entities in the target network, and the edges comprise interaction relations among the terminal entities;

based on a network structure view and a meta-path view, extracting local structure features and higher-order structure features of the host interaction graph, and obtaining edge feature embedding of the network structure view and the meta-path view;

performing self-supervision collaborative contrast learning by combining the edge feature embedding of the network structure view and the element path view, and training a feature embedding model of the network structure view;

and extracting network traffic characteristics in the host interaction graph based on the characteristic embedding model, and generating an edge characteristic embedding vector to identify malicious network traffic.

2. The network intrusion detection method based on self-supervised collaborative contrast learning according to claim 1, wherein the obtaining the edge feature embedding of the network structure view specifically comprises:

carrying out neighborhood information aggregation on each node of the host interaction graph to obtain node characteristic embedding;

and obtaining edge feature embedding of the network structure view based on the node feature embedding.

3. The network intrusion detection method based on self-supervised collaborative contrast learning according to claim 1, wherein the obtaining the edge feature embedding of the meta-path view specifically comprises:

corresponding node characteristics are aggregated for each meta-path in the meta-path view by using GCN codes;

acquiring edge feature embedding of each element path based on the node feature embedding;

and based on semantic level attention fusion, different meta-paths are fused, and edge feature embedding of the meta-path view is obtained.

4. The method for network intrusion detection based on self-supervised collaborative contrast learning of claim 1, further comprising:

performing self-supervision collaborative contrast learning based on edge feature embedding of the network structure view and the meta-path view, and calculating contrast loss of a positive sample set and a negative sample set under the network structure view, wherein the positive sample set and the negative sample set are from the meta-path view or the network mode view;

and training a characteristic embedding model of the network structure view and the element path view based on the contrast loss.

5. The method for network intrusion detection based on self-supervised collaborative contrast learning of claim 1, further comprising:

extracting characteristics of network flow data in the host interaction diagram based on a characteristic embedding model in the network structure diagram, and generating the edge characteristic embedding vector; and/or the number of the groups of groups,

and detecting the edge feature embedded vector based on an unsupervised anomaly detection algorithm, and identifying malicious network traffic.

6. The method for network intrusion detection based on self-supervised collaborative contrast learning of claim 5, wherein the unsupervised anomaly detection algorithm includes at least one of PCA, IF, HBOS.

7. The method for network intrusion detection based on self-supervised collaborative contrast learning of claim 1, wherein the terminal entity comprises at least one of a host and a DNS server, and the interaction relationship comprises at least one of an authentication grant, a network request, and a network response.

8. A network intrusion detection device based on self-supervised collaborative contrast learning, the detection device comprising:

the system comprises a drawing module, a network interaction module and a network interaction module, wherein the drawing module is used for establishing a host interaction diagram based on network traffic data which are collected periodically in a target network, the host interaction diagram comprises nodes and edges, the nodes comprise terminal entities in the target network, and the edges comprise interaction relations among the terminal entities;

the extraction module is used for extracting local structural features and higher-order structural features of the host interaction graph based on the network structural view and the element path view, and obtaining edge feature embedding of the network structural view and the element path view;

the training module is used for carrying out self-supervision collaborative contrast learning by combining the edge feature embedding of the network structure view and the element path view, and training a feature embedding model of the network structure view;

9. An electronic device, comprising:

at least one processor; and

a memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the self-supervised collaborative contrast learning-based network intrusion detection method of any one of claims 1 to 7.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the self-supervised collaborative contrast learning based network intrusion detection method according to any one of claims 1 to 7.