WO2022011977A1

WO2022011977A1 - Network anomaly detection method and system, terminal and storage medium

Info

Publication number: WO2022011977A1
Application number: PCT/CN2020/138820
Authority: WO
Inventors: 叶可江; 林鹏; 须成忠
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2020-07-15
Filing date: 2020-12-24
Publication date: 2022-01-20
Also published as: CN111885035B; CN111885035A

Abstract

The present application relates to a network anomaly detection method and system, a terminal and a storage medium. Said method comprises: performing vector conversion on network traffic by using an n-gram model, to obtain a vector matrix of the network traffic; performing spatial-temporal feature extraction on the vector matrix of the network traffic by using a long short-term memory network and a bidirectional gated recurrent unit, to obtain a hidden state of the network traffic; extracting an artificial feature of the network traffic by using an artificial feature extractor, and performing spatial-temporal feature extraction on the artificial feature, to obtain a hidden state of the artificial feature; and splicing the hidden state of the network traffic and the hidden state of the artificial feature, then inputting same to a deep neural network for classification and prediction of the network traffic, and determining, according to the prediction result, whether the network traffic is anomalous. In the present application, a fused feature is used for modeling of a model, being able to better represent network traffic, increasing the upper limit of a model prediction effect, and being able to achieve a better classification effect.

Description

A network abnormality detection method, system, terminal and storage medium

technical field

The present application belongs to the technical field of network security, and in particular, relates to a network abnormality detection method, system, terminal and storage medium.

Background technique

According to the 45th Statistical Report on Internet Development in China by China Internet Network Information Center (CNNIC), as of March 2020, the number of Internet users in my country exceeded 900 million, and the Internet penetration rate reached 64.5%. But with the vigorous development of network technology, network security incidents are emerging one after another. According to the Sangfor Technology report, malware was very active in 2019, and malicious behaviors such as virus infection, ransomware, and network attacks emerged one after another. The current network security threats are very serious. If these abnormal network traffic can be found in the early stage of network intrusion and intercepted, the occurrence of network intrusion events can be effectively reduced and the stability of information systems can be increased. The network anomaly detection system is used to solve this problem, and its purpose is to identify the network traffic that does not conform to the normal behavior pattern in the network traffic.

Currently, network anomaly detection techniques can be divided into two categories:

1. Signature-based detection method: Its principle is to analyze the known abnormal traffic data, extract a specific string pattern from it, and build an abnormal traffic fingerprint database based on this. When new network traffic is found, the traffic is compared with the fingerprints in the database one by one. Once fingerprints containing malicious traffic are found, the current traffic can be determined to be abnormal. The fingerprint-based detection method is a relatively mature detection method. This method has high accuracy, but it requires experienced experts to extract fingerprints, and requires long-term maintenance of the fingerprint database. With more and more abnormal traffic, it becomes increasingly bloated. The fingerprint database will inevitably affect the speed of network anomaly detection; moreover, this method can only identify known malicious attacks, and cannot deal with unknown new attacks, such as 0-day vulnerability detection.

2. Anomaly-based detection method: Anomaly-based detection method is the current mainstream research direction of ADS. The core idea of this method is to establish a credible activity model for legitimate user behavior, and then use the model to calculate the probability that the new behavior satisfies the legitimate behavior. If the score is lower, the behavior may be abnormal. The methods of building models often use knowledge such as mathematical statistics, data mining, and machine learning. This method can detect unknown network traffic, but how to build an effective model with low false alarm rate and low false negative rate has always been a challenge.

There is also a lot of research work on anomaly-based detection methods. But in order to train a classifier using methods such as machine learning, the network traffic must first be converted into a set of vector representations, and this part is currently often implemented manually. Most of the research work is based on the artificially set traffic feature dataset, which obviously cannot determine the upper limit of the classifier based on the quality of the feature design. There are also some works that try to use raw data for modeling, but most of the traffic embedding methods used are byte-level one-hot encoding, which has certain defects and cannot reflect the internal implicit relationship of the data well.

SUMMARY OF THE INVENTION

The present application provides a network anomaly detection method, system, terminal and storage medium, aiming to solve one of the above-mentioned technical problems in the prior art at least to a certain extent.

In order to solve the above problems, the application provides the following technical solutions:

A network anomaly detection method, comprising the following steps:

Use n-gram model to perform vector transformation on network traffic to obtain a vector matrix of the network traffic;

Using a long-short-term memory network and a two-way gated cyclic unit to perform spatiotemporal feature extraction on the vector matrix of the network traffic to obtain the hidden state of the network traffic;

Extract the artificial features of the network traffic through an artificial feature extractor, and perform spatiotemporal feature extraction on the artificial features to obtain the hidden state of the artificial features;

After splicing the hidden state of the network traffic with the hidden state of the artificial feature, input the deep neural network to perform classification prediction of the network traffic, and determine whether the network traffic is abnormal according to the prediction result.

The technical solutions adopted in the embodiments of the present application further include: before the vector transformation of the network traffic data using the n-gram model further includes:

Convert the network traffic into network traffic data packets in the form of standard input; specifically:

The network traffic is divided into m groups according to the five-tuple <source IP, destination IP, source port, destination port, transmission protocol>, and each group represents a bidirectional communication flow;

Take the first p packets in each group to get m*p packets;

Take the first q bytes of each data packet to get m*p*q bytes;

Concatenate the first q bytes of the first p packets in m groups to form a m*p*q tensor.

The technical solutions adopted in the embodiments of the present application further include: the vector transformation of network traffic by using the n-gram model includes:

Set length of 256 bytes of the 1-gram hash table, and set the length l 2-gram hash table and the byte length of ₁ l 3-gram Hash Table ₂ bytes;

Map each 2-gram and 3-gram byte combination to a 2-gram hash byte table and a 3-gram hash byte table, respectively, and the combination in the same position is represented by a shared embedding;

A corresponding d-dimensional vector is respectively set to each element in the 1-gram, 2-gram, and 3gram hash byte tables;

The q bytes in the tensor of m*p*q are converted into vectors through the 1-gram, 2-gram, and 3gram hash byte tables, respectively, to obtain v ₁ , v ₁ , v ₃ , and convert v ₁ , v ₁ , v ₃ are spliced to obtain a tensor with an output dimension of m*p*n*3d, where n=p+p/2+p/3.

The technical solutions adopted in the embodiments of the present application further include: the use of a long-short-term memory network and a bidirectional gated cyclic unit to perform spatiotemporal feature extraction on the vector matrix of the network traffic, and obtaining the hidden state of the network traffic includes:

Performing a one-dimensional convolution operation on the vector matrix of the m*p network traffic data packets, respectively, to obtain the first hidden state h _{1 of} each network traffic data packet;

_{The spatiotemporal feature extraction is performed on the first hidden states h 1} of the m*p network traffic data packets, respectively, to obtain the second hidden states h _{2 of} each network traffic data packet.

The technical solutions adopted in the embodiments of the present application further include: performing a one-dimensional convolution operation on the vector matrices of the m*p network traffic data packets respectively includes:

Set convolution kernels with sizes of 3*3d, 4*3d and 5*3d respectively, the number of each convolution kernel used is r, and the total number of the convolution kernels is 3r;

Perform a one-dimensional convolution operation in the row direction on the i-th (0<i≤p) network traffic data packet to obtain 3r feature maps;

Perform a maximum pooling operation on the 3r feature maps respectively to obtain 3r values, and splicing the 3r values to obtain the first hidden state h _{i1 of} the i-th network traffic data packet.

The technical solution adopted in the embodiment of the present application further includes: the step of extracting the spatiotemporal features _{of the first hidden states h 1 of the m*p network traffic data packets respectively includes:}

_{Send the first hidden state h i1 of} the i-th network traffic data packet into the bidirectional long-short-term memory network to learn the first hidden state s _{i1 of} each time step;

Send all the first hidden states s _i1 to the bidirectional gated recurrent unit to learn the second hidden states h _{i2 of} each time step;

Calculate the attention weight of each time step according to the second hidden state h _i2 _{: e i} =tanh(Wh _i2 +b),

Perform weighted summation on all the second hidden states h _i2 to obtain the second hidden state of the i-th network traffic data packet: h ₂ =∑ _i α _i *h _i2 .

The technical solutions adopted in the embodiments of the present application further include: extracting the artificial features of the network traffic by using an artificial feature extractor further includes:

Extracting 80 hand-designed network traffic features from the network traffic using a traffic feature extraction tool;

Each network flow in the network flow is represented as a flow vector with a size of 1*80, and each column represents an eigenvalue.

The technical solution adopted in the embodiment of the present application further includes: performing spatiotemporal feature extraction on the artificial feature, and obtaining the hidden state of the artificial feature includes:

Converting the artificial characteristics of the network traffic into artificial characteristic data packets in the form of standard input;

Perform spatiotemporal feature extraction on the artificial feature data packet in the standard input form to obtain the hidden state h′ _{2 of} each traffic vector.

Another technical solution adopted by the embodiments of the present application is: a network anomaly detection system, comprising:

Vector conversion module: used to perform vector conversion on network traffic by using the n-gram model to obtain a vector matrix of the network traffic;

The first spatiotemporal feature extraction module: used for extracting spatiotemporal features from the vector matrix of the network traffic by using a long-short-term memory network and a bidirectional gated cyclic unit to obtain the hidden state of the network traffic;

Artificial feature extraction module: for extracting the artificial features of the network traffic through an artificial feature extractor;

The second spatiotemporal feature extraction module: used for performing spatiotemporal feature extraction on the artificial feature to obtain the hidden state of the artificial feature;

Network traffic prediction module: used to splicing the hidden state of the network traffic with the hidden state of the artificial feature, inputting the deep neural network to classify and predict the network traffic, and determining whether the network traffic is abnormal according to the prediction result .

Another technical solution adopted by the embodiments of the present application is: a terminal, the terminal includes a processor and a memory coupled to the processor, wherein,

The memory stores program instructions for implementing the network anomaly detection method;

The processor is configured to execute the program instructions stored in the memory to control network anomaly detection.

Another technical solution adopted by the embodiments of the present application is: a storage medium storing program instructions executable by a processor, where the program instructions are used to execute the network abnormality detection method.

Compared with the prior art, the beneficial effects of the embodiments of the present application are: the network anomaly detection method, system, terminal and storage medium of the embodiments of the present application establish a combination table of network traffic by using the n-gram model, and for each combination Learning a vector representation in a low-dimensional space, and using fused features to model the model, that is, using a deep neural network to learn the intrinsic feature representation of network traffic on the basis of artificially designed features, which can better represent network traffic and increase The upper bound of the model prediction effect. At the same time, the embodiment of the present application uses one-dimensional convolution, bidirectional LSTM, bidirectional GRU and attention mechanism, which can better reflect the internal implicit relationship of data, so as to better learn the feature representation of network traffic, which can achieve better classification effect.

Description of drawings

1 is a flowchart of a network abnormality detection method according to an embodiment of the present application;

2 is a schematic diagram of an original network traffic conversion method according to an embodiment of the present application;

Fig. 3 is the flow chart that the embodiment of the present application adopts n-gram model to carry out vector conversion to the network traffic data packet of standard input form;

4 is a flow chart of performing a one-dimensional convolution operation on a vector matrix of each network traffic data packet according to an embodiment of the present application;

5 is a flowchart of spatiotemporal feature extraction according to an embodiment of the present application;

6 is a schematic structural diagram of a network anomaly detection system according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a terminal according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a storage medium according to an embodiment of the present application.

detailed description

In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

In order to solve the deficiencies of the prior art, the embodiment of the present application uses the n-gram model to establish a combination table of network traffic, and learns a vector representation in a low-dimensional space for each combination, and each network data packet is processed by the n-gram model. After splitting and vector transformation, it is sent to a deep neural network to learn the vector space representation of network traffic and extract spatiotemporal features. At the same time, in order to supplement the hidden features that the neural network may not learn, the embodiment of the present application further improves the detection effect of the model by adding artificially designed feature representations.

Specifically, please refer to FIG. 1 , which is a flowchart of a network abnormality detection method according to an embodiment of the present application. The network anomaly detection method according to the embodiment of the present application includes the following steps:

S1: Collect raw network traffic, and execute S2 and S6 at the same time;

In the embodiment of the present application, the network traffic collection method is specifically: using a network traffic capture technology such as Wireshark and TCPdump to capture network traffic data packets, and save the captured network traffic data packets as a pcp file.

S2: Convert the original network traffic into network traffic packets in the form of standard input;

In this step, please refer to Figure 2, which is a schematic diagram of the original network traffic conversion method, which specifically includes:

S21: Divide the original network traffic into m groups according to the five-tuple <source IP, destination IP, source port, destination port, transmission protocol>, and each group represents a bidirectional communication flow; wherein, the size of the m value can be determined according to the actual application make settings;

S22: Take the first p data packets in each group to obtain m*p data packets; wherein, if there is a group with less than p data packets, fill the group to make it reach p data packets; p The size of the value can be set according to the actual application;

S23: Take the first q bytes of each data packet to obtain m*p*q bytes; wherein, if there is a data packet less than q bytes, the data packet is filled with 0x00 bytes, so that It reaches q bytes; the size of the q value can be set according to the actual application;

S24: Concatenate the first q bytes of the first p data packets in the m groups to form a m*p*q tensor.

S3: Use the n-gram model to perform vector transformation on the network traffic data packets in the form of standard input, and obtain the vector matrix of each network traffic data packet;

In this step, please refer to FIG. 3 together, which is a flow chart of vector conversion of network traffic data packets in the form of standard input using the n-gram model, which specifically includes:

S31: Set the length of 1-gram byte hash table 256, and sets the length l 2-gram hash table and the byte length of ₁ l 3-gram Hash Table ₂ bytes;

S32: Map each 2-gram and 3-gram byte combination to the 2-gram hash byte table and the 3-gram hash byte table respectively, and the combination in the same position is represented by a shared embedding;

S33: Set a corresponding d-dimensional vector for each element in the 1-gram, 2-gram, and 3gram byte tables, and the value of the d-dimensional vector is randomly initialized first;

S34: Convert the q bytes in the m*p*q tensor through the 1-gram, 2-gram, and 3gram byte tables to vector conversion to obtain v ₁ , v ₁ , v ₃ , and _{convert v 1} , v ₁ , v ₃ is spliced to obtain a tensor with an output dimension of m*p*n*3d, where n=p+p/2+p/3.

S4: respectively perform a one-dimensional convolution operation on the vector matrix of m*p network traffic data packets to obtain the first hidden state h _{1 of} each network traffic data packet;

In this step, by using one-dimensional convolution to perform vertical scanning calculation on the vector matrix of network traffic data packets, and use the maximum pooling technology to compress the data. Specifically, as shown in Figure 4, it is a flow chart of performing a one-dimensional convolution operation on the vector matrix of each network traffic data packet, which specifically includes:

S41: Set three convolution kernels with sizes of 3*3d, 4*3d and 5*3d respectively, and the number of each convolution kernel used is r, that is, the total number of convolution kernels is 3r;

S42: Perform a row-direction one-dimensional convolution operation on the i-th (0<i≤p) network traffic data packet, and each convolution kernel can obtain a feature map, then a total of 3r feature maps are obtained;

S43: Perform a maximum pooling operation on the 3r feature maps respectively to obtain 3r values, and splicing the 3r values to obtain the first hidden state h _{i1 of} the i-th network traffic data packet.

_{S5: Perform spatiotemporal feature extraction on the first hidden states h 1} of m*p network traffic data packets, respectively, to obtain second hidden states h _{2 of} each network traffic data packet, and execute S9;

In this step, the two-way LSTM structure, the two-way GRU structure and the attention mechanism are used to extract the spatiotemporal features of the network traffic data packets, as shown in Figure 5, which is a flowchart of the spatiotemporal feature extraction, which specifically includes:

S51: send the first hidden state h _{i1 of} the i-th network traffic data packet into a bidirectional long-short-term memory network (Bi-LSTM) to learn the first hidden state s _{i1 of} each time step;

S52: send all the first hidden states s _i1 to the bidirectional gated recurrent unit (Bi-GRU) to learn the second hidden states h _{i2 of} each time step;

S53: Calculate the attention weight of each time step according to the second hidden state h _i2 _{: e i} =tanh(Wh _i2 +b),

S54: Perform weighted summation on all the second hidden states h _i2 to obtain the second hidden state of the i-th network traffic data packet: h ₂ =∑ _i α _i *h _i2 .

S6: perform artificial feature extraction on the original network traffic through an artificial feature extractor to obtain an artificial feature representation of the original network traffic;

In this step, the manual feature extraction method of the original network traffic specifically includes: using the traffic feature extraction tool CICFlowMeter to extract 80 manually designed network traffic features from the pcap file; representing each network flow in the original network traffic as a size is a 1*80 traffic vector, each column represents an eigenvalue.

S7: Convert the artificial feature representation of the original network traffic into an artificial feature data packet in the form of standard input;

In this step, the conversion method of artificial feature representation is specifically:

First, normalize each feature of each network flow in the original network traffic, and map the attribute values to between 0 and 1:

Then, the window length w is set, and each network flow vector is combined with its previous w-1 network flow vectors to obtain a flow vector representation with a size of w*80.

S8: perform spatiotemporal feature extraction on the artificial feature data packet in the form of standard input, and obtain the hidden state h′ _{2 of} each traffic vector;

In this step, a bidirectional LSTM structure, bi-directional, and attention mechanisms GRU artificial structure characterized in packet temporal feature extraction, hidden state flow vector h _'2 hidden state and a second network traffic data ₂ h temporal characteristics The extraction process is the same and will not be repeated here.

Step 900 : splicing the second hidden state h _{2 of the} network traffic data packet and the hidden state h′ _{2 of the} traffic vector to obtain the final third hidden state h ₃ , and inputting the third hidden state h ₃ into the deep neural network for network Classification prediction of traffic, according to the prediction result to determine whether the network traffic data is abnormal;

In this step, after sending h ₃ into the deep neural network, first calculate the output predicted values of different categories of network traffic: u=Wh ₃ +b; then perform Softmax classification on the predicted value u to obtain the predicted labels of network traffic:

Based on the above, the network anomaly detection method of the embodiment of the present application uses fused features to model the model, that is, uses a deep neural network to learn the intrinsic feature representation of network traffic on the basis of artificially designed features, so that the network can be better represented. traffic, thereby increasing the upper bound of the model's predictive effect. At the same time, the embodiment of this application proposes a new byte combination embedding method, which learns the vector representation of 1-gram, 2-gram and 3-gram for network traffic, and splices them horizontally, so as to better represent network traffic . In addition, the embodiment of this application uses one-dimensional convolution, bidirectional LSTM, bidirectional GRU and attention mechanism, which can better reflect the internal implicit relationship of data, so as to better learn the feature representation of network traffic, so as to achieve better classification effect.

Please refer to FIG. 6 , which is a schematic structural diagram of a network anomaly detection system according to an embodiment of the present application. The network anomaly detection system of the embodiment of the present application includes:

Traffic collection module: used to collect original network traffic; in the embodiment of the present application, the network traffic collection method is specifically: using Wireshark, TCPdump and other network traffic capture technologies to capture network traffic data packets, and save the captured network traffic data packets as pacp file.

Traffic conversion module: used to convert the original network traffic into network traffic data packets in the form of standard input; the traffic conversion method specifically includes:

The original network traffic is divided into m groups according to the five-tuple <source IP, destination IP, source port, destination port, transmission protocol>, and each group represents a bidirectional communication flow; the value of m can be set according to the actual application. Certainly;

Take the first p data packets in each group to get m*p data packets; among them, if there is a group with less than p data packets, fill the group to make it reach p data packets; The size can be set according to the actual application;

Take the first q bytes of each data packet to get m*p*q bytes; among them, if there is a data packet with less than q bytes, the data packet is filled with 0x00 bytes to make it reach q bytes; the size of the q value can be set according to the actual application;

Vector conversion module: It is used to perform vector conversion on network traffic data packets in the form of standard input by using the n-gram model to obtain a vector matrix of each network traffic data packet; wherein, the vector conversion methods specifically include:

Set a corresponding d-dimensional vector for each element in the 1-gram, 2-gram, and 3gram byte tables, and the value of the d-dimensional vector is randomly initialized first;

Convert the q bytes in the m*p*q tensor through the 1-gram, 2-gram, and 3gram byte tables to vector conversion to obtain v ₁ , v ₁ , v ₃ , and _{convert v 1} , v ₁ , v ₃ Perform splicing to obtain a tensor with an output dimension of m*p*n*3d, where n=p+p/2+p/3.

Convolution calculation module: used to perform a one-dimensional convolution operation on the vector matrix of m*p network traffic data packets, respectively, to obtain the first hidden state h _{1 of} each network traffic data packet; wherein, the embodiment of the present application uses a Dimensional convolution computes a vertical scan of a vector matrix of network traffic packets, and compresses the data using a max pooling technique. Specifically include:

Three convolution kernels with sizes of 3*3d, 4*3d and 5*3d are set respectively, and the number of each convolution kernel used is r, that is, the total number of convolution kernels is 3r;

Perform a one-dimensional convolution operation in the row direction on the i-th (0<i≤p) network traffic data packet, each convolution kernel can obtain a feature map, and a total of 3r feature maps are obtained;

Perform a maximum pooling operation on the 3r feature maps respectively to obtain 3r values, and splicing the 3r values to obtain the first hidden state h _{i1 of} the ith network traffic data packet.

A first temporal feature extraction module: a second hidden states for a first hidden state m * p network traffic packets h ₁ respectively temporal feature extraction, to give the respective network data packet traffic h _2; wherein the application of the present embodiment The example uses the bidirectional LSTM structure, the bidirectional GRU structure and the attention mechanism to extract the spatiotemporal features of the network traffic data packets, including:

_{Send the first hidden state h i1 of} the i-th network traffic data packet into a bidirectional long-short-term memory network (Bi-LSTM) to learn the first hidden state s _{i1 of} each time step;

Send all the first hidden states s _i1 into the bidirectional gated recurrent unit (Bi-GRU) to learn the second hidden states h _{i2 of} each time step;

Manual feature extraction module: It is used to perform manual feature extraction on the original network traffic through an artificial feature extractor to obtain the artificial feature representation of the original network traffic; wherein, the manual feature extraction method of the original network traffic specifically includes: using the traffic feature extraction tool CICFlowMeter from 80 hand-designed network traffic features are extracted from the pcap file; each network flow in the original network traffic is represented as a traffic vector with a size of 1*80, and each column represents a feature value.

Artificial feature conversion module: It is used to convert the artificial feature representation of the original network traffic into an artificial feature data packet in the form of standard input; wherein, the conversion method of the artificial feature representation is specifically:

The second spatiotemporal feature extraction module is used to extract spatiotemporal features from the artificial feature data packets in the form of standard input, and obtain the hidden state h′ _{2 of} each traffic vector; wherein, the hidden state h′ _{2 of the} traffic vector and the network traffic data The spatiotemporal feature extraction process of the second hidden state h _{2 is} the same, which will not be repeated here.

Network traffic prediction module: used _{to splicing the second hidden state h 2 of the} network traffic data packet with the hidden state h′ _{2 of the} traffic vector to obtain the final third hidden state h ₃ , and input the hidden state h ₃ into the deep neural network Carry out classification prediction of network traffic, and determine whether the network traffic is abnormal according to the prediction result; among them, after sending h ₃ into the deep neural network, first calculate the output predicted value of different categories of network traffic: u=Wh ₃ +b; u performs Softmax classification to get the predicted label of network traffic:

Please refer to FIG. 7 , which is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal 50 includes a processor 51 and a memory 52 coupled to the processor 51 .

The memory 52 stores program instructions for implementing the above-mentioned network abnormality detection method.

The processor 51 is configured to execute program instructions stored in the memory 52 to control network anomaly detection.

The processor 51 may also be referred to as a CPU (Central Processing Unit, central processing unit). The processor 51 may be an integrated circuit chip with signal processing capability. The processor 51 may also be a general purpose processor, digital signal processor (DSP), application specific integrated circuit (ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component . A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Please refer to FIG. 8 , which is a schematic structural diagram of a storage medium according to an embodiment of the present application. The storage medium of this embodiment of the present application stores a program file 61 capable of implementing all the above methods, wherein the program file 61 may be stored in the above-mentioned storage medium in the form of a software product, and includes several instructions to make a computer device (which may It is a personal computer, a server, or a network device, etc.) or a processor that executes all or part of the steps of the methods of the various embodiments of the present invention. The aforementioned storage medium includes: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes, or Computers, servers, mobile phones, tablets and other terminal equipment.

The above description of the disclosed embodiments enables any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined in this application may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

A network anomaly detection method, comprising the following steps:

Use n-gram model to perform vector transformation on network traffic to obtain a vector matrix of the network traffic;

Using a long-short-term memory network and a two-way gated cyclic unit to perform spatiotemporal feature extraction on the vector matrix of the network traffic to obtain the hidden state of the network traffic;

Extract the artificial features of the network traffic through an artificial feature extractor, and perform spatiotemporal feature extraction on the artificial features to obtain the hidden state of the artificial features;

After splicing the hidden state of the network traffic with the hidden state of the artificial feature, input the deep neural network to perform classification prediction of the network traffic, and determine whether the network traffic is abnormal according to the prediction result.
The network anomaly detection method according to claim 1, wherein before the vector transformation is performed on the network traffic data by using the n-gram model, the method further comprises:

Convert the network traffic into network traffic data packets in the form of standard input; specifically:

The network traffic is divided into m groups according to the five-tuple <source IP, destination IP, source port, destination port, transmission protocol>, and each group represents a bidirectional communication flow;

Take the first p packets in each group to get m*p packets;

Take the first q bytes of each data packet to get m*p*q bytes;

Concatenate the first q bytes of the first p packets in m groups to form a m*p*q tensor.
The network anomaly detection method according to claim 2, wherein the vector transformation of the network traffic using the n-gram model comprises:

Set length of 256 bytes of the 1-gram hash table, and set the length l 2-gram hash table and the byte length of 1 l 3-gram Hash Table 2 bytes;

Map each 2-gram and 3-gram byte combination to a 2-gram hash byte table and a 3-gram hash byte table, respectively, and the combination in the same position is represented by a shared embedding;

A corresponding d-dimensional vector is respectively set to each element in the 1-gram, 2-gram, and 3gram hash byte tables;

The q bytes in the tensor of m*p*q are converted into vectors through the 1-gram, 2-gram, and 3gram hash byte tables, respectively, to obtain v 1 , v 1 , v 3 , and convert v 1 , v 1 , v 3 are spliced to obtain a tensor with an output dimension of m*p*n*3d, where n=p+p/2+p/3.
The network anomaly detection method according to claim 3, characterized in that, using a long-short-term memory network and a bidirectional gated loop unit to perform spatiotemporal feature extraction on the vector matrix of the network traffic to obtain a hidden state of the network traffic include:

Performing a one-dimensional convolution operation on the vector matrix of the m*p network traffic data packets, respectively, to obtain the first hidden state h 1 of each network traffic data packet;

The spatiotemporal feature extraction is performed on the first hidden states h 1 of the m*p network traffic data packets, respectively, to obtain the second hidden states h 2 of each network traffic data packet.
The network anomaly detection method according to claim 4, wherein the performing a one-dimensional convolution operation on the vector matrix of the m*p network traffic data packets respectively comprises:

Set convolution kernels with sizes of 3*3d, 4*3d and 5*3d respectively, the number of each convolution kernel used is r, and the total number of the convolution kernels is 3r;

Perform a one-dimensional convolution operation in the row direction on the i-th (0<i≤p) network traffic data packet to obtain 3r feature maps;

Perform a maximum pooling operation on the 3r feature maps respectively to obtain 3r values, and splicing the 3r values to obtain the first hidden state h i1 of the i-th network traffic data packet.
The network anomaly detection method according to claim 5, wherein the step of extracting spatiotemporal features for the first hidden states h 1 of the m*p network traffic data packets respectively comprises:

Send the first hidden state h i1 of the i-th network traffic data packet into the bidirectional long-short-term memory network to learn the first hidden state s i1 of each time step;

Send all the first hidden states s i1 to the bidirectional gated recurrent unit to learn the second hidden states h i2 of each time step;

Calculate the attention weight of each time step according to the second hidden state h i2 : e i =tanh(Wh i2 +b),

Perform weighted summation on all the second hidden states h i2 to obtain the second hidden state of the i-th network traffic data packet: h 2 =∑ i α i *h i2 .
The network anomaly detection method according to claim 1, wherein the extracting the artificial features of the network traffic by an artificial feature extractor further comprises:

Extracting 80 hand-designed network traffic features from the network traffic using a traffic feature extraction tool;

Each network flow in the network flow is represented as a flow vector with a size of 1*80, and each column represents an eigenvalue.
The network anomaly detection method according to claim 7, wherein the extraction of spatiotemporal features from the artificial features to obtain a hidden state of the artificial features comprises:

Converting the artificial characteristics of the network traffic into artificial characteristic data packets in the form of standard input;

Perform spatiotemporal feature extraction on the artificial feature data packet in the standard input form to obtain the hidden state h′ 2 of each traffic vector.
A network anomaly detection system, characterized in that it includes:

Vector conversion module: used to perform vector conversion on network traffic by using the n-gram model to obtain a vector matrix of the network traffic;

The first spatiotemporal feature extraction module: used to extract the spatiotemporal feature of the vector matrix of the network traffic by using a long-short-term memory network and a bidirectional gated cyclic unit to obtain the hidden state of the network traffic;

Artificial feature extraction module: used to extract the artificial features of the network traffic through an artificial feature extractor;

The second spatiotemporal feature extraction module: used for performing spatiotemporal feature extraction on the artificial feature to obtain the hidden state of the artificial feature;

Network traffic prediction module: After splicing the hidden state of the network traffic and the hidden state of the artificial feature, inputting the deep neural network to perform classification prediction of the network traffic, and determining whether the network traffic is abnormal according to the prediction result .
A terminal, characterized in that the terminal includes a processor and a memory coupled to the processor, wherein,

The memory stores program instructions for implementing the network abnormality detection method according to any one of claims 1-8;

The processor is configured to execute the program instructions stored in the memory to control network anomaly detection.
A storage medium, characterized in that it stores program instructions executable by a processor, and the program instructions are used to execute the network abnormality detection method according to any one of claims 1 to 8.