CN112839034B

CN112839034B - Network intrusion detection method based on CNN-GRU hierarchical neural network

Info

Publication number: CN112839034B
Application number: CN202011590155.7A
Authority: CN
Inventors: 王梓天; 朱国胜; 邹洁; 王泽松; 刘旭
Original assignee: Hubei University; CERNET Corp
Current assignee: Hubei University; CERNET Corp
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2022-08-05
Anticipated expiration: 2040-12-29
Also published as: CN112839034A

Abstract

The invention relates to a network intrusion detection method based on a CNN-GRU hierarchical neural network, which comprises the following steps: capturing a network flow data packet, namely a data packet to be classified, by Wireshark software; carrying out data packet marking, preprocessing and data cleaning on a data packet to be classified, further analyzing the data packet into decimal data, and converting the decimal data into a 40 × 40 single-channel gray-scale image to obtain a sample complete set; dividing a sample complete set into a training set and a testing set, taking a single-channel gray-scale map matrix as an input vector, and establishing a CNN-GRU hierarchical neural network classification model through the training set; and after the model training is finished, the data of the test set is transmitted into the model, the model predicts the input data according to the parameters obtained by training, and classifies unknown network traffic to judge whether the unknown network traffic is attack traffic. The experimental result shows that the accuracy of the method for classifying the normal flow and the attack flow reaches 99.92%.

Description

Network intrusion detection method based on CNN-GRU hierarchical neural network

Technical Field

The invention relates to the technical field of network security, in particular to a network intrusion detection method based on a CNN-GRU hierarchical neural network.

Background

With the rapid development of the Internet, a large number of devices and persons have joined the Internet environment. At the same time, problems with network traffic security have increased. Wherein, the network attacker often breaks down the network according to the loophole on the internet, which causes immeasurable loss to the user. In the past, such attacks often caused economic losses to the enterprise, but now including personal privacy theft, which caused tremendous harm to the interests of most network users.

To avoid such problems, we often need to be able to detect attack behavior by analyzing traffic data generated by network users. A key challenge is how to efficiently identify traffic data with aggressive behavior. Because the traditional method for cracking and decrypting the network traffic needs to deploy additional equipment, the cost and the deployment difficulty are higher. Traditional payload-based methods have been unable to handle more and more encrypted traffic, and traditional machine learning models are often used in machine learning-based network intrusion detection. However, the common problems encountered are that it is difficult to find a proper function as a reference standard of the network, and the machine learning model usually needs more quantifiable features as a training reference and is not suitable for classification training with ambiguous features. When machine learning methods are used for classification, this further leads to a bottleneck in accuracy, which is difficult to improve.

With the development of chip technology, the computing power of computers has been greatly developed in recent years. Meanwhile, the development of the internet also urges a large amount of data. In this case, deep learning networks have been widely used, including network intrusion detection. Compared with the traditional machine learning method, the deep learning method can automatically find the correlation among different traffic information, and gives different weights to the features through mass data training. Compared with the method for manually defining the characteristics, the method has better applicability and is more suitable for realizing a network intrusion detection system.

Disclosure of Invention

The purpose of the invention is: the invention provides a network intrusion detection method based on a CNN-GRU (neural network-GRU) hierarchical neural network by analyzing the characteristic attributes in the acquired network traffic through the CNN-GRU hierarchical neural network. According to the practical problem of network intrusion detection, the method comprises the steps of collecting available original data, extracting a CNN-GRU hierarchical neural network sample complete set by utilizing the determined label data, preprocessing the original data through characteristic engineering, and removing part of invalid contents in a data packet. And after dividing the sample complete set into a training set and a testing set according to a proper proportion, training the model, verifying the effectiveness of the model through the testing set to obtain a CNN-GRU hierarchical neural network classification model, and realizing accurate monitoring of network intrusion behaviors.

In order to solve the problems, the technical scheme adopted by the invention is as follows:

a network intrusion detection method based on a CNN-GRU hierarchical neural network is characterized by comprising the following steps:

(1) capturing network traffic through Wireshark software to obtain a network traffic data packet, namely a data packet to be classified;

(2) carrying out data packet marking on the data packet to be classified, and meanwhile, preprocessing the data packet to be classified through feature engineering to remove part of invalid contents in the data packet; cleaning the data packets in all the streams, and cleaning each data stream; further analyzing the data packet into decimal data, and converting the decimal data into a 40-by-40 single-channel gray-scale graph; obtaining all picture samples required by model training, thereby obtaining a CNN-GRU layered neural network sample complete set;

(3) dividing a sample complete set into a training set and a test set according to a proper proportion, based on a CNN-GRU hierarchical neural network algorithm, taking a single-channel gray-scale map matrix as an input vector, and establishing a CNN-GRU hierarchical neural network classification model through the training set, so that the model learns how to classify samples;

(4) and after the model training is finished, transmitting the data of the test set into the model, predicting the input data by the model according to the parameters obtained by training, and classifying unknown network traffic to judge whether the unknown network traffic is attack traffic or which type of attack traffic.

Further, the network flow data packet captured in the step (1) has a binary data content stored in the data packet.

Further, the specific process of step (2) includes:

(2.1) marking the data packet to be classified, marking normal flow and attack flow according to the requirement, if the attack flow is required to be classified, marking different types of attack flow in a classified manner, wherein the result of flow type marking is stored in a digital manner and starts from 0;

(2.2) preprocessing a data packet to be classified through feature engineering, shunting the captured network flow according to a source IP address, a source port and a destination IP address, and realizing shunting by using SliptCat software;

(2.3) cleaning the data packets in all the streams, removing the MAC source address, the MAC destination address and the network protocol type information used by the data packets in the data packets, extracting the data of the first 160 bytes from each data packet, and filling 0 treatment on the part of the data packets with less than 160 bytes;

(2.4) cleaning each data stream, extracting the first 10 data packets from each data stream, and filling 160-byte all-0 data packets until 10 data packets are processed under the condition that the data stream contains less than 10 data packets;

(2.5) at the moment, the data in each stream is 1600 bytes of 160 × 10, the data in each byte is converted into decimal, a numerical value with the value range of 0-255 is obtained, and the decimal data with 1600 dimensions is converted into matrix data of 40 × 40;

and (2.6) converting the numerical values in the data of the 40-by-40 matrix into gray levels, obtaining a single-channel gray level diagram with the size of 40-by-40 corresponding to each matrix, and obtaining all picture samples required by model training.

Further, the specific process of step (3) includes:

(3.1) firstly, entering the data into an improved LetNet-5 network, and extracting the spatial characteristics of the original network flow data by using two convolution layers and two maximum pooling layers; using 32 5 x 5 convolution kernels in the first tier of the convolution process, then performing a max pooling operation, and using 64 3 x 3 convolution kernels in the second tier, then performing a max pooling operation; after each convolution operation, the CNN hidden layer is firstly converted by using a ReLU activation function, then the CNN hidden layer is processed by using a maximum pooling operation, and an original single-channel 40 × 40 picture is converted into an 8 × 8 picture with 64 channels; after the vectors are fully expanded, 4096-dimensional vectors are obtained and transmitted to an output layer of a CNN network, the output layer uses a full-junction layer, the full-junction layer uses 1600 neurons, and after the transformation keeps the same dimensionality and original data are extracted, the full-junction layer is considered to randomly inactivate some neurons so as to avoid overfitting;

(3.2) then, automatically extracting the time characteristics of the original stream data by using a GRU network, wherein the GRU network extracts the time characteristics by using a two-layer unit; each unit of the GRU comprises 256 GRU units, and the activation function of each layer performs nonlinear operation by using an S-shaped function; the last layer of the GRU network uses a fully connected layer, and the number of neurons in the fully connected layer is equal to the number of flow classes;

and (3.3) training by using the training set to obtain a network intrusion detection model.

Further, the convolution operation in step (3.1) performs sliding convolution on the picture with size n × n using a convolution kernel ω with size f × f, and each sliding convolution generates a new feature; assuming X is the input to the convolution, b is the bias term, ci is the new feature produced by the convolution at layer i, and σ r is the activation function ReLU; then, the new features obtained by the convolution operation are: ci ═ σ r (# × Xi + bi), after convolution, the n × n profile will generate c ═ nf +1) profiles; determining the size by a sliding window of convolution kernel of size ff; after convolution, performing maximum pooling on the feature map c, and taking the maximum value in the selected window as a final feature; the final signature size is: [ (nf +1) ]/2.

Further, the GRU network in step (3.2) transmits a status h ^t -1 and input x of the current node ^t To obtain two gating states; where r controls the gating of resets and z controls the gating of updates. h is ^t-1′ ＝h ^t-1 After Θ r gets the gating signal, reset gating is first used to get the data h later ^t-1′ ＝h ^t-1 Theta r, then h ^t-1′ And input x ^t Splicing, and scaling the data to-1 by a tanh activation functionWithin the range of 1, two steps of forgetting to memorize are carried out simultaneously, and the final expression h is obtained by using the previously obtained updating gating z ^t ＝(1-z)Θh ^t-1 +zΘh ^′ 。

Inputting the result output by the complete connection layer into a softmax regression layer, and outputting the classification probability of each flow by a softmax classifier; the label with the highest probability represents the classification result of the hierarchical network on the flow; the loss function used in the model is the mean-square loss function, and the training optimizer uses an adammoptimizer that performs gradient descent using adaptive moment estimation.

Further, the step (4) further comprises: and comparing the result of the model prediction with the actual result of the test set, and judging the specific indexes of the model prediction result, wherein the reference items comprise accuracy, precision, recall, F1-Measure and convergence rate.

The technical scheme provided by the invention has the beneficial effects that at least: compared with the method for detecting network intrusion through deep learning, which is widely used at present, the method has the following advantages:

1. the acquired flow data directly come from a transmitted data packet, the cost is extremely low in data acquisition, and the universality of data sources can be obviously increased;

2. the one-dimensional data packet data is innovatively converted into the two-dimensional image, different features in the data packet are fully combined in such a way, and a feature combination more beneficial to describing the type of the data packet is obtained;

3. the GRU network is used for describing the time sequence relation among the data packets, and in view of the fact that only part of the data packets among the first 10 data packets in the same intercepted flow contain attack information, the random front-back relevance of the GRU network describes the situation from the bottom layer design and is closer to the actual situation of describing the data packet transmission;

4. after two networks are combined according to levels, compared with the traditional machine learning method, the method provided by the invention has obvious improvement on accuracy, and the accuracy of classifying normal traffic and attack traffic reaches 99.92% and 99.77% according to experimental results, so that the traditional method cannot achieve high prediction accuracy;

5. compared with the traditional machine learning method or a single network deep learning method, the model has the defect of long convergence time.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a block diagram of a network intrusion detection method based on a CNN-GRU hierarchical neural network according to an embodiment of the present invention;

fig. 2 is a flow chart of network traffic data graphical.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As shown in fig. 1, a specific implementation of a network intrusion detection method based on a CNN-GRU hierarchical neural network is as follows:

and capturing the network traffic through Wireshark software to obtain a network traffic data packet, namely a data packet to be classified, wherein the content stored in the data packet at this moment is binary data.

And marking the data packet to be classified, marking normal flow and attack flow according to the requirement, if the requirement for classifying the attack flow exists, classifying the attack flow of different types, and storing the result of flow type marking in a digital mode, starting from 0.

And preprocessing the data packet to be classified through characteristic engineering, shunting the captured network flow according to a source IP address, a source port and a destination IP address, and realizing shunting by utilizing SliptCat software.

And cleaning the data packets in all the streams, removing the MAC source address, the MAC destination address and the network protocol type information used by the data packets in the data packets, extracting the data of the first 160 bytes from each data packet, and filling 0 in the part of the data packets with less than 160 bytes.

The two influences are considered in selecting 160 bytes of data, selecting a shorter length may cause insufficient selected features, and good features for displaying the data packet cannot be obtained, if the selected data is too long, the training time of the model is significantly increased, and selecting a longer length of data may cause too much padding 0, which affects the display of the features of the data packet.

And cleaning each data stream, extracting the first 10 data packets from each data stream, and filling 160-byte all-0 data packets until 10 data packets are obtained when the number of the data packets in the data stream is less than 10.

The 10 packets are selected in consideration of the fact that the network traffic data is less under most network attack behaviors, and too many packets are not selected for better describing the process, otherwise, the network attacks are mixed or the attack traffic and the normal traffic are mixed.

At the moment, the data in each stream is 1600 bytes of 160 × 10, the data in each byte is converted into decimal, a numerical value with the value range of 0-255 is obtained, and the decimal data with 1600 dimensions is converted into matrix data of 40 × 40.

And (3) converting the numerical values in the 40-by-40 matrix data into gray levels, obtaining a single-channel gray level graph with the size of 40-by-40 corresponding to each matrix, and obtaining all picture samples required by model training.

And taking 20% of all processed samples as test samples, taking the rest training samples as training samples, and sending the training samples into the model for training so that the model learns how to classify the samples.

The training sample enters the model, firstly, an improved LetNet-5 network is used for extracting the spatial features of original network flow data by using two convolution layers and two maximum pooling layers, the features are used for describing the features contained in each data packet, and the data packets are converted into pictures to combine the features of the data packets which are originally far away, so that the feature combinations which are beneficial to classification can be more easily learned.

The network has two layers in total, using 32 5 x 5 convolution kernels in the first layer of the convolution process, and then performing the maximum pooling operation, and 64 3 x 3 convolution kernels in the second layer, and then performing the maximum pooling operation. After each convolution operation, the CNN hidden layer is first transformed using the ReLU activation function, and then processed using the max pooling operation, the original single-channel 40 × 40 picture will be transformed into 8 × 8 picture with 64 channels. After extending them sufficiently, 4096-dimensional vectors are obtained and transmitted to the output layer of the CNN network, which uses the full-junction layer, which uses 1600 neurons, this transformation preserving the same dimensions. After the raw data is extracted, some neurons are randomly inactivated in view of the fully connected layer to avoid overfitting.

The convolution operation performs a sliding convolution on a picture of size n using a convolution kernel ω of size f, each time the sliding convolution produces a new feature. Let X be the input of the convolution, b be the bias term, ci be the new feature produced by the convolution at layer i, and σ r be the activation function ReLU. Then, the new features obtained by the convolution operation are: after the convolution operation, the n × n signature will generate c ═ nf + 1. The size is determined by a sliding window of convolution kernel of size ff. After convolution, the feature map c is maximally pooled, and the maximum value in the selected window is taken as the final feature. The final signature size is: [ (nf +1) ]/2.

The second layer of the model is a double-layer GRU network used for extracting the time characteristics of the original network traffic data, the characteristics are used for describing the relation of the data packets in the same flow in time stamp sequence, the actual process of the transmission of the data packets is met, and the characteristics of the network flow are more comprehensively described for distinguishing the types of the data packets.

Each layer of the network contains 256 GRU units and the activation function of each layer operates non-linearly using an S-type function. The last layer of the GRU network uses a fully connected layer, and the number of neurons in the fully connected layer is equal to the number of flow classes.

State h transmitted by GRU network ^t -1 and input x of the current node ^t To obtain two gating states. Where r controls the gating of resets and z controls the gating of updates. h is ^t-1′ ＝h ^t-1 After Θ r gets the gating signal, reset gating is first used to get the data h later ^t-1′ ＝h ^t-1 Theta r, then h ^t-1′ And input x ^t Splicing, zooming the data to the range of-1 to 1 through a tanh activation function, simultaneously performing two steps of forgetting to memorize, and using the previously obtained update gate control z to obtain the final expression h ^t ＝(1-z)Θh ^t-1 +zΘh′。

The result output by the complete connection layer is input to the softmax regression layer, and the softmax classifier outputs the classification probability of each stream. The label with the highest probability represents the classification result of the hierarchical network on the flow. The loss function used in the model is the mean-square loss function, and the training optimizer uses an adammoptimizer that performs gradient descent using adaptive moment estimation.

And after the model training is finished, the data of the test set is transmitted into the model, the model predicts the input data according to the parameters obtained by training, and classifies unknown network traffic to judge whether the unknown network traffic is attack traffic or the type of the unknown network traffic.

And comparing the result of the model prediction with the actual result of the test set, and judging the specific indexes of the model prediction result, wherein the reference items comprise Accuracy (Accuracy), Precision (Precision), Recall (Recall), F1-Measure and Convergence rate (Convergence speed).

The specific embodiment is as follows:

the model is tested using the CICIDS2017 data set, which has the advantage that it has richer traffic types and relatively newer data distribution times, which are more consistent with the current network practice. The data set is from an attack scenario designed by a researcher. All data collected on the first day is normal network traffic. In the next four days, the network is under attack and traffic information is recorded. The final result is stored in the PCAP file, which includes all traffic marked as normal network traffic and various network attacks. Considering the reliability of the training result, the first ten attack traffic and normal traffic are selected as the training set and the test set of the user, and each type is ensured to contain at least two thousand traffic data. Given that the labels given in the data set do not meet the actual requirements, we re-add the labels to the traffic data to meet training requirements. After certain treatment, the number and the proportion of the network flows are shown in the following table:

table 1: number and proportion of network flows

In order to make the test of the model more complete, we test the results of the model classifying only normal traffic and attack traffic and classifying each attack traffic, with an iteration number of twenty-thousand:

table 2: results of model classification

As can be seen from the table, the accuracy of the model prediction exceeds 99.5% in any classification mode, and the model has very high training precision.

The example can show that the method can effectively realize the accurate classification of the network intrusion detection flow and realize the network intrusion detection.

It should be understood that the specific order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.

In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby expressly incorporated into the detailed description, with each claim standing on its own as a separate preferred embodiment of the invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. Of course, the processor and the storage medium may reside as discrete components in a user terminal.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.

What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean a "non-exclusive or".

Claims

1. A network intrusion detection method based on a CNN-GRU hierarchical neural network is characterized by comprising the following steps:

(4) after the model training is finished, the data of the test set is transmitted into the model, the model predicts the input data according to the parameters obtained by training, and classifies unknown network traffic to judge whether the unknown network traffic is attack traffic or the type of the unknown network traffic;

the specific process of the step (2) comprises the following steps:

(2.3) cleaning the data packets in all the flows, removing the MAC source address, the MAC destination address and the network protocol type information used by the data packets in the data packets, extracting the data of the first 160 bytes from each data packet, and filling 0 in the part of the data packets with less than 160 bytes;

(2.6) converting numerical values in the 40-by-40 matrix data into gray levels, obtaining a single-channel gray level diagram with the size of 40-by-40 corresponding to each matrix, and obtaining all picture samples required by model training;

the specific process of the step (3) comprises the following steps:

(3.3) training by using a training set to obtain a network intrusion detection model;

performing sliding convolution on the picture with the size of n × n by using a convolution kernel ω with the size of f × f in the convolution operation in the step (3.1), wherein each sliding convolution generates a new feature; let X be the input of the convolution, b be the bias term, c _i Is the new feature at layer i generated by convolution, and σ r is the activation function ReLU; then, the new features obtained by the convolution operation are: c. C _i ＝σr*(ω*X _i +b _i ) After the convolution operation, the n × n feature map generates a feature map of c ═ n-f +1 (n-f + 1); determining the size by sliding a window through a convolution kernel of size f; after convolution, performing maximum pooling on the feature map c, and taking the maximum value in the selected window as a final feature; the final signature size is: [ (n-f +1) (n-f +1)]/2。

2. The CNN-GRU hierarchical neural network-based network intrusion detection method according to claim 1, wherein the network traffic data packet captured in step (1) has binary data stored therein.

3. The network intrusion detection method based on CNN-GRU hierarchical neural network as claimed in claim 1, wherein the GRU network in step (3.2) transmits a down state h ^t -1 and input x of the current node ^t To obtain two gating states; where r controls the gating of resets, z controls the gating of updates, h ^t-1′ ＝h ^t-1 After Θ r gets the gating signal, reset gating is first used to get the data h later ^t-1′ ＝h ^t-1 Theta.r, and then h ^t-1′ And input x ^t Splicing, zooming the data to the range of-1 to 1 through a tanh activation function, simultaneously performing two steps of forgetting to memorize, and obtaining a final expression h by using the previously obtained updating gating z ^t ＝(1-z)Θh ^t-1 +zΘh′；

4. The CNN-GRU hierarchical neural network-based network intrusion detection method according to claim 1, wherein the step (4) further comprises: and comparing the result of the model prediction with the actual result of the test set, and judging the specific indexes of the model prediction result, wherein the reference items comprise accuracy, precision, recall, F1-Measure and convergence rate.