CN115695002A

CN115695002A - Traffic intrusion detection method, apparatus, device, storage medium, and program product

Info

Publication number: CN115695002A
Application number: CN202211352783.0A
Authority: CN
Inventors: 张晗; 李嘉睿; 王继龙; 尹霞; 施新刚; 安常青
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2023-02-03

Abstract

The present disclosure provides a traffic intrusion detection method, which can be applied to the field of network security and computer network management. The traffic intrusion detection method comprises the following steps: acquiring a plurality of flow data packets; extracting a plurality of feature vectors from a plurality of traffic data packets respectively based on a neural network; extracting frequency domain characteristics of a data stream by performing Fourier transform on the data stream formed by a plurality of characteristic vectors; and detecting, by the classifier, a type of the data stream according to the frequency domain features. The present disclosure also provides a traffic intrusion detection apparatus, device, storage medium and program product.

Description

Traffic intrusion detection method, apparatus, device, storage medium, and program product

Technical Field

The present disclosure relates to the field of network security and computer network management, and in particular, to a method, an apparatus, a device, a storage medium, and a program product for detecting traffic intrusion based on a neural network.

Background

With the development of internet technology, network traffic intrusion detection is an important means for guaranteeing network security. For example, the conventional intrusion detection methods mainly include a signature-based method, a statistical-based method, and an information entropy-based method, etc. However, the above approaches tend to have a high false alarm rate, and the defined detection rules are easily outdated over time and attack evolution.

Currently, some efforts attempt to utilize deep learning for traffic intrusion detection. However, the deep learning model is usually too complex, and occupies a large amount of computing and memory resources, which brings certain difficulties to practical application. For example, without powerful computing resources as hardware support, if the computation and parameter quantities of the deep learning model are large, the computation resources and the real-time performance of detection required for deployment cannot be guaranteed.

Disclosure of Invention

In view of the foregoing, the present disclosure provides a method, apparatus, device, storage medium, and program product for lightweight neural network architecture-based traffic intrusion detection that improves computational efficiency.

According to a first aspect of the present disclosure, there is provided a traffic intrusion detection method, including: acquiring a plurality of flow data packets; extracting a plurality of feature vectors from a plurality of traffic data packets respectively based on a neural network; extracting frequency domain characteristics of a data stream by performing Fourier transform on the data stream formed by a plurality of characteristic vectors; and detecting, by the classifier, a type of the data stream according to the frequency domain features.

According to the embodiment of the present disclosure, extracting a plurality of feature vectors from a plurality of traffic data packets, respectively, based on a neural network includes: extracting a plurality of initial characteristic vectors from a plurality of flow data packets through a plurality of groups of input channels of a convolutional layer of a neural network; and compressing the initial characteristic vector of each flow data packet in the plurality of flow data packets through a full connection layer of the neural network to obtain the characteristic vector, wherein the dimensionality of the characteristic vector is smaller than that of the initial characteristic vector.

According to an embodiment of the present disclosure, the plurality of traffic data packets includes M data packets, the convolutional layer of the neural network includes M input channels and N output channels, M and N are positive integers; extracting a plurality of initial feature vectors from a plurality of traffic data packets through a plurality of groups of input channels of convolutional layers of a neural network, respectively, including: dividing M input channels into M groups of channels, wherein each group of channels comprises one input channel, and each input channel is provided with a convolution kernel; performing deep convolution operation on the M data packets through M convolution kernels of the M groups of channels to obtain M first feature vectors; and performing point-by-point convolution operation on the M first feature vectors, and obtaining N initial feature vectors through N output channels.

According to the embodiment of the present disclosure, performing deep convolution operation on M data packets through M convolution kernels of M groups of channels, respectively, to obtain M first feature vectors, includes: according to

Performing deep convolution operation on M data packets, wherein K is _d，m Convolution kernel, Q, for the mth of M input channels _l+d-1，m The length of the one-dimensional convolution characteristic graph is l in the mth channel, and d is a pixel point in the one-dimensional convolution characteristic graph traversed by the convolution kernel.

According to the embodiment of the present disclosure, performing a point-by-point convolution operation on M first feature vectors, and obtaining N initial feature vectors through N output channels includes: according to

Performing point-by-point convolution operation on the M first feature vectors, wherein W _d，m，n Convolution kernels, Q, for the mth input channel of the M input channels and the nth output channel of the N output channels _l+d，m Is the length of the m-th input channelAnd l is a one-dimensional convolution characteristic diagram, and d is a pixel point of the one-dimensional convolution characteristic diagram traversed by the convolution kernel.

According to the embodiment of the present disclosure, performing a point-by-point convolution operation on the M first eigenvectors, and obtaining N initial eigenvectors through the N output channels, further includes: dividing M input channels and N output channels into G groups to obtain G grouped convolution channel matrixes, wherein G is a positive integer; for each of the G grouped convolution channel matrixes, splitting the grouped convolution channel matrix into a first sub-convolution channel matrix and a second sub-convolution channel matrix, wherein the first sub-convolution channel matrix comprises C output channels, the second sub-convolution channel matrix comprises C input channels, and C is a positive integer; performing point-by-point convolution on the M first eigenvectors through the G first sub-convolution channel matrixes respectively to obtain G multiplied by C second eigenvectors; performing channel shuffling on the G multiplied by C second feature vectors to obtain G multiplied by C third feature vectors; and performing point-by-point convolution on the G multiplied by C third eigenvectors through the G second sub convolution channel matrixes respectively to obtain N initial eigenvectors.

According to an embodiment of the present disclosure, splitting a grouped convolutional channel matrix into a first sub-convolutional channel matrix and a second sub-convolutional channel matrix, includes: according to

Splitting the grouped convolution channel matrix into a first sub-convolution channel matrix and a second sub-convolution channel matrix into two sub-convolutions, wherein,

a block convolutional channel matrix for the L-th layer of convolutional layers, having e input channels and f output channels,

is a first sub-convolution channel matrix and,

for the second sub-convolution channel matrix, L-1 layer is an intermediate layer with C output channels, where M = G × e, N = G × f, and e and f are positive integers.

According to the embodiment of the disclosure, the initial feature vector comprises an initial packet header feature vector and an initial load feature vector; extracting a plurality of initial feature vectors from a plurality of traffic data packets through a plurality of groups of input channels of convolutional layers of a neural network, respectively, including: for each flow data packet of a plurality of flow data packets, acquiring packet header data and load data of each flow data packet; constructing a first convolution model for packet header data and a second convolution model for load data, wherein the number of input channels and the number of output channels of the first convolution model are smaller than that of the second convolution model; and extracting an initial packet header characteristic vector and an initial load characteristic vector from the packet header data and the load data respectively through a first convolution model and a second convolution model.

According to an embodiment of the present disclosure, the plurality of initial feature vectors includes N initial feature vectors; compressing the initial characteristic vector through a full connection layer of the neural network to obtain a characteristic vector, wherein the method comprises the following steps: dividing the N initial feature vectors into a plurality of word segments, each word segment comprising N feature values; and performing in-group compression on the N characteristic values of each of the plurality of word segments through the full connection layer to obtain a plurality of characteristic vectors.

According to the embodiment of the present disclosure, extracting frequency domain features of a data stream by performing fourier transform on the data stream composed of a plurality of feature vectors includes: carrying out Fourier transform on the data stream to obtain a frequency domain variable; and calculating the amplitude of the frequency domain variable to obtain the frequency domain characteristics.

According to an embodiment of the present disclosure, fourier transforming a data stream to obtain a frequency domain variable includes:

according to

y _m ＝F _seq (F _h (p _m ))

Calculating the frequency domain variable y _m Wherein F is _h Extracting interrelations between internal features of multiple traffic data packetsTaking a function, F _seg Extraction function for extracting a sequence relation of a plurality of traffic data packets in a data stream, p _m The feature vector of the mth traffic data packet in the plurality of traffic data packets.

A second aspect of the present disclosure provides a traffic intrusion detection device, including: the acquisition module is used for acquiring a plurality of flow data packets; the first extraction module is used for extracting a plurality of feature vectors from a plurality of flow data packets respectively based on a neural network; the second extraction module is used for extracting frequency domain characteristics of the data stream by carrying out Fourier transform on the data stream formed by the plurality of characteristic vectors; and a detection module for detecting the type of the data stream according to the frequency domain characteristics by the classifier.

A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above traffic intrusion detection method.

A fourth aspect of the present disclosure also provides a computer-readable storage medium having executable instructions stored thereon, which when executed by a processor, cause the processor to perform the above-mentioned traffic intrusion detection method.

A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described traffic intrusion detection method.

According to the embodiment of the disclosure, for each flow data packet, the efficient lightweight convolutional neural network is used for extracting features from original bytes, and the full connection layer is used for compressing and abstracting the features extracted from the convolutional layer, so that the calculation amount is saved by reducing the connectivity of the nodes, and the calculation resources are reduced by a thinning mode. For all the traffic data packets, the mutual relation among the traffic data packets is captured, and the frequency domain characteristics of the data stream consisting of the characteristic vectors of a plurality of data packets are extracted by utilizing Fourier transform. The characterization patterns of normal traffic and attack traffic are clearly distinguishable in the frequency domain.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, taken in conjunction with the accompanying drawings of which:

FIG. 1 schematically illustrates a flow diagram of a traffic intrusion detection method according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a schematic diagram of a traffic intrusion detection method according to an embodiment of the present disclosure;

FIG. 3 schematically shows a flow chart for extracting feature vectors according to an embodiment of the present disclosure;

FIG. 4A schematically shows a schematic diagram of extracting initial feature vectors according to an embodiment of the disclosure;

FIG. 4B schematically shows a schematic diagram of compressing initial feature vectors according to an embodiment of the disclosure;

FIG. 5 schematically shows a flow diagram for extracting frequency domain features according to an embodiment of the disclosure;

FIG. 6 schematically illustrates data stream processing rates of different devices according to an embodiment of the disclosure;

FIG. 7 is a block diagram schematically illustrating an architecture of a traffic intrusion detection device according to an embodiment of the present disclosure; and

fig. 8 schematically illustrates a block diagram of an electronic device suitable for implementing a traffic intrusion detection method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

In those instances where a convention analogous to "at least one of A, B, and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, and C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.).

In the technical scheme of the disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying the personal information of the related users are all in accordance with the regulations of related laws and regulations, necessary security measures are taken, and the customs of public sequences is not violated. In the technical scheme of the disclosure, before the personal information of the user is obtained or collected, the authorization or the consent of the user is obtained.

Fig. 1 schematically shows a flow chart of a traffic intrusion detection method according to an embodiment of the present disclosure.

As shown in fig. 1, the traffic intrusion detection method of this embodiment includes operations S110 to S140.

In operation S110, a plurality of traffic packets are acquired.

In the embodiment of the present disclosure, the traffic data packet is an object of traffic intrusion detection. The traffic data packet may be obtained from a network firewall or an intrusion detection system deployed in the edge device.

In operation S120, a plurality of feature vectors are respectively extracted from a plurality of traffic data packets based on a neural network.

In an embodiment of the present disclosure, for each traffic packet of a plurality of traffic packets, intra-packet modeling is performed on the traffic packet within a set of input channels of a neural network to process the traffic packet. The intra-packet module is used for converting original bytes included in the traffic data packet into representative features so as to extract the feature vector of the traffic data packet. Each group of output channels are not communicated with each other, and sparsification is formed.

The neural network may be a convolutional neural network, employing local connections. The original traffic packet has a differentiated characteristic of localized short fields in bytes. For example, "HTTP/1.1", "id = xxx \ sumit = sumit". Convolutional neural networks are very good at perceiving local details. When processing the localized attack features, the convolutional neural network can utilize a convolution kernel with a fixed receptive field to perform feature extraction in a local region in a sliding mode, and can well capture local details. Compared with the intensive fully-connected neural network, the convolution neural network adopting the local connection can greatly reduce the parameter quantity and the calculated quantity, and has higher operation efficiency.

In operation S130, a data stream composed of a plurality of feature vectors is fourier-transformed, thereby extracting frequency domain features of the data stream.

In the disclosed embodiment, after obtaining the feature vector of a single traffic data packet, inter-packet modeling may be performed on the traffic data packet. Inter-packet modeling is used to capture the interrelationships between traffic data packets. Combining the plurality of eigenvectors into a data stream, and performing inter-packet modeling using a discrete fourier transform to form a sequence of data packets.

The data packet sequence is a time sequence, and the correlation among the flow data packets can be fully extracted through modeling analysis among the flow data packets based on Fourier transform, so that the time sequence characteristics of the data packet sequence are captured. And converting the data packet sequence from a time domain to a frequency domain by using the frequency domain characteristic to characterize the relation between the data packets and the sequence characteristic so as to convert the data stream into a stream characterization vector. The flow characterization vector may include frequency domain features of the data flow.

The pattern of normal traffic and attack traffic is clearly distinguishable in the frequency domain. For example, the frequency of normal traffic tends to be distributed widely and clutter, but the frequency distribution of fixed types of attack traffic, such as botnet (Bot), SQL injection, etc., tends to be single and concentrated.

In the embodiment of the present disclosure, a machine learning model such as a recurrent neural network or a self-attention mechanism may also be used to extract frequency domain features of the data stream. Preferably, the fourier transform may not require learning additional parameters, and the computational complexity is lower.

In operation S140, a type of the data stream is detected according to the frequency domain features through a classifier.

In the embodiment of the present disclosure, the classifier may classify the flow feature vector obtained by modeling between packets, and determine that the flow data packet corresponding to the flow feature vector is normal flow or different types of attack flow.

According to the embodiment of the disclosure, the feature vector of each traffic data packet is extracted through the intra-packet modeling for each data packet, and then the frequency domain feature of the data stream composed of a plurality of feature vectors is extracted by using fourier transform to capture the correlation among a plurality of traffic data packets. The accuracy of flow intrusion detection is improved by extracting the characteristics of data in two forms of data packets and data streams.

Fig. 2 schematically shows a schematic diagram of a traffic intrusion detection method according to an embodiment of the disclosure.

As shown in fig. 2, the plurality of traffic data packets 210 are subjected to intra-packet modeling analysis 220, and the intra-packet modeling analysis 220 is used for extracting feature vectors of each of the plurality of traffic data packets 210 in the plurality of traffic data packets 210, respectively. The intra-package modeling analysis 220 may include a feature extraction stage 221 and a feature compression stage 222. The dimension of the feature vector obtained in the feature extraction stage 221 may be too large, and in order to further reduce the amount of calculation and reduce the consumption of calculation resources, the dimension of the feature vector may be compressed in the feature compression stage 222 to obtain a low-dimension feature vector 230.

A plurality of low-dimensional feature vectors 230 are assembled into a data stream for inter-package modeling analysis 240. The frequency domain features of the low-dimensional feature vector 230 are extracted by fourier transform. The type of the traffic packet is detected by the classifier 250 based on the frequency domain characteristics.

By the embodiment of the disclosure, the input channels of the convolutional neural network are grouped, intra-packet modeling is performed in each group of input channels, and each traffic data packet is processed respectively, so that the connectivity of nodes in the neural network is reduced, and the calculation amount is saved. Dividing the input channel into different groups, and processing the traffic data packets respectively can be understood as decomposing the overall problem into the sum of a plurality of different sub-problems. Each group is subjected to model operation in the group, and connectivity is established only in each group, so that sparseness is realized, and computing resources are reduced. The local connection in the group is used for replacing the dense full connection, so that the model complexity can be reduced, and the complexity is reduced from O (n) to O (n/G), wherein G is the number of groups of the input channel.

Fig. 3 schematically shows a flow chart of extracting feature vectors according to an embodiment of the present disclosure.

As shown in fig. 3, the step of extracting a plurality of feature vectors from a plurality of traffic packets based on a neural network in operation S120 includes operations S310 to S340.

In operation S310, a plurality of initial feature vectors are extracted from a plurality of traffic data packets, respectively, through a plurality of sets of input channels of convolutional layers of a neural network.

In operation S320, for the initial feature vector of each of the plurality of traffic data packets, the initial feature vector is compressed through the full connection layer of the neural network to obtain a feature vector, where a dimension of the feature vector is smaller than that of the initial feature vector.

In an embodiment of the present disclosure, the plurality of traffic packets may include M packets, the convolutional layer of the neural network may include M input channels and N output channels, and M and N are positive integers.

The extracting a plurality of initial feature vectors from a plurality of traffic data packets through a plurality of input channels of a convolutional layer of a neural network, respectively, in operation S310, includes: dividing M input channels into M groups of channels, wherein each group of channels comprises one input channel, and each input channel is provided with a convolution kernel; performing deep convolution operation on the M data packets through M convolution kernels of the M groups of channels to obtain M first feature vectors; and performing point-by-point convolution operation on the M first feature vectors, and obtaining N initial feature vectors through N output channels.

Exemplarily, a process of performing deep convolution operation on M data packets through M convolution kernels of M groups of channels to obtain M first feature vectors may be calculated according to the following formula:

K _d，m convolution kernel, Q, for the mth of M input channels _l+d-1，m The length of the one-dimensional convolution characteristic graph is l in the mth channel, and d is a pixel point in the one-dimensional convolution characteristic graph traversed by the convolution kernel. The one-dimensional convolution signature is used to characterize the traffic packets.

And respectively applying a convolution kernel to each input channel to perform deep convolution operation, wherein the number of output channels of the deep convolution can be the same as that of input channels of the deep convolution. The deep convolution process does not extend the dimensions of the feature map. Furthermore, the characteristic information of different channels can be effectively fused through point-by-point convolution, and the information circulation among the channels is ensured.

For example, a point-by-point convolution operation may be performed on the M first feature vectors according to the following calculation, and the process of obtaining N initial feature vectors through N output channels is performed:

W _d，m，n for the convolution kernel, Q, of the mth input channel of the M input channels and the nth output channel of the N output channels _l+d，m The length of the one-dimensional convolution characteristic graph is l in the mth input channel, and d is the pixel point of the one-dimensional convolution characteristic graph traversed by the convolution kernel.

The number of input channels of the first layer of convolutional layers in the point-by-point convolution is equal to the number of output channels of the convolutional layers in the depth convolution. The convolutional layer in deep convolution includes M input channels and M output channels. The first layer convolutional layer in the point-by-point convolution also includes M input channels. The convolutional layer of the point-by-point convolution may include N output channels.

In the embodiment of the disclosure, the grouping convolution can be performed on the point-by-point convolution process, so that the connectivity of the nodes is reduced, the calculation amount is reduced, and the model efficiency is improved. For example, M channels input to the point-by-point convolution layer are divided into G groups, and a point-by-point convolution operation is performed in each group. To further increase the exchange of information between channels, the input and output channels of a block convolution may be decomposed into two sub-convolutions. For example, the input and output channels of each block convolution may be considered to form a block convolution channel matrix, and each block convolution channel matrix may be split into two sub-convolution channel matrices. Channel shuffling is used between two sub-convolution to increase information interaction between channels.

The step of performing point-by-point convolution operation on the M first feature vectors and obtaining N initial feature vectors through N output channels comprises the following steps: dividing the M input channels and the N output channels into G groups to obtain G grouped convolution channel matrixes, wherein G is a positive integer; for each of the G grouped convolution channel matrixes, splitting the grouped convolution channel matrix into a first sub-convolution channel matrix and a second sub-convolution channel matrix, wherein the first sub-convolution channel matrix comprises C output channels, the second sub-convolution channel matrix comprises C input channels, and C is a positive integer; performing point-by-point convolution on the M first eigenvectors through the G first sub-convolution channel matrixes respectively to obtain G multiplied by C second eigenvectors; performing channel shuffling on the G × C second feature vectors to obtain G × C third feature vectors; and performing point-by-point convolution on the G multiplied by C third eigenvectors through the G second sub-convolution channel matrixes respectively to obtain N initial eigenvectors.

Illustratively, the grouped convolution channel matrix may be split into a first sub-convolution channel matrix and a second sub-convolution channel matrix into two sub-convolutions according to the following equation:

for the first sub-convolution channel matrix,

for the second sub-convolution channel matrix, the L-1 layer is an intermediate layer with C output channels, where M = gxe, N = gxf, e and f are positive integers.

For example, the point-by-point convolution includes 8 input channels and 8 output channels. The 8 input channels and 8 output channels are divided into 2 groups, each group comprising 4 input channels and 4 output channels. Each group of 4 input channels and 4 output channels may form a matrix of grouped convolutional channels

Grouped convolutional channel matrix

Can be split into a first sub-convolution channel matrix

And a second sub-convolution channel matrix

First sub-convolution channel matrix

Including 4 input channels and 2 output channels, a second sub-convolution channel matrix

Including 2 input channels and 4 output channels.

For each set of first sub-convolution channel matrices

And a second sub-convolution channel matrix

For the channel matrix passing through the first sub-convolution

The output two sets of data are shuffled and the lanes of the two packets are randomly shuffled and redistributed, thereby creating a currency of inter-group information.

The split two sub-convolution channel matrixes are used for performing point-to-point convolution, channel sparse connection can be established through the grouping convolution, the operation amount of the point-to-point convolution is greatly reduced, and the connectivity between the groups can be kept through the inter-group channel shuffling, so that the representation capability of the output characteristics is further enhanced.

Fig. 4A schematically shows a schematic diagram of extracting an initial feature vector according to an embodiment of the present disclosure.

As shown in fig. 4A, 4 first eigenvectors are output in the process of deep convolution 410, and in the process of point-by-point convolution 420, 4 input channels and 4 output channels corresponding to the 4 first eigenvectors are split into two groups, so as to obtain two 2 × 2 grouped convolution channel matrices. Each 2 x 2 block convolution channel matrix includes 2 input channels and 2 output channels. Each 2 x 2 grouped convolutional channel matrix is split into two 1 x 1 sub-convolutional channel matrices. And after the channel matrix is subjected to the first 1 multiplied by 1 sub convolution, the two groups of second eigenvectors are subjected to channel shuffling to obtain two groups of third eigenvectors. And obtaining an initial feature vector after a second 1 multiplied by 1 sub-convolution channel matrix.

In an embodiment of the present disclosure, the step of extracting a plurality of initial feature vectors from a plurality of traffic data packets through a plurality of input channels of a convolutional layer of a neural network in operation S310 includes: for each flow data packet of a plurality of flow data packets, acquiring packet header data and load data of each flow data packet; constructing a first convolution model for packet header data and a second convolution model for load data, wherein the number of input channels and the number of output channels of the first convolution model are smaller than the number of input channels and the number of output channels of the second convolution model; and extracting an initial packet header characteristic vector and an initial load characteristic vector from the packet header data and the load data respectively through a first convolution model and a second convolution model.

Both the header data and the payload data of the traffic data packet may be used as detection input data. The header data and the load data of the traffic data packet can be processed separately, and different numbers of channels are applied respectively. The header data carries relatively less semantic information, while the payload data carries more semantic information and diversified features. The load data can contain important information related to attack content, and application layer attacks located in the payload, such as SQL injection, cross-site scripting attack, remote malicious code utilization and the like, can be detected through traffic intrusion detection. A higher dimensional feature space can be established for the load data for characterization.

For example, for header data, only network layer header data and transport layer header data of a traffic packet may be detected. The data link layer header does not contain valid information with attack characteristics. In addition, the source port, checksum, identification, serial number, and acknowledgement number may not be input as a detection, and the IP address may be anonymized in order to mitigate overfitting during detection training. For example, the header data length of a single traffic data packet is fixed to 30 bytes, while the payload data length may be 80 bytes.

In the first convolution model of the header data and the second convolution model of the payload data, starting layers of the first convolution model and the second convolution model may be both standard convolution (standard convolution) for initially extracting shallow features from the original traffic data packet. The first convolution model and the second convolution model may then be subjected to a maximum pooling operation (max boosting) and successively superimposed with a deep convolution unit (Depthwise) and a point-by-point convolution unit (Pointwise) to expand the receptive field. Each point-by-point convolution unit has two sub-convolutions. The first convolution model and the second convolution model may also improve training efficiency By Normalization (BN), and perform linear activation on each layer output by an activation function (ReLU).

The contents of the first convolution model of the header data and the second convolution model of the payload data may be as shown in the following table.

In the embodiment of the present disclosure, in the case where the plurality of initial feature vectors includes N initial feature vectors. In operation S320, compressing the initial feature vector through the full connection layer of the neural network to obtain a feature vector, including: dividing the N initial feature vectors into a plurality of word segments, each word segment comprising N feature values; and performing in-group compression on the N characteristic values of each of the plurality of word segments through the full connection layer to obtain a plurality of characteristic vectors.

After the convolution operation is completed, the dimensionality of the initial feature vector of the convolution layer output may be too large, e.g., greater than 1000. If the initial feature vector is directly used as the feature vector of the traffic data packet, a large amount of calculation cost is generated in the inter-packet modeling analysis stage. And adding a full connection layer after the convolutional layer for feature compression to generate a simple data packet feature vector. Feature compression may also filter less effective redundant features implicitly through end-to-end supervised training.

In the feature compression process, the initial header feature vector and the initial load feature vector are divided into a plurality of short word segments (seg 1, seg 2.) respectively, and the operation is performed in the short word segments. The full-connection layer is divided into g sparse connections, g being the number of packets. The packet feature compression can divide the traffic data packet into different byte segments according to the network layer and the transmission layer, and averagely divide the initial packet header feature vector and the initial load feature vector.

Since the compression of the initial packet header feature vector and the initial load feature vector is similar. The present disclosure illustrates, by way of example only, the compression process for load feature vectors.

For example, the initial load feature vector of the convolutional layer output may be represented by the following equation:

b is the initial load characteristic vector of the convolution layer output, B _i The number of output channels of the convolutional layer is n. For each byte b _i Characteristic value b of n channels _i1 ...b _in . Performing a segmentation operation on the initial load feature vector B to segment the initial load feature vector B into a plurality of short word segments seg _i To implement the grouping.

The initial load feature vector B may be divided according to:

seg _i for the ith byte segment [ b ] with n-channel characteristics _(i-1)R ：b _iR ]R is the number of elements in each group, and g is the number of groups. After grouping, the features seg of each byte section _i Will be flattened into [ b _1，(i-1) R，...b _1，iR ，...b _n，(i-1) R，...b _n，iR ]And compressing the characteristics in the groups to obtain the reduced characteristics of each group. The fusion layer fuses the reduced features from the different packet outputs and outputs a low-dimensional feature vector, for example, a feature vector with dimensions less than 20.

Fig. 4B schematically illustrates a schematic diagram of compressing initial feature vectors according to an embodiment of the disclosure. As shown in fig. 4, the traffic data packet 430 is subjected to feature extraction to obtain an initial packet header feature vector 440 and an initial load feature vector 450. For the initialThe header feature vector 440 and the initial load feature vector 450 are partitioned. The initial header feature vector 440 may be divided into 3 short-field segs ₁ 、seg ₂ And seg ₃ The initial load feature vector 450 may be divided into 5 short word segments seg ₁ 、seg ₂ 、seg ₃ 、seg ₄ And seg ₅ . For example, the length of the initial load feature vector is 50 bytes, and the initial load feature vector is divided into 5 segments: [ 0: 10 ]]，[10∶20]，[20∶30]，[30∶40]，[40∶50]. And performing intra-group feature compression on the features of each short byte section to obtain a reduction feature of each group. The reduced features from the different packet outputs are fused by the fusion layer 460 and a low dimensional feature vector 470 is output.

Fig. 5 schematically shows a flowchart of extracting frequency domain features according to an embodiment of the disclosure.

As shown in fig. 5, the fourier transforming the data stream to obtain the frequency domain variables in operation S130 includes operations S510 to S520.

In operation S510, a fourier transform is performed on the data stream to obtain a frequency domain variable.

In operation S520, the amplitude of the frequency domain variable is calculated, resulting in a frequency domain characteristic.

In the disclosed embodiment, let S be a bi-directional flow containing M eigenvectors:

S＝[p ₁ ，p ₂ ，...，p _m ，...，p _M ]

p _m is the feature vector of the mth data packet in the data stream.

Converting the data stream S into a data packet sequence, using discrete Fourier transform to characterize the relation and sequence characteristics between data packets by using frequency domain characteristics, and realizing the conversion of the data stream S into a stream characterization vector y _m 。

Illustratively, a Discrete Fourier Transform (DFT) may decompose an input sequence into a combination of a set of harmonics in the frequency domain:

x _m is an input sequence of length m, m is within 0]。

Applying the discrete fourier transform to the step of performing fourier transform on the data stream to obtain the frequency domain variable in operation S510, a two-dimensional discrete fourier transform of the data packet sequence may be calculated according to the following formula:

y _m ＝F _seq (F _h (p _m ))，0≤m≤M-1

a plurality of y _m Is a frequency domain variable, F _h For extracting a function for extracting interrelations between internal features of a plurality of traffic data packets, F _seq Extraction function for extracting a sequence relation of a plurality of traffic data packets in a data stream, p _m The characteristic vector of the mth traffic data packet in the M traffic data packets in the plurality of traffic data packets.

The output of the two-dimensional discrete fourier transform can be represented as follows:

y _m ＝a _m +b _m j

wherein U is the characteristic dimension of a single data packet, M is the dimension of a data packet sequence, and x _UM Is a sequence of data packets.

Illustratively, the amplitude of the frequency domain variable may be calculated according to the following equation:

using the variable Y due to the symmetry of the frequency domain _m As a frequency domain feature. The final output of the discrete Fourier transform is a flow characterizing vector Y _m . The classifier will be to Y _m Is divided intoAnd the class is used for judging that the flow data packet is normal flow or attack flow of different types.

The present disclosure also provides a verification of the traffic intrusion detection method of the present disclosure, verifying the detection performance. The verification process comprises testing and evaluating the detection effect and the detection efficiency. Experiments two public data sets were used for the experiments: CIC-IDS2017 and CIC-IDS2018.

The disclosed traffic intrusion detection method is compared to a variety of algorithms, such as AE + MLP, LCB + LSTM, 1D-CNN, and MobileNet. The AE + MLP is used for detecting by using manually designed features, an automatic encoder is used for compressing input features in a semi-supervised mode, and a classifier is used for classifying specific attack types. LCB (Light-weighted Convolution Block) is a lightweight convolutional neural network module. LCB + MLP uses LCB for intra-packet modeling and outputs traffic packet vectors directly to the classifier without inter-packet modeling. LCB + LSTM uses LCB for intra-packet modeling and Long Short Term Memory Network (LSTM) for inter-packet modeling. 1D-CNN is a standard convolution operation with two convolution layers, 8 and 16 channels respectively. MobileNet is an efficient CNN architecture, and a lightweight convolutional neural network model is constructed by using deep separable convolution.

In the course of comparative experiments, n is set in this section _p =8 and n _b =80, wherein n _p Is the first N data packets, N, used in a data stream _b Is the number of bytes used in the payload. The traffic intrusion detection method of the present disclosure is named LiteIDS. The following table shows the multi-classification results of the various detection methods.

LiteIDS has higher precision, recall, and F1 values than AE + MLP, indicating that automatic extraction of replacement table features from the original packet is more efficient than using manually designed features. The LiteIDS has better detection effect compared with the LCB + MLP, and represents the effectiveness of modeling between data packet packets through Fourier transform. The LiteIDS converts the original sequence of the data packets from the time domain to the frequency domain, and captures richer frequency domain characteristics while extracting the relation between the data packets. Compared with a data packet modeling mode based on a long-short term memory network (LCB + LSTM), the LiteIDS has less calculation amount and higher detection performance while achieving similar detection effect. In addition, compared with standard convolution 1D-CNN and MobileNet which do not introduce simplified operation, the validity of LiteIDS is not obviously reduced, and the validity and feasibility of the lightweight operation of the method are proved.

In addition, compared with other anomaly detection methods based on deep learning, the LiteIDS has the characteristics of low calculation cost and small model scale. The method evaluates the detection efficiency of the verification of the traffic intrusion detection method, and the evaluation indexes comprise MACC, parameters and model size. MACC (Multi-Accumulate Operation) is the sum of all times of multiplication and addition in the model Operation process and directly reflects the calculated amount in the model. The number of parameters and the size of the model reflect the number of variables in the model that need to be trained.

And evaluating the model efficiency by calculating the average value of the evaluation indexes of the single traffic data packet. In the detection process of a single traffic data packet, because the load detection module of the traffic data packet does not work when the traffic data packet does not have a payload, the computation complexity of the load is multiplied by a probability factor λ, where λ is the probability that the single traffic data packet has a payload. For example, the proportion of packets having a length of not more than 64 bytes is about 60% of the total number of traffic packets, and the probability factor λ =0.4 may be set. In the inter-packet modeling, the calculation amount of one data flow can be averagely distributed to each traffic packet to calculate the average value of the calculation efficiency of the single traffic packet.

The table below shows the average MACC, the parameter quantities, the model size and the F1 value for a single traffic packet for different detection algorithms. When the complexity of the Kitsune model is evaluated, the calculation is performed under the conditions of n =198, k =10 and k =20, n is the number of characteristic numbers of each traffic packet, and k is the number of automatic encoders. The compression ratio of the concealment layer of the automatic encoder is 3/4.

Algorithm	MACC	Amount of parameter	Mean size of model	Value of F1
					LiteIDS	4,263	1,107	4.3k	99.27％
LCB+LSTM	6,192	3,260	13.1k	99.41％
					1D-CNN	73,920	16,866	67.5k	99.69％
MobileNet	3,805,000	268,321	1.03k	99.92％
					Kitsune(k＝20)	3800	3,800	15.2k
Kitsune(k＝10)	6160	6,160	25.2k

The MACC value for LiteIDS was only 5.76% of 1D-CNN and 0.12% of MobileNet. It follows that LiteIDS can reduce the computational load of standard one-dimensional convolution by up to 94.24% without a significant drop in detection accuracy. Furthermore, the MACC value of LiteIDS is only 68.84% of LCB + LSTM, indicating that using fourier transforms for inter-packet modeling and sequence feature extraction is less computationally complex than LSTM. LiteIDS also has MACC values comparable to or lower than Kitsune, indicating that the method has model efficiencies comparable to or higher than Kitsune. Meanwhile, the LiteIDS also has the minimum parameter number and the average model size which are respectively 6.56 percent of 1D-CNN, 0.41 percent of MobileNet and 33.95 percent of LCB + LSTM, which indicates that the method has the simplest parameters and the model occupies smaller space in the deployment process.

In addition, the present disclosure evaluates the performance of running the traffic intrusion detection method of the present disclosure. The operating environment is a desktop computer and a software router, wherein the software router also performs routing and forwarding tasks simultaneously. Fig. 6 schematically illustrates data stream processing rates of different devices according to an embodiment of the disclosure.

As shown in FIG. 6, following packet n in the data stream _p The stream processing rate is continuously decreased in both environments as the number is increased. When n is _p =8, the flow processing rates are 3010flow/s and 2342flow/s. According to the experimental results of the hyperginseng sensitivity section, even when n is present _p If =4, the model still has a good F1 value, so the flow processing rate of the system in the two devices can reach 3293flow/s and 2564flow/s at most. Experimental results show that the traffic intrusion detection method can be operated under a software router according to the complexity of the current model.

Based on the traffic intrusion detection method, the disclosure also provides a traffic intrusion detection device. The apparatus will be described in detail below with reference to fig. 7.

Fig. 7 schematically shows a block diagram of a traffic intrusion detection apparatus according to an embodiment of the present disclosure.

As shown in fig. 7, the traffic intrusion detection apparatus 700 of this embodiment includes an obtaining module 710, a first extracting module 720, a second extracting module 730, and a detecting module 740.

The obtaining module 710 is configured to obtain a plurality of traffic data packets. In an embodiment, the obtaining module 710 may be configured to perform the operation S110 described above, which is not described herein again.

The first extraction module 720 is configured to extract a plurality of feature vectors from a plurality of traffic data packets based on a neural network, respectively. In an embodiment, the first extracting module 720 may be configured to perform the operation S120 described above, which is not described herein again.

The second extraction module 730 is configured to extract frequency domain features of the data stream by performing fourier transform on the data stream composed of the plurality of feature vectors. In an embodiment, the second extracting module 730 may be configured to perform the operation S130 described above, which is not described herein again.

The detection module 740 is configured to detect the type of the data stream according to the frequency domain features through a classifier. In an embodiment, the detecting module 740 may be configured to perform the operation S140 described above, which is not described herein again.

According to the embodiment of the present disclosure, any plurality of the obtaining module 710, the first extracting module 720, the second extracting module 730, and the detecting module 740 may be combined and implemented in one module, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the obtaining module 710, the first extracting module 720, the second extracting module 730, and the detecting module 740 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented in any one of three implementations of software, hardware, and firmware, or in a suitable combination of any of them. Alternatively, at least one of the obtaining module 710, the first extracting module 720, the second extracting module 730 and the detecting module 740 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.

Fig. 8 schematically shows a block diagram of an electronic device adapted to implement the calculation method of the particle monte carlo dose according to an embodiment of the present disclosure.

As shown in fig. 8, an electronic device 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include onboard memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing different actions of the method flows according to embodiments of the present disclosure.

In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or RAM 803. Note that the programs may also be stored in one or more memories other than the ROM 802 and RAM 803. The processor 801 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 800 may also include input/output (I/O) interface 805, input/output (I/O) interface 805 also connected to bus 804, according to an embodiment of the present disclosure. The electronic device 800 may also include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that the computer program read out therefrom is mounted on the storage section 808 as necessary.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement a method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include one or more memories other than the ROM 802 and/or RAM 803 and/or ROM 802 and RAM 803 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the traffic intrusion detection method provided by the embodiment of the disclosure.

The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 801. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, and the like. In another embodiment, the computer program may also be transmitted in the form of a signal, distributed over a network medium, downloaded and installed via communications portion 809, and/or installed from removable media 811. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program, when executed by the processor 801, performs the above-described functions defined in the system of the embodiments of the present disclosure. The above described systems, devices, apparatuses, modules, units, etc. may be implemented by computer program modules according to embodiments of the present disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external computing devices (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It will be appreciated by a person skilled in the art that various combinations or/and combinations of features recited in the various embodiments of the disclosure and/or in the claims may be made, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments of the present disclosure and/or the claims may be made without departing from the spirit and teachings of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A traffic intrusion detection method, comprising:

acquiring a plurality of flow data packets;

extracting a plurality of feature vectors from a plurality of traffic data packets respectively based on a neural network;

extracting frequency domain features of a data stream formed by the plurality of feature vectors by performing Fourier transform on the data stream; and

and detecting the type of the data stream according to the frequency domain characteristics through a classifier.

2. The traffic intrusion detection method according to claim 1, wherein the extracting a plurality of feature vectors from a plurality of traffic data packets, respectively, based on a neural network comprises:

extracting a plurality of initial characteristic vectors from a plurality of flow data packets through a plurality of groups of input channels of the convolutional layer of the neural network respectively; and

and for the initial characteristic vector of each flow data packet in the plurality of flow data packets, compressing the initial characteristic vector through a full connection layer of the neural network to obtain the characteristic vector, wherein the dimensionality of the characteristic vector is smaller than that of the initial characteristic vector.

3. The traffic intrusion detection method according to claim 2, wherein the plurality of traffic data packets includes M data packets, the convolutional layers of the neural network include M input channels and N output channels, M and N are positive integers; the extracting a plurality of initial feature vectors from a plurality of traffic data packets through a plurality of groups of input channels of the convolutional layer of the neural network respectively comprises:

dividing the M input channels into M groups of channels, wherein each group of channels comprises one input channel, and each input channel is provided with a convolution kernel;

performing deep convolution operation on the M data packets through the M convolution kernels of the M groups of channels to obtain M first feature vectors;

and performing point-by-point convolution operation on the M first feature vectors, and obtaining N initial feature vectors through the N output channels.

4. The traffic intrusion detection method according to claim 3, wherein the performing, by the M convolution kernels of the M groups of channels, deep convolution operations on the M data packets, respectively, to obtain M first feature vectors includes:

according to

Performing deep convolution operation on M data packets, wherein K is _d，m A convolution kernel, Q, for the mth of the M input channels _l+d-1，m Is a one-dimensional convolution characteristic graph with the length of l in the mth channel, and d is the length of the one-dimensional convolution characteristic graph traversed by the convolution kernelAnd (6) pixel points.

5. The traffic intrusion detection method according to claim 3, wherein the performing a point-by-point convolution operation on the M first eigenvectors to obtain N initial eigenvectors through the N output channels includes:

according to

Performing point-by-point convolution operation on the M first feature vectors, wherein W is _d，m，n Convolution kernels, Q, for the mth one of the M input channels and the nth one of the N output channels _l+d，m And d is a pixel point of the one-dimensional convolution feature map traversed by the convolution kernel.

6. The traffic intrusion detection method according to claim 4, wherein the performing a point-by-point convolution operation on the M first eigenvectors to obtain N initial eigenvectors through the N output channels further includes:

dividing the M input channels and the N output channels into G groups to obtain G grouped convolution channel matrixes, wherein G is a positive integer;

for each of the G grouped convolutional channel matrices, splitting the grouped convolutional channel matrix into a first sub-convolutional channel matrix and a second sub-convolutional channel matrix, the first sub-convolutional channel matrix including C output channels, the second sub-convolutional channel matrix including C input channels, and C being a positive integer;

performing point-by-point convolution on the M first eigenvectors through G first sub-convolution channel matrixes respectively to obtain G × C second eigenvectors;

performing channel shuffling on the G × C second feature vectors to obtain G × C third feature vectors; and

and performing point-by-point convolution on the G × C third feature vectors through the G second sub-convolution channel matrixes respectively to obtain N initial feature vectors.

7. The traffic intrusion detection method according to claim 6, wherein the splitting the packet convolution channel matrix into a first sub-convolution channel matrix and a second sub-convolution channel matrix includes:

according to

is a first sub-convolution channel matrix and,

8. The traffic intrusion detection method according to claim 2, wherein the initial feature vectors include an initial packet header feature vector and an initial load feature vector; the extracting a plurality of initial feature vectors from a plurality of traffic data packets through a plurality of groups of input channels of the convolutional layers of the neural network respectively comprises:

for each flow data packet of the plurality of flow data packets, acquiring header data and load data of each flow data packet;

constructing a first convolution model for the packet header data and a second convolution model for the load data, wherein the number of input channels and the number of output channels of the first convolution model are smaller than the number of input channels and the number of output channels of the second convolution model; and

extracting the initial packet header feature vector and the initial load feature vector from the packet header data and the load data through the first convolution model and the second convolution model respectively.

9. The traffic intrusion detection method according to claim 2, wherein the plurality of initial eigenvectors includes N initial eigenvectors; compressing the initial feature vector through the full connection layer of the neural network to obtain the feature vector, including:

dividing the N initial feature vectors into a plurality of byte segments, each byte segment comprising N feature values; and

and performing in-group compression on the N characteristic values of each of the plurality of word segments through the full connection layer to obtain the plurality of characteristic vectors.

10. The traffic intrusion detection method according to claim 2, wherein the extracting frequency domain features of the data stream by performing fourier transform on the data stream composed of the plurality of feature vectors comprises:

carrying out Fourier transform on the data stream to obtain a frequency domain variable; and

and calculating the amplitude of the frequency domain variable to obtain the frequency domain characteristics.

11. The traffic intrusion detection method according to claim 10, wherein the fourier transforming the data stream to obtain frequency domain variables comprises:

according to

y _m ＝F _seq (F _h (p _m ))

Computing frequency domainVariable y _m Wherein F is _h For extracting an extraction function of the interrelations between the internal features of the plurality of traffic data packets, F _seq An extraction function for extracting the sequence relation of the plurality of traffic data packets in the data stream, p _m The feature vector of the mth traffic data packet in the plurality of traffic data packets.

12. A traffic intrusion detection device, comprising:

the acquisition module is used for acquiring a plurality of flow data packets;

the first extraction module is used for extracting a plurality of characteristic vectors from a plurality of flow data packets respectively based on a neural network; and

a second extraction module, configured to extract frequency domain features of a data stream formed by the plurality of feature vectors by performing fourier transform on the data stream; and

and the detection module is used for detecting the type of the data stream according to the frequency domain characteristics through a classifier.

13. An electronic device, comprising:

one or more processors;

a storage device to store one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-11.

14. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any one of claims 1 to 11.

15. A computer program product comprising a computer program which, when executed by a processor, carries out the method according to any one of claims 1 to 11.