WO2022222020A1

WO2022222020A1 - Neural network architecture automatic search method and device for traffic classification

Info

Publication number: WO2022222020A1
Application number: PCT/CN2021/088293
Authority: WO
Inventors: 林鹏; 叶可江; 须成忠
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2021-04-20
Filing date: 2021-04-20
Publication date: 2022-10-27

Abstract

A neural network architecture automatic search method and device for traffic classification. The method and apparatus comprise: obtaining a network traffic data sample, preprocessing the data sample, and converting same into a traffic data set in the form of a two-dimensional matrix (S101); respectively dividing the traffic data set into a training set, a verification set, and a test set (S102); after dividing the traffic data set, constructing an optional operation of a search space of a neural network architecture (S103); and searching an optimal neural network architecture on the basis of the optional operation of the search space, comprising searching an optimal Cell structure, repeatedly searching and stacking identical Cell structures, and forming the searched Cell structures into the entire neural network architecture (S104). The method can automatically search an optimal neural network structure by means of a given candidate operation, without manual participation.

Description

A neural network architecture automatic search method and device for traffic classification

technical field

The present application relates to the field of information technology, and in particular, to an automatic search method and device for a neural network architecture for traffic classification.

Background technique

Network traffic classification technology is an important research problem in the network field. It plays a very key role in intrusion detection, network anomaly detection and QoS assurance. As a very promising technology in recent years, deep neural network has achieved good results in the field of traffic classification. However, the current deep neural networks all require manual design and are difficult to adapt to different traffic classification tasks. These different traffic classification tasks may have different focuses, and the traffic patterns in different enterprises are also different. Designing a suitable neural network architecture is very expensive.

Traditional traffic classification techniques are based on port numbers and deep packet inspection. This method matches traffic to be classified based on pre-set rules, thereby classifying it into known traffic patterns. The method based on machine learning enables traffic classification technology to more intelligently identify its inherent patterns. Such methods generally need to characterize network traffic first, convert it into a set of traffic feature vectors, and then combine common machine learning methods. methods such as SVM, random forest, KNN, etc. for classification. The emergence of deep neural networks makes end-to-end traffic classification possible. Such methods do not require prioritizing network traffic characterization, and can also simply encode the original network data and send it to the neural network. The neural network first maps these codes to a set of high-dimensional vectors, and then performs convolution, pooling, recursive calculations, etc. on them to extract hidden features. Generally speaking, in order to strengthen the learned feature expression, the number of layers of the network will be deepened, which is the origin of "deep neural network". After the multi-layer feature extraction operation, the linear layer and the Softmax layer are generally combined at the end for classification.

Traditional traffic classification techniques are based on port numbers and deep packet inspection, which are easily bypassed by techniques such as port spoofing and traffic encryption. Machine learning-based methods need to characterize network traffic first, but this requires a large amount of expert prior knowledge, and artificially designed features may not be guaranteed to be representative and comprehensive. Although deep learning methods can lower the threshold of expert knowledge, designing an effective neural network architecture is itself a challenging task. In addition, to find out which neural network architecture is best for the current task requires a lot of experimentation and redesign, which requires huge human and material resources.

SUMMARY OF THE INVENTION

The present application provides an automatic search method and device for a neural network architecture for traffic classification, aiming at realizing automatic neural network architecture search.

In order to solve the above problems, the application provides the following technical solutions:

In one aspect, an automatic search method for a neural network architecture for traffic classification is provided, comprising the following steps:

Obtain network traffic data samples, preprocess the data samples and convert them into a traffic data set in the form of a two-dimensional matrix;

Divide the traffic data set into training set, validation set and test set respectively;

An optional operation to build the search space of the neural network architecture after the traffic dataset is partitioned;

Based on the optional operation of the search space, search for the optimal neural network structure; including searching for the optimal Cell structure, repeating the search and stacking the same Cell structure, and forming the searched Cell structure into the entire neural network structure.

The technical solutions adopted in the embodiments of the present application also include: the method further includes:

Add spatial attention and channel attention to the output position of each Cell;

Save the searched Cell architecture parameters, and retrain the Cell architecture with a new training set that combines the training set and the validation set.

The technical solutions adopted in the embodiments of the present application further include: in acquiring network traffic data samples, preprocessing the data samples and converting the data samples into a traffic data set in the form of a two-dimensional matrix, including:

Step 1: perform data extraction at network flow granularity;

Step 2: carry out data extraction at packet granularity;

Step 3: splicing the data extracted at the network flow granularity and the data extracted at the data packet granularity along the channel dimension to form a tensor;

Step 4: Perform the operations from Step 1 to Step 3 above for all network flows to construct a formatted traffic data set.

The technical solution adopted in the embodiment of the present application further includes: dividing the traffic data set into training set, verification set and test set respectively includes:

The training set, validation set and test set contain the same proportion of traffic of each category; among them,

The training set is used to update the network weight parameters in this structure;

The validation set is used to search for the best network structure parameters;

The test set is used to retrain using the training set plus the validation set after the search is complete, and to evaluate the final results on the test set.

The technical solutions adopted in the embodiments of the present application further include: the optional operation of constructing the search space includes: limiting the scope of the search neural network architecture space.

The technical solutions adopted in the embodiments of the present application further include: the optional operation of the search space of the neural network architecture is set as: 3x3 depthwise separable convolution, 5x5 depthwise separable convolution, 3x3 hole convolution, 5x5 hole convolution, 3x3 Maximum pooling and 3x3 average pooling.

The technical solutions adopted in the embodiments of the present application further include: in the optional operation based on the search space, searching for the optimal neural network architecture includes:

Step 1: Build a supernet; in which, the operation of extending eight edges in the supernet, and each edge has a different weight;

Step 2: Use gradient descent on the validation set loss to update the neural network architecture parameters;

Step 3: Use the gradient descent method to update the operation weight parameters on the training set loss;

Step 4: Repeat

steps

2 and 3 until the neural network architecture training is completed;

Step 5: After the training is completed, keep the edge with the largest node weight.

The technical solution adopted in the embodiment of the present application also includes: adding spatial attention and channel attention to the output position of each Cell, including:

Step 1: The channel attention weight is obtained by adding channel attention to the output of the searched Cell structure;

Step 2: The spatial attention weight is obtained by adding spatial attention to the output of the searched Cell structure;

Step 3: Perform matrix multiplication of the obtained channel attention weight and spatial attention weight to obtain a new output of attention weighting.

In another aspect, a neural network architecture automatic search device for traffic classification is provided, the device comprising:

Data conversion module: used to obtain network traffic data samples, preprocess the data samples and convert them into a traffic data set in the form of a two-dimensional matrix;

Data division module: used to divide the traffic data set into training set, validation set and test set respectively;

Optional operation building block: optional operation used to build the search space of the neural network architecture after the traffic dataset is divided;

Search for the optimal network architecture module: It is used to search for the optimal neural network architecture based on the optional operation of the search space; including, searching for the optimal Cell structure, repeating the search for stacking the same Cell structure, and forming the searched Cell structure into a whole Neural network architecture.

The technical solutions adopted in the embodiments of the present application further include: the device further includes:

Attention adding module: used to add spatial attention and channel attention to the output position of each Cell;

Retraining module: used to save the searched Cell architecture parameters, and retrain the Cell architecture using a new training set that combines the training set and the validation set.

Compared with the prior art, the beneficial effects of the embodiments of the present application are as follows: the neural network architecture automatic search method and device for traffic classification in the embodiments of the present application obtain network traffic data samples, preprocess the data samples, and transfer the data samples. The traffic data set in the form of a two-dimensional matrix; the traffic data set is divided into training set, validation set and test set respectively; after the traffic data set is divided, the optional operation of constructing the search space of the neural network architecture; the search space-based optional operation The selection operation is performed to search for the optimal neural network architecture; which includes, searching for the optimal Cell structure, repeatedly searching and stacking the same Cell structure, and forming the searched Cell structure into the entire neural network structure. The present application can automatically search for the optimal neural network structure through a given candidate operation without manual participation.

Description of drawings

The drawings described herein are used to provide further understanding of the present application and constitute a part of the present application. The schematic embodiments and descriptions of the present application are used to explain the present application and do not constitute an improper limitation of the present application. In the attached image:

1 is a flowchart of an automatic search method for a neural network architecture for traffic classification according to an embodiment of the present application;

Fig. 2 is the preferred flow chart of the neural network architecture automatic search method for traffic classification according to the embodiment of the present application;

3 is a block diagram of a neural network architecture automatic search device for traffic classification according to an embodiment of the present application;

4 is a preferred module diagram of a neural network architecture automatic search device for traffic classification according to an embodiment of the present application;

5 is a preferred module diagram of a neural network architecture automatic search device for traffic classification according to an embodiment of the present application;

FIG. 6 is a schematic diagram of adding attention after the searched Cell structure described in an embodiment of the present application;

7 is a diagram illustrating a calculation process of channel attention in an embodiment of the present application;

FIG. 8 is a diagram illustrating a calculation process of spatial attention according to an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

It should be noted that the terms "first", "second", etc. in the description and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

Example 1

Referring to FIG. 1 to FIG. 8 , according to an embodiment of the present application, an automatic search method for a neural network architecture for traffic classification is provided, including the following steps:

S101: Obtain network traffic data samples, preprocess the data samples and convert them into a traffic data set in the form of a two-dimensional matrix;

S102: Divide the traffic data set into training set, validation set and test set respectively;

S103: an optional operation of constructing a search space of the neural network architecture after the traffic data set is divided;

S104: Searching for an optimal neural network architecture based on an optional operation in the search space, which includes searching for an optimal Cell structure, repeatedly searching and stacking the same Cell structure, and forming the searched Cell structure into an entire neural network architecture.

The neural network architecture automatic search method and device for traffic classification in the embodiment of the present application obtains network traffic data samples, preprocesses the data samples and converts them into a traffic data set in the form of a two-dimensional matrix; and divides the traffic data sets into separate into a training set, a validation set and a test set; after the traffic data set is divided, an optional operation to construct the search space of the neural network architecture; an optional operation based on the search space to search for the optimal neural network architecture; including, searching for the most optimal neural network architecture. For an optimal Cell structure, repeat the search and stack the same Cell structure, and form the searched Cell structure into the entire neural network architecture. The present application can automatically search for the optimal neural network structure through a given candidate operation without manual participation.

In an embodiment, the method further includes:

S105: Add spatial attention and channel attention to the output position of each Cell;

S106: Save the searched Cell architecture parameters, and retrain the Cell architecture using a new training set formed by combining the training set and the validation set.

In the technology of this application, the spatial attention and channel attention mechanisms are combined into each cell structure searched, which enhances the feature extraction capability of the model.

In the technology of this application, spatial attention and channel attention are added to the output position of each cell structure searched to enhance the feature extraction ability of the model; the Cell structure obtained by adding spatial attention and channel attention training will be added. After the parameters are saved, continue to retrain until the training is complete.

In the embodiment, in the acquisition of network traffic data samples, the data samples are preprocessed and converted into a traffic data set in the form of a two-dimensional matrix, including:

Step 1: perform data extraction at network flow granularity;

Step 2: carry out data extraction at packet granularity;

The following describes in detail the flow data set that the data samples of the present application are preprocessed and converted into a two-dimensional matrix form with specific examples:

Step 1: Perform data extraction at the network flow granularity, extract the previous byte of the network flow, and form

The two-dimensional matrix m ₁ of .

Step 2, perform data extraction at the data packet granularity, extract bytes from the first n data packets of a flow, extract the first α/n bytes of each data packet, total α bytes, and form

The two-dimensional matrix m ₂ of .

Step 3, splicing m ₁ and m ₂ along the channel dimension to form a 32*32*3 tensor.

Step 4: Perform the operations from Steps 1 to 3 above on all network flows to construct a formatted traffic data set.

In the embodiment, dividing the traffic data set into training set, validation set and test set respectively includes:

The validation set is used to search for the best network structure parameters;

The traffic dataset constructed above is divided, one part is used for training, one part is used for validation, and the remaining part is used for testing. Each set contains the same proportion of each category of traffic. The training set is used to update the network weight parameters in the structure, and the validation set is used to search for the best network structure parameters. After the search is completed, the training set and the validation set are used for retraining, and the final result is evaluated on the test set.

In the embodiment, the optional operation of constructing the search space includes: limiting the scope of searching the neural network architecture space. The optional operations for automatic neural architecture search are infinite, and to reduce complexity we need to limit the scope of the search space.

In the embodiment, optional operations of the search space of the neural network architecture are set to: 3x3 depthwise separable convolution, 5x5 depthwise separable convolution, 3x3 atrous convolution, 5x5 atrous convolution, 3x3 max pooling and 3x3 averaging pooling. Optional operations on the search space in this application include: 3x3 depthwise separable convolution, 5x5 depthwise separable convolution, 3x3 atrous convolution, 5x5 atrous convolution, 3x3 max pooling, 3x3 average pooling, identity and direct A total of eight operations.

In the embodiment, in the optional operation based on the search space, searching for the optimal neural network architecture includes:

Step 4: Repeat steps 2 and 3 until the neural network architecture training is completed;

This search method belongs to microscopic search, that is, to search for an optimal Cell structure, and then repeatedly stack the same Cell to form the entire network. Each Cell contains four nodes, and the nodes represent the feature map at a certain moment. The edges between nodes represent some kind of optional operation, and the total number of optional operations is eight kinds. Finally, it is necessary to search for a directed acyclic graph, that is, the structure of a Cell, as shown in Figure 5.

The process of searching for the optimal neural network architecture in the present application will be described in detail below with specific embodiments:

Step 1: Build a supernet through the supernet construction formula, expand the operation of eight edges in the supernet, and each edge has a different weight; the supernet construction formula is as follows:

Among them, o(x) is one of the operations in the optional operation set, that is, it represents one of the eight sides;

Refers to the weight of the edge o(x), (i, j) defines that the edge belongs between the i and j nodes;

Represents the weighted operation of eight edges between nodes i and i, and a feature map can be obtained through the calculation of the weighted operation; in addition, the formula uses the Softmax function to

processed;

Step 2: Use the gradient descent method to update the neural network architecture parameter α on the validation set loss, the formula is as follows:

Step 3: Use the gradient descent method to update the operation weight parameter w on the training set loss, the formula is as follows:

Step 5: After the training is completed, the maximum weight between the reserved nodes (i, j) is achieved through the node reservation formula

The edge of ; the node retention formula is:

where each node only retains connections to the previous two nodes with the largest weighted edges.

In the embodiment, adding spatial attention and channel attention to the output position of each Cell includes:

The process of adding spatial attention and channel attention to the output position of each Cell of the present application will be described in detail below with specific embodiments, as shown in Figures 6 and 8:

Step 1: Add channel attention to the output of the searched Cell structure, as shown in Figure 7; which includes, first, global average pooling and global maximum pooling are used for x with dimension h*w*C to obtain two A 1*1*C tensor, and then use a shared MLP layer to further extract features from the two tensors, and add them to obtain the channel attention weight M _c through the Sigmoid function.

Step 2: Add spatial attention to the output of the searched Cell structure, as shown in Figure 8; which includes, first, global average pooling and global maximum pooling are adopted for x with dimension h*w*C to obtain two h*w*1 tensors, and then use a convolution kernel to convert the two tensors into a h*w*1 feature map, and then process the Sigmoid function to obtain the spatial attention weight M _s .

Step 3: Perform matrix multiplication on the original output dimension x through the matrix multiplication formula with the obtained channel attention weight M _c and spatial attention weight M _s to obtain a new output x' of attention weighting. The matrix multiplication formula is as follows:

x'=M _s ×(M _c ×x)

Example 2

According to another embodiment of the present application, a neural network architecture automatic search device for traffic classification is provided, referring to FIG. 3 to FIG. 8 , including:

The neural network architecture automatic search method and device for traffic classification in the embodiment of the present application, and the data conversion module: used to obtain network traffic data samples, preprocess the data samples, and convert them into a traffic data set in the form of a two-dimensional matrix; Data division module: used to divide the traffic data set into training set, validation set and test set respectively; optional operation building module: optional operation used to construct the search space of the neural network architecture after the traffic data set is divided; search Optimal network architecture module: It is used to search for the optimal neural network architecture based on the optional operation of the search space; including, searching for the optimal Cell structure, repeatedly searching and stacking the same Cell structure, and forming the searched Cell structure into the entire neural network. Network Architecture. The present application can automatically search for the optimal neural network structure through a given candidate operation without manual participation.

In an embodiment, the device further includes:

The beneficial effects of this application are:

1. The neural network architecture automatic search method and device for traffic classification in the embodiment of the present application obtains network traffic data samples, preprocesses the data samples and converts them into a traffic data set in the form of a two-dimensional matrix; It is divided into training set, validation set and test set respectively; after the division of the traffic data set, the optional operation of constructing the search space of the neural network architecture; the optional operation based on the search space, searching for the optimal neural network architecture; including, Search for the optimal Cell structure, repeat the search and stack the same Cell structure, and form the searched Cell structure into the entire neural network architecture. The present application can automatically search for the optimal neural network structure through a given candidate operation without manual participation.

2. The application uses two-dimensional data matrix extraction with two granularities, thereby forming a dual-channel input.

3. This application adds spatial attention and channel attention to each Cell structure to enhance feature extraction capabilities.

The above description of the disclosed embodiments enables any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present application. Therefore, this application is not intended to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

A neural network architecture automatic search method for traffic classification, characterized in that it comprises the following steps:

Obtaining network traffic data samples, preprocessing the data samples and converting them into a traffic data set in the form of a two-dimensional matrix;

dividing the traffic data set into training set, validation set and test set respectively;

An optional operation of constructing a search space of a neural network architecture after the traffic data set is divided;

Based on the optional operation of the search space, searching for an optimal neural network architecture includes, searching for an optimal Cell structure, repeatedly searching and stacking the same Cell structure, and forming the searched Cell structure into an entire neural network architecture.
The neural network architecture automatic search method for traffic classification according to claim 1, wherein the method further comprises:

Add spatial attention and channel attention to the output position of each Cell;

Save the cell architecture parameters searched out, and retrain the Cell architecture using a new training set formed by combining the training set and the verification set.
The automatic search method for neural network architecture for traffic classification according to claim 2, characterized in that, when acquiring network traffic data samples, preprocessing the data samples and converting them into a traffic data set in the form of a two-dimensional matrix including: :

Step 1: perform data extraction at network flow granularity;

Step 2: carry out data extraction at packet granularity;

Step 3: splicing the data extracted at the network flow granularity and the data extracted at the data packet granularity along the channel dimension to form a tensor;

Step 4: Perform the operations from Step 1 to Step 3 above for all network flows to construct a formatted traffic data set.
The neural network architecture automatic search method for traffic classification according to claim 2, wherein dividing the traffic data set into a training set, a verification set and a test set respectively comprises:

The training set, verification set and test set contain the same proportion of traffic of each category; wherein,

The training set is used to update the network weight parameters in the structure;

The validation set is used to search for optimal network structure parameters;

The test set is used for retraining using the training set plus the validation set after the search is completed, and the final result is evaluated on the test set.
The method for automatically searching neural network architectures for traffic classification according to claim 2, wherein the optional operation of constructing a search space includes: limiting the scope of searching the neural network architecture space.
The automatic search method for a neural network architecture for traffic classification according to claim 5, wherein the optional operation of the search space of the neural network architecture is set as: 3×3 depthwise separable convolution, 5×5 depthwise separable convolution Convolution, 3x3 atrous convolution, 5x5 atrous convolution, 3x3 max pooling and 3x3 average pooling.
The method for automatically searching a neural network architecture for traffic classification according to claim 2, wherein the searching for an optimal neural network architecture based on the optional operation of the search space comprises:

Step 1: constructing a supernet; wherein, the operation of extending eight edges in the supernet, and each edge has a different weight;

Step 2: Use gradient descent method to update neural network architecture parameters on the validation set loss;

Step 3: Use gradient descent method to update the operation weight parameters on the loss of the training set;

Step 4: Repeat Step 2 and Step 3 until the neural network architecture training is completed;

Step 5: After the training is completed, keep the edge with the largest node weight.
The neural network architecture automatic search method for traffic classification according to claim 6, wherein the adding spatial attention and channel attention to the output position of each Cell includes:

Step 1: The channel attention weight is obtained by adding channel attention to the output of the cell structure searched out for processing;

Step 2: The spatial attention weight is obtained by adding spatial attention to the output of the searched Cell structure;

Step 3: Perform matrix multiplication of the obtained channel attention weight and spatial attention weight to obtain a new output of attention weighting.
A neural network architecture automatic search device for traffic classification, characterized in that the device comprises:

Data conversion module: used to obtain network traffic data samples, preprocess the data samples and convert them into a traffic data set in the form of a two-dimensional matrix;

Data division module: used to divide the traffic data set into training set, validation set and test set respectively;

Optional operation building module: an optional operation for constructing the search space of the neural network architecture after the traffic data set is divided;

Searching the optimal network architecture module: used to search for the optimal neural network architecture based on the optional operation of the search space; including, searching for the optimal Cell structure, repeatedly searching for stacking the same Cell structure, and combining the searched Cell structure Form the entire neural network architecture.
The neural network architecture automatic search device for traffic classification according to claim 9, wherein the device further comprises:

Attention adding module: used to add spatial attention and channel attention to the output position of each Cell;

Retraining module: used to save the cell architecture parameters searched out, and retrain the Cell architecture using a new training set formed by combining the training set and the verification set.