WO2020119481A1

WO2020119481A1 - Network traffic classification method and system based on deep learning, and electronic device

Info

Publication number: WO2020119481A1
Application number: PCT/CN2019/122001
Authority: WO
Inventors: 赵世林; 叶可江; 须成忠
Original assignee: 深圳先进技术研究院
Priority date: 2018-12-11
Filing date: 2019-11-29
Publication date: 2020-06-18
Also published as: CN109639481B; CN109639481A

Abstract

The present application relates to a network traffic classification method and system based on deep learning, and an electronic device. The method comprises: step a: capturing network traffic sample data; step b: extracting a global feature data set of the network traffic sample data by means of a deep learning classification algorithm; and step c: constructing a random forest classification model according to the global feature data set, and outputting a network traffic classification result by means of the random forest classification model. In the present application, the random forest classification model is trained by utilizing extracted global features, the result shows stable classification performance, ultra-high-dimension traffic data can be processed, and feature selection is not necessary. Compared with the prior art, the present application can effectively guarantee high precision and high performance of network traffic classification; in addition, the classification efficiency can be improved, the training time can be shortened, and the computation overhead can be reduced.

Description

Network traffic classification method, system and electronic equipment based on deep learning

Technical field

This application belongs to the technical field of network traffic classification, and in particular relates to a network traffic classification method, system, and electronic device based on deep learning.

Background technique

With the rapid development of Internet technology, a large number of new applications continue to appear on the network, and each application carries a variety of services and functions, making the network environment extremely large, complex and changeable. For the normal operation of the network and the real-time allocation of services and resources, it is already indispensable to have an effective method of monitoring network activities. Network traffic classification plays an important role in network management, resource allocation, on-demand services, and security systems. For example, for enterprise managers, by finely classifying and identifying network traffic, network resources can be accurately managed, The effective reuse of resources and the provision of personalized services play a very good role, and are also very important for enterprises to save unnecessary expenses on the network. Therefore, how to accurately classify network traffic and improve network resource reuse and personalized services is a major challenge.

In the prior art, commonly used network traffic classification methods include the following:

1. Network traffic classification based on representation learning: By preprocessing the obtained network traffic data, using the representation learning algorithm to extract the feature of the preprocessed network traffic data, the network traffic data is generated into a network flow vector, according to the network flow direction To classify the network traffic data by volume, it can realize the efficient classification of network traffic.

2. Network traffic classification method based on two-stage sequence feature learning: two-stage use of short- and long-term memory neural networks to learn the sequence characteristics of network traffic in two stages at the two levels of data packet and network stream, the first stage is based on the flow byte sequence A sequence of packet vectors is generated on the second stage. In the second stage, a network flow vector is further generated based on the sequence of packet vectors. Finally, a classifier is used to perform traffic classification on the network flow vector. This method fully considers the internal structure and organization relationship of network traffic, effectively utilizes the time series feature learning ability of long-term and short-term memory neural network, and classifies after obtaining more comprehensive and comprehensive traffic characteristics, which can achieve a more accurate network traffic classification effect.

3. A network traffic classification method based on hierarchical spatio-temporal feature learning: obtaining the spatial characteristics of the network traffic data through the first neural network; obtaining the temporal characteristics of the network traffic data through the second neural network; according to the spatial characteristics and the Time series features classify the network traffic. This method can get more comprehensive and accurate traffic feature information, which can effectively improve the network traffic classification ability; using a better traffic feature set can effectively reduce the false alarm rate.

In summary, the existing network traffic classification methods are based on traditional machine learning technology, the classification performance is very dependent on the design of traffic characteristics, and how to accurately describe the feature set of traffic characteristics requires a lot of manual design, This is still a difficulty in solving the problem of network traffic classification. At the same time, most of the current network traffic classification methods basically propose various optimization and improvement algorithms for the classification algorithm module in the training phase, but the local characteristics contained in the original data of the network traffic are rarely studied and excavated. Classification performance is unstable.

Summary of the invention

The present application provides a method, system and electronic device for network traffic classification based on deep learning, aiming to solve at least to a certain extent one of the above technical problems in the prior art.

In order to solve the above problems, this application provides the following technical solutions:

A network traffic classification method based on deep learning, including the following steps:

Step a: Capture network traffic sample data;

Step b: Extract the global feature data set of the network traffic sample data through a deep learning classification algorithm;

Step c: Construct a random forest classification model according to the global feature data set, and output the network traffic classification result through the random forest classification model.

The technical solution adopted in the embodiment of the present application further includes: in the step a, the capturing of network traffic sample data specifically includes: selecting a network data center to collect all network data packets; and at the same time, acquiring the network data packet corresponding time period System network logs generated by communication between internal network traffic.

The technical solution adopted in the embodiment of the present application further includes: in the step a, the network traffic sample data further includes: detecting network traffic sample data, preprocessing the network traffic sample data, and filtering out the network traffic sample data Incomplete network packets, and delete retransmitted network packets.

The technical solution adopted in the embodiment of the present application further includes: in the step a, the network traffic sample data further includes: performing sample labeling processing on the preprocessed network traffic sample data to obtain a network flow data set; The labeling of the sample is specifically: analyzing the network traffic sample data, finding out the natural attributes of each application and the IP address and transmission protocol between communicating with other applications; extracting the system network log and each application Associate the IP endpoints and the number of transmission packets to determine the category of the network traffic sample data, and combine the two applications with the IP address and transmission protocol of each application to complete the marking of the network traffic sample data; finally, use Deep packet inspection technology performs feature fingerprint matching on unknown traffic data to complete the marking of unknown traffic data.

The technical solution adopted in the embodiment of the present application further includes: in the step b, the global feature data set for extracting the network traffic sample data through the deep learning classification algorithm specifically includes:

Step b1: Enter the network flow data set;

Step b2: Use the correlation between the flow data contained in the four layers of the TCP/IP protocol to sequentially extract the flow data of the application layer, transmission layer, network layer, and data link layer of each network packet in sequence;

Step b3: According to the importance of the data contained in the four layers of the TCP/IP protocol, sequentially divide and extract the traffic data of different sizes for each layer in proportion;

Step b4: The extracted flow data is composed into one-dimensional M bytes, and the M bytes are converted into N pixels;

Step b5: Convert the N pixels into a gray image of standard size to form a new gray image data set;

Step b6: Send the grayscale image data set to the input layer of the convolutional neural network model. After continuously adaptively adjusting the size and number of the convolutional layer and the pooling layer, perform the convolution operation according to the bad, to obtain high-dimensional Global feature dataset.

Another technical solution adopted by the embodiment of the present application is: a network traffic classification system based on deep learning, including:

Data acquisition module: used to capture network traffic sample data;

Feature extraction module: used to extract the global feature data set of the network traffic sample data through a deep learning classification algorithm;

Classification model building module: used to build a random forest classification model according to the global feature data set;

Result output module: used to output network traffic classification results.

The technical solution adopted in the embodiments of the present application further includes: the data acquisition module capturing network traffic sample data specifically includes: selecting a network data center to collect all network data packets; and at the same time, acquiring network traffic within a time period corresponding to the network data packets System network logs generated during the exchange.

The technical solution adopted in the embodiment of the present application further includes a data preprocessing module, which is used to detect network traffic sample data, preprocess the network traffic sample data, and filter out incompleteness in the network traffic sample data Network data packets, and delete the retransmitted network data packets.

The technical solution adopted in the embodiment of the present application further includes a data labeling module, and the data labeling module is used to perform sample labeling processing on the preprocessed network traffic sample data to obtain a network flow data set; the sample labeling The tags are specifically: analyzing the network traffic sample data to find out the natural attributes of each application and the IP address and transmission protocol between communicating with other applications; extracting the IP associated with each application in the system network log Endpoints and the number of transmission packets, determine the category of the network traffic sample data, and combine the IP address and transmission protocol of each application to integrate the two to complete the marking of the network traffic sample data; Finally, use deep packet inspection technology Perform feature fingerprint matching on unknown traffic data to complete the marking of unknown traffic data.

The technical solution adopted in the embodiments of the present application further includes: the feature extraction module extracts the global feature data set of the network traffic sample data through a deep learning classification algorithm specifically: input network flow data set; using the TCP/IP protocol four-layer laboratory Contains the degree of correlation between traffic data, and sequentially extracts the traffic data of the application layer, transport layer, network layer, and data link layer of each network packet; according to the importance of the data contained in the four layers of the TCP/IP protocol , In order to divide and extract the traffic data of different sizes in each layer in turn according to the proportion; the extracted traffic data is composed of one-dimensional M bytes, and the M bytes are converted into N pixels; the N pixels The points are converted into standard-sized grayscale images to form a new grayscale image dataset; the grayscale image dataset is sent to the input layer of the convolutional neural network model, and the convolutional layer and pooling are continuously adaptively adjusted The size and number of layers are convolved according to the bad, and a high-dimensional global feature data set is obtained.

Another technical solution adopted by the embodiment of the present application is: an electronic device, including:

At least one processor; and

A memory communicatively connected to the at least one processor; wherein,

The memory stores instructions executable by the one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the following of the deep learning-based network traffic classification method described above operating:

Step a: Capture network traffic sample data;

Compared with the prior art, the beneficial effects produced by the embodiments of the present application are: the deep learning-based network traffic classification method, system and electronic device of the embodiments of the present application use the potential characteristics of the traffic data of each layer in the TCP/IP protocol for classification, The classification accuracy is improved, and at the same time, the depth of the data contained in each layer is mined in proportion to the depth, which well guarantees the high cohesion of the features of each layer. Using the extracted global features to train a random forest classification model, the results show stable classification performance, can handle high-dimensional traffic data, and do not need to make feature selection. Compared with the prior art, the present application can effectively guarantee the high accuracy and high performance of the network traffic classification, and at the same time, it can improve the classification efficiency, shorten the training time, and reduce the calculation overhead.

BRIEF DESCRIPTION

1 is a flowchart of a network traffic classification method based on deep learning according to an embodiment of the present application;

2 is a flowchart of feature extraction by a deep learning classification algorithm according to an embodiment of the present application;

3 is a schematic structural diagram of a network traffic classification system based on deep learning according to an embodiment of the present application;

4 is a schematic structural diagram of a hardware device of a network traffic classification method based on deep learning provided by an embodiment of the present application.

detailed description

In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be described in further detail in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.

In view of the technical problems of the existing network traffic classification methods, the deep learning-based network traffic classification method of the embodiment of the present application uses deep learning hidden feature extraction technology to accurately mine a large number of hidden traffic feature sets in network traffic to ensure that network traffic classification In the process, the flow feature set in the network traffic is fully and efficiently used to accurately classify and identify the network traffic.

Specifically, please refer to FIG. 1, which is a flowchart of a deep learning-based network traffic classification method according to an embodiment of the present application. The network traffic classification method based on deep learning in the embodiment of the present application includes the following steps:

Step 100: Capture sample data of network traffic;

In step 100, capturing network traffic sample data specifically includes: selecting a large network data center and using Wireshark software to collect all network data packets; at the same time, for labeling data, and setting up high-performance network monitoring software for continuous capture to obtain network data The system network log generated by communication between network traffic during the time period corresponding to the packet.

Step 200: Detect network traffic sample data, and preprocess the network traffic sample data;

In step 200, the preprocessing of the network traffic sample data specifically includes: first, in order to prevent the incomplete network data packets generated by the transmission disconnection caused by the unstable three-way handshake of TCP (Transmission Control Protocol), the incomplete network data needs to be filtered out package. Secondly, in order to avoid the retransmission of network packets caused by the loss of acknowledgement packets during TCP connection, the retransmitted network packets need to be deleted.

Step 300: Perform sample labeling processing on the preprocessed network traffic sample data to obtain a network flow data set;

In step 300, the sample labeling specifically includes: first, analyze the network traffic sample data to find out the natural attributes of each application and the key information between communicating with other applications, including the IP address, transmission protocol, etc.; second, extract In the system network log, the IP endpoints and the number of transmission packets associated with each application are used to determine the category of network traffic sample data, and combined with the IP address and transmission protocol of each application to associate and merge the two to complete the marking of network traffic sample data. Finally, the DPI (Deep Packet Inspection) technology is used to perform fingerprint matching on unknown traffic data to complete the marking of unknown traffic data.

Step 400: Extract the global feature data set of the network flow data set through a deep learning classification algorithm;

In step 400, the embodiment of the present application uses the degree of association of each layer of protocol data in the traffic packets in the network traffic to re-extract and distribute the data set. Specifically, please refer to FIG. 2 together, which is a flowchart of extracting global feature data of the deep learning classification algorithm according to an embodiment of the present application, which specifically includes the following steps:

Step 401: Enter the network stream data set;

Step 402: Use the correlation between the flow data contained in the four layers of the TCP/IP protocol to sequentially extract the flow data of the application layer, transmission layer, network layer, and data link layer of each network packet in sequence;

Step 403: According to the importance of the data contained in the four layers of the TCP/IP protocol, sequentially divide and extract traffic data of different sizes for each layer according to a certain ratio;

In step 403, the present application deeply digs in proportion to the importance of the data contained in each layer, which well guarantees the high cohesion of the features of each layer.

Step 404: Combine the extracted traffic data into one-dimensional M bytes, and convert the M bytes into N pixels;

Step 405: Convert the N pixels into a gray image of standard size (X, X, 1) to form a new gray image data set;

Step 406: Send the grayscale image data set to the input layer of the convolutional neural network model. After continuously adaptively adjusting the size and number of the convolutional layer and the pooling layer, perform the convolution operation according to the bad, to obtain a high-dimensional global Feature data set;

In step 406, the convolution operation of the convolutional neural network model is specifically as follows: First, a small number of convolution kernels are set in the convolutional layer close to the input layer. The number of accumulated cores increased. The size Y*Y, number C and sliding step W of the convolution kernel can be automatically trained. In order to ensure that the original image size can be kept unchanged after the convolution operation, in the embodiment of the present application, a convolution kernel of size 3*3 and a zero padding of 0 are selected, and the size of Feature_map (feature map) = ( wide+2*padding_size-filter_size)/stride+1, the specific size can be set according to the actual application.

Step 407: Compress the image in the global feature data set to reduce parameters without affecting the image quality by downsampling;

In step 407, the downsampling method is specifically: the pooling layer is set to use MaxPooling (maximum pooling), the size is 2*2, the step size is 1, the maximum value of each window is updated, then the size of the image will be determined by Feature_map Becomes 2*2: (Feature_map-2)+1.

Step 408: Repeat steps 407 and 408 until a large number of local features are extracted and the convolution operation is terminated after the set learning rate is satisfied;

Step 409: The local feature extraction result is input to the Flatten layer, and the Flatten layer outputs a one-dimensional global feature data set.

Step 500: Perform classification training on the extracted global feature data set, construct a random forest classification model, and output a network traffic classification result through the random forest classification model.

In step 500, the present application first uses a convolutional neural network to extract a global feature data set, and then uses the extracted global feature data set to train a random forest classification model. During the training process, it can detect the mutual influence of features (features), which is effective Guarantees the high precision and high performance of network traffic classification.

In this application, a random forest algorithm using supervised learning is used for modeling. According to the results given by each decision tree in the forest, not only can the category judgment of known traffic be obtained, but also the classification of unknown traffic can be determined by voting. The test results show that the random forest classification model of the embodiment of the present application has very high classification accuracy, and at the same time, it can improve classification efficiency, shorten training time, and reduce calculation overhead.

Please refer to FIG. 3, which is a schematic structural diagram of a network traffic classification system based on deep learning according to an embodiment of the present application. The network flow classification system based on deep learning in the embodiment of the present application includes a data acquisition module, a data preprocessing module, a data labeling module, a feature extraction module, a classification model construction module, and a result output module.

Data acquisition module: used to capture network traffic sample data; among them, capturing network traffic sample data specifically includes: selecting a large network data center and using Wireshark software to collect all network data packets; at the same time, for label data, and setting up high-performance network monitoring The software continuously captures and obtains the system network log generated by the communication between the network traffic within the corresponding time period of the network data packet.

Data pre-processing module: used to detect network traffic sample data and pre-process network traffic sample data; among them, network traffic sample data pre-processing specifically includes: first, in order to prevent TCP (Transmission Control Protocol) three-way handshake Instability leads to incomplete network data packets caused by disconnection. Incomplete network data packets need to be filtered out. Secondly, in order to avoid the retransmission of network packets caused by the loss of acknowledgement packets during TCP connection, the retransmitted network packets need to be deleted.

Data labeling module: used for sample labeling the pre-processed network traffic sample data to obtain a network flow data set; among them, the sample labeling specifically includes: first, analyze the network traffic sample data to find each application The natural attributes of and the key information exchanged with other applications, including IP addresses, transmission protocols, etc.; second, extract the IP endpoints and the number of transmission packets associated with each application in the system network log to determine the network traffic sample data belongs to Category, and combine the IP address and transmission protocol of each application to associate and merge the two to complete the marking of network traffic sample data; finally, use DPI (Deep Packet Inspection) technology to perform feature fingerprint matching on unknown traffic data, Complete tagging of unknown traffic data.

Feature extraction module: used to extract the global feature data set of the network flow data set through a deep learning classification algorithm; the embodiments of the present application use the degree of association of each layer of protocol data in the traffic packets in the network traffic to re-extract and distribute the data set. Specifically, the global feature data set extraction method includes:

1. Enter the network stream data set;

2. Use the correlation between the flow data contained in the four layers of the TCP/IP protocol to sequentially extract the flow data of the application layer, transmission layer, network layer, and data link layer of each network packet;

3. According to the importance of the data contained in the four layers of the TCP/IP protocol, the traffic data of different sizes of each layer is sequentially divided and extracted according to a certain ratio;

4. The extracted flow data is composed of one-dimensional M bytes, and the M bytes are converted into N pixels;

5. Convert N pixels to grayscale images of standard size (X, X, 1) to form a new grayscale image data set;

6. Send the gray image data set to the input layer of the convolutional neural network model. After continuously adaptively adjusting the size and number of the convolutional layer and the pooling layer, the convolution operation is performed according to the bad, and the high-dimensional global features are obtained. Data set; specifically: First, a small number of convolution kernels are set in the convolution layer close to the input layer. As the training cycle is broken, the number of convolution kernels set by the convolution layer increases. The size Y*Y, number C and sliding step W of the convolution kernel can be automatically trained. In order to ensure that the original image size can be kept unchanged after the convolution operation, in the embodiment of the present application, a convolution kernel of size 3*3 and a zero padding of 0 are selected, and the size of Feature_map (feature map) = ( wide+2*padding_size-filter_size)/stride+1, the specific size can be set according to the actual application.

7. Through the downsampling method, the image in the global feature data set is compressed to reduce the parameters without affecting the image quality; the downsampling method is specifically: the pooling layer is set to use MaxPooling (maximum pooling), the size is 2*2, step size is 1, update with the largest value of each window, then the size of the image will change from Feature_map to 2*2: (Feature_map-2)+1.

8. Repeat the convolution operation and the downsampling operation until a large number of local features are extracted and the convolution operation is terminated after the set learning rate is satisfied;

9. The local feature extraction result is input to the Flatten (flattening) layer, and the Flatten layer outputs a one-dimensional global feature dataset.

Classification model building module: used for classification training on the extracted global feature data set to build a random forest classification model; this application first uses a convolutional neural network to extract the global feature data set, and then uses the extracted global feature data set to train the random forest classification The model, during the training process, can detect the interaction between features (features), and effectively guarantee the high accuracy and high performance of network traffic classification.

Result output module: used to output network traffic classification results.

4 is a schematic structural diagram of a hardware device of a network traffic classification method based on deep learning provided by an embodiment of the present application. As shown in Figure 4, the device includes one or more processors and memory. Taking a processor as an example, the device may further include: an input system and an output system.

The processor, memory, input system, and output system may be connected through a bus or in other ways. In FIG. 4, connection through a bus is used as an example.

As a non-transitory computer-readable storage medium, the memory can be used to store non-transitory software programs, non-transitory computer executable programs, and modules. The processor runs non-transitory software programs, instructions, and modules stored in the memory to execute various functional applications and data processing of the electronic device, that is, to implement the processing methods of the foregoing method embodiments.

The memory may include a storage program area and a storage data area, where the storage program area may store an operating system and application programs required by at least one function; the storage data area may store data, and the like. In addition, the memory may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the memory optionally includes memories remotely located with respect to the processor, and these remote memories may be connected to the processing system via a network. Examples of the above network include but are not limited to the Internet, intranet, local area network, mobile communication network, and combinations thereof.

The input system can receive input digital or character information, and generate signal input. The output system may include display devices such as display screens.

The one or more modules are stored in the memory, and when executed by the one or more processors, perform the following operations of any of the foregoing method embodiments:

Step a: Capture network traffic sample data;

The above-mentioned products can execute the method provided in the embodiments of the present application, and have function modules and beneficial effects corresponding to the execution method. For technical details that are not described in detail in this embodiment, refer to the method provided in the embodiments of the present application.

An embodiment of the present application provides a non-transitory (non-volatile) computer storage medium that stores computer-executable instructions, and the computer-executable instructions can perform the following operations:

Step a: Capture network traffic sample data;

An embodiment of the present application provides a computer program product. The computer program product includes a computer program stored on a non-transitory computer-readable storage medium. The computer program includes program instructions. When the program instructions are executed by a computer To cause the computer to perform the following operations:

Step a: Capture network traffic sample data;

The deep learning-based network traffic classification method, system, and electronic device of the embodiment of the present application use the potential characteristics of each layer of traffic data in the TCP/IP protocol for classification, which improves the classification accuracy, and at the same time according to the importance of the data contained in each layer Deep digging according to the ratio guarantees the high cohesion of the features of each layer. Using the extracted global features to train a random forest classification model, the results show stable classification performance, can handle high-dimensional traffic data, and do not need to make feature selection. Compared with the prior art, the present application can effectively guarantee the high accuracy and high performance of the network traffic classification, and at the same time, it can improve the classification efficiency, shorten the training time, and reduce the calculation overhead.

The above is only the preferred embodiment of the present invention. It should be pointed out that for those of ordinary skill in the art, without departing from the principles of the present invention, several improvements and retouches can be made. These improvements and retouches also It should be regarded as the protection scope of the present invention.

Claims

A network traffic classification method based on deep learning, which is characterized by the following steps:

Step a: Capture network traffic sample data;

Step b: Extract the global feature data set of the network traffic sample data through a deep learning classification algorithm;

Step c: Construct a random forest classification model according to the global feature data set, and output the network traffic classification result through the random forest classification model.
The network traffic classification method based on deep learning according to claim 1, wherein in step a, the capturing network traffic sample data specifically includes: selecting a network data center and collecting all network data packets; at the same time, Obtain system network logs generated by communication between network traffic within the corresponding time period of the network data packet.
The network traffic classification method based on deep learning according to claim 2, wherein in the step a, the network traffic sample data further comprises: detecting network traffic sample data, and preprocessing the network traffic sample data , Filter out incomplete network packets in the network traffic sample data, and delete the retransmitted network packets.
The network traffic classification method based on deep learning according to claim 3, characterized in that, in the step a, the network traffic sample data further comprises: performing sample processing on the preprocessed network traffic sample data Label processing to obtain a network flow data set; the labeling of the sample specifically includes analyzing the network traffic sample data to find out the natural attributes of each application and the IP address and transmission protocol between the application and other applications; Describe the number of IP endpoints and transmission packets associated with each application in the system network log, determine the category of the network traffic sample data, and combine the two applications with the IP address and transmission protocol of each application to complete the network Marking of traffic sample data; finally, using deep packet inspection technology to perform fingerprint matching on unknown traffic data to complete the marking of unknown traffic data.
The network traffic classification method based on deep learning according to claim 4, characterized in that, in the step b, the global feature data set for extracting the network traffic sample data through the deep learning classification algorithm specifically includes:

Step b1: Enter the network flow data set;

Step b2: Use the correlation between the flow data contained in the four layers of the TCP/IP protocol to sequentially extract the flow data of the application layer, transmission layer, network layer, and data link layer of each network packet in sequence;

Step b3: According to the importance of the data contained in the four layers of the TCP/IP protocol, sequentially divide and extract the traffic data of different sizes for each layer in proportion;

Step b4: The extracted flow data is composed into one-dimensional M bytes, and the M bytes are converted into N pixels;

Step b5: Convert the N pixels into a gray image of standard size to form a new gray image data set;

Step b6: Send the gray-scale image data set to the input layer of the convolutional neural network model, after continuously adaptively adjusting the size and number of the convolutional layer and the pooling layer, perform the convolution operation according to the bad, and obtain high-dimensional Global feature dataset.
A network traffic classification system based on deep learning, which is characterized by:

Data acquisition module: used to capture network traffic sample data;

Feature extraction module: used to extract the global feature data set of the network traffic sample data through a deep learning classification algorithm;

Classification model building module: used to build a random forest classification model according to the global feature data set;

Result output module: used to output network traffic classification results.
The network traffic classification system based on deep learning according to claim 6, wherein the data acquisition module capturing network traffic sample data specifically includes: selecting a network data center to collect all network data packets; and at the same time, acquiring the network The system network log generated by the communication between network traffic during the time period corresponding to the data packet.
The network traffic classification system based on deep learning according to claim 7, further comprising a data pre-processing module, the data pre-processing module is configured to detect network traffic sample data and pre-process the network traffic sample data Process, filter out incomplete network data packets in network traffic sample data, and delete retransmitted network data packets.
The network traffic classification system based on deep learning according to claim 8, further comprising a data labeling module, the data labeling module is used for sample labeling the preprocessed network traffic sample data Processing to obtain a network flow data set; the labeling of the sample specifically includes: analyzing the network flow sample data to find out the natural attributes of each application and the IP address and transmission protocol between communicating with other applications; extracting the In the system network log, the number of IP endpoints and transmission packets associated with each application, determine the category of the network traffic sample data, and combine the IP address and transmission protocol of each application to associate and merge the two to complete the network traffic Marking of sample data; finally, using deep packet inspection technology to perform fingerprint matching on unknown traffic data to complete the marking of unknown traffic data.
The network traffic classification system based on deep learning according to claim 9, wherein the feature extraction module extracts the global feature data set of the network traffic sample data through a deep learning classification algorithm specifically: input network flow data set ; Utilize the degree of correlation between the flow data contained in the four layers of the TCP/IP protocol to sequentially extract the flow data of the application layer, transport layer, network layer, and data link layer of each network packet; according to the TCP/IP protocol The importance of the data contained in the four layers is divided and extracted in sequence in proportion to the flow data of different sizes; the extracted flow data is composed of one-dimensional M bytes, and the M bytes are converted into N Pixels; convert the N pixels into a standard-sized grayscale image to form a new grayscale image data set; send the grayscale image data set to the input layer of the convolutional neural network model, after continuous Adaptively adjust the size and number of convolutional layer and pooling layer, and perform convolution operation according to the bad, to obtain a high-dimensional global feature data set.
An electronic device, including:

At least one processor; and

A memory communicatively connected to the at least one processor; wherein,

The memory stores instructions executable by the one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the depth-based Learn the following operations of the network traffic classification method:

Step a: Capture network traffic sample data;

Step b: Extract the global feature data set of the network traffic sample data through a deep learning classification algorithm;

Step c: Construct a random forest classification model according to the global feature data set, and output the network traffic classification result through the random forest classification model.