CN116319583A

CN116319583A - Encryption network traffic classification method based on GCNN and MoE

Info

Publication number: CN116319583A
Application number: CN202310207576.4A
Authority: CN
Inventors: 段思睿; 张弦; 余翔; 庞育才; 肖云鹏; 王蓉
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2023-03-06
Filing date: 2023-03-06
Publication date: 2023-06-23

Abstract

The invention belongs to the field of computer artificial intelligence, and in particular relates to an encryption network traffic classification method based on GCNN and MoE, which comprises the following steps: dividing flow data of a mobile application program in a period of time into a plurality of flow blocks with the same length; converting the traffic blocks into graph datasets having node features and edge weights; constructing an encrypted network traffic classification model of a mobile application program based on a graph rolling neural network GCNN and a hybrid expert system, and training the model; inputting a graph dataset of data to be tested into an encryption network flow classification model to obtain a classification result; the invention realizes higher classification performance and solves the problems of low classification accuracy, poor performance and the like of the traditional machine learning method and the traditional CNN, RNN and other neural network models.

Description

Encryption network traffic classification method based on GCNN and MoE

Technical Field

The invention belongs to the field of computer artificial intelligence, and particularly relates to an encryption network traffic classification method based on GCNN and MoE.

Background

With the increasing development of internet communication technology in recent years, the popularity of communication technology including 5G has made the growth of intelligent and mobile devices remarkable. By 2023, it is generally predicted that the number of internet of things (IoT) devices, including smartphones, will reach hundreds of billions, and that networks have become part of people's work and lives. In today's network management systems, network traffic classification is a critical task, the main objective being to predict network data flow protocols and application types.

In recent years, with the rapid development of the requirements for protecting the privacy security of the transmitted data and users, more and more protocols of application programs start to transmit data by using encryption technology, the duty ratio of encrypted traffic in the network also increases sharply, and the encryption technology is also more and more complex. The classification of encrypted traffic has been one of the most important network security directions since the advent of the internet. But due to the popularity of encryption technology and the high-speed increase of network throughput, it becomes increasingly difficult to achieve rapid and accurate classification of encrypted traffic. On the other hand, the occurrence of the encryption technology also leads to the increase of the possibility of various malicious traffic and network abnormal traffic, and the hacking attack also utilizes the encryption technology to perform a great deal of malicious attack activities, so that when a great deal of encrypted traffic occurs in the network, how to quickly classify the encrypted traffic and further perform refined traffic analysis is very important.

Existing mobile application classification works mostly overcome the challenges of encrypting traffic. For example, the appscenner approach uses a flow-based detection method that extracts side channel features from packet headers and computes statistical features to train on a machine learning model of mobile application classification. Also, the FlowPrint method constructs a fingerprint of an application by considering a communication map between a mobile device and other destinations (e.g., CDN and third party service) and related attributes (e.g., destination IP, destination port, and TLS certificate). In the inference phase, the fingerprints collected in the past are compared with the new fingerprints to determine the application. However, short communication time situations are considered due to challenges in building a communication graph of all possible behaviors of an application. Thus, if the user changes his usage behavior or uses a different function of the application, it may not function properly.

Integrating the current network traffic classification research situation based on deep learning, it is found that some challenges still exist in the process of classifying the network traffic by using the deep learning method:

1. over 80% of mobile traffic is encrypted or adopts Transport Layer Security (TLS), so traffic may not be classified using payload-based methods that analyze certain areas of the application layer protocol;

2. the port-based classification method cannot classify mobile traffic because an application program mainly uses HTTPS to transfer data and uses text formats such as XML or JSON to send data back and forth. Some information (such as the number of files or the size of the files) is not available to employ web page classification.

3. The user behavior varies dynamically over time, depending on the function used. Traffic captured within a short time (e.g., 5 minutes) of the mobile application may not represent its complete traffic behavior.

Disclosure of Invention

In order to solve the problems, the invention provides an encryption network traffic classification method based on GCNN and MoE, which specifically comprises the following steps:

s1, dividing flow data of a mobile application program in a period of time into flow blocks with the same length;

s2, converting the flow block into a graph dataset with node characteristics and edge weights;

s3, constructing an encrypted network traffic classification model of a mobile application program based on a graph rolling neural network GCNN and a hybrid expert system, and training the model;

s4, inputting the graph dataset of the data to be tested into the encrypted network flow classification model to obtain a classification result.

Further, when dividing the flow data of the mobile application program in a period of time into flow blocks with the same length, setting duration and overlapping time, and dividing the flow blocks by the duration and the overlapping time specifically includes: the length of each flow block is set to be the length of the duration, and each flow block has overlapping with its previous flow block by the length of overlap time and also has overlapping with its next flow block by the length of overlap time except for the first flow block and the last flow block.

Further, the process of converting traffic blocks into graph datasets having node features and edge weights includes the steps of:

removing the dns protocol in the flow block;

acquiring an IP address and a port number in a mobile application program and combining the IP address to the port number;

in constructing the graph data of the mobile application,

obtaining the maximum node number N required by one MApp graph, and generating all graph data of each MApp according to the weight between two nodes;

all the graphic data of each MApp are stored in 2 csv format files, node features are stored in the features.

Further, the encryption network traffic classification model of the mobile application program based on the graph rolling neural network GCNN and the hybrid Expert system comprises a plurality of cascaded GCN layers, a sourcing layer, an experiment network and a softmax layer, the graphs of the outputs of the four cascaded GCN layers potentially represent the maximum K values of the selected graph potential representation values of the input sourcing layer, the experiment network comprises a plurality of experiment units, the selected graphs potentially represent the input of the plurality of experiment units respectively, the product of the output of each experiment unit and the corresponding weight of the experiment unit is accumulated, and then the product of the output of each experiment unit and the corresponding weight of the experiment unit is input into the softmax layer, and the softmax layer obtains the classification result.

Further, if there are L GCN hierarchies, the output of the GCN layer of the first+1th is expressed as:

wherein ,

output of the layer for the 1+1th picture volume, c _l For the number of features of each graph node extracted at the first layer, n is the number of nodes, l=0..l-1, z ⁰ ＝X，/>

Representing a node characteristic matrix, c representing the characteristic quantity of nodes in the node characteristic matrix; />

Is a diagonal matrix of the graph; />

An adjacency matrix for adding self-loops; />

Is a trainable parameter of the first layer.

Further, a graph data is denoted by G, expressed as

For the node set in the graph data, epsilon is the edge set in the graph data, and if A is the adjacency matrix of the graph data, the adjacency matrix added with the self-loop is expressed as:

wherein I is an identity matrix.

Further, the side relationship between the nodes is established through the cross-correlation between the nodes, that is, if the cross-correlation between the two nodes is not 0, the side relationship exists between the two nodes, the side weight is the cross-correlation between the two nodes, and the calculation process of the cross-correlation between the two nodes comprises the following steps:

generating graph nodes according to the traffic captured in a given time window, dividing the given time window into T slices with different durations;

counting the number of packets of each intra-slice mobile application that send a packet of traffic to or receive a service deployed on the destination IP address and port as a cross-correlation between two nodes, expressed as:

wherein ,C_i,j R is the cross-correlation between node i and node j _i (t) a binary variable representing whether node i is active in time slice t, r when node i is active _i (t) =1, otherwise r _i (t)＝0。

Further, an Adam algorithm is adopted to optimize an encryption network traffic classification model of a mobile application program based on a graph roll-up neural network GCNN and a hybrid expert system, and the optimization process comprises the following steps:

the method comprises the steps of obtaining historical data as a training data set, and slicing the training data set, namely dividing the training data set into a plurality of mutually independent and orthogonal sub-data sets, wherein each sub-data set is sliced;

when the network is trained, each fragment is used for respectively inputting a classification model to obtain a prediction result, and training is carried out according to the prediction result and a label corresponding to training data;

and when the loss result converges or reaches the maximum training times, the optimization is completed.

Further, in the process of optimizing by using training data, a model logic loss function is used for carrying out a direction propagation optimizing model, and the model logic loss function is expressed as:

wherein ,

as a logical loss function of the network, Θ is a parameterized Expert network, denoted as

θ _M Representing the M th Expert unit, d is the dimension of an Expert network, and M is the number of the Expert units in the Expert network; n is the number of training data; y is _i A tag that is the ith training data; f (x) _i The method comprises the steps of carrying out a first treatment on the surface of the Θ, W) is the output of the Expert network, x _i For the ith training data, W is the weight of the MoE gating network; />

Is a sigmoid function, expressed as +.>

Further, the output F (x _i The method comprises the steps of carrying out a first treatment on the surface of the Θ, W) is expressed as:

wherein ,

is the set of selected indices, pi _m (x; Θ) is the gating value of the mth Expert network, h _m (x; Θ) is the output of the mth Expert network, [ P ]]For a sliced dataset, [ M ]]Gating a set of networks for the expert; f (f) _m (x; W) is the output of the mth Expert network, [ J ]]Representing the set of filters, σ (·) is the activation function, P represents the number of training data in a sliced dataset, w _m,j Weight vector, x representing the jth filter in the mth Expert network ^(p) Representing the input data as the p-th data in the sliced data set; x represents input data.

In the invention, the limitation of a single graph rolling neural network GCNN model on encryption traffic classification is considered, so that the encryption traffic is better identified and classified, the single GCNN model is split into a plurality of Experts networks by adding a MoE expert network into a graph rolling network GCN structure, training and prediction are simultaneously carried out, then the judgment is carried out in a combined mode, and the accuracy of classification and identification on the network encryption traffic is improved by the combined classification judgment of GCNN and MoE, the higher classification performance is realized, and the problems of low classification accuracy, poor performance and the like of the traditional machine learning method and the traditional CNN, RNN and other neural network models are solved.

Drawings

FIG. 1 is an overall flow chart of the present invention;

FIG. 2 is a schematic diagram of a split of a proof mass of the present invention;

FIG. 3 is a model diagram of an encryption traffic classification method based on DGCNN and MoE according to the present invention;

fig. 4 is a diagram of the structure of the experert network of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention provides an encryption network traffic classification method based on GCNN and MoE, which specifically comprises the following steps:

s1, dividing flow data of a mobile application program (Mobile Application, MApp) within a period of time into flow blocks with the same length;

In this embodiment, for the flow data of a certain MApp, a large-duration flow block of a MApp is split into a plurality of smaller-duration flow blocks of the same MApp, and the duration T is passed through _duration And overlap time T _overlap For splitting the flow blocks, as shown in FIG. 2, the split flow blocks of MApp of the same length are divided by the duration T _duration And overlap time T _overlap The flow block of one MApp is overlapped with the previous flow block by the overlapping time T _overlap Is also overlapping time T _overlap If the duration T is _duration Short enough, no T is required to be arranged _overlap How to judge the duration T _duration Whether short enough, the dividing flow block specifically comprises the following steps, according to experience set by a person skilled in the art:

step 1.1, acquiring original flow data by using a flow acquisition tool Wireshark and the like, wherein a sample data set is encrypted flow data of MApp in an original form, and processing the data;

step 1.2, firstly splitting the original MApp flow data block according to the duration T _duration And overlap time T _overlap Splitting a flow block of MApp with long duration into a plurality of small blocks with the same length and shorter duration;

step 1.3, the MApp flow block data set is stored as a csv format file after being split.

In this embodiment, each pair (IP: port) after combining the IP address to the Port number is defined as the next node in the graph data, that is, one node in the graph data in this embodiment is a combination of the IP address and the Port number. In this embodiment, the optimal value of the number N of nodes in a graph data is 20, if the value of N is higher, then all the graphs with fewer nodes must use zero padding for feature vectors (in the case of MLP) or potential representation vectors (in the case of the present invention), on the one hand, zero value features may mislead learning of the model, on the other hand, using fewer nodes, useful information may be lost from the discarded nodes, thereby affecting the performance of the model, most graphs have about 10 nodes, 90% of the graphs have less than 35 nodes, 86% of the graphs have less than 30 nodes, and when a large number of nodes are used, the performance increases to the optimal value N before decreasing again. It is worth mentioning that the optimal values of the MLP and the N of the present invention are different, the present invention requires more information about the graphics topology to distinguish the mobile applications, and the present invention has better performance in various experimental scenarios compared to the MLP.

Generating graph data used as a deep neural network model from the acquired flow data blocks of each MApp, specifically comprising the following steps:

step 2.1, removing the dns protocol in the data block;

step 2.2, obtaining the IP address and port number in the MApp and merging the IP address into a port number (same tuple (IP, port number) -same network destination);

and 2.3, using a flow block of a certain MApp as a data frame, generating all image data of each MApp by constructing the maximum node number N required by one MApp image and the weight between two nodes, wherein all image data of each MApp are stored in 2 csv format files, a features.csv format file with node characteristics and a weights.csv format file with weight between nodes.

Step 2.4, the connection is established by the edges connected between the nodes through cross correlation: given the traffic captured in a time window that would result in the plurality of graph nodes described above, the time window is then further divided into a plurality of segments having a predefined slice duration t _slice Inner slice. Let T be the number of slices, and during each slice time, a node (destination IP address and port number are a pair) is considered active if the MApp sends or receives at least one traffic packet to a service deployed on the destination IP address and port number. Let r be _i (t) is a binary variable of whether node i is active at time slice t, r when node i is active _i (t) =1, otherwise r _i (t) =0. Within the slice number T, the cross-correlation of two nodes i and j is defined as:

by adopting cross correlation, establishing the relationship of edges between nodes, and correspondingly setting the weight of the edges, specifically: if C _i,j Not equal to 0, an edge is established between two nodes i and j, and the weight is C _i,j 。

To avoid feature bias when training and predicting graph data into neural network classification model, using min-max scalar normalization to normalize C _i,j Normalized to the range [0,1 ]]And (3) inner part. The min-max scalar normalization math is defined as follows:

wherein x' represents the normalized value of the single data, and x represents the value before normalization of the single data; min represents the minimum value of the column in which the data is located, and max represents the maximum value of the column in which the data is located.

Each of the graph dataNodes all need to construct a feature vector, i.e., node features. Since mobile applications are connected to various services, each represented by a certain node of the graph as a tuple of IP address and port number, traffic behavior from the mobile device to the server of each service may differ in various traffic characteristics, such as packet size, packet number, flow duration, etc. To extend to both encrypted and unencrypted traffic, it is employed to extract information only from the packet header, without analyzing the packet payload. In addition to packet features, flow features such as the number of flows, the average number of packets in each flow, and the average flow size in bytes are extracted, and in this embodiment only TCP flows and UDP flows are considered, relying on the Wireshark tool to collect analysis flow features. The feature vectors of all nodes in a graph data form a node feature matrix X,

where n is the number of nodes and c is the number of features of the nodes.

An encrypted flow classification model based on a graph rolling neural network GCNN and a mixed expert MoE is constructed, statistical features and derivative features of data are extracted through the GCNN model, and the data processed through a SortPooling layer through the MoE model are fed into different MoE sub-networks for training and testing, and the method specifically comprises the following steps:

step 3.1, constructing an encryption flow classification framework of GCNN and MoE, judging the approaching degree of actual output (output is probability) and expected output (output is probability) by adopting a cross loss function, and adopting an Adam optimizer; the mathematical calculation of the cross entropy is as follows:

the learning rate during training is adjusted by adopting a fixed period interval, and after each step_size of epochs are trained, the learning rate is adjusted to lr=lr×decay, step_size=10, initial lr=0.0001, decay=0.9;

step 3.2, referring to fig. 3, the GCNN network model portion includes: the GCN structure formed by stacking 4 graph rolls has the size of 1024 for the first 3 graph convolution layers, 512 for the last graph convolution layer and tan h for the activation function;

in the laminate part of the drawing, a drawing is given

Let a be the adjacency matrix of G, so that a is a symmetrical binary matrix, and let the graph have no self-loops. Node feature matrix is defined as->

wherein />

The computing node is potentially denoted +.>

wherein />

Is an adjacency matrix added with self-loops, +.>

Is a diagonal matrix of the graph, such that +.>

Is a trainable graph convolution parameter matrix shared among nodes, sigma is a nonlinear activation function, ++>

Is the output activation matrix.

Intuitively, the graph data before being fed to the GCN layer shown in FIG. 3 has nodes defined by their neighbors and edges connecting the nodes, and thus the node's potential representation is affected by its neighbors. The graph convolutional layer, through its node features (XW), allows information to propagate between neighboring nodes through the product of the node features and the adjacency matrix (AXW). The potential representation of the final node is defined as

Wherein l= … L-1, z ⁰ :＝X,/>

Output for the first picture volume layer; c _l The number of output channels of the first layer (i.e., the number of features extracted at the first layer per graph node);

is a trainable parameter of the first layer. After the convolution process of all the convolution layers described above, a potential representation from the nodes in the overall graph is obtained.

Step 3.3, after the data of the graph convolution layer is processed, the obtained output is fed to a SortPooling layer, and the nodes are ordered through the sum of node characteristics of a first layer (the last layer of the graph convolution process);

if two nodes have the same value at the first level, the sum of the node features of the first-1 level is used, i.e. if the sum of the features of two nodes of the current level is the same, the two nodes are ordered using the sum of the features of their previous level until the tie is broken. Since the number of nodes in each graph is heterogeneous, the pooling layer also truncates or expands the potential representation of the graph to a predefined size. Given a predefined size (e.g., k) of the graph potential representation, truncation is performed if there are more than k values in the graph potential representation vector. Otherwise, zero padding is performed. The value of k is heuristically defined based on the input data. For example, defining k, 90% of the graph nodes are used to construct the graph latent representation vector to avoid losing node features in the final graph latent representation.

Referring to fig. 3, after constructing a MoE network model and passing through a sortpoling layer, an expert sub-network training test is performed, which specifically includes the following steps:

step 4.1, moE layer, consisting of a group of M "experert networks" f ₁ ,...,f _M And gating networks, which are typically set to be linear. Definition f _m (x; W) is the output of the mth Expert network, the output of the MoE layer may beThe definition is as follows:

wherein

Is the set of selected indices, pi _m (x; Θ) is the gating value of the mth Expert network, its value being determined by the following definition:

for the mth expert network, this embodiment considers it as a convolutional neural network CNN structure, defined as follows:

wherein ,

is the weight vector of the jth filter (i.e. neuron) in the mth Expert, j is the number of filters (i.e. neurons), d is the dimension of an Expert network; x is x ^(p) Representing the input data as the p-th data in the sliced data set; x represents input data; [ P ]]For a sliced dataset, [ M ]]Gating a set of networks for the expert; [ J]Representing the set of filters, σ (·) is the activation function, and P represents the number of training data in one sliced dataset.

In this embodiment, there is actually a Router at the SortPooling layer output to Weight, which adds noise as a disturbance and uses the experience-loss gradient of the disturbance to update the weights. The disturbance empirical loss at time t after the Router adds the disturbance is defined as:

wherein ,

and->

Is random noise; w (W) ⁽⁰⁾ Is the value initialized by the weight matrix, and the weight update rule of Router is defined as:

wherein η > 0 is expert learning rate, || _F Representing the operation of the norm calculation,

representing the weight update gradient at the mth expert network.

For the weight matrix of the mth expert, further let w= { W _m } _m∈[M] As a set of expert weight matrices.

Step 4.2, training the constructed MoE network model, which comprises the following steps:

wherein ,

a logic loss function representing the network, defined as +.>

And will be theta ⁽⁰⁾ Initialized to 0.

And 4.3, performing combined training and then judging on the data processed by the SortPooling layer by utilizing the constructed multi-Experts network with the graph rolling neural network GCNN and MoE structures, and performing prediction classification on the encrypted network flow data through the output probability of the Softmax function.

In this embodiment, a schematic diagram of an Expert unit is also provided, as shown in fig. 4, where the Expert unit includes a Conv1D layer, a MaxPool1D layer, a Conv1D layer, a Dense layer, a Dropout layer, and a Dense layer in cascade.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The encryption network traffic classification method based on GCNN and MoE is characterized by comprising the following steps:

s1, dividing flow data of a mobile application program in a period of time into a plurality of flow blocks with the same length;

2. The method for classifying traffic of an encrypted network based on GCNN and MoE according to claim 1, wherein when traffic data of a mobile application program in a period of time is divided into traffic blocks of the same length, a duration and an overlapping time are set, and the traffic blocks are divided by the duration and the overlapping time, specifically comprising: the length of each flow block is set to be the length of the duration, and each flow block has overlapping with its previous flow block by the length of overlap time and also has overlapping with its next flow block by the length of overlap time except for the first flow block and the last flow block.

3. The method of classifying traffic in an encrypted network based on GCNN and MoE as recited in claim 1, wherein the process of converting traffic blocks into graph datasets having node features and edge weights includes the steps of:

removing the dns protocol in the flow block;

in constructing the graph data of the mobile application,

4. The encryption network traffic classification method based on GCNN and MoE according to claim 1, wherein the encryption network traffic classification model of the mobile application program based on graph convolutional neural network GCNN and hybrid Expert system comprises a plurality of cascaded GCN layers, a soft layer, an experiment network and a softmax layer, graphs of outputs of four cascaded GCN layers potentially represent that the input soft layer selects the maximum K values of the graph potential representation values, the experiment network comprises a plurality of experiment units, the selected graphs potentially represent the input of the plurality of experiment units respectively, products of the output of each experiment unit and the corresponding weight of the experiment units are accumulated and then input into the softmax layer, and the softmax layer obtains classification results.

5. The method of classifying traffic in an encrypted network based on GCNN and MoE as recited in claim 4, wherein if there are L GCN hierarchies, the output of the GCN layer of the l+1 th is expressed as:

wherein ,

Is a diagonal matrix of the graph; />

An adjacency matrix for adding self-loops; />

Is a trainable parameter of the first layer.

6. The method of classifying traffic in an encrypted network based on GCNN and MoE as recited in claim 5, wherein G represents a graph data expressed as

wherein I is an identity matrix.

7. A method for classifying traffic in an encrypted network based on GCNN and MoE according to claim 3, wherein the side relationship between nodes is established by the cross-correlation between nodes, i.e. if the cross-correlation between two nodes is not 0, there is a side relationship between two nodes, the side weight is the cross-correlation between two nodes, and the calculation process of the cross-correlation between two nodes comprises the following steps:

8. The method for classifying encrypted network traffic based on GCNN and MoE according to claim 1, wherein the optimizing process comprises the steps of:

9. The method for classifying traffic in an encrypted network based on GCNN and MoE according to claim 8, wherein in the process of optimizing using training data, a model logic loss function is used to perform a direction propagation optimization model, where the model logic loss function is expressed as:

wherein ,

θ _M Representing the M th Expert unit, d is the dimension of an Expert network, and M is the number of the Expert units in the Expert network; n is the number of training data; y is _i A tag that is the ith training data; f (x) _i The method comprises the steps of carrying out a first treatment on the surface of the Θ, W) is the output of the Expert network, x _i For the ith training data, W is the weight of the MoE gating network; l (z) is a sigmoid function, expressed as l (z) =log (1+exp (-z)).

10. The method for classifying traffic in an encrypted network based on GCNN and MoE as recited in claim 8, wherein the output F (x _i The method comprises the steps of carrying out a first treatment on the surface of the Θ, W) is expressed as:

wherein ,