CN116451138A

CN116451138A - Encryption traffic classification method, device and storage medium based on multi-modal learning

Info

Publication number: CN116451138A
Application number: CN202310475221.3A
Authority: CN
Inventors: 金彦亮; 陈彦韬; 高塬
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2023-04-27
Filing date: 2023-04-27
Publication date: 2023-07-18

Abstract

The invention relates to an encryption traffic classification method, device and storage medium based on multi-mode learning, and relates to the technical field of computer network management, wherein the method comprises the following steps: collecting target encrypted flow data, and obtaining an effective session through flow division and data cleaning; heterogeneous information of an effective session is extracted, and a multi-mode data set consisting of a content matrix and a time sequence matrix is constructed; extracting content modal features from the data packet level to the session level in turn by using the hierarchical attention network; extracting time sequence modal characteristics of different granularities by using a time sequence circulation network; fusing the content modal characteristics and the time sequence modal characteristics based on a multi-modal fusion network, and extracting high-level multi-modal characteristics by adopting a high-speed network; based on the high-level multi-mode characteristics, the traffic classification probability is output through the output layer, so that the encrypted traffic classification is realized. In contrast to the prior art, the method has the advantages that, the invention has the advantages of fully considering the hierarchical structure and the time sequence association characteristic, improving the classification accuracy of the encrypted flow data and the like.

Description

Encryption traffic classification method, device and storage medium based on multi-modal learning

Technical Field

The present invention relates to the field of computer network management technologies, and in particular, to an encrypted traffic classification method, device and storage medium based on multi-modal learning.

Background

In recent years, due to the wide spread and vigorous development of the internet, people have generated a large amount of network traffic in daily life at any time. Although the internet has greatly improved people's daily life in all aspects of clothing and eating houses, the problem of privacy leakage accompanying this has become an increasing focus. For this reason, encryption technology is gradually applied to internet communication, and problems in terms of network security are solved to some extent. However, this also inevitably results in the amount of encrypted traffic in the internet assuming a year-by-year trend. Therefore, classification methods for encrypted traffic are also becoming a challenge to be solved in the industry and industry. On one hand, the realization of the accurate classification of the encrypted traffic can assist a network operator to reasonably allocate bandwidth resources according to the type of the traffic, and effectively ensure the service quality; on the other hand, the method is helpful for detecting malicious traffic disguised by encryption technology and enhancing the security defense of the network.

In the early stage of the internet, the main stream method of traffic classification is port number matching and deep packet inspection technology. With the popularity of dynamic ports and encryption techniques, these two classification approaches also began to go inapplicable. As the field of machine learning rises, more and more researchers began to adopt statistical-based machine learning algorithms to solve the problem of classifying encrypted traffic. While statistical properties are not masked by encryption techniques and have shown success in a portion of the study, such methods rely heavily on expert knowledge to design effective features, with obvious shortcuts that are time consuming, laborious, complex.

In view of this, the deep learning technique in recent years has been popular among researchers in the field of encryption traffic classification due to the advantages of having end-to-end and automatic learning features. At present, a lot of researches apply deep learning to encryption traffic classification, but the existing method still has the following improvement spaces: (1) encrypted traffic information utilization is incomplete: because the effective load information of the encrypted traffic is confused by encryption technology, the distribution characteristic of the effective load information is covered to a certain extent, and most of the existing methods only focus and stack deep networks to extract effective load characteristics, but do not fully utilize clear statistical information, so that the characterization capability of the encrypted traffic is weak. (2) ignoring the heterogeneity of the encrypted traffic: the encrypted traffic has a hierarchical structure and time domain correlation characteristics respectively in terms of content and time sequence, and the conventional method does not utilize the characteristics to design a proper characteristic extraction network, so that high-precision classification performance is difficult to realize.

Disclosure of Invention

The invention aims to provide an encryption traffic classification method, device and storage medium based on multi-mode learning, which are used for realizing high-precision performance on different fine-granularity classification tasks of encryption traffic by constructing a payload mode and a statistical information mode to fully utilize the isomerism of traffic.

The aim of the invention can be achieved by the following technical scheme:

an encryption traffic classification method based on multi-modal learning comprises the following steps:

s1, collecting target encrypted flow data, and obtaining an effective session through flow division and data cleaning;

s2, heterogeneous information of an effective session is extracted, and a multi-mode data set consisting of a content matrix and a time sequence matrix is constructed;

s3, extracting content modal characteristics from the data packet level to the session level in sequence by using a hierarchical attention network based on the multi-modal data set;

s4, extracting time sequence modal characteristics with different granularities by using a time sequence circulation network based on the multi-modal data set;

s5, fusing the content modal characteristics and the time sequence modal characteristics based on a multi-modal fusion network, and extracting high-level multi-modal characteristics by adopting a high-speed network;

s6, based on the high-level multi-mode characteristics, outputting the traffic classification probability through the output layer, and realizing encryption traffic classification.

Further, the step S1 includes the steps of:

s11, capturing target encrypted traffic of a designated network interface by utilizing Wireshark to obtain a PCAP format original traffic file;

s12, dividing each PCAP file into two-way sessions according to the five-tuple by utilizing the hash data structure;

s13, considering that a certain number of plaintext domain name resolution sessions exist in network communication, cleaning data, and filtering out the sessions to avoid deviation of classification results;

s14, deleting the ACK data packet and retransmitting the data packet in the TCP session, and removing the data link layer protocol header of all the data packets to obtain a clean effective session.

Further, the five-tuple structure is expressed as < source IP address, destination IP address, source port, destination port, transport layer protocol >, wherein the source and destination directions are interchangeable.

Further, the step S2 includes the steps of:

s21, sequentially selecting the data of the first N data packets representing the whole session of each effective session;

s22, analyzing byte content of each data packet, extracting a payload byte sequence of a transmission layer, and processing the payload byte sequence into a fixed length with the size of M of an Ethernet MTU (modulation transfer unit), so as to obtain a content matrix with the size of N rows and M columns corresponding to each effective session;

s23, analyzing protocol heads of each data packet, extracting statistical information sequences of each data packet, wherein the statistical information sequences comprise arrival time intervals, data packet directions, lengths and TCP window sizes, performing dimension splicing on the four statistical information sequences to obtain a time sequence matrix with N rows and 4 columns corresponding to each effective session, wherein the arrival time intervals are determined by subtracting arrival time sequences of adjacent data packets sequentially, the data packet directions are determined by analyzing IP addresses to infer identities of a server and a client, and the data packet lengths are determined by lengths of payload sequences of a transmission layer;

and S24, normalizing each element in the content matrix and the time sequence matrix, and dividing the content matrix and the time sequence matrix into a training set and a testing set according to a preconfigured proportion, wherein the training set and the testing set are used for training and testing a hierarchical attention network, a time sequence circulation network and a multi-mode fusion network.

Further, the hierarchical attention network comprises a distributed one-dimensional convolution module, a distributed attention mechanism module and a bidirectional GRU module facing to a data packet sequence, which are sequentially connected, and the content modal feature extraction based on the hierarchical attention network specifically comprises the following steps:

s31, sequentially iterating each row of the content matrix by using a distributed one-dimensional convolution module, and extracting the content characteristics of each data packet payload at the data packet level;

s32, sequentially giving attention weight to the content feature sequences output in the S31 by using a distributed attention mechanism, sequentially carrying out weighted summation on sequence points on each content feature sequence, and converting the sequence points into high-purity content feature vectors;

s33, stacking each content feature vector obtained in the S32 into a content feature sequence of a session level, modeling the content feature of the session level by utilizing a bidirectional GRU module, and outputting the content modal feature.

Further, the time sequence circulation network comprises a Time Convolution Network (TCN) module and a bidirectional GRU module which are sequentially connected, and the time sequence circulation network-based time sequence mode feature extraction method specifically comprises the following steps:

s41, inputting a time sequence matrix into a time convolution network module, and carrying out short-term feature extraction through multi-layer expansion causal convolution to construct identity mapping;

s42, capturing long-term characteristics from front directions and back directions based on the bidirectional GRU module, and outputting time sequence modal characteristics.

Further, in the multi-modal fusion network in S5, considering that the content modal feature and the time sequence modal feature are both in a sequence form, performing dimension splicing on each sequence point by using a point-to-point fusion mode, and adaptively extracting multi-modal representation by using a high-speed network to obtain a high-level multi-modal feature.

Further, the step S6 specifically includes: the high-level multi-mode features are subjected to dimension reduction through the flat operation and input into a full-connection output layer, and are mapped into classification probabilities through a softmax function, so that classification labels of the encrypted traffic are obtained, and the encrypted traffic is classified.

An encrypted traffic classification device based on multi-modal learning comprises a memory, a processor and a program stored in the memory, wherein the processor realizes the method when executing the program.

A storage medium having stored thereon a program which when executed performs a method as described above.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention provides a new encrypted traffic heterogeneous representation mode, which utilizes the effective load information and the statistical information of a session to construct a content matrix and a time sequence matrix, thereby better enhancing the input characterization of the encrypted traffic.

2. The invention constructs a hierarchical attention network based on the hierarchical structure characteristics of the traffic, and fully mines the content difference of the encrypted traffic by fully extracting the content characteristics of different fine granularity of data packets and conversations.

3. According to the invention, a time sequence circulation network is constructed based on time sequence association characteristics of traffic, extraction of model optimization time sequence characteristics of two memory lengths is introduced, and modeling of encrypted traffic time sequence characteristics is effectively realized.

4. The invention provides a multi-mode model by utilizing a parallel integration mode, and improves the classification performance of the encrypted traffic by adopting a characteristic extraction method combining a content mode and a time sequence mode.

Drawings

FIG. 1 is a flow diagram of an encryption traffic classification method based on multi-modal learning;

FIG. 2 is a flow chart of efficient session acquisition and heterogeneous information extraction for encrypted traffic;

fig. 3 is a diagram of an encrypted traffic classification model architecture of the present invention.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples. The present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.

As shown in fig. 1, the embodiment discloses an encrypted traffic classification method based on multi-modal learning, which includes the following steps:

The heterogeneous information representation method provided by the invention enhances the input characterization of the encrypted traffic and provides rich information to support reliable classification decision. Further, a multi-mode classification model is formed by constructing a payload mode and a statistical information mode, the effective characteristics of the encrypted traffic are fully mined, and the classification performance of the model is improved. The steps of this embodiment are specifically described below:

the target encrypted traffic raw data is collected in S1 and converted into an active session, as shown in fig. 2.

S11, after confirming a communication interface of the target encrypted traffic, capturing the target encrypted traffic of a designated network interface by utilizing Wireshark to obtain a PCAP format original traffic file;

s12, dividing the collected PCAP file according to the source IP address, the destination IP address, the source port, the destination port and the transmission layer protocol by utilizing the dictionary type of the hash data structure, and accurately obtaining the flow to the session level;

s13, considering that a certain number of plaintext domain name resolution sessions exist in network communication, performing data cleaning, analyzing the highest layer protocol of a data packet by using the information of a dpkt library resolution session protocol layer, filtering out the plaintext domain name resolution session, and eliminating deviation interference of plaintext information on a classification result;

s14, deleting the ACK data packet and retransmitting the data packet in the TCP session, removing the data link layer protocol header of all the data packets, eliminating the noise influence of the unclassified information, and obtaining a clean effective session.

S2, extracting heterogeneous information representation of the effective session, and forming a training set and a testing set.

S21, sequentially cutting the effective sessions, and selecting the data of the first 20 data packets representing the whole session of each effective session;

s22, carrying out layer-by-layer protocol analysis by using a dpkt library, extracting a payload byte sequence of a transmission layer, filling a data packet with a value less than 1500 bytes of an Ethernet MTU, and truncating a data packet with a value greater than 1500 bytes of the Ethernet MTU, thereby obtaining a content matrix with a size of 20 multiplied by 1500 corresponding to each effective session;

s23, analyzing the network layer protocol header and the transport layer protocol header by using a dpkt library, and extracting the statistical information sequence of each data packet. In this embodiment, the statistics information sequence includes an arrival time interval, a packet direction, a length, and a TCP window size, where the arrival time interval is determined by subtracting arrival times of adjacent packets sequentially, the packet direction is determined by analyzing the IP address to infer server and client identities, and the packet length is determined by the length of the transport layer payload sequence. And performing dimension splicing on the four statistical information sequences through concat operation to obtain a time sequence matrix with the size of 20 multiplied by 4 corresponding to each effective session.

S24, normalizing each element in the content matrix and the time sequence matrix to a [0,1] interval by utilizing the z-score, and dividing the content matrix and the time sequence matrix into a training set and a testing set according to the ratio of 8:2 through a train-test-split function, wherein the training set and the testing set are used for training and testing a hierarchical attention network, a time sequence circulation network and a multi-mode fusion network.

Wherein, the z-score normalization formula is:

wherein, mu, sigma are the mean and standard deviation, respectively.

S3-S6 describe the process of encrypting and classifying the flow data preprocessed by S1-S2 by using an encrypting flow classification model, and the whole structure of the encrypting flow classification model is shown in FIG. 3.

The hierarchical attention network is constructed in S3 to extract content modality features from the packet level to the session level in sequence, and the network architecture is shown as a hierarchical attention network module in fig. 3.

S31, sequentially iterating each row of the content matrix by using a distributed one-dimensional convolution module, and primarily extracting the content characteristics of each data packet payload, wherein the distributed one-dimensional convolution consists of two convolution layers and a maximum pooling layer, and the output shape of each distributed one-dimensional convolution module is 64 multiplied by 500;

and S32, sequentially giving attention weight to the content feature sequences output in the step S31 by using a distributed attention mechanism, sequentially carrying out weighted summation on sequence points on each content feature sequence, converting the sequence points into a high-purity content feature vector form, and outputting the content feature vector form with the output shape of 1 multiplied by 64. The original length 500 content feature sequence is reduced into a length 1 vector through an attention mechanism, and the parameter operation amount is effectively reduced. The calculation formula of the attention weight is as follows:

wherein u is _p As a trainable parameter, u _i For the linear projection of the content feature sequence, exp is an exponential operation based on e;

s33, stacking each content feature vector obtained in the S32 into a content feature sequence of a session level, and modeling the content feature of the session level by utilizing a bidirectional GRU module to obtain an output content mode feature, wherein the output shape is 20 multiplied by 100, and the dimensions respectively correspond to the length of the input sequence and the number of neurons of the bidirectional GRU.

And S4, constructing a time sequence circulation network to extract time sequence modal characteristics, wherein the time sequence circulation network is constructed as shown in a time sequence circulation network module of FIG. 3 and comprises a time convolution network module and a plurality of bidirectional GRU modules which are sequentially connected.

S41, inputting the time sequence matrix into a time convolution network TCN module preliminarily, and carrying out short-term feature extraction through multi-layer expansion causal convolution to construct identity mapping. Setting three layers of expansion coefficients to be increased by powers of 2, sequentially carrying out causal convolution from convolution kernel sizes of 2,3 and 5, and combining causality and expansibility to ensure a short-distance perception field.

Wherein, the causal convolution formula of expansion is:

where l is the input timing matrix, d is the expansion coefficient, and f is the convolution kernel of size k.

S42, capturing long-term characteristics of the time sequence matrix from the front direction and the back direction by utilizing the bidirectional GRU module, and obtaining the output time sequence modal characteristics, wherein the shape is 20 multiplied by 64.

And S5, adopting a multi-mode fusion network to fuse the content mode characteristics and the time sequence mode characteristics, performing dimension splicing on two equal-length mode characteristic sequences in the sequence dimension, and completing finer-grain mode fusion in a point-to-point mode, wherein the fused shape is 20 multiplied by 164. And then, the characteristics in each mode are adaptively extracted by utilizing a high-speed network, and important characteristics in each mode are further screened and filtered through a training gating mechanism, so that high-level multi-mode characteristics are obtained.

S6, based on the high-level multi-mode characteristics obtained in the S5, dimension reduction is performed through the flat operation, the dimension reduction is performed, the full-connection output layer is input, the classification probability is mapped through the softmax function, and the classification label of the encrypted flow is obtained.

The encryption traffic classification method based on multi-mode learning acquires encryption traffic original data and generates rich heterogeneous information representation for an effective session; by constructing the payload mode and the statistical information mode, important characteristics of the encrypted traffic are extracted in multiple aspects, and classification performance of the encrypted traffic is effectively improved.

In order to verify the effectiveness of the encryption traffic method based on multi-mode learning, the embodiment selects the public encryption traffic data set ISCXVPN2016 for verification, and selects other prior art methods for comparison. Based on the samples in the dataset, three sets of experimental scenarios of different fine granularity were set: the first set of experiments is to distinguish encryption types of traffic; the second set of experiments is to distinguish the application type that generated the encrypted traffic; the third set of experiments was to distinguish the functional types of traffic. The three groups of experimental conditions all adopt a ten-fold cross validation mode, and mainly adopt two evaluation indexes: accuracy and F1 fraction, and experimental results are shown in Table one. The classification accuracy of the classification method in three groups of experimental scenes is over 99 percent, wherein the accuracy of the classification of the encryption type II and the F1 fraction reach 99.86 percent. Compared with other technical methods in the first table, the encryption traffic classification accuracy and the F1 score are the highest.

Table one: the method of the invention classifies the situation on public encrypted traffic data set

The above functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing describes in detail preferred embodiments of the present invention. It should be understood that numerous modifications and variations can be made in accordance with the concepts of the invention by one of ordinary skill in the art without undue burden. Therefore, all technical solutions which can be obtained by logic analysis, reasoning or limited experiments based on the prior art by a person skilled in the art according to the inventive concept shall be within the scope of protection defined by the claims.

Claims

1. The encrypted traffic classification method based on multi-modal learning is characterized by comprising the following steps:

2. The encrypted traffic classification method based on multi-modal learning according to claim 1, wherein S1 comprises the steps of:

s13, cleaning flow data, and filtering out a plaintext domain name resolution session;

3. The encrypted traffic classification method according to claim 2, wherein the five-tuple structure is expressed as < source IP address, destination IP address, source port, destination port, transport layer protocol >, wherein the source and destination directions are interchangeable.

4. The encrypted traffic classification method based on multi-modal learning according to claim 1, wherein the step S2 comprises the steps of:

5. The encrypted traffic classification method based on multi-modal learning according to claim 1, wherein the hierarchical attention network comprises a distributed one-dimensional convolution module, a distributed attention mechanism module and a bidirectional GRU module facing a data packet sequence, which are sequentially connected, and the content modal feature extraction based on the hierarchical attention network specifically comprises the following steps:

6. The encrypted traffic classification method based on multi-modal learning according to claim 1, wherein the time-series circulation network comprises a time-series convolution network module and a bidirectional GRU module which are sequentially connected, and extracting time-series modal characteristics based on the time-series circulation network specifically comprises the following steps:

7. The encrypted traffic classification method based on multi-modal learning according to claim 1, wherein the multi-modal fusion network in S5 considers that the content modal feature and the time sequence modal feature are both in a sequence form, performs dimension splicing on each sequence point by using a point-to-point fusion mode, and adaptively extracts multi-modal representation by using a high-speed network to obtain high-level multi-modal features.

8. The encrypted traffic classification method based on multi-modal learning according to claim 1, wherein S6 is specifically: the high-level multi-mode features are subjected to dimension reduction through the flat operation and input into a full-connection output layer, and are mapped into classification probabilities through a softmax function, so that classification labels of the encrypted traffic are obtained, and the encrypted traffic is classified.

9. An encrypted traffic classification device based on multimodal learning, comprising a memory, a processor, and a program stored in the memory, wherein the processor implements the method of any of claims 1-8 when executing the program.

10. A storage medium having a program stored thereon, wherein the program, when executed, implements the method of any of claims 1-8.