CN115883263B

CN115883263B - Encryption application protocol type identification method based on multi-scale load semantic mining

Info

Publication number: CN115883263B
Application number: CN202310189712.1A
Authority: CN
Inventors: 吉庆兵; 谈程; 罗杰; 潘炜; 康璐; 倪绿林; 尹浩
Original assignee: CETC 30 Research Institute
Current assignee: CETC 30 Research Institute
Priority date: 2023-03-02
Filing date: 2023-03-02
Publication date: 2023-05-09
Anticipated expiration: 2043-03-02
Also published as: CN115883263A

Abstract

The invention provides an encryption application protocol type identification method based on multi-scale load semantic mining, which comprises the following steps: step 1, extracting load characteristics from original flow and converting the load characteristics into a decimal byte sequence; step 2, constructing a pyramid neural network based on the load semantic mining block, and processing a decimal byte sequence to obtain an input feature sequence; step 3, a load semantic mining block constructs a sliding window on the input feature sequence, the sliding window sequentially moves to the tail end of the sequence, and features extracted from the splicing window obtain features of the input sequence; step 4, performing dimension reduction on the characteristics of the input sequence to serve as a new input sequence, repeating the steps 3-4, and splicing the characteristics obtained each time to obtain multi-scale characteristics; and 5, finishing classification of the encrypted network application protocol types according to the multi-scale characteristics. The method and the device can extract the multi-scale characteristics in the encrypted network application protocol message under the complex scene, and improve the speed and the accuracy of encrypted flow identification.

Description

Encryption application protocol type identification method based on multi-scale load semantic mining

Technical Field

The invention relates to the field of flow analysis, in particular to an encryption application protocol type identification method based on multi-scale load semantic mining.

Background

Traffic classification has found very widespread application, which is the basis for network security and network management, from QoS services in network service providers to detection of security applications in firewalls and intrusion detection systems. At present, methods based on port numbers, deep packet inspection, machine learning and the like are mainly adopted for traffic classification, but certain disadvantages exist:

(1) Traditional port number based approaches have failed since newer applications either use well known port numbers to mask their traffic or do not use standard registered port numbers.

(2) The way in which deep packet inspection relies on finding keys in the packets, which can fail in the face of encrypted traffic.

(3) Encryption network traffic identification methods based on machine learning rely heavily on ergonomic features, which limit their popularity.

With the popularity of deep learning methods, researchers have studied the effects of these methods on traffic classification tasks and demonstrated higher accuracy on early mobile application traffic datasets. With the continuous upgrading of encryption protocols, explosive growth of the number of mobile applications and the change of development modes of the mobile applications, a shallow deep learning model cannot meet the actual requirements of mobile application flow identification in the current complex scene. Although the encryption traffic identification method based on the transducer has a better effect on feature learning, global features are focused more in the feature extraction process, detail features hidden in high-fraction load data are ignored, and the local features are key for realizing accurate classification in many cases.

Disclosure of Invention

In order to solve the problem that deep features in encrypted traffic cannot be learned by a shallow neural network in the current complex scene and detail features are lost due to the fact that global features are excessively focused by the existing deep neural network, the invention provides a novel encrypted network application protocol type identification method.

The technical scheme adopted by the invention is as follows: the encryption application protocol type identification method based on multi-scale load semantic mining comprises the following steps:

step 1, preprocessing the original traffic of a mobile application encryption network, extracting the load characteristics of a transmission layer load, and converting the load characteristics into a decimal byte sequence;

step 2, constructing a pyramid neural network based on a load semantic mining block, and acquiring word embedding features and position coding features of a decimal byte sequence, wherein the word embedding features and the position coding features are added to obtain an input feature sequence;

step 3, a load semantic mining block constructs a sliding window on the input feature sequence, the sliding window sequentially moves until the tail end of the input sequence, the features in the sliding window are extracted when each movement is performed, and the features extracted in all the sliding windows are sequentially spliced to obtain the features of the input sequence;

step 4, performing feature compression and dimension reduction on the features of the input sequence to serve as a new input sequence, repeating the steps 3-4 and k times, and splicing the features of the input sequence obtained each time to obtain multi-scale features of the input sequence;

and 5, finishing classification of the encrypted network application protocol types according to the multi-scale characteristics.

Further, the pretreatment process in the step 1 is as follows:

step 1.1, dividing a data packet into session flows according to five-tuple;

step 1.2, cleaning the session stream, and removing data packets retransmitted over time, address resolution protocol and dynamic host configuration protocol;

step 1.3, extracting load characteristics of a transmission layer load in a data packet, and splicing the extracted load characteristics according to the arrival sequence of the data packet until the length of bytes after splicing reaches the set load characteristic length;

and step 1.4, converting the extracted spliced load characteristics into a decimal byte sequence.

Further, in the step 1.3, if the byte length after the payload features of all the data packets in the session stream are spliced is still smaller than the set payload feature length, the padding is performed with 0X 00.

Further, in the step 2, the byte characteristic of the decimal byte sequence is mapped to the vector space of d dimension to obtain the word embedding characteristic F1,

where R represents a real number in the matrix.

Further, in the step 2, the method for calculating the position coding feature is as follows:

（1）

（2）

（3）

where pos represents the position where the byte appears in the byte sequence, (1) left of formula

Position coding of bytes representing even positions, (2) left +.>

Representing the position of bytes in odd positionsCoding (I)>

I is the modulus of the dimension subscript pair 2 of the position code, (1) represents the even position with +.>

(2) formula represents odd positions +.>

，/>

Dimension for position coding->

Is a position coding feature, in the formula (3)>

Representing the position encoding of each byte in the byte sequence.

Further, the substep of the step 3 includes:

step 3.1, constructing a sliding window with the length of L bytes on an input sequence;

step 3.2, extracting features of the data in the sliding window by adopting a multi-head attention mechanism to obtain features F4;

step 3.3, carrying out residual connection and layer normalization processing on the input sequence F3 and the characteristic F4 to obtain a characteristic F5;

step 3.4, performing two-layer full-connection layer operation on the feature F5 to obtain a feature F6;

step 3.5, carrying out residual connection and layer normalization processing on the characteristic F5 and the characteristic F6 to obtain a characteristic F7;

step 3.6, the sliding window moves backwards by L bytes, and the steps 3.2-3.6 are repeated until the sliding window moves to the tail end of the input sequence;

and 3.7, splicing the features F7 in all the sliding windows to obtain features F8 serving as features of the input sequence.

Further, the substeps of the step 3.2 are as follows:

step 3.2.1, performing multi-head self-attention calculation on the data in the sliding window, and extracting the association relation of byte sequences in the window;

and 3.2.2, repeating the step 3.2.1 for M times according to the set attention head number M, and performing splicing and linear transformation on the extracted result each time to obtain the characteristic F4 of the data in the sliding window.

Further, in the step 4, the feature compression and dimension reduction are completed by adopting a one-dimensional maximum pooling layer, and each pooling operation halves the dimension of the first dimension of the feature.

Further, the substep of step 5 includes:

step 5.1, inputting the extracted multi-scale features into a full-connection layer and an activation function, wherein the output dimension is consistent with the number of flow categories;

and 5.2, calculating the category of the encrypted network application protocol type according to the output.

Further, in the step 5.2, the specific calculation method of the category is:

where Z represents the output of the activation function and the multi-scale feature input fully-connected layer.

Compared with the prior art, the beneficial effects of adopting the technical scheme are as follows:

1. the pyramid network constructed based on the load semantic mining block can extract multi-scale features in the message type of the encryption network application protocol under the current complex scene, and fully extract global features and multi-scale local features, so that the accuracy of encryption traffic identification is improved.

2. When the local features are extracted, a sliding window mode is adopted, and each self-attention calculation is performed in a window coverage range, so that noise is prevented from being introduced when the local features are extracted, model parameters are greatly reduced, and the calculation speed of a model is improved.

3. The method is based on the load data on the transmission layer in the network traffic data to learn and classify, and does not depend on the IP address and port number information of the head of the network traffic data packet, so that the generalization capability of the classification model is strong; the strong identification information such as the IP address and the port number information of the header of the network traffic data packet has no universality, and may cause strong interference to the final identification result.

Drawings

Fig. 1 is a flowchart of an encryption application protocol type identification method based on multi-scale load semantic mining.

Fig. 2 is a schematic diagram of a pyramid network model according to an embodiment of the invention.

FIG. 3 is a flow chart of a sliding window implementation in an embodiment of the invention.

FIG. 4 is a schematic diagram of multi-scale feature extraction in accordance with one embodiment of the invention.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar modules or modules having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the present application include all alternatives, modifications, and equivalents as may be included within the spirit and scope of the appended claims.

Aiming at the problems that a shallow neural network cannot learn deep features in encrypted traffic under the current complex scene and detail features are lost due to the fact that the existing deep neural network pays attention to global features excessively, the embodiment provides an encryption application protocol type identification method for extracting multi-scale features based on load semantic mining deep neural network.

As shown in fig. 1, the encryption application protocol type identification method based on multi-scale load semantic mining includes:

step 2, constructing a pyramid neural network based on the load semantic mining block; acquiring word embedding characteristics and position coding characteristics of a decimal byte sequence, and adding the word embedding characteristics and the position coding characteristics to obtain an input characteristic sequence;

Since the pre-identification information such as the IP address and the port number information of the network traffic packet header is not universal, strong interference may be caused to the identification result, in this embodiment, learning and classification are performed based on the presence and the data on the network traffic data transmission layer, and the pre-identification information is not dependent on the IP address, the port number, and the like of the network traffic packet header.

Before parsing, the original flow needs to be preprocessed, specifically:

step 1.1, dividing the received data packet into session flows according to five-tuple (source IP, destination IP, source port, destination port, transport layer protocol), and identifying the flows by taking the session flows as units.

In step 1.2, packets that are not related to the specific traffic of the transmission content exist in the received packets, so that the session stream needs to be cleaned, and packets that are retransmitted over time, address Resolution Protocol (ARP), and Dynamic Host Configuration Protocol (DHCP) are removed. In this embodiment, the cleaning is accomplished using a wireshare Tshark tool.

And 1.3, after the irrelevant data packets are removed, extracting the load characteristics of the transmission layer load of the rest data packets, and extracting the load characteristics of the transmission layer load according to the arrival sequence of the data packets to splice until the length of the extracted bytes reaches the set load characteristic length N. It should be noted that, in this embodiment, if the byte length after the concatenation of the payload features of all the packets in the session stream is smaller than N, 0X00 is used for padding.

Preferably, the load characteristics of the transport layer load are extracted by using the rdpcap method of the Scapy tool in this embodiment.

And 1.4, converting the extracted and spliced binary load characteristics into a decimal byte sequence, namely converting each byte into a corresponding decimal number (0-255).

After obtaining the decimal byte sequence representing the transmission layer characteristics, the analysis of the traffic class can be started, and in this embodiment, features of different scales in the load (decimal byte sequence) are extracted by using the constructed Pyramid-shaped neural network (Pyramid-Transformer).

The current encryption traffic recognition model based on a transform (a deep learning framework) uses a self-attention mechanism to more attention to the extraction of global features and neglects the extraction of local features, which may be the key for realizing fine classification, and meanwhile, the phenomenon that the local features have inconsistent scales may exist, so that interference may exist in the extraction process.

As shown in fig. 2 and 4, in the embodiment, in step 2, a Pyramid-shaped neural network (pyremid-Transformer) constructed based on a plurality of load semantic mining blocks (Pyramid Transformer block) is provided, and a one-dimensional maximum pooling layer is arranged between each load semantic mining block, so as to realize compression and dimension reduction in the feature extraction process. Each load semantic mining block has the same composition and comprises six parts of multi-head attention calculation, residual error connection, layer normalization, two layers of full-connection layer+activation functions, residual error connection and layer normalization which are connected in sequence. Deep multi-scale features are extracted through a mode that a plurality of load semantic mining blocks are stacked, feature dimensions are compressed to 1/2 after each load semantic mining block extracts the features, the compressed features are input into the next load semantic mining block, the window size is unchanged, features with larger scale are extracted through the mode, the feature dimensions extracted by each load semantic mining block are reduced to form a pyramid shape, and the features are spliced to obtain final features.

Specific explanation is given to the process of realizing feature extraction of the pyramid neural network:

feature extraction is mainly completed through a load semantic mining block in the pyramid neural network, and the input of the load semantic mining block is the combination of word embedding features and position coding features of a byte sequence, so that a decimal byte sequence needs to be processed first.

For byte sequences (in figures 2, 4, areB1、B2、…、BN-1、BN-2) Performing word embedding operation, mapping byte features to d-dimensional vector space to obtain word embedded features F1 as subsequent input,

where R represents a real number in the matrix.

Calculating position-coding features of byte sequences

，/>

Wherein R represents real numbers in the matrix:

（1）

（2）

（3）

Position coding of bytes representing even positions, (2) left +.>

Position coding of bytes representing odd positions, < >>

(2) formula represents odd positions +.>

，/>

Dimension for position coding; (3) In->

Representing the position encoding of each byte in the byte sequence. Because the transducer uses global information, byte order information cannot be utilized, which is important for feature learning, the present embodiment acquires position-coded features.

Combining the word embedding feature and the position coding feature according to the formula (4) to obtain the input feature of the load semantic mining block

，/>

Where R represents a real number in the matrix.

（4）

After the input of the load semantic mining block is determined, the feature extraction can be performed through the load semantic mining block, and the method specifically comprises the following steps:

in step 3.1, because some detail features only exist on a small number of adjacent bytes, the direct feature extraction of the whole input sequence may cause interference to local detail features, and a sliding window is used to ensure that the high-resolution local detail features are not destroyed. Thus at the input feature

A sliding window with the size of L is constructed, and as shown in fig. 3, feature extraction is performed on the data inside the window.

Step 3.2, acquiring the data in the sliding window as

，/>

Adopts a multi-head attention mechanism pair +.>

Extracting features to obtain features->

，/>

，/>

The global dependency of the bytes within the window is contained, whereas the view of the entire byte sequence is obtained here as a local feature within the window. />

The specific process comprises the following steps:

step 3.2.1, pair

Performing multi-head self-attention calculation, and extracting correlation relation of byte sequences in windowThe system is as follows:

using a weight matrix

，/>

，/>

Computing features->

Is->

The calculation process is shown as the formula (5), the formula (6) and the formula (7):

（5）

（6）

（7）

by passing through

The matrix operation of (a) implements a self-Attention mechanism (Attention), resulting in an output +.>

，/>

：

（8）

Wherein,

is->

The column number of the matrix, i.e. the vector dimension, and +.>

Same (I)>

Transpose the matrix. Calculating matrix +.>

And->

The inner product of each row vector is divided by +.>

. After the transpose of Q multiplied by K, the obtained matrix row and column numbers are L, wherein L is the window size, and the matrix can represent the association strength between bytes. Obtain->

Afterwards, use +.>

The function (normalized exponential function) calculates the self-attention coefficient of each byte for the other bytes, in the formula +.>

Each row of the matrix is normalized, i.e. the sum of each row becomes 1.

Step 3.2.2, setting the number M of attention heads, repeating the step 3.2.1M times to obtain M output Z, and splicing and linearly transforming the M Z to obtain the characteristic

，/>

：

Wherein,

output representing the first calculation, +.>

Indicate->

Output of the secondary calculation, +.>

Weight matrix representing a linear transformation, +.>

。

Step 3.3, pair

And->

Performing residual connection and layer normalization operation to obtain characteristic ∈>

：

（9）

Wherein LayerNorm represents the layer normalization operation.

Step 3.4, pair

Performing a Forward propagation (Feed Forward) operation to obtain the characteristic +.>

，/>

。

（10）

Wherein, linear represents performing a full-connection layer operation; feed Forward consists of two fully connected layers, the first layer using an activation function RELU and the second layer not using an activation function.

Step 3.5, pair

And->

，/>

：

（11）

Step 3.6, moving the sliding window backwards by L bytes, and re-executing the steps 3.2-3.5 in the new window until the sliding window moves to the input feature

Ending;

step 3.7, feature obtained in each sliding window

Splicing to obtain->

，/>

：/>

（12）

Wherein,

representing the characteristics obtained for the first window +.>

，/>

Representing the feature obtained for the last window->

。

In order to extract the multi-scale features of the byte sequence, in step 4 of this embodiment, one-dimensional maximum pooling layer pair features are first used

Performing feature compression and dimension reduction to obtain feature ∈F>

，/>

：

（13）

The MaxPool1d represents one-dimensional maximum pooling operation, the pooling operation halves the dimension of the first dimension of the feature, and meanwhile, the new feature has more abundant semantic information.

Setting the repetition number k according to the requirement, repeating the steps 3-4 and k times except for inputting the characteristic in the process of executing the step 3 for the first time

In addition, when step 3 is performed subsequently, the feature obtained in the last step 4 is +.>

As an input for this time.

Will be repeatedly executed each time to obtainFeatures of (2)

Splicing to obtain characteristic->

：

（14）

As shown in fig. 4, the repeated operation represents stacking the load semantic mining blocks of the pyramid network model multiple times, progressively extracting features of deeper and higher semantics layer by layer, in fig. 4, the feature dimension is represented by N, d, N is the same as the length of the input byte sequence, and d is the same as the dimension of each byte extension after the word embedding operation. Wherein,

characteristic of the first repetition>

，/>

Characteristic of the kth operation +.>

。

The characteristics obtained at this time

I.e., multi-scale features in the desired load. After the multi-scale characteristics are obtained, flow classification can be performed:

in this embodiment, the classification process specifically includes:

step 5.1, extracting the multiscale features

Input fully connected layer and activation function>

Output ofDimension and number of traffic categories +.>

And consistent.

（15）

Wherein,

weight matrix representing fully connected layer, +.>

，/>

Step 5.2, calculating and outputting the category of the encrypted network application protocol type:

the embodiment constructs a deep neural network, namely a pyramid neural network, and stacks the load semantic mining blocks, so that deep features in the type of the encryption protocol message in the current complex scene can be extracted, and the accuracy of flow identification is improved.

It should be noted that, in the description of the embodiments of the present invention, unless explicitly specified and limited otherwise, the terms "disposed," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; may be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in the present invention will be understood in detail by those skilled in the art; the accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. The encryption application protocol type identification method based on multi-scale load semantic mining is characterized by comprising the following steps:

step 4, performing feature compression and dimension reduction on the features of the input sequence to serve as a new input sequence, repeating the steps 3-4 and k times, and performing splicing processing on the features of the input sequence obtained in each repeated step 3 to obtain multi-scale features of the input sequence;

step 5, completing classification of the encrypted network application protocol types according to the multi-scale characteristics;

the substep of the step 3 comprises the following steps:

step 3.1, constructing a sliding window with a length of L bytes on an input characteristic sequence;

2. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 1, wherein the preprocessing process in step 1 is as follows:

step 1.1, dividing a data packet into session flows according to five-tuple;

3. The encryption application protocol type recognition method based on multi-scale payload semantic mining according to claim 2, wherein in the step 1.3, if the byte length after the payload features of all the data packets in the session stream are spliced is still smaller than the set payload feature length, the byte length is filled with 0X 00.

4. The encryption application protocol type recognition method based on multi-scale payload semantic mining according to claim 1 or 2, wherein in step 2, byte characteristics of the decimal byte sequence are mappedInjecting into the vector space of d dimension to obtain word embedding feature F1,

where R represents a real number in the matrix.

5. The encryption application protocol type recognition method based on multi-scale load semantic mining according to claim 4, wherein in the step 2, the position coding feature calculation method is as follows:

（1）/>

（2）

（3）

Position coding of bytes representing even positions, (2) left +.>

Position coding of bytes representing odd positions, < >>

(2) formula represents odd positions +.>

，/>

Dimension for position coding->

Is a position coding feature, in the formula (3)>

Representing the position encoding of each byte in the byte sequence.

6. The encryption application protocol type identification method based on multi-scale payload semantic mining according to claim 1, wherein the substeps of step 3.2 are:

7. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 1, wherein in the step 4, feature compression and dimension reduction are completed by adopting a one-dimensional maximum pooling layer, and each pooling operation halves the dimension of a first dimension of a feature.

8. The encryption application protocol type recognition method based on multi-scale payload semantic mining according to claim 1, wherein the substep of step 5 includes:

9. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 8, wherein in the step 5.2, the specific calculation method of the category is:

wherein,

representing class, Z represents the output of the multi-scale feature input fully connected layer and activation function. />