CN115277888B

CN115277888B - Method and system for analyzing message type of mobile application encryption protocol

Info

Publication number: CN115277888B
Application number: CN202211171000.9A
Authority: CN
Inventors: 吉庆兵; 罗杰; 潘炜; 倪绿林; 谈程; 康璐
Original assignee: CETC 30 Research Institute
Current assignee: CETC 30 Research Institute
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2023-01-31
Anticipated expiration: 2042-09-26
Also published as: CN115277888A

Abstract

The invention relates to the technical field of message analysis, and discloses a method and a system for analyzing the type of a mobile application encryption protocol message. The invention solves the problems of high resource consumption, poor universality, low accuracy, poor generalization capability and the like in the prior art.

Description

Method and system for analyzing message type of mobile application encryption protocol

Technical Field

The invention relates to the technical field of message analysis, in particular to a method and a system for analyzing message types of a mobile application encryption protocol.

Background

Network traffic tends to be trended towards the comprehensive encryption era, the encryption technology can guarantee the safety of data transmission in network communication, but it cannot be denied that malicious behaviors such as malicious software, illegal speech, network attack and the like are hidden in network mobile application encryption traffic, and serious threats are brought to users using the internet. The method is an important precondition for information monitoring, safety detection and electronic evidence collection, and has important significance for maintaining healthy and green network environment, national safety and social stability.

The traditional methods of port matching and deep packet inspection need to analyze the message content first and then identify the message type through regular matching, but these pair encryption protocol messages are faced with failure. The method for using machine learning needs to design artificial features of a message to be identified, which consumes a lot of time and energy, and in the face of a plurality of application programs and encryption protocols with differences, it is difficult to design a feature set which generally reflects traffic features, which limits the universality of the machine learning method, and thus, when the machine learning method is used for analyzing and identifying encrypted network traffic, a better effect is difficult to obtain.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a method and a system for analyzing the message type of a mobile application encryption protocol, which solve the problems of high resource consumption, poor universality, low accuracy, poor generalization capability and the like in the prior art.

The technical scheme adopted by the invention for solving the problems is as follows:

a method for analyzing the type of a mobile application encryption protocol message extracts and learns different modal characteristics of the mobile application encryption protocol message, and realizes the type analysis of the encryption protocol message by fusing the different modal characteristics.

As a preferable technical scheme, the method comprises the following steps:

s1, preprocessing message data: preprocessing the acquired mobile application network flow original data, and extracting structural feature data, time sequence feature data and interactive feature data of message loads in the original data;

s2, feature learning, which specifically comprises the following steps:

S2A, learning message structure characteristics: building a mobile application private encryption protocol message structure feature learning model based on a dynamic pooling convolutional neural network by using the structure feature data, and learning to obtain a message load structure feature vector;

S2B, learning message time sequence characteristics: constructing a mobile application private encryption protocol message time sequence characteristic learning model based on a long-time and short-time memory network by using the time sequence characteristic data, and learning to obtain a message load time sequence characteristic vector;

S2C, learning message interaction characteristics: constructing a mobile application private encryption protocol message interactive feature learning model based on a graph convolution neural network by using interactive feature data, and learning to obtain a message session interactive feature vector;

s3, message type analysis: and fusing and splicing the message load structure characteristic vector, the time sequence characteristic vector and the interactive characteristic vector, and outputting an analysis result of the message type of the mobile application private encryption protocol by using a maximum entropy classifier.

As a preferred technical solution, the step S1 includes the following steps:

s11, setting the length of an original network data packet intercepted by preprocessing, segmenting continuous network flow by a session flow, and separating network message load data above a transmission layer of each data packet in the session flow;

s12, distinguishing the uplink and downlink directions of the message load data: defining the uplink direction and the downlink direction of the message load data in the data packets in the session according to the data flow direction, taking the message load data which has the same initial address, destination address and port number as the first data packet as the uplink message load data, and taking the rest as the downlink message load data;

s13, respectively calculating the sizes of the load data in the uplink direction and the downlink direction, and constructing a load data sequence in a hexadecimal form;

s14, splicing the uplink message load data and the downlink message load data to obtain message load structural characteristic data according to a splicing mode that the uplink data is in front of the downlink data;

s15, arranging according to an organization mode of the data packet time sequence to obtain message load time sequence characteristic data;

s16, constructing a feature expression model based on a sequence-to-graph, and converting a data packet sequence in the session flow into an undirected graph; for each data packet in the session flow, extracting the packet direction of the data packet, the standard information entropy of the load data and the load length, and embedding the packet direction of the data packet, the standard information entropy of the load data and the load length as graph node characteristics to obtain message load interaction characteristic data.

As a preferred technical solution, in step S16, the calculation formula of the standard information entropy is:

；

wherein the content of the first and second substances,

the entropy of the standard information is represented,

represent an arbitrary distribution

Discrete random variables of

，

To represent

The number of discrete variables contained in (a),

indicating the sequence number of the bytes in the data packet,

which represents the bytes in the data packet,

representing bytes

In that

The probability of occurrence of (a).

As a preferred technical solution, the step S2A includes the following steps:

S2A1, inputting message load structure feature data into a self-encoder with sparsity constraint conditions and noise robustness constraint conditions to perform anti-noise and dimension-reduction processing, and generating feature vectors after dimension-reduction and anti-noise processing;

S2A2, constructing a mobile application private encryption protocol message structure feature learning model based on a dynamic pooling convolutional neural network; inputting the feature vector subjected to the dimension reduction and noise resistance processing into a constructed message structure feature learning model for learning to obtain a feature sequence subjected to convolution kernel operation;

the message structure feature learning model is constructed as follows:

constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, wherein the message structure characteristic learning model is formed by stacking three layers of one-dimensional convolutions; the filling mode adopts a same mode, and each layer of convolution is accompanied with batch normalization; for each layer of convolution operation, the hidden layer output after one-dimensional convolution is:

；

wherein the content of the first and second substances,

the row numbers of the weight matrix representing the one-dimensional convolution kernel,

column numbers of the weight matrix representing the one-dimensional convolution kernel,

in the weight matrix representing the one-dimensional convolution kernel

Go to the first

The weight value of a column is determined,

which represents the shape of the convolution kernel or kernels,

representing input data

Go to the first

The value of the column is such that,

the total number of rows of data is represented,

which represents the total number of columns entered,

the shape of the input is represented by,

to represent the output

A value of each position;

after the convolution kernel operation, a plurality of characteristic sequences are obtained for each input data, and the characteristic vector output by the last layer of convolution is set as:

；

wherein the content of the first and second substances,

representing feature vectors

Each element of (1);

S2A3, for the feature vector output by the last layer of convolution, taking k-max boosting as a nonlinear down-sampling function, and extracting the feature vector by utilizing nonlinear function dynamic pooling operation to obtain a message load structure feature vector;

the dynamic pooling operation is as follows:

；

wherein the content of the first and second substances,

the structural characteristics of the message are represented,

indicates the number of all the convolutional layers,

indicates the number of layers of the current convolutional layer,

which indicates the length of the input sequence,

representing fixed pooling layer parameters.

As a preferred technical solution, in step S2A1, the cost function of the sparsity constraint condition is:

；

wherein the content of the first and second substances,

a cost function representing a sparsity constraint,

representing the input from the encoder, and,

the sparsity constraint is expressed in terms of,

the weight representing the sparsity constraint is then determined,

which represents the expectation of the total noise,

representing the number of implicit layers in the self-encoder,

,

representing gaussian noise with a mean of 0 and a variance of 1,

representing a neural network

The layer is input into the device body,

indicating hidden layer element number，

The number of hidden layer neurons is represented,

representing the hidden layer response;

the cost function of the noise robustness constraint is:

；

wherein, the first and the second end of the pipe are connected with each other,

a cost function representing a noise robustness constraint,

the target output is represented by a target output,

representing the output from the encoder learning network,

which is indicative of an activation factor,

、

a number representing two input data is shown,

representing input data from

To input data

Is connected toAnd (6) weighting.

As a preferred technical solution, the step S2B includes the following steps:

S2B1, constructing a mobile application private encryption protocol message load time sequence characteristic learning model based on a long-time memory network, wherein the message load time sequence characteristic learning model comprises JI memory units, JI is an integer and 32 is more than or equal to JI and less than or equal to 256, and learning message load time sequence characteristic data by using the constructed message load time sequence characteristic learning model, wherein the learning formula is as follows:

；

wherein the content of the first and second substances,

a function of a gate unit is represented,

、

、

respectively representing a forgetting gate, an input gate or an output gate,

it is shown that the activation function is,

a parameter corresponding to a forgetting gate, an input gate or an output gate,

indicating the time of day

The input of (a) is performed,

indicating the time of day

Is then outputted from the output of (a),

a bias value representing a forgetting gate, an input gate, or an output gate;

S2B2, obtaining a time sequence characteristic vector output as a message load, wherein the output formula is as follows:

；

wherein the content of the first and second substances,

representing the time sequence characteristic vector of the message load,

it is shown that the activation function is,

a vector of the states of the cells is represented,

the tan h activation function is expressed as,

a parameter indicative of the output gate is provided,

indicating the bias.

As a preferred technical solution, the step S2C includes the steps of:

S2C1, constructing a mobile application private encryption protocol message session interactive feature learning model based on a graph convolution neural network, wherein the session interactive feature learning model comprises two graph convolution layers which are sequentially connected, setting the number of channels of two graph convolutions when graph convolution operation is carried out, and activating a function to select a ReLU function;

inputting the message load interactive characteristic data into a graph convolution neural network model, and converting the graph into a sequence-to-graph method

(ii) a Wherein, the number of the network data packets of the graph is

Each node contains a characteristic number of packets of

The feature matrix is

The adjacency matrix is

；

And S2C2, performing graph convolution operation by using the learning model constructed in the step S2C1, wherein the graph convolution operation comprises the following steps of:

；

，

the unit matrix is represented by a matrix of units,

to represent

A corresponding matrix of degrees is formed,

representing a networkThe number of layers is equal to or greater than the number of layers,

is shown as

Layer weight, the dimension of the weight is

，

Represents passing through

The dimensions of the graph node data after the layer convolution,

is shown as

The bias of the layers is such that,

is shown as

Input of the layer, the input of the first layer being

，

Representing a nonlinear activation function ReLU function;

S2C3, obtaining one after two-layer graph convolution operation

Using the Flatten operation to stretch the matrix into one-dimensional eigenvectors

Obtaining:

；

wherein the content of the first and second substances,

representing the interactive feature vector of the messaging session,

has the dimension of

，

Representing each element in the message session interaction feature vector;

S2C4, compressing by using a layer of full connection layer pair, reducing dimensionality, and learning to obtain a message load session feature vector:

；

wherein the content of the first and second substances,

a feature vector representing the message payload session,

a weight matrix representing the fully-connected layer,

the offset is represented by the number of bits in the bit,

it is shown that the activation function is,

the ReLU function is used at the fully connected layer.

As a preferred technical solution, the step S3 includes the following steps:

s31, performing integrated learning and combined training on the message structure characteristic learning model, the message time sequence characteristic learning model and the message interaction characteristic learning model, and setting a hyper-parameter during model combined training; and performing feature fusion splicing on the obtained message load structure feature vector, the message load time sequence feature vector and the message session interaction feature vector, and connecting to obtain:

；

wherein the content of the first and second substances,

representing message session multimode fusion feature vectors;

s32, calculating through a second full connection layer and a softmax activation function thereof:

；

wherein the content of the first and second substances,

a weight matrix representing the second fully-connected layer,

the offset is represented by the number of bits in the bit,

the length representing the number of categories to be classified,

is a one-dimensional vector;

s33, finally calculating and outputting the message type of the private encryption protocol of the mobile applicationAnalysis result

：

；

Wherein the content of the first and second substances,

indicating the corresponding sequence number of the belonging category.

A mobile application encryption protocol message type analysis system is based on the mobile application encryption protocol message type analysis method and comprises the following modules:

a message data preprocessing module: the method comprises the steps of preprocessing acquired mobile application network flow original data, and extracting structural feature data, time sequence feature data and interactive feature data of message loads in the original data;

the message structural feature learning module: the method comprises the steps of constructing a mobile application private encryption protocol message structure feature learning model based on a dynamic pooling convolutional neural network by using structure feature data, and learning to obtain a message load structure feature vector;

a message time sequence characteristic learning module: the method comprises the steps of constructing a mobile application private encryption protocol message time sequence characteristic learning model based on a long-time and short-time memory network by using time sequence characteristic data, and learning to obtain a message load time sequence characteristic vector;

the message interaction feature learning module: the method comprises the steps of constructing a mobile application private encryption protocol message interactive feature learning model based on a graph convolution neural network by using interactive feature data, and learning to obtain a message session interactive feature vector;

a message type analysis module: the message type analysis method is used for fusing and splicing the message load structure characteristic vector, the time sequence characteristic vector and the interaction characteristic vector, and outputting an analysis result of the message type of the mobile application private encryption protocol by using a maximum entropy classifier;

the input ends of the message structure characteristic learning module, the message time sequence characteristic learning module and the message type analysis module are respectively and electrically connected with the output end of the message data preprocessing module, and the output ends of the message structure characteristic learning module, the message time sequence characteristic learning module and the message type analysis module are respectively and electrically connected with the input end of the message type analysis module.

Compared with the prior art, the invention has the following beneficial effects:

(1) The invention can accurately identify the message types of various network mobile application private encryption protocols, thereby improving the supervision efficiency and the supervision strength of network space safety;

(2) The invention is based on the load data above the transmission layer in the network flow data to learn and classify, does not depend on the IP address and port number information of the head of the network flow data packet, and the generalization capability of a classification model is strong;

(3) The invention carries out data set sampling test in a complex network environment, and the detection result is more in line with the requirement under a real network environment.

Drawings

Fig. 1 is a schematic diagram illustrating steps of a method for parsing a mobile application encryption protocol packet type according to the present invention;

fig. 2 is a schematic structural diagram of a mobile application encryption protocol message type parsing system according to the present invention;

FIG. 3 is a schematic diagram of a mobile application encryption protocol message type parsing framework for multi-mode feature fusion learning according to the present invention;

FIG. 4 is a diagram of a process for converting a sequence of packets to the session characteristics of the mobile application private encryption protocol packet of the figure;

FIG. 5 is one of exemplary graphs of a mobile application session data sequence to graph conversion result;

FIG. 6 is a second exemplary graph of a mobile application session data sequence to graph conversion result;

FIG. 7 is a third exemplary graph of a mobile application session data sequence to graph conversion result;

FIG. 8 is a fourth exemplary graph of a mobile application session data sequence to graph conversion result;

FIG. 9 is a fifth exemplary diagram of a mobile application session data sequence to graph conversion result;

FIG. 10 is a sixth exemplary graph of a mobile application session data sequence to graph conversion result;

FIG. 11 is a seventh exemplary graph of a mobile application session data sequence to graph conversion result;

FIG. 12 is an eighth exemplary graph of a mobile application session data sequence to graph conversion result;

FIG. 13 is a schematic diagram showing the comparison of the accuracy of the analysis of 17 types of mobile application encryption protocol message types by other classification algorithms and the present invention;

FIG. 14 is a diagram illustrating comparison of precision ratios for analysis of 17 types of mobile application encryption protocol messages according to other classification algorithms and the present invention;

FIG. 15 is a schematic diagram of other classification algorithms and a comparison of recall ratios for 17 types of mobile application encryption protocol message type parsing according to the present invention;

fig. 16 is a schematic diagram of comparison of F1 values for other classification algorithms and analysis of 17 mobile application encryption protocol packet types according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited to these examples.

Example 1

As shown in fig. 1 to fig. 16, the present invention provides a method for analyzing a message type of a mobile application encryption protocol based on multi-mode feature fusion learning, that is, a method for analyzing a message type of a mobile application encryption protocol, including the following steps:

(1) Preprocessing the acquired mobile application network flow original data, and extracting load structure characteristic data, load time sequence characteristic data and session interaction characteristic data of the mobile application encryption protocol message.

(2) Constructing a mobile application private encryption protocol message structure characteristic learning model based on an autoencoder and a dynamic pooling convolutional neural network, and learning to obtain a message load structure characteristic vector;

(3) Constructing a mobile application private encryption protocol message time sequence characteristic learning model based on a long-time and short-time memory network, and learning to obtain a message load time sequence characteristic vector;

(4) And constructing a mobile application private encryption protocol message interaction feature learning model based on a graph convolution neural network, and learning to obtain a message session interaction feature vector.

(5) And fusing and splicing the load structure characteristic vector, the load time sequence characteristic vector and the session interaction characteristic vector of the mobile application private encryption protocol message, and outputting an analysis result of the type of the mobile application private encryption protocol message by using a maximum entropy classifier.

More specific description of the invention follows:

further, the step (1) specifically comprises the following substeps:

(1.1) preprocessing message load data of an original network data packet, setting the length of the data packet intercepted by preprocessing, segmenting continuous network flow by a session flow, and separating the network message load data above a transmission layer of each data packet in the session flow;

(1.2) distinguishing the uplink direction and the downlink direction of the message load data, distinguishing the data packets according to the directions when the uplink direction and the downlink direction are adopted, defining the direction of the first data packet in the session as the uplink direction, taking the message load data which has the same starting address, destination address and port number as the first data packet as the uplink message load data, and taking the rest as the downlink message load data.

And (1.3) respectively calculating the sizes of the load data in the uplink direction and the downlink direction, and constructing the load data sequence in a hexadecimal form. The format is as follows:

the uplink message payload data is represented as: 00+ hex (uplink load data size);

the downlink message load data is represented as: FF + hex (downstream payload data size).

And (1.4) splicing the uplink and downlink message load data according to the organization mode of the uplink data before the downlink data to obtain message load structural characteristic data.

And (1.5) arranging according to the organizing mode of the data packet time sequence to obtain message load time sequence characteristic data.

And (1.6) constructing a characteristic expression model based on sequence to graph, and converting the data packet sequence in the session flow into an undirected graph. And extracting the packet direction, the information entropy of the load data and the load length of each data packet in the session flow, and embedding the packet direction, the information entropy of the load data and the load length as graph node characteristics to obtain message load interaction characteristic data.

The feature expression model based on sequence-to-graph is constructed by converting a data packet sequence in a conversation into a graph structure and performing feature expression on the converted data by utilizing a graph neural network. The transformation process is shown in FIG. 4. First, the transmission direction of the data packet needs to be distinguished. For this purpose, it is defined that the first packet sent in a session is C, the other is S, the positive direction of the packet sent from C to S is represented by 0, and the negative direction of the packet sent from S to C is represented by 1. Thus, the transmission process of the data packets of both sides of the session can be represented by an array A with the element value of 0 or 1, and the sequence of the elements in the array is the sequence of the data packets in the session. The one-dimensional array A representing the packet direction is converted into an undirected graph adjacency matrix M. The packets are connected in time sequence to form a sequence, and then the sequences are connected end to form a graph structure.

With the data structure of the graph, a one-dimensional sequence of data packet transmission processes can be represented in a two-dimensional mesh form. The graphical structure of the encryption protocol messaging session interaction feature of several mobile applications is shown in fig. 5-12.

Features extracted from each data packet are embedded in the graph nodes to express encrypted network traffic features. Calculating the length of the transmission layer load and a standard information entropy, wherein the calculation formula of the standard information entropy is as follows:

；

and then carrying out graph node characteristic embedding and correlation on the length of the transport layer load and the standard information entropy. And combining the three values of the data packet direction, the load length and the standard information entropy into an array. Sequence-to-graph feature representation for each session may result in a3 x n matrix and a label.

Further, the step (2) specifically comprises the following sub-steps:

and (2.1) inputting the message load structure characteristic data into a self-encoder with sparsity constraint conditions and noise robustness constraint conditions for anti-noise and dimension reduction processing so as to improve the anti-interference capability of message type analysis of the mobile application encryption protocol under the network environment of background flow. The implementation of the step can not only reduce the training time of each round of the subsequent dynamic pooling convolutional neural network, but also extract the characteristics more accurately, and finally increase the accuracy of the type analysis of the mobile application encryption protocol message.

Setting sparsity constraint conditions in a hidden layer of a self-encoder, wherein the input of the self-encoder is, the noise of background flow is considered during input, the expectation of the input noise is that the sparsity constraint is that the weight of the sparsity constraint is, the number of hidden layers in the self-encoder is, the number of hidden layer units is, the number of hidden layer neurons is, the hidden layer response is, and the sparsity constraint cost function of the self-encoder is:

；

and setting a noise robustness constraint condition in the self-encoder to constrain the connection weight matrix so as to strengthen a larger weight and weaken the disturbance of a small weight representing network background traffic noise. The cost function of the noise robustness constraint of the self-encoder is:

；

and inputting the message load structure feature data into a self-encoder with sparsity constraint conditions and noise robustness constraint conditions for unsupervised learning, and generating a feature vector after dimension reduction and noise resistance processing.

And (2.2) inputting the feature vector subjected to the dimension reduction and noise resistance processing into a constructed dynamic pooling convolutional neural network for learning. And constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, wherein the model is formed by stacking three layers of one-dimensional convolutions. The filling mode adopts a same mode, and batch normalization is carried out along with each layer of convolution.

For each layer of convolution operation, setting the number of channels c of the convolution operation, and outputting the hidden layer after one-dimensional convolution as follows:

；

after the convolution kernel operation, a plurality of characteristic sequences can be obtained for each input data, and the characteristics of the last layer of convolution output are set as follows:

；

after the convolution operation DropOut is added to prevent overfitting.

(2.3) for the feature vector output by the last layer of convolution, taking k-max _ posing as a nonlinear down-sampling function, and extracting the features by utilizing a nonlinear function dynamic pooling operation, wherein the dynamic pooling operation is as follows:

；

and after the pooling operation, obtaining the message load structure characteristic vector.

Further, the step (3) specifically includes the following sub-steps:

and (3.1) constructing a mobile application private encryption protocol message load time sequence characteristic learning model based on a long-time and short-time memory network, wherein the model comprises 64 memory units and is used for learning message load time sequence characteristic data.

The mobile application private encryption protocol message load time sequence characteristic learning model adopts a gate control mechanism to learn:

；

the gating values can be compressed between the [0,1] intervals by activating the function.

DropOut was added to the learning model to prevent overfitting, with a threshold of 0.5.

(3.2) model outputs are:

；

and the unit state vector acts with an output gate after passing through an activation function to obtain a time sequence characteristic vector of the output message load.

Further, the step (4) specifically includes the following sub-steps:

and (4.1) constructing a mobile application private encryption protocol message session interactive feature learning model based on the graph convolution neural network. The model structure comprises two times of graph convolution operations, the number of channels of the two times of graph convolution is set, and a function selection function is activated.

(ii) a Wherein, the number of the network data packets of the graph is

Each node contains a characteristic number of packets of

The feature matrix is

The adjacency matrix is

；

And (4.2) carrying out graph convolution operation by using the constructed learning model. In the model, for each layer map the convolution operations are:

；

(4.3) after the two-layer graph convolution operation, one is obtained

Obtaining:

；

(4.4) compressing by using a full connection layer pair, reducing dimensionality, and learning to obtain a message load session feature vector:

compressing by using a layer of full connection layer pair, reducing dimensionality, and learning to obtain a message load session feature vector:

；

further, the step (5) specifically comprises the following sub-steps:

and (5.1) performing ensemble learning and combined training on the three models, and setting hyper-parameters during model combined training.

And performing feature fusion splicing on the obtained message load structure feature vector, the message load time sequence feature vector and the message session interaction feature vector, and connecting to obtain the message load time sequence feature vector.

(5.2) calculating through the second fully-connected layer and its softmax activation function:

(5.1) carrying out integrated learning and combined training on the message structural feature learning model, the message time sequence feature learning model and the message interaction feature learning model, and setting hyper-parameters during model combined training; performing feature fusion splicing on the obtained message load structure feature vector, the obtained message load time sequence feature vector and the obtained message session interaction feature vector, and connecting to obtain:

；

(5.2) calculating through the second full connection layer and the softmax activation function thereof:

；

(5.3) finally, calculating and outputting the message type analysis result of the private encryption protocol of the mobile application

：

；

The method provided by the invention extracts and learns the mobile application encryption message protocol characteristics of different modes from multiple dimensions, integrates and learns the load structure characteristics, the load time sequence characteristics and the session interaction characteristics of the mobile application private encryption protocol message, constructs a mobile application encryption protocol message type analysis model, has strong generalization capability, and obtains a good classification effect on encryption network traffic data sets of different environments.

Example 2

As shown in fig. 1 to fig. 16, as a further optimization of embodiment 1, on the basis of embodiment 1, the present embodiment further includes the following technical features:

in this embodiment, a model framework is shown in fig. 3, and first, preprocessing acquired mobile application network traffic raw data, and extracting structural feature data, time sequence feature data, and interaction feature data of a packet load. Then constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, and learning to obtain a message load structure characteristic vector; constructing a mobile application private encryption protocol message time sequence characteristic learning model based on a long-time and short-time memory network, and learning to obtain a message load time sequence characteristic vector; and constructing a mobile application private encryption protocol message interactive feature learning model based on the graph convolution neural network, and learning to obtain a message session interactive feature vector. And secondly, fusing and splicing the message load structure characteristic vector, the time sequence characteristic vector and the session interaction characteristic vector, and outputting an analysis result of the message type of the private encryption protocol of the mobile application by using a maximum entropy classifier.

Specifically, the method for analyzing the message type of the mobile application encryption protocol based on the multimode feature fusion learning further includes the following technical features:

(1) Preprocessing the acquired mobile application network flow original data, and extracting the structural feature data, the time sequence feature data and the interactive feature data of the message load.

In (1.1) of this step: in the design process of the message type analysis model and the classifier of the mobile application encryption protocol, the problem of effective input of the classifier needs to be considered so as to improve the efficiency of classification and identification. No matter the public network traffic data set or the network service data traffic collected by researchers, the original traffic format is the pcap format, and the pcap format cannot be directly used for the input of a mobile application encryption protocol message type analysis model, and the data needs to be preprocessed.

Five types of network mobile applications with different purposes, such as audio-visual entertainment, news information, life shopping, instant messaging and tools, are selected, and the network mobile applications comprise 17 different mobile application tools. The private encryption protocol message types used by the mobile applications are used as tag data and run in a public network environment and a campus network environment to collect corresponding network traffic data. The resulting data set is shown in table 1.

Table 1 collected mobile application network traffic data set

And embedding the characteristics extracted from each data packet in the graph node to express the encrypted network traffic characteristics. Calculating the length of the transmission layer load and a standard information entropy, wherein the calculation formula of the standard information entropy is as follows:

；

in general

A bit string or character string of a particular length.

And then carrying out graph node characteristic embedding and correlation on the length of the transport layer load and the standard information entropy. And combining the three values of the packet direction, the load length and the standard information entropy into an array. Sequence-to-graph feature representation for each session may result in a3 x n matrix and a label.

(2) Constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, and learning to obtain a message load structure characteristic vector

The specific process of the step is as follows:

and (2.1) inputting the message load structural feature data into a self-encoder with sparsity constraint conditions and noise robustness constraint conditions for anti-noise and dimension reduction processing so as to improve the anti-interference capability of mobile application encryption protocol message type analysis under the network environment of background flow. The implementation of the step can not only reduce the training time of each round of the subsequent dynamic pooling convolutional neural network, but also extract the characteristics more accurately, and finally increase the accuracy of the type analysis of the mobile application encryption protocol message.

Setting sparsity constraint conditions in a hidden layer of an autoencoder;

and setting a noise robustness constraint condition in the self-encoder to constrain the connection weight matrix so as to strengthen a larger weight and weaken the disturbance of a small weight representing network background traffic noise.

And (2.2) inputting the feature vector subjected to the dimension reduction and noise resistance processing into a constructed dynamic pooling convolutional neural network for learning. And constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, wherein the model is formed by stacking three layers of one-dimensional convolutions. The filling mode adopts a same mode, and batch normalization is carried out along with each layer of convolution. A list of the unit structures of the message payload structural feature learning model is shown in table 2.

Table 2 list of unit structures of message payload structure feature learning model

For each layer of convolution operation, the hidden layer output after one-dimensional convolution is:

；

；

after the convolution operation DropOut is added to prevent overfitting, with a threshold of 0.2.

(2.3) for the feature vector output by the last layer of convolution, adopting k-max boosting as a nonlinear down-sampling function, and extracting features by utilizing nonlinear function dynamic pooling operation, wherein the dynamic pooling operation is as follows:

；

(3) And constructing a mobile application private encryption protocol message time sequence characteristic learning model based on a long-time and short-time memory network, and learning to obtain a message load time sequence characteristic vector.

The specific process of the step is as follows:

and (3.1) constructing a mobile application private encryption protocol message load time sequence characteristic learning model based on a long-time and short-time memory network, wherein the model comprises 64 memory units and is used for learning input traffic characteristics. A list of unit structures of the message payload timing characteristic learning model is shown in table 3.

Table 3 list of unit structures of message load timing characteristic learning model

；

(3.2) model outputs are:

；

(4) And constructing a mobile application private encryption protocol message interactive feature learning model based on the graph convolution neural network, and learning to obtain a message session interactive feature vector.

The specific process of the step is as follows:

and (4.1) constructing a mobile application private encryption protocol message session interactive feature learning model based on the graph convolution neural network, wherein the unit structure of the model is set as shown in the table 4.

Table 4 list of unit structures of interactive feature learning model for message sessions

Inputting message load interactive characteristic data into a graph convolution neural network model, and converting the graph into a graph through a sequence-to-graph method

(ii) a Wherein, the number of the network data packets of the graph is

Each node contains a characteristic number of packets of

The feature matrix is

The adjacency matrix is

。

And (4.2) performing graph convolution operation by using the constructed learning model. In the model, for each layer graph, the convolution operations are:

；

(4.3) after the two-layer graph convolution operation, one is obtained

Using a Flatten operation to stretch the matrix into one-dimensional eigenvectors

Obtaining:

；

(4.4) compressing by using a layer of full connection layer pair, reducing dimensionality, and learning to obtain a message load conversation feature vector:

；

(5) And fusing and splicing the load structure characteristic vector, the time sequence characteristic vector and the session interaction characteristic vector of the mobile application private encryption protocol message, and outputting an analysis result of the type of the mobile application private encryption protocol message by using a maximum entropy classifier.

The specific process of the step is as follows:

(5.1) performing ensemble learning and combined training on the three models, wherein the hyper-parameter setting during the model combined training is shown in a table 5.

TABLE 5 parameter settings during training of three model combinations

And performing feature fusion splicing on the obtained feature vectors, wherein a list of unit structures of the feature fusion splicing is shown in table 6.

Table 6 list of unit structures for feature fusion splicing

Are connected to obtain

；

；

(5.3) finally calculating and outputting the message type analysis result of the private encryption protocol of the mobile application, namely

The sequence numbers corresponding to the categories to which the data belongs:

；

wherein the content of the first and second substances,

indicating the corresponding sequence number of the belonging category.

The experiment of this embodiment is performed on the acquired data set of the network mobile application in 17, and the experimental result is shown in table 7, which shows the analysis result of the method of this embodiment for each application traffic encryption protocol packet type. From the data in the table it can be seen that: the four types of the standard-finding indexes are more than 99 percent, namely Jingdong, mei Tuo, aiqi skill and much spelling; for the recall index, 4 types of applications exceed 98 percent, namely Microsoft-Launcher, dog searching input method, weChat and Mei Tuan respectively; for the F1 value index, over 98% have 5 types of applications, which are search dog input, microsoft-Launcher, kyoto, mei Tuo, and WeChat, respectively. The weighted average values of the precision ratio, the recall ratio and the F1 value are 97.29%,97.26% and 97.27%, respectively, and the overall accuracy of the model on the data set reaches 97.26%.

Table 7 type resolution results of the inventive method on a dataset of a network mobile application in 17

In the embodiment, the comparison experiment selects and compares the model with the 2D-CNN, LSTM, GCN and CNN + LSTM to verify the effectiveness of the message type analysis method of the mobile application encryption protocol based on multi-mode feature fusion learning. The final overall comparative experimental results are shown in fig. 13 to 16.

It should be noted that, for the sake of simplicity, the present embodiment is described as a series of acts, but those skilled in the art should understand that the present application is not limited by the described order of acts, because some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

The invention can accurately identify the types of the private encryption protocol messages of various network mobile applications, and improve the supervision efficiency and the supervision strength of network space safety;

the invention is based on the load data above the transmission layer in the network flow data to learn and classify, does not depend on the IP address and port number information of the head of the network flow data packet, and the generalization capability of the classification model is strong;

the invention carries out data set sampling test in a complex network environment, and the detection result more accords with the requirement under a real network environment.

It should be noted that, in the present invention, the execution sequence of the "S2A, the message structure feature learning", "S2B, the message timing feature learning", and "S2C, the message interaction feature learning" may be in various forms, and may even be performed simultaneously, so the order of the steps in the embodiments described in the present invention should not be considered as limiting the execution sequence of the three.

As described above, the present invention can be preferably implemented.

All features disclosed in all embodiments in this specification, or all methods or process steps implicitly disclosed, may be combined and/or expanded, or substituted, in any way, except for mutually exclusive features and/or steps.

The foregoing is only a preferred embodiment of the present invention, and the present invention is not limited thereto in any way, and any simple modification, equivalent replacement and improvement made to the above embodiment within the spirit and principle of the present invention still fall within the protection scope of the present invention.

Claims

1. A kind of mobile application encrypts the message type analytic method of the agreement, characterized by, withdraw and learn the different modal characteristic of the mobile application encrypts the agreement message, utilize the different modal characteristic to fuse each other and realize the analytic method of the message type of the encryption agreement;

the method comprises the following steps:

s1, message data preprocessing: preprocessing the acquired mobile application network flow original data, and extracting structural feature data, time sequence feature data and interactive feature data of message loads in the original data;

s2, feature learning, which specifically comprises the following steps:

S2B, learning message time sequence characteristics: constructing a mobile application private encryption protocol message time sequence feature learning model based on a long-time and short-time memory network by using the time sequence feature data, and learning to obtain a message load time sequence feature vector;

s3, message type analysis: fusing and splicing the message load structure characteristic vector, the time sequence characteristic vector and the interaction characteristic vector, and outputting an analysis result of the mobile application private encryption protocol message type by using a maximum entropy classifier;

step S2A includes the steps of:

S2A1, inputting the message load structure characteristic data into a self-encoder with sparsity constraint conditions and noise robustness constraint conditions for anti-noise and dimension-reduction processing, and generating a feature vector after dimension-reduction and anti-noise processing;

S2A2, constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network; inputting the feature vector subjected to the dimension reduction and noise resistance processing into a constructed message structure feature learning model for learning to obtain a feature sequence subjected to convolution kernel operation;

the message structure feature learning model is constructed as follows:

；

wherein the content of the first and second substances,

column labels of the weight matrix representing the one-dimensional convolution kernel,

in the weight matrix representing the one-dimensional convolution kernel

Go to the first

The weight value of a column is determined,

which represents the shape of the convolution kernel,

representing input data

Go to the first

The value of the column is such that,

the total number of rows of data is represented,

which represents the total number of columns entered,

the shape of the input is represented by,

to represent the output of

A positionA value of (d);

after the convolution kernel operation, a plurality of characteristic sequences can be obtained for each input data, and the characteristic vector output by the last layer of convolution is set as follows:

；

wherein the content of the first and second substances,

representing feature vectors

Each element of (1);

the dynamic pooling operation is as follows:

；

the message structure is represented by a character of the message,

indicates the number of all the convolutional layers,

indicates the number of layers of the current convolutional layer,

which indicates the length of the input sequence,

representing fixed pooling layer parameters.

2. The method for parsing message type of mobile application encryption protocol according to claim 1, wherein in step S2A1, the cost function of sparsity constraint condition is:

；

wherein the content of the first and second substances,

a cost function representing a sparsity constraint,

representing the input from the encoder, and,

the sparsity constraint is expressed in terms of,

the weight representing the sparsity constraint is represented by,

which represents the expectation of the total noise,

representing the number of implicit layers in the self-encoder,

,

representing gaussian noise with a mean of 0 and a variance of 1,

representing a neural network

The input of the layer is carried out,

a number of a hidden layer unit is indicated,

the number of hidden layer neurons is represented,

representing a hidden layer response;

the cost function of the noise robustness constraint is:

；

wherein the content of the first and second substances,

a cost function representing a noise robustness constraint,

the target output is represented by a number of words,

representation autoencoder learning networkAnd outputting the signals to the computer for output,

which is indicative of an activation factor,

、

a number representing two input data is shown,

representing input data from

To input data

The connection weight of (c).

3. The method according to claim 2, wherein the step S2B comprises the steps of:

；

wherein the content of the first and second substances,

a function of a gate unit is represented,

、

、

respectively representing a forgetting gate, an input gate or an output gate,

it is shown that the activation function is,

indicating the time of day

The input of (a) is performed,

indicating the time of day

Is then outputted from the output of (a),

a bias value representing a forgetting gate, an input gate, or an output gate;

；

wherein the content of the first and second substances,

representing the time sequence characteristic vector of the message load,

it is shown that the activation function is,

a state vector of the cell is represented,

the tan h activation function is expressed as,

a parameter indicative of the output gate is provided,

indicating the bias.

4. The method according to claim 3, wherein the step S2C comprises the following steps:

(ii) a Wherein, the number of the network data packets of the graph is

Each node contains a characteristic number of packets of

The feature matrix is

The adjacency matrix is

；

；

wherein the content of the first and second substances,

，

the unit matrix is represented by a matrix of units,

to represent

A corresponding matrix of degrees is formed by the degree matrix,

the number of network layers is indicated,

is shown as

The weight of the layer, the dimension of the weight is

，

Is shown passing through

The dimensions of the graph node data after the layer convolution,

is shown as

The bias of the layers is such that,

is shown as

Input of the layer, the input of the first layer being

，

Representing a nonlinear activation function ReLU function;

S2C3, obtaining one after two-layer graph convolution operation

Obtaining:

；

wherein the content of the first and second substances,

representing the interactive feature vector of the messaging session,

has the dimension of

，

Representing each element in the message session interaction feature vector;

；

a feature vector representing the message payload session,

a weight matrix representing the fully-connected layer,

the offset is represented by the number of bits in the bit,

it is shown that the activation function is,

the ReLU function is used at the fully connected layer.

5. The method for parsing message type of mobile application encryption protocol according to claim 4, wherein the step S3 comprises the steps of:

s31, performing integrated learning and combined training on the message structural feature learning model, the message time sequence feature learning model and the message interaction feature learning model, and setting hyper-parameters during model combined training; and performing feature fusion splicing on the obtained message load structure feature vector, the message load time sequence feature vector and the message session interaction feature vector, and connecting to obtain:

；

representing message session multimode fusion feature vectors;

；

wherein the content of the first and second substances,

a weight matrix representing the second fully-connected layer,

the offset is represented by the number of bits in the bit,

the length representing the number of classes that need to be classified,

is a one-dimensional vector;

S33，finally calculating and outputting the analysis result of the mobile application private encryption protocol message type

：

；

indicating the corresponding sequence number of the belonging category.

6. The method for parsing message type of mobile application encryption protocol according to any of claims 1 to 5, wherein the step S1 comprises the steps of:

s11, setting the length of an original network data packet intercepted by preprocessing, segmenting continuous network flow by a conversation flow, and separating network message load data above a transmission layer of each data packet in the conversation flow;

7. The method according to claim 6, wherein in step S16, the standard entropy is calculated as:

；

wherein the content of the first and second substances,

the entropy of the standard information is represented,

representing an arbitrary distribution

Discrete random variables of

，

To represent

The number of discrete variables contained in (a),

indicating the sequence number of the bytes in the data packet,

which represents the bytes in the data packet,

representing bytes

In that

The probability of occurrence of (c).

8. A mobile application cryptographic protocol message type parsing system, based on any one of claims 1 to 7, wherein the mobile application cryptographic protocol message type parsing method comprises the following modules:

a message time sequence characteristic learning module: the method comprises the steps of constructing a mobile application private encryption protocol message time sequence characteristic learning model based on a long-time and short-time memory network by utilizing time sequence characteristic data, and learning to obtain a message load time sequence characteristic vector;

the message interaction feature learning module: the method comprises the steps of constructing a mobile application private encryption protocol message interactive feature learning model based on a graph convolution neural network by utilizing interactive feature data, and learning to obtain a message session interactive feature vector;

a message type analysis module: the message type analysis method is used for fusing and splicing the message load structure characteristic vector, the time sequence characteristic vector and the interaction characteristic vector, and outputting an analysis result of the mobile application private encryption protocol message type by using a maximum entropy classifier;

the input ends of the message structure characteristic learning module, the message time sequence characteristic learning module and the message type analysis module are respectively and electrically connected with the output end of the message data preprocessing module, and the output ends of the message structure characteristic learning module, the message time sequence characteristic learning module and the message type analysis module are respectively and electrically connected with the input end of the message type analysis module;

the message structure characteristic learning module executes the following steps when working:

the message structure characteristic learning model is constructed as follows:

；

wherein the content of the first and second substances,

represents aThe column labels of the weight matrix of the dimensional convolution kernel,

in the weight matrix representing the one-dimensional convolution kernel

Go to the first

The weight value of a column is determined,

which represents the shape of the convolution kernel or kernels,

representing input data

Go to the first

The value of the column is such that,

the total number of rows of data is represented,

which represents the total number of columns entered,

the shape of the input is represented by,

to represent the output

A value of each position;

；

wherein the content of the first and second substances,

representing feature vectors

Each element of (1);

S2A3, for the feature vector output by the last layer of convolution, adopting k-max boosting as a nonlinear down-sampling function, and extracting the feature vector by utilizing nonlinear function dynamic pooling operation to obtain a message load structure feature vector;

the dynamic pooling operation is:

；

wherein the content of the first and second substances,

the structural characteristics of the message are represented,

indicates the number of all the convolutional layers,

indicates the number of layers of the current convolutional layer,

which indicates the length of the input sequence,

representing fixed pooling layer parameters.