CN115277888A

CN115277888A - Method and system for analyzing message type of mobile application encryption protocol

Info

Publication number: CN115277888A
Application number: CN202211171000.9A
Authority: CN
Inventors: 吉庆兵; 罗杰; 潘炜; 倪绿林; 谈程; 康璐
Original assignee: CETC 30 Research Institute
Current assignee: CETC 30 Research Institute
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2022-11-01
Anticipated expiration: 2042-09-26
Also published as: CN115277888B

Abstract

The invention relates to the technical field of message analysis, and discloses a method and a system for analyzing the type of a mobile application encryption protocol message. The invention solves the problems of high resource consumption, poor universality, low accuracy, poor generalization capability and the like in the prior art.

Description

Method and system for analyzing message type of mobile application encryption protocol

Technical Field

The invention relates to the technical field of message analysis, in particular to a method and a system for analyzing message types of a mobile application encryption protocol.

Background

The trend of network traffic to the comprehensive encryption era is great, the encryption technology can ensure the safety of data transmission in network communication, but undeniably, malicious behaviors such as malicious software, illegal statements, network attacks and the like are also hidden in network mobile application encryption traffic, and serious threats are brought to users using the internet. The method is an important precondition for information monitoring, safety detection and electronic evidence collection, and has very important significance for maintaining healthy and green network environment, national safety and social stability.

The traditional methods of port matching and deep packet inspection need to analyze the message content first and then identify the message type through regular matching, but these pair encryption protocol messages are faced with failure. The method for using machine learning needs to design artificial features of a message to be identified, which consumes a lot of time and energy, and in the face of a plurality of application programs and encryption protocols with differences, it is difficult to design a feature set which generally reflects traffic features, which limits the universality of the machine learning method, and thus, when the machine learning method is used for analyzing and identifying encrypted network traffic, a better effect is difficult to obtain.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a method and a system for analyzing the message type of a mobile application encryption protocol, which solve the problems of high resource consumption, poor universality, low accuracy, poor generalization capability and the like in the prior art.

The technical scheme adopted by the invention for solving the problems is as follows:

a method for analyzing the type of a mobile application encryption protocol message extracts and learns different modal characteristics of the mobile application encryption protocol message, and realizes the type analysis of the encryption protocol message by fusing the different modal characteristics.

As a preferable technical scheme, the method comprises the following steps:

s1, preprocessing message data: preprocessing the acquired mobile application network flow original data, and extracting structural feature data, time sequence feature data and interactive feature data of message loads in the original data;

s2, feature learning, which specifically comprises the following steps:

S2A, learning message structure characteristics: constructing a mobile application private encryption protocol message structure feature learning model based on a dynamic pooling convolutional neural network by using the structure feature data, and learning to obtain a message load structure feature vector;

S2B, learning message time sequence characteristics: constructing a mobile application private encryption protocol message time sequence characteristic learning model based on a long-time and short-time memory network by using the time sequence characteristic data, and learning to obtain a message load time sequence characteristic vector;

S2C, learning message interaction characteristics: constructing a mobile application private encryption protocol message interactive feature learning model based on a graph convolution neural network by using interactive feature data, and learning to obtain a message session interactive feature vector;

s3, message type analysis: and fusing and splicing the message load structure characteristic vector, the time sequence characteristic vector and the interactive characteristic vector, and outputting an analysis result of the message type of the mobile application private encryption protocol by using a maximum entropy classifier.

As a preferred technical solution, the step S1 includes the following steps:

s11, setting the length of an original network data packet intercepted by preprocessing, segmenting continuous network flow by a session flow, and separating network message load data above a transmission layer of each data packet in the session flow;

s12, distinguishing the uplink and downlink directions of the message load data: defining the uplink direction and the downlink direction of the message load data in the data packets in the session according to the data flow direction, taking the message load data which has the same initial address, destination address and port number as the first data packet as the uplink message load data, and taking the rest as the downlink message load data;

s13, respectively calculating the sizes of the load data in the uplink direction and the downlink direction, constructing a payload data sequence in hexadecimal form;

s14, splicing the uplink message load data and the downlink message load data to obtain message load structural characteristic data according to a splicing mode of the uplink data before and the downlink data after;

s15, arranging according to an organization mode of the data packet time sequence to obtain message load time sequence characteristic data;

s16, constructing a feature expression model based on a sequence-to-graph, and converting a data packet sequence in the session flow into an undirected graph; for each data packet in the session flow, extracting the packet direction of the data packet, the standard information entropy of the load data and the load length, and embedding the packet direction of the data packet, the standard information entropy of the load data and the load length as graph node characteristics to obtain message load interaction characteristic data.

As a preferred technical solution, in step S16, the calculation formula of the standard information entropy is:

；

wherein,

the entropy of the standard information is represented,

representing an arbitrary distribution

Discrete random variables of

，

To represent

The number of discrete variables contained in (a),

indicating the sequence number of the bytes in the data packet,

which represents the bytes in the data packet,

representing bytes

In that

The probability of occurrence of (c).

As a preferred technical solution, the step S2A includes the following steps:

S2A1, inputting the message load structure characteristic data into a self-encoder with sparsity constraint conditions and noise robustness constraint conditions for anti-noise and dimension-reduction processing, and generating a feature vector after dimension-reduction and anti-noise processing;

S2A2, constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network; inputting the feature vector subjected to the dimension reduction and noise resistance processing into a constructed message structure feature learning model for learning to obtain a feature sequence subjected to convolution kernel operation;

the message structure characteristic learning model is constructed as follows:

constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, wherein the message structure characteristic learning model is formed by stacking three layers of one-dimensional convolutions; the filling mode adopts a same mode, and each layer of convolution is accompanied with batch normalization; for each layer of convolution operation, the hidden layer output after one-dimensional convolution is:

；

wherein,

weight moments representing one-dimensional convolution kernelsThe row numbers of the array are numbered,

column numbers of the weight matrix representing the one-dimensional convolution kernel,

in the weight matrix representing the one-dimensional convolution kernel

Go to the first

The weight value of a column is determined,

which represents the shape of the convolution kernel or kernels,

representing input data

Go to the first

The value of the column is such that,

the total number of rows of data is represented,

which represents the total number of columns entered,

the shape of the input is represented by,

to represent the output

A value of each position;

after the convolution kernel operation, a plurality of characteristic sequences are obtained for each input data, and the characteristic vector output by the last layer of convolution is set as:

；

wherein,

representing feature vectors

Each element of (1);

S2A3, for the feature vector output by the last layer of convolution, taking k-max boosting as a nonlinear down-sampling function, and extracting the feature vector by utilizing nonlinear function dynamic pooling operation to obtain a message load structure feature vector;

the dynamic pooling operation is as follows:

；

wherein,

the structural characteristics of the message are represented,

indicates the number of all the convolutional layers,

indicates the number of layers of the current convolutional layer,

which indicates the length of the input sequence,

indicating a fixed poolLayer parameters.

As a preferred technical solution, in step S2A1, the cost function of the sparsity constraint condition is:

；

wherein,

a cost function representing a sparsity constraint,

representing the input from the encoder, and,

the sparsity constraint is expressed in terms of,

the weight representing the sparsity constraint is represented by,

which represents the expectation of the total noise,

representing the number of implicit layers in the self-encoder,

,

representing gaussian noise with a mean of 0 and a variance of 1,

representing a neural network

The layer is input into the device body,

a number of an implicit layer element is indicated,

the number of the neurons in the hidden layer is represented,

representing a hidden layer response;

the cost function of the noise robustness constraint is:

；

wherein,

a cost function representing a noise robustness constraint,

the target output is represented by a target output,

representing the output from the encoder learning network,

which is indicative of an activation factor,

、

a number representing two input data is shown,

representing input data from

To input data

The connection weight of (c).

As a preferred technical solution, the step S2B includes the following steps:

S2B1, constructing a mobile application private encryption protocol message load time sequence characteristic learning model based on a long-time memory network, wherein the message load time sequence characteristic learning model comprises JI memory units, JI is an integer and 32 is more than or equal to JI and less than or equal to 256, and learning message load time sequence characteristic data by using the constructed message load time sequence characteristic learning model, wherein the learning formula is as follows:

；

wherein,

a function of a gate unit is represented,

、

、

respectively representing a forgetting gate, an input gate or an output gate,

it is shown that the activation function is,

corresponding to forgetting to gate and loseThe parameters of the input gate or the output gate,

indicating the time of day

The input of (a) is performed,

indicating the time of day

Is then outputted from the output of (a),

a bias value representing a forgetting gate, an input gate, or an output gate;

S2B2, obtaining a time sequence characteristic vector output as a message load, wherein the output formula is as follows:

；

wherein,

representing the time sequence characteristic vector of the message load,

it is shown that the activation function is,

a state vector of the cell is represented,

the tan h activation function is expressed as,

a parameter indicative of the output gate is provided,

indicating the bias.

As a preferred technical solution, the step S2C includes the steps of:

S2C1, constructing a mobile application private encryption protocol message session interactive feature learning model based on a graph convolution neural network, wherein the session interactive feature learning model comprises two graph convolution layers which are sequentially connected, setting the number of channels of two graph convolutions when graph convolution operation is carried out, and activating a function to select a ReLU function;

inputting the message load interactive characteristic data into a graph convolution neural network model, and converting the graph into a sequence-to-graph method

(ii) a Wherein, the number of the network data packets of the graph is

Each node contains a characteristic number of packets of

The feature matrix is

The adjacency matrix is

；

And S2C2, performing graph convolution operation by using the learning model constructed in the step S2C1, wherein the graph convolution operation comprises the following steps of:

；

wherein,

，

the unit matrix is represented by a matrix of units,

to represent

A corresponding matrix of degrees is formed by the degree matrix,

the number of network layers is indicated,

is shown as

The weight of the layer, the dimension of the weight is

，

Is shown passing through

The dimensionality of the graph node data after the layer convolution,

is shown as

The biasing of the layers is such that,

is shown as

Input of the layer, the input of the first layer being

，

Representing a nonlinear activation function ReLU function;

S2C3, obtaining one after two-layer graph convolution operation

Using the Flatten operation to stretch the matrix into one-dimensional eigenvectors

Obtaining:

；

wherein,

representing the interactive feature vector of the messaging session,

has the dimension of

，

Representing each element in the message session interaction feature vector;

S2C4, compressing by using a layer of full connection layer pair, reducing dimensionality, and learning to obtain a message load session feature vector:

；

wherein,

representing the feature vector of the message payload session,

a weight matrix representing the fully-connected layer,

the offset is represented by the number of bits in the bit,

it is shown that the activation function is,

the ReLU function is used at the fully connected layer.

As a preferred technical solution, the step S3 includes the following steps:

s31, performing integrated learning and combined training on the message structure characteristic learning model, the message time sequence characteristic learning model and the message interaction characteristic learning model, and setting a hyper-parameter during model combined training; and performing feature fusion splicing on the obtained message load structure feature vector, the message load time sequence feature vector and the message session interaction feature vector, and connecting to obtain:

；

wherein,

representing message session multimode fusion feature vectors;

s32, calculating through a second full connection layer and a softmax activation function thereof:

；

wherein,

a weight matrix representing the second fully-connected layer,

the offset is represented by the number of bits in the bit,

the length representing the number of classes that need to be classified,

is a one-dimensional vector;

s33, finally calculating and outputting the message type analysis result of the private encryption protocol of the mobile application

：

；

Wherein,

indicating the corresponding sequence number of the belonging category.

A mobile application encryption protocol message type analysis system is based on the mobile application encryption protocol message type analysis method and comprises the following modules:

a message data preprocessing module: the method comprises the steps of preprocessing acquired mobile application network flow original data, and extracting structural feature data, time sequence feature data and interactive feature data of message loads in the original data;

the message structural feature learning module: the method comprises the steps of constructing a mobile application private encryption protocol message structure feature learning model based on a dynamic pooling convolutional neural network by using structure feature data, and learning to obtain a message load structure feature vector;

a message time sequence characteristic learning module: the method comprises the steps of constructing a mobile application private encryption protocol message time sequence characteristic learning model based on a long-time and short-time memory network by using time sequence characteristic data, and learning to obtain a message load time sequence characteristic vector;

the message interaction feature learning module: the method comprises the steps of constructing a mobile application private encryption protocol message interactive feature learning model based on a graph convolution neural network by utilizing interactive feature data, and learning to obtain a message session interactive feature vector;

a message type analysis module: the message type analysis method is used for fusing and splicing the message load structure characteristic vector, the time sequence characteristic vector and the interaction characteristic vector, and outputting an analysis result of the mobile application private encryption protocol message type by using a maximum entropy classifier;

the input ends of the message structure characteristic learning module, the message time sequence characteristic learning module and the message type analysis module are respectively and electrically connected with the output end of the message data preprocessing module, and the output ends of the message structure characteristic learning module, the message time sequence characteristic learning module and the message type analysis module are respectively and electrically connected with the input end of the message type analysis module.

Compared with the prior art, the invention has the following beneficial effects:

(1) The invention can accurately identify the message types of various network mobile application private encryption protocols, thereby improving the supervision efficiency and the supervision strength of network space safety;

(2) The invention is based on the load data above the transmission layer in the network flow data to learn and classify, does not depend on the IP address and port number information of the head of the network flow data packet, and the generalization capability of a classification model is strong;

(3) The invention carries out data set sampling test in a complex network environment, and the detection result more accords with the requirement under a real network environment.

Drawings

Fig. 1 is a schematic diagram illustrating steps of a method for parsing a mobile application encryption protocol packet type according to the present invention;

fig. 2 is a schematic structural diagram of a mobile application encryption protocol message type parsing system according to the present invention;

FIG. 3 is a schematic diagram of a mobile application encryption protocol message type parsing framework for multi-mode feature fusion learning according to the present invention;

FIG. 4 is a diagram of a process for converting a sequence of packets to the session characteristics of the mobile application private encryption protocol packet of the figure;

FIG. 5 is one of exemplary graphs of a mobile application session data sequence to graph conversion result;

FIG. 6 is a second exemplary graph of a mobile application session data sequence to graph conversion result;

FIG. 7 is a third exemplary graph of a mobile application session data sequence to graph conversion result;

FIG. 8 is a fourth exemplary graph of a mobile application session data sequence to graph conversion result;

FIG. 9 is a fifth exemplary graph of a mobile application session data sequence to graph conversion result;

FIG. 10 is a sixth exemplary graph of a mobile application session data sequence to graph conversion result;

FIG. 11 is a seventh exemplary graph of a mobile application session data sequence to graph conversion result;

FIG. 12 is an eighth exemplary graph of a mobile application session data sequence to graph conversion result;

FIG. 13 is a schematic diagram showing the comparison of the accuracy of the analysis of 17 types of mobile application encryption protocol message types by other classification algorithms and the present invention;

FIG. 14 is a diagram illustrating comparison of precision ratios for analysis of 17 types of mobile application encryption protocol messages according to other classification algorithms and the present invention;

FIG. 15 is a schematic diagram of other classification algorithms and a comparison of recall ratios for 17 types of mobile application encryption protocol message type parsing according to the present invention;

fig. 16 is a schematic diagram of comparison of F1 values for other classification algorithms and analysis of 17 mobile application encryption protocol packet types according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited to these examples.

Example 1

As shown in fig. 1 to 16, the present invention provides a method for analyzing a message type of a mobile application encryption protocol for multi-mode feature fusion learning, that is, a method for analyzing a message type of a mobile application encryption protocol, including the following steps:

(1) Preprocessing the acquired mobile application network flow original data, and extracting load structure characteristic data, load time sequence characteristic data and session interaction characteristic data of the mobile application encryption protocol message.

(2) Constructing a mobile application private encryption protocol message structure characteristic learning model based on an autoencoder and a dynamic pooling convolutional neural network, and learning to obtain a message load structure characteristic vector;

(3) Constructing a mobile application private encryption protocol message time sequence characteristic learning model based on a long-time and short-time memory network, and learning to obtain a message load time sequence characteristic vector;

(4) And constructing a mobile application private encryption protocol message interaction feature learning model based on a graph convolution neural network, and learning to obtain a message session interaction feature vector.

(5) And fusing and splicing the load structure characteristic vector, the load time sequence characteristic vector and the session interaction characteristic vector of the mobile application private encryption protocol message, and outputting an analysis result of the type of the mobile application private encryption protocol message by using a maximum entropy classifier.

More specific description of the invention follows:

further, the step (1) specifically comprises the following substeps:

(1.1) preprocessing message load data of an original network data packet, setting the length of the data packet intercepted by preprocessing, segmenting continuous network flow by a session flow, and separating the network message load data above a transmission layer of each data packet in the session flow;

(1.2) distinguishing the uplink direction and the downlink direction of the message load data, distinguishing the data packets according to the directions when the uplink direction and the downlink direction are adopted, defining the direction of the first data packet in the session as the uplink direction, taking the message load data which has the same starting address, destination address and port number as the first data packet as the uplink message load data, and taking the rest as the downlink message load data.

And (1.3) respectively calculating the sizes of the load data in the uplink direction and the downlink direction, and constructing load data sequences in a hexadecimal form. The format is as follows:

the uplink message payload data is represented as: 00+ hex (uplink load data size);

the downlink message payload data is represented as: FF + hex (downstream payload data size).

And (1.4) splicing the uplink and downlink message load data according to the organization mode of the uplink data before the downlink data to obtain message load structural characteristic data.

And (1.5) arranging according to the organizing mode of the data packet time sequence to obtain message load time sequence characteristic data.

(1.6) constructing a feature expression model based on sequence-to-graph, and converting the data packet sequence in the session flow into an undirected graph. And extracting the packet direction, the information entropy and the load length of the load data of each data packet in the session flow, and embedding the packet direction, the information entropy and the load length of the load data as graph node characteristics to obtain message load interaction characteristic data.

The feature expression model based on sequence-to-graph is constructed by converting a data packet sequence in a conversation into a graph structure and performing feature expression on the converted data by utilizing a graph neural network. The transformation process is shown in FIG. 4. First, the transmission direction of the data packet needs to be distinguished. For this purpose, it is defined that the first packet sent in the session is C, the other is S, the positive direction of the packet sent by C to S is represented by 0, and the negative direction of the packet sent by S to C is represented by 1. Thus, the transmission process of the data packets of both sides of the session can be represented by an array A with the element value of 0 or 1, and the sequence of the elements in the array is the sequence of the data packets in the session. This one-dimensional array a representing the packet direction is converted into a adjacency matrix M of an undirected graph. The packets are connected in time sequence to form a sequence, and then the sequences are connected end to form a graph structure.

With the data structure of the graph, a one-dimensional sequence of data packet transmission processes can be represented in a two-dimensional mesh form. The graphical structure of the encryption protocol messaging session interaction feature of several mobile applications is shown in fig. 5-12.

Features extracted from each data packet are embedded in the graph nodes to express encrypted network traffic features. Calculating the length of the transmission layer load and a standard information entropy, wherein the calculation formula of the standard information entropy is as follows:

；

and then carrying out graph node characteristic embedding and correlation on the length of the transport layer load and the standard information entropy. And combining the three values of the packet direction, the load length and the standard information entropy into an array. Sequence-to-graph feature representation for each session can generate a matrix of 3*N and a label.

Further, the step (2) specifically comprises the following sub-steps:

and (2.1) inputting the message load structural feature data into a self-encoder with sparsity constraint conditions and noise robustness constraint conditions for anti-noise and dimension reduction processing so as to improve the anti-interference capability of mobile application encryption protocol message type analysis under the network environment of background flow. The implementation of the step can not only reduce the training time of each round of the subsequent dynamic pooling convolutional neural network, but also extract the characteristics more accurately, and finally increase the accuracy of the type analysis of the mobile application encryption protocol message.

Setting sparsity constraint conditions in a hidden layer of a self-encoder, wherein the input of the self-encoder is, the noise of background flow is considered during input, the expectation of the input noise is that the sparsity constraint is that the weight of the sparsity constraint is, the number of hidden layers in the self-encoder is, the number of hidden layer units is, the number of hidden layer neurons is, the hidden layer response is, and the sparsity constraint cost function of the self-encoder is:

；

and setting a noise robustness constraint condition in the self-encoder to constrain the connection weight matrix so as to strengthen a larger weight and weaken the disturbance of a small weight representing network background traffic noise. The cost function of the noise robustness constraint of the self-encoder is:

；

and inputting the message load structure feature data into a self-encoder with sparsity constraint conditions and noise robustness constraint conditions for unsupervised learning, and generating a feature vector after dimension reduction and noise resistance processing.

And (2.2) inputting the feature vector subjected to the dimension reduction and noise resistance processing into a constructed dynamic pooling convolutional neural network for learning. And constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, wherein the model is formed by stacking three layers of one-dimensional convolutions. The filling mode adopts a same mode, and batch normalization is carried out along with each layer of convolution.

For each layer of convolution operation, setting the number of channels c of the convolution operation, and outputting the hidden layer after one-dimensional convolution as follows:

；

after the convolution kernel operation, a plurality of characteristic sequences can be obtained for each input data, and the characteristics of the last layer of convolution output are set as follows:

；

DropOut is added after the convolution operation to prevent overfitting.

(2.3) for the feature vector output by the last layer of convolution, adopting k-max _ posing as a nonlinear down-sampling function, and extracting features by utilizing nonlinear function dynamic pooling operation, wherein the dynamic pooling operation is as follows:

；

and after the pooling operation, obtaining the message load structure characteristic vector.

Further, the step (3) specifically includes the following sub-steps:

and (3.1) constructing a mobile application private encryption protocol message load time sequence characteristic learning model based on a long-time and short-time memory network, wherein the model comprises 64 memory units and is used for learning message load time sequence characteristic data.

The mobile application private encryption protocol message load time sequence characteristic learning model adopts a gate control mechanism to learn:

；

the gating values can be compressed between the [0,1] intervals by the activation function.

DropOut was added to the learning model to prevent overfitting, with a threshold of 0.5.

(3.2) model outputs are:

；

and the unit state vector acts with an output gate after passing through the activation function to obtain a time sequence characteristic vector of the output message load.

Further, the step (4) specifically includes the following sub-steps:

and (4.1) constructing a mobile application private encryption protocol message session interactive feature learning model based on the graph convolution neural network. The model structure comprises two times of image convolution operations, the number of channels of the two times of image convolution is set, and a function selection function is activated.

(ii) a Wherein, the number of the network data packets of the graph is

Each node contains a characteristic number of packets of

The feature matrix is

The adjacency matrix is

；

And (4.2) performing graph convolution operation by using the constructed learning model. In the model, for each layer map the convolution operations are:

；

(4.3) after the two-layer graph convolution operation, one is obtained

Obtaining:

；

(4.4) compressing by using a full connection layer pair, reducing dimensionality, and learning to obtain a message load session feature vector:

compressing by using a layer of full connection layer pair, reducing dimensionality, and learning to obtain a message load session feature vector:

；

further, the step (5) specifically comprises the following sub-steps:

and (5.1) performing ensemble learning and combined training on the three models, and setting hyper-parameters during model combined training.

And performing feature fusion splicing on the obtained message load structure feature vector, the message load time sequence feature vector and the message session interaction feature vector, and connecting to obtain the message load structure feature vector, the message load time sequence feature vector and the message session interaction feature vector.

(5.2) calculating through the second fully-connected layer and its softmax activation function:

(5.1) carrying out integrated learning and combined training on the message structural feature learning model, the message time sequence feature learning model and the message interaction feature learning model, and setting hyper-parameters during model combined training; and performing feature fusion splicing on the obtained message load structure feature vector, the message load time sequence feature vector and the message session interaction feature vector, and connecting to obtain:

；

(5.2) calculating through the second full connection layer and the softmax activation function thereof:

；

(5.3) finally, calculating and outputting the message type analysis result of the private encryption protocol of the mobile application

：

；

The method provided by the invention extracts and learns the mobile application encryption message protocol characteristics of different modes from multiple dimensions, integrates and learns the load structure characteristics, the load time sequence characteristics and the session interaction characteristics of the mobile application private encryption protocol message, constructs the mobile application encryption protocol message type analysis model, has strong generalization capability, and obtains a good classification effect on encryption network flow data sets of different environments.

Example 2

As shown in fig. 1 to fig. 16, as a further optimization of embodiment 1, on the basis of embodiment 1, the present embodiment further includes the following technical features:

in this embodiment, a model framework is shown in fig. 3, and first, preprocessing acquired mobile application network traffic raw data, and extracting structural feature data, time sequence feature data, and interaction feature data of a packet load. Then constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, and learning to obtain a message load structure characteristic vector; constructing a mobile application private encryption protocol message time sequence characteristic learning model based on a long-time and short-time memory network, and learning to obtain a message load time sequence characteristic vector; and constructing a mobile application private encryption protocol message interactive feature learning model based on the graph convolution neural network, and learning to obtain a message session interactive feature vector. And secondly, fusing and splicing the message load structure characteristic vector, the time sequence characteristic vector and the session interaction characteristic vector, and outputting an analysis result of the message type of the mobile application private encryption protocol by using a maximum entropy classifier.

Specifically, the method for analyzing the message type of the mobile application encryption protocol based on the multi-mode feature fusion learning of the embodiment further includes the following technical features:

(1) Preprocessing the acquired mobile application network flow original data, and extracting the structural feature data, the time sequence feature data and the interactive feature data of the message load.

In (1.1) of this step: in the design process of the message type analysis model and the classifier of the mobile application encryption protocol, the effective input problem of the classifier needs to be considered so as to improve the efficiency of classification and identification. Whether an open network traffic data set or network service data traffic collected by researchers are adopted, the original traffic format is in the pcap format, and the pcap format cannot be directly used for inputting a mobile application encryption protocol message type analysis model, and data needs to be preprocessed.

Five types of network mobile applications with different purposes, such as audio-visual entertainment, news information, life shopping, instant messaging and tools, are selected, and the network mobile applications comprise 17 different mobile application tools. The private encryption protocol message types used by the mobile applications are used as tag data and run in a public network environment and a campus network environment to collect corresponding network traffic data. The resulting data set is shown in table 1.

Table 1 collected mobile application network traffic data set

And embedding the characteristics extracted from each data packet in the graph node to express the encrypted network traffic characteristics. Calculating the length of the transmission layer load and a standard information entropy, wherein the calculation formula of the standard information entropy is as follows:

；

in general

A bit string or a character string of a particular length.

And then carrying out graph node characteristic embedding and correlation on the length of the transport layer load and the standard information entropy. And combining the three values of the data packet direction, the load length and the standard information entropy into an array. Sequence-to-graph feature representation for each session can generate a matrix of 3*N and a label.

(2) Constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, and learning to obtain a message load structure characteristic vector

The specific process of the step is as follows:

Setting sparsity constraint conditions in a hidden layer of an autoencoder;

and setting a noise robustness constraint condition in the self-encoder to constrain the connection weight matrix so as to strengthen a larger weight and weaken the disturbance of a small weight representing network background traffic noise.

And inputting the message load structure characteristic data into a self-encoder with sparsity constraint conditions and noise robustness constraint conditions for unsupervised learning, and generating a characteristic vector after dimension reduction and noise resistance processing.

And (2.2) inputting the feature vector subjected to the dimension reduction and noise resistance processing into a constructed dynamic pooling convolutional neural network for learning. And constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, wherein the model is formed by stacking three layers of one-dimensional convolutions. The filling mode adopts a same mode, and batch normalization is carried out along with each layer of convolution. A list of unit structures of the message payload structure feature learning model is shown in table 2.

Table 2 list of unit structures of message payload structure feature learning model

For each layer of convolution operation, the hidden layer output after one-dimensional convolution is:

；

；

after the convolution operation DropOut is added to prevent overfitting, with a threshold of 0.2.

(2.3) for the feature vector output by the last layer of convolution, taking k-max boosting as a nonlinear down-sampling function, and extracting the features by utilizing nonlinear function dynamic pooling operation, wherein the dynamic pooling operation is as follows:

；

and obtaining the characteristic vector of the message load structure after the pooling operation.

(3) And constructing a message time sequence characteristic learning model of the mobile application private encryption protocol based on the long-time memory network, and learning to obtain a message load time sequence characteristic vector.

The specific process of the step is as follows:

and (3.1) constructing a mobile application private encryption protocol message load time sequence characteristic learning model based on a long-time and short-time memory network, wherein the model comprises 64 memory units and is used for learning input traffic characteristics. A list of unit structures of the message payload timing characteristic learning model is shown in table 3.

Table 3 list of unit structures of message load timing characteristic learning model

；

(3.2) model outputs are:

；

(4) And constructing a mobile application private encryption protocol message interactive feature learning model based on the graph convolution neural network, and learning to obtain a message session interactive feature vector.

The specific process of the step is as follows:

and (4.1) constructing a mobile application private encryption protocol message session interactive feature learning model based on the graph convolution neural network, wherein the unit structure of the model is set as shown in the table 4.

Table 4 list of unit structures of interactive feature learning model for message sessions

(ii) a Wherein, the number of the network data packets of the graph is

Each node contains a characteristic number of packets of

The feature matrix is

The adjacency matrix is

。

；

(4.3) after the two-layer graph convolution operation, one is obtained

And obtaining:

；

(4.4) compressing by using a layer of full connection layer pair, reducing dimensionality, and learning to obtain a message load session feature vector:

；

(5) And fusing and splicing the load structure characteristic vector, the time sequence characteristic vector and the session interaction characteristic vector of the mobile application private encryption protocol message, and outputting an analysis result of the type of the mobile application private encryption protocol message by using a maximum entropy classifier.

The specific process of the step is as follows:

(5.1) performing ensemble learning and combined training on the three models, wherein the hyper-parameter setting during the model combined training is shown in the table 5.

TABLE 5 parameter settings during training of three model combinations

And performing feature fusion splicing on the obtained feature vectors, wherein a list of unit structures of the feature fusion splicing is shown in table 6.

Table 6 list of unit structures for feature fusion splicing

Are connected to obtain

；

；

(5.3) finally calculating and outputting the analysis result of the type of the private encryption protocol message of the mobile application, namely

The sequence numbers corresponding to the categories to which the data belongs:

；

wherein,

indicating the corresponding sequence number of the belonging category.

The experiment of this embodiment is performed on the acquired data set of the network mobile application in 17, and the experimental result is shown in table 7, which shows the analysis result of the method of this embodiment for each application traffic encryption protocol packet type. From the data in the table it can be seen that: four types of standard-finding indexes are applied to more than 99 percent, namely Jingdong, mei Tuo, aiqi skill and much spelling; for the recall index, there are 4 types of applications which exceed 98%, namely Microsoft-Launcher, dog searching input method, weChat and Mei Tuo respectively; for the F1 value index, over 98% have 5 types of applications, which are search dog input, microsoft-Launcher, kyoto, mei Tuo, and WeChat, respectively. The weighted averages of the precision, recall, and F1 values were 97.29%,97.26% and 97.27%, respectively, the overall accuracy of the model on this data set reached 97.26%.

Table 7 type resolution results of the inventive method on a dataset of a network mobile application in 17

In the comparison experiment, the model 2D-CNN, LSTM, GCN, CNN + LSTM is selected for comparison so as to verify the effectiveness of the message type analysis method of the mobile application encryption protocol for multi-mode feature fusion learning. The final overall comparative experimental results are shown in fig. 13 to 16.

It should be noted that, for the sake of simplicity, the present embodiment is described as a series of acts, but those skilled in the art should understand that the present application is not limited by the described order of acts, because some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art will recognize that the embodiments described in this specification are preferred embodiments and that acts or modules referred to are not necessarily required for this application.

The invention can accurately identify the types of the private encryption protocol messages of various network mobile applications, and improve the supervision efficiency and the supervision strength of network space safety;

the invention is based on the load data above the transmission layer in the network flow data to learn and classify, does not depend on the IP address and port number information of the head of the network flow data packet, and the generalization capability of the classification model is strong;

the invention carries out data set sampling test in a complex network environment, and the detection result more accords with the requirement under a real network environment.

It should be noted that, in the present invention, the execution sequence of the "S2A, the message structure feature learning", "S2B, the message timing feature learning", and "S2C, the message interaction feature learning" may be in various forms, and may even be performed simultaneously, so the order of the steps in the embodiments described in the present invention should not be considered as limiting the execution sequence of the three.

As described above, the present invention can be preferably realized.

All features disclosed in all embodiments in this specification, or all methods or process steps implicitly disclosed, may be combined and/or expanded, or substituted, in any way, except for mutually exclusive features and/or steps.

While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications, equivalent arrangements, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for analyzing the type of a mobile application encryption protocol message is characterized in that different modal characteristics of the mobile application encryption protocol message are extracted and learned, and the type of the encryption protocol message is analyzed by fusing the different modal characteristics.

2. The method for parsing message type according to mobile application encryption protocol of claim 1, comprising the steps of:

s2, feature learning, which specifically comprises the following steps:

S2A, learning message structure characteristics: building a mobile application private encryption protocol message structure feature learning model based on a dynamic pooling convolutional neural network by using the structure feature data, and learning to obtain a message load structure feature vector;

S2B, learning message time sequence characteristics: constructing a mobile application private encryption protocol message time sequence feature learning model based on a long-time and short-time memory network by using the time sequence feature data, and learning to obtain a message load time sequence feature vector;

3. The method according to claim 2, wherein the step S1 comprises the following steps:

s13, respectively calculating the sizes of the load data in the uplink direction and the downlink direction, and constructing a load data sequence in a hexadecimal form;

4. The method according to claim 3, wherein in step S16, the standard entropy is calculated as:

；

wherein,

the entropy of the standard information is represented,

represent an arbitrary distribution

Discrete random variables of

，

To represent

The number of discrete variables contained in (a),

indicating the sequence number of the bytes in the data packet,

which represents the bytes in the data packet,

representing bytes

In that

The probability of occurrence of (c).

5. The method for parsing message type of mobile application encryption protocol according to any one of claims 2 to 4, wherein the step S2A comprises the steps of:

the message structure characteristic learning model is constructed as follows:

；

wherein,

representing one-dimensional convolution kernelsThe row numbers of the weight matrix are numbered,

column labels of the weight matrix representing the one-dimensional convolution kernel,

in the weight matrix representing the one-dimensional convolution kernel

Go to the first

The weight value of a column is determined,

which represents the shape of the convolution kernel,

representing input data

Go to the first

The value of the column is such that,

the total number of rows of data is represented,

which represents the total number of columns entered,

the shape of the input is represented by,

to represent the output

A value of each position;

after the convolution kernel operation, a plurality of characteristic sequences can be obtained for each input data, and the characteristic vector output by the last layer of convolution is set as follows:

；

wherein,

representing feature vectors

Each element of (1);

the dynamic pooling operation is as follows:

；

wherein,

the message structure is represented by a character of the message,

indicates the number of all the convolutional layers,

indicates the number of layers of the current convolutional layer,

which indicates the length of the input sequence,

representing fixed pooling layer parameters.

6. The method for parsing message type of mobile application encryption protocol according to claim 5, wherein in step S2A1, the cost function of sparsity constraint condition is:

；

wherein,

a cost function representing a sparsity constraint,

representing the input from the encoder, and,

the sparsity constraint is expressed in terms of,

the weight representing the sparsity constraint is represented by,

which represents the expectation of the total noise,

representing the number of implicit layers in the self-encoder,

,

representing gaussian noise with a mean of 0 and a variance of 1,

representing a neural network

The layer is input into the device body,

a number of a hidden layer unit is indicated,

the number of the neurons in the hidden layer is represented,

representing a hidden layer response;

the cost function of the noise robustness constraint is:

；

wherein,

a cost function representing a noise robustness constraint,

the target output is represented by a target output,

representing the output from the encoder learning network,

which is indicative of an activation factor,

、

a number representing two input data is shown,

representing input data from

To input data

The connection weight of (2).

7. The method according to claim 6, wherein the step S2B comprises the following steps:

；

wherein,

a function of a gate unit is represented,

、

、

respectively representing a forgetting gate, an input gate or an output gate,

it is shown that the activation function is,

a parameter corresponding to a forgetting gate, an input gate or an output gate,

indicating the time of day

The input of (a) is performed,

indicating the time of day

Is then outputted from the output of (a),

a bias value representing a forgetting gate, an input gate, or an output gate;

；

wherein,

a characteristic vector representing the time sequence of the message payload,

it is shown that the activation function is,

a vector of the states of the cells is represented,

the tan h activation function is expressed as,

a parameter indicative of the output gate is provided,

indicating the bias.

8. The method according to claim 7, wherein the step S2C comprises the steps of:

(ii) a Wherein, the net of the figureThe number of network data packets is

Each node contains a characteristic number of packets of

The feature matrix is

The adjacency matrix is

；

；

wherein,

，

the unit matrix is represented by a matrix of units,

to represent

A corresponding matrix of degrees is formed by the degree matrix,

the number of network layers is indicated,

denotes the first

The weight of the layer, the dimension of the weight is

，

Is shown passing through

The dimensionality of the graph node data after the layer convolution,

is shown as

The biasing of the layers is such that,

is shown as

Input of the layer, the input of the first layer being

，

Representing a nonlinear activation function ReLU function;

S2C3, obtaining one after two-layer graph convolution operation

Obtaining:

；

wherein,

representing the interactive feature vector of the messaging session,

has the dimension of

，

Representing each element in the message session interaction feature vector;

；

wherein,

a feature vector representing the message payload session,

a weight matrix representing the fully-connected layer,

the offset is represented by the number of bits in the bit,

it is shown that the activation function is,

the ReLU function is used at the fully connected layer.

9. The method according to claim 8, wherein the step S3 comprises the following steps:

；

wherein,

representing message session multimode fusion feature vectors;

；

wherein,

a weight matrix representing the second fully-connected layer,

the offset is represented by the number of bits in the bit,

the length representing the number of classes that need to be classified,

is a one-dimensional vector;

s33, finally calculating and outputting the analysis result of the mobile application private encryption protocol message type

：

；

Wherein,

indicating the corresponding sequence number of the belonging category.

10. A mobile application encryption protocol message type parsing system, based on any one of claims 1 to 9, characterized in that the mobile application encryption protocol message type parsing method comprises the following modules:

the message interaction feature learning module: the method comprises the steps of constructing a mobile application private encryption protocol message interactive feature learning model based on a graph convolution neural network by using interactive feature data, and learning to obtain a message session interactive feature vector;