CN115277888A - Method and system for analyzing message type of mobile application encryption protocol - Google Patents

Method and system for analyzing message type of mobile application encryption protocol Download PDF

Info

Publication number
CN115277888A
CN115277888A CN202211171000.9A CN202211171000A CN115277888A CN 115277888 A CN115277888 A CN 115277888A CN 202211171000 A CN202211171000 A CN 202211171000A CN 115277888 A CN115277888 A CN 115277888A
Authority
CN
China
Prior art keywords
message
data
feature
mobile application
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211171000.9A
Other languages
Chinese (zh)
Other versions
CN115277888B (en
Inventor
吉庆兵
罗杰
潘炜
倪绿林
谈程
康璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 30 Research Institute
Original Assignee
CETC 30 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 30 Research Institute filed Critical CETC 30 Research Institute
Priority to CN202211171000.9A priority Critical patent/CN115277888B/en
Publication of CN115277888A publication Critical patent/CN115277888A/en
Application granted granted Critical
Publication of CN115277888B publication Critical patent/CN115277888B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to the technical field of message analysis, and discloses a method and a system for analyzing the type of a mobile application encryption protocol message. The invention solves the problems of high resource consumption, poor universality, low accuracy, poor generalization capability and the like in the prior art.

Description

Method and system for analyzing message type of mobile application encryption protocol
Technical Field
The invention relates to the technical field of message analysis, in particular to a method and a system for analyzing message types of a mobile application encryption protocol.
Background
The trend of network traffic to the comprehensive encryption era is great, the encryption technology can ensure the safety of data transmission in network communication, but undeniably, malicious behaviors such as malicious software, illegal statements, network attacks and the like are also hidden in network mobile application encryption traffic, and serious threats are brought to users using the internet. The method is an important precondition for information monitoring, safety detection and electronic evidence collection, and has very important significance for maintaining healthy and green network environment, national safety and social stability.
The traditional methods of port matching and deep packet inspection need to analyze the message content first and then identify the message type through regular matching, but these pair encryption protocol messages are faced with failure. The method for using machine learning needs to design artificial features of a message to be identified, which consumes a lot of time and energy, and in the face of a plurality of application programs and encryption protocols with differences, it is difficult to design a feature set which generally reflects traffic features, which limits the universality of the machine learning method, and thus, when the machine learning method is used for analyzing and identifying encrypted network traffic, a better effect is difficult to obtain.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a method and a system for analyzing the message type of a mobile application encryption protocol, which solve the problems of high resource consumption, poor universality, low accuracy, poor generalization capability and the like in the prior art.
The technical scheme adopted by the invention for solving the problems is as follows:
a method for analyzing the type of a mobile application encryption protocol message extracts and learns different modal characteristics of the mobile application encryption protocol message, and realizes the type analysis of the encryption protocol message by fusing the different modal characteristics.
As a preferable technical scheme, the method comprises the following steps:
s1, preprocessing message data: preprocessing the acquired mobile application network flow original data, and extracting structural feature data, time sequence feature data and interactive feature data of message loads in the original data;
s2, feature learning, which specifically comprises the following steps:
S2A, learning message structure characteristics: constructing a mobile application private encryption protocol message structure feature learning model based on a dynamic pooling convolutional neural network by using the structure feature data, and learning to obtain a message load structure feature vector;
S2B, learning message time sequence characteristics: constructing a mobile application private encryption protocol message time sequence characteristic learning model based on a long-time and short-time memory network by using the time sequence characteristic data, and learning to obtain a message load time sequence characteristic vector;
S2C, learning message interaction characteristics: constructing a mobile application private encryption protocol message interactive feature learning model based on a graph convolution neural network by using interactive feature data, and learning to obtain a message session interactive feature vector;
s3, message type analysis: and fusing and splicing the message load structure characteristic vector, the time sequence characteristic vector and the interactive characteristic vector, and outputting an analysis result of the message type of the mobile application private encryption protocol by using a maximum entropy classifier.
As a preferred technical solution, the step S1 includes the following steps:
s11, setting the length of an original network data packet intercepted by preprocessing, segmenting continuous network flow by a session flow, and separating network message load data above a transmission layer of each data packet in the session flow;
s12, distinguishing the uplink and downlink directions of the message load data: defining the uplink direction and the downlink direction of the message load data in the data packets in the session according to the data flow direction, taking the message load data which has the same initial address, destination address and port number as the first data packet as the uplink message load data, and taking the rest as the downlink message load data;
s13, respectively calculating the sizes of the load data in the uplink direction and the downlink direction, constructing a payload data sequence in hexadecimal form;
s14, splicing the uplink message load data and the downlink message load data to obtain message load structural characteristic data according to a splicing mode of the uplink data before and the downlink data after;
s15, arranging according to an organization mode of the data packet time sequence to obtain message load time sequence characteristic data;
s16, constructing a feature expression model based on a sequence-to-graph, and converting a data packet sequence in the session flow into an undirected graph; for each data packet in the session flow, extracting the packet direction of the data packet, the standard information entropy of the load data and the load length, and embedding the packet direction of the data packet, the standard information entropy of the load data and the load length as graph node characteristics to obtain message load interaction characteristic data.
As a preferred technical solution, in step S16, the calculation formula of the standard information entropy is:
Figure 948557DEST_PATH_IMAGE001
wherein,
Figure 388897DEST_PATH_IMAGE002
the entropy of the standard information is represented,
Figure 366081DEST_PATH_IMAGE003
representing an arbitrary distribution
Figure 434531DEST_PATH_IMAGE004
Discrete random variables of
Figure 838967DEST_PATH_IMAGE005
Figure 750291DEST_PATH_IMAGE006
To represent
Figure 265586DEST_PATH_IMAGE003
The number of discrete variables contained in (a),
Figure 719701DEST_PATH_IMAGE007
indicating the sequence number of the bytes in the data packet,
Figure 597835DEST_PATH_IMAGE008
which represents the bytes in the data packet,
Figure 137400DEST_PATH_IMAGE009
representing bytes
Figure 456386DEST_PATH_IMAGE008
In that
Figure 624062DEST_PATH_IMAGE003
The probability of occurrence of (c).
As a preferred technical solution, the step S2A includes the following steps:
S2A1, inputting the message load structure characteristic data into a self-encoder with sparsity constraint conditions and noise robustness constraint conditions for anti-noise and dimension-reduction processing, and generating a feature vector after dimension-reduction and anti-noise processing;
S2A2, constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network; inputting the feature vector subjected to the dimension reduction and noise resistance processing into a constructed message structure feature learning model for learning to obtain a feature sequence subjected to convolution kernel operation;
the message structure characteristic learning model is constructed as follows:
constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, wherein the message structure characteristic learning model is formed by stacking three layers of one-dimensional convolutions; the filling mode adopts a same mode, and each layer of convolution is accompanied with batch normalization; for each layer of convolution operation, the hidden layer output after one-dimensional convolution is:
Figure 370302DEST_PATH_IMAGE010
wherein,
Figure 866005DEST_PATH_IMAGE007
weight moments representing one-dimensional convolution kernelsThe row numbers of the array are numbered,
Figure 988682DEST_PATH_IMAGE011
column numbers of the weight matrix representing the one-dimensional convolution kernel,
Figure 761597DEST_PATH_IMAGE012
in the weight matrix representing the one-dimensional convolution kernel
Figure 678737DEST_PATH_IMAGE007
Go to the first
Figure 927316DEST_PATH_IMAGE011
The weight value of a column is determined,
Figure 916001DEST_PATH_IMAGE013
which represents the shape of the convolution kernel or kernels,
Figure 730373DEST_PATH_IMAGE014
representing input data
Figure 21677DEST_PATH_IMAGE007
Go to the first
Figure 554289DEST_PATH_IMAGE015
The value of the column is such that,
Figure 330353DEST_PATH_IMAGE016
the total number of rows of data is represented,
Figure 202494DEST_PATH_IMAGE017
which represents the total number of columns entered,
Figure 727017DEST_PATH_IMAGE018
the shape of the input is represented by,
Figure 809242DEST_PATH_IMAGE019
to represent the output
Figure 15095DEST_PATH_IMAGE020
A value of each position;
after the convolution kernel operation, a plurality of characteristic sequences are obtained for each input data, and the characteristic vector output by the last layer of convolution is set as:
Figure 804060DEST_PATH_IMAGE021
wherein,
Figure 312533DEST_PATH_IMAGE022
representing feature vectors
Figure 554158DEST_PATH_IMAGE023
Each element of (1);
S2A3, for the feature vector output by the last layer of convolution, taking k-max boosting as a nonlinear down-sampling function, and extracting the feature vector by utilizing nonlinear function dynamic pooling operation to obtain a message load structure feature vector;
the dynamic pooling operation is as follows:
Figure 298123DEST_PATH_IMAGE024
wherein,
Figure 207174DEST_PATH_IMAGE025
the structural characteristics of the message are represented,
Figure 135815DEST_PATH_IMAGE026
indicates the number of all the convolutional layers,
Figure 67999DEST_PATH_IMAGE027
indicates the number of layers of the current convolutional layer,
Figure 412393DEST_PATH_IMAGE028
which indicates the length of the input sequence,
Figure 756043DEST_PATH_IMAGE029
indicating a fixed poolLayer parameters.
As a preferred technical solution, in step S2A1, the cost function of the sparsity constraint condition is:
Figure 465373DEST_PATH_IMAGE030
Figure 947170DEST_PATH_IMAGE031
Figure 157571DEST_PATH_IMAGE032
wherein,
Figure 775634DEST_PATH_IMAGE033
a cost function representing a sparsity constraint,
Figure 921445DEST_PATH_IMAGE034
representing the input from the encoder, and,
Figure 624959DEST_PATH_IMAGE035
the sparsity constraint is expressed in terms of,
Figure 124204DEST_PATH_IMAGE036
the weight representing the sparsity constraint is represented by,
Figure 65616DEST_PATH_IMAGE037
which represents the expectation of the total noise,
Figure 179065DEST_PATH_IMAGE038
representing the number of implicit layers in the self-encoder,
Figure 697771DEST_PATH_IMAGE039
,
Figure 125341DEST_PATH_IMAGE040
representing gaussian noise with a mean of 0 and a variance of 1,
Figure 717997DEST_PATH_IMAGE041
representing a neural network
Figure 579511DEST_PATH_IMAGE038
The layer is input into the device body,
Figure 257617DEST_PATH_IMAGE042
a number of an implicit layer element is indicated,
Figure 223299DEST_PATH_IMAGE043
the number of the neurons in the hidden layer is represented,
Figure 670461DEST_PATH_IMAGE044
representing a hidden layer response;
the cost function of the noise robustness constraint is:
Figure 453609DEST_PATH_IMAGE045
wherein,
Figure 822274DEST_PATH_IMAGE046
a cost function representing a noise robustness constraint,
Figure 653964DEST_PATH_IMAGE047
the target output is represented by a target output,
Figure 768681DEST_PATH_IMAGE048
representing the output from the encoder learning network,
Figure 863676DEST_PATH_IMAGE049
which is indicative of an activation factor,
Figure 516374DEST_PATH_IMAGE050
Figure 948493DEST_PATH_IMAGE051
a number representing two input data is shown,
Figure 370247DEST_PATH_IMAGE052
representing input data from
Figure 370564DEST_PATH_IMAGE053
To input data
Figure 244979DEST_PATH_IMAGE054
The connection weight of (c).
As a preferred technical solution, the step S2B includes the following steps:
S2B1, constructing a mobile application private encryption protocol message load time sequence characteristic learning model based on a long-time memory network, wherein the message load time sequence characteristic learning model comprises JI memory units, JI is an integer and 32 is more than or equal to JI and less than or equal to 256, and learning message load time sequence characteristic data by using the constructed message load time sequence characteristic learning model, wherein the learning formula is as follows:
Figure 264144DEST_PATH_IMAGE055
wherein,
Figure 478088DEST_PATH_IMAGE056
a function of a gate unit is represented,
Figure 711623DEST_PATH_IMAGE057
Figure 135651DEST_PATH_IMAGE058
Figure 847255DEST_PATH_IMAGE059
respectively representing a forgetting gate, an input gate or an output gate,
Figure 446864DEST_PATH_IMAGE060
it is shown that the activation function is,
Figure 585721DEST_PATH_IMAGE061
corresponding to forgetting to gate and loseThe parameters of the input gate or the output gate,
Figure 247778DEST_PATH_IMAGE062
indicating the time of day
Figure 966335DEST_PATH_IMAGE063
The input of (a) is performed,
Figure 217188DEST_PATH_IMAGE064
indicating the time of day
Figure 854842DEST_PATH_IMAGE065
Is then outputted from the output of (a),
Figure 394408DEST_PATH_IMAGE066
a bias value representing a forgetting gate, an input gate, or an output gate;
S2B2, obtaining a time sequence characteristic vector output as a message load, wherein the output formula is as follows:
Figure 713394DEST_PATH_IMAGE067
wherein,
Figure 130338DEST_PATH_IMAGE068
representing the time sequence characteristic vector of the message load,
Figure 876577DEST_PATH_IMAGE069
it is shown that the activation function is,
Figure 372280DEST_PATH_IMAGE070
a state vector of the cell is represented,
Figure 494957DEST_PATH_IMAGE071
the tan h activation function is expressed as,
Figure 517140DEST_PATH_IMAGE072
a parameter indicative of the output gate is provided,
Figure 637542DEST_PATH_IMAGE073
indicating the bias.
As a preferred technical solution, the step S2C includes the steps of:
S2C1, constructing a mobile application private encryption protocol message session interactive feature learning model based on a graph convolution neural network, wherein the session interactive feature learning model comprises two graph convolution layers which are sequentially connected, setting the number of channels of two graph convolutions when graph convolution operation is carried out, and activating a function to select a ReLU function;
inputting the message load interactive characteristic data into a graph convolution neural network model, and converting the graph into a sequence-to-graph method
Figure 948438DEST_PATH_IMAGE074
(ii) a Wherein, the number of the network data packets of the graph is
Figure 422276DEST_PATH_IMAGE075
Each node contains a characteristic number of packets of
Figure 236648DEST_PATH_IMAGE076
The feature matrix is
Figure 793531DEST_PATH_IMAGE077
The adjacency matrix is
Figure 388461DEST_PATH_IMAGE078
And S2C2, performing graph convolution operation by using the learning model constructed in the step S2C1, wherein the graph convolution operation comprises the following steps of:
Figure 587361DEST_PATH_IMAGE079
wherein,
Figure 459502DEST_PATH_IMAGE080
Figure 984024DEST_PATH_IMAGE081
the unit matrix is represented by a matrix of units,
Figure 52868DEST_PATH_IMAGE082
to represent
Figure 524300DEST_PATH_IMAGE083
A corresponding matrix of degrees is formed by the degree matrix,
Figure 313265DEST_PATH_IMAGE007
the number of network layers is indicated,
Figure 805426DEST_PATH_IMAGE084
is shown as
Figure 312631DEST_PATH_IMAGE085
The weight of the layer, the dimension of the weight is
Figure 322175DEST_PATH_IMAGE086
Figure 778695DEST_PATH_IMAGE087
Is shown passing through
Figure 645020DEST_PATH_IMAGE007
The dimensionality of the graph node data after the layer convolution,
Figure 311625DEST_PATH_IMAGE088
is shown as
Figure 921598DEST_PATH_IMAGE007
The biasing of the layers is such that,
Figure 747471DEST_PATH_IMAGE089
is shown as
Figure 722381DEST_PATH_IMAGE090
Input of the layer, the input of the first layer being
Figure 204177DEST_PATH_IMAGE091
Figure 663846DEST_PATH_IMAGE092
Representing a nonlinear activation function ReLU function;
S2C3, obtaining one after two-layer graph convolution operation
Figure 281910DEST_PATH_IMAGE093
Using the Flatten operation to stretch the matrix into one-dimensional eigenvectors
Figure 427720DEST_PATH_IMAGE094
Obtaining:
Figure 193551DEST_PATH_IMAGE095
wherein,
Figure 879747DEST_PATH_IMAGE096
representing the interactive feature vector of the messaging session,
Figure 555579DEST_PATH_IMAGE096
has the dimension of
Figure 934608DEST_PATH_IMAGE097
Figure 204046DEST_PATH_IMAGE098
Representing each element in the message session interaction feature vector;
S2C4, compressing by using a layer of full connection layer pair, reducing dimensionality, and learning to obtain a message load session feature vector:
Figure 631617DEST_PATH_IMAGE099
wherein,
Figure 224272DEST_PATH_IMAGE100
representing the feature vector of the message payload session,
Figure 836519DEST_PATH_IMAGE101
a weight matrix representing the fully-connected layer,
Figure 249046DEST_PATH_IMAGE102
the offset is represented by the number of bits in the bit,
Figure 745886DEST_PATH_IMAGE069
it is shown that the activation function is,
Figure 927469DEST_PATH_IMAGE069
the ReLU function is used at the fully connected layer.
As a preferred technical solution, the step S3 includes the following steps:
s31, performing integrated learning and combined training on the message structure characteristic learning model, the message time sequence characteristic learning model and the message interaction characteristic learning model, and setting a hyper-parameter during model combined training; and performing feature fusion splicing on the obtained message load structure feature vector, the message load time sequence feature vector and the message session interaction feature vector, and connecting to obtain:
Figure 240112DEST_PATH_IMAGE103
wherein,
Figure 343197DEST_PATH_IMAGE104
representing message session multimode fusion feature vectors;
s32, calculating through a second full connection layer and a softmax activation function thereof:
Figure 174887DEST_PATH_IMAGE105
wherein,
Figure 804452DEST_PATH_IMAGE106
a weight matrix representing the second fully-connected layer,
Figure 633867DEST_PATH_IMAGE107
the offset is represented by the number of bits in the bit,
Figure 20986DEST_PATH_IMAGE108
the length representing the number of classes that need to be classified,
Figure 469416DEST_PATH_IMAGE108
is a one-dimensional vector;
s33, finally calculating and outputting the message type analysis result of the private encryption protocol of the mobile application
Figure 891170DEST_PATH_IMAGE109
Figure 891487DEST_PATH_IMAGE110
Wherein,
Figure 93799DEST_PATH_IMAGE111
indicating the corresponding sequence number of the belonging category.
A mobile application encryption protocol message type analysis system is based on the mobile application encryption protocol message type analysis method and comprises the following modules:
a message data preprocessing module: the method comprises the steps of preprocessing acquired mobile application network flow original data, and extracting structural feature data, time sequence feature data and interactive feature data of message loads in the original data;
the message structural feature learning module: the method comprises the steps of constructing a mobile application private encryption protocol message structure feature learning model based on a dynamic pooling convolutional neural network by using structure feature data, and learning to obtain a message load structure feature vector;
a message time sequence characteristic learning module: the method comprises the steps of constructing a mobile application private encryption protocol message time sequence characteristic learning model based on a long-time and short-time memory network by using time sequence characteristic data, and learning to obtain a message load time sequence characteristic vector;
the message interaction feature learning module: the method comprises the steps of constructing a mobile application private encryption protocol message interactive feature learning model based on a graph convolution neural network by utilizing interactive feature data, and learning to obtain a message session interactive feature vector;
a message type analysis module: the message type analysis method is used for fusing and splicing the message load structure characteristic vector, the time sequence characteristic vector and the interaction characteristic vector, and outputting an analysis result of the mobile application private encryption protocol message type by using a maximum entropy classifier;
the input ends of the message structure characteristic learning module, the message time sequence characteristic learning module and the message type analysis module are respectively and electrically connected with the output end of the message data preprocessing module, and the output ends of the message structure characteristic learning module, the message time sequence characteristic learning module and the message type analysis module are respectively and electrically connected with the input end of the message type analysis module.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention can accurately identify the message types of various network mobile application private encryption protocols, thereby improving the supervision efficiency and the supervision strength of network space safety;
(2) The invention is based on the load data above the transmission layer in the network flow data to learn and classify, does not depend on the IP address and port number information of the head of the network flow data packet, and the generalization capability of a classification model is strong;
(3) The invention carries out data set sampling test in a complex network environment, and the detection result more accords with the requirement under a real network environment.
Drawings
Fig. 1 is a schematic diagram illustrating steps of a method for parsing a mobile application encryption protocol packet type according to the present invention;
fig. 2 is a schematic structural diagram of a mobile application encryption protocol message type parsing system according to the present invention;
FIG. 3 is a schematic diagram of a mobile application encryption protocol message type parsing framework for multi-mode feature fusion learning according to the present invention;
FIG. 4 is a diagram of a process for converting a sequence of packets to the session characteristics of the mobile application private encryption protocol packet of the figure;
FIG. 5 is one of exemplary graphs of a mobile application session data sequence to graph conversion result;
FIG. 6 is a second exemplary graph of a mobile application session data sequence to graph conversion result;
FIG. 7 is a third exemplary graph of a mobile application session data sequence to graph conversion result;
FIG. 8 is a fourth exemplary graph of a mobile application session data sequence to graph conversion result;
FIG. 9 is a fifth exemplary graph of a mobile application session data sequence to graph conversion result;
FIG. 10 is a sixth exemplary graph of a mobile application session data sequence to graph conversion result;
FIG. 11 is a seventh exemplary graph of a mobile application session data sequence to graph conversion result;
FIG. 12 is an eighth exemplary graph of a mobile application session data sequence to graph conversion result;
FIG. 13 is a schematic diagram showing the comparison of the accuracy of the analysis of 17 types of mobile application encryption protocol message types by other classification algorithms and the present invention;
FIG. 14 is a diagram illustrating comparison of precision ratios for analysis of 17 types of mobile application encryption protocol messages according to other classification algorithms and the present invention;
FIG. 15 is a schematic diagram of other classification algorithms and a comparison of recall ratios for 17 types of mobile application encryption protocol message type parsing according to the present invention;
fig. 16 is a schematic diagram of comparison of F1 values for other classification algorithms and analysis of 17 mobile application encryption protocol packet types according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited to these examples.
Example 1
As shown in fig. 1 to 16, the present invention provides a method for analyzing a message type of a mobile application encryption protocol for multi-mode feature fusion learning, that is, a method for analyzing a message type of a mobile application encryption protocol, including the following steps:
(1) Preprocessing the acquired mobile application network flow original data, and extracting load structure characteristic data, load time sequence characteristic data and session interaction characteristic data of the mobile application encryption protocol message.
(2) Constructing a mobile application private encryption protocol message structure characteristic learning model based on an autoencoder and a dynamic pooling convolutional neural network, and learning to obtain a message load structure characteristic vector;
(3) Constructing a mobile application private encryption protocol message time sequence characteristic learning model based on a long-time and short-time memory network, and learning to obtain a message load time sequence characteristic vector;
(4) And constructing a mobile application private encryption protocol message interaction feature learning model based on a graph convolution neural network, and learning to obtain a message session interaction feature vector.
(5) And fusing and splicing the load structure characteristic vector, the load time sequence characteristic vector and the session interaction characteristic vector of the mobile application private encryption protocol message, and outputting an analysis result of the type of the mobile application private encryption protocol message by using a maximum entropy classifier.
More specific description of the invention follows:
further, the step (1) specifically comprises the following substeps:
(1.1) preprocessing message load data of an original network data packet, setting the length of the data packet intercepted by preprocessing, segmenting continuous network flow by a session flow, and separating the network message load data above a transmission layer of each data packet in the session flow;
(1.2) distinguishing the uplink direction and the downlink direction of the message load data, distinguishing the data packets according to the directions when the uplink direction and the downlink direction are adopted, defining the direction of the first data packet in the session as the uplink direction, taking the message load data which has the same starting address, destination address and port number as the first data packet as the uplink message load data, and taking the rest as the downlink message load data.
And (1.3) respectively calculating the sizes of the load data in the uplink direction and the downlink direction, and constructing load data sequences in a hexadecimal form. The format is as follows:
the uplink message payload data is represented as: 00+ hex (uplink load data size);
the downlink message payload data is represented as: FF + hex (downstream payload data size).
And (1.4) splicing the uplink and downlink message load data according to the organization mode of the uplink data before the downlink data to obtain message load structural characteristic data.
And (1.5) arranging according to the organizing mode of the data packet time sequence to obtain message load time sequence characteristic data.
(1.6) constructing a feature expression model based on sequence-to-graph, and converting the data packet sequence in the session flow into an undirected graph. And extracting the packet direction, the information entropy and the load length of the load data of each data packet in the session flow, and embedding the packet direction, the information entropy and the load length of the load data as graph node characteristics to obtain message load interaction characteristic data.
The feature expression model based on sequence-to-graph is constructed by converting a data packet sequence in a conversation into a graph structure and performing feature expression on the converted data by utilizing a graph neural network. The transformation process is shown in FIG. 4. First, the transmission direction of the data packet needs to be distinguished. For this purpose, it is defined that the first packet sent in the session is C, the other is S, the positive direction of the packet sent by C to S is represented by 0, and the negative direction of the packet sent by S to C is represented by 1. Thus, the transmission process of the data packets of both sides of the session can be represented by an array A with the element value of 0 or 1, and the sequence of the elements in the array is the sequence of the data packets in the session. This one-dimensional array a representing the packet direction is converted into a adjacency matrix M of an undirected graph. The packets are connected in time sequence to form a sequence, and then the sequences are connected end to form a graph structure.
With the data structure of the graph, a one-dimensional sequence of data packet transmission processes can be represented in a two-dimensional mesh form. The graphical structure of the encryption protocol messaging session interaction feature of several mobile applications is shown in fig. 5-12.
Features extracted from each data packet are embedded in the graph nodes to express encrypted network traffic features. Calculating the length of the transmission layer load and a standard information entropy, wherein the calculation formula of the standard information entropy is as follows:
Figure 267291DEST_PATH_IMAGE112
and then carrying out graph node characteristic embedding and correlation on the length of the transport layer load and the standard information entropy. And combining the three values of the packet direction, the load length and the standard information entropy into an array. Sequence-to-graph feature representation for each session can generate a matrix of 3*N and a label.
Further, the step (2) specifically comprises the following sub-steps:
and (2.1) inputting the message load structural feature data into a self-encoder with sparsity constraint conditions and noise robustness constraint conditions for anti-noise and dimension reduction processing so as to improve the anti-interference capability of mobile application encryption protocol message type analysis under the network environment of background flow. The implementation of the step can not only reduce the training time of each round of the subsequent dynamic pooling convolutional neural network, but also extract the characteristics more accurately, and finally increase the accuracy of the type analysis of the mobile application encryption protocol message.
Setting sparsity constraint conditions in a hidden layer of a self-encoder, wherein the input of the self-encoder is, the noise of background flow is considered during input, the expectation of the input noise is that the sparsity constraint is that the weight of the sparsity constraint is, the number of hidden layers in the self-encoder is, the number of hidden layer units is, the number of hidden layer neurons is, the hidden layer response is, and the sparsity constraint cost function of the self-encoder is:
Figure 481235DEST_PATH_IMAGE113
Figure 714770DEST_PATH_IMAGE114
Figure 388066DEST_PATH_IMAGE115
and setting a noise robustness constraint condition in the self-encoder to constrain the connection weight matrix so as to strengthen a larger weight and weaken the disturbance of a small weight representing network background traffic noise. The cost function of the noise robustness constraint of the self-encoder is:
Figure 568511DEST_PATH_IMAGE116
and inputting the message load structure feature data into a self-encoder with sparsity constraint conditions and noise robustness constraint conditions for unsupervised learning, and generating a feature vector after dimension reduction and noise resistance processing.
And (2.2) inputting the feature vector subjected to the dimension reduction and noise resistance processing into a constructed dynamic pooling convolutional neural network for learning. And constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, wherein the model is formed by stacking three layers of one-dimensional convolutions. The filling mode adopts a same mode, and batch normalization is carried out along with each layer of convolution.
For each layer of convolution operation, setting the number of channels c of the convolution operation, and outputting the hidden layer after one-dimensional convolution as follows:
Figure 964858DEST_PATH_IMAGE117
after the convolution kernel operation, a plurality of characteristic sequences can be obtained for each input data, and the characteristics of the last layer of convolution output are set as follows:
Figure 431611DEST_PATH_IMAGE118
DropOut is added after the convolution operation to prevent overfitting.
(2.3) for the feature vector output by the last layer of convolution, adopting k-max _ posing as a nonlinear down-sampling function, and extracting features by utilizing nonlinear function dynamic pooling operation, wherein the dynamic pooling operation is as follows:
Figure 15039DEST_PATH_IMAGE119
and after the pooling operation, obtaining the message load structure characteristic vector.
Further, the step (3) specifically includes the following sub-steps:
and (3.1) constructing a mobile application private encryption protocol message load time sequence characteristic learning model based on a long-time and short-time memory network, wherein the model comprises 64 memory units and is used for learning message load time sequence characteristic data.
The mobile application private encryption protocol message load time sequence characteristic learning model adopts a gate control mechanism to learn:
Figure 733596DEST_PATH_IMAGE120
the gating values can be compressed between the [0,1] intervals by the activation function.
DropOut was added to the learning model to prevent overfitting, with a threshold of 0.5.
(3.2) model outputs are:
Figure 63078DEST_PATH_IMAGE121
and the unit state vector acts with an output gate after passing through the activation function to obtain a time sequence characteristic vector of the output message load.
Further, the step (4) specifically includes the following sub-steps:
and (4.1) constructing a mobile application private encryption protocol message session interactive feature learning model based on the graph convolution neural network. The model structure comprises two times of image convolution operations, the number of channels of the two times of image convolution is set, and a function selection function is activated.
Inputting the message load interactive characteristic data into a graph convolution neural network model, and converting the graph into a sequence-to-graph method
Figure 638415DEST_PATH_IMAGE122
(ii) a Wherein, the number of the network data packets of the graph is
Figure 646823DEST_PATH_IMAGE123
Each node contains a characteristic number of packets of
Figure 965809DEST_PATH_IMAGE124
The feature matrix is
Figure 133485DEST_PATH_IMAGE125
The adjacency matrix is
Figure 879724DEST_PATH_IMAGE126
And (4.2) performing graph convolution operation by using the constructed learning model. In the model, for each layer map the convolution operations are:
Figure 906586DEST_PATH_IMAGE127
(4.3) after the two-layer graph convolution operation, one is obtained
Figure 78197DEST_PATH_IMAGE093
Using the Flatten operation to stretch the matrix into one-dimensional eigenvectors
Figure 303642DEST_PATH_IMAGE094
Obtaining:
Figure 158466DEST_PATH_IMAGE128
(4.4) compressing by using a full connection layer pair, reducing dimensionality, and learning to obtain a message load session feature vector:
compressing by using a layer of full connection layer pair, reducing dimensionality, and learning to obtain a message load session feature vector:
Figure 203782DEST_PATH_IMAGE129
further, the step (5) specifically comprises the following sub-steps:
and (5.1) performing ensemble learning and combined training on the three models, and setting hyper-parameters during model combined training.
And performing feature fusion splicing on the obtained message load structure feature vector, the message load time sequence feature vector and the message session interaction feature vector, and connecting to obtain the message load structure feature vector, the message load time sequence feature vector and the message session interaction feature vector.
(5.2) calculating through the second fully-connected layer and its softmax activation function:
(5.1) carrying out integrated learning and combined training on the message structural feature learning model, the message time sequence feature learning model and the message interaction feature learning model, and setting hyper-parameters during model combined training; and performing feature fusion splicing on the obtained message load structure feature vector, the message load time sequence feature vector and the message session interaction feature vector, and connecting to obtain:
Figure 192467DEST_PATH_IMAGE130
(5.2) calculating through the second full connection layer and the softmax activation function thereof:
Figure 210102DEST_PATH_IMAGE131
(5.3) finally, calculating and outputting the message type analysis result of the private encryption protocol of the mobile application
Figure 298143DEST_PATH_IMAGE132
Figure 643805DEST_PATH_IMAGE133
The method provided by the invention extracts and learns the mobile application encryption message protocol characteristics of different modes from multiple dimensions, integrates and learns the load structure characteristics, the load time sequence characteristics and the session interaction characteristics of the mobile application private encryption protocol message, constructs the mobile application encryption protocol message type analysis model, has strong generalization capability, and obtains a good classification effect on encryption network flow data sets of different environments.
Example 2
As shown in fig. 1 to fig. 16, as a further optimization of embodiment 1, on the basis of embodiment 1, the present embodiment further includes the following technical features:
in this embodiment, a model framework is shown in fig. 3, and first, preprocessing acquired mobile application network traffic raw data, and extracting structural feature data, time sequence feature data, and interaction feature data of a packet load. Then constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, and learning to obtain a message load structure characteristic vector; constructing a mobile application private encryption protocol message time sequence characteristic learning model based on a long-time and short-time memory network, and learning to obtain a message load time sequence characteristic vector; and constructing a mobile application private encryption protocol message interactive feature learning model based on the graph convolution neural network, and learning to obtain a message session interactive feature vector. And secondly, fusing and splicing the message load structure characteristic vector, the time sequence characteristic vector and the session interaction characteristic vector, and outputting an analysis result of the message type of the mobile application private encryption protocol by using a maximum entropy classifier.
Specifically, the method for analyzing the message type of the mobile application encryption protocol based on the multi-mode feature fusion learning of the embodiment further includes the following technical features:
(1) Preprocessing the acquired mobile application network flow original data, and extracting the structural feature data, the time sequence feature data and the interactive feature data of the message load.
In (1.1) of this step: in the design process of the message type analysis model and the classifier of the mobile application encryption protocol, the effective input problem of the classifier needs to be considered so as to improve the efficiency of classification and identification. Whether an open network traffic data set or network service data traffic collected by researchers are adopted, the original traffic format is in the pcap format, and the pcap format cannot be directly used for inputting a mobile application encryption protocol message type analysis model, and data needs to be preprocessed.
Five types of network mobile applications with different purposes, such as audio-visual entertainment, news information, life shopping, instant messaging and tools, are selected, and the network mobile applications comprise 17 different mobile application tools. The private encryption protocol message types used by the mobile applications are used as tag data and run in a public network environment and a campus network environment to collect corresponding network traffic data. The resulting data set is shown in table 1.
Table 1 collected mobile application network traffic data set
Figure 108284DEST_PATH_IMAGE134
And embedding the characteristics extracted from each data packet in the graph node to express the encrypted network traffic characteristics. Calculating the length of the transmission layer load and a standard information entropy, wherein the calculation formula of the standard information entropy is as follows:
Figure 246005DEST_PATH_IMAGE135
in general
Figure 567265DEST_PATH_IMAGE008
A bit string or a character string of a particular length.
And then carrying out graph node characteristic embedding and correlation on the length of the transport layer load and the standard information entropy. And combining the three values of the data packet direction, the load length and the standard information entropy into an array. Sequence-to-graph feature representation for each session can generate a matrix of 3*N and a label.
(2) Constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, and learning to obtain a message load structure characteristic vector
The specific process of the step is as follows:
and (2.1) inputting the message load structural feature data into a self-encoder with sparsity constraint conditions and noise robustness constraint conditions for anti-noise and dimension reduction processing so as to improve the anti-interference capability of mobile application encryption protocol message type analysis under the network environment of background flow. The implementation of the step can not only reduce the training time of each round of the subsequent dynamic pooling convolutional neural network, but also extract the characteristics more accurately, and finally increase the accuracy of the type analysis of the mobile application encryption protocol message.
Setting sparsity constraint conditions in a hidden layer of an autoencoder;
and setting a noise robustness constraint condition in the self-encoder to constrain the connection weight matrix so as to strengthen a larger weight and weaken the disturbance of a small weight representing network background traffic noise.
And inputting the message load structure characteristic data into a self-encoder with sparsity constraint conditions and noise robustness constraint conditions for unsupervised learning, and generating a characteristic vector after dimension reduction and noise resistance processing.
And (2.2) inputting the feature vector subjected to the dimension reduction and noise resistance processing into a constructed dynamic pooling convolutional neural network for learning. And constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, wherein the model is formed by stacking three layers of one-dimensional convolutions. The filling mode adopts a same mode, and batch normalization is carried out along with each layer of convolution. A list of unit structures of the message payload structure feature learning model is shown in table 2.
Table 2 list of unit structures of message payload structure feature learning model
Figure 587173DEST_PATH_IMAGE136
For each layer of convolution operation, the hidden layer output after one-dimensional convolution is:
Figure 793027DEST_PATH_IMAGE137
after the convolution kernel operation, a plurality of characteristic sequences can be obtained for each input data, and the characteristics of the last layer of convolution output are set as follows:
Figure 581991DEST_PATH_IMAGE138
after the convolution operation DropOut is added to prevent overfitting, with a threshold of 0.2.
(2.3) for the feature vector output by the last layer of convolution, taking k-max boosting as a nonlinear down-sampling function, and extracting the features by utilizing nonlinear function dynamic pooling operation, wherein the dynamic pooling operation is as follows:
Figure 588999DEST_PATH_IMAGE139
and obtaining the characteristic vector of the message load structure after the pooling operation.
(3) And constructing a message time sequence characteristic learning model of the mobile application private encryption protocol based on the long-time memory network, and learning to obtain a message load time sequence characteristic vector.
The specific process of the step is as follows:
and (3.1) constructing a mobile application private encryption protocol message load time sequence characteristic learning model based on a long-time and short-time memory network, wherein the model comprises 64 memory units and is used for learning input traffic characteristics. A list of unit structures of the message payload timing characteristic learning model is shown in table 3.
Table 3 list of unit structures of message load timing characteristic learning model
Figure 33887DEST_PATH_IMAGE140
The mobile application private encryption protocol message load time sequence characteristic learning model adopts a gate control mechanism to learn:
Figure 840169DEST_PATH_IMAGE141
the gating values can be compressed between the [0,1] intervals by the activation function.
DropOut was added to the learning model to prevent overfitting, with a threshold of 0.5.
(3.2) model outputs are:
Figure 545957DEST_PATH_IMAGE142
and the unit state vector acts with an output gate after passing through the activation function to obtain a time sequence characteristic vector of the output message load.
(4) And constructing a mobile application private encryption protocol message interactive feature learning model based on the graph convolution neural network, and learning to obtain a message session interactive feature vector.
The specific process of the step is as follows:
and (4.1) constructing a mobile application private encryption protocol message session interactive feature learning model based on the graph convolution neural network, wherein the unit structure of the model is set as shown in the table 4.
Table 4 list of unit structures of interactive feature learning model for message sessions
Figure 615544DEST_PATH_IMAGE143
Inputting the message load interactive characteristic data into a graph convolution neural network model, and converting the graph into a sequence-to-graph method
Figure 344466DEST_PATH_IMAGE144
(ii) a Wherein, the number of the network data packets of the graph is
Figure 767488DEST_PATH_IMAGE145
Each node contains a characteristic number of packets of
Figure 265465DEST_PATH_IMAGE146
The feature matrix is
Figure 240374DEST_PATH_IMAGE147
The adjacency matrix is
Figure 784488DEST_PATH_IMAGE148
And (4.2) performing graph convolution operation by using the constructed learning model. In the model, for each layer map the convolution operations are:
Figure 932573DEST_PATH_IMAGE149
(4.3) after the two-layer graph convolution operation, one is obtained
Figure 488319DEST_PATH_IMAGE150
Using the Flatten operation to stretch the matrix into one-dimensional eigenvectors
Figure 696447DEST_PATH_IMAGE151
And obtaining:
Figure 980054DEST_PATH_IMAGE152
(4.4) compressing by using a layer of full connection layer pair, reducing dimensionality, and learning to obtain a message load session feature vector:
Figure 869512DEST_PATH_IMAGE153
(5) And fusing and splicing the load structure characteristic vector, the time sequence characteristic vector and the session interaction characteristic vector of the mobile application private encryption protocol message, and outputting an analysis result of the type of the mobile application private encryption protocol message by using a maximum entropy classifier.
The specific process of the step is as follows:
(5.1) performing ensemble learning and combined training on the three models, wherein the hyper-parameter setting during the model combined training is shown in the table 5.
TABLE 5 parameter settings during training of three model combinations
Figure 342082DEST_PATH_IMAGE154
And performing feature fusion splicing on the obtained feature vectors, wherein a list of unit structures of the feature fusion splicing is shown in table 6.
Table 6 list of unit structures for feature fusion splicing
Figure 783428DEST_PATH_IMAGE155
Are connected to obtain
Figure 974238DEST_PATH_IMAGE156
(5.2) calculating through the second fully-connected layer and its softmax activation function:
Figure 667387DEST_PATH_IMAGE157
(5.3) finally calculating and outputting the analysis result of the type of the private encryption protocol message of the mobile application, namely
Figure 338671DEST_PATH_IMAGE158
The sequence numbers corresponding to the categories to which the data belongs:
Figure 560705DEST_PATH_IMAGE159
wherein,
Figure 566707DEST_PATH_IMAGE160
indicating the corresponding sequence number of the belonging category.
The experiment of this embodiment is performed on the acquired data set of the network mobile application in 17, and the experimental result is shown in table 7, which shows the analysis result of the method of this embodiment for each application traffic encryption protocol packet type. From the data in the table it can be seen that: four types of standard-finding indexes are applied to more than 99 percent, namely Jingdong, mei Tuo, aiqi skill and much spelling; for the recall index, there are 4 types of applications which exceed 98%, namely Microsoft-Launcher, dog searching input method, weChat and Mei Tuo respectively; for the F1 value index, over 98% have 5 types of applications, which are search dog input, microsoft-Launcher, kyoto, mei Tuo, and WeChat, respectively. The weighted averages of the precision, recall, and F1 values were 97.29%,97.26% and 97.27%, respectively, the overall accuracy of the model on this data set reached 97.26%.
Table 7 type resolution results of the inventive method on a dataset of a network mobile application in 17
Figure 63547DEST_PATH_IMAGE161
In the comparison experiment, the model 2D-CNN, LSTM, GCN, CNN + LSTM is selected for comparison so as to verify the effectiveness of the message type analysis method of the mobile application encryption protocol for multi-mode feature fusion learning. The final overall comparative experimental results are shown in fig. 13 to 16.
It should be noted that, for the sake of simplicity, the present embodiment is described as a series of acts, but those skilled in the art should understand that the present application is not limited by the described order of acts, because some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art will recognize that the embodiments described in this specification are preferred embodiments and that acts or modules referred to are not necessarily required for this application.
The invention can accurately identify the types of the private encryption protocol messages of various network mobile applications, and improve the supervision efficiency and the supervision strength of network space safety;
the invention is based on the load data above the transmission layer in the network flow data to learn and classify, does not depend on the IP address and port number information of the head of the network flow data packet, and the generalization capability of the classification model is strong;
the invention carries out data set sampling test in a complex network environment, and the detection result more accords with the requirement under a real network environment.
It should be noted that, in the present invention, the execution sequence of the "S2A, the message structure feature learning", "S2B, the message timing feature learning", and "S2C, the message interaction feature learning" may be in various forms, and may even be performed simultaneously, so the order of the steps in the embodiments described in the present invention should not be considered as limiting the execution sequence of the three.
As described above, the present invention can be preferably realized.
All features disclosed in all embodiments in this specification, or all methods or process steps implicitly disclosed, may be combined and/or expanded, or substituted, in any way, except for mutually exclusive features and/or steps.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications, equivalent arrangements, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method for analyzing the type of a mobile application encryption protocol message is characterized in that different modal characteristics of the mobile application encryption protocol message are extracted and learned, and the type of the encryption protocol message is analyzed by fusing the different modal characteristics.
2. The method for parsing message type according to mobile application encryption protocol of claim 1, comprising the steps of:
s1, preprocessing message data: preprocessing the acquired mobile application network flow original data, and extracting structural feature data, time sequence feature data and interactive feature data of message loads in the original data;
s2, feature learning, which specifically comprises the following steps:
S2A, learning message structure characteristics: building a mobile application private encryption protocol message structure feature learning model based on a dynamic pooling convolutional neural network by using the structure feature data, and learning to obtain a message load structure feature vector;
S2B, learning message time sequence characteristics: constructing a mobile application private encryption protocol message time sequence feature learning model based on a long-time and short-time memory network by using the time sequence feature data, and learning to obtain a message load time sequence feature vector;
S2C, learning message interaction characteristics: constructing a mobile application private encryption protocol message interactive feature learning model based on a graph convolution neural network by using interactive feature data, and learning to obtain a message session interactive feature vector;
s3, message type analysis: and fusing and splicing the message load structure characteristic vector, the time sequence characteristic vector and the interactive characteristic vector, and outputting an analysis result of the message type of the mobile application private encryption protocol by using a maximum entropy classifier.
3. The method according to claim 2, wherein the step S1 comprises the following steps:
s11, setting the length of an original network data packet intercepted by preprocessing, segmenting continuous network flow by a session flow, and separating network message load data above a transmission layer of each data packet in the session flow;
s12, distinguishing the uplink and downlink directions of the message load data: defining the uplink direction and the downlink direction of the message load data in the data packets in the session according to the data flow direction, taking the message load data which has the same initial address, destination address and port number as the first data packet as the uplink message load data, and taking the rest as the downlink message load data;
s13, respectively calculating the sizes of the load data in the uplink direction and the downlink direction, and constructing a load data sequence in a hexadecimal form;
s14, splicing the uplink message load data and the downlink message load data to obtain message load structural characteristic data according to a splicing mode of the uplink data before and the downlink data after;
s15, arranging according to an organization mode of the data packet time sequence to obtain message load time sequence characteristic data;
s16, constructing a feature expression model based on a sequence-to-graph, and converting a data packet sequence in the session flow into an undirected graph; for each data packet in the session flow, extracting the packet direction of the data packet, the standard information entropy of the load data and the load length, and embedding the packet direction of the data packet, the standard information entropy of the load data and the load length as graph node characteristics to obtain message load interaction characteristic data.
4. The method according to claim 3, wherein in step S16, the standard entropy is calculated as:
Figure 891448DEST_PATH_IMAGE001
wherein,
Figure 879607DEST_PATH_IMAGE002
the entropy of the standard information is represented,
Figure 284043DEST_PATH_IMAGE003
represent an arbitrary distribution
Figure 805154DEST_PATH_IMAGE004
Discrete random variables of
Figure 586029DEST_PATH_IMAGE005
Figure 899198DEST_PATH_IMAGE006
To represent
Figure 474536DEST_PATH_IMAGE003
The number of discrete variables contained in (a),
Figure 482943DEST_PATH_IMAGE007
indicating the sequence number of the bytes in the data packet,
Figure 801929DEST_PATH_IMAGE008
which represents the bytes in the data packet,
Figure 720338DEST_PATH_IMAGE009
representing bytes
Figure 669839DEST_PATH_IMAGE008
In that
Figure 493439DEST_PATH_IMAGE003
The probability of occurrence of (c).
5. The method for parsing message type of mobile application encryption protocol according to any one of claims 2 to 4, wherein the step S2A comprises the steps of:
S2A1, inputting the message load structure characteristic data into a self-encoder with sparsity constraint conditions and noise robustness constraint conditions for anti-noise and dimension-reduction processing, and generating a feature vector after dimension-reduction and anti-noise processing;
S2A2, constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network; inputting the feature vector subjected to the dimension reduction and noise resistance processing into a constructed message structure feature learning model for learning to obtain a feature sequence subjected to convolution kernel operation;
the message structure characteristic learning model is constructed as follows:
constructing a mobile application private encryption protocol message structure characteristic learning model based on a dynamic pooling convolutional neural network, wherein the message structure characteristic learning model is formed by stacking three layers of one-dimensional convolutions; the filling mode adopts a same mode, and each layer of convolution is accompanied with batch normalization; for each layer of convolution operation, the hidden layer output after one-dimensional convolution is:
Figure 412853DEST_PATH_IMAGE010
wherein,
Figure 638298DEST_PATH_IMAGE007
representing one-dimensional convolution kernelsThe row numbers of the weight matrix are numbered,
Figure 493122DEST_PATH_IMAGE011
column labels of the weight matrix representing the one-dimensional convolution kernel,
Figure 115602DEST_PATH_IMAGE012
in the weight matrix representing the one-dimensional convolution kernel
Figure 776390DEST_PATH_IMAGE007
Go to the first
Figure 794025DEST_PATH_IMAGE011
The weight value of a column is determined,
Figure 882067DEST_PATH_IMAGE013
which represents the shape of the convolution kernel,
Figure 476996DEST_PATH_IMAGE014
representing input data
Figure 144738DEST_PATH_IMAGE007
Go to the first
Figure 79196DEST_PATH_IMAGE015
The value of the column is such that,
Figure 151188DEST_PATH_IMAGE016
the total number of rows of data is represented,
Figure 171097DEST_PATH_IMAGE017
which represents the total number of columns entered,
Figure 376950DEST_PATH_IMAGE018
the shape of the input is represented by,
Figure 493811DEST_PATH_IMAGE019
to represent the output
Figure 923655DEST_PATH_IMAGE020
A value of each position;
after the convolution kernel operation, a plurality of characteristic sequences can be obtained for each input data, and the characteristic vector output by the last layer of convolution is set as follows:
Figure 368543DEST_PATH_IMAGE021
wherein,
Figure 174825DEST_PATH_IMAGE022
representing feature vectors
Figure 132810DEST_PATH_IMAGE023
Each element of (1);
S2A3, for the feature vector output by the last layer of convolution, taking k-max boosting as a nonlinear down-sampling function, and extracting the feature vector by utilizing nonlinear function dynamic pooling operation to obtain a message load structure feature vector;
the dynamic pooling operation is as follows:
Figure 202397DEST_PATH_IMAGE024
wherein,
Figure 931319DEST_PATH_IMAGE025
the message structure is represented by a character of the message,
Figure 603608DEST_PATH_IMAGE026
indicates the number of all the convolutional layers,
Figure 101586DEST_PATH_IMAGE027
indicates the number of layers of the current convolutional layer,
Figure 76495DEST_PATH_IMAGE028
which indicates the length of the input sequence,
Figure 371341DEST_PATH_IMAGE029
representing fixed pooling layer parameters.
6. The method for parsing message type of mobile application encryption protocol according to claim 5, wherein in step S2A1, the cost function of sparsity constraint condition is:
Figure 519426DEST_PATH_IMAGE030
Figure 75172DEST_PATH_IMAGE031
Figure 283300DEST_PATH_IMAGE032
wherein,
Figure 49130DEST_PATH_IMAGE033
a cost function representing a sparsity constraint,
Figure 204168DEST_PATH_IMAGE034
representing the input from the encoder, and,
Figure 676738DEST_PATH_IMAGE035
the sparsity constraint is expressed in terms of,
Figure 367351DEST_PATH_IMAGE036
the weight representing the sparsity constraint is represented by,
Figure 558161DEST_PATH_IMAGE037
which represents the expectation of the total noise,
Figure 251310DEST_PATH_IMAGE038
representing the number of implicit layers in the self-encoder,
Figure 906283DEST_PATH_IMAGE039
,
Figure 190633DEST_PATH_IMAGE040
representing gaussian noise with a mean of 0 and a variance of 1,
Figure 806423DEST_PATH_IMAGE041
representing a neural network
Figure 100001DEST_PATH_IMAGE038
The layer is input into the device body,
Figure 625791DEST_PATH_IMAGE042
a number of a hidden layer unit is indicated,
Figure 284305DEST_PATH_IMAGE043
the number of the neurons in the hidden layer is represented,
Figure 184128DEST_PATH_IMAGE044
representing a hidden layer response;
the cost function of the noise robustness constraint is:
Figure 78135DEST_PATH_IMAGE045
wherein,
Figure 645383DEST_PATH_IMAGE046
a cost function representing a noise robustness constraint,
Figure 474798DEST_PATH_IMAGE047
the target output is represented by a target output,
Figure 442011DEST_PATH_IMAGE048
representing the output from the encoder learning network,
Figure 77391DEST_PATH_IMAGE049
which is indicative of an activation factor,
Figure 436829DEST_PATH_IMAGE050
Figure 499462DEST_PATH_IMAGE051
a number representing two input data is shown,
Figure 701774DEST_PATH_IMAGE052
representing input data from
Figure 812949DEST_PATH_IMAGE053
To input data
Figure 89210DEST_PATH_IMAGE054
The connection weight of (2).
7. The method according to claim 6, wherein the step S2B comprises the following steps:
S2B1, constructing a mobile application private encryption protocol message load time sequence characteristic learning model based on a long-time memory network, wherein the message load time sequence characteristic learning model comprises JI memory units, JI is an integer and 32 is more than or equal to JI and less than or equal to 256, and learning message load time sequence characteristic data by using the constructed message load time sequence characteristic learning model, wherein the learning formula is as follows:
Figure 604636DEST_PATH_IMAGE055
wherein,
Figure 966347DEST_PATH_IMAGE056
a function of a gate unit is represented,
Figure 271426DEST_PATH_IMAGE057
Figure 339877DEST_PATH_IMAGE058
Figure 744313DEST_PATH_IMAGE059
respectively representing a forgetting gate, an input gate or an output gate,
Figure 904905DEST_PATH_IMAGE060
it is shown that the activation function is,
Figure 889041DEST_PATH_IMAGE061
a parameter corresponding to a forgetting gate, an input gate or an output gate,
Figure 874315DEST_PATH_IMAGE062
indicating the time of day
Figure 511970DEST_PATH_IMAGE063
The input of (a) is performed,
Figure 848273DEST_PATH_IMAGE064
indicating the time of day
Figure 104942DEST_PATH_IMAGE065
Is then outputted from the output of (a),
Figure 210301DEST_PATH_IMAGE066
a bias value representing a forgetting gate, an input gate, or an output gate;
S2B2, obtaining a time sequence characteristic vector output as a message load, wherein the output formula is as follows:
Figure 35169DEST_PATH_IMAGE067
wherein,
Figure 796452DEST_PATH_IMAGE068
a characteristic vector representing the time sequence of the message payload,
Figure 653549DEST_PATH_IMAGE069
it is shown that the activation function is,
Figure 675732DEST_PATH_IMAGE070
a vector of the states of the cells is represented,
Figure 858451DEST_PATH_IMAGE071
the tan h activation function is expressed as,
Figure 841451DEST_PATH_IMAGE072
a parameter indicative of the output gate is provided,
Figure 82333DEST_PATH_IMAGE073
indicating the bias.
8. The method according to claim 7, wherein the step S2C comprises the steps of:
S2C1, constructing a mobile application private encryption protocol message session interactive feature learning model based on a graph convolution neural network, wherein the session interactive feature learning model comprises two graph convolution layers which are sequentially connected, setting the number of channels of two graph convolutions when graph convolution operation is carried out, and activating a function to select a ReLU function;
inputting the message load interactive characteristic data into a graph convolution neural network model, and converting the graph into a sequence-to-graph method
Figure 162284DEST_PATH_IMAGE074
(ii) a Wherein, the net of the figureThe number of network data packets is
Figure 188009DEST_PATH_IMAGE075
Each node contains a characteristic number of packets of
Figure 720622DEST_PATH_IMAGE076
The feature matrix is
Figure 512997DEST_PATH_IMAGE077
The adjacency matrix is
Figure 385138DEST_PATH_IMAGE078
And S2C2, performing graph convolution operation by using the learning model constructed in the step S2C1, wherein the graph convolution operation comprises the following steps of:
Figure 644081DEST_PATH_IMAGE079
wherein,
Figure 477039DEST_PATH_IMAGE080
Figure 479630DEST_PATH_IMAGE081
the unit matrix is represented by a matrix of units,
Figure 471857DEST_PATH_IMAGE082
to represent
Figure 229597DEST_PATH_IMAGE083
A corresponding matrix of degrees is formed by the degree matrix,
Figure 736802DEST_PATH_IMAGE007
the number of network layers is indicated,
Figure 215188DEST_PATH_IMAGE084
denotes the first
Figure 124238DEST_PATH_IMAGE085
The weight of the layer, the dimension of the weight is
Figure 302148DEST_PATH_IMAGE086
Figure 234331DEST_PATH_IMAGE087
Is shown passing through
Figure 578725DEST_PATH_IMAGE007
The dimensionality of the graph node data after the layer convolution,
Figure 404599DEST_PATH_IMAGE088
is shown as
Figure 441825DEST_PATH_IMAGE007
The biasing of the layers is such that,
Figure 861305DEST_PATH_IMAGE089
is shown as
Figure 822439DEST_PATH_IMAGE090
Input of the layer, the input of the first layer being
Figure 440502DEST_PATH_IMAGE091
Figure 586312DEST_PATH_IMAGE092
Representing a nonlinear activation function ReLU function;
S2C3, obtaining one after two-layer graph convolution operation
Figure 289826DEST_PATH_IMAGE093
Using the Flatten operation to stretch the matrix into one-dimensional eigenvectors
Figure 38339DEST_PATH_IMAGE094
Obtaining:
Figure 979751DEST_PATH_IMAGE095
wherein,
Figure 358779DEST_PATH_IMAGE096
representing the interactive feature vector of the messaging session,
Figure 852385DEST_PATH_IMAGE096
has the dimension of
Figure 342272DEST_PATH_IMAGE097
Figure 607031DEST_PATH_IMAGE098
Representing each element in the message session interaction feature vector;
S2C4, compressing by using a layer of full connection layer pair, reducing dimensionality, and learning to obtain a message load session feature vector:
Figure 484857DEST_PATH_IMAGE099
wherein,
Figure 162963DEST_PATH_IMAGE100
a feature vector representing the message payload session,
Figure 394224DEST_PATH_IMAGE101
a weight matrix representing the fully-connected layer,
Figure 841386DEST_PATH_IMAGE102
the offset is represented by the number of bits in the bit,
Figure 109688DEST_PATH_IMAGE069
it is shown that the activation function is,
Figure 478352DEST_PATH_IMAGE069
the ReLU function is used at the fully connected layer.
9. The method according to claim 8, wherein the step S3 comprises the following steps:
s31, performing integrated learning and combined training on the message structure characteristic learning model, the message time sequence characteristic learning model and the message interaction characteristic learning model, and setting a hyper-parameter during model combined training; and performing feature fusion splicing on the obtained message load structure feature vector, the message load time sequence feature vector and the message session interaction feature vector, and connecting to obtain:
Figure 310042DEST_PATH_IMAGE103
wherein,
Figure 939606DEST_PATH_IMAGE104
representing message session multimode fusion feature vectors;
s32, calculating through a second full connection layer and a softmax activation function thereof:
Figure 565760DEST_PATH_IMAGE105
wherein,
Figure 156141DEST_PATH_IMAGE106
a weight matrix representing the second fully-connected layer,
Figure 368685DEST_PATH_IMAGE107
the offset is represented by the number of bits in the bit,
Figure 524860DEST_PATH_IMAGE108
the length representing the number of classes that need to be classified,
Figure 525177DEST_PATH_IMAGE108
is a one-dimensional vector;
s33, finally calculating and outputting the analysis result of the mobile application private encryption protocol message type
Figure 399592DEST_PATH_IMAGE109
Figure 900981DEST_PATH_IMAGE110
Wherein,
Figure 380504DEST_PATH_IMAGE111
indicating the corresponding sequence number of the belonging category.
10. A mobile application encryption protocol message type parsing system, based on any one of claims 1 to 9, characterized in that the mobile application encryption protocol message type parsing method comprises the following modules:
a message data preprocessing module: the method comprises the steps of preprocessing acquired mobile application network flow original data, and extracting structural feature data, time sequence feature data and interactive feature data of message loads in the original data;
the message structural feature learning module: the method comprises the steps of constructing a mobile application private encryption protocol message structure feature learning model based on a dynamic pooling convolutional neural network by using structure feature data, and learning to obtain a message load structure feature vector;
a message time sequence characteristic learning module: the method comprises the steps of constructing a mobile application private encryption protocol message time sequence characteristic learning model based on a long-time and short-time memory network by using time sequence characteristic data, and learning to obtain a message load time sequence characteristic vector;
the message interaction feature learning module: the method comprises the steps of constructing a mobile application private encryption protocol message interactive feature learning model based on a graph convolution neural network by using interactive feature data, and learning to obtain a message session interactive feature vector;
a message type analysis module: the message type analysis method is used for fusing and splicing the message load structure characteristic vector, the time sequence characteristic vector and the interaction characteristic vector, and outputting an analysis result of the mobile application private encryption protocol message type by using a maximum entropy classifier;
the input ends of the message structure characteristic learning module, the message time sequence characteristic learning module and the message type analysis module are respectively and electrically connected with the output end of the message data preprocessing module, and the output ends of the message structure characteristic learning module, the message time sequence characteristic learning module and the message type analysis module are respectively and electrically connected with the input end of the message type analysis module.
CN202211171000.9A 2022-09-26 2022-09-26 Method and system for analyzing message type of mobile application encryption protocol Active CN115277888B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211171000.9A CN115277888B (en) 2022-09-26 2022-09-26 Method and system for analyzing message type of mobile application encryption protocol

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211171000.9A CN115277888B (en) 2022-09-26 2022-09-26 Method and system for analyzing message type of mobile application encryption protocol

Publications (2)

Publication Number Publication Date
CN115277888A true CN115277888A (en) 2022-11-01
CN115277888B CN115277888B (en) 2023-01-31

Family

ID=83757417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211171000.9A Active CN115277888B (en) 2022-09-26 2022-09-26 Method and system for analyzing message type of mobile application encryption protocol

Country Status (1)

Country Link
CN (1) CN115277888B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115801897A (en) * 2022-12-20 2023-03-14 南京工程学院 Dynamic message processing method for edge proxy
CN115883263A (en) * 2023-03-02 2023-03-31 中国电子科技集团公司第三十研究所 Encryption application protocol type identification method based on multi-scale load semantic mining

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105430021A (en) * 2015-12-31 2016-03-23 中国人民解放军国防科学技术大学 Encrypted traffic identification method based on load adjacent probability model
US20190273509A1 (en) * 2018-03-01 2019-09-05 Crowdstrike, Inc. Classification of source data by neural network processing
CN111147394A (en) * 2019-12-16 2020-05-12 南京理工大学 Multi-stage classification detection method for remote desktop protocol traffic behavior
CN112003870A (en) * 2020-08-28 2020-11-27 国家计算机网络与信息安全管理中心 Network encryption traffic identification method and device based on deep learning
CN112511555A (en) * 2020-12-15 2021-03-16 中国电子科技集团公司第三十研究所 Private encryption protocol message classification method based on sparse representation and convolutional neural network
WO2021103135A1 (en) * 2019-11-25 2021-06-03 中国科学院深圳先进技术研究院 Deep neural network-based traffic classification method and system, and electronic device
CN113179223A (en) * 2021-04-23 2021-07-27 中山大学 Network application identification method and system based on deep learning and serialization features
WO2022041394A1 (en) * 2020-08-28 2022-03-03 南京邮电大学 Method and apparatus for identifying network encrypted traffic
CN114358177A (en) * 2021-12-31 2022-04-15 北京工业大学 Unknown network traffic classification method and system based on multidimensional feature compact decision boundary
WO2022094926A1 (en) * 2020-11-06 2022-05-12 中国科学院深圳先进技术研究院 Encrypted traffic identification method, and system, terminal and storage medium
CN114519390A (en) * 2022-02-17 2022-05-20 北京邮电大学 QUIC flow classification method based on multi-mode deep learning

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105430021A (en) * 2015-12-31 2016-03-23 中国人民解放军国防科学技术大学 Encrypted traffic identification method based on load adjacent probability model
US20190273509A1 (en) * 2018-03-01 2019-09-05 Crowdstrike, Inc. Classification of source data by neural network processing
WO2021103135A1 (en) * 2019-11-25 2021-06-03 中国科学院深圳先进技术研究院 Deep neural network-based traffic classification method and system, and electronic device
CN111147394A (en) * 2019-12-16 2020-05-12 南京理工大学 Multi-stage classification detection method for remote desktop protocol traffic behavior
CN112003870A (en) * 2020-08-28 2020-11-27 国家计算机网络与信息安全管理中心 Network encryption traffic identification method and device based on deep learning
WO2022041394A1 (en) * 2020-08-28 2022-03-03 南京邮电大学 Method and apparatus for identifying network encrypted traffic
WO2022094926A1 (en) * 2020-11-06 2022-05-12 中国科学院深圳先进技术研究院 Encrypted traffic identification method, and system, terminal and storage medium
CN112511555A (en) * 2020-12-15 2021-03-16 中国电子科技集团公司第三十研究所 Private encryption protocol message classification method based on sparse representation and convolutional neural network
CN113179223A (en) * 2021-04-23 2021-07-27 中山大学 Network application identification method and system based on deep learning and serialization features
CN114358177A (en) * 2021-12-31 2022-04-15 北京工业大学 Unknown network traffic classification method and system based on multidimensional feature compact decision boundary
CN114519390A (en) * 2022-02-17 2022-05-20 北京邮电大学 QUIC flow classification method based on multi-mode deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZIYI ZHAO ET AL.: ""CL-ETC: A Contrastive Learning Method for Encrypted Traffic Classification"", 《2022 IFIP NETWORKING CONFERENCE (IFIP NETWORKING)》 *
程永新等: ""一种加密流量行为分析系统的设计研究"", 《通信技术》 *
童博等: ""复杂网络环境下加密流量识别方法研究"", 《邮电设计技术》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115801897A (en) * 2022-12-20 2023-03-14 南京工程学院 Dynamic message processing method for edge proxy
CN115801897B (en) * 2022-12-20 2024-05-24 南京工程学院 Message dynamic processing method of edge proxy
CN115883263A (en) * 2023-03-02 2023-03-31 中国电子科技集团公司第三十研究所 Encryption application protocol type identification method based on multi-scale load semantic mining
CN115883263B (en) * 2023-03-02 2023-05-09 中国电子科技集团公司第三十研究所 Encryption application protocol type identification method based on multi-scale load semantic mining

Also Published As

Publication number Publication date
CN115277888B (en) 2023-01-31

Similar Documents

Publication Publication Date Title
CN110896381B (en) Deep neural network-based traffic classification method and system and electronic equipment
CN115277888B (en) Method and system for analyzing message type of mobile application encryption protocol
CN110287983B (en) Single-classifier anomaly detection method based on maximum correlation entropy deep neural network
WO2019144521A1 (en) Deep learning-based malicious attack detection method in traffic cyber physical system
CN112508085B (en) Social network link prediction method based on perceptual neural network
Wang et al. App-net: A hybrid neural network for encrypted mobile traffic classification
CN109698836A (en) A kind of method for wireless lan intrusion detection and system based on deep learning
Lai et al. Industrial anomaly detection and attack classification method based on convolutional neural network
CN109446804B (en) Intrusion detection method based on multi-scale feature connection convolutional neural network
CN111353153A (en) GEP-CNN-based power grid malicious data injection detection method
CN112087442B (en) Time sequence related network intrusion detection method based on attention mechanism
CN113177132A (en) Image retrieval method based on depth cross-modal hash of joint semantic matrix
Xue et al. Clustering-Induced Adaptive Structure Enhancing Network for Incomplete Multi-View Data.
CN114615093A (en) Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning
CN111397902A (en) Rolling bearing fault diagnosis method based on feature alignment convolutional neural network
CN113541834B (en) Abnormal signal semi-supervised classification method and system and data processing terminal
CN115037805B (en) Unknown network protocol identification method, system and device based on deep clustering and storage medium
CN103177265A (en) High-definition image classification method based on kernel function and sparse coding
CN111641598A (en) Intrusion detection method based on width learning
CN115277258B (en) Network attack detection method and system based on temporal-spatial feature fusion
CN114064471A (en) Ethernet/IP protocol fuzzy test method based on generation of countermeasure network
CN111130942B (en) Application flow identification method based on message size analysis
CN114915575A (en) Network flow detection device based on artificial intelligence
CN106021170A (en) Graph building method employing semi-supervised low-rank representation model
CN117633627A (en) Deep learning unknown network traffic classification method and system based on evidence uncertainty evaluation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant