CN115883263A - Encryption application protocol type identification method based on multi-scale load semantic mining - Google Patents
Encryption application protocol type identification method based on multi-scale load semantic mining Download PDFInfo
- Publication number
- CN115883263A CN115883263A CN202310189712.1A CN202310189712A CN115883263A CN 115883263 A CN115883263 A CN 115883263A CN 202310189712 A CN202310189712 A CN 202310189712A CN 115883263 A CN115883263 A CN 115883263A
- Authority
- CN
- China
- Prior art keywords
- sequence
- features
- load
- characteristic
- application protocol
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000005065 mining Methods 0.000 title claims abstract description 41
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000013528 artificial neural network Methods 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 5
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 13
- 230000005540 biological transmission Effects 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 8
- 230000006835 compression Effects 0.000 claims description 7
- 238000007906 compression Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 230000009467 reduction Effects 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000004140 cleaning Methods 0.000 claims description 2
- 239000000284 extract Substances 0.000 abstract description 4
- 238000013135 deep learning Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000005206 flow analysis Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention provides an encryption application protocol type identification method based on multi-scale load semantic mining, which comprises the following steps: step 1, extracting load characteristics of original flow and converting the load characteristics into a decimal byte sequence; step 2, constructing a pyramid neural network based on a load semantic mining block, and processing a decimal byte sequence to obtain an input characteristic sequence; step 3, the load semantic mining block constructs a sliding window on the input feature sequence, the sliding window sequentially moves to the tail end of the sequence, and features extracted from the splicing window obtain features of the input sequence; step 4, reducing the dimension of the features of the input sequence to be used as a new input sequence, repeating the step 3 to the step 4, and splicing the features obtained each time to obtain multi-scale features; and 5, finishing classification of the encryption network application protocol types according to the multiple scale characteristics. The invention can extract the multi-scale characteristics in the encryption network application protocol message in a complex scene, and improve the speed and the precision of encryption flow identification.
Description
Technical Field
The invention relates to the field of flow analysis, in particular to an encryption application protocol type identification method based on multi-scale load semantic mining.
Background
Traffic classification has been used in a very wide range of applications, and is the basis for network security and network management, and detection of traffic classification from QoS services in network service providers to security applications in firewalls and intrusion detection systems has not been isolated. At present, the traffic classification mainly adopts methods based on port number, deep packet inspection, machine learning and the like, but has certain defects:
(1) Traditional port number-based approaches have long failed because newer applications either use well-known port numbers to mask their traffic or do not use standard registered port numbers.
(2) Deep packet inspection relies on finding keys in the packets, which fails in the face of encrypted traffic.
(3) Machine learning based methods of encrypted network traffic identification rely heavily on ergonomic features, which limits their popularity.
With the popularization of deep learning methods, researchers have studied the effects of these methods on traffic classification tasks and demonstrated higher accuracy on early mobile application traffic data sets. With the continuous upgrading of encryption protocols, the explosive growth of the number of mobile applications and the change of mobile application development modes, shallow deep learning models cannot meet the actual requirements of mobile application traffic identification in the current complex scene. Although the currently proposed transform-based encrypted traffic identification method has a good effect on feature learning, global features are more concerned in the feature extraction process, detail features hidden in high-fraction load data are ignored, and the local features are the key for realizing accurate classification in many cases.
Disclosure of Invention
In order to solve the problems that deep features in encrypted flow cannot be learned by a shallow neural network under the current complex scene and the existing deep neural network excessively focuses on global features to cause loss of detail features, the invention provides a new encryption network application protocol type identification method, which fully utilizes the global features and the local detail features of different scales in packet loads by extracting the features of different scales, thereby improving the identification precision.
The technical scheme adopted by the invention is as follows: the encryption application protocol type identification method based on multi-scale load semantic mining comprises the following steps:
step 3, the load semantic mining block constructs a sliding window on the input feature sequence, the sliding window moves in sequence until the tail end of the input sequence, the features in the sliding window during each movement are extracted, and the features extracted in all the sliding windows are spliced in sequence to obtain the features of the input sequence;
step 4, performing feature compression and dimension reduction on the features of the input sequence to serve as a new input sequence, repeating the steps 3-4 k, and splicing the features of the input sequence obtained each time to obtain the multi-scale features of the input sequence;
and 5, finishing classification of the encryption network application protocol types according to the multiple scale characteristics.
Further, the pretreatment process in the step 1 is as follows:
step 1.1, dividing the data packet into session flows according to quintuple;
step 1.2, cleaning the session stream, and removing the data packet retransmitted overtime, the data packet of the address resolution protocol and the data packet of the dynamic host configuration protocol;
step 1.3, extracting load characteristics of a transmission layer load in a data packet, and splicing the extracted load characteristics according to the arrival sequence of the data packet until the byte length after splicing reaches the set load characteristic length;
and 1.4, converting the extracted spliced load characteristics into a decimal byte sequence.
Further, in step 1.3, if the byte length after the payload features of all the packets in the session stream are spliced is still smaller than the set payload feature length, the packet is padded with 0X 00.
Further, in the step 2, mapping byte features of the decimal byte sequence to a d-dimensional vector space to obtain word embedding features F1,where R represents a real number in the matrix.
Further, in the step 2, the position coding feature calculating method includes:
where pos denotes the position where the byte appears in the byte sequence, left side of formula (1)Position coding which indicates bytes in even positions, left-based or (2)>Indicates the position-coding of the byte in the odd position, based on the value of the flag>I is a position-coded dimension subscript modulo 2, and (1) indicates that even positions are->And (2) represents the odd number position is based on->,For the position-coded dimension, <' > H>For position-coding features, in formula (3)>Indicating the position code of each byte in the byte sequence.
Further, the substep of step 3 comprises:
step 3.1, constructing a sliding window with the size of L bytes on the input sequence;
step 3.2, performing feature extraction on the data in the sliding window by adopting a multi-head attention mechanism to obtain a feature F4;
step 3.3, carrying out residual error connection and layer normalization processing on the input sequence F3 and the characteristic F4 to obtain a characteristic F5;
step 3.4, performing two-layer full-connection layer operation on the characteristic F5 to obtain a characteristic F6;
step 3.5, carrying out residual error connection and layer normalization processing on the characteristic F5 and the characteristic F6 to obtain a characteristic F7;
step 3.6, moving the sliding window backwards by L bytes, and repeating the step 3.2 to the step 3.6 until the sliding window moves to the tail end of the input sequence;
and 3.7, splicing the features F7 in all the sliding windows to obtain a feature F8 which is used as the feature of the input sequence.
Further, the substeps of step 3.2 are:
step 3.2.1, performing multi-head self-attention calculation on the data in the sliding window, and extracting the incidence relation of byte sequences in the window;
and 3.2.2, repeating the step 3.2.1 for M times according to the set attention head number M, and splicing and linearly converting the extracted result every time to obtain the characteristic F4 of the data in the sliding window.
Further, in step 4, a one-dimensional maximum pooling layer is used to complete feature compression and dimension reduction, and each pooling operation halves the dimension of the first dimension of the feature.
Further, the substep of step 5 comprises:
step 5.1, inputting the extracted multi-scale features into a full connection layer and an activation function, wherein the output dimension is consistent with the quantity of flow categories;
and 5.2, calculating the type of the encrypted network application protocol according to the output.
Further, in the step 5.2, a specific calculation method of the category is as follows:
where Z represents the output of the multi-scale feature input fully-connected layer and the activation function.
Compared with the prior art, the beneficial effects of adopting the technical scheme are as follows:
1. the pyramid network constructed based on the load semantic mining block can extract multi-scale features in the message type of the encryption network application protocol in the current complex scene, fully extract global features and multi-scale local features, and further improve the accuracy of encryption flow identification.
2. When the local features are extracted, a sliding window mode is adopted, each self-attention calculation is carried out in the range covered by the window, noise is avoided being introduced when the local features are extracted, meanwhile, model parameters are greatly reduced, and the calculation speed of the model is improved.
3. Learning and classifying based on load data above a transmission layer in the network traffic data, and having strong generalization capability without depending on IP address and port number information of the head of a network traffic data packet; strong identification information such as an IP address and port number information of a header of a network traffic data packet does not have universality, and may cause strong interference to a final identification result.
Drawings
Fig. 1 is a flowchart of an encryption application protocol type identification method based on multi-scale load semantic mining according to the present invention.
Fig. 2 is a schematic diagram of a pyramid network model structure according to an embodiment of the present invention.
FIG. 3 is a flowchart illustrating an implementation of a sliding window according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of multi-scale feature extraction according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar modules or modules having the same or similar functionality throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the application include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
Aiming at the problems that deep level features in encrypted flow cannot be learned by a shallow neural network in the prior art under the current complex scene and the existing deep neural network excessively focuses on global features to cause loss of detail features, the embodiment provides an encryption application protocol type identification method for mining deep neural networks to extract multi-scale features based on load semantics, the features of different scales are extracted, the global features in packet loads and the local detail features of different scales are fully utilized, the identification precision is further improved, meanwhile, the local features are extracted in a sliding window mode, self-attention calculation is limited in a window range, model parameters are reduced, and the calculation speed of a model is improved, and the specific scheme is as follows:
as shown in fig. 1, the method for identifying the type of the encryption application protocol based on the multi-scale load semantic mining includes:
step 3, the load semantic mining block constructs a sliding window on the input feature sequence, the sliding window moves in sequence until the end of the input sequence, the features in the sliding window during each movement are extracted, and the features extracted in all the sliding windows are spliced in sequence to obtain the features of the input sequence;
step 4, performing feature compression and dimension reduction on the features of the input sequence to serve as a new input sequence, repeating the steps 3-4 k, and splicing the features of the input sequence obtained each time to obtain the multi-scale features of the input sequence;
and 5, finishing classification of the encryption network application protocol types according to the multiple scale characteristics.
Since the pre-identification information such as the IP address and the port number information of the network traffic packet header has no universality and may cause strong interference to the identification result, in this embodiment, the learning and classification are performed based on the information and the data above the network traffic data transmission layer, and do not depend on the IP address and the port number of the network traffic packet header.
Before parsing, the original flow needs to be preprocessed, specifically:
step 1.1, dividing the received data packet into session flows according to a quintuple (source IP, destination IP, source port, destination port, transport layer protocol), and identifying the flow by taking the session flows as a unit.
Step 1.2, because the received data packets contain data packets irrelevant to the specific flow of the transmission content, the session stream needs to be cleaned, and the data packets retransmitted overtime, the data packets of an Address Resolution Protocol (ARP) and a Dynamic Host Configuration Protocol (DHCP) are removed. In this example, the washing was accomplished using the Tsharp tool from WireShark.
And step 1.3, after removing the irrelevant data packets, extracting the load characteristics of the transmission layer loads of the remaining data packets, and extracting the load characteristics of the transmission layer loads according to the arrival sequence of the data packets for splicing until the extracted byte length reaches the set load characteristic length N. It should be noted that, in this embodiment, if the length of the concatenated bytes of the payload features of all the packets in the session stream is smaller than N, padding is performed by 0X 00.
Preferably, the present embodiment uses the rdpcap method of the Scapy tool to extract the load characteristics of the transport layer load.
And step 1.4, converting the extracted and spliced binary load characteristics into a decimal byte sequence, namely converting each byte into a corresponding decimal number (0 to 255).
After the decimal byte sequence representing the transmission layer characteristics is obtained, the analysis of the traffic type can be started, and in this embodiment, the features of different scales in the payload (decimal byte sequence) are extracted by using the constructed Pyramid-type neural network (Pyramid-Transformer).
The current encryption flow identification model based on a transform (a deep learning framework) uses a self-attention mechanism to extract global features more, neglects extraction of local features, and the local features may be a key for realizing fine classification, and meanwhile, the local features have a phenomenon of inconsistent scales, and interference may exist in the extraction process.
As shown in fig. 2 and 4, in this embodiment, a Pyramid-type neural network (Pyramid-Transformer) constructed based on a plurality of load semantic mining blocks (Pyramid Transformer blocks) is proposed in step 2, and a one-dimensional maximum pooling layer is arranged between each load semantic mining block, so as to implement compression and dimension reduction in the feature extraction process. Each load semantic mining block has the same composition and comprises six parts of multi-head attention calculation, residual connection, layer normalization, two layers of fully-connected layers and activation functions, residual connection and layer normalization which are sequentially connected. Extracting deep multi-scale features in a stacking mode of a plurality of load semantic mining blocks, compressing feature dimensions to 1/2 after each load semantic mining block extracts the features, inputting the compressed features into the next load semantic mining block without changing the window size, extracting features with larger dimensions in the mode, gradually reducing the feature dimensions extracted by each load semantic mining block to form a pyramid shape, and splicing the features to obtain the final features.
The process of realizing feature extraction by the pyramid type neural network is specifically explained as follows:
in the pyramid type neural network, feature extraction is mainly completed through a load semantic mining block, and the input of the load semantic mining block is the combination of word embedding features and position coding features of a byte sequence, so that a decimal byte sequence needs to be processed firstly.
For byte sequence (in FIG. 2, FIG. 4, areB1、B2、…、BN-1、BN-2) Performing word embedding operation, mapping the byte features to a d-dimensional vector space to obtain word embedding features F1 as subsequent input,where R represents a real number in the matrix.
Calculating position coding characteristics of byte sequence,Where R represents a real number in the matrix:
where pos denotes the position where the byte appears in the byte sequence, left side of formula (1)Indicates the position coding of the byte in the even position, and (2) left @>Position-coding, which represents bytes in odd positions>I is a position-coded dimension subscript modulo 2, and (1) indicates that even positions are->And (2) represents the odd number position is based on->,A dimension that encodes a position; (3) In the formula>Indicating the position code of each byte in the byte sequence. Since the Transformer uses global information and cannot utilize the order information of bytes, which is very important for feature learning, the present embodiment acquires the position encoding feature.
Combining the word embedding characteristic and the position coding characteristic according to the formula (4) to obtain the input characteristic of the load semantic mining block,Where R represents a real number in the matrix.
After determining the input of the load semantic mining block, feature extraction can be performed through the load semantic mining block, which specifically includes:
step 3.1, because some detail features only exist on a small number of adjacent bytes, the direct feature extraction of the whole input sequence may cause interference to the local detail features, and the sliding window mode is used to ensure that the high-resolution local detail features are not damaged. Thus in the input featureAnd constructing a sliding window with the size of L, and performing feature extraction on data inside the window as shown in FIG. 3.
Step 3.2, acquiring data in the sliding window as,The multi-head attention machine is adopted to control the paired judgment and judgment>Performing feature extraction to obtain the feature->,,Contains global dependencies of bytes within a window, and what is obtained here from the point of view of the entire byte sequence is a local feature within the window.
The specific process comprises the following steps:
step 3.2.1, pairPerforming multi-head self-attention calculation, and extracting the association relation of byte sequences in a window:
using a weight matrix,,Counting feature>Is/are>The calculation process is shown as formula (5), formula (6) and formula (7):
by passingThe matrix operation of (a) implements a self-Attention mechanism (Attention), resulting in an output @>,:
wherein ,is->The number of columns of the matrix, i.e. the vector dimension, and->In the same way>Is a matrix transposition. The calculation matrix in the formula-> andThe inner product of each row vector is divided by ^ to prevent the inner product from being too large>. After Q is multiplied by the transpose of K, the number of rows and columns of the obtained matrix is L, where L is the window size, and this matrix can represent the strength of association between bytes. Get->Thereafter, use->The function (normalized exponential function) calculates the self-attention coefficient of each byte for the other bytes, and->Each row of the matrix is normalized, i.e. the sum of each row becomes 1.
Step 3.2.2, setting the number M of attention heads, repeating the step 3.2.1M times to obtain M output Z, splicing and linearly transforming the M output Z to obtain characteristics,:
wherein ,represents the output of the first calculation, is greater than or equal to>Indicates the fifth->The sub-calculated output->Weight matrix representing a linear transformation>。
Step 3.3, for andPerforming residual connection and layer normalization to obtain a characteristic->:
Wherein LayerNorm indicates layer normalization operation.
Step 3.4, thePerforming a Forward propagation (Feed Forward) operation to obtain the characteristic->,。
Wherein, linear represents that one full connection layer operation is carried out; feed Forward consists of two fully connected layers, the first layer using the activation function RELU and the second layer not using the activation function.
Step 3.5, for andPerforming residual connection and layer normalization to obtain a characteristic->,:
Step 3.6, moving the sliding window backwards by L bytes, and re-executing the step 3.2 to the step 3.5 in the new window until the sliding window moves to the input characteristicEnding;
wherein ,characteristic representing the result of the first window>,Indicates the characteristic taken by the last window>。
In order to extract the multi-scale features of the byte sequence, in step 4 of this embodiment, a one-dimensional maximum pooling layer pair of features is first adoptedFeature compression and dimension reduction to obtain features->,:
Wherein, maxPool1d represents one-dimensional maximum pooling operation, the pooling operation halves the dimension of the first dimension of the feature, and meanwhile, the new feature has richer semantic information.
And setting repetition times k according to the requirement, and repeating the steps 3-4 k times except for the input characteristic in the process of executing the step 3 for the first timeIn addition, when step 3 is subsequently executed, the characteristic obtained in the last step 4 is->As an input for this time.
As shown in fig. 4, the repeated operation represents that the load semantic mining blocks of the pyramid network model are stacked multiple times, and the features of deeper higher semantics are extracted layer by layer, in fig. 4, the feature dimension is represented by N, d, N is the same as the length of the input byte sequence, and d is the same as the dimension of each byte extension after the word embedding operation. Wherein,representing a characteristic taken by a first repeated operation>,Representing a characteristic taken by the kth operation>。
The characteristics obtained at this timeI.e. a multi-scale feature in the required load. After obtaining the multi-scale features, the flow classification can be carried out:
in this embodiment, the classification process specifically includes:
step 5.1, extracting the multi-scale featuresInput full connection layer and activation function>Dimension of output and number of traffic classes->And (5) the consistency is achieved.
And 5.2, calculating and outputting the type of the encrypted network application protocol:
in the embodiment, a deep neural network, namely a pyramid neural network is constructed, and the network stacks load semantic mining blocks, so that deep features in an encryption protocol message type in a current complex scene can be extracted, and the accuracy of flow identification is improved.
It should be noted that, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "disposed" and "connected" should be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; may be directly connected or may be indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood as specific cases to those of ordinary skill in the art; the drawings in the embodiments are used for clearly and completely describing the technical scheme in the embodiments of the invention, and obviously, the described embodiments are a part of the embodiments of the invention, but not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.
Claims (10)
1. The encryption application protocol type identification method based on multi-scale load semantic mining is characterized by comprising the following steps:
step 1, preprocessing original flow of a mobile application encryption network, extracting load characteristics of a transmission layer load, and converting the load characteristics into a decimal byte sequence;
step 2, constructing a pyramid neural network based on a load semantic mining block, and acquiring a word embedding characteristic and a position coding characteristic of a decimal byte sequence, wherein an input characteristic sequence is obtained by adding the word embedding characteristic and the position coding characteristic;
step 3, the load semantic mining block constructs a sliding window on the input feature sequence, the sliding window moves in sequence until the tail end of the input sequence, the features in the sliding window during each movement are extracted, and the features extracted in all the sliding windows are spliced in sequence to obtain the features of the input sequence;
step 4, performing feature compression and dimension reduction on the features of the input sequence to serve as a new input sequence, repeating the steps 3-4 k times, and splicing the features of the input sequence obtained in the step 3 repeatedly each time to obtain multi-scale features of the input sequence;
and 5, finishing classification of the encryption network application protocol types according to the multiple scale characteristics.
2. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 1, wherein the preprocessing process in the step 1 is as follows:
step 1.1, dividing the data packet into session flows according to quintuple;
step 1.2, cleaning the session stream, and removing the data packet retransmitted overtime, the data packet of the address resolution protocol and the data packet of the dynamic host configuration protocol;
step 1.3, extracting load characteristics of transmission layer loads in the data packets, and splicing the extracted load characteristics according to the arrival sequence of the data packets until the byte length after splicing reaches the set load characteristic length;
and step 1.4, converting the extracted spliced load characteristics into a decimal byte sequence.
3. The encryption application protocol type identification method based on multi-scale load semantic mining as claimed in claim 2, wherein in step 1.3, if the byte length of the spliced load features of all the data packets in the session stream is still smaller than the set load feature length, the byte length is padded with 0X 00.
4. The encryption application protocol type recognition method based on multi-scale load semantic mining according to claim 1 or 2, characterized in that in the step 2, byte features of decimal byte sequence are mapped to d-dimensional vector space to obtain word embedding features F1,where R represents a real number in the matrix.
5. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 4, wherein in the step 2, the position coding feature calculation method is as follows:
where pos denotes the position where the byte appears in the byte sequence, left side of formula (1)Position coding which indicates bytes in even positions, left-based or (2)>Indicates the position-coding of the byte in the odd position, based on the value of the flag>I is a position-coded dimension subscript modulo 2, and (1) indicates that even positions are->And (2) represents the odd number position is based on->,For the position-coded dimension, <' > H>For position-coding features, in formula (3)>Indicating the position code of each byte in the byte sequence.
6. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 1, wherein the substep of step 3 comprises:
step 3.1, constructing a sliding window with the size of L bytes on the input characteristic sequence;
step 3.2, performing feature extraction on the data in the sliding window by adopting a multi-head attention mechanism to obtain a feature F4;
step 3.3, carrying out residual error connection and layer normalization processing on the input sequence F3 and the characteristic F4 to obtain a characteristic F5;
step 3.4, performing two-layer full-connection layer operation on the characteristic F5 to obtain a characteristic F6;
step 3.5, carrying out residual error connection and layer normalization processing on the characteristic F5 and the characteristic F6 to obtain a characteristic F7;
step 3.6, moving the sliding window backwards by L bytes, and repeating the step 3.2 to the step 3.6 until the sliding window moves to the tail end of the input sequence;
and 3.7, splicing the features F7 in all the sliding windows to obtain a feature F8 which is used as the feature of the input sequence.
7. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 6, wherein the substep of the step 3.2 is:
step 3.2.1, performing multi-head self-attention calculation on the data in the sliding window, and extracting the association relation of byte sequences in the window;
and 3.2.2, repeating the step 3.2.1 for M times according to the set attention head number M, and splicing and linearly converting the extracted result every time to obtain the characteristic F4 of the data in the sliding window.
8. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 1, characterized in that in the step 4, a one-dimensional maximum pooling layer is adopted to complete feature compression and dimension reduction, and the dimension of the first dimension of the feature is halved for each pooling operation.
9. The encryption application protocol type identification method based on multiscale load semantic mining according to claim 1, wherein the substep of the step 5 comprises:
step 5.1, inputting the extracted multi-scale features into a full connection layer and an activation function, wherein the output dimension is consistent with the quantity of flow categories;
and 5.2, calculating the type of the encrypted network application protocol according to the output.
10. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 9, wherein in the step 5.2, the specific calculation method of the category is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310189712.1A CN115883263B (en) | 2023-03-02 | 2023-03-02 | Encryption application protocol type identification method based on multi-scale load semantic mining |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310189712.1A CN115883263B (en) | 2023-03-02 | 2023-03-02 | Encryption application protocol type identification method based on multi-scale load semantic mining |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115883263A true CN115883263A (en) | 2023-03-31 |
CN115883263B CN115883263B (en) | 2023-05-09 |
Family
ID=85761794
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310189712.1A Active CN115883263B (en) | 2023-03-02 | 2023-03-02 | Encryption application protocol type identification method based on multi-scale load semantic mining |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115883263B (en) |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104052749A (en) * | 2014-06-23 | 2014-09-17 | 中国科学技术大学 | Method for identifying link-layer protocol data types |
CN104506484A (en) * | 2014-11-11 | 2015-04-08 | 中国电子科技集团公司第三十研究所 | Proprietary protocol analysis and identification method |
CN105430021A (en) * | 2015-12-31 | 2016-03-23 | 中国人民解放军国防科学技术大学 | Encrypted traffic identification method based on load adjacent probability model |
EP3111612A1 (en) * | 2014-02-28 | 2017-01-04 | British Telecommunications Public Limited Company | Profiling for malicious encrypted network traffic identification |
US20180115567A1 (en) * | 2015-03-17 | 2018-04-26 | British Telecommunications Public Limited Company | Learned profiles for malicious encrypted network traffic identification |
CN110532564A (en) * | 2019-08-30 | 2019-12-03 | 中国人民解放军陆军工程大学 | Application layer protocol online identification method based on CNN and LSTM mixed model |
CN111211948A (en) * | 2020-01-15 | 2020-05-29 | 太原理工大学 | Shodan flow identification method based on load characteristics and statistical characteristics |
CN112163594A (en) * | 2020-08-28 | 2021-01-01 | 南京邮电大学 | Network encryption traffic identification method and device |
CN112511555A (en) * | 2020-12-15 | 2021-03-16 | 中国电子科技集团公司第三十研究所 | Private encryption protocol message classification method based on sparse representation and convolutional neural network |
CN113949653A (en) * | 2021-10-18 | 2022-01-18 | 中铁二院工程集团有限责任公司 | Encryption protocol identification method and system based on deep learning |
CN114358118A (en) * | 2021-11-29 | 2022-04-15 | 南京邮电大学 | Multi-task encrypted network traffic classification method based on cross-modal feature fusion |
WO2022094926A1 (en) * | 2020-11-06 | 2022-05-12 | 中国科学院深圳先进技术研究院 | Encrypted traffic identification method, and system, terminal and storage medium |
CN115277888A (en) * | 2022-09-26 | 2022-11-01 | 中国电子科技集团公司第三十研究所 | Method and system for analyzing message type of mobile application encryption protocol |
CN115348198A (en) * | 2022-10-19 | 2022-11-15 | 中国电子科技集团公司第三十研究所 | Unknown encryption protocol identification and classification method, device and medium based on feature retrieval |
CN115348215A (en) * | 2022-07-25 | 2022-11-15 | 南京信息工程大学 | Encrypted network flow classification method based on space-time attention mechanism |
-
2023
- 2023-03-02 CN CN202310189712.1A patent/CN115883263B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3111612A1 (en) * | 2014-02-28 | 2017-01-04 | British Telecommunications Public Limited Company | Profiling for malicious encrypted network traffic identification |
CN104052749A (en) * | 2014-06-23 | 2014-09-17 | 中国科学技术大学 | Method for identifying link-layer protocol data types |
CN104506484A (en) * | 2014-11-11 | 2015-04-08 | 中国电子科技集团公司第三十研究所 | Proprietary protocol analysis and identification method |
US20180115567A1 (en) * | 2015-03-17 | 2018-04-26 | British Telecommunications Public Limited Company | Learned profiles for malicious encrypted network traffic identification |
CN105430021A (en) * | 2015-12-31 | 2016-03-23 | 中国人民解放军国防科学技术大学 | Encrypted traffic identification method based on load adjacent probability model |
CN110532564A (en) * | 2019-08-30 | 2019-12-03 | 中国人民解放军陆军工程大学 | Application layer protocol online identification method based on CNN and LSTM mixed model |
CN111211948A (en) * | 2020-01-15 | 2020-05-29 | 太原理工大学 | Shodan flow identification method based on load characteristics and statistical characteristics |
WO2022041394A1 (en) * | 2020-08-28 | 2022-03-03 | 南京邮电大学 | Method and apparatus for identifying network encrypted traffic |
CN112163594A (en) * | 2020-08-28 | 2021-01-01 | 南京邮电大学 | Network encryption traffic identification method and device |
WO2022094926A1 (en) * | 2020-11-06 | 2022-05-12 | 中国科学院深圳先进技术研究院 | Encrypted traffic identification method, and system, terminal and storage medium |
CN112511555A (en) * | 2020-12-15 | 2021-03-16 | 中国电子科技集团公司第三十研究所 | Private encryption protocol message classification method based on sparse representation and convolutional neural network |
CN113949653A (en) * | 2021-10-18 | 2022-01-18 | 中铁二院工程集团有限责任公司 | Encryption protocol identification method and system based on deep learning |
CN114358118A (en) * | 2021-11-29 | 2022-04-15 | 南京邮电大学 | Multi-task encrypted network traffic classification method based on cross-modal feature fusion |
CN115348215A (en) * | 2022-07-25 | 2022-11-15 | 南京信息工程大学 | Encrypted network flow classification method based on space-time attention mechanism |
CN115277888A (en) * | 2022-09-26 | 2022-11-01 | 中国电子科技集团公司第三十研究所 | Method and system for analyzing message type of mobile application encryption protocol |
CN115348198A (en) * | 2022-10-19 | 2022-11-15 | 中国电子科技集团公司第三十研究所 | Unknown encryption protocol identification and classification method, device and medium based on feature retrieval |
Non-Patent Citations (2)
Title |
---|
JINHAI ZHANG: "Research on Key Technology of VPN Protocol Recognition" * |
刘帅: "基于机器学习的加密流量识别研究与实现" * |
Also Published As
Publication number | Publication date |
---|---|
CN115883263B (en) | 2023-05-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104918046B (en) | A kind of local description compression method and device | |
CN109818930B (en) | Communication text data transmission method based on TCP protocol | |
CN112702235B (en) | Method for automatically and reversely analyzing unknown protocol | |
CN112511555A (en) | Private encryption protocol message classification method based on sparse representation and convolutional neural network | |
CN112131395A (en) | Iterative knowledge graph entity alignment method based on dynamic threshold | |
CN103955539B (en) | Method and device for obtaining control field demarcation point in binary protocol data | |
CN113037646A (en) | Train communication network flow identification method based on deep learning | |
CN112887291A (en) | I2P traffic identification method and system based on deep learning | |
CN113778718B (en) | Dynamic routing-based micro-service resource management method and system and electronic equipment | |
CN108462707A (en) | A kind of mobile application recognition methods based on deep learning sequence analysis | |
CN115473850B (en) | AI-based real-time data filtering method, system and storage medium | |
CN113128626A (en) | Multimedia stream fine classification method based on one-dimensional convolutional neural network model | |
CN115277888B (en) | Method and system for analyzing message type of mobile application encryption protocol | |
CN111355671B (en) | Network traffic classification method, medium and terminal equipment based on self-attention mechanism | |
CN108563795B (en) | Pairs method for accelerating matching of regular expressions of compressed flow | |
CN110796182A (en) | Bill classification method and system for small amount of samples | |
CN112383488B (en) | Content identification method suitable for encrypted and non-encrypted data streams | |
CN115883263A (en) | Encryption application protocol type identification method based on multi-scale load semantic mining | |
CN114519390A (en) | QUIC flow classification method based on multi-mode deep learning | |
CN108573069B (en) | Twins method for accelerating matching of regular expressions of compressed flow | |
CN117640193A (en) | Industrial control threat detection method based on application layer effective load extraction | |
CN113852605B (en) | Protocol format automatic inference method and system based on relation reasoning | |
CN114553790A (en) | Multi-mode feature-based small sample learning Internet of things traffic classification method and system | |
CN101262493B (en) | Method for accelerating inter-network data transmission via stream buffer | |
CN111641624B (en) | Network protocol header compression method based on decision tree |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |