CN115883263B - Encryption application protocol type identification method based on multi-scale load semantic mining - Google Patents
Encryption application protocol type identification method based on multi-scale load semantic mining Download PDFInfo
- Publication number
- CN115883263B CN115883263B CN202310189712.1A CN202310189712A CN115883263B CN 115883263 B CN115883263 B CN 115883263B CN 202310189712 A CN202310189712 A CN 202310189712A CN 115883263 B CN115883263 B CN 115883263B
- Authority
- CN
- China
- Prior art keywords
- features
- sequence
- load
- scale
- application protocol
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000005065 mining Methods 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000013528 artificial neural network Methods 0.000 claims abstract description 15
- 230000009467 reduction Effects 0.000 claims abstract description 8
- 238000012545 processing Methods 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000005540 biological transmission Effects 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 8
- 230000006835 compression Effects 0.000 claims description 7
- 238000007906 compression Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000004140 cleaning Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 239000000284 extract Substances 0.000 abstract description 4
- 238000000605 extraction Methods 0.000 description 11
- 238000013135 deep learning Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000005206 flow analysis Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Abstract
The invention provides an encryption application protocol type identification method based on multi-scale load semantic mining, which comprises the following steps: step 1, extracting load characteristics from original flow and converting the load characteristics into a decimal byte sequence; step 2, constructing a pyramid neural network based on the load semantic mining block, and processing a decimal byte sequence to obtain an input feature sequence; step 3, a load semantic mining block constructs a sliding window on the input feature sequence, the sliding window sequentially moves to the tail end of the sequence, and features extracted from the splicing window obtain features of the input sequence; step 4, performing dimension reduction on the characteristics of the input sequence to serve as a new input sequence, repeating the steps 3-4, and splicing the characteristics obtained each time to obtain multi-scale characteristics; and 5, finishing classification of the encrypted network application protocol types according to the multi-scale characteristics. The method and the device can extract the multi-scale characteristics in the encrypted network application protocol message under the complex scene, and improve the speed and the accuracy of encrypted flow identification.
Description
Technical Field
The invention relates to the field of flow analysis, in particular to an encryption application protocol type identification method based on multi-scale load semantic mining.
Background
Traffic classification has found very widespread application, which is the basis for network security and network management, from QoS services in network service providers to detection of security applications in firewalls and intrusion detection systems. At present, methods based on port numbers, deep packet inspection, machine learning and the like are mainly adopted for traffic classification, but certain disadvantages exist:
(1) Traditional port number based approaches have failed since newer applications either use well known port numbers to mask their traffic or do not use standard registered port numbers.
(2) The way in which deep packet inspection relies on finding keys in the packets, which can fail in the face of encrypted traffic.
(3) Encryption network traffic identification methods based on machine learning rely heavily on ergonomic features, which limit their popularity.
With the popularity of deep learning methods, researchers have studied the effects of these methods on traffic classification tasks and demonstrated higher accuracy on early mobile application traffic datasets. With the continuous upgrading of encryption protocols, explosive growth of the number of mobile applications and the change of development modes of the mobile applications, a shallow deep learning model cannot meet the actual requirements of mobile application flow identification in the current complex scene. Although the encryption traffic identification method based on the transducer has a better effect on feature learning, global features are focused more in the feature extraction process, detail features hidden in high-fraction load data are ignored, and the local features are key for realizing accurate classification in many cases.
Disclosure of Invention
In order to solve the problem that deep features in encrypted traffic cannot be learned by a shallow neural network in the current complex scene and detail features are lost due to the fact that global features are excessively focused by the existing deep neural network, the invention provides a novel encrypted network application protocol type identification method.
The technical scheme adopted by the invention is as follows: the encryption application protocol type identification method based on multi-scale load semantic mining comprises the following steps:
step 3, a load semantic mining block constructs a sliding window on the input feature sequence, the sliding window sequentially moves until the tail end of the input sequence, the features in the sliding window are extracted when each movement is performed, and the features extracted in all the sliding windows are sequentially spliced to obtain the features of the input sequence;
step 4, performing feature compression and dimension reduction on the features of the input sequence to serve as a new input sequence, repeating the steps 3-4 and k times, and splicing the features of the input sequence obtained each time to obtain multi-scale features of the input sequence;
and 5, finishing classification of the encrypted network application protocol types according to the multi-scale characteristics.
Further, the pretreatment process in the step 1 is as follows:
step 1.1, dividing a data packet into session flows according to five-tuple;
step 1.2, cleaning the session stream, and removing data packets retransmitted over time, address resolution protocol and dynamic host configuration protocol;
step 1.3, extracting load characteristics of a transmission layer load in a data packet, and splicing the extracted load characteristics according to the arrival sequence of the data packet until the length of bytes after splicing reaches the set load characteristic length;
and step 1.4, converting the extracted spliced load characteristics into a decimal byte sequence.
Further, in the step 1.3, if the byte length after the payload features of all the data packets in the session stream are spliced is still smaller than the set payload feature length, the padding is performed with 0X 00.
Further, in the step 2, the byte characteristic of the decimal byte sequence is mapped to the vector space of d dimension to obtain the word embedding characteristic F1,where R represents a real number in the matrix.
Further, in the step 2, the method for calculating the position coding feature is as follows:
where pos represents the position where the byte appears in the byte sequence, (1) left of formulaPosition coding of bytes representing even positions, (2) left +.>Representing the position of bytes in odd positionsCoding (I)>I is the modulus of the dimension subscript pair 2 of the position code, (1) represents the even position with +.>(2) formula represents odd positions +.>,/>Dimension for position coding->Is a position coding feature, in the formula (3)>Representing the position encoding of each byte in the byte sequence.
Further, the substep of the step 3 includes:
step 3.1, constructing a sliding window with the length of L bytes on an input sequence;
step 3.2, extracting features of the data in the sliding window by adopting a multi-head attention mechanism to obtain features F4;
step 3.3, carrying out residual connection and layer normalization processing on the input sequence F3 and the characteristic F4 to obtain a characteristic F5;
step 3.4, performing two-layer full-connection layer operation on the feature F5 to obtain a feature F6;
step 3.5, carrying out residual connection and layer normalization processing on the characteristic F5 and the characteristic F6 to obtain a characteristic F7;
step 3.6, the sliding window moves backwards by L bytes, and the steps 3.2-3.6 are repeated until the sliding window moves to the tail end of the input sequence;
and 3.7, splicing the features F7 in all the sliding windows to obtain features F8 serving as features of the input sequence.
Further, the substeps of the step 3.2 are as follows:
step 3.2.1, performing multi-head self-attention calculation on the data in the sliding window, and extracting the association relation of byte sequences in the window;
and 3.2.2, repeating the step 3.2.1 for M times according to the set attention head number M, and performing splicing and linear transformation on the extracted result each time to obtain the characteristic F4 of the data in the sliding window.
Further, in the step 4, the feature compression and dimension reduction are completed by adopting a one-dimensional maximum pooling layer, and each pooling operation halves the dimension of the first dimension of the feature.
Further, the substep of step 5 includes:
step 5.1, inputting the extracted multi-scale features into a full-connection layer and an activation function, wherein the output dimension is consistent with the number of flow categories;
and 5.2, calculating the category of the encrypted network application protocol type according to the output.
Further, in the step 5.2, the specific calculation method of the category is:
where Z represents the output of the activation function and the multi-scale feature input fully-connected layer.
Compared with the prior art, the beneficial effects of adopting the technical scheme are as follows:
1. the pyramid network constructed based on the load semantic mining block can extract multi-scale features in the message type of the encryption network application protocol under the current complex scene, and fully extract global features and multi-scale local features, so that the accuracy of encryption traffic identification is improved.
2. When the local features are extracted, a sliding window mode is adopted, and each self-attention calculation is performed in a window coverage range, so that noise is prevented from being introduced when the local features are extracted, model parameters are greatly reduced, and the calculation speed of a model is improved.
3. The method is based on the load data on the transmission layer in the network traffic data to learn and classify, and does not depend on the IP address and port number information of the head of the network traffic data packet, so that the generalization capability of the classification model is strong; the strong identification information such as the IP address and the port number information of the header of the network traffic data packet has no universality, and may cause strong interference to the final identification result.
Drawings
Fig. 1 is a flowchart of an encryption application protocol type identification method based on multi-scale load semantic mining.
Fig. 2 is a schematic diagram of a pyramid network model according to an embodiment of the invention.
FIG. 3 is a flow chart of a sliding window implementation in an embodiment of the invention.
FIG. 4 is a schematic diagram of multi-scale feature extraction in accordance with one embodiment of the invention.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar modules or modules having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the present application include all alternatives, modifications, and equivalents as may be included within the spirit and scope of the appended claims.
Aiming at the problems that a shallow neural network cannot learn deep features in encrypted traffic under the current complex scene and detail features are lost due to the fact that the existing deep neural network pays attention to global features excessively, the embodiment provides an encryption application protocol type identification method for extracting multi-scale features based on load semantic mining deep neural network.
As shown in fig. 1, the encryption application protocol type identification method based on multi-scale load semantic mining includes:
step 3, a load semantic mining block constructs a sliding window on the input feature sequence, the sliding window sequentially moves until the tail end of the input sequence, the features in the sliding window are extracted when each movement is performed, and the features extracted in all the sliding windows are sequentially spliced to obtain the features of the input sequence;
step 4, performing feature compression and dimension reduction on the features of the input sequence to serve as a new input sequence, repeating the steps 3-4 and k times, and splicing the features of the input sequence obtained each time to obtain multi-scale features of the input sequence;
and 5, finishing classification of the encrypted network application protocol types according to the multi-scale characteristics.
Since the pre-identification information such as the IP address and the port number information of the network traffic packet header is not universal, strong interference may be caused to the identification result, in this embodiment, learning and classification are performed based on the presence and the data on the network traffic data transmission layer, and the pre-identification information is not dependent on the IP address, the port number, and the like of the network traffic packet header.
Before parsing, the original flow needs to be preprocessed, specifically:
step 1.1, dividing the received data packet into session flows according to five-tuple (source IP, destination IP, source port, destination port, transport layer protocol), and identifying the flows by taking the session flows as units.
In step 1.2, packets that are not related to the specific traffic of the transmission content exist in the received packets, so that the session stream needs to be cleaned, and packets that are retransmitted over time, address Resolution Protocol (ARP), and Dynamic Host Configuration Protocol (DHCP) are removed. In this embodiment, the cleaning is accomplished using a wireshare Tshark tool.
And 1.3, after the irrelevant data packets are removed, extracting the load characteristics of the transmission layer load of the rest data packets, and extracting the load characteristics of the transmission layer load according to the arrival sequence of the data packets to splice until the length of the extracted bytes reaches the set load characteristic length N. It should be noted that, in this embodiment, if the byte length after the concatenation of the payload features of all the packets in the session stream is smaller than N, 0X00 is used for padding.
Preferably, the load characteristics of the transport layer load are extracted by using the rdpcap method of the Scapy tool in this embodiment.
And 1.4, converting the extracted and spliced binary load characteristics into a decimal byte sequence, namely converting each byte into a corresponding decimal number (0-255).
After obtaining the decimal byte sequence representing the transmission layer characteristics, the analysis of the traffic class can be started, and in this embodiment, features of different scales in the load (decimal byte sequence) are extracted by using the constructed Pyramid-shaped neural network (Pyramid-Transformer).
The current encryption traffic recognition model based on a transform (a deep learning framework) uses a self-attention mechanism to more attention to the extraction of global features and neglects the extraction of local features, which may be the key for realizing fine classification, and meanwhile, the phenomenon that the local features have inconsistent scales may exist, so that interference may exist in the extraction process.
As shown in fig. 2 and 4, in the embodiment, in step 2, a Pyramid-shaped neural network (pyremid-Transformer) constructed based on a plurality of load semantic mining blocks (Pyramid Transformer block) is provided, and a one-dimensional maximum pooling layer is arranged between each load semantic mining block, so as to realize compression and dimension reduction in the feature extraction process. Each load semantic mining block has the same composition and comprises six parts of multi-head attention calculation, residual error connection, layer normalization, two layers of full-connection layer+activation functions, residual error connection and layer normalization which are connected in sequence. Deep multi-scale features are extracted through a mode that a plurality of load semantic mining blocks are stacked, feature dimensions are compressed to 1/2 after each load semantic mining block extracts the features, the compressed features are input into the next load semantic mining block, the window size is unchanged, features with larger scale are extracted through the mode, the feature dimensions extracted by each load semantic mining block are reduced to form a pyramid shape, and the features are spliced to obtain final features.
Specific explanation is given to the process of realizing feature extraction of the pyramid neural network:
feature extraction is mainly completed through a load semantic mining block in the pyramid neural network, and the input of the load semantic mining block is the combination of word embedding features and position coding features of a byte sequence, so that a decimal byte sequence needs to be processed first.
For byte sequences (in figures 2, 4, areB1、B2、…、BN-1、BN-2) Performing word embedding operation, mapping byte features to d-dimensional vector space to obtain word embedded features F1 as subsequent input,where R represents a real number in the matrix.
Calculating position-coding features of byte sequences,/>Wherein R represents real numbers in the matrix:
where pos represents the position where the byte appears in the byte sequence, (1) left of formulaPosition coding of bytes representing even positions, (2) left +.>Position coding of bytes representing odd positions, < >>I is the modulus of the dimension subscript pair 2 of the position code, (1) represents the even position with +.>(2) formula represents odd positions +.>,/>Dimension for position coding; (3) In->Representing the position encoding of each byte in the byte sequence. Because the transducer uses global information, byte order information cannot be utilized, which is important for feature learning, the present embodiment acquires position-coded features.
Combining the word embedding feature and the position coding feature according to the formula (4) to obtain the input feature of the load semantic mining block,/>Where R represents a real number in the matrix.
After the input of the load semantic mining block is determined, the feature extraction can be performed through the load semantic mining block, and the method specifically comprises the following steps:
in step 3.1, because some detail features only exist on a small number of adjacent bytes, the direct feature extraction of the whole input sequence may cause interference to local detail features, and a sliding window is used to ensure that the high-resolution local detail features are not destroyed. Thus at the input featureA sliding window with the size of L is constructed, and as shown in fig. 3, feature extraction is performed on the data inside the window.
Step 3.2, acquiring the data in the sliding window as,/>Adopts a multi-head attention mechanism pair +.>Extracting features to obtain features->,/>,/>The global dependency of the bytes within the window is contained, whereas the view of the entire byte sequence is obtained here as a local feature within the window. />
The specific process comprises the following steps:
step 3.2.1, pairPerforming multi-head self-attention calculation, and extracting correlation relation of byte sequences in windowThe system is as follows:
using a weight matrix,/>,/>Computing features->Is->The calculation process is shown as the formula (5), the formula (6) and the formula (7):
by passing throughThe matrix operation of (a) implements a self-Attention mechanism (Attention), resulting in an output +.>,/>:
Wherein, is->The column number of the matrix, i.e. the vector dimension, and +.>Same (I)>Transpose the matrix. Calculating matrix +.>And->The inner product of each row vector is divided by +.>. After the transpose of Q multiplied by K, the obtained matrix row and column numbers are L, wherein L is the window size, and the matrix can represent the association strength between bytes. Obtain->Afterwards, use +.>The function (normalized exponential function) calculates the self-attention coefficient of each byte for the other bytes, in the formula +.>Each row of the matrix is normalized, i.e. the sum of each row becomes 1.
Step 3.2.2, setting the number M of attention heads, repeating the step 3.2.1M times to obtain M output Z, and splicing and linearly transforming the M Z to obtain the characteristic,/>:
Wherein, output representing the first calculation, +.>Indicate->Output of the secondary calculation, +.>Weight matrix representing a linear transformation, +.>。
Step 3.3, pairAnd->Performing residual connection and layer normalization operation to obtain characteristic ∈>:
Wherein LayerNorm represents the layer normalization operation.
Step 3.4, pairPerforming a Forward propagation (Feed Forward) operation to obtain the characteristic +.>,/>。
Wherein, linear represents performing a full-connection layer operation; feed Forward consists of two fully connected layers, the first layer using an activation function RELU and the second layer not using an activation function.
Step 3.5, pairAnd->Performing residual connection and layer normalization operation to obtain characteristic ∈>,/>:
Step 3.6, moving the sliding window backwards by L bytes, and re-executing the steps 3.2-3.5 in the new window until the sliding window moves to the input featureEnding;
Wherein, representing the characteristics obtained for the first window +.>,/>Representing the feature obtained for the last window->。
In order to extract the multi-scale features of the byte sequence, in step 4 of this embodiment, one-dimensional maximum pooling layer pair features are first usedPerforming feature compression and dimension reduction to obtain feature ∈F>,/>:
The MaxPool1d represents one-dimensional maximum pooling operation, the pooling operation halves the dimension of the first dimension of the feature, and meanwhile, the new feature has more abundant semantic information.
Setting the repetition number k according to the requirement, repeating the steps 3-4 and k times except for inputting the characteristic in the process of executing the step 3 for the first timeIn addition, when step 3 is performed subsequently, the feature obtained in the last step 4 is +.>As an input for this time.
As shown in fig. 4, the repeated operation represents stacking the load semantic mining blocks of the pyramid network model multiple times, progressively extracting features of deeper and higher semantics layer by layer, in fig. 4, the feature dimension is represented by N, d, N is the same as the length of the input byte sequence, and d is the same as the dimension of each byte extension after the word embedding operation. Wherein, characteristic of the first repetition>,/>Characteristic of the kth operation +.>。
The characteristics obtained at this timeI.e., multi-scale features in the desired load. After the multi-scale characteristics are obtained, flow classification can be performed:
in this embodiment, the classification process specifically includes:
step 5.1, extracting the multiscale featuresInput fully connected layer and activation function>Output ofDimension and number of traffic categories +.>And consistent.
Step 5.2, calculating and outputting the category of the encrypted network application protocol type:
the embodiment constructs a deep neural network, namely a pyramid neural network, and stacks the load semantic mining blocks, so that deep features in the type of the encryption protocol message in the current complex scene can be extracted, and the accuracy of flow identification is improved.
It should be noted that, in the description of the embodiments of the present invention, unless explicitly specified and limited otherwise, the terms "disposed," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; may be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in the present invention will be understood in detail by those skilled in the art; the accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.
Claims (9)
1. The encryption application protocol type identification method based on multi-scale load semantic mining is characterized by comprising the following steps:
step 1, preprocessing the original traffic of a mobile application encryption network, extracting the load characteristics of a transmission layer load, and converting the load characteristics into a decimal byte sequence;
step 2, constructing a pyramid neural network based on a load semantic mining block, and acquiring word embedding features and position coding features of a decimal byte sequence, wherein the word embedding features and the position coding features are added to obtain an input feature sequence;
step 3, a load semantic mining block constructs a sliding window on the input feature sequence, the sliding window sequentially moves until the tail end of the input sequence, the features in the sliding window are extracted when each movement is performed, and the features extracted in all the sliding windows are sequentially spliced to obtain the features of the input sequence;
step 4, performing feature compression and dimension reduction on the features of the input sequence to serve as a new input sequence, repeating the steps 3-4 and k times, and performing splicing processing on the features of the input sequence obtained in each repeated step 3 to obtain multi-scale features of the input sequence;
step 5, completing classification of the encrypted network application protocol types according to the multi-scale characteristics;
the substep of the step 3 comprises the following steps:
step 3.1, constructing a sliding window with a length of L bytes on an input characteristic sequence;
step 3.2, extracting features of the data in the sliding window by adopting a multi-head attention mechanism to obtain features F4;
step 3.3, carrying out residual connection and layer normalization processing on the input sequence F3 and the characteristic F4 to obtain a characteristic F5;
step 3.4, performing two-layer full-connection layer operation on the feature F5 to obtain a feature F6;
step 3.5, carrying out residual connection and layer normalization processing on the characteristic F5 and the characteristic F6 to obtain a characteristic F7;
step 3.6, the sliding window moves backwards by L bytes, and the steps 3.2-3.6 are repeated until the sliding window moves to the tail end of the input sequence;
and 3.7, splicing the features F7 in all the sliding windows to obtain features F8 serving as features of the input sequence.
2. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 1, wherein the preprocessing process in step 1 is as follows:
step 1.1, dividing a data packet into session flows according to five-tuple;
step 1.2, cleaning the session stream, and removing data packets retransmitted over time, address resolution protocol and dynamic host configuration protocol;
step 1.3, extracting load characteristics of a transmission layer load in a data packet, and splicing the extracted load characteristics according to the arrival sequence of the data packet until the length of bytes after splicing reaches the set load characteristic length;
and step 1.4, converting the extracted spliced load characteristics into a decimal byte sequence.
3. The encryption application protocol type recognition method based on multi-scale payload semantic mining according to claim 2, wherein in the step 1.3, if the byte length after the payload features of all the data packets in the session stream are spliced is still smaller than the set payload feature length, the byte length is filled with 0X 00.
4. The encryption application protocol type recognition method based on multi-scale payload semantic mining according to claim 1 or 2, wherein in step 2, byte characteristics of the decimal byte sequence are mappedInjecting into the vector space of d dimension to obtain word embedding feature F1,where R represents a real number in the matrix.
5. The encryption application protocol type recognition method based on multi-scale load semantic mining according to claim 4, wherein in the step 2, the position coding feature calculation method is as follows:
where pos represents the position where the byte appears in the byte sequence, (1) left of formulaPosition coding of bytes representing even positions, (2) left +.>Position coding of bytes representing odd positions, < >>I is the modulus of the dimension subscript pair 2 of the position code, (1) represents the even position with +.>(2) formula represents odd positions +.>,/>Dimension for position coding->Is a position coding feature, in the formula (3)>Representing the position encoding of each byte in the byte sequence.
6. The encryption application protocol type identification method based on multi-scale payload semantic mining according to claim 1, wherein the substeps of step 3.2 are:
step 3.2.1, performing multi-head self-attention calculation on the data in the sliding window, and extracting the association relation of byte sequences in the window;
and 3.2.2, repeating the step 3.2.1 for M times according to the set attention head number M, and performing splicing and linear transformation on the extracted result each time to obtain the characteristic F4 of the data in the sliding window.
7. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 1, wherein in the step 4, feature compression and dimension reduction are completed by adopting a one-dimensional maximum pooling layer, and each pooling operation halves the dimension of a first dimension of a feature.
8. The encryption application protocol type recognition method based on multi-scale payload semantic mining according to claim 1, wherein the substep of step 5 includes:
step 5.1, inputting the extracted multi-scale features into a full-connection layer and an activation function, wherein the output dimension is consistent with the number of flow categories;
and 5.2, calculating the category of the encrypted network application protocol type according to the output.
9. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 8, wherein in the step 5.2, the specific calculation method of the category is:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310189712.1A CN115883263B (en) | 2023-03-02 | 2023-03-02 | Encryption application protocol type identification method based on multi-scale load semantic mining |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310189712.1A CN115883263B (en) | 2023-03-02 | 2023-03-02 | Encryption application protocol type identification method based on multi-scale load semantic mining |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115883263A CN115883263A (en) | 2023-03-31 |
CN115883263B true CN115883263B (en) | 2023-05-09 |
Family
ID=85761794
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310189712.1A Active CN115883263B (en) | 2023-03-02 | 2023-03-02 | Encryption application protocol type identification method based on multi-scale load semantic mining |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115883263B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104052749A (en) * | 2014-06-23 | 2014-09-17 | 中国科学技术大学 | Method for identifying link-layer protocol data types |
CN104506484A (en) * | 2014-11-11 | 2015-04-08 | 中国电子科技集团公司第三十研究所 | Proprietary protocol analysis and identification method |
CN105430021A (en) * | 2015-12-31 | 2016-03-23 | 中国人民解放军国防科学技术大学 | Encrypted traffic identification method based on load adjacent probability model |
EP3111612A1 (en) * | 2014-02-28 | 2017-01-04 | British Telecommunications Public Limited Company | Profiling for malicious encrypted network traffic identification |
CN110532564A (en) * | 2019-08-30 | 2019-12-03 | 中国人民解放军陆军工程大学 | A kind of application layer protocol online recognition method based on CNN and LSTM mixed model |
CN111211948A (en) * | 2020-01-15 | 2020-05-29 | 太原理工大学 | Shodan flow identification method based on load characteristics and statistical characteristics |
CN112163594A (en) * | 2020-08-28 | 2021-01-01 | 南京邮电大学 | Network encryption traffic identification method and device |
CN112511555A (en) * | 2020-12-15 | 2021-03-16 | 中国电子科技集团公司第三十研究所 | Private encryption protocol message classification method based on sparse representation and convolutional neural network |
WO2022094926A1 (en) * | 2020-11-06 | 2022-05-12 | 中国科学院深圳先进技术研究院 | Encrypted traffic identification method, and system, terminal and storage medium |
CN115277888A (en) * | 2022-09-26 | 2022-11-01 | 中国电子科技集团公司第三十研究所 | Method and system for analyzing message type of mobile application encryption protocol |
CN115348215A (en) * | 2022-07-25 | 2022-11-15 | 南京信息工程大学 | Encrypted network flow classification method based on space-time attention mechanism |
CN115348198A (en) * | 2022-10-19 | 2022-11-15 | 中国电子科技集团公司第三十研究所 | Unknown encryption protocol identification and classification method, device and medium based on feature retrieval |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107637041B (en) * | 2015-03-17 | 2020-09-29 | 英国电讯有限公司 | Method and system for identifying malicious encrypted network traffic and computer program element |
CN113949653B (en) * | 2021-10-18 | 2023-07-07 | 中铁二院工程集团有限责任公司 | Encryption protocol identification method and system based on deep learning |
CN114358118A (en) * | 2021-11-29 | 2022-04-15 | 南京邮电大学 | Multi-task encrypted network traffic classification method based on cross-modal feature fusion |
-
2023
- 2023-03-02 CN CN202310189712.1A patent/CN115883263B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3111612A1 (en) * | 2014-02-28 | 2017-01-04 | British Telecommunications Public Limited Company | Profiling for malicious encrypted network traffic identification |
CN104052749A (en) * | 2014-06-23 | 2014-09-17 | 中国科学技术大学 | Method for identifying link-layer protocol data types |
CN104506484A (en) * | 2014-11-11 | 2015-04-08 | 中国电子科技集团公司第三十研究所 | Proprietary protocol analysis and identification method |
CN105430021A (en) * | 2015-12-31 | 2016-03-23 | 中国人民解放军国防科学技术大学 | Encrypted traffic identification method based on load adjacent probability model |
CN110532564A (en) * | 2019-08-30 | 2019-12-03 | 中国人民解放军陆军工程大学 | A kind of application layer protocol online recognition method based on CNN and LSTM mixed model |
CN111211948A (en) * | 2020-01-15 | 2020-05-29 | 太原理工大学 | Shodan flow identification method based on load characteristics and statistical characteristics |
CN112163594A (en) * | 2020-08-28 | 2021-01-01 | 南京邮电大学 | Network encryption traffic identification method and device |
WO2022041394A1 (en) * | 2020-08-28 | 2022-03-03 | 南京邮电大学 | Method and apparatus for identifying network encrypted traffic |
WO2022094926A1 (en) * | 2020-11-06 | 2022-05-12 | 中国科学院深圳先进技术研究院 | Encrypted traffic identification method, and system, terminal and storage medium |
CN112511555A (en) * | 2020-12-15 | 2021-03-16 | 中国电子科技集团公司第三十研究所 | Private encryption protocol message classification method based on sparse representation and convolutional neural network |
CN115348215A (en) * | 2022-07-25 | 2022-11-15 | 南京信息工程大学 | Encrypted network flow classification method based on space-time attention mechanism |
CN115277888A (en) * | 2022-09-26 | 2022-11-01 | 中国电子科技集团公司第三十研究所 | Method and system for analyzing message type of mobile application encryption protocol |
CN115348198A (en) * | 2022-10-19 | 2022-11-15 | 中国电子科技集团公司第三十研究所 | Unknown encryption protocol identification and classification method, device and medium based on feature retrieval |
Non-Patent Citations (2)
Title |
---|
Jinhai Zhang.Research on Key Technology of VPN Protocol Recognition.《2018 IEEE International Conference of Safety Produce Informatization (IICSPI)》.2019,161-164页. * |
刘帅.基于机器学习的加密流量识别研究与实现.《中国优秀硕士学位论文全文数据库 信息科技辑》.2021,I139-28页. * |
Also Published As
Publication number | Publication date |
---|---|
CN115883263A (en) | 2023-03-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112163594B (en) | Network encryption traffic identification method and device | |
JP4456554B2 (en) | Data compression method and compressed data transmission method | |
CN104036012B (en) | Dictionary learning, vision bag of words feature extracting method and searching system | |
CN112511555A (en) | Private encryption protocol message classification method based on sparse representation and convolutional neural network | |
CN109818930B (en) | Communication text data transmission method based on TCP protocol | |
CN113179223A (en) | Network application identification method and system based on deep learning and serialization features | |
CN108462707B (en) | Mobile application identification method based on deep learning sequence analysis | |
CN113313156A (en) | Internet of things equipment identification method and system based on time sequence load flow fingerprints | |
EP3716547A1 (en) | Data stream recognition method and apparatus | |
CN113780447A (en) | Sensitive data discovery and identification method and system based on flow analysis | |
CN116192523A (en) | Industrial control abnormal flow monitoring method and system based on neural network | |
CN112887291A (en) | I2P traffic identification method and system based on deep learning | |
CN104463922B (en) | A kind of characteristics of image coding and recognition methods based on integrated study | |
CN115883263B (en) | Encryption application protocol type identification method based on multi-scale load semantic mining | |
CN112383488B (en) | Content identification method suitable for encrypted and non-encrypted data streams | |
CN115248924A (en) | Two-dimensional code processing method and device, electronic equipment and storage medium | |
CN108563795B (en) | Pairs method for accelerating matching of regular expressions of compressed flow | |
CN104767998B (en) | A kind of visual signature coding method and device towards video | |
CN108573069B (en) | Twins method for accelerating matching of regular expressions of compressed flow | |
CN114553790A (en) | Multi-mode feature-based small sample learning Internet of things traffic classification method and system | |
CN113852605A (en) | Protocol format automatic inference method and system based on relational reasoning | |
CN115473850A (en) | Real-time data filtering method and system based on AI and storage medium | |
US20070050489A1 (en) | Method to Exchange Objects Between Object-Oriented and Non-Object-Oriented Environments | |
CN114048799A (en) | Zero-day traffic classification method based on statistical information and payload coding | |
JP4456574B2 (en) | Compressed data transmission method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |