CN116668186B - Encryption agent protocol identification method based on multi-view features and ensemble learning - Google Patents
Encryption agent protocol identification method based on multi-view features and ensemble learning Download PDFInfo
- Publication number
- CN116668186B CN116668186B CN202310879928.0A CN202310879928A CN116668186B CN 116668186 B CN116668186 B CN 116668186B CN 202310879928 A CN202310879928 A CN 202310879928A CN 116668186 B CN116668186 B CN 116668186B
- Authority
- CN
- China
- Prior art keywords
- flow
- protocol
- features
- characteristic
- encryption
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000005538 encapsulation Methods 0.000 claims abstract description 33
- 239000003795 chemical substances by application Substances 0.000 claims abstract description 24
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 22
- 238000000605 extraction Methods 0.000 claims abstract description 13
- 238000013145 classification model Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims abstract description 10
- 239000013598 vector Substances 0.000 claims abstract description 6
- 230000005540 biological transmission Effects 0.000 claims description 5
- 102100026278 Cysteine sulfinic acid decarboxylase Human genes 0.000 claims description 3
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 238000011161 development Methods 0.000 claims description 3
- 108010064775 protein C activator peptide Proteins 0.000 claims description 3
- 239000002131 composite material Substances 0.000 claims 2
- 230000006866 deterioration Effects 0.000 abstract 1
- 238000010801 machine learning Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/08—Network architectures or network communication protocols for network security for authentication of entities
- H04L63/0884—Network architectures or network communication protocols for network security for authentication of entities by delegation of authentication, e.g. a proxy authenticates an entity to be authenticated on behalf of this entity vis-à-vis an authentication entity
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Abstract
The invention discloses a method for identifying an encryption proxy protocol based on multi-view features and ensemble learning, and belongs to the technical field of encryption proxy protocol identification. The method solves the problem that the encryption proxy protocol identification method in the prior art cannot clearly represent the network flow of the complete encryption proxy protocol; the invention comprises the following steps: s1, constructing a multi-view feature extraction algorithm to extract space-time related features, connection management features, flow encapsulation features, authentication mode features and flow confusion features, and taking a set of 135-dimensional feature vectors extracted from each network flow as a data sample set; s2, interpolating the data sample set by adopting an SMOTE oversampling algorithm to obtain an SMOTE balanced data sample set; s3, constructing an integrated learning classification model MvBoost according to the SMOTE balance data sample set to obtain an encryption agent protocol classification recognition result. The invention can effectively identify the encryption proxy protocol and avoid the deterioration of model data training.
Description
Technical Field
The invention relates to a method for identifying an encryption proxy protocol, in particular to a method for identifying the encryption proxy protocol based on multi-view features and ensemble learning, belonging to the technical field of encryption proxy protocol identification.
Background
The encryption agent protocol identification problem may be defined as a classification problem of an encryption agent protocol versus a non-encryption agent protocol, or as a multi-classification problem that distinguishes between multiple encryption agent protocols. The currently mainly studied encryption proxy protocols comprise VPN protocols, shadowsocks protocols, V2Ray protocols and the like, network traffic of different encryption proxy protocols show different data characteristics and behavior characteristics, and the current encryption proxy protocol identification method mainly adopts machine learning or deep learning technology to learn network traffic characteristics so as to realize classification of the encryption proxy protocol traffic.
The encryption agent protocol flow classification method based on machine learning needs to determine effective network flow characteristics, such as time-related characteristics, data packet-related characteristics, statistical characteristics, behavior characteristics and the like, then learns the extracted network flow characteristics through random forest, C4.5 decision tree, GBDT, SVM and other algorithms, and identifies the encryption agent protocol, wherein the identification effect of the random forest algorithm is superior to other algorithms in most cases, but the method based on machine learning also has corresponding problems, and the method needs to manually select the used network flow characteristics, and how to select and which characteristics are difficult to judge.
The encryption proxy protocol flow classification method based on deep learning can automatically learn related characteristic information from network flows, simplifies the encryption proxy protocol identification process, takes a CNN model as a main component, and proves the effectiveness of the CNN model in encryption proxy protocol identification tasks through a large number of experiments.
The existing encryption agent protocol identification method mainly relies on time-space related characteristics and uses a traditional machine learning model for identification, and has the following defects in terms of characteristic selection and algorithm model:
1. the time-space related characteristics can not clearly represent the complete network flow of the encryption proxy protocol, and the encryption proxy protocol has various differences from the common encryption traffic, including authentication mode, connection management, encapsulation protocol, traffic confusion and the like;
2. in the actual encryption proxy flow collection process, as the connection management modes of proxy protocols are different, only a small number of proxy protocols use a single tunnel model of VPN technology to carry out flow proxy, so that the number of network flow samples of the encryption proxy protocol using single tunnel connection management is small, the distribution of the network flow samples of the encryption proxy protocol is unbalanced, and the stability of a classifier is influenced;
3. the bias of different features in the classifier training process is not considered in the traditional machine learning algorithm, namely different feature sets have different influences on the classification result of the sample, and the training algorithm simply adopts all the feature sets to train the algorithm can cause the data training to be poor.
Therefore, there is a need for an encryption proxy protocol identification method, which can increase multi-view features on the basis of time-space features to more comprehensively represent the encryption proxy protocol, and can solve the problem of unbalanced network flow samples.
Disclosure of Invention
The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. It should be understood that this summary is not an exhaustive overview of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention. Its purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.
In view of this, in order to solve the problem that the encryption proxy protocol identification method in the prior art cannot clearly characterize the complete encryption proxy protocol network flow, the invention provides an encryption proxy protocol identification method based on multi-view features and integrated learning.
The technical proposal is as follows: a method for cryptographic proxy protocol identification based on multi-view features and ensemble learning, comprising the steps of:
s1, constructing a multi-view feature extraction algorithm, extracting space-time related features, connection management features, flow encapsulation features, authentication mode features and flow confusion features aiming at encryption agent protocol network flows, and outputting a set of 135-dimensional feature vectors extracted from each network flow as a data sample set to a CSV file for storage;
s2, interpolating the data sample set by adopting an SMOTE oversampling algorithm, and increasing the number of samples of a minority class in the data sample set to obtain an SMOTE balanced data sample set;
s3, constructing an integrated learning classification model MvBoost based on multi-view features according to the SMOTE balance data sample set to obtain a final classification recognition result of the encryption agent protocol;
specific: the integrated learning classification model MvBoost is composed of five weak learning base classifiers and one strong learning classifier, the weak learning base classifier is built according to the extracted multi-view characteristics, a first weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic of each encryption proxy protocol network flow, a second weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic and the connection management characteristic of each encryption proxy protocol network flow, a third weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic and the flow encapsulation characteristic of each encryption proxy protocol network flow, a fourth weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic, the flow encapsulation characteristic and the authentication mode characteristic of each encryption proxy protocol network flow, and a fifth weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic, the flow encapsulation characteristic, the authentication mode characteristic and the flow confusion characteristic of each encryption proxy protocol network flow; and adopting a strong learning classifier as a decision maker of the classification and identification results of the five base classifiers, performing connection operation on the prediction classification and identification results of the five base classifiers, and performing model training by the decision maker by adopting prediction classification and identification result data formed by connection as input to obtain the final classification and identification result of the encryption proxy protocol.
Further, in the step S1, the space-time related features are extracted by adopting a Cicflowmeter feature extraction tool, and the Cicflowmeter feature extraction tool is expanded and developed to obtain a multi-view feature extraction algorithm;
the expansion development process is as follows:
input all packet sequences captured in PCAP file or network environment
Outputting CSV files containing multi-view features for each network stream
1 initializing network flow dictionary flow = subject
Each packet pdo in the 2for packet sequence
3 calculating flow key=hash (P) from five-tuple of packet P
4ifflows[flow key]=None then
5 creating a new network Flow = Flow from the rimless group ()
6flows[flow key]=new flow
7end
8 adding network data packet P to network flow
9 extracting and updating space-time authentication mode, connection management, encapsulation protocol and confusion flow characteristics
10end
11 outputting the extracted multi-view network flow characteristics to the CSV file
12return contains CSV files for multiple network flows multi-view features.
Further, in the flow encapsulation feature, a flow encapsulation feature is selected for the transport layer protocol and TLS Application Data type data packet feature, and for TLS Application Data type data packets, each field of header information is selected as the flow encapsulation feature; the transport layer protocol comprises a TCP protocol and a UDP protocol; for the TCP protocol, selecting statistical information of each identification bit of the TCP protocol as flow encapsulation characteristics for distinguishing different proxy protocols, and selecting window size of a TCP protocol bidirectional flow as an additional characteristic; and for the UDP protocol, selecting a judging result of whether the encryption proxy protocol uses the UDP protocol in a transmission layer as a flow encapsulation characteristic.
Further, in the authentication mode feature, whether the encryption proxy protocol performs key negotiation and authentication through the TLS handshake packet is reflected by the relevant feature of the TLS handshake packet, and when the encryption proxy protocol performs authentication through the authentication mode provided by the TLS handshake packet, detail information in the TLS handshake packet is designed, including SID, cryptography suite information and certificate relevant information, so as to obtain the key negotiation and authenticated network data flow feature.
Further, in the connection management feature, for the multi-connection tunnel proxy protocol, the number of proxy tunnels is counted by observing the change condition of four tuples formed by the client IP, the proxy port and the client port.
Further, in the traffic confusion feature, for the HTTP confusion traffic, three types of statistical features including the number of HTTP confusion traffic data packets, the size of HTTP confusion traffic load and the header of HTTP confusion traffic request are designed; aiming at TLS confusion flow, designing the TLS confusion flow data packet duty ratio and the TLS confusion flow head length statistical characteristics; the ratio of the statistically aliased traffic packets to all packets serves as an additional aliasing feature.
Further, in the S2, in the SMOTE oversampling algorithm, for each minority class sample of the data sample set, K neighboring samples around the minority class sample are randomly selected, interpolation is performed based on distances between the K neighboring samples, a new synthesized sample is generated and stored in the data sample set, and the new synthesized sample is integrated into the SMOTE balance data sample set.
The beneficial effects of the invention are as follows: the invention clearly represents the complete encryption proxy protocol network flow by adding the visual angle characteristic; the SMOTE oversampling algorithm is adopted to balance the distribution of the data sample set of the encryption agent protocol network flow, so that the subsequent classifier is convenient to process; the integrated learning classification model MvBoost is used for judging different classification recognition results generated by different feature sets through a decision maker to obtain a final classification recognition result of the encryption agent protocol, and is constructed based on the SMOTE balance data sample set, so that the bias of different features in the classifier training process is balanced, and the data training model is prevented from becoming worse; the invention can effectively identify the encryption proxy protocol, the identification accuracy is up to 99 percent, and the multi-view feature provided by the invention has great contribution to the prediction result of the model, and can play a key role in the identification process of the encryption proxy protocol.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 is a flow chart of a method for encryption agent protocol identification based on multi-view features and ensemble learning;
fig. 2 is a flow chart of an embodiment of a method for encryption agent protocol identification based on multi-view features and ensemble learning.
Detailed Description
In order to make the technical solutions and advantages of the embodiments of the present invention more apparent, the following detailed description of exemplary embodiments of the present invention is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention and not exhaustive of all embodiments. It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
1-2, a method for identifying an encryption proxy protocol based on multi-view features and ensemble learning includes the following steps:
s1, inputting network flow, dividing the network flow, constructing a multi-view feature extraction algorithm, extracting space-time related features, connection management features, flow encapsulation features, authentication mode features and flow confusion features aiming at the encryption proxy protocol network flow, and outputting a set of 135-dimensional multi-view feature vectors extracted by each network flow as a data sample set to a CSV file for storage;
s2, interpolating the data sample set by adopting an SMOTE oversampling algorithm, and increasing the number of samples of a minority class in the data sample set to obtain an SMOTE balanced data sample set;
s3, constructing an integrated learning classification model MvBoost based on multi-view features according to the SMOTE balance data sample set to obtain a final classification recognition result of the encryption agent protocol;
specific: the integrated learning classification model MvBoost is composed of five weak learning base classifiers and one strong learning classifier, the weak learning base classifier is built according to the extracted multi-view characteristics, a first weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic of each encryption proxy protocol network flow, a second weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic and the connection management characteristic of each encryption proxy protocol network flow, a third weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic and the flow encapsulation characteristic of each encryption proxy protocol network flow, a fourth weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic, the flow encapsulation characteristic and the authentication mode characteristic of each encryption proxy protocol network flow, and a fifth weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic, the flow encapsulation characteristic, the authentication mode characteristic and the flow confusion characteristic of each encryption proxy protocol network flow; adopting a strong learning classifier as a decision maker of the classification and identification results of the five base classifiers, performing connection operation on the prediction classification and identification results of the five base classifiers, and performing model training by the decision maker by adopting prediction classification and identification result data formed by connection as input to obtain the final classification and identification result of the encryption agent protocol;
specifically, in this embodiment, the construction principle of the base classifier of the ensemble learning classification model MvBoost is that each weak learning base classifier is characterized by a mapping subset on 135-dimensional multi-view features, each base classifier uses a P-dimensional subspace of the 135-dimensional multi-view feature vector as an input feature vector, and when the classification recognition results of the five base classifiers are inconsistent, the decision maker of the ensemble learning classification model MvBoost performs further learning, namely further model training, and performs discrimination again through the decision maker, so as to finally obtain the flow classification result of the encryption proxy protocol.
Further, in the step S1, the space-time related features are extracted by adopting a Cicflowmeter feature extraction tool, and the Cicflowmeter feature extraction tool is expanded and developed to obtain a multi-view feature extraction algorithm;
the expansion development process is as follows:
input all packet sequences captured in PCAP file or network environment
Outputting CSV files containing multi-view features for each network stream
1 initializing network flow dictionary flow = subject
Each packet pdo in the 2for packet sequence
3 calculating flow key=hash (P) from five-tuple of packet P
4ifflows[flow key]=None then
5 creating a new network Flow = Flow from the rimless group ()
6flows[flow key]=new flow
7end
8 adding network data packet P to network flow
9 extracting and updating space-time authentication mode, connection management, encapsulation protocol and confusion flow characteristics
10end
11 outputting the extracted multi-view network flow characteristics to the CSV file
12return contains CSV files for multiple network flows multi-view features.
Further, in the flow encapsulation feature, a flow encapsulation feature is selected for the transport layer protocol and TLS Application Data type data packet feature, and for TLS Application Data type data packets, each field of header information is selected as the flow encapsulation feature; the transport layer protocol comprises a TCP protocol and a UDP protocol;
for the TCP protocol, selecting statistical information of each identification bit of the TCP protocol as flow encapsulation characteristics for distinguishing different proxy protocols, and selecting window size of a TCP protocol bidirectional flow as an additional characteristic;
for the UDP protocol, selecting a judging result of whether the encryption agent protocol uses the UDP protocol in a transmission layer as a flow encapsulation characteristic;
specifically, the TCP protocol generally sets a relevant identification bit in the proxy client, so that the server can quickly respond to the proxy request, thereby improving the transmission efficiency and instantaneity of the request data, and the UDP protocol has too simple header characteristics, so that each field of the UDP header cannot be used as a traffic encapsulation characteristic for distinguishing the proxy protocol.
Further, in the authentication mode feature, whether the encryption proxy protocol performs key negotiation and authentication through the TLS handshake packet is reflected by the relevant feature of the TLS handshake packet, and when the encryption proxy protocol performs authentication through the authentication mode provided by the TLS handshake packet, detail information in the TLS handshake packet is designed, including SID, cryptography suite information and certificate relevant information, so as to obtain the key negotiation and authenticated network data flow feature.
Further, in the connection management feature, for the multi-connection tunnel proxy protocol, counting the number of proxy tunnels by observing the change condition of four tuples formed by the client IP, the proxy port and the client port;
specifically, under the multi-connection tunnel mode, the IP address and port of the server in the connection established between the proxy client and the proxy server are basically kept unchanged, so that the number of the proxy tunnels can be counted by observing the change condition of four groups formed by the client IP, the proxy server port and the client port when the number of the proxy tunnels is detected; if the client IP address, the proxy IP address and the proxy port are the same, but a plurality of client ports exist, the multi-connection tunnel proxy protocol is adopted; if the tunnels are needed to be distinguished by counting the uplink and downlink traffic in the tunnels under the multi-connection proxy protocol, the tunnels are separated from the uplink and downlink.
Further, in the traffic confusion feature, for the HTTP confusion traffic, three types of statistical features including the number of HTTP confusion traffic data packets, the size of HTTP confusion traffic load and the header of HTTP confusion traffic request are designed; aiming at TLS confusion flow, designing the TLS confusion flow data packet duty ratio and the TLS confusion flow head length statistical characteristics; the ratio of the statistically aliased traffic packets to all packets serves as an additional aliasing feature.
Further, in the S2, in the SMOTE oversampling algorithm, for each minority class sample of the data sample set, K neighboring samples around the minority class sample are randomly selected, interpolation is performed based on distances between the K neighboring samples, a new synthesized sample is generated and stored in the data sample set, and the new synthesized sample is integrated into the SMOTE balance data sample set.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of the above description, will appreciate that other embodiments are contemplated within the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is defined by the appended claims.
Claims (3)
1. A method for identifying an encryption proxy protocol based on multi-view features and ensemble learning, comprising the steps of:
s1, constructing a multi-view feature extraction algorithm, extracting space-time related features, connection management features, flow encapsulation features, authentication mode features and flow confusion features aiming at encryption agent protocol network flows, and outputting a set of 135-dimensional feature vectors extracted from each network flow as a data sample set to a CSV file for storage;
specific:
in the connection management feature, for a multi-connection tunnel proxy protocol, counting the number of proxy tunnels by observing the change condition of a quadruple formed by a client IP, a proxy port and a client port;
among the flow encapsulation features, a flow encapsulation feature is selected for the transmission layer protocol and TLS Application Data type data packet features, and for TLS Application Data type data packets, each field of header information of the data packets is selected as the flow encapsulation feature; the transport layer protocol comprises a TCP protocol and a UDP protocol; for the TCP protocol, selecting statistical information of each identification bit of the TCP protocol as flow encapsulation characteristics for distinguishing different proxy protocols, and selecting window size of a TCP protocol bidirectional flow as an additional characteristic; for the UDP protocol, selecting a judging result of whether the encryption agent protocol uses the UDP protocol in a transmission layer as a flow encapsulation characteristic;
in the authentication mode characteristics, whether the encryption agent protocol carries out key negotiation and authentication through the TLS handshake packet is reflected by the relevant characteristics of the TLS handshake packet, and when the encryption agent protocol carries out authentication through an authentication mode provided by the TLS handshake packet, detail information in the TLS handshake packet is designed, wherein the detail information comprises SID, cryptography suite information and certificate relevant information, and the key negotiation and authenticated network data flow characteristics are acquired;
in the flow confusion feature, aiming at HTTP confusion flow, three statistical features of the number of HTTP confusion flow data packets, the size of HTTP confusion flow load and the head of HTTP confusion flow request are designed; aiming at TLS confusion flow, designing the TLS confusion flow data packet duty ratio and the TLS confusion flow head length statistical characteristics; counting the duty ratio of the confusing flow data packets in all the data packets as an additional confusing feature;
s2, interpolating the data sample set by adopting an SMOTE oversampling algorithm, and increasing the number of samples of a minority class in the data sample set to obtain an SMOTE balanced data sample set;
s3, constructing an integrated learning classification model MvBoost based on multi-view features according to the SMOTE balance data sample set to obtain a final classification recognition result of the encryption agent protocol;
specific: the integrated learning classification model MvBoost is composed of five weak learning base classifiers and one strong learning classifier, the weak learning base classifier is built according to the extracted multi-view characteristics, a first weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic of each encryption proxy protocol network flow, a second weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic and the connection management characteristic of each encryption proxy protocol network flow, a third weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic and the flow encapsulation characteristic of each encryption proxy protocol network flow, a fourth weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic, the flow encapsulation characteristic and the authentication mode characteristic of each encryption proxy protocol network flow, and a fifth weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic, the flow encapsulation characteristic, the authentication mode characteristic and the flow confusion characteristic of each encryption proxy protocol network flow; and adopting a strong learning classifier as a decision maker of the classification and identification results of the five base classifiers, performing connection operation on the prediction classification and identification results of the five base classifiers, and performing model training by the decision maker by adopting prediction classification and identification result data formed by connection as input to obtain the final classification and identification result of the encryption proxy protocol.
2. The method for identifying the encryption agent protocol based on the multi-view features and the ensemble learning according to claim 1, wherein in the step S1, the space-time related features are extracted by using a cic flowMeter feature extraction tool, and the cic flowMeter feature extraction tool is expanded and developed to obtain a multi-view feature extraction algorithm;
the expansion development process is as follows:
input all packet sequences captured in PCAP file or network environment
Outputting CSV files containing multi-view features for each network stream
1 initializing network flow dictionary flow = subject
Each packet pdo in the 2for packet sequence
3 calculating flow key=hash (P) from five-tuple of packet P
4if flows[flow key]=None then
5 creating a new network Flow = Flow from the rimless group ()
6flows[flow key]=new flow
7end
8 adding network data packet P to network flow
9 extracting and updating space-time authentication mode, connection management, encapsulation protocol and confusion flow characteristics
10end
11 outputting the extracted multi-view network flow characteristics to the CSV file
12return contains CSV files for multiple network flows multi-view features.
3. The method according to claim 2, wherein in step S2, in the SMOTE oversampling algorithm, for each minority class sample of the data sample set, K neighboring samples around the minority class sample are randomly selected, and interpolation is performed based on distances between the K neighboring samples, so as to generate a new composite sample, and the new composite sample is stored in the data sample set and integrated into an SMOTE balanced data sample set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310879928.0A CN116668186B (en) | 2023-07-18 | 2023-07-18 | Encryption agent protocol identification method based on multi-view features and ensemble learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310879928.0A CN116668186B (en) | 2023-07-18 | 2023-07-18 | Encryption agent protocol identification method based on multi-view features and ensemble learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116668186A CN116668186A (en) | 2023-08-29 |
CN116668186B true CN116668186B (en) | 2024-02-02 |
Family
ID=87722612
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310879928.0A Active CN116668186B (en) | 2023-07-18 | 2023-07-18 | Encryption agent protocol identification method based on multi-view features and ensemble learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116668186B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104270392A (en) * | 2014-10-24 | 2015-01-07 | 中国科学院信息工程研究所 | Method and system for network protocol recognition based on tri-classifier cooperative training learning |
CN111626336A (en) * | 2020-04-29 | 2020-09-04 | 南京理工大学 | Subway fault data classification method based on unbalanced data set |
CN111817982A (en) * | 2020-07-27 | 2020-10-23 | 南京信息工程大学 | Encrypted flow identification method for category imbalance |
CN112671757A (en) * | 2020-12-22 | 2021-04-16 | 无锡江南计算技术研究所 | Encrypted flow protocol identification method and device based on automatic machine learning |
CN116232642A (en) * | 2022-12-12 | 2023-06-06 | 国家电网有限公司客户服务中心 | Automatic encryption flow analysis method crossing multiple protocols and protocol combinations |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8311956B2 (en) * | 2009-08-11 | 2012-11-13 | At&T Intellectual Property I, L.P. | Scalable traffic classifier and classifier training system |
US11455569B2 (en) * | 2019-01-09 | 2022-09-27 | International Business Machines Corporation | Device discovery and classification from encrypted network traffic |
-
2023
- 2023-07-18 CN CN202310879928.0A patent/CN116668186B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104270392A (en) * | 2014-10-24 | 2015-01-07 | 中国科学院信息工程研究所 | Method and system for network protocol recognition based on tri-classifier cooperative training learning |
CN111626336A (en) * | 2020-04-29 | 2020-09-04 | 南京理工大学 | Subway fault data classification method based on unbalanced data set |
CN111817982A (en) * | 2020-07-27 | 2020-10-23 | 南京信息工程大学 | Encrypted flow identification method for category imbalance |
CN112671757A (en) * | 2020-12-22 | 2021-04-16 | 无锡江南计算技术研究所 | Encrypted flow protocol identification method and device based on automatic machine learning |
CN116232642A (en) * | 2022-12-12 | 2023-06-06 | 国家电网有限公司客户服务中心 | Automatic encryption flow analysis method crossing multiple protocols and protocol combinations |
Non-Patent Citations (2)
Title |
---|
基于不平衡算法的恶意网络行为检测分析;陈弓;;信息技术与信息化(第08期);第121-125页 * |
基于机器学习的流量识别技术综述与展望;赵双等;计算机工程与科学(第10期);第34-44页 * |
Also Published As
Publication number | Publication date |
---|---|
CN116668186A (en) | 2023-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shen et al. | Classification of encrypted traffic with second-order markov chains and application attribute bigrams | |
Rezaei et al. | Large-scale mobile app identification using deep learning | |
Shen et al. | Certificate-aware encrypted traffic classification using second-order markov chain | |
Saber et al. | Encrypted traffic classification: Combining over-and under-sampling through a pca-svm | |
KR20190121666A (en) | Method and apparatus for analyzing traffic based on flow in cloud system | |
CN110868409A (en) | Passive operating system identification method and system based on TCP/IP protocol stack fingerprint | |
Korczyński et al. | Classifying service flows in the encrypted skype traffic | |
CN109525508A (en) | Encryption stream recognition method, device and the storage medium compared based on flow similitude | |
CN111147394A (en) | Multi-stage classification detection method for remote desktop protocol traffic behavior | |
CN114257428B (en) | Encryption network traffic identification and classification method based on deep learning | |
Himura et al. | Synoptic graphlet: Bridging the gap between supervised and unsupervised profiling of host-level network traffic | |
CN116668186B (en) | Encryption agent protocol identification method based on multi-view features and ensemble learning | |
Lee et al. | A machine learning approach to predicting block cipher security | |
Gomez et al. | Efficient network telemetry based on traffic awareness | |
Pereira et al. | ITCM: A real time internet traffic classifier monitor | |
Luo et al. | Behavior-based method for real-time identification of encrypted proxy traffic | |
Hejun et al. | Online and automatic identification and mining of encryption network behavior in big data environment | |
Fan et al. | Identify OS from encrypted traffic with TCP/IP stack fingerprinting | |
Lu et al. | Comparison and analysis of flow features at the packet level for traffic classification | |
CN114338070B (en) | Shadowsocks (R) identification method based on protocol attribute | |
Bacquet et al. | An investigation of multi-objective genetic algorithms for encrypted traffic identification | |
Zhang et al. | Skype traffic identification based SVM using optimized feature set | |
Luckner | Conversion of decision tree into deterministic finite automaton for high accuracy online syn flood detection | |
Carela-Espanol et al. | Traffic classification with sampled netflow | |
Nieminen et al. | A framework for classifying IPFIX flow data, case KNN classifier |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |