CN116668186B - Encryption agent protocol identification method based on multi-view features and ensemble learning - Google Patents

Encryption agent protocol identification method based on multi-view features and ensemble learning Download PDF

Info

Publication number
CN116668186B
CN116668186B CN202310879928.0A CN202310879928A CN116668186B CN 116668186 B CN116668186 B CN 116668186B CN 202310879928 A CN202310879928 A CN 202310879928A CN 116668186 B CN116668186 B CN 116668186B
Authority
CN
China
Prior art keywords
flow
protocol
features
characteristic
encryption
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310879928.0A
Other languages
Chinese (zh)
Other versions
CN116668186A (en
Inventor
刘立坤
余翔湛
宋赟祖
史建焘
胡智超
葛蒙蒙
李竑杰
刘奉哲
孔德文
羿天阳
龚家兴
张森
程明明
高展鹏
王钲皓
郭一澄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202310879928.0A priority Critical patent/CN116668186B/en
Publication of CN116668186A publication Critical patent/CN116668186A/en
Application granted granted Critical
Publication of CN116668186B publication Critical patent/CN116668186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0884Network architectures or network communication protocols for network security for authentication of entities by delegation of authentication, e.g. a proxy authenticates an entity to be authenticated on behalf of this entity vis-à-vis an authentication entity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The invention discloses a method for identifying an encryption proxy protocol based on multi-view features and ensemble learning, and belongs to the technical field of encryption proxy protocol identification. The method solves the problem that the encryption proxy protocol identification method in the prior art cannot clearly represent the network flow of the complete encryption proxy protocol; the invention comprises the following steps: s1, constructing a multi-view feature extraction algorithm to extract space-time related features, connection management features, flow encapsulation features, authentication mode features and flow confusion features, and taking a set of 135-dimensional feature vectors extracted from each network flow as a data sample set; s2, interpolating the data sample set by adopting an SMOTE oversampling algorithm to obtain an SMOTE balanced data sample set; s3, constructing an integrated learning classification model MvBoost according to the SMOTE balance data sample set to obtain an encryption agent protocol classification recognition result. The invention can effectively identify the encryption proxy protocol and avoid the deterioration of model data training.

Description

Encryption agent protocol identification method based on multi-view features and ensemble learning
Technical Field
The invention relates to a method for identifying an encryption proxy protocol, in particular to a method for identifying the encryption proxy protocol based on multi-view features and ensemble learning, belonging to the technical field of encryption proxy protocol identification.
Background
The encryption agent protocol identification problem may be defined as a classification problem of an encryption agent protocol versus a non-encryption agent protocol, or as a multi-classification problem that distinguishes between multiple encryption agent protocols. The currently mainly studied encryption proxy protocols comprise VPN protocols, shadowsocks protocols, V2Ray protocols and the like, network traffic of different encryption proxy protocols show different data characteristics and behavior characteristics, and the current encryption proxy protocol identification method mainly adopts machine learning or deep learning technology to learn network traffic characteristics so as to realize classification of the encryption proxy protocol traffic.
The encryption agent protocol flow classification method based on machine learning needs to determine effective network flow characteristics, such as time-related characteristics, data packet-related characteristics, statistical characteristics, behavior characteristics and the like, then learns the extracted network flow characteristics through random forest, C4.5 decision tree, GBDT, SVM and other algorithms, and identifies the encryption agent protocol, wherein the identification effect of the random forest algorithm is superior to other algorithms in most cases, but the method based on machine learning also has corresponding problems, and the method needs to manually select the used network flow characteristics, and how to select and which characteristics are difficult to judge.
The encryption proxy protocol flow classification method based on deep learning can automatically learn related characteristic information from network flows, simplifies the encryption proxy protocol identification process, takes a CNN model as a main component, and proves the effectiveness of the CNN model in encryption proxy protocol identification tasks through a large number of experiments.
The existing encryption agent protocol identification method mainly relies on time-space related characteristics and uses a traditional machine learning model for identification, and has the following defects in terms of characteristic selection and algorithm model:
1. the time-space related characteristics can not clearly represent the complete network flow of the encryption proxy protocol, and the encryption proxy protocol has various differences from the common encryption traffic, including authentication mode, connection management, encapsulation protocol, traffic confusion and the like;
2. in the actual encryption proxy flow collection process, as the connection management modes of proxy protocols are different, only a small number of proxy protocols use a single tunnel model of VPN technology to carry out flow proxy, so that the number of network flow samples of the encryption proxy protocol using single tunnel connection management is small, the distribution of the network flow samples of the encryption proxy protocol is unbalanced, and the stability of a classifier is influenced;
3. the bias of different features in the classifier training process is not considered in the traditional machine learning algorithm, namely different feature sets have different influences on the classification result of the sample, and the training algorithm simply adopts all the feature sets to train the algorithm can cause the data training to be poor.
Therefore, there is a need for an encryption proxy protocol identification method, which can increase multi-view features on the basis of time-space features to more comprehensively represent the encryption proxy protocol, and can solve the problem of unbalanced network flow samples.
Disclosure of Invention
The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. It should be understood that this summary is not an exhaustive overview of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention. Its purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.
In view of this, in order to solve the problem that the encryption proxy protocol identification method in the prior art cannot clearly characterize the complete encryption proxy protocol network flow, the invention provides an encryption proxy protocol identification method based on multi-view features and integrated learning.
The technical proposal is as follows: a method for cryptographic proxy protocol identification based on multi-view features and ensemble learning, comprising the steps of:
s1, constructing a multi-view feature extraction algorithm, extracting space-time related features, connection management features, flow encapsulation features, authentication mode features and flow confusion features aiming at encryption agent protocol network flows, and outputting a set of 135-dimensional feature vectors extracted from each network flow as a data sample set to a CSV file for storage;
s2, interpolating the data sample set by adopting an SMOTE oversampling algorithm, and increasing the number of samples of a minority class in the data sample set to obtain an SMOTE balanced data sample set;
s3, constructing an integrated learning classification model MvBoost based on multi-view features according to the SMOTE balance data sample set to obtain a final classification recognition result of the encryption agent protocol;
specific: the integrated learning classification model MvBoost is composed of five weak learning base classifiers and one strong learning classifier, the weak learning base classifier is built according to the extracted multi-view characteristics, a first weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic of each encryption proxy protocol network flow, a second weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic and the connection management characteristic of each encryption proxy protocol network flow, a third weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic and the flow encapsulation characteristic of each encryption proxy protocol network flow, a fourth weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic, the flow encapsulation characteristic and the authentication mode characteristic of each encryption proxy protocol network flow, and a fifth weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic, the flow encapsulation characteristic, the authentication mode characteristic and the flow confusion characteristic of each encryption proxy protocol network flow; and adopting a strong learning classifier as a decision maker of the classification and identification results of the five base classifiers, performing connection operation on the prediction classification and identification results of the five base classifiers, and performing model training by the decision maker by adopting prediction classification and identification result data formed by connection as input to obtain the final classification and identification result of the encryption proxy protocol.
Further, in the step S1, the space-time related features are extracted by adopting a Cicflowmeter feature extraction tool, and the Cicflowmeter feature extraction tool is expanded and developed to obtain a multi-view feature extraction algorithm;
the expansion development process is as follows:
input all packet sequences captured in PCAP file or network environment
Outputting CSV files containing multi-view features for each network stream
1 initializing network flow dictionary flow = subject
Each packet pdo in the 2for packet sequence
3 calculating flow key=hash (P) from five-tuple of packet P
4ifflows[flow key]=None then
5 creating a new network Flow = Flow from the rimless group ()
6flows[flow key]=new flow
7end
8 adding network data packet P to network flow
9 extracting and updating space-time authentication mode, connection management, encapsulation protocol and confusion flow characteristics
10end
11 outputting the extracted multi-view network flow characteristics to the CSV file
12return contains CSV files for multiple network flows multi-view features.
Further, in the flow encapsulation feature, a flow encapsulation feature is selected for the transport layer protocol and TLS Application Data type data packet feature, and for TLS Application Data type data packets, each field of header information is selected as the flow encapsulation feature; the transport layer protocol comprises a TCP protocol and a UDP protocol; for the TCP protocol, selecting statistical information of each identification bit of the TCP protocol as flow encapsulation characteristics for distinguishing different proxy protocols, and selecting window size of a TCP protocol bidirectional flow as an additional characteristic; and for the UDP protocol, selecting a judging result of whether the encryption proxy protocol uses the UDP protocol in a transmission layer as a flow encapsulation characteristic.
Further, in the authentication mode feature, whether the encryption proxy protocol performs key negotiation and authentication through the TLS handshake packet is reflected by the relevant feature of the TLS handshake packet, and when the encryption proxy protocol performs authentication through the authentication mode provided by the TLS handshake packet, detail information in the TLS handshake packet is designed, including SID, cryptography suite information and certificate relevant information, so as to obtain the key negotiation and authenticated network data flow feature.
Further, in the connection management feature, for the multi-connection tunnel proxy protocol, the number of proxy tunnels is counted by observing the change condition of four tuples formed by the client IP, the proxy port and the client port.
Further, in the traffic confusion feature, for the HTTP confusion traffic, three types of statistical features including the number of HTTP confusion traffic data packets, the size of HTTP confusion traffic load and the header of HTTP confusion traffic request are designed; aiming at TLS confusion flow, designing the TLS confusion flow data packet duty ratio and the TLS confusion flow head length statistical characteristics; the ratio of the statistically aliased traffic packets to all packets serves as an additional aliasing feature.
Further, in the S2, in the SMOTE oversampling algorithm, for each minority class sample of the data sample set, K neighboring samples around the minority class sample are randomly selected, interpolation is performed based on distances between the K neighboring samples, a new synthesized sample is generated and stored in the data sample set, and the new synthesized sample is integrated into the SMOTE balance data sample set.
The beneficial effects of the invention are as follows: the invention clearly represents the complete encryption proxy protocol network flow by adding the visual angle characteristic; the SMOTE oversampling algorithm is adopted to balance the distribution of the data sample set of the encryption agent protocol network flow, so that the subsequent classifier is convenient to process; the integrated learning classification model MvBoost is used for judging different classification recognition results generated by different feature sets through a decision maker to obtain a final classification recognition result of the encryption agent protocol, and is constructed based on the SMOTE balance data sample set, so that the bias of different features in the classifier training process is balanced, and the data training model is prevented from becoming worse; the invention can effectively identify the encryption proxy protocol, the identification accuracy is up to 99 percent, and the multi-view feature provided by the invention has great contribution to the prediction result of the model, and can play a key role in the identification process of the encryption proxy protocol.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 is a flow chart of a method for encryption agent protocol identification based on multi-view features and ensemble learning;
fig. 2 is a flow chart of an embodiment of a method for encryption agent protocol identification based on multi-view features and ensemble learning.
Detailed Description
In order to make the technical solutions and advantages of the embodiments of the present invention more apparent, the following detailed description of exemplary embodiments of the present invention is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention and not exhaustive of all embodiments. It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
1-2, a method for identifying an encryption proxy protocol based on multi-view features and ensemble learning includes the following steps:
s1, inputting network flow, dividing the network flow, constructing a multi-view feature extraction algorithm, extracting space-time related features, connection management features, flow encapsulation features, authentication mode features and flow confusion features aiming at the encryption proxy protocol network flow, and outputting a set of 135-dimensional multi-view feature vectors extracted by each network flow as a data sample set to a CSV file for storage;
s2, interpolating the data sample set by adopting an SMOTE oversampling algorithm, and increasing the number of samples of a minority class in the data sample set to obtain an SMOTE balanced data sample set;
s3, constructing an integrated learning classification model MvBoost based on multi-view features according to the SMOTE balance data sample set to obtain a final classification recognition result of the encryption agent protocol;
specific: the integrated learning classification model MvBoost is composed of five weak learning base classifiers and one strong learning classifier, the weak learning base classifier is built according to the extracted multi-view characteristics, a first weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic of each encryption proxy protocol network flow, a second weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic and the connection management characteristic of each encryption proxy protocol network flow, a third weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic and the flow encapsulation characteristic of each encryption proxy protocol network flow, a fourth weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic, the flow encapsulation characteristic and the authentication mode characteristic of each encryption proxy protocol network flow, and a fifth weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic, the flow encapsulation characteristic, the authentication mode characteristic and the flow confusion characteristic of each encryption proxy protocol network flow; adopting a strong learning classifier as a decision maker of the classification and identification results of the five base classifiers, performing connection operation on the prediction classification and identification results of the five base classifiers, and performing model training by the decision maker by adopting prediction classification and identification result data formed by connection as input to obtain the final classification and identification result of the encryption agent protocol;
specifically, in this embodiment, the construction principle of the base classifier of the ensemble learning classification model MvBoost is that each weak learning base classifier is characterized by a mapping subset on 135-dimensional multi-view features, each base classifier uses a P-dimensional subspace of the 135-dimensional multi-view feature vector as an input feature vector, and when the classification recognition results of the five base classifiers are inconsistent, the decision maker of the ensemble learning classification model MvBoost performs further learning, namely further model training, and performs discrimination again through the decision maker, so as to finally obtain the flow classification result of the encryption proxy protocol.
Further, in the step S1, the space-time related features are extracted by adopting a Cicflowmeter feature extraction tool, and the Cicflowmeter feature extraction tool is expanded and developed to obtain a multi-view feature extraction algorithm;
the expansion development process is as follows:
input all packet sequences captured in PCAP file or network environment
Outputting CSV files containing multi-view features for each network stream
1 initializing network flow dictionary flow = subject
Each packet pdo in the 2for packet sequence
3 calculating flow key=hash (P) from five-tuple of packet P
4ifflows[flow key]=None then
5 creating a new network Flow = Flow from the rimless group ()
6flows[flow key]=new flow
7end
8 adding network data packet P to network flow
9 extracting and updating space-time authentication mode, connection management, encapsulation protocol and confusion flow characteristics
10end
11 outputting the extracted multi-view network flow characteristics to the CSV file
12return contains CSV files for multiple network flows multi-view features.
Further, in the flow encapsulation feature, a flow encapsulation feature is selected for the transport layer protocol and TLS Application Data type data packet feature, and for TLS Application Data type data packets, each field of header information is selected as the flow encapsulation feature; the transport layer protocol comprises a TCP protocol and a UDP protocol;
for the TCP protocol, selecting statistical information of each identification bit of the TCP protocol as flow encapsulation characteristics for distinguishing different proxy protocols, and selecting window size of a TCP protocol bidirectional flow as an additional characteristic;
for the UDP protocol, selecting a judging result of whether the encryption agent protocol uses the UDP protocol in a transmission layer as a flow encapsulation characteristic;
specifically, the TCP protocol generally sets a relevant identification bit in the proxy client, so that the server can quickly respond to the proxy request, thereby improving the transmission efficiency and instantaneity of the request data, and the UDP protocol has too simple header characteristics, so that each field of the UDP header cannot be used as a traffic encapsulation characteristic for distinguishing the proxy protocol.
Further, in the authentication mode feature, whether the encryption proxy protocol performs key negotiation and authentication through the TLS handshake packet is reflected by the relevant feature of the TLS handshake packet, and when the encryption proxy protocol performs authentication through the authentication mode provided by the TLS handshake packet, detail information in the TLS handshake packet is designed, including SID, cryptography suite information and certificate relevant information, so as to obtain the key negotiation and authenticated network data flow feature.
Further, in the connection management feature, for the multi-connection tunnel proxy protocol, counting the number of proxy tunnels by observing the change condition of four tuples formed by the client IP, the proxy port and the client port;
specifically, under the multi-connection tunnel mode, the IP address and port of the server in the connection established between the proxy client and the proxy server are basically kept unchanged, so that the number of the proxy tunnels can be counted by observing the change condition of four groups formed by the client IP, the proxy server port and the client port when the number of the proxy tunnels is detected; if the client IP address, the proxy IP address and the proxy port are the same, but a plurality of client ports exist, the multi-connection tunnel proxy protocol is adopted; if the tunnels are needed to be distinguished by counting the uplink and downlink traffic in the tunnels under the multi-connection proxy protocol, the tunnels are separated from the uplink and downlink.
Further, in the traffic confusion feature, for the HTTP confusion traffic, three types of statistical features including the number of HTTP confusion traffic data packets, the size of HTTP confusion traffic load and the header of HTTP confusion traffic request are designed; aiming at TLS confusion flow, designing the TLS confusion flow data packet duty ratio and the TLS confusion flow head length statistical characteristics; the ratio of the statistically aliased traffic packets to all packets serves as an additional aliasing feature.
Further, in the S2, in the SMOTE oversampling algorithm, for each minority class sample of the data sample set, K neighboring samples around the minority class sample are randomly selected, interpolation is performed based on distances between the K neighboring samples, a new synthesized sample is generated and stored in the data sample set, and the new synthesized sample is integrated into the SMOTE balance data sample set.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of the above description, will appreciate that other embodiments are contemplated within the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is defined by the appended claims.

Claims (3)

1. A method for identifying an encryption proxy protocol based on multi-view features and ensemble learning, comprising the steps of:
s1, constructing a multi-view feature extraction algorithm, extracting space-time related features, connection management features, flow encapsulation features, authentication mode features and flow confusion features aiming at encryption agent protocol network flows, and outputting a set of 135-dimensional feature vectors extracted from each network flow as a data sample set to a CSV file for storage;
specific:
in the connection management feature, for a multi-connection tunnel proxy protocol, counting the number of proxy tunnels by observing the change condition of a quadruple formed by a client IP, a proxy port and a client port;
among the flow encapsulation features, a flow encapsulation feature is selected for the transmission layer protocol and TLS Application Data type data packet features, and for TLS Application Data type data packets, each field of header information of the data packets is selected as the flow encapsulation feature; the transport layer protocol comprises a TCP protocol and a UDP protocol; for the TCP protocol, selecting statistical information of each identification bit of the TCP protocol as flow encapsulation characteristics for distinguishing different proxy protocols, and selecting window size of a TCP protocol bidirectional flow as an additional characteristic; for the UDP protocol, selecting a judging result of whether the encryption agent protocol uses the UDP protocol in a transmission layer as a flow encapsulation characteristic;
in the authentication mode characteristics, whether the encryption agent protocol carries out key negotiation and authentication through the TLS handshake packet is reflected by the relevant characteristics of the TLS handshake packet, and when the encryption agent protocol carries out authentication through an authentication mode provided by the TLS handshake packet, detail information in the TLS handshake packet is designed, wherein the detail information comprises SID, cryptography suite information and certificate relevant information, and the key negotiation and authenticated network data flow characteristics are acquired;
in the flow confusion feature, aiming at HTTP confusion flow, three statistical features of the number of HTTP confusion flow data packets, the size of HTTP confusion flow load and the head of HTTP confusion flow request are designed; aiming at TLS confusion flow, designing the TLS confusion flow data packet duty ratio and the TLS confusion flow head length statistical characteristics; counting the duty ratio of the confusing flow data packets in all the data packets as an additional confusing feature;
s2, interpolating the data sample set by adopting an SMOTE oversampling algorithm, and increasing the number of samples of a minority class in the data sample set to obtain an SMOTE balanced data sample set;
s3, constructing an integrated learning classification model MvBoost based on multi-view features according to the SMOTE balance data sample set to obtain a final classification recognition result of the encryption agent protocol;
specific: the integrated learning classification model MvBoost is composed of five weak learning base classifiers and one strong learning classifier, the weak learning base classifier is built according to the extracted multi-view characteristics, a first weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic of each encryption proxy protocol network flow, a second weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic and the connection management characteristic of each encryption proxy protocol network flow, a third weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic and the flow encapsulation characteristic of each encryption proxy protocol network flow, a fourth weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic, the flow encapsulation characteristic and the authentication mode characteristic of each encryption proxy protocol network flow, and a fifth weak learning base classifier is built by adopting the subspace of the space-time correlation characteristic, the connection management characteristic, the flow encapsulation characteristic, the authentication mode characteristic and the flow confusion characteristic of each encryption proxy protocol network flow; and adopting a strong learning classifier as a decision maker of the classification and identification results of the five base classifiers, performing connection operation on the prediction classification and identification results of the five base classifiers, and performing model training by the decision maker by adopting prediction classification and identification result data formed by connection as input to obtain the final classification and identification result of the encryption proxy protocol.
2. The method for identifying the encryption agent protocol based on the multi-view features and the ensemble learning according to claim 1, wherein in the step S1, the space-time related features are extracted by using a cic flowMeter feature extraction tool, and the cic flowMeter feature extraction tool is expanded and developed to obtain a multi-view feature extraction algorithm;
the expansion development process is as follows:
input all packet sequences captured in PCAP file or network environment
Outputting CSV files containing multi-view features for each network stream
1 initializing network flow dictionary flow = subject
Each packet pdo in the 2for packet sequence
3 calculating flow key=hash (P) from five-tuple of packet P
4if flows[flow key]=None then
5 creating a new network Flow = Flow from the rimless group ()
6flows[flow key]=new flow
7end
8 adding network data packet P to network flow
9 extracting and updating space-time authentication mode, connection management, encapsulation protocol and confusion flow characteristics
10end
11 outputting the extracted multi-view network flow characteristics to the CSV file
12return contains CSV files for multiple network flows multi-view features.
3. The method according to claim 2, wherein in step S2, in the SMOTE oversampling algorithm, for each minority class sample of the data sample set, K neighboring samples around the minority class sample are randomly selected, and interpolation is performed based on distances between the K neighboring samples, so as to generate a new composite sample, and the new composite sample is stored in the data sample set and integrated into an SMOTE balanced data sample set.
CN202310879928.0A 2023-07-18 2023-07-18 Encryption agent protocol identification method based on multi-view features and ensemble learning Active CN116668186B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310879928.0A CN116668186B (en) 2023-07-18 2023-07-18 Encryption agent protocol identification method based on multi-view features and ensemble learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310879928.0A CN116668186B (en) 2023-07-18 2023-07-18 Encryption agent protocol identification method based on multi-view features and ensemble learning

Publications (2)

Publication Number Publication Date
CN116668186A CN116668186A (en) 2023-08-29
CN116668186B true CN116668186B (en) 2024-02-02

Family

ID=87722612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310879928.0A Active CN116668186B (en) 2023-07-18 2023-07-18 Encryption agent protocol identification method based on multi-view features and ensemble learning

Country Status (1)

Country Link
CN (1) CN116668186B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104270392A (en) * 2014-10-24 2015-01-07 中国科学院信息工程研究所 Method and system for network protocol recognition based on tri-classifier cooperative training learning
CN111626336A (en) * 2020-04-29 2020-09-04 南京理工大学 Subway fault data classification method based on unbalanced data set
CN111817982A (en) * 2020-07-27 2020-10-23 南京信息工程大学 Encrypted flow identification method for category imbalance
CN112671757A (en) * 2020-12-22 2021-04-16 无锡江南计算技术研究所 Encrypted flow protocol identification method and device based on automatic machine learning
CN116232642A (en) * 2022-12-12 2023-06-06 国家电网有限公司客户服务中心 Automatic encryption flow analysis method crossing multiple protocols and protocol combinations

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8311956B2 (en) * 2009-08-11 2012-11-13 At&T Intellectual Property I, L.P. Scalable traffic classifier and classifier training system
US11455569B2 (en) * 2019-01-09 2022-09-27 International Business Machines Corporation Device discovery and classification from encrypted network traffic

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104270392A (en) * 2014-10-24 2015-01-07 中国科学院信息工程研究所 Method and system for network protocol recognition based on tri-classifier cooperative training learning
CN111626336A (en) * 2020-04-29 2020-09-04 南京理工大学 Subway fault data classification method based on unbalanced data set
CN111817982A (en) * 2020-07-27 2020-10-23 南京信息工程大学 Encrypted flow identification method for category imbalance
CN112671757A (en) * 2020-12-22 2021-04-16 无锡江南计算技术研究所 Encrypted flow protocol identification method and device based on automatic machine learning
CN116232642A (en) * 2022-12-12 2023-06-06 国家电网有限公司客户服务中心 Automatic encryption flow analysis method crossing multiple protocols and protocol combinations

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于不平衡算法的恶意网络行为检测分析;陈弓;;信息技术与信息化(第08期);第121-125页 *
基于机器学习的流量识别技术综述与展望;赵双等;计算机工程与科学(第10期);第34-44页 *

Also Published As

Publication number Publication date
CN116668186A (en) 2023-08-29

Similar Documents

Publication Publication Date Title
Shen et al. Classification of encrypted traffic with second-order markov chains and application attribute bigrams
Rezaei et al. Large-scale mobile app identification using deep learning
Shen et al. Certificate-aware encrypted traffic classification using second-order markov chain
Saber et al. Encrypted traffic classification: Combining over-and under-sampling through a pca-svm
KR20190121666A (en) Method and apparatus for analyzing traffic based on flow in cloud system
CN110868409A (en) Passive operating system identification method and system based on TCP/IP protocol stack fingerprint
Korczyński et al. Classifying service flows in the encrypted skype traffic
CN109525508A (en) Encryption stream recognition method, device and the storage medium compared based on flow similitude
CN111147394A (en) Multi-stage classification detection method for remote desktop protocol traffic behavior
CN114257428B (en) Encryption network traffic identification and classification method based on deep learning
Himura et al. Synoptic graphlet: Bridging the gap between supervised and unsupervised profiling of host-level network traffic
CN116668186B (en) Encryption agent protocol identification method based on multi-view features and ensemble learning
Lee et al. A machine learning approach to predicting block cipher security
Gomez et al. Efficient network telemetry based on traffic awareness
Pereira et al. ITCM: A real time internet traffic classifier monitor
Luo et al. Behavior-based method for real-time identification of encrypted proxy traffic
Hejun et al. Online and automatic identification and mining of encryption network behavior in big data environment
Fan et al. Identify OS from encrypted traffic with TCP/IP stack fingerprinting
Lu et al. Comparison and analysis of flow features at the packet level for traffic classification
CN114338070B (en) Shadowsocks (R) identification method based on protocol attribute
Bacquet et al. An investigation of multi-objective genetic algorithms for encrypted traffic identification
Zhang et al. Skype traffic identification based SVM using optimized feature set
Luckner Conversion of decision tree into deterministic finite automaton for high accuracy online syn flood detection
Carela-Espanol et al. Traffic classification with sampled netflow
Nieminen et al. A framework for classifying IPFIX flow data, case KNN classifier

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant