CN113141364B

CN113141364B - Encrypted traffic classification method, system, equipment and readable storage medium

Info

Publication number: CN113141364B
Application number: CN202110438554.XA
Authority: CN
Inventors: 马小博; 安冰玉; 瞿建; 潘鹏宇; 李森; 王鑫; 卞华峰
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2021-04-22
Filing date: 2021-04-22
Publication date: 2022-07-12
Anticipated expiration: 2041-04-22
Also published as: CN113141364A

Abstract

The invention discloses a classification method, a system, equipment and a readable storage medium of encrypted traffic, wherein a K-dimensional vector carrying double labels of corresponding streams simultaneously is formed based on indexes of leaf nodes of a judgment result of each decision tree in a stream classification model, the K-dimensional vector is used as input training K-nearest neighbor classification algorithm to calculate and obtain original encrypted traffic samples and L2I values of the streams, when any given encrypted traffic sample is classified, meta-feature vectors of all the extracted streams are input into the stream-based classification model and the encrypted traffic type labels thereof are predicted to obtain prediction labels, the sum of the L2I values corresponding to the labels is calculated, then the original encrypted traffic samples and the L2I values of the streams are compared to realize encrypted traffic classification based on a stream double label mechanism, complete website access classification can be realized, and traffic intersection in the access process can be prevented, the method is suitable for carrying out encryption traffic classification on web-oriented and stream-oriented network behaviors, and can realize complete website access classification.

Description

Encrypted traffic classification method, system, equipment and readable storage medium

Technical Field

The invention belongs to the field of network security and user privacy, and particularly relates to an encrypted traffic classification method, system, equipment and readable storage medium.

Background

In recent years, with the rapid development of the internet, the network has been tightly integrated into our production and life, and the network security has become a non-negligible problem. In daily life, the network security awareness of people is gradually improved, and more users and enterprises pay attention to the protection and the safe transmission of information. The network behavior identification technology based on the encrypted flow can be used for realizing the safety supervision of the network, in particular the supervision of illegal services and bad information. The encrypted traffic analysis is to analyze the internet access behavior of the current user through the characteristics of some traffic per se, but not through the content analysis of data packets. The most important technology for the current encrypted traffic analysis application is web site fingerprinting (website fingerprinting), which is a technology that classifies user behaviors by extracting features of network traffic and combining with a supervised classification model, and can accurately judge a website accessed by a current user. For the website fingerprint analysis technology, how to accurately realize website classification and be applied in a real network environment is a key problem.

Most of the current encryption traffic analysis technologies stay in the academic research stage, and no people research the application of the encryption traffic analysis technologies in a real network environment. This is because when the existing website fingerprinting technology trains the classification model, the used basic recognition unit is still the complete traffic generated by visiting a website, and this complete traffic cannot be determined in the real network environment. Because in a real network environment, there may be a NAT network, or a scenario similar to this, where multiple websites are accessed simultaneously, which may create a traffic intersection situation. Once traffic crossing occurs, we cannot accurately distinguish traffic belonging to the access to a website.

In summary, the basic unit of identification used in encrypted traffic analysis at home and abroad is the complete traffic generated by visiting one website, and the generation and collection of the traffic are required in a pure network environment, and the visiting time of each website needs to be strictly controlled to ensure that the traffic is not cross-polluted. The research method is suitable for research and learning, but a complete website access flow cannot be distinguished due to the fact that the flow in a real network environment is crossed, so that the research method cannot be applied to the real network environment for a while, and no one researches the application of encrypted flow analysis in the real network environment so far.

Disclosure of Invention

The invention aims to provide a method, a system, equipment and a readable storage medium for classifying encrypted traffic, so as to overcome the defects of the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme:

an encrypted traffic classification method, comprising the steps of:

s1, generating a training set of the encrypted flow sample based on the flow;

s2, generating an encrypted flow recognition flow-based classification model by adopting a random forest classification model according to the flow-based training set;

s3, forming a K-dimensional vector simultaneously carrying the double labels of the corresponding flow according to the index of the leaf node of the judgment result of each decision tree in the flow-based classification model, and calculating by taking the K-dimensional vector as an input training K-nearest neighbor classification algorithm to obtain an original encrypted flow sample and an L2I value of the flow;

s4, according to port information contained in the data packet, dividing the encrypted flow sample to be detected into flows with the same port and extracting meta-feature vectors, inputting the extracted meta-feature vectors of all the flows into a flow-based classification model and predicting the encrypted flow type labels of the flows to obtain prediction labels, grouping the prediction labels with the same first dimension labels, calculating the sum of L2I values corresponding to the labels in the grouping, and then comparing the original encrypted flow sample with the L2I value of the flow; if the ratio of the two is larger than the threshold value set before the user, the encrypted traffic label with the largest ratio is output as a classification result, and if the ratio of the two is smaller than the threshold value, the classification result is not output, so that encrypted traffic classification is completed.

Further, a user encrypted flow sample set is collected, each encrypted flow sample in the set is an original flow file containing a data packet, and the encrypted flow sample set has a unique encrypted flow type label; splitting each complete encrypted flow sample into a plurality of flow samples according to port information contained in the data packet; then marking the corresponding stream according to the file containing the stream log information in the encrypted flow sample set; according to the meta-feature vectors in the encrypted traffic sample set, performing vectorization representation on the stream in each encrypted traffic sample; and after all the streams in the encrypted flow sample set are represented in a vectorization mode by adopting the meta-feature vectors, the encrypted flow type label and the stream label of each encrypted flow sample are reserved, and a stream-based training set of the encrypted flow samples is obtained.

Further, for the encrypted traffic sample setExtracting d-dimensional sequence characteristics of each encrypted flow sample in the system, and recording the d-dimensional sequence characteristics as [ f [ ]₁,f₂,…,f_d](ii) a Let the total p-type encrypted traffic samples and the encrypted traffic type of the i-th type encrypted traffic sample be labeled as label_i(ii) a After the encrypted flow sample is split into streams according to the ports, the streams are marked as label according to the log file_i-jWherein the value of j is determined according to the number of streams of the encrypted traffic samples of different classes; the encrypted traffic sample training set is denoted as T:

T＝{(label₁,label_1-1)：[f₁,f₂,…,f_d],(label₁,label_1-2)：[f₁,f₂,…,f_d],…,(label_p,label_p-j)：[f₁,f₂,…,f_d]}

wherein, label_pThe first dimension label is an encrypted flow sample layer label and corresponds to the network address of each monitoring website; label_p-jIs a second dimension label and is a flow layer label.

Further, taking the stream-based training set obtained in the step S1 as an input, training a random forest classification model, which is composed of k decision trees; and taking the index of the leaf node of the judgment result of each decision tree to form a k-dimensional vector which simultaneously carries the double labels of the corresponding flow.

Further, a stream sample in the stream-based training set T is used as an input of the stream-based classification model C, an index value of a leaf node where a decision result of the v-th decision tree in the stream-based classification model C is located is recorded, and a one-dimensional new feature F belonging to the encrypted traffic sample is generated_jTotal k-dimensional composite feature vector, denoted as [ F₁,F₂,…,F_k](ii) a And finally, generating k-dimensional new features for each encrypted flow sample in the training set T based on the flow to obtain a fingerprint set, wherein the fingerprint set is represented as P:

P＝{(label₁,label_1-1)：[F₁,F₂,…,F_k],(label₁,label_1-2)：[F₁,F₂,…,F_k],…,(label_p,label_p-j)：[F₁,F₂,…,F_k]}。

further, if the number of each type of encrypted traffic samples is n, K in the K neighbors is n-1, and the label is assumed to be (label)_p,label_p-j) Of K samples surrounding one finger print sample, the number of samples labeled with the same label is Num_p-jThen the L2I value for this type of stream is:

L2I_p-j＝Num_p-j/K；

the first dimension label is label_pThe value of L2I for the encrypted traffic sample of (1) is that all first dimension labels are labels_pOf the stream L2I.

Furthermore, any encrypted flow sample is given, is divided into flows according to ports, is subjected to vectorization representation by adopting element feature vectors, and is input into a flow-based classification model C to obtain sample labels of all flows.

An encrypted traffic classification system comprising:

the input module is used for splitting the encrypted flow sample to be detected into flows with the same port according to port information contained in the data packet and extracting meta-feature vectors, inputting the extracted meta-feature vectors of all the flows into a flow-based classification model and predicting the encrypted flow type labels of the flows to obtain prediction labels, grouping the prediction labels with the same first dimension label, calculating the sum of L2I values corresponding to the labels in the grouping, and inputting the sum to the classification comparison module;

the classification comparison module is used for forming a K-dimensional vector which simultaneously carries the double labels of the corresponding flow according to the index of the leaf node of the judgment result of each decision tree in the flow-based classification model, and the K-dimensional vector is used as input to train a K-nearest neighbor classification algorithm to calculate and obtain an original encrypted flow sample and an L2I value of the flow; and comparing the original encrypted traffic sample and the L2I value of the stream according to the sum of the L2I values corresponding to the tags in the computation packets; if the ratio of the two is larger than the threshold value set before the user, the encrypted traffic label with the largest ratio is output as the classification result, and if the ratio of the two is smaller than the threshold value, the classification result is not output.

A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above-mentioned encrypted traffic classification method when executing the computer program.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned encrypted traffic classification method.

Compared with the prior art, the invention has the following beneficial technical effects:

the invention relates to an encrypted flow classification method, which forms a k-dimensional vector simultaneously carrying double labels of corresponding flows according to the index of a leaf node of a judgment result of each decision tree in a flow-based classification model, training a K-nearest neighbor classification algorithm by taking the K-dimensional vector as input to calculate to obtain an original encrypted flow sample and an L2I value of the flow, when any given encrypted flow sample is classified, the encrypted flow sample to be detected is divided into flows with the same ports according to port information contained in a data packet, meta-feature vectors are extracted, the meta-feature vectors of all the flows extracted are input into a flow-based classification model, encrypted flow type labels of all the flows are predicted to obtain prediction labels, the first dimension labels in the prediction labels are the same and are grouped, the sum of L2I values corresponding to the labels in the groups is calculated, and then the original encrypted flow sample and the L2I value of the flow are compared; if the ratio of the two is larger than the threshold value set by the user, the encrypted traffic label with the maximum ratio is output as a classification result, and if the ratio of the two is smaller than the threshold value, the classification result is not output, and the encrypted traffic classification is completed; the method is suitable for web page-oriented and stream-oriented network behaviors, can quickly classify the encrypted traffic, and realizes that the traffic in a real network environment distinguishes a complete website access traffic.

The encryption traffic classification system can quickly classify the encryption traffic, realizes the traffic differentiation of a complete website access traffic in a real network environment, provides powerful guarantee for network security, and can accurately judge the website accessed by a current user in a real network.

Drawings

FIG. 1 is a flowchart illustrating an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings:

as shown in fig. 1, a method for classifying encrypted traffic includes the following steps:

s1: generating a stream-based training set of encrypted traffic samples;

and acquiring a user encrypted traffic sample set, wherein each encrypted traffic sample in the set is an original traffic file containing a data packet and has a unique encrypted traffic type label. Splitting each complete encrypted flow sample into a plurality of flow samples according to port information contained in the data packet; then marking the corresponding stream according to the file containing the stream log information in the encrypted flow sample set; recording d-dimensional feature vectors in the encrypted flow sample set as meta-feature vectors; according to the meta-feature vector, performing vectorization representation on the stream in each encrypted flow sample; after all the streams in the encrypted flow sample set are represented in a vectorization mode by using the meta-feature vector, keeping the encrypted flow type label and the stream label of each encrypted flow sample, and comprehensively obtaining a double label (encrypted flow label and stream label) of each stream to obtain a stream-based training set of the encrypted flow samples;

extracting d-dimensional sequence characteristics of each encrypted flow sample in the encrypted flow sample set, and recording the d-dimensional sequence characteristics as [ f ]₁,f₂,…,f_d](ii) a Let the total p-type encrypted traffic samples and the encrypted traffic type of the i-th type encrypted traffic sample be labeled as label_i(ii) a After the encrypted flow sample is split into streams according to the ports, the streams are marked as label according to the log file_i-jWherein the value of j is determined according to the stream number of the encrypted flow samples of different classes; the meta-feature vector contains d-dimensional sequence features, denoted as [ f [ ]₁,f₂,…,f_d](ii) a Encrypted traffic sample trainingSet T, as follows:

wherein, label_pThe first dimension label is an encrypted flow sample layer label and corresponds to the network address of each monitoring website; label_p-jThe second dimension label is a flow layer label and corresponds to the connection network address of the flow in the website; the resulting T serves as the stream-based training set.

S2: generating an encrypted flow identification flow-based classification model by adopting a random forest classification model according to a flow-based training set;

specifically, the stream-based training set obtained in step S1 is used as input to train a random forest classification model, where the model is composed of k decision trees, and each decision tree has an independent determination result; the model integrates the independent judgment results of all decision trees and outputs an integrated judgment result; meanwhile, an index of a leaf node of a judgment result of each decision tree is taken to form a k-dimensional vector which is named as a fingerprint (fingerprint) and simultaneously carries a double label of a corresponding stream;

s3: encrypted traffic identification flow-oriented Label predictive Index (Label-Indication Index, hereinafter referred to as L2I value) calculation and recording: taking the flow-carrying double-label finger print obtained in the step 2 as an input, and training a K-nearest neighbor (KNN) classification algorithm; setting K as the number of each type of encrypted flow samples minus 1, counting the proportion values of the labels in the nearest K neighbors of each label prediction index finger print and the number of the samples same as the labels in the nearest K neighbors after the model is trained, and finally taking the average value of the proportion values of the label prediction indexes finger print of the same label type as the L2I value of the type of flow; meanwhile, in the two-dimensional label (encrypted traffic label, flow label) of the flow, the sum of L2I values belonging to the same type of encrypted traffic is counted as the L2I value of the type of encrypted traffic sample;

in particular, will be flow basedTaking a stream sample in the training set T as the input of a random forest classification model C, recording the index value of a leaf node where the judgment result of the v-th decision tree in the random forest classification model C is located, and generating a one-dimensional new feature F belonging to the encrypted flow sample_jTotal k-dimensional composite feature vector, denoted as [ F₁,F₂,…,F_k]. Finally, k-dimensional new features are generated for each encrypted traffic sample in the initial stream-based training set T, and a set of fingerprints (fingerprint) is obtained, which is denoted as P:

carrying out K nearest neighbor model training: calculating the original encrypted traffic sample and the L2I value of the stream; if the number of each type of encrypted traffic samples is n, K in the K neighbor is n-1, and the label is assumed to be (label)_p,label_p-j) Of K samples surrounding one finger print sample, the number of samples labeled with the same label is Num_p-jThen the L2I values for such a stream are:

L2I_p-j＝Num_p-j/K；

label for first dimension label_pThe value of L2I for the encrypted traffic sample of (1) is that all first dimension labels are labels_pThe sum of the L2I values of the stream; and recording the calculated original encrypted traffic sample and the L2I value of the stream, so as to be convenient for later calculation.

S4: integrating the flow to the original encrypted traffic label to implement encrypted traffic classification: giving any encrypted flow sample to be detected after flow splitting and extraction of meta-feature vectors, inputting the extracted meta-feature vectors of all flows into a flow-based classification model C, predicting an encrypted flow type label of the flow, and setting the label as (label)_x，label_x-1)，(label_x，label_x-2)...(label_y，label_y-j)；

Specifically, any encrypted flow sample is given, the whole flow sample is split into flows with the same port according to port information contained in a data packet, and then the obtained flows are respectively input into the flow-based classification model obtained in step S2, so that the judgment result is a double label; next, the sum of the L2I values of all the streams having the same first-dimension label (i.e., encrypted traffic sample label) in the determination result is calculated with reference to the L2I value record of the stream in step S3, and then the L2I value of such encrypted traffic obtained in step S3 is compared; if the ratio of the two is larger than a threshold value t set in front of the user, the encrypted traffic label with the largest ratio is output as a classification result, and if the ratio of the two is not larger than the threshold value t, the classification result is not output.

Let the total p-type encrypted traffic samples be shared, and the encrypted traffic type of the i-th type encrypted traffic sample is calibrated to be label_iThen, after splitting the encrypted traffic sample into streams according to the ports, marking the streams as label according to the log file_i-jWherein the value of j is determined according to the number of streams of the encrypted traffic samples of different classes; the post feature vector contains d-dimensional features, denoted as [ f [ ]₁,f₂,…,f_d](ii) a The encrypted traffic sample first stage training set is denoted as T and is expressed as follows:

wherein, label_pThe label is a first dimension label and is an encrypted flow sample layer label; label_p-jIs a second dimension label and is a flow layer label.

The vector of the fingerprint (fingerprint) contains k-dimensional features, denoted as [ F [ ]₁,F₂,…,F_k](ii) a Let the total p-type encrypted traffic samples be shared, and the encrypted traffic type of the i-th type encrypted traffic sample is calibrated to be label_iAfter the encrypted traffic sample is split into flows according to the ports, the jth flow of the ith encrypted traffic sample is marked as label_i-jWherein the value of j is based on different classes of encrypted traffic samplesDetermining the number of streams; the set of fingerprints (fingerprint) generated from the leaf indices of the decision tree is denoted as P, and is represented as follows:

specifically, any encrypted flow sample is given, split into streams according to ports, and subjected to vectorization representation by adopting element feature vector flow, and then input into a stream-based classification model C to obtain sample labels of all streams, and the sample labels are set as (label)_x，label_x-m)，(label_y，label_y-s) (ii) a Then, the sum of the L2I values of all encrypted traffic samples with the same first dimension label is counted_x,sum_yThen, L2I of the encrypted traffic sample of the corresponding category obtained in step S3 is calculated_x,L2I_yRatio e of_x,e_y(ii) a If the specific values are all larger than a threshold t set by a user, taking a larger category for output, and if the specific values are all smaller than the threshold t, not outputting, and calibrating the category as an invalid sample; and repeating the steps, and classifying all the encrypted flow samples based on the flow double labels.

In one embodiment of the present invention, a terminal device is provided that includes a processor and a memory, the memory storing a computer program comprising program instructions, the processor executing the program instructions stored by the computer storage medium. The processor is a Central Processing Unit (CPU), or other general purpose processor, Digital Signal Processor (DSP), Application Specific Integrated Circuit (ASIC), ready-made programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc., which is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and in particular, to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; the processor described in the embodiments of the present invention may be used for the operation of the encryption traffic classification method.

An encrypted traffic classification system can be used for realizing the encrypted traffic classification method in the embodiment, and specifically comprises an input module and a classification comparison module;

the classification comparison module is used for forming a K-dimensional vector which simultaneously carries the double labels of the corresponding flow according to the index of the leaf node of the judgment result of each decision tree in the flow-based classification model, and calculating by taking the K-dimensional vector as an input training K-nearest neighbor classification algorithm to obtain an original encrypted flow sample and an L2I value of the flow; and comparing the original encrypted traffic sample and the L2I value of the stream according to the sum of the L2I values corresponding to the tags in the computation packets; if the ratio of the two is larger than the threshold value set before the user, the encrypted traffic label with the largest ratio is output as the classification result, and if the ratio of the two is smaller than the threshold value, the classification result is not output.

In still another embodiment of the present invention, the present invention further provides a storage medium, which specifically uses a computer-readable storage medium (Memory), where the computer-readable storage medium is a Memory device in a terminal device, and is used for storing programs and data. The computer-readable storage medium includes a built-in storage medium in the terminal device, provides a storage space, stores an operating system of the terminal, and may also include an extended storage medium supported by the terminal device. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a Non-volatile memory (Non-volatile memory), such as at least one disk memory. One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to perform the corresponding steps of the method for classifying encrypted traffic in the above embodiments.

The encrypted flow sample set and the meta-feature vector are provided by a user; a user provides an original data file of each encrypted flow sample and an encrypted flow type label of the original data file; the number k of decision trees of the random forest algorithm and a threshold value t required by the integration stage are set by a user. The encryption flow classification based on the flow double-label mechanism can realize complete website access classification and prevent the cross of flow in the access process; the method is suitable for web-oriented and stream-oriented network behaviors, and can be used in different encrypted flows, including HTTPS protocol, Tor network and ShadowSocks network.

Claims

1. A method for classifying encrypted traffic is characterized by comprising the following steps:

s1, collecting a user encrypted flow sample set, and splitting each complete encrypted flow sample into a plurality of flow samples according to port information contained in the data packet; then marking the corresponding stream according to the file containing the stream log information in the encrypted flow sample set; recording d-dimensional feature vectors in the encrypted flow sample set as meta-feature vectors; according to the meta-feature vector, performing vectorization representation on the stream in each encrypted flow sample; after all the streams in the encrypted flow sample set are represented in a vectorization mode by adopting the meta-feature vectors, keeping the encrypted flow type label and the stream label of each encrypted flow sample, comprehensively obtaining the label of each stream as a double label, and obtaining a stream-based training set of the encrypted flow samples;

s3, forming a K-dimensional vector simultaneously carrying an encrypted traffic type label of each encrypted traffic sample and a flow label according to the index of a leaf node of a judgment result of each decision tree in the flow-based classification model, and calculating by taking the K-dimensional vector as an input training K-nearest neighbor classification algorithm to obtain an original encrypted traffic sample and an L2I value of the flow;

training a K-nearest neighbor (KNN) classification algorithm by taking the obtained dual-label finger print carrying the flow as input; taking the average value of the proportional values of the label predictive index finger of the same label type as the L2I value of the stream;

s4, according to port information contained in the data packet, dividing the encrypted flow sample to be detected into flows with the same port and extracting meta-feature vectors, inputting the extracted meta-feature vectors of all the flows into a flow-based classification model and predicting the encrypted flow type labels of the flows to obtain prediction labels, grouping the prediction labels with the same first dimension labels, calculating the sum of L2I values corresponding to the labels in the grouping, and then comparing the original encrypted flow sample with the L2I value of the flow; if the ratio of the encrypted traffic label to the encrypted traffic label is larger than a threshold value set in front of the user, outputting the encrypted traffic label with the largest ratio as a classification result, and if the ratio of the encrypted traffic label to the encrypted traffic label is smaller than the threshold value, not outputting the classification result and finishing encrypted traffic classification;

specifically, any encrypted traffic sample to be measured after being subjected to flow splitting and meta-feature vector extraction is given, the meta-feature vectors of all the flows extracted by the sample are input into the flow-based classification model C, and the encrypted traffic type labels of the samples are predicted.

2. The method according to claim 1, wherein a user encrypted traffic sample set is collected, each encrypted traffic sample in the set is an original traffic file containing a data packet, and has a unique encrypted traffic type tag; splitting each complete encrypted flow sample into a plurality of flow samples according to port information contained in the data packet; then marking the corresponding stream according to the file containing the stream log information in the encrypted flow sample set; according to the meta-feature vectors in the encrypted traffic sample set, performing vectorization representation on the stream in each encrypted traffic sample; and after all the streams in the encrypted flow sample set are represented in a vectorization mode by adopting the meta-feature vector, keeping the encrypted flow type label and the stream label of each encrypted flow sample to obtain a stream-based training set of the encrypted flow samples.

3. The encrypted traffic classification method according to claim 2, characterized in that d-dimensional sequence features are extracted from each encrypted traffic sample in the encrypted traffic sample set and are recorded as [ f [ ]₁,f₂,…,f_d](ii) a Let the total p-type encrypted traffic samples and the encrypted traffic type of the i-th type encrypted traffic sample be labeled as label_i(ii) a After the encrypted flow sample is split into streams according to the ports, the streams are marked as label according to the log file_i-jWherein the value of j is determined according to the number of streams of the encrypted traffic samples of different classes; the training set of encrypted traffic samples is denoted as T:

4. The encrypted traffic classification method according to claim 1, characterized in that a random forest classification model is trained with the stream-based training set obtained in step S1 as input, the model being composed of k decision trees; and taking the index of the leaf node of the judgment result of each decision tree to form a k-dimensional vector simultaneously carrying the double labels of the corresponding flow.

5. The encrypted traffic classification method according to claim 4, characterized in that a stream sample in the stream-based training set T is used as an input of the stream-based classification model C, the index value of the leaf node where the decision result of the v-th decision tree in the stream-based classification model C is located is recorded,generating a new one-dimensional feature F belonging to the encrypted flow sample_jTotal k-dimensional composite feature vector, denoted as [ F₁,F₂,…,F_k](ii) a And finally, generating k-dimensional new features for each encrypted flow sample in the training set T based on the flow to obtain a fingerprint set, wherein the fingerprint set is represented as P:

6. the method according to claim 5, wherein if the number of encrypted traffic samples in each class is n, K in K neighbors is n-1, and the label is assumed to be (label)_p,label_p-j) Of K samples surrounding one finger print sample, the number of samples labeled with the same label is Num_p-jThen the L2I value for this type of stream is:

L2I_p-j＝Num_p-j/K；

the first dimension label is label_pThe value of L2I for the encrypted traffic sample of (1) is that all first dimension labels are labels_pThe sum of the L2I values of the stream;

the set of finger prints generated from the leaf indices of the decision tree is denoted as P and is represented as follows:

7. the method according to claim 5, wherein any encrypted traffic sample is given, split into streams according to ports, and subjected to vectorization representation by using meta-feature vectors, and then input into a stream-based classification model C to obtain type labels of all streams.

8. An encrypted traffic classification system for use in the encrypted traffic classification method according to claim 1, comprising:

the classification comparison module is used for forming a K-dimensional vector which simultaneously carries an encrypted traffic type label of each encrypted traffic sample and a flow label according to the index of a leaf node of a judgment result of each decision tree in the flow-based classification model, and calculating by taking the K-dimensional vector as an input training K-nearest neighbor classification algorithm to obtain an original encrypted traffic sample and an L2I value of the flow; and comparing the original encrypted traffic sample with the L2I value of the stream according to the sum of the L2I values corresponding to the tags in the computation packets; if the ratio of the two is larger than the threshold value set before the user, the encrypted traffic label with the largest ratio is output as the classification result, and if the ratio of the two is smaller than the threshold value, the classification result is not output.

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.