CN115334005B - Encryption flow identification method based on pruning convolutional neural network and machine learning - Google Patents
Encryption flow identification method based on pruning convolutional neural network and machine learning Download PDFInfo
- Publication number
- CN115334005B CN115334005B CN202210337870.2A CN202210337870A CN115334005B CN 115334005 B CN115334005 B CN 115334005B CN 202210337870 A CN202210337870 A CN 202210337870A CN 115334005 B CN115334005 B CN 115334005B
- Authority
- CN
- China
- Prior art keywords
- neural network
- convolutional neural
- model
- machine learning
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 55
- 238000010801 machine learning Methods 0.000 title claims abstract description 20
- 238000013138 pruning Methods 0.000 title claims abstract description 20
- 239000013598 vector Substances 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 13
- 210000002569 neuron Anatomy 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 9
- 238000003066 decision tree Methods 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 5
- 238000005192 partition Methods 0.000 claims description 4
- 238000012546 transfer Methods 0.000 claims description 4
- 230000000873 masking effect Effects 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 230000011218 segmentation Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 8
- 238000013145 classification model Methods 0.000 abstract description 7
- 239000000284 extract Substances 0.000 abstract description 5
- 238000004364 calculation method Methods 0.000 abstract description 4
- 238000010276 construction Methods 0.000 abstract 1
- 238000000605 extraction Methods 0.000 abstract 1
- 238000001514 detection method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2441—Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2483—Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Image Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an encryption flow identification method based on a pruning convolutional neural network and machine learning, which comprises the steps of data preprocessing, CNN model construction, model pruning, CNN extraction of high-level feature vectors and LightGBM classification. According to the encryption flow identification method based on the pruned convolutional neural network and the machine learning, the characteristics are not required to be manually extracted, the CNN model is utilized to automatically extract the advanced characteristics from the original flow file and classify the same, meanwhile, a pruned convolutional neural network model is constructed, the model parameter is reduced, the calculation cost is reduced, the LightGBM is used for classifying according to the advanced characteristics of the encryption flow, the strong classification effect is achieved through the weak classifier, the accuracy is improved, and the final model can achieve higher performance and accuracy than other classification models.
Description
Technical Field
The invention relates to the technical field of network traffic identification, in particular to an encrypted traffic identification method based on a pruned convolutional neural network and machine learning.
Background
The network traffic recognition technology plays an important role in network quality of service control, traffic billing, network resource usage planning, malware detection and other applications. With the continuous development of network information technology, more and more software uses encryption or port confusion technologies such as SSL, SSH, VPN and Tor, and the like, so that the duty ratio of encrypted traffic is higher and higher.
The research statistics agency Netmarketshare states that by 10 months of 2019, the proportion of encrypted Web traffic has exceeded ninety, the top 100 non-Google Web sites on the internet had 90 bits of HTTPS by default, and the proportion of HTTPS in the united states was 92%, russia 85%, japan 80%, and indonesia 74% worldwide. This variation presents new challenges to current traffic detection methods, making network traffic identification and analysis increasingly difficult.
The precondition of flow classification is that the characteristics of different flows are unique, and the current flow classification method can be roughly divided into the following steps:
1) Port-based classification methods. The method is based on the premise that the application service uses ports allocated by the IANA and remains unchanged, different flow types are distinguished according to port numbers used by the flows.
2) Payload-based classification methods. This approach, also known as deep packet inspection, i.e., distinguishing protocols based on static payload characteristics, can be used on some coarse-grained traffic classification.
3) Statistical-based classification methods. The method adopts a plurality of machine learning techniques, and different types are distinguished according to the statistical characteristics of the flow. These features can be broadly divided into packet level, which includes some packet length, packet inter-arrival time and direction, etc., and flow level, which includes some number of upstream and downstream traffic packets, network flow duration, proportion of different types of traffic packets, etc.
The current flow classification method has the following disadvantages:
1) The classification method based on the ports can greatly reduce the accuracy rate when encountering the ports which are beyond IANA regulation and use random or dynamic ports for the malicious software traffic, and the method can not identify the malicious software traffic.
2) The method for classifying the traffic based on the payloads can destroy the load characteristics depending on the traffic after encrypting, and is only suitable for traffic classification with coarse granularity or scenes which are not completely encrypted.
3) The classification model trained by the method has huge parameter quantity and limits the model deployment condition based on the classification method of deep learning.
Disclosure of Invention
Aiming at the technical problem of huge parameter quantity of a classification model trained by a classification method based on deep learning, the invention provides an encryption flow identification method based on a pruning convolutional neural network and machine learning, which does not need to manually extract features, automatically extracts advanced features from an original flow file and classifies the extracted features, pruning is carried out on the model, the parameter quantity of the model is reduced, the convolutional neural network is used for automatically extracting the extracted features, the LightGBM achieves the effect of strong classification by a weak classifier, the final model can achieve higher performance and higher precision than other classification models, and the encryption flow identification method is suitable for efficient detection of encryption flow.
In order to achieve the above object, the present invention provides the following technical solutions:
the invention provides an encrypted flow identification method based on a pruned convolutional neural network and machine learning, which comprises the following steps:
s1: preprocessing data;
s2: the CNN model is built, and the convolutional neural network mainly comprises the following layers: an input layer, a convolution layer, a ReLU layer, a pool layer and a full connection layer;
s3: pruning the model, retraining the model, and obtaining an optimized CNN model after a plurality of iterations;
s4: the optimized CNN model outputs a 256-dimensional feature vector as input of the LightGBM classifier;
s5: the method comprises the steps of classifying the lightGBM, obtaining a gradient decision tree in the lightGBM algorithm by carrying out iteration on a given training data set for a plurality of times, readjusting a new tree by gradient information during each iteration to add a previous iteration tree, wherein in a function space, the process is a continuously-changing linear combination process, the lightGBM integrates weights of all leaf nodes as references for constructing the tree, then determining partition points, calculating first-order gradient and second-order gradient, and optimizing the performance of the lightGBM classifier after a plurality of iterations.
Compared with the prior art, the invention has the beneficial effects that:
according to the encryption flow identification method based on the pruned convolutional neural network and the machine learning, the characteristics are not required to be manually extracted, the CNN model is utilized to automatically extract the high-level characteristics from the original flow file and classify the high-level characteristics, meanwhile, the pruned convolutional neural network model is constructed, the model parameter is reduced, the calculation cost is reduced, the LightGBM is used for classifying according to the high-level characteristics of the encryption flow, the strong classification effect is achieved through the weak classifier, the accuracy is improved, and the final model can achieve higher performance and accuracy than other classification models.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
Fig. 1 is a flowchart of an encryption traffic identification method based on a pruned convolutional neural network and machine learning according to an embodiment of the present invention.
Fig. 2 is a flow chart of data preprocessing according to an embodiment of the present invention.
Fig. 3 is a flowchart of a pruning step according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides an encryption flow identification method based on a pruned convolutional neural network and machine learning, which is shown in figure 1 and comprises the following steps:
s1: data preprocessing: processing the original flow file to be suitable for standard input of the CNN model;
the encrypted traffic entered in step S1 uses the public data set ISCXVPN2016, which contains 6 conventional encrypted traffic: email, chat, streaming, file transfer, voIP and P2P,6 corresponding VPN encrypted traffic: VPN-Email, VPN-Chat, VPN-Streaming, VPN-File transfer, VPN-VoIP, and VPN-P2P. Traffic data were obtained by both the Wireshark and tcpdump tools in a real environment for a total of 28GB.
The specific flow of the data preprocessing step is shown in fig. 2. The key points are as follows:
removing irrelevant messages: i.e. removing packets that affect model predictions or have empty payloads. Traffic in the real environment may contain some packets for TCP connection and disconnection, such as packets containing SYN, ACK or FIN flags, and some packets for domain name resolution and packets with empty payload, which are not effective for traffic classification, but rather affect classification accuracy, so that removal is required.
The ethernet frame header is removed: the ethernet header contains a MAC address for identifying the network device location and for transmitting data packets between network nodes, but has little effect in traffic classification, so the ethernet header needs to be deleted.
Masking the IP address: the IP address has an overfitting effect on the model in traffic classification, requiring the source IP address and the destination IP address to be set to 0.
Check packet length: the method uses convolutional neural network, which requires input of fixed size, but the length of the data packet is not constant, so the length of the data packet is checked, and if the length is smaller than the prescribed input size, zero padding is needed at the end of the data packet. If the length is greater than the fixed input size, the packet needs to be truncated. Ensuring that the length of the traffic packet conforms to the input size of the CNN model.
Normalization: different evaluation indexes often have different dimensions, and in order to solve the comparability between data indexes, normalization processing is required to be performed on the data packet, and 255 is divided by byte unit, so that the input size is between 0 and 1.
S2: and constructing a CNN model.
Convolutional neural networks are a type of feedforward neural network that includes convolutional calculations and has a deep structure, and are one of the most popular deep learning algorithms at present. With the penetration of learning theory and the improvement of computing performance, convolutional neural networks have been rapidly developed and applied to computer vision, natural language processing, and the like. The convolutional neural network is mainly composed of the following layers: input layer, convolution layer, reLU layer, pool layer, and full connection layer. By superposing the layers, a complete convolutional neural network is formed, and finally 256-dimensional feature vectors are output for the subsequent LightGBM classifier. The feature vector dimension is too high, which easily causes over fitting of results and increases cost, and too low dimension can reduce the accuracy of the classification. The structure of the CNN model used in the invention is shown in Table 1, wherein the key points are as follows:
convolution layer: conv2D, two-dimensional convolution, the flow data packet can be converted into a gray level image, and the processing is more suitable for two-dimensional convolution.
Activation function: reLU, as shown in equation (1), activates a node only when the input is greater than 0, outputs 0 when the input is less than 0, and outputs equal to the input when the input is greater than 0. The function may remove negative values from the convolution result, leaving positive values unchanged.
ReLU(x)=max(0,x) (1)
Batch normalization: batch Normalization, like the conventional data normalization, is a way to unify scattered data and is also a way to optimize neural networks, dividing the data into small batches for random gradient descent. As shown in formula (2), wherein alpha i Is the original activation value of a certain neuron,is a normalized value after normalization operation.
Loss function: the cross entropy loss function (CrossEntropy Loss) represents the difference between the true probability distribution and the predicted probability distribution as shown in formula (3), and the smaller the value of the cross entropy, the better the model prediction effect.
Activation function of output layer: softmax, when a sample passes through the Softmax layer and outputs a vector of T1, the index of the largest value in the vector is taken as the prediction label of the sample, and the formula is shown in (4).
Dropout: when training, training of some neurons can be stopped randomly, robustness of the model is improved, and the model is set to be dropout to be 0.5.
TABLE 1 principal parameters of CNN model
Network layer | Operation of | Input device | Convolution kernel | Step size | Filling | Output of | Weight number |
1 | Conv2D+ReLU+BN | 30*30 | 3*3 | 1 | Same | 8*30*30 | 80 |
2 | Conv2D+ReLU+BN | 8*30*30 | 3*3 | 2 | Same | 16*14*14 | 1168 |
3 | Conv2D+ReLU+BN | 16*14*14 | 3*3 | 2 | Same | 32*6*6 | 4640 |
4 | Conv2D+ReLU+BN | 32*6*6 | 3*3 | 1 | Same | 64*4*4 | 18496 |
5 | Full connection +Dropout | 64*4*4 | Null | Null | None | 256 | 262400 |
S3: pruning the model, retraining the model, and obtaining an optimized CNN model after a plurality of iterations;
generally speaking, the more layers and parameters of the neural network, the better the result, but at the same time, the more computational resources are consumed. Therefore, the pruning technology can be used for removing parameters which have smaller influence on the prediction result, the contribution degree of the neurons of the model to the output result is ordered according to the neurons of the model, and neurons with low contribution degree are abandoned, so that the running speed of the model is faster, and the model file is smaller. As shown in fig. 3, assuming that the first layer has 4 neurons and the second layer has 5 neurons, the corresponding weight matrix is 4*5. The pruning process is as follows:
sorting weights of two adjacent layers of neurons according to absolute values;
the weight with smaller absolute value (e.g. 0.4) is clipped according to pruning rate P, i.e. set to 0.
After pruning, the model is retrained, and an optimized CNN model is obtained after a plurality of iterations.
S4: the optimized CNN model outputs a 256-dimensional feature vector as input of the LightGBM classifier;
s5: lightGBM classification.
The LightGBM is a framework for realizing GBDT algorithm, GBDT is a model which is long and weak in machine learning, and the main idea is to use weak classifier (decision tree) for iterative training to obtain an optimal model, and the model has the advantages of good training effect, difficult fitting and the like. Compared with the conventional CNN full-connection layer serving as a classifier, the LightGBM classifier supports high-efficiency parallel training, has higher training speed and lower memory consumption, supports distributed rapid processing of massive data, and reduces the deployment requirement of a detection model.
The gradient decision tree in the LightGBM algorithm is obtained by iterating a given training data set multiple times, and during each iteration, readjusting a new tree with gradient information to add a previous iteration tree, wherein in the function space, the above-mentioned process is a continuously-changing linear combination process, as shown in formula (6):
χ is the function space of the iteration tree, f q (x i ) Representing the predicted value of the ith instance in the qth tree.
Each partition node of the tree adopts an optimal partition point, and a greedy method is actually used in building a tree model. The LightGBM integrates the weights of all leaf nodes as a reference for building a tree, then determines the segmentation points and calculates the first and second order gradients.
For any given tree structure, the LightGBM defines the total number of times each feature is partitioned in the iteration tree, t_split, and the sum of gains t_gain that the feature brings after being partitioned in all decision trees, as a measure for measuring the importance of the feature, specifically defined as follows:
wherein K is K decision trees generated by K rounds of iteration.
And after multiple iterations, the performance of the LightGBM classifier is optimized.
Compared with the classification of the original CNN model, the LightGBM improves the accuracy and recall rate, and the recognition speed is also accelerated.
According to the encryption flow identification method based on the pruned convolutional neural network and the machine learning, the characteristics are not required to be manually extracted, the CNN model is utilized to automatically extract the high-level characteristics from the original flow file and classify the high-level characteristics, meanwhile, the pruned convolutional neural network model is constructed, the number of model parameters is reduced, the calculation cost is reduced, the LightGBM is used for classifying according to the high-level characteristics of the encryption flow, the effect of strong classification is achieved through a weak classifier, the accuracy is improved, and the final model can achieve higher performance and higher accuracy than other classification models (see Table 2).
TABLE 2 comparison of the frontal model of the present application with other classification models
Method | Accuracy rate of | Recall rate of recall | F1 value |
1D CNN | 0.89 | 0.89 | 0.89 |
CNN+LSTM | 0.91 | 0.91 | 0.91 |
SAE | 0.92 | 0.92 | 0.92 |
2D-CNN | 0.91 | 0.91 | 0.91 |
Model before pruning | 0.90 | 0.86 | 0.88 |
Pruning model (application) | 0.94 | 0.93 | 0.93 |
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus embodiments, the electronic device embodiments, the computer-readable storage medium embodiments, and the computer program product embodiments, the description is relatively simple, as relevant to the description of the method embodiments in part, since they are substantially similar to the method embodiments.
The foregoing examples are merely specific embodiments of the present application, and are not intended to limit the scope of the present application, but the present application is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, the present application is not limited thereto. Any person skilled in the art, within the technical scope of the disclosure of the present application, may modify or easily conceive of changes to the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical details; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (7)
1. An encrypted traffic identification method based on a pruned convolutional neural network and machine learning is characterized by comprising the following steps:
s1: preprocessing data; the encrypted traffic entered in step S1 uses the public data set ISCXVPN2016, which contains 6 conventional encrypted traffic: email, chat, streaming, file transfer, voIP and P2P,6 corresponding VPN encrypted traffic: VPN-Email, VPN-Chat, VPN-Streaming, VPN-File transfer, VPN-VoIP, and VPN-P2P;
s2: the CNN model is built, and the convolutional neural network mainly comprises the following layers: an input layer, a convolution layer, a ReLU layer, a pool layer and a full connection layer;
s3: pruning the model, retraining the model, and obtaining an optimized CNN model after a plurality of iterations;
s4: the optimized CNN model outputs a 256-dimensional feature vector as input of the LightGBM classifier;
s5: the method comprises the steps of classifying the lightGBM, obtaining a gradient decision tree in the lightGBM algorithm by carrying out iteration on a given training data set for a plurality of times, readjusting a new tree by gradient information during each iteration to add a previous iteration tree, wherein in a function space, the process is a continuously-changing linear combination process, the lightGBM integrates weights of all leaf nodes as references for constructing the tree, then determining partition points, calculating first-order gradient and second-order gradient, and optimizing the performance of the lightGBM classifier after a plurality of iterations.
2. The encrypted traffic recognition method based on the pruned convolutional neural network and the machine learning according to claim 1, wherein the traffic data input in the step S1 are acquired by a Wireshark tool and a tcpdump tool in a real environment, and total 28GB.
3. The encrypted traffic recognition method based on a pruned convolutional neural network and machine learning according to claim 1, wherein the step S1 data preprocessing process comprises:
s11: reading a pcap file;
s12: removing irrelevant messages;
s13: removing the Ethernet frame header;
s14: masking the IP address;
s15: checking whether the packet length is larger than the specified input size, if so, cutting off the data packet, otherwise, performing zero padding at the end of the data packet to generate a byte matrix;
s16: the data packet is normalized by dividing 255 in bytes so that the input size is between 0 and 1.
4. The encrypted traffic recognition method based on pruned convolutional neural network and machine learning according to claim 1, wherein step S2 forms a complete convolutional neural network by superimposing an input layer, a convolutional layer, a ReLU layer, a pool layer, and a full-connection layer;
wherein the convolution layer is a two-dimensional convolution;
the activation function ReLU is shown in equation (1):
ReLU(x)=max(0,x) (1)
batch normalization is shown in equation (2):
wherein alpha is i Is the original activation value of a certain neuron,is a standard value after standardized operation;
the loss function is shown in equation (3):
the activation function Softmax formula of the output layer is shown in (4):
the present model set dropout to 0.5.
5. The encrypted traffic recognition method based on pruning convolutional neural network and machine learning according to claim 1, wherein the pruning process of step S3 is as follows:
s31: sorting weights of two adjacent layers of neurons according to absolute values;
s32: cutting out a weight with an absolute value smaller than 0.4 according to the pruning rate P, namely setting the weight to 0;
s33: after pruning, the model is retrained, and an optimized CNN model is obtained after a plurality of iterations.
6. The encrypted traffic recognition method based on a pruned convolutional neural network and machine learning according to claim 1, wherein the continuously varying linear combination process of step S5 is as shown in formula (6):
χ is the function space of the iteration tree, f q (x i ) Representing the predicted value of the ith instance in the qth tree.
7. The method for identifying encrypted traffic based on a pruned convolutional neural network and machine learning according to claim 1, wherein, for any given tree structure, the LightGBM defines the total number of times each feature is segmented in the iterative tree, t_split, and the Gain sum t_gain of the feature after being used for segmentation in all decision trees as a measure for measuring the importance of the feature, specifically defined as follows:
wherein K is K decision trees generated by K rounds of iteration.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210337870.2A CN115334005B (en) | 2022-03-31 | 2022-03-31 | Encryption flow identification method based on pruning convolutional neural network and machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210337870.2A CN115334005B (en) | 2022-03-31 | 2022-03-31 | Encryption flow identification method based on pruning convolutional neural network and machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115334005A CN115334005A (en) | 2022-11-11 |
CN115334005B true CN115334005B (en) | 2024-03-22 |
Family
ID=83916441
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210337870.2A Active CN115334005B (en) | 2022-03-31 | 2022-03-31 | Encryption flow identification method based on pruning convolutional neural network and machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115334005B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116743506B (en) * | 2023-08-14 | 2023-11-21 | 南京信息工程大学 | Encrypted flow identification method and device based on quaternion convolutional neural network |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110472778A (en) * | 2019-07-29 | 2019-11-19 | 上海电力大学 | A kind of short-term load forecasting method based on Blending integrated study |
CN112380781A (en) * | 2020-11-30 | 2021-02-19 | 中国人民解放军国防科技大学 | Satellite observation completion method based on reanalysis data and unbalanced learning |
WO2021088499A1 (en) * | 2019-11-04 | 2021-05-14 | 西安交通大学 | False invoice issuing identification method and system based on dynamic network representation |
CN113159109A (en) * | 2021-03-04 | 2021-07-23 | 北京邮电大学 | Wireless network flow prediction method based on data driving |
WO2021190379A1 (en) * | 2020-03-25 | 2021-09-30 | 第四范式(北京)技术有限公司 | Method and device for realizing automatic machine learning |
CN113489751A (en) * | 2021-09-07 | 2021-10-08 | 浙江大学 | Network traffic filtering rule conversion method based on deep learning |
CN113537497A (en) * | 2021-06-07 | 2021-10-22 | 贵州优联博睿科技有限公司 | Gradient lifting decision tree model construction optimization method based on dynamic sampling |
CN113779608A (en) * | 2021-09-17 | 2021-12-10 | 神谱科技(上海)有限公司 | Data protection method based on WOE mask in multi-party longitudinal federal learning LightGBM training |
CN113901448A (en) * | 2021-09-03 | 2022-01-07 | 燕山大学 | Intrusion detection method based on convolutional neural network and lightweight gradient elevator |
WO2022041394A1 (en) * | 2020-08-28 | 2022-03-03 | 南京邮电大学 | Method and apparatus for identifying network encrypted traffic |
CN114189350A (en) * | 2021-10-20 | 2022-03-15 | 北京交通大学 | LightGBM-based train communication network intrusion detection method |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108932480B (en) * | 2018-06-08 | 2022-03-15 | 电子科技大学 | Distributed optical fiber sensing signal feature learning and classifying method based on 1D-CNN |
CN111860628A (en) * | 2020-07-08 | 2020-10-30 | 上海乘安科技集团有限公司 | Deep learning-based traffic identification and feature extraction method |
-
2022
- 2022-03-31 CN CN202210337870.2A patent/CN115334005B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110472778A (en) * | 2019-07-29 | 2019-11-19 | 上海电力大学 | A kind of short-term load forecasting method based on Blending integrated study |
WO2021088499A1 (en) * | 2019-11-04 | 2021-05-14 | 西安交通大学 | False invoice issuing identification method and system based on dynamic network representation |
WO2021190379A1 (en) * | 2020-03-25 | 2021-09-30 | 第四范式(北京)技术有限公司 | Method and device for realizing automatic machine learning |
WO2022041394A1 (en) * | 2020-08-28 | 2022-03-03 | 南京邮电大学 | Method and apparatus for identifying network encrypted traffic |
CN112380781A (en) * | 2020-11-30 | 2021-02-19 | 中国人民解放军国防科技大学 | Satellite observation completion method based on reanalysis data and unbalanced learning |
CN113159109A (en) * | 2021-03-04 | 2021-07-23 | 北京邮电大学 | Wireless network flow prediction method based on data driving |
CN113537497A (en) * | 2021-06-07 | 2021-10-22 | 贵州优联博睿科技有限公司 | Gradient lifting decision tree model construction optimization method based on dynamic sampling |
CN113901448A (en) * | 2021-09-03 | 2022-01-07 | 燕山大学 | Intrusion detection method based on convolutional neural network and lightweight gradient elevator |
CN113489751A (en) * | 2021-09-07 | 2021-10-08 | 浙江大学 | Network traffic filtering rule conversion method based on deep learning |
CN113779608A (en) * | 2021-09-17 | 2021-12-10 | 神谱科技(上海)有限公司 | Data protection method based on WOE mask in multi-party longitudinal federal learning LightGBM training |
CN114189350A (en) * | 2021-10-20 | 2022-03-15 | 北京交通大学 | LightGBM-based train communication network intrusion detection method |
Non-Patent Citations (4)
Title |
---|
Fourier神经网络非线性拟合性能优化研究;陈诗雨;李小勇;杜杨杨;谢福起;;武汉大学学报(工学版)(第03期);全文 * |
Research_on_condition_assessment_for_distribution_vacuum_switch_cabinets_based_on_multi-source_information_fusion;chaoqun kang;《2015 5th International Conference on Electric Utility Deregulation and Restructuring and Power Technologies (DRPT)》;全文 * |
基于一维卷积神经网络的网络流量分类方法;李道全;王雪;于波;黄泰铭;;计算机工程与应用(第03期);全文 * |
流量的集成学习与重采样均衡分类方法;顾兆军;吴优;赵春迪;周景贤;;计算机工程与应用(第06期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN115334005A (en) | 2022-11-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113162908B (en) | Encrypted flow detection method and system based on deep learning | |
CN108768986B (en) | Encrypted traffic classification method, server and computer readable storage medium | |
CN114615093B (en) | Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning | |
CN111565156B (en) | Method for identifying and classifying network traffic | |
CN112769752B (en) | Network intrusion detection method based on machine learning integration model | |
CN109831392A (en) | Semi-supervised net flow assorted method | |
CN108173704A (en) | A kind of method and device of the net flow assorted based on representative learning | |
CN111817971B (en) | Data center network flow splicing method based on deep learning | |
CN113992349B (en) | Malicious traffic identification method, device, equipment and storage medium | |
Coelho et al. | BACKORDERS: using random forests to detect DDoS attacks in programmable data planes | |
CN115334005B (en) | Encryption flow identification method based on pruning convolutional neural network and machine learning | |
CN110351303B (en) | DDoS feature extraction method and device | |
CN113472751A (en) | Encrypted flow identification method and device based on data packet header | |
Chen et al. | Ride: Real-time intrusion detection via explainable machine learning implemented in a memristor hardware architecture | |
Yan et al. | Principal Component Analysis Based Network Traffic Classification. | |
Min et al. | Online Internet traffic identification algorithm based on multistage classifier | |
CN112839051A (en) | Encryption flow real-time classification method and device based on convolutional neural network | |
CN113746707B (en) | Encrypted traffic classification method based on classifier and network structure | |
CN114124437B (en) | Encrypted flow identification method based on prototype convolutional network | |
CN116248530A (en) | Encryption flow identification method based on long-short-time neural network | |
CN115473748A (en) | DDoS attack classification detection method, device and equipment based on BiLSTM-ELM | |
CN112367325A (en) | Unknown protocol message clustering method and system based on closed frequent item mining | |
Chen et al. | Encapsulated and Anonymized Network Video Traffic Classification With Generative Models | |
Kong et al. | Fast abnormal identification for large scale internet traffic | |
Tatarnikova et al. | Detection of network attacks by deep learning method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |