CN115334005A - Encrypted flow identification method based on pruning convolution neural network and machine learning - Google Patents
Encrypted flow identification method based on pruning convolution neural network and machine learning Download PDFInfo
- Publication number
- CN115334005A CN115334005A CN202210337870.2A CN202210337870A CN115334005A CN 115334005 A CN115334005 A CN 115334005A CN 202210337870 A CN202210337870 A CN 202210337870A CN 115334005 A CN115334005 A CN 115334005A
- Authority
- CN
- China
- Prior art keywords
- neural network
- pruning
- convolutional neural
- model
- machine learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000013138 pruning Methods 0.000 title claims abstract description 31
- 238000010801 machine learning Methods 0.000 title claims abstract description 22
- 238000013528 artificial neural network Methods 0.000 title description 5
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 56
- 239000013598 vector Substances 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 230000008569 process Effects 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 13
- 210000002569 neuron Anatomy 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 238000003066 decision tree Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 5
- 238000012546 transfer Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 2
- 238000005192 partition Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 8
- 238000013145 classification model Methods 0.000 abstract description 7
- 238000004364 calculation method Methods 0.000 abstract description 4
- 238000000605 extraction Methods 0.000 abstract description 3
- 238000010276 construction Methods 0.000 abstract 1
- 238000012545 processing Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2441—Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2483—Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Image Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an encrypted traffic identification method based on a pruning convolutional neural network and machine learning, which comprises the steps of data preprocessing, CNN model construction, model pruning, CNN extraction of high-level feature vectors and LightGBM classification. According to the encrypted flow identification method based on the pruned convolutional neural network and the machine learning, manual feature extraction is not needed, the CNN model is used for automatically extracting high-level features from an original flow file and classifying the high-level features, meanwhile, the pruned convolutional neural network model is constructed, the number of model parameters is reduced, the calculation overhead is reduced, the LightGBM is used for classifying according to the high-level features of the encrypted flow, a weak classifier is used for achieving a strong classification effect, the accuracy is improved, and the final model can achieve higher performance and accuracy than other classification models.
Description
Technical Field
The invention relates to the technical field of network traffic identification, in particular to an encrypted traffic identification method based on a pruning convolutional neural network and machine learning.
Background
Network traffic identification techniques have an important role in applications such as network quality of service control, traffic charging, network resource usage planning, and malware detection. With the continuous development of network information technology, more and more software uses SSL, SSH, VPN, tor and other encryption or port obfuscation technologies, and the proportion of encrypted traffic is higher and higher.
The survey statistics agency, netmarkertschare, states that by 10 months of 2019, the proportion of encrypted Web traffic has exceeded nine percent, the existing 90 sites of HTTPS are used by default in the top 100 ranked non-Google websites on the internet, and globally, the HTTPS proportion in the united states is 92%, russia is 85%, japan is 80%, and indonesia is 74%. This change presents new challenges to current traffic detection methods, making network traffic identification and analysis increasingly difficult.
The premise of traffic classification is that the characteristics of different traffic are unique, and the current traffic classification methods can be roughly classified into the following methods:
1) Port-based classification methods. The method distinguishes different traffic types according to the port numbers used by the traffic on the premise that the application services all use the ports allocated by the IANA and keep the ports unchanged.
2) Payload-based classification method. The method is also called deep packet inspection, namely protocols are distinguished according to static payload characteristics, and the method can be used for some coarse-grained traffic classification.
3) A statistical-based classification method. The method adopts more machine learning techniques, and distinguishes different types according to the statistical characteristics of the flow. These characteristics can be roughly classified into two types, namely packet level and flow level, the former includes some packet length, packet inter-arrival time and direction, and the latter includes some uplink and downlink traffic packet quantity, network flow time length, proportion of different types of traffic packets, and the like.
Current traffic classification methods have the following disadvantages:
1) The port-based classification method has the advantages that when the application software uses the ports beyond IANA regulations, the accuracy rate is greatly reduced, and the malicious software traffic uses random or dynamic ports, so the method cannot identify the malicious software traffic.
2) The classification method based on the effective load can destroy the load characteristics which the flow depends on after being encrypted, and is only suitable for coarse-grained flow classification or incompletely encrypted scenes.
3) The classification method based on deep learning has the advantages that the trained classification model has huge parameter quantity, and the deployment condition of the model is limited.
Disclosure of Invention
Aiming at the technical problem that the parameter quantity of a classification model trained by a deep learning-based classification method is huge, the invention provides an encrypted flow identification method based on a pruning convolutional neural network and machine learning, the characteristics do not need to be manually extracted, high-level characteristics are directly and automatically extracted from an original flow file and classified, the model is pruned, the parameter quantity of the model is reduced, the convolutional neural network is used for extracting the characteristics automatically, the LightGBM achieves the effect of strong classification by using a weak classifier, the final model can achieve higher performance and higher accuracy than other classification models, and the encrypted flow identification method is suitable for efficient detection of encrypted flow.
In order to achieve the above purpose, the invention provides the following technical scheme:
the invention provides an encrypted flow identification method based on a pruning convolutional neural network and machine learning, which comprises the following steps of:
s1: preprocessing data;
s2: a CNN model is constructed, and a convolutional neural network mainly comprises the following layers: an input layer, a convolution layer, a ReLU layer, a pool layer and a full connection layer;
s3: pruning the model, retraining the model, and obtaining an optimized CNN model after a plurality of iterations;
s4: outputting a 256-dimensional characteristic vector serving as the input of the LightGBM classifier by the optimized CNN model;
s5: the LightGBM classification is characterized in that a gradient decision tree in a LightGBM algorithm is obtained by carrying out multiple iterations on a given training data set, during each iteration, a new tree is readjusted by using gradient information to add into a previous iteration tree, the process is a continuously-changing linear combination process in a function space, the LightGBM integrates weights of all leaf nodes as references for building the tree, then partition points are determined, a first-order gradient and a second-order gradient are calculated, and after multiple iterations, the performance of the LightGBM classifier is enabled to reach the optimum.
Compared with the prior art, the invention has the following beneficial effects:
according to the encrypted flow identification method based on the pruning convolutional neural network and machine learning, manual feature extraction is not needed, high-level features are automatically extracted from original flow files by using a CNN model and are classified, meanwhile, the pruning convolutional neural network model is constructed, the number of model parameters is reduced, the calculation cost is reduced, the lightGBM is used for classification according to the high-level features of encrypted flow, a weak classifier is used for achieving a strong classification effect, the accuracy is improved, and the final model can achieve higher performance and accuracy than other classification models.
Drawings
In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a flowchart of an encrypted traffic identification method based on a pruning convolutional neural network and machine learning according to an embodiment of the present invention.
Fig. 2 is a flow chart of data preprocessing according to an embodiment of the present invention.
Fig. 3 is a flowchart of pruning steps provided by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention provides an encrypted traffic identification method based on a pruning convolutional neural network and machine learning, which comprises the following steps as shown in figure 1:
s1: data preprocessing: processing the original flow file to be suitable for standard input of a CNN model;
the encrypted traffic input at step S1 uses public data set iscxnvpn 2016, which contains 6 traditional encrypted traffic types: email, chat, streaming, file transfer, voIP and P2P,6 corresponding VPN encrypted flows: VPN-Email, VPN-Chat, VPN-Streaming, VPN-File transfer, VPN-VoIP, and VPN-P2P. The traffic data are obtained by Wireshark and tcpdump tools in real environment, and the total volume is 28GB.
The specific flow of the data preprocessing step is shown in fig. 2. The key points are as follows:
removing irrelevant messages: i.e. removing packets that affect the model prediction or that are payload empty. The traffic in the real environment may include some packets for establishing and disconnecting TCP, such as packets including SYN, ACK, or FIN flag bits, and some packets for domain name resolution and packets with empty payload, which do not work for traffic classification, but rather affect classification accuracy, and therefore need to be removed.
Removing the Ethernet frame head: the ethernet frame header contains the MAC address for confirming the location of the network device and for transmitting data packets between the network nodes, but has little effect in traffic classification, so the ethernet frame header needs to be deleted.
Masking the IP address: the IP address has an overfitting effect on the model in traffic classification, and the source IP address and the target IP address need to be set to 0.
Checking the packet length: the method uses a convolutional neural network, requires a fixed-size input, but the length of the data packet is not constant, so to check the length of the data packet, if the length is smaller than the specified input size, zero padding is required at the end of the data packet. If the length is greater than the fixed input size, the packet needs to be truncated. Ensuring that the length of the traffic packet conforms to the input size of the CNN model.
Normalization: different evaluation indexes often have different dimensions, and in order to solve the comparability between the data indexes, the data packet needs to be subjected to normalization processing, and the normalization processing is divided by 255 in units of bytes, so that the input sizes are all between 0 and 1.
S2: and (5) constructing a CNN model.
The convolutional neural network is a feed-forward neural network comprising convolution calculation and having a deep structure, and is one of the most popular deep learning algorithms at present. With the deepening of learning theory and the improvement of computing performance, convolutional neural networks have been rapidly developed and applied to computer vision, natural language processing, and the like. The convolutional neural network is mainly composed of the following layers: input layer, convolutional layer, reLU layer, pool layer, and fully-connected layer. By overlapping the layers, a complete convolutional neural network is formed, and finally, 256-dimensional feature vectors are output for use by a subsequent LightGBM classifier. The output feature vector dimension is too high, so that the result is over-fitted and the cost is increased, and the classification accuracy is reduced due to too low dimension. The structure of the CNN model used in the present invention is shown in table 1, wherein the key points are as follows:
a convolutional layer: conv2D, two-dimensional convolution, the flow data packet can be converted into a gray image, more suitable for processing with two-dimensional convolution.
Activation function: reLU, as shown in equation (1), activates a node only when the input is greater than 0, the output is 0 when the input is less than 0, and the output is equal to the input when the input is greater than 0. The function can remove negative values in the convolution result, leaving positive values unchanged.
ReLU(x)=max(0,x) (1)
Batch standardization: batch Normalization, similar to normal data Normalization, is a method for unifying scattered data and optimizing a neural network, and divides data into small batches for random gradient descent. As shown in formula (2), wherein α i Is the value of the original activation of a certain neuron,is a standard value after standardized operation.
Loss function: the cross entropy Loss function (CrossEntropy Loss) represents the difference between the true probability distribution and the predicted probability distribution as shown in formula (3), and the smaller the value of the cross entropy, the better the model prediction effect.
Activation function of output layer: softmax, when a sample passes through Softmax layer and outputs a vector of T × 1, the index of the number with the largest value in the vector is taken as the prediction label of the sample, and the formula is shown in (4).
Dropout: and (3) stopping training some neurons randomly during training, improving the robustness of the model, and setting dropout to be 0.5 by the model.
TABLE 1 CNN model Main parameters
Network layer | Operation of | Input device | Convolution kernel | Step size | Filling in | Output the output | Number of weights |
1 | Conv2D+ReLU+BN | 30*30 | 3*3 | 1 | Same | 8*30*30 | 80 |
2 | Conv2D+ReLU+BN | 8*30*30 | 3*3 | 2 | Same | 16*14*14 | 1168 |
3 | Conv2D+ReLU+BN | 16*14*14 | 3*3 | 2 | Same | 32*6*6 | 4640 |
4 | Conv2D+ReLU+BN | 32*6*6 | 3*3 | 1 | Same | 64*4*4 | 18496 |
5 | Full connection + Dropout | 64*4*4 | Null | Null | None | 256 | 262400 |
S3: pruning the model, retraining the model, and obtaining an optimized CNN model after a plurality of iterations;
generally speaking, the more layers and parameters of a neural network, the better the result, but at the same time, the more computing resources consumed. Therefore, parameters which have small influence on the prediction result can be removed by using a pruning technology, the neuron with low contribution degree is abandoned according to the ranking of the neuron contribution degree of the model to the output result, and the model has higher running speed and smaller model files. As shown in fig. 3, assuming that the first layer has 4 neurons and the second layer has 5 neurons, the corresponding weight matrix is 4 × 5 in size. The pruning process is as follows:
sorting the weights of the adjacent two layers of neurons according to the absolute value;
the smaller absolute value (e.g. 0.4) weight is pruned, i.e. set to 0, based on the pruning rate P.
After pruning, the model is retrained and an optimized CNN model is obtained after a plurality of iterations.
S4: outputting a 256-dimensional characteristic vector serving as the input of the LightGBM classifier by the optimized CNN model;
s5: lightGBM classification.
The LightGBM is a framework for realizing a GBDT algorithm, the GBDT is a model with a long and prosperous life in machine learning, the main idea is to use a weak classifier (decision tree) to carry out iterative training to obtain an optimal model, and the model has the advantages of good training effect, difficulty in overfitting and the like. Compared with the conventional CNN full connection layer used as a classifier, the LightGBM classifier supports high-efficiency parallel training, has higher training speed and lower memory consumption, supports distributed rapid processing of mass data, and reduces the deployment requirement of a detection model.
The gradient decision tree in LightGBM algorithm is obtained by performing multiple iterations on a given training data set, and in each iteration, a new tree is readjusted by using gradient information to join a previous iteration tree, and in function space, the above process is a continuously changing linear combination process, as shown in formula (6):
χ is the function space of the iterative tree, f q (x i ) The predicted value of the ith instance in the qth tree is represented.
Each segmentation node of the tree adopts an optimal segmentation point, and a greedy method is actually used in the process of building the tree model. The LightGBM integrates the weights of all leaf nodes as a reference to construct the tree, then determines the partitioning points and computes the first and second order gradients.
For any given tree structure, lightGBM defines the total number of times each feature is partitioned in the iterative tree, T _ Split, and the sum of gains T _ Gain that the feature is partitioned in all decision trees as metrics for measuring feature importance, as follows:
and K is K decision trees generated by K rounds of iteration.
After multiple iterations, the LightGBM classifier performance is optimized.
Compared with the original CNN model classification, the LightGBM improves the accuracy and recall rate and increases the recognition speed.
According to the encrypted flow identification method based on the pruning convolutional neural network and machine learning, provided by the invention, the characteristics do not need to be manually extracted, the CNN model is used for automatically extracting high-level characteristics from an original flow file and classifying the high-level characteristics, meanwhile, the pruning convolutional neural network model is constructed, the model parameter quantity is reduced, the calculation cost is reduced, the LightGBM is used for classifying according to the high-level characteristics of the encrypted flow, a weak classifier is used for achieving the effect of strong classification, the accuracy is improved, and the final model can achieve higher performance and accuracy than other classification models (see table 2).
TABLE 2 comparison of the models of the present application with other classification models
Method | Rate of accuracy | Recall rate | F1 value |
1D CNN | 0.89 | 0.89 | 0.89 |
CNN+LSTM | 0.91 | 0.91 | 0.91 |
SAE | 0.92 | 0.92 | 0.92 |
2D-CNN | 0.91 | 0.91 | 0.91 |
Model before pruning | 0.90 | 0.86 | 0.88 |
Post-pruning model (application) | 0.94 | 0.93 | 0.93 |
It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, apparatus embodiments, electronic device embodiments, computer-readable storage medium embodiments, and computer program product embodiments are described with relative simplicity as they are substantially similar to method embodiments, where relevant only as described in portions of the method embodiments.
The above-mentioned embodiments are only specific embodiments of the present application, and are used to illustrate the technical solutions of the present application, but not to limit the technical solutions, and the scope of the present application is not limited to the above-mentioned embodiments, although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: those skilled in the art can still make modifications or easily conceive of changes to the technical solutions described in the foregoing embodiments, or make equivalents to some of them, within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (9)
1. A method for identifying encrypted traffic based on a pruning convolutional neural network and machine learning is characterized by comprising the following steps:
s1: preprocessing data;
s2: a CNN model is constructed, and a convolutional neural network mainly comprises the following layers: an input layer, a convolution layer, a ReLU layer, a pool layer and a full connection layer;
s3: pruning the model, retraining the model, and obtaining an optimized CNN model after a plurality of iterations;
s4: outputting a 256-dimensional characteristic vector serving as input of the LightGBM classifier by the optimized CNN model;
s5: the LightGBM classification is characterized in that a gradient decision tree in a LightGBM algorithm is obtained by carrying out multiple iterations on a given training data set, during each iteration, a new tree is readjusted by using gradient information to add into a previous iteration tree, the process is a continuously-changing linear combination process in a function space, the LightGBM integrates weights of all leaf nodes as references for building the tree, then partition points are determined, a first-order gradient and a second-order gradient are calculated, and after multiple iterations, the performance of the LightGBM classifier is enabled to reach the optimum.
2. The method for identifying encrypted traffic based on pruned convolutional neural network and machine learning of claim 1, wherein the encrypted traffic inputted in step S1 uses public data set iscxnvpn 2016, which contains 6 traditional encrypted traffic: email, chat, streaming, file transfer, voIP and P2P,6 corresponding VPN encrypted flows: VPN-Email, VPN-Chat, VPN-Streaming, VPN-File transfer, VPN-VoIP, and VPN-P2P.
3. The encrypted traffic identification method based on the pruned convolutional neural network and the machine learning of claim 2, wherein the traffic data input in step S1 are all obtained by Wireshark and tcpdump tools in a real environment, and the total amount is 28GB.
4. The encrypted traffic identification method based on the pruning convolutional neural network and the machine learning according to claim 1, wherein the data preprocessing process in the step S1 comprises:
s11: reading a pcap file;
s12: irrelevant messages are removed;
s13: removing the Ethernet frame header;
s14: covering the IP address;
s15: checking whether the packet length is larger than a specified input size, if so, truncating the data packet, otherwise, performing zero padding at the tail of the data packet to generate a byte matrix;
s16: the packets are normalized and divided by 255 in bytes so that the input sizes are all between 0 and 1.
5. The encrypted traffic identification method based on pruning convolutional neural network and machine learning according to claim 1, wherein step S2 is to form a complete convolutional neural network by stacking the input layer, convolutional layer, reLU layer, pool layer and full connection layer;
wherein the convolutional layer is a two-dimensional convolution;
the activation function ReLU is shown in equation (1):
ReLU(x)=max(O,x) (I)
batch normalization is shown in equation (2):
wherein alpha is i Is the value of the original activation of a certain neuron,the standard value is a standard value after standardized operation;
the loss function is shown in equation (3):
the activation function Softmax formula of the output layer is shown as (4):
the model sets dropout to 0.5.
6. The encrypted traffic identification method based on the pruning convolutional neural network and the machine learning according to claim 1, wherein the main parameters of the CNN model in the step S2 are:
7. The encrypted traffic identification method based on the pruning convolutional neural network and the machine learning according to claim 1, wherein the pruning process in the step S3 is as follows:
s31: sorting the weights of the adjacent two layers of neurons according to the absolute value;
s32: according to the pruning speed P, the weight with the absolute value less than 0.4 is pruned, namely the weight is set to be 0;
s33: after pruning, the model is retrained and an optimized CNN model is obtained after a plurality of iterations.
8. The encrypted traffic recognition method based on pruned convolutional neural network and machine learning of claim 1, wherein the continuously varying linear combination process of step S5 is as shown in formula (6):
χ is the function space of the iterative tree, f q (x i ) Indicating the predicted value of the ith example in the qth tree.
9. The method for identifying encrypted traffic based on pruned convolutional neural network and machine learning of claim 1, wherein step S5, for any given tree structure, lightGBM defines the total number of times T _ Split each feature is Split in the iterative tree and the total Gain T _ Gain of the feature after being Split in all decision trees as the metric for measuring the feature importance, which is specifically defined as follows:
and K is K decision trees generated by K rounds of iteration.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210337870.2A CN115334005B (en) | 2022-03-31 | 2022-03-31 | Encryption flow identification method based on pruning convolutional neural network and machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210337870.2A CN115334005B (en) | 2022-03-31 | 2022-03-31 | Encryption flow identification method based on pruning convolutional neural network and machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115334005A true CN115334005A (en) | 2022-11-11 |
CN115334005B CN115334005B (en) | 2024-03-22 |
Family
ID=83916441
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210337870.2A Active CN115334005B (en) | 2022-03-31 | 2022-03-31 | Encryption flow identification method based on pruning convolutional neural network and machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115334005B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116743506A (en) * | 2023-08-14 | 2023-09-12 | 南京信息工程大学 | Encrypted flow identification method and device based on quaternion convolutional neural network |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180357542A1 (en) * | 2018-06-08 | 2018-12-13 | University Of Electronic Science And Technology Of China | 1D-CNN-Based Distributed Optical Fiber Sensing Signal Feature Learning and Classification Method |
CN110472778A (en) * | 2019-07-29 | 2019-11-19 | 上海电力大学 | A kind of short-term load forecasting method based on Blending integrated study |
CN111860628A (en) * | 2020-07-08 | 2020-10-30 | 上海乘安科技集团有限公司 | Deep learning-based traffic identification and feature extraction method |
CN112380781A (en) * | 2020-11-30 | 2021-02-19 | 中国人民解放军国防科技大学 | Satellite observation completion method based on reanalysis data and unbalanced learning |
WO2021088499A1 (en) * | 2019-11-04 | 2021-05-14 | 西安交通大学 | False invoice issuing identification method and system based on dynamic network representation |
CN113159109A (en) * | 2021-03-04 | 2021-07-23 | 北京邮电大学 | Wireless network flow prediction method based on data driving |
WO2021190379A1 (en) * | 2020-03-25 | 2021-09-30 | 第四范式(北京)技术有限公司 | Method and device for realizing automatic machine learning |
CN113489751A (en) * | 2021-09-07 | 2021-10-08 | 浙江大学 | Network traffic filtering rule conversion method based on deep learning |
CN113537497A (en) * | 2021-06-07 | 2021-10-22 | 贵州优联博睿科技有限公司 | Gradient lifting decision tree model construction optimization method based on dynamic sampling |
CN113779608A (en) * | 2021-09-17 | 2021-12-10 | 神谱科技(上海)有限公司 | Data protection method based on WOE mask in multi-party longitudinal federal learning LightGBM training |
CN113901448A (en) * | 2021-09-03 | 2022-01-07 | 燕山大学 | Intrusion detection method based on convolutional neural network and lightweight gradient elevator |
WO2022041394A1 (en) * | 2020-08-28 | 2022-03-03 | 南京邮电大学 | Method and apparatus for identifying network encrypted traffic |
CN114189350A (en) * | 2021-10-20 | 2022-03-15 | 北京交通大学 | LightGBM-based train communication network intrusion detection method |
-
2022
- 2022-03-31 CN CN202210337870.2A patent/CN115334005B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180357542A1 (en) * | 2018-06-08 | 2018-12-13 | University Of Electronic Science And Technology Of China | 1D-CNN-Based Distributed Optical Fiber Sensing Signal Feature Learning and Classification Method |
CN110472778A (en) * | 2019-07-29 | 2019-11-19 | 上海电力大学 | A kind of short-term load forecasting method based on Blending integrated study |
WO2021088499A1 (en) * | 2019-11-04 | 2021-05-14 | 西安交通大学 | False invoice issuing identification method and system based on dynamic network representation |
WO2021190379A1 (en) * | 2020-03-25 | 2021-09-30 | 第四范式(北京)技术有限公司 | Method and device for realizing automatic machine learning |
CN111860628A (en) * | 2020-07-08 | 2020-10-30 | 上海乘安科技集团有限公司 | Deep learning-based traffic identification and feature extraction method |
WO2022041394A1 (en) * | 2020-08-28 | 2022-03-03 | 南京邮电大学 | Method and apparatus for identifying network encrypted traffic |
CN112380781A (en) * | 2020-11-30 | 2021-02-19 | 中国人民解放军国防科技大学 | Satellite observation completion method based on reanalysis data and unbalanced learning |
CN113159109A (en) * | 2021-03-04 | 2021-07-23 | 北京邮电大学 | Wireless network flow prediction method based on data driving |
CN113537497A (en) * | 2021-06-07 | 2021-10-22 | 贵州优联博睿科技有限公司 | Gradient lifting decision tree model construction optimization method based on dynamic sampling |
CN113901448A (en) * | 2021-09-03 | 2022-01-07 | 燕山大学 | Intrusion detection method based on convolutional neural network and lightweight gradient elevator |
CN113489751A (en) * | 2021-09-07 | 2021-10-08 | 浙江大学 | Network traffic filtering rule conversion method based on deep learning |
CN113779608A (en) * | 2021-09-17 | 2021-12-10 | 神谱科技(上海)有限公司 | Data protection method based on WOE mask in multi-party longitudinal federal learning LightGBM training |
CN114189350A (en) * | 2021-10-20 | 2022-03-15 | 北京交通大学 | LightGBM-based train communication network intrusion detection method |
Non-Patent Citations (5)
Title |
---|
CHAOQUN KANG: "Research_on_condition_assessment_for_distribution_vacuum_switch_cabinets_based_on_multi-source_information_fusion", 《2015 5TH INTERNATIONAL CONFERENCE ON ELECTRIC UTILITY DEREGULATION AND RESTRUCTURING AND POWER TECHNOLOGIES (DRPT)》 * |
李道全;王雪;于波;黄泰铭;: "基于一维卷积神经网络的网络流量分类方法", 计算机工程与应用, no. 03 * |
董浩;李烨;: "基于卷积神经网络的复杂网络加密流量识别", 软件导刊, no. 09 * |
陈诗雨;李小勇;杜杨杨;谢福起;: "Fourier神经网络非线性拟合性能优化研究", 武汉大学学报(工学版), no. 03 * |
顾兆军;吴优;赵春迪;周景贤;: "流量的集成学习与重采样均衡分类方法", 计算机工程与应用, no. 06 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116743506A (en) * | 2023-08-14 | 2023-09-12 | 南京信息工程大学 | Encrypted flow identification method and device based on quaternion convolutional neural network |
CN116743506B (en) * | 2023-08-14 | 2023-11-21 | 南京信息工程大学 | Encrypted flow identification method and device based on quaternion convolutional neural network |
Also Published As
Publication number | Publication date |
---|---|
CN115334005B (en) | 2024-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110730140B (en) | Deep learning flow classification method based on combination of space-time characteristics | |
CN109951444B (en) | Encrypted anonymous network traffic identification method | |
CN111191767B (en) | Vectorization-based malicious traffic attack type judging method | |
CN112769752B (en) | Network intrusion detection method based on machine learning integration model | |
CN111565156B (en) | Method for identifying and classifying network traffic | |
CN113989583A (en) | Method and system for detecting malicious traffic of internet | |
CN113472751B (en) | Encrypted flow identification method and device based on data packet header | |
CN114615093A (en) | Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning | |
CN111817971B (en) | Data center network flow splicing method based on deep learning | |
CN113821793B (en) | Multi-stage attack scene construction method and system based on graph convolution neural network | |
Zhang et al. | Autonomous model update scheme for deep learning based network traffic classifiers | |
Soleymanpour et al. | An efficient deep learning method for encrypted traffic classification on the web | |
CN111367908A (en) | Incremental intrusion detection method and system based on security assessment mechanism | |
CN114513367B (en) | Cellular network anomaly detection method based on graph neural network | |
CN115334005A (en) | Encrypted flow identification method based on pruning convolution neural network and machine learning | |
Chen et al. | Ride: Real-time intrusion detection via explainable machine learning implemented in a memristor hardware architecture | |
Yujie et al. | End-to-end android malware classification based on pure traffic images | |
CN112839051A (en) | Encryption flow real-time classification method and device based on convolutional neural network | |
US11461590B2 (en) | Train a machine learning model using IP addresses and connection contexts | |
Dener et al. | RFSE-GRU: Data balanced classification model for mobile encrypted traffic in big data environment | |
CN111291078A (en) | Domain name matching detection method and device | |
CN113746707B (en) | Encrypted traffic classification method based on classifier and network structure | |
Wanode et al. | Optimal feature set selection for IoT device fingerprinting on edge infrastructure using machine intelligence | |
Li et al. | Fden: Mining effective information of features in detecting network anomalies | |
Nigmatullin et al. | Accumulated Generalized Mean Value-a New Approach to Flow-Based Feature Generation for Encrypted Traffic Characterization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |