CN115334005A - Encrypted flow identification method based on pruning convolution neural network and machine learning - Google Patents

Encrypted flow identification method based on pruning convolution neural network and machine learning Download PDF

Info

Publication number
CN115334005A
CN115334005A CN202210337870.2A CN202210337870A CN115334005A CN 115334005 A CN115334005 A CN 115334005A CN 202210337870 A CN202210337870 A CN 202210337870A CN 115334005 A CN115334005 A CN 115334005A
Authority
CN
China
Prior art keywords
neural network
pruning
convolutional neural
model
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210337870.2A
Other languages
Chinese (zh)
Other versions
CN115334005B (en
Inventor
李小勇
栗仕超
刘芸杉
亢超群
李二霞
李灵慧
苑洁
高雅丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Online Shanghai Energy Internet Research Institute Co ltd
Beijing University of Posts and Telecommunications
Original Assignee
China Online Shanghai Energy Internet Research Institute Co ltd
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Online Shanghai Energy Internet Research Institute Co ltd, Beijing University of Posts and Telecommunications filed Critical China Online Shanghai Energy Internet Research Institute Co ltd
Priority to CN202210337870.2A priority Critical patent/CN115334005B/en
Publication of CN115334005A publication Critical patent/CN115334005A/en
Application granted granted Critical
Publication of CN115334005B publication Critical patent/CN115334005B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an encrypted traffic identification method based on a pruning convolutional neural network and machine learning, which comprises the steps of data preprocessing, CNN model construction, model pruning, CNN extraction of high-level feature vectors and LightGBM classification. According to the encrypted flow identification method based on the pruned convolutional neural network and the machine learning, manual feature extraction is not needed, the CNN model is used for automatically extracting high-level features from an original flow file and classifying the high-level features, meanwhile, the pruned convolutional neural network model is constructed, the number of model parameters is reduced, the calculation overhead is reduced, the LightGBM is used for classifying according to the high-level features of the encrypted flow, a weak classifier is used for achieving a strong classification effect, the accuracy is improved, and the final model can achieve higher performance and accuracy than other classification models.

Description

Encrypted flow identification method based on pruning convolution neural network and machine learning
Technical Field
The invention relates to the technical field of network traffic identification, in particular to an encrypted traffic identification method based on a pruning convolutional neural network and machine learning.
Background
Network traffic identification techniques have an important role in applications such as network quality of service control, traffic charging, network resource usage planning, and malware detection. With the continuous development of network information technology, more and more software uses SSL, SSH, VPN, tor and other encryption or port obfuscation technologies, and the proportion of encrypted traffic is higher and higher.
The survey statistics agency, netmarkertschare, states that by 10 months of 2019, the proportion of encrypted Web traffic has exceeded nine percent, the existing 90 sites of HTTPS are used by default in the top 100 ranked non-Google websites on the internet, and globally, the HTTPS proportion in the united states is 92%, russia is 85%, japan is 80%, and indonesia is 74%. This change presents new challenges to current traffic detection methods, making network traffic identification and analysis increasingly difficult.
The premise of traffic classification is that the characteristics of different traffic are unique, and the current traffic classification methods can be roughly classified into the following methods:
1) Port-based classification methods. The method distinguishes different traffic types according to the port numbers used by the traffic on the premise that the application services all use the ports allocated by the IANA and keep the ports unchanged.
2) Payload-based classification method. The method is also called deep packet inspection, namely protocols are distinguished according to static payload characteristics, and the method can be used for some coarse-grained traffic classification.
3) A statistical-based classification method. The method adopts more machine learning techniques, and distinguishes different types according to the statistical characteristics of the flow. These characteristics can be roughly classified into two types, namely packet level and flow level, the former includes some packet length, packet inter-arrival time and direction, and the latter includes some uplink and downlink traffic packet quantity, network flow time length, proportion of different types of traffic packets, and the like.
Current traffic classification methods have the following disadvantages:
1) The port-based classification method has the advantages that when the application software uses the ports beyond IANA regulations, the accuracy rate is greatly reduced, and the malicious software traffic uses random or dynamic ports, so the method cannot identify the malicious software traffic.
2) The classification method based on the effective load can destroy the load characteristics which the flow depends on after being encrypted, and is only suitable for coarse-grained flow classification or incompletely encrypted scenes.
3) The classification method based on deep learning has the advantages that the trained classification model has huge parameter quantity, and the deployment condition of the model is limited.
Disclosure of Invention
Aiming at the technical problem that the parameter quantity of a classification model trained by a deep learning-based classification method is huge, the invention provides an encrypted flow identification method based on a pruning convolutional neural network and machine learning, the characteristics do not need to be manually extracted, high-level characteristics are directly and automatically extracted from an original flow file and classified, the model is pruned, the parameter quantity of the model is reduced, the convolutional neural network is used for extracting the characteristics automatically, the LightGBM achieves the effect of strong classification by using a weak classifier, the final model can achieve higher performance and higher accuracy than other classification models, and the encrypted flow identification method is suitable for efficient detection of encrypted flow.
In order to achieve the above purpose, the invention provides the following technical scheme:
the invention provides an encrypted flow identification method based on a pruning convolutional neural network and machine learning, which comprises the following steps of:
s1: preprocessing data;
s2: a CNN model is constructed, and a convolutional neural network mainly comprises the following layers: an input layer, a convolution layer, a ReLU layer, a pool layer and a full connection layer;
s3: pruning the model, retraining the model, and obtaining an optimized CNN model after a plurality of iterations;
s4: outputting a 256-dimensional characteristic vector serving as the input of the LightGBM classifier by the optimized CNN model;
s5: the LightGBM classification is characterized in that a gradient decision tree in a LightGBM algorithm is obtained by carrying out multiple iterations on a given training data set, during each iteration, a new tree is readjusted by using gradient information to add into a previous iteration tree, the process is a continuously-changing linear combination process in a function space, the LightGBM integrates weights of all leaf nodes as references for building the tree, then partition points are determined, a first-order gradient and a second-order gradient are calculated, and after multiple iterations, the performance of the LightGBM classifier is enabled to reach the optimum.
Compared with the prior art, the invention has the following beneficial effects:
according to the encrypted flow identification method based on the pruning convolutional neural network and machine learning, manual feature extraction is not needed, high-level features are automatically extracted from original flow files by using a CNN model and are classified, meanwhile, the pruning convolutional neural network model is constructed, the number of model parameters is reduced, the calculation cost is reduced, the lightGBM is used for classification according to the high-level features of encrypted flow, a weak classifier is used for achieving a strong classification effect, the accuracy is improved, and the final model can achieve higher performance and accuracy than other classification models.
Drawings
In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a flowchart of an encrypted traffic identification method based on a pruning convolutional neural network and machine learning according to an embodiment of the present invention.
Fig. 2 is a flow chart of data preprocessing according to an embodiment of the present invention.
Fig. 3 is a flowchart of pruning steps provided by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention provides an encrypted traffic identification method based on a pruning convolutional neural network and machine learning, which comprises the following steps as shown in figure 1:
s1: data preprocessing: processing the original flow file to be suitable for standard input of a CNN model;
the encrypted traffic input at step S1 uses public data set iscxnvpn 2016, which contains 6 traditional encrypted traffic types: email, chat, streaming, file transfer, voIP and P2P,6 corresponding VPN encrypted flows: VPN-Email, VPN-Chat, VPN-Streaming, VPN-File transfer, VPN-VoIP, and VPN-P2P. The traffic data are obtained by Wireshark and tcpdump tools in real environment, and the total volume is 28GB.
The specific flow of the data preprocessing step is shown in fig. 2. The key points are as follows:
removing irrelevant messages: i.e. removing packets that affect the model prediction or that are payload empty. The traffic in the real environment may include some packets for establishing and disconnecting TCP, such as packets including SYN, ACK, or FIN flag bits, and some packets for domain name resolution and packets with empty payload, which do not work for traffic classification, but rather affect classification accuracy, and therefore need to be removed.
Removing the Ethernet frame head: the ethernet frame header contains the MAC address for confirming the location of the network device and for transmitting data packets between the network nodes, but has little effect in traffic classification, so the ethernet frame header needs to be deleted.
Masking the IP address: the IP address has an overfitting effect on the model in traffic classification, and the source IP address and the target IP address need to be set to 0.
Checking the packet length: the method uses a convolutional neural network, requires a fixed-size input, but the length of the data packet is not constant, so to check the length of the data packet, if the length is smaller than the specified input size, zero padding is required at the end of the data packet. If the length is greater than the fixed input size, the packet needs to be truncated. Ensuring that the length of the traffic packet conforms to the input size of the CNN model.
Normalization: different evaluation indexes often have different dimensions, and in order to solve the comparability between the data indexes, the data packet needs to be subjected to normalization processing, and the normalization processing is divided by 255 in units of bytes, so that the input sizes are all between 0 and 1.
S2: and (5) constructing a CNN model.
The convolutional neural network is a feed-forward neural network comprising convolution calculation and having a deep structure, and is one of the most popular deep learning algorithms at present. With the deepening of learning theory and the improvement of computing performance, convolutional neural networks have been rapidly developed and applied to computer vision, natural language processing, and the like. The convolutional neural network is mainly composed of the following layers: input layer, convolutional layer, reLU layer, pool layer, and fully-connected layer. By overlapping the layers, a complete convolutional neural network is formed, and finally, 256-dimensional feature vectors are output for use by a subsequent LightGBM classifier. The output feature vector dimension is too high, so that the result is over-fitted and the cost is increased, and the classification accuracy is reduced due to too low dimension. The structure of the CNN model used in the present invention is shown in table 1, wherein the key points are as follows:
a convolutional layer: conv2D, two-dimensional convolution, the flow data packet can be converted into a gray image, more suitable for processing with two-dimensional convolution.
Activation function: reLU, as shown in equation (1), activates a node only when the input is greater than 0, the output is 0 when the input is less than 0, and the output is equal to the input when the input is greater than 0. The function can remove negative values in the convolution result, leaving positive values unchanged.
ReLU(x)=max(0,x) (1)
Batch standardization: batch Normalization, similar to normal data Normalization, is a method for unifying scattered data and optimizing a neural network, and divides data into small batches for random gradient descent. As shown in formula (2), wherein α i Is the value of the original activation of a certain neuron,
Figure BDA0003575066160000051
is a standard value after standardized operation.
Figure BDA0003575066160000052
Loss function: the cross entropy Loss function (CrossEntropy Loss) represents the difference between the true probability distribution and the predicted probability distribution as shown in formula (3), and the smaller the value of the cross entropy, the better the model prediction effect.
Figure BDA0003575066160000053
Activation function of output layer: softmax, when a sample passes through Softmax layer and outputs a vector of T × 1, the index of the number with the largest value in the vector is taken as the prediction label of the sample, and the formula is shown in (4).
Figure BDA0003575066160000054
Dropout: and (3) stopping training some neurons randomly during training, improving the robustness of the model, and setting dropout to be 0.5 by the model.
TABLE 1 CNN model Main parameters
Network layer Operation of Input device Convolution kernel Step size Filling in Output the output Number of weights
1 Conv2D+ReLU+BN 30*30 3*3 1 Same 8*30*30 80
2 Conv2D+ReLU+BN 8*30*30 3*3 2 Same 16*14*14 1168
3 Conv2D+ReLU+BN 16*14*14 3*3 2 Same 32*6*6 4640
4 Conv2D+ReLU+BN 32*6*6 3*3 1 Same 64*4*4 18496
5 Full connection + Dropout 64*4*4 Null Null None 256 262400
S3: pruning the model, retraining the model, and obtaining an optimized CNN model after a plurality of iterations;
generally speaking, the more layers and parameters of a neural network, the better the result, but at the same time, the more computing resources consumed. Therefore, parameters which have small influence on the prediction result can be removed by using a pruning technology, the neuron with low contribution degree is abandoned according to the ranking of the neuron contribution degree of the model to the output result, and the model has higher running speed and smaller model files. As shown in fig. 3, assuming that the first layer has 4 neurons and the second layer has 5 neurons, the corresponding weight matrix is 4 × 5 in size. The pruning process is as follows:
sorting the weights of the adjacent two layers of neurons according to the absolute value;
the smaller absolute value (e.g. 0.4) weight is pruned, i.e. set to 0, based on the pruning rate P.
After pruning, the model is retrained and an optimized CNN model is obtained after a plurality of iterations.
S4: outputting a 256-dimensional characteristic vector serving as the input of the LightGBM classifier by the optimized CNN model;
s5: lightGBM classification.
The LightGBM is a framework for realizing a GBDT algorithm, the GBDT is a model with a long and prosperous life in machine learning, the main idea is to use a weak classifier (decision tree) to carry out iterative training to obtain an optimal model, and the model has the advantages of good training effect, difficulty in overfitting and the like. Compared with the conventional CNN full connection layer used as a classifier, the LightGBM classifier supports high-efficiency parallel training, has higher training speed and lower memory consumption, supports distributed rapid processing of mass data, and reduces the deployment requirement of a detection model.
The gradient decision tree in LightGBM algorithm is obtained by performing multiple iterations on a given training data set, and in each iteration, a new tree is readjusted by using gradient information to join a previous iteration tree, and in function space, the above process is a continuously changing linear combination process, as shown in formula (6):
Figure BDA0003575066160000071
χ is the function space of the iterative tree, f q (x i ) The predicted value of the ith instance in the qth tree is represented.
Each segmentation node of the tree adopts an optimal segmentation point, and a greedy method is actually used in the process of building the tree model. The LightGBM integrates the weights of all leaf nodes as a reference to construct the tree, then determines the partitioning points and computes the first and second order gradients.
For any given tree structure, lightGBM defines the total number of times each feature is partitioned in the iterative tree, T _ Split, and the sum of gains T _ Gain that the feature is partitioned in all decision trees as metrics for measuring feature importance, as follows:
Figure BDA0003575066160000072
Figure BDA0003575066160000073
and K is K decision trees generated by K rounds of iteration.
After multiple iterations, the LightGBM classifier performance is optimized.
Compared with the original CNN model classification, the LightGBM improves the accuracy and recall rate and increases the recognition speed.
According to the encrypted flow identification method based on the pruning convolutional neural network and machine learning, provided by the invention, the characteristics do not need to be manually extracted, the CNN model is used for automatically extracting high-level characteristics from an original flow file and classifying the high-level characteristics, meanwhile, the pruning convolutional neural network model is constructed, the model parameter quantity is reduced, the calculation cost is reduced, the LightGBM is used for classifying according to the high-level characteristics of the encrypted flow, a weak classifier is used for achieving the effect of strong classification, the accuracy is improved, and the final model can achieve higher performance and accuracy than other classification models (see table 2).
TABLE 2 comparison of the models of the present application with other classification models
Method Rate of accuracy Recall rate F1 value
1D CNN 0.89 0.89 0.89
CNN+LSTM 0.91 0.91 0.91
SAE 0.92 0.92 0.92
2D-CNN 0.91 0.91 0.91
Model before pruning 0.90 0.86 0.88
Post-pruning model (application) 0.94 0.93 0.93
It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, apparatus embodiments, electronic device embodiments, computer-readable storage medium embodiments, and computer program product embodiments are described with relative simplicity as they are substantially similar to method embodiments, where relevant only as described in portions of the method embodiments.
The above-mentioned embodiments are only specific embodiments of the present application, and are used to illustrate the technical solutions of the present application, but not to limit the technical solutions, and the scope of the present application is not limited to the above-mentioned embodiments, although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: those skilled in the art can still make modifications or easily conceive of changes to the technical solutions described in the foregoing embodiments, or make equivalents to some of them, within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. A method for identifying encrypted traffic based on a pruning convolutional neural network and machine learning is characterized by comprising the following steps:
s1: preprocessing data;
s2: a CNN model is constructed, and a convolutional neural network mainly comprises the following layers: an input layer, a convolution layer, a ReLU layer, a pool layer and a full connection layer;
s3: pruning the model, retraining the model, and obtaining an optimized CNN model after a plurality of iterations;
s4: outputting a 256-dimensional characteristic vector serving as input of the LightGBM classifier by the optimized CNN model;
s5: the LightGBM classification is characterized in that a gradient decision tree in a LightGBM algorithm is obtained by carrying out multiple iterations on a given training data set, during each iteration, a new tree is readjusted by using gradient information to add into a previous iteration tree, the process is a continuously-changing linear combination process in a function space, the LightGBM integrates weights of all leaf nodes as references for building the tree, then partition points are determined, a first-order gradient and a second-order gradient are calculated, and after multiple iterations, the performance of the LightGBM classifier is enabled to reach the optimum.
2. The method for identifying encrypted traffic based on pruned convolutional neural network and machine learning of claim 1, wherein the encrypted traffic inputted in step S1 uses public data set iscxnvpn 2016, which contains 6 traditional encrypted traffic: email, chat, streaming, file transfer, voIP and P2P,6 corresponding VPN encrypted flows: VPN-Email, VPN-Chat, VPN-Streaming, VPN-File transfer, VPN-VoIP, and VPN-P2P.
3. The encrypted traffic identification method based on the pruned convolutional neural network and the machine learning of claim 2, wherein the traffic data input in step S1 are all obtained by Wireshark and tcpdump tools in a real environment, and the total amount is 28GB.
4. The encrypted traffic identification method based on the pruning convolutional neural network and the machine learning according to claim 1, wherein the data preprocessing process in the step S1 comprises:
s11: reading a pcap file;
s12: irrelevant messages are removed;
s13: removing the Ethernet frame header;
s14: covering the IP address;
s15: checking whether the packet length is larger than a specified input size, if so, truncating the data packet, otherwise, performing zero padding at the tail of the data packet to generate a byte matrix;
s16: the packets are normalized and divided by 255 in bytes so that the input sizes are all between 0 and 1.
5. The encrypted traffic identification method based on pruning convolutional neural network and machine learning according to claim 1, wherein step S2 is to form a complete convolutional neural network by stacking the input layer, convolutional layer, reLU layer, pool layer and full connection layer;
wherein the convolutional layer is a two-dimensional convolution;
the activation function ReLU is shown in equation (1):
ReLU(x)=max(O,x) (I)
batch normalization is shown in equation (2):
Figure FDA0003575066150000021
wherein alpha is i Is the value of the original activation of a certain neuron,
Figure FDA0003575066150000022
the standard value is a standard value after standardized operation;
the loss function is shown in equation (3):
Figure FDA0003575066150000023
the activation function Softmax formula of the output layer is shown as (4):
Figure FDA0003575066150000024
the model sets dropout to 0.5.
6. The encrypted traffic identification method based on the pruning convolutional neural network and the machine learning according to claim 1, wherein the main parameters of the CNN model in the step S2 are:
network layer Operation of Input device Convolution kernel Step size Filling in Output the output Number of weights 1 Conv2D+ReLU+BN 30*30 3*3 1 Same 8*30*30 80 2 Conv2D+ReLU+BN 8*30*30 3*3 2 Same 16*14*14 1168 3 Conv2D+ReLU+BN 16*14*14 3*3 2 Same 32*6*6 4640 4 Conv2D+ReLU+BN 32*6*6 3*3 1 Same 64*4*4 18496 5 Full connection + Dropout 64*4*4 Null Null None 256 262400
7. The encrypted traffic identification method based on the pruning convolutional neural network and the machine learning according to claim 1, wherein the pruning process in the step S3 is as follows:
s31: sorting the weights of the adjacent two layers of neurons according to the absolute value;
s32: according to the pruning speed P, the weight with the absolute value less than 0.4 is pruned, namely the weight is set to be 0;
s33: after pruning, the model is retrained and an optimized CNN model is obtained after a plurality of iterations.
8. The encrypted traffic recognition method based on pruned convolutional neural network and machine learning of claim 1, wherein the continuously varying linear combination process of step S5 is as shown in formula (6):
Figure FDA0003575066150000031
χ is the function space of the iterative tree, f q (x i ) Indicating the predicted value of the ith example in the qth tree.
9. The method for identifying encrypted traffic based on pruned convolutional neural network and machine learning of claim 1, wherein step S5, for any given tree structure, lightGBM defines the total number of times T _ Split each feature is Split in the iterative tree and the total Gain T _ Gain of the feature after being Split in all decision trees as the metric for measuring the feature importance, which is specifically defined as follows:
Figure FDA0003575066150000032
Figure FDA0003575066150000033
and K is K decision trees generated by K rounds of iteration.
CN202210337870.2A 2022-03-31 2022-03-31 Encryption flow identification method based on pruning convolutional neural network and machine learning Active CN115334005B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210337870.2A CN115334005B (en) 2022-03-31 2022-03-31 Encryption flow identification method based on pruning convolutional neural network and machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210337870.2A CN115334005B (en) 2022-03-31 2022-03-31 Encryption flow identification method based on pruning convolutional neural network and machine learning

Publications (2)

Publication Number Publication Date
CN115334005A true CN115334005A (en) 2022-11-11
CN115334005B CN115334005B (en) 2024-03-22

Family

ID=83916441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210337870.2A Active CN115334005B (en) 2022-03-31 2022-03-31 Encryption flow identification method based on pruning convolutional neural network and machine learning

Country Status (1)

Country Link
CN (1) CN115334005B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116743506A (en) * 2023-08-14 2023-09-12 南京信息工程大学 Encrypted flow identification method and device based on quaternion convolutional neural network

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180357542A1 (en) * 2018-06-08 2018-12-13 University Of Electronic Science And Technology Of China 1D-CNN-Based Distributed Optical Fiber Sensing Signal Feature Learning and Classification Method
CN110472778A (en) * 2019-07-29 2019-11-19 上海电力大学 A kind of short-term load forecasting method based on Blending integrated study
CN111860628A (en) * 2020-07-08 2020-10-30 上海乘安科技集团有限公司 Deep learning-based traffic identification and feature extraction method
CN112380781A (en) * 2020-11-30 2021-02-19 中国人民解放军国防科技大学 Satellite observation completion method based on reanalysis data and unbalanced learning
WO2021088499A1 (en) * 2019-11-04 2021-05-14 西安交通大学 False invoice issuing identification method and system based on dynamic network representation
CN113159109A (en) * 2021-03-04 2021-07-23 北京邮电大学 Wireless network flow prediction method based on data driving
WO2021190379A1 (en) * 2020-03-25 2021-09-30 第四范式(北京)技术有限公司 Method and device for realizing automatic machine learning
CN113489751A (en) * 2021-09-07 2021-10-08 浙江大学 Network traffic filtering rule conversion method based on deep learning
CN113537497A (en) * 2021-06-07 2021-10-22 贵州优联博睿科技有限公司 Gradient lifting decision tree model construction optimization method based on dynamic sampling
CN113779608A (en) * 2021-09-17 2021-12-10 神谱科技(上海)有限公司 Data protection method based on WOE mask in multi-party longitudinal federal learning LightGBM training
CN113901448A (en) * 2021-09-03 2022-01-07 燕山大学 Intrusion detection method based on convolutional neural network and lightweight gradient elevator
WO2022041394A1 (en) * 2020-08-28 2022-03-03 南京邮电大学 Method and apparatus for identifying network encrypted traffic
CN114189350A (en) * 2021-10-20 2022-03-15 北京交通大学 LightGBM-based train communication network intrusion detection method

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180357542A1 (en) * 2018-06-08 2018-12-13 University Of Electronic Science And Technology Of China 1D-CNN-Based Distributed Optical Fiber Sensing Signal Feature Learning and Classification Method
CN110472778A (en) * 2019-07-29 2019-11-19 上海电力大学 A kind of short-term load forecasting method based on Blending integrated study
WO2021088499A1 (en) * 2019-11-04 2021-05-14 西安交通大学 False invoice issuing identification method and system based on dynamic network representation
WO2021190379A1 (en) * 2020-03-25 2021-09-30 第四范式(北京)技术有限公司 Method and device for realizing automatic machine learning
CN111860628A (en) * 2020-07-08 2020-10-30 上海乘安科技集团有限公司 Deep learning-based traffic identification and feature extraction method
WO2022041394A1 (en) * 2020-08-28 2022-03-03 南京邮电大学 Method and apparatus for identifying network encrypted traffic
CN112380781A (en) * 2020-11-30 2021-02-19 中国人民解放军国防科技大学 Satellite observation completion method based on reanalysis data and unbalanced learning
CN113159109A (en) * 2021-03-04 2021-07-23 北京邮电大学 Wireless network flow prediction method based on data driving
CN113537497A (en) * 2021-06-07 2021-10-22 贵州优联博睿科技有限公司 Gradient lifting decision tree model construction optimization method based on dynamic sampling
CN113901448A (en) * 2021-09-03 2022-01-07 燕山大学 Intrusion detection method based on convolutional neural network and lightweight gradient elevator
CN113489751A (en) * 2021-09-07 2021-10-08 浙江大学 Network traffic filtering rule conversion method based on deep learning
CN113779608A (en) * 2021-09-17 2021-12-10 神谱科技(上海)有限公司 Data protection method based on WOE mask in multi-party longitudinal federal learning LightGBM training
CN114189350A (en) * 2021-10-20 2022-03-15 北京交通大学 LightGBM-based train communication network intrusion detection method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CHAOQUN KANG: "Research_on_condition_assessment_for_distribution_vacuum_switch_cabinets_based_on_multi-source_information_fusion", 《2015 5TH INTERNATIONAL CONFERENCE ON ELECTRIC UTILITY DEREGULATION AND RESTRUCTURING AND POWER TECHNOLOGIES (DRPT)》 *
李道全;王雪;于波;黄泰铭;: "基于一维卷积神经网络的网络流量分类方法", 计算机工程与应用, no. 03 *
董浩;李烨;: "基于卷积神经网络的复杂网络加密流量识别", 软件导刊, no. 09 *
陈诗雨;李小勇;杜杨杨;谢福起;: "Fourier神经网络非线性拟合性能优化研究", 武汉大学学报(工学版), no. 03 *
顾兆军;吴优;赵春迪;周景贤;: "流量的集成学习与重采样均衡分类方法", 计算机工程与应用, no. 06 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116743506A (en) * 2023-08-14 2023-09-12 南京信息工程大学 Encrypted flow identification method and device based on quaternion convolutional neural network
CN116743506B (en) * 2023-08-14 2023-11-21 南京信息工程大学 Encrypted flow identification method and device based on quaternion convolutional neural network

Also Published As

Publication number Publication date
CN115334005B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
CN110730140B (en) Deep learning flow classification method based on combination of space-time characteristics
CN109951444B (en) Encrypted anonymous network traffic identification method
CN111191767B (en) Vectorization-based malicious traffic attack type judging method
CN112769752B (en) Network intrusion detection method based on machine learning integration model
CN111565156B (en) Method for identifying and classifying network traffic
CN113989583A (en) Method and system for detecting malicious traffic of internet
CN113472751B (en) Encrypted flow identification method and device based on data packet header
CN114615093A (en) Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning
CN111817971B (en) Data center network flow splicing method based on deep learning
CN113821793B (en) Multi-stage attack scene construction method and system based on graph convolution neural network
Zhang et al. Autonomous model update scheme for deep learning based network traffic classifiers
Soleymanpour et al. An efficient deep learning method for encrypted traffic classification on the web
CN111367908A (en) Incremental intrusion detection method and system based on security assessment mechanism
CN114513367B (en) Cellular network anomaly detection method based on graph neural network
CN115334005A (en) Encrypted flow identification method based on pruning convolution neural network and machine learning
Chen et al. Ride: Real-time intrusion detection via explainable machine learning implemented in a memristor hardware architecture
Yujie et al. End-to-end android malware classification based on pure traffic images
CN112839051A (en) Encryption flow real-time classification method and device based on convolutional neural network
US11461590B2 (en) Train a machine learning model using IP addresses and connection contexts
Dener et al. RFSE-GRU: Data balanced classification model for mobile encrypted traffic in big data environment
CN111291078A (en) Domain name matching detection method and device
CN113746707B (en) Encrypted traffic classification method based on classifier and network structure
Wanode et al. Optimal feature set selection for IoT device fingerprinting on edge infrastructure using machine intelligence
Li et al. Fden: Mining effective information of features in detecting network anomalies
Nigmatullin et al. Accumulated Generalized Mean Value-a New Approach to Flow-Based Feature Generation for Encrypted Traffic Characterization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant