CN114615093B - Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning - Google Patents

Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning Download PDF

Info

Publication number
CN114615093B
CN114615093B CN202210506848.6A CN202210506848A CN114615093B CN 114615093 B CN114615093 B CN 114615093B CN 202210506848 A CN202210506848 A CN 202210506848A CN 114615093 B CN114615093 B CN 114615093B
Authority
CN
China
Prior art keywords
traffic
feature
layer
reconstruction
inheritance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210506848.6A
Other languages
Chinese (zh)
Other versions
CN114615093A (en
Inventor
肖滕龙
翟江涛
许成程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202210506848.6A priority Critical patent/CN114615093B/en
Publication of CN114615093A publication Critical patent/CN114615093A/en
Application granted granted Critical
Publication of CN114615093B publication Critical patent/CN114615093B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Computer Hardware Design (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning, wherein the method comprises the following steps: collecting original network flow, primarily screening the flow, and removing non-Tor flow; reconstructing the primarily screened flow, and converting the flow into a gray characteristic diagram; processing the feature map after flow reconstruction by using a convolutional neural network model and a cyclic neural network model, extracting an interactive information feature vector, a packet space feature vector and a flow time sequence feature vector, and fusing the three feature vectors; inputting the fusion characteristics into a multi-classifier for application classification, wherein the multi-classifier updates classifier parameters through an inheritance learning mechanism when detecting a new flow class; the home application of the traffic is determined based on majority rules. The invention simplifies the process of feature design, enriches the comprehensiveness of features, meets the requirement of online updating of model parameters, keeps the model remembering the past training, and only needs small-scale training each time a new category is added.

Description

Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning
Technical Field
The invention relates to network traffic identification and network application classification, in particular to an anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning.
Background
With the continuous development of the internet, the types of network traffic are gradually complex, and different types of application programs are continuously emerged. Applications can generate a large amount of network traffic, and different types of traffic can exhibit different characteristics. The goal of traffic classification is to identify the class of traffic based on its distinguishing characteristics, which is essential to network operators. The traffic classification is the first step of guaranteeing the service quality from the perspective of the user service quality, and is a premise of providing differentiated services for services according to requirements of different service types, and on the other hand, the traffic classification is the first step of detecting abnormal network traffic from the perspective of security, so that the network security can be better protected. In recent years, with the increasing demand of users for privacy protection and the continuous development of anonymized encryption technology, more and more traffic is specially processed, which presents new challenges to network traffic classification.
Classification methods in the field of traffic identification have undergone several changes, and conventional traffic classification methods are mainly classified into two categories: one is a port number-based method, which identifies according to a protocol number corresponding to a port number, but with the advent of anonymous network port obfuscation techniques, this method is becoming ineffective. The other type is an identification method based on Deep Packet Inspection (DPI), and data packet loads are matched to determine the category based on different categories of regular expressions. But this method is not feasible as the traffic anonymization encryption technology is mature. With the loss of function of the traditional methods, researchers began to look for new methods of traffic classification. Machine learning methods that have progressed rapidly in recent years have received considerable attention from researchers. Compared with the traditional classification method, the machine learning technology is more intelligent and convenient, and can effectively avoid the influence of flow encryption by classifying according to the statistical characteristics of the flow. Therefore, researchers have proposed a traffic classification algorithm based on machine learning, and the machine learning algorithms widely used at present include support vector machines, decision trees, random forests, XGBoost methods, and the like. The classification methods have good classification accuracy and are widely accepted by all social circles. However, the traffic classification method based on machine learning requires expert experience to extract and screen traffic characteristics, and the characteristics are not comprehensive enough while consuming time and energy, and have high representativeness requirements on the characteristics and low classification accuracy. The model based on deep learning becomes a research hotspot at present, an end-to-end model is favored by researchers, but in actual deployment, when a novel traffic identification scene is encountered, the model needs to be retrained, a large amount of time is consumed, and the difficulty is encountered in anonymous network traffic application classification at present.
Disclosure of Invention
The invention aims to: the invention aims to provide an anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning, which at least partially solve the problems in the background art.
The technical scheme is as follows: an anonymous network traffic identification method based on traffic reconstruction and inheritance learning comprises the following steps:
collecting original network flow, primarily screening the flow, and removing non-Tor flow;
reconstructing the flow after primary screening, converting the flow into a gray characteristic diagram, comprising: original byte characteristic reconstruction: taking the standard byte asLTo be less thanLThe data packet of each byte is subjected to zero padding operation, exceedingLThe data packet of each byte is cut off and normalized to generatei*iThereby converting the packed byte matrix into a gray image; and reconstructing the characteristics of the uplink and downlink interactive behaviors: constructing horizontal and vertical coordinates according to the size and direction of the data packets and time intervals, and forming a characteristic diagram simulating uplink and downlink interaction behaviors by taking the number of the data packets in each time interval as a gray value of a pixel point;
inputting corresponding uplink and downlink interactive behavior characteristic graphs into a convolutional neural network to extract and obtain interactive information characteristic vectors by taking a data packet as a unitV s Inputting the original byte characteristic diagram into the convolutional neural network to extract and obtain a packet space characteristic vectorV n Grouping the packet space feature vectors and inputting the grouped packet space feature vectors into a recurrent neural network to extract to obtain the stream time sequence feature vectorsV m And fusing the three feature vectors;
inputting the fusion characteristics into a multi-classifier for application classification, wherein the multi-classifier updates classifier parameters through an inheritance learning mechanism when detecting a new flow category;
the home application of the traffic is determined based on majority rules.
The invention also provides an anonymous network flow identification device based on flow reconstruction and inheritance learning, which comprises the following components:
the data acquisition and filtering module is used for acquiring original network flow, primarily screening the flow and eliminating non-Tor flow;
the flow reconstruction module reconstructs the flow after primary screening, converts the flow into a gray characteristic diagram, and comprises: original byte characteristic reconstruction unit: taking the standard byte asLFor less thanLThe data packet of one byte is subjected to zero padding operation, and exceedsLThe data packet of each byte is cut off and normalized to generatei*iThereby converting the packed byte matrix into a gray image; and an uplink and downlink interactive behavior characteristic reconstruction unit: constructing horizontal and vertical coordinates according to the size, the direction and the time intervals of the data packets, and forming a characteristic diagram simulating uplink and downlink interaction behaviors by taking the number of the data packets in each time interval as a gray value of a pixel point;
the feature extraction and fusion module takes a data packet as a unit and inputs the corresponding uplink and downlink interactive behavior feature map into the convolutional neural network to extract and obtain an interactive information feature vectorV s Inputting the original byte characteristic diagram into the convolutional neural network to extract and obtain a packet space characteristic vectorV n Inputting a group of packet space feature vectors into a recurrent neural network to extract and obtain a stream time sequence feature vectorV m And fusing the three feature vectors;
the application classification module is used for inputting the fusion characteristics into a multi-classifier for application classification, and the multi-classifier updates classifier parameters through an inheritance learning mechanism when detecting a new flow category;
and the class judgment module is used for determining the attribution application of the flow based on a majority principle.
The present invention also provides a computer apparatus comprising:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors, which when executed by the processors, implement the steps of the anonymous network traffic identification method based on traffic reconstruction and inheritance learning as described above.
The present invention also provides a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, implements the steps of the anonymous network traffic identification method based on traffic reconstruction and inheritance learning as described above.
Has the beneficial effects that: according to the method, the characteristic vectors containing the interactive information, the packet-level spatial information and the flow-level time sequence information with different dimensions are extracted through the reconstruction of the flow characteristic diagram, and application classification is carried out, so that the problem that the classification accuracy is low when the characteristic representativeness is insufficient is solved, the characteristic design process is simplified, the comprehensiveness of the characteristics is enriched, and the requirement of online updating of model parameters is met. Meanwhile, the invention utilizes the inheritance learning mechanism to ensure that the classifier model keeps the memory of the past training, and only needs small-scale training when a new category is added each time. The method of the invention can realize the application classification of the anonymous network flow with high efficiency, accuracy and low cost.
Drawings
FIG. 1 is a general flow diagram of a Tor traffic identification method of the present invention;
FIG. 2 is a flowchart of an embodiment of a Tor traffic application identification method of the present invention;
FIG. 3 is a schematic diagram of interactive behavior traffic reconstruction in accordance with the present invention;
FIG. 4 is a schematic diagram of a convolutional neural network structure employed in the present invention;
FIG. 5 is a schematic diagram of a recurrent neural network architecture employed in the present invention;
FIG. 6 is a schematic diagram of an online updating method for inherited learning mechanism parameters in the present invention;
fig. 7 is a diagram illustrating most principles of determining flow attribution categories according to the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings.
Referring to fig. 1 and fig. 2, the anonymous network traffic identification method based on traffic reconstruction and inheritance learning provided by the present invention includes the following steps:
step 1, collecting original network flow, carrying out flow primary screening, and removing non-Tor flow.
According to The embodiment of The invention, a flow detector is deployed in a network, accounts of various application programs are established, Tor (The on Router, Onion Router) network is used for simulating The behavior of users using various applications, and Tor flow, namely anonymous network flow, is generated. The method comprises the steps of capturing flow by Wireshark, storing the flow in a PCAP mode, and dividing original flow into two-way flows according to a { SrcIP, SrcPort, DstIP, DstPort and Protocol } quintuple mode and then storing the two-way flows. In the quintuple, SrcIP is a source IP address, SrcPort is a source port, DstIP is a destination IP address, DstPort is a destination port, and Protocol represents a Protocol type. One network flow with the same quintuple data is considered to be a unidirectional flow, while the source and destination IPs and source and destination ports of a bidirectional flow may be interchanged simultaneously. For example, a packet containing only a to B is a unidirectional flow, and a packet containing a to B and B to a is a bidirectional flow. The network mainly comprises two types of protocol flows, namely a TCP flow and a UDP flow, wherein the TCP flow uses a SYN zone bit to represent the beginning of transmission, the FIN zone bit is used for finishing transmission, and the UDP flow uses a data packet time interval as a judgment basis. The method and the device use the DPKT library to analyze and divide the PCAP file and reserve the information of all layers of the session stream.
And (3) performing feature extraction on the divided network flow by using a feature extraction tool CICFlowMeter, performing histogram equal-depth discretization on the features, inputting the features into a limit gradient lifting decision tree, and sequentially performing traversal calculation on the value of each feature through a target function consisting of a loss function and a regularization penalty term to find out feature points of the minimized target function, thereby filtering out non-Tor flow and reducing the working complexity. Objective function
Figure 38845DEST_PATH_IMAGE001
As shown in formula (1), wherein
Figure 181114DEST_PATH_IMAGE002
In order to be a function of the loss,
Figure 597052DEST_PATH_IMAGE003
for the penalty function:
Figure 598506DEST_PATH_IMAGE004
(1)
Figure DEST_PATH_IMAGE005
(2)
Figure 859723DEST_PATH_IMAGE006
(3)
in the formula (I), the compound is shown in the specification,
Figure 997443DEST_PATH_IMAGE002
in
Figure DEST_PATH_IMAGE007
The difference between the true value and the predicted value is described,
Figure 554588DEST_PATH_IMAGE009
is a sampleiFirst, thetThe decision tree model generated by the round of fitting,g i is composed of
Figure 902393DEST_PATH_IMAGE007
The first derivative of (a) is,h i is composed of
Figure 108247DEST_PATH_IMAGE007
The second derivative of (c).TIs the number of leaves of the decision tree model,
Figure 490686DEST_PATH_IMAGE010
in order to obtain a learning rate,
Figure 389372DEST_PATH_IMAGE011
for the prediction of the input samples by the decision tree,
Figure 224473DEST_PATH_IMAGE012
to control the constant parameters of the size of the penalty term,
Figure 702859DEST_PATH_IMAGE013
is a decision tree ofjA predicted value for each leaf node. The target function represents the error between the predicted value and the true value, after a training sample is input, the decision tree evaluates different values of each characteristic of Tor and non-Tor flow, the influence of the value of each characteristic on the judgment of a certain sample as Tor or non-Tor flow is detected, namely a loss function is calculated, when the value of the characteristic is a certain value, the sample is always judged as non-Tor, and the value of the characteristic is taken as a splitting characteristic point. So that non-Tor traffic can be filtered out.
And 2, step: and reconstructing the primarily screened flow, and converting the flow into a gray characteristic diagram.
The flow reconstruction of the invention comprises two parts of a data packet original byte characteristic diagram and an uplink and downlink interactive behavior characteristic diagram.
For original byte characteristic reconstruction, taking standard byte as L, carrying out zero filling operation on data packets less than L bytes, carrying out truncation processing on data packets more than L bytes, and generating after standardization/normalizationi*iAnd (4) into a gray scale image. WhereiniDetermined by the packet size distribution. For example, according to the invention, the data packet size is distributed within 1400 bytes, the standard size of the data packet is 1444 bytes, a 38 × 38 packet byte matrix is generated, so that a gray image is obtained, a single input is input into a convolutional neural network to obtain a spatial feature vector, and the obtained packet spatial vector is grouped according to flow and then a cyclic neural network is used to obtain a time sequence feature vector.
And for the reconstruction of the characteristics of the uplink and downlink interactive behaviors, the size and the direction of the data packets of the network flow and the arrival time interval form a three-dimensional characteristic graph, and the number of the data packets in each time interval is used as the gray value of a pixel point to form a gray graph simulating the vertical interactive information. As shown in fig. 3, the abscissa of the grayscale map is the size of the data packet, the maximum value and the minimum value of the data packet in the stream sample are found out and used as the starting position and the ending position of the abscissa, the sizes of all the data packets are normalized to the whole abscissa, the ordinate is equally divided into two parts, which are the arrival time of the uplink packet and the downlink packet respectively, and the depth of the cross pixel point of the abscissa represents the number of the data packets. In the description of the present invention, uplink and downlink refer to bidirectional transmission between two network nodes, and uplink and downlink interactive behavior information refers to paired data packets with opposite destination IP and source IP, for example, a and B transmit with each other, the direction of the generated first data packet represents uplink, and the direction opposite to the first data packet represents downlink.
And 3, step 3: and constructing a neural network model and extracting features.
Convolutional Neural Networks (CNN) are a kind of multi-layered supervised learning neural network, and convolutional layers and pooling layers are core parts of feature extraction. The weight parameters in the network are reversely adjusted layer by adopting a gradient descent algorithm to minimize a loss function, and the accuracy of the network is improved by frequent iterative training. Convolutional neural networks consist of alternating convolutional and pooling layers, followed by a fully-connected layer and a logistic regression classifier such as a Softmax layer. The input of the first fully connected layer is a feature map obtained by feature extraction from the convolutional layer and the pooling layer. A Recurrent Neural Network (RNN) is a special neural network structure in which a sequence of current outputs is also related to previous outputs, and the network memorizes the previous information and applies it to the calculation of the current output.
As shown in fig. 4, the convolutional neural network model structure constructed by the present invention is input layer-convolutional layer (CONV 1) -pooling layer (POOL 1) -convolutional layer (CONV 2) -pooling layer (POOL 2) -convolutional layer (CONV 3) -full-connectivity layer (FC 1) -full-connectivity layer (FC 2) (FC 3 in the figure is a feature simplification step, which is described below). Inputting a gray image with 38 × 1 layers, after convoluting by CONV1, the number of channels is 32, the dimension is 38 × 32, after 2 × 2 convolution kernel sampling by POOL1, the dimension is 19 × 32, after convoluting by CONV2, the number of channels is 64, after 2 × 2 convolution kernel sampling by POOL2, the output dimension is 10 × 64, after 2 × 2 convolution kernel sampling, the dimension is pulled to one dimension by a Flatten function through CONV3 convolution, the dimension can be inputted into the full-connection layers, and finally, the neuron of the full-connection layer FC2 is nnSet to 64, i.e., FC2 outputs a feature vector of 1 × 64.
As shown in fig. 5, the recurrent neural network model constructed by the present invention has a structure of BiGRU layer (BiGRU 1) -BiGRU layer (BiGRU 2) -full connection layer (FC 4) (in the figure, FC5 is a feature simplification step, which is described below), and processes packet space feature packet input. The packet space features obtained by a plurality of data packet feature maps through a CNN model are used as a group, a group of packet feature vectors are input into a BiGRU (bidirectional circular gated neural network) layer to extract high-level time sequence feature vectors, the number of neurons of a full connection layer FC4 is m, and m is set to be 64 in the invention, namely the dimension of the flow time sequence feature vector based on the packet data packets is 1 x 64.
The specific process of extracting the features is as follows:
(a) extracting characteristic vectors of uplink and downlink interactive information;
inputting the up-down interactive behavior reconstruction graph into a convolutional neural network, extracting a spatial feature graph by operation of the first two convolutional layers and the pooling layer of the convolutional neural network, converting the feature graph into a one-dimensional vector by a Flatten function of the third convolutional layer so as to input the one-dimensional vector into a fully-connected layer, extracting 1 s of up-down interactive behavior feature vectors from the fully-connected layer FC2, wherein the neuron of FC2 is 64, and thus obtaining 1 x 64 one-dimensional feature vectors. And saving the interactive behavior feature vector.
(b) Extracting packet-level spatial feature vectors;
the invention inputs CNN to extract packet-level spatial features by taking a data packet as a unit, namely, the invention extracts the packet-level spatial features of a single data packet and intercepts or supplements zero in the data packetkA standard byte, which is converted into a single byte by means of single hot codinglThe dimension vector of the vector is calculated,ka byte can form a framel*kThe gray scale image of (1). In an embodiment of the present invention, the grayscale image set is represented by 9: and 1, training and dividing a test set. Training by using a convolutional neural network, selecting the size of Batchsize to be 64, selecting a cross entropy function as a loss function, using a random gradient descent algorithm in an optimization method, training the training times to be 200, learning rate to be 0.001, adopting a Tanh function as an activation function, and adopting maximum pooling for pooling operation. Extracting 1 from the fully connected layer after inputting the gray scale graph generated by the data packetnThe number of neurons in the full link layer FC2 is 64, and the extracted packet is the same as that of the extracted packet-level feature vectorThe rank feature vector dimension is 1 x 64. Where n = s, n and s are distinguished to indicate that both are features of different nature. And storing the feature vector of each data packet extracted by the CNN model.
(c) Extracting the flow-level time sequence characteristics of the grouped data packets;
the packet feature vectors are input into the recurrent neural network according to stream packets for training, as shown in fig. 5, 10 packet numbers are measured, then 10 packet feature vectors form 1 × 320 input, parameters required for training and the recurrent neural network are operated by the BiGRU layer to obtain stream level timing characteristics. The time sequence characteristics extracted from the BiGRU layer are converted into one-dimensional vectors through a Flatten function and input into the full connection layer FC4, and 1 x is extracted from the full connection layer FC4mIf the dimension of the one-dimensional feature vector of (1) is 64 for the full connection layer FC4 neuron, the extracted flow-level time-series feature vector dimension is 1 × 64, and the time-series feature vector is stored.
Referring to fig. 2 to 4, the present invention performs feature fusion in units of packets after feature simplification. The characteristic simplification means that 1 isnOne-dimensional spatial feature of (1) m And 1. the time sequence characteristic ofsThe interactive behavior characteristics are converted into one-dimensional characteristic vectors smaller than the original dimensionality by adding a full connection layer on the basis of the original model. In the invention, the packet level space characteristics of 1 × 64 and the flow level time sequence characteristics of 1 × 6 are simplified into 1 × 32 dimensions through the full connection layers FC3 and FC5, respectively, and the uplink and downlink interaction behavior characteristics of 1 × 64 are simplified into 1 × 26 dimensions through the full connection layer FC3 with the number of neurons being 26. The weights of the three types of features can be adjusted through feature simplification while facilitating subsequent processing. And the characteristic fusion is carried out by taking the data packet as a unit, namely the interactive behavior characteristic, the data packet space characteristic and the stream time sequence characteristic of the stream after the characteristic simplification are subjected to characteristic fusion according to a single data packet as a sample to obtain a characteristic vector with the dimension of 1 x 90, and the characteristic vector is transferred and input to the multi-classifier.
And 4, step 4: and classifying the traffic application by using a plurality of classifiers.
The multi-classifier adopts a one-dimensional convolutional neural network and has a structure of a first convolutional layer, a first pooling layer, a second convolutional layer, a second pooling layer, a first fully-connected layer, a second fully-connected layer and a Softmax layer. The input layer is a multi-step with 90 x 1 of dimensionality after the characteristic is fusedThe number of channels after the first convolutional layer is 32, the output dimension is 90 x 32, the dimension after the first sampling pooling layer is reduced to 30 x 32, the number of channels after the second convolutional layer is 64, the output dimension is 30 x 64, the dimension after the second sampling pooling layer is reduced to 10 x 64, and the neurons in the two fully-connected layers are 128 and 10cHere, thecIs the desired number of categories. The training parameter is set to be min-batch of 50, the loss function is a cross entropy function, the optimization method is a random gradient descent algorithm, the learning rate is 0.001, and the Epoch is 40. Before the flowNThe fused features of each packet in the data packet are classified for application.
In the embodiment of the invention, when a new flow class appears, the parameters of the multiple classifiers are updated through an inheritance learning mechanism. The specific process is as follows:
(1) sample data preprocessing;
dividing the new class samples and a small amount of old class samples into a training set, a verification set and a test set, wherein the ratio of the training set to the verification set to the test set is 9: 1: 1, predicting samples by using an original classifier, and outputting a normalized vector of a classification result from a final full-connection layerV a The new classifier obtains the normalized vector of the classification result for the prediction sampleV b Remembering the true class label vector asV c
(2) Using an inheritance loss function to learn the parameters of the original model and adapt to the new category at the same time;
referring to fig. 6, the inherited loss function is defined as a weighted sum of the true loss function and the differential loss function, as shown in equation (5) below. Wherein the real loss function describes the real class in the training processV c With new classifier prediction resultsV b The fitting degree of (2) is equivalent to the process of learning new knowledge, and the cross entropy loss function is adopted in the invention, as shown in formula (5). Normalization vector of prediction result of original classifier by using difference loss functionV a Normalizing vectors with new classifier predictorsV b The degree of difference of (a) is equivalent to a process of retaining the originally learned weight information, so that updating of the classifier can be completed more quickly. Hair brushIt is clear that the difference of the two probability distributions is described using the KL divergence loss function. Ratio of old and new classes
Figure 706849DEST_PATH_IMAGE014
As a function of the differential loss, and
Figure 510857DEST_PATH_IMAGE015
then the weight of the true loss function.p(x i )、q(x i ) Respectively for random variable samplesxTwo probability distributions for the predicted result, 0.375 and 0.625 in the present invention, respectively.
Figure 833254DEST_PATH_IMAGE016
(4)
Figure 380910DEST_PATH_IMAGE017
(5)
Figure 472363DEST_PATH_IMAGE018
(6)
(3) Defining retention coefficients to control the learning degree of the parameters of the original classifier;
the original classifier not only represents the classification result of the prediction sample, but also represents the degree similar to or different from other classes, different importance is given to the learning of the normalization vector of the classification result by using a retention coefficient, the retention coefficient is set between 0 and 1 according to the required learning degree, the retention coefficient is increased when the original classifier extracts sufficiently detailed features, and otherwise, a smaller retention coefficient is used.
(4) Using linear mapping to balance the classification preferences of different classes at the fully connected layer;
the parameters of the full-connection layer of the classifier are always most fitted to the latest category when predicting the sample, and in order to balance the fitting degree of the new and old categories, one is defined for the output result of the new categoryA linear mapping model processes the classification result vectors for the new classes. Two parameters of the linear mapping modelabAnd determining by using a verification set, wherein the loss function adopts a cross entropy loss function, and the parameters are stored as a weight file.
Figure 712851DEST_PATH_IMAGE019
(7)
outThe probability given to the classifier.
And 5: and judging the final attribution application of the traffic based on a majority principle.
Determining flow classification using majority rules refers to pre-staging flowNVoting selection is carried out after the classification result of each data packet is obtained,Nmost packets in the packet classification result are classified into a certain type of application, and the flow is determined as the application traffic. As shown in FIG. 7, the present inventionNAnd if the number of the data packets classified into a plurality of categories is equal, comparing the probability sum, and taking the category with the large probability sum as the final data flow attribution category.
Based on the same technical concept as the method embodiment, the invention also provides an anonymous network traffic identification device based on traffic reconstruction and inheritance learning, which comprises the following steps:
the data acquisition and filtering module is used for acquiring original network flow, primarily screening the flow and eliminating non-Tor flow;
the flow reconstruction module is used for reconstructing the primarily screened flow and converting the flow into a gray characteristic diagram; the method comprises the following steps: original byte characteristic reconstruction unit: taking the standard byte asLTo be less thanLThe data packet of one byte is subjected to zero padding operation, and exceedsLThe data packet of each byte is cut off and normalized to generatei*iThereby converting the packed byte matrix into a gray image; and an uplink and downlink interactive behavior characteristic reconstruction unit: constructing horizontal and vertical coordinates according to the size, direction and time interval of the data packets, and taking the number of the data packets in each time interval as the gray value of the pixel pointForming a characteristic diagram for simulating uplink and downlink interactive behaviors;
a feature extraction and fusion module, which takes the data packet as a unit and inputs the corresponding uplink and downlink interactive behavior feature map into a convolutional neural network for extraction to obtain an interactive information feature vectorV s Inputting the original byte characteristic diagram into the convolutional neural network to extract and obtain a packet space characteristic vectorV n Inputting a group of packet space feature vectors into a recurrent neural network to extract and obtain a stream time sequence feature vectorV m And fusing the three feature vectors;
the application classification module is used for inputting the fusion characteristics into a multi-classifier for application classification, and the multi-classifier updates classifier parameters through an inheritance learning mechanism when detecting a new flow category;
and the class judgment module is used for determining the attribution application of the flow based on a majority principle.
It should be understood that the anonymous network traffic identification apparatus provided in this embodiment may implement all technical solutions of the anonymous network traffic identification method, functions of each functional module of the anonymous network traffic identification apparatus may be implemented according to the method in the foregoing method embodiment, and a specific implementation process may refer to relevant descriptions in the foregoing embodiment, which is not described herein again.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims (10)

1.一种基于流量重构与继承学习的匿名网络流量识别方法,其特征在于,所述方法包括以下步骤:1. an anonymous network traffic identification method based on traffic reconstruction and inheritance learning, is characterized in that, described method comprises the following steps: 采集原始网络流量,并进行流量初筛,剔除非Tor流量;Collect raw network traffic, and perform preliminary traffic screening to eliminate non-Tor traffic; 对初筛后的流量进行重构,将流量转化为灰度特征图,包括:原始字节特征重构:取标准字节为L,对少于L个字节的数据包进行补零操作,超过L个字节的数据包进行截断处理,归一化之后生成i*i的包字节矩阵从而转化为灰度图像;以及,上下行交互行为特征重构:根据数据包大小、方向与时间间隔构造横纵坐标,每个时间间隔内的数据包数量作为像素点灰度值,形成模拟上下行交互行为的特征图;Reconstruct the traffic after the initial screening, and convert the traffic into a grayscale feature map, including: original byte feature reconstruction: take the standard byte as L , and perform zero-fill operation on the data packets with less than L bytes, Data packets with more than L bytes are truncated, and normalized to generate an i * i packet byte matrix to convert into grayscale images; and, reconstruction of uplink and downlink interaction behavior characteristics: according to the size, direction and time of the data packet The horizontal and vertical coordinates are constructed at intervals, and the number of data packets in each time interval is used as the gray value of the pixel point to form a feature map that simulates the interaction behavior of upstream and downstream; 以数据包为单位,将相对应的上下行交互行为特征图输入到卷积神经网络提取得到交互信息特征向量,将原始字节特征图输入到卷积神经网络提取得到包空间特征向量,将一组包空间特征向量输入循环神经网络提取得到流时序特征向量,并将三种特征向量进行融合;Taking the data packet as the unit, input the corresponding uplink and downlink interactive behavior feature map into the convolutional neural network to extract the interactive information feature vector, input the original byte feature map into the convolutional neural network to extract the packet space feature vector, The packet space feature vector is input into the cyclic neural network to extract the stream time series feature vector, and the three feature vectors are fused; 将融合特征输入多分类器进行应用分类,所述多分类器在检测到流量新类别时通过继承学习机制更新分类器参数;Input the fusion feature into a multi-classifier for application classification, and the multi-classifier updates the classifier parameters through an inheritance learning mechanism when a new traffic category is detected; 基于多数原则确定流量的归属应用。The attribution application of the traffic is determined based on the majority rule. 2.根据权利要求1所述的基于流量重构与继承学习的匿名网络流量识别方法,其特征在于,所述采集原始网络流量,并进行流量初筛包括:2. the anonymous network traffic identification method based on traffic reconstruction and inheritance learning according to claim 1, is characterized in that, described collecting original network traffic, and carrying out the preliminary screening of traffic comprises: 利用网络流量采集工具抓取原始流量,按照五元组形式对原始流量进行划分;Use network traffic collection tools to capture the original traffic, and divide the original traffic in the form of quintuple; 利用特征提取工具对划分好的网络流进行特征提取,对特征进行直方图等深离散化处理,输入到极限梯度提升决策树中,通过由损失函数与正则化惩罚项组成的目标函数对每个特征的取值依次遍历计算,找到最小化目标函数的特征点,从而过滤掉非Tor流量。Use feature extraction tools to extract features from the divided network streams, perform deep discretization processing such as histograms on the features, and input them into the extreme gradient boosting decision tree. The values of the features are traversed and calculated in turn to find the feature points that minimize the objective function, thereby filtering out non-Tor traffic. 3.根据权利要求1所述的基于流量重构与继承学习的匿名网络流量识别方法,其特征在于,根据数据包大小、方向与时间间隔构造横纵坐标包括:以数据包大小作为横坐标,找出流样本中数据包最大值与最小值,作为横坐标的起始与结束位置,将所有数据包的大小归一化到整个横坐标中,纵坐标等分为两部分,分别为上行包与下行包的到达时间,横纵坐标交叉像素点的深度代表数据包数量。3. the anonymous network traffic identification method based on traffic reconstruction and inheritance learning according to claim 1, is characterized in that, according to packet size, direction and time interval, constructing abscissa and ordinate comprises: with packet size as abscissa, Find the maximum and minimum values of data packets in the flow sample as the start and end positions of the abscissa, normalize the size of all data packets to the entire abscissa, and divide the ordinate into two equal parts, which are uplink packets. With the arrival time of the downlink packet, the depth of the intersection of the horizontal and vertical coordinates represents the number of data packets. 4.根据权利要求1所述的基于流量重构与继承学习的匿名网络流量识别方法,其特征在于,所述卷积神经网络结构为输入层-卷积层CONV1-池化层POOL1-卷积层CONV2-池化层POOL2-卷积层CONV3-全连接层FC1-全连接层FC2;4. the anonymous network traffic identification method based on traffic reconstruction and inheritance learning according to claim 1, is characterized in that, described convolutional neural network structure is input layer-convolution layer CONV1-pooling layer POOL1-convolution Layer CONV2-pooling layer POOL2-convolutional layer CONV3-full connection layer FC1-full connection layer FC2; 所述交互信息特征向量根据以下方法得到:将上下行交互行为特征图输入到卷积神经网络中,由前两个卷积层与池化层运算提取空间特征图,经卷积层CONV3的Flatten函数将特征图转为一维向量从而输入全连接层,从全连接层FC2中提取出1*s的一维特征向量V s s为全连接层FC2的神经元数目;The interactive information feature vector is obtained according to the following method: input the uplink and downlink interactive behavior feature map into the convolutional neural network, extract the spatial feature map by the first two convolutional layers and the pooling layer operation, and pass the Flatten of the convolutional layer CONV3. The function converts the feature map into a one-dimensional vector to input the fully connected layer, and extracts a one-dimensional feature vector V s of 1* s from the fully connected layer FC2, where s is the number of neurons in the fully connected layer FC2; 所述包空间特征向量根据以下方法得到:将包原始字节处理后转为的灰度图像输入到卷积神经网络模型中训练,由前两个卷积层与池化层运算提取空间特征图,经卷积层CONV3的Flatten函数将特征图转为一维向量从而输入全连接层,从全连接层FC2中提取出1*n的一维特征向量V n ,n=s。The packet space feature vector is obtained according to the following method: input the grayscale image converted into the original byte of the packet into the convolutional neural network model for training, and extract the spatial feature map by the operation of the first two convolutional layers and the pooling layer. , the feature map is converted into a one-dimensional vector by the Flatten function of the convolutional layer CONV3 to input the fully connected layer, and a one-dimensional feature vector V n of 1* n is extracted from the fully connected layer FC2, n=s. 5.根据权利要求1所述的基于流量重构与继承学习的匿名网络流量识别方法,其特征在于,所述循环神经网络模型的结构为BiGRU层BiGRU1-BiGRU层BiGRU2-全连接层FC4;5. the anonymous network traffic identification method based on traffic reconstruction and inheritance learning according to claim 1, is characterized in that, the structure of described recurrent neural network model is BiGRU layer BiGRU1-BiGRU layer BiGRU2-full connection layer FC4; 所述流时序特征向量根据以下方法得到:将分组数据包的灰度图像批量输入到循环神经网络模型中训练,由BiGRU层进行运算得到时序特征图,经Flatten函数转为一维向量输入全连接层,从全连接层FC4中提取出1*m的一维特征向量,m为全连接层FC4的神经元数目。The stream time sequence feature vector is obtained according to the following method: input the grayscale images of the grouped data packets into the cyclic neural network model for training in batches, perform operations by the BiGRU layer to obtain the time sequence feature map, and convert it into a one-dimensional vector input full connection through the Flatten function. layer, a one-dimensional feature vector of 1* m is extracted from the fully connected layer FC4, where m is the number of neurons in the fully connected layer FC4. 6.根据权利要求1所述的基于流量重构与继承学习的匿名网络流量识别方法,其特征在于,将三种特征向量进行融合包括:以数据包为单位进行特征融合,将维度1*s 的上下行交互行为特征向量、1*n的空间特征向量与1*m的时序特征向量分别利用一个全连接层转化为维度更低的一维特征向量,再将三个维度更低的一维特征向量进行融合得到融合特征。6. the anonymous network traffic identification method based on traffic reconstruction and inheritance learning according to claim 1, is characterized in that, the fusion of three kinds of feature vectors comprises: feature fusion is carried out in units of data packets, the dimension 1* s The upstream and downstream interaction behavior feature vector, 1* n spatial feature vector and 1* m time sequence feature vector are respectively converted into one-dimensional feature vectors with lower dimensions using a fully connected layer, and then the three lower-dimensional one-dimensional feature vectors are converted into The feature vectors are fused to obtain fused features. 7.根据权利要求1所述的基于流量重构与继承学习的匿名网络流量识别方法,其特征在于,所述多分类器采用一维卷积神经网络,包括卷积层-池化层-Flatten层-全连接层-Softmax层,对流量前N个数据包中每一个包的融合特征都进行应用分类;7. the anonymous network traffic identification method based on traffic reconstruction and inheritance learning according to claim 1, is characterized in that, described multi-classifier adopts one-dimensional convolutional neural network, comprises convolution layer-pooling layer-Flatten Layer - fully connected layer - Softmax layer, which applies classification to the fusion features of each packet in the first N data packets of the traffic; 所述多分类器在检测到流量新类别时通过继承学习机制更新分类器参数包括:保留预训练分类器时学习到的部分特征参数,同时学习新流量类别样本,使用继承损失函数计算分类器学习前后的参数差异,结合新流量类别样本损失函数共同更新分类器参数,并利用保留系数确定参数学习程度,在最后全连接层使用线性映射平衡不同类别的分类偏好。When the multi-classifier detects a new traffic category, updating the classifier parameters through the inheritance learning mechanism includes: retaining some of the feature parameters learned during the pre-training classifier, learning new traffic category samples at the same time, and using the inheritance loss function to calculate the classifier learning. The parameter difference before and after is combined with the new traffic category sample loss function to update the classifier parameters, and the retention coefficient is used to determine the degree of parameter learning. In the final fully connected layer, linear mapping is used to balance the classification preferences of different categories. 8.一种基于流量重构与继承学习的匿名网络流量识别装置,其特征在于,包括:8. An anonymous network traffic identification device based on traffic reconstruction and inheritance learning, characterized in that, comprising: 数据采集与过滤模块,采集原始网络流量,并进行流量初筛,剔除非Tor流量;The data collection and filtering module collects the original network traffic, and conducts a preliminary screening of the traffic to eliminate non-Tor traffic; 流量重构模块,对初筛后的流量进行重构,将流量转化为灰度特征图,包括:原始字节特征重构单元:取标准字节为L,对少于L个字节的数据包进行补零操作,超过L个字节的数据包进行截断处理,归一化之后生成i*i的包字节矩阵从而转化为灰度图像;以及,上下行交互行为特征重构单元:根据数据包大小、方向与时间间隔构造横纵坐标,每个时间间隔内的数据包数量作为像素点灰度值,形成模拟上下行交互行为的特征图;The traffic reconstruction module reconstructs the traffic after the initial screening, and converts the traffic into a grayscale feature map, including: original byte feature reconstruction unit: take the standard byte as L , for data less than L bytes The packet is zero-filled, the data packets exceeding L bytes are truncated, and after normalization, an i * i packet byte matrix is generated to convert it into a grayscale image; and, the uplink and downlink interactive behavior feature reconstruction unit: according to The size, direction and time interval of the data packet construct the horizontal and vertical coordinates, and the number of data packets in each time interval is used as the gray value of the pixel point to form a feature map that simulates the interaction behavior of uplink and downlink; 特征提取与融合模块,以数据包为单位,将相对应的上下行交互行为特征图输入到卷积神经网络提取得到交互信息特征向量,将原始字节特征图输入到卷积神经网络提取得到包空间特征向量,将一组包空间特征向量输入循环神经网络提取得到流时序特征向量,并将三种特征向量进行融合;The feature extraction and fusion module takes the data packet as the unit, inputs the corresponding uplink and downlink interaction behavior feature map to the convolutional neural network to extract the interactive information feature vector, and inputs the original byte feature map to the convolutional neural network to extract the packet. Spatial feature vector, input a group of packet space feature vectors into the cyclic neural network to extract the stream time series feature vector, and fuse the three feature vectors; 应用分类模块,将融合特征输入多分类器进行应用分类,所述多分类器在检测到流量新类别时通过继承学习机制更新分类器参数;The application classification module inputs the fusion feature into a multi-classifier for application classification, and the multi-classifier updates the parameters of the classifier through an inheritance learning mechanism when a new type of traffic is detected; 类别判定模块,基于多数原则确定流量的归属应用。The category determination module determines the attribution application of the traffic based on the majority principle. 9.一种计算机设备,其特征在于,包括:9. A computer equipment, characterized in that, comprising: 一个或多个处理器;one or more processors; 存储器;以及memory; and 一个或多个程序,其中所述一个或多个程序被存储在所述存储器中,并且被配置为由所述一个或多个处理器执行,所述程序被处理器执行时实现如权利要求1-7中任一项所述的基于流量重构与继承学习的匿名网络流量识别方法的步骤。One or more programs, wherein the one or more programs are stored in the memory and are configured to be executed by the one or more processors, the programs when executed by the processors implement as claimed in claim 1 The steps of the anonymous network traffic identification method based on traffic reconstruction and inheritance learning according to any one of -7. 10.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-7中任一项所述的基于流量重构与继承学习的匿名网络流量识别方法的步骤。10. A computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the traffic-based reconstruction and inheritance according to any one of claims 1-7 are implemented Learn the steps of an anonymous network traffic identification method.
CN202210506848.6A 2022-05-11 2022-05-11 Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning Active CN114615093B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210506848.6A CN114615093B (en) 2022-05-11 2022-05-11 Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210506848.6A CN114615093B (en) 2022-05-11 2022-05-11 Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning

Publications (2)

Publication Number Publication Date
CN114615093A CN114615093A (en) 2022-06-10
CN114615093B true CN114615093B (en) 2022-07-26

Family

ID=81870459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210506848.6A Active CN114615093B (en) 2022-05-11 2022-05-11 Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning

Country Status (1)

Country Link
CN (1) CN114615093B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115086006B (en) * 2022-06-13 2024-02-02 安徽工业大学 Distributed application program encryption traffic classification method based on bidirectional gating logic unit
CN114785623A (en) * 2022-06-21 2022-07-22 南京信息工程大学 Network intrusion detection method and device based on discretization characteristic energy system
CN115277585B (en) * 2022-07-08 2023-07-28 南京邮电大学 Multi-granularity business flow identification method based on machine learning
CN115442309B (en) * 2022-09-01 2023-06-09 深圳信息职业技术学院 Packet granularity network traffic classification method based on graph neural network
CN116743506B (en) * 2023-08-14 2023-11-21 南京信息工程大学 Encrypted flow identification method and device based on quaternion convolutional neural network
CN117176664A (en) * 2023-08-28 2023-12-05 枣庄福缘网络科技有限公司 An abnormal traffic monitoring system for the Internet of Things
CN116886637B (en) * 2023-09-05 2023-12-19 北京邮电大学 Single-feature encryption stream detection method and system based on graph integration
CN117113262B (en) * 2023-10-23 2024-02-02 北京中科网芯科技有限公司 Network traffic identification method and system
CN118573635A (en) * 2024-05-29 2024-08-30 烽火通信科技股份有限公司 Space-time feature extraction algorithm, flow identification method and model
CN118413387B (en) * 2024-06-14 2025-02-07 四川大学 A Tor anonymous network traffic identification method based on multi-dimensional feature deep learning
CN118433121B (en) * 2024-07-05 2024-10-29 南京信息工程大学 Network traffic content type identification method and device based on deep learning
CN118509372B (en) * 2024-07-18 2024-09-20 广东联想懂的通信有限公司 Flow distribution method and system
CN118659928B (en) * 2024-08-16 2024-12-03 厘壮信息科技(苏州)有限公司 Intelligent control method and system based on VMess

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108200006A (en) * 2017-11-21 2018-06-22 中国科学院声学研究所 A kind of net flow assorted method and device based on the study of stratification space-time characteristic
CN112367334A (en) * 2020-11-23 2021-02-12 中国科学院信息工程研究所 Network traffic identification method and device, electronic equipment and storage medium
CN112910853A (en) * 2021-01-18 2021-06-04 南京信息工程大学 Encryption flow classification method based on mixed characteristics
CN113037730A (en) * 2021-02-27 2021-06-25 中国人民解放军战略支援部队信息工程大学 Network encryption traffic classification method and system based on multi-feature learning
CN113162908A (en) * 2021-03-04 2021-07-23 中国科学院信息工程研究所 Encrypted flow detection method and system based on deep learning
CN114301636A (en) * 2021-12-10 2022-04-08 南京理工大学 VPN communication behavior analysis method based on multi-scale spatiotemporal feature fusion of traffic

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108200006A (en) * 2017-11-21 2018-06-22 中国科学院声学研究所 A kind of net flow assorted method and device based on the study of stratification space-time characteristic
CN112367334A (en) * 2020-11-23 2021-02-12 中国科学院信息工程研究所 Network traffic identification method and device, electronic equipment and storage medium
CN112910853A (en) * 2021-01-18 2021-06-04 南京信息工程大学 Encryption flow classification method based on mixed characteristics
CN113037730A (en) * 2021-02-27 2021-06-25 中国人民解放军战略支援部队信息工程大学 Network encryption traffic classification method and system based on multi-feature learning
CN113162908A (en) * 2021-03-04 2021-07-23 中国科学院信息工程研究所 Encrypted flow detection method and system based on deep learning
CN114301636A (en) * 2021-12-10 2022-04-08 南京理工大学 VPN communication behavior analysis method based on multi-scale spatiotemporal feature fusion of traffic

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
VoIP Traffic Detection in Tunneled and Anonymous Networks Using Deep Learning;FAIZ UL ISLAM et al.;《IEEE Access》;20210419;全文 *

Also Published As

Publication number Publication date
CN114615093A (en) 2022-06-10

Similar Documents

Publication Publication Date Title
CN114615093B (en) Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning
CN112163594B (en) Network encryption traffic identification method and device
CN112398779B (en) Network traffic data analysis method and system
CN109902740B (en) Re-learning industrial control intrusion detection method based on multi-algorithm fusion parallelism
CN110751222A (en) Online encrypted traffic classification method based on CNN and LSTM
CN107465664B (en) Intrusion detection method based on parallel multi-worker bee colony algorithm and support vector machine
CN113298186A (en) Network abnormal flow detection method for confluent flow model confrontation generation network and clustering algorithm
Soleymanpour et al. An efficient deep learning method for encrypted traffic classification on the web
CN113746707A (en) Encrypted traffic classification method based on classifier and network structure
CN112116078A (en) Information security baseline learning method based on artificial intelligence
CN113705604A (en) Botnet flow classification detection method and device, electronic equipment and storage medium
CN109951357A (en) Network Application Recognition Method Based on Multilayer Neural Network
CN112367303A (en) Distributed self-learning abnormal flow cooperative detection method and system
Novikova et al. Autoencoder anomaly detection on large CAN bus data
CN115277888B (en) Method and system for analyzing message type of mobile application encryption protocol
CN112929380B (en) Trojan horse communication detection method and system combining meta-learning and spatiotemporal feature fusion
CN106453294A (en) Security situation prediction method based on niche technology with fuzzy elimination mechanism
CN111130942B (en) Application flow identification method based on message size analysis
CN115643115A (en) Method and system for predicting security situation of industrial control network based on big data
Cui et al. Semi-2DCAE: a semi-supervision 2D-CNN AutoEncoder model for feature representation and classification of encrypted traffic
CN112633475A (en) Large-scale network burst flow identification model and method and model training method
CN114358177B (en) Unknown network traffic classification method and system based on multidimensional feature compact decision boundary
CN114169390B (en) A network anomaly detection method integrating GBDT and neural network
CN116599694A (en) Botnet detection method based on CNN and LSTM-DAE
Jia et al. Trojan traffic detection based on meta-learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant