CN114448906A - Network traffic identification method and system - Google Patents
Network traffic identification method and system Download PDFInfo
- Publication number
- CN114448906A CN114448906A CN202210101924.5A CN202210101924A CN114448906A CN 114448906 A CN114448906 A CN 114448906A CN 202210101924 A CN202210101924 A CN 202210101924A CN 114448906 A CN114448906 A CN 114448906A
- Authority
- CN
- China
- Prior art keywords
- data
- image
- network
- resnet
- network traffic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000000605 extraction Methods 0.000 claims abstract description 24
- 238000000513 principal component analysis Methods 0.000 claims abstract description 24
- 238000013507 mapping Methods 0.000 claims abstract description 21
- 230000000007 visual effect Effects 0.000 claims abstract description 14
- 238000010586 diagram Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 8
- 238000012800 visualization Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2483—Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2441—Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
Abstract
The application provides a network flow identification method and a system, which are used for converting first network flow data in hexadecimal into second network flow data in binary, and mapping the second network flow data into gray map data through a mapping rule; performing feature extraction on the gray scale image data by adopting a triple residual network L2-triplot Resnet constrained by L2 to obtain first feature data; performing linear dimensionality reduction on the first characteristic data by using a Principal Component Analysis (PCA) algorithm to obtain second characteristic data; carrying out nonlinear dimensionality reduction on the second characteristic data through a t-SNE algorithm to obtain visual characteristic data; the visual characteristic data is subjected to clustering identification through a K-means algorithm, and an identification result is output.
Description
Technical Field
The present application relates to the field of network security technologies, and in particular, to a method and a system for identifying network traffic.
Background
In the field of network security, network traffic carries behavioral characteristics of network applications and is an important carrier for characterizing network application properties. With the rapid increase of network traffic, more and more encrypted network traffic and a large number of private protocols emerge, so that a network traffic identification method capable of improving the traffic identification accuracy and the automation degree has important practical significance for network security control, and is a problem to be solved by technical personnel in the field.
Disclosure of Invention
The application provides a network traffic identification method and a network traffic identification system, which can improve the accuracy and the automation degree of network traffic identification.
In order to achieve the above object, the present application provides the following technical solutions:
a network traffic identification method comprises the following steps:
the method comprises the steps of converting first network flow data in hexadecimal into second network flow data in binary, and mapping the second network flow data into gray map data through a mapping rule;
performing feature extraction on the gray scale image data by adopting a triple residual network L2-triplot Resnet network constrained by L2 to obtain first feature data;
performing linear dimensionality reduction on the first characteristic data by utilizing a Principal Component Analysis (PCA) algorithm to obtain second characteristic data;
carrying out nonlinear dimensionality reduction on the second characteristic data through a t-SNE algorithm to obtain visual characteristic data;
and performing clustering identification on the visualized characteristic data through a K-means algorithm, and outputting an identification result.
Preferably, the converting the hexadecimal first network traffic data into the binary second network traffic data includes:
judging whether the bit stream length of the first network traffic data exceeds the 1024-bit stream length, if so, deleting the network traffic data exceeding the 1024-bit stream length in the first network traffic data to obtain second network traffic data;
if not, zero padding is carried out in the first network traffic data to obtain second network traffic data, and the bit stream length of the second network traffic data is the 1024-bit stream length.
Preferably, the L2-triplot Resnet network includes: a depth residual error network Resnet-18;
the depth residual error network Resnet-18 comprises:
17 convolutional layers and 1 fully-connected layer;
the depth residual network Resnet-18 does not include a classification layer.
Preferably, the L2-triplot Resnet network includes: a depth residual network Resnet-18, and L2 constraint and scaling module;
the method for extracting the characteristics of the gray scale map data by adopting the triple residual network L2-triplot Resnet constrained by L2 to obtain first characteristic data comprises the following steps:
inputting three images in the gray map data into three depth residual error networks Resnet-18 in the L2-triplot Resnet network respectively to obtain three embedded features;
adding L2 constraints to the three embedded features through an L2 constraint and scaling module to obtain the first feature data corresponding to the three embedded features respectively, wherein the first feature data are specifically represented by the following formula:
wherein x isiFor the embedding feature, r is the scaling parameter constrained by the L2, N is the natural number set, | f (x)i)‖2Is constrained to the first feature data.
Preferably, the method further comprises: :
calculating an image x in a gray map dataset byi、xjSimilarity of (2):
Lpis a Min-type distance, p is a norm and p is more than or equal to 1, when p is 2, the image xi、xjIs Euclidean distance, LpThe smaller, the image xiAnd image xjThe more similar the intensity map data set is χ, xi,xjE x, image xi、xjIs two different images in the gray scale map dataset, d is d-dimensional euclidean space,
f(xi)=(f(xi)(1),f(xi)(2),…,f(xi)(d))T,f(xj)=(f(xj)(1),f(xj)(2),…,f(xj)(d))T;
the distance between the positive and negative image pairs is calculated by:
xi、respectively a sample image, a positive image and a negative image, xi、For a positive image pair, xi、For the negative image pair,is the euclidean distance of the positive image pair,is the euclidean distance of the negative image pair, the positive image being an image that belongs to the same application class as the sample image and is different from the network traffic data of the sample image, the negative image being an image that does not belong to the same application class as the sample image and is different from the network traffic data of the sample image, the positive image pair comprising the sample image and the positive image, the negative image pair comprising the sample image and the negative image, α being the distance between the positive image pair and the negative image pair;
From said set of gray-scale map data, image xi、xjSimilarity of the positive image pair to the negative image pair, distance between the positive image pair and the negative image pair, and hinge lossCalculating a ternary loss function:
l is the minimum value of the ternary loss function;
updating the L2-triplot Resnet network with the minimum of the ternary loss function.
A network traffic identification system, comprising:
the data acquisition module is used for converting first network flow data in hexadecimal form into second network flow data in binary form and mapping the second network flow data into gray map data through a mapping rule;
the characteristic extraction module is used for performing characteristic extraction on the gray map data by adopting a triple residual network L2-triplot Resnet network constrained by L2 to obtain first characteristic data, performing linear dimensionality reduction on the first characteristic data by utilizing a Principal Component Analysis (PCA) algorithm to obtain second characteristic data, and performing nonlinear dimensionality reduction on the second characteristic data by utilizing a t-SNE algorithm to obtain visual characteristic data;
and the flow identification module is used for carrying out clustering identification on the visualized characteristic data through a K-means algorithm and outputting an identification result.
Preferably, the data acquisition module includes:
judging whether the bit stream length of the first network traffic data exceeds the 1024-bit stream length, if so, deleting the network traffic data exceeding the 1024-bit stream length in the first network traffic data to obtain second network traffic data;
if not, zero padding is carried out in the first network traffic data to obtain second network traffic data, and the bit stream length of the second network traffic data is the 1024-bit stream length.
Preferably, the L2-triplot Resnet network in the feature extraction module includes: a depth residual error network Resnet-18;
the depth residual error network Resnet-18 comprises:
17 convolutional layers and 1 fully-connected layer;
the depth residual network Resnet-18 does not include a classification layer.
Preferably, the L2-triplot Resnet network in the feature extraction module includes: a depth residual network Resnet-18, and L2 constraint and scaling module;
the method for extracting the characteristics of the gray scale map data by adopting the triple residual network L2-triplot Resnet constrained by L2 to obtain first characteristic data comprises the following steps:
inputting three images in the gray map data into three depth residual error networks Resnet-18 in the L2-triplot Resnet network respectively to obtain three embedded features;
adding, by the L2 constraint and scaling module, L2 constraints to the three embedded features to obtain the first feature data corresponding to the three embedded features, according to the following formula:
wherein x isiFor the embedding feature, r is the scaling parameter constrained by the L2, N is the natural number set, | f (x)i)‖2Is constrained to the first feature data.
Preferably, the feature extraction module further includes:
the gray map data set, image x, is calculated as followsi、xjSimilarity of (2):
wherein L ispIs a Min-type distance, p is a norm and p is more than or equal to 1, when p is 2, the image xi、xjIs Euclidean distance, LpThe smaller, the image xiAnd image xjThe more similar the intensity map data set is χ, xi,xjE x, image xi、xjIs two different images in the gray scale map dataset, d is d-dimensional euclidean space,
f(xi)=(f(xi)(1),f(xi)(2),…,f(xi)(d))T,f(xj)=(f(xj)(1),f(xj)(2),…,f(xj)(d))T;
the distance between the positive and negative image pairs is calculated by:
xi、respectively a sample image, a positive image and a negative image, xi、For a positive image pair, xi、For the negative image pair,is the euclidean distance of the positive image pair,is the euclidean distance of the negative image pair, the positive image being an image that belongs to the same application class as the sample image and is different from the network traffic data of the sample image, the negative image being an image that does not belong to the same application class as the sample image and is different from the network traffic data of the sample image, the positive image pair comprising the sample image and the positive image, the negative image pair comprising the sample image and the negative image, α being the distance between the positive image pair and the negative image pair;
By image x in the greyscale map data seti、xjSimilarity of the positive image pair to the negative image pair, distance between the positive image pair and the negative image pair, and hinge lossCalculating a ternary loss function:
l is the minimum value of the ternary loss function;
updating the L2-triplot Resnet network with the minimum of the ternary loss function.
The application provides a network flow identification method and a system, which are used for converting first network flow data in hexadecimal into second network flow data in binary, and mapping the second network flow data into gray map data through a mapping rule; performing feature extraction on the gray scale image data by adopting a triple residual network L2-triplot Resnet network constrained by L2 to obtain first feature data; performing linear dimensionality reduction on the first characteristic data by utilizing a Principal Component Analysis (PCA) algorithm to obtain second characteristic data; carrying out nonlinear dimensionality reduction on the second characteristic data through a t-SNE algorithm to obtain visual characteristic data; and performing clustering identification on the visualized characteristic data through a K-means algorithm, and outputting an identification result. Due to the fact that the L2-triplot Resnet network is adopted to improve the efficiency and the precision of feature extraction, PCA linear dimensionality reduction and t-SNE nonlinear dimensionality reduction are combined, the stability of the structure in data is guaranteed, meanwhile, the calculated amount is reduced, visual analysis of unknown feature data is achieved, finally, fast iteration is conducted through a K-means algorithm, network flow is classified, and the accuracy and the automation degree of network flow identification can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a schematic flowchart of a network traffic identification method according to an embodiment of the present application;
fig. 2a is a schematic diagram of a structure diagram of a single depth residual error network Resnet-18 according to an embodiment of the present application;
fig. 2b is a schematic structural diagram of a Layer in a depth residual error network Resnet-18 according to an embodiment of the present application;
fig. 2c is a schematic structural diagram of a basic block in a Layer according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a network traffic identification system according to an embodiment of the present application.
Detailed Description
The application provides a network traffic identification method and a network traffic identification system, which are used for improving the accuracy and the automation degree of network traffic identification.
In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, fig. 1 is a flowchart of a network traffic identification method according to an embodiment of the present application, where the embodiment of the present application at least includes the following steps:
101. the method comprises the steps of converting first network flow data in hexadecimal into second network flow data in binary, and mapping the second network flow data into gray map data through a mapping rule;
in the embodiment of the present application, because the bit stream lengths of the network traffic data are different, and the gray scale map data generated according to the network traffic data are also not uniform, it is necessary to convert the first network traffic data into the binary second network traffic data, so as to implement uniform bit stream lengths of the network traffic data, and then map the binary second network traffic data into the gray scale map data according to a mapping rule, where the mapping rule is: the binary value 1 corresponds to the image gray scale value 255, and the binary value 0 corresponds to the image gray scale value 0, so that the gray scale image data are unified. When the network flow data is processed, the bit stream length of the network flow data is unified and the bit stream length is also unified with the gray scale map data. Thus improving the efficiency of network traffic data processing.
Specifically, an optional manner is that the specific implementation process of this step 101 may include:
judging whether the bit stream length of the first network traffic data exceeds the 1024-bit stream length, if so, deleting the network traffic data exceeding the 1024-bit stream length in the first network traffic data to obtain second network traffic data; and if not, zero padding is carried out in the first network traffic data to obtain second network traffic data, and the bit stream length of the second network traffic data is 1024-bit stream length.
In this implementation manner, when first network traffic data is acquired, a network traffic data identification experiment is performed in order to unify the bit stream lengths of the network traffic data, and finally a 1024-bit stream length is selected as a bit stream length unification standard to unify the bit stream lengths of the network traffic data, and then whether the bit stream length of the first network traffic data exceeds a 1024-bit binary bit stream length is judged, and if the bit stream length of the first network traffic data exceeds the 1024-bit binary bit stream length, the network traffic data exceeding the 1024-bit binary bit stream length in the first network traffic data is deleted to obtain second network traffic data; and if the bit stream length of the first network traffic data does not exceed the 1024-bit binary bit stream length, zero padding is carried out in the first network traffic data to obtain second network traffic data. The bit stream length of the second network traffic data is 1024-bit stream length, and the processing efficiency of the network traffic data can be improved by processing the second network traffic after the bit stream length is unified.
102. And performing feature extraction on the gray map data by adopting a triple residual network L2-triplot Resnet network constrained by L2 to obtain first feature data.
In this embodiment of the present application, the triple residual network L2-triplot Resnet network constrained by L2 includes: l2 constraint and scaling module, three depth residual networks Resnet-18. Please refer to fig. 2a, fig. 2b, and fig. 2c for the composition of the depth residual error network Resnet-18. Fig. 2a is a structural diagram of a single depth residual error network Resnet-18, fig. 2b is a structural diagram of a Layer in the depth residual error network Resnet-18, and fig. 2c is a structural diagram of a basic block in the Layer.
The deep residual network Resnet-18 does not contain a classification layer, but directly takes the embedded features of 32 dimensions in the fully-connected layer as output results.
As can be seen from fig. 2a, the depth residual network Resnet-18 includes: one 3 x 3 convolutional Layer, Layer1, Layer2, Layer3, Layer4, average pooling Layer, full tie Layer.
As can be seen from fig. 2b, each Layer contains two basic blocks.
As can be seen from fig. 2c, each basic block contains two convolutional layers.
The specific parameters of each layer of the single depth residual network Resnet-18 are shown in table 1:
TABLE 1
Specifically, in an alternative implementation manner, the specific implementation process of this step 102 includes the following steps a1-a 2:
step A1: and respectively inputting the three images in the gray map data into three depth residual error networks Resnet-18 in an L2-triplot Resnet network to obtain three embedded features.
In this implementation, three images in the grayscale image data are acquired: x is the number ofi、Wherein xi、A sample image, a positive image and a negative image, respectively, the positive image being an image belonging to the same application class as the sample image and different from the network traffic data of the sample image, and the negative image being an image not belonging to the same application class as the sample image and different from the network traffic data of the sample image.
Step A2: adding L2 constraints to the three embedded features through an L2 constraint and scaling module to obtain first feature data corresponding to the three embedded features respectively, wherein the first feature data are specifically represented by the following formula:
wherein x isiFor the embedding feature, r is the scaling parameter constrained by the L2, N is the natural number set, | f (x)i)‖2Is constrained to the first feature data.
In this implementation, the scaling parameter r constrained by L2 is added to the embedded feature to be located on a hypersphere with a fixed radius r, and the scaling parameter r can reduce the ternary loss, so that adding the L2 constraint to the three embedded features through the L2 constraint and scaling module can make the embedded features converge quickly and reduce the ternary loss of the three embedded features to the maximum extent.
103. And performing linear dimensionality reduction on the first characteristic data by using a Principal Component Analysis (PCA) algorithm to obtain second characteristic data.
In the embodiment of the application, because the Principal Component Analysis (PCA) algorithm is high in calculation speed, the adoption of the PCA algorithm to perform dimensionality reduction on the feature data can reduce excessive resource consumption when the data dimensionality is too high, reduce the data to a lower dimensionality, and improve the calculation efficiency of performing dimensionality reduction on the first feature data.
104. And carrying out nonlinear dimensionality reduction on the second characteristic data through a t-SNE algorithm to obtain visualized characteristic data.
In the embodiment of the application, because the t-SNE (t-distributed stored probabilistic Neighbor Embedding) algorithm can project high-dimensional data into a low-dimensional space to realize visualization and maintain the capability of a local structure of network traffic data, the t-SNE algorithm is adopted to perform 2-dimensional visualization on the second characteristic data, so that the nonlinear dimension reduction is performed on the second characteristic data while the local structure of the second characteristic data is maintained, the visualization processing on the characteristic data is realized, and the problems of congestion and difficulty in optimization of the characteristic data are solved.
105. And performing clustering identification on the visualized characteristic data through a K-means algorithm, and outputting an identification result.
In the embodiment of the application, due to the fact that the K-means (K-means clustering algorithm) algorithm is high in iteration speed, convenient to use and good in clustering performance, the K-means algorithm is adopted to conduct iterative training on the visual characteristic data to determine the mass center of each visual characteristic data cluster, the distance between the data point participating in the iterative training and the mass center of each visual characteristic data cluster is calculated, and then the first network flow data corresponding to the data point is identified according to the distance between the data point and the mass center.
On the basis of the method, the method further comprises the following steps: updating the L2-triplot Resnet network through the minimum value of the ternary loss function, which is as follows:
defining a gray-scale map dataset χ, xi,xjE χ are two different images in the grayscale dataset, f (x) e RdThe function is a feature embedding function for mapping the images in the gray map data set into feature points in Euclidean space, and the function embeds the images in the gray map data set into d-dimensional Euclidean space to make the distance between similar images in the gray map data set shorter, f (x)i)=(f(xi)(1),f(xi)(2),…,f(xi)(d))T;f(xj)=(f(xj)(1),f(xj)(2),…,f(xj)(d))T;
From image xi、xjThe distance between feature points in d-dimensional Euclidean space is used for calculating an image x in a gray scale image data set in the following wayi、xjSimilarity of (2):
Lpis a Min-type distance, p is a norm and p is more than or equal to 1, when p is 2, the image xi、xjIs Euclidean distance, LpThe smaller, the image xiAnd image xjThe more similar the grayscale image dataset is χ, xi,xjE x, image xi、xjIs two different images in the gray scale map dataset, d is d-dimensional euclidean space,
the distance between the positive and negative image pairs is recalculated by:
xi、respectively a sample image, a positive image and a negative image, xi、For a positive image pair, xi、For the negative image pair,is the euclidean distance of the positive image pair,the Euclidean distance of a negative image pair, wherein the positive image is an image which belongs to the same application class as the sample image and is different from the network traffic data of the sample image, the negative image is an image which does not belong to the same application class as the sample image and is different from the network traffic data of the sample image, the positive image pair is the sample image and the positive image, the negative image pair is the sample image and the negative image, and alpha is the distance between the positive image pair and the negative image pair;
And concentrating the image x by means of a gray-scale map data seti、xjAnd the distance between the positive and negative image pairs and hinge lossCalculating a ternary loss function:
l is the minimum value of the ternary loss function;
and finally, updating the L2-triplot Resnet network through the minimum value L of the ternary loss function, and improving the accuracy of extracting the characteristic data by the L2-triplot Resnet network.
In summary, the network traffic identification method and system provided in this embodiment are configured to convert hexadecimal first network traffic data into binary second network traffic data, and map the second network traffic data into grayscale data according to a mapping rule; performing feature extraction on the gray scale image data by adopting a triple residual network L2-triplot Resnet network constrained by L2 to obtain first feature data; performing linear dimensionality reduction on the first characteristic data by utilizing a Principal Component Analysis (PCA) algorithm to obtain second characteristic data; carrying out nonlinear dimensionality reduction on the second characteristic data through a t-SNE algorithm to obtain visual characteristic data; and performing clustering identification on the visualized characteristic data through a K-means algorithm, and outputting an identification result.
As shown in fig. 3, for a schematic structural diagram of a network traffic identification system provided in an embodiment of the present application, a network traffic identification system will be described below, and for related contents, refer to the foregoing method embodiment, where the network traffic identification system includes:
the data acquisition module 201: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for converting first network flow data in hexadecimal into second network flow data in binary, and mapping the second network flow data into gray map data through a mapping rule;
the feature extraction module 202: the method comprises the steps of performing feature extraction on gray map data by adopting an L2-triplot Resnet network to obtain first feature data, performing linear dimensionality reduction on the first feature data by utilizing a Principal Component Analysis (PCA) algorithm to obtain second feature data, and performing nonlinear dimensionality reduction on the second feature data by utilizing a t-SNE algorithm to obtain visual feature data;
the flow identification module 203: the method is used for carrying out clustering identification on the visualized characteristic data through a K-means algorithm and outputting an identification result.
Optionally, the data obtaining module includes:
judging whether the bit stream length of the first network traffic data exceeds the 1024-bit stream length, if so, deleting the network traffic data exceeding the 1024-bit stream length in the first network traffic data to obtain second network traffic data;
and if not, zero padding is carried out in the first network traffic data to obtain second network traffic data, and the bit stream length of the second network traffic data is 1024-bit stream length.
Optionally, the L2-triplot Resnet network in the feature extraction module includes: a depth residual error network Resnet-18;
the deep residual network Resnet-18 includes:
17 convolutional layers and 1 fully-connected layer;
the depth residual network Resnet-18 does not include a classification layer.
Optionally, the performing feature extraction on the grayscale map data by using the triple residual network L2-triplot Resnet network constrained by L2 in the feature extraction module to obtain first feature data includes:
respectively inputting three images in gray map data into three depth residual error networks Resnet-18 in an L2-triplot Resnet network to obtain three embedded characteristics;
adding L2 constraints to the three embedded features through an L2 constraint and scaling module to obtain three first feature data corresponding to the three embedded features, wherein the three first feature data are represented by the following formula:
wherein x isiFor embedded features, N is a set of natural numbers, | f (x)i)‖zIs a first feature data constraint.
Optionally, the feature extraction module further includes:
computing image x in a grayscale image dataseti,xjSimilarity of (2):
wherein L ispIs a Min-type distance, p is a norm and p is more than or equal to 1, when p is 2, the image xi、xjIs Euclidean distance, LpThe smaller, the image xiAnd image xjThe more similar the intensity map data set is χ, xi,xjE x, image xi、xjTwo different images in the gray scale image dataset, d is d-dimensional euclidean space,
f(xi)=(f(xi)(1),f(xi)(2),…,f(xi)(d))T,f(xj)=(f(xj)(1),f(xj)(2),…,f(xj)(d))T;
calculating the distance between the positive and negative image pairs:
xi、respectively a sample image, a positive image and a negative image, xi、For a positive image pair, xi、For the negative image pair,is the euclidean distance of the positive image pair,is the Euclidean distance of the negative image pair, the positive image is the same with the sample imageThe image of the application class and different from the network traffic data of the sample image, the negative image is an image which does not belong to the same application class as the sample image and is different from the network traffic data of the sample image, the positive image pair comprises the sample image and the positive image, the negative image pair comprises the sample image and the negative image, and alpha is the distance between the positive image pair and the negative image pair;
Concentration of image x by grayscale map datai、xjSimilarity of (d), distance between positive and negative image pairs, and hinge lossCalculating a ternary loss function:
l is the minimum value of the ternary loss function;
the L2-triplot Resnet network is updated by the minimum of the ternary loss function.
In summary, the network traffic identification method and system provided in this embodiment are configured to convert hexadecimal first network traffic data into binary second network traffic data, and map the second network traffic data into grayscale data according to a mapping rule; performing feature extraction on the gray scale image data by adopting a triple residual network L2-triplot Resnet network constrained by L2 to obtain first feature data; performing linear dimensionality reduction on the first characteristic data by utilizing a Principal Component Analysis (PCA) algorithm to obtain second characteristic data; carrying out nonlinear dimensionality reduction on the second characteristic data through a t-SNE algorithm to obtain visual characteristic data; and performing clustering identification on the visualized characteristic data through a K-means algorithm, and outputting an identification result.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.
Claims (10)
1. A network traffic identification method is characterized by comprising the following steps:
the method comprises the steps of converting first network flow data in hexadecimal into second network flow data in binary, and mapping the second network flow data into gray map data through a mapping rule;
performing feature extraction on the gray scale image data by adopting a triple residual network L2-triplot Resnet network constrained by L2 to obtain first feature data;
performing linear dimensionality reduction on the first characteristic data by utilizing a Principal Component Analysis (PCA) algorithm to obtain second characteristic data;
carrying out nonlinear dimensionality reduction on the second characteristic data through a t-SNE algorithm to obtain visual characteristic data;
and performing clustering identification on the visualized characteristic data through a K-means algorithm, and outputting an identification result.
2. The method of claim 1, wherein converting the first hexadecimal network traffic data into the second binary network traffic data comprises:
judging whether the bit stream length of the first network traffic data exceeds the 1024-bit stream length, if so, deleting the network traffic data exceeding the 1024-bit stream length in the first network traffic data to obtain second network traffic data;
if not, zero padding is carried out in the first network traffic data to obtain second network traffic data, and the bit stream length of the second network traffic data is the 1024-bit stream length.
3. The method of claim 1, wherein the L2-triplot Resnet network comprises: a depth residual error network Resnet-18;
the depth residual error network Resnet-18 comprises:
17 convolutional layers and 1 fully-connected layer;
the depth residual network Resnet-18 does not include a classification layer.
4. The method of claim 1, wherein the L2-triplot Resnet network comprises: a depth residual network Resnet-18, and L2 constraint and scaling module;
the method for extracting the characteristics of the gray scale map data by adopting the triple residual network L2-triplot Resnet constrained by L2 to obtain first characteristic data comprises the following steps:
inputting three images in the gray map data into three depth residual error networks Resnet-18 in the L2-triplot Resnet network respectively to obtain three embedded features;
adding L2 constraints to the three embedded features through an L2 constraint and scaling module to obtain the first feature data corresponding to the three embedded features respectively, wherein the first feature data are specifically represented by the following formula:
wherein x isiFor the embedding feature, r is the scaling parameter constrained by the L2, N is the natural number set, | f (x)i)‖2And constraining the first feature data.
5. The method of claim 1, further comprising:
the gray map data set, image x, is calculated as followsi,xjSimilarity of (2):
wherein L ispIs a Min-type distance, p is a norm and p is more than or equal to 1, when p is 2, the image xi、xjIs Euclidean distance, LpThe smaller, the image xiAnd image xjThe more similar the intensity map data set is χ, xi,xjE x, image xi、xjIs two different images in the gray scale map dataset, d is d-dimensional euclidean space,
f(xi)=(f(xi)(1),f(xi)(2),…,f(xi)(d))T,f(xj)=(f(xj)(1),f(xj)(2),…,f(xj)(d))T;
the distance between the positive and negative image pairs is calculated by:
xi、respectively a sample image, a positive image and a negative image, xi、For a positive image pair, xi、For the negative image pair,is the euclidean distance of the positive image pair,is the euclidean distance of the negative image pair, the positive image being an image that belongs to the same application class as the sample image and is different from the network traffic data of the sample image, the negative image being an image that does not belong to the same application class as the sample image and is different from the network traffic data of the sample image, the positive image pair comprising the sample image and the positive image, the negative image pair comprising the sample image and the negative image, α being the distance between the positive image pair and the negative image pair;
By image x in the greyscale map data seti、xjThe distance between the positive image pair and the negative image pair, and a hingeLoss of powerCalculating a ternary loss function:
l is the minimum value of the ternary loss function;
updating the L2-triplot Resnet network with the minimum of the ternary loss function.
6. A network traffic identification system, comprising:
the data acquisition module is used for converting first network flow data in hexadecimal form into second network flow data in binary form and mapping the second network flow data into gray map data through a mapping rule;
the characteristic extraction module is used for performing characteristic extraction on the gray map data by adopting a triple residual network L2-triplot Resnet network constrained by L2 to obtain first characteristic data, performing linear dimensionality reduction on the first characteristic data by utilizing a Principal Component Analysis (PCA) algorithm to obtain second characteristic data, and performing nonlinear dimensionality reduction on the second characteristic data by utilizing a t-SNE algorithm to obtain visual characteristic data;
and the flow identification module is used for carrying out clustering identification on the visualized characteristic data through a K-means algorithm and outputting an identification result.
7. The system of claim 6, wherein the data acquisition module comprises:
judging whether the bit stream length of the first network traffic data exceeds the 1024-bit stream length, if so, deleting the network traffic data exceeding the 1024-bit stream length in the first network traffic data to obtain second network traffic data;
if not, zero padding is carried out in the first network traffic data to obtain second network traffic data, and the bit stream length of the second network traffic data is the 1024-bit stream length.
8. The system according to claim 6, wherein the L2-triplot Resnet network in the feature extraction module comprises: a depth residual error network Resnet-18;
the depth residual error network Resnet-18 comprises:
17 convolutional layers and 1 fully-connected layer;
the depth residual network Resnet-18 does not include a classification layer.
9. The system according to claim 6, wherein the L2-triplot Resnet network in the feature extraction module comprises: a depth residual network Resnet-18, and L2 constraint and scaling module;
the method for extracting the characteristics of the gray scale map data by adopting the triple residual network L2-triplot Resnet constrained by L2 to obtain first characteristic data comprises the following steps:
inputting three images in the gray map data into three depth residual error networks Resnet-18 in the L2-triplot Resnet network respectively to obtain three embedded features;
adding L2 constraints to the three embedded features through an L2 constraint and scaling module to obtain the first feature data corresponding to the three embedded features respectively, wherein the first feature data are specifically represented by the following formula:
wherein x isiFor the embedding feature, r is the scaling parameter constrained by the L2, N is the natural number set, | f (x)i)‖2Is constrained to the first feature data.
10. The system of claim 6, wherein the feature extraction module further comprises:
the gray map data set, image x, is calculated as followsi,xjSimilarity of (2):
wherein L ispIs a Min-type distance, p is a norm and p is more than or equal to 1, when p is 2, the image xi、xjIs Euclidean distance, LpThe smaller, the image xiAnd image xjThe more similar the intensity map data set is χ, xi,xjE x, image xi、xjIs two different images in the gray scale map dataset, d is d-dimensional euclidean space,
f(xi)=(f(xi)(1),f(xi)(2),…,f(xi)(d))T,f(xj)=(f(xj)(1),f(xj)(2),…,f(xj)(d))T;
the distance between the positive and negative image pairs is calculated by:
xi、respectively a sample image, a positive image and a negative image, xi、For a positive image pair, xi、For the negative image pair,is the euclidean distance of the positive image pair,is the euclidean distance of the negative image pair, the positive image being an image that belongs to the same application class as the sample image and is different from the network traffic data of the sample image, the negative image being an image that does not belong to the same application class as the sample image and is different from the network traffic data of the sample image, the positive image pair comprising the sample image and the positive image, the negative image pair comprising the sample image and the negative image, α being the distance between the positive image pair and the negative image pair;
By image x in said gray map data seti、xjSimilarity of the positive image pair to the negative image pair, distance between the positive image pair and the negative image pair, and hinge lossCalculating a ternary loss function:
l is the minimum value of the ternary loss function;
updating the L2-triplot Resnet network with the minimum of the ternary loss function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210101924.5A CN114448906A (en) | 2022-01-27 | 2022-01-27 | Network traffic identification method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210101924.5A CN114448906A (en) | 2022-01-27 | 2022-01-27 | Network traffic identification method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114448906A true CN114448906A (en) | 2022-05-06 |
Family
ID=81369649
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210101924.5A Pending CN114448906A (en) | 2022-01-27 | 2022-01-27 | Network traffic identification method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114448906A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112580590A (en) * | 2020-12-29 | 2021-03-30 | 杭州电子科技大学 | Finger vein identification method based on multi-semantic feature fusion network |
CN112633154A (en) * | 2020-12-22 | 2021-04-09 | 云南翼飞视科技有限公司 | Method and system for converting heterogeneous face feature vectors |
-
2022
- 2022-01-27 CN CN202210101924.5A patent/CN114448906A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112633154A (en) * | 2020-12-22 | 2021-04-09 | 云南翼飞视科技有限公司 | Method and system for converting heterogeneous face feature vectors |
CN112580590A (en) * | 2020-12-29 | 2021-03-30 | 杭州电子科技大学 | Finger vein identification method based on multi-semantic feature fusion network |
Non-Patent Citations (1)
Title |
---|
薛靖靓: "基于深度度量学习的网络流量识别技术研究", 《中国优秀硕士论文电子期刊网》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106980641B (en) | Unsupervised Hash quick picture retrieval system and unsupervised Hash quick picture retrieval method based on convolutional neural network | |
CN111368920B (en) | Quantum twin neural network-based classification method and face recognition method thereof | |
CN110222218B (en) | Image retrieval method based on multi-scale NetVLAD and depth hash | |
CN112511555A (en) | Private encryption protocol message classification method based on sparse representation and convolutional neural network | |
JP2018511109A (en) | Learning from distributed data | |
CN113989583A (en) | Method and system for detecting malicious traffic of internet | |
CN111681132B (en) | Typical power consumption mode extraction method suitable for massive class unbalanced load data | |
CN114510732A (en) | Encrypted traffic classification method based on incremental learning | |
Yang et al. | One-class classification using generative adversarial networks | |
Zhang et al. | Deep unsupervised self-evolutionary hashing for image retrieval | |
WO2014118978A1 (en) | Learning method, image processing device and learning program | |
CN112884121A (en) | Traffic identification method based on generation of confrontation deep convolutional network | |
CN112990371B (en) | Unsupervised night image classification method based on feature amplification | |
CN113254649B (en) | Training method of sensitive content recognition model, text recognition method and related device | |
CN112348108A (en) | Sample labeling method based on crowdsourcing mode | |
CN114448906A (en) | Network traffic identification method and system | |
Castellanos et al. | Prototype generation in the string space via approximate median for data reduction in nearest neighbor classification | |
Ma et al. | Toward making unsupervised graph hashing discriminative | |
CN116127400A (en) | Sensitive data identification system, method and storage medium based on heterogeneous computation | |
Dłotko et al. | Euler characteristic curves and profiles: a stable shape invariant for big data problems | |
WO2022127124A1 (en) | Meta learning-based entity category recognition method and apparatus, device and storage medium | |
CN114021637A (en) | Decentralized application encrypted flow classification method and device based on measurement space | |
CN112367325A (en) | Unknown protocol message clustering method and system based on closed frequent item mining | |
CN112884046A (en) | Image classification method and device based on incomplete supervised learning and related equipment | |
CN113256507A (en) | Attention enhancement method for generating image aiming at binary flux data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20220506 |
|
WD01 | Invention patent application deemed withdrawn after publication |