CN111614514A - Network traffic identification method and device - Google Patents

Network traffic identification method and device Download PDF

Info

Publication number
CN111614514A
CN111614514A CN202010366325.7A CN202010366325A CN111614514A CN 111614514 A CN111614514 A CN 111614514A CN 202010366325 A CN202010366325 A CN 202010366325A CN 111614514 A CN111614514 A CN 111614514A
Authority
CN
China
Prior art keywords
value
discrete
continuous
features
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010366325.7A
Other languages
Chinese (zh)
Other versions
CN111614514B (en
Inventor
胡博
陈山枝
朱轶凡
汪劲希
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202010366325.7A priority Critical patent/CN111614514B/en
Publication of CN111614514A publication Critical patent/CN111614514A/en
Application granted granted Critical
Publication of CN111614514B publication Critical patent/CN111614514B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • H04L43/045Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data

Abstract

The invention provides a network flow identification method and a device, wherein the method comprises the following steps: obtaining a continuous flow chart by adopting multivariate correlation analysis based on the obtained continuous characteristics of the flow to be identified; based on the obtained discrete characteristics of the flow to be identified, obtaining a discrete flow graph by adopting single-hot coding and entity embedding; inputting the continuous flow chart into a convolutional neural network to obtain a continuous flow chart value; inputting the discrete flow diagram into a factor decomposition machine to obtain a discrete flow diagram value; and inputting the continuous flow chart value and the discrete flow chart value into a normalized exponential function to obtain the type of the flow to be identified. The method provided by the invention is not only suitable for the original network flow in the byte form, but also suitable for the non-byte network flow after feature extraction and combination are carried out on the basis of the original network flow, thereby enlarging the application range of the network flow identification method and improving the accuracy of network flow identification.

Description

Network traffic identification method and device
Technical Field
The present invention relates to the field of communications network technologies, and in particular, to a network traffic identification method and apparatus.
Background
With the development of network technology, this brings a great challenge to the diversification of service types. Identifying the type of each service, i.e., identifying the type of each network traffic, becomes a key concern for network academic research and deployment operations.
The network flow is an important carrier for recording and reflecting the network and the user activities thereof, and the network flow identification can be used for evaluating the network situation, developing and analyzing the application program, finely operating and the like. With diversification of service types, the occurrence of dynamic ports and encrypted traffic limits the technology of network traffic identification based on ports or payloads, and intelligent network traffic identification is a current key research idea, such as network traffic identification by a traditional machine learning method. The traditional machine learning method realizes the identification of network traffic based on the extraction and combination of statistical characteristics, avoids the limitation of dynamic ports and encrypted traffic technology, but the traditional machine learning method needs to artificially and subjectively determine the selection and combination mode of the characteristics, so that the efficiency and the accuracy of traffic identification are lower.
Therefore, a new research idea at present is to introduce image classification in deep learning into flow identification, and to implement conversion from flow characteristics to pixel points by using characterization learning, so as to implement network flow identification, for example, to express an original flow data packet in a byte form in a form of a gray scale map, and to implement classification of network flow by classifying the gray scale map. The method is simple to implement, but when the input data is the non-byte network traffic after feature extraction and combination are carried out on the basis of the original network traffic, the method cannot accurately identify the type of the network traffic, so that the application range of the method is small, and the accuracy of network traffic identification is low.
Disclosure of Invention
The invention aims to provide a network traffic identification method and a network traffic identification device, which are used for enlarging the application range of the network traffic identification method and improving the accuracy of network traffic identification. The specific technical scheme is as follows:
in order to achieve the above object, an embodiment of the present invention provides a network traffic identification method, where the method includes:
based on the obtained continuous features of the flow to be identified, mapping the continuous features into pixel values by adopting multivariate correlation analysis to obtain a continuous flow chart;
mapping the discrete features into pixel values by adopting one-hot coding and entity embedding based on the acquired discrete features of the flow to be identified to obtain a discrete flow graph;
inputting the continuous flow chart into a convolutional neural network to obtain a continuous flow chart value, wherein the convolutional neural network is used for carrying out high-order combination on the continuous features;
inputting the discrete flow diagram into a factorization machine to obtain a discrete flow diagram value, wherein the factorization machine is used for carrying out low-order combination on the discrete features;
and inputting the continuous flow chart value and the discrete flow chart value into a normalized exponential function to obtain the type of the flow to be identified.
Optionally, the step of mapping the continuous features into pixel values by using multivariate correlation analysis based on the obtained continuous features of the flow to be identified to obtain a continuous flow chart includes:
inputting each continuous characteristic of the flow to be identified into a standard deviation standardization function to obtain a standardized continuous characteristic;
calculating the correlation between every two normalized continuous features;
determining a correlation matrix corresponding to each two normalized continuous features according to the correlation between the continuous features;
mapping the value of each element in the correlation matrix to the value range of a pixel point in a gray scale image to obtain a pixel value corresponding to each element in the correlation matrix;
and generating the continuous flow chart by using the pixel value corresponding to each element in the correlation matrix.
Optionally, the step of calculating the correlation between each two normalized continuous features includes:
the correlation between each two normalized consecutive features is calculated using the following formula:
Figure BDA0002476635760000021
wherein r isi,jAs a continuous feature xiNormalized value and continuous feature xjCorrelation between normalized values, xnormalization_iFor said continuous feature xiNormalized value, xnormalization_jAs a continuous feature xjAnd taking a normalized value, wherein p is the total number of the continuous features of the flow to be identified.
Optionally, the step of mapping the discrete features into pixel values by using unique hot coding and entity embedding based on the obtained discrete features of the flow to be identified to obtain a discrete flow graph includes:
vectorizing a first class of features in the discrete features by adopting entity embedding to obtain a first processed feature value, wherein the first class of features are discrete features of which the value number is greater than a preset value number threshold;
coding a second type of characteristics in the discrete characteristics by adopting one-hot coding to obtain a second processed characteristic value, wherein the second type of characteristics is discrete characteristics with the value number less than or equal to the preset value number threshold;
mapping the first processed characteristic value and the second processed characteristic value to a value range of a pixel point in a gray scale image respectively to obtain a pixel value corresponding to each discrete characteristic;
and generating the discrete flow chart by using the pixel value corresponding to each discrete feature.
Optionally, the step of performing vectorization processing on the first type of features in the discrete features by using entity embedding to obtain a first processed feature value includes:
calculating the vector dimension of each first-class feature according to the value quantity of each first-class feature;
obtaining a vector element value of the value of each first type feature on the vector dimension;
and normalizing the vector element value corresponding to each first-class feature to obtain a normalized vector element value, wherein the normalized vector element value is between the value ranges of the pixel points in the gray level image, and the normalized vector element value is the first-processed feature value.
Optionally, the step of calculating the vector dimension of each first-class feature according to the number of values of each first-class feature includes:
for each first class feature, calculating the vector dimension of the first class feature by using the following formula:
Figure BDA0002476635760000031
wherein dimensions is the vector dimension of the first class of features, and the passive values is the value number of the first class of features,
Figure BDA0002476635760000041
is to round up upwards;
the step of normalizing the vector element value corresponding to each first type feature to obtain a normalized vector element value includes:
for each first-class feature, the vector element value corresponding to the first-class feature is normalized by using the following formula to obtain a normalized vector element value corresponding to the first-class feature:
Figure BDA0002476635760000042
wherein v isnormalization_i_kIs a feature x of the first kindiThe kth value of (a) corresponds to the normalized vector element value, vi_kIs x of said first class of featuresiCorresponding k-th valued vector element value, vi_minIs the first type of feature xiCorresponding minimum vector element value, vi_maxIs the first type of feature xiThe corresponding maximum vector element value.
Optionally, the convolutional neural network includes a convolutional layer, a pooling layer, and a full-link layer;
the step of inputting the continuous flow chart into a convolutional neural network to obtain a continuous flow chart value comprises:
inputting the continuous flow graph into the convolutional neural network;
the convolution layer extracts continuous features in the continuous flow chart through a convolution kernel and conducts convolution operation on the continuous features in the continuous flow chart to obtain a convolution result;
the pooling layer performs characteristic screening on the convolution result to obtain a pooling result;
the full connection layer rectifies the pooling result and outputs a continuous flow chart value;
and acquiring the continuous flow chart value output by the full connection layer.
Optionally, the step of rectifying the pooling result by the full connection layer and outputting a continuous flow chart value includes:
calculating the continuous flow chart value by using the following formula:
yout=σ(yin*ω+b);
where σ is a parameter of the linear rectification ReLU function, yinAs a result of said pooling, youtAnd the continuous flow chart value is represented by omega, the weight of the pooling result is represented by b, and the offset term corresponding to the full connection layer is represented by b.
Optionally, the step of inputting the discrete flow chart into a factorization machine to obtain a discrete flow chart value includes:
calculating to obtain a discrete flow chart value by using the following formula:
Figure BDA0002476635760000051
wherein, ydiscreteIs the discrete flow diagram value, omega is the parameter of the characteristic, d is the total discrete characteristic of the network flow to be processed, ViAnd VjFor said discrete feature xiAnd the discrete feature xjA parameter of the product of (c).
In order to achieve the above object, an embodiment of the present invention further provides a network traffic identification apparatus, where the apparatus includes:
the continuous flow chart generation module is used for mapping the continuous features into pixel values by adopting multivariate correlation analysis based on the obtained continuous features of the flow to be identified to obtain a continuous flow chart;
the discrete flow map generation module is used for mapping the discrete features into pixel values by adopting one-hot coding and entity embedding based on the acquired discrete features of the flow to be identified to obtain a discrete flow map;
the continuous flow chart value calculation module is used for inputting the continuous flow chart into a convolutional neural network to obtain a continuous flow chart value, and the convolutional neural network is used for performing high-order combination on the continuous features;
the discrete flow graph value calculation module is used for inputting the discrete flow graph into a factorization machine to obtain discrete flow graph values, and the factorization machine is used for carrying out low-order combination on the discrete features;
and the output module is used for inputting the continuous flow chart value and the discrete flow chart value into a normalized exponential function to obtain the type of the flow to be identified.
Optionally, the continuous flow chart generating module includes:
the standardization processing submodule is used for inputting each continuous characteristic of the flow to be identified into a standard deviation standardization function to obtain a standardized continuous characteristic;
the calculation submodule is used for calculating the correlation between every two normalized continuous features;
the correlation determination submodule is used for determining a correlation matrix corresponding to each two normalized continuous features according to the correlation between the continuous features;
the first mapping submodule is used for mapping the value of each element in the correlation matrix to the value range of a pixel point in a gray scale map to obtain a pixel value corresponding to each element in the correlation matrix;
and the continuous flow chart generation submodule is used for generating the continuous flow chart by utilizing the pixel value corresponding to each element in the correlation matrix.
Optionally, the calculation sub-module is specifically configured to:
the correlation between each two normalized consecutive features is calculated using the following formula:
Figure BDA0002476635760000061
wherein r isi,jAs a continuous feature xiNormalized value and continuous feature xjCorrelation between normalized values, xnormalization_iFor said continuous feature xiNormalized value, xnormalization_jAs a continuous feature xjAnd taking a normalized value, wherein p is the total number of the continuous features of the flow to be identified.
Optionally, the discrete flow map generating module includes:
the first-class feature processing sub-module is used for carrying out vectorization processing on first-class features in the discrete features by adopting entity embedding to obtain a first processed feature value, wherein the first-class features are discrete features of which the value number is greater than a preset value number threshold;
the second-class feature processing sub-module is used for encoding a second-class feature in the discrete features by using one-hot encoding to obtain a second processed feature value, wherein the second-class feature is the discrete feature of which the value number is less than or equal to the preset value number threshold;
the second mapping submodule is used for mapping the first processed characteristic value and the second processed characteristic value to a value range of a pixel point in a gray scale map respectively to obtain a pixel value corresponding to each discrete characteristic;
and the discrete flow chart generation submodule is used for generating the discrete flow chart by utilizing the pixel value corresponding to each discrete feature.
Optionally, the first-class feature processing sub-module includes:
the calculation unit is used for calculating the vector dimension of each first-class feature according to the value quantity of each first-class feature;
the acquisition unit is used for acquiring a vector element value of each first-class feature, wherein the value of each first-class feature is on the vector dimension;
and the processing unit is used for carrying out standardization processing on the vector element value corresponding to each first-class feature to obtain a standardized vector element value, wherein the standardized vector element value is between the value ranges of the pixel points in the gray level image, and the standardized vector element value is the first-processed feature value.
Optionally, the computing unit is specifically configured to:
for each first class feature, calculating the vector dimension of the first class feature by using the following formula:
Figure BDA0002476635760000071
wherein dimensions is the vector dimension of the first class of features, and the passive values is the value number of the first class of features,
Figure BDA0002476635760000072
is to round up upwards;
the processing unit is specifically configured to:
for each first-class feature, the vector element value corresponding to the first-class feature is normalized by using the following formula to obtain a normalized vector element value corresponding to the first-class feature:
Figure BDA0002476635760000073
wherein v isnormalization_i_kIs a feature x of the first kindiThe kth value of (a) corresponds to the normalized vector element value, vi_kIs x of said first class of featuresiCorresponding k-th valued vector element value, vi_minIs the first type of feature xiCorresponding minimum vector element value, vi_maxIs the first type of feature xiThe corresponding maximum vector element value.
Optionally, the continuous flow chart value calculating module includes:
an input submodule for inputting the continuous flow graph into the convolutional neural network if the convolutional neural network includes a convolutional layer, a pooling layer, and a fully-connected layer;
the convolution layer extracts continuous features in the continuous flow chart through a convolution kernel and conducts convolution operation on the continuous features in the continuous flow chart to obtain a convolution result;
the pooling layer performs characteristic screening on the convolution result to obtain a pooling result;
the full connection layer rectifies the pooling result and outputs a continuous flow chart value;
and the acquisition submodule is used for acquiring the continuous flow chart value output by the full connection layer.
Optionally, the input sub-module is specifically configured to:
calculating the continuous flow chart value by using the following formula:
yout=σ(yin*ω+b);
where σ is a parameter of the linear rectification ReLU function, yinAs a result of said pooling, youtAnd the continuous flow chart value is represented by omega, the weight of the pooling result is represented by b, and the offset term corresponding to the full connection layer is represented by b.
Optionally, the discrete flow map value calculation module is specifically configured to:
calculating to obtain a discrete flow chart value by using the following formula:
Figure BDA0002476635760000081
wherein, ydiscreteFor the discrete flow graph value, ω is a parameter of a discrete feature, d is a total number of discrete features of the network flow to be processed, ViAnd VjFor said discrete feature xiAnd the discrete feature xjA parameter of the product of (c).
In order to achieve the above object, an embodiment of the present invention further provides an electronic device, where the electronic device includes a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing any of the above method steps when executing a program stored in the memory.
To achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above method steps.
To achieve the above object, an embodiment of the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to perform any of the above method steps.
The technical scheme provided by the embodiment of the invention has the beneficial effects that:
the method can acquire continuous characteristics and discrete characteristics of flow to be identified, process the continuous characteristics through a multivariate correlation analysis technology to obtain a continuous flow graph, process the discrete characteristics through a one-hot coding technology and an entity embedding technology to obtain a discrete flow graph, input the continuous flow graph into a convolutional neural network to obtain a continuous flow graph value, input the discrete flow graph into a factorization machine to obtain a discrete flow graph value, input the continuous flow graph value and the discrete flow graph value into a normalization index function, and output the type of the flow to be identified. In the technical scheme provided by the embodiment of the invention, the continuous characteristic and the discrete characteristic of the network flow are respectively and correspondingly processed, and the type of the flow to be identified is identified based on the processed continuous characteristic and discrete characteristic.
The embodiment of the invention realizes the conversion from the continuous characteristic and the discrete characteristic of the network flow to the pixel points and the combination learning of the pixel points, converts the identification problem of the network flow into the image classification problem and improves the identification accuracy of the network flow. In addition, the continuous features and the discrete features of the network traffic do not change with the processing of the network traffic, so the technical scheme provided by the embodiment of the invention for identifying the type of the traffic to be identified based on the continuous features and the discrete features is not only suitable for the original network traffic, but also suitable for the non-byte network traffic after feature extraction and combination based on the original network traffic, and the application range of the network traffic identification method is enlarged while the network traffic identification is realized, and the accuracy of the network traffic identification is further improved.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a network traffic identification method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a method for generating a continuous flow chart according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a method for generating a discrete flow chart according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a method for processing discrete features using entity embedding techniques according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a default recognition model according to an embodiment of the present invention;
fig. 6 is another schematic flow chart of a network traffic identification method according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a flow rate identification device according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to increase the application range of the network traffic identification method and improve the accuracy of network traffic identification while implementing network traffic identification, embodiments of the present invention provide a network traffic identification method, an apparatus, an electronic device, and a storage medium, and a detailed description will be given below of the network traffic identification method, the apparatus, the electronic device, and the storage medium provided in embodiments of the present invention with reference to fig. 1 to 8.
Referring to fig. 1, fig. 1 is a schematic flow chart of a network traffic identification method according to an embodiment of the present invention, where the method includes the following steps.
And step 101, mapping the continuous features into pixel values by adopting multivariate correlation analysis based on the obtained continuous features of the flow to be identified to obtain a continuous flow chart.
In this embodiment of the present invention, the traffic to be identified may include one or more data packets to be identified, where the identification of the traffic to be identified is a process of identifying the type of the one or more data packets to be identified, and associating the one or more data packets to be identified with a service type that generates the one or more data packets to be identified, for example, the service type of the data packet sent by the communication software may be classified as a communication service type.
The continuous characteristic is a characteristic that the value of the characteristic is continuous and regular, such as the size of the network flow, the time for generating the network flow and the like. Each flow to be identified may include a plurality of consecutive features.
In the embodiment of the invention, multivariate correlation analysis is used for mining the correlation between continuous features through Euclidean distance. For example, the absolute distance between two consecutive features is calculated by the euclidean distance formula, and the correlation between the two consecutive features is reflected by the absolute distance.
In one embodiment, as shown in fig. 2, step 101 may be subdivided into the following steps to more accurately determine the correlation between the continuous features and further more accurately identify and classify the network traffic corresponding to the continuous features.
Step 1011, inputting each continuous characteristic of the flow to be identified into a standard deviation standardization function to obtain the standardized continuous characteristic.
In the embodiment of the present invention, since the continuous features of the network traffic have different categories, such as the size of the network traffic, the number of packets corresponding to the network traffic, and the like, in order to process the continuous features in the same feature space, all the continuous features are standardized uniformly, and the continuous features are converted into numerical values which are in accordance with normal distribution and have no unit under a uniform standard, so as to process the continuous features of the network traffic under the uniform standard conveniently.
In one embodiment, the standard deviation normalization function may be:
Figure BDA0002476635760000111
wherein x isnomarlizationFor normalized continuous features, x is the continuous feature, μAnd sigma is the average difference of all values of the continuous characteristic.
At step 1012, a correlation between each two normalized consecutive features is calculated.
In the embodiment of the present invention, the correlation between each two consecutive features is a correlation between each two consecutive features. The correlation between each two consecutive features may be represented by a distance between each two consecutive features, and the distance between each two consecutive features may be calculated by a euclidean distance formula.
In one embodiment, the correlation between each two normalized consecutive features may be calculated using the following formula:
Figure BDA0002476635760000112
wherein r isi,jAs a continuous feature xiNormalized value and continuous feature xjCorrelation between normalized values, xnormalization_iAs a continuous feature xiNormalized value, xnormalization_jAs a continuous feature xjAnd (4) taking a normalized value, wherein p is the total number of continuous characteristics of the flow to be identified.
The above formula for calculating the correlation between two normalized features can also be understood as considering the distance between two normalized features as the length of the hypotenuse of the right triangle, and two normalized features are the two end points of the hypotenuse, so that the distance between two normalized features is calculated by determining the right triangle formed by the two normalized features.
Feature x sampled from input continuous featuresiAnd feature xjFor example, feature x is computediAnd feature xjThe correlation between the features x is firstly determinediAnd feature xjData conversion is carried out to obtain the data with the characteristic xiAnd feature xjA right triangle formed by combining the features xiAnd feature xjProjected to the (i, j) thIn the two-dimensional Euclidean subspace, the feature x is determined by the following formulaiAnd feature xjCoordinate points within the above two-dimensional euclidean subspace:
pointi,j=[i j]TX=[xixj]T(1≤i,j≤p,i≠j);
i=[ei,1,ei,2,…,ei,p]T
j=[ej,1,ej,2,...,ej,p]T
wherein pointi,jIs a characteristic xiAnd feature xjAt coordinate points within the above two-dimensional euclidean subspace,iis a characteristic xiA vector within a two-dimensional euclidean subspace,jis a characteristic x ofjThe vector in the two-dimensional euclidean subspace, p being the total number of consecutive features of the sample, X being the input set of samples.
In one example, assume that in a vectoriIn, except ei,1Is not 1, and the other values are all 0, and in the vectorjIn, except ej,1When the values of (a) and (b) are all 0, the coordinate points of the feature j and the feature i in the two-dimensional Euclidean subspace can be expressed as (x)i,xj) Will (x)i,xj) The projection points mapped to the i axis and the j axis and the coordinate origin form a triangle with delta fiOfjSo as to determine the triangle formed by the feature i and the feature j as deltafiOfj. Wherein O represents the origin of coordinates, fiCoordinate point x representing feature iiCorresponding vertex, fjCoordinate point x representing feature jjThe corresponding vertex.
In one embodiment, the correlation between each two normalized consecutive features can be calculated by using the following formula:
rmn=xnormalization_m*xnormalization_n(1≤m,n≤p,m≠n);
wherein r ismnAs a continuous feature xmNormalized value and continuous feature xnCorrelation between normalized values, xnormalization_mAs a continuous feature xmNormalized value, xnormalization_nAs a continuous feature xnAnd (4) taking a normalized value, wherein p is the total number of the normalized continuous features of the flow to be identified.
In the embodiment of the present invention, the correlation between every two normalized continuous features may also be calculated in other manners, which is not limited specifically. As long as the calculated correlation is guaranteed to lie between 0 and 255.
And 1013, determining a correlation matrix corresponding to the continuous features according to the correlation between every two normalized continuous features.
In the embodiment of the invention, if the sample has a plurality of continuous features, a plurality of normalized continuous features are correspondingly obtained, and the correlation between every two normalized continuous features in the plurality of normalized continuous features is calculated, so that the correlation matrix can be obtained.
Taking p continuous features of the sample as an example, p normalized continuous features are correspondingly obtained, and according to the correlation between every two normalized continuous features in the p normalized continuous features, a correlation matrix R can be obtained as follows:
Figure BDA0002476635760000131
the correlation matrix can be used for visually observing the correlation between the continuous features of the sample, so that the continuous features of the sample are subjected to imaging processing, and the network traffic is identified and classified by classifying the network traffic features.
And 1014, mapping the value of each element in the correlation matrix to the value range of the pixel point in the gray scale image to obtain the pixel value corresponding to each element in the correlation matrix.
In step 1015, a continuous flow chart is generated by using the pixel value corresponding to each element in the correlation matrix.
Because the correlation between the continuous features of different network flows is different, the correlation matrixes corresponding to different network flows are different, and therefore the identification and classification of the network flows can be realized according to the categories of the continuous flow graphs by processing and classifying the continuous flow graphs obtained by the correlation matrixes. Since the value of each element in the correlation matrix is mapped to the value range of the pixel point in the gray scale image, the continuous flow chart is also a gray scale image.
In the embodiment of the invention, the multivariate correlation analysis is adopted to process the continuous features, so that the correlation between every two continuous features can be obtained, meanwhile, the feature space is not enlarged too much, the complexity of network flow identification is reduced, and the identification efficiency of the network flow is improved.
And 102, mapping the discrete features into pixel values by adopting one-hot coding and entity embedding based on the acquired discrete features of the flow to be identified to obtain a discrete flow graph.
In the embodiment of the invention, the discrete characteristic is a characteristic with discrete and unordered characteristic values, such as an IP address of network traffic, a communication protocol of the network traffic and the like. Because the values of the discrete features are discrete and have no sequence, the values of the discrete features can be directly expanded to Euclidean space by adopting unique hot coding and entity embedding, so that the distance between the discrete features is conveniently calculated, and the correlation between the discrete features is obtained through the distance between the discrete features.
Where one-hot encoding, i.e. one-bit-efficient encoding, encodes N states, i.e. discrete feature values, using an N-bit state register, each value having an independent register bit, and only one of which is efficient at any time. For example, assuming that the communication protocol versions of the network traffic have three values, the three communication protocol versions can be sequentially represented in the forms of 001, 010, and 100 after being subjected to the one-hot coding. The values of the discrete features are coded by adopting the one-hot coding, so that the distance between every two values of the discrete features is equal, the calculated distance between the discrete features is reasonable, and the correlation between the calculated discrete features is reasonable.
The entity embedding endows different vector values to different categories, namely endows different values of the discrete features with different vector element values, and the distance between the discrete features with various values is calculated through the entity embedding, so that the calculation efficiency can be improved.
In one embodiment, to obtain the correlation between discrete features more accurately, the discrete features with different values are processed differently, as shown in fig. 3, and step 102 can be subdivided into the following steps.
And 1021, performing vectorization processing on a first class of features in the discrete features by adopting entity embedding to obtain a first processed feature value, wherein the first class of features are discrete features of which the value number is greater than a preset value number threshold value.
The first type of feature may be one or more. The first category is the one with a large number of values. Vectorization processing of the first class of features is to convert the first class of features into vector values. The entity embedding is adopted to process the features with a large number of values in the discrete features, for example, IP addresses, ports and the like corresponding to network traffic are processed, so that the burden of a training model can be reduced, and in addition, the feature dimension of the discrete features can be reduced through the entity embedding, so that the calculation difficulty of the distance between the discrete features is reduced. The preset value quantity threshold value may be set according to an actual situation, which is not limited in the embodiment of the present invention.
In one embodiment, as shown in fig. 4, step 1021 can be further subdivided into the following steps.
Step 10211, calculating the vector dimension of each first-class feature according to the value number of each first-class feature.
In the embodiment of the invention, the vector dimension of the first type of characteristics is related to the value number of the characteristics. And determining the vector dimension of each discrete feature with a large number of values, and assigning the values of the discrete features on the vector dimension, wherein a plurality of values of each discrete feature are positioned on the same vector dimension, so that the feature dimension of the discrete features is reduced, and the overlarge feature space of the discrete features is prevented.
In one embodiment, for each first class feature, the vector dimension of the first class feature may be calculated using the following formula:
Figure BDA0002476635760000151
wherein dimensions is the vector dimension of the first class of features, and the passive values is the value number of the first class of features,
Figure BDA0002476635760000152
is rounded up.
In one embodiment, for each first class feature, the vector dimension of the first class feature may be further calculated by using the following formula:
Figure BDA0002476635760000153
wherein dimensions is the vector dimension of the first type of feature, and the passive values are the value number of the first type of feature, λ is a preset parameter,
Figure BDA0002476635760000154
is rounded up.
In the embodiment of the present invention, the way of calculating the vector dimension of the first-class feature is not particularly limited.
Step 10212, obtain the vector element value of each first type feature in the vector dimension.
In entity embedding, the vector element value of each first-class feature value can be specified manually, and when a certain discrete feature has n values, the n values are different, and the value range of the n values is 1-n-1. Each first-type feature can have a plurality of values, and vector element values of all values of each first-type feature are in the same vector dimension.
Step 10213, standardizing the vector element value corresponding to each first-class feature to obtain a standardized vector element value, wherein the standardized vector element value is between the value ranges of the pixel points in the gray level image, and the standardized vector element value is the first-processed feature value.
Because the first type of features have a large number of values, when each value is given with a vector element value, the vector element value may exceed the value range (0-255) of a pixel point in the gray-scale image, and therefore, the vector element values corresponding to all the values of the features need to be uniformly compressed, so that the correlation between the vector element values corresponding to all the values is not changed, and the vector element values corresponding to all the values are within the value range of the pixel point in the gray-scale image, that is, between 0 and 255.
In an embodiment, for each first-class feature, the following formula may be used to normalize a vector element value corresponding to the first-class feature to obtain a normalized vector element value corresponding to the first-class feature:
Figure BDA0002476635760000161
wherein v isnormalization_i_kIs a feature x of the first kindiThe kth value of (a) corresponds to the normalized vector element value, vi_kX being a feature of the first kindiCorresponding k-th valued vector element value, vi_minIs a feature x of the first kindiCorresponding minimum vector element value, vi_maxIs a feature x of the first kindiThe corresponding maximum vector element value.
In an embodiment, for each first-class feature, the following formula may be further used to normalize a vector element value corresponding to the first-class feature to obtain a normalized vector element value corresponding to the first-class feature:
Figure BDA0002476635760000162
wherein v isnormalization_i_kIs a feature x of the first kindiThe kth value of (a) corresponds to the normalized vector element value, vi_kX being a feature of the first kindiCorresponding k-th valued vector element value, vi_minIs a feature x of the first kindiCorresponding minimum vector element value, vi_maxIs a feature x of the first kindiThe corresponding maximum vector element value, λ, is a preset parameter.
In the embodiment of the present invention, a manner of normalizing the vector element value corresponding to the first-type feature is not particularly limited.
And 1022, encoding a second type of feature in the discrete features by using unique hot encoding to obtain a second processed feature value, where the second type of feature is the discrete feature whose value number is less than or equal to a preset value number threshold.
The second type of feature may be one or more. The second type of feature is a feature with a small number of values. The discrete characteristics with small value quantity are processed by adopting the one-hot coding, for example, the communication protocol version of the network flow is processed, the value of the discrete flow can be directly coded by adopting the one-hot coding to obtain the coded value, and a discrete flow graph is generated by the coded value.
And 1023, respectively mapping the first processed characteristic value and the second processed characteristic value to the value range of the pixel point in the gray scale image to obtain the pixel value corresponding to each discrete characteristic.
And 1024, generating a discrete flow chart by using the pixel value corresponding to each discrete feature.
Because the values of the discrete features of different network flows are different, the values of the discrete features of different network flows after being coded are different, so that the discrete flow graphs corresponding to different network flows are different, and the network flows can be identified and classified according to the types of the discrete flow graphs.
And 103, inputting the continuous flow chart into a convolutional neural network to obtain a continuous flow chart value, wherein the convolutional neural network is used for performing high-order combination on the continuous features.
In one embodiment, a convolutional neural network may include a convolutional layer, a pooling layer, and a fully-connected layer.
Inputting the continuous flow chart into a convolutional neural network to obtain a continuous flow chart value, wherein the continuous flow chart is input into the convolutional neural network, the convolutional layer extracts continuous features in the continuous flow chart through a convolutional kernel, and the continuous features in the continuous flow chart are subjected to convolution operation to obtain a convolution result; the pooling layer performs characteristic screening on the convolution result to obtain a pooling result; and the full connection layer rectifies the pooling result and outputs a continuous flow chart value. And then acquiring the continuous flow chart value output by the full connection layer.
The pooling layer is a lower layer structure of the convolutional layer and is used for extracting and screening convolution results output by the convolutional layer. The convolutional neural network may include one convolutional layer and one pooling layer, or may include a plurality of convolutional layers and a plurality of pooling layers.
In the embodiment of the invention, the convolutional layer extracts continuous characteristics in the continuous flow chart through the convolutional core. When the continuous flow chart is too large or the continuous features in the continuous flow chart are too many, the continuous features may be extracted in a process of dividing the continuous flow chart into n regions, extracting and performing convolution operation on the n regions in the continuous flow chart by the convolution kernel, and sending the processed n regions to the pooling layer for further processing.
The process of extracting and screening the convolution result output by the convolution layer by the pooling layer may be that after the convolution result output by the convolution layer is obtained, an important value in the convolution result is adopted to replace a value interval containing the value, thereby realizing the compression of the convolution result. The important value can be the maximum value in a value interval containing the value, and can also be a value with special meaning in the value interval.
In one embodiment, the continuous flow map values may be calculated using the following equations.
yout=σ(yin*ω+b);
Wherein σIs a parameter of a ReLU (Rectified Linear Unit) function, yinAs a result of pooling youtFor the continuous flow graph values, ω is the weight of the pooling result and b is the offset term for the fully connected layer.
In one embodiment, the continuous flow chart value may also be calculated using the following formula.
yout=σ(yin*ω+b)+λ;
Wherein σ is a ReLU (Rectified Linear Unit) function parameter, yinFor the processed continuous flow chart, youtAnd the value is a continuous flow chart, omega is the weight of the processed continuous flow chart, b is an offset term corresponding to the full connection layer, and lambda is a preset parameter.
And 104, inputting the discrete flow diagram into a factorization machine to obtain a discrete flow diagram value, wherein the factorization machine is used for carrying out low-order combination on the discrete features.
In the embodiment of the present invention, the low-order combination may be to output the processed discrete flow rate map after performing first-order (linear interaction) and second-order (paired feature interaction) feature combination processing on the discrete flow rate map and the continuous flow rate map.
In one embodiment, the discrete flow map values may be calculated using the following equations.
Figure BDA0002476635760000181
Wherein, ydiscreteIs a discrete flow diagram value, omega is a parameter of a discrete feature, d is the total number of the discrete features of the network flow to be processed, ViAnd VjAs a discrete feature xiAnd discrete feature xjA parameter of the product of (c).
In one embodiment, the discrete flow map values may also be calculated using the following equations.
Figure BDA0002476635760000191
Wherein, ydiscreteAs discrete flow map valuesω is the parameter of the discrete characteristics, d is the total number of discrete characteristics of the network traffic to be processed, ViAnd VjAs a discrete feature xiAnd discrete feature xjλ is a preset parameter.
And 105, taking the continuous flow chart value and the discrete flow chart value as input normalization index functions to obtain the type of the flow to be identified.
In the embodiment of the invention, the continuous flow chart value and the discrete flow chart value are processed through the normalized exponential function, and the type of the flow to be identified is output. And realizing multi-classification of network traffic by normalizing the exponential function.
In one embodiment, steps 103-105 can be understood as inputting the continuous flow chart and the discrete flow chart into a preset identification model, so that the preset identification model performs high-order combination on the continuous flow chart to obtain the continuous flow chart value. And carrying out low-order combination on the discrete flow graph to obtain the discrete flow graph value. And processing the continuous flow chart value and the discrete flow chart value through the normalized index function, and outputting the type of the flow to be identified.
The preset identification model may be a combination model obtained by combining a convolutional neural network model and a factorization model. That is, a combined model obtained by combining a Deep (convolutional neural network) model with an FM (Factorization) model. Wherein, the Deep model is subjected to high-order combination, and the FM model is subjected to low-order combination.
The preset identification model may adopt a structure as shown in fig. 5, a plurality of circles in a dotted line frame of a discrete feature represent a plurality of discrete features extracted from a sample, a plurality of circles in a dotted line frame of a continuous feature represent a plurality of continuous features extracted from the sample, a plurality of circles in a single-hot-code/entity-embedded dotted line frame represent discrete features processed by single-hot-code/entity-embedding, a plurality of circles in a dotted line frame of a multivariate correlation analysis represent continuous features processed by multivariate correlation analysis, a plurality of circles in a dotted line frame of an FM layer represent discrete features combined by a low order, a plurality of circles in a dotted line frame of a Deep layer represent continuous features combined by a high order and a low order, and a circle in a dotted line frame of an output layer represents an output processing function, such as a normalized exponential function.
The network traffic identification method provided by the embodiment of the invention respectively carries out corresponding processing on the continuous characteristic and the discrete characteristic of the network traffic, and identifies the type of the traffic to be identified based on the processed continuous characteristic and the discrete characteristic. The network flow identification method provided by the embodiment of the invention realizes the conversion from continuous characteristics and discrete characteristics of the network flow to the pixel points and the combination learning of the pixel points, converts the identification problem of the network flow into the image classification problem, and improves the identification accuracy of the network flow. In addition, the continuous features and the discrete features of the network traffic do not change with the processing of the network traffic, so the technical scheme provided by the embodiment of the invention for identifying the type of the traffic to be identified based on the continuous features and the discrete features is not only suitable for the original network traffic, but also suitable for the non-byte network traffic after feature extraction and combination based on the original network traffic, and the application range of the network traffic identification method is enlarged while the network traffic identification is realized, and the accuracy of the network traffic identification is further improved.
In addition, the network traffic identification method provided by the embodiment of the invention divides the characteristics of the traffic to be identified into the continuous characteristics and the discrete characteristics, and respectively processes the continuous characteristics and the discrete characteristics by adopting different technologies, thereby reducing the identification difficulty of the network traffic and improving the identification efficiency of the network traffic.
As shown in fig. 6, fig. 6 is another schematic flow chart of a network traffic identification method according to an embodiment of the present invention, where the method includes:
step 601, processing the traffic to be identified to obtain continuous features and discrete features of the traffic to be identified.
And step 602, processing the continuous characteristics by adopting multivariate correlation analysis to obtain a continuous flow chart.
Step 603, discrete features are processed by adopting one-hot coding and entity embedding to obtain a discrete flow graph.
And step 604, performing high-order combination on the continuous flow chart to obtain a numerical value of the continuous flow chart.
And 605, performing low-order combination on the discrete flow graph to obtain a discrete flow graph value.
And 606, processing the continuous flow chart value and the discrete flow chart value through a normalized exponential function, and outputting the type of the flow to be identified.
The description of steps 601-606 is relatively simple, and reference may be made to steps 101-105.
To achieve the above object, as shown in fig. 7, an embodiment of the present invention further provides a network traffic identification apparatus, where the apparatus includes:
and the continuous flow chart generating module 701 is configured to map the continuous features into pixel values by using multivariate correlation analysis based on the obtained continuous features of the flow to be identified, so as to obtain a continuous flow chart.
The discrete flow map generating module 702 is configured to map discrete features into pixel values by using unique hot coding and entity embedding based on the obtained discrete features of the flow to be identified, so as to obtain a discrete flow map.
And the continuous flow chart value calculation module 703 is configured to input the continuous flow chart into a convolutional neural network to obtain a continuous flow chart value, where the convolutional neural network is configured to perform high-order combination on the continuous features.
And the discrete flow map value calculation module 704 is configured to input the discrete flow map into a factorizer to obtain a discrete flow map value, and the factorizer is configured to perform low-order combination on the discrete features.
And the output module 705 is configured to input the continuous flow chart value and the discrete flow chart value into the normalized index function to obtain the type of the flow to be identified.
In one embodiment, the continuous flow chart generating module 701 may include:
and the standardization processing submodule is used for inputting each continuous characteristic of the flow to be identified into a standard deviation standardization function to obtain the standardized continuous characteristic.
And the calculation submodule is used for calculating the correlation between every two normalized continuous features.
And the correlation determination submodule is used for determining a correlation matrix corresponding to the continuous features according to the correlation between every two normalized continuous features.
And the first mapping submodule is used for mapping the value of each element in the correlation matrix to the value range of the pixel point in the gray scale image to obtain the pixel value corresponding to each element in the correlation matrix.
And the continuous flow chart generation submodule is used for generating a continuous flow chart by using the pixel value corresponding to each element in the correlation matrix.
In an embodiment, the computation submodule may be specifically configured to:
the correlation between each two normalized consecutive features is calculated using the following formula.
Figure BDA0002476635760000211
Wherein r isi,jAs a continuous feature xiNormalized value and continuous feature xjCorrelation between normalized values, xnormalization_iAs a continuous feature xiNormalized value, xnormalization_jAs a continuous feature xjAnd (4) taking a normalized value, wherein p is the total number of continuous characteristics of the flow to be identified.
In one embodiment, the discrete flow graph generation module 702 may include:
and the first-class feature processing sub-module is used for carrying out vectorization processing on the first-class features in the discrete features by adopting entity embedding to obtain a first processed feature value, wherein the first-class features are discrete features of which the value number is greater than a preset value number threshold value.
And the second-class feature processing submodule is used for encoding the second-class features in the discrete features by adopting one-hot encoding to obtain a second processed feature value, and the second-class features are the discrete features of which the value number is less than or equal to a preset value number threshold value.
And the second mapping submodule is used for respectively mapping the first processed characteristic value and the second processed characteristic value to a value range of a pixel point in the gray scale image to obtain a pixel value corresponding to each discrete characteristic.
And the discrete flow chart generation submodule is used for generating a discrete flow chart by utilizing the pixel value corresponding to each discrete feature.
In one embodiment, the first-class feature processing sub-module may include:
and the calculating unit is used for calculating the vector dimension of each first-class feature according to the value quantity of each first-class feature.
And the acquisition unit is used for acquiring the vector element value of each first-class characteristic in the vector dimension.
And the processing unit is used for carrying out standardization processing on the vector element value corresponding to each first-class feature to obtain a standardized vector element value, wherein the standardized vector element value is between the value ranges of the pixel points in the gray level image, and the standardized vector element value is the first-processed feature value.
In an embodiment, the computing unit may be specifically configured to:
for each first class feature, the vector dimension of the first class feature is calculated using the following formula.
Figure BDA0002476635760000221
Wherein dimensions is the vector dimension of the first class of features, and the passive values is the value number of the first class of features,
Figure BDA0002476635760000231
is rounded up.
The processing unit may be specifically configured to:
and aiming at each first-class feature, carrying out standardization processing on the vector element value corresponding to the first-class feature by using the following formula to obtain a standardized vector element value corresponding to the first-class feature.
Figure BDA0002476635760000232
Wherein v isnormalization_i_kIs a feature x of the first kindiThe kth value of (a) corresponds to the normalized vector element value, vi_kX being a feature of the first kindiCorresponding k-th valued vector element value, vi_minIs a feature x of the first kindiCorresponding minimum vector element value, vi_maxIs a feature x of the first kindiThe corresponding maximum vector element value.
In one embodiment, the continuous flow map value calculation module 703 may include:
and the input submodule is used for inputting the continuous flow chart into the convolutional neural network under the condition that the convolutional neural network comprises a convolutional layer, a pooling layer and a full-link layer. And the convolution layer extracts continuous characteristics in the continuous flow chart through a convolution kernel and performs convolution operation on the continuous characteristics in the continuous flow chart to obtain a convolution result. And the pooling layer performs characteristic screening on the convolution result to obtain a pooling result. And the full connection layer rectifies the pooling result and outputs a continuous flow chart value.
And the acquisition submodule is used for acquiring the continuous flow chart value output by the full connection layer.
In one embodiment, the input submodule may be specifically configured to:
the continuous flow chart value is calculated using the following formula.
yout=σ(yin*ω+b);
Wherein σ is a ReLU (Rectified Linear Unit) function parameter, yinFor the processed continuous flow chart, youtFor the pooled result, ω is the weight of the pooled result and b is the bias term for the full link layer.
In one embodiment, the discrete flow map value calculation 704 module may be specifically configured to:
the discrete flow map value is calculated using the following formula.
Figure BDA0002476635760000241
Wherein, ydiscreteAs discrete flow map valuesω is the parameter of the discrete characteristics, d is the total number of discrete characteristics of the network traffic to be processed, ViAnd VjAs a discrete feature xiAnd discrete feature xjA parameter of the product of (c).
In order to achieve the above object, an embodiment of the present invention further provides a terminal device, as shown in fig. 8, including a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete communication with each other through the communication bus 804.
A memory 803 for storing a computer program;
the processor 801 is configured to implement the network traffic identification method according to the embodiment of the present invention when executing the program stored in the memory 803.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
In another embodiment provided by the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements a network traffic identification method provided by an embodiment of the present invention.
In another embodiment, the present invention further provides a computer program product containing instructions, which when executed on a computer, causes the computer to implement a network traffic identification method provided by the embodiment of the present invention.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the apparatus, the electronic device, and the computer-readable storage medium, since they are substantially similar to the embodiments of the method, the description is simple, and for the relevant points, reference may be made to the partial description of the embodiments of the method.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A method for identifying network traffic, the method comprising:
based on the obtained continuous features of the flow to be identified, mapping the continuous features into pixel values by adopting multivariate correlation analysis to obtain a continuous flow chart;
mapping the discrete features into pixel values by adopting one-hot coding and entity embedding based on the acquired discrete features of the flow to be identified to obtain a discrete flow graph;
inputting the continuous flow chart into a convolutional neural network to obtain a continuous flow chart value, wherein the convolutional neural network is used for carrying out high-order combination on the continuous features;
inputting the discrete flow diagram into a factorization machine to obtain a discrete flow diagram value, wherein the factorization machine is used for carrying out low-order combination on the discrete features;
and inputting the continuous flow chart value and the discrete flow chart value into a normalized exponential function to obtain the type of the flow to be identified.
2. The method according to claim 1, wherein the step of mapping the continuous features into pixel values by using multivariate correlation analysis based on the obtained continuous features of the flow to be identified to obtain a continuous flow chart comprises:
inputting each continuous characteristic of the flow to be identified into a standard deviation standardization function to obtain a standardized continuous characteristic;
calculating the correlation between every two normalized continuous features;
determining a correlation matrix corresponding to each two normalized continuous features according to the correlation between the continuous features;
mapping the value of each element in the correlation matrix to the value range of a pixel point in a gray scale image to obtain a pixel value corresponding to each element in the correlation matrix;
and generating the continuous flow chart by using the pixel value corresponding to each element in the correlation matrix.
3. The method of claim 2, wherein the step of calculating the correlation between each two normalized consecutive features comprises:
the correlation between each two normalized consecutive features is calculated using the following formula:
Figure FDA0002476635750000021
wherein r isi,jAs a continuous feature xiAfter standardizationValue and continuous characteristic x ofjCorrelation between normalized values, xnormalization_iFor said continuous feature xiNormalized value, xnormalization_jAs a continuous feature xjAnd taking a normalized value, wherein p is the total number of the continuous features of the flow to be identified.
4. The method according to claim 1, wherein the step of mapping the discrete features into pixel values by using one-hot coding and entity embedding based on the obtained discrete features of the traffic to be identified to obtain a discrete traffic map comprises:
vectorizing a first class of features in the discrete features by adopting entity embedding to obtain a first processed feature value, wherein the first class of features are discrete features of which the value number is greater than a preset value number threshold;
coding a second type of characteristics in the discrete characteristics by adopting one-hot coding to obtain a second processed characteristic value, wherein the second type of characteristics is discrete characteristics with the value number less than or equal to the preset value number threshold;
mapping the first processed characteristic value and the second processed characteristic value to a value range of a pixel point in a gray scale image respectively to obtain a pixel value corresponding to each discrete characteristic;
and generating the discrete flow chart by using the pixel value corresponding to each discrete feature.
5. The method according to claim 4, wherein the step of vectorizing the first class of features in the discrete features by using entity embedding to obtain a first processed feature value comprises:
calculating the vector dimension of each first-class feature according to the value quantity of each first-class feature;
obtaining a vector element value of the value of each first type feature on the vector dimension;
and normalizing the vector element value corresponding to each first-class feature to obtain a normalized vector element value, wherein the normalized vector element value is between the value ranges of the pixel points in the gray level image, and the normalized vector element value is the first-processed feature value.
6. The method according to claim 5, wherein the step of calculating the vector dimension of each first class feature according to the value number of each first class feature comprises:
for each first class feature, calculating the vector dimension of the first class feature by using the following formula:
Figure FDA0002476635750000031
wherein, dimensions is the vector dimension of the first type of features, and the passive values are the value number of the first type of features;
the step of normalizing the vector element value corresponding to each first type feature to obtain a normalized vector element value includes:
for each first-class feature, the vector element value corresponding to the first-class feature is normalized by using the following formula to obtain a normalized vector element value corresponding to the first-class feature:
Figure FDA0002476635750000032
wherein v isnormalization_i_kIs a feature x of the first kindiThe kth value of (a) corresponds to the normalized vector element value, vi_kIs x of said first class of featuresiCorresponding k-th valued vector element value, vi_minIs the first type of feature xiCorresponding minimum vector element value, vi_maxIs the first type of feature xiThe corresponding maximum vector element value.
7. The method of claim 1, wherein the convolutional neural network comprises a convolutional layer, a pooling layer, and a fully-connected layer;
the step of inputting the continuous flow chart into a convolutional neural network to obtain a continuous flow chart value comprises:
inputting the continuous flow graph into the convolutional neural network;
the convolution layer extracts continuous features in the continuous flow chart through a convolution kernel and conducts convolution operation on the continuous features in the continuous flow chart to obtain a convolution result;
the pooling layer performs characteristic screening on the convolution result to obtain a pooling result;
the full connection layer rectifies the pooling result and outputs a continuous flow chart value;
and acquiring the continuous flow chart value output by the full connection layer.
8. The method of claim 7, wherein the fully connected layer is configured to rectify the pooled results, and the step of outputting continuous flow map values comprises:
calculating the continuous flow chart value by using the following formula:
yout=σ(yin*ω+b);
where σ is a parameter of the linear rectification ReLU function, yinAs a result of said pooling, youtAnd the continuous flow chart value is represented by omega, the weight of the pooling result is represented by b, and the offset term corresponding to the full connection layer is represented by b.
9. The method of claim 1, wherein the step of inputting the discrete flow map into a factorizer to obtain discrete flow map values comprises:
calculating to obtain a discrete flow chart value by using the following formula:
Figure FDA0002476635750000041
wherein, ydiscreteFor the discrete flow graph value, ω is a parameter of a discrete feature, d is a total number of discrete features of the network flow to be processed, ViAnd VjFor said discrete feature xiAnd the discrete feature xjA parameter of the product of (c).
10. A network traffic identification apparatus, the apparatus comprising:
the continuous flow chart generation module is used for mapping the continuous features into pixel values by adopting multivariate correlation analysis based on the obtained continuous features of the flow to be identified to obtain a continuous flow chart;
the discrete flow map generation module is used for mapping the discrete features into pixel values by adopting one-hot coding and entity embedding based on the acquired discrete features of the flow to be identified to obtain a discrete flow map;
the continuous flow chart value calculation module is used for inputting the continuous flow chart into a convolutional neural network to obtain a continuous flow chart value, and the convolutional neural network is used for performing high-order combination on the continuous features;
the discrete flow graph value calculation module is used for inputting the discrete flow graph into a factorization machine to obtain discrete flow graph values, and the factorization machine is used for carrying out low-order combination on the discrete features;
and the output module is used for inputting the continuous flow chart value and the discrete flow chart value into a normalized exponential function to obtain the type of the flow to be identified.
CN202010366325.7A 2020-04-30 2020-04-30 Network traffic identification method and device Expired - Fee Related CN111614514B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010366325.7A CN111614514B (en) 2020-04-30 2020-04-30 Network traffic identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010366325.7A CN111614514B (en) 2020-04-30 2020-04-30 Network traffic identification method and device

Publications (2)

Publication Number Publication Date
CN111614514A true CN111614514A (en) 2020-09-01
CN111614514B CN111614514B (en) 2021-09-24

Family

ID=72196738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010366325.7A Expired - Fee Related CN111614514B (en) 2020-04-30 2020-04-30 Network traffic identification method and device

Country Status (1)

Country Link
CN (1) CN111614514B (en)

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1098544A2 (en) * 1999-11-04 2001-05-09 Lucent Technologies Inc. Road-based evaluation and interpolation of wireless network parameters
CN1622524A (en) * 2004-12-17 2005-06-01 北京邮电大学 Alleviation adjusting method for mobile IP burst flow
CN101790022A (en) * 2009-11-26 2010-07-28 西北工业大学 Method for detecting moving vehicles based on JPEG (Joint Photographic Experts Group) image
CN103268286A (en) * 2013-06-04 2013-08-28 百度在线网络技术(北京)有限公司 Method, system and testing machine for testing flow of application program in mobile terminal
CN104683137A (en) * 2013-11-30 2015-06-03 成都勤智数码科技股份有限公司 Abnormal flow detection method for periodic characteristic network
US9158847B1 (en) * 2011-07-19 2015-10-13 Kyndi Inc. Cognitive memory encoding networks for fast semantic indexing storage and retrieval
CN105376105A (en) * 2014-08-27 2016-03-02 苏州大数聚信息技术有限公司 Internet traffic modeling method based on time-sliding window
CN105577473A (en) * 2015-12-21 2016-05-11 重庆大学 Multi-business flow generation system based on network flow model
CN107070943A (en) * 2017-05-05 2017-08-18 兰州理工大学 Industry internet intrusion detection method based on traffic characteristic figure and perception Hash
US20180103052A1 (en) * 2016-10-11 2018-04-12 Battelle Memorial Institute System and methods for automated detection, reasoning and recommendations for resilient cyber systems
US20180137155A1 (en) * 2015-03-24 2018-05-17 Kyndi, Inc. Cognitive memory graph indexing, storage and retrieval
US20180302306A1 (en) * 2017-04-12 2018-10-18 Battelle Memorial Institute Complementary workflows for identifying one-hop network behavior and multi-hop network dependencies
CN109063777A (en) * 2018-08-07 2018-12-21 北京邮电大学 Net flow assorted method, apparatus and realization device
CN109309675A (en) * 2018-09-21 2019-02-05 华南理工大学 A kind of network inbreak detection method based on convolutional neural networks
CN109830102A (en) * 2019-02-14 2019-05-31 重庆邮电大学 A kind of short-term traffic flow forecast method towards complicated urban traffic network
CN109873774A (en) * 2019-01-15 2019-06-11 北京邮电大学 A kind of network flow identification method and device
CN110009016A (en) * 2019-03-25 2019-07-12 新华三信息安全技术有限公司 Feature extracting method and device

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1098544A2 (en) * 1999-11-04 2001-05-09 Lucent Technologies Inc. Road-based evaluation and interpolation of wireless network parameters
CN1622524A (en) * 2004-12-17 2005-06-01 北京邮电大学 Alleviation adjusting method for mobile IP burst flow
CN101790022A (en) * 2009-11-26 2010-07-28 西北工业大学 Method for detecting moving vehicles based on JPEG (Joint Photographic Experts Group) image
US9158847B1 (en) * 2011-07-19 2015-10-13 Kyndi Inc. Cognitive memory encoding networks for fast semantic indexing storage and retrieval
CN103268286A (en) * 2013-06-04 2013-08-28 百度在线网络技术(北京)有限公司 Method, system and testing machine for testing flow of application program in mobile terminal
CN104683137A (en) * 2013-11-30 2015-06-03 成都勤智数码科技股份有限公司 Abnormal flow detection method for periodic characteristic network
CN105376105A (en) * 2014-08-27 2016-03-02 苏州大数聚信息技术有限公司 Internet traffic modeling method based on time-sliding window
US20180137155A1 (en) * 2015-03-24 2018-05-17 Kyndi, Inc. Cognitive memory graph indexing, storage and retrieval
CN105577473A (en) * 2015-12-21 2016-05-11 重庆大学 Multi-business flow generation system based on network flow model
US20180103052A1 (en) * 2016-10-11 2018-04-12 Battelle Memorial Institute System and methods for automated detection, reasoning and recommendations for resilient cyber systems
US10637744B2 (en) * 2017-04-12 2020-04-28 Battelle Memorial Institute Complementary workflows for identifying one-hop network behavior and multi-hop network dependencies
US20180302306A1 (en) * 2017-04-12 2018-10-18 Battelle Memorial Institute Complementary workflows for identifying one-hop network behavior and multi-hop network dependencies
CN107070943A (en) * 2017-05-05 2017-08-18 兰州理工大学 Industry internet intrusion detection method based on traffic characteristic figure and perception Hash
CN109063777A (en) * 2018-08-07 2018-12-21 北京邮电大学 Net flow assorted method, apparatus and realization device
CN109309675A (en) * 2018-09-21 2019-02-05 华南理工大学 A kind of network inbreak detection method based on convolutional neural networks
CN109873774A (en) * 2019-01-15 2019-06-11 北京邮电大学 A kind of network flow identification method and device
CN109830102A (en) * 2019-02-14 2019-05-31 重庆邮电大学 A kind of short-term traffic flow forecast method towards complicated urban traffic network
CN110009016A (en) * 2019-03-25 2019-07-12 新华三信息安全技术有限公司 Feature extracting method and device

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
PREETI MISHRA: "A Detailed Investigation and Analysis of Using Machine Learning Techniques for Intrusion Detection", 《IEEE COMMUNICATIONS SURVEYS & TUTORIALS ( VOLUME: 21, ISSUE: 1, FIRSTQUARTER 2019)》 *
何鑫: "基于深度学习的木马网络行为检测系统设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
周晓: "基于优化神经网络的混合网络流量预测模型仿真与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
师圣蔓: "基于机器学习的网络流量预测与应用研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
王伟: "基于深度学习的网络流量分类及异常检测方法研究", 《中国博士学位论文全文数据库信息科技辑》 *
程华等: "基于分形插值函数重构的网络流量多尺度结构研究", 《华东理工大学学报(自然科学版)》 *
陶永等: "插值算法在网络流量监控中的应用", 《计算机与网络》 *

Also Published As

Publication number Publication date
CN111614514B (en) 2021-09-24

Similar Documents

Publication Publication Date Title
WO2021017679A1 (en) Address information parsing method and apparatus, system and data acquisition method
WO2021077841A1 (en) Recurrent residual network-based signal modulation and recognition method and device
CN110636445B (en) WIFI-based indoor positioning method, device, equipment and medium
JP4697670B2 (en) Identification data learning system, learning device, identification device, and learning method
EP4024297A1 (en) Artificial intelligence (ai) model evaluation method and system, and device
CN111428557A (en) Method and device for automatically checking handwritten signature based on neural network model
CN112231592A (en) Network community discovery method, device, equipment and storage medium based on graph
CN113158869A (en) Image recognition method and device, terminal equipment and computer readable storage medium
CN112418320A (en) Enterprise association relation identification method and device and storage medium
CN112948823A (en) Data leakage risk assessment method
CN111784401A (en) Order taking rate prediction method, device, equipment and readable storage medium
CN113283901B (en) Byte code-based fraud contract detection method for block chain platform
CN111614514B (en) Network traffic identification method and device
CN116662817B (en) Asset identification method and system of Internet of things equipment
CN112383488A (en) Content identification method suitable for encrypted and non-encrypted data streams
CN116827873A (en) Encryption application flow classification method and system based on local-global feature attention
CN114896594A (en) Malicious code detection device and method based on image feature multi-attention learning
CN112463964B (en) Text classification and model training method, device, equipment and storage medium
CN111814051B (en) Resource type determining method and device
CN114741697A (en) Malicious code classification method and device, electronic equipment and medium
CN114549928A (en) Image enhancement processing method and device, computer equipment and storage medium
CN114116456A (en) Test case generation method, system and computer readable storage medium
CN113282927A (en) Malicious code detection method, device and equipment and computer readable storage medium
CN112686677A (en) Customer qualification evaluation method and device based on combination characteristics and attention mechanism
CN112559589A (en) Remote surveying and mapping data processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210924

CF01 Termination of patent right due to non-payment of annual fee