CN116545944A

CN116545944A - Network traffic classification method and system

Info

Publication number: CN116545944A
Application number: CN202310627413.1A
Authority: CN
Inventors: 刘兰; 余永杰; 吴亚峰; 陈桂铭; 惠占发; 林依婷
Original assignee: Guangdong Polytechnic Normal University
Current assignee: Guangdong Polytechnic Normal University
Priority date: 2023-05-30
Filing date: 2023-05-30
Publication date: 2023-08-04

Abstract

The invention relates to the technical field of flow data processing, and discloses a network flow classification method and a system, wherein the method comprises the following steps: acquiring network traffic data and extracting a corresponding flow of the network traffic data; dividing the stream into a number of sessions; aggregating sessions having the same communication direction into one packet block; converting the grouping blocks into two-dimensional gray level grouping images; performing mode conversion and image recognition on the two-dimensional gray level grouping images in sequence to obtain an image recognition text sequence; and classifying the image recognition text sequence by using a classifier to obtain the application type of the network traffic. The invention can accurately detect and identify the unknown network traffic, and improves the classification effect and efficiency of the network traffic.

Description

Network traffic classification method and system

Technical neighborhood

The present invention relates to the field of traffic data processing technologies, and in particular, to a method and a system for classifying network traffic.

Background

Network traffic classification has a wide range of applications in today's internet, such as resource allocation, qoS provisioning, ISP billing, anomaly detection, etc. The network traffic is accurately classified and identified, so that the network resources can be accurately managed, the resources can be effectively reused, and personalized services can be provided.

The existing network traffic classification method is based on the traditional machine learning technology, depends on the design based on the packet characteristics or the statistical characteristics, needs a large number of label data sets to train, and along with the change and the complexity of the unknown network traffic, the network traffic classification method based on the traditional machine learning technology cannot accurately detect and identify the unknown traffic, and has poor classification effect.

Disclosure of Invention

The invention provides the following technical scheme for overcoming the defects of the prior art that unknown flow cannot be accurately detected and identified and the classification effect is poor:

in a first aspect, the present invention proposes a network traffic classification method, including:

acquiring network traffic data;

extracting a corresponding flow of the network traffic data; the flow is a set of communication packets having the same source IP address, source port, destination IP address, and destination port for a period of time;

dividing the stream into a plurality of sessions according to a preset rule;

aggregating sessions having the same communication direction into one packet block;

converting the grouping blocks into two-dimensional gray level grouping images;

the two-dimensional gray scale group images are identified by an image identifier integrating a modal conversion mechanism and a self-attention mechanism, and an image identification text sequence is obtained;

and classifying the image recognition text sequence by using a classifier to obtain the application type of the network traffic.

In a second aspect, the present invention proposes a network traffic classification system comprising:

the acquisition module is used for acquiring network flow data;

an extraction module for extracting a corresponding flow of network traffic data; the flow is a set of communication packets having the same source IP address, source port, destination IP address, and destination port for a period of time;

the dividing module is used for dividing the stream into a plurality of sessions according to a preset rule;

an aggregation module for aggregating sessions having the same communication direction into one packet block;

the conversion module is used for converting the grouping block into a two-dimensional gray level grouping image;

the recognition module is used for recognizing the two-dimensional gray group images by utilizing an image recognizer integrating a modal conversion mechanism and a self-attention mechanism to obtain an image recognition text sequence;

and the classification module is used for classifying the image recognition text sequence by using a classifier to obtain the application type of the network traffic.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

(1) The flows with the same source IP address, source port, destination IP address and destination port are divided into a plurality of sessions, and the sessions with the same communication direction are aggregated into one grouping block so as to identify the flows, so that deep information of the service flows can be better aggregated, the detection and identification precision of the flows can be improved, and the classification effect is improved.

(2) By converting the grouping block into a two-dimensional gray grouping image and utilizing an image identifier integrating a modal conversion mechanism and a self-attention mechanism to convert and identify the two-dimensional gray grouping image mode, the shift invariant feature of the two-dimensional gray grouping image can be captured more easily, the parallelization is higher, the complexity is lower, and the classification efficiency of network traffic can be greatly improved.

Drawings

Fig. 1 is a flow chart of a network traffic classification method according to an embodiment of the present application.

Fig. 2 is a schematic diagram of communication transmission performed by a packet block in an embodiment of the present application.

Fig. 3 is a two-dimensional distribution matrix diagram of grouping blocks in an embodiment of the present application.

FIG. 4 is a diagram of the overall architecture of an image recognizer in an embodiment of the present application

Fig. 5 is a frame diagram of the image identifier for classifying network traffic in the embodiment of the present application.

Fig. 6 is a block diagram of a network traffic classification system according to an embodiment of the present application.

Detailed Description

Further advantages and effects of the present invention will become apparent to those skilled in the art from the disclosure in the present specification, by referring to the drawings and the preferred embodiments. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be understood that the preferred technical scheme is only for illustrating the present invention and is not intended to limit the protection scope of the present invention.

It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.

In the following description, numerous details are set forth in order to provide a more thorough explanation of embodiments of the present invention, it will be apparent, however, to one skilled in the art that embodiments of the present invention may be practiced without these specific details, in other embodiments, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the embodiments of the present invention.

Specifically, fig. 1 is a flow chart of a network traffic classification method according to an embodiment of the present application.

As shown in fig. 1, the vehicle-mounted electronic instruction recommending method includes the following steps:

s10: network traffic data is acquired.

Optionally, in one embodiment of the present application, the network traffic data includes at least one of voice call traffic, video call traffic, file transfer traffic, instant messaging traffic, and web browsing traffic.

In a specific implementation process, the embodiment uses the OpenVPN data set disclosed herein, where the data set includes five different application types of traffic, which are VoIP (voice call), video (video call), FT (file transfer), chat (instant messaging), and browsing (web browsing), respectively. Traffic for each application type has both normal and encrypted forms, where encrypted traffic is protected using the TLS protocol.

S20: extracting a corresponding flow of the network traffic data; the flow is a set of communication packets having the same source IP address, source port, destination IP address, and destination port over a period of time.

It is understood that a flow refers to a set of communication packets having the same source IP address, source port, destination IP address, and destination port over a period of time, and a flow may be considered a communication process. This embodiment uses four tuples (source IP address, source port, destination IP address and destination port) to distinguish between different streams and saves each stream as a separate file.

S30: dividing the stream into a plurality of sessions according to preset rules.

Optionally, in an embodiment of the present application, for training and testing of the machine learning model, the method divides the extracted session into a training set and a testing set. The training set is used for training the model, and the testing set is used for evaluating the generalization capability of the model on unknown data. According to the method, training sets and test sets are randomly divided according to the ratio of 7:3, namely 70% of the flow of each application type is used as the training set, and 30% of the flow is used as the test set.

Optionally, in one embodiment of the present application, the specific step of S20 includes:

s31: a communication time threshold T is set.

S32: two flows with a communication time interval less than or equal to the communication time threshold T are classified as one session.

It is appreciated that Session tracking is a technique for tracking user operations in a Web application. Each user corresponds to a session, and the continuity and traceability of user data can be maintained by identifying the session to which the user belongs. When directly using the quaternion for session tracking, in order to avoid unlimited creation of new sessions, the same user's consecutive quaternions need to be generalized to the same session.

In a specific implementation process, a communication time threshold T is set to determine whether two flows belong to the same session, where t=30 seconds in this embodiment, that is, if the communication time interval of two flows does not exceed 30 seconds, the two flows may be classified as the same session.

S40: sessions having the same communication direction are aggregated into one packet block.

Optionally, in one embodiment of the present application, the sessions with communication direction of uplink are aggregated into one packet block, or the sessions with communication direction of downlink are aggregated into one packet block.

Optionally, in one embodiment of the present application, the packet block is represented by a binary group [ length, size ], where the length is the number of communication packets in the packet block and the size is the average byte size of the communication packets.

As shown in fig. 2, which is a schematic diagram of communication transmission performed by the packet block in the embodiment of the present application, in a specific implementation process, for each packet in each session, two features are extracted according to the direction (the uplink traffic is positive, the downlink traffic is negative) and the size (the number of bytes). Sessions having the same communication direction are then aggregated into one packet block, and the number (length) and average byte size (size) of communication packets in the packet block are calculated. Thus, each grouping block may be represented by a tuple [ length, size ]. For example, if there are two consecutive upstream communication packets in a session, which are 100B and 200B in size, respectively, they may be aggregated into one packet block [2,150]. In this way, each session can be converted into a sequence of blocks of packets.

It can be understood that by dividing the flow with the same source IP address, source port, destination IP address and destination port into a plurality of sessions and aggregating the sessions with the same communication direction into one packet block, the deep information of the traffic flow can be better aggregated to identify the flow, so that the detection and identification accuracy of the flow can be improved and the classification effect can be improved.

S50: the grouping blocks are converted into two-dimensional gray scale grouping images.

In a specific implementation process, as shown in fig. 3, the two-dimensional distribution matrix diagram of the grouping block in the embodiment of the present application is shown, and the X-axis of the two-dimensional distribution matrix diagram represents the length of the grouping block, and the value range of the X-axis is 0 to L. If L is too large, the active portion will be compressed to the left of the image; if L is too small, many of the packet blocks may be out of image range. Therefore, it is important to select the L value reasonably. By observing the packet blocks of different traffic, the present embodiment sets the L value between 10 and 150. The Y-axis of the two-dimensional distribution matrix diagram represents the size of the block, which has a range of (-1500, 1500], and a total of 3000 dimensions, where-1500 and 1500 correspond to the minimum and maximum mtus of ethernet, respectively.

Optionally, in one embodiment of the present application, the two-dimensional gray scale group image is normalized, and each pixel value of the two-dimensional gray scale group image is mapped into a [0,1] interval.

It will be appreciated that the present embodiment normalizes the packet image in order to bring the pixel values of the packet image within a reasonable range. The normalized two-dimensional gray scale group image can be used as a two-dimensional gray scale graph to represent the characteristics of each session in the network traffic.

S60: and identifying the two-dimensional gray group image by using an image identifier integrating a modal conversion mechanism and a self-attention mechanism to obtain an image identification text sequence.

Optionally, in one embodiment of the present application, an image identifier-MTED (model-Transform Block with integrated Encoder and Decoder, encoder and decoder models of integrated Modality conversion module) integrating a Modality conversion mechanism and a self-attention mechanism is proposed, as shown in fig. 4 and fig. 5, fig. 4 is an overall architecture diagram of the image identifier in the embodiment of the present application, and fig. 5 is a frame diagram of the image identifier for classifying network traffic in the embodiment of the present application. The image identifier comprises a modal conversion module, an encoder and a decoder which are connected in sequence; the mode conversion module is used for converting an input two-dimensional gray group image into an image sequence and transmitting the image sequence to the encoder; the encoder is used for extracting the characteristics of the input image sequence to obtain an image characteristic sequence; the decoder is used for identifying the input image characteristic sequence to obtain an image identification text sequence.

Optionally, in one embodiment of the present application, the modality conversion module includes a plurality of convolution layers. The step size of each layer is set to 2, and the number of channels is gradually increased in proportion to 2 times. This design keeps the product of the height of each layer and the number of channels always constant, dmodel. For each layer, the width and height (w ₀ ,h ₀ ) The width, height and channel number (w, h, c) of the nth layer can be obtained. After the last layer, a stitching operation is applied to reshape the features of the different channels into an image sequence. Each element of this image sequence has a dmedel dimension. This design allows the dimensions of the input to be unified when processing different types of input (e.g. images and text),thereby simplifying the design and training of the model. Position coding is used in the modality conversion module to represent each position in the image sequence. The role of position coding is to provide context information for each position in the sequence, which is important for processing sequence data. The position-coding function generates a periodically varying code by means of sine and cosine functions, which can generate a unique code for each position in the sequence.

Each element in the image sequence output by the modality conversion module has a dmedel dimension. This sequence of images can be regarded as an encoding of the input image, which contains spatial structure information in the image and is suitable for processing of the self-attention model.

It will be appreciated that by converting images into sequences, a powerful self-attention model can be used to process these data. The self-attention model is able to pay global attention to the elements in the input sequence, which means that the self-attention model can take into account all other elements in the sequence when understanding one element. This capability enables the model to capture long-range dependencies, which is important in both image and text processing. Through position coding, the self-attention model can learn the position information of each element in the input sequence. This is critical to understanding the order and structural information in the sequence data, especially when processing such data as images and text.

Optionally, in one embodiment of the present application, the encoder includes N encoding blocks connected in sequence; each coding block comprises a multi-head self-attention module, a feedforward full-connection layer and a normalization layer which are sequentially connected; the multi-head self-attention module is used for performing scaling dot product attention operation on an input image sequence to obtain an attention matrix; the attention matrix sequentially passes through the feedforward full-connection layer and the normalization layer to be converted and standardized to obtain a second image sequence; and adding the image sequence and the second image sequence to obtain an image characteristic sequence.

It will be appreciated that the multi-headed self-attention module allows the encoder to pay attention together to information from different representation subspaces at different positions of an image sequence, similar to a convolutional layer that applies a set of filters to extract various features. The multi-headed attention module first projects each scaled dot product attention through three different linear projections to project a query matrix, a key matrix, and a value matrix from the image sequence to a more discriminative representation. Then, the b-fold scaled dot product focus of the stack is performed in parallel, the output of the multi-headed self-focus module is connected to the linear layer to get the final output:

the input image sequence is subjected to dot product scaling and attention operation, and the expression of the attention matrix H is obtained as follows:

h _i ＝f(W _i ^(q) q，W _i ^(k) k，W _i ^(v) v)

wherein w is ₀ Is a learnable linear transformation parameter, h _i Representing the output of the ith attention head in the multi-head self-attention module, b is the number of attention heads in the multi-head self-attention module, f (·) attention convergence function, q is the query of the image sequence, k is the key of the image sequence, v is the value of the image sequence, W _i ^(q) 、W _i ^(k) And W is _i ^(v) Respectively a query weight matrix, a key weight matrix and a value weight matrix of an ith attention head in the multi-head self-attention module;

given query q ε R ^q And m "key-value" pairs (k ₁ ，v ₁ )，...，(k _m ，v _m ) Wherein k is _i ∈R ^k ，v _i ∈R ^v . Attention concentrating function f (W _i ^(q) _q ，W _i ^(k) k，W _i ^(v) v) is expressed as a weighted sum of values, the expression of which is as follows:

wherein query q and key k _i The attention weight (scalar) of (a) is obtained by mapping two vectors into scalar by an attention scoring function a and then performing softmax operation:

where m is the number of self-attention heads in the multi-head self-attention module and a (·) is the attention scoring function.

Optionally, in one embodiment of the present application, the decoder generates the image recognition text sequence from the image feature sequence output by the encoder and the original input tag. For each original input label, a learnable character-level embedding is applied, converting each character into a multidimensional vector. The resulting multidimensional vector is combined with a position code to form a decoder input.

Wherein the decoder is composed of N identical decoding blocks connected in turn. Similar to the encoder, the decoder block includes a second multi-headed self-attention module, a feed-forward full-join layer, and a normalization layer connected in sequence, but with two differences. First, a second multi-headed self-attention module, which adds a masking mechanism, is added to each decoded block using the autoregressive feature to ensure that the prediction of the position can only depend on the previously known outputs. Can be input by masking (set to- ≡) the mid-pair of softmax this is achieved for all values of the illegal connection. Further, the second multi-headed self-attention module has keys and values from the encoder output, and queries from previously decoded block outputs, with probabilities of converting the output to character classes by linear projection and softmax functions.

It can be appreciated that by converting the grouping block into a two-dimensional gray grouping image and utilizing the image identifier of the integrated modality conversion mechanism and the self-attention mechanism to the two-dimensional gray grouping image, the shift invariant feature of the two-dimensional gray grouping image can be captured more easily, and the classification efficiency of the network traffic can be greatly improved with higher parallelization and lower complexity.

S70: and classifying the image recognition text sequence by using a classifier to obtain the application type of the network traffic.

In this embodiment, a trained softmax classifier is used to convert the image recognition text sequence output by the decoder into probabilities for each class, and the application type of the network traffic is determined according to the class with the highest probability.

In addition, the present embodiment evaluates the classification accuracy of the softmax classifier using Accuracy (ACC) as a metric.

The expression of accuracy ACC is as follows:

wherein, A= { VoIP, video, FT, chat, browse }, TP _i And FP _i The number of true and false positives for class i are indicated, respectively.

It is understood that True Positives (TP) and False Positives (FP) represent the number of positive samples correctly classified and the number of negative samples incorrectly classified as positive samples, respectively.

In addition to ACC, confusion matrices may be used to better observe multi-classification problems. In the confusion matrix, each row represents a real label, and each column represents a predictive label. The diagonal represents the probability of correct prediction for each category. Precision, recall, and F1 score are also used. The definition is as follows:

a network traffic classification system according to an embodiment of the present application will be described next with reference to the accompanying drawings.

Fig. 6 is an architecture diagram of a network traffic classification system according to an embodiment of the present application.

As shown in fig. 6, the classification system includes: the system comprises an acquisition module 100, an extraction module 200, a division module 300, an aggregation module 400, a conversion module 500, an identification module 600 and a classification module 700.

The acquisition module 100 is shown for acquiring network traffic data; the extraction module 200 is configured to extract a corresponding flow of network traffic data; the flow is a set of communication packets having the same source IP address, source port, destination IP address, and destination port for a period of time; the dividing module 300 is configured to divide the flow into a plurality of sessions according to a preset rule; the aggregation module 400 is configured to aggregate sessions having the same communication direction into one packet block; the conversion module 500 is configured to convert the grouping block into a two-dimensional gray-scale grouping image; the recognition module 600 is configured to recognize the two-dimensional gray group image by using an image identifier that integrates a modal transformation mechanism and a self-attention mechanism, so as to obtain an image recognition text sequence; the classifying module 700 is configured to classify the image recognition text sequence by using a classifier, so as to obtain an application type of the network traffic.

It should be noted that the foregoing explanation of the network traffic classification method embodiment is also applicable to the vehicle-mounted electronic instruction recommending apparatus of this embodiment, and will not be repeated here.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, the various embodiments or examples described in this specification and the features of the various embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "N" is at least two, such as two, three, etc., unless explicitly defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, which are well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable gate arrays, field programmable gate arrays, and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, where the program when executed includes one or a combination of the steps of the method embodiments.

It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. A method for classifying network traffic, comprising:

acquiring network traffic data;

dividing the stream into a plurality of sessions according to a preset rule;

converting the grouping blocks into two-dimensional gray level grouping images;

2. The network traffic classification method according to claim 1, wherein the dividing the flow into a plurality of sessions according to a preset rule specifically comprises:

setting a communication time threshold T;

two flows with a communication time interval less than or equal to the communication time threshold T are classified as one session.

3. The network traffic classification method according to claim 1, wherein the aggregating sessions having the same communication direction into one packet block specifically comprises:

the session with the communication direction of uplink transmission is aggregated into a grouping block, or the session with the communication direction of downlink transmission is aggregated into a grouping block.

4. The network traffic classification method of claim 1 wherein the packet block is represented by a doublet [ length, size ], where length is the number of communication packets in the packet block and size is the average byte size of the communication packets.

5. The network traffic classification method according to claim 1, wherein after converting the packet block into a two-dimensional grayscale packet image, the method further comprises:

and carrying out normalization processing on the two-dimensional gray scale group image, and mapping each pixel value of the two-dimensional gray scale group image into a [0,1] interval.

6. The network traffic classification method according to claim 1, wherein the network traffic data comprises at least one of voice call traffic, video call traffic, file transfer traffic, instant messaging traffic, and web browsing traffic.

7. The network traffic classification method according to claim 1, wherein the image identifier comprises a modality conversion module, an encoder, and a decoder connected in sequence;

the mode conversion module is used for converting an input two-dimensional gray group image into an image sequence and transmitting the image sequence to the encoder;

the encoder is used for extracting the characteristics of the input image sequence to obtain an image characteristic sequence;

the decoder is used for identifying the input image characteristic sequence to obtain an image identification text sequence.

8. The network traffic classification method according to claim 7, wherein the encoder comprises N sequentially connected encoding blocks; each coding block comprises a multi-head self-attention module, a feedforward full-connection layer and a normalization layer which are sequentially connected;

the multi-head self-attention module is used for performing scaling dot product attention operation on an input image sequence to obtain an attention matrix;

the attention matrix sequentially passes through the feedforward full-connection layer and the normalization layer to be converted and standardized to obtain a second image sequence;

and adding the image sequence and the second image sequence to obtain an image characteristic sequence.

9. The network traffic classification method according to claim 8, wherein the scaling dot product attention operation is performed on the input image sequence to obtain an expression of an attention matrix H as follows:

h _i ＝(W _i ^(q) q,W _i ^(k) k,W _i ^(v) v)

attention concentrating function f (W _i ^(q) q,W _i ^(k) k,W _i ^(v) The expression of v) is as follows:

10. A network traffic classification system, comprising:

the acquisition module is used for acquiring network flow data;