CN113572703B

CN113572703B - Online traffic service classification method based on FPGA

Info

Publication number: CN113572703B
Application number: CN202110825550.7A
Authority: CN
Inventors: 胡晓艳; 刘旭辉; 程光
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-07-21
Filing date: 2021-07-21
Publication date: 2024-04-09
Anticipated expiration: 2041-07-21
Also published as: CN113572703A

Abstract

The invention provides an on-line traffic business classification method based on an FPGA, which comprises the following specific steps: carrying out hash operation on the message quintuple to determine messages belonging to the same stream; extracting and storing the characteristics of the messages in the same stream; storing class information of the flow to which each message belongs by using a flow class information RAM; storing the characteristic data of each stream by using a stream statistics data RAM; and classifying flow traffic by utilizing a random forest model to extract flow characteristic data from the flow statistical data RAM, wherein the deployment of the random forest model adopts a method of direct description. The invention can classify the traffic at high speed and high accuracy, and provides a precondition for the QoS guarantee service of the network.

Description

Online traffic service classification method based on FPGA

Technical Field

The invention belongs to the technical field of network space safety, and relates to an online traffic service classification method based on an FPGA.

Background

With the high-speed development of the internet, the network speed is faster and the bandwidth is larger, and the network transmission mechanism of the maximum capacity delivery in the traditional network cannot meet the requirement of the network development. The QoS technology is used as a technology for effectively improving the service quality of the network, and can provide targeted services according to the requirements of different types of traffic. Therefore, the QoS technology can greatly improve the transmission performance and flexibility of the network, is beneficial to the safe and reliable operation of the network, and has been widely applied to devices such as switches, network processors and the like. The switch or the network processor can provide QoS guarantee service for the flow in the network from different granularities of application types, service types and the like, but can provide QoS guarantee service according to the granularity of the service types, and can achieve a balance in QoS service performance and resource saving. How to distinguish between different traffic types and thereby provide a corresponding level of service is an important issue.

The CPU in the switch or network processor is mainly used to provide forwarding functions, and additionally bears traffic classification work, which increases the load of the switch or network processor, and also has difficulty in meeting the increasing network speed requirement. In contrast, FPGA-based programmable switches and FPGA-based next generation network processors can utilize their FPGA chips to perform hardware acceleration processing on some functions, and due to the reprogrammable nature of the FPGA, can reconstruct hardware logic compared to stiff ASIC-specific forwarding chips. Therefore, the FPGA is utilized to realize online traffic service classification, so that high-speed processing capacity can be brought, and the continuous evolution requirement can be met.

On the one hand, in the existing work of accelerating QoS guarantee service by utilizing FPGA, most of the work starts from aspects of traffic shaping, queue scheduling and the like, and the identification of traffic service types is lacking. On the other hand, in the existing work of classifying traffic by FPGA, there are problems of insufficient accuracy and excessive resource consumption, and the existing work is to classify traffic in a network from the granularity of application type, and there is no work to distinguish traffic of an unused type from the granularity of traffic type.

Therefore, in order to perform online and high-accuracy service type identification on various flows in a network, the FPGA is utilized to accelerate the flow service classification process, and the message length and the arrival time interval of the first 4 messages of each flow are selected as classification characteristics, so that the extraction and calculation are facilitated. In addition, the random forest classification model is adopted to improve the classification accuracy and simultaneously fully utilize the parallel computing advantage of the FPGA. The FPGA deployment conversion of the random forest classification model adopts a direct description method, so that the resource consumption is reduced.

Disclosure of Invention

In order to identify traffic types of traffic in a network online and with high accuracy, an online traffic classification method based on an FPGA is provided. The classifying method classifies the traffic of each input message according to the granularity of the flow, so that the flow to which each message belongs is firstly determined. The flow to which the message belongs is determined by hashing the five-tuple. And secondly, extracting and storing the characteristics required by classifying the traffic service of each flow. Finally, training the classification model by using the public data set in a software mode, and deploying the trained model on a hardware level. In order to improve classification accuracy, a random forest model method is adopted. Because the online traffic classifier is a sub-module of the switch or network processor, it cannot take up too much hardware resources considering that there are many other working modules. Therefore, in FPGA deployment of the random forest model, a method for directly describing the random forest model is adopted, and lower resource occupation is realized.

In order to achieve the above purpose, the present invention provides the following technical solutions: an on-line traffic classification method based on FPGA includes the following steps:

(1) Training a random forest traffic classification model by using a data set containing traffic of different traffic types and taking statistical data of the first few messages of each flow as traffic type classification characteristics, and deploying the model on an FPGA by using a direct description method;

(2) The stream category information RAM stores the service category information of the quintuple stream, and the stream statistics data RAM stores the feature data of the quintuple stream. For an incoming message, determining whether the flow to which the message belongs is classified according to the information stored in the flow category information RAM of the message five-tuple reading, if so, marking by using a classification category, and if not, marking by using a default classification category and storing or updating corresponding data in the flow category information RAM and the flow statistic data RAM;

(3) For a five-tuple flow, after enough characteristic data is stored in a flow statistical data RAM, controlling to read all the characteristic data of the flow, and then sending the flow into the random forest flow business classification model deployed in the step (1);

(4) And (3) in a random forest classification business classification model part, firstly, carrying out simple calculation processing on the feature data transmitted in the step (3), then obtaining business class information of the flow through a random forest model, and writing the business class information into a flow class information RAM.

Further, the step (1) specifically includes the following sub-steps:

(1.1) classifying the traffic of each application in the public dataset into four traffic types of session type, streaming media type, interactive type and background type according to the suggestion of 3 GPP;

(1.2) extracting the message length and arrival time interval characteristics of the first 4 messages of each five-tuple flow through experimental comparison to train a random forest traffic service classification model;

and (1.3) deploying the trained random forest traffic service classification model on the FPGA by adopting a method of direct description. The random forest is composed of a series of decision trees, the non-leaf nodes of each decision tree have features to be compared and corresponding threshold values, the leaf nodes have corresponding classification information, and Verilog language is used for directly describing whether each node is a leaf node, feature data to be compared of each non-leaf node, the threshold values and other information;

and (1.4) designing two clocks to complete the flow service classification process, traversing all decision trees in the first clock cycle to obtain a classification result of each decision tree, and carrying out majority voting decision in the second clock cycle to obtain a final classification result. The majority voting process is realized by a combined circuit;

and (1.5) the integrated hardware logic circuit can realize the service classification function of the random forest flow service classification model.

Further, the step (2) specifically includes the following sub-steps:

(2.1) firstly extracting five-tuple, message length and arrival time of the message for the incoming message;

(2.2) performing hash calculation on the five-tuple extracted in (2.1) by using a CRC hash algorithm in a parallel mode, and determining the address stored in the stream category information RAM of the stream to which the message belongs;

(2.3) reading data in the stream category information RAM, including the classified mark, category information and the count of the arrived messages of the stream. If the classified flag is marked as classified, the message is marked as corresponding class information, and then the step (2.7) is carried out. Otherwise, marking the message as default category information, and then turning to the step (2.4);

and (2.4) if the message count is smaller than 4, storing the characteristic data, and then switching to the step (2.5). Otherwise, reading the characteristic data, and then turning to the step (2.6);

and (2.5) adding one to the message count stored in the stream information RAM by the stream to which the message belongs, simultaneously calculating the write address of the stream statistics data RAM, and writing the extracted message length and the timestamp information in the step (2.1) into the stream statistics data RAM. Then go to step (2.7);

(2.6) triggering the classification signal, and calculating the read address of the stream statistics RAM. Then go to step (2.7);

and (2.7) sending the marked message out of the online traffic classification module.

Further, the step (3) specifically includes the following sub-steps:

(3.1) for the same stream, four pieces of characteristic data need to be stored, the stream statistics data RAM adopts a form of continuous memory blocks to store the data of the same stream, and the data stored in each row comprises a message length and a time stamp;

(3.2) when the trigger classification signal is detected, sending the read address of the stream statistics data RAM into the FIFO for caching;

(3.3) if the FIFO is not empty, reading the read address of the stream statistics RAM in the FIFO;

(3.3) reading all characteristic data of the stream to which the message belongs according to the read address of the stream statistic data RAM, and sequentially reading four pieces of continuous data taking the address as a first address in four clock cycles through a finite state machine;

and (3.4) sending the four pieces of read data into the classification model obtained in the step (1.5) for service classification, and simultaneously inputting the four pieces of read data into the stream statistics data RAM read address.

Further, the step (4) specifically includes the following sub-steps:

(4.1) for four pieces of characteristic data of the same stream, enabling the four pieces of characteristic data to be delayed by one clock period, and converting the message length and the time stamp into a message length and an arrival time interval;

(4.2) inputting the message length and the arrival time interval characteristics into a random forest service classification model to classify the service, so as to obtain a service class;

(4.3) performing opposite calculation to the read address of the incoming stream statistics data RAM in (2.6), obtaining the address to be written into the classification result in the corresponding stream class information RAM, and writing the service class obtained in (4.2).

Further, the step (2.5) specifically comprises the following sub-steps:

(2.5.1) adding one time to the message count, wherein the write address of the stream class information RAM is the read address of the stream class information RAM obtained by calculating the five-tuple in (2.1), and the write information only changes the message count field;

(2.5.2) because of the classification information of one stream stored in the stream classification information RAM, four pieces of characteristic data of the stream need to be stored in the stream statistics RAM. The stream statistics RAM read address is thus obtained by shifting the stream class information RAM read address by two bits plus the message count.

Further, in the step (2.6), the purpose of the flow statistics RAM read address is to obtain the first address of the flow statistics RAM storing the flow characteristic data and sequentially read out all the characteristic data according to the address, so the flow statistics RAM read address is set to be two bits shifted to the left of the flow category information RAM read address.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) The invention can classify the high-speed traffic class on line, provides preconditions for the network QoS guarantee technology, and effectively improves the network efficiency.

(2) The invention has the characteristics of convenient extraction, simple processing process and meeting the real-time requirement, and the message length and the arrival time interval of the first four messages of each stream are selected.

(3) The invention adopts the random forest model, improves the accuracy of service classification, and can synchronously traverse each decision tree of the random forest model by utilizing the parallel computing advantage of the FPGA. The method of direct description is adopted for the deployment of random forests, so that lower resource occupation is realized.

Drawings

Fig. 1 is a frame of an on-line traffic classification method based on FPGA provided by the present invention.

Fig. 2 shows the accuracy of different machine learning algorithm models under the condition that only the message length characteristic and the message length and the arrival time interval characteristic are used simultaneously under the condition of the first N datagrams.

Fig. 3 is a comparison of time required for traffic classification by software and hardware, respectively.

FIG. 4 shows the resource consumption of the random forest model under different tree numbers when the Xc7z100 ffg-2 model FPGA is used.

Detailed Description

The technical scheme provided by the present invention will be described in detail with reference to the following specific examples, and it should be understood that the following specific examples are only for illustrating the present invention and are not intended to limit the scope of the present invention.

The invention provides an on-line traffic business classification method based on an FPGA, and the framework of the method is shown in figure 1. The random forest business classification model is deployed through a method of direct description after training through a data set; the five-tuple and feature extraction, five-tuple hash, stream category information RAM, marking and other modules are responsible for carrying out feature extraction, address calculation, reading the service type of the stream to which the message belongs and marking and outputting the message entering the classification frame; the stream statistics data RAM is responsible for storing characteristic data of each stream, and sending all data of one stream into a random forest classification model when a classification signal is received; the method comprises the steps of firstly converting feature data such as the length of an incoming message and a time stamp into the length of the message and the time interval of arrival feature in a random forest model, then taking the feature as the input of a classification model, and writing the classification result into a stream category information RAM as the service type of the stream after the classification result is obtained.

Specifically, the method of the invention comprises the following steps:

(1) The method comprises the steps of training a random forest traffic classification model by using a data set containing traffic of different traffic types and taking statistical data of the first few messages of each flow as traffic type classification characteristics, and deploying the model on an FPGA by using a direct description method.

The specific process of the step is as follows:

(1.1) classifying traffic of various applications in the public dataset into four traffic types of session type, streaming media type, interactive type and background type according to the suggestion of 3GPP organization;

(2) The stream category information RAM stores the service category information of the quintuple stream, and the stream statistics data RAM stores the feature data of the quintuple stream. For an incoming message, determining whether the flow to which the message belongs is classified according to the information stored in the flow category information RAM of the message five-tuple reading, if so, marking by using a classification category, and if not, marking by using a default classification category, and storing or updating corresponding data in the flow category information RAM and the flow statistic data RAM.

The specific process in the step is as follows:

(2.6) triggering the classification signal, and calculating the read address of the stream statistics RAM. The purpose of the stream statistics RAM read address is to obtain the first address in the stream statistics RAM where the stream feature data is stored and read out all the feature data sequentially according to the address, so the stream statistics RAM read address is set to be the stream class information RAM read address shifted two bits to the left. Then go to step (2.7);

the method specifically comprises the following steps:

and (3.4) sending the four pieces of read data into the classification model obtained in the step (1.5) for service classification, and simultaneously inputting the four pieces of read data into a stream statistics data RAM read address (not including the lower two bits).

The method specifically comprises the following steps:

The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the embodiment, and also comprises the technical scheme formed by any combination of the technical features. It should be noted that modifications and adaptations to the invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims

1. An on-line traffic classification method based on FPGA is characterized by comprising the following steps:

(1.3) a method of direct description is adopted for a trained random forest flow business classification model to be deployed on an FPGA, the random forest is composed of a series of decision trees, the non-leaf nodes of each decision tree have characteristics to be compared and corresponding threshold values, the leaf nodes have corresponding classification information, and Verilog language is used for directly describing whether each node is a leaf node, the characteristic data to be compared of each non-leaf node and the threshold values;

(1.4) designing two clocks to complete the flow business classification process, traversing all decision trees in the first clock period to obtain a classification result of each decision tree, and carrying out majority voting in the second clock period to obtain a final classification result, wherein the majority voting process is realized through a combined circuit;

(1.5) the integrated hardware logic circuit can realize the service classification function of the random forest flow service classification model;

(2) The stream category information RAM stores service category information of five-tuple streams, the stream statistical data RAM stores feature data of the five-tuple streams, for an incoming message, whether the stream to which the message belongs is classified is determined according to information stored in the stream category information RAM read by the message five-tuple, if the stream is classified, the stream is marked by a classification category, if the stream is not classified, the stream is marked by a default classification category, and corresponding data is stored or updated in the stream category information RAM and the stream statistical data RAM;

(3.2) when the triggered classification signal is detected, sending the read address of the stream statistics data RAM into the FIFO for caching;

(3.4) sending the four pieces of read data into the classification model obtained in the step (1.5) for service classification, and simultaneously inputting the four pieces of read data into a stream statistics data RAM read address;

2. The FPGA-based on-line traffic classification method according to claim 1, wherein the step (2) specifically comprises the following sub-steps:

(2.3) reading data in the stream category information RAM, wherein the data comprises a classified mark, category information and the count of the arrived messages of the stream, if the classified mark is marked as classified, marking the messages as corresponding category information, then turning to the step (2.7), otherwise, marking the messages as default category information, and then turning to the step (2.4);

(2.4) if the message count is smaller than 4, storing the characteristic data, then turning to the step (2.5), otherwise, reading the characteristic data, and then turning to the step (2.6);

(2.5) adding one to the message count stored in the stream information RAM of the stream to which the message belongs, calculating the write address of the stream statistics data RAM, writing the extracted message length and the timestamp information in the step (2.1) into the stream statistics data RAM, and then turning to the step (2.7);

(2.6) triggering a classification signal, calculating a stream statistics data RAM read address, and then turning to the step (2.7);

3. The FPGA-based on-line traffic classification method according to claim 1, wherein the step (4) specifically comprises the following sub-steps:

4. The FPGA-based on-line traffic classification method according to claim 2, wherein the step (2.5) specifically comprises the following sub-steps:

(2.5.2) because of the classification information of one stream stored in the stream class information RAM, four pieces of characteristic data of the stream need to be stored in the stream statistics RAM, so the stream statistics RAM read address is obtained by shifting the stream class information RAM read address by two bits to the left and adding the packet count.

5. The FPGA-based on-line traffic classification method according to claim 2, wherein in the step (2.6), the purpose of the flow statistics RAM read address is to acquire a first address of the flow statistics RAM storing the flow feature data and sequentially read out all feature data according to the address, so that the flow statistics RAM read address is set to be two bits of left shift of the flow class information RAM read address.