CN110768829B

CN110768829B - Method for realizing linear increase of traffic analysis service performance based on DPDK

Info

Publication number: CN110768829B
Application number: CN201911009568.9A
Authority: CN
Inventors: 张广兴; 景阳; 王伟
Original assignee: Jiangsu Future Networks Innovation Institute
Current assignee: Jiangsu Future Networks Innovation Institute
Priority date: 2019-10-23
Filing date: 2019-10-23
Publication date: 2022-02-15
Anticipated expiration: 2039-10-23
Also published as: CN110768829A; WO2021077748A1

Abstract

The invention aims to provide a method for realizing linear increase of traffic analysis service performance based on DPDK. The first technical scheme of the invention is as follows: providing a method for establishing a multi-core system framework which is based on DPDK characteristics and suitable for linear performance increase; the second technical scheme of the invention is as follows: a stream node storage and operation method is provided, so that the stream node storage and operation method is suitable for the system framework of the first technical scheme. The system provided by the invention provides a method for increasing the system performance approximately linearly along with the increase of the network card or the CPU by allocating the use of hardware resources (the network card and the CPU) which have the greatest influence on the device performance, so as to solve the defects of the prior art, and theoretically achieve the aim that the system performance has no upper limit as long as the hardware resources are enough.

Description

Method for realizing linear increase of traffic analysis service performance based on DPDK

Technical Field

The invention relates to the technical field of data processing, in particular to a method for realizing linear increase of traffic analysis service performance based on DPDK.

Background

With the continuous development of traffic analysis services based on DPDK forwarding, people have higher and higher requirements on the services, and not only the requirements on analyzed contents are thinner and thinner, but also the requirements on system performance are higher and higher. In addition, in order to seize the market better, many device manufacturers often design multiple alternative models and multiple service combination switches for the same software product, so as to facilitate the selection of users. Such requirements cause inconvenience to the technician developing the flow analysis software. The main body is as follows: (1) the performance of the traffic analysis service is strongly correlated with the hardware of the device, so that if the device is changed, the performance of the original traffic analysis service cannot be well adapted to the existing device, and different versions are required to be maintained to adapt to the optimal system performance of different devices; (2) in the performance tuning process, the increase of hardware resources is found to be incapable of effectively improving the system performance; (3) hardware resources are tight, but some resources are wasted for performance tuning.

Disclosure of Invention

The invention aims to provide a method for realizing linear increase of traffic analysis service performance based on DPDK, which aims to realize a system with approximately linear increase of system processing performance along with increase of hardware resources, and theoretically, if the hardware resources are enough, no upper limit of the system performance can be realized. In the implementation process, the invention provides two technical schemes which cooperate with each other to complete the final goal.

The first technical scheme of the invention is as follows: providing a method for establishing a multi-core system framework which is based on DPDK characteristics and suitable for linear performance increase; the second technical scheme of the invention is as follows: a stream node storage and operation method is provided, so that the stream node storage and operation method is suitable for the system framework of the first technical scheme.

A DPDK-based method for realizing linear increase of traffic analysis service performance comprises the following steps: a method for establishing a multi-core system framework which is based on DPDK characteristics and suitable for linear performance increase; a stream node storage and operation method is suitable for a multi-core system framework which is based on DPDK characteristics and suitable for linear performance increase; the method for establishing the multi-core system framework which is based on the DPDK characteristics and suitable for linear performance increase comprises the following steps:

providing a configuration file, wherein the configuration file comprises the following contents: 1) and specifying packet receiving threads of the messages of each port, wherein the ratio of the packet receiving threads to the ports can be freely set, the total flow of all the ports can be processed by setting the number of the packet receiving threads, and the ports which are matched for each packet receiving thread are distributed to the packet receiving threads in a balanced manner as much as possible according to the processing capacity of the ports. 2) And appointing a packet processing thread corresponding to the packet receiving thread, wherein the packet processing thread is corresponding to the hardware port, and the corresponding principle is that the number of the packet processing threads is distributed to each port according to the port rate. 3) And the number of scanning and processing threads of the designated stream nodes is one-to-one according to the number of the ports. The benefits of setting these parameters in the configuration file are two-fold: 1) parameters can be flexibly set according to factors such as scenes, equipment types and the like; 2) the ratio of the parameters needs to be preset according to experience values, then adaptation and adjustment are carried out according to different services, and the adjustment process needs to be tried for many times, so that it is important to provide a configuration file which is convenient to adjust;

step (2) program initialization process, reading the number and the corresponding relation of ports, packet receiving threads, packet processing threads and stream scanning threads from the configuration file in step (1), then setting starting threads according to the read parameters (the starting threads, except the packet receiving threads and the packet processing threads, also comprise stream node information export threads and some special service threads), and distributing specific tasks for the two types of threads;

in the packet receiving and processing process of the step (3), after the DPDK port receives the message, the message is sent to a designated packet receiving thread according to the configuration parameters in the step (1), then the message is delivered to a packet processing thread selected by a policy for detailed service processing in the packet processing thread according to the distribution policy and the designated packet processing thread in the configuration file, and finally the message is forwarded or discarded;

in the process of outputting the stream information in step (4), the stream nodes are managed finally to derive the stream information concerned by the user, and the derived form is also many, for example, the stream information is output to a hard disk, a designated back-end server, and the like.

In the step (2), the program initialization process includes the following steps: 1) reading and recording configuration parameters; 2) setting a packet receiving queue for the port according to the configuration parameters and the DPDK characteristics, setting a packet receiving cache queue for the packet receiving thread, and setting a packet receiving cache queue for the packet processing thread; 3) for applying for the memory storage stream node for the service processing thread and establishing a stream node use and management mechanism, the detailed contents refer to the second technical scheme of the present invention.

In the step (3), the packet receiving and processing process includes the following steps: 1) setting a DPDK packet receiving environment, and setting parameters such as a queue, a packet receiving cache and the like for a DPDK port; 2) receiving a packet by using a DPDK packet receiving port, and delivering the received packet to a packet receiving cache ring corresponding to the port (a unique packet receiving cache ring is arranged between each port and a packet receiving thread); 3) receiving the packet from all the corresponding packet receiving cache rings circularly by the packet receiving thread; 4) the packet receiving thread distributes the message to each service processing thread according to the configuration relationship and the strategy of the port, the port and the packet processing thread (for each DPDK port, several queues are set for the DPDK port by several packet processing threads, RSS is enabled for the queues, and then the packet processing thread to be delivered by the message is selected by using the RSS value calculated by the DPDK multi-queues); 5) the packet processing thread is responsible for establishing a flow table, extracting and collecting information from the message according to specific service requirements and storing the information into the flow table.

In the step (4), the output process is only performed until the organization data is output, and the process comprises the following steps: 1) the scanning thread function is responsible for scanning all flow nodes in the flow table storage area corresponding to the port according to the corresponding port, sorting information to be output and putting the information into the cache area; 2) the stream node export thread is used for periodically fetching the stream node information output to be exported from the buffer area in the step 1).

The method for realizing linear increase of traffic analysis service performance based on DPDK also comprises a stream node storage and management method suitable for linear increase of performance, and the method mainly comprises the following steps:

the storage mode of the streaming nodes in the step (1) is as follows: 1) establishing a global two-dimensional array for storing stream nodes when a program is initialized; 2) the number of rows of the two-dimensional array is the number of used ports, and the number of columns is the number of packet processing; 3) each node of the two-dimensional array is a pointer pointing to a flow table structure, the pointed content is a flow table area, and the maximum flow node number supported by a corresponding port is applied in the flow table in advance.

And (3) a flow node management mode in the step (2) mainly comprises the operations of adding, updating, deleting and scanning a flow table.

The step (2) specifically comprises the following steps:

step 1) establishing a flow table management array using a port as an index when a program is initialized;

step 2) elements of the array are hash buckets for managing the flow tables, pointers pointing to the flow nodes are in the hash buckets, and actual memories point to the flow nodes pre-distributed in the step 1);

step 3) dividing a flow table management array area according to the number of the packet processing threads, namely, operating different flow table management areas by different packet processing threads corresponding to one port, so as to reduce the conflict of operating flow tables by different threads;

step 4) a flow node adding process, which is operated by a packet processing thread, wherein when no flow node of the current packet exists in the flow table management array, a flow node linked list is found by taking the current message port number and the packet processing thread number as indexes in the step 1), an idle flow node is taken out, flow node information is filled, then a hash value is calculated by the flow information, and the flow node is inserted into the flow table;

step 5), a flow node updating process is carried out, wherein the packet processing thread operates, when the flow node of the current packet exists in the flow table management array, the position of the flow node is found, and the flow information is updated;

step 6) a stream node deleting process, wherein the scanning thread operates to periodically scan the timestamp of each stream node, and for overtime stream nodes, the node breaks a pointer from the stream table management array and sets the memory of the stream node to be idle;

and 7) in the flow node scanning process, the number of scanning threads is set according to the number of the ports, each scanning thread is responsible for processing the flow nodes in the flow table of the corresponding port, and the main work of the scanning threads has two aspects, namely aging and deleting overtime flow nodes and collecting information in the flow nodes according to service requirements.

The invention has the beneficial effects that:

the system provided by the invention provides a method for increasing the system performance approximately linearly along with the increase of the network card or the CPU by allocating the use of hardware resources (the network card and the CPU) which have the greatest influence on the device performance, so as to solve the defects of the prior art, and theoretically achieve the aim that the system performance has no upper limit as long as the hardware resources are enough. The main points are as follows:

(1) setting the use conditions of the port and the CPU through the configuration file, and modifying the use of the port and the CPU for different devices by modifying the configuration file;

(2) the packet receiving thread and the packet processing thread are separated, namely the capability of the system for receiving the message and the traffic analysis service processing capability are considered separately, namely the processing capability of the network card and the processing capability of the CPU for performing service processing are considered separately, so that the advantage of adjusting the overall performance of the system according to the actual hardware conditions of different devices is achieved. In detail, the following points are: 1) the matching of the packet receiving threads only needs to consider whether the messages from all the ports can be received, if the processing capacity of the network card is strong, a plurality of packet receiving threads are used more, and if the processing capacity of the network card is weak, a plurality of packet receiving threads are used less; 2) the proportion of the packet processing threads only needs to consider the processing complexity of the traffic analysis service, more packet processing threads are used when the service is complex, and fewer packet processing threads are used when the service is simple.

(3) The corresponding relation between the ports and the packet receiving threads can be many-to-one or one-to-many, because one packet receiving thread can receive the current network traffic of 10G to 20G in the current packet receiving frame based on the DPDK, and most of the mainstream network cards used by the DPDK are single-

port

1G or 10G in speed, the number of the packet receiving threads is less than the total number of the ports in general case, which only needs to achieve the balanced packet receiving ratio of the packet receiving threads and can receive the messages of all the ports.

(4) The setting of the packet processing thread corresponds to the port, for example, a packet processing thread is configured for a 1G port, under the condition that the traffic analysis service is not very complex, the packet processing thread can perform all service processing on a 1G packet, for example, 4 or more packet processing threads are set for a 10G port, and the packet-packing policy of the packet-receiving thread ensures that the blind flow is divided into different packet processing threads, and the flow table management mechanism ensures that the processing capacity of the port is approximately linearly increased when the packet processing thread is added for the port.

(5) The flow node storage and use mechanism partitions the flow table area and reduces conflicts for each thread of the operation flow node, so that the access efficiency of the flow node is improved.

(6) The mechanisms of (3), (4) and (5) above may enable 1) to approximately linearly improve the overall performance of the system by increasing the port or the port traffic and proportionally increasing the number of packet receiving threads and packet processing threads; 2) for the conditions that the single-port flow is large and the flow analysis service is complex, the overall performance of the system is improved approximately linearly by increasing the number of packet processing threads; 3) based on 1)2), the problems of optimal performance of the same software product adaptable to different hardware devices and system resource waste can be solved.

Drawings

FIG. 1 is a flow node storage diagram according to the present invention.

Fig. 2 is a flowchart of a flow node management method according to the present invention.

Fig. 3 is a diagram of message distribution and thread setup according to the present invention.

FIG. 4 is a diagram of the packet processing thread and scan thread processing memory of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1 to 4, a method for implementing linear increase of traffic analysis service performance based on DPDK includes: a method for establishing a multi-core system framework which is based on DPDK characteristics and suitable for linear performance increase; a stream node storage and operation method is suitable for a multi-core system framework which is based on DPDK characteristics and suitable for linear performance increase; the method for establishing the multi-core system framework which is based on the DPDK characteristics and suitable for linear performance increase comprises the following steps:

The step (2) specifically comprises the following steps:

Example 1

The specific treatment process comprises the following steps:

(1) a stream node storage method is characterized in that when a program is initialized, a global variable is established to apply for, store and manage stream nodes. For example, a global variable g _ Flow _ BUF is established, the global variable is a two-dimensional array, the number of rows is the number of ports, the number of columns is the number of packet processing threads, each node is a pointer pointing to an I _ BUF type variable, and a chain of Flow types is maintained in the I _ BUF. The whole memory is well applied once when initialized, the pointer management is directly used when the packet processing needs to add a flow table, the pointer management is disconnected when the flow is aged, but the memory does not need to be released, namely the memory is recycled. A system configuration with 2 ports and 4 packet processing threads is shown, for example, in fig. 1.

(2) The Flow node management method comprises the steps that when a program is initialized, 2 ilink _ true structures are created according to the number of ports, each ilink _ true structure maintains a pointer pointing to a FlowBucket type array, the FlowBucket type array is a hash table for managing Flow, and the FlowBucket type array is used for managing Flow nodes by dividing regions according to the number of packet processing threads. A system configuration with 2 ports and 4 packet processing threads is shown, for example, in fig. 2.

(3) The method for setting the packet receiving thread and the packet processing thread comprises the steps of establishing a packet processing flow framework according to the corresponding relation of the port, the packet receiving thread and the packet processing thread in the configuration file when a program is initialized. Taking a slightly more complicated situation as an example, for example, the packet receiving environment has 4 1G ports, 2 10G ports, 2 packet receiving threads, and 8 packet processing threads, as shown in fig. 3. 1) A packet receiving thread rx _ thread0 that receives traffic of port 0(1G), port 1(1G), and port 4(10G), and a packet receiving thread rx _ thread1 that receives traffic of port 2(1G), port 3(1G), and port 5 (10G); 2) work _ thread0 handles port 0 traffic, work _ thread1 handles port 1 traffic, work _ thread2, work _ thread3 handles port 4 traffic, work _ thread4 handles port 2 traffic, work _ thread5 handles port 3 traffic, and work _ thread6, work _ thread7 handles port 5 traffic.

(4) After the port accesses the traffic, taking the traffic accessed in the port 0 and the port 5 in fig. 3 as an example, the traffic in the port 0 is received by rx _ thread0 and is delivered to the work _ thread0, and the traffic in the port 5 is received by rx _ thread1, and then the traffic in the port 5 is distributed to the work _ thread6 and the work _ thread7 for processing according to the principle that the same flow or the same user flow is distributed to the same work _ thread. At this time, the traffic of different ports is already put into different areas in the hash table in (2) for management according to different processing threads.

(5) In the flow table scanning method, when a program is initialized, one flow table scanning thread is provided for each port device and is responsible for regularly scanning the flow table corresponding to the port, as shown in fig. 4, the flow table area scanned by the scanning thread is a hash table in an ilink _ true structure corresponding to the port, a section of memory area can be processed by the scanning thread and one of packet processing threads at the same time, and a conflict point is a node in a hash bucket added and deleted.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A DPDK-based method for realizing linear increase of traffic analysis service performance is characterized by comprising the following steps: a method for establishing a multi-core system framework which is based on DPDK characteristics and suitable for linear performance increase; a stream node storage and management method is suitable for a multi-core system framework which is based on DPDK characteristics and suitable for linear performance increase; the method for establishing the multi-core system framework which is based on the DPDK characteristics and suitable for linear performance increase comprises the following steps:

providing a configuration file, wherein the configuration file comprises the following contents: 1) appointing the packet receiving thread of the message of each port, wherein the ratio of the packet receiving thread to the port can be freely set, but the setting of the number of the packet receiving threads can be met to process the total flow of all the ports, and the ports matched with each packet receiving thread are distributed to the packet receiving thread in a balanced way as much as possible according to the processing capacity of the port; 2) appointing a packet processing thread corresponding to the packet receiving thread, wherein the packet processing thread is corresponding to a hardware port, and the corresponding principle is that the number of the packet processing threads is distributed to each port according to the port rate; 3) the advantage of setting these parameters in the configuration file is two-fold, given the number of stream scan threads, one-to-one, per port number: 1) parameters can be flexibly set according to the factors of scenes and equipment types; 2) the ratio of the parameters needs to be preset according to experience values, then adaptation and adjustment are carried out according to different services, and the adjustment process needs to be tried for many times, so that it is important to provide a configuration file which is convenient to adjust;

step (2) program initialization process, reading the number and the corresponding relation of ports, packet receiving threads, packet processing threads and stream scanning threads from the configuration file in step (1), then setting starting threads according to the read parameters, and allocating specific tasks for the started threads, wherein the started threads comprise stream node information export threads and some special service threads besides the packet receiving threads and the packet processing threads, and allocate the specific tasks for the two types of threads;

in the packet receiving and processing process of the step (3), after the DPDK port receives the message, the message is sent to a specified packet receiving thread according to the configuration parameters in the configuration file provided in the step (1), then the message is delivered to a packet processing thread selected by a policy for detailed service processing in the packet processing thread according to the distribution policy and the specified packet processing thread in the configuration file, and finally the message is forwarded out or discarded;

in the process of outputting the stream information, the management of the stream nodes is finally to derive the stream information concerned by the user, and the derivation forms are also many and include outputting to a hard disk and outputting to a specified back-end server;

the storage mode of the streaming nodes in the step (1) is as follows: 1) establishing a global two-dimensional array for storing stream nodes when a program is initialized; 2) the number of rows of the two-dimensional array is the number of used ports, and the number of columns is the number of packet processing; 3) each node of the two-dimensional array is a pointer pointing to a flow table structure, the pointed content is a flow table area, and the flow table is applied for the maximum flow node number supported by a corresponding port in advance;

step (2) flow node management mode, mainly flow table adding, updating, deleting, scanning operation;

the stream node management method of step (2) specifically includes the following steps:

step 4) a flow node adding process, which is operated by a packet processing thread, wherein when no flow node of the current packet exists in the flow table management array established in the step 1), a flow node linked list is found by taking the current message port number and the packet processing thread number as indexes, an idle flow node is taken out, flow node information is filled, then a hash value is calculated by the flow information, and the flow node is inserted into the flow table;

2. The method according to claim 1, wherein in the step (2), the procedure initialization procedure includes the following steps: 1) reading and recording configuration parameters; 2) setting a packet receiving queue for the port according to the configuration parameters and the DPDK characteristics, setting a packet receiving cache queue for the packet receiving thread, and setting a packet receiving cache queue for the packet processing thread; 3) and applying for a memory storage stream node for the service processing thread, and establishing a stream node use and management mechanism.

3. The method according to claim 1, wherein in the step (3), the packet receiving and processing procedure includes the following steps: 1) setting a DPDK packet receiving environment, and setting a queue and packet receiving cache parameters for a DPDK port; 2) receiving packets by using a DPDK packet receiving port, and delivering the received messages to a packet receiving cache ring corresponding to the port, wherein a unique packet receiving cache ring is arranged between each port and a packet receiving thread; 3) receiving the packet from all the corresponding packet receiving cache rings circularly by the packet receiving thread; 4) the packet receiving thread distributes the packet to each service processing thread according to the configuration relationship and the strategy of the ports, the ports and the packet processing threads, wherein for each DPDK port, a plurality of packet processing threads set a plurality of queues for the DPDK port and start RSS for the queues, and then the packet processing thread to be delivered by the packet is selected by using the RSS value calculated by the DPDK multi-queue, and 5) the packet processing thread is responsible for establishing a flow table, and extracts and collects information from the packet according to the specific requirements of the service and stores the information into the flow table.

4. The method according to claim 1, wherein in the step (4), the output process is only until the data output is organized, and the process includes the following steps: 1) the scanning thread is responsible for scanning all flow nodes in the flow table storage area corresponding to the port according to the corresponding port, sorting information to be output and putting the information into the cache area; 2) and the stream node export thread is responsible for periodically fetching the stream node information output to be exported from the cache region.