CN115185878A

CN115185878A - Multi-core packet network processor architecture and task scheduling method

Info

Publication number: CN115185878A
Application number: CN202210569718.7A
Authority: CN
Inventors: 原德鹏; 张双林
Original assignee: Yusur Technology Co ltd
Current assignee: Yusur Technology Co ltd
Priority date: 2022-05-24
Filing date: 2022-05-24
Publication date: 2022-10-14

Abstract

The invention provides a multi-core packet network processor architecture and a task scheduling method, wherein the architecture comprises a plurality of message processor core groups connected to a bus; each message processor core group comprises a task buffer area, a task distributor and a plurality of message processor cores which are connected in parallel, wherein the task buffer area is used for temporarily storing tasks to be distributed from a bus, the task distributor periodically inquires the task buffer area and carries out task decomposition on the tasks in the task buffer area and sequentially distributes the tasks to the message processor cores in the message processor core group, and the plurality of message processor cores which are connected in parallel are used for respectively processing the tasks of the same type; and the message processor core in the message processor core group outputs a next-level task to the bus, and the next-level task is transmitted to a task buffer area of the message processor core group for processing the next-level task through the bus until the task flow processing is finished and the message is output. The invention can realize the high flexibility and high performance of the network processor architecture.

Description

Multi-core packet network processor architecture and task scheduling method

Technical Field

The invention relates to the technical field of network processors, in particular to a multi-core packet network processor architecture and a task scheduling method.

Background

The network is a transportation hub in a data center, is connected with all devices for running application services, and is a booming foundation of the internet at present. Network technology is constantly evolving, wherein increasing the flexibility and performance of network processing has been the direction of human effort in the field.

The network processor is a hardware basis for network processing, and the main function of the network processor is to process network messages. Fig. 1 is a flow chart of message processing in a network processor, and the processing flow can be roughly summarized as follows: the method comprises the following steps of message input and output, IP address table look-up, message header modification and message buffer management, wherein the IP address table look-up is realized by accessing an address table item, and the message buffer management is realized by combining off-chip storage.

The existing network processor can improve performance by designing and adjusting the architecture, and the existing architecture is mainly divided into an RTC architecture (Run-to-Completion) and a Pipeline architecture (Pipeline).

Fig. 2 is a flow chart of message processing of a modern RTC architecture of a network processor, where a message input enters a designated message processor core through a scheduling distributor, and the message processor can execute corresponding functions a, b, c, and d on the message, and then output the network message. The internal structure of the message processor core of the network processor is generally a standard von neumann architecture, and includes a memory, an instruction memory and a data memory including corresponding execution functions a, b, c, and d, and structures such as a controller, an input device, an output device, and an ALU (Arithmetic and logic Unit) Arithmetic Unit (ALU), and fig. 3 is a structure diagram of the von neumann architecture, in which a dotted line is a control flow signal, and a solid arrow is a data or instruction flow signal. The message processing is carried out under an RTC framework, one processor core can process processing flow functions a, b, c, d and the like of the whole network message, and the whole bandwidth and performance are improved through multi-core parallel processing. Generally, in a processing structure of an RTC, multiple high-level languages such as C can be flexibly programmed and then compiled into a series of instructions for execution, and such a structure has high flexibility.

Fig. 4 is a flow chart of message processing in the pipeline architecture of the network processor, wherein a message is input into the network processor, and the message processor core 0 executes the function a, the message processor core 2 executes the function b, and so on. Fig. 5 is a message processing time task distribution diagram of a network processor pipeline architecture, wherein a plurality of tasks enter each network processor sequentially, and each message processor core completes one of functions a, b, c, and d respectively, and the tasks are performed sequentially like an industrial pipeline. The idea of the pipeline architecture is that the tasks of a processor can be decomposed according to different software characteristics, and then each level is made into special hardware, so that the structure has better performance.

However, neither the RTC architecture nor the pipeline architecture has its limitations. Although the RTC architecture is flexible, new functions can be added through software codes in the later period, but the RTC architecture is limited by the main frequency and has weaker performance of an operation mechanism; although the pipeline architecture has the advantage of good performance, the fixed processing flow of the pipeline architecture lacks flexibility, and along with the rapid development of a protocol, the cured processing flow cannot meet the requirements of novel services, the development of a traditional chip usually needs a development period of 2-5 years, and as some functions are upgraded, a new chip needs to be developed to meet the requirements, so that the life cycle of the chip is greatly limited.

Therefore, how to provide a network processor architecture with flexible architecture and powerful performance and a task scheduling method based on the architecture is a problem to be solved urgently.

Disclosure of Invention

In view of this, embodiments of the present invention provide a processor architecture of a multi-core packet network, so as to improve the limitation of the existing network processor architecture, and enable the network processor architecture to have flexibility and higher performance.

One aspect of the present invention provides a multi-core packet network processor architecture that includes a plurality of message processor core sets coupled to a bus, each message processor core set for processing a predetermined type of task. Each message processor core group comprises a task buffer area, a task distributor and a plurality of message processor cores which are connected in parallel, wherein the task buffer area is used for temporarily storing tasks to be distributed from a bus, the task distributor periodically inquires the task buffer area, carries out task decomposition on the tasks in the task buffer area and sequentially distributes the tasks to the message processor cores in the message processor core group, and the plurality of message processor cores which are connected in parallel are used for respectively processing the tasks of the same type. And the message processor core in the message processor core group outputs a next-level task to the bus, and the next-level task is transmitted to a task buffer area of the message processor core group for processing the type of the next-level task through the bus until the task flow processing is finished and the message is output.

In some embodiments of the invention, the task buffer stores tasks on a first-in-first-out basis.

In some embodiments of the invention, the process of the task distributor periodically querying tasks within the task buffer is implemented by a clock management unit of the multi-core packet network processor.

In some embodiments of the present invention, the number of packet processor cores within each packet processor core group is a predetermined value.

In some embodiments of the invention, the message processor cores are based on a von neumann architecture, including a processor and a memory having stored therein computer instructions, the processor being configured to execute the computer instructions stored in the memory, the instructions in the memory being programmatically modified to enable customization of the type of task processed by each group of message processor cores.

In some embodiments of the invention, the number of packet processor core groups processing a single type of task is one or more.

In some embodiments of the invention, a message processor core within a message processor core group sends a signal onto a bus, which signal passes through the bus to the designated message processor core group.

In another aspect, the present invention provides a method for scheduling tasks based on the processor architecture of the multi-core packet network in any one of the above embodiments, including the following steps: the task buffer area in the message processor core group temporarily stores tasks to be distributed from a bus, the task distributor in the message processor core group periodically inquires the task buffer area, performs task decomposition on the tasks in the task buffer area and sequentially distributes the tasks to the message processor cores in the message processor core group, and the plurality of parallel connected message processor cores in the message processor core group are used for respectively processing the tasks of the same type. And the message processor core in the message processor core group outputs a next-level task to the bus, and the next-level task is transmitted to a task buffer area of the message processor core group for processing the type of the next-level task through the bus until the task flow processing is finished and the message is output. In some embodiments of the invention, computer instructions in the memory of a von neumann-based message processor core are programmatically modified to enable customization of the type of task processed by each message processor core group, wherein the number of message processor core groups processing a single type of task is one or more.

In some embodiments of the invention, the step of communicating between packet processor cores comprises: the message processor cores in the message processor core group send signals to the bus, and the signals reach the designated message processor core group through the bus.

The multi-core packet network processor architecture and the task scheduling method based on the architecture can combine the high flexibility of the RTC architecture and the high performance of the pipeline architecture, and do not introduce new complex problems and designs.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the specific details set forth above, and that these and other objects that can be achieved with the present invention will be more clearly understood from the detailed description that follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

fig. 1 is a flow chart of message processing in a network processor.

FIG. 2 is a flow chart of a modern RTC architecture message processing of a network processor.

Fig. 3 is a von neumann architecture diagram.

FIG. 4 is a flow diagram of network processor pipeline architecture message processing.

FIG. 5 is a diagram of a network processor pipeline architecture message processing time task distribution.

FIG. 6 is a block diagram of a multi-core packet network processor architecture according to an embodiment of the invention.

FIG. 7 is a block diagram of a bus interconnect architecture for multiple message processor cores according to an embodiment of the invention.

Fig. 8 is a diagram illustrating an internal task scheduling architecture of a packet processor core according to an embodiment of the present invention.

Fig. 9 is a flow chart of firewall traffic processing.

Fig. 10 is a schematic diagram illustrating an architecture adjustment for firewall services according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the structures and/or processing steps closely related to the scheme according to the present invention are shown in the drawings, and other details not so relevant to the present invention are omitted.

It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.

It is also noted that, unless otherwise specified, the term "coupled" is used herein to refer not only to a direct connection, but also to an indirect connection with an intermediate.

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the drawings, the same reference numerals denote the same or similar components, or the same or similar steps.

The invention aims to overcome the limitations of the RTC architecture and the pipeline architecture of the existing network processor and realize the coexistence of high flexibility and high performance. The network processor of the existing RTC architecture can not decompose the message processing flow, if the pipeline architecture is directly introduced on the basis of the RTC architecture, namely connection is established between message processor cores, because the number of cores of the existing network processor is very large and can reach hundreds, if hundreds of cores are interconnected, the network is very huge, one bus needs to fan out hundreds of data buses, the layout and the wiring of the chip are extremely difficult, and the time sequence is also tense.

The design idea of the invention is that a plurality of cores are grouped, each group is fixed with a certain number of processor cores, each group of cores processes a single type of task, group-level bus interconnection is carried out, the realization difficulty is greatly reduced due to the reduction of the number of buses, and the architecture provides a method for communication between cores and task scheduling, which better overcomes the problems or defects existing in the existing network processor architecture.

Therefore, in order to overcome the limitation of the existing network processor architecture, the present invention provides a multi-core packet network processor architecture and a task scheduling method, and fig. 6 is a schematic diagram of the multi-core packet network processor architecture according to an embodiment of the present invention. The architecture comprises a plurality of message processor core groups connected to a bus, wherein each message processor core group is used for processing a preset type of task and realizing a type of function. Each message processor core group comprises a task buffer area, a task distributor and a plurality of message processor cores which are connected in parallel, the task buffer area is used for temporarily storing tasks to be distributed from a bus, the task distributor periodically inquires the task buffer area, carries out task decomposition on the tasks in the task buffer area and sequentially distributes the tasks to the message processor cores in the message processor core group, and the plurality of message processor cores which are connected in parallel are used for respectively processing the tasks of the same type. And the message processor core in the message processor core group outputs a next-level task to the bus, and the next-level task is transmitted to a task buffer area of the message processor core group for processing the type of the next-level task through the bus until the task flow processing is finished and the message is output.

In one embodiment of the invention, the task buffer stores tasks according to a first-in first-out principle. The task buffer area stores tasks according to the first-in first-out principle, so that the consistency of the incoming and outgoing sequences of the messages is ensured, and a large amount of message retransmission is avoided.

In an embodiment of the invention, the process of the task distributor periodically querying the tasks in the task buffer is realized by a clock management unit of the multi-core packet network processor. The clock management unit comprises a clock tree and is responsible for processing time and timing of the whole processor.

In an embodiment of the present invention, the number of packet processor cores in each packet processor core group is a preset value. It should be noted that the number of the packet processor cores in different packet processor core groups may be the same or different, which is specifically set according to the situation. As the flow of message processing is almost as shown in fig. 1, and the scale of the demand of each step of the flow for the message processor core is different, the number of the message processor cores in the core group can be set according to the demand of each link for the message processor core. For example, in one embodiment of the present invention, there are 300 message processor cores in total, wherein twenty message processor cores in a group of 5 message processor cores, 10 message processor cores in a group of 10 message processor cores, and 5 message processor cores in a group of 20 message processor cores.

In an embodiment of the present invention, the message processor core is based on a von neumann architecture, and includes a processor and a memory, wherein the memory stores computer instructions, and the processor is configured to execute the computer instructions stored in the memory, and implement customization of task types processed by each message processor core group by programming and modifying the instructions in the memory.

In the embodiment of the invention, different software code programs can be downloaded to different core groups to realize the task types processed by the software definition core groups and the realized functions, and different hardware processing flows are defined by programming according to different business flows. The architecture supports any novel network protocol, realizes the separation of the network protocol and the hardware architecture, ensures the performance superior to that of an RTC architecture, and has high flexibility and over-line architecture. On the other hand, the architecture can reduce program storage space, each message processor core of the original RTC architecture needs to process a complete service flow, and each message processor core needs a complete service flow code. With the increase of the number of the message processor cores, the space occupied by the code amount is multiplied, and by dividing different message processor cores into different service functions, only the program codes of the functions of the single message processor core need to be solidified, the whole service codes do not need to be stored, and the storage of the code amount is greatly reduced.

It should be noted that the service refers to a task type processed by the packet processor core group, the task refers to a packet processing task specifically allocated to the packet processor core and the packet processor core group, and the function is an effect achieved after the packet processor core group processes the task.

In one embodiment of the invention, the number of packet processor core groups processing a single type of task is one or more.

Fig. 7 is a schematic diagram of a bus interconnection architecture of multiple message processor cores in an embodiment of the present invention, in which a message processor core 0, a message processor core 1, a message processor core 2, and a message processor core 3 are all connected to a bus, so as to implement communication and task scheduling between the message processor cores. In one embodiment of the present invention, a message processor core within a message processor core group sends a signal to a bus, and the signal reaches an assigned message processor core group through the bus. Each message processor core comprises an output module, the current message processor core can send the next-level task to the output module after processing the current task, and the output module directly connected to the bus distributes the next-level task to the corresponding message processor core group.

Fig. 8 is a diagram illustrating an internal task scheduling architecture of a packet processor core according to an embodiment of the present invention. In another aspect of the present invention, as shown in fig. 8, a task scheduling method based on any one of the foregoing embodiments of the processor architecture of the multi-core packet network is provided, including the following steps: the task buffer area in the message processor core group temporarily stores tasks to be distributed from a bus, the task distributor in the message processor core group periodically inquires the task buffer area, performs task decomposition on the tasks in the task buffer area and sequentially distributes the tasks to the message processor cores in the message processor core group, and the plurality of parallel connected message processor cores in the message processor core group are used for respectively processing the tasks of the same type. And the message processor core in the message processor core group outputs a next-level task to the bus, and the next-level task is transmitted to a task buffer area of the message processor core group for processing the type of the next-level task through the bus until the task flow processing is finished and the message is output. It should be noted that the task distributor in the packet processor core group decomposes and sequentially distributes the tasks to the packet processor cores, so as to preserve the order, that is, to ensure that the sequence of the incoming packets is consistent with the sequence of the outgoing packets, and to ensure the sequence of the network packets, otherwise, a large number of retransmission packets will occur.

As shown in fig. 8, in an embodiment of the present invention, the number of message processor cores inside the message processor core group is 4. The number is only an example, and may also be 5, 6, or 8, and the number of the packet processor cores in different packet processor core groups may be the same or different, and is set according to the time consumption length of this type of packet processing task.

In one embodiment of the invention, computer instructions in the memory of the von neumann-based message processor core are programmatically modified to enable customization of the type of task processed by each message processor core group, wherein the number of message processor core groups processing a single type of task is one or more. When the type of the processed task is time-consuming, a plurality of message processor core groups can be programmed to process the same type of task, so that the message processing efficiency is guaranteed.

In an embodiment of the present invention, the step of performing communication between packet processor cores includes: the message processor cores in the message processor core group send signals to the bus, and the signals reach the appointed message processor core group through the bus.

Based on the above method steps, the network processor architecture of the embodiment of the present invention implements inter-core communication, and in principle, any two packet processor cores can perform inter-core communication based on the communication mechanism, and perform inter-core task scheduling based on the communication signal. The inter-core task scheduling comprises the steps that after the message processor core group finishes processing the responsible task, a next-level task needing to be processed by the next-level message processor core group is generated, the task scheduling also comprises the steps of task scheduling among the message processor core groups for processing the same type of task, and the task is forwarded to other message processor core groups for processing the same type of task based on communication signals.

In an embodiment of the present invention, a change of a multi-core packet network processor architecture for a firewall service processing flow can be implemented. Fig. 9 is a flow chart of firewall service processing, which includes the following steps:

(1) The network message enters a network processor from a port, and the network processor analyzes the network message based on a network Protocol to obtain VLAN ID and network IP Address (Internet Protocol Address) information of the network message, including a source IP Address, a destination IP Address, a source port number and a destination port number.

(2) And (4) performing VLAN ID (Virtual Local Area Network Identity) table look-up matching, if the VLAN ID is matched, performing the next process, and if the VLAN ID is not matched, discarding the Network packet.

(3) And performing IP address table look-up matching, namely performing look-up matching on the destination IP address in the routing table, if the destination IP address is matched with the routing table, allowing the network message to pass, and if the destination IP address is not matched with the routing table, discarding the network packet.

(4) And forwarding the message, wherein if the network message passes through the message forwarding module, the message forwarding module forwards the message, and if the network message does not pass through the message forwarding module, the message forwarding module does not process the message, so that the message processing flow is ended.

Fig. 10 is a schematic diagram of architecture adjustment for firewall services according to an embodiment of the present invention, where the architecture adjustment is implemented by high-level language programming and compiling of a network processor core group, and the specific steps are as follows:

(1) The network message enters the port, i.e. the message is input into the network processor core group, the message processor core group 0 is used for executing network Protocol analysis, and analyzing the message to obtain the VLAN ID and network IP Address (Internet Protocol Address) information of the network message, including the source IP Address, the destination IP Address, the source port number and the destination port number.

(2) The message processor core group 1 carries out VLAN ID table lookup matching, if matching, the network message is allowed to pass, and if not matching, the network packet is discarded.

(3) And the message processor core group 2 and the message processor core group 3 carry out IP address table look-up matching, namely, the destination IP address is looked up and matched in a routing table, if the destination IP address is matched with the routing table, the network message is allowed to pass, and if the destination IP address is not matched with the routing table, the network packet is discarded.

(4) The message processor core group 4 executes message forwarding, if the network message passes through, the message is forwarded, and if the network message does not pass through, the message is not processed, so far, the message processing flow is finished.

Based on the message processing flow, the message processor core groups are combined to complete all steps of the whole message processing flow, thereby realizing different functions of message analysis, VLAN ID table lookup, IP address table lookup, message forwarding and the like. It should be noted that, the present invention can properly add or delete kernel groups according to different service type characteristics to improve concurrency performance, thereby reducing delay.

The invention combines the RTC architecture and the pipeline architecture of the multi-core network processor, realizes communication and task scheduling between core groups by building the message processor core groups and connecting the message processor core groups based on the bus, constructs a novel multi-core packet network processor architecture, and provides a task scheduling method based on the architecture, thereby increasing the performance without losing the flexible characteristic of the RTC.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions, or change the order between the steps, after comprehending the spirit of the present invention.

Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments in the present invention.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A multi-core packet network processor architecture is characterized by comprising a plurality of message processor core groups connected to a bus, wherein each message processor core group is used for processing a preset type task;

each message processor core group comprises a task buffer area, a task distributor and a plurality of message processor cores which are connected in parallel, wherein the task buffer area is used for temporarily storing tasks to be distributed from a bus, the task distributor periodically inquires the task buffer area, carries out task decomposition on the tasks in the task buffer area and sequentially distributes the tasks to the message processor cores in the message processor core group, and the plurality of message processor cores which are connected in parallel are used for respectively processing the tasks of the same type;

and the message processor core in the message processor core group outputs a next-level task to the bus, and the next-level task is transmitted to a task buffer area of the message processor core group for processing the type of the next-level task through the bus until the task flow processing is finished and the message is output.

2. The architecture of claim 1, wherein the task buffer stores tasks on a first-in-first-out basis.

3. The architecture of claim 1, wherein the task dispatcher periodically queries tasks in a task buffer via a clock management unit of the multi-core packet network processor.

4. The architecture of claim 1, wherein the number of packet processor cores within each packet processor core group is a predetermined value.

5. The architecture of claim 1, wherein the message processor cores are based on a von neumann architecture, comprising a processor and a memory, the memory having stored therein computer instructions, the processor being configured to execute the computer instructions stored in the memory to customize the type of task processed by each group of message processor cores by programmatically modifying the instructions in the memory.

6. The architecture of claim 5, wherein the number of groups of packet processor cores processing a single type of task is one or more.

7. The architecture of claim 1, wherein a message processor core within a message processor core group sends a signal onto a bus, the signal passing through the bus to a designated message processor core group.

8. A task scheduling method based on the multi-core packet network processor architecture of any one of claims 1 to 7, comprising the steps of:

the task buffer area in the message processor core group temporarily stores tasks to be distributed from a bus, the task distributor period query task buffer area in the message processor core group decomposes the tasks in the task buffer area and sequentially distributes the tasks to the message processor cores in the message processor core group, and the plurality of parallel connected message processor cores in the message processor core group are used for respectively processing the same type of tasks;

9. The method of claim 8, wherein the computer instructions in the memory of the message processor cores based on the von Neumann architecture are programmatically modified to enable customization of the type of task processed by each group of message processor cores, wherein the number of groups of message processor cores processing a single type of task is one or more.

10. The method of claim 8, wherein the step of communicating between packet processor cores comprises: the message processor cores in the message processor core group send signals to the bus, and the signals reach the designated message processor core group through the bus.