WO2017067215A1

WO2017067215A1 - Method and system for packet scheduling using many-core network processor and micro-engine thereof, and storage medium

Info

Publication number: WO2017067215A1
Application number: PCT/CN2016/088163
Authority: WO
Inventors: 袁力
Original assignee: 深圳市中兴微电子技术有限公司
Priority date: 2015-10-21
Filing date: 2016-07-01
Publication date: 2017-04-27
Also published as: CN106612236A; CN106612236B

Abstract

The present invention discloses a method and system for packet scheduling using a many-core network processor and micro-engines thereof. The method comprises: upon an input of a packet, requesting an idle pointer for the packet; storing the packet at a location in a shared buffer where the pointer points to, and storing the pointer to a corresponding stream queue, wherein scheduling of the stream queue is performed by means of polling; when the pointer in the stream queue is scheduled, searching for a micro-engine corresponding to the pointer to map the packet corresponding to the pointer to the micro-engine; and if a micro-engine corresponding to the pointer is not found, then re-storing the pointer to the corresponding stream queue.

Description

Packet scheduling method, system, and storage medium for many core network processors and their microengines

Technical field

The present invention relates to network processor technologies, and in particular, to a message scheduling method, system, and storage medium for a many-core network processor and a micro-engine thereof.

Background technique

The network processor is the core component of the forwarding surface of the data communication field. It is a solution that takes into account the speed and flexibility of the processor. It can flexibly modify the service microcode to meet the needs of various basic and complex network services. Expansion of the business.

Due to the rapid development of network services, the processing power requirements of network processors are getting higher and higher, and single or a small number of micro-engines are far from meeting the processing power requirements. The solutions currently solved mainly include: increasing the system frequency and increasing the number of cores. The method of increasing the frequency of the system, the development speed of the semiconductor process has lagged far behind the demand for processing capacity improvement, and the method of simply increasing the system frequency by adopting the new process can no longer meet the demand for processing capacity improvement. At present, the processing power of high-end network processors has reached more than 500 Gbps, and the frequency of micro-engines is generally in the range of 1 GHz to 2 GHz, and several or dozens of micro-engines cannot achieve the required processing power at all. Therefore, network processors using many-core architectures have become an inevitable choice. The number of microengines in a network processor can be estimated simply by the following formula:

Among them, Me_num is the number of micro-engines, Performance is the processing capability (unit bps), Pkt_len is the packet length, Instr_num is the number of service micro-codes, and Freq is the system main frequency.

According to the analysis of the processing performance requirements of the network processor, using the above formula, you can estimate The number of micro-engines for future commercial network processors can reach 256 or more, which is necessarily a multi-core processor. The huge number of microengines has brought a series of problems.

First, due to the large number of microengines, the organizational structure between microengines, namely on-chip network and routing, has become one of the keys to performance. Commonly used on-chip networks include: ring, mesh, torus, tree, and disk. As the number of micro-engines increases, the number of on-chip network nodes increases sharply. The more bandwidth, delay, and load imbalance caused by routing algorithms The more serious it is.

Secondly, the mapping between messages and microengines is also a complex problem. There are many mapping algorithms proposed at present, but most of them can only improve one aspect or the algorithm is too complicated to be implemented on hardware due to network data flow. The packet between the same data stream has the locality of the upper layer application. The sequence of the packets entering and leaving the network processor needs to be ensured as much as possible to avoid timeout retransmission of the upper layer of the network. Load balancing is related to the ability to fully utilize processing power and the length of processing delay. In addition, the instruction storage of the micro engine generally adopts a cache structure to improve the efficiency of fetching, and if the message of the same processing flow can be processed into the same kernel, the efficiency of the instruction cache can be improved. Therefore, a good mapping algorithm needs to achieve the best balance point in terms of order-preserving, load balancing, and instruction cache efficiency. This is one of the difficulties and research hotspots of many-core network processors.

Summary of the invention

To solve the above technical problem, an embodiment of the present invention provides a message scheduling method, system, and storage medium for a many-core network processor and a micro-engine thereof.

The message scheduling method based on the micro engine in the many-core network processor provided by the embodiment of the present invention includes:

When there is a message input, apply for a free pointer for the message;

And storing the packet in a location in the shared cache, and storing the pointer to a corresponding flow queue, where the flow queues are scheduled by using a polling scheduling manner;

When the pointer in the flow queue is scheduled, searching for the micro engine corresponding to the pointer to Transmitting a message corresponding to the pointer to the micro engine;

When the micro engine corresponding to the pointer is not found, the pointer is stored again to the corresponding stream queue.

In the embodiment of the present invention, each flow queue maintains a dynamic mapping table. In the dynamic mapping table of each flow queue, the initial values of the entries 0 to N-1 are sequentially from microengine 0 to microengine N-1;

Where N is the total number of microengines.

In the embodiment of the present invention, the method further includes:

Updating the dynamic mapping table according to the update of the message status.

In the embodiment of the present invention, the updating the state of the micro engine in the dynamic mapping table according to the update of the packet status includes:

After the message completes the mapping or the message is processed in the microengine, the following status is updated:

a flag indicating whether the microengine is completely idle;

Whether the micro engine is occupied by one or two flow queues;

The number of idle threads in the microengine;

When the micro engine is occupied by two flow queues, the number of threads occupied by each flow queue;

The flow queue counts the total number of packets in all microengines.

In the embodiment of the present invention, the method further includes:

After the status update, the dynamic mapping table is updated according to the updated status according to the following rules:

When there is no message or only one flow queue in the micro engine in the header of the mapping table of the flow queue, and there is an idle thread, the mapping table is not updated;

When the micro engine in the header of the mapping table of the flow queue has only one queue flow and there is no idle thread, start from the header according to the order in the mapping table, and select the micro engine according to the following rules:

Prefers to a fully idle microengine; secondly selects a microengine with only one flow queue and the most idle threads; if neither of the above is satisfied, writes an invalid flag to the mapping table; according to the above rules, the selected microengine The number moves to the header of the mapping table, and the remaining entries in the mapping table are sequentially moved backward;

When there are two flow queues in the micro engine in the header of the mapping table of the flow queue, and there are idle threads, the mapping table is updated according to the following rules:

When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and there is a completely idle micro engine, the flow queues occupying fewer threads are remapped to the completely idle micro engine, wherein the preferred one is selected. The idle idle micro engine in the mapping table;

When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and the micro engine is not completely idle, the mapping relationship remains unchanged;

When the micro engine in the header of the mapping table in the flow group has two flow queues and no idle threads, the mapping table is updated according to the following rules:

For the mapping table corresponding to the flow queue occupying a small number of threads, the first fully idle microengine in the mapping table is preferentially selected, and then the microengine with only one flow queue and the most idle thread is selected; otherwise, the mapping table is not maintained. Changing the selected microengine to the head of the mapping table, and the remaining entries are sequentially moved backward; for another flow queue in the microengine, keeping the mapping table unchanged;

When the total number of the packets of the flow queue in all the micro-engines is 0 and exceeds a certain time threshold, all the mapping table entries of the flow queue are invalid.

The packet scheduling system based on the microengine in the many-core network processor provided by the embodiment of the present invention includes:

The application unit is configured to apply for a free pointer for the message when a message is input;

a storage unit configured to store the message to a location in the shared cache to which the instruction is directed, and store the pointer to a corresponding flow queue, wherein the flow queue is polled Scheduling in degrees

a scheduling unit, configured to: when the pointer in the flow queue is scheduled, search for a micro engine corresponding to the pointer, to map the packet corresponding to the pointer to the micro engine;

The storage unit is further configured to store the pointer to the corresponding flow queue again when the micro engine corresponding to the pointer is not found.

Where N is the total number of microengines.

In the embodiment of the present invention, the system further includes:

And an update unit configured to update the dynamic mapping table according to the update of the message status.

In the embodiment of the present invention, the updating unit is further configured to update the following status after the message completes mapping or the processing of the packet in the microengine is completed:

a flag indicating whether the microengine is completely idle;

Whether the micro engine is occupied by one or two flow queues;

The number of idle threads in the microengine;

The flow queue counts the total number of packets in all microengines.

In the embodiment of the present invention, the updating unit is further configured to: after the status update, update the dynamic mapping table according to the updated state according to the following rules:

Preference is given to a completely idle microengine; secondly, only one stream queue is selected and the idle thread is the most a plurality of microengines; if neither of the above is satisfied, writing an invalid flag to the mapping table; according to the above rule, moving the selected microengine number to the header of the mapping table, and the rest of the mapping table The entries move backwards in order;

The many-core network processor provided by the embodiment of the present invention is composed of multiple micro-engines;

a plurality of said microengines forming a group, said plurality of groups forming a family, a fully parallel structure between said group and said family; a two-dimensional network structure between said families; In the middle of the grid, a plurality of routing modules are disposed, and the routing module adopts a two-dimensional network structure;

The routing module is configured to input and output messages to the upper, lower, left, and right families, access the off-chip storage, and access the co-processing module;

The routing module is configured to complete the packet, access the off-chip storage request and return, access the request and return route of the co-processing module, complete the message, request or return the transmission to the specified destination group or the scheduling module.

The storage medium provided by the embodiment of the present invention stores a computer program configured to execute the message scheduling method based on the micro engine in the many-core network processor.

In the technical solution of the embodiment of the present invention, the many-core network processor is composed of a plurality of micro-engines; wherein, the plurality of micro-engines form a group, the plurality of groups form a group, and the group and the family are A full parallel structure is adopted; a two-dimensional network structure is adopted between the families; in the middle of the family grid, a plurality of routing modules are disposed, and the routing module adopts a two-dimensional network structure. When there is a message input, requesting a free pointer for the message; storing the message to a location in the shared cache pointed to by the instruction, and storing the pointer to a corresponding flow queue, wherein the flow Queues are scheduled by using a polling scheduling manner; when a pointer in the flow queue is scheduled, the micro engine corresponding to the pointer is searched to map the packet corresponding to the pointer to the micro engine; When the micro engine corresponding to the pointer is found, the pointer is stored again in the corresponding flow queue. In this way, the degree of disorder of the message in the same packet flow is reduced, the efficiency of the micro-engine instruction cache is improved, and the balance of load balancing, message order-preserving, and instruction cache efficiency is achieved, which satisfies the requirement of high-performance forwarding.

DRAWINGS

1 is a schematic flowchart of a packet scheduling method based on a microengine in a many-core network processor according to an embodiment of the present invention;

2 is a schematic structural diagram of a packet scheduling system based on a microengine in a many-core network processor according to an embodiment of the present invention;

3 is a schematic structural diagram of a 256-core network processor microengine of a conventional structure;

4 is a schematic diagram of a multi-level structure of a 256-core processor microengine;

Figure 5 is a schematic diagram of a microengine hierarchy inside a cluster;

FIG. 6 is a schematic diagram of a message mapping process according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of an implementation example of a serial mode according to an embodiment of the present invention; FIG.

FIG. 8 is a schematic diagram of an implementation example of a serial-to-mix mode according to an embodiment of the present invention.

detailed description

The embodiments of the present invention are described in detail below with reference to the accompanying drawings.

1 is a schematic flowchart of a packet scheduling method based on a micro-engine in a many-core network processor according to an embodiment of the present invention. As shown in FIG. 1, the packet scheduling method based on a micro-engine in a multi-core network processor includes the following: step:

Step 101: When there is a message input, apply for a free pointer for the message.

In the embodiment of the present invention, the threads and states of all the micro engines are monitored in real time by a monitoring module to implement global scheduling based on thread granularity.

The virtual output queue (VOQ) mode is used to prevent the "header blocking" phenomenon of packet mapping between different flows. After each message is input, a free pointer is applied first, and the message is stored in the location pointed to by the pointer in the shared cache, and the pointer is stored in the queue of the corresponding stream. The flow queues are output to the mapping module by polling scheduling (a polling measurement such as fairness, weighting, or priority). The packets that failed to be mapped are returned to the corresponding flow queue, and the mapping is performed next time (priority or other methods can be used).

Each of the flow queues maintains a dynamic mapping table. In the dynamic mapping table of each flow queue, the initial values of the entries 0 to N-1 are sequentially from microengine 0 to microengine N-1;

Among them, N is the total number of micro-engines, that is, the micro-engine 0 is preferentially selected at the beginning.

Step 102: Store the message to a location in the shared cache that the instruction points to, and store the pointer to a corresponding flow queue, where the flow queues are scheduled by using a polling scheduling manner.

Step 103: When the pointer in the flow queue is scheduled, the micro engine corresponding to the pointer is searched to map the packet corresponding to the pointer to the micro engine.

Step 104: When the micro engine corresponding to the pointer is not found, the pointer is stored again to the corresponding flow queue.

In the embodiment of the present invention, the method further includes:

Specifically, after the message completes the mapping or the message is processed in the micro engine, the following status is updated:

a flag indicating whether the microengine is completely idle;

Whether the micro engine is occupied by one or two flow queues;

The number of idle threads in the microengine;

The flow queue counts the total number of packets in all microengines.

In the embodiment of the present invention, after the status update, the dynamic mapping table is updated according to the updated state according to the following rules:

When there are two flow queues in the micro engine in the header of the mapping table of the flow queue, and there is idle When the thread is updated, the mapping table is updated according to the following rules:

The above rules are further explained below.

(1) If there is no message or only one stream in the micro engine of the header, and there is an idle thread, the mapping table does not need to be updated.

(2) If there is only one stream in the mapping table of a stream, but there is no idle thread in the header, then according to the order in the mapping table, starting from the header, select the micro engine according to the following algorithm:

Prefer to a fully idle microengine.

Second, select only one stream, and the micro-engine with the most idle threads.

If the above two are not satisfied, the invalid flag is written to the mapping table, that is, the packet of the flow cannot be mapped to the micro engine temporarily, and is cached in the queue.

According to the above principle, move the selected microengine number to the header of the mapping table, and the rest of the table. The entries move backwards in order.

(3) If there is a flow map of a stream, there are 2 streams in the micro engine of the header, and there are idle threads, update the mapping table according to the following algorithm:

In order to prevent there are always 2 streams in a micro engine, if one stream in the micro engine occupies more threads than the other stream, and there is a completely idle micro engine, then the stream occupying less threads is remapped to completely idle. In the microengine, the idle microengine in the top of the mapping table is preferred.

If the number of threads occupied by one stream in the microengine is greater than the other stream, but there is no fully idle microengine, the mapping relationship does not change.

(4) If there is a flow map of a flow, the micro engine of the header has 2 streams, but there is no idle thread, and the mapping table is updated according to the following algorithm:

The mapping table corresponding to the stream occupying a small number of threads preferentially selects the first fully idle microengine in the mapping table, and secondly selects the microengine with only one stream and the largest idle thread, otherwise the mapping table does not change. Move the selected microengine to the head of the mapping table, and the remaining entries are moved backwards.

Another stream in the microengine has the same mapping relationship.

(5) If the total number of packets of a flow in all microengines is 0 and exceeds a certain time threshold th, then all the mapping entries of the flow are invalidated. The purpose of this is that when a certain stream does not enter a new message for a period of time, the micro-engine that the stream originally occupied may have been occupied by other streams, then the message that the stream re-enters, prefers the completely idle micro- The engine can make the load more balanced.

2 is a schematic structural diagram of a packet scheduling system based on a microengine in a many-core network processor according to an embodiment of the present invention. As shown in FIG. 2, the packet scheduling system based on a micro-engine in a multi-core network processor includes :

The application unit 21 is configured to apply for a free pointer for the message when a message is input;

The storage unit 22 is configured to store the message to a location in the shared cache pointed by the instruction, and store the pointer to a corresponding flow queue, wherein the flow queue uses a round Request scheduling mode for scheduling;

The scheduling unit 23 is configured to: when the pointer in the flow queue is scheduled, search for a micro engine corresponding to the pointer, to map the packet corresponding to the pointer to the micro engine;

The storage unit 22 is further configured to store the pointer to the corresponding flow queue again when the micro engine corresponding to the pointer is not found.

Each flow queue maintains a dynamic mapping table. In the dynamic mapping table of each flow queue, the initial values of entry 0 to entry N-1 are in order from microengine 0 to microengine N-1;

Where N is the total number of microengines.

The system also includes:

The updating unit 24 is configured to update the dynamic mapping table according to the update of the message status.

The updating unit 24 is further configured to update the following status after the message completes mapping or the processing of the message in the micro engine is completed:

a flag indicating whether the microengine is completely idle;

Whether the micro engine is occupied by one or two flow queues;

The number of idle threads in the microengine;

The flow queue counts the total number of packets in all microengines.

The updating unit 24 is further configured to: after the status update, update the dynamic mapping table according to the updated status according to the following rules:

In order to better understand the embodiments of the present invention, the interconnection structure of the microengines in the many-core network processor is first introduced. At present, the number of micro-engines for network processors has grown to hundreds. Due to the large number and numerous routing nodes, the existing multi-core processor core interconnect structure and routing algorithms are no longer sufficient. The most commonly used micro-engine connection structure is a mesh structure, which is connected in a two-dimensional matrix. Each micro-engine is connected to a routing module through an interconnection interface for packets or other data to pass through the micro-engine network. Reach the destination module (eg off-chip storage, coprocessor, etc.). The routing of this structure can easily cause problems such as local congestion, load imbalance, and large delay. The following is an example of a 256-core network processor. As shown in Figure 3, it is a structural diagram of a micro-engine interconnect in a 256-core network processor with a conventional structure. The complexity and reliability of the routing algorithm can be seen. The difficulty of bandwidth and achievability is very large.

The embodiment of the present invention adopts a multi-level structure scheme, and the organizational structure has been described above, that is, the "cluster-group-me" hierarchical structure is adopted. The number of microengines is me_num, the number of clusters is cluster_num, the number of groups in each cluster is group_num, and the number of microengines in each group is group_me_num, then the relationship between them can be expressed by the following formula:

Me_num=cluster_num×group_num×group_me_num

The selection of each parameter in the formula should take into account factors such as bandwidth and back-end implementation.

The 256-core network processor is taken as an example. The scope of protection of the embodiments of the present invention is not limited to this example. The four micro-engines are grouped into one group, the four groups form a cluster, and the 16 clusters adopt a two-dimensional mesh structure, as shown in FIG. The routing module is in the middle of the grid, and also adopts a two-dimensional mesh structure. Each routing module is responsible for inputting and outputting messages to the four clusters up and down, accessing off-chip storage, co-processing modules, etc., and the interconnection between the routing modules is completed. Messages, access to off-chip storage requests and returns, access to the co-processing module request and return routes, completion of messages, requests, or returns to the specified destination cluster or scheduling module. The routing node of this structure only Four, compared to the 256 nodes of the conventional structure, the number of nodes is greatly reduced, and the benefit of the reduced number of nodes is that the routing algorithm is simple, easy to implement in hardware, and load balancing can be well achieved. The organization structure of a micro-engine in a cluster has a group layer, and there are four me in the group, as shown in Figure 5.

The process of message mapping according to the present invention is as shown in FIG. 6. The message applies a pointer to the idle pointer fifo, stores the message in the corresponding address in the cache, and stores the pointer in the corresponding flow queue. The flow queue adopts a polling arbitration mechanism, and is dispatched to the mapping module to perform mapping of the message to the micro engine. According to the flow number flow_num in the packet, check the corresponding flow mapping table to obtain the current mapping table result. If the result of the lookup table is "invalid mapping", the current flow is a new flow or a flow whose relationship has been deleted by aging, and is mapped according to the following principles:

If there is a completely idle microengine, select the first fully idle microengine according to the order of the microengines in the mapping table. The purpose of this is that the micro-engine is the last used micro-engine in the completely idle micro-engine, and there may be valid instructions in the instruction cache, which can improve the instruction cache efficiency.

If there is no fully idle microengine, choose a microengine with only one stream in it and the most idle threads. To ensure that there are at most 2 streams in a kernel, there is almost no loss in instruction cache efficiency, but the processing power of the micro-engine can be reduced a lot.

If the above two conditions are not met, the message is re-entered and waiting for the next poll to be remapped. The message that is re-queued is preferentially mapped the next time the queue is polled.

If the result of the lookup table can find a valid mapping relationship, then the microengine of the mapping header is treated as a processing microengine. The dynamic update mechanism of the mapping table has been introduced in the foregoing invention. The mapping table automatically puts the latest mapping result in the header according to the monitored micro-engine status, and the old mapping moves backwards in turn.

The dynamic update of the mapping table has been described in great detail in the foregoing summary and will not be repeated in the implementation examples.

According to the mapping step and the dynamic update mechanism of the mapping table, the message of the same stream can be sent to the recently used micro-engine as much as possible, and the cache efficiency of the instruction is improved, and the packet mess is greatly reduced. Order degree.

In addition, the preference is to completely idle the micro-engine, to prioritize the micro-engine with the most threads, and to ensure that there are at most 2 streams in a kernel. These rules ensure the balance of packets distributed in the micro-engine.

Although the organizational structure of the microengine is fully parallel in the above description, another advantage of the present invention is that it can support the serial organization at the same time. For a long time, in the commercial network processor, the serial micro-engine architecture is a mainstream architecture, which has the advantages of guaranteed performance and no disorder. The implementation examples of the serial structure of the present invention are as follows:

All packets are sent from the first cluster. After processing, the processing cluster number of the next level is specified by the microcode. After the first cluster is sent out, the routing module specifies the next level according to the microcode. The cluster number will pass the packet to the next cluster, and the packet processing will be completed after the packet passes through all the clusters. As shown in Figure 7.

The invention also supports the method of serial-to-mixing. A specific implementation example is shown in FIG. 8. Each group of four clusters, each group of cluster internal messages is serially processed between four clusters. The 4 sets of clusters are parallel. After the message comes in, the first 1 to 4 scheduling is performed, and the message is sent to one of the four clusters. Then, the message is in the selected group, and the four clusters are passed through, and the last four clusters are passed through one. 4 to 1 convergence, the message will be output.

The embodiment of the invention further describes a storage medium in which a computer program is stored, the computer program being configured to execute the message scheduling method based on the microengine in the many-core network processor of the foregoing embodiments.

The technical solutions described in the embodiments of the present invention can be arbitrarily combined without conflict.

In several embodiments provided by the present invention, it should be understood that the disclosed method and intelligence The device can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed. In addition, the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.

The units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one second processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.

The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention.

Industrial applicability

According to the present invention, when there is a message input, a free pointer is requested for the message; the message is stored to a location in the shared cache where the instruction points, and the pointer is stored to a corresponding flow queue, wherein The flow queues are scheduled by using a polling scheduling manner; when the pointers in the flow queues are scheduled, the micro engine corresponding to the pointer is searched to map the packets corresponding to the pointers to the micro The engine stores the pointer to the corresponding stream queue again when the micro engine corresponding to the pointer is not found. In this way, the degree of disorder of the message in the same packet flow is reduced, the efficiency of the micro-engine instruction cache is improved, and load balancing, message order-preserving, and instruction cache effect are implemented. The balance of the three rates meets the needs of high-performance forwarding.

Claims

A message scheduling method based on a microengine in a many-core network processor, the method comprising:

When there is a message input, apply for a free pointer for the message;

And storing the packet in a location in the shared cache, and storing the pointer to a corresponding flow queue, where the flow queues are scheduled by using a polling scheduling manner;

When the pointer in the flow queue is scheduled, the micro engine corresponding to the pointer is searched to map the packet corresponding to the pointer to the micro engine;

When the micro engine corresponding to the pointer is not found, the pointer is stored again to the corresponding stream queue.
The packet scheduling method based on a microengine in a many-core network processor according to claim 1, wherein each flow queue maintains a dynamic mapping table, and in each dynamic queue of the flow queue, entries 0 to N- The initial value of 1 is microengine 0 to microengine N-1;

Where N is the total number of microengines.
The packet scheduling method of the micro-engine based on the multi-core network processor according to claim 2, wherein the method further comprises:

Updating the dynamic mapping table according to the update of the message status.
The message scheduling method of the micro-engine based on the multi-core network processor according to claim 3, wherein the updating the state of the micro-engine in the dynamic mapping table according to the update of the message status comprises:

After the message completes the mapping or the message is processed in the microengine, the following status is updated:

a flag indicating whether the microengine is completely idle;

Whether the micro engine is occupied by one or two flow queues;

The number of idle threads in the microengine;

When the micro engine is occupied by two flow queues, the number of threads occupied by each flow queue;

The flow queue counts the total number of packets in all microengines.
The packet scheduling method of the micro-engine based on the multi-core network processor according to claim 4, wherein the method further comprises:

After the status update, the dynamic mapping table is updated according to the updated status according to the following rules:

When there is no message or only one flow queue in the micro engine in the header of the mapping table of the flow queue, and there is an idle thread, the mapping table is not updated;

When the micro engine in the header of the mapping table of the flow queue has only one queue flow and there is no idle thread, start from the header according to the order in the mapping table, and select the micro engine according to the following rules:

Prefers to a fully idle microengine; secondly selects a microengine with only one flow queue and the most idle threads; if neither of the above is satisfied, writes an invalid flag to the mapping table; according to the above rules, the selected microengine The number moves to the header of the mapping table, and the remaining entries in the mapping table are sequentially moved backward;

When there are two flow queues in the micro engine in the header of the mapping table of the flow queue, and there are idle threads, the mapping table is updated according to the following rules:

When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and there is a completely idle micro engine, the flow queues occupying fewer threads are remapped to the completely idle micro engine, wherein the preferred one is selected. The idle idle micro engine in the mapping table;

When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and the micro engine is not completely idle, the mapping relationship remains unchanged;

When the micro engine in the header of the mapping table in the flow group has two flow queues and no idle threads, the mapping table is updated according to the following rules:

For the mapping table corresponding to the flow queue occupying a small number of threads, the first fully idle microengine in the mapping table is preferentially selected, and then the microinjection with only one flow queue and the most idle thread is selected. Engine; otherwise, keep the mapping table unchanged; move the selected microengine to the head of the mapping table, and the remaining entries are sequentially moved backward; for another flow queue in the microengine, keep the mapping table unchanged change;

When the total number of the packets of the flow queue in all the micro-engines is 0 and exceeds a certain time threshold, all the mapping table entries of the flow queue are invalid.
A message scheduling system based on a microengine in a many-core network processor, the system comprising:

The application unit is configured to apply for a free pointer for the message when a message is input;

a storage unit, configured to store the message to a location pointed by the instruction in the shared cache, and store the pointer to a corresponding flow queue, where the flow queue is scheduled by using a polling scheduling manner;

a scheduling unit, configured to: when the pointer in the flow queue is scheduled, search for a micro engine corresponding to the pointer, to map the packet corresponding to the pointer to the micro engine;

The storage unit is further configured to store the pointer to the corresponding flow queue again when the micro engine corresponding to the pointer is not found.
The packet scheduling system based on a microengine in a many-core network processor according to claim 6, wherein each flow queue maintains a dynamic mapping table, and in each dynamic queue of the flow queue, entries 0 to N- The initial value of 1 is microengine 0 to microengine N-1;

Where N is the total number of microengines.
The packet scheduling system of a micro-engine based on a multi-core network processor according to claim 7, wherein the system further comprises:

And an update unit configured to update the dynamic mapping table according to the update of the message status.
The packet scheduling system of the micro-engine based on the multi-core network processor according to claim 8, wherein the updating unit is further configured to: when the message completes mapping or the message is processed in the microengine After that, update the following status:

a flag indicating whether the microengine is completely idle;

Whether the micro engine is occupied by one or two flow queues;

The number of idle threads in the microengine;

When the micro engine is occupied by two flow queues, the number of threads occupied by each flow queue;

The flow queue counts the total number of packets in all microengines.
The packet scheduling system of the micro-engine based on the multi-core network processor according to claim 9, wherein the updating unit is further configured to update according to the updated status according to the following rules after the status update. Dynamic mapping table:

When there is no message or only one flow queue in the micro engine in the header of the mapping table of the flow queue, and there is an idle thread, the mapping table is not updated;

When the micro engine in the header of the mapping table of the flow queue has only one queue flow and there is no idle thread, start from the header according to the order in the mapping table, and select the micro engine according to the following rules:

Prefers to a fully idle microengine; secondly selects a microengine with only one flow queue and the most idle threads; if neither of the above is satisfied, writes an invalid flag to the mapping table; according to the above rules, the selected microengine The number moves to the header of the mapping table, and the remaining entries in the mapping table are sequentially moved backward;

When there are two flow queues in the micro engine in the header of the mapping table of the flow queue, and there are idle threads, the mapping table is updated according to the following rules:

When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and there is a completely idle micro engine, the flow queues occupying fewer threads are remapped to the completely idle micro engine, wherein the preferred one is selected. The idle idle micro engine in the mapping table;

When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and the micro engine is not completely idle, the mapping relationship remains unchanged;

When the micro engine in the header of the mapping table in the flow group has two flow queues and no idle threads, the mapping table is updated according to the following rules:

For the mapping table corresponding to the flow queue occupying a small number of threads, the first fully idle microengine in the mapping table is preferentially selected, and then the microengine with only one flow queue and the most idle thread is selected; otherwise, the mapping table is not maintained. Changing the selected microengine to the head of the mapping table, and the remaining entries are sequentially moved backward; for another flow queue in the microengine, keeping the mapping table unchanged;

When the total number of the packets of the flow queue in all the micro-engines is 0 and exceeds a certain time threshold, all the mapping table entries of the flow queue are invalid.
A many-core network processor, the processor being composed of a plurality of micro-engines;

a plurality of said microengines forming a group, said plurality of groups forming a family, a fully parallel structure between said group and said family; a two-dimensional network structure between said families; In the middle of the grid, a plurality of routing modules are disposed, and the routing module adopts a two-dimensional network structure;

The routing module is configured to input and output messages to the upper, lower, left, and right families, access the off-chip storage, and access the co-processing module;

The routing module is configured to complete the packet, access the off-chip storage request and return, access the request and return route of the co-processing module, complete the message, request or return the transmission to the specified destination group or the scheduling module.
A storage medium storing a computer program configured to execute a message scheduling method based on a microengine in a many-core network processor according to any one of claims 1 to 5.