WO2017067215A1 - Method and system for packet scheduling using many-core network processor and micro-engine thereof, and storage medium - Google Patents

Method and system for packet scheduling using many-core network processor and micro-engine thereof, and storage medium Download PDF

Info

Publication number
WO2017067215A1
WO2017067215A1 PCT/CN2016/088163 CN2016088163W WO2017067215A1 WO 2017067215 A1 WO2017067215 A1 WO 2017067215A1 CN 2016088163 W CN2016088163 W CN 2016088163W WO 2017067215 A1 WO2017067215 A1 WO 2017067215A1
Authority
WO
WIPO (PCT)
Prior art keywords
mapping table
micro
microengine
flow queue
idle
Prior art date
Application number
PCT/CN2016/088163
Other languages
French (fr)
Chinese (zh)
Inventor
袁力
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2017067215A1 publication Critical patent/WO2017067215A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling

Definitions

  • the present invention relates to network processor technologies, and in particular, to a message scheduling method, system, and storage medium for a many-core network processor and a micro-engine thereof.
  • the network processor is the core component of the forwarding surface of the data communication field. It is a solution that takes into account the speed and flexibility of the processor. It can flexibly modify the service microcode to meet the needs of various basic and complex network services. Expansion of the business.
  • the solutions currently solved mainly include: increasing the system frequency and increasing the number of cores.
  • the method of increasing the frequency of the system the development speed of the semiconductor process has lagged far behind the demand for processing capacity improvement, and the method of simply increasing the system frequency by adopting the new process can no longer meet the demand for processing capacity improvement.
  • the processing power of high-end network processors has reached more than 500 Gbps, and the frequency of micro-engines is generally in the range of 1 GHz to 2 GHz, and several or dozens of micro-engines cannot achieve the required processing power at all. Therefore, network processors using many-core architectures have become an inevitable choice.
  • the number of microengines in a network processor can be estimated simply by the following formula:
  • Me_num is the number of micro-engines
  • Performance is the processing capability (unit bps)
  • Pkt_len is the packet length
  • Instr_num is the number of service micro-codes
  • Freq is the system main frequency.
  • on-chip networks include: ring, mesh, torus, tree, and disk.
  • on-chip network nodes increases sharply. The more bandwidth, delay, and load imbalance caused by routing algorithms The more serious it is.
  • mapping between messages and microengines is also a complex problem.
  • mapping algorithms proposed at present, but most of them can only improve one aspect or the algorithm is too complicated to be implemented on hardware due to network data flow.
  • the packet between the same data stream has the locality of the upper layer application.
  • the sequence of the packets entering and leaving the network processor needs to be ensured as much as possible to avoid timeout retransmission of the upper layer of the network.
  • Load balancing is related to the ability to fully utilize processing power and the length of processing delay.
  • the instruction storage of the micro engine generally adopts a cache structure to improve the efficiency of fetching, and if the message of the same processing flow can be processed into the same kernel, the efficiency of the instruction cache can be improved. Therefore, a good mapping algorithm needs to achieve the best balance point in terms of order-preserving, load balancing, and instruction cache efficiency. This is one of the difficulties and research hotspots of many-core network processors.
  • an embodiment of the present invention provides a message scheduling method, system, and storage medium for a many-core network processor and a micro-engine thereof.
  • the pointer is stored again to the corresponding stream queue.
  • each flow queue maintains a dynamic mapping table.
  • the initial values of the entries 0 to N-1 are sequentially from microengine 0 to microengine N-1;
  • N is the total number of microengines.
  • the method further includes:
  • the updating the state of the micro engine in the dynamic mapping table according to the update of the packet status includes:
  • micro engine Whether the micro engine is occupied by one or two flow queues
  • the flow queue counts the total number of packets in all microengines.
  • the method further includes:
  • the dynamic mapping table is updated according to the updated status according to the following rules:
  • mapping table When there is no message or only one flow queue in the micro engine in the header of the mapping table of the flow queue, and there is an idle thread, the mapping table is not updated;
  • P refers to a fully idle microengine; secondly selects a microengine with only one flow queue and the most idle threads; if neither of the above is satisfied, writes an invalid flag to the mapping table; according to the above rules, the selected microengine The number moves to the header of the mapping table, and the remaining entries in the mapping table are sequentially moved backward;
  • mapping table When there are two flow queues in the micro engine in the header of the mapping table of the flow queue, and there are idle threads, the mapping table is updated according to the following rules:
  • mapping relationship remains unchanged
  • mapping table is updated according to the following rules:
  • mapping table corresponding to the flow queue occupying a small number of threads the first fully idle microengine in the mapping table is preferentially selected, and then the microengine with only one flow queue and the most idle thread is selected; otherwise, the mapping table is not maintained. Changing the selected microengine to the head of the mapping table, and the remaining entries are sequentially moved backward; for another flow queue in the microengine, keeping the mapping table unchanged;
  • the application unit is configured to apply for a free pointer for the message when a message is input;
  • a storage unit configured to store the message to a location in the shared cache to which the instruction is directed, and store the pointer to a corresponding flow queue, wherein the flow queue is polled Scheduling in degrees
  • a scheduling unit configured to: when the pointer in the flow queue is scheduled, search for a micro engine corresponding to the pointer, to map the packet corresponding to the pointer to the micro engine;
  • the storage unit is further configured to store the pointer to the corresponding flow queue again when the micro engine corresponding to the pointer is not found.
  • each flow queue maintains a dynamic mapping table.
  • the initial values of the entries 0 to N-1 are sequentially from microengine 0 to microengine N-1;
  • N is the total number of microengines.
  • system further includes:
  • An update unit configured to update the dynamic mapping table according to the update of the message status.
  • the updating unit is further configured to update the following status after the message completes mapping or the processing of the packet in the microengine is completed:
  • micro engine Whether the micro engine is occupied by one or two flow queues
  • the flow queue counts the total number of packets in all microengines.
  • the updating unit is further configured to: after the status update, update the dynamic mapping table according to the updated state according to the following rules:
  • mapping table When there is no message or only one flow queue in the micro engine in the header of the mapping table of the flow queue, and there is an idle thread, the mapping table is not updated;
  • mapping table When there are two flow queues in the micro engine in the header of the mapping table of the flow queue, and there are idle threads, the mapping table is updated according to the following rules:
  • mapping relationship remains unchanged
  • mapping table is updated according to the following rules:
  • mapping table corresponding to the flow queue occupying a small number of threads the first fully idle microengine in the mapping table is preferentially selected, and then the microengine with only one flow queue and the most idle thread is selected; otherwise, the mapping table is not maintained. Changing the selected microengine to the head of the mapping table, and the remaining entries are sequentially moved backward; for another flow queue in the microengine, keeping the mapping table unchanged;
  • the many-core network processor provided by the embodiment of the present invention is composed of multiple micro-engines
  • a plurality of said microengines forming a group, said plurality of groups forming a family, a fully parallel structure between said group and said family; a two-dimensional network structure between said families;
  • a plurality of routing modules are disposed, and the routing module adopts a two-dimensional network structure;
  • the routing module is configured to input and output messages to the upper, lower, left, and right families, access the off-chip storage, and access the co-processing module;
  • the routing module is configured to complete the packet, access the off-chip storage request and return, access the request and return route of the co-processing module, complete the message, request or return the transmission to the specified destination group or the scheduling module.
  • the storage medium provided by the embodiment of the present invention stores a computer program configured to execute the message scheduling method based on the micro engine in the many-core network processor.
  • the many-core network processor is composed of a plurality of micro-engines; wherein, the plurality of micro-engines form a group, the plurality of groups form a group, and the group and the family are A full parallel structure is adopted; a two-dimensional network structure is adopted between the families; in the middle of the family grid, a plurality of routing modules are disposed, and the routing module adopts a two-dimensional network structure.
  • FIG. 1 is a schematic flowchart of a packet scheduling method based on a microengine in a many-core network processor according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a packet scheduling system based on a microengine in a many-core network processor according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a 256-core network processor microengine of a conventional structure
  • FIG. 4 is a schematic diagram of a multi-level structure of a 256-core processor microengine
  • Figure 5 is a schematic diagram of a microengine hierarchy inside a cluster
  • FIG. 6 is a schematic diagram of a message mapping process according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of an implementation example of a serial mode according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of an implementation example of a serial-to-mix mode according to an embodiment of the present invention.
  • the packet scheduling method based on a micro-engine in a multi-core network processor includes the following: step:
  • Step 101 When there is a message input, apply for a free pointer for the message.
  • the threads and states of all the micro engines are monitored in real time by a monitoring module to implement global scheduling based on thread granularity.
  • the virtual output queue (VOQ) mode is used to prevent the "header blocking" phenomenon of packet mapping between different flows.
  • VOQ virtual output queue
  • a free pointer is applied first, and the message is stored in the location pointed to by the pointer in the shared cache, and the pointer is stored in the queue of the corresponding stream.
  • the flow queues are output to the mapping module by polling scheduling (a polling measurement such as fairness, weighting, or priority).
  • polling scheduling a polling measurement such as fairness, weighting, or priority.
  • the packets that failed to be mapped are returned to the corresponding flow queue, and the mapping is performed next time (priority or other methods can be used).
  • Each of the flow queues maintains a dynamic mapping table.
  • the initial values of the entries 0 to N-1 are sequentially from microengine 0 to microengine N-1;
  • N is the total number of micro-engines, that is, the micro-engine 0 is preferentially selected at the beginning.
  • Step 102 Store the message to a location in the shared cache that the instruction points to, and store the pointer to a corresponding flow queue, where the flow queues are scheduled by using a polling scheduling manner.
  • Step 103 When the pointer in the flow queue is scheduled, the micro engine corresponding to the pointer is searched to map the packet corresponding to the pointer to the micro engine.
  • Step 104 When the micro engine corresponding to the pointer is not found, the pointer is stored again to the corresponding flow queue.
  • the method further includes:
  • micro engine Whether the micro engine is occupied by one or two flow queues
  • the flow queue counts the total number of packets in all microengines.
  • the dynamic mapping table is updated according to the updated state according to the following rules:
  • mapping table When there is no message or only one flow queue in the micro engine in the header of the mapping table of the flow queue, and there is an idle thread, the mapping table is not updated;
  • P refers to a fully idle microengine; secondly selects a microengine with only one flow queue and the most idle threads; if neither of the above is satisfied, writes an invalid flag to the mapping table; according to the above rules, the selected microengine The number moves to the header of the mapping table, and the remaining entries in the mapping table are sequentially moved backward;
  • mapping table is updated according to the following rules:
  • mapping relationship remains unchanged
  • mapping table is updated according to the following rules:
  • mapping table corresponding to the flow queue occupying a small number of threads the first fully idle microengine in the mapping table is preferentially selected, and then the microengine with only one flow queue and the most idle thread is selected; otherwise, the mapping table is not maintained. Changing the selected microengine to the head of the mapping table, and the remaining entries are sequentially moved backward; for another flow queue in the microengine, keeping the mapping table unchanged;
  • mapping table does not need to be updated.
  • the invalid flag is written to the mapping table, that is, the packet of the flow cannot be mapped to the micro engine temporarily, and is cached in the queue.
  • mapping relationship does not change.
  • mapping table is updated according to the following algorithm:
  • mapping table corresponding to the stream occupying a small number of threads preferentially selects the first fully idle microengine in the mapping table, and secondly selects the microengine with only one stream and the largest idle thread, otherwise the mapping table does not change. Move the selected microengine to the head of the mapping table, and the remaining entries are moved backwards.
  • Another stream in the microengine has the same mapping relationship.
  • the packet scheduling system based on a micro-engine in a multi-core network processor includes :
  • the application unit 21 is configured to apply for a free pointer for the message when a message is input;
  • the storage unit 22 is configured to store the message to a location in the shared cache pointed by the instruction, and store the pointer to a corresponding flow queue, wherein the flow queue uses a round Request scheduling mode for scheduling;
  • the scheduling unit 23 is configured to: when the pointer in the flow queue is scheduled, search for a micro engine corresponding to the pointer, to map the packet corresponding to the pointer to the micro engine;
  • the storage unit 22 is further configured to store the pointer to the corresponding flow queue again when the micro engine corresponding to the pointer is not found.
  • Each flow queue maintains a dynamic mapping table.
  • the initial values of entry 0 to entry N-1 are in order from microengine 0 to microengine N-1;
  • N is the total number of microengines.
  • the system also includes:
  • the updating unit 24 is configured to update the dynamic mapping table according to the update of the message status.
  • the updating unit 24 is further configured to update the following status after the message completes mapping or the processing of the message in the micro engine is completed:
  • micro engine Whether the micro engine is occupied by one or two flow queues
  • the flow queue counts the total number of packets in all microengines.
  • the updating unit 24 is further configured to: after the status update, update the dynamic mapping table according to the updated status according to the following rules:
  • mapping table When there is no message or only one flow queue in the micro engine in the header of the mapping table of the flow queue, and there is an idle thread, the mapping table is not updated;
  • mapping table When there are two flow queues in the micro engine in the header of the mapping table of the flow queue, and there are idle threads, the mapping table is updated according to the following rules:
  • mapping relationship remains unchanged
  • mapping table is updated according to the following rules:
  • mapping table corresponding to the flow queue occupying a small number of threads the first fully idle microengine in the mapping table is preferentially selected, and then the microengine with only one flow queue and the most idle thread is selected; otherwise, the mapping table is not maintained. Changing the selected microengine to the head of the mapping table, and the remaining entries are sequentially moved backward; for another flow queue in the microengine, keeping the mapping table unchanged;
  • the many-core network processor provided by the embodiment of the present invention is composed of multiple micro-engines
  • a plurality of said microengines forming a group, said plurality of groups forming a family, a fully parallel structure between said group and said family; a two-dimensional network structure between said families;
  • a plurality of routing modules are disposed, and the routing module adopts a two-dimensional network structure;
  • the routing module is configured to input and output messages to the upper, lower, left, and right families, access the off-chip storage, and access the co-processing module;
  • the routing module is configured to complete the packet, access the off-chip storage request and return, access the request and return route of the co-processing module, complete the message, request or return the transmission to the specified destination group or the scheduling module.
  • the interconnection structure of the microengines in the many-core network processor is first introduced.
  • the number of micro-engines for network processors has grown to hundreds. Due to the large number and numerous routing nodes, the existing multi-core processor core interconnect structure and routing algorithms are no longer sufficient.
  • the most commonly used micro-engine connection structure is a mesh structure, which is connected in a two-dimensional matrix. Each micro-engine is connected to a routing module through an interconnection interface for packets or other data to pass through the micro-engine network. Reach the destination module (eg off-chip storage, coprocessor, etc.). The routing of this structure can easily cause problems such as local congestion, load imbalance, and large delay.
  • FIG. 3 it is a structural diagram of a micro-engine interconnect in a 256-core network processor with a conventional structure.
  • the complexity and reliability of the routing algorithm can be seen.
  • the difficulty of bandwidth and achievability is very large.
  • the embodiment of the present invention adopts a multi-level structure scheme, and the organizational structure has been described above, that is, the "cluster-group-me” hierarchical structure is adopted.
  • the number of microengines is me_num
  • the number of clusters is cluster_num
  • the number of groups in each cluster is group_num
  • the number of microengines in each group is group_me_num
  • Me_num cluster_num ⁇ group_num ⁇ group_me_num
  • the 256-core network processor is taken as an example.
  • the scope of protection of the embodiments of the present invention is not limited to this example.
  • the four micro-engines are grouped into one group, the four groups form a cluster, and the 16 clusters adopt a two-dimensional mesh structure, as shown in FIG.
  • the routing module is in the middle of the grid, and also adopts a two-dimensional mesh structure.
  • Each routing module is responsible for inputting and outputting messages to the four clusters up and down, accessing off-chip storage, co-processing modules, etc., and the interconnection between the routing modules is completed. Messages, access to off-chip storage requests and returns, access to the co-processing module request and return routes, completion of messages, requests, or returns to the specified destination cluster or scheduling module.
  • the routing node of this structure only Four, compared to the 256 nodes of the conventional structure, the number of nodes is greatly reduced, and the benefit of the reduced number of nodes is that the routing algorithm is simple, easy to implement in hardware, and load balancing can be well achieved.
  • the organization structure of a micro-engine in a cluster has a group layer, and there are four me in the group, as shown in Figure 5.
  • the process of message mapping according to the present invention is as shown in FIG. 6.
  • the message applies a pointer to the idle pointer fifo, stores the message in the corresponding address in the cache, and stores the pointer in the corresponding flow queue.
  • the flow queue adopts a polling arbitration mechanism, and is dispatched to the mapping module to perform mapping of the message to the micro engine.
  • the mapping module According to the flow number flow_num in the packet, check the corresponding flow mapping table to obtain the current mapping table result. If the result of the lookup table is "invalid mapping", the current flow is a new flow or a flow whose relationship has been deleted by aging, and is mapped according to the following principles:
  • micro-engine selects the first fully idle microengine according to the order of the microengines in the mapping table.
  • the purpose of this is that the micro-engine is the last used micro-engine in the completely idle micro-engine, and there may be valid instructions in the instruction cache, which can improve the instruction cache efficiency.
  • microengine If there is no fully idle microengine, choose a microengine with only one stream in it and the most idle threads. To ensure that there are at most 2 streams in a kernel, there is almost no loss in instruction cache efficiency, but the processing power of the micro-engine can be reduced a lot.
  • the message is re-entered and waiting for the next poll to be remapped.
  • the message that is re-queued is preferentially mapped the next time the queue is polled.
  • mapping table If the result of the lookup table can find a valid mapping relationship, then the microengine of the mapping header is treated as a processing microengine.
  • the dynamic update mechanism of the mapping table has been introduced in the foregoing invention.
  • the mapping table automatically puts the latest mapping result in the header according to the monitored micro-engine status, and the old mapping moves backwards in turn.
  • mapping table has been described in great detail in the foregoing summary and will not be repeated in the implementation examples.
  • mapping step and the dynamic update mechanism of the mapping table the message of the same stream can be sent to the recently used micro-engine as much as possible, and the cache efficiency of the instruction is improved, and the packet mess is greatly reduced. Order degree.
  • the preference is to completely idle the micro-engine, to prioritize the micro-engine with the most threads, and to ensure that there are at most 2 streams in a kernel.
  • serial micro-engine architecture is a mainstream architecture, which has the advantages of guaranteed performance and no disorder.
  • All packets are sent from the first cluster.
  • the processing cluster number of the next level is specified by the microcode.
  • the routing module specifies the next level according to the microcode. The cluster number will pass the packet to the next cluster, and the packet processing will be completed after the packet passes through all the clusters. As shown in Figure 7.
  • the invention also supports the method of serial-to-mixing.
  • a specific implementation example is shown in FIG. 8.
  • Each group of four clusters each group of cluster internal messages is serially processed between four clusters.
  • the 4 sets of clusters are parallel. After the message comes in, the first 1 to 4 scheduling is performed, and the message is sent to one of the four clusters. Then, the message is in the selected group, and the four clusters are passed through, and the last four clusters are passed through one. 4 to 1 convergence, the message will be output.
  • the embodiment of the invention further describes a storage medium in which a computer program is stored, the computer program being configured to execute the message scheduling method based on the microengine in the many-core network processor of the foregoing embodiments.
  • the disclosed method and intelligence can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed.
  • the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one second processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit;
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • a free pointer is requested for the message; the message is stored to a location in the shared cache where the instruction points, and the pointer is stored to a corresponding flow queue, wherein
  • the flow queues are scheduled by using a polling scheduling manner; when the pointers in the flow queues are scheduled, the micro engine corresponding to the pointer is searched to map the packets corresponding to the pointers to the micro The engine stores the pointer to the corresponding stream queue again when the micro engine corresponding to the pointer is not found.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present invention discloses a method and system for packet scheduling using a many-core network processor and micro-engines thereof. The method comprises: upon an input of a packet, requesting an idle pointer for the packet; storing the packet at a location in a shared buffer where the pointer points to, and storing the pointer to a corresponding stream queue, wherein scheduling of the stream queue is performed by means of polling; when the pointer in the stream queue is scheduled, searching for a micro-engine corresponding to the pointer to map the packet corresponding to the pointer to the micro-engine; and if a micro-engine corresponding to the pointer is not found, then re-storing the pointer to the corresponding stream queue.

Description

众核网络处理器及其微引擎的报文调度方法、系统、存储介质Packet scheduling method, system, and storage medium for many core network processors and their microengines 技术领域Technical field
本发明涉及网络处理器技术,尤其涉及一种众核网络处理器及其微引擎的报文调度方法、系统、存储介质。The present invention relates to network processor technologies, and in particular, to a message scheduling method, system, and storage medium for a many-core network processor and a micro-engine thereof.
背景技术Background technique
网络处理器是数据通信领域转发面的核心部件,其是一种兼顾处理器速度与使用灵活性的解决方案,可以通过灵活地修改业务微码以满足各种基本与复杂网络服务的需求,便于业务的扩展升级。The network processor is the core component of the forwarding surface of the data communication field. It is a solution that takes into account the speed and flexibility of the processor. It can flexibly modify the service microcode to meet the needs of various basic and complex network services. Expansion of the business.
由于网络服务高速发展的需求,对网络处理器的处理能力要求越来越高,单个或少量微引擎已经远远不能满足处理能力的需求。目前解决的方案主要有:提高系统主频、增加内核数量等方法。提高系统主频的方法,半导体工艺的发展速度已经远远落后于处理能力提升的需求,单纯依靠采用新工艺提高系统主频的方法已经不能满足处理能力提升的需求。目前对高端网络处理器的处理能力已经达到了500Gbps以上,而微引擎的主频一般在1GHz~2GHz范围内,几个或几十个微引擎根本无法达到需要的处理能力。因此,采用众核结构的网络处理器成为一个必然的选择。网络处理器中微引擎的数量可以简单地用下面的公式估算:Due to the rapid development of network services, the processing power requirements of network processors are getting higher and higher, and single or a small number of micro-engines are far from meeting the processing power requirements. The solutions currently solved mainly include: increasing the system frequency and increasing the number of cores. The method of increasing the frequency of the system, the development speed of the semiconductor process has lagged far behind the demand for processing capacity improvement, and the method of simply increasing the system frequency by adopting the new process can no longer meet the demand for processing capacity improvement. At present, the processing power of high-end network processors has reached more than 500 Gbps, and the frequency of micro-engines is generally in the range of 1 GHz to 2 GHz, and several or dozens of micro-engines cannot achieve the required processing power at all. Therefore, network processors using many-core architectures have become an inevitable choice. The number of microengines in a network processor can be estimated simply by the following formula:
Figure PCTCN2016088163-appb-000001
Figure PCTCN2016088163-appb-000001
其中,Me_num为微引擎数量,Performance为处理能力(单位bps),Pkt_len为包长,Instr_num为业务微码条数,Freq为系统主频。Among them, Me_num is the number of micro-engines, Performance is the processing capability (unit bps), Pkt_len is the packet length, Instr_num is the number of service micro-codes, and Freq is the system main frequency.
根据对网络处理器处理性能需求的分析,利用上面的公式,可以估算 出未来商用网路处理器的微引擎数量可达256甚至更多,这必然是一种众核处理器。巨大的微引擎数量带来了一系列的问题。According to the analysis of the processing performance requirements of the network processor, using the above formula, you can estimate The number of micro-engines for future commercial network processors can reach 256 or more, which is necessarily a multi-core processor. The huge number of microengines has brought a series of problems.
首先,由于微引擎数量众多,微引擎之间的组织架构,即片上网络与路由,成为影响性能的关键之一。常用的片上网络主要有:ring、mesh、torus、树型、碟型等,随着微引擎数量的增多,片上网络节点急剧增加,路由算法带来的带宽、时延、负载不均衡等问题越来越严重。First, due to the large number of microengines, the organizational structure between microengines, namely on-chip network and routing, has become one of the keys to performance. Commonly used on-chip networks include: ring, mesh, torus, tree, and disk. As the number of micro-engines increases, the number of on-chip network nodes increases sharply. The more bandwidth, delay, and load imbalance caused by routing algorithms The more serious it is.
其次,报文到微引擎之间的映射也是一个复杂的问题,目前提出的映射算法很多,但大多数都只能改善某一方面或算法过于复杂,很难在硬件上实现,由于网络数据流中,同一条数据流之间报文具有上层应用的局部性,需要尽可能地保证进出网络处理器的报文顺序,以避免网络上层的超时重传。负载均衡关系到处理能力能否充分使用、处理时延的长短等。此外,微引擎的指令存储一般采用缓存(cache)结构以提高取指效率,那么相同处理流程的报文如果可以进同一个内核处理,那么可以使指令cache的效率提高。因此一个好的映射算法,需要在保序、负载均衡、指令cache效率等方面做到最佳平衡点,这也是众核网络处理器的难点与研究热点之一。Secondly, the mapping between messages and microengines is also a complex problem. There are many mapping algorithms proposed at present, but most of them can only improve one aspect or the algorithm is too complicated to be implemented on hardware due to network data flow. The packet between the same data stream has the locality of the upper layer application. The sequence of the packets entering and leaving the network processor needs to be ensured as much as possible to avoid timeout retransmission of the upper layer of the network. Load balancing is related to the ability to fully utilize processing power and the length of processing delay. In addition, the instruction storage of the micro engine generally adopts a cache structure to improve the efficiency of fetching, and if the message of the same processing flow can be processed into the same kernel, the efficiency of the instruction cache can be improved. Therefore, a good mapping algorithm needs to achieve the best balance point in terms of order-preserving, load balancing, and instruction cache efficiency. This is one of the difficulties and research hotspots of many-core network processors.
发明内容Summary of the invention
为解决上述技术问题,本发明实施例提供了一种众核网络处理器及其微引擎的报文调度方法、系统、存储介质。To solve the above technical problem, an embodiment of the present invention provides a message scheduling method, system, and storage medium for a many-core network processor and a micro-engine thereof.
本发明实施例提供的基于众核网络处理器中微引擎的报文调度方法包括:The message scheduling method based on the micro engine in the many-core network processor provided by the embodiment of the present invention includes:
当有报文输入时,为所述报文申请空闲指针;When there is a message input, apply for a free pointer for the message;
将所述报文存储至共享缓存中所述指令指向的位置,以及将所述指针存储至对应的流队列,其中,所述流队列之间采用轮询调度方式进行调度;And storing the packet in a location in the shared cache, and storing the pointer to a corresponding flow queue, where the flow queues are scheduled by using a polling scheduling manner;
当所述流队列中的指针被调度时,查找所述指针对应的微引擎,以将 所述指针对应的报文映射到所述微引擎;When the pointer in the flow queue is scheduled, searching for the micro engine corresponding to the pointer to Transmitting a message corresponding to the pointer to the micro engine;
当未查找到所述指针对应的微引擎时,将所述指针再次存储至对应的流队列。When the micro engine corresponding to the pointer is not found, the pointer is stored again to the corresponding stream queue.
本发明实施例中,每一条流队列维护一个动态映射表,每条流队列的动态映射表中,条目0到条目N-1的初始值依次为微引擎0到微引擎N-1;In the embodiment of the present invention, each flow queue maintains a dynamic mapping table. In the dynamic mapping table of each flow queue, the initial values of the entries 0 to N-1 are sequentially from microengine 0 to microengine N-1;
其中,N为微引擎总数。Where N is the total number of microengines.
本发明实施例中,所述方法还包括:In the embodiment of the present invention, the method further includes:
根据所述报文状态的更新,更新所述动态映射表。Updating the dynamic mapping table according to the update of the message status.
本发明实施例中,所述根据所述报文状态的更新,更新所述动态映射表中微引擎的状态,包括:In the embodiment of the present invention, the updating the state of the micro engine in the dynamic mapping table according to the update of the packet status includes:
当所述报文完成映射或所述报文在微引擎中处理完成后,更新如下状态:After the message completes the mapping or the message is processed in the microengine, the following status is updated:
所述微引擎是否完全空闲的标志;a flag indicating whether the microengine is completely idle;
所述微引擎被1条还是2条流队列占用的标志;Whether the micro engine is occupied by one or two flow queues;
所述微引擎中的空闲线程数;The number of idle threads in the microengine;
所述微引擎中被2条流队列占用时,每条流队列各占用线程的数目;When the micro engine is occupied by two flow queues, the number of threads occupied by each flow queue;
所述流队列在所有微引擎中的报文总数统计。The flow queue counts the total number of packets in all microengines.
本发明实施例中,所述方法还包括:In the embodiment of the present invention, the method further includes:
在所述状态更新后,根据更新后的所述状态按照如下规则更新动态映射表:After the status update, the dynamic mapping table is updated according to the updated status according to the following rules:
当所述流队列的映射表的表头中的微引擎中没有报文或只有1条流队列,且有空闲线程时,则不更新所述映射表;When there is no message or only one flow queue in the micro engine in the header of the mapping table of the flow queue, and there is an idle thread, the mapping table is not updated;
当所述流队列的映射表的表头中的微引擎只有1条队列流,且没有空闲线程时,按照所述映射表中的顺序从表头开始,按照如下规则选择微引擎: When the micro engine in the header of the mapping table of the flow queue has only one queue flow and there is no idle thread, start from the header according to the order in the mapping table, and select the micro engine according to the following rules:
优先选择完全空闲的微引擎;其次选择只有1条流队列且空闲线程最多的微引擎;如果以上两者都不满足,向所述映射表写入无效标志;根据以上规则,将选中的微引擎编号移至所述映射表的表头,所述映射表中的其余条目依次向后移动;Prefers to a fully idle microengine; secondly selects a microengine with only one flow queue and the most idle threads; if neither of the above is satisfied, writes an invalid flag to the mapping table; according to the above rules, the selected microengine The number moves to the header of the mapping table, and the remaining entries in the mapping table are sequentially moved backward;
当所述流队列的映射表的表头中的微引擎中有2条流队列,且有空闲线程时,按照如下规则更新所述映射表:When there are two flow queues in the micro engine in the header of the mapping table of the flow queue, and there are idle threads, the mapping table is updated according to the following rules:
当所述微引擎中一条流队列占用的线程数目大于另外一条流队列,且有完全空闲的微引擎时,将占用线程少的流队列重新映射到完全空闲的微引擎中,其中,优先选择所述映射表中靠前的空闲微引擎;When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and there is a completely idle micro engine, the flow queues occupying fewer threads are remapped to the completely idle micro engine, wherein the preferred one is selected. The idle idle micro engine in the mapping table;
当所述微引擎中一条流队列占用的线程数目大于另外一条流队列,且没有完全空闲的微引擎时,保持映射关系不变;When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and the micro engine is not completely idle, the mapping relationship remains unchanged;
当所述流队里的映射表的表头中的微引擎有2条流队列,且没有空闲线程时,按照如下规则更新所述映射表:When the micro engine in the header of the mapping table in the flow group has two flow queues and no idle threads, the mapping table is updated according to the following rules:
对于占用线程数目较少的流队列对应的映射表,优先选择映射表中靠前的完全空闲的微引擎,其次选择只有1条流队列且空闲线程最多的微引擎;否则保持所述映射表不变;将选择的微引擎移到所述映射表的头部,其余条目依次后移;对于所述微引擎中的另外一条流队列,保持所述映射表不变;For the mapping table corresponding to the flow queue occupying a small number of threads, the first fully idle microengine in the mapping table is preferentially selected, and then the microengine with only one flow queue and the most idle thread is selected; otherwise, the mapping table is not maintained. Changing the selected microengine to the head of the mapping table, and the remaining entries are sequentially moved backward; for another flow queue in the microengine, keeping the mapping table unchanged;
当所述流队列在所有微引擎中的报文总数为0,且超过一定的时间阈值时,将所述流队列的映射表条目全部置为无效。When the total number of the packets of the flow queue in all the micro-engines is 0 and exceeds a certain time threshold, all the mapping table entries of the flow queue are invalid.
本发明实施例提供的基于众核网络处理器中微引擎的报文调度系统包括:The packet scheduling system based on the microengine in the many-core network processor provided by the embodiment of the present invention includes:
申请单元,配置为当有报文输入时,为所述报文申请空闲指针;The application unit is configured to apply for a free pointer for the message when a message is input;
存储单元,配置为将所述报文存储至共享缓存中所述指令指向的位置,以及将所述指针存储至对应的流队列,其中,所述流队列之间采用轮询调 度方式进行调度;a storage unit configured to store the message to a location in the shared cache to which the instruction is directed, and store the pointer to a corresponding flow queue, wherein the flow queue is polled Scheduling in degrees
调度单元,配置为当所述流队列中的指针被调度时,查找所述指针对应的微引擎,以将所述指针对应的报文映射到所述微引擎;a scheduling unit, configured to: when the pointer in the flow queue is scheduled, search for a micro engine corresponding to the pointer, to map the packet corresponding to the pointer to the micro engine;
所处存储单元,还配置为当未查找到所述指针对应的微引擎时,将所述指针再次存储至对应的流队列。The storage unit is further configured to store the pointer to the corresponding flow queue again when the micro engine corresponding to the pointer is not found.
本发明实施例中,每一条流队列维护一个动态映射表,每条流队列的动态映射表中,条目0到条目N-1的初始值依次为微引擎0到微引擎N-1;In the embodiment of the present invention, each flow queue maintains a dynamic mapping table. In the dynamic mapping table of each flow queue, the initial values of the entries 0 to N-1 are sequentially from microengine 0 to microengine N-1;
其中,N为微引擎总数。Where N is the total number of microengines.
本发明实施例中,所述系统还包括:In the embodiment of the present invention, the system further includes:
更新单元,配置为根据所述报文状态的更新,更新所述动态映射表。And an update unit configured to update the dynamic mapping table according to the update of the message status.
本发明实施例中,所述更新单元,还配置为当所述报文完成映射或所述报文在微引擎中处理完成后,更新如下状态:In the embodiment of the present invention, the updating unit is further configured to update the following status after the message completes mapping or the processing of the packet in the microengine is completed:
所述微引擎是否完全空闲的标志;a flag indicating whether the microengine is completely idle;
所述微引擎被1条还是2条流队列占用的标志;Whether the micro engine is occupied by one or two flow queues;
所述微引擎中的空闲线程数;The number of idle threads in the microengine;
所述微引擎中被2条流队列占用时,每条流队列各占用线程的数目;When the micro engine is occupied by two flow queues, the number of threads occupied by each flow queue;
所述流队列在所有微引擎中的报文总数统计。The flow queue counts the total number of packets in all microengines.
本发明实施例中,所述更新单元,还配置为在所述状态更新后,根据更新后的所述状态按照如下规则更新动态映射表:In the embodiment of the present invention, the updating unit is further configured to: after the status update, update the dynamic mapping table according to the updated state according to the following rules:
当所述流队列的映射表的表头中的微引擎中没有报文或只有1条流队列,且有空闲线程时,则不更新所述映射表;When there is no message or only one flow queue in the micro engine in the header of the mapping table of the flow queue, and there is an idle thread, the mapping table is not updated;
当所述流队列的映射表的表头中的微引擎只有1条队列流,且没有空闲线程时,按照所述映射表中的顺序从表头开始,按照如下规则选择微引擎:When the micro engine in the header of the mapping table of the flow queue has only one queue flow and there is no idle thread, start from the header according to the order in the mapping table, and select the micro engine according to the following rules:
优先选择完全空闲的微引擎;其次选择只有1条流队列且空闲线程最 多的微引擎;如果以上两者都不满足,向所述映射表写入无效标志;根据以上规则,将选中的微引擎编号移至所述映射表的表头,所述映射表中的其余条目依次向后移动;Preference is given to a completely idle microengine; secondly, only one stream queue is selected and the idle thread is the most a plurality of microengines; if neither of the above is satisfied, writing an invalid flag to the mapping table; according to the above rule, moving the selected microengine number to the header of the mapping table, and the rest of the mapping table The entries move backwards in order;
当所述流队列的映射表的表头中的微引擎中有2条流队列,且有空闲线程时,按照如下规则更新所述映射表:When there are two flow queues in the micro engine in the header of the mapping table of the flow queue, and there are idle threads, the mapping table is updated according to the following rules:
当所述微引擎中一条流队列占用的线程数目大于另外一条流队列,且有完全空闲的微引擎时,将占用线程少的流队列重新映射到完全空闲的微引擎中,其中,优先选择所述映射表中靠前的空闲微引擎;When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and there is a completely idle micro engine, the flow queues occupying fewer threads are remapped to the completely idle micro engine, wherein the preferred one is selected. The idle idle micro engine in the mapping table;
当所述微引擎中一条流队列占用的线程数目大于另外一条流队列,且没有完全空闲的微引擎时,保持映射关系不变;When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and the micro engine is not completely idle, the mapping relationship remains unchanged;
当所述流队里的映射表的表头中的微引擎有2条流队列,且没有空闲线程时,按照如下规则更新所述映射表:When the micro engine in the header of the mapping table in the flow group has two flow queues and no idle threads, the mapping table is updated according to the following rules:
对于占用线程数目较少的流队列对应的映射表,优先选择映射表中靠前的完全空闲的微引擎,其次选择只有1条流队列且空闲线程最多的微引擎;否则保持所述映射表不变;将选择的微引擎移到所述映射表的头部,其余条目依次后移;对于所述微引擎中的另外一条流队列,保持所述映射表不变;For the mapping table corresponding to the flow queue occupying a small number of threads, the first fully idle microengine in the mapping table is preferentially selected, and then the microengine with only one flow queue and the most idle thread is selected; otherwise, the mapping table is not maintained. Changing the selected microengine to the head of the mapping table, and the remaining entries are sequentially moved backward; for another flow queue in the microengine, keeping the mapping table unchanged;
当所述流队列在所有微引擎中的报文总数为0,且超过一定的时间阈值时,将所述流队列的映射表条目全部置为无效。When the total number of the packets of the flow queue in all the micro-engines is 0 and exceeds a certain time threshold, all the mapping table entries of the flow queue are invalid.
本发明实施例提供的众核网络处理器由多个微引擎组成;其中,The many-core network processor provided by the embodiment of the present invention is composed of multiple micro-engines;
多个所述微引擎组成一个组,所述多个组组成一个族,在所述组与所述族之间采用全并行结构;所述族之间采用二维网状结构;在所述族网格中间,设置有多个路由模块,所述路由模块采用二维网状结构;a plurality of said microengines forming a group, said plurality of groups forming a family, a fully parallel structure between said group and said family; a two-dimensional network structure between said families; In the middle of the grid, a plurality of routing modules are disposed, and the routing module adopts a two-dimensional network structure;
所述路由模块,配置为向其上下左右4个族输入输出报文、访问片外存储、访问协处理模块; The routing module is configured to input and output messages to the upper, lower, left, and right families, access the off-chip storage, and access the co-processing module;
所述路由模块,配置为完成报文、访问片外存储请求与返回、访问协处理模块请求与返回的路由,完成报文、请求或返回到指定目的族或调度模块的传输。The routing module is configured to complete the packet, access the off-chip storage request and return, access the request and return route of the co-processing module, complete the message, request or return the transmission to the specified destination group or the scheduling module.
本发明实施例提供的存储介质中存储有计算机程序,所述计算机程序配置为执行所述的基于众核网络处理器中微引擎的报文调度方法。The storage medium provided by the embodiment of the present invention stores a computer program configured to execute the message scheduling method based on the micro engine in the many-core network processor.
本发明实施例的技术方案中,众核网络处理器由多个微引擎组成;其中,多个所述微引擎组成一个组,所述多个组组成一个族,在所述组与所述族之间采用全并行结构;所述族之间采用二维网状结构;在所述族网格中间,设置有多个路由模块,所述路由模块采用二维网状结构。当有报文输入时,为所述报文申请空闲指针;将所述报文存储至共享缓存中所述指令指向的位置,以及将所述指针存储至对应的流队列,其中,所述流队列之间采用轮询调度方式进行调度;当所述流队列中的指针被调度时,查找所述指针对应的微引擎,以将所述指针对应的报文映射到所述微引擎;当未查找到所述指针对应的微引擎时,将所述指针再次存储至对应的流队列。如此,减少了同一条报文流中报文乱序程度,提高了微引擎指令cache效率,实现了负载均衡、报文保序、指令cache效率三者的平衡,满足了高性能转发的需求。In the technical solution of the embodiment of the present invention, the many-core network processor is composed of a plurality of micro-engines; wherein, the plurality of micro-engines form a group, the plurality of groups form a group, and the group and the family are A full parallel structure is adopted; a two-dimensional network structure is adopted between the families; in the middle of the family grid, a plurality of routing modules are disposed, and the routing module adopts a two-dimensional network structure. When there is a message input, requesting a free pointer for the message; storing the message to a location in the shared cache pointed to by the instruction, and storing the pointer to a corresponding flow queue, wherein the flow Queues are scheduled by using a polling scheduling manner; when a pointer in the flow queue is scheduled, the micro engine corresponding to the pointer is searched to map the packet corresponding to the pointer to the micro engine; When the micro engine corresponding to the pointer is found, the pointer is stored again in the corresponding flow queue. In this way, the degree of disorder of the message in the same packet flow is reduced, the efficiency of the micro-engine instruction cache is improved, and the balance of load balancing, message order-preserving, and instruction cache efficiency is achieved, which satisfies the requirement of high-performance forwarding.
附图说明DRAWINGS
图1为本发明实施例的基于众核网络处理器中微引擎的报文调度方法的流程示意图;1 is a schematic flowchart of a packet scheduling method based on a microengine in a many-core network processor according to an embodiment of the present invention;
图2为本发明实施例的基于众核网络处理器中微引擎的报文调度系统的结构组成示意图;2 is a schematic structural diagram of a packet scheduling system based on a microengine in a many-core network processor according to an embodiment of the present invention;
图3为一个常规结构的256核网络处理器微引擎结构示意图;3 is a schematic structural diagram of a 256-core network processor microengine of a conventional structure;
图4为一个多层次结构的256核处理器微引擎结构示意图;4 is a schematic diagram of a multi-level structure of a 256-core processor microengine;
图5为一个cluster内部的微引擎层次结构示意图; Figure 5 is a schematic diagram of a microengine hierarchy inside a cluster;
图6为本发明实施例的报文映射过程示意图;FIG. 6 is a schematic diagram of a message mapping process according to an embodiment of the present invention;
图7为本发明实施例串行方式的实施案例示意图;FIG. 7 is a schematic diagram of an implementation example of a serial mode according to an embodiment of the present invention; FIG.
图8为本发明实施例串并混合方式的实施案例示意图。FIG. 8 is a schematic diagram of an implementation example of a serial-to-mix mode according to an embodiment of the present invention.
具体实施方式detailed description
为了能够更加详尽地了解本发明实施例的特点与技术内容,下面结合附图对本发明实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本发明实施例。The embodiments of the present invention are described in detail below with reference to the accompanying drawings.
图1为本发明实施例的基于众核网络处理器中微引擎的报文调度方法的流程示意图,如图1所示,所述基于众核网络处理器中微引擎的报文调度方法包括以下步骤:1 is a schematic flowchart of a packet scheduling method based on a micro-engine in a many-core network processor according to an embodiment of the present invention. As shown in FIG. 1, the packet scheduling method based on a micro-engine in a multi-core network processor includes the following: step:
步骤101:当有报文输入时,为所述报文申请空闲指针。Step 101: When there is a message input, apply for a free pointer for the message.
本发明实施例中,所有微引擎的线程、状态由一个监控模块进行实时监控,以实现基于线程粒度的全局调度。In the embodiment of the present invention, the threads and states of all the micro engines are monitored in real time by a monitoring module to implement global scheduling based on thread granularity.
其中,采用虚拟输出队列(VOQ,Virtual Out Queue)的方式,防止不同流之间报文映射的“头阻塞”现象。每个报文输入后,先申请一个空闲指针,报文存入共享的缓存中指针指向的位置,指针存入对应流的队列。流队列之间采用轮询调度(公平、加权或优先级等轮询测量都可以)输出到映射模块。映射失败的报文重新回到相应流队列,下次再进行映射(可以采用优先或者其他方式)。The virtual output queue (VOQ) mode is used to prevent the "header blocking" phenomenon of packet mapping between different flows. After each message is input, a free pointer is applied first, and the message is stored in the location pointed to by the pointer in the shared cache, and the pointer is stored in the queue of the corresponding stream. The flow queues are output to the mapping module by polling scheduling (a polling measurement such as fairness, weighting, or priority). The packets that failed to be mapped are returned to the corresponding flow queue, and the mapping is performed next time (priority or other methods can be used).
其中,每一条流队列维护一个动态映射表,每条流队列的动态映射表中,条目0到条目N-1的初始值依次为微引擎0到微引擎N-1;Each of the flow queues maintains a dynamic mapping table. In the dynamic mapping table of each flow queue, the initial values of the entries 0 to N-1 are sequentially from microengine 0 to microengine N-1;
其中,N为微引擎总数,即一开始优先选择微引擎0。Among them, N is the total number of micro-engines, that is, the micro-engine 0 is preferentially selected at the beginning.
步骤102:将所述报文存储至共享缓存中所述指令指向的位置,以及将所述指针存储至对应的流队列,其中,所述流队列之间采用轮询调度方式进行调度。 Step 102: Store the message to a location in the shared cache that the instruction points to, and store the pointer to a corresponding flow queue, where the flow queues are scheduled by using a polling scheduling manner.
步骤103:当所述流队列中的指针被调度时,查找所述指针对应的微引擎,以将所述指针对应的报文映射到所述微引擎。Step 103: When the pointer in the flow queue is scheduled, the micro engine corresponding to the pointer is searched to map the packet corresponding to the pointer to the micro engine.
步骤104:当未查找到所述指针对应的微引擎时,将所述指针再次存储至对应的流队列。Step 104: When the micro engine corresponding to the pointer is not found, the pointer is stored again to the corresponding flow queue.
本发明实施例中,所述方法还包括:In the embodiment of the present invention, the method further includes:
根据所述报文状态的更新,更新所述动态映射表。Updating the dynamic mapping table according to the update of the message status.
具体地,当所述报文完成映射或所述报文在微引擎中处理完成后,更新如下状态:Specifically, after the message completes the mapping or the message is processed in the micro engine, the following status is updated:
所述微引擎是否完全空闲的标志;a flag indicating whether the microengine is completely idle;
所述微引擎被1条还是2条流队列占用的标志;Whether the micro engine is occupied by one or two flow queues;
所述微引擎中的空闲线程数;The number of idle threads in the microengine;
所述微引擎中被2条流队列占用时,每条流队列各占用线程的数目;When the micro engine is occupied by two flow queues, the number of threads occupied by each flow queue;
所述流队列在所有微引擎中的报文总数统计。The flow queue counts the total number of packets in all microengines.
本发明实施例中,在所述状态更新后,根据更新后的所述状态按照如下规则更新动态映射表:In the embodiment of the present invention, after the status update, the dynamic mapping table is updated according to the updated state according to the following rules:
当所述流队列的映射表的表头中的微引擎中没有报文或只有1条流队列,且有空闲线程时,则不更新所述映射表;When there is no message or only one flow queue in the micro engine in the header of the mapping table of the flow queue, and there is an idle thread, the mapping table is not updated;
当所述流队列的映射表的表头中的微引擎只有1条队列流,且没有空闲线程时,按照所述映射表中的顺序从表头开始,按照如下规则选择微引擎:When the micro engine in the header of the mapping table of the flow queue has only one queue flow and there is no idle thread, start from the header according to the order in the mapping table, and select the micro engine according to the following rules:
优先选择完全空闲的微引擎;其次选择只有1条流队列且空闲线程最多的微引擎;如果以上两者都不满足,向所述映射表写入无效标志;根据以上规则,将选中的微引擎编号移至所述映射表的表头,所述映射表中的其余条目依次向后移动;Prefers to a fully idle microengine; secondly selects a microengine with only one flow queue and the most idle threads; if neither of the above is satisfied, writes an invalid flag to the mapping table; according to the above rules, the selected microengine The number moves to the header of the mapping table, and the remaining entries in the mapping table are sequentially moved backward;
当所述流队列的映射表的表头中的微引擎中有2条流队列,且有空闲 线程时,按照如下规则更新所述映射表:When there are two flow queues in the micro engine in the header of the mapping table of the flow queue, and there is idle When the thread is updated, the mapping table is updated according to the following rules:
当所述微引擎中一条流队列占用的线程数目大于另外一条流队列,且有完全空闲的微引擎时,将占用线程少的流队列重新映射到完全空闲的微引擎中,其中,优先选择所述映射表中靠前的空闲微引擎;When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and there is a completely idle micro engine, the flow queues occupying fewer threads are remapped to the completely idle micro engine, wherein the preferred one is selected. The idle idle micro engine in the mapping table;
当所述微引擎中一条流队列占用的线程数目大于另外一条流队列,且没有完全空闲的微引擎时,保持映射关系不变;When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and the micro engine is not completely idle, the mapping relationship remains unchanged;
当所述流队里的映射表的表头中的微引擎有2条流队列,且没有空闲线程时,按照如下规则更新所述映射表:When the micro engine in the header of the mapping table in the flow group has two flow queues and no idle threads, the mapping table is updated according to the following rules:
对于占用线程数目较少的流队列对应的映射表,优先选择映射表中靠前的完全空闲的微引擎,其次选择只有1条流队列且空闲线程最多的微引擎;否则保持所述映射表不变;将选择的微引擎移到所述映射表的头部,其余条目依次后移;对于所述微引擎中的另外一条流队列,保持所述映射表不变;For the mapping table corresponding to the flow queue occupying a small number of threads, the first fully idle microengine in the mapping table is preferentially selected, and then the microengine with only one flow queue and the most idle thread is selected; otherwise, the mapping table is not maintained. Changing the selected microengine to the head of the mapping table, and the remaining entries are sequentially moved backward; for another flow queue in the microengine, keeping the mapping table unchanged;
当所述流队列在所有微引擎中的报文总数为0,且超过一定的时间阈值时,将所述流队列的映射表条目全部置为无效。When the total number of the packets of the flow queue in all the micro-engines is 0 and exceeds a certain time threshold, all the mapping table entries of the flow queue are invalid.
下面对上述规则进一步解释说明。The above rules are further explained below.
(1)如果某条流的映射表,表头的微引擎中没有报文或只有1条流,且有空闲线程,那么映射表无需更新。(1) If there is no message or only one stream in the micro engine of the header, and there is an idle thread, the mapping table does not need to be updated.
(2)如果某条流的映射表,表头的微引擎只有1条流,但没有空闲线程,那么按照映射表中的顺序,从表头开始,按照如下算法选择微引擎:(2) If there is only one stream in the mapping table of a stream, but there is no idle thread in the header, then according to the order in the mapping table, starting from the header, select the micro engine according to the following algorithm:
优先选择完全空闲的微引擎。Prefer to a fully idle microengine.
其次选择,只有1条流,且空闲线程最多的微引擎。Second, select only one stream, and the micro-engine with the most idle threads.
如果以上两者都不满足,往该映射表写入无效标志,即该流的报文暂时不能映射到微引擎中,缓存在队列中。If the above two are not satisfied, the invalid flag is written to the mapping table, that is, the packet of the flow cannot be mapped to the micro engine temporarily, and is cached in the queue.
根据以上原则,将选中的微引擎编号移到映射表的表头,表中的其余 条目依次向后移动。According to the above principle, move the selected microengine number to the header of the mapping table, and the rest of the table. The entries move backwards in order.
(3)如果某条流的映射表,表头的微引擎中有2条流,且有空闲线程,按照如下算法更新映射表:(3) If there is a flow map of a stream, there are 2 streams in the micro engine of the header, and there are idle threads, update the mapping table according to the following algorithm:
为了防止一个微引擎中总是有2条流,如果该微引擎中一条流占用的线程数大于另外一条流,且有完全空闲的微引擎,那么将占用线程少的流重新映射到完全空闲的微引擎中,优先选择映射表中,靠前的空闲微引擎。In order to prevent there are always 2 streams in a micro engine, if one stream in the micro engine occupies more threads than the other stream, and there is a completely idle micro engine, then the stream occupying less threads is remapped to completely idle. In the microengine, the idle microengine in the top of the mapping table is preferred.
如果微引擎中一条流占用的线程数大于另外一条流,但没有完全空闲的微引擎,那么映射关系不变。If the number of threads occupied by one stream in the microengine is greater than the other stream, but there is no fully idle microengine, the mapping relationship does not change.
(4)如果某条流的映射表,表头的微引擎有2条流,但没有空闲线程,按照如下算法更新映射表:(4) If there is a flow map of a flow, the micro engine of the header has 2 streams, but there is no idle thread, and the mapping table is updated according to the following algorithm:
其中占用线程数较少的流对应的映射表,优先选择映射表中靠前的完全空闲的微引擎,其次选择其中只有1条流且空闲线程最多的微引擎,否则映射表不变。将选择的微引擎移到映射表的头部,其余条目依次后移。The mapping table corresponding to the stream occupying a small number of threads preferentially selects the first fully idle microengine in the mapping table, and secondly selects the microengine with only one stream and the largest idle thread, otherwise the mapping table does not change. Move the selected microengine to the head of the mapping table, and the remaining entries are moved backwards.
微引擎中的另外一条流,映射关系不变。Another stream in the microengine has the same mapping relationship.
(5)如果某条流在所有微引擎中的报文总数为0,且超过一定的时间阈值th,那么将这条流的映射表条目全部置无效。这样做的目的是,某条流超过一段时间没有新报文进来时,原来这条流占用的微引擎可能已经被其他流占用,那么这条流再进来的报文,优先选择完全空闲的微引擎可以使负载更均衡。(5) If the total number of packets of a flow in all microengines is 0 and exceeds a certain time threshold th, then all the mapping entries of the flow are invalidated. The purpose of this is that when a certain stream does not enter a new message for a period of time, the micro-engine that the stream originally occupied may have been occupied by other streams, then the message that the stream re-enters, prefers the completely idle micro- The engine can make the load more balanced.
图2为本发明实施例的基于众核网络处理器中微引擎的报文调度系统的结构组成示意图,如图2所示,所述基于众核网络处理器中微引擎的报文调度系统包括:2 is a schematic structural diagram of a packet scheduling system based on a microengine in a many-core network processor according to an embodiment of the present invention. As shown in FIG. 2, the packet scheduling system based on a micro-engine in a multi-core network processor includes :
申请单元21,配置为当有报文输入时,为所述报文申请空闲指针;The application unit 21 is configured to apply for a free pointer for the message when a message is input;
存储单元22,配置为将所述报文存储至共享缓存中所述指令指向的位置,以及将所述指针存储至对应的流队列,其中,所述流队列之间采用轮 询调度方式进行调度;The storage unit 22 is configured to store the message to a location in the shared cache pointed by the instruction, and store the pointer to a corresponding flow queue, wherein the flow queue uses a round Request scheduling mode for scheduling;
调度单元23,配置为当所述流队列中的指针被调度时,查找所述指针对应的微引擎,以将所述指针对应的报文映射到所述微引擎;The scheduling unit 23 is configured to: when the pointer in the flow queue is scheduled, search for a micro engine corresponding to the pointer, to map the packet corresponding to the pointer to the micro engine;
所处存储单元22,还配置为当未查找到所述指针对应的微引擎时,将所述指针再次存储至对应的流队列。The storage unit 22 is further configured to store the pointer to the corresponding flow queue again when the micro engine corresponding to the pointer is not found.
每一条流队列维护一个动态映射表,每条流队列的动态映射表中,条目0到条目N-1的初始值依次为微引擎0到微引擎N-1;Each flow queue maintains a dynamic mapping table. In the dynamic mapping table of each flow queue, the initial values of entry 0 to entry N-1 are in order from microengine 0 to microengine N-1;
其中,N为微引擎总数。Where N is the total number of microengines.
所述系统还包括:The system also includes:
更新单元24,配置为根据所述报文状态的更新,更新所述动态映射表。The updating unit 24 is configured to update the dynamic mapping table according to the update of the message status.
所述更新单元24,还配置为当所述报文完成映射或所述报文在微引擎中处理完成后,更新如下状态:The updating unit 24 is further configured to update the following status after the message completes mapping or the processing of the message in the micro engine is completed:
所述微引擎是否完全空闲的标志;a flag indicating whether the microengine is completely idle;
所述微引擎被1条还是2条流队列占用的标志;Whether the micro engine is occupied by one or two flow queues;
所述微引擎中的空闲线程数;The number of idle threads in the microengine;
所述微引擎中被2条流队列占用时,每条流队列各占用线程的数目;When the micro engine is occupied by two flow queues, the number of threads occupied by each flow queue;
所述流队列在所有微引擎中的报文总数统计。The flow queue counts the total number of packets in all microengines.
所述更新单元24,还配置为在所述状态更新后,根据更新后的所述状态按照如下规则更新动态映射表:The updating unit 24 is further configured to: after the status update, update the dynamic mapping table according to the updated status according to the following rules:
当所述流队列的映射表的表头中的微引擎中没有报文或只有1条流队列,且有空闲线程时,则不更新所述映射表;When there is no message or only one flow queue in the micro engine in the header of the mapping table of the flow queue, and there is an idle thread, the mapping table is not updated;
当所述流队列的映射表的表头中的微引擎只有1条队列流,且没有空闲线程时,按照所述映射表中的顺序从表头开始,按照如下规则选择微引擎:When the micro engine in the header of the mapping table of the flow queue has only one queue flow and there is no idle thread, start from the header according to the order in the mapping table, and select the micro engine according to the following rules:
优先选择完全空闲的微引擎;其次选择只有1条流队列且空闲线程最 多的微引擎;如果以上两者都不满足,向所述映射表写入无效标志;根据以上规则,将选中的微引擎编号移至所述映射表的表头,所述映射表中的其余条目依次向后移动;Preference is given to a completely idle microengine; secondly, only one stream queue is selected and the idle thread is the most a plurality of microengines; if neither of the above is satisfied, writing an invalid flag to the mapping table; according to the above rule, moving the selected microengine number to the header of the mapping table, and the rest of the mapping table The entries move backwards in order;
当所述流队列的映射表的表头中的微引擎中有2条流队列,且有空闲线程时,按照如下规则更新所述映射表:When there are two flow queues in the micro engine in the header of the mapping table of the flow queue, and there are idle threads, the mapping table is updated according to the following rules:
当所述微引擎中一条流队列占用的线程数目大于另外一条流队列,且有完全空闲的微引擎时,将占用线程少的流队列重新映射到完全空闲的微引擎中,其中,优先选择所述映射表中靠前的空闲微引擎;When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and there is a completely idle micro engine, the flow queues occupying fewer threads are remapped to the completely idle micro engine, wherein the preferred one is selected. The idle idle micro engine in the mapping table;
当所述微引擎中一条流队列占用的线程数目大于另外一条流队列,且没有完全空闲的微引擎时,保持映射关系不变;When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and the micro engine is not completely idle, the mapping relationship remains unchanged;
当所述流队里的映射表的表头中的微引擎有2条流队列,且没有空闲线程时,按照如下规则更新所述映射表:When the micro engine in the header of the mapping table in the flow group has two flow queues and no idle threads, the mapping table is updated according to the following rules:
对于占用线程数目较少的流队列对应的映射表,优先选择映射表中靠前的完全空闲的微引擎,其次选择只有1条流队列且空闲线程最多的微引擎;否则保持所述映射表不变;将选择的微引擎移到所述映射表的头部,其余条目依次后移;对于所述微引擎中的另外一条流队列,保持所述映射表不变;For the mapping table corresponding to the flow queue occupying a small number of threads, the first fully idle microengine in the mapping table is preferentially selected, and then the microengine with only one flow queue and the most idle thread is selected; otherwise, the mapping table is not maintained. Changing the selected microengine to the head of the mapping table, and the remaining entries are sequentially moved backward; for another flow queue in the microengine, keeping the mapping table unchanged;
当所述流队列在所有微引擎中的报文总数为0,且超过一定的时间阈值时,将所述流队列的映射表条目全部置为无效。When the total number of the packets of the flow queue in all the micro-engines is 0 and exceeds a certain time threshold, all the mapping table entries of the flow queue are invalid.
本发明实施例提供的众核网络处理器由多个微引擎组成;其中,The many-core network processor provided by the embodiment of the present invention is composed of multiple micro-engines;
多个所述微引擎组成一个组,所述多个组组成一个族,在所述组与所述族之间采用全并行结构;所述族之间采用二维网状结构;在所述族网格中间,设置有多个路由模块,所述路由模块采用二维网状结构;a plurality of said microengines forming a group, said plurality of groups forming a family, a fully parallel structure between said group and said family; a two-dimensional network structure between said families; In the middle of the grid, a plurality of routing modules are disposed, and the routing module adopts a two-dimensional network structure;
所述路由模块,配置为向其上下左右4个族输入输出报文、访问片外存储、访问协处理模块; The routing module is configured to input and output messages to the upper, lower, left, and right families, access the off-chip storage, and access the co-processing module;
所述路由模块,配置为完成报文、访问片外存储请求与返回、访问协处理模块请求与返回的路由,完成报文、请求或返回到指定目的族或调度模块的传输。The routing module is configured to complete the packet, access the off-chip storage request and return, access the request and return route of the co-processing module, complete the message, request or return the transmission to the specified destination group or the scheduling module.
为了更好地理解本发明实施例,首先介绍一下众核网络处理器中的微引擎的互连结构。目前网络处理器的微引擎数量已经发展到数百个,由于数量巨大,路由节点众多,现有的多核处理器内核互连结构与路由算法已经不能满足需求。最常用的微引擎连接结构是网状(mesh)结构,即按照二维矩阵的方式连接,每个微引擎都通过互连接口与路由模块连接,用于报文或其他数据穿过微引擎网络到达目的模块(如:片外存储、协处理器等)。这种结构的路由很容易造成局部拥塞、负载不均衡、延迟大等问题。下面以一个256核的网络处理器为例介绍,如图3所示,是一个采用常规结构的256核网络处理器中微引擎互连的结构图,可以看出路由算法的复杂度、可靠性、带宽、可实现性的难度都是非常大的。In order to better understand the embodiments of the present invention, the interconnection structure of the microengines in the many-core network processor is first introduced. At present, the number of micro-engines for network processors has grown to hundreds. Due to the large number and numerous routing nodes, the existing multi-core processor core interconnect structure and routing algorithms are no longer sufficient. The most commonly used micro-engine connection structure is a mesh structure, which is connected in a two-dimensional matrix. Each micro-engine is connected to a routing module through an interconnection interface for packets or other data to pass through the micro-engine network. Reach the destination module (eg off-chip storage, coprocessor, etc.). The routing of this structure can easily cause problems such as local congestion, load imbalance, and large delay. The following is an example of a 256-core network processor. As shown in Figure 3, it is a structural diagram of a micro-engine interconnect in a 256-core network processor with a conventional structure. The complexity and reliability of the routing algorithm can be seen. The difficulty of bandwidth and achievability is very large.
本发明实施例采用多层次结构方案,前面已经描述过组织结构,即采用“cluster-group-me”层次结构。微引擎数量为me_num,cluster数量为cluster_num,每个cluster中group数量为group_num,每个group中微引擎数量为group_me_num,那么它们之间的关系可以用如下的公式表示:The embodiment of the present invention adopts a multi-level structure scheme, and the organizational structure has been described above, that is, the "cluster-group-me" hierarchical structure is adopted. The number of microengines is me_num, the number of clusters is cluster_num, the number of groups in each cluster is group_num, and the number of microengines in each group is group_me_num, then the relationship between them can be expressed by the following formula:
me_num=cluster_num×group_num×group_me_numMe_num=cluster_num×group_num×group_me_num
公式中各个参数的选取,应该综合考虑带宽与后端实现等因素。The selection of each parameter in the formula should take into account factors such as bandwidth and back-end implementation.
以一个256核网络处理器为例介绍,本发明实施例的权利保护范围不限于此例。将4个微引擎组成一个group,4个group组成一个cluster,16个cluster采用二维mesh结构,如图4所示。路由模块在网格中间,也采用二维mesh结构,每个路由模块负责向其上下左右4个cluster输入输出报文、访问片外存储、协处理模块等,同时路由模块之间的互连完成报文、访问片外存储请求与返回、访问协处理模块请求与返回的路由,完成报文、请求或返回到指定目的cluster或调度模块的传输。这种结构的路由节点只有 4个,相比于常规结构的256个节点,节点数量大大减少,节点数量减少带来的好处是,路由算法简单,易于硬件实现,同时可以很好地实现负载均衡。一个cluster中微引擎的组织结构,有group一层,group里面再是4个me,如图5所示。The 256-core network processor is taken as an example. The scope of protection of the embodiments of the present invention is not limited to this example. The four micro-engines are grouped into one group, the four groups form a cluster, and the 16 clusters adopt a two-dimensional mesh structure, as shown in FIG. The routing module is in the middle of the grid, and also adopts a two-dimensional mesh structure. Each routing module is responsible for inputting and outputting messages to the four clusters up and down, accessing off-chip storage, co-processing modules, etc., and the interconnection between the routing modules is completed. Messages, access to off-chip storage requests and returns, access to the co-processing module request and return routes, completion of messages, requests, or returns to the specified destination cluster or scheduling module. The routing node of this structure only Four, compared to the 256 nodes of the conventional structure, the number of nodes is greatly reduced, and the benefit of the reduced number of nodes is that the routing algorithm is simple, easy to implement in hardware, and load balancing can be well achieved. The organization structure of a micro-engine in a cluster has a group layer, and there are four me in the group, as shown in Figure 5.
本发明的报文映射的过程如图6所示,报文向空闲指针fifo申请指针,将报文存储在缓存中的相应地址,指针存入相应流队列。流队列采用轮询的仲裁机制,调度到映射模块进行报文到微引擎的映射。根据报文中的流号flow_num,查对应的流映射表,得到当前的映射表结果。如果查表结果为“无效映射”,表示当前流是一条新流或者是一条映射关系已经老化删除的流,按照如下原则映射:The process of message mapping according to the present invention is as shown in FIG. 6. The message applies a pointer to the idle pointer fifo, stores the message in the corresponding address in the cache, and stores the pointer in the corresponding flow queue. The flow queue adopts a polling arbitration mechanism, and is dispatched to the mapping module to perform mapping of the message to the micro engine. According to the flow number flow_num in the packet, check the corresponding flow mapping table to obtain the current mapping table result. If the result of the lookup table is "invalid mapping", the current flow is a new flow or a flow whose relationship has been deleted by aging, and is mapped according to the following principles:
如果有完全空闲的微引擎,按照映射表中,微引擎的顺序,选择第一个完全空闲的微引擎。这样做的目的是,该微引擎是完全空闲的微引擎中最后使用的微引擎,指令cache中可能还存有有效指令,可以提高指令cache效率。If there is a completely idle microengine, select the first fully idle microengine according to the order of the microengines in the mapping table. The purpose of this is that the micro-engine is the last used micro-engine in the completely idle micro-engine, and there may be valid instructions in the instruction cache, which can improve the instruction cache efficiency.
如果没有完全空闲的微引擎,选择只有1条流在其中,且空闲线程数最多的微引擎。保证一个内核中最多只有2条流,指令cache效率几乎没有损失,但微引擎的处理能力浪费可以减少很多。If there is no fully idle microengine, choose a microengine with only one stream in it and the most idle threads. To ensure that there are at most 2 streams in a kernel, there is almost no loss in instruction cache efficiency, but the processing power of the micro-engine can be reduced a lot.
如果上述2个条件都不满足,则将报文重新入队,等待下次轮询到再重新映射。重新入队的报文,下次轮询到该队列时,优先映射。If the above two conditions are not met, the message is re-entered and waiting for the next poll to be remapped. The message that is re-queued is preferentially mapped the next time the queue is polled.
如果查表结果可以找到有效映射关系,那么将映射表头的微引擎做为处理微引擎。在前面发明内容中已经介绍了映射表的动态更新机制,映射表会根据监控的微引擎状态自动将最新的映射结果放在表头,旧的映射依次向后移。If the result of the lookup table can find a valid mapping relationship, then the microengine of the mapping header is treated as a processing microengine. The dynamic update mechanism of the mapping table has been introduced in the foregoing invention. The mapping table automatically puts the latest mapping result in the header according to the monitored micro-engine status, and the old mapping moves backwards in turn.
映射表的动态更新在前面发明内容中已经很详细地描述,在实施示例中不再重复。 The dynamic update of the mapping table has been described in great detail in the foregoing summary and will not be repeated in the implementation examples.
按照上述的映射步骤与映射表动态更新机制,可以做到尽可能地将同一条流的报文送到最近使用的微引擎中处理,指令cache效率得以提升,同时也大大降低了报文的乱序程度。According to the mapping step and the dynamic update mechanism of the mapping table, the message of the same stream can be sent to the recently used micro-engine as much as possible, and the cache efficiency of the instruction is improved, and the packet mess is greatly reduced. Order degree.
此外,优先选择完全空闲的微引擎、优先选择线程数最多的微引擎、保证一个内核中最多只有2条流,这些规则保证了报文在微引擎中分布的均衡。In addition, the preference is to completely idle the micro-engine, to prioritize the micro-engine with the most threads, and to ensure that there are at most 2 streams in a kernel. These rules ensure the balance of packets distributed in the micro-engine.
虽然以上的描述中,微引擎的组织架构是全并行的,但是本发明的另一个优点在于,它同时可以支持串行的组织结构。很长一段时间,商用网络处理器中,串行微引擎结构是一种主流的架构,它具有性能可保证、无乱序等优点。本发明实现串行结构的实施案例如下:Although the organizational structure of the microengine is fully parallel in the above description, another advantage of the present invention is that it can support the serial organization at the same time. For a long time, in the commercial network processor, the serial micro-engine architecture is a mainstream architecture, which has the advantages of guaranteed performance and no disorder. The implementation examples of the serial structure of the present invention are as follows:
所有报文从第一个cluster进,经过处理后,通过微码指定其下一级的处理cluster编号,报文出第一个cluster后,在路由模块中,根据微码指定其下一级的cluster编号将报文从到下一个cluster,如此直到报文经过所有cluster后,完成报文处理。如图7所示。All packets are sent from the first cluster. After processing, the processing cluster number of the next level is specified by the microcode. After the first cluster is sent out, the routing module specifies the next level according to the microcode. The cluster number will pass the packet to the next cluster, and the packet processing will be completed after the packet passes through all the clusters. As shown in Figure 7.
本发明也支持串并混合的方式,一个具体的实施案例如图8所示,每4个cluster一组,每组cluster内部报文串行在4个clustre之间逐级处理。4组cluster之间是并行的。报文进来后,首先进行1到4的调度,将报文发送都4组cluster中的一个,然后报文在选择的组中,逐级通过4个cluster,最后4组cluster的报文通过一个4到1的汇聚,将报文输出。The invention also supports the method of serial-to-mixing. A specific implementation example is shown in FIG. 8. Each group of four clusters, each group of cluster internal messages is serially processed between four clusters. The 4 sets of clusters are parallel. After the message comes in, the first 1 to 4 scheduling is performed, and the message is sent to one of the four clusters. Then, the message is in the selected group, and the four clusters are passed through, and the last four clusters are passed through one. 4 to 1 convergence, the message will be output.
本发明实施例还记载了一种存储介质,所述存储介质中存储有计算机程序,所述计算机程序配置为执行前述各实施例的基于众核网络处理器中微引擎的报文调度方法。The embodiment of the invention further describes a storage medium in which a computer program is stored, the computer program being configured to execute the message scheduling method based on the microengine in the many-core network processor of the foregoing embodiments.
本发明实施例所记载的技术方案之间,在不冲突的情况下,可以任意组合。The technical solutions described in the embodiments of the present invention can be arbitrarily combined without conflict.
在本发明所提供的几个实施例中,应该理解到,所揭露的方法和智能 设备,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。In several embodiments provided by the present invention, it should be understood that the disclosed method and intelligence The device can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed. In addition, the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各实施例中的各功能单元可以全部集成在一个第二处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one second processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention.
工业实用性Industrial applicability
根据本发明,当有报文输入时,为所述报文申请空闲指针;将所述报文存储至共享缓存中所述指令指向的位置,以及将所述指针存储至对应的流队列,其中,所述流队列之间采用轮询调度方式进行调度;当所述流队列中的指针被调度时,查找所述指针对应的微引擎,以将所述指针对应的报文映射到所述微引擎;当未查找到所述指针对应的微引擎时,将所述指针再次存储至对应的流队列。如此,减少了同一条报文流中报文乱序程度,提高了微引擎指令cache效率,实现了负载均衡、报文保序、指令cache效 率三者的平衡,满足了高性能转发的需求。 According to the present invention, when there is a message input, a free pointer is requested for the message; the message is stored to a location in the shared cache where the instruction points, and the pointer is stored to a corresponding flow queue, wherein The flow queues are scheduled by using a polling scheduling manner; when the pointers in the flow queues are scheduled, the micro engine corresponding to the pointer is searched to map the packets corresponding to the pointers to the micro The engine stores the pointer to the corresponding stream queue again when the micro engine corresponding to the pointer is not found. In this way, the degree of disorder of the message in the same packet flow is reduced, the efficiency of the micro-engine instruction cache is improved, and load balancing, message order-preserving, and instruction cache effect are implemented. The balance of the three rates meets the needs of high-performance forwarding.

Claims (12)

  1. 一种基于众核网络处理器中微引擎的报文调度方法,所述方法包括:A message scheduling method based on a microengine in a many-core network processor, the method comprising:
    当有报文输入时,为所述报文申请空闲指针;When there is a message input, apply for a free pointer for the message;
    将所述报文存储至共享缓存中所述指令指向的位置,以及将所述指针存储至对应的流队列,其中,所述流队列之间采用轮询调度方式进行调度;And storing the packet in a location in the shared cache, and storing the pointer to a corresponding flow queue, where the flow queues are scheduled by using a polling scheduling manner;
    当所述流队列中的指针被调度时,查找所述指针对应的微引擎,以将所述指针对应的报文映射到所述微引擎;When the pointer in the flow queue is scheduled, the micro engine corresponding to the pointer is searched to map the packet corresponding to the pointer to the micro engine;
    当未查找到所述指针对应的微引擎时,将所述指针再次存储至对应的流队列。When the micro engine corresponding to the pointer is not found, the pointer is stored again to the corresponding stream queue.
  2. 根据权利要求1所述的基于众核网络处理器中微引擎的报文调度方法,其中,每一条流队列维护一个动态映射表,每条流队列的动态映射表中,条目0到条目N-1的初始值依次为微引擎0到微引擎N-1;The packet scheduling method based on a microengine in a many-core network processor according to claim 1, wherein each flow queue maintains a dynamic mapping table, and in each dynamic queue of the flow queue, entries 0 to N- The initial value of 1 is microengine 0 to microengine N-1;
    其中,N为微引擎总数。Where N is the total number of microengines.
  3. 根据权利要求2所述的基于众核网络处理器中微引擎的报文调度方法,其中,所述方法还包括:The packet scheduling method of the micro-engine based on the multi-core network processor according to claim 2, wherein the method further comprises:
    根据所述报文状态的更新,更新所述动态映射表。Updating the dynamic mapping table according to the update of the message status.
  4. 根据权利要求3所述的基于众核网络处理器中微引擎的报文调度方法,其中,所述根据所述报文状态的更新,更新所述动态映射表中微引擎的状态,包括:The message scheduling method of the micro-engine based on the multi-core network processor according to claim 3, wherein the updating the state of the micro-engine in the dynamic mapping table according to the update of the message status comprises:
    当所述报文完成映射或所述报文在微引擎中处理完成后,更新如下状态:After the message completes the mapping or the message is processed in the microengine, the following status is updated:
    所述微引擎是否完全空闲的标志;a flag indicating whether the microengine is completely idle;
    所述微引擎被1条还是2条流队列占用的标志;Whether the micro engine is occupied by one or two flow queues;
    所述微引擎中的空闲线程数;The number of idle threads in the microengine;
    所述微引擎中被2条流队列占用时,每条流队列各占用线程的数目; When the micro engine is occupied by two flow queues, the number of threads occupied by each flow queue;
    所述流队列在所有微引擎中的报文总数统计。The flow queue counts the total number of packets in all microengines.
  5. 根据权利要求4所述的基于众核网络处理器中微引擎的报文调度方法,其中,所述方法还包括:The packet scheduling method of the micro-engine based on the multi-core network processor according to claim 4, wherein the method further comprises:
    在所述状态更新后,根据更新后的所述状态按照如下规则更新动态映射表:After the status update, the dynamic mapping table is updated according to the updated status according to the following rules:
    当所述流队列的映射表的表头中的微引擎中没有报文或只有1条流队列,且有空闲线程时,则不更新所述映射表;When there is no message or only one flow queue in the micro engine in the header of the mapping table of the flow queue, and there is an idle thread, the mapping table is not updated;
    当所述流队列的映射表的表头中的微引擎只有1条队列流,且没有空闲线程时,按照所述映射表中的顺序从表头开始,按照如下规则选择微引擎:When the micro engine in the header of the mapping table of the flow queue has only one queue flow and there is no idle thread, start from the header according to the order in the mapping table, and select the micro engine according to the following rules:
    优先选择完全空闲的微引擎;其次选择只有1条流队列且空闲线程最多的微引擎;如果以上两者都不满足,向所述映射表写入无效标志;根据以上规则,将选中的微引擎编号移至所述映射表的表头,所述映射表中的其余条目依次向后移动;Prefers to a fully idle microengine; secondly selects a microengine with only one flow queue and the most idle threads; if neither of the above is satisfied, writes an invalid flag to the mapping table; according to the above rules, the selected microengine The number moves to the header of the mapping table, and the remaining entries in the mapping table are sequentially moved backward;
    当所述流队列的映射表的表头中的微引擎中有2条流队列,且有空闲线程时,按照如下规则更新所述映射表:When there are two flow queues in the micro engine in the header of the mapping table of the flow queue, and there are idle threads, the mapping table is updated according to the following rules:
    当所述微引擎中一条流队列占用的线程数目大于另外一条流队列,且有完全空闲的微引擎时,将占用线程少的流队列重新映射到完全空闲的微引擎中,其中,优先选择所述映射表中靠前的空闲微引擎;When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and there is a completely idle micro engine, the flow queues occupying fewer threads are remapped to the completely idle micro engine, wherein the preferred one is selected. The idle idle micro engine in the mapping table;
    当所述微引擎中一条流队列占用的线程数目大于另外一条流队列,且没有完全空闲的微引擎时,保持映射关系不变;When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and the micro engine is not completely idle, the mapping relationship remains unchanged;
    当所述流队里的映射表的表头中的微引擎有2条流队列,且没有空闲线程时,按照如下规则更新所述映射表:When the micro engine in the header of the mapping table in the flow group has two flow queues and no idle threads, the mapping table is updated according to the following rules:
    对于占用线程数目较少的流队列对应的映射表,优先选择映射表中靠前的完全空闲的微引擎,其次选择只有1条流队列且空闲线程最多的微引 擎;否则保持所述映射表不变;将选择的微引擎移到所述映射表的头部,其余条目依次后移;对于所述微引擎中的另外一条流队列,保持所述映射表不变;For the mapping table corresponding to the flow queue occupying a small number of threads, the first fully idle microengine in the mapping table is preferentially selected, and then the microinjection with only one flow queue and the most idle thread is selected. Engine; otherwise, keep the mapping table unchanged; move the selected microengine to the head of the mapping table, and the remaining entries are sequentially moved backward; for another flow queue in the microengine, keep the mapping table unchanged change;
    当所述流队列在所有微引擎中的报文总数为0,且超过一定的时间阈值时,将所述流队列的映射表条目全部置为无效。When the total number of the packets of the flow queue in all the micro-engines is 0 and exceeds a certain time threshold, all the mapping table entries of the flow queue are invalid.
  6. 一种基于众核网络处理器中微引擎的报文调度系统,所述系统包括:A message scheduling system based on a microengine in a many-core network processor, the system comprising:
    申请单元,配置为当有报文输入时,为所述报文申请空闲指针;The application unit is configured to apply for a free pointer for the message when a message is input;
    存储单元,配置为将所述报文存储至共享缓存中所述指令指向的位置,以及将所述指针存储至对应的流队列,其中,所述流队列之间采用轮询调度方式进行调度;a storage unit, configured to store the message to a location pointed by the instruction in the shared cache, and store the pointer to a corresponding flow queue, where the flow queue is scheduled by using a polling scheduling manner;
    调度单元,配置为当所述流队列中的指针被调度时,查找所述指针对应的微引擎,以将所述指针对应的报文映射到所述微引擎;a scheduling unit, configured to: when the pointer in the flow queue is scheduled, search for a micro engine corresponding to the pointer, to map the packet corresponding to the pointer to the micro engine;
    所处存储单元,还配置为当未查找到所述指针对应的微引擎时,将所述指针再次存储至对应的流队列。The storage unit is further configured to store the pointer to the corresponding flow queue again when the micro engine corresponding to the pointer is not found.
  7. 根据权利要求6所述的基于众核网络处理器中微引擎的报文调度系统,其中,每一条流队列维护一个动态映射表,每条流队列的动态映射表中,条目0到条目N-1的初始值依次为微引擎0到微引擎N-1;The packet scheduling system based on a microengine in a many-core network processor according to claim 6, wherein each flow queue maintains a dynamic mapping table, and in each dynamic queue of the flow queue, entries 0 to N- The initial value of 1 is microengine 0 to microengine N-1;
    其中,N为微引擎总数。Where N is the total number of microengines.
  8. 根据权利要求7所述的基于众核网络处理器中微引擎的报文调度系统,其中,所述系统还包括:The packet scheduling system of a micro-engine based on a multi-core network processor according to claim 7, wherein the system further comprises:
    更新单元,配置为根据所述报文状态的更新,更新所述动态映射表。And an update unit configured to update the dynamic mapping table according to the update of the message status.
  9. 根据权利要求8所述的基于众核网络处理器中微引擎的报文调度系统,其中,所述更新单元,还配置为当所述报文完成映射或所述报文在微引擎中处理完成后,更新如下状态:The packet scheduling system of the micro-engine based on the multi-core network processor according to claim 8, wherein the updating unit is further configured to: when the message completes mapping or the message is processed in the microengine After that, update the following status:
    所述微引擎是否完全空闲的标志; a flag indicating whether the microengine is completely idle;
    所述微引擎被1条还是2条流队列占用的标志;Whether the micro engine is occupied by one or two flow queues;
    所述微引擎中的空闲线程数;The number of idle threads in the microengine;
    所述微引擎中被2条流队列占用时,每条流队列各占用线程的数目;When the micro engine is occupied by two flow queues, the number of threads occupied by each flow queue;
    所述流队列在所有微引擎中的报文总数统计。The flow queue counts the total number of packets in all microengines.
  10. 根据权利要求9所述的基于众核网络处理器中微引擎的报文调度系统,其中,所述更新单元,还配置为在所述状态更新后,根据更新后的所述状态按照如下规则更新动态映射表:The packet scheduling system of the micro-engine based on the multi-core network processor according to claim 9, wherein the updating unit is further configured to update according to the updated status according to the following rules after the status update. Dynamic mapping table:
    当所述流队列的映射表的表头中的微引擎中没有报文或只有1条流队列,且有空闲线程时,则不更新所述映射表;When there is no message or only one flow queue in the micro engine in the header of the mapping table of the flow queue, and there is an idle thread, the mapping table is not updated;
    当所述流队列的映射表的表头中的微引擎只有1条队列流,且没有空闲线程时,按照所述映射表中的顺序从表头开始,按照如下规则选择微引擎:When the micro engine in the header of the mapping table of the flow queue has only one queue flow and there is no idle thread, start from the header according to the order in the mapping table, and select the micro engine according to the following rules:
    优先选择完全空闲的微引擎;其次选择只有1条流队列且空闲线程最多的微引擎;如果以上两者都不满足,向所述映射表写入无效标志;根据以上规则,将选中的微引擎编号移至所述映射表的表头,所述映射表中的其余条目依次向后移动;Prefers to a fully idle microengine; secondly selects a microengine with only one flow queue and the most idle threads; if neither of the above is satisfied, writes an invalid flag to the mapping table; according to the above rules, the selected microengine The number moves to the header of the mapping table, and the remaining entries in the mapping table are sequentially moved backward;
    当所述流队列的映射表的表头中的微引擎中有2条流队列,且有空闲线程时,按照如下规则更新所述映射表:When there are two flow queues in the micro engine in the header of the mapping table of the flow queue, and there are idle threads, the mapping table is updated according to the following rules:
    当所述微引擎中一条流队列占用的线程数目大于另外一条流队列,且有完全空闲的微引擎时,将占用线程少的流队列重新映射到完全空闲的微引擎中,其中,优先选择所述映射表中靠前的空闲微引擎;When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and there is a completely idle micro engine, the flow queues occupying fewer threads are remapped to the completely idle micro engine, wherein the preferred one is selected. The idle idle micro engine in the mapping table;
    当所述微引擎中一条流队列占用的线程数目大于另外一条流队列,且没有完全空闲的微引擎时,保持映射关系不变;When the number of threads occupied by one flow queue in the micro engine is greater than another flow queue, and the micro engine is not completely idle, the mapping relationship remains unchanged;
    当所述流队里的映射表的表头中的微引擎有2条流队列,且没有空闲线程时,按照如下规则更新所述映射表: When the micro engine in the header of the mapping table in the flow group has two flow queues and no idle threads, the mapping table is updated according to the following rules:
    对于占用线程数目较少的流队列对应的映射表,优先选择映射表中靠前的完全空闲的微引擎,其次选择只有1条流队列且空闲线程最多的微引擎;否则保持所述映射表不变;将选择的微引擎移到所述映射表的头部,其余条目依次后移;对于所述微引擎中的另外一条流队列,保持所述映射表不变;For the mapping table corresponding to the flow queue occupying a small number of threads, the first fully idle microengine in the mapping table is preferentially selected, and then the microengine with only one flow queue and the most idle thread is selected; otherwise, the mapping table is not maintained. Changing the selected microengine to the head of the mapping table, and the remaining entries are sequentially moved backward; for another flow queue in the microengine, keeping the mapping table unchanged;
    当所述流队列在所有微引擎中的报文总数为0,且超过一定的时间阈值时,将所述流队列的映射表条目全部置为无效。When the total number of the packets of the flow queue in all the micro-engines is 0 and exceeds a certain time threshold, all the mapping table entries of the flow queue are invalid.
  11. 一种众核网络处理器,所述处理器由多个微引擎组成;其中,A many-core network processor, the processor being composed of a plurality of micro-engines;
    多个所述微引擎组成一个组,所述多个组组成一个族,在所述组与所述族之间采用全并行结构;所述族之间采用二维网状结构;在所述族网格中间,设置有多个路由模块,所述路由模块采用二维网状结构;a plurality of said microengines forming a group, said plurality of groups forming a family, a fully parallel structure between said group and said family; a two-dimensional network structure between said families; In the middle of the grid, a plurality of routing modules are disposed, and the routing module adopts a two-dimensional network structure;
    所述路由模块,配置为向其上下左右4个族输入输出报文、访问片外存储、访问协处理模块;The routing module is configured to input and output messages to the upper, lower, left, and right families, access the off-chip storage, and access the co-processing module;
    所述路由模块,配置为完成报文、访问片外存储请求与返回、访问协处理模块请求与返回的路由,完成报文、请求或返回到指定目的族或调度模块的传输。The routing module is configured to complete the packet, access the off-chip storage request and return, access the request and return route of the co-processing module, complete the message, request or return the transmission to the specified destination group or the scheduling module.
  12. 一种存储介质,所述存储介质中存储有计算机程序,所述计算机程序配置为执行权利要求1至5任一项所述的基于众核网络处理器中微引擎的报文调度方法。 A storage medium storing a computer program configured to execute a message scheduling method based on a microengine in a many-core network processor according to any one of claims 1 to 5.
PCT/CN2016/088163 2015-10-21 2016-07-01 Method and system for packet scheduling using many-core network processor and micro-engine thereof, and storage medium WO2017067215A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510695926.1A CN106612236B (en) 2015-10-21 2015-10-21 Many-core network processor and message scheduling method and system of micro-engine thereof
CN201510695926.1 2015-10-21

Publications (1)

Publication Number Publication Date
WO2017067215A1 true WO2017067215A1 (en) 2017-04-27

Family

ID=58556696

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/088163 WO2017067215A1 (en) 2015-10-21 2016-07-01 Method and system for packet scheduling using many-core network processor and micro-engine thereof, and storage medium

Country Status (2)

Country Link
CN (1) CN106612236B (en)
WO (1) WO2017067215A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111262792A (en) * 2020-01-17 2020-06-09 新华三信息安全技术有限公司 Message forwarding method, device, network equipment and storage medium
CN114285807A (en) * 2021-12-22 2022-04-05 中国农业银行股份有限公司 Message information management method, device, server and storage medium
CN114415969A (en) * 2022-02-09 2022-04-29 杭州云合智网技术有限公司 Dynamic storage method for message of switching chip
CN117956054A (en) * 2024-03-26 2024-04-30 上海云豹创芯智能科技有限公司 Method, system, chip and storage medium for realizing timer processing in RDMA
CN117956054B (en) * 2024-03-26 2024-06-11 上海云豹创芯智能科技有限公司 Method, system, chip and storage medium for realizing timer processing in RDMA

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257280B (en) * 2017-07-14 2022-05-27 深圳市中兴微电子技术有限公司 Micro-engine and message processing method thereof
CN109391556B (en) * 2017-08-10 2022-02-18 深圳市中兴微电子技术有限公司 Message scheduling method, device and storage medium
CN107579921B (en) * 2017-09-26 2020-09-25 锐捷网络股份有限公司 Flow control method and device
CN108833299B (en) * 2017-12-27 2021-12-28 北京时代民芯科技有限公司 Large-scale network data processing method based on reconfigurable switching chip architecture
CN108762810B (en) * 2017-12-27 2021-01-08 北京时代民芯科技有限公司 Network message header processor based on parallel micro-engine
EP3893122A4 (en) * 2018-12-24 2022-01-05 Huawei Technologies Co., Ltd. Network processor and message processing method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101072176A (en) * 2007-04-02 2007-11-14 华为技术有限公司 Report processing method and system
CN101739241A (en) * 2008-11-12 2010-06-16 中国科学院微电子研究所 On-chip multi-core DSP cluster and application extension method
WO2014183530A1 (en) * 2013-05-14 2014-11-20 华为技术有限公司 Task assigning method, task assigning apparatus, and network-on-chip
CN104394096A (en) * 2014-12-11 2015-03-04 福建星网锐捷网络有限公司 Multi-core processor based message processing method and multi-core processor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101072176A (en) * 2007-04-02 2007-11-14 华为技术有限公司 Report processing method and system
CN101739241A (en) * 2008-11-12 2010-06-16 中国科学院微电子研究所 On-chip multi-core DSP cluster and application extension method
WO2014183530A1 (en) * 2013-05-14 2014-11-20 华为技术有限公司 Task assigning method, task assigning apparatus, and network-on-chip
CN104394096A (en) * 2014-12-11 2015-03-04 福建星网锐捷网络有限公司 Multi-core processor based message processing method and multi-core processor

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111262792A (en) * 2020-01-17 2020-06-09 新华三信息安全技术有限公司 Message forwarding method, device, network equipment and storage medium
CN111262792B (en) * 2020-01-17 2022-04-01 新华三信息安全技术有限公司 Message forwarding method, device, network equipment and storage medium
CN114285807A (en) * 2021-12-22 2022-04-05 中国农业银行股份有限公司 Message information management method, device, server and storage medium
CN114415969A (en) * 2022-02-09 2022-04-29 杭州云合智网技术有限公司 Dynamic storage method for message of switching chip
CN114415969B (en) * 2022-02-09 2023-09-29 杭州云合智网技术有限公司 Method for dynamically storing messages of exchange chip
CN117956054A (en) * 2024-03-26 2024-04-30 上海云豹创芯智能科技有限公司 Method, system, chip and storage medium for realizing timer processing in RDMA
CN117956054B (en) * 2024-03-26 2024-06-11 上海云豹创芯智能科技有限公司 Method, system, chip and storage medium for realizing timer processing in RDMA

Also Published As

Publication number Publication date
CN106612236A (en) 2017-05-03
CN106612236B (en) 2020-02-07

Similar Documents

Publication Publication Date Title
WO2017067215A1 (en) Method and system for packet scheduling using many-core network processor and micro-engine thereof, and storage medium
US11036556B1 (en) Concurrent program execution optimization
US9571399B2 (en) Method and apparatus for congestion-aware routing in a computer interconnection network
US6393026B1 (en) Data packet processing system and method for a router
WO2017003887A1 (en) Convolutional neural networks on hardware accelerators
JP2016195375A (en) Method and apparatus for using multiple linked memory lists
CN105740199B (en) Time sequence power estimation device and method of network on chip
US10659372B2 (en) Multi-core lock-free rate limiting apparatus and method
CN112084027B (en) Network-on-chip data transmission method, device, network-on-chip, equipment and medium
US9860841B2 (en) Communications fabric with split paths for control and data packets
US9727499B2 (en) Hardware first come first serve arbiter using multiple request buckets
US9304706B2 (en) Efficient complex network traffic management in a non-uniform memory system
Daneshtalab et al. CARS: Congestion-aware request scheduler for network interfaces in NoC-based manycore systems
Cota et al. NoC basics
US20130219094A1 (en) Commonality of Memory Island Interface and Structure
US20210051116A1 (en) Efficient packet queueing for computer networks
US9588928B1 (en) Unique packet multicast packet ready command
US20150003250A1 (en) Credit-Based Resource Allocator Circuit
US9996468B1 (en) Scalable dynamic memory management in a network device
US8559436B2 (en) Processing resource management in an island-based network flow processor
US9727512B1 (en) Identical packet multicast packet ready command
Huang et al. Accelerating NoC-based MPI primitives via communication architecture customization
Chen et al. Contention minimization in emerging smart NoC via direct and indirect routes
Salah et al. Design of a 2d mesh-torus router for network on chip
US9164794B2 (en) Hardware prefix reduction circuit

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16856655

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16856655

Country of ref document: EP

Kind code of ref document: A1