WO2015131445A1 - Microengine and packet processing method therefor, and computer storage medium - Google Patents

Microengine and packet processing method therefor, and computer storage medium Download PDF

Info

Publication number
WO2015131445A1
WO2015131445A1 PCT/CN2014/077834 CN2014077834W WO2015131445A1 WO 2015131445 A1 WO2015131445 A1 WO 2015131445A1 CN 2014077834 W CN2014077834 W CN 2014077834W WO 2015131445 A1 WO2015131445 A1 WO 2015131445A1
Authority
WO
WIPO (PCT)
Prior art keywords
message
thread
queue
level
instruction
Prior art date
Application number
PCT/CN2014/077834
Other languages
French (fr)
Chinese (zh)
Inventor
周峰
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2015131445A1 publication Critical patent/WO2015131445A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/64Hybrid switching systems
    • H04L12/6418Hybrid transport

Abstract

Disclosed are a microengine (ME) and a packet processing method therefor, and a computer storage medium. The method comprises: conducting, by an ME, thread allocation on received packets through at least five thread management queues; according to the allocated threads, storing the packets in a packet memory having a double read-write port; and controlling the allocated threads to process the packets stored in the packet memory in an eight-level pipelining manner. Also disclosed at the same time is an ME for processing packets.

Description

一种微引擎及其处理报文的方法、 计算 储介质 技术领域  Microengine and method for processing message thereof, and calculation storage medium
本发明涉及网络处理器技术,尤其涉及一种微引擎( ME, Micro Engine ) 及其处理报文的方法、 计算机存储介质。 背景技术  The present invention relates to network processor technologies, and in particular, to a micro engine (ME, Micro Engine) and a method thereof for processing a message, and a computer storage medium. Background technique
为了满足未来网络发展的需要, 提高路由器的性能, 处于因特网 In order to meet the needs of future network development, improve the performance of the router, in the Internet
( Internet ) 骨干位置的核心路由器进行了一个又一个的技术变革。 尤其在 高端路由器市场, 网络处理器以其杰出的报文处理性能及可编程性已经成 为构成路由转发引擎不可替代的部分。 目前业界基本使用多线程的网络处 理器结构, 而多线程的管理和调度是影响多线程网络处理器性能的一个关 键因素。 (Internet) The core routers at the backbone locations have undergone one technological change. Especially in the high-end router market, network processors have become an irreplaceable part of the routing and forwarding engine with its outstanding packet processing performance and programmability. Currently, the industry basically uses a multi-threaded network processor structure, and multi-thread management and scheduling is a key factor affecting the performance of multi-threaded network processors.
在网络处理器系统中, ME是网络处理器的核心部件。 多线程结构是提 高网络处理器 ME性能的一种有效方法, 但是也会带来线程管理的复杂性 和系统频率瓶颈等问题。 因此需要设计一个合理的方案来实现高频高效的 ME线程调度管理, 同时使 ME有较高的处理性能。  In a network processor system, the ME is a core component of the network processor. Multi-threaded architecture is an effective way to improve the performance of network processor ME, but it also brings the complexity of thread management and system frequency bottlenecks. Therefore, it is necessary to design a reasonable solution to achieve high-frequency and efficient ME thread scheduling management, and at the same time enable ME to have higher processing performance.
一些传统的多线程网络处理器采用了基于粗粒度调度的 ME, 这样的 ME虽然可以保证一个线程的指令全速执行,但是在每一次线程切换时,数 据的加载和保存都会造成内核流水线的空闲, 从而导致 ME性能的下降。  Some traditional multi-threaded network processors use MEs based on coarse-grained scheduling. Although such MEs can guarantee the execution of a thread's instructions at full speed, the loading and saving of data will cause the kernel pipeline to be idle every time the thread is switched. As a result, the performance of the ME is degraded.
另外, 由于 ME的流水线中只有一条线程在执行, 所以设计方案时需 要解决数据冒险的问题。 当采用数据前推的设计时, 将增加逻辑的复杂度, 并在连续处理两个结果相关的报文指令时造成组合逻辑路径的增加, 导致 系统频率的下降。 发明内容 In addition, since only one thread in the ME pipeline is executing, the design of the solution needs to solve the problem of data risk. When the data push design is adopted, the complexity of the logic is increased, and the combined logic path is increased when the two result related message instructions are continuously processed, resulting in a decrease in the system frequency. Summary of the invention
为解决现有存在的技术问题, 本发明实施例期望提供一种 ME及其处 理报文的方法、 计算机存储介质, 能够在克服现有的 ME频率与性能不高 的问题。  In order to solve the existing technical problems, the embodiments of the present invention are expected to provide an ME, a method for processing the message, and a computer storage medium, which can overcome the problem that the existing ME frequency and performance are not high.
本发明实施例的技术方案是这样实现的:  The technical solution of the embodiment of the present invention is implemented as follows:
本发明实施例提供了一种 ME处理 4艮文的方法, 所述方法包括: ME通 过至少五个线程管理队列对接收到的报文进行线程分配, 根据所分配的线 程将所述报文存储在具有双读写端口的报文存储器中, 并采用八级流水线 的方式控制所分配的线程对存储在所述报文存储器中的报文进行处理。  An embodiment of the present invention provides a method for an ME to process a message. The method includes: the ME performs thread allocation on the received message through at least five thread management queues, and stores the message according to the allocated thread. In the message memory with dual read/write ports, the eight-stage pipeline is used to control the allocated threads to process the messages stored in the message memory.
本发明实施例提供了一种 ME, 所述 ME包括: 线程管理模块、 具有双 读写端口的报文存储模块、 内核模块; 其中, 所述线程管理模块, 配置为 通过至少五个线程管理队列对接收到的报文进行线程分配; 所述报文存储 模块, 配置为根据所分配的线程存储所述报文; 所述内核模块, 配置为采 用八级流水线的方式控制所分配的线程对存储在所述报文存储模块中的报 文进行处理。  The embodiment of the present invention provides an ME, where the ME includes: a thread management module, a message storage module having dual read/write ports, and a kernel module; wherein the thread management module is configured to manage the queue by using at least five threads. Performing thread allocation on the received message; the message storage module is configured to store the message according to the allocated thread; and the kernel module is configured to control the allocated thread pair storage by using an eight-stage pipeline The message in the message storage module is processed.
本发明实施例还提供了一种计算机存储介质, 其中存储有计算机程序, 该计算机程序用于执行所述 ME处理报文的方法。  The embodiment of the invention further provides a computer storage medium, wherein a computer program for executing the method for processing the message by the ME is stored.
由此可见, 本发明实施例提供一种 ME及其处理报文的方法、 计算机 存储介质, ME通过至少五个线程管理队列对接收到报文进行线程分配,根 据所分配的线程将所述报文存储在具有双读写端口的报文存储器中, 并采 用八级流水线的方式控制所分配的线程对存储在所述报文存储模块中的报 文进行处理; 从硬件结构上避免了数据冒险的产生, 简化了逻辑, 不需要 进行与数据冒险相关判断逻辑, 并避免了 ME 内部资源访问冲突的产生, 有效地提高 ME工作频率和性能, 保证 ME高频率高性能的处理报文, 并 且方案实现相对简单, 能够降低编码的复杂度, 从而降低人力成本。 附图说明 As can be seen, an embodiment of the present invention provides an ME, a method for processing a message, and a computer storage medium. The ME allocates a received message by using at least five thread management queues, and the report is sent according to the allocated thread. The file is stored in a message memory with dual read/write ports, and the eight-stage pipeline is used to control the allocated threads to process the messages stored in the message storage module; the data risk is avoided from the hardware structure. The generation of the logic simplifies the logic, does not need to make judgment logic related to the data adventure, and avoids the occurrence of internal resource access conflicts in the ME, effectively improves the ME operating frequency and performance, and ensures the ME high-frequency and high-performance processing of the message, and the solution The implementation is relatively simple, and the complexity of the coding can be reduced, thereby reducing labor costs. DRAWINGS
图 1为本发明实施例一提供的 ME处理报文的方法的流程示意图; 图 2为本发明实施例二提供的 ME处理报文的方法的流程示意图; 图 3为本发明实施例二提供的 ME流水线处理一个 4艮文的工作过程示 意图;  1 is a schematic flowchart of a method for processing a message by an ME according to Embodiment 1 of the present invention; FIG. 2 is a schematic flowchart of a method for processing a message by an ME according to Embodiment 2 of the present invention; Schematic diagram of the working process of the ME pipeline processing a 4 艮 text;
图 4为本发明实施例二提供的 ME流水线处理多个个 4艮文的工作过程 示意图;  4 is a schematic diagram of a working process of processing a plurality of MEs in a ME pipeline according to Embodiment 2 of the present invention;
图 5本发明实施例三提供的 ME的结构示意图。 具体实施方式 在本发明实施例中, ME通过至少五个线程管理队列对接收到的报文进 行线程分配, 根据所分配的线程将所述报文存储在具有双读写端口的报文 存储器中, 并采用八级流水线的方式控制所分配的线程对存储在所述 ·艮文 存储器中的报文进行处理。  FIG. 5 is a schematic structural diagram of an ME according to Embodiment 3 of the present invention. In the embodiment of the present invention, the ME performs thread allocation on the received message through at least five thread management queues, and stores the message in a message storage having dual read/write ports according to the allocated thread. And adopting an eight-stage pipeline to control the allocated thread to process the message stored in the memory.
下面结合附图及具体实施例对本发明的技术方案做进一步的详细说 明。  The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
实施例一 Embodiment 1
图 1为本发明实施例一提供的 ME处理报文的方式的流程示意图, 如 图 1所示, 该方法包括以下步驟:  FIG. 1 is a schematic flowchart of a manner of processing a message by an ME according to Embodiment 1 of the present invention. As shown in FIG. 1, the method includes the following steps:
步驟 101 , ME通过至少五个线程管理队列对接收到的报文进行线程分 配;  Step 101: The ME performs thread allocation on the received packet by using at least five thread management queues.
具体的, 以五个线程管理队列为例, ME接收到新的报文时, 由空闲队 列 free— queue 以先进先出的方式为 4艮文分配线程号, 并将分配的线程号和 报文自身携带的取指地址写到待处理队列 rdy— queue, 当 ME中具有空闲的 流水线资源时, ME从 rdy— queue中调度一个待处理的报文的线程号和所述 线程号对应的取指地址写到操作队列 work— queue, work— queue中存储的都 是 ME正在处理的报文的线程号和取指地址, 当一个报文需要查表时, 将 所述 4艮文的线程号和取指地址写入到查表队列 srh— queue中, 当一个 4艮文处 理完毕时, 将所述报文的线程号和取指地址写入到报文输出队列 pkt— out— queue 中; 其中, 当一个报文需要查表或处理完毕时, 将报文的线 程号和取指地址写入 srh— queue或 pkt— out— queue的同时, 从 work— queue中 删除该 4艮文对应的线程号和取指地址。 Specifically, taking five thread management queues as an example, when the ME receives a new packet, the idle queue free_queue allocates a thread number to the 4艮 text in a first-in, first-out manner, and allocates the thread number and the packet. The fetch address of the packet to be processed is written to the queue to be processed, rdy_queue. When the ME has an idle pipeline resource, the ME schedules a thread number of the to-be-processed message from the rdy_queue and the The instruction fetch address corresponding to the thread number is written to the work queue work_queue. The work_queue stores the thread number and the fetch address of the packet being processed by the ME. When a packet needs to be looked up, the 4 The thread number and the fetch address of the text are written into the table lookup queue srh_queue. When a 4艮 file is processed, the thread number and the fetch address of the message are written to the message output queue pkt— Out_ queue; wherein, when a message needs to be looked up or processed, the thread number and the fetch address of the message are written into the srh_queue or pkt_out_queue, and the work_queue is deleted. 4) The thread number and the fetch address corresponding to the text.
其中, 由 free— queue 为报文分配的线程号与报文本身——对应, 通过 所分配的线程号能确定其对应的 ·艮文。  The thread number assigned by the free_queue to the message body corresponds to the body of the message, and the corresponding thread number can be determined by the assigned thread number.
当 ME正在处理的报文少于 8个时, ME具有空闲流水线资源, 则从 rdy— queue中调度一个待处理的报文的线程号和所述线程号对应的取指地址 写到 work— queue 中, 将该空闲的流水线资源分配给此时调度进入 work— queue 的线程号对应的 4艮文, 由该 4艮文的线程号对应的线程利用空闲 的流水线资源对该 4艮文进行处理。 这里, work— queue 中存储的正在处理的 报文的线程号的总数为 8, 与八级流水线对应; 当 work— queue中存储的线 程号的总数为 8时, ME正在处理的报文的数量为 8, 其中每个报文对应一 个线程, 此时, ME的八级流水线中有 8个线程在八级流水线中循环工作。  When the ME is processing less than eight packets, the ME has an idle pipeline resource, and the thread number of a packet to be processed and the fetch address corresponding to the thread number are written from the rdy_queue to the work_queue. The idle pipeline resource is allocated to the 4艮 file corresponding to the thread number scheduled to enter the work_queue at this time, and the thread corresponding to the thread number of the 4艮 text uses the idle pipeline resource to process the 4艮 text. Here, the total number of thread numbers of the packets being processed stored in the work_queue is 8, corresponding to the eight-stage pipeline; when the total number of thread numbers stored in the work_queue is 8, the number of packets being processed by the ME As 8, each of the messages corresponds to one thread. At this time, 8 of the ME's eight-stage pipelines work in the eight-stage pipeline.
在八级流水线处理 4艮文的过程中, 八级流水线的每级对应一个线程, 每个线程处理一个报文, 因此, ME流水线可同时对 8个报文进行处理, 当 8个报文中的一个报文处理完毕之后,将该处理完毕的报文对应的线程号写 入 pkt— out— queue 中, 并将该处理完毕的报文对应的线程号和取指地址从 work— queue中删除,则 work— queue中存储的线程号的总数小于 8,相应的, 由于该 ·艮文处理完毕, 为该处理完毕的 4艮文分配的流水线资源由于其处理 过程执行完毕, 置为空闲状态, 被再次用于处理其他的报文。  In the process of processing the four-stage pipeline in the eight-stage pipeline, each level of the eight-stage pipeline corresponds to one thread, and each thread processes one message. Therefore, the ME pipeline can process eight messages at the same time, in eight messages. After processing a message, the thread number corresponding to the processed message is written into the pkt_out_queue, and the thread number and the fetch address corresponding to the processed message are deleted from the work_queue. The total number of thread numbers stored in the work_queue is less than 8. Correspondingly, since the processing of the processing is completed, the pipeline resource allocated for the processed 4艮 text is set to the idle state because the processing of the processing is completed. Used again to process other messages.
在报文处理完毕后, ME将该报文的线程号以及该线程号对应的线程幹 放; 这里, 释放的线程号与其对应的线程被再次分配给 ME在之后接收的 报文。 After the packet is processed, the ME threads the thread number of the packet and the thread corresponding to the thread number. Here, the released thread number and its corresponding thread are again assigned to the message that the ME receives later.
步驟 102, ME根据所分配的线程将所述报文存储在具有双读写端口的 报文存储器中;  Step 102: The ME stores the packet in a message storage having dual read/write ports according to the allocated thread.
这里, 在步驟 101 中 ME为接收到的报文分配线程号之后, 为报文分 配了相应的线程, 则 ME根据所分配的线程将接收的报文先进行存储, 将 报文存储在具有双读端口的报文存储器中。  Here, after the ME allocates the thread number to the received message in step 101, the ME allocates the corresponding thread to the message, and the ME stores the received message according to the allocated thread, and stores the message in the double. Read the port's message memory.
在实际应用中, 报文存储器为具有双读写端口的随机存储器(RAM, Random Access Memory )。  In practical applications, the message memory is a random access memory (RAM) with dual read/write ports.
步驟 103 , ME采用八级流水线的方式控制所分配的线程对存储在所述 报文存储器中的报文进行处理;  Step 103: The ME controls, by using an eight-stage pipeline, the allocated thread to process the packet stored in the message storage.
具体的, 当报文的线程号与对应的取指地址写入 work— queue 时, ME 采用八级流水线的方式控制在步驟 101 中分配的线程对存储在 4艮文存储器 中的报文进行处理。  Specifically, when the thread number of the packet and the corresponding fetch address are written into the work_queue, the ME uses an eight-stage pipeline to control the thread allocated in step 101 to process the message stored in the 4-dimensional file memory. .
这里, 八级流水线支持八个线程同时工作, 八级流水线中每一级对应 一个线程, 其中,  Here, the eight-stage pipeline supports eight threads working at the same time, and each of the eight-stage pipelines corresponds to one thread, wherein
第一级, 线程根据报文的取指地址发送报文指令的获取请求; 第二级, 线程接收报文指令;  The first level, the thread sends a request for acquiring the message instruction according to the fetch address of the message; the second level, the thread receives the message instruction;
第三级, 线程解析报文指令并获取报文指令的源操作数;  The third level, the thread parses the message instruction and obtains the source operand of the message instruction;
第四级, 线程对源操作数进行位调整;  In the fourth level, the thread performs bit adjustment on the source operand;
第五级, 线程根据所述位调整后的源操作数执行报文指令对应的算术 运算和对应的存储地址的计算;  In the fifth stage, the thread performs an arithmetic operation corresponding to the message instruction and a calculation of the corresponding storage address according to the bit-adjusted source operand;
第六级, 线程根据所述存储地址发出读写操作请求;  In the sixth level, the thread issues a read/write operation request according to the storage address;
第七级, 线程获得所述操作请求的响应;  In the seventh level, the thread obtains a response to the operation request;
第八级, 线程将所述算术运算的结果或所述操作请求的响应作为 ·艮文 指令的处理结果回写; In the eighth stage, the thread takes the result of the arithmetic operation or the response of the operation request as a text The processing result of the instruction is written back;
其中, 在第八级之后, 确定报文不需要查表且包括有未处理的报文指 令时, 根据所述报文的线程号返回第一级处理所述未处理的报文指令, 直 至所述报文的报文指令全部处理完成。  After the eighth level, after determining that the message does not need to be looked up and includes an unprocessed message instruction, returning to the first level to process the unprocessed message instruction according to the thread number of the message, until The message instructions of the message are all processed.
针对一个报文而言, 对其一条报文指令的处理需要依次经过第一级至 第八级的处理, 一般情况下, 一个报文包括多条报文指令, 因此, 在第一 级-第八级执行完成后, 需要判断该报文是否需要查表, 并确定是否包括有 未处理的报文指令, 当确定该报文中不需要查表且包括未处理的报文指令 时, 将该 4艮文对应的线程号以及该线程号对应的取指地址继续存储在 work— queue 中, 并由该线程号对应的线程通过八级流水线继续对未处理的 报文指令进行处理; 当确定报文需要查表时, 将该报文对应的线程号写入 到 srh— queue中, 将该报文对应的线程挂起, 直至 ME收到该报文的查表响 应, 由该线程号对应的线程继续对该 4艮文的 4艮文指令进行处理; 当 4艮文处 理完毕, 即不需要查表且不存在未处理的报文指令时, 将该报文对应的线 程号写入到 pkt— out— queue中。  For a message, the processing of one message instruction needs to be processed through the first level to the eighth level. In general, one message includes multiple message instructions, therefore, in the first level - the first After the eight-level execution is completed, it is necessary to determine whether the message needs to be looked up, and determine whether an unprocessed message instruction is included. When it is determined that the message does not need to be looked up and includes an unprocessed message instruction, The thread number corresponding to the 艮文 and the fetch address corresponding to the thread number are continuously stored in the work_queue, and the thread corresponding to the thread number continues to process the unprocessed message instruction through the eight-stage pipeline; When the file needs to be searched, the thread number corresponding to the packet is written into the srh_queue, and the thread corresponding to the packet is suspended until the ME receives the table response of the packet, and the thread number corresponds to The thread continues to process the 4艮 text instruction of the 4艮文; when the 4艮 file is processed, ie, there is no need to look up the table and there is no unprocessed message instruction, the thread number corresponding to the message is written to the pk T_out_ queue.
在实际应用中,八级流水线的第六级在向双读写端口的 RAM发送读写 操作请求时, 只使用两个读写端口中的一个, 另一个读写端口在 ME接收 到报文加载或获取请求时, 用于 ME中报文的接收和发出。  In practical applications, the sixth stage of the eight-stage pipeline uses only one of the two read/write ports when sending a read/write operation request to the RAM of the dual read/write port, and the other read/write port receives the message load on the ME. Or when receiving a request, it is used for receiving and sending messages in the ME.
实施例二  Embodiment 2
在实施例二中, ME通过以下 5个队列完成报文的管理和线程的调度: 空闲队列 free— queue, 待处理队列 rdy— queue, 操作队列 work— queue, 查表队列 srh— queue, 才艮文输出队列 pkt— out— queue; 其中, free— queue为新 进入 ME的报文分配线程号; rdy— queue存储待处理的报文的线程号和取指 地址, 其中, rdy— queue 最多可存储 8 个待处理的线程号与取指地址; work— queue存储正在处理的报文的线程号和取指地址; srh— queue存储需要 查表的报文的线程号和取指地址, pkt— out— queue存储处理完毕后需要发出 的报文的线程号和取指地址; 其中, 报文在进入 ME时, 自身携带取指地 址。 In the second embodiment, the ME completes packet management and thread scheduling through the following five queues: idle queue free_queue, pending queue rdy_queue, operation queue work_queue, table lookup queue srh_queue, 艮The output queue pkt_out_queue; wherein, the free_queue allocates a thread number for the newly entered ME message; the rdy_queue stores the thread number and the fetch address of the to-be-processed message, where rdy_queue can be stored at most 8 pending thread numbers and fetch addresses; work—queue stores the thread number and fetch address of the message being processed; srh—queue storage needs The thread number and the fetch address of the packet of the table are checked. The pkt_out_queue stores the thread number and the fetching address of the packet to be sent after the processing is completed. When the packet enters the ME, the packet carries the fetching address.
图 2为本发明实施例二提供的一种 ME处理 4艮文的方法, 如图 2所示, 包括以下步驟:  FIG. 2 is a schematic diagram of a ME processing method according to Embodiment 2 of the present invention. As shown in FIG. 2, the method includes the following steps:
步驟 201 , ME由 free— queue为 4艮文分配线程号;  Step 201: The ME allocates a thread number by using a free_queue;
具体的 , ME接收到新的报文时 , 由 free— queue为报文分配线程号 , 其 中, free— queue的深度为 16, free— queue的深度决定了可分配线程号的数量, 也就是说, free— queue为报文分配的线程号最多为 16; ME中可用于处理报 文的线程与 free— queue可为报文分配的线程号的数量一致, 也为 16个; 这 里, free— queue的深度 16是根据综合性能测试结果和资源的成本计算得出 的。  Specifically, when the ME receives a new packet, the free_queue allocates a thread number to the packet, where the depth of the free_queue is 16, and the depth of the free_queue determines the number of assignable thread numbers, that is, The free-queue allocates a thread number of up to 16 for the packet; the thread that can be used to process the message in the ME and the free-queue can match the number of thread numbers allocated to the packet, which is also 16; here, free_queue The depth of 16 is calculated based on the results of the comprehensive performance test and the cost of the resource.
free— queue以先进先出的方式为接收的才艮文分配线程号, free— queue的 深度为 16, 16个线程号可为 0-15; 当接收到报文时, free— queue将排在队 列最前面的线程号分配给接收到的报文。 当报文完成处理完而输出后, 则 将完成处理的 4艮文对应的线程号幹放, 并将该幹放的线程号存储至 free— queue, free— queue将释放的线程号放在队列的最前面, 以待再接收到 新的报文时, 将该幹放的线程号再次分配给新的报文。 在 ME上电或复位 启动时, free— queue的 16个线程号都处于未分配状态, 从排在队列最前面 的线程号开始为接收的报文分配线程号, 并采用先进先出的方式为之后接 收的 ·艮文分配线程号。  The free-queue allocates the thread number for the received packets in a first-in, first-out manner. The depth of the free-queue is 16, and the 16 thread numbers can be 0-15. When receiving the message, the free-queue will be ranked. The thread number at the front of the queue is assigned to the received message. After the message is processed and output, the thread number corresponding to the processed 4艮 text is dry, and the dry thread number is stored in the free_queue, and the free_queue puts the released thread number in the queue. At the forefront, when the new message is received again, the dry thread number is again assigned to the new message. When the ME is powered on or reset, the 16 thread numbers of the free-queue are unallocated. The thread number is assigned to the received message from the thread number at the top of the queue, and the first-in-first-out method is adopted. After that, the received thread number is assigned.
ME通过给报文分配线程号,使报文与报文之间相互独立, 并通过线程 号将报文与 ME分配给报文的存储资源建立对应的映射关系。  The ME allocates a thread number to the packet, so that the packet and the packet are independent of each other, and the mapping relationship between the packet and the ME is allocated to the storage resource of the packet.
步驟 202, ME为报文分配的线程号与对应的取指地址写入 rdy— queue; 这里, 在为报文分配线程号之后, free— queue将分配的线程号和取指地 址写到 rdy— queue, 此时, 进入 rdy— queue的线程号对应的报文为待处理的 报文; rdy— queue中存储待处理的报文的线程号与取指地址, 并最多存储 8 个待处理的 4艮文的线程号和取指地址。 Step 202: The ME allocates the thread number assigned to the packet and the corresponding fetch address to the rdy_queue; here, after assigning the thread number to the packet, the free_queue will allocate the thread number and the fetching ground. The address is written to rdy_queue. At this time, the packet corresponding to the thread number of the rdy_queue is the pending packet; the rdy_queue stores the thread number and the fetch address of the pending packet, and stores up to 8 The thread number and the fetch address of the 4 files to be processed.
步驟 203 , 当具有空闲的流水线资源时, ME从 rdy— queue调度一个待 处理的 4艮文的线程号与其对应的取指地址写入 work— queue;  Step 203: When there is an idle pipeline resource, the ME schedules a thread number of the 4th file to be processed from the rdy_queue and its corresponding fetch address to be written into the work_queue;
具体的, 当 ME中具有空闲的流水线资源时 , ME从 rdy— queue中调度 一个待处理的 4艮文的线程号与该线程号对应的取指地址写入到 work— queue, 此时, 由为该 4艮文的分配的线程号对应的线程通过流水线中 空闲的流水线资源对该报文进行处理;其中, work— queue中存储的都是 ME 正在处理的报文的线程号。 当一个报文需要查表或处理完毕之后, ME从 work— queue 中删除该报文对应的线程号与其对应的取指地址, 将该报文对 应的线程号与其对应的取指地址写入到 srh— queue或 pkt— out— queue中, 从 而使得 work— queue允许 rdy— queue中的待处理的 4艮文的线程号与该线程号 对应的取指地址进入到 work— queue中。  Specifically, when the ME has an idle pipeline resource, the ME schedules a thread number of the 4th file to be processed from the rdy_queue and the fetch address corresponding to the thread number to be written to the work_queue, at this time, The thread corresponding to the thread number assigned to the file is processed by the idle pipeline resource in the pipeline; wherein the work_queue stores the thread number of the message that the ME is processing. After a packet needs to be looked up or processed, the ME deletes the thread number corresponding to the packet from the work_queue and its corresponding fetch address, and writes the thread number corresponding to the packet with its corresponding fetch address. Srh_queue or pkt_out_queue, so that the work_queue allows the thread number of the 4th file to be processed in the rdy_queue to enter the work-queue with the fetch address corresponding to the thread number.
这里, 内核中线程对 4艮文的处理的采用八级流水线结构, 八级流水线 根据 work— queue中存储的线程号和该线程号对应的取指地址完成报文中报 文指令的提取, 报文指令提取完毕之后, 解析报文指令, 完成该报文指令 的源操作数的提取。在提取源操作数之后, ME中的逻辑计算单元根据报文 指令的要求完成对源操作数移位, 拼接, 加减等计算, 将计算结果写入到 目的寄存器或者存储器中。 ·艮文经过八级流水线的处理之后, 具有以下三 种情况:  Here, the processing of the thread in the kernel adopts an eight-stage pipeline structure, and the eight-stage pipeline completes the extraction of the message instruction in the message according to the thread number stored in the work_queue and the fetch address corresponding to the thread number. After the text instruction is extracted, the message instruction is parsed, and the source operand of the message instruction is extracted. After extracting the source operand, the logic calculation unit in the ME completes the calculation of the source operand shift, splicing, addition and subtraction according to the requirements of the message instruction, and writes the calculation result into the destination register or the memory. · After the processing of the eight-stage pipeline, there are three cases:
第一、 当报文不需要查表且包括有未处理的报文指令时, 该报文未处 理完毕, 由八级流水线继续对该报文的下一条报文指令进行处 理;  First, when the message does not need to be looked up and includes an unprocessed message instruction, the message is not processed, and the next message instruction of the message is processed by the eight-stage pipeline;
第二、 当报文需要查表时, 该报文未处理完毕, 执行步驟 204; 第三、 当 ·艮文处理完毕时, 执行步驟 205。 Second, when the message needs to be looked up, the message is not processed, and step 204 is performed; Third, when the processing is completed, step 205 is performed.
步驟 204, ME将需要查表的报文的线程号与其对应的取指地址写入 srh— queue;  Step 204: The ME writes the thread number of the packet that needs to be looked up and its corresponding fetch address to the srh_queue;
具体的, 当报文需要查表时, 将报文的线程号和该线程号对应的取指 地址写入 srh— queue中, 等待查表; 此时, 该报文未处理完毕, 将其线程号 写入 srh— queue , 在 ME将该报文发送出去进行查表时, 发出去的查表消息 携带该报文的线程号, 因此, 该报文的线程号仍被占用; 同时, 用于处理 该才艮文的线程被挂起, 等待查表响应的返回; 由于该查表的 4艮文对应的线 程被挂起, 则八级流水线具有空闲的流水线资源, 空闲的流水线资源对 rdy— queue中存储的线程号所对应的待处理的 4艮文进行处理。  Specifically, when the packet needs to be looked up, the thread number of the packet and the fetch address corresponding to the thread number are written into the srh_queue, waiting to be checked; at this time, the packet is not processed, and the thread is threaded. The number is written to the srh_queue. When the ME sends the packet out for table lookup, the sent lookup table message carries the thread number of the packet. Therefore, the thread number of the packet is still occupied. The thread that handles the essay is suspended, waiting for the return of the table lookup response; since the thread corresponding to the 4 艮 table of the table is suspended, the octet pipeline has idle pipeline resources, and the idle pipeline resource rdy— The 4th file to be processed corresponding to the thread number stored in the queue is processed.
当 ME接收到查表响应时, 查表响应中携带该查表报文的线程号, 通 过该线程号在该线程对应的查表指令存储器 srh_pc— ram中提取该报文的取 指地址; 在提取指令地址后返回步驟 202; 其中, 在接收到查表响应时与接 收到新报文时的两种情况下的步驟 202的唯一区别是: 当 ME接收到报文 的查表响应时, 直接将该报文的线程号与提取的取指地址写入 rdy— queue, 不再由 free— queue分配线程号; 当 ME接收到新报文时, 由 free— queue将为 该才艮文分配的线程号与取指地址写入 rdy— queue。  When the ME receives the lookup table response, the lookup table response carries the thread number of the lookup table message, and extracts the fetch address of the message in the table lookup instruction memory srh_pc_ram corresponding to the thread by the thread number; After the instruction address is extracted, the process returns to step 202. The only difference between the two steps in the case of receiving the new message is that the ME receives the response of the message directly. Write the thread number of the packet and the extracted fetch address to the rdy_queue, and no longer assign the thread number by the free-queue; when the ME receives the new packet, the free-queue will allocate the packet for the packet. The thread number and the fetch address are written to rdy_queue.
步驟 205 , ME 将处理完毕的报文的线程号与对应的取指地址写入 pkt— out— queue;  Step 205: The ME writes the thread number of the processed packet and the corresponding fetch address to the pkt_out_queue;
具体的, 当报文处理完毕时, 将报文的线程号和该线程号对应的取指 地址写入 pkt— out— queue中, 等待被输出; 此时, 该报文虽然已处理完毕, 但其线程号被写入到 pkt— out— queue 中, 该 4艮文的线程号仍然被使用; 在 ME将该报文发送出去时, 该报文的线程号被幹放,使得该线程号可被分配 给新的 4艮文, 从而不占用 ME中的可分配线程号资源。  Specifically, when the packet processing is completed, the thread number of the packet and the fetch address corresponding to the thread number are written into the pkt_out_queue, waiting to be output; at this time, although the packet has been processed, The thread number is written to the pkt_out_queue, and the thread number of the 4艮 text is still used; when the ME sends the packet, the thread number of the packet is dried, so that the thread number can be Is assigned to the new 4 艮 text, never occupying the assignable thread number resource in the ME.
其中, 在步驟 205幹放的线程号以队列的方式被写到 free— queue中, 用于再次被分配。 The thread number that is dry in step 205 is written to the free_queue in a queue manner. Used to be assigned again.
需要说明的是,当 rdy— queue中存储的待处理的报对应的线程号的总数 小于 8时, 在 ME中自动生成空报文, 保持 rdy— queue与 work— queue中存 储的线程号的数量为 8, 并使得 ME的八级流水线同时处理 8个报文,从而 使得八级流水线的各级能够正常执行。 这里, 对空报文的处理结果为步驟 205。  It should be noted that when the total number of thread numbers corresponding to the pending report stored in the rdy_queue is less than 8, an empty message is automatically generated in the ME, and the number of thread numbers stored in the rdy_queue and work_queue is maintained. It is 8, and makes the ME's eight-stage pipeline process 8 messages at the same time, so that all levels of the eight-stage pipeline can be executed normally. Here, the processing result of the null message is step 205.
ME在步驟 201之后,根据步驟 201中为报文分配的线程号,也就是根 据为报文所分配的线程将报文保存在具有双读写端口的报文存储器 pkt— ram中; 这里, pkt— ram具有双读写端口, 使得八级流水线应用双读写 端口对存储在 pkt— ram中的报文进行处理。  After step 201, the ME saves the message according to the thread number allocated for the message in step 201, that is, according to the thread allocated for the message, in the message memory pkt_ram with dual read/write ports; here, pkt — The ram has dual read/write ports, which enable the eight-stage pipeline to apply dual read/write ports to process messages stored in the pkt-ram.
当 ME的八级流水线正在处理才艮文指令时,需要访问 pkt— ram,而此时, 写入 pkt— out— queue 中等待被输出的才艮文也存储在 pkt— ram 中, 当才艮文从 ME输出时, 也需要访问 pkt— ram, 由此, 造成了 pkt— ram的读写冲突。 为 了避免这个冲突, 需要暂停流水线的处理或者需要输出的报文的获取, 使 得 ME的性能降低, 本发明实施例三中, 通过双读写端口的 pkt— ram, 使得 因抢占 pkt— ram的端口而导致流水线暂停的情况不会发生,流水线能全速运 行, 提高了 ME处理报文的性能。  When the ME's eight-stage pipeline is processing the instructions, you need to access pkt_ram. At this time, the writes waiting for the output in the pkt_out_queue are also stored in the pkt_ram. When the text is output from the ME, it also needs to access pkt_ram, which causes the read and write conflict of pkt-ram. In order to avoid this conflict, it is necessary to suspend the processing of the pipeline or the acquisition of the packets to be output, so that the performance of the ME is degraded. In the third embodiment of the present invention, the port of the pkt-ram is preempted by the pkt-ram of the dual read-write port. As a result, the pipeline suspension will not occur, and the pipeline can run at full speed, improving the performance of ME processing messages.
图 3为步驟 203中的 ME流水线处理一个 4艮文的工作流程示意图, 如 图 3所示, 流水线对 4艮文的处理过程如下:  FIG. 3 is a schematic diagram of the workflow of the ME pipeline processing in step 203. As shown in FIG. 3, the processing procedure of the pipeline is as follows:
当报文的线程号与报文自身携带的取指地址进入 work— queue, 流水线 中空闲的流水线资源对"¾文进行处理。 流水线最多可同时运行 8个线程, 即可同时处理 8个才艮文。  When the thread number of the packet and the fetch address carried by the packet enter the work-queue, the idle pipeline resource in the pipeline processes the "3⁄4 text. The pipeline can run up to 8 threads at the same time, and 8 ports can be processed at the same time. Text.
报文首先进入流水线第一级,指令获取 1 ( IF1 , Instruction Fetch 1 )级, 对才艮文进行处理的线程^ ^据该 4艮文存储在 work— queue 中的取指地址 thread_pc发送报文指令的获取请求; 其中, 线程将报文指令的获取请求发 送至用于存储报文指令的指令存储器 instrmem中; 这里, instrmem为与存 储报文的 RAM相互独立的 RAM, 使得指令的读写访问速度高, 延迟小, 不会出现访问不命中的情况。 The message first enters the first stage of the pipeline, and the instruction acquires 1 (IF1, Instruction Fetch 1) level, and the thread that processes the data is sent according to the instruction address thread_pc stored in the work_queue. An acquisition request of the instruction; wherein, the thread sends the request for obtaining the message instruction It is sent to the instruction memory instrmem for storing the message instruction; here, instrmem is a RAM independent of the RAM storing the message, so that the read/write access speed of the instruction is high, the delay is small, and the access failure does not occur.
第二级, 指令获取 2 ( IF2, Instruction Fetch 2 )级, 从 instrmem中接 收报文指令, 将接收的报文指令保存在用于保存获取的报文指令的指令寄 存器 if— instr中。  The second level, the instruction acquires 2 (IF2, Instruction Fetch 2) level, receives the message instruction from instrmem, and stores the received message instruction in the instruction register if-instr for saving the obtained message instruction.
第三级, 指令译码 ( ID, Instruction Decode ) 级, 对 if_instr中接收的 报文指令解析, 进行译码, 产生寄存器文件(RF, register file )读命令和读 地址, 从 RF获取用于执行报文指令的各执行单元需要的源操作数,; 这里, ME为流水线中的每个线程分配对应的 RF,以存储与每个线程相关的数据。  The third stage, instruction decoding (ID, Instruction Decode) level, parses the message instruction received in if_instr, decodes it, generates register file (RF, register file) read command and read address, and obtains from RF for execution. The source operand required by each execution unit of the message instruction; Here, the ME allocates a corresponding RF for each thread in the pipeline to store data associated with each thread.
第四级, 执行 1 ( EX1 , execute 1 ) 级, 对源操作数进行位调整, 由于 ME支持的操作类型较多 ,比如,逻辑计算单元 (ALU, Arithmetic Logical Unit) 类计算, 则需要将源操作数的数值进行对齐, 为第五级的运算做准备。 这 一级主要是保证执行报文指令的算术运算单元不需要对源操作数进行计 算, 只需要根据获取的操作数和操作类型进行位调整以提高时序。  The fourth level, the execution of 1 (EX1, execute 1) level, the bit adjustment of the source operand, because the ME supports more types of operations, such as the logical unit (ALU, Arithmetic Logical Unit) class calculation, you need to source The values of the operands are aligned to prepare for the fifth level of operation. This level is mainly to ensure that the arithmetic unit that executes the message instruction does not need to calculate the source operand, and only needs to adjust the bit according to the acquired operand and operation type to improve the timing.
第五级, 执行 2 ( EX2, execute 2 )级, 通过 ALU对第四级中位调整 之后的源操作数进行计算, 以执行报文指令对应的算术运算和对应的存储 地址的计算; 此部分为纯组合逻辑, 根据源操作数完成报文指令对应的算 术运算和存储器地址的计算。  The fifth level, executing 2 (EX2, execute 2) level, calculating the source operand after the fourth level median adjustment by the ALU, to perform the arithmetic operation corresponding to the message instruction and the calculation of the corresponding storage address; For pure combinatorial logic, the arithmetic operation corresponding to the message instruction and the calculation of the memory address are completed according to the source operand.
第六级, 内存访问 1 ( MAI , Memory Access 1 )级, 这里的操作请求 与报文指令对应, 当报文指令为算术运算时, 将算术运算结果写到结果 ( result )单元中; 当报文指令为存储地址的运算时, 通过 pkt— ram的其中 一个读写端口向 pkt— ram发出读写操作请求。  The sixth level, memory access 1 (MAI, Memory Access 1) level, where the operation request corresponds to the message instruction, when the message instruction is an arithmetic operation, the arithmetic operation result is written into the result (result) unit; When the text instruction is an operation for storing an address, a read/write operation request is issued to the pkt_ram through one of the read and write ports of the pkt_ram.
第七级, 内存访问 2 ( MA2, Memory Access2 )级, 获得读写操作请求 从 pkt— ram读出的数据, 同时, 将 result单元和从 pkt— ram读出的数据作为 流水线的输出发送至数据判断单元 Wb— mux, 以在第八级回写之前根据流 水线的输出判断经过流水线处理之后的 ·艮文具体为步驟 203 中三种情况的 哪一种, 流水线的输出也就是本次报文指令的处理结果。 The seventh level, memory access 2 (MA2, Memory Access2) level, obtains the read and write operation request data read from pkt-ram, and at the same time, the result unit and the data read from pkt-ram are used as The output of the pipeline is sent to the data judging unit Wb_mux to judge which one of the three cases in step 203 is after the pipeline processing according to the output of the pipeline before the eighth-level write back, and the output of the pipeline is also This is the result of the processing of this message.
第八级, 回写 (WB, Write Back )级, 将经过 Wb— mux判断的流水线 输出回写到 RF中, 使才艮文指令的处理结果生效。  The eighth level, write back (WB, Write Back) level, writes the output of the pipeline judged by Wb- mux back to the RF, so that the processing result of the command is valid.
其中, IF1级与 IF2级从 instrmem提取报文指令, ID级解析报文指令, EX1级完成报文指令的源操作数的提取, EX2级由逻辑计算单元根据报文 指令的要求完成对源操作数移位, 拼接, 加减等计算, MA1级、 MA2级、 WB级将报文指令的结果写入到报文指令的 RF中, 经过上述流水线的八级 的操作执行, 完成一条 4艮文指令的处理。  Among them, the IF1 level and the IF2 level extract the message instruction from the instrmem, the ID level parsing the message instruction, the EX1 level completes the extraction of the source operand of the message instruction, and the EX2 level is completed by the logic calculation unit according to the requirements of the message instruction. The number shift, splicing, addition and subtraction calculations, MA1 level, MA2 level, WB level write the result of the message instruction to the RF of the message instruction, and execute the eight-level operation of the above-mentioned pipeline to complete a 4 艮 text Processing of instructions.
在实际应用中, 在 IF1 级发出报文指令获取请求时, 当前取指地址对 应的 4艮文指令将被提取; 在 4艮文指令提取后, 取指地址也相应的变化, 向 后偏移一位, 以在获取下一条报文指令时能够正确提取报文指令。  In the actual application, when the IF1 level sends a message instruction acquisition request, the 4 艮 text instruction corresponding to the current fetching address will be extracted; after the 4 艮 text instruction is fetched, the fetching address also changes correspondingly, and the backward offset One bit, in order to correctly extract the message instruction when acquiring the next message instruction.
如图 4所示, 后面每一个报文都按顺序依次进入流水线的每一级; 流 水线的每一级对应一个线程, 流水线支持 8个线程同时工作。 第一个报文 在 IF1级取报文指令, 在 WB级完成处理结果的回写, 完成一条报文指令 的处理,后面的每一个 4艮文依次落后于前一个 4艮文一级流水线。 8级流水线, 在同一时间, 每一级流水执行不同的操作, 完成该级对应的功能。 当 8个 线程同时工作时,每一个线程按顺序工作在不同的流水级。 例如: T1时刻, ThreadO线程工作在 IF1级; T2时刻, ThreadO线程工作在 IF2级, Threadl 线程工作在 IF1级; T3时刻, ThreadO线程工作在 ID级, Threadl线程工 作在 IF2级, Thread2线程工作在 IF1级, 依次, 当 T8时刻时, ThreadO线 程工作在 WB级, Threadl工作在 MA2级, Thread2线程工作在 MA1级, Thread3工作在 EX2级, Thread4工作在 EX1 级, Thread2线程工作在 ID 级, Thread3工作在 IF2级, Thread7工作在 IF1级。 一个报文走到 WB级时, 就代表完成了该报文一条指令的处理; 此时, 报文不需要查表且当前报文指令不是该报文的最后一条报文指令, 则 ME 将该报文的线程号和取指地址继续存储在 work— queue中, 并通过处理该报 文的线程处理该报文的下一条报文指令。 As shown in Figure 4, each subsequent message enters each stage of the pipeline in sequence; each stage of the pipeline corresponds to one thread, and the pipeline supports 8 threads to work simultaneously. The first message fetches the message instruction at the IF1 level, completes the write back of the processing result at the WB level, and completes the processing of a message instruction. Each subsequent 4艮 text lags behind the previous 4艮 first stage pipeline. The 8-stage pipeline, at the same time, performs different operations for each level of water to complete the corresponding functions of the level. When eight threads work at the same time, each thread works in a different pipeline level in sequence. For example: At time T1, ThreadO thread works at IF1 level; at time T2, ThreadO thread works at IF2 level, Threadl thread works at IF1 level; At time T3, ThreadO thread works at ID level, Threadl thread works at IF2 level, Thread2 thread works at IF1 level, in turn, when T8 time, ThreadO thread works at WB level, Threadl works at MA2 level, Thread2 thread works at MA1 level, Thread3 works at EX2 level, Thread4 works at EX1 level, Thread2 thread works at ID level, Thread3 Working at IF2 level, Thread7 works at IF1 level. When a message arrives at the WB level, it represents the completion of processing an instruction of the message; at this time, the message does not need to look up the table and the current message instruction is not the last message instruction of the message, then the ME will The thread number and the fetch address of the message continue to be stored in the work_queue, and the next message instruction of the message is processed by the thread that processes the message.
对于连续两条相关的 ALU报文指令,也就是前一条报文指令计算的结 果是后一条报文指令的源操作数, 前一条报文指令在 WB级回写的处理结 果在 RF的回写生效, 下一条指令需要在 ID级从 RF获取该处理结果作为 源操作数, 其中, 处理结果的回写与被获取中间隔了 5个周期, 也就是说, 下一条报文指令需要五个周期后才能使用上一条报文指令的处理结果, 否 则会产生数据冒险。 而流水线级数是 8级, 每一级对应一个线程, 一个线 程执行完一条 4艮文指令, 到执行下一条 4艮文指令中间间隔 8个周期, 该 8 个周期大于 5个周期, 避免了数据冒险的产生。  For two consecutive ALU message instructions, that is, the result of the calculation of the previous message instruction is the source operand of the latter message instruction, and the processing result of the previous message instruction at the WB level is written back in RF. Effective, the next instruction needs to obtain the processing result from the RF as the source operand at the ID level, where the writeback of the processing result is separated by 5 cycles, that is, the next message instruction requires five cycles. The result of the previous message instruction can only be used, otherwise a data adventure will occur. The number of pipeline stages is 8 levels. Each level corresponds to one thread. One thread executes a 4 艮 instruction, and the next 4 艮 instruction is interrupted by 8 cycles. The 8 cycles are greater than 5 cycles, which avoids The generation of data adventures.
对于线程 ThreadO来说, ThreadO执行的第一条报文指令在 T1时刻进 入到流水线, 需要经过 8个周期第一条报文指令才会执行到 WB级。 这时 ThreadO的第二条报文指令才会进入到流水线,即对于 ThreadO而言, ThreadO 所执行的前后两条报文指令进入流水线要间隔 8个周期。 而对于八级流水 线来说, WB级是完成 RF写入的操作, ID级是完成 RF读取的操作, 这两 级之间相隔 5个周期, 后一条指令此时未执行到 ID级, 避免了数据冒险的 产生。  For the thread ThreadO, the first message command executed by ThreadO enters the pipeline at time T1, and it takes 8 cycles of the first message instruction to execute to the WB level. At this time, the second message instruction of ThreadO will enter the pipeline. That is, for ThreadO, the two message instructions executed by ThreadO are separated into 8 pipelines. For the eight-stage pipeline, the WB level is the operation of completing the RF write, and the ID level is the operation of completing the RF read. The two stages are separated by 5 cycles, and the latter instruction is not executed to the ID level at this time, avoiding The generation of data adventures.
实施例三  Embodiment 3
图 5为本发明实施例三提供的一种 ME的结构示意图, 如图 3所示, ME50包括线程管理模块 51、 具有双读写端口的报文存储模块 52、 以及内 核模块 53;  5 is a schematic structural diagram of an ME according to Embodiment 3 of the present invention. As shown in FIG. 3, the ME50 includes a thread management module 51, a message storage module 52 having dual read/write ports, and an internal core module 53;
线程管理模块 51可由中央处理器(CPU, Central Processing Unit ) 与 存储芯片配合实现, 配置为通过至少五个线程管理队列对接收到的报文进 行线程分配; The thread management module 51 can be implemented by a central processing unit (CPU) and a memory chip, and configured to receive the received message through at least five thread management queues. Row thread allocation;
具体的, 以五个线程管理队列为例, 线程管理模块 51可以通过空闲队 列 free— queue 以先进先出的方式为 4艮文分配线程号, 并将分配的线程号和 报文自身携带的取指地址写到待处理队列 rdy— queue, 当有空闲的流水线资 源时,从 rdy— queue中调度一个待处理的报文的线程号和所述线程号对应的 取指地址写到操作队列 work— queue, work— queue中存储的都是正在处理的 报文的线程号和取指地址, 当一个报文需要查表时, 将所述报文的线程号 和取指地址写入到查表队列 srh— queue中, 当一个报文处理完毕时, 将所述 报文的线程号和取指地址写入到报文输出队列 pkt— out— queue中; 其中, 当 一个报文需要查表或处理完毕时, 从 work— queue中删除所述报文对应的线 程号和取指地址。  Specifically, taking the five thread management queue as an example, the thread management module 51 can allocate the thread number in the first-in first-out manner through the idle queue free-queue, and the assigned thread number and the packet itself are carried. The address is written to the queue to be processed, rdy_queue. When there is an idle pipeline resource, the thread number of a pending message from the rdy_queue and the fetch address corresponding to the thread number are written to the operation queue work. Queue, work—The queue is the thread number and the fetch address of the packet being processed. When a packet needs to be looked up, the thread number and the fetch address of the packet are written to the lookup table queue. In the srh_queue, when a packet is processed, the thread number and the fetch address of the packet are written into the packet output queue pkt_out_queue; wherein, when a packet needs to be looked up or processed When finished, the thread number and the fetch address corresponding to the packet are deleted from the work_queue.
报文存储模块 52可由 RAM实现, 配置为根据所分配的线程存储所述 报文。  The message storage module 52 can be implemented by RAM and configured to store the message in accordance with the assigned thread.
内核模块 53可由 CPU与信号处理芯片配合实现, 配置为采用八级流 水线的方式控制所分配的线程对 ·艮文进行处理;  The kernel module 53 can be implemented by the CPU and the signal processing chip, and is configured to control the allocated thread pair by using an eight-stage pipeline method;
具体的, 内核模块 53配置为将八级流水线中每一级对应一个线程, 其 中,  Specifically, the kernel module 53 is configured to correspond to one thread of each of the eight stages of pipelines, wherein
第一级, 线程根据报文的取指地址发送报文指令的获取请求; 第二级, 线程接收报文指令;  The first level, the thread sends a request for acquiring the message instruction according to the fetch address of the message; the second level, the thread receives the message instruction;
第三级, 线程解析报文指令并获取报文指令的源操作数;  The third level, the thread parses the message instruction and obtains the source operand of the message instruction;
第四级, 线程对源操作数进行位调整;  In the fourth level, the thread performs bit adjustment on the source operand;
第五级, 线程根据所述位调整后的源操作数执行报文指令对应的算术 运算和对应的存储地址的计算;  In the fifth stage, the thread performs an arithmetic operation corresponding to the message instruction and a calculation of the corresponding storage address according to the bit-adjusted source operand;
第六级, 线程根据所述存储地址发出读写操作请求;  In the sixth level, the thread issues a read/write operation request according to the storage address;
第七级, 线程获得所述操作请求的响应; 第八级, 线程将所述算术运算的结果或所述操作请求的响应作为所述 报文指令的处理结果回写; In the seventh level, the thread obtains a response to the operation request; In the eighth stage, the thread writes back the result of the arithmetic operation or the response of the operation request as a processing result of the message instruction;
其中, 在第八级之后, 确定报文不需要查表且包括有未处理的报文指 令时, 根据所述报文的线程号返回第一级处理所述报文中未处理的报文指 令。  After the eighth level, after determining that the message does not need to be looked up and includes an unprocessed message instruction, returning to the first level according to the thread number of the message, processing the unprocessed message instruction in the message. .
线程管理模块 51还配置为在所述 4艮文处理完成后, 将所述 4艮文的线程 号幹放。  The thread management module 51 is further configured to dry the thread number of the message after the processing of the file is completed.
本发明是根据本发明实施例一至实施例三中任一实施例的方法、 设备 (系统)、 和计算机程序产品的流程图和 /或方框图来描述的。 应理解可由 计算机程序指令实现流程图和 /或方框图中的每一流程和 /或方框、 以及 流程图和 /或方框图中的流程和 /或方框的结合。 可提供这些计算机程序 指令到通用计算机、 专用计算机、 嵌入式处理机或其他可编程数据处理设 备的处理器以产生一个机器, 使得通过计算机或其他可编程数据处理设备 的处理器执行的指令产生用于实现在流程图一个流程或多个流程和 /或方 框图一个方框或多个方框中指定的功能的装置。  The present invention is described in terms of a flowchart, and/or block diagram of a method, apparatus (system), and computer program product according to any one of embodiments 1 through 3 of the present invention. It will be understood that each flow and/or block of the flowcharts and/or block diagrams, and combinations of flow and / or blocks in the flowcharts and / or block diagrams can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理 设备以特定方式工作的计算机可读存储器中, 使得存储在该计算机可读存 储器中的指令产生包括指令装置的制造品, 该指令装置实现在流程图一个 流程或多个流程和 /或方框图一个方框或多个方框中指定的功能。  The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备 上, 使得在计算机或其他可编程设备上执行一系列操作步驟以产生计算机 实现的处理, 从而在计算机或其他可编程设备上执行的指令提供用于实现 在流程图一个流程或多个流程和 /或方框图一个方框或多个方框中指定的 功能的步驟。  These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
相应的, 本发明实施例一、 二中任一实施例还提供一种计算机存储介 质, 其中存储有计算机程序, 该计算机程序用于执行本发明实施例一、 二 中任一实施例的 ME处理 4艮文的方法。 Correspondingly, any one of Embodiments 1 and 2 of the present invention further provides a computer storage medium. And a computer program for executing the ME processing method of any one of the embodiments 1 and 2 of the present invention.
以上所述, 仅为本发明的较佳实施例而已, 并非用于限定本发明的保 护范围, 凡在本发明的精神和原则之内所作的任何修改、 等同替换和改进 等, 均应包含在本发明的保护范围之内。  The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included. Within the scope of protection of the present invention.

Claims

权利要求书 claims
1、 一种微引擎(ME )处理报文的方法, 所述方法包括: 1. A method for micro-engine (ME) to process messages. The method includes:
ME通过至少五个线程管理队列对接收到的报文进行线程分配,根据所 分配的线程将所述报文存储在具有双读写端口的报文存储器中, 并采用八 级流水线的方式控制所分配的线程对存储在所述报文存储器中的报文进行 处理。 The ME allocates threads to received messages through at least five thread management queues, stores the messages in a message memory with dual read and write ports according to the assigned threads, and uses an eight-level pipeline to control all messages. The allocated threads process messages stored in the message memory.
2、 根据权利要求 1所述的方法, 其中, 所述 ME通过至少五个线程管 理队列对接收到的报文进行线程分配为: 2. The method according to claim 1, wherein the ME allocates threads to the received messages through at least five thread management queues as:
ME接收到新的报文时,由空闲队列 free— queue以先进先出的方式为报 文分配线程号, 并将分配的线程号和报文自身携带的取指地址写到待处理 队列 rdy— queue, 当 ME中具有空闲的流水线资源时, ME从 rdy— queue中 调度一个待处理的 4艮文的线程号和所述线程号对应的取指地址写到操作队 列 work— queue, work— queue中存储的都是 ME正在处理的报文的线程号和 取指地址, 当一个报文需要查表时, 将所述报文的线程号和取指地址写入 到查表队列 srh— queue中, 当一个报文处理完毕时, 将所述报文的线程号和 取指地址写入到报文输出队列 pkt— out— queue中; 其中, 当一个报文需要查 表或处理完毕时,从 work— queue中删除所述报文对应的线程号和取指地址。 When ME receives a new message, the idle queue free_queue assigns a thread number to the message in a first-in, first-out manner, and writes the assigned thread number and the fetch address carried by the message itself to the pending queue rdy_ queue, when the ME has idle pipeline resources, the ME schedules the thread number of a pending document and the fetch address corresponding to the thread number from the rdy_queue to write to the operation queue work_queue, work_queue. What is stored in is the thread number and fetch address of the message being processed by ME. When a message needs to be looked up in the table, the thread number and fetch address of the message are written into the table lookup queue srh_queue. , when a message is processed, write the thread number and fetch address of the message into the message output queue pkt_out_queue; where, when a message needs to be looked up in the table or is processed, from Delete the thread number and instruction address corresponding to the message in the work-queue.
3、 根据权利要求 2所述的方法, 其中, 所述采用八级流水线的方式控 制所分配的线程对存储在所述报文存储器中的报文进行处理为: 3. The method according to claim 2, wherein the eight-stage pipeline is used to control the allocated threads to process the messages stored in the message memory as:
八级流水线支持八个线程同时工作, 八级流水线中每一级对应一个线 程; 其中, The eight-level pipeline supports eight threads working at the same time, and each level in the eight-level pipeline corresponds to one thread; among them,
第一级, 线程根据报文的取指地址发送报文指令的获取请求; 第二级, 线程接收报文指令; At the first level, the thread sends a request to obtain the message instruction according to the fetch address of the message; at the second level, the thread receives the message instruction;
第三级, 线程解析报文指令并获取报文指令的源操作数; At the third level, the thread parses the message instruction and obtains the source operand of the message instruction;
第四级, 线程对源操作数进行位调整; 第五级, 线程根据所述位调整后的源操作数执行报文指令对应的算术 运算和对应的存储地址的计算; At the fourth level, the thread performs bit adjustments on the source operand; At the fifth level, the thread performs the arithmetic operation corresponding to the message instruction and the calculation of the corresponding storage address based on the bit-adjusted source operand;
第六级, 线程根据所述存储地址发出读写操作请求; At the sixth level, the thread issues read and write operation requests based on the storage address;
第七级, 线程获得所述操作请求的响应; At the seventh level, the thread obtains the response to the operation request;
第八级, 线程将所述算术运算的结果或所述操作请求的响应作为 ·艮文 指令的处理结果回写; At the eighth level, the thread writes back the result of the arithmetic operation or the response to the operation request as the processing result of the instruction;
其中, 在第八级之后, 确定报文不需要查表且包括有未处理的报文指 令时, 根据所述报文的线程号返回第一级处理所述报文中未处理的报文指 令。 Among them, after the eighth level, when it is determined that the message does not require table lookup and contains unprocessed message instructions, return to the first level to process the unprocessed message instructions in the message according to the thread number of the message. .
4、 根据权利要求 3所述的方法, 其中, 该方法还包括: 4. The method according to claim 3, wherein the method further includes:
在所述报文处理完成后, 将所述 文的线程号幹放。 After the message processing is completed, the thread number of the message is stored.
5、 一种 ME, 所述 ME包括: 线程管理模块、 具有双读写端口的 4艮文 存储模块、 内核模块; 其中, 5. An ME, the ME includes: a thread management module, a text storage module with dual read and write ports, and a kernel module; wherein,
所述线程管理模块, 配置为通过至少五个线程管理队列对接收到的报 文进行线程分配; The thread management module is configured to perform thread allocation on received messages through at least five thread management queues;
所述报文存储模块, 配置为根据所分配的线程存储所述报文; 所述内核模块, 配置为采用八级流水线的方式控制所分配的线程对存 储在所述报文存储模块中的报文进行处理。 The message storage module is configured to store the message according to the assigned thread; the kernel module is configured to use an eight-stage pipeline to control the assigned thread to store the message stored in the message storage module. The text is processed.
6、 根据权利要求 5所述的 ME, 其中, 所述线程管理模块, 配置为通 过空闲队列 free— queue 以先进先出的方式为 4艮文分配线程号, 并将分配的 线程号和报文自身携带的取指地址写到待处理队列 rdy— queue, 当有空闲的 流水线资源时,从 rdy— queue中调度一个待处理的报文的线程号和所述线程 号对应的取指地址写到操作队列 work— queue, work— queue中存储的都是正 在处理的报文的线程号和取指地址, 当一个报文需要查表时, 将所述报文 的线程号和取指地址写入到查表队列 srh— queue中,当一个报文处理完毕时, 将所述报文的线程号和取指地址写入到报文输出队列 pkt— out— queue中; 其 中, 当一个报文需要查表或处理完毕时, 从 work— queue中删除所述报文对 应的线程号和取指地址。 6. The ME according to claim 5, wherein the thread management module is configured to allocate thread numbers to messages in a first-in-first-out manner through the free queue, and combine the allocated thread numbers and messages The fetch address carried by itself is written to the queue to be processed rdy_queue. When there are idle pipeline resources, the thread number of a message to be processed is scheduled from rdy_queue and the fetch address corresponding to the thread number is written to Operation queue work_queue, work_queue stores the thread number and fetch address of the message being processed. When a message needs to be looked up in the table, write the thread number and fetch address of the message. Go to the table lookup queue srh_queue. When a message is processed, Write the thread number and fetch address of the message into the message output queue pkt_out_queue; wherein, when a message needs to be looked up in the table or is processed, the message is deleted from the work_queue. The corresponding thread number and instruction address.
7、 根据权利要求 6所述的 ME, 其中, 所述内核模块, 配置为将八级 流水线中每一级对应一个线程; 其中, 7. The ME according to claim 6, wherein the kernel module is configured to correspond to one thread for each level of the eight-level pipeline; wherein,
第一级, 线程根据报文的取指地址发送报文指令的获取请求; 第二级, 线程接收报文指令; At the first level, the thread sends a request to obtain the message instruction according to the fetch address of the message; at the second level, the thread receives the message instruction;
第三级, 线程解析报文指令并获取报文指令的源操作数; At the third level, the thread parses the message instruction and obtains the source operand of the message instruction;
第四级, 线程对源操作数进行位调整; At the fourth level, the thread performs bit adjustments on the source operand;
第五级, 线程根据所述位调整后的源操作数执行报文指令对应的算术 运算和对应的存储地址的计算; At the fifth level, the thread performs the arithmetic operation corresponding to the message instruction and the calculation of the corresponding storage address based on the bit-adjusted source operand;
第六级, 线程根据所述存储地址发出读写操作请求; At the sixth level, the thread issues read and write operation requests based on the storage address;
第七级, 线程获得所述操作请求的响应; At the seventh level, the thread obtains the response to the operation request;
第八级, 线程将所述算术运算的结果或所述操作请求的响应作为所述 报文指令的处理结果回写; At the eighth level, the thread writes back the result of the arithmetic operation or the response to the operation request as the processing result of the message instruction;
其中, 在第八级之后, 确定报文不需要查表且包括有未处理的报文指 令时, 根据所述报文的线程号返回第一级处理所述报文中未处理的报文指 令。 Among them, after the eighth level, when it is determined that the message does not require table lookup and contains unprocessed message instructions, return to the first level to process the unprocessed message instructions in the message according to the thread number of the message. .
8、 根据权利要求 7所述的 ME, 其中, 所述线程管理模块, 还配置为 在所述报文处理完成后, 将所述 文的线程号幹放。 8. The ME according to claim 7, wherein the thread management module is further configured to dry the thread number of the message after the message processing is completed.
9、 一种计算机存储介质, 其中存储有计算机程序, 该计算机程序用于 执行权利要求 1至 4任一项所述的 ME处理 4艮文的方法。 9. A computer storage medium in which a computer program is stored, the computer program being used to execute the ME processing method described in any one of claims 1 to 4.
PCT/CN2014/077834 2014-03-07 2014-05-19 Microengine and packet processing method therefor, and computer storage medium WO2015131445A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410084619.5A CN104901901B (en) 2014-03-07 2014-03-07 A kind of micro engine and its method for handling message
CN201410084619.5 2014-03-07

Publications (1)

Publication Number Publication Date
WO2015131445A1 true WO2015131445A1 (en) 2015-09-11

Family

ID=54034300

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/077834 WO2015131445A1 (en) 2014-03-07 2014-05-19 Microengine and packet processing method therefor, and computer storage medium

Country Status (2)

Country Link
CN (1) CN104901901B (en)
WO (1) WO2015131445A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257280B (en) * 2017-07-14 2022-05-27 深圳市中兴微电子技术有限公司 Micro-engine and message processing method thereof
CN109298923B (en) * 2018-09-14 2019-11-29 中科驭数(北京)科技有限公司 Deep pipeline task processing method and device
CN117331655A (en) * 2022-06-27 2024-01-02 深圳市中兴微电子技术有限公司 Multithreading scheduling method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1767502A (en) * 2004-09-29 2006-05-03 英特尔公司 Updating instructions executed by a multi-core processor
CN101763285A (en) * 2010-01-15 2010-06-30 西安电子科技大学 Zero-overhead switching multithread processor and thread switching method thereof
CN102752198A (en) * 2012-06-21 2012-10-24 北京星网锐捷网络技术有限公司 Multi-core message forwarding method, multi-core processor and network equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5430850A (en) * 1991-07-22 1995-07-04 Massachusetts Institute Of Technology Data processing system with synchronization coprocessor for multiple threads
US6829697B1 (en) * 2000-09-06 2004-12-07 International Business Machines Corporation Multiple logical interfaces to a shared coprocessor resource

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1767502A (en) * 2004-09-29 2006-05-03 英特尔公司 Updating instructions executed by a multi-core processor
CN101763285A (en) * 2010-01-15 2010-06-30 西安电子科技大学 Zero-overhead switching multithread processor and thread switching method thereof
CN102752198A (en) * 2012-06-21 2012-10-24 北京星网锐捷网络技术有限公司 Multi-core message forwarding method, multi-core processor and network equipment

Also Published As

Publication number Publication date
CN104901901B (en) 2019-03-12
CN104901901A (en) 2015-09-09

Similar Documents

Publication Publication Date Title
US11675598B2 (en) Loop execution control for a multi-threaded, self-scheduling reconfigurable computing fabric using a reenter queue
US11567766B2 (en) Control registers to store thread identifiers for threaded loop execution in a self-scheduling reconfigurable computing fabric
US11531543B2 (en) Backpressure control using a stop signal for a multi-threaded, self-scheduling reconfigurable computing fabric
US11915057B2 (en) Computational partition for a multi-threaded, self-scheduling reconfigurable computing fabric
US11275710B2 (en) Loop thread order execution control of a multi-threaded, self-scheduling reconfigurable computing fabric
US11782710B2 (en) Execution or write mask generation for data selection in a multi-threaded, self-scheduling reconfigurable computing fabric
US11119768B2 (en) Conditional branching control for a multi-threaded, self-scheduling reconfigurable computing fabric
US10990392B2 (en) Efficient loop execution for a multi-threaded, self-scheduling reconfigurable computing fabric
US11635959B2 (en) Execution control of a multi-threaded, self-scheduling reconfigurable computing fabric
US20230153258A1 (en) Multi-Threaded, Self-Scheduling Reconfigurable Computing Fabric
JP4472339B2 (en) Multi-core multi-thread processor
US7376952B2 (en) Optimizing critical section microblocks by controlling thread execution
WO2011032327A1 (en) Parallel processor and method for thread processing thereof
WO2011063574A1 (en) Stream data processing method and stream processor
WO2016101664A1 (en) Instruction scheduling method and device
WO2013185571A1 (en) Thread control and invoking method of multi-thread virtual assembly line processor, and processor thereof
WO2021218633A1 (en) Cpu instruction processing method, controller, and central processing unit
US20200319893A1 (en) Booting Tiles of Processing Units
TWI639951B (en) Central processing unit based on simultaneous multiple threads (SMT) and device for detecting data correlation of instructions
WO2015131445A1 (en) Microengine and packet processing method therefor, and computer storage medium
US10289306B1 (en) Data storage system with core-affined thread processing of data movement requests
US20170147345A1 (en) Multiple operation interface to shared coprocessor
WO2010137092A1 (en) Multi-operating system control method and processor system
US7127589B2 (en) Data processor
US11822472B2 (en) Memory management unit for multi-threaded architecture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14884589

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07/02/2017)

122 Ep: pct application non-entry in european phase

Ref document number: 14884589

Country of ref document: EP

Kind code of ref document: A1