WO2015131445A1

WO2015131445A1 - Microengine and packet processing method therefor, and computer storage medium

Info

Publication number: WO2015131445A1
Application number: PCT/CN2014/077834
Authority: WO
Inventors: 周峰
Original assignee: 深圳市中兴微电子技术有限公司
Priority date: 2014-03-07
Filing date: 2014-05-19
Publication date: 2015-09-11
Also published as: CN104901901A; CN104901901B

Abstract

Disclosed are a microengine (ME) and a packet processing method therefor, and a computer storage medium. The method comprises: conducting, by an ME, thread allocation on received packets through at least five thread management queues; according to the allocated threads, storing the packets in a packet memory having a double read-write port; and controlling the allocated threads to process the packets stored in the packet memory in an eight-level pipelining manner. Also disclosed at the same time is an ME for processing packets.

Description

Microengine and method for processing message thereof, and calculation storage medium

The present invention relates to network processor technologies, and in particular, to a micro engine (ME, Micro Engine) and a method thereof for processing a message, and a computer storage medium. Background technique

In order to meet the needs of future network development, improve the performance of the router, in the Internet

(Internet) The core routers at the backbone locations have undergone one technological change. Especially in the high-end router market, network processors have become an irreplaceable part of the routing and forwarding engine with its outstanding packet processing performance and programmability. Currently, the industry basically uses a multi-threaded network processor structure, and multi-thread management and scheduling is a key factor affecting the performance of multi-threaded network processors.

In a network processor system, the ME is a core component of the network processor. Multi-threaded architecture is an effective way to improve the performance of network processor ME, but it also brings the complexity of thread management and system frequency bottlenecks. Therefore, it is necessary to design a reasonable solution to achieve high-frequency and efficient ME thread scheduling management, and at the same time enable ME to have higher processing performance.

Some traditional multi-threaded network processors use MEs based on coarse-grained scheduling. Although such MEs can guarantee the execution of a thread's instructions at full speed, the loading and saving of data will cause the kernel pipeline to be idle every time the thread is switched. As a result, the performance of the ME is degraded.

In addition, since only one thread in the ME pipeline is executing, the design of the solution needs to solve the problem of data risk. When the data push design is adopted, the complexity of the logic is increased, and the combined logic path is increased when the two result related message instructions are continuously processed, resulting in a decrease in the system frequency. Summary of the invention

In order to solve the existing technical problems, the embodiments of the present invention are expected to provide an ME, a method for processing the message, and a computer storage medium, which can overcome the problem that the existing ME frequency and performance are not high.

The technical solution of the embodiment of the present invention is implemented as follows:

An embodiment of the present invention provides a method for an ME to process a message. The method includes: the ME performs thread allocation on the received message through at least five thread management queues, and stores the message according to the allocated thread. In the message memory with dual read/write ports, the eight-stage pipeline is used to control the allocated threads to process the messages stored in the message memory.

The embodiment of the present invention provides an ME, where the ME includes: a thread management module, a message storage module having dual read/write ports, and a kernel module; wherein the thread management module is configured to manage the queue by using at least five threads. Performing thread allocation on the received message; the message storage module is configured to store the message according to the allocated thread; and the kernel module is configured to control the allocated thread pair storage by using an eight-stage pipeline The message in the message storage module is processed.

The embodiment of the invention further provides a computer storage medium, wherein a computer program for executing the method for processing the message by the ME is stored.

As can be seen, an embodiment of the present invention provides an ME, a method for processing a message, and a computer storage medium. The ME allocates a received message by using at least five thread management queues, and the report is sent according to the allocated thread. The file is stored in a message memory with dual read/write ports, and the eight-stage pipeline is used to control the allocated threads to process the messages stored in the message storage module; the data risk is avoided from the hardware structure. The generation of the logic simplifies the logic, does not need to make judgment logic related to the data adventure, and avoids the occurrence of internal resource access conflicts in the ME, effectively improves the ME operating frequency and performance, and ensures the ME high-frequency and high-performance processing of the message, and the solution The implementation is relatively simple, and the complexity of the coding can be reduced, thereby reducing labor costs. DRAWINGS

1 is a schematic flowchart of a method for processing a message by an ME according to Embodiment 1 of the present invention; FIG. 2 is a schematic flowchart of a method for processing a message by an ME according to Embodiment 2 of the present invention; Schematic diagram of the working process of the ME pipeline processing a 4 艮 text;

4 is a schematic diagram of a working process of processing a plurality of MEs in a ME pipeline according to Embodiment 2 of the present invention;

FIG. 5 is a schematic structural diagram of an ME according to Embodiment 3 of the present invention. In the embodiment of the present invention, the ME performs thread allocation on the received message through at least five thread management queues, and stores the message in a message storage having dual read/write ports according to the allocated thread. And adopting an eight-stage pipeline to control the allocated thread to process the message stored in the memory.

The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

Embodiment 1

FIG. 1 is a schematic flowchart of a manner of processing a message by an ME according to Embodiment 1 of the present invention. As shown in FIG. 1, the method includes the following steps:

Step 101: The ME performs thread allocation on the received packet by using at least five thread management queues.

Specifically, taking five thread management queues as an example, when the ME receives a new packet, the idle queue free_queue allocates a thread number to the 4艮 text in a first-in, first-out manner, and allocates the thread number and the packet. The fetch address of the packet to be processed is written to the queue to be processed, rdy_queue. When the ME has an idle pipeline resource, the ME schedules a thread number of the to-be-processed message from the rdy_queue and the The instruction fetch address corresponding to the thread number is written to the work queue work_queue. The work_queue stores the thread number and the fetch address of the packet being processed by the ME. When a packet needs to be looked up, the 4 The thread number and the fetch address of the text are written into the table lookup queue srh_queue. When a 4艮 file is processed, the thread number and the fetch address of the message are written to the message output queue pkt— Out_ queue; wherein, when a message needs to be looked up or processed, the thread number and the fetch address of the message are written into the srh_queue or pkt_out_queue, and the work_queue is deleted. 4) The thread number and the fetch address corresponding to the text.

The thread number assigned by the free_queue to the message body corresponds to the body of the message, and the corresponding thread number can be determined by the assigned thread number.

When the ME is processing less than eight packets, the ME has an idle pipeline resource, and the thread number of a packet to be processed and the fetch address corresponding to the thread number are written from the rdy_queue to the work_queue. The idle pipeline resource is allocated to the 4艮 file corresponding to the thread number scheduled to enter the work_queue at this time, and the thread corresponding to the thread number of the 4艮 text uses the idle pipeline resource to process the 4艮 text. Here, the total number of thread numbers of the packets being processed stored in the work_queue is 8, corresponding to the eight-stage pipeline; when the total number of thread numbers stored in the work_queue is 8, the number of packets being processed by the ME As 8, each of the messages corresponds to one thread. At this time, 8 of the ME's eight-stage pipelines work in the eight-stage pipeline.

In the process of processing the four-stage pipeline in the eight-stage pipeline, each level of the eight-stage pipeline corresponds to one thread, and each thread processes one message. Therefore, the ME pipeline can process eight messages at the same time, in eight messages. After processing a message, the thread number corresponding to the processed message is written into the pkt_out_queue, and the thread number and the fetch address corresponding to the processed message are deleted from the work_queue. The total number of thread numbers stored in the work_queue is less than 8. Correspondingly, since the processing of the processing is completed, the pipeline resource allocated for the processed 4艮 text is set to the idle state because the processing of the processing is completed. Used again to process other messages.

After the packet is processed, the ME threads the thread number of the packet and the thread corresponding to the thread number. Here, the released thread number and its corresponding thread are again assigned to the message that the ME receives later.

Step 102: The ME stores the packet in a message storage having dual read/write ports according to the allocated thread.

Here, after the ME allocates the thread number to the received message in step 101, the ME allocates the corresponding thread to the message, and the ME stores the received message according to the allocated thread, and stores the message in the double. Read the port's message memory.

In practical applications, the message memory is a random access memory (RAM) with dual read/write ports.

Step 103: The ME controls, by using an eight-stage pipeline, the allocated thread to process the packet stored in the message storage.

Specifically, when the thread number of the packet and the corresponding fetch address are written into the work_queue, the ME uses an eight-stage pipeline to control the thread allocated in step 101 to process the message stored in the 4-dimensional file memory. .

Here, the eight-stage pipeline supports eight threads working at the same time, and each of the eight-stage pipelines corresponds to one thread, wherein

The first level, the thread sends a request for acquiring the message instruction according to the fetch address of the message; the second level, the thread receives the message instruction;

The third level, the thread parses the message instruction and obtains the source operand of the message instruction;

In the fourth level, the thread performs bit adjustment on the source operand;

In the fifth stage, the thread performs an arithmetic operation corresponding to the message instruction and a calculation of the corresponding storage address according to the bit-adjusted source operand;

In the sixth level, the thread issues a read/write operation request according to the storage address;

In the seventh level, the thread obtains a response to the operation request;

In the eighth stage, the thread takes the result of the arithmetic operation or the response of the operation request as a text The processing result of the instruction is written back;

After the eighth level, after determining that the message does not need to be looked up and includes an unprocessed message instruction, returning to the first level to process the unprocessed message instruction according to the thread number of the message, until The message instructions of the message are all processed.

For a message, the processing of one message instruction needs to be processed through the first level to the eighth level. In general, one message includes multiple message instructions, therefore, in the first level - the first After the eight-level execution is completed, it is necessary to determine whether the message needs to be looked up, and determine whether an unprocessed message instruction is included. When it is determined that the message does not need to be looked up and includes an unprocessed message instruction, The thread number corresponding to the 艮文 and the fetch address corresponding to the thread number are continuously stored in the work_queue, and the thread corresponding to the thread number continues to process the unprocessed message instruction through the eight-stage pipeline; When the file needs to be searched, the thread number corresponding to the packet is written into the srh_queue, and the thread corresponding to the packet is suspended until the ME receives the table response of the packet, and the thread number corresponds to The thread continues to process the 4艮 text instruction of the 4艮文; when the 4艮 file is processed, ie, there is no need to look up the table and there is no unprocessed message instruction, the thread number corresponding to the message is written to the pk T_out_ queue.

In practical applications, the sixth stage of the eight-stage pipeline uses only one of the two read/write ports when sending a read/write operation request to the RAM of the dual read/write port, and the other read/write port receives the message load on the ME. Or when receiving a request, it is used for receiving and sending messages in the ME.

Embodiment 2

In the second embodiment, the ME completes packet management and thread scheduling through the following five queues: idle queue free_queue, pending queue rdy_queue, operation queue work_queue, table lookup queue srh_queue, 艮The output queue pkt_out_queue; wherein, the free_queue allocates a thread number for the newly entered ME message; the rdy_queue stores the thread number and the fetch address of the to-be-processed message, where rdy_queue can be stored at most 8 pending thread numbers and fetch addresses; work—queue stores the thread number and fetch address of the message being processed; srh—queue storage needs The thread number and the fetch address of the packet of the table are checked. The pkt_out_queue stores the thread number and the fetching address of the packet to be sent after the processing is completed. When the packet enters the ME, the packet carries the fetching address.

FIG. 2 is a schematic diagram of a ME processing method according to Embodiment 2 of the present invention. As shown in FIG. 2, the method includes the following steps:

Step 201: The ME allocates a thread number by using a free_queue;

Specifically, when the ME receives a new packet, the free_queue allocates a thread number to the packet, where the depth of the free_queue is 16, and the depth of the free_queue determines the number of assignable thread numbers, that is, The free-queue allocates a thread number of up to 16 for the packet; the thread that can be used to process the message in the ME and the free-queue can match the number of thread numbers allocated to the packet, which is also 16; here, free_queue The depth of 16 is calculated based on the results of the comprehensive performance test and the cost of the resource.

The free-queue allocates the thread number for the received packets in a first-in, first-out manner. The depth of the free-queue is 16, and the 16 thread numbers can be 0-15. When receiving the message, the free-queue will be ranked. The thread number at the front of the queue is assigned to the received message. After the message is processed and output, the thread number corresponding to the processed 4艮 text is dry, and the dry thread number is stored in the free_queue, and the free_queue puts the released thread number in the queue. At the forefront, when the new message is received again, the dry thread number is again assigned to the new message. When the ME is powered on or reset, the 16 thread numbers of the free-queue are unallocated. The thread number is assigned to the received message from the thread number at the top of the queue, and the first-in-first-out method is adopted. After that, the received thread number is assigned.

The ME allocates a thread number to the packet, so that the packet and the packet are independent of each other, and the mapping relationship between the packet and the ME is allocated to the storage resource of the packet.

Step 202: The ME allocates the thread number assigned to the packet and the corresponding fetch address to the rdy_queue; here, after assigning the thread number to the packet, the free_queue will allocate the thread number and the fetching ground. The address is written to rdy_queue. At this time, the packet corresponding to the thread number of the rdy_queue is the pending packet; the rdy_queue stores the thread number and the fetch address of the pending packet, and stores up to 8 The thread number and the fetch address of the 4 files to be processed.

Step 203: When there is an idle pipeline resource, the ME schedules a thread number of the 4th file to be processed from the rdy_queue and its corresponding fetch address to be written into the work_queue;

Specifically, when the ME has an idle pipeline resource, the ME schedules a thread number of the 4th file to be processed from the rdy_queue and the fetch address corresponding to the thread number to be written to the work_queue, at this time, The thread corresponding to the thread number assigned to the file is processed by the idle pipeline resource in the pipeline; wherein the work_queue stores the thread number of the message that the ME is processing. After a packet needs to be looked up or processed, the ME deletes the thread number corresponding to the packet from the work_queue and its corresponding fetch address, and writes the thread number corresponding to the packet with its corresponding fetch address. Srh_queue or pkt_out_queue, so that the work_queue allows the thread number of the 4th file to be processed in the rdy_queue to enter the work-queue with the fetch address corresponding to the thread number.

Here, the processing of the thread in the kernel adopts an eight-stage pipeline structure, and the eight-stage pipeline completes the extraction of the message instruction in the message according to the thread number stored in the work_queue and the fetch address corresponding to the thread number. After the text instruction is extracted, the message instruction is parsed, and the source operand of the message instruction is extracted. After extracting the source operand, the logic calculation unit in the ME completes the calculation of the source operand shift, splicing, addition and subtraction according to the requirements of the message instruction, and writes the calculation result into the destination register or the memory. · After the processing of the eight-stage pipeline, there are three cases:

First, when the message does not need to be looked up and includes an unprocessed message instruction, the message is not processed, and the next message instruction of the message is processed by the eight-stage pipeline;

Second, when the message needs to be looked up, the message is not processed, and step 204 is performed; Third, when the processing is completed, step 205 is performed.

Step 204: The ME writes the thread number of the packet that needs to be looked up and its corresponding fetch address to the srh_queue;

Specifically, when the packet needs to be looked up, the thread number of the packet and the fetch address corresponding to the thread number are written into the srh_queue, waiting to be checked; at this time, the packet is not processed, and the thread is threaded. The number is written to the srh_queue. When the ME sends the packet out for table lookup, the sent lookup table message carries the thread number of the packet. Therefore, the thread number of the packet is still occupied. The thread that handles the essay is suspended, waiting for the return of the table lookup response; since the thread corresponding to the 4 艮 table of the table is suspended, the octet pipeline has idle pipeline resources, and the idle pipeline resource rdy— The 4th file to be processed corresponding to the thread number stored in the queue is processed.

When the ME receives the lookup table response, the lookup table response carries the thread number of the lookup table message, and extracts the fetch address of the message in the table lookup instruction memory srh_pc_ram corresponding to the thread by the thread number; After the instruction address is extracted, the process returns to step 202. The only difference between the two steps in the case of receiving the new message is that the ME receives the response of the message directly. Write the thread number of the packet and the extracted fetch address to the rdy_queue, and no longer assign the thread number by the free-queue; when the ME receives the new packet, the free-queue will allocate the packet for the packet. The thread number and the fetch address are written to rdy_queue.

Step 205: The ME writes the thread number of the processed packet and the corresponding fetch address to the pkt_out_queue;

Specifically, when the packet processing is completed, the thread number of the packet and the fetch address corresponding to the thread number are written into the pkt_out_queue, waiting to be output; at this time, although the packet has been processed, The thread number is written to the pkt_out_queue, and the thread number of the 4艮 text is still used; when the ME sends the packet, the thread number of the packet is dried, so that the thread number can be Is assigned to the new 4 艮 text, never occupying the assignable thread number resource in the ME.

The thread number that is dry in step 205 is written to the free_queue in a queue manner. Used to be assigned again.

It should be noted that when the total number of thread numbers corresponding to the pending report stored in the rdy_queue is less than 8, an empty message is automatically generated in the ME, and the number of thread numbers stored in the rdy_queue and work_queue is maintained. It is 8, and makes the ME's eight-stage pipeline process 8 messages at the same time, so that all levels of the eight-stage pipeline can be executed normally. Here, the processing result of the null message is step 205.

After step 201, the ME saves the message according to the thread number allocated for the message in step 201, that is, according to the thread allocated for the message, in the message memory pkt_ram with dual read/write ports; here, pkt — The ram has dual read/write ports, which enable the eight-stage pipeline to apply dual read/write ports to process messages stored in the pkt-ram.

When the ME's eight-stage pipeline is processing the instructions, you need to access pkt_ram. At this time, the writes waiting for the output in the pkt_out_queue are also stored in the pkt_ram. When the text is output from the ME, it also needs to access pkt_ram, which causes the read and write conflict of pkt-ram. In order to avoid this conflict, it is necessary to suspend the processing of the pipeline or the acquisition of the packets to be output, so that the performance of the ME is degraded. In the third embodiment of the present invention, the port of the pkt-ram is preempted by the pkt-ram of the dual read-write port. As a result, the pipeline suspension will not occur, and the pipeline can run at full speed, improving the performance of ME processing messages.

FIG. 3 is a schematic diagram of the workflow of the ME pipeline processing in step 203. As shown in FIG. 3, the processing procedure of the pipeline is as follows:

When the thread number of the packet and the fetch address carried by the packet enter the work-queue, the idle pipeline resource in the pipeline processes the "3⁄4 text. The pipeline can run up to 8 threads at the same time, and 8 ports can be processed at the same time. Text.

The message first enters the first stage of the pipeline, and the instruction acquires 1 (IF1, Instruction Fetch 1) level, and the thread that processes the data is sent according to the instruction address thread_pc stored in the work_queue. An acquisition request of the instruction; wherein, the thread sends the request for obtaining the message instruction It is sent to the instruction memory instrmem for storing the message instruction; here, instrmem is a RAM independent of the RAM storing the message, so that the read/write access speed of the instruction is high, the delay is small, and the access failure does not occur.

The second level, the instruction acquires 2 (IF2, Instruction Fetch 2) level, receives the message instruction from instrmem, and stores the received message instruction in the instruction register if-instr for saving the obtained message instruction.

The third stage, instruction decoding (ID, Instruction Decode) level, parses the message instruction received in if_instr, decodes it, generates register file (RF, register file) read command and read address, and obtains from RF for execution. The source operand required by each execution unit of the message instruction; Here, the ME allocates a corresponding RF for each thread in the pipeline to store data associated with each thread.

The fourth level, the execution of 1 (EX1, execute 1) level, the bit adjustment of the source operand, because the ME supports more types of operations, such as the logical unit (ALU, Arithmetic Logical Unit) class calculation, you need to source The values of the operands are aligned to prepare for the fifth level of operation. This level is mainly to ensure that the arithmetic unit that executes the message instruction does not need to calculate the source operand, and only needs to adjust the bit according to the acquired operand and operation type to improve the timing.

The fifth level, executing 2 (EX2, execute 2) level, calculating the source operand after the fourth level median adjustment by the ALU, to perform the arithmetic operation corresponding to the message instruction and the calculation of the corresponding storage address; For pure combinatorial logic, the arithmetic operation corresponding to the message instruction and the calculation of the memory address are completed according to the source operand.

The sixth level, memory access 1 (MAI, Memory Access 1) level, where the operation request corresponds to the message instruction, when the message instruction is an arithmetic operation, the arithmetic operation result is written into the result (result) unit; When the text instruction is an operation for storing an address, a read/write operation request is issued to the pkt_ram through one of the read and write ports of the pkt_ram.

The seventh level, memory access 2 (MA2, Memory Access2) level, obtains the read and write operation request data read from pkt-ram, and at the same time, the result unit and the data read from pkt-ram are used as The output of the pipeline is sent to the data judging unit Wb_mux to judge which one of the three cases in step 203 is after the pipeline processing according to the output of the pipeline before the eighth-level write back, and the output of the pipeline is also This is the result of the processing of this message.

The eighth level, write back (WB, Write Back) level, writes the output of the pipeline judged by Wb- mux back to the RF, so that the processing result of the command is valid.

Among them, the IF1 level and the IF2 level extract the message instruction from the instrmem, the ID level parsing the message instruction, the EX1 level completes the extraction of the source operand of the message instruction, and the EX2 level is completed by the logic calculation unit according to the requirements of the message instruction. The number shift, splicing, addition and subtraction calculations, MA1 level, MA2 level, WB level write the result of the message instruction to the RF of the message instruction, and execute the eight-level operation of the above-mentioned pipeline to complete a 4 艮 text Processing of instructions.

In the actual application, when the IF1 level sends a message instruction acquisition request, the 4 艮 text instruction corresponding to the current fetching address will be extracted; after the 4 艮 text instruction is fetched, the fetching address also changes correspondingly, and the backward offset One bit, in order to correctly extract the message instruction when acquiring the next message instruction.

As shown in Figure 4, each subsequent message enters each stage of the pipeline in sequence; each stage of the pipeline corresponds to one thread, and the pipeline supports 8 threads to work simultaneously. The first message fetches the message instruction at the IF1 level, completes the write back of the processing result at the WB level, and completes the processing of a message instruction. Each subsequent 4艮 text lags behind the previous 4艮 first stage pipeline. The 8-stage pipeline, at the same time, performs different operations for each level of water to complete the corresponding functions of the level. When eight threads work at the same time, each thread works in a different pipeline level in sequence. For example: At time T1, ThreadO thread works at IF1 level; at time T2, ThreadO thread works at IF2 level, Threadl thread works at IF1 level; At time T3, ThreadO thread works at ID level, Threadl thread works at IF2 level, Thread2 thread works at IF1 level, in turn, when T8 time, ThreadO thread works at WB level, Threadl works at MA2 level, Thread2 thread works at MA1 level, Thread3 works at EX2 level, Thread4 works at EX1 level, Thread2 thread works at ID level, Thread3 Working at IF2 level, Thread7 works at IF1 level. When a message arrives at the WB level, it represents the completion of processing an instruction of the message; at this time, the message does not need to look up the table and the current message instruction is not the last message instruction of the message, then the ME will The thread number and the fetch address of the message continue to be stored in the work_queue, and the next message instruction of the message is processed by the thread that processes the message.

For two consecutive ALU message instructions, that is, the result of the calculation of the previous message instruction is the source operand of the latter message instruction, and the processing result of the previous message instruction at the WB level is written back in RF. Effective, the next instruction needs to obtain the processing result from the RF as the source operand at the ID level, where the writeback of the processing result is separated by 5 cycles, that is, the next message instruction requires five cycles. The result of the previous message instruction can only be used, otherwise a data adventure will occur. The number of pipeline stages is 8 levels. Each level corresponds to one thread. One thread executes a 4 艮 instruction, and the next 4 艮 instruction is interrupted by 8 cycles. The 8 cycles are greater than 5 cycles, which avoids The generation of data adventures.

For the thread ThreadO, the first message command executed by ThreadO enters the pipeline at time T1, and it takes 8 cycles of the first message instruction to execute to the WB level. At this time, the second message instruction of ThreadO will enter the pipeline. That is, for ThreadO, the two message instructions executed by ThreadO are separated into 8 pipelines. For the eight-stage pipeline, the WB level is the operation of completing the RF write, and the ID level is the operation of completing the RF read. The two stages are separated by 5 cycles, and the latter instruction is not executed to the ID level at this time, avoiding The generation of data adventures.

Embodiment 3

5 is a schematic structural diagram of an ME according to Embodiment 3 of the present invention. As shown in FIG. 3, the ME50 includes a thread management module 51, a message storage module 52 having dual read/write ports, and an internal core module 53;

The thread management module 51 can be implemented by a central processing unit (CPU) and a memory chip, and configured to receive the received message through at least five thread management queues. Row thread allocation;

Specifically, taking the five thread management queue as an example, the thread management module 51 can allocate the thread number in the first-in first-out manner through the idle queue free-queue, and the assigned thread number and the packet itself are carried. The address is written to the queue to be processed, rdy_queue. When there is an idle pipeline resource, the thread number of a pending message from the rdy_queue and the fetch address corresponding to the thread number are written to the operation queue work. Queue, work—The queue is the thread number and the fetch address of the packet being processed. When a packet needs to be looked up, the thread number and the fetch address of the packet are written to the lookup table queue. In the srh_queue, when a packet is processed, the thread number and the fetch address of the packet are written into the packet output queue pkt_out_queue; wherein, when a packet needs to be looked up or processed When finished, the thread number and the fetch address corresponding to the packet are deleted from the work_queue.

The message storage module 52 can be implemented by RAM and configured to store the message in accordance with the assigned thread.

The kernel module 53 can be implemented by the CPU and the signal processing chip, and is configured to control the allocated thread pair by using an eight-stage pipeline method;

Specifically, the kernel module 53 is configured to correspond to one thread of each of the eight stages of pipelines, wherein

In the fourth level, the thread performs bit adjustment on the source operand;

In the seventh level, the thread obtains a response to the operation request; In the eighth stage, the thread writes back the result of the arithmetic operation or the response of the operation request as a processing result of the message instruction;

After the eighth level, after determining that the message does not need to be looked up and includes an unprocessed message instruction, returning to the first level according to the thread number of the message, processing the unprocessed message instruction in the message. .

The thread management module 51 is further configured to dry the thread number of the message after the processing of the file is completed.

The present invention is described in terms of a flowchart, and/or block diagram of a method, apparatus (system), and computer program product according to any one of embodiments 1 through 3 of the present invention. It will be understood that each flow and/or block of the flowcharts and/or block diagrams, and combinations of flow and / or blocks in the flowcharts and / or block diagrams can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Correspondingly, any one of Embodiments 1 and 2 of the present invention further provides a computer storage medium. And a computer program for executing the ME processing method of any one of the embodiments 1 and 2 of the present invention.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included. Within the scope of protection of the present invention.

Claims

claims

1. A method for micro-engine (ME) to process messages. The method includes:

The ME allocates threads to received messages through at least five thread management queues, stores the messages in a message memory with dual read and write ports according to the assigned threads, and uses an eight-level pipeline to control all messages. The allocated threads process messages stored in the message memory.

2. The method according to claim 1, wherein the ME allocates threads to the received messages through at least five thread management queues as:

When ME receives a new message, the idle queue free_queue assigns a thread number to the message in a first-in, first-out manner, and writes the assigned thread number and the fetch address carried by the message itself to the pending queue rdy_ queue, when the ME has idle pipeline resources, the ME schedules the thread number of a pending document and the fetch address corresponding to the thread number from the rdy_queue to write to the operation queue work_queue, work_queue. What is stored in is the thread number and fetch address of the message being processed by ME. When a message needs to be looked up in the table, the thread number and fetch address of the message are written into the table lookup queue srh_queue. , when a message is processed, write the thread number and fetch address of the message into the message output queue pkt_out_queue; where, when a message needs to be looked up in the table or is processed, from Delete the thread number and instruction address corresponding to the message in the work-queue.

3. The method according to claim 2, wherein the eight-stage pipeline is used to control the allocated threads to process the messages stored in the message memory as:

The eight-level pipeline supports eight threads working at the same time, and each level in the eight-level pipeline corresponds to one thread; among them,

At the first level, the thread sends a request to obtain the message instruction according to the fetch address of the message; at the second level, the thread receives the message instruction;

At the third level, the thread parses the message instruction and obtains the source operand of the message instruction;

At the fourth level, the thread performs bit adjustments on the source operand; At the fifth level, the thread performs the arithmetic operation corresponding to the message instruction and the calculation of the corresponding storage address based on the bit-adjusted source operand;

At the sixth level, the thread issues read and write operation requests based on the storage address;

At the seventh level, the thread obtains the response to the operation request;

At the eighth level, the thread writes back the result of the arithmetic operation or the response to the operation request as the processing result of the instruction;

Among them, after the eighth level, when it is determined that the message does not require table lookup and contains unprocessed message instructions, return to the first level to process the unprocessed message instructions in the message according to the thread number of the message. .

4. The method according to claim 3, wherein the method further includes:

After the message processing is completed, the thread number of the message is stored.

5. An ME, the ME includes: a thread management module, a text storage module with dual read and write ports, and a kernel module; wherein,

The thread management module is configured to perform thread allocation on received messages through at least five thread management queues;

The message storage module is configured to store the message according to the assigned thread; the kernel module is configured to use an eight-stage pipeline to control the assigned thread to store the message stored in the message storage module. The text is processed.

6. The ME according to claim 5, wherein the thread management module is configured to allocate thread numbers to messages in a first-in-first-out manner through the free queue, and combine the allocated thread numbers and messages The fetch address carried by itself is written to the queue to be processed rdy_queue. When there are idle pipeline resources, the thread number of a message to be processed is scheduled from rdy_queue and the fetch address corresponding to the thread number is written to Operation queue work_queue, work_queue stores the thread number and fetch address of the message being processed. When a message needs to be looked up in the table, write the thread number and fetch address of the message. Go to the table lookup queue srh_queue. When a message is processed, Write the thread number and fetch address of the message into the message output queue pkt_out_queue; wherein, when a message needs to be looked up in the table or is processed, the message is deleted from the work_queue. The corresponding thread number and instruction address.

7. The ME according to claim 6, wherein the kernel module is configured to correspond to one thread for each level of the eight-level pipeline; wherein,

At the fourth level, the thread performs bit adjustments on the source operand;

At the fifth level, the thread performs the arithmetic operation corresponding to the message instruction and the calculation of the corresponding storage address based on the bit-adjusted source operand;

At the seventh level, the thread obtains the response to the operation request;

At the eighth level, the thread writes back the result of the arithmetic operation or the response to the operation request as the processing result of the message instruction;

8. The ME according to claim 7, wherein the thread management module is further configured to dry the thread number of the message after the message processing is completed.

9. A computer storage medium in which a computer program is stored, the computer program being used to execute the ME processing method described in any one of claims 1 to 4.