CN104901901A - Micro-engine and method for processing message therewith - Google Patents

Micro-engine and method for processing message therewith Download PDF

Info

Publication number
CN104901901A
CN104901901A CN201410084619.5A CN201410084619A CN104901901A CN 104901901 A CN104901901 A CN 104901901A CN 201410084619 A CN201410084619 A CN 201410084619A CN 104901901 A CN104901901 A CN 104901901A
Authority
CN
China
Prior art keywords
message
thread
queue
instruction
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410084619.5A
Other languages
Chinese (zh)
Other versions
CN104901901B (en
Inventor
周峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen ZTE Microelectronics Technology Co Ltd
Original Assignee
Shenzhen ZTE Microelectronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen ZTE Microelectronics Technology Co Ltd filed Critical Shenzhen ZTE Microelectronics Technology Co Ltd
Priority to CN201410084619.5A priority Critical patent/CN104901901B/en
Priority to PCT/CN2014/077834 priority patent/WO2015131445A1/en
Publication of CN104901901A publication Critical patent/CN104901901A/en
Application granted granted Critical
Publication of CN104901901B publication Critical patent/CN104901901B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/64Hybrid switching systems
    • H04L12/6418Hybrid transport

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses a micro-engine (ME) and a method for processing a message therewith. The micro-engine (ME) performs thread distribution on a received message via at least five thread management queues and stores the message in a message memory with dual read-write ports according to the distributed thread, and uses a manner of eight-stage pipeline to control the distributed thread to process the message stored in the message memory. The invention further discloses a micro-engine (ME) for processing the message.

Description

A kind of method of micro engine and process message thereof
Technical field
The present invention relates to Network Processor technology, particularly relate to the method for a kind of micro engine (ME, Micro Engine) and process message thereof.
Background technology
In order to meet the needs of future network development, improve the performance of router, the core router being in internet (Internet) key position has carried out one and another technological change.Especially in high-end router market, network processing unit has become with its outstanding Message processing performance and programmability and has been formed the irreplaceable part of routing forwarding engine.Current industry uses the network processor arrangement of multithreading substantially, and the management of multithreading and scheduling are the key factors affecting multi-threaded network processor performance.
In network processor system, ME is the core component of network processing unit.Multithreaded architecture is a kind of effective ways improving network processing unit ME performance, but also can bring the problems such as the complexity of thread management and system frequency bottleneck.Therefore need design rational scheme to realize the ME threading scheduling management of high frequency efficient, make ME have higher handling property simultaneously.
Some traditional multi-threaded network processors have employed the ME based on coarseness scheduling, although such ME can ensure that the instruction of a thread performs at full speed, but when thread switches each time, the loading of data and preservation all can cause the free time of kernel streamline, thus cause the decline of ME performance.
In addition, owing to only having a thread in the streamline of ME in execution, so need the problem solving data hazard during design.When the design pushed away before adopting data, will the complexity of logic be increased, and cause the increase in combinational logic path when processing the message instruction that two results are correlated with continuously, cause the decline of system frequency.
Summary of the invention
In view of this, the embodiment of the present invention expects a kind of method providing ME and process message thereof, can overcome existing ME frequency and the not high problem of performance.
Technical scheme of the present invention is achieved in that
The invention provides a kind of method of ME process message, described method comprises: ME carries out thread distribution by least five thread management queues to the message received, according to distributed thread by described packet storage in the packet storage device with two reading-writing port, and adopt the mode of eight level production lines to control distributed thread to process the message be stored in described packet storage device.
In such scheme, described ME carries out thread by least five thread management queues to the message received and is assigned as: when ME receives new message, by idle queues free_queue in the mode of first in first out for message distributes thread number, and the thread number of distribution and the self-contained fetching address of message are write pending queue rdy_queue, when having idle pipeline resource in ME, ME from rdy_queue, dispatch the thread number of an outstanding message and operation queue work_queue is write in fetching address corresponding to described thread number, what store in work_queue is all thread number and the fetching address of the message that ME is processing, when a message needs to table look-up, the thread number of described message and fetching address being written to tables look-up in queue srh_queue, when a Message processing is complete, the thread number of described message and fetching address are written in message output queue pkt_out_queue, wherein, when a message needs to table look-up or be disposed, from work_queue, thread number corresponding to described message and fetching address is deleted.
In such scheme, the mode of described employing eight level production line controls distributed thread and is treated to the message be stored in described packet storage device: eight level production line support eight threads work simultaneously, the corresponding thread of every one-level in eight level production lines; Wherein, the first order, thread sends the acquisition request of message instruction according to the fetching address of message; The second level, thread receives message instruction; The third level, the instruction of thread analytic message also obtains the source operand of message instruction; The fourth stage, thread carries out position adjustment to source operand; Level V, thread performs the calculating of arithmetical operation corresponding to message instruction and corresponding memory address according to the source operand after institute's rheme adjustment; 6th grade, thread sends read-write operation request according to described memory address; 7th grade, thread obtains the response of described operation requests; 8th grade, thread is using the result write-back of the response of the result of described arithmetical operation or described operation requests as message instruction; Wherein, after the 8th grade, when determining that message does not need to table look-up and includes untreated message instruction, return untreated message instruction in message described in first order process according to the thread number of described message.
In such scheme, after described Message processing completes, the thread number of described message is discharged.
The invention provides a kind of ME, described ME comprises: thread management module, have packet storage module, the kernel module of two reading-writing port; Wherein, described thread management module, for carrying out thread distribution by least five thread management queues to the message received; Described packet storage module, for message according to distributed threads store; Described kernel module, controls distributed thread for adopting the mode of eight level production lines and processes the message be stored in described packet storage module.
In such scheme, described thread management module, specifically for by idle queues free_queue in the mode of first in first out for message distributes thread number, and the thread number of distribution and the self-contained fetching address of message are write pending queue rdy_queue, when available free pipeline resource, the thread number of an outstanding message is dispatched and operation queue work_queue is write in fetching address corresponding to described thread number from rdy_queue, what store in work_queue is all thread number and the fetching address of the message processed, when a message needs to table look-up, the thread number of described message and fetching address being written to tables look-up in queue srh_queue, when a Message processing is complete, the thread number of described message and fetching address are written in message output queue pkt_out_queue, wherein, when a message needs to table look-up or be disposed, from work_queue, thread number corresponding to described message and fetching address is deleted.
In such scheme, described kernel module, specifically for by corresponding for every one-level in an eight level production lines thread; Wherein, the first order, thread sends the acquisition request of message instruction according to the fetching address of message; The second level, thread receives message instruction; The third level, the instruction of thread analytic message also obtains the source operand of message instruction; The fourth stage, thread carries out position adjustment to source operand; Level V, thread performs the calculating of arithmetical operation corresponding to message instruction and corresponding memory address according to the source operand after institute's rheme adjustment; 6th grade, thread sends read-write operation request according to described memory address; 7th grade, thread obtains the response of described operation requests; 8th grade, thread is using the result write-back of the response of the result of described arithmetical operation or described operation requests as described message instruction; Wherein, after the 8th grade, when determining that message does not need to table look-up and includes untreated message instruction, return untreated message instruction in message described in first order process according to the thread number of described message.
In such scheme, the thread number of described message, also for after described Message processing completes, discharges by described thread management module.
As can be seen here, the embodiment of the present invention provides a kind of method of ME and process message thereof, ME carries out thread distribution by least five thread management queues to receiving message, according to distributed thread by described packet storage in the packet storage device with two reading-writing port, and adopt the mode of eight level production lines to control distributed thread to process the message be stored in described packet storage module; The generation of data hazard is avoided from hardware configuration, simplify logic, do not need to carry out and data hazard correlated judgment logic, and avoid the generation of ME internal resource access conflict, effectively improve ME operating frequency and performance, ensure the high performance process message of ME high-frequency, and scheme realizes relatively simple, the complexity of coding can be reduced, thus reduce human cost.
Accompanying drawing explanation
The schematic flow sheet of the method for the ME process message that Fig. 1 provides for the embodiment of the present invention one;
The schematic flow sheet of the method for the ME process message that Fig. 2 provides for the embodiment of the present invention two;
The course of work schematic diagram of the ME pipeline processes message that Fig. 3 provides for the embodiment of the present invention two;
The course of work schematic diagram of ME pipeline processes multiple the messages that Fig. 4 provides for the embodiment of the present invention two;
The structural representation of the ME that Fig. 5 embodiment of the present invention three provides.
Embodiment
In embodiments of the present invention, ME carries out thread distribution by least five thread management queues to the message received, according to distributed thread by described packet storage in the packet storage device with two reading-writing port, and adopt the mode of eight level production lines to control distributed thread to process the message be stored in described packet storage device.
Below in conjunction with drawings and the specific embodiments, technical scheme of the present invention is described in further detail.
Embodiment one
The schematic flow sheet of the mode of the ME process message that Fig. 1 provides for the embodiment of the present invention one, as shown in Figure 1, the method comprises the following steps:
Step 101, ME carries out thread distribution by least five thread management queues to the message received;
Concrete, for five thread management queues, when ME receives new message, by idle queues free_queue in the mode of first in first out for message distributes thread number, and the thread number of distribution and the self-contained fetching address of message are write pending queue rdy_queue, when having idle pipeline resource in ME, ME from rdy_queue, dispatch the thread number of an outstanding message and operation queue work_queue is write in fetching address corresponding to described thread number, what store in work_queue is all thread number and the fetching address of the message that ME is processing, when a message needs to table look-up, the thread number of described message and fetching address being written to tables look-up in queue srh_queue, when a Message processing is complete, the thread number of described message and fetching address are written in message output queue pkt_out_queue, wherein, when a message needs to table look-up or be disposed, while the thread number of message and fetching address write srh_queue or pkt_out_queue, from work_queue, delete thread number corresponding to this message and fetching address.
Wherein, the thread number and the one_to_one corresponding of message own that are message distribution by free_queue, pass through the message that distributed thread number can determine its correspondence.
When the message that ME is processing is less than 8, ME has free pipeline resource, from rdy_queue, then dispatch the thread number of an outstanding message and fetching address corresponding to described thread number is write in work_queue, the pipeline resource of this free time is distributed to now scheduling and enter message corresponding to the thread number of work_queue, utilize idle pipeline resource to process this message by the thread that the thread number of this message is corresponding.Here, the thread number of the message processed stored in work_queue add up to 8, corresponding with eight level production lines; When the thread number stored in work_queue add up to 8 time, the quantity of the message that ME is processing is 8, and wherein the corresponding thread of each message, now, has 8 thread periodic duties in eight level production lines in eight level production lines of ME.
In the process of eight level production line process messages, every grade of correspondence thread of eight level production lines, each thread process message, therefore, ME streamline can process 8 messages simultaneously, after the Message processing of in 8 messages is complete, in the thread number write pkt_out_queue that the message this be disposed is corresponding, and thread number corresponding to the message this be disposed and fetching address are deleted from work_queue, the sum of the thread number then stored in work_queue is less than 8, accordingly, because this Message processing is complete, the pipeline resource that the message be disposed for this distributes is finished due to its processing procedure, be set to idle condition, be used again to the message processing other.
After Message processing, ME by the thread number of this message and this thread number corresponding thread release; Here, the thread that the thread number of release is corresponding with it is by the message again distributed to ME and receiving afterwards.
Step 102, ME according to distributed thread by described packet storage in the packet storage device with two reading-writing port;
Here, ME is that after the message received distributes thread number, for message is assigned with corresponding thread, then ME first stores, by packet storage in the packet storage device with dual-read port according to the message of distributed thread by reception in a step 101.
In actual applications, packet storage device is the random asccess memory (RAM, Random Access Memory) with two reading-writing port.
Step 103, ME adopts the mode of eight level production lines to control distributed thread and processes the message be stored in described packet storage device;
Concrete, when the thread number of message writes work_queue with corresponding fetching address, the thread that ME adopts the mode of eight level production lines to control to distribute in a step 101 processes the message be stored in packet storage device.
Here, eight level production line support eight threads work simultaneously, the corresponding thread of every one-level in eight level production lines, wherein,
The first order, thread sends the acquisition request of message instruction according to the fetching address of message;
The second level, thread receives message instruction;
The third level, the instruction of thread analytic message also obtains the source operand of message instruction;
The fourth stage, thread carries out position adjustment to source operand;
Level V, thread performs the calculating of arithmetical operation corresponding to message instruction and corresponding memory address according to the source operand after institute's rheme adjustment;
6th grade, thread sends read-write operation request according to described memory address;
7th grade, thread obtains the response of described operation requests;
8th grade, thread is using the result write-back of the response of the result of described arithmetical operation or described operation requests as message instruction;
Wherein, after the 8th grade, when determining that message does not need to table look-up and includes untreated message instruction, return untreated message instruction described in first order process according to the thread number of described message, until the message instruction of described message has all processed.
For a message, need successively through the process of the first order to the 8th grade to the process of one article message instruction, generally, a message comprises many message instructions, therefore, after the first order-eight grades is complete, need to judge that this message is the need of tabling look-up, and determine whether to include untreated message instruction, when determining do not need in this message to table look-up and comprise untreated message instruction, thread number corresponding for this message and fetching address corresponding to this thread number are continued to be stored in work_queue, and continued to process untreated message instruction by eight level production lines by the thread that this thread number is corresponding, when determining that message needs to table look-up, thread number corresponding for this message is written in srh_queue, by thread suspension corresponding for this message, until ME receives the response of tabling look-up of this message, continued to process the message instruction of this message by the thread that this thread number is corresponding, when Message processing is complete, when namely not needing table look-up and there is not untreated message instruction, thread number corresponding for this message is written in pkt_out_queue.
In actual applications, the 6th grade of eight level production lines when sending read-write operation request to the RAM of two reading-writing port, only use in two reading-writing port, another reading-writing port ME receive message load or obtain request time, for message in ME reception and send.
Embodiment two
In embodiment two, ME completes the management of message and the scheduling of thread by following 5 queues:
Idle queues free_queue, pending queue rdy_queue, operation queue work_queue, table look-up queue srh_queue, message output queue pkt_out_queue; Wherein, free_queue is that the message newly entering ME distributes thread number; Rdy_queue stores the thread number of outstanding message and fetching address, and wherein, rdy_queue can store at most 8 pending thread number and fetching address; Work_queue stores thread number and the fetching address of the message processed; Srh_queue stores the thread number and fetching address that need the message of tabling look-up, needs thread number and the fetching address of the message sent after pkt_out_queue stores processor; Wherein, message when entering ME, self-contained fetching address.
The method of a kind of ME process message that Fig. 2 provides for the embodiment of the present invention two, as shown in Figure 2, comprises the following steps:
Step 201, ME is that message distributes thread number by free_queue;
Concrete, when ME receives new message, be that message distributes thread number by free_queue, wherein, the degree of depth of free_queue is that the degree of depth of 16, free_queue determines the quantity can distributing thread number, that is, free_queue is that the thread number that message distributes mostly is 16 most; The thread that can be used for processing message in ME is consistent with the quantity that free_queue can be the thread number that message distributes, and is also 16; Here, the degree of depth 16 of free_queue draws according to the cost calculation of all-round property testing result and resource.
Free_queue is in the mode of first in first out for the message received distributes thread number, and the degree of depth of free_queue is that 16,16 thread number can be 0-15; When receiving message, the thread number coming queue foremost is distributed to the message received by free_queue.To process and after exporting when message completes, then thread number corresponding for the message completing process is discharged, and the thread number of this release is stored to free_queue, the thread number of release is placed on the foremost of queue by free_queue, with in time receiving new message again, the thread number of this release is distributed to new message again.ME power on or reset start time, 16 thread number of free_queue are all in unallocated state, from coming the thread number of queue foremost be receive message distribute thread number, and adopt the mode of first in first out be after receive message distribute thread number.
ME, by distributing thread number to message, makes between message and message separate, and by thread number, message is set up corresponding mapping relations with the storage resources that ME distributes to message.
Step 202, ME is that the thread number that message distributes writes rdy_queue with corresponding fetching address;
Here, distributing after thread number for message, the thread number of distribution and fetching address are write rdy_queue by free_queue, and now, the message entering the thread number of rdy_queue corresponding is outstanding message; Store thread number and the fetching address of outstanding message in rdy_queue, and store at most thread number and the fetching address of 8 outstanding messages.
Step 203, when having idle pipeline resource, the fetching address that ME dispatches the thread number of an outstanding message corresponding with it from rdy_queue writes work_queue;
Concrete, when having idle pipeline resource in ME, the fetching address that ME dispatches the thread number of an outstanding message corresponding with this thread number from rdy_queue is written to work_queue, now, this message is processed by pipeline resource idle in streamline by the thread that the thread number of the distribution for this message is corresponding; Wherein, what store in work_queue is all the thread number of the message that ME is processing.After a message needs to table look-up or be disposed, ME deletes thread number corresponding to this message fetching address corresponding with it from work_queue, fetching address corresponding with it for thread number corresponding for this message is written in srh_queue or pkt_out_queue, thus makes work_queue allow the thread number of the outstanding message in the rdy_queue fetching address corresponding with this thread number to enter into work_queue.
Here, Kernel-level thread is to employing eight stage pipeline structure of the process of message, the fetching address that eight level production lines are corresponding according to the thread number stored in work_queue and this thread number completes the extraction of message instruction in message, after message instruction fetch is complete, analytic message instruction, completes the extraction of the source operand of this message instruction.After extraction source operand, the logic computing unit in ME completes source operand displacement according to the requirement of message instruction, and splicing, the calculating such as plus-minus, are written to result of calculation in destination register or memory.Message, after the process of eight level production lines, has following three kinds of situations:
The first, when message does not need to table look-up and includes untreated message instruction, this message is untreated complete, is continued to process next message instruction of this message by eight level production lines;
The second, when message needs to table look-up, this message is untreated complete, performs step 204;
Three, when Message processing is complete, step 205 is performed.
Step 204, the fetching address write srh_queue that ME will need the thread number of the message of tabling look-up corresponding with it;
Concrete, when message needs to table look-up, by fetching address corresponding to the thread number of message and this thread number write srh_queue, wait is tabled look-up; Now, this message is untreated complete, by its thread number write srh_queue, ME this message sent table look-up time, the message of tabling look-up sent out carries the thread number of this message, and therefore, the thread number of this message is still occupied; Meanwhile, the thread for the treatment of this message is suspended, and waits for returning of response of tabling look-up; The thread corresponding due to this message of tabling look-up is suspended, then eight level production lines have idle pipeline resource, and idle pipeline resource processes the outstanding message corresponding to the thread number stored in rdy_queue.
When ME receive table look-up response time, carry the thread number of this message of tabling look-up in response of tabling look-up, in the table-look-up instruction memory srh_pc_ram that this thread is corresponding, extracted the fetching address of this message by this thread number; Step 202 is returned after extraction instruction address; Wherein, when receiving and tabling look-up response with unique difference of the step 202 in two kinds of situations when receiving new message be: when ME receive message table look-up respond time, directly the fetching address of the thread number of this message and extraction is write rdy_queue, no longer distribute thread number by free_queue; When ME receives new message, by free_queue, the thread number of distributing for this message and fetching address are write rdy_queue.
Step 205, the thread number of the message be disposed is write pkt_out_queue with corresponding fetching address by ME;
Concrete, when Message processing is complete, by fetching address corresponding to the thread number of message and this thread number write pkt_out_queue, wait is output; Now, although this message is processed complete, its thread number is written in pkt_out_queue, and the thread number of this message is still used; When this message sends by ME, the thread number of this message is released, and makes this thread number can be assigned to new message, thus does not take the distributed thread number resource in ME.
Wherein, the thread number discharged in step 205 is written in free_queue in the mode of queue, for being again assigned with.
It should be noted that, when the sum of the thread number of the pending report correspondence stored in rdy_queue is less than 8, automatically empty message is generated in ME, the quantity keeping the thread number stored in rdy_queue and work_queue is 8, and make eight level production lines of ME process 8 messages, thus the at different levels of eight level production lines can normally be performed simultaneously.Here, be step 205 to the result of empty message.
Message after step 201, according to being the thread number that message distributes in step 201, being namely kept at according to the thread distributed for message and having in the packet storage device pkt_ram of two reading-writing port by ME; Here, pkt_ram has two reading-writing port, and the two reading-writing port of eight level production line application is processed the message be stored in pkt_ram.
When eight level production lines of ME are processing message instruction, needing to access pkt_ram, and now, wait for that the message be output also is stored in pkt_ram in write pkt_out_queue, when message exports from ME, also need to access pkt_ram, thus, the read/write conflict of pkt_ram is caused.In order to avoid this conflict, need the acquisition suspending the process of streamline or the message of needs output, the performance of ME is reduced, in the embodiment of the present invention three, by the pkt_ram of two reading-writing port, make the port because seizing pkt_ram and cause the situation of pipeline stalling to occur, the full speed running of streamline energy, improves the performance of ME process message.
Fig. 3 is the workflow schematic diagram of the ME pipeline processes message in step 203, and as shown in Figure 3, the processing procedure of streamline to message is as follows:
When the thread number of message and the self-contained fetching address of message enter work_queue, pipeline resource idle in streamline processes message.Streamline can run 8 threads at most simultaneously, can process 8 messages simultaneously.
First message enters the streamline first order, and instruction obtains 1(IF1, Instruction Fetch1) level, the thread processed message sends the acquisition request of message instruction according to the fetching address thread_pc of this packet storage in work_queue; Wherein, the acquisition request of message instruction is sent in the command memory instrmem for stored messages instruction by thread; Here, instrmem is the RAM separate with the RAM of stored messages, makes the read and write access speed of call instruction high, postpones little, there will not be the situation that access is not hit.
The second level, instruction obtains 2(IF2, Instruction Fetch2) level, from instrmem, receiving message instruction, the message instruction of reception being kept in the command register if_instr of the message instruction for preserving acquisition.
The third level, Instruction decoding (ID, Instruction Decode) level, to the message instructions parse received in if_instr, carry out decoding, produce register file (RF, register file) read command and read address, the source operand of each performance element needs for performing message instruction is obtained from RF; Here, ME is the RF of each thread distribution correspondence in streamline, to store the data relevant to each thread.
The fourth stage, perform 1(EX1, execute1) level, carry out position adjustment to source operand, the action type supported due to ME is more, such as, logic computing unit (ALU, Arithmetic Logical Unit) class calculating, then need the numerical value of source operand to align, for the computing of level V is prepared.This one-level mainly ensures that the arithmetic operation unit performing message instruction does not need to calculate source operand, only needs to carry out position adjustment to improve sequential according to the operand obtained and action type.
Level V, performs 2(EX2, execute2) level, by ALU, the source operand after the adjustment of fourth stage meta is calculated, to perform the calculating of arithmetical operation corresponding to message instruction and corresponding memory address; This part is pure combinational logic, completes the calculating of arithmetical operation corresponding to message instruction and storage address according to source operand.
6th grade, internal storage access 1(MA1, Memory Access1) level, operation requests is here corresponding with message instruction, when message instruction is arithmetical operation, arithmetic operation results is write in result (result) unit; When message instruction is the computing of memory address, send read-write operation request by one of them reading-writing port of pkt_ram to pkt_ram.
7th grade, internal storage access 2(MA2, Memory Access2) level, obtain the data that read-write operation request reads from pkt_ram, simultaneously, be sent to data determining unit Wb_mux using result unit with from the data that pkt_ram reads as the output of streamline, be specially any of three kinds of situations in step 203 with the message judging after pipeline processes according to the output of streamline before the 8th grade of write-back, the result of output namely this message instruction of streamline.
8th grade, write-back (WB, Write Back) level, exports the streamline judged through Wb_mux and is written back in RF, the result of message instruction is come into force.
Wherein, IF1 level and IF2 level extract message instruction from instrmem, the instruction of ID level analytic message, EX1 level completes the extraction of the source operand of message instruction, and EX2 level completes source operand displacement by logic computing unit according to the requirement of message instruction, splicing, the calculating such as plus-minus, the result of message instruction is written in the RF of message instruction by MA1 level, MA2 level, WB level, and the operation of eight grades through above-mentioned streamline performs, and completes the process of a message instruction.
In actual applications, IF1 level send message instruction obtain request time, message instruction corresponding to current fetching address will be extracted; After message instruction fetch, fetching address also changes accordingly, offsets one backward, can correctly extract message instruction when obtaining next message instruction.
As shown in Figure 4, after, each message enters every one-level of streamline in order successively; The corresponding thread of every one-level of streamline, streamline support 8 threads work simultaneously.First message gets message instruction in IF1 level, completes the write-back of result in WB level, completes the process of a message instruction, and each message below lags behind previous message one level production line successively.8 level production lines, at one time, every one-level flowing water performs different operations, completes this grade of corresponding function.When 8 threads work simultaneously, each thread is operated in different pipelining-stages in order.Such as: the T1 moment, Thread0 thread work is in IF1 level; In the T2 moment, Thread0 thread work is in IF2 level, and Thread1 thread work is in IF1 level; In the T3 moment, Thread0 thread work is in ID level, and Thread1 thread work is in IF2 level, Thread2 thread work in IF1 level, successively, when the T8 moment, Thread0 thread work is in WB level, Thread1 is operated in MA2 level, and Thread2 thread work is in MA1 level, and Thread3 is operated in EX2 level, Thread4 is operated in EX1 level, Thread2 thread work is in ID level, and Thread3 is operated in IF2 level, and Thread7 is operated in IF1 level.
When a message goes to WB level, just representative completes the process of this message instruction; Now, message does not need to table look-up and current message instruction is not the last item message instruction of this message, then the thread number of this message and fetching address continue to be stored in work_queue by ME, and pass through next message instruction of this message of thread process of this message of process.
For continuous two relevant ALU message instructions, namely the result of last bar message command calculations is the source operand of a rear message instruction, the instruction of last bar message comes into force at the write-back of RF in the result of WB level write-back, next instruction needs to obtain this result as source operand in ID level from RF, wherein, the write-back of result be acquired in be spaced 5 cycles, that is, next message instruction needs five all after dates could use the result of a upper message instruction, otherwise can produce data hazard.And pipeline series is 8 grades, the corresponding thread of every one-level, a complete message instruction of thread execution, to performing next 8 cycle of message instruction midfeather, these 8 cycles are greater than 5 cycles, avoid the generation of data hazard.
The Article 1 message instruction performed for thread Thread0, Thread0 enters into streamline in the T1 moment, needs just can perform WB level through 8 cycle Article 1 message instructions.At this moment the Article 2 message instruction of Thread0 just can enter into streamline, and namely for Thread0, the message instruction of two, the front and back performed by Thread0 enters streamline and wants 8 cycles of interval.And for eight level production lines, WB level has been the operation of RF write, ID level has been the operation that RF reads, and in 5 cycles of being separated by between this two-stage, a rear instruction does not now perform ID level, avoids the generation of data hazard.
Embodiment three
The structural representation of a kind of ME that Fig. 5 provides for the embodiment of the present invention three, as shown in Figure 3, ME50 comprises thread management module 51, has the packet storage module 52 of two reading-writing port and kernel module 53;
Thread management module 51 can be coordinated with storage chip by central processing unit (CPU, Central Processing Unit) and realizes, for carrying out thread distribution by least five thread management queues to the message received;
Concrete, for five thread management queues, thread management module 51 can by idle queues free_queue in the mode of first in first out for message distributes thread number, and the thread number of distribution and the self-contained fetching address of message are write pending queue rdy_queue, when available free pipeline resource, the thread number of an outstanding message is dispatched and operation queue work_queue is write in fetching address corresponding to described thread number from rdy_queue, what store in work_queue is all thread number and the fetching address of the message processed, when a message needs to table look-up, the thread number of described message and fetching address being written to tables look-up in queue srh_queue, when a Message processing is complete, the thread number of described message and fetching address are written in message output queue pkt_out_queue, wherein, when a message needs to table look-up or be disposed, from work_queue, thread number corresponding to described message and fetching address is deleted.
Packet storage module 52 can be realized by RAM, for message according to distributed threads store.
Kernel module 53 can be coordinated with signal processing chip by CPU and realizes, and controls distributed thread process message for adopting the mode of eight level production lines;
Concrete, kernel module 53 for by every one-level correspondence thread in eight level production lines, wherein,
The first order, thread sends the acquisition request of message instruction according to the fetching address of message;
The second level, thread receives message instruction;
The third level, the instruction of thread analytic message also obtains the source operand of message instruction;
The fourth stage, thread carries out position adjustment to source operand;
Level V, thread performs the calculating of arithmetical operation corresponding to message instruction and corresponding memory address according to the source operand after institute's rheme adjustment;
6th grade, thread sends read-write operation request according to described memory address;
7th grade, thread obtains the response of described operation requests;
8th grade, thread is using the result write-back of the response of the result of described arithmetical operation or described operation requests as described message instruction;
Wherein, after the 8th grade, when determining that message does not need to table look-up and includes untreated message instruction, return untreated message instruction in message described in first order process according to the thread number of described message.
The thread number of described message, also for after described Message processing completes, discharges by thread management module 51.
The present invention is that the flow chart of method, equipment (system) and computer program according to any embodiment in the embodiment of the present invention one to embodiment three and/or block diagram describe.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block diagram and/or square frame and flow chart and/or block diagram and/or square frame.These computer program instructions can being provided to the processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computer or other programmable data processing device produce device for realizing the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.
These computer program instructions also can be loaded in computer or other programmable data processing device, make on computer or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computer or other programmable devices is provided for the step realizing the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.
Accordingly, in the embodiment of the present invention one, two, any embodiment also provides a kind of computer-readable storage medium, wherein stores computer program, and this computer program is for performing the method for the ME process message of any embodiment in the embodiment of the present invention one, two.
The above, be only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention, and all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims (8)

1. micro engine (ME) processes a method for message, it is characterized in that, described method comprises:
ME carries out thread distribution by least five thread management queues to the message received, according to distributed thread by described packet storage in the packet storage device with two reading-writing port, and adopt the mode of eight level production lines to control distributed thread to process the message be stored in described packet storage device.
2. method according to claim 1, is characterized in that, described ME carries out thread by least five thread management queues to the message received and is assigned as:
When ME receives new message, by idle queues free_queue in the mode of first in first out for message distributes thread number, and the thread number of distribution and the self-contained fetching address of message are write pending queue rdy_queue, when having idle pipeline resource in ME, ME from rdy_queue, dispatch the thread number of an outstanding message and operation queue work_queue is write in fetching address corresponding to described thread number, what store in work_queue is all thread number and the fetching address of the message that ME is processing, when a message needs to table look-up, the thread number of described message and fetching address being written to tables look-up in queue srh_queue, when a Message processing is complete, the thread number of described message and fetching address are written in message output queue pkt_out_queue, wherein, when a message needs to table look-up or be disposed, from work_queue, thread number corresponding to described message and fetching address is deleted.
3. method according to claim 2, is characterized in that, the mode of described employing eight level production line controls distributed thread and is treated to the message be stored in described packet storage device:
Eight level production line support eight threads work simultaneously, the corresponding thread of every one-level in eight level production lines; Wherein,
The first order, thread sends the acquisition request of message instruction according to the fetching address of message;
The second level, thread receives message instruction;
The third level, the instruction of thread analytic message also obtains the source operand of message instruction;
The fourth stage, thread carries out position adjustment to source operand;
Level V, thread performs the calculating of arithmetical operation corresponding to message instruction and corresponding memory address according to the source operand after institute's rheme adjustment;
6th grade, thread sends read-write operation request according to described memory address;
7th grade, thread obtains the response of described operation requests;
8th grade, thread is using the result write-back of the response of the result of described arithmetical operation or described operation requests as message instruction;
Wherein, after the 8th grade, when determining that message does not need to table look-up and includes untreated message instruction, return untreated message instruction in message described in first order process according to the thread number of described message.
4. method according to claim 3, is characterized in that, the method also comprises:
After described Message processing completes, the thread number of described message is discharged.
5. a ME, is characterized in that, described ME comprises: thread management module, have packet storage module, the kernel module of two reading-writing port; Wherein,
Described thread management module, for carrying out thread distribution by least five thread management queues to the message received;
Described packet storage module, for message according to distributed threads store;
Described kernel module, controls distributed thread for adopting the mode of eight level production lines and processes the message be stored in described packet storage module.
6. ME according to claim 5, it is characterized in that, described thread management module, specifically for by idle queues free_queue in the mode of first in first out for message distributes thread number, and the thread number of distribution and the self-contained fetching address of message are write pending queue rdy_queue, when available free pipeline resource, the thread number of an outstanding message is dispatched and operation queue work_queue is write in fetching address corresponding to described thread number from rdy_queue, what store in work_queue is all thread number and the fetching address of the message processed, when a message needs to table look-up, the thread number of described message and fetching address being written to tables look-up in queue srh_queue, when a Message processing is complete, the thread number of described message and fetching address are written in message output queue pkt_out_queue, wherein, when a message needs to table look-up or be disposed, from work_queue, thread number corresponding to described message and fetching address is deleted.
7. ME according to claim 6, is characterized in that, described kernel module, specifically for by corresponding for every one-level in an eight level production lines thread; Wherein,
The first order, thread sends the acquisition request of message instruction according to the fetching address of message;
The second level, thread receives message instruction;
The third level, the instruction of thread analytic message also obtains the source operand of message instruction;
The fourth stage, thread carries out position adjustment to source operand;
Level V, thread performs the calculating of arithmetical operation corresponding to message instruction and corresponding memory address according to the source operand after institute's rheme adjustment;
6th grade, thread sends read-write operation request according to described memory address;
7th grade, thread obtains the response of described operation requests;
8th grade, thread is using the result write-back of the response of the result of described arithmetical operation or described operation requests as described message instruction;
Wherein, after the 8th grade, when determining that message does not need to table look-up and includes untreated message instruction, return untreated message instruction in message described in first order process according to the thread number of described message.
8. ME according to claim 7, is characterized in that, described thread management module, also for after described Message processing completes, the thread number of described message is discharged.
CN201410084619.5A 2014-03-07 2014-03-07 A kind of micro engine and its method for handling message Active CN104901901B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410084619.5A CN104901901B (en) 2014-03-07 2014-03-07 A kind of micro engine and its method for handling message
PCT/CN2014/077834 WO2015131445A1 (en) 2014-03-07 2014-05-19 Microengine and packet processing method therefor, and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410084619.5A CN104901901B (en) 2014-03-07 2014-03-07 A kind of micro engine and its method for handling message

Publications (2)

Publication Number Publication Date
CN104901901A true CN104901901A (en) 2015-09-09
CN104901901B CN104901901B (en) 2019-03-12

Family

ID=54034300

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410084619.5A Active CN104901901B (en) 2014-03-07 2014-03-07 A kind of micro engine and its method for handling message

Country Status (2)

Country Link
CN (1) CN104901901B (en)
WO (1) WO2015131445A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257280A (en) * 2017-07-14 2019-01-22 深圳市中兴微电子技术有限公司 A kind of micro engine and its method for handling message
CN109298923A (en) * 2018-09-14 2019-02-01 中科驭数(北京)科技有限公司 Deep pipeline task processing method and device
WO2024001411A1 (en) * 2022-06-27 2024-01-04 深圳市中兴微电子技术有限公司 Multi-thread scheduling method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5560029A (en) * 1991-07-22 1996-09-24 Massachusetts Institute Of Technology Data processing system with synchronization coprocessor for multiple threads
US6829697B1 (en) * 2000-09-06 2004-12-07 International Business Machines Corporation Multiple logical interfaces to a shared coprocessor resource
CN1767502A (en) * 2004-09-29 2006-05-03 英特尔公司 Updating instructions executed by a multi-core processor
CN101763285A (en) * 2010-01-15 2010-06-30 西安电子科技大学 Zero-overhead switching multithread processor and thread switching method thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102752198B (en) * 2012-06-21 2014-10-29 北京星网锐捷网络技术有限公司 Multi-core message forwarding method, multi-core processor and network equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5560029A (en) * 1991-07-22 1996-09-24 Massachusetts Institute Of Technology Data processing system with synchronization coprocessor for multiple threads
US6829697B1 (en) * 2000-09-06 2004-12-07 International Business Machines Corporation Multiple logical interfaces to a shared coprocessor resource
CN1767502A (en) * 2004-09-29 2006-05-03 英特尔公司 Updating instructions executed by a multi-core processor
CN101763285A (en) * 2010-01-15 2010-06-30 西安电子科技大学 Zero-overhead switching multithread processor and thread switching method thereof

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257280A (en) * 2017-07-14 2019-01-22 深圳市中兴微电子技术有限公司 A kind of micro engine and its method for handling message
CN109257280B (en) * 2017-07-14 2022-05-27 深圳市中兴微电子技术有限公司 Micro-engine and message processing method thereof
CN109298923A (en) * 2018-09-14 2019-02-01 中科驭数(北京)科技有限公司 Deep pipeline task processing method and device
CN109298923B (en) * 2018-09-14 2019-11-29 中科驭数(北京)科技有限公司 Deep pipeline task processing method and device
WO2024001411A1 (en) * 2022-06-27 2024-01-04 深圳市中兴微电子技术有限公司 Multi-thread scheduling method and device

Also Published As

Publication number Publication date
CN104901901B (en) 2019-03-12
WO2015131445A1 (en) 2015-09-11

Similar Documents

Publication Publication Date Title
KR102333341B1 (en) Exception handling in microprocessor systems
CN101739242B (en) Stream data processing method and stream processor
US10318297B2 (en) Method and apparatus for operating a self-timed parallelized multi-core processor
WO2009006607A1 (en) Dynamically composing processor cores to form logical processors
CN105279017B (en) Method for allocating tasks, apparatus and system based on cloud test platform
CN106575220B (en) Multiple clustered VLIW processing cores
CA2533741A1 (en) Programmable delayed dispatch in a multi-threaded pipeline
US10049043B2 (en) Flushing control within a multi-threaded processor
CN105183564A (en) Equipment scheduling method, device and system based on cloud testing platform
TW201719398A (en) Scheduling method and processing device using the same
US20200319893A1 (en) Booting Tiles of Processing Units
CN115129480B (en) Scalar processing unit and access control method thereof
CN104901901A (en) Micro-engine and method for processing message therewith
TW201224991A (en) Next-instruction-type field
CN105094747B (en) The device of central processing unit based on SMT and the data dependence for detection instruction
CN116414541B (en) Task execution method and device compatible with multiple task working modes
CN104915258A (en) Distributed task scheduling method and device
KR20240025019A (en) Provides atomicity for complex operations using near-memory computing
KR102210765B1 (en) A method and apparatus for long latency hiding based warp scheduling
CN103019844A (en) Method and device supporting calling of MPI (Message Passing Interface) function through multiple threads
US9223699B2 (en) Cache management in managed runtime environments
US8683181B2 (en) Processor and method for distributing load among plural pipeline units
Kwon et al. Mobile GPU shader processor based on non-blocking coarse grained reconfigurable arrays architecture
CN112732416B (en) Parallel data processing method and parallel processor for effectively eliminating data access delay
CN103019843A (en) MPI (message passing interface) function calling method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20150909

Assignee: Xi'an Chris Semiconductor Technology Co. Ltd.

Assignor: SHENZHEN ZTE MICROELECTRONICS TECHNOLOGY CO., LTD.

Contract record no.: 2019440020036

Denomination of invention: Micro-engine and method for processing message therewith

Granted publication date: 20190312

License type: Common License

Record date: 20190619