CN104901901B - A kind of micro engine and its method for handling message - Google Patents
A kind of micro engine and its method for handling message Download PDFInfo
- Publication number
- CN104901901B CN104901901B CN201410084619.5A CN201410084619A CN104901901B CN 104901901 B CN104901901 B CN 104901901B CN 201410084619 A CN201410084619 A CN 201410084619A CN 104901901 B CN104901901 B CN 104901901B
- Authority
- CN
- China
- Prior art keywords
- message
- thread
- queue
- thread number
- instruction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/64—Hybrid switching systems
- H04L12/6418—Hybrid transport
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Advance Control (AREA)
Abstract
The invention discloses a kind of micro engine (ME) and its methods for handling message, ME carries out thread distribution to message is received by least five thread management queues, according to the thread distributed by the packet storage in the packet storage device with double reading-writing ports, and control distributed thread by the way of eight level production lines and the message being stored in the packet storage device handled;The present invention also discloses a kind of ME for handling message.
Description
Technical field
The present invention relates to Network Processor technology more particularly to a kind of micro engine (ME, Micro Engine) and its processing
The method of message.
Background technique
In order to meet the needs of future network development, the performance of router is improved, internet (Internet) backbone is in
The core router of position has carried out one and another technological change.Especially in high-end router market, network processing unit with
Its outstanding Message processing performance and programmability, which have become, constitutes the irreplaceable part of routing forwarding engine.Industry at present
The basic network processor arrangement for using multithreading, and the management of multithreading and scheduling are to influence multi-threaded network processor performance
A key factor.
In network processor system, ME is the core component of network processing unit.Multithreaded architecture is to improve network processes
A kind of effective ways of device ME performance, but the problems such as also bring along the complexity and system frequency bottleneck of thread management.Therefore
It needs to design a reasonable scheme to realize the ME threading scheduling management of high frequency efficient, while ME being made to have higher treatability
Energy.
Some traditional multi-threaded network processors use the ME that dispatches based on coarseness, although such ME can be with
Guarantee that the instruction an of thread executes at full speed, but in the switching of each secondary thread, the load and preservation of data can all cause interior
The free time of core assembly line, so as to cause the decline of ME performance.
In addition, due to only having a thread executing in the assembly line of ME, so needing to solve data when design scheme to emit
The problem of danger.When the design being pushed forward using data, the complexity of logic will be increased, and relevant in two results of continuous processing
Message causes the increase in combinational logic path when instructing, lead to the decline of system frequency.
Summary of the invention
In view of this, an embodiment of the present invention is intended to provide a kind of ME and its handle message method, can overcome it is existing
ME frequency and the not high problem of performance.
The technical scheme of the present invention is realized as follows:
The present invention provides a kind of methods of ME processing message, which comprises ME passes through at least five thread managements
Queue carries out thread distribution to the message received, and the packet storage is had double reading-writing ports according to the thread distributed
Packet storage device in, and control by the way of eight level production lines distributed thread to being stored in the packet storage device
Message handled.
In above scheme, the ME carries out thread distribution to the message received by least five thread management queues
Are as follows: when ME receives new message, thread number is distributed as message in the way of first in first out by idle queues free_queue, and
Queue rdy_queue to be processed is write into the self-contained fetching address of the thread number of distribution and message, it is idle when having in ME
Pipeline resource when, ME dispatched from rdy_queue an outstanding message thread number and the thread number it is corresponding
Operation queue work_queue is write in fetching address, and what is stored in work_queue is all the thread for the message that ME is being handled
Number and fetching address the thread number of the message and fetching address are written to queue of tabling look-up when a message needs to table look-up
In srh_queue, when a Message processing finishes, the thread number of the message and fetching address are written to message output team
It arranges in pkt_out_queue;Wherein, when a message needs table look-up or are disposed, from work_queue described in deletion
The corresponding thread number of message and fetching address.
It is described to control distributed thread by the way of eight level production lines and deposited to the message is stored in above scheme
Message in reservoir is handled are as follows: eight level production lines support eight threads to work at the same time, and every level-one is corresponding in eight level production lines
One thread;Wherein, the first order, thread send the acquisition request that message instructs according to the fetching address of message;The second level, thread
Receive message instruction;The third level, thread analytic message instruct and obtain the source operand of message instruction;The fourth stage, thread is to source
Operand carries out position adjustment;Level V, thread execute message according to institute's rheme source operand adjusted and instruct corresponding arithmetic
The calculating of operation and corresponding storage address;6th grade, thread issues read-write operation request according to the storage address;7th
Grade, thread obtain the response of the operation requests;8th grade, thread is by the result of the arithmetical operation or the operation requests
Respond the processing result write-back instructed as message;Wherein, after the 8th grade, determine that message does not need to table look-up and includes not
When the message instruction of processing, untreated message in the first order processing message is returned to according to the thread number of the message and is referred to
It enables.
In above scheme, after the completion of the Message processing, the thread number of the message is discharged.
The present invention provides a kind of ME, the ME includes: thread management module, the packet storage mould with double reading-writing ports
Block, kernel module;Wherein, the thread management module, for passing through at least five thread management queues to the message received
Carry out thread distribution;The packet storage module, for the message according to the threads store distributed;The kernel module,
The message being stored in the packet storage module is carried out for controlling distributed thread by the way of eight level production lines
Processing.
In above scheme, the thread management module is specifically used for through idle queues free_queue with first in first out
Mode be that message distributes thread number, and queue to be processed is write into the self-contained fetching address of the thread number of distribution and message
Rdy_queue, when available free pipeline resource, from rdy_queue dispatch an outstanding message thread number and
Operation queue work_queue is write in the corresponding fetching address of the thread number, and what is stored in work_queue is all to locate
The thread number of the message of reason and fetching address, when a message needs to table look-up, by the thread number of the message and fetching address
It is written in the queue srh_queue that tables look-up, when a Message processing finishes, by the thread number of the message and fetching address
It is written in message output queue pkt_out_queue;Wherein, when a message needs table look-up or are disposed, from
The corresponding thread number of the message and fetching address are deleted in work_queue.
In above scheme, the kernel module is specifically used for the corresponding thread of level-one every in eight level production lines;Its
In, the first order, thread sends the acquisition request that message instructs according to the fetching address of message;The second level, thread receive message and refer to
It enables;The third level, thread analytic message instruct and obtain the source operand of message instruction;The fourth stage, thread carry out source operand
Position adjustment;Level V, thread execute message according to institute's rheme source operand adjusted and instruct corresponding arithmetical operation and correspondence
Storage address calculating;6th grade, thread issues read-write operation request according to the storage address;7th grade, thread obtains
The response of the operation requests;8th grade, thread is using the response of the result of the arithmetical operation or the operation requests as institute
State the processing result write-back of message instruction;Wherein, after the 8th grade, determine that message does not need to table look-up and includes untreated
When message instructs, the first order is returned to according to the thread number of the message and handles untreated message instruction in the message.
In above scheme, the thread management module is also used to after the completion of the Message processing, by the line of the message
Journey number release.
It can be seen that a kind of method that the embodiment of the present invention provides ME and its handles message, ME passes through at least five threads
Management queue carries out thread distribution to message is received, and the packet storage is had double read-write ends according to the thread distributed
Mouthful packet storage device in, and control by the way of eight level production lines distributed thread to being stored in the packet storage mould
Message in block is handled;The generation that data hazard is avoided from hardware configuration, simplifies logic, does not need to carry out and count
According to venture correlated judgment logic, and the generation of ME internal resource access conflict is avoided, effectively improves ME working frequency and property
Can, guarantee the high performance processing message of ME high-frequency, and scheme realization is relatively easy, can reduce the complexity of coding, from
And reduce human cost.
Detailed description of the invention
Fig. 1 is the flow diagram for the method that the ME that the embodiment of the present invention one provides handles message;
Fig. 2 is the flow diagram for the method that ME provided by Embodiment 2 of the present invention handles message;
Fig. 3 is the course of work schematic diagram of one message of ME pipeline processes provided by Embodiment 2 of the present invention;
Fig. 4 is the course of work schematic diagram of the multiple messages of ME pipeline processes provided by Embodiment 2 of the present invention;
The structural schematic diagram for the ME that Fig. 5 embodiment of the present invention three provides.
Specific embodiment
In embodiments of the present invention, ME carries out thread point to the message received by least five thread management queues
Match, is flowed the packet storage in the packet storage device with double reading-writing ports, and using eight grades according to the thread distributed
The mode of waterline controls distributed thread and handles the message being stored in the packet storage device.
With reference to the accompanying drawing and specific embodiment is described in further detail technical solution of the present invention.
Embodiment one
Fig. 1 is the flow diagram for the mode that the ME that the embodiment of the present invention one provides handles message, as shown in Fig. 1, the party
Method the following steps are included:
Step 101, ME carries out thread distribution to the message received by least five thread management queues;
Specifically, by taking five thread management queues as an example, when ME receives new message, by idle queues free_
Queue distributes thread number as message in the way of first in first out, and by the self-contained fetching of the thread number of distribution and message
Queue rdy_queue to be processed is write in location, and when having idle pipeline resource in ME, ME dispatches one from rdy_queue
Operation queue work_queue, work_ are write in the thread number of a outstanding message and the corresponding fetching address of the thread number
What is stored in queue is all thread number and the fetching address for the message that ME is being handled, when a message needs to table look-up, by institute
The thread number and fetching address for stating message are written in the queue srh_queue that tables look-up, will be described when a Message processing finishes
The thread number of message and fetching address are written in message output queue pkt_out_queue;Wherein, when a message needs are looked into
Table or when being disposed, while by the thread number of message and fetching address write-in srh_queue or pkt_out_queue, from
The corresponding thread number of the message and fetching address are deleted in work_queue.
Wherein, it is corresponded by the thread number that free_queue is message distribution with message itself, passes through the line distributed
Journey number can determine that its corresponding message.
When the message that ME is being handled is less than 8, ME has free pipeline resource, then dispatches from rdy_queue
The thread number of one outstanding message and the corresponding fetching address of the thread number are write in work_queue, by the free time
Pipeline resource distributes to the corresponding message of thread number that scheduling at this time enters work_queue, corresponding by the thread number of the message
Thread the message is handled using idle pipeline resource.Here, what is stored in work_queue is being handled
The sum of the thread number of message is 8, corresponding with eight level production lines;When the sum of the thread number stored in work_queue is 8
When, the quantity for the message that ME is being handled is 8, wherein each message corresponds to a thread, at this point, having in eight level production lines of ME
8 threads cycle operation in eight level production lines.
During eight level production lines handle message, one thread of every grade of correspondence of eight level production lines, at per thread
A message is managed, therefore, ME assembly line can simultaneously be handled 8 messages, when a Message processing in 8 messages finishes
Later, the corresponding thread number of the message being disposed is written in pkt_out_queue, and the message that this is disposed
Corresponding thread number and fetching address are deleted from work_queue, then the sum of the thread number stored in work_queue is small
In 8, correspondingly, since the Message processing finishes, for the message being disposed distribution pipeline resource since it is processed
Journey is finished, and is set to idle state, is used again to handle other messages.
After Message processing, ME discharges the thread number of the message and the corresponding thread of the thread number;Here, it releases
The corresponding thread of the thread number put is assigned again to ME in received message later.
Step 102, ME according to the thread distributed by the packet storage in the packet storage device with double reading-writing ports
In;
Here, ME is that the message received distributes after thread number in a step 101, is assigned with corresponding line for message
Journey, then ME first stores received message according to the thread distributed, by packet storage in the message with dual-read port
In memory.
In practical applications, packet storage device is random access memory (RAM, Random with double reading-writing ports
Access Memory).
Step 103, ME controls distributed thread to being stored in the packet storage device by the way of eight level production lines
In message handled;
Specifically, ME uses eight grades of flowing water when work_queue is written with corresponding fetching address in the thread number of message
The mode of line controls the thread distributed in a step 101 and handles the message being stored in packet storage device.
Here, eight level production lines support eight threads to work at the same time, the corresponding thread of every level-one in eight level production lines,
In,
The first order, thread send the acquisition request that message instructs according to the fetching address of message;
The second level, thread receive message instruction;
The third level, thread analytic message instruct and obtain the source operand of message instruction;
The fourth stage, thread carry out position adjustment to source operand;
Level V, thread execute message according to institute's rheme source operand adjusted and instruct corresponding arithmetical operation and correspondence
Storage address calculating;
6th grade, thread issues read-write operation request according to the storage address;
7th grade, thread obtains the response of the operation requests;
8th grade, the processing that thread instructs the response of the result of the arithmetical operation or the operation requests as message
As a result write-back;
Wherein, after the 8th grade, when determining that message does not need to table look-up and include the instruction of untreated message, according to institute
The thread number for stating message returns to the first order processing untreated message instruction, until at the message instruction all of the message
Reason is completed.
For a message, the successively place Jing Guo the first order to the 8th grade is needed to the processing of one article message instruction
Reason, under normal circumstances, a message include a plurality of message instruction, therefore, after the completion of the eight grades of execution of the first order-the, need to sentence
Whether the message that breaks needs to table look-up, and determines whether to include untreated message instruction, does not need to look into when determining in the message
Table and when including the instruction of untreated message, the corresponding thread number of the message and the corresponding fetching address of the thread number are continued
It is stored in work_queue, and continues to instruct untreated message by eight level production lines by the corresponding thread of the thread number
It is handled;When determining that message needs to table look-up, the corresponding thread number of the message is written in srh_queue, by the message
Corresponding thread suspension is continued by the corresponding thread of the thread number to the message until ME receives the response of tabling look-up of the message
Message instruction is handled;When Message processing finishes, that is, does not need to table look-up and instruct there is no untreated message, by the report
The corresponding thread number of text is written in pkt_out_queue.
In practical applications, the 6th grade of eight level production lines to the RAM of double reading-writing ports send read-write operation request when,
One in two reading-writing ports is only used, another reading-writing port is used for when ME receives message load or acquisition request
Message receives and sends in ME.
Embodiment two
In example 2, ME completes the management of message and the scheduling of thread by following 5 queues:
Idle queues free_queue, queue rdy_queue to be processed, operation queue work_queue, queue of tabling look-up
Srh_queue, message output queue pkt_out_queue;Wherein, free_queue is that the new message into ME distributes thread
Number;Rdy_queue store outstanding message thread number and fetching address, wherein rdy_queue can at most store 8 to
The thread number of processing and fetching address;The thread number for the message that work_queue storage is being handled and fetching address;srh_
Queue storage needs the thread number for the message tabled look-up and fetching address, pkt_out_queue storage to need to issue after being disposed
Message thread number and fetching address;Wherein, message is when entering ME, self-contained fetching address.
Fig. 2 is the method that a kind of ME provided by Embodiment 2 of the present invention handles message, as shown in Figure 2, comprising the following steps:
Step 201, ME is that message distributes thread number by free_queue;
Specifically, being that message distributes thread number by free_queue, wherein free_ when ME receives new message
The depth of queue is that the depth of 16, free_queue determines the quantity that can distribute thread number, that is to say, that free_queue
Thread number for message distribution is up to 16;The thread and free_queue that can be used for handling message in ME can be message distribution
The quantity of thread number is consistent, is also 16;Here, the depth 16 of free_queue is according to all-round property testing result and money
What the cost calculation in source obtained.
Free_queue distributes thread number in a manner of first in first out for received message, and the depth of free_queue is
16,16 thread numbers can be 0-15;When receiving message, free_queue distributes to the thread number for coming queue foremost
The message received.After message completion has been handled and exported, then the corresponding thread number of message for completing processing is discharged, and will
The thread number of the release stores to free_queue, free_queue and the thread number of release is placed on to the foremost of queue, with to
When receiving new message again, the thread number of the release is again assigned to new message.When ME is powered on or resets starting,
16 thread numbers of free_queue are received report since the thread number for coming queue foremost all in unallocated state
Text distribution thread number, and rear received message distributes thread number for it by the way of first in first out.
ME by giving message to distribute thread number, make between message and message it is mutually indepedent, and by thread number by message with
The storage resource that ME distributes to message establishes corresponding mapping relations.
Step 202, ME is that rdy_queue is written with corresponding fetching address in the thread number of message distribution;
Here, after distributing thread number for message, free_queue writes the thread number of distribution and fetching address
Rdy_queue, at this point, the corresponding message of thread number into rdy_queue is outstanding message;It is stored in rdy_queue
The thread number of outstanding message and fetching address, and store up to the thread number and fetching address of 8 outstanding messages.
Step 203, when having idle pipeline resource, ME dispatches an outstanding message from rdy_queue
Work_queue is written in the corresponding fetching address of thread number;
Specifically, ME dispatches a report to be processed from rdy_queue when having idle pipeline resource in ME
Thread number fetching corresponding with the thread number address of text is written to work_queue, at this point, by the line of the distribution for the message
The corresponding thread of journey number is handled the message by pipeline resource idle in assembly line;Wherein, in work_queue
What is stored is all the thread number for the message that ME is being handled.After message needs table look-up or are disposed, ME is from work_
The corresponding fetching address of the corresponding thread number of the message is deleted in queue, the corresponding thread number of the message is corresponding
Fetching address be written in srh_queue or pkt_out_queue so that work_queue allow rdy_queue in
Thread number fetching corresponding with the thread number address of outstanding message enter in work_queue.
Here, Kernel-level thread is to eight stage pipeline structures of use of the processing of message, and eight level production lines are according to work_
Complete the extraction of message instruction in message, message instruction in the thread number stored in queue fetching corresponding with thread number address
After extraction finishes, the extraction of the source operand of message instruction is completed in analytic message instruction.After extracting source operand,
Logic computing unit in ME is completed to shift source operand according to the requirement that message instructs, and splicing, plus-minus etc. calculates, and will calculate
As a result it is written in destination register or memory.Message has following three kinds of feelings after the processing of eight level production lines
Condition:
The first, when message does not need to table look-up and include the instruction of untreated message, the message is untreated to be finished, by eight
Level production line continues to handle the next message instruction of the message;
The second, when message needs to table look-up, the message is untreated to be finished, and executes step 204;
Third, when Message processing finishes, execute step 205.
Step 204, srh_queue is written in the corresponding fetching address of the thread number for the message for needing to table look-up by ME;
Specifically, the thread number of message fetching corresponding with thread number address is written when message needs to table look-up
In srh_queue, waiting is tabled look-up;Finished at this point, the message is untreated, by its thread number be written srh_queue, ME should
Message is sent when being tabled look-up, and the message of tabling look-up sent out carries the thread number of the message, therefore, the thread number of the message
It is still occupied;Meanwhile the thread for handling the message is suspended, and waits the return for response of tabling look-up;Due to the message tabled look-up
Corresponding thread is suspended, then eight level production lines have idle pipeline resource, and idle pipeline resource is to rdy_queue
Outstanding message corresponding to the thread number of middle storage is handled.
When ME receive table look-up respond when, carry the thread number of the message of tabling look-up in response of tabling look-up, existed by the thread number
The fetching address of the message is extracted in the corresponding table-look-up instruction memory srh_pc_ram of the thread;It is returned after extracting IA
Return step 202;Wherein, in the case that receive table look-up respond when it is unique with the step 202 when two kinds when receiving new message
Difference is: when ME receive message table look-up respond when, directly by the fetching address of the thread number of the message and extraction be written
Rdy_queue no longer distributes thread number by free_queue;It will be the report by free_queue when ME receives new message
Rdy_queue is written in the thread number of text distribution and fetching address.
Step 205, pkt_out_queue is written with corresponding fetching address in the thread number for the message being disposed by ME;
Specifically, the thread number of message fetching corresponding with thread number address is written when Message processing finishes
In pkt_out_queue, waiting is exported;It is finished at this point, the message is although processed, but its thread number is written to pkt_
In out_queue, the thread number of the message is still used;When ME sends the message, the thread number quilt of the message
Release, so that the thread number can be assigned to new message, distributes thread number resource to be not take up in ME.
Wherein, it is written in free_queue in a manner of queue in the thread number that step 205 discharges, for being divided again
Match.
It should be noted that when the sum of the corresponding thread number of report to be processed stored in rdy_queue is less than 8,
Empty message is automatically generated in ME, keeping the quantity of the thread number stored in rdy_queue and work_queue is 8, and is made
Eight level production lines of ME handle 8 messages simultaneously, so that the at different levels of eight level production lines can normally execute.Here, to sky
The processing result of message is step 205.
ME after step 201, divides according to the thread number in step 201 being message distribution, that is, according to by message
Message is stored in the packet storage device pkt_ram with double reading-writing ports by the thread matched;Here, pkt_ram has double read
Write port, so that the double reading-writing ports of eight level production lines application handle the message being stored in pkt_ram.
When eight level production lines of ME are handling message instruction, need to access pkt_ram, and at this point, write-in pkt_
The message exported is waited also to be stored in pkt_ram in out_queue, when message is exported from ME, it is also desirable to access pkt_
Ram causes the read/write conflict of pkt_ram as a result,.In order to avoid this conflict, needs to suspend the processing of assembly line or need
The acquisition for the message to be exported, so that the reduced performance of ME, in the embodiment of the present invention three, by the pkt_ram of double reading-writing ports,
So that the case where leading to pipeline stalling because of the port for seizing pkt_ram will not occur, assembly line energy full speed running is improved
The performance of ME processing message.
Fig. 3 is the workflow schematic diagram of one message of ME pipeline processes in step 203, as shown in Fig. 3, flowing water
Line is as follows to the treatment process of message:
When the thread number of message and the self-contained fetching address of message enter work_queue, idle stream in assembly line
Waterline resource handles message.Assembly line can at most run 8 threads simultaneously, can handle 8 messages simultaneously.
Message initially enters the assembly line first order, and instruction obtains 1(IF1, Instruction Fetch1) grade, to message into
The thread of row processing sends obtaining for message instruction according to fetching address thread_pc of the packet storage in work_queue
Take request;Wherein, the acquisition request that message instructs is sent to the command memory for being used for stored messages instruction by thread
In instrmem;Here, instrmem is the mutually independent RAM of RAM with stored messages, makes the read and write access speed of call instruction
Height, it is not in the case where access is not hit by that delay is small.
The second level, instruction obtain 2(IF2, Instruction Fetch2) grade, message instruction is received from instrmem,
The instruction of received message is stored in the command register if_instr for the message instruction for being used to save acquisition.
The third level, Instruction decoding (ID, Instruction Decode) grade instruct message received in if_instr and solve
Analysis, is decoded, and register file (RF, register file) read command and read address are generated, and is obtained from RF for executing report
The source operand that each execution unit of text instruction needs,;Here, ME is that the per thread in assembly line distributes corresponding RF, with
Store data relevant to per thread.
The fourth stage executes 1(EX1, execute1) grade, position adjustment is carried out to source operand, the operation class supported due to ME
Type is more, for example, logic computing unit (ALU, Arithmetic Logical Unit) class calculates, then needs source operand
Numerical value be aligned, prepare for the operation of level V.This level-one is mainly the arithmetical operation list for guaranteeing to execute message instruction
Member does not need to calculate source operand, it is only necessary to when carrying out position adjustment according to the operand of acquisition and action type to improve
Sequence.
Level V, execute 2(EX2, execute2) grade, by ALU in the fourth stage position adjustment after source operand into
Row calculates, to execute the calculating that message instructs corresponding arithmetical operation and corresponding storage address;This part is pure combinational logic,
Message, which is completed, according to source operand instructs corresponding arithmetical operation and the calculating of storage address.
6th grade, internal storage access 1(MA1, Memory Access1) grade, operation requests here are corresponding with message instruction,
When message instruction is arithmetical operation, arithmetic operation results are write in result (result) unit;When message instruction is storage
When the operation of address, read-write operation request is issued to pkt_ram by one of reading-writing port of pkt_ram.
7th grade, internal storage access 2(MA2, Memory Access2) grade, it obtains read-write operation request and is read from pkt_ram
Data, meanwhile, using result unit and from pkt_ram read data as the output of assembly line be sent to data judgement singly
First Wb_mux is specially by the message after pipeline processes with the output judgement before the 8th grade of write-back according to assembly line
Any, processing result of the output of assembly line namely this message instruction of three kinds of situations in step 203.
8th grade, the assembly line output by Wb_mux judgement is written back in RF by write-back (WB, Write Back) grade,
The processing result for instructing message comes into force.
Wherein, message instruction, ID grades of analytic message instructions, EX1 grades of completion reports are extracted from instrmem with IF2 grades for IF1 grades
The extraction of the source operand of text instruction, EX2 grades are completed to move source operand by logic computing unit according to the requirement that message instructs
Position, splicing, plus-minus etc. calculate, and MA1 grades, MA2 grades, WB grades are written to the result that message instructs in the RF of message instruction, pass through
Eight grades of operation of above-mentioned assembly line executes, and completes the processing of message instruction.
In practical applications, when IF1 grades of sending messages instruct acquisition request, the corresponding message instruction in current fetching address
It will be extracted;After message instruction is extracted, fetching address also changes accordingly, deviates one backward, to obtain next report
Message instruction can be correctly extracted when text instruction.
As shown in figure 4, each message sequentially enters every level-one of assembly line in order below;Every level-one of assembly line
A corresponding thread, assembly line support 8 threads to work at the same time.First message takes message to instruct at IF1 grades, completes at WB grades
The write-back of processing result, completes the processing of message instruction, each subsequent message successively lags behind previous message one
Level production line.8 level production lines, in the same time, every level-one flowing water executes different operations, completes this grade of corresponding function.When 8
When a thread works at the same time, each thread is worked in order in different pipelining-stages.Such as: T1 moment, Thread0 thread work
Make at IF1 grades;At the T2 moment, Thread0 thread work is at IF2 grades, and Thread1 thread work is at IF1 grades;The T3 moment,
Thread0 thread work is at ID grades, and Thread1 thread work is at IF2 grades, and Thread2 thread work is at IF1 grades, successively, works as T8
When the moment, Thread0 thread work at WB grades, Thread1 work at MA2 grades, Thread2 thread work at MA1 grades,
Thread3 work is at EX2 grades, and Thread4 work is at EX1 grades, and at ID grades, Thread3's Thread2 thread work works in IF2
Grade, Thread7 work at IF1 grades.
As soon as a message goes to WB grades, the processing for completing the message one instruction is represented;At this point, message does not need to look into
Table and current message instruction are not the last item message instructions of the message, then ME by the thread number of the message and fetching address after
It is continuous to be stored in work_queue, and the next message instruction by handling the thread process of the message message.
The instruction of relevant for continuous two ALU message, that is, previous message instruction calculate the result is that latter item
The source operand of message instruction, previous message is instructed to come into force in write-back of the processing result of WB grades of write-backs in RF, next finger
Enable and need at ID grade to obtain the processing result as source operand from RF, wherein the write-back of processing result be acquired middle interval
5 periods, that is to say, that the processing knot that next message instruction could use upper message to instruct after needing five periods
Otherwise fruit can generate data hazard.And pipeline series are 8 grades, the corresponding thread of every level-one, a thread has executed one
Message instruction instructs 8 periods of midfeather to next message is executed, which is greater than 5 periods, avoids data
The generation of venture.
For thread Thread0, first message that Thread0 is executed, which instructs, enters assembly line at the T1 moment,
It needs just go to WB grades by 8 periods, first message instruction.At this moment the Article 2 message of Thread0 instructs just meeting
Assembly line is entered, i.e., for Thread0, two messages in front and back instruction performed by Thread0 enters between assembly line wants
Every 8 periods.And for eight level production lines, WB grades are the operations for completing RF write-in, and ID grades are the behaviour for completing RF and reading
Make, be separated by 5 periods between this two-stage, latter item instruction is not carried out ID grades at this time, avoids the generation of data hazard.
Embodiment three
Fig. 5 is a kind of structural schematic diagram for ME that the embodiment of the present invention three provides, as shown in figure 3, ME50 includes thread pipe
Manage module 51, packet storage module 52 and kernel module 53 with double reading-writing ports;
Thread management module 51 can be matched by central processing unit (CPU, Central Processing Unit) with storage chip
It closes and realizes, for carrying out thread distribution to the message received by least five thread management queues;
Specifically, thread management module 51 can pass through idle queues free_ by taking five thread management queues as an example
Queue distributes thread number as message in the way of first in first out, and by the self-contained fetching of the thread number of distribution and message
Queue rdy_queue to be processed is write in location, and when available free pipeline resource, scheduling one is to be processed from rdy_queue
Message thread number and the corresponding fetching address of the thread number write operation queue work_queue, deposited in work_queue
What is stored up is all thread number and the fetching address of the message handled, when a message needs to table look-up, by the line of the message
Journey number and fetching address are written in the queue srh_queue that tables look-up, when a Message processing finishes, by the thread of the message
Number and fetching address be written in message output queue pkt_out_queue;Wherein, when a message needs are tabled look-up or handled
Bi Shi deletes the corresponding thread number of the message and fetching address from work_queue.
Packet storage module 52 can be realized by RAM, for the message according to the threads store distributed.
Kernel module 53 can be realized with CPU and signal processing chip, for being controlled by the way of eight level production lines
The thread distributed handles message;
Specifically, kernel module 53 is used for the corresponding thread of level-one every in eight level production lines, wherein
The first order, thread send the acquisition request that message instructs according to the fetching address of message;
The second level, thread receive message instruction;
The third level, thread analytic message instruct and obtain the source operand of message instruction;
The fourth stage, thread carry out position adjustment to source operand;
Level V, thread execute message according to institute's rheme source operand adjusted and instruct corresponding arithmetical operation and correspondence
Storage address calculating;
6th grade, thread issues read-write operation request according to the storage address;
7th grade, thread obtains the response of the operation requests;
8th grade, thread instructs the response of the result of the arithmetical operation or the operation requests as the message
Processing result write-back;
Wherein, after the 8th grade, when determining that message does not need to table look-up and include the instruction of untreated message, according to institute
The thread number for stating message returns to the first order and handles untreated message instruction in the message.
Thread management module 51 is also used to after the completion of the Message processing, and the thread number of the message is discharged.
The present invention be according to embodiments of the present invention one into embodiment three method of any embodiment, equipment (system) and
The flowchart and/or the block diagram of computer program product describes.It should be understood that flow chart can be realized by computer program instructions
And/or the knot of the process and/or box in each flow and/or block and flowchart and/or the block diagram in block diagram
It closes.These computer program instructions be can provide to general purpose computer, special purpose computer, Embedded Processor or other programmable numbers
According to the processor of processing equipment to generate a machine, so that passing through the processing of computer or other programmable data processing devices
The instruction that device executes generates for realizing in one box of one or more flows of the flowchart and/or block diagram or multiple sides
The device for the function of being specified in frame.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Correspondingly, any embodiment also provides a kind of computer storage medium in the embodiment of the present invention one, two, wherein storing
There is computer program, which is used to execute the ME processing message of any embodiment in the embodiment of the present invention one, two
Method.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention, it is all
Made any modifications, equivalent replacements, and improvements etc. within the spirit and principles in the present invention, should be included in protection of the invention
Within the scope of.
Claims (4)
1. a kind of method of micro engine (ME) processing message, which is characterized in that the described method includes:
ME carries out thread distribution to the message that receives by least five thread management queues, according to the thread distributed by institute
Packet storage is stated in the packet storage device with double reading-writing ports, and controls distributed line by the way of eight level production lines
Journey handles the message being stored in the packet storage device;
Wherein, the ME carries out thread distribution to the message received by least five thread management queues are as follows:
When ME receives new message, thread number is distributed as message in the way of first in first out by idle queues free_queue,
And queue rdy_queue to be processed is write into the self-contained fetching address of the thread number of distribution and message, it is empty when having in ME
When the pipeline resource in spare time, ME dispatches the thread number an of outstanding message from rdy_queue and the thread number corresponds to
Fetching address write operation queue work_queue, what is stored in work_queue is all the thread for the message that ME is being handled
Number and fetching address the thread number of the message and fetching address are written to queue of tabling look-up when a message needs to table look-up
In srh_queue, when a Message processing finishes, the thread number of the message and fetching address are written to message output team
It arranges in pkt_out_queue;Wherein, when a message needs table look-up or are disposed, from work_queue described in deletion
The corresponding thread number of message and fetching address;
It is described controlled by the way of eight level production lines distributed thread to the message being stored in the packet storage device into
Row processing are as follows:
Eight level production lines support eight threads to work at the same time, the corresponding thread of every level-one in eight level production lines;Wherein,
The first order, thread send the acquisition request that message instructs according to the fetching address of message;
The second level, thread receive message instruction;
The third level, thread analytic message instruct and obtain the source operand of message instruction;
The fourth stage, thread carry out position adjustment to source operand;
Level V, thread execute message according to institute's rheme source operand adjusted and instruct corresponding arithmetical operation and corresponding deposit
Store up the calculating of address;
6th grade, thread issues read-write operation request according to the storage address;
7th grade, thread obtains the response of the operation requests;
8th grade, the processing result that thread instructs the response of the result of the arithmetical operation or the operation requests as message
Write-back;
Wherein, after the 8th grade, when determining that message does not need to table look-up and include the instruction of untreated message, according to the report
The thread number of text returns to the first order and handles untreated message instruction in the message.
2. the method according to claim 1, wherein this method further include:
After the completion of the Message processing, the thread number of the message is discharged.
3. a kind of ME, which is characterized in that the ME include: thread management module, the packet storage module with double reading-writing ports,
Kernel module;Wherein,
The thread management module, for carrying out thread distribution to the message received by least five thread management queues;
The packet storage module, for the message according to the threads store distributed;
The kernel module, for controlling distributed thread by the way of eight level production lines to being stored in the packet storage
Message in module is handled;
Wherein, the thread management module, specifically for being report in a manner of first in first out idle queues free_queue
Text distribution thread number, and queue rdy_queue to be processed is write into the self-contained fetching address of the thread number of distribution and message,
When available free pipeline resource, the thread number and the thread number of an outstanding message are dispatched from rdy_queue
Operation queue work_queue is write in corresponding fetching address, and what is stored in work_queue is all the line of the message handled
The thread number of the message and fetching address are written to team of tabling look-up when a message needs to table look-up by journey number and fetching address
It arranges in srh_queue, when a Message processing finishes, the thread number of the message and fetching address is written to message output
In queue pkt_out_queue;Wherein, when a message needs table look-up or are disposed, institute is deleted from work_queue
State the corresponding thread number of message and fetching address;
The kernel module is specifically used for the corresponding thread of level-one every in eight level production lines;Wherein,
The first order, thread send the acquisition request that message instructs according to the fetching address of message;
The second level, thread receive message instruction;
The third level, thread analytic message instruct and obtain the source operand of message instruction;
The fourth stage, thread carry out position adjustment to source operand;
Level V, thread execute message according to institute's rheme source operand adjusted and instruct corresponding arithmetical operation and corresponding deposit
Store up the calculating of address;
6th grade, thread issues read-write operation request according to the storage address;
7th grade, thread obtains the response of the operation requests;
8th grade, the processing that thread instructs the response of the result of the arithmetical operation or the operation requests as the message
As a result write-back;
Wherein, after the 8th grade, when determining that message does not need to table look-up and include the instruction of untreated message, according to the report
The thread number of text returns to the first order and handles untreated message instruction in the message.
4. ME according to claim 3, which is characterized in that the thread management module is also used in the Message processing
After the completion, the thread number of the message is discharged.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410084619.5A CN104901901B (en) | 2014-03-07 | 2014-03-07 | A kind of micro engine and its method for handling message |
PCT/CN2014/077834 WO2015131445A1 (en) | 2014-03-07 | 2014-05-19 | Microengine and packet processing method therefor, and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410084619.5A CN104901901B (en) | 2014-03-07 | 2014-03-07 | A kind of micro engine and its method for handling message |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104901901A CN104901901A (en) | 2015-09-09 |
CN104901901B true CN104901901B (en) | 2019-03-12 |
Family
ID=54034300
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410084619.5A Active CN104901901B (en) | 2014-03-07 | 2014-03-07 | A kind of micro engine and its method for handling message |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104901901B (en) |
WO (1) | WO2015131445A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109257280B (en) * | 2017-07-14 | 2022-05-27 | 深圳市中兴微电子技术有限公司 | Micro-engine and message processing method thereof |
CN109298923B (en) * | 2018-09-14 | 2019-11-29 | 中科驭数(北京)科技有限公司 | Deep pipeline task processing method and device |
CN117331655A (en) * | 2022-06-27 | 2024-01-02 | 深圳市中兴微电子技术有限公司 | Multithreading scheduling method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5560029A (en) * | 1991-07-22 | 1996-09-24 | Massachusetts Institute Of Technology | Data processing system with synchronization coprocessor for multiple threads |
US6829697B1 (en) * | 2000-09-06 | 2004-12-07 | International Business Machines Corporation | Multiple logical interfaces to a shared coprocessor resource |
CN1767502A (en) * | 2004-09-29 | 2006-05-03 | 英特尔公司 | Updating instructions executed by a multi-core processor |
CN101763285A (en) * | 2010-01-15 | 2010-06-30 | 西安电子科技大学 | Zero-overhead switching multithread processor and thread switching method thereof |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102752198B (en) * | 2012-06-21 | 2014-10-29 | 北京星网锐捷网络技术有限公司 | Multi-core message forwarding method, multi-core processor and network equipment |
-
2014
- 2014-03-07 CN CN201410084619.5A patent/CN104901901B/en active Active
- 2014-05-19 WO PCT/CN2014/077834 patent/WO2015131445A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5560029A (en) * | 1991-07-22 | 1996-09-24 | Massachusetts Institute Of Technology | Data processing system with synchronization coprocessor for multiple threads |
US6829697B1 (en) * | 2000-09-06 | 2004-12-07 | International Business Machines Corporation | Multiple logical interfaces to a shared coprocessor resource |
CN1767502A (en) * | 2004-09-29 | 2006-05-03 | 英特尔公司 | Updating instructions executed by a multi-core processor |
CN101763285A (en) * | 2010-01-15 | 2010-06-30 | 西安电子科技大学 | Zero-overhead switching multithread processor and thread switching method thereof |
Also Published As
Publication number | Publication date |
---|---|
CN104901901A (en) | 2015-09-09 |
WO2015131445A1 (en) | 2015-09-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11003489B2 (en) | Cause exception message broadcast between processing cores of a GPU in response to indication of exception event | |
CN102752198B (en) | Multi-core message forwarding method, multi-core processor and network equipment | |
US9965274B2 (en) | Computer processor employing bypass network using result tags for routing result operands | |
CN105450618B (en) | A kind of operation method and its system of API server processing big data | |
US20150149744A1 (en) | Data processing apparatus and method for performing vector processing | |
US11256507B2 (en) | Thread transition management | |
JP2018501564A (en) | Execution unit circuit for processor core, processor core, and method of executing program instructions in processor core | |
US20070226696A1 (en) | System and method for the execution of multithreaded software applications | |
CN105159768A (en) | Task management method and cloud data center management platform | |
WO2009006607A1 (en) | Dynamically composing processor cores to form logical processors | |
US20080155197A1 (en) | Locality optimization in multiprocessor systems | |
RU2008138707A (en) | DECLARATIVE MODEL FOR MANAGING PARALLEL PERFORMANCE OF LIGHTWEIGHT PERFORMANCE FLOWS | |
CN103197916A (en) | Methods and apparatus for source operand collector caching | |
CN109032668A (en) | Stream handle with high bandwidth and low-power vector register file | |
US11507386B2 (en) | Booting tiles of processing units | |
KR20180095652A (en) | Data Processing with Dynamic Partitioning | |
CN104901901B (en) | A kind of micro engine and its method for handling message | |
CN110308982A (en) | A kind of shared drive multiplexing method and device | |
CN106575220A (en) | Multiple clustered very long instruction word processing core | |
CN106406820B (en) | A kind of multi-emitting parallel instructions processing method and processing device of network processor micro-engine | |
CN115129480A (en) | Scalar processing unit and access control method thereof | |
US11875425B2 (en) | Implementing heterogeneous wavefronts on a graphics processing unit (GPU) | |
RU2694153C2 (en) | Stream processing using virtual processing agents | |
He et al. | Real-time scheduling in mapreduce clusters | |
CN116414541B (en) | Task execution method and device compatible with multiple task working modes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20150909 Assignee: Xi'an Chris Semiconductor Technology Co. Ltd. Assignor: SHENZHEN ZTE MICROELECTRONICS TECHNOLOGY CO., LTD. Contract record no.: 2019440020036 Denomination of invention: Micro-engine and method for processing message therewith Granted publication date: 20190312 License type: Common License Record date: 20190619 |
|
EE01 | Entry into force of recordation of patent licensing contract |