GB2454818A

GB2454818A - Request filtering in multi-stage arbiter circuitry to reduce latency

Info

Publication number: GB2454818A
Application number: GB0822309A
Authority: GB
Inventors: Markus Helms; Daniel Sentler; Manfred Walz
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2008-01-15
Filing date: 2008-12-08
Publication date: 2009-05-20
Anticipated expiration: 2028-12-08
Also published as: GB0822309D0; GB2454818B

Abstract

Arbiter circuitry 11 includes at least one request filter 12, a plurality of requestor latches 14, at least two staged arbiters 13 arranged directly behind the requestor latches, and an arbitration result latch 15 arranged behind the arbiters. Request filter 12 is arranged behind the arbitration result latch 15 in a non-timing critical path, e.g. pipeline stage 16. A latency reduction is achieved by avoiding stage latches (06, fig. 1). Moving filter 12 to pipeline stage 16 means that incorrect arbitration results may occur so, preferably, it is possible to rollback incorrect arbitration results. To allow rollback, two-staged grants may be provided, e.g. a preliminary grant (22) in a first cycle and a final grant (24) in a second cycle. Preferably, arbitration circuitry 11 is operated below its maximum throughput capacity. The invention may be applied to processing direct memory access (DMA) requests of input/output (I/O) devices attached to an I/O adapter of a host device having main memory.

Description

DESCRI PTION

Arbiter circuitry and method to reduce latency when processing direct memory access requests on a main memory of a data processing system within an arbitration circuitry

Technical field

The present invention relates to processing of direct memory access (DMA) requests of input/output (I/O) devices attached to an I/O adapter of a host data processing system with a main memory, wherein the DMA requests are executed on the main memory and wherein the I/O adapter converts between timing and protocol requirements of the I/o device's memory and those of the host data processing system's main memory.

Background of the invention

Adapters, switches, routers and the like typically use arbitration means in forwarding traffic, e.g. packets, from multiple sources onto one destination. The required arbitratjons might be quite complex: E.g. requests come in from multiple classes/areas. Often some additional rules apply, like e.g. ordering requirements, resource availability, and the like. This often results in multistageci arbiters with complex request filters. In parallel to complexity, the design point might choose an aggressive frequency, resulting in short cycle times for the arbitration, On top it is always a goal to minimize the latency in forwarding the traffic.

So one typically ends up in the task, to solve the conflict between implementing as much as possible of the filtering and arbitration in one cycle on the one hand and coming up with small logic cones on the other hand.

Fig. 1 shows an example of a two-staged arbiter circuitry 01 according to the state of the art, wherein parts of the dcign.

e.g. grants, are not shown to simplify the figure. Request filters 02, which implement some design specific rules, are arranged in front of the arbiters 03. The problematic cone is from the requestor latches 04 into the arbitration result latch 05, because the hierarchical structure requires a long cycle. If the filtering plus arbitration cannot be done in the cycle, the standard Solution is, to add extra latches 06, so called stage latches 06, in front of the arbiter 03 with the price of having extra cycle latency.

Object of the invention An object of the invention is to provide an arbiter circuitry with reduced latency and a method to reduce latency when processing direct memory access requests on a main memory of a data processing system within an arbiter circuitry.

Summary of the invention

A first subject matter of the invention concerns an arbiter circuitry comprising at least one request filter, a plurality of requestor latches, at least two staged arbiters connected with the requestor latches and an arbitration result latch arranged behind the arbiters. According to the invention, the request filters as well as the stage latches arranged and needed between the requestor latches and the arbiters according to the state of the art, are removed and the arbiters are arranged directly behind the requestor latches and a request filter is arranged behind the arbitration result latch in a non-timing critical path.

Said arbiter circuitry according to the invention has the advantaqe over the t-t-f the art that a latency reducticn 13 achieved by avoiding the stage latches. According to the invention this is possible by moving the timing critical cone that is the request filtering into the next cycle.

The advantages of the invention are achieved by moving the time consuming filtering into a different cycle without increasing the latency. If the filtering is moved into the next cycle, one has achieved that goal. To do so, according to the invention the request filter is arranged behind the arbitration result latch into a non-timing critical path.

Compared to the state of the art, the request filter is moved into a non-timing critical path, e.g. a pipeline stage, and which pipeline stage preferably had almost no logic in it. The new request filter fits into a gap, which according to the state of the art was required to cover a separate parallel logic cone/pipe, which was e.g. a lookup for and address translation.

According to a preferred embodiment of the arbiter circuitry according to the invention, the non-timing critical path is a pipeline stage.

According to another preferred embodiment of the invention, the arbiter circuitry is a two-staged arbiter circuitry, where at least two arbiters are staged or series connected.

A second subject matter of the invention concerns a method to reduce latency when processing direct memory access requests on a main memory of a data processing system within an arbitration circuitry comprising a request filter, a plurality of requestor latches, at least two staged arbiters connected with the requestor latches and an arbitration result latch arranged behind the arbiters. According to the invention, said method is characterized hy mrrirrT the time concumin filtcring into different cycle without increasing the latency by arranging the request filter behind the arbitration result latch into a non-timing critical path.

Since that move changes the logical behavior of the circuit of the two-staged arbiter such, that incorrect arbitration results violating some rules could happen, preferably a rollback-ability is added into the design. Thus in a preferred embodiment of said method it is foreseen, that if an unfiltered arbitration violates the rules, the arbitration result is considered as an invalid result, wherein it is possible to revert a decision i.e. to rollback an incorrect arbitration.

To revert a decision i.e. to rollback an incorrect arbitration, preferably a grant indication is divided into a two-staged grant, wherein in a first cycle a preliminary grant is given and in a second cycle the grant is either the final grant or withdrawn, wherein the logic in the requestors is changed in a way that in a state after the preliminary grant it can go back to a previous state.

According to a particularly preferred embodiment of the invention, the arbitration circuitry is operated such that a typical load on the system is below its maximum throughput capacity. For such cases there often is no battle from multiple requestors. Also some rules, e.g. ordering, apply only to a subset of operations or commands. Therefore most of the time a request can go straight through without any delay or filtering. The filtering would have applied only in rare cases. If the rate of miss-predictions is low or if the incorrect speculative arbitration hits unused cycles and if the rate of valid speculative arbitration has helped to reduce the latency for the majority of operations, one has clearly reached an overall faster design point.

Such a roll-.backable arbitration reduces the overall latency with aggressive cycle times and small logic cones by allowing a low rate of incorrect speculative grants.

The invention is applicable e.g. for DMA access of I/O adapters into memory like e.g. a requestor having a DMA Read or DMA Write Operation pending targeting an interface to a main memory.

A pref erred embodiment of said method is characterized in that -a requestor's request is routed to an arbiter without any filtering to match a standard case, -the arbiter will eventually grant a request, -the reguestor enters a next state based on the grant from the arbiter and checks, if the filter/delay condition does apply, wherein -if the filter/delay condition does apply the requestor does not exploit the grant and goes back to its previous requesting state, and -if the filter/delay condition does not apply, the requestor proceeds as in a standard case In a logic implementation view the movement of the filter/delay Condition into a different sequential spot, allows the logic to be changed in a way that: -The arbitration/grant logic-cone is getting smaller.

-The filter/delay cone is implemented in different cone with huge cycle time margin.

-Only minor changes in requestor logic are necessary compared to the state of the art, like e.g. the state machine of the requestor logic gets one additional condition allowing a state transition after incorrect speculative grant, forming a very simple rollback capability.

-The logic is better distributed nd thus the avcr.gc cone is smaller, which allows a shorter cycle time of the circuit and therefore an overall faster operation, because operated at higher frequency.

-Potentially wasted preliminary speculative grants can be tolerated, because the impact due to their rare occurences is lower than the gain due to frequency increase.

The foregoing, together with other objects, features, and advantages of this invention can be better appreciated with reference to the following specification, claims and drawings.

Brief description of the drawings, with

Fig. 1 showing a scheme of a two-staged arbiter according to the state of the art; Fig. 2 showing a scheme of a two-staged arbiter according to the invention; Fig. 3 showing a flowchart of requestor states according to the invention; Fig. 4 showing a flowchart of filtering and arbitration requestor states according to the state of the art; Fig. 5 showing a flowchart of rollback-able filtering and arbitration requestor states according to the invention;

Detailed description of the drawings

A basic idea of the invention is to move the time consuming filtering into a different cycle without increasing the latency.

If the filtering is moved into the next cycle, one has achieved -.7...

that goal. To do so, according to the invention a new request filter is arranged behind the arbitration result latch into a non-timing critical path.

Fig. 2 shows an example of a two-staged arbiter circuitry 11 according to the invention, wherein parts of the design, e.g. grants, are not shown to simplify the figure. The request filters 02 as well as the stage latches 06 needed according to the state of the art shown in Fig. 1 are removed, since they are not required any more. Now according to the invention arbiters 13 are arranged directly behind the requestor latches 14 and a new request filter 12 is arranged behind the arbitration result latch into a non-timing critical path.

Comparing the two-staged arbiter 01 according to the state of the art shown in Fig. 1 and the two-staged arbiter 11 according to the invention shown in Fig. 2, the filter 02 is moved into a pipeline stage 16, where it becomes the filter 12 and which pipeline stage 16 had almost no logic in it. The new filter 12 fits into a gap, which was required to cover a separate parallel logic cone/pipe, which was e.g. a lookup for and address translation.

According to the invention the latency reduction is achieved by avoiding the stage latches 06 required according to the state of the art shown in Fig. 1. According to the invention this is possible by moving the timing critical cone into the next cycle.

That move changes the logical behavior of the circuit of the two-staged arbiter 11 such, that incorrect arbitration results violating some rules could happen. Thus preferably a rollback ability is added into the design.

With moving the filter into the later cycle, new scenarios have to be solved. For example, if an arbitration violates the rules, the arbitration result has to be considered as an invalid result. So it must be possible, to revert the decision i.e. to rollback an incorrect arbitration.

With reference to Fig. 3 showing a flowchart 20 of requestor states this is achieved as follows: -The grant indication is divided into a 2-staged grant: a) First a preliminary grant 22 is given.

b) In the second cycle the grant is either the final grant 24 or withdrawn 26.

-The logic in the recluestors is changed slightly: in the state 32 after the preliminary grant 22, it can go back to its previous state 30.

-for an incorrect speculative arbitration, the price for a wasted cycle is paid.

Requestor states representing this proceeding are depicted in Fig. 3. In a typical or usual case the preliminary grant 22 is correct.

In the usual case a header processing is performed in the state 32 HDR_REQU'. In a seldom or unusual case of an invalid grant 26 the prepared header is discarded. The final grant 24 is to say that the process continues as usual.

Figs. 4 and 5 depict more detailed views of the processing from a requestor state 40, 30 IDLE' via a requestor state 42, 32 HDR_REQU' to a requestor state 44, 34 NEXT_HDR'. Thereby the requesor states 30, 32, 34 in Fig. 5 are the same as in Fig. 3 as indicated by the dotted box 28 in Fig. 3.

In Figs. 4 and 5 rectangulars show state-containing requestors with f1ip-flps defining the cone boundaries. Round-cornered and rhombus shapes show regular concurrent logic without flip-flops.

Fig. 4 shows the requestor processing according to the state of the art, i.e. with filtering 45 taking place before arbitration 41. The query 43 takes place fr the arbitration 41. On the right, arrows 47, 49 indicate the time consumption of the Processing. The arrow 47 indicates that according to the state of the art 4ns are required from requestor state 40 IDLE' to requestor state 42 HDR_REQU'. The arrow 49 indicates, that the cycle from requestor state 42 HDR_REQU' to the requestor state 44 NEXT HDR' is not utilized according to the state of the art.

In Fig. 4 it is recognizable that a huge cone of logic defines the longest cycle time required and therefore defines the maximum frequency defines by the large cone.

Fig. 5 shows the re-structured "rollback-able" requestor processing with a shrinked cone on the top and a new small cone in a before un-utilized location. This is now allowing shorter cycle times and thus a higher frequency.

Fig. 5 shows the requestor processing according to the invention, i.e. with filtering 35 taking place after arbitration 31. The query 33 takes place directly after the arbitration 31. on the right, arrows 37, 39 indicate the time consumption of the processing. The arrow 37 indicates that according to the invention 3ns are required from requestor state 30 IDLE' to requestor state 32 HDR_REQU'. The arrow 39 indicates, that the cycle from requestor state 32 HDRREQU' to the requestor state 34 NEXT_HDR now according to the invention is utilized for the filtering 35.

Implementations can be operated such that the typical load on the system is below its maximum throughput capacity. For such cases there often is no battle from multiple requestors. Also some rules, e.g. ordering, apply only to a subset of operations or commands. Therefore most of the time a request can go straight through without any delay or filtering. The filtering would have applied only in rare cases. If the rate of miss-predictions is low or if the incorrect speculative arbitration hits unwd cycles and if the rate of valid speculative arbitration has helped to reduce the latency for the majority of operations, one has clearly reached an overall faster design point.

Such a rollback-able arbitration reduces the overall latency with aggressive cycle times and small logic cones by allowing a low rate of incorrect speculative grants.

The invention is applicable e.g. for DMA access of I/O adapters into memory like e.g. a requestor means having a DMA Read or DMA Write Operation pending targeting an interface to a main memory means.

Thereby the following conditions occur; -Multiple requestors are targeting a single resource, like e.g. an interface.

-The requestors keep their request stored in local memory means, e.g. flip-flops, until granted.

-There are special filter/delay conditions, but in most cases no filter/delay is required, depending e.g. on traffic characteristics and the like.

In a sequential view this works as follows: -reguestor's request is routed to arbiter without any filtering => this matches the standard case -arbiter will eventually grant a request -requestor enters a next state based on the grant from arbiter -new: reguestor now does the checking, if the filter/delay condition does apply. If yes: don't exploit the grant and go -11 -back to previous (requesting state), if no: proceed as in standard case In a logic implementation view this works as follows: -the movement of the filter/delay condition into a different "sequential" spot, allows the logic to be changed: -arbitration/grant logic-cone is getting smaller -filter/delay cone is implemented in different cone (with huge cycle time margin), see figures in original document -minor change in requestor logic: it's state machine gets one additional condition (state transition after incorrect speculative grant) => very simple rollback capability logic is better distributed, average cone is smaller, this allows shorter cycle time of circuit, therefore overall faster operation, because operated at higher frequency potentially wasted preliminary "speculative" grants can be tolerated, because the impact due to their rare occurrences is lower than the gain due to frequency increase While the present invention has been described in detail, in conjunction with specific preferred embodiments, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art in light of the foregoing description. it is therefore contemplated that the appended claims will embrace any such alternatives, modifications and variations as falling within the true scope and spirit of the present invention.

Claims

1. Arbiter circuitry (11) comprising at least one reqiit-filter (12), a plurality of requestor latches (14), at least two staged arbiters (13) connected with the requestor latches (14) and an arbitration result latch (15) arranged behind the arbiters (13), characterized in that the arbiters (13) are arranged directly behind the requestor latches (14) and a request filter (12) is arranged behind the arbitration result latch (15) in a non-timing critical path (16)

2. Arbiter circuitry (11) according to claim 1, characterized in that the non-timing critical path (16) is a pipeline stage (16).

3. Arbiter circuitry (11) according to claim 1 or 2, characterized in that the arbiter circuitry (11) is a two-staged arbiter circuitry (11), where at least two arbiters (13) are staged.

4. Method to reduce latency when processing direct memory access requests on a main memory of a data processing system within an arbitration circuitry (11) comprising a request filter (12), a plurality of requestor latches (14), at least two staged arbiters (13) connected with the requestor latches (14) and an arbitration result latch (15) arranged behind the arbiters (13), characterized by moving the time consuming filtering (12) into a different cycle without increasing the latency by arranging the request filter (12) behind the arbitration result latch (15) into a non-timing critical path (16)

5. Method according to claim 4, characterized in that, if an arbitration violates the rules, the arbitration result is considered as an invalid result, wherein it is possible to rollback an incorrect arbitration.

-13 -

6. Method according to claim 5, characterized in that to rollback an incorrect arbitration a grant indication is divided into a two-staged grant, wherein in a first cycle a preliminary grant (22) is given and in a second cycle the grant is either the final grant (24) or withdrawn (26), wherein the logic in the requestors is changed in a way that in a state (32) after the preliminary grant (22) it can go back to a previous state (30).

7. Method according to claim 4, 5 or 6, characterized in that the arbitration circuitry (11) is operated such that a typical load on the system is below its maximum throughput capacity.

8. Method according to one of the claims 4 to 7, characterized in that -a requestor's request is routed to an arbiter (13) without any filtering, -the arbiter (13) will eventually grant a request, -the requestor enters a next state based on the grant from the arbiter (13) and checks, if the filter/delay condition does apply, wherein -if the filter/delay condition does apply the requestor does not exploit the grant and goes back to its previous requesting state, and -if the filter/delay condition does not apply, the requestor proceeds as in a standard case