GB2454818A - Request filtering in multi-stage arbiter circuitry to reduce latency - Google Patents

Request filtering in multi-stage arbiter circuitry to reduce latency Download PDF

Info

Publication number
GB2454818A
GB2454818A GB0822309A GB0822309A GB2454818A GB 2454818 A GB2454818 A GB 2454818A GB 0822309 A GB0822309 A GB 0822309A GB 0822309 A GB0822309 A GB 0822309A GB 2454818 A GB2454818 A GB 2454818A
Authority
GB
United Kingdom
Prior art keywords
arbitration
requestor
grant
filter
arbiter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0822309A
Other versions
GB0822309D0 (en
GB2454818B (en
Inventor
Markus Helms
Daniel Sentler
Manfred Walz
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of GB0822309D0 publication Critical patent/GB0822309D0/en
Publication of GB2454818A publication Critical patent/GB2454818A/en
Application granted granted Critical
Publication of GB2454818B publication Critical patent/GB2454818B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/36Handling requests for interconnection or transfer for access to common bus or bus system
    • G06F13/362Handling requests for interconnection or transfer for access to common bus or bus system with centralised access control
    • G06F13/364Handling requests for interconnection or transfer for access to common bus or bus system with centralised access control using independent requests or grants, e.g. using separated request and grant lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • G06F13/30Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal with priority control

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bus Control (AREA)

Abstract

Arbiter circuitry 11 includes at least one request filter 12, a plurality of requestor latches 14, at least two staged arbiters 13 arranged directly behind the requestor latches, and an arbitration result latch 15 arranged behind the arbiters. Request filter 12 is arranged behind the arbitration result latch 15 in a non-timing critical path, e.g. pipeline stage 16. A latency reduction is achieved by avoiding stage latches (06, fig. 1). Moving filter 12 to pipeline stage 16 means that incorrect arbitration results may occur so, preferably, it is possible to rollback incorrect arbitration results. To allow rollback, two-staged grants may be provided, e.g. a preliminary grant (22) in a first cycle and a final grant (24) in a second cycle. Preferably, arbitration circuitry 11 is operated below its maximum throughput capacity. The invention may be applied to processing direct memory access (DMA) requests of input/output (I/O) devices attached to an I/O adapter of a host device having main memory.

Description

DESCRI PTION
Arbiter circuitry and method to reduce latency when processing direct memory access requests on a main memory of a data processing system within an arbitration circuitry
Technical field
The present invention relates to processing of direct memory access (DMA) requests of input/output (I/O) devices attached to an I/O adapter of a host data processing system with a main memory, wherein the DMA requests are executed on the main memory and wherein the I/O adapter converts between timing and protocol requirements of the I/o device's memory and those of the host data processing system's main memory.
Background of the invention
Adapters, switches, routers and the like typically use arbitration means in forwarding traffic, e.g. packets, from multiple sources onto one destination. The required arbitratjons might be quite complex: E.g. requests come in from multiple classes/areas. Often some additional rules apply, like e.g. ordering requirements, resource availability, and the like. This often results in multistageci arbiters with complex request filters. In parallel to complexity, the design point might choose an aggressive frequency, resulting in short cycle times for the arbitration, On top it is always a goal to minimize the latency in forwarding the traffic.
So one typically ends up in the task, to solve the conflict between implementing as much as possible of the filtering and arbitration in one cycle on the one hand and coming up with small logic cones on the other hand.
Fig. 1 shows an example of a two-staged arbiter circuitry 01 according to the state of the art, wherein parts of the dcign.
e.g. grants, are not shown to simplify the figure. Request filters 02, which implement some design specific rules, are arranged in front of the arbiters 03. The problematic cone is from the requestor latches 04 into the arbitration result latch 05, because the hierarchical structure requires a long cycle. If the filtering plus arbitration cannot be done in the cycle, the standard Solution is, to add extra latches 06, so called stage latches 06, in front of the arbiter 03 with the price of having extra cycle latency.
Object of the invention An object of the invention is to provide an arbiter circuitry with reduced latency and a method to reduce latency when processing direct memory access requests on a main memory of a data processing system within an arbiter circuitry.
Summary of the invention
A first subject matter of the invention concerns an arbiter circuitry comprising at least one request filter, a plurality of requestor latches, at least two staged arbiters connected with the requestor latches and an arbitration result latch arranged behind the arbiters. According to the invention, the request filters as well as the stage latches arranged and needed between the requestor latches and the arbiters according to the state of the art, are removed and the arbiters are arranged directly behind the requestor latches and a request filter is arranged behind the arbitration result latch in a non-timing critical path.
Said arbiter circuitry according to the invention has the advantaqe over the t-t-f the art that a latency reducticn 13 achieved by avoiding the stage latches. According to the invention this is possible by moving the timing critical cone that is the request filtering into the next cycle.
The advantages of the invention are achieved by moving the time consuming filtering into a different cycle without increasing the latency. If the filtering is moved into the next cycle, one has achieved that goal. To do so, according to the invention the request filter is arranged behind the arbitration result latch into a non-timing critical path.
Compared to the state of the art, the request filter is moved into a non-timing critical path, e.g. a pipeline stage, and which pipeline stage preferably had almost no logic in it. The new request filter fits into a gap, which according to the state of the art was required to cover a separate parallel logic cone/pipe, which was e.g. a lookup for and address translation.
According to a preferred embodiment of the arbiter circuitry according to the invention, the non-timing critical path is a pipeline stage.
According to another preferred embodiment of the invention, the arbiter circuitry is a two-staged arbiter circuitry, where at least two arbiters are staged or series connected.
A second subject matter of the invention concerns a method to reduce latency when processing direct memory access requests on a main memory of a data processing system within an arbitration circuitry comprising a request filter, a plurality of requestor latches, at least two staged arbiters connected with the requestor latches and an arbitration result latch arranged behind the arbiters. According to the invention, said method is characterized hy mrrirrT the time concumin filtcring into different cycle without increasing the latency by arranging the request filter behind the arbitration result latch into a non-timing critical path.
Since that move changes the logical behavior of the circuit of the two-staged arbiter such, that incorrect arbitration results violating some rules could happen, preferably a rollback-ability is added into the design. Thus in a preferred embodiment of said method it is foreseen, that if an unfiltered arbitration violates the rules, the arbitration result is considered as an invalid result, wherein it is possible to revert a decision i.e. to rollback an incorrect arbitration.
To revert a decision i.e. to rollback an incorrect arbitration, preferably a grant indication is divided into a two-staged grant, wherein in a first cycle a preliminary grant is given and in a second cycle the grant is either the final grant or withdrawn, wherein the logic in the requestors is changed in a way that in a state after the preliminary grant it can go back to a previous state.
According to a particularly preferred embodiment of the invention, the arbitration circuitry is operated such that a typical load on the system is below its maximum throughput capacity. For such cases there often is no battle from multiple requestors. Also some rules, e.g. ordering, apply only to a subset of operations or commands. Therefore most of the time a request can go straight through without any delay or filtering. The filtering would have applied only in rare cases. If the rate of miss-predictions is low or if the incorrect speculative arbitration hits unused cycles and if the rate of valid speculative arbitration has helped to reduce the latency for the majority of operations, one has clearly reached an overall faster design point.
Such a roll-.backable arbitration reduces the overall latency with aggressive cycle times and small logic cones by allowing a low rate of incorrect speculative grants.
The invention is applicable e.g. for DMA access of I/O adapters into memory like e.g. a requestor having a DMA Read or DMA Write Operation pending targeting an interface to a main memory.
A pref erred embodiment of said method is characterized in that -a requestor's request is routed to an arbiter without any filtering to match a standard case, -the arbiter will eventually grant a request, -the reguestor enters a next state based on the grant from the arbiter and checks, if the filter/delay condition does apply, wherein -if the filter/delay condition does apply the requestor does not exploit the grant and goes back to its previous requesting state, and -if the filter/delay condition does not apply, the requestor proceeds as in a standard case In a logic implementation view the movement of the filter/delay Condition into a different sequential spot, allows the logic to be changed in a way that: -The arbitration/grant logic-cone is getting smaller.
-The filter/delay cone is implemented in different cone with huge cycle time margin.
-Only minor changes in requestor logic are necessary compared to the state of the art, like e.g. the state machine of the requestor logic gets one additional condition allowing a state transition after incorrect speculative grant, forming a very simple rollback capability.
-The logic is better distributed nd thus the avcr.gc cone is smaller, which allows a shorter cycle time of the circuit and therefore an overall faster operation, because operated at higher frequency.
-Potentially wasted preliminary speculative grants can be tolerated, because the impact due to their rare occurences is lower than the gain due to frequency increase.
The foregoing, together with other objects, features, and advantages of this invention can be better appreciated with reference to the following specification, claims and drawings.
Brief description of the drawings, with
Fig. 1 showing a scheme of a two-staged arbiter according to the state of the art; Fig. 2 showing a scheme of a two-staged arbiter according to the invention; Fig. 3 showing a flowchart of requestor states according to the invention; Fig. 4 showing a flowchart of filtering and arbitration requestor states according to the state of the art; Fig. 5 showing a flowchart of rollback-able filtering and arbitration requestor states according to the invention;
Detailed description of the drawings
A basic idea of the invention is to move the time consuming filtering into a different cycle without increasing the latency.
If the filtering is moved into the next cycle, one has achieved -.7...
that goal. To do so, according to the invention a new request filter is arranged behind the arbitration result latch into a non-timing critical path.
Fig. 2 shows an example of a two-staged arbiter circuitry 11 according to the invention, wherein parts of the design, e.g. grants, are not shown to simplify the figure. The request filters 02 as well as the stage latches 06 needed according to the state of the art shown in Fig. 1 are removed, since they are not required any more. Now according to the invention arbiters 13 are arranged directly behind the requestor latches 14 and a new request filter 12 is arranged behind the arbitration result latch into a non-timing critical path.
Comparing the two-staged arbiter 01 according to the state of the art shown in Fig. 1 and the two-staged arbiter 11 according to the invention shown in Fig. 2, the filter 02 is moved into a pipeline stage 16, where it becomes the filter 12 and which pipeline stage 16 had almost no logic in it. The new filter 12 fits into a gap, which was required to cover a separate parallel logic cone/pipe, which was e.g. a lookup for and address translation.
According to the invention the latency reduction is achieved by avoiding the stage latches 06 required according to the state of the art shown in Fig. 1. According to the invention this is possible by moving the timing critical cone into the next cycle.
That move changes the logical behavior of the circuit of the two-staged arbiter 11 such, that incorrect arbitration results violating some rules could happen. Thus preferably a rollback ability is added into the design.
With moving the filter into the later cycle, new scenarios have to be solved. For example, if an arbitration violates the rules, the arbitration result has to be considered as an invalid result. So it must be possible, to revert the decision i.e. to rollback an incorrect arbitration.
With reference to Fig. 3 showing a flowchart 20 of requestor states this is achieved as follows: -The grant indication is divided into a 2-staged grant: a) First a preliminary grant 22 is given.
b) In the second cycle the grant is either the final grant 24 or withdrawn 26.
-The logic in the recluestors is changed slightly: in the state 32 after the preliminary grant 22, it can go back to its previous state 30.
-for an incorrect speculative arbitration, the price for a wasted cycle is paid.
Requestor states representing this proceeding are depicted in Fig. 3. In a typical or usual case the preliminary grant 22 is correct.
In the usual case a header processing is performed in the state 32 HDR_REQU'. In a seldom or unusual case of an invalid grant 26 the prepared header is discarded. The final grant 24 is to say that the process continues as usual.
Figs. 4 and 5 depict more detailed views of the processing from a requestor state 40, 30 IDLE' via a requestor state 42, 32 HDR_REQU' to a requestor state 44, 34 NEXT_HDR'. Thereby the requesor states 30, 32, 34 in Fig. 5 are the same as in Fig. 3 as indicated by the dotted box 28 in Fig. 3.
In Figs. 4 and 5 rectangulars show state-containing requestors with f1ip-flps defining the cone boundaries. Round-cornered and rhombus shapes show regular concurrent logic without flip-flops.
Fig. 4 shows the requestor processing according to the state of the art, i.e. with filtering 45 taking place before arbitration 41. The query 43 takes place fr the arbitration 41. On the right, arrows 47, 49 indicate the time consumption of the Processing. The arrow 47 indicates that according to the state of the art 4ns are required from requestor state 40 IDLE' to requestor state 42 HDR_REQU'. The arrow 49 indicates, that the cycle from requestor state 42 HDR_REQU' to the requestor state 44 NEXT HDR' is not utilized according to the state of the art.
In Fig. 4 it is recognizable that a huge cone of logic defines the longest cycle time required and therefore defines the maximum frequency defines by the large cone.
Fig. 5 shows the re-structured "rollback-able" requestor processing with a shrinked cone on the top and a new small cone in a before un-utilized location. This is now allowing shorter cycle times and thus a higher frequency.
Fig. 5 shows the requestor processing according to the invention, i.e. with filtering 35 taking place after arbitration 31. The query 33 takes place directly after the arbitration 31. on the right, arrows 37, 39 indicate the time consumption of the processing. The arrow 37 indicates that according to the invention 3ns are required from requestor state 30 IDLE' to requestor state 32 HDR_REQU'. The arrow 39 indicates, that the cycle from requestor state 32 HDRREQU' to the requestor state 34 NEXT_HDR now according to the invention is utilized for the filtering 35.
Implementations can be operated such that the typical load on the system is below its maximum throughput capacity. For such cases there often is no battle from multiple requestors. Also some rules, e.g. ordering, apply only to a subset of operations or commands. Therefore most of the time a request can go straight through without any delay or filtering. The filtering would have applied only in rare cases. If the rate of miss-predictions is low or if the incorrect speculative arbitration hits unwd cycles and if the rate of valid speculative arbitration has helped to reduce the latency for the majority of operations, one has clearly reached an overall faster design point.
Such a rollback-able arbitration reduces the overall latency with aggressive cycle times and small logic cones by allowing a low rate of incorrect speculative grants.
The invention is applicable e.g. for DMA access of I/O adapters into memory like e.g. a requestor means having a DMA Read or DMA Write Operation pending targeting an interface to a main memory means.
Thereby the following conditions occur; -Multiple requestors are targeting a single resource, like e.g. an interface.
-The requestors keep their request stored in local memory means, e.g. flip-flops, until granted.
-There are special filter/delay conditions, but in most cases no filter/delay is required, depending e.g. on traffic characteristics and the like.
In a sequential view this works as follows: -reguestor's request is routed to arbiter without any filtering => this matches the standard case -arbiter will eventually grant a request -requestor enters a next state based on the grant from arbiter -new: reguestor now does the checking, if the filter/delay condition does apply. If yes: don't exploit the grant and go -11 -back to previous (requesting state), if no: proceed as in standard case In a logic implementation view this works as follows: -the movement of the filter/delay condition into a different "sequential" spot, allows the logic to be changed: -arbitration/grant logic-cone is getting smaller -filter/delay cone is implemented in different cone (with huge cycle time margin), see figures in original document -minor change in requestor logic: it's state machine gets one additional condition (state transition after incorrect speculative grant) => very simple rollback capability logic is better distributed, average cone is smaller, this allows shorter cycle time of circuit, therefore overall faster operation, because operated at higher frequency potentially wasted preliminary "speculative" grants can be tolerated, because the impact due to their rare occurrences is lower than the gain due to frequency increase While the present invention has been described in detail, in conjunction with specific preferred embodiments, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art in light of the foregoing description. it is therefore contemplated that the appended claims will embrace any such alternatives, modifications and variations as falling within the true scope and spirit of the present invention.

Claims (8)

1. Arbiter circuitry (11) comprising at least one reqiit-filter (12), a plurality of requestor latches (14), at least two staged arbiters (13) connected with the requestor latches (14) and an arbitration result latch (15) arranged behind the arbiters (13), characterized in that the arbiters (13) are arranged directly behind the requestor latches (14) and a request filter (12) is arranged behind the arbitration result latch (15) in a non-timing critical path (16)
2. Arbiter circuitry (11) according to claim 1, characterized in that the non-timing critical path (16) is a pipeline stage (16).
3. Arbiter circuitry (11) according to claim 1 or 2, characterized in that the arbiter circuitry (11) is a two-staged arbiter circuitry (11), where at least two arbiters (13) are staged.
4. Method to reduce latency when processing direct memory access requests on a main memory of a data processing system within an arbitration circuitry (11) comprising a request filter (12), a plurality of requestor latches (14), at least two staged arbiters (13) connected with the requestor latches (14) and an arbitration result latch (15) arranged behind the arbiters (13), characterized by moving the time consuming filtering (12) into a different cycle without increasing the latency by arranging the request filter (12) behind the arbitration result latch (15) into a non-timing critical path (16)
5. Method according to claim 4, characterized in that, if an arbitration violates the rules, the arbitration result is considered as an invalid result, wherein it is possible to rollback an incorrect arbitration.
-13 -
6. Method according to claim 5, characterized in that to rollback an incorrect arbitration a grant indication is divided into a two-staged grant, wherein in a first cycle a preliminary grant (22) is given and in a second cycle the grant is either the final grant (24) or withdrawn (26), wherein the logic in the requestors is changed in a way that in a state (32) after the preliminary grant (22) it can go back to a previous state (30).
7. Method according to claim 4, 5 or 6, characterized in that the arbitration circuitry (11) is operated such that a typical load on the system is below its maximum throughput capacity.
8. Method according to one of the claims 4 to 7, characterized in that -a requestor's request is routed to an arbiter (13) without any filtering, -the arbiter (13) will eventually grant a request, -the requestor enters a next state based on the grant from the arbiter (13) and checks, if the filter/delay condition does apply, wherein -if the filter/delay condition does apply the requestor does not exploit the grant and goes back to its previous requesting state, and -if the filter/delay condition does not apply, the requestor proceeds as in a standard case
GB0822309.1A 2008-01-15 2008-12-08 Arbiter circuitry and method to reduce latency when processing direct memory access requests on a main memory of a data processing system Active GB2454818B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP08100475 2008-01-15

Publications (3)

Publication Number Publication Date
GB0822309D0 GB0822309D0 (en) 2009-01-14
GB2454818A true GB2454818A (en) 2009-05-20
GB2454818B GB2454818B (en) 2012-09-19

Family

ID=40289621

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0822309.1A Active GB2454818B (en) 2008-01-15 2008-12-08 Arbiter circuitry and method to reduce latency when processing direct memory access requests on a main memory of a data processing system

Country Status (1)

Country Link
GB (1) GB2454818B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0464237A1 (en) * 1990-07-03 1992-01-08 International Business Machines Corporation Bus arbitration scheme
US20020083250A1 (en) * 2000-12-26 2002-06-27 Do-Young Kim Apparatus and method for arbitrating use authority for system bus in a multi-stage connection
US20050246463A1 (en) * 2004-04-29 2005-11-03 International Business Machines Corporation Transparent high-speed multistage arbitration system and method
EP1811394A1 (en) * 2004-10-28 2007-07-25 Magima Digital Information Co., Ltd. An arbitrator and its arbitration method
GB2447690A (en) * 2007-03-22 2008-09-24 Advanced Risc Mach Ltd A data processing apparatus and method for performing multi-cycle arbitration

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0464237A1 (en) * 1990-07-03 1992-01-08 International Business Machines Corporation Bus arbitration scheme
US20020083250A1 (en) * 2000-12-26 2002-06-27 Do-Young Kim Apparatus and method for arbitrating use authority for system bus in a multi-stage connection
US20050246463A1 (en) * 2004-04-29 2005-11-03 International Business Machines Corporation Transparent high-speed multistage arbitration system and method
EP1811394A1 (en) * 2004-10-28 2007-07-25 Magima Digital Information Co., Ltd. An arbitrator and its arbitration method
GB2447690A (en) * 2007-03-22 2008-09-24 Advanced Risc Mach Ltd A data processing apparatus and method for performing multi-cycle arbitration

Also Published As

Publication number Publication date
GB0822309D0 (en) 2009-01-14
GB2454818B (en) 2012-09-19

Similar Documents

Publication Publication Date Title
US7143221B2 (en) Method of arbitrating between a plurality of transfers to be routed over a corresponding plurality of paths provided by an interconnect circuit of a data processing apparatus
US9053058B2 (en) QoS inband upgrade
US5519854A (en) Write request interlock
JP5629819B2 (en) Apparatus and method for using variable clock gating hysteresis in communication port
US9176913B2 (en) Coherence switch for I/O traffic
US8656078B2 (en) Transaction identifier expansion circuitry and method of operation of such circuitry
US8677045B2 (en) Transaction reordering system and method with protocol indifference
US7809972B2 (en) Data processing apparatus and method for translating a signal between a first clock domain and a second clock domain
US9524261B2 (en) Credit lookahead mechanism
US20120079148A1 (en) Reordering arrangement
KR102397275B1 (en) Memory pre-fetch for virtual memory
US8930601B2 (en) Transaction routing device and method for routing transactions in an integrated circuit
TW201303870A (en) Effective utilization of flash interface
US7412551B2 (en) Methods and apparatus for supporting programmable burst management schemes on pipelined buses
US7945806B2 (en) Data processing apparatus and method for controlling a transfer of payload data over a communication channel
KR100762264B1 (en) A Structure of BusMatrix To Decrease Latency Time
CN110573989B (en) Dynamic arbitration method for real-time stream in multi-client system
GB2454818A (en) Request filtering in multi-stage arbiter circuitry to reduce latency
Jun et al. Latency-aware bus arbitration for real-time embedded systems
US10366019B1 (en) Multiprocessor system having efficient and shared atomic metering resource
US10365681B1 (en) Multiprocessor system having fast clocking prefetch circuits that cause processor clock signals to be gapped
US20140052941A1 (en) Calculation processing device and control method for calculation processing device
US20200341879A1 (en) Information processing apparatus and information processing method
Purantra et al. A Novel Approach to Solve Deadlock Problem in On-Chip BUS Communication
JP2007164428A (en) Bus arbitration circuit and multilayer bus system using it

Legal Events

Date Code Title Description
746 Register noted 'licences of right' (sect. 46/1977)

Effective date: 20121029