US8826058B1 - Delay tolerant asynchronous interface (DANI) - Google Patents
Delay tolerant asynchronous interface (DANI) Download PDFInfo
- Publication number
- US8826058B1 US8826058B1 US14/025,677 US201314025677A US8826058B1 US 8826058 B1 US8826058 B1 US 8826058B1 US 201314025677 A US201314025677 A US 201314025677A US 8826058 B1 US8826058 B1 US 8826058B1
- Authority
- US
- United States
- Prior art keywords
- wrapper
- destination
- data
- integrated circuit
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/04—Generating or distributing clock signals or signals derived directly therefrom
- G06F1/12—Synchronisation of different clock signals provided by a plurality of clock generators
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F5/00—Methods or arrangements for data conversion without changing the order or content of the data handled
- G06F5/06—Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F5/00—Methods or arrangements for data conversion without changing the order or content of the data handled
- G06F5/06—Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
- G06F5/10—Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor having a sequence of storage locations each being individually accessible for both enqueue and dequeue operations, e.g. using random access memory
- G06F5/12—Means for monitoring the fill level; Means for resolving contention, i.e. conflicts between simultaneous enqueue and dequeue operations
- G06F5/14—Means for monitoring the fill level; Means for resolving contention, i.e. conflicts between simultaneous enqueue and dequeue operations for overflow or underflow handling, e.g. full or empty flags
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/7825—Globally asynchronous, locally synchronous, e.g. network on chip
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2205/00—Indexing scheme relating to group G06F5/00; Methods or arrangements for data conversion without changing the order or content of the data handled
- G06F2205/10—Indexing scheme relating to groups G06F5/10 - G06F5/14
- G06F2205/102—Avoiding metastability, i.e. preventing hazards, e.g. by using Gray code counters
Definitions
- the present disclosure relates generally to the design of computer and communication systems; and in particular, but not limited to, delay-tolerant asynchronous interfaces that provide a reliable communications interface between systems, such as, but not limited to synchronous cores on an integrated circuit chip.
- GALS globally-asynchronous, locally-synchronous
- the GALS approach is to partition a system design into decoupled clock-independent modules that can be designed to meet their individual requirements. These independent modules can then be coupled using an asynchronous interconnect network or an asynchronous network-on-chip (ANoC), which improves reliability by simplifying clock-domain crossing timing by using delay-tolerant connection modules.
- ANoC asynchronous network-on-chip
- FIG. 1A illustrates a sending system according to one embodiment
- FIG. 1B illustrates a receiving system according to one embodiment
- FIG. 2 illustrates a wrapper destination control according to one embodiment
- FIG. 3A illustrates a head of queue write address unit according to one embodiment
- FIG. 3B illustrates a tail of queue read address unit according to one embodiment
- FIG. 4 illustrates an asynchronous first-in, first-out queue (FIFO) according to one embodiment
- FIG. 5A illustrates a sending system according to one embodiment
- FIG. 5B illustrates a receiving system according to one embodiment
- FIG. 6 illustrates a wrapper source control according to one embodiment
- FIG. 7 illustrates a wrapper destination control according to one embodiment
- FIG. 8 illustrates a token-based flow control according to one embodiment
- FIG. 9A illustrates a an additional stage synchronization unit according to one embodiment.
- FIG. 9B illustrates an additional stage synchronization unit according to one embodiment.
- One embodiment includes an integrated circuit, comprising: a source wrapper providing an asynchronous sending interface to a sending system on the integrated circuit, with the asynchronous sending interface producing a write clock output signal and a data output signal; a destination wrapper providing an asynchronous receiving interface to a receiving system on the integrated circuit, with the asynchronous receiving interface receiving a write clock input signal and a data input signal; and signal paths on the integrated circuit communicatively coupling the write clock output signal and the write clock input signal, and the data output signal and the data input signal, with the signal paths providing said received write clock input and data input signals with a relative timing said produced between said write clock output and data output signals.
- the destination wrapper includes an asynchronous first-in, first-out queue (aFIFO) providing an intermediate storage of information received on the data input signal lines from a first clock domain with timing corresponding to the write clock input signal and provided to the receiving system operating in a different clock domain that is timed by a read clock received from the receiving system.
- aFIFO asynchronous first-in, first-out queue
- the destination wrapper uses a unary code, not a Gray code, to determine locations within the aFIFO.
- the destination wrapper produces token-based flow control information provided to the source wrapper over a flow control signal path for controlling sending of information from the source wrapper to the destination wrapper.
- each of the sending and receiving systems is synchronous.
- a Delay-tolerant Asynchronous Interface is typically used to make the clock domains for reusable silicon intellectual property (IP) cores completely independent of each other.
- IP silicon intellectual property
- a DANI-wrapped IP core usually appears to its environment as if it were clockless. This property is necessary to address the variability in data transmission-time between source and destination. This variability is a result of the lack of predictability of the properties of transistors and their interconnections in today's leading-edge, integrated-circuit manufacturing processes.
- the term “asynchronous” is used in referring to the wrappers because they provide a non-synchronous interface between sending and receiving systems.
- One embodiment employs dual clocking of components in the asynchronous interfaces.
- a DANI wrapper is applied to the IP core that is the source of data to be transmitted or it can be applied to the IP core that is the destination of that data.
- the transmission time over the route between source and destination may vary, both within and among integrated circuits and be more than a single clock period in duration.
- the source of data may be synchronous and the destination for that data may also be synchronous, but may be operating at a different clock frequency and/or phase. However, this invention also applies if the source, destination or both have an irregular clock and/or are asynchronous.
- One embodiment is expressed as a hierarchical set of block diagrams. At the top level there are two alternative cases:
- Section 1 reviews the case without flow control.
- the flow-control case in Section 2 then requires only a few additional ideas.
- Section 3 reviews some synchronization issues.
- Section 4 discusses some practical issues related to signal integrity.
- Section 5 reminds the reader of the vast number of embodiments of the teachings described herein.
- FIG. 1A illustrates a source wrapper and sending system 100 of a first clocked domain according to one embodiment.
- FIG. 1B illustrates a destination wrapper and receiving system 150 of a second clock domain according to one embodiment.
- One embodiment communicatively couples write clock signal 131 and w data lines 141 of FIGS. 1A and 1B to provide a reliable interface between two independently clocked domains. This design problem is called “clock domain crossing” and is a notoriously difficult task. Conventional solutions compromise either reliability or efficiency.
- sending system 110 produces the three signals of data 113 (w-bits wide), a free-running source clock 111 , and a data available signal 112 , reporting that information is being communicated over data 113 .
- Source wrapper 120 receives these signals.
- Source control 130 converts source clock 111 and data available 112 to a gated write clock 131 signal for transmission to the destination (e.g., destination wrapper 160 of FIG. 1B ).
- Source wrapper 120 also includes a w-bit wide source data register 140 that drives the w-bit wide data bus 141 to the destination (e.g., destination wrapper 160 of FIG. 1B ).
- This arrangement insures that data transitions and the escorting-clock transitions have a well-defined phase relationship at the source. If setup, hold, and clock-to-Q times were zero, then setting clock transitions to take place exactly one-half clock period after data transitions allows for the largest maximum skew constraint and insures that the clock and data transitions arrive at the destination in a timely way. An actual case typically will require a somewhat smaller maximum skew constraint.
- source-synchronous write clock 131 embodiments there are several source-synchronous write clock 131 embodiments, such as, but not limited to those using two-phase or four-phase clocking, etc.
- signal integrity issues will dictate which of them should be used for a particular integrated circuit.
- Two-phase embodiments transmit the clock at half the frequency of source clock 111 , either on one or two wires. These two-phase embodiments are more complicated at the destination than four-phase. Therefore, we delay their discussion until Section 4 and assume here the four-phase option that sets write clock 131 equal in frequency to source clock 111 .
- Destination wrapper and receiving system 150 of FIG. 1B includes destination wrapper 160 and receiving system 190 of one embodiment.
- Receiving system 190 generates a read clock 191 for synchronizing the receiving data 181 into the clock domain of receiving system 190 .
- Destination control 170 of destination wrapper 160 provides, based on write clock 131 , enabling signals (read enable 172 and write enable 171 ) for reading and writing the appropriate w-bit wide register of an asynchronous FIFO 180 (aFIFO) of depth d (meaning it can store d different words of w-bits wide).
- the source-synchronous write clock 131 drives the writing process at the aFIFO 180 while the destination's read clock 191 drives the reading process.
- the empty signal 173 indicates that the aFIFO 180 is not empty and there are data words available to be read.
- the write enable 171 and read enable 172 signals are d-bit wide pointers that indicate the appropriate aFIFO 180 registers for writing and reading, respectively.
- Words can be concurrently written to and read from the aFIFO 180 without interference so long as the two pointers differ (e.g., they are concurrently accessing different registers). This is the case so long as empty 173 is asserted and the aFIFO 180 does not overflow, a condition that can never occur if the destination clock is at least as fast as the source clock. Destination control 170 and the aFIFO 180 make up the DANI wrapper at the destination, which provides empty signal 173 (as a data available signal) and the w-bit wide data 181 from the entry of aFIFO 180 selected by read enable signal 172 .
- One embodiment includes multiple instances of the source control 130 and source data register 140 within the source wrapper 120 . Similarly, one embodiment includes multiple instances of the destination control 170 and the aFIFO 180 within the destination wrapper.
- FIG. 2 illustrates one embodiment of a DANI wrapper destination control 270 for generating signals for controlling the timing of communications operations to ensure reliability.
- DANI wrapper destination control 270 includes the head of queue write address register (H W ) 272 and the tail of queue read address register (T R ) 280 .
- These write and read address registers record, in coded form, the position of the next aFIFO register to be written and the next aFIFO register to be read, respectively (e.g., to or from aFIFO 180 of FIG. 1B ).
- the former is written on the write clock 131 (e.g., from source control 130 of FIG. 1A ) and the latter on the read clock 191 (e.g., from receiving system 190 of FIG. 1B ).
- H W register 272 and T R register 280 must be synchronized first because they are advanced on different clocks—e.g., from the source (write) domain (denoted by subscript “W”) and the destination (read) domain (denoted by subscript “R”).
- This synchronization is done in H R register 274 receiving H W signal 273 so that the synchronized write register output 275 and read register output 285 can be compared by comparator 290 in the domain of the read clock ( 191 ).
- comparator 290 When H R 275 and T R 285 are different, data 181 (from aFIFO 180 of FIG. 1B ) are available and the aFIFO 180 is not empty. Comparator 290 generates the appropriate empty 173 signal. Conversion from the coding scheme used in H W 272 and T R 280 to the decoded pointers, write enable 171 and read enable 172 is carried out by the two U ⁇ X decoding blocks 276 and 286 .
- Head register H W 372 Shown in FIG. 3A is Head register H W 372 and shown in FIG. 3B is Tail register T R 380 used in one embodiment to synchronize communications between two independently clocked domains.
- the Head register H W 372 is composed of a shift register with d flip-flops (e.g., typically corresponding to the maximum number of entries that can be stored in an aFIFO 180 of FIG. 1B ).
- the first d ⁇ 1 shift-register flip-flops, FF 1 , FF 2 , . . . FF d ⁇ 1 shift their Q outputs to the D input to the right.
- FF d shifts its Q output back to the D input of FF 1 .
- This sequence is a unary code that is fixed in length and repeats cyclically, stepping forward on each rising edge of write clock 131 .
- H W 372 contains a code for which a transition from 1 to 0 or from 0 to 1 in the example sequence of four bits identifies a unique aFIFO location that is used to construct a four-bit address pointer. This rule applies except for the 1111 and 0000 cases when the right-most bit is the pointer.
- a gray code, lookup table, and/or other sequence generator is used instead of the unary code described supra.
- This particular, fixed-length unary code has the property that only one bit changes at each step in the sequence and can be easily generalized to any number of bits d.
- the property of the code wherein only a single-bit changes on each rising edge of the write clock facilitates the synchronization that takes place in H R 274 .
- H W 272 which is H W 372 of one embodiment
- H R 274 a register synchronized to the receive clock (e.g., read clock 191 ).
- This synchronization step assumes that a single read clock cycle allows sufficient settling time to achieve the desired mean time between failures (MTBF). However, if an increased MTBF is required, added clock cycles can be inserted to increase the effective settling time. Alternative such schemes are described in Section 3. It is important to recall that only one bit of H W 372 of FIG. 3A changes at a time in one embodiment. It does not matter if a transition is missed because the next clock will catch it. However, if the changing bit of H W 372 remains metastable throughout the allowed settling-time, a synchronization failure may occur.
- tail register T R 380 is like H W 372 (of FIG. 3A ), except it steps on read clock 191 and has an active enable signal instead of being fixed high.
- T R 380 uses the same d-bit unary code, as do H W 372 and H R 274 (of FIG. 2 ).
- the codes in H R 274 and T R 380 are identical and both synchronized to read clock 191 so that the empty signal 173 (of FIG. 2 ) is false (e.g. empty is true).
- the codes in H R 274 and T R 280 are not identical they can be compared and a empty signal 173 generated.
- This empty signal 173 is used to enable the T R register 380 so that it does not move ahead in its cycle unless the aFIFO 180 has data to be read.
- the U ⁇ X decoder 276 and 286 takes the codes used in the H R 274 and T R 380 registers and decodes them by converting to a “one-hot” code suitable for enabling a single register in the aFIFO 480 .
- H W register 372 shifts on every rising edge of write clock 131 .
- the details of the T R register 380 are similar except that it shifts on the rising edge of read clock 191 unless the empty signal 173 is not asserted.
- FIG. 4 Shown in FIG. 4 is an aFIFO 480 used in one embodiment.
- aFIFO 480 uses d registers, each w-bits wide.
- w-wide data ( 141 ) are transmitted on the rising edge of write clock 131 , only one of the d registers is write-enabled as determined by the d-bit write enable signal 171 .
- the Q outputs of all the registers 482 are multiplexed ( 490 ) together and only the register selected by the d-bit read enable signal 172 is presented as output w-bit wide data 181 .
- the temporal relationship among the w-bit data lines 141 input to the destination wrapper 160 may be overly skewed.
- the temporal relationship between the write clock 131 and these data line 141 may also be overly skewed. Too much skew in any of these relationships may lead to setup or hold violations at the inputs to the d registers of aFIFO 480 . These violations may, in turn, lead to data errors.
- Design tools generally use synchronous timing constraints that utilize absolute values of time measured with respect to the root of the clock tree. These constraints are ineffective in controlling the skew in data and clock signals input to destination wrapper 160 .
- relative timing constraints applied, in one embodiment, at the destination wrapper 160 between the data lines 141 and the write clock 131 can minimize this skew.
- Application of said relative constrains can yield reliable performance of the resulting integrated circuit.
- satisfaction of these relative constraints is accomplished by iteratively rerouting problem paths until static timing analysis determines that skew is within acceptable limits.
- FIGS. 5A-B show a top-level diagram of one embodiment with flow control added to the circuits of FIG. 1A-B .
- An acknowledgement token (ACK) 532 is generated whenever the destination 590 ( FIG. 5B ) reads a word from aFIFO 580 .
- Source control 530 FIG. 5A ) keeps track of these ACK tokens and only allows data to be transmitted when the destination aFIFO 580 has room for it.
- Data words 581 ( FIG. 5B ) are read at the destination when the aFIFO is empty 573 and read data 592 is asserted. Otherwise, the action at this top level of one embodiment of FIGS. 5A and 5B is the same as one embodiment without flow control of FIGS. 1A and 1B (typically when the last two digits of a reference number appears in two figures, they refer to the same thing but possibly in a different embodiment).
- FIG. 6 The details of one embodiment 630 of source control 530 of FIG. 5A are shown in FIG. 6 , where most of the elements are mirror images of those in the wrapper destination control 270 (of FIG. 2 ) without flow control. However, in the source control there is no need to compute the write nor read enable.
- a write clock control block 692 is added similar to that shown in the source control 130 (of FIG. 1A ). In the embodiment of FIG. 6 , however, write control block 692 converts the free-running source clock 511 and empty signal 533 to a gated write clock 531 for transmission to the destination. As shown in FIG. 5A , only when empty 533 is asserted are data words 541 delivered by the source wrapper 520 to the data bus and sending system 510 is enabled to send data ( 513 ) to source wrapper 520 .
- One embodiment with flow control includes multiple instances of the source control 530 , the source data register 540 , the destination control 570 and the aFIFO 580 within the source and destination wrappers 520 and 560 .
- wrapper destination control 770 (of FIG. 7 ) also includes ACK Control 760 at the upper right that block functions similarly to source control 130 of FIG. 1A .
- ACK Control 760 at the upper right that block functions similarly to source control 130 of FIG. 1A .
- Tokens are also associated with each data word 541 ( FIG. 5A ) transmitted by the source wrapper 520 . It can be shown that only d tokens are contained in the system ( 500 of FIG. 5A coupled to 550 of FIG. 5B ) so that the depth d of the aFIFO 580 ( FIG. 5B ) is always sufficient to store the data words transmitted by source wrapper 520 ( FIG. 5A ).
- the synchronization process at the source is identical to that at the destination. As a result, the phase and period of the source and destination clocks can be independent of each other.
- This method of flow control of one embodiment can be understood from examination of the Petri net 800 shown in FIG. 8 .
- the transition 802 (vertical bar) models the launching of a w-wide data word 541 from source wrapper 520 .
- a token is removed from the left-hand place 801 and inserted in the lower-middle place 803 to indicate a data word in flight on the bus from source wrapper 520 ( FIG. 5A ) to destination wrapper 560 ( FIG. 5B ).
- the firing of the upper left transition 808 restocks the tokens in the left-hand place 801 indicating that aFIFO 580 has freed up a w-bit wide entry so it can accept new data 541 from source wrapper 520 ( FIG. 5A )
- the system conserves the number of tokens in the Petri net. As a result there can never be more than d tokens in the right hand place modeling the number of data words in the destination aFIFO 580 of destination wrapper 560 ( FIG. 5B ). This insures that the aFIFO 580 can never overflow despite variations in delays en route and the timing of the consumption of words by the destination system 590 (e.g., IP core). This is an essential property of the flow control system of one embodiment because it avoids the need to calculate a full signal at the aFIFO 580 , a tricky business at best and impossible to do on a timely basis.
- the Petri net initial condition of d tokens in the left-hand place 801 of FIG. 8 corresponds to initializing the T R register 680 and H R register 694 ( FIG. 6 ) to all 1's and all 0's, respectively in the wrapper source control 630 of FIG. 6 .
- d data words can be sent by the source wrapper 520 ( FIG. 5A ) before T R register 680 and H R register 694 are both all 0s.
- the empty signal 533 of source wrapper 520 is then de-asserted curtailing the transmissions.
- One embodiment shows one source and one destination.
- One embodiment has one or more sources and one or more destinations.
- One embodiment includes one or more intermediate router modules to direct the flow of data words.
- these router modules are synchronous.
- these router modules are asynchronous.
- point-to-point routes include a DANI wrapper at the destination.
- not all point-to-point routes include a DANI wrapper.
- typically not using flow control a single source broadcasts to multiple destinations.
- a router that implements an asynchronous data branch uses a DANI wrapper.
- a router that implements an asynchronous data branch does not use a DANI wrapper.
- a DANI wrapper may include multiple source and destination interfaces.
- the logic 290 in FIG. 2 that computes H R ⁇ T R determines empty 173 , signals that a data word 181 is available at the output of the aFIFO 180 of destination wrapper 160 of FIG. 1B and enables the advancement of T R 280 on the next read clock 191 .
- metastability in H R 274 can produce erroneous results for empty 173 .
- MTBF mean time between failures
- ⁇ is the settling time-constant of the flip-flops in H R 274
- T W is their metastability window
- f W is the frequency of write clock ( 131 ) transitions
- f R is the read clock ( 191 ) frequency.
- the available settling time t S is made as large as possible. This time is compromised by both t L and t SU .
- the logic delay t L through the H R ⁇ T R block 290 is at best equivalent to two gates in an ASIC or a single LUT in an FPGA. The logic family used will fix the setup time t SU . As a result, one embodiment may not achieve an adequate MTBF with the design shown in FIG. 2 .
- FIGS. 9A-B Two embodiments for additional synchronization settling-time are shown in FIGS. 9A-B .
- Each shown embodiment 900 , 920 introduces an additional stage and an additional clock period of delay in the availability of the empty signal 173 ( FIG. 2 ).
- This additional stage increases the latency of arrival of data words by one clock tick and dramatically enhances MTBF. Which of the two provides the largest increase in MTBF will depend on circuit parameters and can be determined by simulation.
- Embodiment 900 is a familiar two-stage synchronizer 900 instantiated for each of the d bits in H R 274 ( FIG. 2 ). It replaces the H R block 274 in FIG. 2 .
- the MTBF is much larger because of a larger t S and a smaller T W .
- t S 2t R ⁇ t L ⁇ t SU , an increase of t R over the single stage case.
- the smaller value of T W and the value of ⁇ have to be determined from simulation using specific circuit parameters. However, these changes are small compared to the effect of the increase in the value of the exponent.
- the extra stage of synchronization follows the logic used to calculate inequality between H R 274 and T R 280 .
- the value of t S is unchanged from that of FIG. 9A , but the values of T W and ⁇ may be different. Simulation is used to determine their values in one embodiment. If additional settling time is required, a synchronizer with more than two stages may be used in either embodiment 900 of FIG. 9A or embodiment 920 of FIG. 9B .
- embodiment 920 of FIG. 9B requires only one additional flip-flop
- embodiment 900 of FIG. 9A requires d extra flip-flops.
- d 4
- the embodiment 920 of FIG. 9B requires only one additional flip-flop
- the embodiment 900 of FIG. 9A requires four.
- the increase of d flip-flops for embodiment 900 of FIG. 9A is only a small fractional increase in required resources.
- wrapper destination control 770 including logic 790 , of FIG. 7
- wrapper destination control 270 including logic 290 , of FIG. 2
- wrapper source control including logic 696 , of FIG. 6 .
- the write clock line 131 and data bus 141 may travel over a substantial portion of the integrated circuit as indicated by the ellipsis in the lines. Transitions on data bus 141 occur at the frequency of rising edges of the clock. However, transitions on write clock line 131 occur at twice that frequency and as a result may be subject to threats to signal integrity, particularly for long runs. It is desirable that write clock line 131 and data bus 141 have the same upper frequency limit.
- the source wrapper 520 launch the data 541 and the write clock 531 with a well-defined phase relationship to each other. This simplifies the application of relative timing constraints and can be done if all signals are similarly registered at the source wrapper 520 . However, registering the data is difficult to do when the clock line must have twice as many transitions as the data lines.
- two toggle flip-flops are included at the source control 530 of FIG. 1 , one toggling on the rising clock edge and one on the falling edge.
- the two half-frequency clock lines are transmitted to the destination and, by combining them in an XOR gate, the original clock frequency can be recovered.
- the write clock 531 and ACK 532 lines shown in FIGS. 5A-B may be gated by data available signal 512 and read data signal 592 , respectively. For high clock rates this gating may be problematic and an enable signal escorting these clock lines may be required. This will allow write clock 531 and ACK 532 to be continuously active, but their transitions ignored when the enable signal is not asserted.
- a very wide data bus 141 of FIGS. 1A-B and 541 of FIGS. 5A-B may, even with the application of relative timing constraints, have skew that is too large to satisfy the setup and hold constraints at the aFIFO input 180 of FIG. 1B and 580 of FIG. 5B .
- This problem can be resolved by dividing the bus 141 , 541 into a number of smaller busses each of whose skew is tolerable. The skew between busses can then be absorbed by an individual aFIFO on each bus. Only when all portions of a word have been received will the destination core read the entire word.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Information Transfer Systems (AREA)
Abstract
Description
-
- DANI without flow control. A wrapper for the destination IP core that can be used when the source clock frequency is never greater than the destination clock frequency. A trivial wrapper for the source may also be included.
- DANI with flow control. Wrappers applied to both source and destination IP cores that can be used no matter the relationship between source and destination clock frequencies.
-
- 0000→1000→1100→1110→1111→0111→0011→0001→0000
X i =U i ⊕U i+1 ;i=1,2, . . . d−1
X d =U d ⊕Ū 1 ;i=d−1
An example conversion from U→X for d=4 is
-
- 0000→0001, 1000→1000, 1100→0100, 1110→0010,
- 1111→0001, 0111→1000, 0011→0100, 0001→0010.
Claims (24)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/025,677 US8826058B1 (en) | 2012-09-16 | 2013-09-12 | Delay tolerant asynchronous interface (DANI) |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201261701704P | 2012-09-16 | 2012-09-16 | |
| US14/025,677 US8826058B1 (en) | 2012-09-16 | 2013-09-12 | Delay tolerant asynchronous interface (DANI) |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US8826058B1 true US8826058B1 (en) | 2014-09-02 |
Family
ID=51400181
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/025,677 Active US8826058B1 (en) | 2012-09-16 | 2013-09-12 | Delay tolerant asynchronous interface (DANI) |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US8826058B1 (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140376569A1 (en) * | 2013-06-19 | 2014-12-25 | Netspeed Systems | Multiple clock domains in noc |
| US9722767B2 (en) | 2015-06-25 | 2017-08-01 | Microsoft Technology Licensing, Llc | Clock domain bridge static timing analysis |
| US10447461B2 (en) * | 2015-12-01 | 2019-10-15 | Infineon Technologies Austria Ag | Accessing data via different clocks |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6850092B2 (en) * | 2000-06-09 | 2005-02-01 | The Trustees Of Columbia University | Low latency FIFO circuits for mixed asynchronous and synchronous systems |
| US20070097771A1 (en) * | 2005-10-28 | 2007-05-03 | Yeh-Lin Chu | Asynchronous first-in first-out cell |
| US7310396B1 (en) * | 2003-03-28 | 2007-12-18 | Xilinx, Inc. | Asynchronous FIFO buffer for synchronizing data transfers between clock domains |
| US20090019193A1 (en) * | 2007-07-09 | 2009-01-15 | Luk King W | Buffer circuit |
| US20090323876A1 (en) * | 2008-06-30 | 2009-12-31 | Sun Microsystems, Inc. | Adaptive synchronization circuit |
-
2013
- 2013-09-12 US US14/025,677 patent/US8826058B1/en active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6850092B2 (en) * | 2000-06-09 | 2005-02-01 | The Trustees Of Columbia University | Low latency FIFO circuits for mixed asynchronous and synchronous systems |
| US7310396B1 (en) * | 2003-03-28 | 2007-12-18 | Xilinx, Inc. | Asynchronous FIFO buffer for synchronizing data transfers between clock domains |
| US20070097771A1 (en) * | 2005-10-28 | 2007-05-03 | Yeh-Lin Chu | Asynchronous first-in first-out cell |
| US20090019193A1 (en) * | 2007-07-09 | 2009-01-15 | Luk King W | Buffer circuit |
| US20090323876A1 (en) * | 2008-06-30 | 2009-12-31 | Sun Microsystems, Inc. | Adaptive synchronization circuit |
| US8559576B2 (en) | 2008-06-30 | 2013-10-15 | Oracle America, Inc. | Adaptive synchronization circuit |
Non-Patent Citations (2)
| Title |
|---|
| Quinton et al., "Practical Asynchronous Interconnect Network Design," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, May 2008, pp. 579-588, vol. 16, No. 5, IEEE, New York, NY. |
| Santosh Sood, "A Novel Interleaved and Distributed FIFO," Thesis, Nov. 2005, The University of British Columbia, Vancouver, BC, CA (115 pages). |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140376569A1 (en) * | 2013-06-19 | 2014-12-25 | Netspeed Systems | Multiple clock domains in noc |
| US10027433B2 (en) * | 2013-06-19 | 2018-07-17 | Netspeed Systems | Multiple clock domains in NoC |
| US9722767B2 (en) | 2015-06-25 | 2017-08-01 | Microsoft Technology Licensing, Llc | Clock domain bridge static timing analysis |
| US10447461B2 (en) * | 2015-12-01 | 2019-10-15 | Infineon Technologies Austria Ag | Accessing data via different clocks |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JPH10117185A (en) | Synchronizer for data transfer, method and system | |
| US8352774B2 (en) | Inter-clock domain data transfer FIFO circuit | |
| US6308229B1 (en) | System for facilitating interfacing between multiple non-synchronous systems utilizing an asynchronous FIFO that uses asynchronous logic | |
| TWI298888B (en) | Pseudo-synchronization of the transportation of data across asynchronous clock domains | |
| US5487092A (en) | System for high-speed synchronization across clock domains | |
| US7925803B2 (en) | Method and systems for mesochronous communications in multiple clock domains and corresponding computer program product | |
| CN106897238B (en) | Data processing device and method | |
| JPH11505047A (en) | Source synchronous clock type data link | |
| US5539739A (en) | Asynchronous interface between parallel processor nodes | |
| US9672008B2 (en) | Pausible bisynchronous FIFO | |
| JP2006522378A (en) | Pipeline synchronization device | |
| US8826058B1 (en) | Delay tolerant asynchronous interface (DANI) | |
| US7518408B2 (en) | Synchronizing modules in an integrated circuit | |
| EP2015457B1 (en) | Serial-to-parallel conversion circuit and method of designing the same | |
| US20160173266A1 (en) | Deskew fifo buffer with simplified initialization | |
| Mekie et al. | Interface design for rationally clocked GALS systems | |
| Datta et al. | qCDC: Metastability-resilient synchronization FIFO for SFQ logic | |
| KR100817270B1 (en) | Interface device and method for synchronizing data | |
| JPWO2016158063A1 (en) | Asynchronous interface | |
| Ning et al. | Design of a GALS Wrapper for Network on Chip | |
| Abasaheb et al. | Design of mesochronous dual clock fifo buffer with modified synchronizer circuit | |
| Elrabaa | A new FIFO design enabling fully-synchronous on-chip data communication network | |
| Sheibanyrad et al. | Two efficient synchronous asynchronous converters well-suited for network on chip in GALS architectures | |
| JP2001195354A (en) | Inter-lsi data transfer system, and source synchronous data transfer system used therefor | |
| US7076680B1 (en) | Method and apparatus for providing skew compensation using a self-timed source-synchronous network |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: BLENDICS, INC., A CORPORATION OF DELAWARE, MISSOUR Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:COX, JEROME R., JR.;ENGEL, GEORGE;MOSCOLA, JAMES;AND OTHERS;REEL/FRAME:031197/0474 Effective date: 20130912 |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551) Year of fee payment: 4 |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2552); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 8 |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2553); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 12 |