US9002693B2 - Wire like link for cycle reproducible and cycle accurate hardware accelerator - Google Patents
Wire like link for cycle reproducible and cycle accurate hardware accelerator Download PDFInfo
- Publication number
- US9002693B2 US9002693B2 US13/342,128 US201213342128A US9002693B2 US 9002693 B2 US9002693 B2 US 9002693B2 US 201213342128 A US201213342128 A US 201213342128A US 9002693 B2 US9002693 B2 US 9002693B2
- Authority
- US
- United States
- Prior art keywords
- programmable gate
- field programmable
- clock frequency
- signals
- gate array
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G06F17/5027—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/32—Circuit design at the digital level
- G06F30/33—Design verification, e.g. functional simulation or model checking
- G06F30/3308—Design verification, e.g. functional simulation or model checking using simulation
- G06F30/331—Design verification, e.g. functional simulation or model checking using simulation with hardware acceleration, e.g. by using field programmable gate array [FPGA] or emulation
Definitions
- the present invention relates to the electrical, electronic and computer arts, and, more particularly, to simulation of integrated circuit (IC) chips and the like.
- FPGA Field Programmable Gate Array
- an exemplary method includes the steps of providing first and second field programmable gate arrays implementing, respectively, first and second blocks of a circuit design to be simulated; operating the first and second field programmable gate arrays at a first clock frequency; and providing a wire like link to send a plurality of signals between the first and second field programmable gate arrays.
- the wire like link includes a serializer, on the first field programmable gate array, to serialize the plurality of signals; a deserializer on the second field programmable gate array, to deserialize the plurality of signals; and a connection between the serializer and the deserializer.
- a further step includes operating the serializer and the deserializer at a second clock frequency, greater than the first clock frequency, the second clock frequency being selected such that latency of transmission and reception of the plurality of signals is less than a period corresponding to the first clock frequency.
- an exemplary apparatus for simulating a circuit design includes first and second field programmable gate arrays implementing, respectively, first and second blocks of the circuit design to be simulated; at least a first clock source which clocks the first and second field programmable gate arrays such that they operate at a first clock frequency; and a wire like link configured to send a plurality of signals between the first and second field programmable gate arrays.
- the wire like link in turn includes a serializer, on the first field programmable gate array, to serialize the plurality of signals; a deserializer on the second field programmable gate array, to deserialize the plurality of signals; and a connection between the serializer and the deserializer.
- Also included in the apparatus is at least a second clock source which clocks the serializer and the deserializer such that they operate at a second clock frequency, greater than the first clock frequency, the second clock frequency having a value such that latency of transmission and reception of the plurality of signals is less than a period corresponding to the first clock frequency.
- an exemplary design structure tangibly embodied in a non-transitory manner in a machine readable medium, includes instructions which cause first and second field programmable gate arrays to implement, respectively, first and second blocks of a circuit design to be simulated.
- the first field programmable gate array has as a macro thereon at least a portion of a serializer to serialize a plurality of signals to be sent over a wire like link between the first and second field programmable gate arrays.
- the second field programmable gate array has as a macro thereon at least a portion of a deserializer to deserialize the plurality of signals.
- the design structure also includes instructions which cause the first and second field programmable gate arrays to implement at least one port for receiving a signal from at least a first clock source which clocks the first and second field programmable gate arrays such that they operate at a first clock frequency; and instructions which cause the first and second field programmable gate arrays to implement at least one port for receiving a signal from at least a second clock source which clocks the serializer and the deserializer such that they operate at a second clock frequency, greater than the first clock frequency.
- the second clock frequency has a value such that latency of transmission and reception of the plurality of signals is less than a period corresponding to the first clock frequency.
- another exemplary method includes the steps of providing a design structure of the kind just described, and transmitting instructions corresponding to the design structure.
- facilitating includes performing the action, making the action easier, helping to carry the action out, or causing the action to be performed.
- instructions executing on one processor might facilitate an action carried out by instructions executing on a remote processor, by sending appropriate data or commands to cause or aid the action to be performed.
- the action is nevertheless performed by some entity or combination of entities.
- One or more embodiments of the invention or elements thereof can be implemented in the form of means for carrying out one or more of the method steps described herein.
- Some embodiments of the invention are directed to design structures for circuits used in simulation of integrated circuit designs and/or to the circuit designs.
- Techniques of the present invention can provide substantial beneficial technical effects.
- one or more embodiments may provide one or more of the following advantages:
- FIG. 1 shows an exemplary system including two FPGAs and a wire like link, according to an aspect of the invention
- FIG. 2 shows an exemplary waveform diagram for the system of FIG. 1 , according to an aspect of the invention
- FIG. 3 shows an exemplary time domain multiplexer scheme, according to an aspect of the invention
- FIG. 4 shows an exemplary transmitter with a training pattern generator and a receiver with a bit, word alignment and a head latency block, according to an aspect of the invention
- FIG. 5 is a flow diagram of a design process used in semiconductor design, manufacture, and/or test
- FIG. 6 depicts a computer system that may be useful in implementing some aspects and/or elements of the invention, such as in automating a design process as shown in FIG. 5 ;
- FIG. 7 shows an exemplary embodiment of a delay chain used for bit alignment and a corresponding delay chain selection mechanism, according to an aspect of the invention.
- FPGA Field Programmable Gate Array
- the FPGA based hardware accelerator should exactly mimic the behavior of the DUT on a cycle by cycle basis, which means that, if the DUT were simulated on a software simulator or when the DUT is built into a single or multiple chips, at any given DUT clock cycle, all three systems, namely, the hardware accelerator, the software simulator, and the DUT chip should be in the same state. This becomes an issue in the design of hardware accelerators, as the DUT design is now partitioned over multiple FPGAs communicating with each other. Because of pin limitations on the FPGAs, the signals between the FPGAs have to be multiplexed. This multiplexing and demultiplexing of signals consumes clock cycles.
- cycle reproducibility Another issue in building hardware simulation accelerators is cycle reproducibility, which is defined as follows: multiple executions starting from the same initial condition(s) shall yield identical trace(s) for all DUT state(s). This property is quite significant for enabling efficient debugging of the simulation. This requirement constrains how clocking and reset of the entire acceleration system is implemented. It also constrains the design of the wire like links.
- These high speed multiplexers and demultiplexers, also known as SerDes links require link training, which generally involves bit, word and block alignment. These alignment techniques can take variable amounts of time to complete, depending upon the physical and electrical properties of the link. To have a cycle reproducible link, special circuits typically have to be designed into the system, in order to mitigate the effect of this variability.
- One or more embodiments provide innovative circuits and techniques used in the design of WLLs for a cycle reproducible and cycle accurate hardware accelerator.
- FIG. 1 shows two FPGAs, FPGA1 and FPGA2, designated as 102 and 104 , respectively, connected to each other on a printed circuit board (PCB) 106 .
- a hardware accelerator can have hundreds of such FPGAs connected to each other.
- the DUT logic which needs to be simulated using this hardware accelerator is partitioned between the two FPGAs.
- the DUT logic can be represented by a combinatorial logic cloud 108 , 110 and a register 112 , 114 .
- Signals in FPGA1 102 are multiplexed or serialized, in serializer 116 , before sending them to FPGA2 104 .
- the signals are demultiplexed or deserialized in deserializer 118 before sending them to the DUT register 114 .
- CLK11 represent the clock signal which updates the DUT state in FPGA1 102 .
- CLK12 be the clock signal which drives the DUT state in FPGA2 104 . Both CLK11 and CLK12 should be designed to have the same frequency, although they can have different phases.
- CLK2 represents the clock signal which drives the serialized or multiplexed data out of FPGA1 102 and demultiplexes it at FPGA2 104 . For a clock forwarded system, CLK2 is forwarded from FPGA1 102 to FPGA2 104 .
- Non-clock forwarded system also known as a clock data recovery (CDR) system
- CDR clock data recovery
- transmitter version of CLK2 and a receiver version of CLK2; these have similar frequencies (within a certain tolerance) and different phases.
- This exemplary embodiment describes a clock forwarded system, although the techniques described to achieve cycle accuracy and reproducibility can also be applied to a CDR based system, given the teachings herein.
- F t be the maximum frequency at which these C wires can be operated.
- F t is limited by several factors, such as the quality of the printed circuit board material, transmitter/receiver design, packaging, cross-talk between wires, inter symbol interference, and the like.
- FIG. 1 to maximize the performance of the system, it would be desirable to have the frequency of CLK2 to be equal to F t .
- N be the number of F t cycles required for the signal to propagate from FPGA1 102 to FPGA2 104 . This includes the time to multiplex in the transmitter, the flight time between chips, and the propagation delay in the chip output driver of FPGA1 102 and the input receiver of FPGA2 104 . N is also referred to as the latency of the link.
- the data is transferred from FPGA1 102 to FPGA2 104 in less than one DUT cycle (CLK11, CLK12), by choosing to operate the DUT logic at 50 MHz.
- CLK11, CLK12 the DUT cycle
- the CLK2 waveform, at 1 GHz is shown at 221 .
- Twenty pulses of CLK2 occur in one pulse of CLK12; the waveform for the latter, at 50 MHz, is shown at 225 .
- DATA 1, designated as 223 begins transmission at the rising edge of pulse 1 in waveform 221 .
- Link training is appropriate because the signals travelling from one FPGA to another through a printed circuit board or cables undergo several forms of delay. Since the SerDes macros are not intended to work in more than the 1-2 Gb/s range, the FPGA manufacturers, to reduce the power and area cost, typically do not provide any signal conditioning circuits. This makes it difficult to capture the center of the data eye using the forwarded clock. The process of computing the center of the eye with respect to the sampling clock is commonly referred to as bit alignment. There are several techniques for doing this. The FPGA manufacturers typically provide fine delay elements to help solve this problem. These delay elements can be placed in the clock path or the data path, thus moving one edge with respect to the other. The FPGA manufacturers usually recommend placing the delay elements in the clock path.
- One or more embodiments advantageously add the delay elements in the data path.
- a step that must typically be taken in eye measurement using delay elements is averaging. Instead of measuring the width of the eye once, one should measure the eye multiple times and an average should be taken to decide the final set of delay elements. This averaging compensates for long term jitter events.
- Metastability is another issue which the bit alignment circuits typically suffer from. As the data edge is continuously moved with respect to the clock edge, setup and hold violations can occur at the capture latch. Although this cannot be avoided, it should be detected, or else it can give misleading results in delay computation.
- a metastability detection circuit This can be achieved by sending a low frequency square wave training pattern at the transmitter, which when received will look like a thermometer code at the output of the demultiplexer. Metastability at the first receiver latch will typically always appear in the form of bubbles in the thermometer code.
- the bubbles can be very easily detected by XORing (that is, applying an eXclusive OR logic function to) the adjacent bits of the demultiplexer output. Once metastability is detected, the sampling point can be changed by adding or removing a delay element from the data path.
- bit alignment block being configured to perform bit alignment by, inter alia, detecting metastability in a test pattern.
- the process comprises of sending a training pattern (for example a low frequency square wave) which at the transmitting end (T in FIG. 4 ) will look like x00 x00 x0F xFF xFF.
- the bit alignment procedure will try to add delays in the data path, so that the receiver pattern at one time instance say x0F changes to x07 (note the shift-right in pattern as the sampling eye moved from one data window to another). Once this happens, one notes the number of delay elements required to cause this shift.
- the process just described, in and of itself, is known to the skilled artisan to perform delay alignment.
- one or more embodiments provide metastability detection—as the “appearance of shifting” can be caused by a metastable event, so one or more embodiments employ a metastable detection circuit.
- known techniques in and of themselves do not address long term jitter issues, so one or more embodiments carry out the eye measurement several times and for reliability, the average number of delay elements is used.
- the next step in the link alignment process is called word alignment. It involves shifting the bits received at the demultiplexer output, so as to align the first incoming bit to the desired location of the word, which could be, for example, the most significant bit location.
- FPGA manufacturers typically provide a word alignment feature for the SerDes.
- the multiplex ratios can be as high as 100 or more. Current multiplex ratios available in FPGA SerDes macros range from 8 to 16.
- TDM time domain multiplexers
- demultiplexers to extend the multiplex ratio of an existing SerDes block. For example, and referring now to FIG.
- the FPGA pre-existing SerDes 351 provide a multiplex ratio of 8 to 1, whereas let the DUT requirement for the multiplex ratio be 96 to 1.
- element 353 is a demultiplexer.
- a similar demultiplexer can be built at the receiver, as shown at 357 . Note that only the first and eighth 12 to 1 TDMs on the input side, i.e., TDM(1) and TDM(8) are shown at 355 , with the second through seventh TDMs omitted to avoid clutter and symbolized by the ellipsis.
- time division demultiplex units 1-to-12(1) and 1-to-12(8) are shown, with the second through seventh time division demultiplex units 1-to-12(2) through 1-to-12(7) omitted to avoid clutter and symbolized by the ellipsis.
- the hard macro serializer 351 together with the eight TDMs 355 correspond to serializer 116 in FIG. 1 .
- the hard macro deserializer 353 together with the eight time division demultiplexers 357 correspond to deserializer 118 in FIG. 1 .
- an additional circuit is provided in one or more embodiments to perform the word alignment. Indeed, one or more embodiments provide a simple technique to perform this additional bit slip with minimal overhead. To illustrate this, refer again to the example in FIG. 3 of a 12 to 1 multiplexer built on top of an 8 to 1 multiplexer to achieve an overall multiplex ratio of 96 to 1.
- bit alignment block 469 word alignment block 467 , and head latency detection block 465 .
- This word alignment function can be achieved by sending an eight bit pattern from the training block 463 in FIG. 4 exactly once. On detecting this pattern at the node R in the receiver, block 465 sends signal 471 to start the shift operation in the 12 bit shift register 461 . This shift operation should then be stopped after 12 shifts. This shift operation can then be repeated periodically for every DUT data front.
- periodicity is defined by the denominator of equation 2, i.e., M+N+B. This is a single pass word alignment mechanism, which is extremely fast and minimal in area utilization.
- bit and word alignment After the bit and word alignment is achieved by blocks 469 , 467 , one could have several data lanes aligned to a single forwarded clock. Define a combination of multiple data lanes aligned to a single clock as a bank. Within a bank, different data lanes will align at different times. In a hardware accelerator, one could have thousands of such banks across multiple FPGAs. As a result, data lanes in different banks will also align at different times. It is possible to design very complex circuits which accurately predict this time. However, inasmuch as this is a problem which is encountered only at startup, one or more embodiments provide a simple time-out mechanism. Both bit alignment 469 and word alignment 467 are allowed to run for a specific time duration. At the end of this time, if the links are aligned, they are marked as good, and if they are not aligned, they are marked as bad.
- FIFOs are a scarce resource, and having thousands of them can cause routing issues.
- one or more embodiments provide a so-called “burning time” technique. Data transfers between two clocks of the same frequency and different phase can cause an ambiguity of ⁇ 1 clock cycle, that is, the data could arrive a clock cycle early or a clock cycle late depending on where the two clock edges were placed with respect to each other.
- FIFOs instead of inserting FIFOs, one could choose a larger value for the variable B in Equation 2. As a result, the data at the receiver is guaranteed to be stable before the next DUT clock edge.
- an additional multiplexer 467 is provided to select whether the actual data or the training pattern is provided to the hard macro serializer 351 , and thus transmitted to FPGA2 104 .
- the “D” port of unit 467 connects to the eight outputs of the array 355 of TDM(1) to TDM(8).
- the “T” port of unit 467 connects to the eight outputs of the training pattern generator.
- the eight outputs of unit 467 connect to the eight inputs of unit 351 .
- either the eight outputs of the array 355 of TDM(1) to TDM(8) or the eight outputs of the training pattern generator are provided to the eight inputs of unit 351 .
- one or more embodiments employ three discrete timing events:
- One or more embodiments thus provide a Wire Like Link (WLL) for a high performance, cycle accurate, multi-chip hardware accelerator.
- WLL Wire Like Link
- One or more instances implement a WLL using a source synchronous transfer mechanism.
- data sent on link C 120 is synchronous to forwarded clock CLK2, hence called the clock forwarded scheme or source synchronous data transfer scheme.
- a WLL may include bit alignment, word alignment, head latency detection and block alignment circuits in the receiver.
- a WLL includes transmitter circuits to generate training patterns for bit alignment, word alignment, and head latency detection. Given the teachings herein, the skilled artisan can select appropriate patterns. In some instances, a WLL may be be programmable to handle multiple multiplex ratios and variable transfer delays.
- the bit alignment may, in some cases, perform eye measurement several times and use the average for reliable operation. The bit alignment may also, for example, perform bubble error detection to increase the reliability of eye measurement.
- a word alignment circuit may be used to correct for word orientation in the receiver.
- a head latency detection circuit may also be used in some instances to compute the variable latency of a link and mark the boundaries of received data.
- a block alignment circuit may be used in one or more instances to align all the received signals across all the links to a single clock edge. The block alignment circuit may, for example, burn dead cycles in the receiver to align all the received signals.
- a timeout mechanism may be used in some instances in order to remove the ambiguity of training time; a timeout mechanism may be used in training to remove the need of back channel status indication; and/or three discrete time events may be introduced to start training, reset the DUT state, and start the DUT cycles in order to maintain cycle reproducibility.
- Bit alignment block 469 detects metastability as described above, by XORing the adjacent bits of the output of demultiplexer 353 ; the sampling point is then changed by block 469 by adding or removing a delay element from the data path, as also described above.
- Bit alignment block 469 may be implemented, for example, by suitably programming the FPGA to implement the logic described elsewhere herein.
- Word alignment block 467 works in conjunction with head latency detection block 465 to detect the training pattern and implement the above-described single pass word alignment. Furthermore in this regard, in one or more embodiments, word alignment is a two part process. In the first part, one can use the bit slip mechanism provided by the FPGA vendor to rotate the bits at the output of element 353 , till they align with the desired location. In the second part of word alignment, an additional circuit disclosed herein, the head latency detection technique is used to perform word alignment at the output of additional demultiplexer—block 461 .
- Word alignment block 467 may be implemented, for example, by programming the FPGA to implement the logic described elsewhere herein; for example, to carry out a switching function on block 353 to cause slipping to align the words at the deserializer output.
- Head latency detection block 465 may be implemented, for example, by suitably programming the FPGA to implement the logic described elsewhere herein.
- Training pattern generator block 463 may be implemented, in a non-limiting example, by suitably programming the FPGA to provide a plurality of flip-flops arranged to generate a desired pattern; as noted elsewhere herein, the skilled artisan, given the teachings herein, will be able to select appropriate test/training patterns; for example, a slow square wave may be used for bit alignment.
- One or more embodiments thus address design of inter-FPGA links and/or achieving low latency of such links.
- One or more embodiments are cycle accurate and/or cycle reproducible.
- One or more embodiments provide non-packetized links which can be, for example, extremely low latency non-packetized links.
- the links are point-to-point links.
- One or more embodiments provide a head latency detector for word alignment.
- an exemplary method includes the step of providing first and second field programmable gate arrays FPGA1, designated as 102 , and FPGA2, designated as 104 .
- the FPGAs implement, respectively, first and second blocks of a circuit design to be simulated.
- a further step includes operating the first and second field programmable gate arrays at a first clock frequency (the frequency of CLK11 and CLK12).
- a still further step includes providing a wire like link to send a plurality of signals, P, between the first and second field programmable gate arrays.
- the wire like link includes a serializer 116 or 351 plus 355 , on the first field programmable gate array, to serialize the plurality of signals; a deserializer 118 or 353 plus 357 on the second field programmable gate array, to deserialize the plurality of signals; and a connection 120 (e.g., via printed circuit board, cable, optical fiber, or the like) between the serializer and the deserializer.
- An even further step includes operating the serializer and the deserializer at a second clock frequency, F t , greater than the first clock frequency; the second clock frequency is selected such that latency of transmission and reception of the plurality of signals is less than the period corresponding to the first clock frequency, as best seen in FIG. 2 —this aspect advantageously provides cycle accurate links as described elsewhere herein.
- the first and second field programmable gate arrays 102 , 104 are clocked at the first clock frequency by first and second clock signals CLK11 and CLK12 which are potentially out of phase with each other, and, in the operating steps, the first clock frequency is no greater than F s from Equation 2; that is, the first clock frequency is no greater than the second clock frequency F t divided by the sum of:
- bit alignment is performed at the deserializer on the second field programmable gate array by detecting metastability in a test pattern, and adding or removing a delay element in a data path between the first and second field programmable gate arrays to change a sampling point to remove the metastability in the test pattern.
- the data path refers to the data signal travelling over the physical wire C 120 in FIG. 1 .
- an additional step includes generating the test pattern in the first field programmable gate array (e.g., at block 463 ) and transmitting the test pattern to the second field programmable gate array over the wire like link.
- many FPGAs have a delay chain that can be programmed to be placed into the data path; for example, immediately upstream of the deserializer 353 or within the deserializer 353 immediately downstream of the input thereof, as seen in FIG. 7 .
- the data passes through the connection 120 and through the delay chain.
- the chain includes delay elements 791 , 793 , and 795 , interconnected between input 789 and multiplexer 797 .
- Multiplexer 797 has output 787 and select lines 785 . If the select lines 785 cause multiplexer 797 to connect port 0 to output 787 , three delay elements will be in the chain. If the select lines 785 cause multiplexer 797 to connect port 1 to output 787 , two delay elements will be in the chain.
- select lines 785 cause multiplexer 797 to connect port 2 to output 787 , one delay element will be in the chain. If the select lines 785 cause multiplexer 797 to connect port 3 to output 787 , no delay elements will be in the chain. Thus, by changing the signal on select lines 785 , one can change the point at which the clock will sample delayed data.
- further steps include providing the serializer as macro serializer 351 with a plurality of time domain multiplexers 355 at its inputs; and providing the deserializer as a macro deserializer with a corresponding plurality of time domain multiplexers 357 at its outputs.
- the corresponding plurality of time domain multiplexers includes a corresponding plurality of shift registers 461 each sized with a number of bits equal to the multiplex ratio of the plurality of time domain multiplexers.
- the multiplex ratio is 12 to 1 and the shift registers are 12 bit registers.
- any suitable multiple can be chosen, and the value of 12 is entirely exemplary and not intended to be limiting.
- an additional step includes generating the test pattern in the first field programmable gate array (e.g., at block 463 ) and transmitting the test pattern to the second field programmable gate array over the wire like link.
- further steps include carrying out a bit alignment process with a bit alignment block 469 of the second field programmable gate array, for a predetermined amount of time; carrying out a word alignment process with a word alignment block 467 of the second field programmable gate array, for the predetermined amount of time; and designating the wire like link as good if the wire like link is aligned at the end of the predetermined amount of time—this aspect advantageously provides cycle reproducibility as described elsewhere herein—different links may get aligned in different numbers of clock cycles so a predetermined amount of time is allowed. This aspect also advantageously allows for removal of bad channels.
- time is burned by increasing B; that is, picking the first clock frequency to be no greater than the second clock frequency divided by the sum of:
- an apparatus for simulating a circuit design includes first and second field programmable gate arrays FPGA1 and FPGA2, designated respectively as 102 and 104 . These FPGAs implement, respectively, first and second blocks of the circuit design to be simulated. Also provided is at least a first clock source which clocks the first and second field programmable gate arrays such that they operate at a first clock frequency (the frequency of CLK11 and CLK12). Further elements include a wire like link configured to send a plurality of signals between the first and second field programmable gate arrays.
- the wire like link in turn includes a serializer 116 or 351 plus 355 , on the first field programmable gate array, to serialize the plurality of signals; a deserializer 118 or 353 plus 357 , on the second field programmable gate array, to deserialize the plurality of signals; and a connection 120 between the serializer and the deserializer (again, e.g., printed circuit board, cable, or the like).
- a still further element includes at least a second clock source CLK2 which clocks the serializer and the deserializer such that they operate at a second clock frequency, greater than the first clock frequency.
- the second clock frequency has a value such that latency of transmission and reception of the plurality of signals is less than a period corresponding to the first clock frequency.
- the first clock source further comprises a third clock source; i.e., the first clock source includes CLK11 and the third clock source CLK12. Terms such as first, second, third, etc. are used for linguistic convenience only.
- the first and second field programmable gate arrays are clocked at the first clock frequency (50 MHz in the non-limiting example of FIG. 2 ) by the first and third clock sources CLK11 and CLK12 respectively, and the first and third clock sources are potentially out of phase with each other.
- the first clock frequency is no greater than F s from Equation 2, as described above with respect to the exemplary method.
- a further element includes a bit alignment block 469 at the deserializer on the second field programmable gate array.
- the bit alignment block is configured to perform bit alignment by: detecting metastability in a test pattern, and adding or removing a delay element in a data path between the first and second field programmable gate arrays to change a sampling point to remove the metastability in the test pattern. See also discussion of FIG. 7 .
- a still further element includes a training pattern generation block 463 in the first field programmable gate array which generates the test pattern in the first field programmable gate array and transmits the test pattern to the second field programmable gate array over the wire like link.
- the serializer includes a macro serializer 351 and a plurality of time domain multiplexers 355 at inputs of the macro serializer; and the deserializer includes a macro deserializer 353 and a corresponding plurality of time domain multiplexers 357 at outputs of the macro deserializer.
- the corresponding plurality of time domain multiplexers include a corresponding plurality of shift registers 461 , each sized with a number of bits equal to the multiplex ratio of the plurality of time domain multiplexers (as discussed above, 12 to 1 in the non-limiting example such that the register 461 is a 12 bit register); again, 12 is a non-limiting exemplary value.
- a still further element includes a head latency circuit 465 , located at an output node R of the demultiplexer, and configured to detect a test pattern, and, responsive to detection of the test pattern, send a signal 471 to cause the plurality of shift registers to commence a shift operation for a number of shifts equal to the multiplex ratio of the plurality of time domain multiplexers.
- a still further element includes a training pattern generation block 463 in the first field programmable gate array which generates the test pattern in the first field programmable gate array and transmits the test pattern to the second field programmable gate array over the wire like link.
- Some embodiments include a bit alignment block 469 in the second field programmable gate array, configured to carry out a bit alignment process for a predetermined amount of time; and a word alignment block 467 in the second field programmable gate array, configured to carry out a word alignment process for the predetermined amount of time.
- the wire like link is designated as good if the wire like link is aligned at the end of the predetermined amount of time.
- each alignment procedure has a flag at the end, which is raised when successful or lowered when the process failed. This status flag is then used to mark the faulty links.
- time is burned by increasing B; that is, the first clock frequency is selected to be no greater than the second clock frequency divided by the sum of:
- time is burned in the second field programmable gate array so that data received at the second field programmable gate array is stabilized prior to a next clock edge of the third clock source CLK12.
- Embodiments of the invention also contemplate one or more design structures, discussed further below with respect to FIG. 5 .
- Such design structure(s) are, in one or more embodiments, tangibly embodied in a non-transitory manner in a machine readable medium.
- a design structure includes instructions which cause first and second field programmable gate arrays to implement, respectively, first and second blocks of a circuit design to be simulated.
- the first field programmable gate array has as a macro thereon at least a portion of a serializer to serialize a plurality of signals to be sent over a wire like link between the first and second field programmable gate arrays.
- instructions may be provided to cause the first FPGA to implement other portions of the serializer, such as 355 .
- the second field programmable gate array has as a macro thereon at least a portion of a deserializer to deserialize the plurality of signals.
- instructions may be provided to cause the second FPGA to implement other portions of the deserializer, such as 357 .
- the design structure further includes instructions which cause the first and second field programmable gate arrays to implement at least one port for receiving a signal from at least a first clock source which clocks the first and second field programmable gate arrays such that they operate at a first clock frequency, and instructions which cause the first and second field programmable gate arrays to implement at least one port for receiving a signal from at least a second clock source which clocks the serializer and the deserializer such that they operate at a second clock frequency, greater than the first clock frequency.
- the second clock frequency having a value such that latency of transmission and reception of the plurality of signals is less than a period corresponding to the first clock frequency.
- the clocks per se may be located on, and/or external to, the FPGAs.
- the design structure further includes instructions which cause the second field programmable gate array to implement a bit alignment block which carries out a bit alignment process for a predetermined amount of time; instructions which cause the second field programmable gate array to implement a word alignment block which carries out a word alignment process for the predetermined amount of time; and instructions which cause the wire like link to be designated as good if the wire like link is aligned at an end of the predetermined amount of time.
- a method includes providing a design structure of the kind just described, with or without any one, some, or all of the optional features, and transmitting instructions corresponding to the design structure. They may be transmitted over a network, over a cable, by sending a tangible storage medium, or the like. For example, they may be transmitted from a computer to one or more FPGAs over a cable; from a flash memory to an FPGA; or over a local or wide area network. In some cases, where even further speed enhancement is desired, in the transmitting step, the transmitting is to an application specific integrated circuit fabricator, and a further step includes receiving, from the application specific integrated circuit fabricator, an application specific integrated circuit which mimics the programmed first and second field programmable gate arrays.
- the integrated circuit chips that are ultimately manufactured in accordance with the design simulations can be distributed by the fabricator in raw wafer form (that is, as a single wafer that has multiple unpackaged chips), as a bare die, or in a packaged form. In the latter case the chip is mounted in a single chip package (such as a plastic carrier, with leads that are affixed to a motherboard or other higher level carrier) or in a multichip package (such as a ceramic carrier that has either or both surface interconnections or buried interconnections).
- a single chip package such as a plastic carrier, with leads that are affixed to a motherboard or other higher level carrier
- a multichip package such as a ceramic carrier that has either or both surface interconnections or buried interconnections.
- the chip is then integrated with other chips, discrete circuit elements, and/or other signal processing devices as part of either (a) an intermediate product, such as a motherboard, or (b) an end product.
- the end product can be any product that includes integrated circuit chips, ranging from toys and other low-end applications to advanced computer products having a display, a keyboard or other input device, and a central processor.
- FIG. 5 shows a block diagram of an exemplary design flow 500 used for example, in semiconductor IC logic design, simulation, test, layout, and manufacture.
- the design flow shown in FIG. 5 has relevance to one or more embodiments of the invention in three ways; namely:
- the FPGA programming that implements the structures in FIGS. 1-4 and 7 could itself be a design structure that is sent to a fabrication house to fabricate an ASIC for even faster simulations.
- Design flow 500 includes processes, machines and/or mechanisms for processing design structures or devices to generate logically or otherwise functionally equivalent representations of design structures and/or devices.
- the design structures processed and/or generated by design flow 500 may be encoded on machine-readable transmission or storage media to include data and/or instructions that when executed or otherwise processed on a data processing system generate a logically, structurally, mechanically, or otherwise functionally equivalent representation of hardware components, circuits, devices, or systems.
- Machines include, but are not limited to, any machine used in an IC design process, such as designing, manufacturing, or simulating a circuit, component, device, or system.
- machines may include: lithography machines, machines and/or equipment for generating masks (e.g. e-beam writers), computers or equipment for simulating design structures, any apparatus used in the manufacturing or test process, or any machines for programming functionally equivalent representations of the design structures into any medium (e.g. a machine for programming a programmable gate array).
- Design flow 500 may vary depending on the type of representation being designed. For example, a design flow 500 for building an application specific IC (ASIC) may differ from a design flow 500 for designing a standard component or from a design flow 500 for instantiating the design into a programmable array, for example a programmable gate array (PGA) or a field programmable gate array (FPGA) offered by Altera®Inc. or Xilinx® Inc.
- FIG. 5 illustrates multiple such design structures including an input design structure 520 that is preferably processed by a design process 510 .
- Design structure 520 may be a logical simulation design structure generated and processed by design process 510 to produce a logically equivalent functional representation of a hardware element.
- Design structure 520 may also or alternatively comprise data and/or program instructions that when processed by design process 510 , generate a functional representation of the physical structure of a hardware element. Whether representing functional and/or structural design features, design structure 520 may be generated using electronic computer-aided design (ECAD) such as implemented by a core developer/designer. When encoded on a machine-readable data transmission, gate array, or storage medium, design structure 520 may be accessed and processed by one or more hardware and/or software modules within design process 510 to simulate or otherwise functionally represent an electronic component, circuit, electronic or logic module, apparatus, device, or system.
- ECAD electronic computer-aided design
- design structure 520 may comprise files or other data structures including human and/or machine-readable source code, compiled structures, and computer-executable code structures that when processed by a design or simulation data processing system, functionally simulate or otherwise represent circuits or other levels of hardware logic design.
- data structures may include hardware-description language (HDL) design entities or other data structures conforming to and/or compatible with lower-level HDL design languages such as Verilog and VHDL, and/or higher level design languages such as C or C++.
- HDL hardware-description language
- Design process 510 preferably employs and incorporates hardware and/or software modules for synthesizing, translating, or otherwise processing a design/simulation functional equivalent of the components, circuits, devices, or logic structures to generate a Netlist 580 which may contain design structures such as design structure 520 .
- Netlist 580 may comprise, for example, compiled or otherwise processed data structures representing a list of wires, discrete components, logic gates, control circuits, I/O devices, models, etc. that describes the connections to other elements and circuits in an integrated circuit design.
- Netlist 580 may be synthesized using an iterative process in which netlist 580 is resynthesized one or more times depending on design specifications and parameters for the device.
- netlist 580 may be recorded on a machine-readable data storage medium or programmed into a programmable gate array.
- the medium may be a non-volatile storage medium such as a magnetic or optical disk drive, a programmable gate array, a compact flash, or other flash memory. Additionally, or in the alternative, the medium may be a system or cache memory, buffer space, or electrically or optically conductive devices and materials on which data packets may be transmitted and intermediately stored via the Internet, or other networking suitable means.
- Design process 510 may include hardware and software modules for processing a variety of input data structure types including Netlist 580 .
- Such data structure types may reside, for example, within library elements 530 and include a set of commonly used elements, circuits, and devices, including models, layouts, and symbolic representations, for a given manufacturing technology (e.g., different technology nodes, 32 nm, 45 nm, 90 nm, etc.).
- the data structure types may further include design specifications 540 , characterization data 550 , verification data 560 , design rules 570 , and test data files 585 which may include input test patterns, output test results, and other testing information.
- Design process 510 may further include, for example, standard mechanical design processes such as stress analysis, thermal analysis, mechanical event simulation, process simulation for operations such as casting, molding, and die press forming, etc.
- standard mechanical design processes such as stress analysis, thermal analysis, mechanical event simulation, process simulation for operations such as casting, molding, and die press forming, etc.
- One of ordinary skill in the art of mechanical design can appreciate the extent of possible mechanical design tools and applications used in design process 510 without deviating from the scope and spirit of the invention.
- Design process 510 may also include modules for performing standard circuit design processes such as timing analysis, verification, design rule checking, place and route operations, etc.
- Design process 510 employs and incorporates logic and physical design tools such as HDL compilers and simulation model build tools to process design structure 520 together with some or all of the depicted supporting data structures along with any additional mechanical design or data (if applicable), to generate a second design structure 590 .
- Design structure 590 resides on a storage medium or programmable gate array in a data format used for the exchange of data of mechanical devices and structures (e.g. information stored in an IGES, DXF, Parasolid XT, JT, DRG, or any other suitable format for storing or rendering such mechanical design structures).
- design structure 590 preferably comprises one or more files, data structures, or other computer-encoded data or instructions that reside on transmission or data storage media and that when processed by an ECAD system generate a logically or otherwise functionally equivalent form of one or more circuits, physical structures, or the like.
- design structure 590 may comprise a compiled, executable HDL simulation model that functionally simulates the circuits, physical structures, or the like.
- Design structure 590 may also employ a data format used for the exchange of layout data of integrated circuits and/or symbolic data format (e.g. information stored in a GDSII (GDS2), GL1, OASIS, map files, or any other suitable format for storing such design data structures).
- Design structure 590 may comprise information such as, for example, symbolic data, map files, test data files, design content files, manufacturing data, layout parameters, wires, levels of metal, vias, shapes, data for routing through the manufacturing line, and any other data required by a manufacturer or other designer/developer to produce a device or structure.
- Design structure 590 may then proceed to a stage 595 where, for example, design structure 590 : proceeds to tape-out, is released to manufacturing, is released to a mask house, is sent to another design house, is sent back to the customer, etc.
- a general purpose computer or workstation namely, a general purpose computer or workstation, with appropriate software, can be used to initially program an FPGA (after which the programming can simply be included in a flash memory accessible to the FPGA); and a general purpose computer or workstation can be used to run software aspects of the process shown in FIG. 5 .
- a general purpose computer might include, for example, a processor 602 , a memory 604 , and an input/output interface formed, for example, by a display 606 and a keyboard 608 .
- processor as used, in connection with FIG.
- processor 6 to include any processing device, such as, for example, one that includes a CPU (central processing unit) and/or other forms of processing circuitry.
- processor in connection with FIG. 6 may refer to more than one individual processor.
- memory in connection with FIG. 6 is intended to include memory associated with a processor or CPU, such as, for example, RAM (random access memory), ROM (read only memory), a fixed memory device (for example, hard drive), a removable memory device (for example, diskette), a flash memory and the like.
- RAM random access memory
- ROM read only memory
- fixed memory device for example, hard drive
- removable memory device for example, diskette
- flash memory and the like.
- the processor 602 , memory 604 , and input/output interface such as display 606 and keyboard 608 can be interconnected, for example, via bus 610 as part of data processing unit 612 . Suitable interconnections, for example via bus 610 , can also be provided to a network interface 614 , such as a network card, which can be provided to interface with a computer network, and to a media interface 616 , such as a diskette or CD-ROM drive, which can be provided to interface with media 618 . Interface 614 or a different interface can be sued to program the FPGA(s), for example.
- Computer software including instructions or code for performing FPGA programming and/or software aspects of the design process of FIG. 5 may be stored in one or more of the associated memory devices (for example, ROM, fixed or removable memory) and, when ready to be utilized, loaded in part or in whole (for example, into RAM) and implemented by a CPU.
- Such software could include, but is not limited to, firmware, resident software, microcode, and the like.
- a data processing system suitable for storing and/or executing program code will include at least one processor 602 coupled directly or indirectly to memory elements 604 through a system bus 610 .
- the memory elements can include local memory employed during actual implementation of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during implementation.
- I/O devices including but not limited to keyboards 608 , displays 606 , pointing devices, and the like
- I/O controllers can be coupled to the system either directly (such as via bus 610 ) or through intervening I/O controllers (omitted for clarity).
- Network adapters such as network interface 614 may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
- a “server” includes a physical data processing system (for example, system 612 as shown in FIG. 6 ) running a server program. It will be understood that such a physical server may or may not include a display and keyboard.
- Computer instructions may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon. Any combination of one or more computer readable medium(s) may be utilized.
- the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
- a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
- Media block 618 is a non-limiting example.
- a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.
- a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
- Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
- Computer program code may be written, for example, in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- LAN local area network
- WAN wide area network
- Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
- Software aspects such as software which facilitates programming of FPGA(s) and/or carrying out software-related aspects of FIG. 5 may include providing a system comprising distinct software modules embodied on a computer readable storage medium to implement appropriate functionality.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Design And Manufacture Of Integrated Circuits (AREA)
Abstract
Description
-
- speed advantage of up to 100,000-fold over conventional simulators;
- ability to run benchmarks on a hardware simulator to verify performance of a microprocessor design early-on in the design process rather than awaiting expensive and time-consuming fabrication of a hardware prototype of the microprocessor (because software simulations of such benchmarks are unfeasible due to excessive execution times);
- substantial reduction in development times and/or development costs by finding problems early-on prior to prototyping.
M=P/C (1)
F s ≦F t/(M+N+B) (2)
where B represents the extra cycles required to compensate for phase differences between CLK11 and CLK12. These extra cycles can also be used to prevent any setup and hold violations that may arise because of any jitter riding on the clock. It will be further illustrated how the selection of different values of B can, in some embodiments, yield several benefits, such as a universal block alignment circuit and a FIFO-less design (FIFO=First-In-First-Out).
P=10
C=1
M=10/1=10
N=8, B=2
F t=1 GHz
F s=1 GHz/(10+8+2)=50 MHz
-
- N, the number of cycles at the second clock frequency required for a given one of the plurality of signals to propagate from the first field programmable gate array to the second field programmable gate array;
- B, the number of extra cycles at the second clock frequency required to compensate for phase differences between the first and second clock signals; and
- M, which is the total number of signals in the plurality of signals, P, divided by the number of wires, C, available in the connection between the serializer and the deserializer.
-
- the number of cycles at the second clock frequency required for the given one of the plurality of signals to propagate from the first field programmable gate array to the second field programmable gate array;
- a number greater than the number of extra cycles at the second clock frequency required to compensate for phase differences between the first and second clock signals; and
- the total number of signals in the plurality of signals, divided by the number of wires available in the connection between the serializer and the deserializer.
-
- the number of cycles at the second clock frequency required for the given one of the plurality of signals to propagate from the first field programmable gate array to the second field programmable gate array;
- a number greater than the number of extra cycles at the second clock frequency required to compensate for phase differences between the first and second clock signals; and
- the total number of signals in the plurality of signals, divided by the number of wires available in the connection between the serializer and the deserializer.
Claims (24)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/342,128 US9002693B2 (en) | 2012-01-02 | 2012-01-02 | Wire like link for cycle reproducible and cycle accurate hardware accelerator |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/342,128 US9002693B2 (en) | 2012-01-02 | 2012-01-02 | Wire like link for cycle reproducible and cycle accurate hardware accelerator |
Publications (2)
Publication Number | Publication Date |
---|---|
US20130170525A1 US20130170525A1 (en) | 2013-07-04 |
US9002693B2 true US9002693B2 (en) | 2015-04-07 |
Family
ID=48694757
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/342,128 Active 2033-01-22 US9002693B2 (en) | 2012-01-02 | 2012-01-02 | Wire like link for cycle reproducible and cycle accurate hardware accelerator |
Country Status (1)
Country | Link |
---|---|
US (1) | US9002693B2 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140109035A1 (en) * | 2012-10-12 | 2014-04-17 | Mediatek Inc. | Layout method for printed circuit board |
CN106844256A (en) * | 2017-02-22 | 2017-06-13 | 天津大学 | A kind of active power distribution network real-time simulator internal interface method for designing based on many FPGA |
US10652043B2 (en) | 2016-06-14 | 2020-05-12 | G.E. Aviation Systems, LLC | Communication regulation in computing systems |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2497314A (en) * | 2011-12-06 | 2013-06-12 | St Microelectronics Grenoble 2 | Independent blocks to control independent busses or a single combined bus |
US9081925B1 (en) * | 2012-02-16 | 2015-07-14 | Xilinx, Inc. | Estimating system performance using an integrated circuit |
US9286423B2 (en) | 2012-03-30 | 2016-03-15 | International Business Machines Corporation | Cycle accurate and cycle reproducible memory for an FPGA based hardware accelerator |
US9230046B2 (en) * | 2012-03-30 | 2016-01-05 | International Business Machines Corporation | Generating clock signals for a cycle accurate, cycle reproducible FPGA based hardware accelerator |
US8972806B2 (en) * | 2012-10-18 | 2015-03-03 | Applied Micro Circuits Corporation | Self-test design for serializer / deserializer testing |
US9529946B1 (en) | 2012-11-13 | 2016-12-27 | Xilinx, Inc. | Performance estimation using configurable hardware emulation |
US9454630B1 (en) * | 2013-02-26 | 2016-09-27 | Xilinx, Inc. | Graphical representation of integrated circuits |
US9846587B1 (en) * | 2014-05-15 | 2017-12-19 | Xilinx, Inc. | Performance analysis using configurable hardware emulation within an integrated circuit |
US9608871B1 (en) | 2014-05-16 | 2017-03-28 | Xilinx, Inc. | Intellectual property cores with traffic scenario data |
US11153191B2 (en) | 2018-01-19 | 2021-10-19 | Intel Corporation | Technologies for timestamping with error correction |
TWI774295B (en) * | 2021-03-29 | 2022-08-11 | 瑞昱半導體股份有限公司 | Method for data transmission control of inter field programmable gate array and associated apparatus |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050081170A1 (en) | 2003-10-14 | 2005-04-14 | Hyduke Stanley M. | Method and apparatus for accelerating the verification of application specific integrated circuit designs |
US20070116465A1 (en) * | 2005-11-21 | 2007-05-24 | Tellabs Operations, Inc. | Systems and methods for dynamic alignment of data bursts conveyed over a passive optical net work |
US7506297B2 (en) | 2004-06-15 | 2009-03-17 | University Of North Carolina At Charlotte | Methodology for scheduling, partitioning and mapping computational tasks onto scalable, high performance, hybrid FPGA networks |
US20090254505A1 (en) | 2008-04-08 | 2009-10-08 | Microsoft Corporation | Reconfigurable hardware accelerator for boolean satisfiability solver |
US7603540B2 (en) | 2003-10-30 | 2009-10-13 | International Business Machines Corporation | Using field programmable gate array (FPGA) technology with a microprocessor for reconfigurable, instruction level hardware acceleration |
US7640528B1 (en) | 2006-08-04 | 2009-12-29 | Altera Corporation | Hardware acceleration of functional factoring |
US7969187B1 (en) | 2006-04-18 | 2011-06-28 | Xilinx, Inc. | Hardware interface in an integrated circuit |
-
2012
- 2012-01-02 US US13/342,128 patent/US9002693B2/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050081170A1 (en) | 2003-10-14 | 2005-04-14 | Hyduke Stanley M. | Method and apparatus for accelerating the verification of application specific integrated circuit designs |
CN1867893A (en) | 2003-10-14 | 2006-11-22 | 史坦利·M·海德克 | Method and apparatus for accelerating the verification of application specific integrated circuit designs |
US7603540B2 (en) | 2003-10-30 | 2009-10-13 | International Business Machines Corporation | Using field programmable gate array (FPGA) technology with a microprocessor for reconfigurable, instruction level hardware acceleration |
US7506297B2 (en) | 2004-06-15 | 2009-03-17 | University Of North Carolina At Charlotte | Methodology for scheduling, partitioning and mapping computational tasks onto scalable, high performance, hybrid FPGA networks |
US20070116465A1 (en) * | 2005-11-21 | 2007-05-24 | Tellabs Operations, Inc. | Systems and methods for dynamic alignment of data bursts conveyed over a passive optical net work |
US7969187B1 (en) | 2006-04-18 | 2011-06-28 | Xilinx, Inc. | Hardware interface in an integrated circuit |
US7640528B1 (en) | 2006-08-04 | 2009-12-29 | Altera Corporation | Hardware acceleration of functional factoring |
US7797667B1 (en) | 2006-08-04 | 2010-09-14 | Altera Corporation | Hardware acceleration of functional factoring |
US20090254505A1 (en) | 2008-04-08 | 2009-10-08 | Microsoft Corporation | Reconfigurable hardware accelerator for boolean satisfiability solver |
Non-Patent Citations (10)
Title |
---|
Altera Corporation: 5. High-Speed Differential I/O Interfaces in Stratix Devices; Stratix Device Handbook, vol. 1; Altera Corporation; 2005; pp. 5-1 to 5-76. * |
Athavale et al.; High-Speed Serial I/O Made Simple, Designer's Guide with FPGA Applications; Connectivity Solutions Edition 1.0; Xilinx, Inc; 2005; 210 pages. * |
Chao Wang, et al. "An Application Mapping Scheme over Distributed Reconfigurable System" IEEE 2009 15th Intl. Conf. Parallel Dist. Systems., pp. 535-542 (IEEE 2009). |
Patel: The basics of SerDes (serializers/deserializers) for interfacing; Planet Analog-Articles-The basics of SerDes (serializers/deserializers) for interfacing; Sep. 2010; pp. 1-8. * |
Patel: The basics of SerDes (serializers/deserializers) for interfacing; Planet Analog—Articles—The basics of SerDes (serializers/deserializers) for interfacing; Sep. 2010; pp. 1-8. * |
Schleupen et al.; Dynamic Partial FPGA Reconfiguration in a Prototype Microprocessor System; IEEE; 2007; 533-536. * |
SERDES Handbook; 2003 Lattice Semiconductor Corporation 2003; 98 pages. * |
Sides et al.: SERDES Interfaces in an FPGA World: What Do I Need to Know to Get Started?; Published on Wireless Design & Development (http://www.wirelessdesignmag.com); 2007; pp. 1-9. * |
Stauffer et al,; High Speed Serdes Devices and Applications; 2008 Springer Science; 495 pages. * |
Sunter: An Automated, Complete, Structural Test Solution for SERDES; 2004 ITC International Test Conference; 2004, pp. 95-104. * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140109035A1 (en) * | 2012-10-12 | 2014-04-17 | Mediatek Inc. | Layout method for printed circuit board |
US9158880B2 (en) * | 2012-10-12 | 2015-10-13 | Mediatek Inc. | Layout method for printed circuit board |
US9846756B2 (en) | 2012-10-12 | 2017-12-19 | Mediatek Inc | Layout method for printed circuit board |
US10652043B2 (en) | 2016-06-14 | 2020-05-12 | G.E. Aviation Systems, LLC | Communication regulation in computing systems |
CN106844256A (en) * | 2017-02-22 | 2017-06-13 | 天津大学 | A kind of active power distribution network real-time simulator internal interface method for designing based on many FPGA |
CN106844256B (en) * | 2017-02-22 | 2020-09-11 | 天津大学 | Active power distribution network real-time simulator internal interface design method based on multiple FPGAs |
Also Published As
Publication number | Publication date |
---|---|
US20130170525A1 (en) | 2013-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9002693B2 (en) | Wire like link for cycle reproducible and cycle accurate hardware accelerator | |
US7908574B2 (en) | Techniques for use with automated circuit design and simulations | |
US7904859B2 (en) | Method and apparatus for determining a phase relationship between asynchronous clock signals | |
US7984400B2 (en) | Techniques for use with automated circuit design and simulations | |
US8640070B2 (en) | Method and infrastructure for cycle-reproducible simulation on large scale digital circuits on a coordinated set of field-programmable gate arrays (FPGAs) | |
US8756557B2 (en) | Techniques for use with automated circuit design and simulations | |
US11386250B2 (en) | Detecting timing violations in emulation using field programmable gate array (FPGA) reprogramming | |
US7289946B1 (en) | Methodology for verifying multi-cycle and clock-domain-crossing logic using random flip-flop delays | |
US10796048B1 (en) | Adding delay elements to enable mapping a time division multiplexing circuit on an FPGA of a hardware emulator | |
US8352242B2 (en) | Communication scheme between programmable sub-cores in an emulation environment | |
CN113642285A (en) | Determining and verifying metastability in clock domain crossings | |
US11275877B2 (en) | Hardware simulation systems and methods for reducing signal dumping time and size by fast dynamical partial aliasing of signals having similar waveform | |
US8631364B1 (en) | Constraining VLSI circuits | |
Ono et al. | A modular synchronizing FIFO for NoCs | |
US9449127B1 (en) | System for verifying timing constraints of IC design | |
JP2014001937A (en) | Scan test method, program and scan test circuit | |
Chaturvedi | Static analysis of asynchronous clock domain crossings | |
Poornima et al. | Functional verification of clock domain crossing in register transfer level | |
WO2024148502A1 (en) | Circuitry for staggering capture clocks in testing electronic circuits with multiple clock domains | |
US20230035693A1 (en) | Clock signal realignment for emulation of a circuit design | |
Sabparie et al. | Design and simulation of serial peripheral interface core with APB interfacing | |
US8701061B2 (en) | Semiconductor design support apparatus | |
Semba et al. | A study on the optimization of asynchronous circuits during RTL conversion from synchronous circuits | |
US11860751B1 (en) | Deterministic data latency in serializer/deserializer-based design for test systems | |
Semba et al. | A Design Support Tool Set for Interface Circuits Between Synchronous and Asynchronous Modules |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ASAAD, SAMEH;KAPUR, MOHIT;PARKER, BENJAMIN D.;SIGNING DATES FROM 20111220 TO 20111221;REEL/FRAME:027468/0055 |
|
AS | Assignment |
Owner name: U.S. DEPARTMENT OF ENERGY, DISTRICT OF COLUMBIA Free format text: CONFIRMATORY LICENSE;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:027607/0929 Effective date: 20120104 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551) Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |