US20190097636A1 - Dynamic multicycles for core-periphery timing closure - Google Patents
Dynamic multicycles for core-periphery timing closure Download PDFInfo
- Publication number
- US20190097636A1 US20190097636A1 US15/719,194 US201715719194A US2019097636A1 US 20190097636 A1 US20190097636 A1 US 20190097636A1 US 201715719194 A US201715719194 A US 201715719194A US 2019097636 A1 US2019097636 A1 US 2019097636A1
- Authority
- US
- United States
- Prior art keywords
- memory element
- clock
- edge
- data transfer
- latency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012546 transfer Methods 0.000 claims abstract description 97
- 238000000034 method Methods 0.000 claims abstract description 73
- 238000013461 design Methods 0.000 claims abstract description 44
- 239000004744 fabric Substances 0.000 claims abstract description 25
- 238000004519 manufacturing process Methods 0.000 claims description 14
- 238000004891 communication Methods 0.000 claims description 4
- 230000010363 phase shift Effects 0.000 claims description 2
- 230000011664 signaling Effects 0.000 claims description 2
- 230000015572 biosynthetic process Effects 0.000 abstract description 26
- 238000003786 synthesis reaction Methods 0.000 abstract description 26
- 238000004458 analytical method Methods 0.000 abstract description 9
- 230000001360 synchronised effect Effects 0.000 abstract description 3
- 230000003111 delayed effect Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 18
- 230000006870 function Effects 0.000 description 6
- 230000001960 triggered effect Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000009826 distribution Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03K—PULSE TECHNIQUE
- H03K19/00—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
- H03K19/02—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components
- H03K19/173—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
- H03K19/177—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
- H03K19/17748—Structural details of configuration resources
- H03K19/17764—Structural details of configuration resources for reliability
-
- G06F17/5054—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/34—Circuit design for reconfigurable circuits, e.g. field programmable gate arrays [FPGA] or programmable logic devices [PLD]
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03K—PULSE TECHNIQUE
- H03K5/00—Manipulating of pulses not covered by one of the other main groups of this subclass
- H03K5/15—Arrangements in which pulses are delivered at different times at several outputs, i.e. pulse distributors
- H03K5/15006—Arrangements in which pulses are delivered at different times at several outputs, i.e. pulse distributors with two programmable outputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2119/00—Details relating to the type or aim of the analysis or the optimisation
- G06F2119/12—Timing analysis or timing optimisation
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03K—PULSE TECHNIQUE
- H03K19/00—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
- H03K19/02—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components
- H03K19/173—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
- H03K19/1733—Controllable logic circuits
- H03K19/1735—Controllable logic circuits by wiring, e.g. uncommitted logic arrays
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03K—PULSE TECHNIQUE
- H03K19/00—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
- H03K19/02—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components
- H03K19/173—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
- H03K19/1733—Controllable logic circuits
- H03K19/1737—Controllable logic circuits using multiplexers
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03K—PULSE TECHNIQUE
- H03K19/00—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
- H03K19/02—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components
- H03K19/173—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components
- H03K19/177—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using elementary logic circuits as components arranged in matrix form
- H03K19/17724—Structural details of logic blocks
- H03K19/17728—Reconfigurable logic blocks, e.g. lookup tables
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Nonlinear Science (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- General Physics & Mathematics (AREA)
- Design And Manufacture Of Integrated Circuits (AREA)
Abstract
Description
- The present invention relates generally to synthesis of digital circuitry and, more specifically, to systems and methods for obtaining timing closure in digital circuitry design.
- This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present invention, which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
- Certain electrical devices, such as programmable logic devices (PLDs) and application specific integrated circuits (ASICs), may have circuitry elements that may exchange data via a bus or a wire that may have large latencies. For example, certain field-programmable gate arrays (FPGAs) may have programmable fabric region (e.g., core) that may be customized by a user, and a hardened circuitry region (e.g., hardened logic region, fixed circuitry, periphery) that may provide interface functionality to the FPGA that may be used by the custom logic. The synchronous logic in the programmable fabric region may be clocked by a clock tree, which may be generated during the FPGA synthesis process by the user. As such, the latency of the clock provided to the programmable fabric region may vary based on the FPGA design. The hardened logic, by contrast, may have a fixed clock latency that may be determined by during the synthesis of the hardened logic circuitry and may be different from the clock latency of the programmable fabric region. The differences in the clock latency in the programmable fabric region and the hardened region may lead to clock skews, which may affect performance and/or failure of the circuit. While certain synthesis process in computer assisted design (CAD) tools may reduce these clock skews, the variable latency of programmable fabric region may lead to unavoidably large clock skews, which may interfere significantly in the transfer of data between registers in the programmable fabric region and registers in the hardened logic region.
- Advantages of the invention may become apparent upon reading the following detailed description and upon reference to the drawings in which:
-
FIG. 1 illustrates an electrical device having a hardened logic region and a programmable fabric region, and may benefit from the dynamic multicycle for core periphery transfer, in accordance with an embodiment; -
FIG. 2 illustrates a method for synthesis of circuitry which may incorporate multicycles for clock synthesis, in accordance with an embodiment; -
FIG. 3 illustrates a simple configurable clock network that may be used to provide clock signals for programmable fabric of the FPGA ofFIG. 3 , in accordance with an embodiment; -
FIG. 4A illustrates a small clock tree that may be implemented in the configurable clock network ofFIG. 3 , in accordance with an embodiment; -
FIG. 4B illustrates a large clock tree that may be implemented in the configurable clock network ofFIG. 3 and may present a different clock latency from the clock tree ofFIG. 4A , in accordance with an embodiment; -
FIG. 5 illustrates a diagram of a transfer of data from a core registry to a periphery and may benefit from the use of multicycles for transfers, in accordance with an embodiment; -
FIG. 6 illustrates a timing diagram that may use multicycle constraints for timing synthesis of data transfer between core and periphery circuitry, in accordance with embodiment; -
FIG. 7 is a flowchart of a method for dynamic multicycle determination with reduced iteration by employing latency information, in accordance with an embodiment; -
FIG. 8 illustrates a timing diagram that may use destination multicycle constraints for timing synthesis of data transfer between core and periphery circuitry, in accordance with an embodiment; -
FIG. 9 illustrates a timing diagram that for circuitry that uses destination multicycle constraints for timing synthesis of data transfer between core and periphery circuitry with a different skew from that ofFIG. 8 , in accordance with an embodiment; -
FIG. 10 illustrates a method that may be used in the timing synthesis process to determine the application of multicycle constraints by comparing clock latencies, in accordance with an embodiment; and -
FIG. 11 illustrates a method that may be used in the timing synthesis process to determine the application of multicycle constraints by maximizing positive slack data transfers, in accordance with an embodiment. - One or more specific embodiments of the present invention will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
- Many electrical devices may include integrated circuits, such as field-programmable gate arrays (FPGAs) to perform certain functions of the electrical device. These integrated circuits may be created by creating a logic design or a register-transfer level (RTL) design and, through a synthesis process, generating logic circuitry. In application-specific integrated circuits (ASICs), the process may generate circuitry that have hardened circuitry logic. In programmable logic devices (PLDs), the process may generate instructions to program the configurable circuitry to implement the desired logic. Some programmable logic devices may also include certain functionalities that may be provided by hardened circuitry. For example, certain FPGAs may have a programmable fabric (e.g., a core) which may be customized by a user, and hardened logic (e.g., a periphery) that may implement certain routine functionalities for the user's convenience. Examples of hardened logic include circuitry that implements communication protocol (e.g., Ethernet, Bluetooth, Peripheral Component Interconnect Express or PCIe, etc.), memory interface protocols (e.g., Double Data Rate or DDR), and other communication standards such as the low-voltage differential signaling (LVDS).
- During the synthesis process, design tools may take into account timing constraints when generating the logic circuitry. Timing constraints may allow proper synchronization between different elements of the circuitry to prevent certain types of failure. For example, if an RTL design implements a transfer of data between two registers, the data provided by the source register should be available and stable when the destination register latches the data. Note that clocks of the two registers may not be completely synchronized due to differences in the latency in both registers, generating clock skews. Embodiments described herein are related to methods and systems that may be used to satisfy timing constraints during the logic synthesis process under the presence of substantial and/or unmitigated clock skew. For example, the hardened circuitry in an FPGA may be a fixed latency that may not be changed by the user during synthesis of the custom logic. This latency may be substantially different from the variable clock latency that may appear in custom logic, as detailed below. Embodiments may allow satisfying time constraints for, for example, data transfers between registers in hardened logic and programmable fabric, in which clock skews be substantial. In certain embodiments, the timing constraints may be satisfied with the use of multicycles, instruction in which data transfers may employ multiple clock cycles to accomplish. Moreover, certain embodiments employ destination multicycles, whereby a circuit-design tool may determine the number of cycles used for a data transfer based on the latencies and/or skews.
- With the foregoing in mind,
FIG. 1 provides an example of anFPGA 40 that may be programmed based on a circuit design developed using logic synthesis. TheFPGA 40 may includeinterface circuitry 44 for driving signals off of theFPGA 40 and for receiving signals from other devices.Interface circuitry 44 may include analog circuitry (e.g., transceiver circuitry) and hardened logic circuitry to implement certain routine instructions related to the specific protocol used byinterface circuitry 44. Data may be exchanged through theFPGA 40 throughinterconnection resources 46, which may be used to route signals, such as clock or data signals, through theFPGA 40. TheFPGA 40 ofFIG. 1 may include a number ofprogrammable fabric elements 48. Eachprogrammable fabric element 48 may include a number ofprogrammable logic elements 50 having operations defined by configuration memory 52 (e.g., configuration random access memory, or CRAM). Theprogrammable logic elements 50 may include combinational or sequential logic circuitry. For example, theprogrammable logic elements 50 may include look-up tables (LUTs), registers, multiplexers, routing wires, and so forth. A user may program theprogrammable logic elements 50 to perform a variety of desired functions. For example, a user may program aprogrammable logic element 50 to receive and/or send data to a register ininterface circuitry 44 to send or receive data with an external device. - A
power supply 54 may provide a source of voltage and current to a power distribution network (PDN) 56 that distributes electrical power to the various components of theFPGA 40. Operating the circuitry of theFPGA 40 causes power to be drawn from thepower distribution network 56. Furthermore, theFPGA 40 may be electrically programmed. With electrical programming arrangements, theprogrammable elements 50 may include one or more logic elements (wires, gates, registers, etc.). For example, during programming, configuration data is loaded into the configuration memory 52 usinginterface circuitry 44 and/or input/output circuitry 42. In one example, the configuration memory 52 may be implemented as configuration random-access-memory (CRAM) cells. The use of configuration memory 52 based on RAM technology is described herein is intended to be only one example. Moreover, configuration memory 52 may be distributed (e.g., as RAM cells) throughout the variousprogrammable fabric elements 48 theFPGA 40. The configuration memory 52 may provide a corresponding static control output signal that controls the state of an associatedprogrammable logic element 50 or programmable component of theinterconnection resources 46. The output signals of the configuration memory 52 may configure the may be applied to the gates of metal-oxide-semiconductor (MOS) transistors that control the states of theprogrammable logic elements 50 or programmable components of theinterconnection resources 46. - The programming of the
programmable fabric elements 48, of thepower distribution network 56, and of theinterconnection resource 46, which may include clocking, may take place as described above through electrical programming. Theflow chart 100 inFIG. 2 illustrates a method to generate instructions for programming of an FPGA device from a logic design. In aprocess 102, a logic design may be generated. The logic design may include a high-level description of the functions that may be performed by the programmable fabric elements and/or the design. The logic design may be an algorithmic description of a desired behavior. In general, the logic design may be provided in a computer-readable format to a logic synthesis tool, such as a hardware description language. In some situations, the logic design may be automatically generated by the logic synthesis tool from a more abstract description. Aprocess 104 may receive the logic design fromprocess 102 to produce a register-transfer level (RTL) design. Inprocess 102, the logic design may be translated into an RTL design that may include memory elements (e.g., look-up table, register, flip-flop, latch, etc.) that may be used to perform a desired function. - Electronic elements described in the RTL design may be associated with logic elements of an FPGA in routing and
placement process 106. Note thatprocess 106 may incorporate certainphysical constraints 108 related to the number of logic elements and/or memory employed, bandwidth constraints, power and thermal constraints, data path and total wire length. The routing andplacement process 106 may also include, may precede, or may follow atiming analysis process 110.Timing analysis process 110 may be performed by a static timing analysis (STA) tool.Timing analysis process 110 may take into accountcertain timing constraints 112 associated with the RTL design. For example, the operation frequency for the RTL design may limit the distance between two registers that may operate synchronously. Timingconstraints 112 may also include setup and hold constraints, which may assist the validity of data that is transferred between two registers. In order to satisfytiming constraints 112, the STA tool may incorporate certain rules and/or strategies such asmulticycle 114 and destination multicycle 116 strategies, which are detailed below. - Following the routing and
placement process 106 andtiming analysis process 110, a programming instruction may be generated in aprocess 118. The programming instruction may determine the placement and operation of gates, LUTs, and memory elements of the FPGA. The programming instruction may also configure the clock tree, which provides timing to the different regions of the FPGA, and the PDN, as discussed above. - A diagram in
FIG. 3 illustrates aconfigurable clocking network 150 for an FPGA. Clockingnetwork 150 may have a plurality ofclock switch boxes 152, which may allow clock signals to be routed in a programmable manner. The configuration of theswitch boxes 152 of clockingnetwork 150 may be produced as a result of thetiming analysis process 110. For example,clock switch boxes 152 may be programmed to provide certain regions with reduced clock skew by providing balanced clock latency. -
FIGS. 4A and 4B provide examples of two latency-balanced clock trees that may be implemented by theswitch boxes 152 described inFIG. 3 . AnFPGA device 180 inFIG. 4A may have a configuredclock tree 182 with a clock signal source on anode 181.Clock tree 182 may reach all nodes of aregion 184 of theFPGA device 180 and, thus, may be suitable for an RTL design that may have logic elements placed in aregion 184. Note, further, that due to the particular layout of theclock tree 182, the latencies in theregion 184 are balanced, reducing the clock skew between the logic elements ofregion 184. AnFPGA device 190 inFIG. 4B shows a differently configuredclock tree 192 with a clock signal source on anode 191.Clock tree 192 covers alarger region 194 of theFPGA 190. As a result, theclock tree 192 may be larger. Note that the layout ofclock tree 192 has a particular structure such that latencies inregion 194 may be balanced. This may result in aregion 194 with reduced clock skew between its logic elements. Note however, that the latencies inclock tree 182 may be much smaller than the latencies inclock tree 192. As a result, if an FPGA includes a first region that receives a small clock tree similar toclock tree 182, and a second region that receives a large clock tree similar toclock tree 192, there may be a clock skew between registers in the first and the second region, similar to the above-described clock skew between registers in a programmable fabric and regions having hardened circuitry. - The electrical diagram 200 in
FIG. 5 illustrates a system in which a clock skew may affect management of data transfer, as discussed above. Electrical diagram 200 may represent an FPGA device having afirst region 202, which may be a hardened circuitry, and asecond region 204, which may be a programmable region. Bothfirst region 202 andsecond region 204 may receive clock signals that may be originated in aclock 210, which may be a phase-locked loop (PLL).First region 202 may receiveclock signals 212 through ahardened clock tree 213.Second region 204 may receiveclock signals 214 throughprogrammable clock tree 215.Clock signal 212 may present a latency that is substantially smaller than the latency ofclock signal 214, as discussed above. As a result, a clock skew betweenfirst region 202 andsecond region 204 may interfere with data transfers from core to periphery (C2P transfer) 216, and/or data transfers from periphery to core (P2C transfer) 217. AC2P transfer 216 may take place between aregister 222 in the programmablefirst region 202 and aregister 224 in the hardenedsecond region 204. AP2C transfer 217 may take place between aregister 226 in thesecond region 204 and thefirst region 202. Due to the clock skew betweenfirst region C2P transfer 216 and/orP2C transfer 217 may fail due to failing to meet setup time and/or hold time constraints. Note that in this example,first region 202 may be hardened circuitry andsecond region 204 may be programmable, but the system may behave similarly for data transfers between two programmable regions having different clock latencies or two hardened circuitry regions having different clock latencies. - The timing diagram 250 in
FIG. 6 illustrates the effect of clock skews on the data transfer and the use of multicycles to satisfy time constraints, by exemplifying a C2P transfer. Asource clock 260 may have awaveform 262 that corresponds to the signal measured in the clock source (e.g., clock 210). Acore region 264 may have awaveform 266 that corresponds to the clock signal received in a programmable region (e.g., second region 204). Aperiphery region 268 may have awaveform 270 that corresponds to the clock signal received in a hardened region (e.g., first region 202). Note that edges 269 inwaveforms waveforms - In the example of the timing diagram 250, a C2P transfer may occur as triggered by
edge 271. In a C2P transfer, the core may make the data available, as triggered byedge 271, and the periphery may latch the data, as triggered byedge 271. However, due to the latency indicated byarrow 272, the core may only make the data available attime 273, while the periphery expects the data to be available attime 275. If periphery clock is configured to latch the data following 1 clock period after the C2P edge 271 (e.g., a 1 multicycle), it will expect data to be available during thewindow 276. This leads to a timing failure as the core would use a negative setup time 278. This failure may be solved by configuring C2P transfers to follow a 2 multicycle, in which the periphery clock is configured to latch the data following 2 clock periods after the C2P edge. With the multicycle of 2, the periphery register may latch data in thewindow 279, allowing a positive setup time 283. Note that for P2C transfers, multicycles may be used to satisfy holding time requirements when there is clock skew. - A logic synthesis tool and/or an STA tool may identify situations in which multicycles may be used to satisfy timing requirements. To that end, the logic synthesis tool may implement a clock tree for the logic circuitry associated with the RTL design, identify the latencies of the many modules, identify data transfers and associated clock skews, and implement multicycles to the design accordingly. However, such process may be cumbersome and involve several iterations of route and placement processes, as it may involve at least one iteration of such process to identify clock latencies and clock skews, and further iterations to determine if a chosen multicycle strategy satisfy the timing constraints.
Method 400 inFIG. 7 illustrates a system that may allow a dynamic multicycle strategy that determines the multicycle determination with reduced iterations. To that end,method 400 may employ latency information for hardened circuitry or from pre-designed soft circuitry (e.g., soft IP). The latency information may be determined during the design process of the hardened circuitry and/or the soft IP, and provided to a user of the FPGA along with circuitry code and/or specifications.Method 400 may have aprocess 402 in which the STA tool receives a timing constraint, which may be associated with a data transfer. The STA tool may then retrieve latency information in aprocess 404 related to circuitry that may be associated with the data transfer, such as a register, a memory device, a LUT, or any other. The information may be a pre-calculated latency that is stored in a database and/or in a file that is accessible by the STA tool or by the synthesis tool. In some embodiments, the retrieval may be implemented through a procedural call by the synthesis tool when processing a timing constraint file to a timing file and/or database that holds latency information about clocks. Based on the retrieved information, the STA tool may dynamically determine the appropriate multicycles for hold and setup edges and satisfy the timing requirements for data transfers in the FPGA design without further iterations. - In using dynamic multicycle constraints, as described above, the relationship between clock edges used by the STA tool may be based on clock edges at the source of the clock tree. Multicycles are designed using as reference an ideal edge from source clock. The STA tool may, instead, use as references the edge as of the clock signal at the end of the clock tree to determine multicycles, leading to destination multicycle constraints. The timing diagram 280 of
FIG. 8 illustrates the use of destination multicycle of 1 to satisfy the timing relationships for data transfers between regions with substantial, by means of a C2P transfer. In this example,source clock 260 may have awaveform 282 that corresponds to the signal measured in the clock source (e.g., clock 210),core region 264 may have awaveform 284 that corresponds to the clock signal received in a programmable region (e.g., second region 204), andperiphery region 268 may have awaveform 286 that corresponds to the clock signal received in a hardened region (e.g., first region 202). Note that theedges 287 inwaveforms FIG. 6 , the phase difference betweenwaveforms - In this example, an RTL design may include a C2P transfer that may be triggered by
edge 290 at the source clock. To implement a destination multicycle constraint, the STA tool may use the clock latency at the register to identify, as illustrated witharrow 292, thecorresponding edge 293 at the core. From theedge 293 and the known latency at the periphery region, the STA tool may identify, as illustrated witharrow 296, aprevious edge 297 to use as a hold edge for this transfer. The STA tool may also identify, as illustrated witharrow 298, anext edge 299 to be used as a setup edge for this C2P transfer. Since the edge ofwaveform 286 that corresponds to edge 290 that triggers the C2P transfer isedge 295, this transfer having a destination multicycle of 1 may be similar to an implementation of a multicycle of 2. However, since the determination of thehold edge 297 and thesetup edge 299 used thedestination edge 293 as reference, the design may be simplified earlier in the process, when the skews and clock latencies are not yet known. - The effect of changes in skew on the destination multicycle constraint is illustrated in the timing diagram 300 of
FIG. 9 . Timing diagram 300 illustrates an example that is similar to the example illustrated by timing diagram 280, but in which the latency of in thecore region 264 is reduced. As illustrated, thecore region 264 shows awaveform 302 that has a smaller latency, when compared towaveform 284 inFIG. 8 . As in timing diagram 280, thesource 260 may present thewaveform 282 associated with the clock source (e.g., clock 210) andperiphery region 268 may have awaveform 286 that corresponds to the clock signal received in a hardened region (e.g., first region 202).Edges 287 inwaveforms waveforms waveforms FIG. 6 . As in timing diagram 280, the C2P transfer may be triggered by thesource clock edge 290. - As discussed above, the STA tool performing an analysis using a destination multicycle of 1 may identify the edges used for the C2P transfer. As discussed above, the STA tool may identify, as illustrated with
arrow 292, theedge 293 at the core that corresponds to edge 290, based on the clock latency atcore region 264. Usingedge 293 as a reference, the STA tool may identify, as illustrated witharrow 296, aprevious edge 297 and use it as a hold edge. The STA tool may also identify, as illustrated witharrow 298, anext edge 299 and use it as setup edge. Since the edge ofwaveform 286 that triggers the C2P transfer and corresponds to edge 290 is thehold edge 297, this transfer having a destination multicycle of 1 may be similar to an implementation of a multicycle of 1. Note that in the example ofFIG. 8 , by contrast, the destination multicycle of 1 led to an implementation of a multicycle of 2. This contrast further illustrates that destination multicycles constraints may simplify the design, as it may be employed without knowledge of the specific clock skews, as may be the case during the design of hardened circuitry of an FPGA. - In the examples illustrated in
FIGS. 8 and 9 , the destination multicycle of 1 was determined relative to thecore region 264 in a C2P transfer, which is the origin of the data transfer illustrated. In general, destination multicycles may be configured to use the data source or the data destination of the transfer as reference. For example, if a destination multicycle is configured to use the data source as a reference, the STA may initially seek the latency between the source clock and the data source register (e.g., a core register in a C2P transfer, a periphery register in a P2C transfer), identify the corresponding edge, and determine the setup and/or hold edges based on the identified edge. If a destination multicycle is configured to use the data destination as a reference, the STA may initially seek the latency between the source clock and the data destination register (e.g., a periphery register in a C2P transfer, a core register in a P2C transfer), identify the corresponding clock edge in the periphery clock for setup and/or hold, and determine the launch, setup and/or hold edges accordingly. Note further that a system may describe destination multicycles that include more than a single cycle (e.g., two cycles, three cycles, etc.). - The flow chart in
FIG. 10 illustrates amethod 340 to implement a destination multicycle constraints. This method may be performed by an STA tool or by a logic synthesis tool.Method 340 may be performed for each data transfer, such as one between registers located in distinct and/or distant regions, a data source element (e.g., a register) and a data destination element (e.g., a register) may be identified. In aprocess 342, a launch edge of the data source may be identified, based on the latency of the region of the data source. For example, in a C2P transfer, the launch edge may be that of a register in the programmable region of the FPGA device, while in a P2C transfer, the launch edge may be that of a register in the hardened circuitry of the FPGA device. In aprocess 344, latch edges of the data destination may be identified, based on the latency of the region of the data destination. Latch edges may be a setup edge and/or a hold edge. As discussed above, the destination region may be hardened circuitry in a C2P transfer or a programmable circuitry in a P2C transfer. - Based on the latency from the launch region and that of the latch region, a phase shift (e.g., clock skew) between the two regions may be determined. Based on the clock skew, the multicycle timing may be properly calculated. If the destination multicycle is configured to use the data source as reference, the launch edge identified may be set as a reference edge, and the setup edges hold edges may be determined based on the clock skew. For example, the setup edge may be identified as the edge in the destination that immediately precedes the launch edge, as discussed above. In this example, the hold edge may be identified as the edge in the destination clock waveform that immediately follows the launch edge. If the destination multicycle is configured to use the data destination as reference, a setup edge may be chosen as a reference and a hold edge may be determined based on that choice. Based on that choice and on the clock skew, the launch edge in the data source may be determined as an edge that precedes the hold edge follows the setup edge. Follow the determination and assignment of edges,
method 340 may adjust the logic circuitry to employ the identified edges as the data transfer edges in aprocess 348, to implement the destination multicycle. - The flow chart in
FIG. 11 illustrates amethod 360 to implement destination multicycles based on a global optimization of a parameter. This method may be performed by an STA tool or by a logic synthesis tool.Method 360 may be performed globally, to satisfy multiple destination multicycle and/or multicycle constraints in a single application. In aninitialization process 362, the method may provide an initial multicycle configuration. Thisinitialization process 362 may have, for example, multiple data transfers, and each data transfer may have a positive and/or negative transfer slacks. Generally, a data transfer slack may refer to a difference between the time at which a data may be available (e.g., the launch edge) and the time at which data may be latched (e.g., the hold edge). In aprocess 364, a figure of merit, which may be a number of data transfers with positive slacks, may be determined. The number of data transfers with a positive slacks may be compared with a threshold (e.g., fraction of data transfers with positive slacks, total number of data transfers with positive slacks) in aprocess 364. - If the threshold is not met,
method 360 may enter anew iteration 366. In this new iteration, the multicycle configuration for the data transfers may be changed inprocess 362. Changes in the multicycle configuration inprocess 362 may be based on the data transfers that were found to have negative slack. Moreover, since these data transfers may be connected to other data transfers, certain data transfers that have positive slack may also have the destination multicycle configuration changed. Following the determination of the multicycles shifts, as described above the data transfers may be compared with the threshold inprocess 364. If the threshold is met,method 360 may enter aprocess 368 wherein the destination multicycle and/or multicycle configuration maximizes the positive slack is implemented by the configurable logic. This process may, for example, configure the logic circuitry in the programmable fabric to provide the data and/or the triggers according to the edges identified. Note that, while this example employed as figure of merit the number of transfers with positive slack, other figures of merit may be employed. For example,method 360 may, instead, minimize the number of transfers with a negative slack.Method 360 may also maximize the total slack (e.g., the sum of all positive and negative slack), maximize the sum of all positive slack, minimize the sum of all negatives slack, minimize the worst negative slack, maximize an absolute negative slack, or use other metrics that are related to the timing analysis performed. - The techniques presented and claimed herein are referenced and applied to material objects and concrete examples of a practical nature that demonstrably improve the present technical field and, as such, are not abstract, intangible or purely theoretical. Further, if any claims appended to the end of this specification contain one or more elements designated as “means for [perform]ing [a function] . . . ” or “step for [perform]ing [a function] . . . ,” it is intended that such elements are to be interpreted under 35 U.S.C. 112(f). However, for any claims containing elements designated in any other manner, it is intended that such elements are not to be interpreted under 35 U.S.C. 112(f).
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/719,194 US10320393B2 (en) | 2017-09-28 | 2017-09-28 | Dynamic multicycles for core-periphery timing closure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/719,194 US10320393B2 (en) | 2017-09-28 | 2017-09-28 | Dynamic multicycles for core-periphery timing closure |
Publications (2)
Publication Number | Publication Date |
---|---|
US20190097636A1 true US20190097636A1 (en) | 2019-03-28 |
US10320393B2 US10320393B2 (en) | 2019-06-11 |
Family
ID=65808059
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/719,194 Active 2037-10-17 US10320393B2 (en) | 2017-09-28 | 2017-09-28 | Dynamic multicycles for core-periphery timing closure |
Country Status (1)
Country | Link |
---|---|
US (1) | US10320393B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220294598A1 (en) * | 2021-03-11 | 2022-09-15 | Xilinx, Inc. | Reconfigurable mixer design enabling multiple radio architectures |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5455931A (en) * | 1993-11-19 | 1995-10-03 | International Business Machines Corporation | Programmable clock tuning system and method |
US5933623A (en) * | 1995-10-26 | 1999-08-03 | Hitachi, Ltd. | Synchronous data transfer system |
US6425114B1 (en) * | 2000-01-31 | 2002-07-23 | Lsi Logic Corporation | Systematic skew reduction through buffer resizing |
US7831856B1 (en) * | 2008-04-03 | 2010-11-09 | Lattice Semiconductor Corporation | Detection of timing errors in programmable logic devices |
US7605604B1 (en) * | 2008-07-17 | 2009-10-20 | Xilinx, Inc. | Integrated circuits with novel handshake logic |
US8001504B1 (en) * | 2008-07-29 | 2011-08-16 | Xilinx, Inc. | Determining clock skew between nodes of an integrated circuit |
CN102955869B (en) * | 2011-08-30 | 2015-04-08 | 国际商业机器公司 | Method and device for evaluating clock skew |
US9477635B1 (en) * | 2012-12-03 | 2016-10-25 | Google Inc. | Generating an identifier for a device using application information |
CN106464601B (en) * | 2014-03-28 | 2020-05-19 | 维格尔传播公司 | Channel bundling |
-
2017
- 2017-09-28 US US15/719,194 patent/US10320393B2/en active Active
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220294598A1 (en) * | 2021-03-11 | 2022-09-15 | Xilinx, Inc. | Reconfigurable mixer design enabling multiple radio architectures |
US11695535B2 (en) * | 2021-03-11 | 2023-07-04 | Xilinx, Inc. | Reconfigurable mixer design enabling multiple radio architectures |
Also Published As
Publication number | Publication date |
---|---|
US10320393B2 (en) | 2019-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11480993B2 (en) | Methods for optimizing circuit performance via configurable clock skews | |
US6617877B1 (en) | Variable data width operation in multi-gigabit transceivers on a programmable logic device | |
US9270279B2 (en) | Apparatus and methods for time-multiplex field-programmable gate arrays | |
US10162918B1 (en) | Integrated circuit retiming with selective modeling of flip-flop secondary signals | |
US20190179990A1 (en) | Distributed programmable delay lines in a clock tree | |
EP3089059A1 (en) | Implementing integrated circuit designs using depopulation and repopulation operations | |
JP5120785B2 (en) | Logic circuit design apparatus, logic circuit design method and logic circuit design program for asynchronous logic circuit | |
US10320393B2 (en) | Dynamic multicycles for core-periphery timing closure | |
US6518788B2 (en) | Logic circuit design method and logic circuit | |
CN112906338B (en) | Method, system, and medium for clock design for physical partition structure | |
US10699045B2 (en) | Methods and apparatus for regulating the supply voltage of an integrated circuit | |
US10181001B2 (en) | Methods and apparatus for automatically implementing a compensating reset for retimed circuitry | |
US10169518B1 (en) | Methods for delaying register reset for retimed circuits | |
US8659318B1 (en) | Systems and methods for implementing tristate signaling by using encapsulated unidirectional signals | |
US11681324B2 (en) | Synchronous reset deassertion circuit | |
US20180349544A1 (en) | Methods for performing register retiming with hybrid initial states | |
US20200348717A1 (en) | Digital circuits for radically reduced power and improved timing performance on advanced semiconductor manufacturing processes | |
US20180181684A1 (en) | Concurrently optimized system-on-chip implementation with automatic synthesis and integration | |
US10354038B1 (en) | Methods for bounding the number of delayed reset clock cycles for retimed circuits | |
US9330217B2 (en) | Holdtime correction using input/output block delay | |
US11030369B2 (en) | Superconducting circuit with virtual timing elements and related methods | |
JP2004127012A (en) | Synchronous circuit and its design method | |
Kudo et al. | Comparison of Pipelined Asynchronous Circuits Designed for FPGA | |
US7002369B2 (en) | Implementing complex clock designs in field programmable devices | |
US20190050517A1 (en) | Integrated circuit design system with automatic timing margin reduction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:AZIZI, NAVID;KUMARASWAMY, ADITI;NG, EMILY ALEXANDRA;REEL/FRAME:044099/0698 Effective date: 20170928 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
AS | Assignment |
Owner name: ALTERA CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTEL CORPORATION;REEL/FRAME:066353/0886 Effective date: 20231219 |