CN109800468B - Register retiming-based multi-pipeline sequential circuit boxing operation method - Google Patents

Register retiming-based multi-pipeline sequential circuit boxing operation method Download PDF

Info

Publication number
CN109800468B
CN109800468B CN201811587501.9A CN201811587501A CN109800468B CN 109800468 B CN109800468 B CN 109800468B CN 201811587501 A CN201811587501 A CN 201811587501A CN 109800468 B CN109800468 B CN 109800468B
Authority
CN
China
Prior art keywords
circuit
register
ble
pipeline
retiming
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811587501.9A
Other languages
Chinese (zh)
Other versions
CN109800468A (en
Inventor
李鹏
李运娣
郭小波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Institute of Engineering
Original Assignee
Henan Institute of Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan Institute of Engineering filed Critical Henan Institute of Engineering
Priority to CN201811587501.9A priority Critical patent/CN109800468B/en
Publication of CN109800468A publication Critical patent/CN109800468A/en
Application granted granted Critical
Publication of CN109800468B publication Critical patent/CN109800468B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Design And Manufacture Of Integrated Circuits (AREA)
  • Logic Circuits (AREA)

Abstract

The invention provides a register retiming-based multi-pipeline sequential circuit boxing operation method, which comprises the following steps of: using FPGA design flow to process the hardware description language designed by user through logic synthesis and mapping stage to generate a lookup table circuit; boxing the lookup table circuit by using a boxing algorithm; judging the type of the boxed pipeline-level time sequence circuit according to the trend of the key path of the time sequence circuit, and calculating the key path time delay of the boxed pipeline-level time sequence circuit by using a boxed time sequence circuit key path time delay calculation method; intermediate registers in the circuit are retimed according to the critical path delay. The invention can distribute the time delay of each pipeline stage as uniformly as possible through the retiming of the register, thereby improving the throughput rate of pipeline application and leading the application system to process more data in unit time; by utilizing the characteristic that registers packed into the logic block can be moved, the time delay of a critical path of the whole sequential circuit pipeline can be reduced.

Description

Register retiming-based multi-pipeline sequential circuit boxing operation method
Technical Field
The invention relates to the technical field of FPGA (field programmable gate array) design flows, in particular to a register retiming-based multi-pipeline sequential circuit packing operation method.
Background
In the FPGA design flow, a Hardware Description Language (HDL) designed by a Hardware programmer generates a gate level netlist (nand gate netlist) through logic synthesis, the gate level netlist generates a Look-Up Table (LUT) circuit through mapping, the LUT circuit is boxed into a logic block of a larger unit of the FPGA through binning, and then a bit stream file which can be downloaded to the FPGA is generated through layout and wiring, as shown in fig. 1. The hardware circuit design needs to be processed in the stages of logic synthesis, mapping, boxing and layout and wiring from a hardware description language designed by a programmer to a netlist downloaded into a chip, and the lookup table circuit generated after mapping can be distributed into FPGA logic blocks through boxing.
The look-up table circuit loads most of the network wires into the logic blocks by binning. The literature [ Betz V, Rose J.VPR: a new packing, placement and routing tool for FPGA research [ C ]. Procedings of the International work hop on Field-Programmable Logic and Applications, London, UK,1997: 213-. The document [ Marquardt A, Betz V, Rose J. Using clustered logic blocks and timing-driving to cloned FPGA speed and density [ C ]. Proceedings of International ACM Symposium on Field-Programmable Gate Arrays Circuit, Monterey, California, USA,1999:37-46 ] attempts to bin critical paths while taking into account the degree of inter-network connectivity to reduce critical path delays. In a document (Lepeng et al, FPGA binning algorithm based on network cable absorption and port occupation analysis, computer aided design and graphics bulletin, vol. 3, 2011, 9 months.), two aspects of network cable absorption and port occupation are considered, and the distribution rate of a subsequent circuit can be effectively improved. Since the line delay boxed inside is much smaller than the outside line delay. In the scheme, the time delay of the circuit path after boxing is not considered to be changed greatly, the critical path of each pipeline stage is also changed, the time delay of each pipeline stage is inconsistent due to boxing, the whole pipeline stage is constrained by the time delay of the unified critical path, and the optimal time delay of the critical path after boxing of the sequential circuit in the original method is determined by the time delay of the longest path in all the pipeline stages.
Disclosure of Invention
Aiming at the technical problems that circuit path time delay after boxing is changed greatly and the whole pipeline stage is constrained by unified key path time delay in the prior art, the invention provides a register retiming-based multi-pipeline sequential circuit boxing operation method, which makes full use of the difference of the path time delay of each pipeline stage and can make the time delay of each pipeline stage average by utilizing the retiming operation of the register, thereby effectively reducing the unified key path time delay of the pipeline sequential circuit and further improving the throughput rate of a circuit system.
In order to achieve the purpose, the technical scheme of the invention is realized as follows: a multi-pipeline sequential circuit boxing operation method based on register retiming comprises the following steps:
the method comprises the following steps: utilizing an FPGA design flow to process a hardware description language designed by a user through a logic synthesis and mapping stage to generate a lookup table circuit;
step two: boxing the lookup table circuit by using a boxing algorithm;
step three: judging the type of the boxed pipeline-level time sequence circuit according to the trend of the time sequence circuit key path, and calculating the time delay of the boxed pipeline-level time sequence circuit key path by using a boxed time sequence circuit key path time delay calculation method;
step four: and retiming intermediate registers in the circuit according to the critical path delay calculated in the third step.
A combination circuit representing the registers is arranged among the registers in the flow line sequential circuit, and the flow line sequential circuit comprises a standard flow line sequential circuit, a branch flow line sequential circuit and an output port type sequential circuit in the flow line; the branch pipeline type sequential circuit has branches on the level of the register, and the output port of the pipeline stage is arranged in the output port type circuit of the pipeline stage.
The method for calculating the time delay of the critical path of the boxed sequential circuit comprises the following steps:
(1) standard pipelined timing circuit: respectively calculating the longest circuit path time delay of the pipeline stages between adjacent registers, and dividing the longest circuit path time delay added by the number of the pipeline stages to obtain the critical path time delay of the pipeline stage time sequence circuit;
(2) a branch-pipelined timing circuit: calculating the critical path delay of each pipeline stage circuit formed by the registers by using a method for solving the critical path delay by using the standard pipeline type time sequence circuit, and selecting the pipeline stage circuit with a larger value as the critical path delay of the whole pipeline stage time sequence circuit;
(3) output port type sequential circuit in pipeline stage: calculating the critical path time delay A of each assembly line without an output port by using a method for solving the critical path time delay by using a standard pipelined timing circuit, calculating the average time delay B of all pipeline stages before the pipeline stage where the middle output port is located by using the method for solving the critical path time delay by using the standard pipelined timing circuit, and fixing the tail end of the pipeline stage where the output port is located at the output port of the middle lookup table; and selecting the larger value of the critical path delay A and the average delay B of all the pipeline stages as the critical path delay of the whole pipeline stage sequential circuit.
The method for calculating the direction of register retiming by using the critical path delay in the fourth step comprises the following steps: according to the critical path time delay calculated in the third step, the time delay of each pipeline stage path in the whole pipeline circuit is averagely distributed, and the demarcation point of each pipeline stage is marked in the circuit; and performing retiming direction judgment on a register between pipeline stages in the pipeline circuit by contrasting the boundary point: if the demarcation point is in front of the corresponding register, the corresponding register needs to retime to the circuit input end until the demarcation point; if the demarcation point is behind the corresponding register, the corresponding register needs to retime the circuit output until the demarcation point.
For each register needing retiming, firstly, judging whether the retiming is carried out between logic unit blocks or in the logic unit blocks by using a logic unit block register retiming method, and selecting a target logic unit block; and then, retiming the register in the corresponding basic logic unit in the target logic unit block to the front end or the back end of the circuit by using a circuit BLE register retiming method after boxing.
The method for retiming the register of the logic unit block comprises the following steps: the basic logic cells are finally packed into the logic cell blocks of the FPGA with register retiming in two cases from the perspective of the logic cell block:
(1) inter-logical unit block retiming: moving registers in BLE within block I of logic cells into BLE within block II of logic cells;
(2) internal retiming of a block of logic cells: the register in one of the BLEs within the block of logic cells is moved to another BLE within the block of logic cells.
The method for boxing the FPGA logic unit blocks by the boxing algorithm in the second step comprises the following steps:
(a) a look-up table circuit netlist BLE is boxed, and two selectors in the BLE are configured according to the actual condition of the look-up table circuit;
(b) selecting one BLE as a seed to be loaded into a target FPGA logic unit block;
(c) selecting BLE according to the attraction function value and continuously installing the BLE into the target FPGA logic unit block until BLE resources in the logic unit block are full or an external port reaches the physical upper limit of the FPGA logic unit block;
(d) selecting new seed BLE to continuously pack the unboxed logic cell blocks until all BLE are packed.
In step (a) of the binning algorithm: an independent LUT is used for BLE packing, register resources in BLE are idle, and two selectors are configured and connected to be output ports of the LUT; if the output end of the LUT drives one register to perform BLE boxing, both the LUT and the register resource in the BLE are utilized, and the two selectors are configured and connected as the output end of the register; if one LUT output end drives two paths, namely the lookup table circuit has two output ports at the same time, the circuit needs to be arranged in two BLE because BLE can only be set as one port output, LUT resources are configured in one BLE, register resources are idle, and two path selectors are configured and connected as the output ports of the LUT; and the LUT resource configuration in the BLE is a direct line connection, the register resource configuration and the two-way selector configuration are connected to be the output port of the register.
The method for retiming the BLE register of the circuit after boxing comprises the following steps: because the main resource of the pipeline stage time sequence circuit is the lookup table circuit, most register resources in BLE are idle, and conditions are created for retiming of a circuit register after subsequent boxing; for the condition that two registers of the pipeline stage circuit are directly connected, the registers can be sequentially retimed;
the following two types can be classified according to register retiming direction:
(1) retiming the BLE register to the circuit front end: the register in the second BLE needs to be retimed to the first BLE, if the register resource in the first BLE is idle, the register in the second BLE can be retimed to the inside of the first BLE, meanwhile, the two-way selector in the first BLE is configured and connected to be an output port of the register, and the two-way selector in the second BLE is configured and connected to be an output port of the lookup table; if the first BLE internal register resource is occupied, the register of the front end of the first BLE circuit for resource supply and the rear end of the circuit can be retimed firstly;
(2) retiming the BLE register to the circuit back end: the registers in the first BLE and the second BLE need to be retimed to the third BLE, if the register resource in the third BLE is idle, the registers in the first BLE and the second BLE can be retimed to the inside of the third BLE, meanwhile, the two-way selector in the third BLE is configured and connected to be the output port of the register, and the two-way selector in the first BLE and the second BLE is configured and connected to be the output port of the lookup table; if the third BLE internal register resources are occupied, the third BLE may be retimed first to the register that resources out of the circuit back end.
The invention has the beneficial effects that: by utilizing the characteristic that registers packed into the logic block can be moved, the time delay of a critical path of the whole sequential circuit pipeline can be reduced. In a specific pipeline circuit design, a Critical Path Delay (CPD) (i.e. the Delay of the longest pipeline stage in all pipeline stages included in a pipeline) determines the throughput rate of a system, and the smaller the CPD, the higher the throughput rate of the system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a design flow of an FPGA.
FIG. 2 is a flow chart of the present invention.
Fig. 3 is a block diagram of an FPGA logic cell, wherein (a) is an FPGA, (b) is a logic cell block, and (c) is a basic logic cell.
Fig. 4 is a table look-up table circuit BLE binning example, where (a) is table look-up table BLE binning, (b) is table look-up table output terminal added register BLE binning, and (c) is table look-up table output terminal driven two-path BLE binning.
FIG. 5 is a standard pipelined lookup table circuit.
FIG. 6 is a branch pipeline look-up table circuit.
FIG. 7 is a look-up table circuit with output ports in a pipeline stage.
Fig. 8 is a schematic diagram of retiming of the look-up table circuit BLE register to the front end of the circuit after packing.
Figure 9 is a schematic diagram of the look-up table circuit BLE register retiming to the back end of the circuit after boxing.
FIG. 10 is a diagram illustrating the retiming of a block of logic cells registers according to the present invention.
Fig. 11 is a flow chart for determining Critical Path Delay (CPD) and time margin for a single pipeline stage.
FIG. 12 is a diagram illustrating the determination of the pipeline stage register retiming direction.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 2, a method for binning multiple pipeline sequential circuits based on register retiming includes the following steps:
the method comprises the following steps: and utilizing an FPGA design flow to generate a lookup table circuit by processing a hardware description language designed by a user through a logic synthesis and mapping stage.
As shown in fig. 1, the generating of the look-up table circuit by processing the hardware description language designed by the user through the design flow of the FPGA includes: the method comprises the steps of generating a gate-level netlist (NAND gate netlist) through a logic synthesis stage by using a hardware description language designed by a user, generating a Look-Up Table (LUT) circuit through mapping the gate-level netlist, and determining that the Look-Up Table circuit meets the requirements of a multi-watermark sequential circuit.
Step two: and using a boxing algorithm to box the lookup table circuit.
As shown in fig. 3, the FPGA Logic cell block has a two-layer structure, i.e., a first layer Logic cell block and a second layer Basic Logic cell (BLE). The FPGA is composed of logic unit blocks, as shown in fig. 6(a), and the logic unit blocks are composed of BLE with shared wiring resources, as shown in fig. 6 (b). The single BLE internal consists of one LUT, one register resource, and one two-way selector, as shown in fig. 6 (c). Typically, the look-up table and register resources within BLE are configured according to the specific circuitry.
The traditional FPGA logic cell block boxing steps are as follows:
(1) look-up table (LUT) circuit netlist BLE encasement
When the look-up table circuit BLE is packed into a box, two selectors in the BLE are configured according to the actual condition of the circuit.
As shown in fig. 4(a), the LUT alone performs BLE binning, register resources in BLE are idle, and the two-way selector configuration is connected as an output port of the LUT. As shown in fig. 4(b), if one LUT output end drives one register for BLE binning, both LUT and register resources in BLE are utilized, and the two-way selector configuration is connected as an output port of the register. As shown in fig. 4(c), if one LUT output drives two paths, i.e. the look-up table circuit has two output ports at the same time, the circuit needs to be loaded into two BLE since BLE can only be set as one port output. Configuring LUT resources in BLE1, idling register resources, and configuring and connecting two selectors as output ports of the LUT; the LUT resource configuration in BLE2 is line direct connection, register resource configuration, and two-way selector configuration connection is the output port of register.
(2) Selecting one BLE as a seed to be loaded into a target FPGA logic unit block;
(3) selecting BLE according to the attraction function value and continuously loading the BLE into a target FPGA logic unit block;
the attraction function is calculated according to a formula provided in documents in the background art (Lepeng and the like, FPGA boxing algorithm based on network cable absorption and port occupation analysis, and the formula is provided in the journal of computer aided design and graphics, 23 vol. 3, 2011, 9 months). The BLE boxing process is continued until BLE resources in the logic block unit block are full or an external port reaches the physical upper limit of the FPGA logic block unit block;
(4) selecting new seed BLE to continuously pack the unboxed logic cell blocks until all BLE are packed.
Step three: judging the type of the boxed pipeline-level time sequence circuit according to the trend of the time sequence circuit key path, and calculating the time delay of the boxed pipeline-level time sequence circuit key path by using a boxed time sequence circuit key path time delay calculation method.
Single pipeline Critical Path Delay (CPD) and time margin solution process:
the method of fig. 11 is used to find the Critical Path Delay (CPD) and time margin for a single pipeline stage. The specific method comprises the following steps:
(a) calculating the time delay of each edge between the registers;
(b) inputting pipeline stage into register node T arrival The value is set to 0;
(c) computing T of other nodes arrival The value:
Figure BDA0001919460500000061
where i is the start of any path in the pipeline, j is the end of the path, and T arrival (i) Is the signal arrival time, T, of node i arrival (j) The signal arrival time of the node j, fanin (j) represents any node before the node j is connected, and delay (i, j) represents the time delay of the path (i, j);
(d) register T for all pipeline output ports required The value is set to the critical path delay:
Figure BDA0001919460500000062
wherein, register is the register of any output port;
(e) calculating T of other nodes by using the following formula required The values are:
Figure BDA0001919460500000063
wherein fanout (i) represents any node driven backward by node i, T required (i) The latest arrival time, T, of the signal representing node i requierd (j) Represents the latest arrival time of the signal at node j;
(f) the time margin value for any connection in the circuit is calculated using the following equation: sleep (i, j) ═ T requierd (j)-T arrival (i)-delay(i,j)。
The method comprises the following steps of (1) solving a pipeline circuit critical path time delay (CPD):
the combination circuit representing the registers is arranged between the registers in the pipeline stage circuit, and the pipeline stage sequential circuit comprises a standard pipeline sequential circuit, a branch pipeline sequential circuit and an output port sequential circuit in the pipeline stage. The pipeline stages are classified into the following types according to the trend of paths thereof:
(1) standard pipeline type
As shown in fig. 5, the registers a, b, and c divide the circuit into two pipeline stages connected in series, and the longest circuit path delay between the registers a to b and the registers b to c can be respectively obtained according to the Critical Path Delay (CPD) of a single pipeline stage and the time margin solving process, and the longest circuit path delay is added and then divided by the number of pipeline stages 2, so that the pipeline stage time sequence circuit critical path delay can be obtained. Subsequently, both pipeline stages can meet the critical path delay requirement by retiming register b. The method for calculating the time delay of the critical path of the standard pipelined timing circuit is the basis for calculating the time delay of the critical path of each subsequent type of timing circuit.
(2) Branch pipeline type
The branch pipeline type sequential circuit has branches at the level of the register. As shown in fig. 6, registers a, c, e, and f and registers b, d, e, and f constitute two pipeline stage circuits, the starting points of which are registers a and b, respectively, and the ending points of which are register f. The standard pipeline-type method for solving the critical path delay shown in fig. 3 can be used to respectively calculate the critical path delays of two pipeline-stage circuits formed by the registers a, c, e, f and the registers b, d, e, f, and then the critical path delay of the whole pipeline-stage time sequence circuit shown in fig. 4 with a larger value is selected. Subsequently, all pipeline stages on the two branch lines can meet the requirement of critical path delay through the retiming operation of the registers c, d and e.
(3) Output port type in pipeline stage
As shown in fig. 7, there are two paths driven by the lookup table output in the pipeline stage between registers b and c, one path leading to pipeline stage terminal register c and the other path being an output port. Because the following registers cannot be retimed to the front end of the circuit where they are limited by the output port, the method for calculating the critical path delay of the timing circuit is divided into the following steps:
the standard pipeline type critical path time delay solving method shown in fig. 5 is used for solving the critical path time delay a of the pipeline stage circuits of the registers a, b, c and d. And calculating the average time delay of all the pipeline stages before the pipeline stage where the middle output port is located, and determining the tail end of the pipeline stage where the output port is located at the output port of the middle lookup table to calculate the average pipeline stage critical path time delay B. The larger value of the critical path delay a and the average water level critical path delay B is selected as the critical path delay of the whole pipeline level timing circuit shown in fig. 7. Subsequently, all pipeline stages of the whole sequential circuit can meet the requirement of critical path delay through retiming the registers b and c.
Step four: and retiming intermediate registers in the circuit according to the critical path delay calculated in the third step.
Calculating register retiming direction using critical path delay:
according to the critical path time delay calculated in the third step, the time delay of each pipeline stage path in the whole pipeline circuit is averagely distributed, and the demarcation point of each pipeline stage is marked in the circuit; as shown in fig. 12, it can be calculated from the critical path delay that the pipeline stage cut point in the pipeline circuit should be A, B.
And performing retiming direction judgment on a register between pipeline stages in the pipeline circuit by contrasting the boundary point: if the demarcation point is in front of the corresponding register, the corresponding register needs to retime the circuit inputs until the demarcation point. As shown in fig. 12, demarcation point a is in front of the corresponding register b, which needs to be retimed to the circuit input, up to demarcation point a. If the demarcation point is behind the corresponding register, the corresponding register needs to retime the circuit output until the demarcation point. As shown in fig. 12, demarcation point B is behind the corresponding register c, and register B needs to retime to the circuit output until demarcation point B.
For each register needing retiming, firstly, judging whether the retiming is carried out between logic unit blocks or in the logic unit blocks by using a logic unit block register retiming method, and selecting a target logic unit block; and then retiming the register in the corresponding basic logic unit in the target logic unit block to the front end or the back end of the circuit by using a packed BLE (Low energy) register retiming method.
Retiming a circuit BLE register after boxing: since the main resource of the pipeline sequential circuit is LUT, most register resources in BLE are idle, which creates conditions for the subsequent circuit register retiming after packing. In an actual circuit, the condition that two registers of the pipeline circuit are directly connected rarely occurs, and if the condition occurs, the registers in the pipeline circuit can be retimed in sequence, so that the retiming operation of the circuit registers after boxing can be ensured.
The circuit BLE register retiming after binning can be divided into the following two types according to the register retiming direction:
(1) the BLE register retimes to the circuit front end:
as shown in fig. 8, registers in BLE2 need to be retimed to BLE1, if register resources inside BLE1 are idle, the registers in BLE2 can be retimed to the inside of BLE1, and the two-way selector in BLE1 is configured to be connected as an output port of the register, and the two-way selector in BLE2 is configured to be connected as an output port of the lookup table. If the internal register resources of BLE1 are occupied, it can first retime the registers that vacate the circuit front-end to the circuit back-end.
(2) BLE register retiming to circuit back end
As shown in fig. 9, registers in BLE1 and BLE2 need to be retimed to BLE3, if the internal register resources of BLE3 are idle, registers in BLE1 and BLE2 can be retimed to be internal to BLE3, and the two-way selector configuration in BLE3 is connected as the output port of the register, and the two-way selector configuration in BLE1 and BLE2 is connected as the output port of the lookup table. If BLE3 internal register resources are occupied, it may first retime its registers out of the circuit back-end to resource the circuit front-end.
Logic cell block register retiming:
since the basic logic cell BLE is finally boxed into a block of logic cells of the FPGA, there are two cases of register retiming from the perspective of the block of logic cells.
(1) Inter-logical cell block retiming: as shown in fig. 10, shifting registers in BLE C within block a to BLE D within block B belongs to inter-block register retiming. The internally generated change to BLE by retiming is described by the circuit BLE register retiming after binning.
(2) Internal retiming of a block of logic cells: as shown in fig. 10, the register in BLE a within the block of logic cells B moves into BLE B within the block of logic cells B to belong to the internal retiming of the block of logic cells. The internally generated changes to BLE are described by the post-binning circuit BLE register retiming.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (7)

1. A multi-pipeline sequential circuit packing operation method based on register retiming is characterized by comprising the following steps:
the method comprises the following steps: utilizing an FPGA design flow to process a hardware description language designed by a user through a logic synthesis and mapping stage to generate a lookup table circuit;
step two: boxing the lookup table circuit by using a boxing algorithm;
step three: judging the type of the boxed pipeline-level time sequence circuit according to the trend of the key path of the time sequence circuit, and calculating the key path time delay of the boxed pipeline-level time sequence circuit by using a boxed time sequence circuit key path time delay calculation method;
the method for calculating the critical path time delay of the boxed sequential circuit comprises the following steps:
(1) standard pipelined timing circuit: respectively calculating the longest circuit path time delay of the pipeline stages between adjacent registers, and dividing the longest circuit path time delay added by the number of the pipeline stages to obtain the critical path time delay of the pipeline stage time sequence circuit;
(2) a branch-pipelined timing circuit: calculating the critical path delay of each pipeline stage circuit formed by the registers by using a method for solving the critical path delay by using the standard pipeline type time sequence circuit, and selecting the pipeline stage circuit with a larger value as the critical path delay of the whole pipeline stage time sequence circuit;
(3) output port type sequential circuit in pipeline stage: calculating the critical path time delay A of each assembly line without an output port by using a method for solving the critical path time delay by using a standard pipelined timing circuit, calculating the average time delay B of all pipeline stages before the pipeline stage where the middle output port is located by using the method for solving the critical path time delay by using the standard pipelined timing circuit, and fixing the tail end of the pipeline stage where the output port is located at the output port of the middle lookup table; selecting the larger value of the critical path time delay A and the average time delay B of all the pipeline stages as the critical path time delay of the whole pipeline stage sequential circuit;
step four: retiming an intermediate register in the circuit according to the critical path delay calculated in the third step;
the method for retiming the BLE register of the circuit after boxing comprises the following steps: because the main resource of the pipeline stage time sequence circuit is the lookup table circuit, most register resources in BLE are idle, and conditions are created for retiming of a circuit register after subsequent boxing; for the condition that two registers of the pipeline stage circuit are directly connected, the registers can be sequentially retimed;
the following two types can be classified according to register retiming direction:
(1) retiming the BLE register to the circuit front end: the register in the second BLE needs to be retimed to the first BLE, if the register resource in the first BLE is idle, the register in the second BLE can be retimed to the inside of the first BLE, meanwhile, the two-way selector in the first BLE is configured and connected to be an output port of the register, and the two-way selector in the second BLE is configured and connected to be an output port of the lookup table; if the first BLE internal register resource is occupied, the register of the front end of the first BLE circuit for resource supply and the rear end of the circuit can be retimed firstly;
(2) the BLE register retimes to the circuit backend: registers in the first BLE and the second BLE need to be retimed to the third BLE, if register resources in the third BLE are idle, the registers in the first BLE and the second BLE can be retimed to the inside of the third BLE, meanwhile, the two-way selector in the third BLE is configured and connected as an output port of the register, and the two-way selector in the first BLE and the second BLE is configured and connected as an output port of the lookup table; if the third BLE internal register resources are occupied, the third BLE may be retimed first to the register that resources out of the circuit back end.
2. The register-retiming-based multi-pipeline stage sequential circuit binning operation method of claim 1, wherein a combinational circuit representing between registers is disposed between registers in the pipeline stage sequential circuit, and the pipeline stage sequential circuit comprises a standard pipeline sequential circuit, a branch pipeline sequential circuit, and an output port sequential circuit in the pipeline stage; the branch pipeline type sequential circuit has branches on the level of the register, and the output port of the pipeline stage is arranged in the output port type circuit of the pipeline stage.
3. The register retiming-based multi-pipeline sequential circuit binning operation method of claim 1 or 2, wherein the method for calculating the register retiming direction using the critical path delay in step four comprises: according to the key path time delay calculated in the third step, the path time delay of each pipeline stage in the whole pipeline circuit is evenly distributed, and the demarcation point of each pipeline stage is marked in the circuit; and performing retiming direction judgment on a register between pipeline stages in the pipeline circuit by contrasting the boundary point: if the demarcation point is in front of the corresponding register, the corresponding register needs to retime to the circuit input end until the demarcation point; if the demarcation point is behind the corresponding register, the corresponding register needs to retime the circuit output until the demarcation point.
4. The register retiming-based multi-pipeline sequential circuit binning operation of claim 3, wherein for each register requiring retiming, a target logic cell block is first selected by determining whether an inter-logic cell block retiming or an intra-logic cell block retiming is performed using a logic cell block register retiming method; and then retiming the register in the corresponding basic logic unit in the target logic unit block to the front end or the back end of the circuit by using a packed BLE (Low energy) register retiming method.
5. The register-retiming-based multiple-pipeline sequential circuit binning operation method of claim 4, wherein said logic cell block register retiming method comprises: there are two cases of register retiming from the logic cell block perspective, with the basic logic cells eventually packed into the logic cell block of the FPGA:
(1) inter-logical cell block retiming: moving registers in BLE within block I of logic cells into BLE within block II of logic cells;
(2) internal retiming of a block of logic cells: the register in one of the BLEs within the block of logic cells is moved to another BLE within the block of logic cells.
6. The register retiming-based multi-pipeline sequential circuit binning operation method according to claim 1 or 5, wherein the step two binning algorithm performs FPGA logic cell binning by:
(a) a look-up table circuit netlist BLE is boxed, and two selectors in the BLE are configured according to the actual condition of the look-up table circuit;
(b) selecting one BLE as a seed to be loaded into a target FPGA logic unit block;
(c) selecting BLE according to the attraction function value and continuously installing the BLE into the target FPGA logic unit block until BLE resources in the logic unit block are full or an external port reaches the physical upper limit of the FPGA logic unit block;
(d) selecting new seed BLE to continuously pack the unboxed logic cell blocks until all BLE are packed.
7. The register retiming-based multiple pipeline sequential circuit binning method of claim 6, wherein in step (a) of said binning algorithm: an independent LUT is used for BLE packing, register resources in BLE are idle, and two selectors are configured and connected to be output ports of the LUT; if the output end of the LUT drives one register to perform BLE boxing, both the LUT and the register resource in the BLE are utilized, and the two selectors are configured and connected as the output end of the register; if one output end of the LUT drives two paths, namely the lookup table circuit has two output ports at the same time, the circuit needs to be arranged in two BLE because BLE can only be set as one port output, wherein one BLE is configured with LUT resources, the register resources are idle, and two path selectors are configured and connected as the output ports of the LUT; and the LUT resource configuration in the BLE is a direct line connection, the register resource configuration and the two-way selector configuration are connected to be the output port of the register.
CN201811587501.9A 2018-12-25 2018-12-25 Register retiming-based multi-pipeline sequential circuit boxing operation method Active CN109800468B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811587501.9A CN109800468B (en) 2018-12-25 2018-12-25 Register retiming-based multi-pipeline sequential circuit boxing operation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811587501.9A CN109800468B (en) 2018-12-25 2018-12-25 Register retiming-based multi-pipeline sequential circuit boxing operation method

Publications (2)

Publication Number Publication Date
CN109800468A CN109800468A (en) 2019-05-24
CN109800468B true CN109800468B (en) 2022-09-30

Family

ID=66557496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811587501.9A Active CN109800468B (en) 2018-12-25 2018-12-25 Register retiming-based multi-pipeline sequential circuit boxing operation method

Country Status (1)

Country Link
CN (1) CN109800468B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400169B (en) * 2020-02-25 2023-04-18 中科亿海微电子科技(苏州)有限公司 Method and system for automatically generating netlist file for testing software and hardware
CN117634383B (en) * 2023-12-26 2024-06-07 苏州异格技术有限公司 Critical path delay optimization method, device, computer equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101515312A (en) * 2008-12-03 2009-08-26 复旦大学 On-site programmable device FPGA logic unit model and general bin packing algorithm thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8813013B2 (en) * 2012-10-19 2014-08-19 Altera Corporation Partitioning designs to facilitate certification

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101515312A (en) * 2008-12-03 2009-08-26 复旦大学 On-site programmable device FPGA logic unit model and general bin packing algorithm thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
以时间裕量为参数的时序电路再综合算法;李鹏等;《计算机辅助设计与图形学学报》;20100915(第09期);全文 *

Also Published As

Publication number Publication date
CN109800468A (en) 2019-05-24

Similar Documents

Publication Publication Date Title
Lemieux et al. Using sparse crossbars within LUT
Singh et al. PITIA: an FPGA for throughput-intensive applications
US7219048B1 (en) Methodology and applications of timing-driven logic resynthesis for VLSI circuits
US8543955B1 (en) Apparatus and methods for time-multiplex field-programmable gate arrays
US8782591B1 (en) Physically aware logic synthesis of integrated circuit designs
US9026967B1 (en) Method and apparatus for designing a system on multiple field programmable gate array device types
US9589090B1 (en) Method and apparatus for performing multiple stage physical synthesis
US10678979B2 (en) Method and apparatus for implementing a system-level design tool for design planning and architecture exploration
US8539414B1 (en) Automatic asynchronous signal pipelining
US8793629B1 (en) Method and apparatus for implementing carry chains on FPGA devices
CN109800468B (en) Register retiming-based multi-pipeline sequential circuit boxing operation method
Abbas et al. Latency insensitive design styles for FPGAs
Yazdanshenas et al. Automatic circuit design and modelling for heterogeneous FPGAs
CN109815545B (en) Register retiming-based multi-pipeline sequential circuit resynthesis operation method
US9646126B1 (en) Post-routing structural netlist optimization for circuit designs
Pui et al. An analytical approach for time-division multiplexing optimization in multi-FPGA based systems
US8443334B1 (en) Method and apparatus for generating graphical representations of slack potential for slack paths
Charitopoulos et al. MC-DeF: Creating customized CGRAs for dataflow applications
Isshiki et al. High-level bit-serial datapath synthesis for multi-FPGA systems
Mohaghegh et al. Tear down the wall: Unified and efficient intra-and inter-cluster routing for FPGAs
US11308025B1 (en) State machine block for high-level synthesis
Liu et al. Timing‐Driven NonuniformDepopulation‐Based Clustering
Zhou et al. 64-bit prefix adders: Power-efficient topologies and design solutions
Abbas System Level Communication Challenges of Large FPGAs
Farooq et al. The effect of LUT and cluster size on a tree based FPGA architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant