Detailed description of the invention
What the technical scheme and advantage for making the embodiment of the present invention was expressed becomes apparent from, and below by drawings and Examples, is described in further detail technical scheme.
Fig. 1 is placement algorithm flow chart disclosed by the invention;As shown in Figure 1, the present invention is to use a kind of new process carry chain constrained procedure in FPGA placement algorithm, can be during global wiring Algorithm for Solving, length according to carry chain carries out two divisions, solve while fix carry chain, and while fixing, consider the problem of non-overlapping copies, in partial layout algorithm, carry chain length is ranked up, little preferentially the moving of length, length is the biggest more finally moves.Concrete steps include:
Step 110, comprehensive and storehouse mapping;
Specifically, before implementing placement algorithm, subscriber's line circuit need to be converted into gate level circuit, in the present invention, the subscriber's line circuit using high-level hardware description language (verilog) compiling, for using hardware description language (verilog) compiling, is comprehensively become the gate level circuit of low level by subscriber's line circuit;And gate level circuit is mapped in look-up table (LUT) and depositor (FF).
Step 120, packing algorithm;
Specifically, look-up table in step 110 and Parasites Fauna are become the elementary cell of three kinds of forms, the look-up table of i.e. four inputs and depositor, the look-up table of independent four inputs, the elementary cell of three kinds of forms of single depositor, and multiple elementary cells are packed into logical block (LogicElement, LE).
Fig. 2 be basic logic unit disclosed by the invention substantially form structure chart;As shown in Figure 2, one basic logic unit (LogicElement, LE) is made up of 4 LP (LogicParcel), fast hop carry chain input (Carryskipin), fast hop carry chain output (Carryskipout) and LBUF.Each LP includes two LUT4,1 LUT4C (LUT4 of band carry chain) and two depositors.1 LE has 12 LUT4 and 8 depositors altogether, the ratio of LUT4 and depositor is 3: 2, the input of fast hop carry chain and the output of fast hop carry chain are used for realizing fast hop carry chain function, and LBUF is for producing the control signal clock of depositor in logical block.
Step 130, global wiring algorithm;
Specifically, what global wiring algorithm was taked is the global parsing type algorithm of chip division, and when carry chain length is more than granularity of division, carry chain can move freely and solve, and this can improve the solution room of algorithm, improves algorithm performance.Along with the length carrying out carry chain of global wiring algorithm will be gradually reduced, when carry chain length is less than or equal to granularity of division, algorithm will fix this carry chain, and ensure not overlap each other between fixing carry chain, therefore, carry chain non-overlapping copies two-by-two is can ensure that in global wiring algorithm, in the present invention, carry chain a length of vertical or horizontal connected in series by logical block, and the carry chain formed, wherein, 1 logical block (LE) constitutes 1 carry chain, multiple logical blocks (LE) a plurality of carry chain of vertical or horizontal composition connected in series, granularity of division minimum in the present invention is 2 for longitudinal carry chain, laterally carry chain is 1.
Step 140, partial layout algorithm;
When have between the carry chain of certain length and the logical block not having carry chain have be connected time, prioritizing selection moves the logical block not having carry chain, can more improve the success rate of mobile logical block.
Step 150, wiring;
After placement algorithm by step 130 and step 140, look-up table and depositor are solved or fixing, and by MUX (MUX), the line between look-up table and depositor is coupled together.
Fig. 3 is the connection diagram of carry chain between multiple logical block disclosed by the invention;As shown in Figure 3, the carry chain being made up of 4 LE, carry chain is represented by dashed line in figure 3,4 LUT4C within 1 LE are attached by ripple carrier chain, it is attached by the structure of the input of fast hop carry chain and the output of fast hop carry chain between different LE, the connection of 4 LUT4C in each LE forms 1 carry chain, and in LE0, the LUT4C on top is connected with the LUT4C of bottom in LE1 by the dotted line of side in Fig. 3, to form a plurality of carry chain.
Fig. 4 is the basic structure of the LUT4C of band carry chain structure disclosed by the invention, and as shown in Figure 4, the carry chain between LUT4C is ripple carrier chain (ripplecarrychain) structure.Ripple carrier chain is for coupling together the multiple LUT4C within 1 LE, and the ripple carrier chain of multiple LE is attached by the input of fast hop carry chain and the output of fast hop carry chain, it is achieved the connection between multiple LE.
Ripple carrier chain structure specifically includes that XOR gate (xor) is for obtaining the output of addition, multiplexer (mux_co) and multiplexer (mux_ca).The input of multiplexer (mux_ca) can connect GND, directly as input, the output of LET4 or the input of LUT4.Selection according to (mux_ca) is different, it is possible to achieve different functions.When selecting to directly input or during the input of LUT4, it is possible to achieve basic signed magnitude arithmetic(al), when selecting the output of GND or LUT4, it is possible to achieve multi input with or function.
Below in conjunction with Fig. 5, the work process that step 130 in Fig. 1 is concrete is described in detail.Fig. 5 is global wiring algorithm flow chart disclosed by the invention;As it is shown in figure 5, when carry chain length is more than granularity of division, carry chain can move freely and solve, this can improve the solution room of algorithm, improves algorithm performance.Along with the length carrying out carry chain of global wiring algorithm will be gradually reduced, when carry chain length is less than or equal to granularity of division, algorithm will fix this carry chain, and ensure not overlap each other between fixing carry chain, so can ensure that carry chain non-overlapping copies two-by-two in global wiring algorithm, concrete steps include:
Step 510, acquisition carry chain length;
Specifically, according to the method forming carry chain length in abovementioned steps 120, carry chain length is obtained;Fig. 6 is the distribution schematic diagram of PLB in chip disclosed by the invention;As shown in Figure 6, every 1 PLB (ProgrammableLogicBlock) includes a basic logic unit (LE);And containing a continuous print carry chain in each column PLB, carry chain order arrangement from bottom to up;In the present invention in the PLB distribution shown in Fig. 6, file PLB maximum quantity is 32, and line PLB maximum quantity is 16, and therefore, the carry chain length of file PLB is 32 to the maximum, and the quantity comprising most LUT4C is 32*4=128;The carry chain length of line PLB is 16 to the maximum, and the quantity comprising most LUT4C is 16*4=64.
Step 520, judge that carry chain length is whether more than granularity of division;
Specifically, in the present invention, with file carry chain a length of 8;Being described in detail as a example by line carry chain length 4, Fig. 7 is that carry chain length disclosed by the invention divides schematic diagram;As it is shown in fig. 7, file carry chain a length of 8, being i.e. made up of 8 PLB, 8 PLB include that 8 LE, 8 LE include 32 LUT4C;Line carry chain a length of 4, i.e. 4 PLB compositions, 4 PLB include that 4 LE, 4 LE include 16 LUT4C.
When carry chain length is more than granularity of division, when i.e. carry chain length can carry out two divisions, carry chain can move freely and solve, in Fig. 7, file carry chain a length of 8, line carry chain a length of 4, all can carry out two divisions, file and line carry out two divisions simultaneously, are divided into 4 identical parts, i.e. I1、I2、I3, and I4;Now, I1File carry chain a length of 4, line carry chain a length of 2;File carry chain length and line carry chain length are also simultaneously greater than granularity of division now, when i.e. carry chain length can also carry out two divisions, by continuation described above, file carry chain length and line carry chain length are carried out two divisions, carrying out along with global wiring algorithm, file carry chain length and line carry chain length will reduce, when carry chain length is less than granularity of division, when i.e. carry chain a length of 1 cannot carry out two divisions, global wiring algorithm will fix this carry chain, and ensure not overlap each other between fixing carry chain, in global wiring algorithm, so can ensure that carry chain non-overlapping copies two-by-two, improve the solution room of algorithm, improve algorithm performance.
Step 530, carry chain move freely and solve;
Specifically, when carry chain length is more than granularity of division, when i.e. carry chain length can carry out two divisions, carry chain can move freely and solve, overall situation derivation algorithm use analytical type solver to solve, the design netlist N (E, V) of given user, line E and node V is the function of design netlist, and overall situation solver finds the position (x of nodei, yi), optimize interconnection line semi-perimeter (Half-PerimeterWireLength, HPWL) so that it is interconnection line semi-perimeter length is minimum, for node coordinateWith
Double optimization equation can be written as, it is contemplated that has line EgFigure G=(E with node Vg, V), wherein line weight wI, j> 0;
Double optimization equation is differentiable function, can be write as about x part and the matrix form of y part:
With
X part and y part minima are equal at zero position obtain at derivative:
With
Jacobi iteration (JacobiMethod) and Gauss Sai Deer relaxation method (SuccessiveOverrelaxationMethod) are used for two above system of linear equations,
Jacobi iteration (JacobiMethod) can be write as the i-th equation of below equation: Ax=bThe equation is negated solution,
The matrix form being write as LU decomposition is: x(k)=D-1(L+U)x(k-1)+D-1b
Gauss Sai Deer relaxation method (SuccessiveOverrelaxationMethod) can use equation below to represent:
Wherein ω is coefficient of relaxation, is sub-lax less than 1, and more than 1 for overrelaxation, it is convergence that ω is necessarily less than 2 guarantee algorithms.The x vector sum y vector for node could be solved respectively by above-mentioned overall situation solver algorithm.
Step 540, carry chain are fixed;
Specifically, when carry chain length is less than granularity of division, when i.e. carry chain a length of 1 cannot carry out two divisions, global wiring algorithm will fix this carry chain, and ensure not overlap each other between fixing carry chain, in global wiring algorithm, so can ensure that carry chain non-overlapping copies two-by-two, improve the solution room of algorithm, improve algorithm performance.
It should be noted that the division to vertical or horizontal carry chain as described above illustrates, in actual applications as a example by vertical or horizontal carry chain all has connection, it is possible to single vertical or horizontal carry chain is divided.
Describing the working method of fast hop carry chain in the present invention below as a example by Fig. 8 in detail, Fig. 8 is the structure chart of carry skip chain disclosed by the invention;As shown in Figure 8,
Carry skip chain comprises three parts of carry chain in fast hop importation and fast hop output part and LE between multiple LUT4C.Illustrate 4 LUT4C in fig. 8.4 LUT4C can be coupled together by ripple carrier chain, and ripple carrier chain is represented by dashed line;It is attached by carry skip chain between LE, thus can realize the high-speed carry of such as 4 and the high-speed carry of such as 8.For two neighbouring LE, below the output of fast hop output part of LE can be directly connected to the input of fast hop importation of LE above.
LUT4C, as conventional look-up table, will be produced output signal, and be exported by the multiplexer (mux_dy) corresponding port in port dy [0], dy [1], dy [2], dy [3].The carry that signed magnitude arithmetic(al) obtains also is transmitted by LUT4C by carry chain, and exports through c4_out.
The effect of fast hop importation is to select suitable carry input for current LE.The local carry input that the input of fast hop importation divides three groups: one group to be current LE, including ground Gnd, byp [4], byp [16];One group be following LE carry chain input: c4_in, c_skip4_in, c_skip8_in;Another group is the output of the jump output unit of following LE: r4_in_b, p4_in_b, p8_in_b, as the selection signal of the carry chain input of current LE.Fast hop importation is output as the carry chain input of current LE.Under the control selecting signal, fast hop importation selects a signal to input as the carry of current LE from the carry chain input of local carry input and following LE.
The effect of fast hop output part is that above the adjacent LE in (i.e. the downstream of carry chain) provides the selection signal of carry signal.The output that the input of fast hop output part is divided into two groups: one group to be 4 LUT4C, another signal is the same input signal shared with fast hop importation: p4_in_b.Its output is as the input signal of the fast hop importation as LE above, i.e. r4_in_b, p4_in_b, p8_in_b.
After four LUT4C of current LE carry out additive operation when outfan output is all the signal of 1, fast hop output part produces and effectively selects signal p4_in_b.When this selection invalidating signal, in the fast hop importation of LE above, the c4_out of current LE is inputted selected as the carry of its LE;When this selection signal is effective, in the fast hop importation of LE above, c_skip4_out or c_skip8_out of current LE is inputted selected as the carry of its LE.
Further, entitled " a kind of carry skip chain " submitted to referring also to the applicant for the concrete function of fast hop carry chain, attorney docket is: the Chinese invention patent application of CP11397.
Professional should further appreciate that, the unit of each example described in conjunction with the embodiments described herein and algorithm steps, can be with electronic hardware, computer software or the two be implemented in combination in, in order to clearly demonstrate the interchangeability of hardware and software, the most generally describe composition and the step of each example according to function.These functions perform with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can use different methods to realize described function to each specifically should being used for, but this realization is it is not considered that beyond the scope of this invention.
The method described in conjunction with the embodiments described herein or the step of algorithm can use the software module that hardware, processor perform, or the combination of the two is implemented.In any other form of storage medium that software module is known in can being placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable ROM, depositor, hard disk, moveable magnetic disc, CD-ROM or technical field.
Above-described detailed description of the invention; the purpose of the present invention, technical scheme and beneficial effect are further described; it is it should be understood that; the foregoing is only the detailed description of the invention of the present invention; the protection domain being not intended to limit the present invention; all within the spirit and principles in the present invention, any modification, equivalent substitution and improvement etc. done, should be included within the scope of the present invention.