CN106682258B

CN106682258B - Multi-operand addition optimization method and system in high-level comprehensive tool

Info

Publication number: CN106682258B
Application number: CN201611009866.4A
Authority: CN
Inventors: 王自鑫; 陈弟虎; 衣杨; 张晓强
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2016-11-16
Filing date: 2016-11-16
Publication date: 2020-04-24
Anticipated expiration: 2036-11-16
Also published as: CN106682258A

Abstract

The invention discloses a method and a system for optimizing multi-operand addition in a high-level comprehensive tool, wherein the method comprises the following steps: acquiring high-level function description of a circuit design, and further acquiring operation and operand contained in the circuit design; analyzing the operation, judging whether 3 or more than 3 operands are continuously added, if so, continuing to execute the next operation, otherwise, ending the operation; reading an optimization target in a user configuration file, establishing a compression tree according to the optimization target, and storing compression tree information; generating synthesizable compressed tree HDL code from the compressed tree information. The invention can carry out the design space optimization of the multi-operand addition according to the optimization target in the user configuration file in the high-level synthesis stage, and is beneficial to generating a multi-operand addition circuit with better performance and improving the performance of a high-level synthesis tool. The multi-operand addition optimization method and system in the high-level comprehensive tool can be widely applied to the field of computer and circuit design.

Description

Multi-operand addition optimization method and system in high-level comprehensive tool

Technical Field

The invention relates to the field of computer and circuit design, in particular to a multi-operand addition optimization method and system in a high-level synthesis tool.

Background

In digital circuit design, multi-operand addition has wide application in digital signal processing, picture video processing, high-performance calculation and other aspects, and the operation speed and the resource overhead of the multi-operand addition often have important influence on the circuit design quality.

The high-level comprehensive technology directly converts a high-level language into a hardware description language through the processes of compiling, scheduling, resource allocation and the like, so that the design efficiency can be effectively improved, and the design time can be saved. The high-efficiency algorithm and the hardware circuit design method are both beneficial to improving the performance of the high-level comprehensive tool. For multi-operand addition, its hardware circuit implementation may have a variety of architectures. However, in the conventional high-level synthesis system, full adders, half adders or conventional adder trees are usually adopted to realize multi-operand addition, and design space exploration and related optimization of multi-operation addition are not deeply considered. On one hand, the larger carry propagation delay is caused; on the other hand, the logic structure of the target platform cannot be well adapted, especially for the case that the target platform is a Field Programmable Gate Array (FPGA). Therefore, in the design of the hardware circuit automatically generated by the conventional high-level integrated system, if a large-scale multi-operand addition operation is performed, the design often has a large time delay and occupies a large amount of hardware resources, so that the overall quality of the hardware design is affected.

Disclosure of Invention

In order to solve the technical problems, the invention aims to: the high-performance multi-operand addition optimization method based on the generalized parallel counter in the high-level comprehensive tool is provided.

In order to solve the above technical problems, another object of the present invention is to: the high-performance multi-operand addition optimization system based on the generalized parallel counter in the high-level synthesis tool is provided.

The technical scheme adopted by the invention is as follows: a multi-operand addition optimization method in a high-level synthesis tool comprises the following steps:

A. acquiring high-level function description of a circuit design, and further acquiring operation and operand contained in the circuit design;

B. judging whether the operation obtained in the step A has 3 or more than 3 operands for continuous addition, if so, loading an addition optimization processing unit, and entering the step C to execute the processing unit, otherwise, ending the operation;

C. reading optimized target data in a user configuration file, establishing a compression tree according to the optimized target data, and storing compression tree information;

D. generating synthesizable compressed tree HDL code according to the compressed tree information saved in step C.

Further, the step C specifically includes:

c1, reading the user configuration file and obtaining optimization target data, and performing priority sequencing on the generalized parallel counters according to the optimization target;

and C2, processing the operands by using the generalized parallel counter subjected to the priority sorting, generating a compression tree and storing the compression tree information.

Further, in the step B, the operand is represented by a two-dimensional dot matrix diagram.

Further, in the step C2, the compression tree is used to sum a plurality of numbers and take the sum as an output, and the saved compression tree information includes the number of stages of the compression tree, the type and the number of usage of the generalized parallel counter used at each stage, and the input and output information of the final adder.

Further, in the step C, the input of the compression tree is an operand of the multi-operand addition, the output of the compression tree is a sum of the operands of the multi-operand addition, and the function of the compression tree is the same as the addition function of the multi-operand addition.

The other technical scheme adopted by the invention is as follows: a system for multi-operand addition optimization in a high-level synthesis tool, the system comprising:

the circuit comprises an acquisition unit, a processing unit and a control unit, wherein the acquisition unit is used for acquiring high-level function description of a circuit design so as to obtain operation and operand contained in the circuit design;

the judging unit is used for judging whether the operation obtained by the obtaining unit has continuous addition of 3 or more than 3 operands, if so, the addition optimization processing unit is loaded and the processing unit is executed, otherwise, the processing unit is ended;

the addition optimization processing unit is used for reading the optimization target data in the user configuration file, establishing a compression tree according to the optimization target data and storing the compression tree information;

and the code generating unit is used for generating compressible tree HDL codes which can be synthesized according to the compressed tree information stored by the addition optimization processing unit.

Further, the addition optimization processing unit includes:

the sequencing module is used for reading the user configuration file, obtaining design optimization target data and carrying out priority sequencing on the generalized parallel counter according to the optimization target data;

and the generating module is used for processing the operands by using the generalized parallel counter subjected to priority sequencing in the sequencing module, generating a compression tree and storing compression tree information.

Furthermore, in the judging unit, the operand is represented by a two-dimensional dot matrix diagram.

Further, in the generating module, the compression tree is used for summing a plurality of numbers and taking the sum as an output, and the stored compression tree information includes the number of stages of the compression tree, the type and the number of usage of the generalized parallel counter used at each stage, and the input and output information of the final adder.

Further, in the addition optimization processing unit, the input of the compression tree is the operand of the multi-operand addition, the output of the compression tree is the sum of the operands of the multi-operand addition, and the function of the compression tree is the same as the addition function of the multi-operand addition.

The invention has the beneficial effects that: by using the method, the design space optimization of the multi-operand addition can be carried out according to the optimization target in the user configuration file in the high-level synthesis stage, the generation of a multi-operand addition circuit with better performance is facilitated, and the improvement of the performance of a high-level synthesis tool is facilitated.

The invention has another beneficial effect that: by using the system of the invention, the design space optimization of the multi-operand addition can be carried out in high-level synthesis according to the optimization target in the user configuration file, which is beneficial to generating a multi-operand addition circuit with better performance and is beneficial to improving the performance of a high-level synthesis tool.

Drawings

The following further describes embodiments of the present invention with reference to the accompanying drawings:

FIG. 1 is a flow chart of the steps of the method of the present invention;

FIG. 2 is a flow chart of the steps of a particular embodiment of the method of the present invention;

FIG. 3 is a schematic diagram of an addition in an embodiment of the method of the present invention;

FIG. 4 is a two-dimensional lattice diagram of an embodiment of the method of the present invention;

FIG. 5 is a schematic illustration of a portion of a GPC bitmap of the method of the present invention;

FIG. 6 is a flow diagram of compressed tree generation in an embodiment of the method of the present invention;

FIG. 7 is a block diagram of the architecture of the system of the present invention;

fig. 8 is a block diagram of the architecture of an embodiment of the system of the present invention.

Detailed Description

referring to fig. 1, a method for optimizing multi-operand addition in a high-level synthesis tool includes the following steps:

in this embodiment, there are 5 4-bit unsigned number additions, and the obtained result is 4 additions and 5 operands.

in this embodiment, 5 operands are detected to be added consecutively, and if yes, step C is performed. The process of adding 5 unsigned 4 bits in this embodiment is shown in FIG. 3, where a_ijBit j, s, representing the ith operand_kIndicating the k-th bit of the addition result.

Referring to fig. 2, as a further preferred embodiment, the step C specifically includes:

c1, reading the user configuration file and obtaining optimization target data, and performing priority sequencing on a Generalized Parallel Counter (GPC for short) according to the optimization target;

one specific GPC input-output relationship is exemplified by GPC (1,4,1, 5; 5), which has 5 inputs with a weight of 0, 1 input with a weight of 1,4 inputs with a weight of 2, 1 input with a weight of 3, whose output is an unsigned number R of 5 bits, when all inputs are 1:

R＝5×2⁰+1×2¹+4×2²+1×2³＝(11111)₂＝(31)₁₀

Further as a preferred embodiment, the design optimization objective includes area optimization, timing optimization or timing area product optimization.

Hardware resources occupied by different GPCs in the FPGA and time delay from input to output of the GPCs are different, and the GPCs are prioritized according to different optimization targets by using different comparison criteria.

For example, in Xilinx's FPGA, 3 LUTs are used for GPC (2, 6; 4), with a maximum delay from input to output of 0.316ns and a 2+6-4 to 4 difference in the number of inputs and outputs. GPC (6; 3) uses 2 LUTs with a maximum delay from input to output of 0.293ns and a difference in the number of inputs and outputs of 6-3 to 3.

If the optimization objective is timing optimization, then the ratio of the difference of the GPC input to the output and the maximum delay of the GPC input to the output (denoted PD) is used as the ranking criterion. GPC (6; 3) has a PD value of 3/0.293 to 10.239 and GPC (2, 6; 4) has a PD value of 4/0.316 to 12.658, and since 12.658>10.239, GPC (2, 6; 4) has higher priority than GPC (6; 3).

If the optimization objective is area optimization, then the ratio of the difference in GPC inputs and outputs to the GPC resources (usually LUTs) (denoted AD) is used as the ranking criterion. GPC (6; 3) has an AD value of 3/2 of 1.5, GPC (2, 6; 4) has an AD value of 4/3 of 1.333, and GPC (6; 3) has higher priority than GPC (2, 6; 4) because 1.5> 1.333.

If the optimization target is time sequence area product optimization, the product of PD and AD (denoted as APD) is the ranking criterion. For example, the APD of GPC (6; 3) is 10.239 × 1.5 to 15.3585, the APD of GPC (2, 6; 4) is 12.658 × 1.333 to 16.8731, and since 18.8731>15.3585, GPC (2, 6; 4) has higher priority than GPC (6; 3).

In the embodiment of the invention, the design optimization target is area optimization, and the ratio E of the difference of the input and output numbers of GPCs to the used resources is used as a sorting standard when sorting is carried out, wherein the larger the ratio is, the more input can be compressed by using less resources corresponding to GPCs. GPCs used in this example are GPC (1,4,1, 5; 5), GPC (4; 3) and GPC (3; 2), which occupy 4 hardware resources of 2 and 1 LUTs, respectively, and have input and output differences of 6, 1 and 1, respectively, and have E values of 6/4 of 1.5, 1/2 of 0.5, 1/1 of 1, and 1.5>1>0.5, respectively, so that the three GPCs are GPC (1,4,1, 5; 5), GPC (3; 2) and GPC (4; 3) in order of priority from high to low.

In a further preferred embodiment, in the step B, the operands are represented by a two-dimensional dot matrix, as shown in fig. 4.

Fig. 4 is a two-dimensional lattice diagram corresponding to fig. 3, which abstracts the operands participating in the operation into a two-dimensional lattice, where each row represents an operand, each point represents a certain bit (value is 0 or 1) of the operand, the leftmost point is the most significant bit of the operand in the row, the rightmost point is the least significant bit of the operand in the row, and all points in any column represent the same weight.

Figure 5 lists several different GPC dot-matrix representations. In this embodiment, the output bitmap of the GPC network has at most 2 points per column, i.e., the output of the GPC network can be composed into two new operands for input to subsequent adders.

Further preferably, in step C1, GPC is a circuit configuration having M-bit input and n-bit output, and the function of GPC is to sum up the number of 1 s represented by all inputs and to represent the unsigned number of n bits as an output result. Each input has a certain weight, the weight represents the number of 1 corresponding to the actual representation of the input, if the actual input of one input is A (can only be 0 or 1), and the weight is W, the number of 1 actually represented by the input is A x 2^W. The GPC symbols can be expressed as: (m)_k-1,m_k-2,…,m₁,m₀(ii) a n) where m_k-1>0,m_iWhere i represents the weight of the input, m_iRepresenting the number of inputs with weight i, k representing the number of input bits, n representing the number of output bits, and having:

GPC can compress the two-dimensional lattice diagram abstracted by a plurality of operands continuously to obtain the required number of operands. Because different GPCs can reduce the number of inputs, the hardware resources used and the delay from input to output, the GPCs can be prioritized according to different design optimization objectives and compressed by using the highest priority GPCs as much as possible.

Further as a preferred embodiment, in the step C2, the compression tree is used to sum a plurality of numbers and output the sum, and the stored compression tree information includes the number of stages of the compression tree, the type and number of the generalized parallel counters used at each stage, and the input and output information of the final adder.

The compression tree in the step C2 is a structure that can sum a plurality of numbers and output the sum, and includes two parts, a GPC network and an adder. The GPC network is divided into multiple stages (assumed to be N stages), each of which can pick a different GPC to compress the input of the stage according to the algorithm policy. The input of stage 1 is the original input composed of a plurality of operands; for the other stages, the input of the current stage is composed of the remaining output of all stages before the current stage and the remaining input of the original input. Finally, the N-level GPC network compresses the original bitmap into a bitmap having at most no more than the required number of points per column. And finally, taking the output dot-matrix diagram of the GPC network as the input of the adder to carry out summation, and finally obtaining the sum of a plurality of operands.

Referring to fig. 6, taking the GPC network generated in this embodiment as an example: the rectangle with solid line in the frame represents GPC (1,4,1, 5; 5), and the dots connecting the two ends of the solid line represent the output of GPC (1,4,1, 5; 5); the rectangle with the frame in dotted line represents the GPC (4; 3), and the dot connecting the two segments in dotted line represents the output of GPC (4; 3); the border is a rectangle with a dotted central line representing the GPC (3; 2) and the dots connecting the two segments with a dotted central line represent the output of the GPC (3; 2). The GPC network in this example had 3 levels in total, with 1 GPC (1,4,1, 5; 5) and 2 GPCs (4; 3) being used in the first level, 2 GPCs (3; 2) being used in the second level, and 1 GPC (3; 2) being used in the third level, as shown in FIG. 6. The output of the third stage is used as the input of the adder, and the result of the multi-operand addition is obtained after operation.

In a further preferred embodiment, in the step C, the input of the compression tree is an operand of a multi-operand addition, the output of the compression tree is a sum of operands of the multi-operand addition, and the function of the compression tree is the same as the addition function of the multi-operand addition.

Referring to fig. 7, a system for multi-operand addition optimization in a high-level synthesis tool, the system comprising:

Referring to fig. 8, further as a preferred embodiment, the addition optimization processing unit includes:

In a further preferred embodiment, the judgment unit represents the operand in a two-dimensional dot matrix.

Further preferably, in the generating module, the compression tree is configured to sum a plurality of numbers and output the sum, and the stored compression tree information includes the number of stages of the compression tree, the type and number of the generalized parallel counters used at each stage, and input and output information of the final adder.

In a further preferred embodiment, in the addition optimization processing unit, the input of the compression tree is an operand of multi-operand addition, the output of the compression tree is a sum of operands of multi-operand addition, and the function of the compression tree is the same as the addition function of the multi-operand addition.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A multi-operand addition optimization method in a high-level synthesis tool is characterized by comprising the following steps: the method comprises the following steps:

B. judging whether the operation obtained in the step A has more than 3 operands for continuous addition, if so, loading an addition optimization processing unit, and entering a step C to execute the processing unit, otherwise, ending the operation;

D. generating an HDL code of the comprehensive compression tree according to the compression tree information saved in the step C;

the compression tree is used for summing a plurality of numbers and taking the sum as output, and comprises a GPC network and an adder, wherein the GPC network is divided into a plurality of stages, each stage can select different GPCs according to algorithm strategies to compress the input of the stage, and the output of the GPC network is taken as the input of the adder to be summed to obtain the sum of a plurality of operands, and the GPC is a generalized parallel counter.

2. The method of claim 1, wherein the method comprises the steps of: the step C specifically comprises the following steps:

3. The method of claim 1 or 2, wherein the method comprises the steps of: in step B, the operands are represented by a two-dimensional dot matrix diagram.

4. The method of claim 2, wherein the method comprises the steps of: in step C2, the saved compression tree information includes the number of stages of the compression tree, the type and number of generalized parallel counters used at each stage, and the input and output information of the final adder.

5. The method of claim 1 or 2, wherein the method comprises the steps of: in the step C, the input of the compression tree is an operand of the multi-operand addition, the output of the compression tree is the sum of the operands of the multi-operand addition, and the function of the compression tree is the same as the addition function of the multi-operand addition.

6. A multi-operand addition optimization system in a high-level synthesis tool is characterized in that: the system comprises: the circuit comprises an acquisition unit, a processing unit and a control unit, wherein the acquisition unit is used for acquiring high-level function description of a circuit design so as to obtain operation and operand contained in the circuit design;

the judging unit is used for judging whether the operation obtained by the obtaining unit has more than 3 operands for continuous addition, if so, the addition optimization processing unit is loaded and the processing unit is executed, otherwise, the processing unit is ended;

a code generating unit for generating a synthesizable compressed tree HDL code based on the compressed tree information held by the addition optimization processing unit;

7. The system of claim 6, wherein the system comprises: the addition optimization processing unit includes:

8. The system for multi-operand addition optimization in a high-level synthesis tool according to claim 6 or 7, wherein: and the judgment unit represents the operand by a two-dimensional dot matrix diagram.

9. The system for multi-operand addition optimization in a high-level synthesis tool according to claim 6 or 7, wherein: in the generating module, the stored compression tree information comprises the stage number of the compression tree, the type and the use number of the generalized parallel counter used at each stage, and the input and output information of the final adder.

10. The system for multi-operand addition optimization in a high-level synthesis tool according to claim 6 or 7, wherein: in the addition optimization processing unit, the input of the compression tree is the operand of the multi-operand addition, the output of the compression tree is the sum of the operands of the multi-operand addition, and the function of the compression tree is the same as the addition function of the multi-operand addition.