CN108255463B

CN108255463B - Digital logic operation method, circuit and FPGA chip

Info

Publication number: CN108255463B
Application number: CN201711464809.XA
Authority: CN
Inventors: 蒲迪锋
Original assignee: Shenzhen Ziguang Tongchuang Electronics Co ltd
Current assignee: Shenzhen Ziguang Tongchuang Electronics Co ltd
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2020-12-22
Anticipated expiration: 2037-12-28
Also published as: CN108255463A

Abstract

The invention provides a digital logic operation method, a circuit and an FPGA chip, wherein a first flow register is arranged in a multiplication unit to reduce the time delay between the two registers; the input unit comprises a plurality of input registers for respectively receiving input data, the multiplication unit comprises an encoder, a Wallace tree structure module and an adder which are connected in sequence, a first pipeline register is arranged between any two devices in the multiplication unit, and a second pipeline register is arranged behind the adder; the output unit comprises an accumulator and an output register which are connected in sequence, and the output of the accumulator and the output of the multiplication unit are used as the input of the accumulator. Through the implementation of the invention, the time delay existing between the first pipeline register and the second pipeline register is reduced, and the maximum time delay of the digital logic operation circuit is reduced on the whole, so that the system operation period is reduced, and the operation rate of the system is improved.

Description

Digital logic operation method, circuit and FPGA chip

Technical Field

The invention relates to the technical field of FPGA devices, in particular to a digital logic operation method, a digital logic operation circuit and an FPGA chip.

Background

The digital logic operation unit, i.e. DSP, is an important component in the FPGA chip, and has a wide application in digital systems, which is indispensable for the FPGA to perform high-speed computation, especially for signal processing. Especially, the performance of the DSP is often a bottleneck in the performance of the system, because the DSP is an indispensable component in high-performance microprocessors, digital signal processors, graphic and image systems, scientific computing, and some specific data processing devices. In the DSP, the multiplier is an important digital module in the DSP, and now, basically, all multipliers are multipliers with a booth structure, and are composed of an encoder, a wallanc tree structure and an adder, referring to fig. 1, fig. 1 shows a circuit structure block diagram of a digital logic operation unit in the prior art, which includes three Input registers Input _ reg, a Pre-loader Pre _ adder, a first pipeline register Pipe _ adder1, a multiplier multiplexer, a second pipeline register Pipe _ adder2, an accumulator Post _ adder and an Output register Output _ reg. In the DSP, a multiplier occupies a very important position, the period of the multiplier completing one multiplication operation basically determines the operation period of the DSP, and the fastest operable frequency of a basic logic operation unit is basically limited by the operation speed of the multiplier. Referring to fig. 2, fig. 2 is a schematic diagram illustrating a maximum delay of a circuit structure block diagram of the digital logic operation unit in fig. 1, and it can be seen from the diagram that a maximum delay duration between registers is between a first pipelined register and a second pipelined register, and a duration of 9 units is composed of a delay duration of 1 unit of a decoder Mult _ encoder, a delay duration of 2 units of a walsh tree structure module Mult _ tree, and a delay duration of 6 units of an adder Mult _ adder, which results in a duration of 9 units of an operation period of the entire logic operation unit, and the operation period is large, thereby limiting a rate of system operation.

Disclosure of Invention

The invention provides a digital logic operation method, a digital logic operation circuit and an FPGA chip, and aims to solve the problems of large time delay, long period and low system operation rate in a digital logic operation circuit in the prior art.

In order to solve the above technical problem, the present invention provides a digital logic operation circuit, which comprises an input unit, a multiplication unit and an output unit, which are connected in sequence;

the input unit comprises a plurality of input registers for respectively receiving input data;

the multiplication unit comprises an encoder, a Wallace tree structure module and an adder which are connected in sequence, a first pipeline register is arranged between any two devices in the multiplication unit, and a second pipeline register is arranged behind the adder;

the output unit comprises an accumulator and an output register which are connected in sequence, and the output of the accumulator and the output of the multiplication unit are used as the input of the accumulator.

Optionally, the input unit further includes at least one pre-adder, an input of the pre-adder is connected to at least two of the input registers, and an output of the pre-adder is used as an input of the multiplication unit.

Optionally, the input registers include three, where the outputs of two input registers are used as the inputs of the pre-adder, and the output of the other input register is directly used as the input of the multiplication unit.

Optionally, the first flow register is disposed between the wallace tree structure module and the adder.

The invention also provides an FPGA chip which is characterized by comprising the digital logic operation circuit.

The invention also provides a digital logic operation method, which comprises the following steps:

an input register in the input unit receives input data;

the input data is input into a multiplication unit, and multiplication operation is carried out sequentially through an encoder, a Wallace tree structure module and an adder in the multiplication unit to obtain operation data; a first pipeline register is arranged between any two devices in the multiplication unit, and a second pipeline register is arranged behind the adder;

the operation data is input into the output unit, the output unit comprises an accumulator and an output register which are sequentially connected, and the output of the accumulator and the output of the multiplication unit are used as the input of the accumulator.

Optionally, when the input data includes three, two of the input data are respectively input into the pre-adder through two input registers as inputs of the multiplication unit; the other input data is directly used as the input of the multiplication unit;

when the input data comprises two data, one of the input data passes through the input register and the pre-adder once to be used as the input of the multiplication unit; the other input data is directly used as the input of the multiplication unit.

The invention has the beneficial effects that:

the invention provides a digital logic operation method, a digital logic operation circuit and an FPGA chip.A first flow register is arranged in a multiplication unit to reduce the time delay between the two registers, specifically, the digital logic operation circuit comprises an input unit, the multiplication unit and an output unit which are connected in sequence, the input unit comprises a plurality of input registers for respectively receiving input data, the multiplication unit comprises an encoder, a Wallace tree structure module and an adder which are connected in sequence, the first flow register is arranged between any two devices in the multiplication unit, and a second flow register is arranged behind the adder; the output unit comprises an accumulator and an output register which are connected in sequence, and the output of the accumulator and the output of the multiplication unit are used as the input of the accumulator. Through the implementation of the invention, the time delay existing between the first pipeline register and the second pipeline register is reduced, and the maximum time delay of the digital logic operation circuit is reduced on the whole, so that the system operation period is reduced, and the operation rate of the system is improved.

Drawings

FIG. 1 is a block diagram of a digital logic unit in the prior art;

FIG. 2 is a schematic diagram of the maximum delay of the circuit structure of the digital logic operation unit in FIG. 1;

FIG. 3 is a schematic diagram of a digital logic circuit according to a first embodiment of the present invention;

FIG. 4 is a diagram illustrating a maximum delay of a digital logic operation circuit according to a first embodiment of the present invention;

fig. 5 is a flowchart of a digital logic operation method according to a first embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be described in detail with reference to the accompanying drawings; it should be noted that the contents of the embodiments are only for explaining the present invention, and do not limit the present invention.

First embodiment

Referring to fig. 3, fig. 3 is a schematic circuit diagram of a digital logic operation circuit according to a first embodiment of the present invention, which includes an input unit 31, a multiplication unit 32, and an output unit 33 connected in sequence;

the input unit 31 includes a plurality of input registers 311 for respectively receiving input data;

the multiplication unit 32 comprises an encoder 321, a wallace tree structure module 322 and an adder 323 which are connected in sequence, wherein a first pipeline register 324 is arranged between any two devices in the multiplication unit 32, and a second pipeline register 325 is arranged behind the adder 323;

the output unit 33 includes an accumulator 331 and an output register 332 connected in this order, and an output of the accumulator 331 and an output of the multiplication unit 32 serve as inputs of the accumulator 331.

The digital logic operation circuit in this embodiment roughly includes three parts, an input unit 31, a multiplication unit 32, and an output unit 33, where the input unit 31 receives input data, and the multiplication unit 32 receives the input data, processes the input data, and outputs the processed input data to the output unit 33.

The input unit 31 includes a plurality of input registers 311, the input registers 311 receive input data inputted from the outside, and each of the input registers 311 receives one input data. The input unit 31 may further comprise at least one pre-adder 312, wherein the input of the pre-adder 312 is connected to at least two input registers 311, and the output of the pre-adder 312 serves as the input of the multiplication unit 32. That is, the outputs of at least two input registers 311 are inputs to the pre-adder 312. The pre-adder 312 is used to pre-process the input data, and is suitable for performing multiplication operations by the following multiplication units 32 more conveniently when there are multiple input data.

Optionally, in this embodiment, the input registers 311 may include three input registers 311, and when there are three input registers 311, the outputs of two of the input registers 311 are used as the inputs of the pre-adder 312, and the output of the other input register 311 is directly used as the input of the multiplication unit 32.

The output unit 33 comprises an accumulator 331 and an output register 332 connected in sequence, wherein the input of the accumulator 331 comprises the output of the multiplication unit 32 and the output of the output register 332, and the output of the accumulator 331 is the input of the output register 332. The accumulator 331 is a register for storing intermediate results generated by the calculation.

In this embodiment, the multiplication unit 32, otherwise called multiplier, is based on an adder 323 structure, which is already an essential part of modern computers. The model of the multiplier is based on a "shift and add" algorithm. In this algorithm, each bit in the multiplier produces a partial product. The first partial product is generated by the LSB of the multiplier, the second product is generated by the second bit of the multiplier, and so on. If the corresponding multiplier bit is a 1, then the local product is the multiplicand value, and if the corresponding multiplier bit is a 0, then the local product is all 0's. Each partial product is shifted one bit to the left.

The multiplier can be represented in a more general way. Each input, the number of partial products, and the result are assigned a logical name, and these names are used as signal names in the schematic of the circuit.

The multiplication unit 32 comprises an encoder 321, a Wallace Tree Structure module 322 and an adder 323 which are connected in sequence, wherein the Wallace Tree Structure module 322 adopts Wallace Tree Algorithm (Wallace Tree Algorithm) which is a Tree Algorithm for partial simplification. It can shorten the delay of multiplier to maximum extent. The specific process comprises the following steps: the first step is to group the partial products of each column by three bits, each group reduces the number of addends by using CSA parts formed by full adders; the second step processes the pseudo sum and local carry signals of the same weight, still in groups of three bits, through the CSA component, again reducing the number of addends, until there are only two outputs at the end, for the result produced in the first step. The final pseudo sum and the local carry are added by carry propagate adder 323 to arrive at the true result. In this approach, the operations of pseudo-summing in each column are done in parallel. Due to the adoption of full additionThe device acts as an adding component, and 3 input signals with weight of 20 are processed each time to obtain a local carry signal with weight of 21 and a pseudo sum signal with weight of 20, so that the number of operands is reduced 1/3, and the generated intermediate pseudo sum is processed in this way, and the elapsed time is O (log)_3/ ₂N) to obtain the final pseudo sum and local carry signals.

Adder 323 is a device that generates a sum of numbers. The device with addend and summand as input and the device with sum and carry as output is a half adder. If the addend, the summand and the carry of the low order are inputs and the sum and the carry are outputs, the adder is a full adder, which is commonly used as a computer arithmetic logic unit and performs logic operation, shift and instruction call, and the adder 323 is also used as a basis of a multiplier and is arranged behind the Wallace tree structure module 322.

In this embodiment, a first flow register 324 is further disposed between the wallace tree structure module 322 and the adder 323. Registers are high-speed memory elements of limited storage capacity that may be used to temporarily store instructions, data, and addresses, and the presence of registers may allow for alignment of incoming data for subsequent operations. In this embodiment, the registers in each component unit, such as the input register 311, the first pipeline register 324, the second pipeline register 325 and the output register 332, all function similarly. The first flow register 324 is disposed inside the multiplication unit 32, between any two devices, such as between the encoder 321 and the wallace tree structure module 322, or between the wallace tree structure module 322 and the adder 323; preferably, in this embodiment, the first flow register 324 is disposed between the wallace tree structure module 322 and the adder 323, receives the output result of the wallace tree structure module 322, and transmits the result to the adder 323. Referring to fig. 4, fig. 4 shows a schematic diagram of the maximum delay of the digital logic operation circuit provided in this embodiment, wherein the maximum delay between the output register 332 and the first pipeline register 324 is 6 unit durations formed by overlapping 2 unit durations of the pre-adder 312, 1 unit duration of the encoder 321, and 2 unit durations of the wallace tree structure module 322, the maximum delay between the first pipeline register 324 and the second pipeline register 325 is 6 unit durations of the adder 323, the maximum delay between the second pipeline register 325 and the output register 332 is 6 unit durations of the accumulator 331, so that the maximum delay duration between each two adjacent registers is 6 unit durations, the maximum delay duration of the whole digital logic operation circuit is also 6 unit durations, and the delay duration is greatly shortened on the basis of the prior art, the operation period is reduced, and the operation speed of the digital logic operation circuit is improved.

In addition, the embodiment further provides an FPGA chip, which specifically includes the digital logic operation circuit described in the embodiment.

Second embodiment

Referring to fig. 5, fig. 5 is a flowchart of a digital logic operation method according to a second embodiment of the present invention, including:

s501, an input register in an input unit receives input data;

s502, inputting input data into a multiplication unit, and performing multiplication operation sequentially through an encoder, a Wallace tree structure module and an adder in the multiplication unit to obtain operation data; a first pipeline register is arranged between any two devices in the multiplication unit, and a second pipeline register is arranged behind the adder;

s503, an operation data input and output unit, wherein the output unit comprises an accumulator and an output register which are connected in sequence, and the output of the accumulator and the output of the multiplication unit are used as the input of the accumulator.

The digital logic operation circuit in this embodiment roughly includes three parts, an input unit, a multiplication unit, and an output unit, where the input unit is to receive input data, and the multiplication unit is to receive the input data, process the input data, and then output the processed input data to the output unit.

The input unit comprises a plurality of input registers, the input registers receive input data input from the outside, and each input register receives one input data. The input unit may further comprise at least one pre-adder, an input of the pre-adder is connected to the at least two input registers, and an output of the pre-adder serves as an input of the multiplication unit. That is, the outputs of at least two input registers serve as inputs to the pre-adder. The pre-adder has the function of preprocessing input data, and is suitable for performing multiplication operation through a subsequent multiplication unit more conveniently when a plurality of input data exist.

Optionally, in this embodiment, the input registers may include three input registers, and when there are three input registers, outputs of two input registers are used as inputs of the pre-adder, and an output of another input register is directly used as an input of the multiplication unit. Specifically, the method specifically includes, according to different input data:

when the input data comprises three data, two input data are respectively input into the pre-adder through two input registers to serve as the input of the multiplication unit; the other input data is directly used as the input of the multiplication unit;

when the input data comprises two data, one of the data passes through the input register and the pre-adder once and is used as the input of the multiplication unit; the other input data is directly used as the input of the multiplication unit.

The output unit comprises an accumulator and an output register which are connected in sequence, wherein the input of the accumulator comprises the output of the multiplication unit and the output of the output register, and the output of the accumulator is the input of the output register. An accumulator is a register used to store intermediate results generated by computations.

In this embodiment, the multiplication unit, otherwise called multiplier, is based on an adder structure, which is already an essential part of modern computers. The model of the multiplier is based on a "shift and add" algorithm. In this algorithm, each bit in the multiplier produces a partial product. The first partial product is generated by the LSB of the multiplier, the second product is generated by the second bit of the multiplier, and so on. If the corresponding multiplier bit is a 1, then the local product is the multiplicand value, and if the corresponding multiplier bit is a 0, then the local product is all 0's. Each partial product is shifted one bit to the left.

The multiplication unit comprises an encoder, a Wallace tree structure module and an adder which are sequentially connected, wherein the Wallace tree structure module adopts a Wallace tree algorithm which is a tree algorithm with partial simplification. It can shorten the delay of multiplier to maximum extent. The specific process comprises the following steps: the first step is to group the partial products of each column by three bits, each group reduces the number of addends by using CSA parts formed by full adders; the second step processes the pseudo sum and local carry signals of the same weight, still in groups of three bits, through the CSA component, again reducing the number of addends, until there are only two outputs at the end, for the result produced in the first step. The final pseudo sum and the local carry are added by a carry propagation adder to obtain a real result. In this approach, the operations of pseudo-summing in each column are done in parallel. Since a full adder is used as an adding component, 3 weight 20 input signals can be processed each time to obtain a weight 21 local carry signal and a weight 20 pseudo sum signal, so that the number of operands is reduced 1/3, and the generated intermediate pseudo sum is processed in this way, and the elapsed time is O (log)_3/2N) to obtain the final pseudo sum and local carry signals.

An adder is a device that generates a sum of numbers. The device with addend and summand as input and the device with sum and carry as output is a half adder. If the addend, the summand and the carry of the low order are input, and the sum and the carry are output, the adder is a full adder, is commonly used as a computer arithmetic logic unit and executes logic operation, shift and instruction calling, and the adder is also used as the basis of a multiplier and is arranged behind the Wallace tree structure module.

In this embodiment, a first flow register is further disposed between the wallace tree structure module and the adder. Registers are high-speed memory elements of limited storage capacity that may be used to temporarily store instructions, data, and addresses, and the presence of registers may allow for alignment of incoming data for subsequent operations. In this embodiment, the registers in each component unit, such as the input register, the first pipeline register, the second pipeline register, and the output register, all function similarly. The first flow register is arranged in the multiplication unit and between any two devices, such as between the encoder and the Wallace tree structure module or between the Wallace tree structure module and the adder; preferably, in this embodiment, the first flow register is disposed between the wallace tree structure module and the adder, receives an output result of the wallace tree structure module, and transmits the output result to the adder. Referring to fig. 4, fig. 4 shows a schematic diagram of the maximum delay of the digital logic operation circuit provided in this embodiment, where the maximum delay between the output register and the first pipelined register is 2 unit durations of the pre-adder, 1 unit duration of the encoder, and 6 unit durations formed by overlapping 2 unit durations of the wallace tree structure module, the maximum delay between the first pipelined register and the second pipelined register is 6 unit durations of the adder, and the maximum delay between the second pipelined register and the output register is 6 unit durations of the accumulator, so that the maximum delay duration between each two adjacent registers is 6 unit durations, and the maximum delay duration of the entire digital logic operation circuit is also 6 unit durations, thereby greatly shortening the delay duration and reducing the operation period based on the prior art, the operation speed of the digital logic operation circuit is improved.

It will be apparent to those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented in program code executable by a computing device, such that they may be stored on a storage medium (ROM/RAM, magnetic disk, optical disk) and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The foregoing is a more detailed description of the present invention that is presented in conjunction with specific embodiments, and the practice of the invention is not to be considered limited to those descriptions. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A digital logic operation circuit is characterized by comprising an input unit, a multiplication unit and an output unit which are connected in sequence;

2. The digital logic operation circuit of claim 1, wherein the input unit further comprises at least one pre-adder, the inputs of the pre-adder being connected to at least two of the input registers, the output of the pre-adder being the input of the multiplication unit.

3. The digital logic operation circuit of claim 2, wherein the input registers comprise three, wherein the outputs of two of the input registers are inputs to the pre-adder and the output of the other of the input registers is directly input to the multiplication unit.

4. The digital logic operation circuit of any of claims 1-3, wherein the first flow register is disposed between the Wallace Tree Structure Module and an adder.

5. An FPGA chip comprising the digital logic operation circuit of any one of claims 1-4.

6. A method of digital logic operation, comprising:

an input register in the input unit receives input data;

the operation data input and output unit comprises an accumulator and an output register which are connected in sequence, and the output of the accumulator and the output of the multiplication unit are used as the input of the accumulator.

7. The method of digital logic operation of claim 6 wherein the input unit further comprises at least one pre-adder, the inputs of the pre-adder being connected to at least two of the input registers, the output of the pre-adder being the input to the multiplication unit.

8. The method of digital logic operation of claim 7 wherein the input registers comprise three, with the outputs of two of the input registers being inputs to the pre-adder and the output of the other of the input registers being directly inputs to the multiplication unit.

9. The digital logic operation method according to claim 8, wherein when the input data includes three, two of the input data are input into the pre-adder as inputs of the multiplication unit through two input registers, respectively; the other input data is directly used as the input of the multiplication unit;

10. A method of digital logic operation according to any of claims 6 to 9, wherein the first flow register is located between the wallace tree structure module and the adder.