CN107092462A

CN107092462A - A kind of 64 Asynchronous Multipliers based on FPGA

Info

Publication number: CN107092462A
Application number: CN201710214226.5A
Authority: CN
Inventors: 何安平; 吴尽昭; 刘晓庆; 冯广博; 郭慧波; 熊菊霞; 王娟
Original assignee: Individual
Current assignee: Individual
Priority date: 2017-04-01
Filing date: 2017-04-01
Publication date: 2017-08-25
Anticipated expiration: 2037-04-01
Also published as: CN107092462B

Abstract

The invention discloses a kind of 64 Asynchronous Multipliers based on FPGA, 64 Asynchronous Multipliers include 8*64 multipliers, selector MUX0, selector MUX1, selector MUX2, compressor reducer, counter Count0, counter Count1, counter Count2, some registers, carry lookahead adder CLA, and control unit, wherein, control unit, the streamline constituted using Click nonsynchronous controllers, handshake is analyzed by the carrying out shake communication of nonsynchronous controller, and order produces four groups of trigger signals；Selector MUX0, selector MUX1, selector MUX2, compressor reducer, counter Count0, counter Count1, counter Count2, some registers, carry lookahead adder CLA carry out the processing such as corresponding data transfer, compression, accumulating operation, output according to four groups of trigger signals.Faster, energy consumption is lower for calculating speed of the present invention.

Description

A kind of 64 Asynchronous Multipliers based on FPGA

Technical field

The present invention relates to a kind of 64 Asynchronous Multipliers for being based on field programmable gate array (FPGA).

Background technology

From after the transistor technology appearance seventies in last century, Synchronization Design almost turns into the design method of digital display circuit Synonym.But current technique has tended to manufacturing limit, 12 nanometers to 7 nanometers of transformation has been slowed down, " very likely first Away from Moore's Law " (John Gustafson, the AMD seat of honour designer).Clock caused by the huge advance of manufacturing process is askew Tiltedly, it is the severe challenge of synchronous design method the problems such as power distribution, synchronous design method can not provide these sternnesses and ask in itself The solution of topic, can only largely use GALS (Global Asynchronous and local synchronization) design method, that is, employ a small amount of asynchronous electricity The multi-core technology on road, to alleviate above-mentioned challenge.

Modern asynchronous design is introduced based on micropipeline design method, and the core of this design method is nonsynchronous controller Circuit, for realizing carrying out shake communication agreement and coordination circuits function.Compared to clock scheme, asynchronous circuit uses local communication mould Formula, completes asynchronous controlling, it is not necessary to huge clock distributing network, the problem of solving clock skew with Handshake Protocol.It is asynchronous Almost the power consumption of whole system is set to be effectively controlled without power consumption during idle.This asynchronous design methodologies low-power consumption, The many aspects such as low electromagnetic, low heat emission, modularization are with the obvious advantage.

Digital multiplier is a kind of binary ALU because digital circuitry framework Boolean logic it On, so needing a kind of mechanism that arithmetic is converted into logic, this mechanism is exactly the essence of digital multiplier algorithm.Numeral multiplies The algorithm of musical instruments used in a Buddhist or Taoist mass comparative maturity, most intuitively array algorithm, since the low level of multiplier, calculate each with being multiplied successively Partial product, is then added and is accumulated, it is necessary to n (n+1) individual full adder for n multipliers by several products (partial product) And n²Individual AND gate, realizes that the multiplier calculating speed of this algorithm is slow, area is high with power consumption.

Booth algorithm is a kind of widely used efficient multiplier implementation method, this method calculate first multiplicand with The partial product that each section of multiplier, summation is then compressed to it and obtains final product.The generation and merging of which part product are crucial, portions Dividing the calculating of product not only influences calculating speed, and determines the scale of whole multiplier.Booth algorithm is improved first, The basic framework of classical Booth algorithm displacement, compression and summation is adopted, this multiplier section portion of subtraction is done after eliminating displacement Divide the method for product, and retain some products in shifting process, and to addition quadrature after its many second compression.It is this to improve enhancing Cohesion inside functional module, reduces the coupled relation of intermodule, simplifies the realization that multiplier controls circuit.

But, because multiplier is divided into some multipliers section by Booth algorithm, multiplication problem stipulations are each multiplicand and multiplier The partial product sum of section.Specifically, in Booth algorithm, can according to multiplier section binary data feature, will each section together The Multiplicative Maps of multiplicand are equivalent displacement and subtraction to try to achieve the partial product on this multiplier section, then carried out again many Secondary addition quadrature, or single is added quadrature after many second compressions, this algorithm operating is slower relative to follow-on algorithm speed It is very restricted in speed, and most of mentalities of designing for using synchronous circuit in Digital Design at present, Synchronised clock scheme is, it is necessary to which huge clock distributing network, there is clock skew waits series of problems.

The content of the invention

It is an object of the invention to provide a kind of computing faster, lower 64 Asynchronous Multipliers based on FPGA of energy consumption.

The present invention is achieved in that a kind of 64 Asynchronous Multipliers based on FPGA, and 64 Asynchronous Multipliers include 8*64 multipliers, selector MUX0, selector MUX1, selector MUX2, compressor reducer, counter Count0, counter Count1, counter Count2, some registers, carry lookahead adder CLA, and control unit, wherein：

Described control unit, the streamline constituted using Click nonsynchronous controllers, passes through the carrying out shake communication of nonsynchronous controller To analyze handshake, and order produces four groups of trigger signals；

The counter Count0, for after first group of trigger signal of control unit is received, control selections device MUX0 carries out computing to input signal in 8*64 multipliers, and operation values are stored in 8 registers respectively；

The register, the output valve for store 8*64 multipliers of higher level is receiving the second of control unit After group trigger signal, the output valve of 8*64 multipliers is continued down to transmit；

The counter Count1, for after the 3rd group of trigger signal of control unit is received, passing through selector MUX1, further the number in 8 registers of control, computing is compressed according to setting order in compressor reducer；

The counter Count2, for after the 4th group of trigger signal is received, control selections device MUX2 to select higher level Compressor reducer output valve, and output valve is adjusted back by continuation and 8 register data pressures in higher level's compressor reducer according to judged result Contracting, or output valve is delivered in carry lookahead adder CLA；

The carry lookahead adder CLA carries out sum operation to the output valve received and exports result.

Preferably, in the counter Count0, the input signal of the 8*64 multiplier is the input of 64 digits Signal A, 8 digits input signal B.

Good digital multiplier is processor and the core component of algorithm chip, is basis and the core of all kinds of complicated calculations The heart, particularly completes the key point of high-performance Real-time digital signal processing and image procossing, and the efficiency of multiplier is directly affected The performance of chip.The efficiency of digital multiplier is mainly reflected in two aspects, i.e. area and speed.The different design method of selection With realize algorithm, the influence of area and speed to multiplier is very big.

The present invention proposes a kind of improved Booth multiplication algorithms, and its core concept is first to shift, recompress, and is finally asked With, reduce the coupling of each intermodule, be conducive to control circuit simplification.

In addition, design method of the present invention according to pure asynchronous circuit system, is shaken hands logical using " bound data binding " two-phase The Click micropipelines of agreement are interrogated, the strategy separated according to control with data processing, realize this innovatory algorithm 64 are different Multiplier is walked, and is verified on FPGA.

1st, the asynchronous controlling principle based on micropipeline

The core of asynchronous design methodologies is nonsynchronous controller circuit, and nonsynchronous controller is used to realize carrying out shake communication agreement and association Circuit function is adjusted, the nonsynchronous controller unit of current main flow has three classes, i.e. CElement, GasP and Click.CElement by Muller is proposed the fifties in last century, is most widely used asynchronous controlling unit, realizes shaking hands based on " data-bound " Agreement, this circuit is in communication process of shaking hands, and due to not data are carried out with any constraint, the later stage needs substantial amounts of sequential to test Card work just can guarantee that the correctness of circuit.And GasP and Click circuits will be led to using the Handshake Protocol of " bound data binding " News and data management are separated into different event, and the mechanism of this event separation ensure that the sequential of circuit from principle, and relative The analysis of sequential can significantly simplify asynchronous design methodologies with the use of ensureing.We are constituted using Click nonsynchronous controllers Streamline is called computing, thus completes final multiply repeatedly as control unit, the module of micropipeline control multiplier Musical instruments used in a Buddhist or Taoist mass algorithm.

2nd, Click circuits and two-phase single track Handshake Protocol

Click circuits are equal to 2010 by Peeters and Willem earliest to be proposed, realizes " bound data binding " two-phase Carrying out shake communication agreement.Carrying out shake communication, two signal intensities are carried out with Req (request) and Ack (response) signal between nonsynchronous controller Between, data transfer is realized, and signal management data transfer (is excited) with Fire, as shown in Figure 1.

3rd, asynchronous micropipeline control circuit

64 Asynchronous Multipliers control circuit to carry out the operation time sequence of strict control modules using asynchronous pipeline, multiply Musical instruments used in a Buddhist or Taoist mass has 19 click circuits containing micropipeline altogether, and produces corresponding 19 Fire signals, such as accompanying drawing 2.Asynchronous circuit Due to producing each flowing water section local clock using Handshake Protocol, the global clock in synchronous integrated circuit instead of, it is not necessary to huge Big clock distributing network, so that naturally solve clock drift in synchronous integrated circuit, the problems such as power consumption is higher, and can To obtain the performance under average case, with preferable reusability and robustness.When the incoming micropipelines of request signal in_R During structure, request signal will finally obtain answer signal in_A in sequence toward transmission.By by micropipeline control unit Called repeatedly, complete the arithmetic operation of whole multiplier.

In the control circuit of asynchronous pipeline, trigger signal Fire is not only exported, and in micropipeline control unit In also relate to the control sections such as counter.In whole multiplier, the counter of 3 is needed altogether to drive different selections Device, then by selector further carrys out control data path, realize circulation flowing structure.

Compared to the shortcoming and defect of prior art, the invention has the advantages that：

(1) compared with the synchronous multiplier under same architecture, Asynchronous Multiplier proposed by the present invention is in energy consumption and face In the case that product is substantially constant, faster, each calculating time 150ns or so enters calculating speed for 2 two of any 64 Multiplier processed is multiplied, and can be rapidly completed product calculation；

(2) design is not influenceed by FPGA intrinsic frequencies, and communication delay most reaches soon between micropipeline internal module 1.5ns, it is not necessary to huge clock distributing network and clock skew problem；

(3) modularization of the present invention is good, it is easy to Hierarchical Design.

Brief description of the drawings

Fig. 1 is " bound data binding " two-phase carrying out shake communication protocol theory figure；

Fig. 2 is micropipeline control circuit structure schematic diagram；

Fig. 3 is the structure chart of each logic module in 64 Asynchronous Multipliers of the invention based on FPGA；

Fig. 4 is 8*64 multiplier logic function structure charts；

Fig. 5 is multiplier analogous diagram.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

The invention discloses a kind of 64 Asynchronous Multipliers based on FPGA, as shown in figure 3,64 Asynchronous Multiplier bags Include 8*64 multipliers, selector MUX0, selector MUX1, selector MUX2, compressor reducer, counter Count0, counter Count1, counter Count2, some registers, carry lookahead adder CLA, and control unit (the miniflow water in Fig. 3 Line), wherein,

The counter Count0, for after first group of trigger signal of control unit is received, control selections device MUX0 carries out computing to the input signal A of 64 digits, the input signal B of 8 digits in 64 multipliers, and operation values are stored to respectively In 8 registers；

The counter Count2, is received after a series of trigger signal, then control selections device MUX2 selects higher level Compressor reducer output valve, is up to adjust back to continue to compress with 8 register datas in higher level's compressor reducer, is still delivered to carry look ahead In adder CLA.

In embodiments of the present invention, as shown in figure 3, the completion of the multiplier needs 8*64 multipliers, selector MUX0, MUX1 and MUX2, compressor reducer Compressor, 3 counters Count0, Count1, Count2 and last carry look ahead add The functional modules such as musical instruments used in a Buddhist or Taoist mass CLA are constituted.Wherein, A is one group of carry out divide value with every 8 by Count0 control selections device, is always divided into For 8 groups, the value of division carries out computing with B in 8*64 multipliers respectively, and obtained value is stored in 8 registers respectively. Count1 is to control the number in 8 registers to be compressed in compressor reducer, is compressed 7 times altogether.Count2 selection compressor reducers Output valve is to adjust back or be delivered in carry lookahead adder CLA to carry out sum operation, and the compaction algorithms of compressor reducer are first It is to be compressed the value in FF1 and FF2, is then compressed the value in obtained compressed value and FF3.By that analogy, when Last time compaction algorithms are completed, obtained compressed value is transferred into carry lookahead adder CLA, obtains 64 multipliers most Output valve afterwards.

In embodiments of the present invention, described control unit realization principle is analyzed by the carrying out shake communication of nonsynchronous controller, As shown in figure 3, specifically including：

(1)fire0：In register FF₀In have a wi_a_64bit, two values of wi_b_64bit, fire0 triggerings FF₀By this two Individual value is down transmitted, and wi_a_64bit value is reached in selector MUX0, and wi_b_64bit will directly reach 8*64 multiplication In device, the two values wait trigger signal to carry out first time calculating jointly.Micropipeline continues handshake down to pass simultaneously Pass, and produce fire1 signals.

(2) fire1, frie3, fire5 are to fire15：This 8 trigger signal control counter Count₀Counted, counted Device then control selections device MUX0 is counted, the wi_a_64bit values in selector are divided, 8 class values of output will reach 8*64 In multiplier, computing is carried out with wi_b_64bit value, obtained value is stored in 8 registers.

(3) fire2, fire4, fire6 are to fire16：Register FF1-FF8 stores the output of 8 higher level's multipliers Value, will be continued down to transmit by the output valve of the trigger register of 8 fire signals, 8*64 multipliers.

(4) fire4, fire6, fire8 are to fire16：This 7 trigger signal control control counters count 0-6, work as meter Number device is that the value in FF1 and FF2 is passed through into selector for 0, is delivered in compressor reducer Compressor and is compressed computing, is selected The input value that device MUX1 main functions are compressions required for selection is selected, using 7 grades of circulation flowing structures on data path, and is transported With compressor reducer tree (Compressor_128bit).The computation capability of common adder is limited, and thus the present invention uses 4- 2 compressor reducers, the addition boil down to 2 that this circuit can be concurrently by 4 inputs is exported, and partial product quantity can be reduced into half. 4-2 compressor reducers are serially made up of two one-bit full addres, and high position compression is independent of low order carry, and concurrency is high, and circuit is complicated Degree is relatively low, and arithmetic speed is higher, and then improves the integral operation efficiency of multiplier.

(5) fire5, fire7, fire9 are to fire17：Major control selector MUX2 selection higher level's compressor reducer output valve be Up adjust back or be delivered in carry lookahead adder CLA.When being arrived such as fire5 trigger signals, by the compressed value of first time In the selector MUX1 for adjusting back higher level, the value in FF3 is controlled to proceed compression with readjustment value by MUX1, this operation is always It is extended to the arrival of fire15 signals.When fire17 signals arrive, meeting compressed value, which is down delivered in CLA, to be continued to calculate, and is calculated Obtained value there will be in the register below adder CLA.

(6)fire18：Last signal will trigger carry look ahead CLA FF1 registers, by final product data output.

In embodiments of the present invention, in the counter Count0, the input letter of input signal A, 8 digits to 64 digits Number B carries out calculating process in 8*64 multipliers, as shown in Figure 4.

Figure 4, it is seen that the input signal of multiplier is A, B respectively, wherein A is 64 digits, and B is 8 digits.Input Parameter A will be 8 groups by one group of total score of 8 bit wides, be from [7 respectively:0] [63 are arrived:56].This 8 groups of A are put into 8 with B respectively Multiplier in, wherein 8 multipliers are made up of 4 shift unit Shifter circuits and compressor reducer Compressor, such as Fig. 4 Middle Multiplier1 structure.

Multiplier1 is one of 8*64 multiplier chief components, and the input of multiplier is A [7:0] and B, first By A [7:0] divided using every two bit wide as one group, be divided into (A₇A₆)(A₅A₄)(A₃A₂)(A₁A₀) 4 groups, distinguish per class value with B It is put into 4 displacement encoders and carries out computing, finally gives the binary value of two 15.In whole 8*64 multiplier, Altogether comprising 88 multipliers, the multipliers of 8 groups of A respectively with B Jing Guo the first order are calculated, 16 15 systems are most obtained at last Number, the computing of first stage is completed.

This 16 binary values are passed through 4-2 compressor reducer trees by second stage, complete to be compressed the operation of numerical value, 4-2 pressures Contracting device concurrently can export 4 boil down tos 2 inputted, partial product quantity can be reduced into half.4-2 compressor reducers are by two one Position full adder is serially constituted, and high position compression is independent of low order carry, and concurrency is high, and circuit complexity is relatively low, arithmetic speed compared with It is high.By a series of calculating of the multiplier, compressor reducer 2 output valves S and C, the two value operation values difference are finally given It is stored in 8 registers, and will continues to be calculated in 64 multipliers.

The present invention realizes this follow-on Booth algorithm using asynchronous design methodologies, and control section is using being easy to [micropipeline of composition, functional circuit realizes that the two is by triggering using combinational logic to the Click nonsynchronous controllers of Time-Series analysis Device is bound up, i.e., asynchronous micropipeline safeguards the calculating time of combinational circuit indirectly by the conducting opportunity of Admin Trigger Sequence, three's cooperation completes once/repeatedly multiplication calculating, constitutes a kind of calculating structure of data path (Data-Path) formula.

The Click micropipelines of maintenance data bound data binding of the present invention " two-phase carrying out shake communication agreement are asynchronous to realize Circuit, asynchronous circuit uses local communication pattern, and asynchronous controlling is completed with Handshake Protocol.

Increase Partial product compression number of times proposed by the present invention simultaneously will add the rearmounted lower coupling Booth algorithm of (subtracting) method, this Algorithm improves computational efficiency by the function of separating modules, is especially suitable for asynchronous controlling, further, and the present invention is with asynchronous micro- Streamline mechanism and combination function module complete displacement, compression and addition function, and the design degree of modularity is high, and flow is simply clear It is clear.

Compared with the synchronous multiplier under same architecture, Asynchronous Multiplier proposed by the present invention is big in energy consumption and area In the case that body is constant, calculating speed is fast, each calculating time 150ns or so, multiplies for 2 binary systems of any 64 Number is multiplied, and can be rapidly completed product calculation.

The design and emulation of 64 Asynchronous Multipliers are carried out using Vivado platforms, hardware description language uses Verilog- 1995 (Vivado be Xilinx companies from RTL to bit stream complete design workflow tool, the FPGA (Field- of utilization Programmable Gate Array) model be Xilinx companies Virtex-7 (xc7vx550tffg1158-2), by wi_ A kind of simulation result that both a_64bit=103741655961231, wi_b_64bit=112381656513586 are multiplied, tool Body oscillogram is accompanying drawing 5, in Vivado simulation document TestBench, writes test code, time stimulatiom is run afterwards, Obtain final result of calculation.

From fig. 5, it can be seen that when inR is changed into high level, nonsynchronous controller carrying out shake communication starts, multiplier starts to calculate, 19 fire signals and 2 counters carry out multiplier data path control altogether.In the resource that circuit takes, LUT is accounted for altogether With 3695, the 1.07% of whole resources is accounted for；Register takes 3335, accounts for the 0.48% of whole resources.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention Any modifications, equivalent substitutions and improvements made within refreshing and principle etc., should be included in the scope of the protection.

Claims

1. a kind of 64 Asynchronous Multipliers based on FPGA, it is characterised in that 64 Asynchronous Multipliers include 8*64 multiplication Device, selector MUX0, selector MUX1, selector MUX2, compressor reducer, counter Count0, counter Count1, counter Count2, some registers, carry lookahead adder CLA, and control unit, wherein,

Described control unit, the streamline constituted using Click nonsynchronous controllers is divided by the carrying out shake communication of nonsynchronous controller Handshake is analysed, and order produces four groups of trigger signals；

The counter Count0, for after first group of trigger signal of control unit is received, MUX0 pairs of control selections device Input signal carries out computing in 8*64 multipliers, and operation values are stored in 8 registers respectively；

The register, the output valve for store 8*64 multipliers of higher level is touched in receive control unit second group After signalling, the output valve of 8*64 multipliers is continued down to transmit；

The counter Count1, for after the 3rd group of trigger signal of control unit is received, by selector MUX1, entering One step controls the number in 8 registers, and computing is compressed in compressor reducer according to setting order；

The counter Count2, for after the 4th group of trigger signal is received, control selections device MUX2 selection higher levels to compress Device output valve, and output valve is adjusted back by continuation and the compression of 8 register datas in higher level's compressor reducer according to judged result, or Output valve is delivered in carry lookahead adder CLA by person；

2. 64 Asynchronous Multipliers as claimed in claim 1 based on FPGA, it is characterised in that in the counter Count0 In, the input signal of the 8*64 multiplier is input signal A, the input signal B of 8 digits of 64 digits.