CN107092462A - A kind of 64 Asynchronous Multipliers based on FPGA - Google Patents
A kind of 64 Asynchronous Multipliers based on FPGA Download PDFInfo
- Publication number
- CN107092462A CN107092462A CN201710214226.5A CN201710214226A CN107092462A CN 107092462 A CN107092462 A CN 107092462A CN 201710214226 A CN201710214226 A CN 201710214226A CN 107092462 A CN107092462 A CN 107092462A
- Authority
- CN
- China
- Prior art keywords
- counter
- multipliers
- asynchronous
- selector
- control unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Logic Circuits (AREA)
Abstract
The invention discloses a kind of 64 Asynchronous Multipliers based on FPGA, 64 Asynchronous Multipliers include 8*64 multipliers, selector MUX0, selector MUX1, selector MUX2, compressor reducer, counter Count0, counter Count1, counter Count2, some registers, carry lookahead adder CLA, and control unit, wherein, control unit, the streamline constituted using Click nonsynchronous controllers, handshake is analyzed by the carrying out shake communication of nonsynchronous controller, and order produces four groups of trigger signals;Selector MUX0, selector MUX1, selector MUX2, compressor reducer, counter Count0, counter Count1, counter Count2, some registers, carry lookahead adder CLA carry out the processing such as corresponding data transfer, compression, accumulating operation, output according to four groups of trigger signals.Faster, energy consumption is lower for calculating speed of the present invention.
Description
Technical field
The present invention relates to a kind of 64 Asynchronous Multipliers for being based on field programmable gate array (FPGA).
Background technology
From after the transistor technology appearance seventies in last century, Synchronization Design almost turns into the design method of digital display circuit
Synonym.But current technique has tended to manufacturing limit, 12 nanometers to 7 nanometers of transformation has been slowed down, " very likely first
Away from Moore's Law " (John Gustafson, the AMD seat of honour designer).Clock caused by the huge advance of manufacturing process is askew
Tiltedly, it is the severe challenge of synchronous design method the problems such as power distribution, synchronous design method can not provide these sternnesses and ask in itself
The solution of topic, can only largely use GALS (Global Asynchronous and local synchronization) design method, that is, employ a small amount of asynchronous electricity
The multi-core technology on road, to alleviate above-mentioned challenge.
Modern asynchronous design is introduced based on micropipeline design method, and the core of this design method is nonsynchronous controller
Circuit, for realizing carrying out shake communication agreement and coordination circuits function.Compared to clock scheme, asynchronous circuit uses local communication mould
Formula, completes asynchronous controlling, it is not necessary to huge clock distributing network, the problem of solving clock skew with Handshake Protocol.It is asynchronous
Almost the power consumption of whole system is set to be effectively controlled without power consumption during idle.This asynchronous design methodologies low-power consumption,
The many aspects such as low electromagnetic, low heat emission, modularization are with the obvious advantage.
Digital multiplier is a kind of binary ALU because digital circuitry framework Boolean logic it
On, so needing a kind of mechanism that arithmetic is converted into logic, this mechanism is exactly the essence of digital multiplier algorithm.Numeral multiplies
The algorithm of musical instruments used in a Buddhist or Taoist mass comparative maturity, most intuitively array algorithm, since the low level of multiplier, calculate each with being multiplied successively
Partial product, is then added and is accumulated, it is necessary to n (n+1) individual full adder for n multipliers by several products (partial product)
And n2Individual AND gate, realizes that the multiplier calculating speed of this algorithm is slow, area is high with power consumption.
Booth algorithm is a kind of widely used efficient multiplier implementation method, this method calculate first multiplicand with
The partial product that each section of multiplier, summation is then compressed to it and obtains final product.The generation and merging of which part product are crucial, portions
Dividing the calculating of product not only influences calculating speed, and determines the scale of whole multiplier.Booth algorithm is improved first,
The basic framework of classical Booth algorithm displacement, compression and summation is adopted, this multiplier section portion of subtraction is done after eliminating displacement
Divide the method for product, and retain some products in shifting process, and to addition quadrature after its many second compression.It is this to improve enhancing
Cohesion inside functional module, reduces the coupled relation of intermodule, simplifies the realization that multiplier controls circuit.
But, because multiplier is divided into some multipliers section by Booth algorithm, multiplication problem stipulations are each multiplicand and multiplier
The partial product sum of section.Specifically, in Booth algorithm, can according to multiplier section binary data feature, will each section together
The Multiplicative Maps of multiplicand are equivalent displacement and subtraction to try to achieve the partial product on this multiplier section, then carried out again many
Secondary addition quadrature, or single is added quadrature after many second compressions, this algorithm operating is slower relative to follow-on algorithm speed
It is very restricted in speed, and most of mentalities of designing for using synchronous circuit in Digital Design at present,
Synchronised clock scheme is, it is necessary to which huge clock distributing network, there is clock skew waits series of problems.
The content of the invention
It is an object of the invention to provide a kind of computing faster, lower 64 Asynchronous Multipliers based on FPGA of energy consumption.
The present invention is achieved in that a kind of 64 Asynchronous Multipliers based on FPGA, and 64 Asynchronous Multipliers include
8*64 multipliers, selector MUX0, selector MUX1, selector MUX2, compressor reducer, counter Count0, counter
Count1, counter Count2, some registers, carry lookahead adder CLA, and control unit, wherein:
Described control unit, the streamline constituted using Click nonsynchronous controllers, passes through the carrying out shake communication of nonsynchronous controller
To analyze handshake, and order produces four groups of trigger signals;
The counter Count0, for after first group of trigger signal of control unit is received, control selections device
MUX0 carries out computing to input signal in 8*64 multipliers, and operation values are stored in 8 registers respectively;
The register, the output valve for store 8*64 multipliers of higher level is receiving the second of control unit
After group trigger signal, the output valve of 8*64 multipliers is continued down to transmit;
The counter Count1, for after the 3rd group of trigger signal of control unit is received, passing through selector
MUX1, further the number in 8 registers of control, computing is compressed according to setting order in compressor reducer;
The counter Count2, for after the 4th group of trigger signal is received, control selections device MUX2 to select higher level
Compressor reducer output valve, and output valve is adjusted back by continuation and 8 register data pressures in higher level's compressor reducer according to judged result
Contracting, or output valve is delivered in carry lookahead adder CLA;
The carry lookahead adder CLA carries out sum operation to the output valve received and exports result.
Preferably, in the counter Count0, the input signal of the 8*64 multiplier is the input of 64 digits
Signal A, 8 digits input signal B.
Good digital multiplier is processor and the core component of algorithm chip, is basis and the core of all kinds of complicated calculations
The heart, particularly completes the key point of high-performance Real-time digital signal processing and image procossing, and the efficiency of multiplier is directly affected
The performance of chip.The efficiency of digital multiplier is mainly reflected in two aspects, i.e. area and speed.The different design method of selection
With realize algorithm, the influence of area and speed to multiplier is very big.
The present invention proposes a kind of improved Booth multiplication algorithms, and its core concept is first to shift, recompress, and is finally asked
With, reduce the coupling of each intermodule, be conducive to control circuit simplification.
In addition, design method of the present invention according to pure asynchronous circuit system, is shaken hands logical using " bound data binding " two-phase
The Click micropipelines of agreement are interrogated, the strategy separated according to control with data processing, realize this innovatory algorithm 64 are different
Multiplier is walked, and is verified on FPGA.
1st, the asynchronous controlling principle based on micropipeline
The core of asynchronous design methodologies is nonsynchronous controller circuit, and nonsynchronous controller is used to realize carrying out shake communication agreement and association
Circuit function is adjusted, the nonsynchronous controller unit of current main flow has three classes, i.e. CElement, GasP and Click.CElement by
Muller is proposed the fifties in last century, is most widely used asynchronous controlling unit, realizes shaking hands based on " data-bound "
Agreement, this circuit is in communication process of shaking hands, and due to not data are carried out with any constraint, the later stage needs substantial amounts of sequential to test
Card work just can guarantee that the correctness of circuit.And GasP and Click circuits will be led to using the Handshake Protocol of " bound data binding "
News and data management are separated into different event, and the mechanism of this event separation ensure that the sequential of circuit from principle, and relative
The analysis of sequential can significantly simplify asynchronous design methodologies with the use of ensureing.We are constituted using Click nonsynchronous controllers
Streamline is called computing, thus completes final multiply repeatedly as control unit, the module of micropipeline control multiplier
Musical instruments used in a Buddhist or Taoist mass algorithm.
2nd, Click circuits and two-phase single track Handshake Protocol
Click circuits are equal to 2010 by Peeters and Willem earliest to be proposed, realizes " bound data binding " two-phase
Carrying out shake communication agreement.Carrying out shake communication, two signal intensities are carried out with Req (request) and Ack (response) signal between nonsynchronous controller
Between, data transfer is realized, and signal management data transfer (is excited) with Fire, as shown in Figure 1.
3rd, asynchronous micropipeline control circuit
64 Asynchronous Multipliers control circuit to carry out the operation time sequence of strict control modules using asynchronous pipeline, multiply
Musical instruments used in a Buddhist or Taoist mass has 19 click circuits containing micropipeline altogether, and produces corresponding 19 Fire signals, such as accompanying drawing 2.Asynchronous circuit
Due to producing each flowing water section local clock using Handshake Protocol, the global clock in synchronous integrated circuit instead of, it is not necessary to huge
Big clock distributing network, so that naturally solve clock drift in synchronous integrated circuit, the problems such as power consumption is higher, and can
To obtain the performance under average case, with preferable reusability and robustness.When the incoming micropipelines of request signal in_R
During structure, request signal will finally obtain answer signal in_A in sequence toward transmission.By by micropipeline control unit
Called repeatedly, complete the arithmetic operation of whole multiplier.
In the control circuit of asynchronous pipeline, trigger signal Fire is not only exported, and in micropipeline control unit
In also relate to the control sections such as counter.In whole multiplier, the counter of 3 is needed altogether to drive different selections
Device, then by selector further carrys out control data path, realize circulation flowing structure.
Compared to the shortcoming and defect of prior art, the invention has the advantages that:
(1) compared with the synchronous multiplier under same architecture, Asynchronous Multiplier proposed by the present invention is in energy consumption and face
In the case that product is substantially constant, faster, each calculating time 150ns or so enters calculating speed for 2 two of any 64
Multiplier processed is multiplied, and can be rapidly completed product calculation;
(2) design is not influenceed by FPGA intrinsic frequencies, and communication delay most reaches soon between micropipeline internal module
1.5ns, it is not necessary to huge clock distributing network and clock skew problem;
(3) modularization of the present invention is good, it is easy to Hierarchical Design.
Brief description of the drawings
Fig. 1 is " bound data binding " two-phase carrying out shake communication protocol theory figure;
Fig. 2 is micropipeline control circuit structure schematic diagram;
Fig. 3 is the structure chart of each logic module in 64 Asynchronous Multipliers of the invention based on FPGA;
Fig. 4 is 8*64 multiplier logic function structure charts;
Fig. 5 is multiplier analogous diagram.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
The invention discloses a kind of 64 Asynchronous Multipliers based on FPGA, as shown in figure 3,64 Asynchronous Multiplier bags
Include 8*64 multipliers, selector MUX0, selector MUX1, selector MUX2, compressor reducer, counter Count0, counter
Count1, counter Count2, some registers, carry lookahead adder CLA, and control unit (the miniflow water in Fig. 3
Line), wherein,
Described control unit, the streamline constituted using Click nonsynchronous controllers, passes through the carrying out shake communication of nonsynchronous controller
To analyze handshake, and order produces four groups of trigger signals;
The counter Count0, for after first group of trigger signal of control unit is received, control selections device
MUX0 carries out computing to the input signal A of 64 digits, the input signal B of 8 digits in 64 multipliers, and operation values are stored to respectively
In 8 registers;
The register, the output valve for store 8*64 multipliers of higher level is receiving the second of control unit
After group trigger signal, the output valve of 8*64 multipliers is continued down to transmit;
The counter Count1, for after the 3rd group of trigger signal of control unit is received, passing through selector
MUX1, further the number in 8 registers of control, computing is compressed according to setting order in compressor reducer;
The counter Count2, is received after a series of trigger signal, then control selections device MUX2 selects higher level
Compressor reducer output valve, is up to adjust back to continue to compress with 8 register datas in higher level's compressor reducer, is still delivered to carry look ahead
In adder CLA.
The carry lookahead adder CLA carries out sum operation to the output valve received and exports result.
In embodiments of the present invention, as shown in figure 3, the completion of the multiplier needs 8*64 multipliers, selector MUX0,
MUX1 and MUX2, compressor reducer Compressor, 3 counters Count0, Count1, Count2 and last carry look ahead add
The functional modules such as musical instruments used in a Buddhist or Taoist mass CLA are constituted.Wherein, A is one group of carry out divide value with every 8 by Count0 control selections device, is always divided into
For 8 groups, the value of division carries out computing with B in 8*64 multipliers respectively, and obtained value is stored in 8 registers respectively.
Count1 is to control the number in 8 registers to be compressed in compressor reducer, is compressed 7 times altogether.Count2 selection compressor reducers
Output valve is to adjust back or be delivered in carry lookahead adder CLA to carry out sum operation, and the compaction algorithms of compressor reducer are first
It is to be compressed the value in FF1 and FF2, is then compressed the value in obtained compressed value and FF3.By that analogy, when
Last time compaction algorithms are completed, obtained compressed value is transferred into carry lookahead adder CLA, obtains 64 multipliers most
Output valve afterwards.
In embodiments of the present invention, described control unit realization principle is analyzed by the carrying out shake communication of nonsynchronous controller,
As shown in figure 3, specifically including:
(1)fire0:In register FF0In have a wi_a_64bit, two values of wi_b_64bit, fire0 triggerings FF0By this two
Individual value is down transmitted, and wi_a_64bit value is reached in selector MUX0, and wi_b_64bit will directly reach 8*64 multiplication
In device, the two values wait trigger signal to carry out first time calculating jointly.Micropipeline continues handshake down to pass simultaneously
Pass, and produce fire1 signals.
(2) fire1, frie3, fire5 are to fire15:This 8 trigger signal control counter Count0Counted, counted
Device then control selections device MUX0 is counted, the wi_a_64bit values in selector are divided, 8 class values of output will reach 8*64
In multiplier, computing is carried out with wi_b_64bit value, obtained value is stored in 8 registers.
(3) fire2, fire4, fire6 are to fire16:Register FF1-FF8 stores the output of 8 higher level's multipliers
Value, will be continued down to transmit by the output valve of the trigger register of 8 fire signals, 8*64 multipliers.
(4) fire4, fire6, fire8 are to fire16:This 7 trigger signal control control counters count 0-6, work as meter
Number device is that the value in FF1 and FF2 is passed through into selector for 0, is delivered in compressor reducer Compressor and is compressed computing, is selected
The input value that device MUX1 main functions are compressions required for selection is selected, using 7 grades of circulation flowing structures on data path, and is transported
With compressor reducer tree (Compressor_128bit).The computation capability of common adder is limited, and thus the present invention uses 4-
2 compressor reducers, the addition boil down to 2 that this circuit can be concurrently by 4 inputs is exported, and partial product quantity can be reduced into half.
4-2 compressor reducers are serially made up of two one-bit full addres, and high position compression is independent of low order carry, and concurrency is high, and circuit is complicated
Degree is relatively low, and arithmetic speed is higher, and then improves the integral operation efficiency of multiplier.
(5) fire5, fire7, fire9 are to fire17:Major control selector MUX2 selection higher level's compressor reducer output valve be
Up adjust back or be delivered in carry lookahead adder CLA.When being arrived such as fire5 trigger signals, by the compressed value of first time
In the selector MUX1 for adjusting back higher level, the value in FF3 is controlled to proceed compression with readjustment value by MUX1, this operation is always
It is extended to the arrival of fire15 signals.When fire17 signals arrive, meeting compressed value, which is down delivered in CLA, to be continued to calculate, and is calculated
Obtained value there will be in the register below adder CLA.
(6)fire18:Last signal will trigger carry look ahead CLA FF1 registers, by final product data output.
In embodiments of the present invention, in the counter Count0, the input letter of input signal A, 8 digits to 64 digits
Number B carries out calculating process in 8*64 multipliers, as shown in Figure 4.
Figure 4, it is seen that the input signal of multiplier is A, B respectively, wherein A is 64 digits, and B is 8 digits.Input
Parameter A will be 8 groups by one group of total score of 8 bit wides, be from [7 respectively:0] [63 are arrived:56].This 8 groups of A are put into 8 with B respectively
Multiplier in, wherein 8 multipliers are made up of 4 shift unit Shifter circuits and compressor reducer Compressor, such as Fig. 4
Middle Multiplier1 structure.
Multiplier1 is one of 8*64 multiplier chief components, and the input of multiplier is A [7:0] and B, first
By A [7:0] divided using every two bit wide as one group, be divided into (A7A6)(A5A4)(A3A2)(A1A0) 4 groups, distinguish per class value with B
It is put into 4 displacement encoders and carries out computing, finally gives the binary value of two 15.In whole 8*64 multiplier,
Altogether comprising 88 multipliers, the multipliers of 8 groups of A respectively with B Jing Guo the first order are calculated, 16 15 systems are most obtained at last
Number, the computing of first stage is completed.
This 16 binary values are passed through 4-2 compressor reducer trees by second stage, complete to be compressed the operation of numerical value, 4-2 pressures
Contracting device concurrently can export 4 boil down tos 2 inputted, partial product quantity can be reduced into half.4-2 compressor reducers are by two one
Position full adder is serially constituted, and high position compression is independent of low order carry, and concurrency is high, and circuit complexity is relatively low, arithmetic speed compared with
It is high.By a series of calculating of the multiplier, compressor reducer 2 output valves S and C, the two value operation values difference are finally given
It is stored in 8 registers, and will continues to be calculated in 64 multipliers.
The present invention realizes this follow-on Booth algorithm using asynchronous design methodologies, and control section is using being easy to
[micropipeline of composition, functional circuit realizes that the two is by triggering using combinational logic to the Click nonsynchronous controllers of Time-Series analysis
Device is bound up, i.e., asynchronous micropipeline safeguards the calculating time of combinational circuit indirectly by the conducting opportunity of Admin Trigger
Sequence, three's cooperation completes once/repeatedly multiplication calculating, constitutes a kind of calculating structure of data path (Data-Path) formula.
The Click micropipelines of maintenance data bound data binding of the present invention " two-phase carrying out shake communication agreement are asynchronous to realize
Circuit, asynchronous circuit uses local communication pattern, and asynchronous controlling is completed with Handshake Protocol.
Increase Partial product compression number of times proposed by the present invention simultaneously will add the rearmounted lower coupling Booth algorithm of (subtracting) method, this
Algorithm improves computational efficiency by the function of separating modules, is especially suitable for asynchronous controlling, further, and the present invention is with asynchronous micro-
Streamline mechanism and combination function module complete displacement, compression and addition function, and the design degree of modularity is high, and flow is simply clear
It is clear.
Compared with the synchronous multiplier under same architecture, Asynchronous Multiplier proposed by the present invention is big in energy consumption and area
In the case that body is constant, calculating speed is fast, each calculating time 150ns or so, multiplies for 2 binary systems of any 64
Number is multiplied, and can be rapidly completed product calculation.
The design and emulation of 64 Asynchronous Multipliers are carried out using Vivado platforms, hardware description language uses Verilog-
1995 (Vivado be Xilinx companies from RTL to bit stream complete design workflow tool, the FPGA (Field- of utilization
Programmable Gate Array) model be Xilinx companies Virtex-7 (xc7vx550tffg1158-2), by wi_
A kind of simulation result that both a_64bit=103741655961231, wi_b_64bit=112381656513586 are multiplied, tool
Body oscillogram is accompanying drawing 5, in Vivado simulation document TestBench, writes test code, time stimulatiom is run afterwards,
Obtain final result of calculation.
From fig. 5, it can be seen that when inR is changed into high level, nonsynchronous controller carrying out shake communication starts, multiplier starts to calculate,
19 fire signals and 2 counters carry out multiplier data path control altogether.In the resource that circuit takes, LUT is accounted for altogether
With 3695, the 1.07% of whole resources is accounted for;Register takes 3335, accounts for the 0.48% of whole resources.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
Any modifications, equivalent substitutions and improvements made within refreshing and principle etc., should be included in the scope of the protection.
Claims (2)
1. a kind of 64 Asynchronous Multipliers based on FPGA, it is characterised in that 64 Asynchronous Multipliers include 8*64 multiplication
Device, selector MUX0, selector MUX1, selector MUX2, compressor reducer, counter Count0, counter Count1, counter
Count2, some registers, carry lookahead adder CLA, and control unit, wherein,
Described control unit, the streamline constituted using Click nonsynchronous controllers is divided by the carrying out shake communication of nonsynchronous controller
Handshake is analysed, and order produces four groups of trigger signals;
The counter Count0, for after first group of trigger signal of control unit is received, MUX0 pairs of control selections device
Input signal carries out computing in 8*64 multipliers, and operation values are stored in 8 registers respectively;
The register, the output valve for store 8*64 multipliers of higher level is touched in receive control unit second group
After signalling, the output valve of 8*64 multipliers is continued down to transmit;
The counter Count1, for after the 3rd group of trigger signal of control unit is received, by selector MUX1, entering
One step controls the number in 8 registers, and computing is compressed in compressor reducer according to setting order;
The counter Count2, for after the 4th group of trigger signal is received, control selections device MUX2 selection higher levels to compress
Device output valve, and output valve is adjusted back by continuation and the compression of 8 register datas in higher level's compressor reducer according to judged result, or
Output valve is delivered in carry lookahead adder CLA by person;
The carry lookahead adder CLA carries out sum operation to the output valve received and exports result.
2. 64 Asynchronous Multipliers as claimed in claim 1 based on FPGA, it is characterised in that in the counter Count0
In, the input signal of the 8*64 multiplier is input signal A, the input signal B of 8 digits of 64 digits.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710214226.5A CN107092462B (en) | 2017-04-01 | 2017-04-01 | 64-bit asynchronous multiplier based on FPGA |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710214226.5A CN107092462B (en) | 2017-04-01 | 2017-04-01 | 64-bit asynchronous multiplier based on FPGA |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107092462A true CN107092462A (en) | 2017-08-25 |
CN107092462B CN107092462B (en) | 2020-10-09 |
Family
ID=59646295
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710214226.5A Active CN107092462B (en) | 2017-04-01 | 2017-04-01 | 64-bit asynchronous multiplier based on FPGA |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107092462B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108537331A (en) * | 2018-04-04 | 2018-09-14 | 清华大学 | A kind of restructural convolutional neural networks accelerating circuit based on asynchronous logic |
CN113407239A (en) * | 2021-06-09 | 2021-09-17 | 中山大学 | Assembly line processor based on asynchronous single track |
WO2023179325A1 (en) * | 2022-03-24 | 2023-09-28 | 华为技术有限公司 | Chip, signal processing method, and electronic device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060008080A1 (en) * | 2004-07-09 | 2006-01-12 | Nec Electronics Corporation | Modular-multiplication computing unit and information processing unit |
CN101504599A (en) * | 2009-03-16 | 2009-08-12 | 西安电子科技大学 | Special instruction set micro-processing system suitable for digital signal processing application |
-
2017
- 2017-04-01 CN CN201710214226.5A patent/CN107092462B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060008080A1 (en) * | 2004-07-09 | 2006-01-12 | Nec Electronics Corporation | Modular-multiplication computing unit and information processing unit |
CN101504599A (en) * | 2009-03-16 | 2009-08-12 | 西安电子科技大学 | Special instruction set micro-processing system suitable for digital signal processing application |
Non-Patent Citations (1)
Title |
---|
肖鹏: "基于FPGA的高速双精度浮点乘法器设计", 《微电子学与计算机》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108537331A (en) * | 2018-04-04 | 2018-09-14 | 清华大学 | A kind of restructural convolutional neural networks accelerating circuit based on asynchronous logic |
CN113407239A (en) * | 2021-06-09 | 2021-09-17 | 中山大学 | Assembly line processor based on asynchronous single track |
CN113407239B (en) * | 2021-06-09 | 2023-06-13 | 中山大学 | Pipeline processor based on asynchronous monorail |
WO2023179325A1 (en) * | 2022-03-24 | 2023-09-28 | 华为技术有限公司 | Chip, signal processing method, and electronic device |
Also Published As
Publication number | Publication date |
---|---|
CN107092462B (en) | 2020-10-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gong et al. | MALOC: A fully pipelined FPGA accelerator for convolutional neural networks with all layers mapped on chip | |
CN104899182B (en) | A kind of Matrix Multiplication accelerated method for supporting variable partitioned blocks | |
CN103677739B (en) | A kind of configurable multiply accumulating arithmetic element and composition thereof multiply accumulating computing array | |
CN100470464C (en) | Multiplier based on improved Montgomey's algorithm | |
CN107092462A (en) | A kind of 64 Asynchronous Multipliers based on FPGA | |
CN109828744A (en) | A kind of configurable floating point vector multiplication IP kernel based on FPGA | |
CN104145281A (en) | Neural network computing apparatus and system, and method therefor | |
CN105183425B (en) | A kind of fixation bit wide multiplier with high-precision low complex degree characteristic | |
CN101221491B (en) | Point addition system of elliptic curve cipher system | |
CN110058840A (en) | A kind of low-consumption multiplier based on 4-Booth coding | |
Stevens et al. | Energy and performance models for synchronous and asynchronous communication | |
CN107544942A (en) | A kind of VLSI design methods of Fast Fourier Transform (FFT) | |
CN106775577B (en) | A kind of design method of the non-precision redundant manipulators multiplier of high-performance | |
CN102364456A (en) | 64-point fast Fourier transform (FFT) calculator | |
CN109472734A (en) | A kind of target detection network and its implementation based on FPGA | |
Yin et al. | FPGA-based high-performance CNN accelerator architecture with high DSP utilization and efficient scheduling mode | |
CN103078729A (en) | Dual-precision chaotic signal generator based on FPGA (field programmable gate array) | |
CN104407836A (en) | Device and method of carrying out cascaded multiply accumulation operation by utilizing fixed-point multiplier | |
Ranganathan et al. | A linear array processor with dynamic frequency clocking for image processing applications | |
CN202395792U (en) | Double precision chaotic signal generator based on FPGA | |
CN108108812A (en) | For the efficiently configurable convolutional calculation accelerator of convolutional neural networks | |
CN107368459A (en) | The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions matrix multiplication | |
CN106168897A (en) | Resources conservation circuit structure for deep stream aquation pulsation finite impulse response filter | |
Srivastava et al. | Operation-dependent frequency scaling using desynchronization | |
Msadaa et al. | A SoPC FPGA implementing of an enhanced parallel CFAR architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |