CN110647309A - High-speed big bit width multiplier - Google Patents

High-speed big bit width multiplier Download PDF

Info

Publication number
CN110647309A
CN110647309A CN201910934899.7A CN201910934899A CN110647309A CN 110647309 A CN110647309 A CN 110647309A CN 201910934899 A CN201910934899 A CN 201910934899A CN 110647309 A CN110647309 A CN 110647309A
Authority
CN
China
Prior art keywords
bit
multiplier
result
multiplication
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910934899.7A
Other languages
Chinese (zh)
Other versions
CN110647309B (en
Inventor
吴冰瑞
俞艳东
张培勇
陆玲霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201910934899.7A priority Critical patent/CN110647309B/en
Publication of CN110647309A publication Critical patent/CN110647309A/en
Application granted granted Critical
Publication of CN110647309B publication Critical patent/CN110647309B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a high-speed large-bit-width multiplier. The multiplier comprises two complementary clocks, a CLA adder, an overflow processing module, a decoder, a K bit multiplying unit and a data operation module; the operation method of the multiplier comprises the following steps: dividing the partial integrals into two groups, controlling the partial integrals of each group by different clocks, and performing parallel operation; and respectively carrying out multiplication operation and shift addition operation on the rising edges of the two complementary clocks to obtain a final multiplication result. The high-speed large-bit-width multiplier reduces the clock cycle consumption by half, and improves the operation speed of the multiplier. The multiplier can be used in the fields of integrated circuits, programmable logic devices, digital signal processing, communication and the like, and is characterized by simple circuit structure, less occupied resources, high speed and capability of realizing multiplication operation of operands with large bit width.

Description

High-speed big bit width multiplier
Technical Field
The invention belongs to the field of computers and integrated circuits, and particularly relates to a design of a high-speed large-bit-width multiplier which can be applied to the fields of digital image processing, communication and the like.
Background
With the rapid development of artificial intelligence, cloud computing and internet of things technologies, the performance requirements of various processors are higher and higher. The multiplier, which is a core component in the processor, largely determines the operating frequency of the whole system due to the long delay time, and a large-bit-width multiplier consumes a larger chip area. Therefore, the speed and area of the multiplier will determine the performance and cost of the whole processor, and is the key point of system optimization.
Some high-speed multipliers have been proposed in recent years, and the main types thereof are classified into the following four types: an adder tree multiplier, a parallel multiplier, a look-up table multiplier, and a shift-and-add multiplier. The addition tree multiplier and the parallel multiplier have high operation speed and have the defect that the consumed hardware resources can be rapidly increased along with the increase of the number of multiplier bits; the lookup table multiplier accesses a memory storing multiplication results by using operands as addresses, the speed of the lookup table multiplier depends on the access speed of the memory, and when the number of bits of the multiplier is increased, the space of the memory is increased sharply; the shift-add multiplier has the disadvantages of low resource consumption and low speed.
The multiplier operation has two main aspects, namely generation of partial products on one hand and accumulation of the partial products on the other hand. Therefore, the method for increasing the speed of the multiplier is mainly to reduce the number of partial products and increase the accumulation speed of the partial products. At present, a plurality of schemes for realizing the multiplier are provided, and the emphasis of the schemes is to increase the speed of the multiplier, but the optimization of the speed and the resource consumption of the multiplier with large bit width is neglected.
Disclosure of Invention
The invention aims to provide a high-speed large-bit-width multiplier which is characterized by simple circuit structure, less occupied resources and high speed, and can realize large-bit-width operand multiplication operation. The multiplier can be used in the fields of integrated circuits, programmable logic devices, digital signal processing, communication and the like. The multiplier of the invention can reduce the consumption of clock period by half and improve the operation speed of the multiplier.
In order to achieve the purpose, the invention adopts the technical scheme that:
a high-speed big-bit-width multiplier comprises two complementary clocks, a CLA adder, an overflow processing module, a decoder, a K-bit multiplying unit and a data operation module;
the two complementary clocks have the same frequency and opposite phases and are used for controlling the two counters;
the CLA adder, namely the carry look ahead adder, is characterized in that carry signals of all stages are generated simultaneously, and the time for generating carry is greatly reduced. Here the adder will divide the X multiplier { ak-1ak-2…a1a0Each portion is added two by two, e.g. a0+a1,a0+a2,a0+a3,a1+a2,......,ak-1+ak. Similarly, for the Y multiplier { b }k-1bk-2...b1b0Is given by b0+b1,b0+b2,b0+b3,b1+b2,......,bk-1+bk
The overflow processing module mainly judges the result of the first-stage CLA adder, and when the bit width of the result is greater than K bits, the output result of the module is sent to a decoder and a data operation module at the lower stage under the control of a clock.
The decoder mainly decodes the code of the counter, selects the data sent from the front stage, stores the data in a corresponding register, and then sends the data to the K bit multiplication unit for operation.
The K bit multiplying unit adopts a partial product generator with two multipliers of K bits and 2 x K bits of output result, and the type of the partial product generator can comprise a pipeline type, a Booth type and the like.
The data operation module is mainly used for carrying out shift addition on the operation result of the K bit multiplication unit under the control of a clock and a counter, the output result of the data operation module enters the next-stage adder for final calculation, and the second-stage CLA adder outputs a correct operation result under the control of the counter.
Further, the method for sending the module output result to the lower-level decoder and the data operation module by the overflow processing module is as follows: the low K bits of the result of the CLA adder are sent to a decoder, and the highest bit of the result is sent to a data operation module, so that the multiplication result is correct.
Further, the operation method of the multiplier is as follows: the karatsuba algorithm is applied to hardware of a parallel multiplier, parts are integrated into two groups, and the two clocks control corresponding counters to form different codes so as to realize control of a decoder, wherein the coding mode can adopt one-hot codes or GRAY codes. When the rising edge of the clock comes, the corresponding multipliers are sequentially sent into the K bit multiplication unit to finish multiplication operation and shift addition operation, and the final multiplication result can be obtained.
Further, the application of the karatsuba algorithm to the hardware of a parallel multiplier may be used to reduce the partial products during large bit width multiplication operations.
The karatsuba algorithm is a fast multiplication algorithm, is mainly used for multiplication of two large numbers, greatly improves the operation efficiency, reduces the complexity compared with the common multiplication, and applies the recursive idea. The basic principle and practice is to divide the two large numbers x and y, which have a large number of bits, into a number with a small number of bits. After this process, it is simplified to make three multiplications, with a small number of addition operations and shift operations.
Compared with the faster tom-Cook algorithm, the algorithm is simpler in hardware implementation, the area cost and the hardware realizability are considered comprehensively, and the karatsuba algorithm is more suitable for multipliers in the processor.
The specific implementation method comprises the following steps: dividing the input two large bit widths x and y of n bits into m k bit numbers, wherein m is greater than 0, and k is greater than 0. Let X be { a ═ ak-1ak-2…a1a0},Y={bk-1bk-2…b1b0The following are provided:
X*Y=a0*b0+(a1*b0+a0*b1)*2k+...+22(m-1)kak-1bk-1 (1)
among these, according to the karatsuba algorithm:
a1*b0+a0*b1=(a0+a1)*(b0+b1)-a0*b0-a1*b1 (2)
replacing the coefficients in equation (1) with the right three terms of the equation yields:
X*Y=a0*b0+[(a0+a1)*(b0+b1)-a0*b0-a1*b1]*2k+...+22(m-1)kak-1bk-1 (3)
the number of such partial products is from m2Is reduced to
Figure BDA0002221295910000031
Namely, it is
Figure BDA0002221295910000032
And (4) respectively.
In the invention, since the coefficients in the formula (3) do not affect each other from the first term and from the last term, the multiplication of the coefficients can be performed from the beginning and the end simultaneously.
The invention adopts a pair of complementary clocks with the same frequency and opposite phases, so that the data change of the counters controlled by the two clocks can be separated by half a period, and no interference is generated when the data are input to a decoder. The decoder selects proper preceding stage data to be sent to the K bit multiplication unit in real time according to the two counters, and the data sent to the K bit multiplication unit every time are spaced by half a clock period. If the first-stage CLA adder has data overflow, a data overflow processing module is called. The data operation module sends the processed data to the second-stage CLA adder, and the second-stage CLA adder outputs a correct operation result under the control of the counter.
The invention has the beneficial effects that:
compared with the prior art, the method has the advantages that the partial product is reduced by utilizing the karatsuba algorithm, so that the multiplication speed is improved, the resource consumption and the time delay of a critical path are reduced, and the chip cost is saved. Meanwhile, the consumption of the clock is saved by utilizing the optimization of the complementary clock, the period of the operation clock is reduced to a half, and the operation efficiency is improved.
Compared with the traditional parallel multiplier, the invention has the advantages that the operation speed is improved by 3-4 times; compared with other types of multipliers, the speed is higher, the circuit is simpler, and the occupied resources are less.
Description of the drawings:
FIG. 1 is a flow chart of an embodiment of the present invention;
fig. 2 is a diagram of an implementation architecture of the present invention.
The specific implementation mode is as follows:
the invention is further described with reference to the accompanying drawings and specific examples.
Fig. 2 shows a high-speed large-bit-width multiplier according to the present invention, which includes two complementary clocks, a CLA adder, an overflow processing module, a decoder, a K-bit multiplying unit, and a data operation module;
the two complementary clocks have the same frequency and opposite phases and are used for controlling the two counters;
the CLA adder, namely the carry look ahead adder, is characterized in that carry signals of all stages are generated simultaneously, and the time for generating carry is greatly reduced. Here the adder will divide the X multiplier { ak-1ak-2…a1a0Each portion is added two by two, e.g. a0+a1,a0+a2,a0+a3,a1+a2,......,ak-1+ak(ii) a Similarly, for the Y multiplier { b }k-1bk-2...b1b0Is given by b0+b1,b0+b2,b0+b3,b1+b2,......,bk-1+bk
The overflow processing module mainly judges the result of the first-stage CLA adder, and when the bit width of the result is more than K bits, the output result of the module is sent to a decoder and a data operation module at the lower stage under the control of a clock;
the decoder mainly decodes the code of the counter, selects the data sent from the front stage, stores the data in a corresponding register and sends the data to the K bit multiplication unit for operation;
the K bit multiplying unit adopts a partial product generator with two multipliers of K bits and 2 x K bits of output result, and the type of the partial product generator can comprise a pipeline type, a Booth type and the like;
the data operation module is mainly used for carrying out shift addition on the operation result of the K bit multiplication unit under the control of a clock and a counter, the output result of the data operation module enters the next-stage adder for final calculation, and the second-stage CLA adder outputs a correct operation result under the control of the counter.
The high-speed large-bit-width multiplier applies the karatsuba algorithm to the hardware realization of a parallel multiplier, integrates parts into two groups, and controls a decoder by using different codes formed by a counter 1 and a counter 2 which are controlled by the two clocks. When the rising edge of the clock comes, the corresponding multipliers are sequentially sent into the K bit multiplication unit to finish multiplication operation and shift addition operation, and the final multiplication result can be obtained.
The invention adopts a pair of complementary clocks with the same frequency and opposite phases, so that the data change of the counter 1 and the counter 2 controlled by the two clocks in the figure 2 can be separated by half a period, and the data change can not generate interference when being input to a decoder. The decoder selects proper preceding stage data to be sent to the K bit multiplication unit in real time according to the counter 1 and the counter 2, and the data sent to the K bit multiplication unit every time are separated by half clock period. In FIG. 2, if the first stage CLA adder has data overflow, the data overflow handling module is called. The overflow processing module sends the low-K bits of the result of the CLA adder to the decoder, and the highest bit of the result is sent to the data operation module, so that the multiplication result is correct. The data operation module sends the processed data to the second-stage CLA adder, and the second-stage CLA adder outputs a correct operation result under the control of the counter 1.
The high-speed big bit width number multiplier of the invention uses the karatsuba algorithm to reduce the partial product in the big bit width multiplication operation process, and the specific realization method is as follows: of two n bits to be inputAnd dividing the large bit width x and y into m k bits, wherein m is greater than 0, and k is greater than 0. Let X be { a ═ ak-1ak-2…a1a0},Y={bk-1bk-2…b1b0The following are provided:
X*Y=a0*b0+(a1*b0+a0*b1)*2k+...+22(m-1)kak-1bk-1 (1)
among these, according to the karatsuba algorithm:
a1*b0+a0*b1=(a0+a1)*(b0+b1)-a0*b0-a1*b1 (2)
replacing the coefficients in equation (1) with the right three terms of the equation yields:
X*Y=a0*b0+[(a0+a1)*(b0+b1)-a0*b0-a1*b1]*2k+...+22(m-1)kak-1bk-1 (3)
so that the number of partial products is reduced from m2 to
Figure BDA0002221295910000051
Namely, it is
Figure BDA0002221295910000052
And (4) respectively.
In the invention, since the coefficients in the formula (3) do not affect each other from the first term and from the last term, the multiplication of the coefficients can be performed from the beginning and the end simultaneously.
The input two multipliers X, Y are 256 bits wide and the output Q is 512 bits wide in this example. The two input multipliers are respectively divided into 4 64-bit numbers, namely X ═ a3a2a1a0},Y={b3b2b1b0Q ═ X ═ Y ═ a }0*b0+(a1*b0+a0*b1)*264+(a1*b1+a2*b0+a0*b2)*2128+(a3*b0+a0*b3+a1*b2+a2*b1)*2192+(a3*b1+a1*b3+a2*b2)*2256+(a3*b2+a2*b3)*2320+2384a3b3 (4)
Wherein, according to karatsubaThe expression after algorithm replacement is:
Q=a0*b0+[(a0+a1)*(b0+b1)-a0*b0-a1*b1]*264+[a1*b1+(a0+a2)*(b0+b2)-a0*b0-a2*b2]*2128+[(a0+a3)*(b0+b3)-a3*b3-a0*b0+(a1+a2)*(b1+b2)-a1*b1-a2*b2]*2192+[(a1+a3)*(b1+b3)-a1*b1-a3*b3+a2*b2]*2256+[(a3+a2)*(b3+b2)-a3*b3-a2*b2]*2320+2384a3b3 (5)
comparing the above expressions, it can be seen that since the same expression only needs to perform multiplication once, the number of partial products is reduced from 16 to 10, and when the bit width of the multiplier is further increased, the partial products are reduced more.
Based on the above principle, the present example designs a multiplier as follows. According to fig. 1, when the input enable signal req _ valid is invalid, the output is always 0; when the input enable signal req _ valid is valid, the two multipliers divide the resulting 8The 64-bit number is fed into 8 registers, respectively. Then a is calculated in the CLA adder0+a1,a0+a2,a0+a3,a1+a2,a1+a3,a2+a3,b0+b1,b0+b2,b0+b3,b1+b2,b1+b3,b2+b3Then into another 8 registers. Since the bottom multiplication unit used in this example is a 64-bit pipelined multiplication unit, the addition described above is a 64-bit addition, which may reach 65 bits, which will generate a data overflow. Therefore, a judgment of data overflow is set here, and when the data overflow, the data overflow enters an overflow processing module. The principle of the method is the same as that of a multiplier, 65-bit data is divided into 1 bit and 64 bits for multiplication, when the multipliers are all 64 bits, the next link is entered, and the rest operation is completed in a shifting mode.
The decoder is controlled by a pair of complementary clocks clk1 and clk2 present in the multiplier of this example, which control the respective counter 1 and counter 2 to form different codes. When the rising edge of the clock comes, the corresponding 64-bit multiplier is sent to the multiplying unit in sequence to finish the multiplication operation. Since the coefficient multiplications of the formula (5) are different from each other and do not generate interference, the operation of the clk1 and the clk2 is completed from the head end and the tail end respectively, so that the operation speed is doubled, the clock efficiency is improved, and the consumption of resources is reduced. Finally, the data output by the two 64-bit multipliers passes through the data operation unit and the CLA adder, and correct data is output when the output enable is valid.
The simulation result of the multiplier of the invention is as follows:
1. the experimental environment is as follows:
the multiplier of the embodiment uses Verilog HDL language to carry out code design, carries out simulation verification in vcs _ vM-2017.03, carries out synthesis under 55nmCMOS process by using a synthesis tool DC-2014, and carries out layout automatic layout and wiring by using INNOVUS.
Three groups of experimental data are randomly selected for pre-simulation, and the result of the simulation is correct. By using DC-2014 for static timing analysis, the whole chip can work correctly under a 100MHz clock.
In order to ensure the reliability of the chip, INNOVUS is used for extracting signal delay caused by standard units and connecting lines in a layout, and three groups of data of front simulation are still used for carrying out post simulation verification. The results are all correct after verification.
Comparative experiment:
compared with the traditional parallel multiplier, the invention only adopts two 64-bit multiplying units to ensure the fairness of comparison.
2. Results of the experiment
Multiplier of the present example Conventional parallel multiplier
Bit width of operation 256 bits 256 bits
Number of bottom multiplying units 2 64 bits 2 64 bits
Process for the preparation of a coating 55nmCMOS 55nmCMOS
Speed of operation 2.5 clock period 8 clock period
Number of logic gates 7 ten thousand 7.5 ten thousand
From simulation results, the multiplier of the present embodiment can output correct operation results in 2.5 clock cycles, whereas the conventional parallel multiplier needs 8 clock cycles to output results. Therefore, the operation speed of the multiplier of the embodiment is 3 times that of the traditional parallel multiplier, and compared with other types of multipliers, the multiplier has the advantages of higher speed, simpler circuit and less occupied resources.

Claims (4)

1. A high-speed big bit width multiplier is characterized by comprising two complementary clocks, a CLA adder, an overflow processing module, a decoder, a K bit multiplying unit and a data operation module;
the two complementary clocks have the same frequency and opposite phases and are used for controlling the two counters;
the CLA adder is used for adding each part of the divided multipliers pairwise;
the overflow processing module is used for judging the result of the first-stage CLA adder, and when the bit width of the result is more than K bits, the output result of the module is sent to a decoder and a data operation module at the lower stage under the control of a counter;
the decoder selects the data sent from the front stage by decoding the code of the counter, stores the data in a corresponding register and then sends the data to the K bit multiplication unit for operation;
the K bit multiplying unit adopts a partial product generator with two multipliers of K bits and an output result of 2 x K bits;
the data operation module carries out shift addition on the operation result of the K bit multiplication unit under the control of the counter, the output result enters the second-stage CLA adder for final calculation, and the second-stage CLA adder outputs the correct operation result under the control of the counter.
2. The circuit structure of a high-speed large-bit-width multiplier according to claim 1, wherein said overflow handling module sends the output result of the module to the decoder and data operation module at the lower stage by: the low K bits of the result of the CLA adder are sent to a decoder, and the highest bit of the result is sent to a data operation module.
3. A high-speed large-bit-width multiplier according to claim 1, wherein the operation method of said multiplier is: the karatsuba algorithm is applied to hardware of a parallel multiplier, parts are integrated into two groups, and the two complementary clocks control corresponding counters to form different codes to realize control of a decoder; when the rising edge of the clock comes, the corresponding multipliers are sequentially sent into the K bit multiplication unit to finish multiplication operation and shift addition operation, and the final multiplication result can be obtained.
4. A high-speed large-bit-width multiplier according to claim 3, wherein the karatsuba algorithm is applied to the hardware of the parallel multiplier, and can be used to reduce the partial product during the large-bit-width multiplication, and the specific method is as follows:
dividing the input large bit width x and y of two n bits into m k bits, wherein m is greater than 0, and k is greater than 0;
let X be { a ═ ak-1ak-2…a1a0},Y={bk-1bk-2…b1b0}
Then there are:
X*Y=a0*b0+(a1*b0+a0*b1)*2k+…+22(m-1)kak-1bk-1 (1)
among these, according to the karatsuba algorithm:
a1*b0+a0*b1=(a0+a1)*(b0+b1)-a0*b0-a1*b1 (2)
replacing the coefficients in equation (1) with the right three terms of the equation yields:
X*Y=a0*b0+[(a0+a1)*(b0+b1)-a0*b0-a1*b1]*2k+...+22(m-1)kak-1bk-1 (3)
the number of partial products is from m2Is reduced to
Figure FDA0002221295900000022
Namely, it is
Figure FDA0002221295900000021
A plurality of;
since the coefficients in the formula (3) do not affect each other from the first term and from the last term, the multiplication of the coefficients can be performed from the beginning and the end simultaneously.
CN201910934899.7A 2019-09-29 2019-09-29 High-speed big bit width multiplier Expired - Fee Related CN110647309B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910934899.7A CN110647309B (en) 2019-09-29 2019-09-29 High-speed big bit width multiplier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910934899.7A CN110647309B (en) 2019-09-29 2019-09-29 High-speed big bit width multiplier

Publications (2)

Publication Number Publication Date
CN110647309A true CN110647309A (en) 2020-01-03
CN110647309B CN110647309B (en) 2020-10-13

Family

ID=68993313

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910934899.7A Expired - Fee Related CN110647309B (en) 2019-09-29 2019-09-29 High-speed big bit width multiplier

Country Status (1)

Country Link
CN (1) CN110647309B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112230886A (en) * 2020-09-11 2021-01-15 清华大学 Processing device free of Toom-Cook and modular multiplication acquisition method based on same
CN114371828A (en) * 2022-01-05 2022-04-19 华中科技大学 Polynomial multiplier and processor with same
CN114666038A (en) * 2022-05-12 2022-06-24 广州万协通信息技术有限公司 Large-bit-width data processing method, device, equipment and storage medium
CN117692126A (en) * 2023-12-14 2024-03-12 哈尔滨理工大学 Paillier homomorphic encryption method and system based on low-complexity modular multiplication algorithm

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1155117A (en) * 1996-01-19 1997-07-23 张胤微 High-speed multiplication device
CN1553310A (en) * 2003-05-28 2004-12-08 中国科学院微电子中心 Symmetric cutting algorithm for high-speed low loss multiplier and circuit strucure thereof
US20090164546A1 (en) * 2007-12-21 2009-06-25 Vinodh Gopal Method and apparatus for efficient programmable cyclic redundancy check (crc)
CN101957739A (en) * 2010-09-10 2011-01-26 清华大学 Sub-quadratic polynomial multiplier based on divide and conquer
CN104375802A (en) * 2014-09-23 2015-02-25 上海晟矽微电子股份有限公司 Multiplication and division device and operational method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1155117A (en) * 1996-01-19 1997-07-23 张胤微 High-speed multiplication device
CN1553310A (en) * 2003-05-28 2004-12-08 中国科学院微电子中心 Symmetric cutting algorithm for high-speed low loss multiplier and circuit strucure thereof
US20090164546A1 (en) * 2007-12-21 2009-06-25 Vinodh Gopal Method and apparatus for efficient programmable cyclic redundancy check (crc)
CN101957739A (en) * 2010-09-10 2011-01-26 清华大学 Sub-quadratic polynomial multiplier based on divide and conquer
CN104375802A (en) * 2014-09-23 2015-02-25 上海晟矽微电子股份有限公司 Multiplication and division device and operational method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
C.PREMA等: "Enhanced high speed modular multiplier using", 《2013 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112230886A (en) * 2020-09-11 2021-01-15 清华大学 Processing device free of Toom-Cook and modular multiplication acquisition method based on same
CN112230886B (en) * 2020-09-11 2022-11-08 清华大学 Processing device free of Toom-Cook and modular multiplication acquisition method based on same
CN114371828A (en) * 2022-01-05 2022-04-19 华中科技大学 Polynomial multiplier and processor with same
CN114666038A (en) * 2022-05-12 2022-06-24 广州万协通信息技术有限公司 Large-bit-width data processing method, device, equipment and storage medium
CN117692126A (en) * 2023-12-14 2024-03-12 哈尔滨理工大学 Paillier homomorphic encryption method and system based on low-complexity modular multiplication algorithm

Also Published As

Publication number Publication date
CN110647309B (en) 2020-10-13

Similar Documents

Publication Publication Date Title
CN110647309B (en) High-speed big bit width multiplier
Abdelgawad et al. High speed and area-efficient multiply accumulate (MAC) unit for digital signal prossing applications
CN109144469B (en) Pipeline structure neural network matrix operation architecture and method
Olivieri Design of synchronous and asynchronous variable-latency pipelined multipliers
Pinto et al. Low-power modified shift-add multiplier design using parallel prefix adder
Srikanth et al. Low power array multiplier using modified full adder
US5133069A (en) Technique for placement of pipelining stages in multi-stage datapath elements with an automated circuit design system
Neeraja et al. Design of an area efficient braun multiplier using high speed parallel prefix adder in cadence
Unwala et al. Superpipelined adder designs
CN111178492A (en) Computing device, related product and computing method for executing artificial neural network model
Sangwan et al. Design and implementation of single precision pipelined floating point co-processor
Givaki et al. High-performance deterministic stochastic computing using residue number system
Shawl et al. Implementation of Area and Power efficient components of a MAC unit for DSP Processors
Lo et al. Building a multi-fpga virtualized restricted boltzmann machine architecture using embedded mpi
Yang et al. Lane shared bit-pragmatic deep neural network computing architecture and circuit
Bharathi et al. Area Efficient Self Timed Adders for Low Power Applications in VLSI
Sasipriya et al. Vedic Multiplier Design Using Modified Carry Select Adder with Parallel Prefix Adder
Sulieman et al. Design and Simulation of a Nanoscale Threshold-Logic Multiplier
Tang et al. A Low-Power Area-Efficient Precision Scalable Multiplier with an Input Vector Systolic Structure. Electronics 2022, 11, 2685
Samanth et al. A novel approach to develop low power MACs for 2D image filtering
Jing-yu et al. Multiply-accumulator using modified booth encoders designed for application in 16-bit RISC processor
Raavi et al. Implementation of High-Speed Hybrid Carry Select Adder using Binary to Excess-1 Converter
Jeong et al. A Study on multiplier architecture optimized for 32-bit processor with 3-stage pipeline
Hussein et al. Low-Latency Deterministic Multiplier for Stochastic Computing
Yugandhar et al. Power-Delay Efficient Array Multiplier for Lifting-Scheme 1D Discrete Wavelet Transform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20201013

CF01 Termination of patent right due to non-payment of annual fee