CN110647309A - High-speed big bit width multiplier - Google Patents
High-speed big bit width multiplier Download PDFInfo
- Publication number
- CN110647309A CN110647309A CN201910934899.7A CN201910934899A CN110647309A CN 110647309 A CN110647309 A CN 110647309A CN 201910934899 A CN201910934899 A CN 201910934899A CN 110647309 A CN110647309 A CN 110647309A
- Authority
- CN
- China
- Prior art keywords
- bit
- multiplier
- result
- multiplication
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims abstract description 14
- 230000000295 complement effect Effects 0.000 claims abstract description 13
- 230000000630 rising effect Effects 0.000 claims abstract description 5
- 238000004422 calculation algorithm Methods 0.000 claims description 16
- 241000030538 Thecla Species 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000004891 communication Methods 0.000 abstract description 3
- 241001442055 Vipera berus Species 0.000 description 24
- 238000004088 simulation Methods 0.000 description 7
- 238000013461 design Methods 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 101100113692 Caenorhabditis elegans clk-2 gene Proteins 0.000 description 2
- 101100003180 Colletotrichum lindemuthianum ATG1 gene Proteins 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000011248 coating agent Substances 0.000 description 1
- 238000000576 coating method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
Abstract
The invention provides a high-speed large-bit-width multiplier. The multiplier comprises two complementary clocks, a CLA adder, an overflow processing module, a decoder, a K bit multiplying unit and a data operation module; the operation method of the multiplier comprises the following steps: dividing the partial integrals into two groups, controlling the partial integrals of each group by different clocks, and performing parallel operation; and respectively carrying out multiplication operation and shift addition operation on the rising edges of the two complementary clocks to obtain a final multiplication result. The high-speed large-bit-width multiplier reduces the clock cycle consumption by half, and improves the operation speed of the multiplier. The multiplier can be used in the fields of integrated circuits, programmable logic devices, digital signal processing, communication and the like, and is characterized by simple circuit structure, less occupied resources, high speed and capability of realizing multiplication operation of operands with large bit width.
Description
Technical Field
The invention belongs to the field of computers and integrated circuits, and particularly relates to a design of a high-speed large-bit-width multiplier which can be applied to the fields of digital image processing, communication and the like.
Background
With the rapid development of artificial intelligence, cloud computing and internet of things technologies, the performance requirements of various processors are higher and higher. The multiplier, which is a core component in the processor, largely determines the operating frequency of the whole system due to the long delay time, and a large-bit-width multiplier consumes a larger chip area. Therefore, the speed and area of the multiplier will determine the performance and cost of the whole processor, and is the key point of system optimization.
Some high-speed multipliers have been proposed in recent years, and the main types thereof are classified into the following four types: an adder tree multiplier, a parallel multiplier, a look-up table multiplier, and a shift-and-add multiplier. The addition tree multiplier and the parallel multiplier have high operation speed and have the defect that the consumed hardware resources can be rapidly increased along with the increase of the number of multiplier bits; the lookup table multiplier accesses a memory storing multiplication results by using operands as addresses, the speed of the lookup table multiplier depends on the access speed of the memory, and when the number of bits of the multiplier is increased, the space of the memory is increased sharply; the shift-add multiplier has the disadvantages of low resource consumption and low speed.
The multiplier operation has two main aspects, namely generation of partial products on one hand and accumulation of the partial products on the other hand. Therefore, the method for increasing the speed of the multiplier is mainly to reduce the number of partial products and increase the accumulation speed of the partial products. At present, a plurality of schemes for realizing the multiplier are provided, and the emphasis of the schemes is to increase the speed of the multiplier, but the optimization of the speed and the resource consumption of the multiplier with large bit width is neglected.
Disclosure of Invention
The invention aims to provide a high-speed large-bit-width multiplier which is characterized by simple circuit structure, less occupied resources and high speed, and can realize large-bit-width operand multiplication operation. The multiplier can be used in the fields of integrated circuits, programmable logic devices, digital signal processing, communication and the like. The multiplier of the invention can reduce the consumption of clock period by half and improve the operation speed of the multiplier.
In order to achieve the purpose, the invention adopts the technical scheme that:
a high-speed big-bit-width multiplier comprises two complementary clocks, a CLA adder, an overflow processing module, a decoder, a K-bit multiplying unit and a data operation module;
the two complementary clocks have the same frequency and opposite phases and are used for controlling the two counters;
the CLA adder, namely the carry look ahead adder, is characterized in that carry signals of all stages are generated simultaneously, and the time for generating carry is greatly reduced. Here the adder will divide the X multiplier { ak-1ak-2…a1a0Each portion is added two by two, e.g. a0+a1,a0+a2,a0+a3,a1+a2,......,ak-1+ak. Similarly, for the Y multiplier { b }k-1bk-2...b1b0Is given by b0+b1,b0+b2,b0+b3,b1+b2,......,bk-1+bk。
The overflow processing module mainly judges the result of the first-stage CLA adder, and when the bit width of the result is greater than K bits, the output result of the module is sent to a decoder and a data operation module at the lower stage under the control of a clock.
The decoder mainly decodes the code of the counter, selects the data sent from the front stage, stores the data in a corresponding register, and then sends the data to the K bit multiplication unit for operation.
The K bit multiplying unit adopts a partial product generator with two multipliers of K bits and 2 x K bits of output result, and the type of the partial product generator can comprise a pipeline type, a Booth type and the like.
The data operation module is mainly used for carrying out shift addition on the operation result of the K bit multiplication unit under the control of a clock and a counter, the output result of the data operation module enters the next-stage adder for final calculation, and the second-stage CLA adder outputs a correct operation result under the control of the counter.
Further, the method for sending the module output result to the lower-level decoder and the data operation module by the overflow processing module is as follows: the low K bits of the result of the CLA adder are sent to a decoder, and the highest bit of the result is sent to a data operation module, so that the multiplication result is correct.
Further, the operation method of the multiplier is as follows: the karatsuba algorithm is applied to hardware of a parallel multiplier, parts are integrated into two groups, and the two clocks control corresponding counters to form different codes so as to realize control of a decoder, wherein the coding mode can adopt one-hot codes or GRAY codes. When the rising edge of the clock comes, the corresponding multipliers are sequentially sent into the K bit multiplication unit to finish multiplication operation and shift addition operation, and the final multiplication result can be obtained.
Further, the application of the karatsuba algorithm to the hardware of a parallel multiplier may be used to reduce the partial products during large bit width multiplication operations.
The karatsuba algorithm is a fast multiplication algorithm, is mainly used for multiplication of two large numbers, greatly improves the operation efficiency, reduces the complexity compared with the common multiplication, and applies the recursive idea. The basic principle and practice is to divide the two large numbers x and y, which have a large number of bits, into a number with a small number of bits. After this process, it is simplified to make three multiplications, with a small number of addition operations and shift operations.
Compared with the faster tom-Cook algorithm, the algorithm is simpler in hardware implementation, the area cost and the hardware realizability are considered comprehensively, and the karatsuba algorithm is more suitable for multipliers in the processor.
The specific implementation method comprises the following steps: dividing the input two large bit widths x and y of n bits into m k bit numbers, wherein m is greater than 0, and k is greater than 0. Let X be { a ═ ak-1ak-2…a1a0},Y={bk-1bk-2…b1b0The following are provided:
X*Y=a0*b0+(a1*b0+a0*b1)*2k+...+22(m-1)kak-1bk-1 (1)
among these, according to the karatsuba algorithm:
a1*b0+a0*b1=(a0+a1)*(b0+b1)-a0*b0-a1*b1 (2)
replacing the coefficients in equation (1) with the right three terms of the equation yields:
X*Y=a0*b0+[(a0+a1)*(b0+b1)-a0*b0-a1*b1]*2k+...+22(m-1)kak-1bk-1 (3)
In the invention, since the coefficients in the formula (3) do not affect each other from the first term and from the last term, the multiplication of the coefficients can be performed from the beginning and the end simultaneously.
The invention adopts a pair of complementary clocks with the same frequency and opposite phases, so that the data change of the counters controlled by the two clocks can be separated by half a period, and no interference is generated when the data are input to a decoder. The decoder selects proper preceding stage data to be sent to the K bit multiplication unit in real time according to the two counters, and the data sent to the K bit multiplication unit every time are spaced by half a clock period. If the first-stage CLA adder has data overflow, a data overflow processing module is called. The data operation module sends the processed data to the second-stage CLA adder, and the second-stage CLA adder outputs a correct operation result under the control of the counter.
The invention has the beneficial effects that:
compared with the prior art, the method has the advantages that the partial product is reduced by utilizing the karatsuba algorithm, so that the multiplication speed is improved, the resource consumption and the time delay of a critical path are reduced, and the chip cost is saved. Meanwhile, the consumption of the clock is saved by utilizing the optimization of the complementary clock, the period of the operation clock is reduced to a half, and the operation efficiency is improved.
Compared with the traditional parallel multiplier, the invention has the advantages that the operation speed is improved by 3-4 times; compared with other types of multipliers, the speed is higher, the circuit is simpler, and the occupied resources are less.
Description of the drawings:
FIG. 1 is a flow chart of an embodiment of the present invention;
fig. 2 is a diagram of an implementation architecture of the present invention.
The specific implementation mode is as follows:
the invention is further described with reference to the accompanying drawings and specific examples.
Fig. 2 shows a high-speed large-bit-width multiplier according to the present invention, which includes two complementary clocks, a CLA adder, an overflow processing module, a decoder, a K-bit multiplying unit, and a data operation module;
the two complementary clocks have the same frequency and opposite phases and are used for controlling the two counters;
the CLA adder, namely the carry look ahead adder, is characterized in that carry signals of all stages are generated simultaneously, and the time for generating carry is greatly reduced. Here the adder will divide the X multiplier { ak-1ak-2…a1a0Each portion is added two by two, e.g. a0+a1,a0+a2,a0+a3,a1+a2,......,ak-1+ak(ii) a Similarly, for the Y multiplier { b }k-1bk-2...b1b0Is given by b0+b1,b0+b2,b0+b3,b1+b2,......,bk-1+bk;
The overflow processing module mainly judges the result of the first-stage CLA adder, and when the bit width of the result is more than K bits, the output result of the module is sent to a decoder and a data operation module at the lower stage under the control of a clock;
the decoder mainly decodes the code of the counter, selects the data sent from the front stage, stores the data in a corresponding register and sends the data to the K bit multiplication unit for operation;
the K bit multiplying unit adopts a partial product generator with two multipliers of K bits and 2 x K bits of output result, and the type of the partial product generator can comprise a pipeline type, a Booth type and the like;
the data operation module is mainly used for carrying out shift addition on the operation result of the K bit multiplication unit under the control of a clock and a counter, the output result of the data operation module enters the next-stage adder for final calculation, and the second-stage CLA adder outputs a correct operation result under the control of the counter.
The high-speed large-bit-width multiplier applies the karatsuba algorithm to the hardware realization of a parallel multiplier, integrates parts into two groups, and controls a decoder by using different codes formed by a counter 1 and a counter 2 which are controlled by the two clocks. When the rising edge of the clock comes, the corresponding multipliers are sequentially sent into the K bit multiplication unit to finish multiplication operation and shift addition operation, and the final multiplication result can be obtained.
The invention adopts a pair of complementary clocks with the same frequency and opposite phases, so that the data change of the counter 1 and the counter 2 controlled by the two clocks in the figure 2 can be separated by half a period, and the data change can not generate interference when being input to a decoder. The decoder selects proper preceding stage data to be sent to the K bit multiplication unit in real time according to the counter 1 and the counter 2, and the data sent to the K bit multiplication unit every time are separated by half clock period. In FIG. 2, if the first stage CLA adder has data overflow, the data overflow handling module is called. The overflow processing module sends the low-K bits of the result of the CLA adder to the decoder, and the highest bit of the result is sent to the data operation module, so that the multiplication result is correct. The data operation module sends the processed data to the second-stage CLA adder, and the second-stage CLA adder outputs a correct operation result under the control of the counter 1.
The high-speed big bit width number multiplier of the invention uses the karatsuba algorithm to reduce the partial product in the big bit width multiplication operation process, and the specific realization method is as follows: of two n bits to be inputAnd dividing the large bit width x and y into m k bits, wherein m is greater than 0, and k is greater than 0. Let X be { a ═ ak-1ak-2…a1a0},Y={bk-1bk-2…b1b0The following are provided:
X*Y=a0*b0+(a1*b0+a0*b1)*2k+...+22(m-1)kak-1bk-1 (1)
among these, according to the karatsuba algorithm:
a1*b0+a0*b1=(a0+a1)*(b0+b1)-a0*b0-a1*b1 (2)
replacing the coefficients in equation (1) with the right three terms of the equation yields:
X*Y=a0*b0+[(a0+a1)*(b0+b1)-a0*b0-a1*b1]*2k+...+22(m-1)kak-1bk-1 (3)
In the invention, since the coefficients in the formula (3) do not affect each other from the first term and from the last term, the multiplication of the coefficients can be performed from the beginning and the end simultaneously.
The input two multipliers X, Y are 256 bits wide and the output Q is 512 bits wide in this example. The two input multipliers are respectively divided into 4 64-bit numbers, namely X ═ a3a2a1a0},Y={b3b2b1b0Q ═ X ═ Y ═ a }0*b0+(a1*b0+a0*b1)*264+(a1*b1+a2*b0+a0*b2)*2128+(a3*b0+a0*b3+a1*b2+a2*b1)*2192+(a3*b1+a1*b3+a2*b2)*2256+(a3*b2+a2*b3)*2320+2384a3b3 (4)
Wherein, according to karatsubaThe expression after algorithm replacement is:
Q=a0*b0+[(a0+a1)*(b0+b1)-a0*b0-a1*b1]*264+[a1*b1+(a0+a2)*(b0+b2)-a0*b0-a2*b2]*2128+[(a0+a3)*(b0+b3)-a3*b3-a0*b0+(a1+a2)*(b1+b2)-a1*b1-a2*b2]*2192+[(a1+a3)*(b1+b3)-a1*b1-a3*b3+a2*b2]*2256+[(a3+a2)*(b3+b2)-a3*b3-a2*b2]*2320+2384a3b3 (5)
comparing the above expressions, it can be seen that since the same expression only needs to perform multiplication once, the number of partial products is reduced from 16 to 10, and when the bit width of the multiplier is further increased, the partial products are reduced more.
Based on the above principle, the present example designs a multiplier as follows. According to fig. 1, when the input enable signal req _ valid is invalid, the output is always 0; when the input enable signal req _ valid is valid, the two multipliers divide the resulting 8The 64-bit number is fed into 8 registers, respectively. Then a is calculated in the CLA adder0+a1,a0+a2,a0+a3,a1+a2,a1+a3,a2+a3,b0+b1,b0+b2,b0+b3,b1+b2,b1+b3,b2+b3Then into another 8 registers. Since the bottom multiplication unit used in this example is a 64-bit pipelined multiplication unit, the addition described above is a 64-bit addition, which may reach 65 bits, which will generate a data overflow. Therefore, a judgment of data overflow is set here, and when the data overflow, the data overflow enters an overflow processing module. The principle of the method is the same as that of a multiplier, 65-bit data is divided into 1 bit and 64 bits for multiplication, when the multipliers are all 64 bits, the next link is entered, and the rest operation is completed in a shifting mode.
The decoder is controlled by a pair of complementary clocks clk1 and clk2 present in the multiplier of this example, which control the respective counter 1 and counter 2 to form different codes. When the rising edge of the clock comes, the corresponding 64-bit multiplier is sent to the multiplying unit in sequence to finish the multiplication operation. Since the coefficient multiplications of the formula (5) are different from each other and do not generate interference, the operation of the clk1 and the clk2 is completed from the head end and the tail end respectively, so that the operation speed is doubled, the clock efficiency is improved, and the consumption of resources is reduced. Finally, the data output by the two 64-bit multipliers passes through the data operation unit and the CLA adder, and correct data is output when the output enable is valid.
The simulation result of the multiplier of the invention is as follows:
1. the experimental environment is as follows:
the multiplier of the embodiment uses Verilog HDL language to carry out code design, carries out simulation verification in vcs _ vM-2017.03, carries out synthesis under 55nmCMOS process by using a synthesis tool DC-2014, and carries out layout automatic layout and wiring by using INNOVUS.
Three groups of experimental data are randomly selected for pre-simulation, and the result of the simulation is correct. By using DC-2014 for static timing analysis, the whole chip can work correctly under a 100MHz clock.
In order to ensure the reliability of the chip, INNOVUS is used for extracting signal delay caused by standard units and connecting lines in a layout, and three groups of data of front simulation are still used for carrying out post simulation verification. The results are all correct after verification.
Comparative experiment:
compared with the traditional parallel multiplier, the invention only adopts two 64-bit multiplying units to ensure the fairness of comparison.
2. Results of the experiment
Multiplier of the present example | Conventional parallel multiplier | |
Bit width of operation | 256 bits | 256 bits |
Number of bottom multiplying units | 2 64 bits | 2 64 bits |
Process for the preparation of a coating | 55nmCMOS | 55nmCMOS |
Speed of operation | 2.5 clock period | 8 clock period |
Number of logic gates | 7 ten thousand | 7.5 ten thousand |
From simulation results, the multiplier of the present embodiment can output correct operation results in 2.5 clock cycles, whereas the conventional parallel multiplier needs 8 clock cycles to output results. Therefore, the operation speed of the multiplier of the embodiment is 3 times that of the traditional parallel multiplier, and compared with other types of multipliers, the multiplier has the advantages of higher speed, simpler circuit and less occupied resources.
Claims (4)
1. A high-speed big bit width multiplier is characterized by comprising two complementary clocks, a CLA adder, an overflow processing module, a decoder, a K bit multiplying unit and a data operation module;
the two complementary clocks have the same frequency and opposite phases and are used for controlling the two counters;
the CLA adder is used for adding each part of the divided multipliers pairwise;
the overflow processing module is used for judging the result of the first-stage CLA adder, and when the bit width of the result is more than K bits, the output result of the module is sent to a decoder and a data operation module at the lower stage under the control of a counter;
the decoder selects the data sent from the front stage by decoding the code of the counter, stores the data in a corresponding register and then sends the data to the K bit multiplication unit for operation;
the K bit multiplying unit adopts a partial product generator with two multipliers of K bits and an output result of 2 x K bits;
the data operation module carries out shift addition on the operation result of the K bit multiplication unit under the control of the counter, the output result enters the second-stage CLA adder for final calculation, and the second-stage CLA adder outputs the correct operation result under the control of the counter.
2. The circuit structure of a high-speed large-bit-width multiplier according to claim 1, wherein said overflow handling module sends the output result of the module to the decoder and data operation module at the lower stage by: the low K bits of the result of the CLA adder are sent to a decoder, and the highest bit of the result is sent to a data operation module.
3. A high-speed large-bit-width multiplier according to claim 1, wherein the operation method of said multiplier is: the karatsuba algorithm is applied to hardware of a parallel multiplier, parts are integrated into two groups, and the two complementary clocks control corresponding counters to form different codes to realize control of a decoder; when the rising edge of the clock comes, the corresponding multipliers are sequentially sent into the K bit multiplication unit to finish multiplication operation and shift addition operation, and the final multiplication result can be obtained.
4. A high-speed large-bit-width multiplier according to claim 3, wherein the karatsuba algorithm is applied to the hardware of the parallel multiplier, and can be used to reduce the partial product during the large-bit-width multiplication, and the specific method is as follows:
dividing the input large bit width x and y of two n bits into m k bits, wherein m is greater than 0, and k is greater than 0;
let X be { a ═ ak-1ak-2…a1a0},Y={bk-1bk-2…b1b0}
Then there are:
X*Y=a0*b0+(a1*b0+a0*b1)*2k+…+22(m-1)kak-1bk-1 (1)
among these, according to the karatsuba algorithm:
a1*b0+a0*b1=(a0+a1)*(b0+b1)-a0*b0-a1*b1 (2)
replacing the coefficients in equation (1) with the right three terms of the equation yields:
X*Y=a0*b0+[(a0+a1)*(b0+b1)-a0*b0-a1*b1]*2k+...+22(m-1)kak-1bk-1 (3)
since the coefficients in the formula (3) do not affect each other from the first term and from the last term, the multiplication of the coefficients can be performed from the beginning and the end simultaneously.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910934899.7A CN110647309B (en) | 2019-09-29 | 2019-09-29 | High-speed big bit width multiplier |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910934899.7A CN110647309B (en) | 2019-09-29 | 2019-09-29 | High-speed big bit width multiplier |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110647309A true CN110647309A (en) | 2020-01-03 |
CN110647309B CN110647309B (en) | 2020-10-13 |
Family
ID=68993313
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910934899.7A Expired - Fee Related CN110647309B (en) | 2019-09-29 | 2019-09-29 | High-speed big bit width multiplier |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110647309B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112230886A (en) * | 2020-09-11 | 2021-01-15 | 清华大学 | Processing device free of Toom-Cook and modular multiplication acquisition method based on same |
CN114371828A (en) * | 2022-01-05 | 2022-04-19 | 华中科技大学 | Polynomial multiplier and processor with same |
CN114666038A (en) * | 2022-05-12 | 2022-06-24 | 广州万协通信息技术有限公司 | Large-bit-width data processing method, device, equipment and storage medium |
CN117692126A (en) * | 2023-12-14 | 2024-03-12 | 哈尔滨理工大学 | Paillier homomorphic encryption method and system based on low-complexity modular multiplication algorithm |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1155117A (en) * | 1996-01-19 | 1997-07-23 | 张胤微 | High-speed multiplication device |
CN1553310A (en) * | 2003-05-28 | 2004-12-08 | 中国科学院微电子中心 | Symmetric cutting algorithm for high-speed low loss multiplier and circuit strucure thereof |
US20090164546A1 (en) * | 2007-12-21 | 2009-06-25 | Vinodh Gopal | Method and apparatus for efficient programmable cyclic redundancy check (crc) |
CN101957739A (en) * | 2010-09-10 | 2011-01-26 | 清华大学 | Sub-quadratic polynomial multiplier based on divide and conquer |
CN104375802A (en) * | 2014-09-23 | 2015-02-25 | 上海晟矽微电子股份有限公司 | Multiplication and division device and operational method |
-
2019
- 2019-09-29 CN CN201910934899.7A patent/CN110647309B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1155117A (en) * | 1996-01-19 | 1997-07-23 | 张胤微 | High-speed multiplication device |
CN1553310A (en) * | 2003-05-28 | 2004-12-08 | 中国科学院微电子中心 | Symmetric cutting algorithm for high-speed low loss multiplier and circuit strucure thereof |
US20090164546A1 (en) * | 2007-12-21 | 2009-06-25 | Vinodh Gopal | Method and apparatus for efficient programmable cyclic redundancy check (crc) |
CN101957739A (en) * | 2010-09-10 | 2011-01-26 | 清华大学 | Sub-quadratic polynomial multiplier based on divide and conquer |
CN104375802A (en) * | 2014-09-23 | 2015-02-25 | 上海晟矽微电子股份有限公司 | Multiplication and division device and operational method |
Non-Patent Citations (1)
Title |
---|
C.PREMA等: "Enhanced high speed modular multiplier using", 《2013 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112230886A (en) * | 2020-09-11 | 2021-01-15 | 清华大学 | Processing device free of Toom-Cook and modular multiplication acquisition method based on same |
CN112230886B (en) * | 2020-09-11 | 2022-11-08 | 清华大学 | Processing device free of Toom-Cook and modular multiplication acquisition method based on same |
CN114371828A (en) * | 2022-01-05 | 2022-04-19 | 华中科技大学 | Polynomial multiplier and processor with same |
CN114666038A (en) * | 2022-05-12 | 2022-06-24 | 广州万协通信息技术有限公司 | Large-bit-width data processing method, device, equipment and storage medium |
CN117692126A (en) * | 2023-12-14 | 2024-03-12 | 哈尔滨理工大学 | Paillier homomorphic encryption method and system based on low-complexity modular multiplication algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN110647309B (en) | 2020-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110647309B (en) | High-speed big bit width multiplier | |
Abdelgawad et al. | High speed and area-efficient multiply accumulate (MAC) unit for digital signal prossing applications | |
CN109144469B (en) | Pipeline structure neural network matrix operation architecture and method | |
Olivieri | Design of synchronous and asynchronous variable-latency pipelined multipliers | |
Pinto et al. | Low-power modified shift-add multiplier design using parallel prefix adder | |
Srikanth et al. | Low power array multiplier using modified full adder | |
US5133069A (en) | Technique for placement of pipelining stages in multi-stage datapath elements with an automated circuit design system | |
Neeraja et al. | Design of an area efficient braun multiplier using high speed parallel prefix adder in cadence | |
Unwala et al. | Superpipelined adder designs | |
CN111178492A (en) | Computing device, related product and computing method for executing artificial neural network model | |
Sangwan et al. | Design and implementation of single precision pipelined floating point co-processor | |
Givaki et al. | High-performance deterministic stochastic computing using residue number system | |
Shawl et al. | Implementation of Area and Power efficient components of a MAC unit for DSP Processors | |
Lo et al. | Building a multi-fpga virtualized restricted boltzmann machine architecture using embedded mpi | |
Yang et al. | Lane shared bit-pragmatic deep neural network computing architecture and circuit | |
Bharathi et al. | Area Efficient Self Timed Adders for Low Power Applications in VLSI | |
Sasipriya et al. | Vedic Multiplier Design Using Modified Carry Select Adder with Parallel Prefix Adder | |
Sulieman et al. | Design and Simulation of a Nanoscale Threshold-Logic Multiplier | |
Tang et al. | A Low-Power Area-Efficient Precision Scalable Multiplier with an Input Vector Systolic Structure. Electronics 2022, 11, 2685 | |
Samanth et al. | A novel approach to develop low power MACs for 2D image filtering | |
Jing-yu et al. | Multiply-accumulator using modified booth encoders designed for application in 16-bit RISC processor | |
Raavi et al. | Implementation of High-Speed Hybrid Carry Select Adder using Binary to Excess-1 Converter | |
Jeong et al. | A Study on multiplier architecture optimized for 32-bit processor with 3-stage pipeline | |
Hussein et al. | Low-Latency Deterministic Multiplier for Stochastic Computing | |
Yugandhar et al. | Power-Delay Efficient Array Multiplier for Lifting-Scheme 1D Discrete Wavelet Transform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20201013 |
|
CF01 | Termination of patent right due to non-payment of annual fee |