CN114115803B - Approximate floating-point multiplier based on partial product probability analysis - Google Patents
Approximate floating-point multiplier based on partial product probability analysis Download PDFInfo
- Publication number
- CN114115803B CN114115803B CN202210076195.2A CN202210076195A CN114115803B CN 114115803 B CN114115803 B CN 114115803B CN 202210076195 A CN202210076195 A CN 202210076195A CN 114115803 B CN114115803 B CN 114115803B
- Authority
- CN
- China
- Prior art keywords
- module
- bit
- approximate
- compressor
- compression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
Abstract
The invention discloses an approximate floating-point multiplier based on partial product probability analysis, which comprises a sign bit exclusive OR module, a mantissa approximate multiplication module, a normalization module, a rounding module, an exponent addition module, an exponent adjustment module, a special condition processing module and a result output module, wherein the sign bit exclusive OR module is used for carrying out partial product probability analysis on a result; the mantissa approximate multiplication module comprises a truncation and compensation unit, a low-order OR gate compression unit, an approximate 4-2 compressor and an accurate compressor; the mantissa approximate multiplication module cuts off the low weight bit and performs compensation processing on the low weight bit with the highest bit number, each two partial products of the compensation bit and the first middle weight bit are compressed into one bit by using an OR gate, and each four partial products of the second middle weight bit are approximately compressed; the high-weight bits are compressed accurately. The invention can effectively simplify the structure of the compressor and generate errors as few as possible, can not generate additional errors by adjusting the input sequence, and ensures the precision of the multiplier while reducing the complexity of the compression structure.
Description
Technical Field
The invention relates to the technical field of approximation circuits, in particular to an approximation floating-point multiplier based on partial product probability analysis.
Background
As the speed of semiconductor processing technology has slowed and the Dennard scaling law has tended to fail, the power consumption and efficiency of integrated circuits have faced significant challenges. As an emerging computational paradigm, approximate computation provides a new idea for solving the high power consumption problem of integrated circuits, i.e., by sacrificing suitable accuracy in exchange for considerable power consumption and area gain. For applications with certain fault-tolerant capabilities such as data recognition, image processing, machine learning, wireless communication and the like, even if some precision reduction is brought by introducing approximate calculation, reasonable results can be generated. A floating-point multiplier, which is an arithmetic operation unit widely used in the fields of High-Dynamic Range (HDR) image processing, wireless communication, and the like, has High complexity and large consumption of hardware resources. Based on the floating-point multiplier, by utilizing the fault-tolerant capability of the application, an approximate floating-point multiplier which can generate a specific error and does not exceed the fault-tolerant limit of the application can be designed. The approximate floating-point multiplier realized by deleting or simplifying the internal circuit can greatly reduce the whole hardware resource and power consumption while sacrificing certain precision.
The existing technology has little research on approximate floating-point multipliers, the existing approximate floating-point multipliers are mainly designed approximately for mantissa multiplication parts, the distribution that the specific partial product is 1 in mantissa multiplication is not considered, and extra errors are introduced when the probabilities of the partial products are different from 1. Therefore, the probability that the partial product is 1 needs to be considered in the approximate design, so that an approximate floating-point multiplier with better performance and higher precision is designed.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides the approximate floating-point multiplier based on the partial product probability analysis, which can effectively simplify the structure of the compressor and generate as few errors as possible, can not generate additional errors by adjusting the input sequence, and ensures the precision of the multiplier while reducing the complexity of the compression structure.
In order to achieve the purpose, the invention adopts the following technical scheme:
an approximate floating-point multiplier based on partial product probability analysis comprises a sign bit exclusive OR module, a mantissa approximate multiplication module, a normalization module, a rounding module, an exponent addition module, an exponent adjustment module, a special case processing module and a result output module;
the input signals of the approximate floating-point multiplier are a multiplier and a multiplicand in a half-precision format, and the digits of the multiplier and the multiplicand are divided into a low weight digit, a first middle weight digit, a second middle weight digit and a high weight digit according to a sequence from low to high;
the sign bit XOR module is used for XOR-ing the sign bits of the multiplier and the multiplicand and sending the XOR result as a result sign bit to the result output module;
the mantissa approximate multiplication module is connected with the result output module sequentially through the normalization module and the rounding module, performs approximate multiplication processing on the digits of a multiplier and a multiplicand, normalizes and normalizes the processing result through the normalization module and the rounding module, and sends the normalized rounding result as a result mantissa digit to the result output module;
the mantissa approximate multiplication module comprises a truncation and compensation unit, a low-order OR gate compression unit, an approximate 4-2 compressor and a precise compressor; the truncation and compensation unit truncates the low weight bits and compensates the low weight bits with the highest bit number, the low-order OR gate compression unit compresses each two partial products of the compensation bits and the first middle weight bits into one bit according to the sequence of the partial products from low to high with the probability of 1, and the approximate 4-2 compressor approximately compresses each four partial products of the second middle weight bits; the precise compressor precisely compresses the high-weight bits;
the exponent adding module adds the exponents of the multiplier and the multiplicand, then adjusts according to the normalized rounding result output by the rounding module, and sends the adjustment result as a result exponent bit to the result output module;
and when the indexes in the input signals are all 0 or 1, multiplying the input signals by adopting a special case processing module.
In order to optimize the technical scheme, the specific measures adopted further comprise:
further, according to the sequence from low to high, the 1 st bit to the 10 th bit are low weight bits, the 11 th bit is a first middle weight bit, the 12 th bit to the 14 th bit are a second middle weight bit, and the remaining bits are high weight bits.
Further, the mantissa approximation multiplication module comprises three stages of compression;
when the first stage of compression is carried out, the truncation and compensation unit carries out partial product processing on the 1 st bit to the 10 th bit; the low-order OR gate compression unit performs probability-based OR gate compression on the 10 th order and the 11 th order; an approximate 4-2 compressor approximately compresses the 12 th bit to the 14 th bit; the precise compressor precisely compresses the rest high-weight bits;
when the second-stage compression is carried out, the truncation and compensation unit carries out compensation processing on the 10 th bit, the approximate 4-2 compressor carries out approximate compression on the 10 th bit to the 14 th bit, and the accurate compressor carries out accurate compression on the rest high-weight bits;
when the third-stage compression is carried out, the approximate 4-2 compressor carries out approximate compression on the 11 th bit to the 12 th bit, and the rest part uses a half adder to obtain the partial product of two rows; the two rows of partial products are added in a final summing section to produce a final product.
Further, the input of the approximate 4-2 compressor is、、Andoutput isAndexpression ofThe expression of carry is。
The invention has the beneficial effects that:
first, the approximate floating-point multiplier based on partial product probability analysis of the present invention is based on partial product probability analysis, and the low-order or gate compression method compresses every two partial products according to the probability from low to high, which can effectively simplify the structure of the compressor and generate as few errors as possible.
Secondly, in the approximate floating-point multiplier based on partial product probability analysis, the approximate 4-2 compressor with insensitive input sequence is suitable for the scene with different partial product 1 probabilities, only when all 1 is input, the-2 error is generated, no extra error is generated by adjusting the input sequence (for the situation, a special condition processing module is adopted for processing), and the precision of the multiplier is ensured while the complexity of a compression structure is reduced.
Drawings
FIG. 1 is a schematic diagram of an approximate floating-point multiplier based on partial product probability analysis according to the present invention.
FIG. 2 is a probability diagram of 1 in the mantissa containing the implied bit of a half-precision floating point number distributed by Gaussian.
FIG. 3 is a diagram illustrating a probability distribution of fraction product 1 in mantissa multiplication.
Fig. 4 is a diagram of an implementation of a mantissa approximation multiplier.
FIG. 5 is a schematic diagram of a bit OR gate compression method.
Fig. 6 is a schematic diagram of an approximate 4-2 compressor architecture that is insensitive to input order.
Fig. 7 is a truth table diagram for an approximate 4-2 compressor insensitive to input order.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings.
It should be noted that the terms "upper", "lower", "left", "right", "front", "back", etc. used in the present invention are for clarity of description only, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not limited by the technical contents of the essential changes.
FIG. 1 is a schematic diagram of an approximate floating-point multiplier based on partial product probability analysis according to the present invention. The approximate floating-point multiplier comprises a sign bit exclusive OR module, a mantissa approximate multiplication module, a normalization module, a rounding module, an exponent addition module, an exponent adjusting module, a special case processing module and a result output module.
The input signals of the approximate floating-point multiplier are a multiplier and a multiplicand in a half-precision format, and the digits of the multiplier and the multiplicand are divided into a low-weight digit, a first middle-weight digit, a second middle-weight digit and a high-weight digit according to the sequence from low to high.
And the sign bit XOR module is used for XOR-ing the sign bits of the multiplier and the multiplicand and sending the XOR result as a result sign bit to the result output module.
The mantissa approximate multiplication module is connected with the result output module sequentially through the normalization module and the rounding module, performs approximate multiplication processing on the digits of the multiplier and the multiplicand, normalizes and normalizes the processing result through the normalization module and the rounding module, and sends the normalized rounding result as a result mantissa digit to the result output module.
The mantissa approximate multiplication module comprises a truncation and compensation unit, a low-order OR gate compression unit, an approximate 4-2 compressor and an accurate compressor; the truncation and compensation unit truncates the low weight bits and compensates the low weight bits with the highest bit number, the low-order OR gate compression unit compresses each two partial products of the compensation bits and the first middle weight bits into one bit according to the sequence of the partial products from low to high with the probability of 1, and the approximate 4-2 compressor approximately compresses each four partial products of the second middle weight bits; the precision compressor precisely compresses the high weight bits.
The exponent adding module adds the exponents of the multiplier and the multiplicand, then adjusts according to the normalized rounding result output by the rounding module, and sends the adjustment result as a result exponent bit to the result output module.
And when the indexes of the input signals are all 0 or 1, multiplying the input signals by using a special case processing module.
The input signals of the embodiment are two multipliers in half-precision format, the approximate floating-point multiplier performs exclusive or on sign bits of the two multipliers respectively, mantissa approximation multiplication is performed on mantissa, exponents are added, adjustment is performed according to a normalized rounding result of the mantissa approximation multiplication, and a product of the final approximate floating-point multiplier is obtained through output.
Based on the universality of Gaussian distribution, data in applications such as HDR image processing and wireless communication are all Gaussian distribution, experimental statistics is carried out on the probability of 1 in mantissas of half-precision floating point numbers of the Gaussian distribution, and the distribution is shown in FIG. 2, wherein A is 11-bit mantissa including hidden bits. The probability that the highest bit of the mantissa is 1 is the highest, and the probability of the mantissa is gradually reduced along with the increase of the numerical value, and is consistent with the distribution rule of the original data. The probability of mantissa 1 comprising an implied bit being different makes the partial products of mantissa approximation multipliers different, and the distribution is shown in fig. 3.
Fig. 4 shows an implementation of a mantissa approximation multiplier. The mantissa approximate multiplication module includes: low order or gate compression units, approximate 4-2 compressors insensitive to input order, and precision compressors. The low-order OR gate compression unit is used for the partial product compression module, and the low-order OR gate is used for compressing every two partial products into one order from low to high according to the probability that the partial products are 1. An approximate 4-2 compressor, insensitive to input order, is also used in the partial product compression module to approximately compress every fourth partial product of the intermediate weights. And truncating and compensating the low weight, performing low-order OR gate compression and approximate 4-2 compression on the middle weight bit, and performing accurate compression on the high weight bit to form a mantissa approximate multiplication module. The mantissa approximation multiplier is used in the mantissa multiplication portion of a floating-point multiplier.
For example, the 1 st bit to 10 th bit products are truncated and compensated at the 10 th bit, the 10 th to 14 th bits are approximately compressed, and the remaining high bits are precisely compressed. In the first stage of the partial product array, the 10 th and 11 th bits are subjected to probability-based OR gate compression as shown in FIG. 5. When compressed according to fig. 5, the probability of error generation by the compression part of the or gate is only 5.88% at the maximum, and the effect on the final result is less when the error generation is at the lower weight bits. Bits 12 to 14 are compressed using the inventive approximate 4-2 compressor without considering the probability problem of 1 in the partial product, and the high bits use exact compression. After the first stage of compression, the probability that the product of the low weight part is 1 becomes large, and the error introduced by using the or gate compression is greatly increased. Thus, in the second stage of compression, the 10 th to 14 th bits use the inventive 4-2 compressor, and the rest of the high order compression is the same as in the first stage. In the third stage of compression, the invented approximate 4-2 compressor is used at 11 and 12 bits with more than two partial products, and the half adder is used in the rest part to obtain two rows of partial products. Finally, the two rows of partial products are added in a final summing section to produce a final product.
Wherein the structure of an approximate 4-2 compressor insensitive to the input sequence is shown in FIG. 6, and the input of the approximate 4-2 compressor isOutput isExpression ofThe expression of carry is. The approximate 4-2 compressor only makes-2 errors when the inputs are all 1, and adjusting the input sequence does not affect the final output result of the compressor, and a specific truth table is shown in fig. 7. The approximate 4-2 compressor is used in the middle weight bit of the mantissa approximate multiplication module, and the specific use range can be adjusted according to the precision.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.
Claims (4)
1. The approximate floating-point multiplier based on partial product probability analysis is characterized by comprising a sign bit exclusive OR module, a mantissa approximate multiplication module, a normalization module, a rounding module, an exponent addition module, an exponent adjustment module, a special case processing module and a result output module;
the input signals of the approximate floating-point multiplier are a multiplier and a multiplicand in a half-precision format, and the digits of the multiplier and the multiplicand are divided into a low weight digit, a first middle weight digit, a second middle weight digit and a high weight digit according to a sequence from low to high;
the sign bit XOR module is used for XOR-ing the sign bits of the multiplier and the multiplicand and sending the XOR result as a result sign bit to the result output module;
the mantissa approximate multiplication module is connected with the result output module sequentially through the normalization module and the rounding module, performs approximate multiplication processing on the digits of a multiplier and a multiplicand, normalizes and normalizes the processing result through the normalization module and the rounding module, and sends the normalized rounding result as a result mantissa digit to the result output module;
the mantissa approximate multiplication module comprises a truncation and compensation unit, a low-order OR gate compression unit, an approximate 4-2 compressor and a precise compressor; the truncation and compensation unit truncates the low weight bits and compensates the low weight bits with the highest bit number, the low-order OR gate compression unit compresses each two partial products of the compensation bits and the first middle weight bits into one bit according to the sequence of the probability of the partial products from low to high, and the approximate 4-2 compressor approximately compresses each four partial products of the second middle weight bits; the precise compressor precisely compresses the high-weight bits;
the exponent adding module adds the exponents of the multiplier and the multiplicand, then adjusts according to the normalized rounding result output by the rounding module, and sends the adjustment result as a result exponent bit to the result output module;
and when the indexes in the input signals are all 0 or 1, multiplying the input signals by adopting a special case processing module.
2. The approximate floating-point multiplier based on partial product probability analysis of claim 1, wherein the multiplier or multiplicand is in the order from low to high, the 1 st bit to the 10 th bit are low weight bits, the 11 th bit is a first intermediate weight bit, the 12 th bit to the 14 th bit are a second intermediate weight bit, and the remaining number of bits are high weight bits.
3. The partial product probability analysis based approximate floating point multiplier of claim 2 wherein the mantissa approximate multiplication module includes three stages of compression;
when the first stage of compression is carried out, the truncation and compensation unit carries out partial product processing on the 1 st bit to the 10 th bit; the low-order OR gate compression unit performs probability-based OR gate compression on the 10 th order and the 11 th order; an approximate 4-2 compressor approximately compresses the 12 th bit to the 14 th bit; the precise compressor precisely compresses the rest high-weight bits;
when the second-stage compression is carried out, the truncation and compensation unit carries out compensation processing on the 10 th bit, the approximate 4-2 compressor carries out approximate compression on the 10 th bit to the 14 th bit, and the accurate compressor carries out accurate compression on the rest high-weight bits;
when the third-stage compression is carried out, the approximate 4-2 compressor carries out approximate compression on the 11 th bit to the 12 th bit, and the rest part uses a half adder to obtain a partial product of two rows; the two rows of partial products are added in a final summing section to produce a final product.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210076195.2A CN114115803B (en) | 2022-01-24 | 2022-01-24 | Approximate floating-point multiplier based on partial product probability analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210076195.2A CN114115803B (en) | 2022-01-24 | 2022-01-24 | Approximate floating-point multiplier based on partial product probability analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114115803A CN114115803A (en) | 2022-03-01 |
CN114115803B true CN114115803B (en) | 2022-05-03 |
Family
ID=80361118
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210076195.2A Active CN114115803B (en) | 2022-01-24 | 2022-01-24 | Approximate floating-point multiplier based on partial product probability analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114115803B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114647399B (en) * | 2022-05-19 | 2022-08-16 | 南京航空航天大学 | Low-energy-consumption high-precision approximate parallel fixed-width multiplication accumulation device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101840324A (en) * | 2010-04-28 | 2010-09-22 | 中国科学院自动化研究所 | 64-bit fixed and floating point multiplier unit supporting complex operation and subword parallelism |
CN109542393A (en) * | 2018-11-19 | 2019-03-29 | 电子科技大学 | A kind of approximation 4-2 compressor and approximate multiplier |
CN112732221A (en) * | 2019-10-14 | 2021-04-30 | 安徽寒武纪信息科技有限公司 | Multiplier, method, integrated circuit chip and computing device for floating-point operation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10241756B2 (en) * | 2017-07-11 | 2019-03-26 | International Business Machines Corporation | Tiny detection in a floating-point unit |
-
2022
- 2022-01-24 CN CN202210076195.2A patent/CN114115803B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101840324A (en) * | 2010-04-28 | 2010-09-22 | 中国科学院自动化研究所 | 64-bit fixed and floating point multiplier unit supporting complex operation and subword parallelism |
CN109542393A (en) * | 2018-11-19 | 2019-03-29 | 电子科技大学 | A kind of approximation 4-2 compressor and approximate multiplier |
CN112732221A (en) * | 2019-10-14 | 2021-04-30 | 安徽寒武纪信息科技有限公司 | Multiplier, method, integrated circuit chip and computing device for floating-point operation |
Non-Patent Citations (2)
Title |
---|
基于新型booth选择器和压缩器的乘法器设计;王佳乐等;《微电子学与计算机》;20200305(第03期);全文 * |
高速浮点运算单元的FPGA实现;张小妍等;《信息化研究》;20091120(第11期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114115803A (en) | 2022-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210349692A1 (en) | Multiplier and multiplication method | |
CN110362292B (en) | Approximate multiplication method and approximate multiplier based on approximate 4-2 compressor | |
US10491239B1 (en) | Large-scale computations using an adaptive numerical format | |
CN110852434B (en) | CNN quantization method, forward calculation method and hardware device based on low-precision floating point number | |
CN111488133B (en) | High-radix approximate Booth coding method and mixed-radix Booth coding approximate multiplier | |
CN114647399B (en) | Low-energy-consumption high-precision approximate parallel fixed-width multiplication accumulation device | |
CN109542393A (en) | A kind of approximation 4-2 compressor and approximate multiplier | |
CN113076083B (en) | Data multiply-add operation circuit | |
Yin et al. | Designs of approximate floating-point multipliers with variable accuracy for error-tolerant applications | |
CN114115803B (en) | Approximate floating-point multiplier based on partial product probability analysis | |
CN111221499B (en) | Approximate multiplier based on approximate 6-2 and 4-2 compressors and calculation method | |
CN116400883A (en) | Floating point multiply-add device capable of switching precision | |
Yang et al. | An approximate multiply-accumulate unit with low power and reduced area | |
CN110187866B (en) | Hyperbolic CORDIC-based logarithmic multiplication computing system and method | |
WO2022170811A1 (en) | Fixed-point multiply-add operation unit and method suitable for mixed-precision neural network | |
CN111966323B (en) | Approximate multiplier based on unbiased compressor and calculation method | |
CN110825346B (en) | Low logic complexity unsigned approximation multiplier | |
Yang et al. | A low-power approximate multiply-add unit | |
CN113986194A (en) | Neural network approximate multiplier implementation method and device based on preprocessing | |
US7840628B2 (en) | Combining circuitry | |
CN115033204A (en) | High-energy-efficiency approximate multiplier with reconfigurable precision and bit width | |
CN114691086A (en) | High-performance approximate multiplier based on operand clipping and calculation method thereof | |
Madadum et al. | A resource-efficient convolutional neural network accelerator using fine-grained logarithmic quantization | |
CN116048455B (en) | Insertion type approximate multiplication accumulator | |
Ge et al. | An energy-efficient approximate floating-point multipliers for wireless communications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |