CN114115803B - Approximate floating-point multiplier based on partial product probability analysis - Google Patents

Approximate floating-point multiplier based on partial product probability analysis Download PDF

Info

Publication number
CN114115803B
CN114115803B CN202210076195.2A CN202210076195A CN114115803B CN 114115803 B CN114115803 B CN 114115803B CN 202210076195 A CN202210076195 A CN 202210076195A CN 114115803 B CN114115803 B CN 114115803B
Authority
CN
China
Prior art keywords
module
bit
approximate
compressor
compression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210076195.2A
Other languages
Chinese (zh)
Other versions
CN114115803A (en
Inventor
刘伟强
赵轩
闫成刚
陈珂
徐宸宇
王成华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202210076195.2A priority Critical patent/CN114115803B/en
Publication of CN114115803A publication Critical patent/CN114115803A/en
Application granted granted Critical
Publication of CN114115803B publication Critical patent/CN114115803B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only

Abstract

The invention discloses an approximate floating-point multiplier based on partial product probability analysis, which comprises a sign bit exclusive OR module, a mantissa approximate multiplication module, a normalization module, a rounding module, an exponent addition module, an exponent adjustment module, a special condition processing module and a result output module, wherein the sign bit exclusive OR module is used for carrying out partial product probability analysis on a result; the mantissa approximate multiplication module comprises a truncation and compensation unit, a low-order OR gate compression unit, an approximate 4-2 compressor and an accurate compressor; the mantissa approximate multiplication module cuts off the low weight bit and performs compensation processing on the low weight bit with the highest bit number, each two partial products of the compensation bit and the first middle weight bit are compressed into one bit by using an OR gate, and each four partial products of the second middle weight bit are approximately compressed; the high-weight bits are compressed accurately. The invention can effectively simplify the structure of the compressor and generate errors as few as possible, can not generate additional errors by adjusting the input sequence, and ensures the precision of the multiplier while reducing the complexity of the compression structure.

Description

Approximate floating-point multiplier based on partial product probability analysis
Technical Field
The invention relates to the technical field of approximation circuits, in particular to an approximation floating-point multiplier based on partial product probability analysis.
Background
As the speed of semiconductor processing technology has slowed and the Dennard scaling law has tended to fail, the power consumption and efficiency of integrated circuits have faced significant challenges. As an emerging computational paradigm, approximate computation provides a new idea for solving the high power consumption problem of integrated circuits, i.e., by sacrificing suitable accuracy in exchange for considerable power consumption and area gain. For applications with certain fault-tolerant capabilities such as data recognition, image processing, machine learning, wireless communication and the like, even if some precision reduction is brought by introducing approximate calculation, reasonable results can be generated. A floating-point multiplier, which is an arithmetic operation unit widely used in the fields of High-Dynamic Range (HDR) image processing, wireless communication, and the like, has High complexity and large consumption of hardware resources. Based on the floating-point multiplier, by utilizing the fault-tolerant capability of the application, an approximate floating-point multiplier which can generate a specific error and does not exceed the fault-tolerant limit of the application can be designed. The approximate floating-point multiplier realized by deleting or simplifying the internal circuit can greatly reduce the whole hardware resource and power consumption while sacrificing certain precision.
The existing technology has little research on approximate floating-point multipliers, the existing approximate floating-point multipliers are mainly designed approximately for mantissa multiplication parts, the distribution that the specific partial product is 1 in mantissa multiplication is not considered, and extra errors are introduced when the probabilities of the partial products are different from 1. Therefore, the probability that the partial product is 1 needs to be considered in the approximate design, so that an approximate floating-point multiplier with better performance and higher precision is designed.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides the approximate floating-point multiplier based on the partial product probability analysis, which can effectively simplify the structure of the compressor and generate as few errors as possible, can not generate additional errors by adjusting the input sequence, and ensures the precision of the multiplier while reducing the complexity of the compression structure.
In order to achieve the purpose, the invention adopts the following technical scheme:
an approximate floating-point multiplier based on partial product probability analysis comprises a sign bit exclusive OR module, a mantissa approximate multiplication module, a normalization module, a rounding module, an exponent addition module, an exponent adjustment module, a special case processing module and a result output module;
the input signals of the approximate floating-point multiplier are a multiplier and a multiplicand in a half-precision format, and the digits of the multiplier and the multiplicand are divided into a low weight digit, a first middle weight digit, a second middle weight digit and a high weight digit according to a sequence from low to high;
the sign bit XOR module is used for XOR-ing the sign bits of the multiplier and the multiplicand and sending the XOR result as a result sign bit to the result output module;
the mantissa approximate multiplication module is connected with the result output module sequentially through the normalization module and the rounding module, performs approximate multiplication processing on the digits of a multiplier and a multiplicand, normalizes and normalizes the processing result through the normalization module and the rounding module, and sends the normalized rounding result as a result mantissa digit to the result output module;
the mantissa approximate multiplication module comprises a truncation and compensation unit, a low-order OR gate compression unit, an approximate 4-2 compressor and a precise compressor; the truncation and compensation unit truncates the low weight bits and compensates the low weight bits with the highest bit number, the low-order OR gate compression unit compresses each two partial products of the compensation bits and the first middle weight bits into one bit according to the sequence of the partial products from low to high with the probability of 1, and the approximate 4-2 compressor approximately compresses each four partial products of the second middle weight bits; the precise compressor precisely compresses the high-weight bits;
the exponent adding module adds the exponents of the multiplier and the multiplicand, then adjusts according to the normalized rounding result output by the rounding module, and sends the adjustment result as a result exponent bit to the result output module;
and when the indexes in the input signals are all 0 or 1, multiplying the input signals by adopting a special case processing module.
In order to optimize the technical scheme, the specific measures adopted further comprise:
further, according to the sequence from low to high, the 1 st bit to the 10 th bit are low weight bits, the 11 th bit is a first middle weight bit, the 12 th bit to the 14 th bit are a second middle weight bit, and the remaining bits are high weight bits.
Further, the mantissa approximation multiplication module comprises three stages of compression;
when the first stage of compression is carried out, the truncation and compensation unit carries out partial product processing on the 1 st bit to the 10 th bit; the low-order OR gate compression unit performs probability-based OR gate compression on the 10 th order and the 11 th order; an approximate 4-2 compressor approximately compresses the 12 th bit to the 14 th bit; the precise compressor precisely compresses the rest high-weight bits;
when the second-stage compression is carried out, the truncation and compensation unit carries out compensation processing on the 10 th bit, the approximate 4-2 compressor carries out approximate compression on the 10 th bit to the 14 th bit, and the accurate compressor carries out accurate compression on the rest high-weight bits;
when the third-stage compression is carried out, the approximate 4-2 compressor carries out approximate compression on the 11 th bit to the 12 th bit, and the rest part uses a half adder to obtain the partial product of two rows; the two rows of partial products are added in a final summing section to produce a final product.
Further, the input of the approximate 4-2 compressor is
Figure 944300DEST_PATH_IMAGE001
Figure 374144DEST_PATH_IMAGE002
Figure 553453DEST_PATH_IMAGE003
And
Figure 828576DEST_PATH_IMAGE004
output is
Figure 472047DEST_PATH_IMAGE005
And
Figure 744897DEST_PATH_IMAGE006
expression of
Figure 473818DEST_PATH_IMAGE007
The expression of carry is
Figure 490316DEST_PATH_IMAGE008
The invention has the beneficial effects that:
first, the approximate floating-point multiplier based on partial product probability analysis of the present invention is based on partial product probability analysis, and the low-order or gate compression method compresses every two partial products according to the probability from low to high, which can effectively simplify the structure of the compressor and generate as few errors as possible.
Secondly, in the approximate floating-point multiplier based on partial product probability analysis, the approximate 4-2 compressor with insensitive input sequence is suitable for the scene with different partial product 1 probabilities, only when all 1 is input, the-2 error is generated, no extra error is generated by adjusting the input sequence (for the situation, a special condition processing module is adopted for processing), and the precision of the multiplier is ensured while the complexity of a compression structure is reduced.
Drawings
FIG. 1 is a schematic diagram of an approximate floating-point multiplier based on partial product probability analysis according to the present invention.
FIG. 2 is a probability diagram of 1 in the mantissa containing the implied bit of a half-precision floating point number distributed by Gaussian.
FIG. 3 is a diagram illustrating a probability distribution of fraction product 1 in mantissa multiplication.
Fig. 4 is a diagram of an implementation of a mantissa approximation multiplier.
FIG. 5 is a schematic diagram of a bit OR gate compression method.
Fig. 6 is a schematic diagram of an approximate 4-2 compressor architecture that is insensitive to input order.
Fig. 7 is a truth table diagram for an approximate 4-2 compressor insensitive to input order.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings.
It should be noted that the terms "upper", "lower", "left", "right", "front", "back", etc. used in the present invention are for clarity of description only, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not limited by the technical contents of the essential changes.
FIG. 1 is a schematic diagram of an approximate floating-point multiplier based on partial product probability analysis according to the present invention. The approximate floating-point multiplier comprises a sign bit exclusive OR module, a mantissa approximate multiplication module, a normalization module, a rounding module, an exponent addition module, an exponent adjusting module, a special case processing module and a result output module.
The input signals of the approximate floating-point multiplier are a multiplier and a multiplicand in a half-precision format, and the digits of the multiplier and the multiplicand are divided into a low-weight digit, a first middle-weight digit, a second middle-weight digit and a high-weight digit according to the sequence from low to high.
And the sign bit XOR module is used for XOR-ing the sign bits of the multiplier and the multiplicand and sending the XOR result as a result sign bit to the result output module.
The mantissa approximate multiplication module is connected with the result output module sequentially through the normalization module and the rounding module, performs approximate multiplication processing on the digits of the multiplier and the multiplicand, normalizes and normalizes the processing result through the normalization module and the rounding module, and sends the normalized rounding result as a result mantissa digit to the result output module.
The mantissa approximate multiplication module comprises a truncation and compensation unit, a low-order OR gate compression unit, an approximate 4-2 compressor and an accurate compressor; the truncation and compensation unit truncates the low weight bits and compensates the low weight bits with the highest bit number, the low-order OR gate compression unit compresses each two partial products of the compensation bits and the first middle weight bits into one bit according to the sequence of the partial products from low to high with the probability of 1, and the approximate 4-2 compressor approximately compresses each four partial products of the second middle weight bits; the precision compressor precisely compresses the high weight bits.
The exponent adding module adds the exponents of the multiplier and the multiplicand, then adjusts according to the normalized rounding result output by the rounding module, and sends the adjustment result as a result exponent bit to the result output module.
And when the indexes of the input signals are all 0 or 1, multiplying the input signals by using a special case processing module.
The input signals of the embodiment are two multipliers in half-precision format, the approximate floating-point multiplier performs exclusive or on sign bits of the two multipliers respectively, mantissa approximation multiplication is performed on mantissa, exponents are added, adjustment is performed according to a normalized rounding result of the mantissa approximation multiplication, and a product of the final approximate floating-point multiplier is obtained through output.
Based on the universality of Gaussian distribution, data in applications such as HDR image processing and wireless communication are all Gaussian distribution, experimental statistics is carried out on the probability of 1 in mantissas of half-precision floating point numbers of the Gaussian distribution, and the distribution is shown in FIG. 2, wherein A is 11-bit mantissa including hidden bits. The probability that the highest bit of the mantissa is 1 is the highest, and the probability of the mantissa is gradually reduced along with the increase of the numerical value, and is consistent with the distribution rule of the original data. The probability of mantissa 1 comprising an implied bit being different makes the partial products of mantissa approximation multipliers different, and the distribution is shown in fig. 3.
Fig. 4 shows an implementation of a mantissa approximation multiplier. The mantissa approximate multiplication module includes: low order or gate compression units, approximate 4-2 compressors insensitive to input order, and precision compressors. The low-order OR gate compression unit is used for the partial product compression module, and the low-order OR gate is used for compressing every two partial products into one order from low to high according to the probability that the partial products are 1. An approximate 4-2 compressor, insensitive to input order, is also used in the partial product compression module to approximately compress every fourth partial product of the intermediate weights. And truncating and compensating the low weight, performing low-order OR gate compression and approximate 4-2 compression on the middle weight bit, and performing accurate compression on the high weight bit to form a mantissa approximate multiplication module. The mantissa approximation multiplier is used in the mantissa multiplication portion of a floating-point multiplier.
For example, the 1 st bit to 10 th bit products are truncated and compensated at the 10 th bit, the 10 th to 14 th bits are approximately compressed, and the remaining high bits are precisely compressed. In the first stage of the partial product array, the 10 th and 11 th bits are subjected to probability-based OR gate compression as shown in FIG. 5. When compressed according to fig. 5, the probability of error generation by the compression part of the or gate is only 5.88% at the maximum, and the effect on the final result is less when the error generation is at the lower weight bits. Bits 12 to 14 are compressed using the inventive approximate 4-2 compressor without considering the probability problem of 1 in the partial product, and the high bits use exact compression. After the first stage of compression, the probability that the product of the low weight part is 1 becomes large, and the error introduced by using the or gate compression is greatly increased. Thus, in the second stage of compression, the 10 th to 14 th bits use the inventive 4-2 compressor, and the rest of the high order compression is the same as in the first stage. In the third stage of compression, the invented approximate 4-2 compressor is used at 11 and 12 bits with more than two partial products, and the half adder is used in the rest part to obtain two rows of partial products. Finally, the two rows of partial products are added in a final summing section to produce a final product.
Wherein the structure of an approximate 4-2 compressor insensitive to the input sequence is shown in FIG. 6, and the input of the approximate 4-2 compressor is
Figure 988293DEST_PATH_IMAGE009
Output is
Figure 759940DEST_PATH_IMAGE010
Expression of
Figure 648262DEST_PATH_IMAGE007
The expression of carry is
Figure 530767DEST_PATH_IMAGE008
. The approximate 4-2 compressor only makes-2 errors when the inputs are all 1, and adjusting the input sequence does not affect the final output result of the compressor, and a specific truth table is shown in fig. 7. The approximate 4-2 compressor is used in the middle weight bit of the mantissa approximate multiplication module, and the specific use range can be adjusted according to the precision.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims (4)

1. The approximate floating-point multiplier based on partial product probability analysis is characterized by comprising a sign bit exclusive OR module, a mantissa approximate multiplication module, a normalization module, a rounding module, an exponent addition module, an exponent adjustment module, a special case processing module and a result output module;
the input signals of the approximate floating-point multiplier are a multiplier and a multiplicand in a half-precision format, and the digits of the multiplier and the multiplicand are divided into a low weight digit, a first middle weight digit, a second middle weight digit and a high weight digit according to a sequence from low to high;
the sign bit XOR module is used for XOR-ing the sign bits of the multiplier and the multiplicand and sending the XOR result as a result sign bit to the result output module;
the mantissa approximate multiplication module is connected with the result output module sequentially through the normalization module and the rounding module, performs approximate multiplication processing on the digits of a multiplier and a multiplicand, normalizes and normalizes the processing result through the normalization module and the rounding module, and sends the normalized rounding result as a result mantissa digit to the result output module;
the mantissa approximate multiplication module comprises a truncation and compensation unit, a low-order OR gate compression unit, an approximate 4-2 compressor and a precise compressor; the truncation and compensation unit truncates the low weight bits and compensates the low weight bits with the highest bit number, the low-order OR gate compression unit compresses each two partial products of the compensation bits and the first middle weight bits into one bit according to the sequence of the probability of the partial products from low to high, and the approximate 4-2 compressor approximately compresses each four partial products of the second middle weight bits; the precise compressor precisely compresses the high-weight bits;
the exponent adding module adds the exponents of the multiplier and the multiplicand, then adjusts according to the normalized rounding result output by the rounding module, and sends the adjustment result as a result exponent bit to the result output module;
and when the indexes in the input signals are all 0 or 1, multiplying the input signals by adopting a special case processing module.
2. The approximate floating-point multiplier based on partial product probability analysis of claim 1, wherein the multiplier or multiplicand is in the order from low to high, the 1 st bit to the 10 th bit are low weight bits, the 11 th bit is a first intermediate weight bit, the 12 th bit to the 14 th bit are a second intermediate weight bit, and the remaining number of bits are high weight bits.
3. The partial product probability analysis based approximate floating point multiplier of claim 2 wherein the mantissa approximate multiplication module includes three stages of compression;
when the first stage of compression is carried out, the truncation and compensation unit carries out partial product processing on the 1 st bit to the 10 th bit; the low-order OR gate compression unit performs probability-based OR gate compression on the 10 th order and the 11 th order; an approximate 4-2 compressor approximately compresses the 12 th bit to the 14 th bit; the precise compressor precisely compresses the rest high-weight bits;
when the second-stage compression is carried out, the truncation and compensation unit carries out compensation processing on the 10 th bit, the approximate 4-2 compressor carries out approximate compression on the 10 th bit to the 14 th bit, and the accurate compressor carries out accurate compression on the rest high-weight bits;
when the third-stage compression is carried out, the approximate 4-2 compressor carries out approximate compression on the 11 th bit to the 12 th bit, and the rest part uses a half adder to obtain a partial product of two rows; the two rows of partial products are added in a final summing section to produce a final product.
4. The approximate floating-point multiplier based on partial product probability analysis of claim 1 wherein the input of the approximate 4-2 compressor is P1、P2、P3And P4Outputs of Sum' and Cout', and
Figure FDA0003542568560000021
the expression of carry is
Figure FDA0003542568560000022
CN202210076195.2A 2022-01-24 2022-01-24 Approximate floating-point multiplier based on partial product probability analysis Active CN114115803B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210076195.2A CN114115803B (en) 2022-01-24 2022-01-24 Approximate floating-point multiplier based on partial product probability analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210076195.2A CN114115803B (en) 2022-01-24 2022-01-24 Approximate floating-point multiplier based on partial product probability analysis

Publications (2)

Publication Number Publication Date
CN114115803A CN114115803A (en) 2022-03-01
CN114115803B true CN114115803B (en) 2022-05-03

Family

ID=80361118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210076195.2A Active CN114115803B (en) 2022-01-24 2022-01-24 Approximate floating-point multiplier based on partial product probability analysis

Country Status (1)

Country Link
CN (1) CN114115803B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114647399B (en) * 2022-05-19 2022-08-16 南京航空航天大学 Low-energy-consumption high-precision approximate parallel fixed-width multiplication accumulation device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101840324A (en) * 2010-04-28 2010-09-22 中国科学院自动化研究所 64-bit fixed and floating point multiplier unit supporting complex operation and subword parallelism
CN109542393A (en) * 2018-11-19 2019-03-29 电子科技大学 A kind of approximation 4-2 compressor and approximate multiplier
CN112732221A (en) * 2019-10-14 2021-04-30 安徽寒武纪信息科技有限公司 Multiplier, method, integrated circuit chip and computing device for floating-point operation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10241756B2 (en) * 2017-07-11 2019-03-26 International Business Machines Corporation Tiny detection in a floating-point unit

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101840324A (en) * 2010-04-28 2010-09-22 中国科学院自动化研究所 64-bit fixed and floating point multiplier unit supporting complex operation and subword parallelism
CN109542393A (en) * 2018-11-19 2019-03-29 电子科技大学 A kind of approximation 4-2 compressor and approximate multiplier
CN112732221A (en) * 2019-10-14 2021-04-30 安徽寒武纪信息科技有限公司 Multiplier, method, integrated circuit chip and computing device for floating-point operation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于新型booth选择器和压缩器的乘法器设计;王佳乐等;《微电子学与计算机》;20200305(第03期);全文 *
高速浮点运算单元的FPGA实现;张小妍等;《信息化研究》;20091120(第11期);全文 *

Also Published As

Publication number Publication date
CN114115803A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
US20210349692A1 (en) Multiplier and multiplication method
CN110362292B (en) Approximate multiplication method and approximate multiplier based on approximate 4-2 compressor
US10491239B1 (en) Large-scale computations using an adaptive numerical format
CN110852434B (en) CNN quantization method, forward calculation method and hardware device based on low-precision floating point number
CN111488133B (en) High-radix approximate Booth coding method and mixed-radix Booth coding approximate multiplier
CN114647399B (en) Low-energy-consumption high-precision approximate parallel fixed-width multiplication accumulation device
CN109542393A (en) A kind of approximation 4-2 compressor and approximate multiplier
CN113076083B (en) Data multiply-add operation circuit
Yin et al. Designs of approximate floating-point multipliers with variable accuracy for error-tolerant applications
CN114115803B (en) Approximate floating-point multiplier based on partial product probability analysis
CN111221499B (en) Approximate multiplier based on approximate 6-2 and 4-2 compressors and calculation method
CN116400883A (en) Floating point multiply-add device capable of switching precision
Yang et al. An approximate multiply-accumulate unit with low power and reduced area
CN110187866B (en) Hyperbolic CORDIC-based logarithmic multiplication computing system and method
WO2022170811A1 (en) Fixed-point multiply-add operation unit and method suitable for mixed-precision neural network
CN111966323B (en) Approximate multiplier based on unbiased compressor and calculation method
CN110825346B (en) Low logic complexity unsigned approximation multiplier
Yang et al. A low-power approximate multiply-add unit
CN113986194A (en) Neural network approximate multiplier implementation method and device based on preprocessing
US7840628B2 (en) Combining circuitry
CN115033204A (en) High-energy-efficiency approximate multiplier with reconfigurable precision and bit width
CN114691086A (en) High-performance approximate multiplier based on operand clipping and calculation method thereof
Madadum et al. A resource-efficient convolutional neural network accelerator using fine-grained logarithmic quantization
CN116048455B (en) Insertion type approximate multiplication accumulator
Ge et al. An energy-efficient approximate floating-point multipliers for wireless communications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant