WO2017185203A1 - Device and method for adding up plurality of floating point numbers - Google Patents

Device and method for adding up plurality of floating point numbers Download PDF

Info

Publication number
WO2017185203A1
WO2017185203A1 PCT/CN2016/080126 CN2016080126W WO2017185203A1 WO 2017185203 A1 WO2017185203 A1 WO 2017185203A1 CN 2016080126 W CN2016080126 W CN 2016080126W WO 2017185203 A1 WO2017185203 A1 WO 2017185203A1
Authority
WO
WIPO (PCT)
Prior art keywords
bit
floating point
point number
mantissa
bits
Prior art date
Application number
PCT/CN2016/080126
Other languages
French (fr)
Chinese (zh)
Inventor
郭崎
周聖元
李震
陈云霁
陈天石
Original Assignee
北京中科寒武纪科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京中科寒武纪科技有限公司 filed Critical 北京中科寒武纪科技有限公司
Priority to PCT/CN2016/080126 priority Critical patent/WO2017185203A1/en
Publication of WO2017185203A1 publication Critical patent/WO2017185203A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/485Adding; Subtracting

Definitions

  • the present invention provides an apparatus and method for performing a plurality of floating point number additions, which can be used for image processors, digital processors, smart devices, and on-chip network data operations.
  • the existing accelerometers for adding operands are mainly divided into two types, a serial carry addition tree and a carry save addition tree.
  • Figure 1 shows the structure of the serial carry addition tree, that is, the structure of the binary tree is used, the operands to be operated are added two by two, and then passed up until the final result is obtained.
  • the structure supports multi-floating point parallel addition, which accelerates the addition operation, but in the carry propagation, it needs to consume a large amount of clock delay, and the operation result and the order of the operand also have a certain relationship, and the precision loss of the operation result is more Big.
  • Figure 2 shows the structure of the carry-save addition tree. That is, using the structure of the Wallace tree, the part generated by the carry of each stage of the full adder is connected to the upper part of the next stage, and the carry-over is realized by the connection to avoid the complexity.
  • the carry transfer logic reduces the delay of carry transfer.
  • this method cannot be directly used for the addition of floating-point numbers, and the order of the operands is different, which may also cause calculation errors.
  • the present invention provides an apparatus for performing a plurality of floating point number additions, the floating point number including a sign bit, an exponent bit, and a mantissa bit, and the apparatus includes:
  • a preprocessing module configured to preprocess the plurality of floating point numbers such that exponential bits and sign bits of the plurality of floating point numbers are consistent
  • An adding module configured to add a plurality of floating-point numbers after the pre-processing, to obtain an accumulated result and a value to be shifted of the accumulated result, where the accumulated result includes a sign bit, an exponent bit, and a mantissa bit;
  • a normalization processing module configured to shift the sign bit, the exponent bit, and the mantissa bit of the accumulated result according to the value to be shifted, to obtain a normalized accumulated result.
  • the preprocessing module includes:
  • a comparison selection module configured to compare the exponential bits of the plurality of floating point numbers in a binary tree to select a maximum exponent bit
  • calculating the number of bits n of the logical shift obtained by the shift module includes:
  • the shift module calculates a logical shift of the mantissa of the floating point number, including:
  • the lowest bit of the shifted mantissa bit is used as a sticky bit, and the sticky bit is ORed with the discarded n bit, and the sticky bit is updated by the operation result to obtain the mantissa bit of the final desired floating point number.
  • addition module includes:
  • the Wallace Tree module is used to add multiple floating point numbers using the Wallace tree structure until it is reduced to two numbers;
  • the final result accumulating module is configured to add the two numbers to obtain a first accumulated result, and add the inverse codes of the two numbers to obtain a second accumulated result, and select the first according to the highest bit of the first accumulated result. Accumulating the result or the second accumulated result as an accumulated result;
  • the normalization processing module logically shifts the accumulated result according to the value to be shifted, so that the first significant digit of the accumulated result is at the highest position, and normalizes the accumulated result after the logical shift to obtain an accumulated result. Sign bit, exponent bit, and mantissa.
  • the present invention also provides a method for performing a plurality of floating point number additions, the floating point number including a sign bit, an exponent bit, and a mantissa bit, and the method includes:
  • step S1 includes:
  • step S12 the number of bits n of the logical shift is obtained, including:
  • step S12 logically shifting the mantissa bits of the floating point number includes:
  • the first bit is padded before the highest bit of the mantissa of the floating point number, wherein the value of the hidden bit is 1 for the normalized floating point number and 0 for the non-normalized floating point number;
  • the lowest bit of the shifted mantissa bit is used as a sticky bit, and the sticky bit is ORed with the discarded n bit, and the sticky bit is updated by the operation result to obtain the mantissa bit of the final desired floating point number.
  • step S2 includes:
  • step S3 includes: logically shifting the accumulated result according to the value to be shifted of the accumulated result, so that the first significant digit of the accumulated result is at the highest position, and normalizing the accumulated result after the logical shift, The sign bit, exponent bit, and mantissa bit of the accumulated result are obtained.
  • the invention can add multiple floating point numbers of the same standard, solves the problem of adding operations of multiple operands in one operation, and adds effective digital bits and sticky bits to reduce the precision loss of the operation result;
  • the calculation of the structure such as the tree reduces the complexity of the hardware and improves the operation speed.
  • FIG. 1 is a schematic structural diagram of a serial carry addition tree in the prior art.
  • FIG. 2 is a schematic view showing the structure of a Wallace tree in the prior art.
  • FIG. 3 is a schematic diagram of an apparatus for performing multiple floating point number additions provided by the present invention.
  • Figure 4 is a schematic illustration of the comparison of index points in the present invention.
  • Figure 5 is a schematic illustration of the selection of the maximum index bit in the present invention.
  • Figure 6 is a schematic diagram of a calculation shifting module in the present invention.
  • Figure 7 is a schematic illustration of the final result accumulation module of the present invention.
  • the apparatus includes a preprocessing module, an adding operation module, and a normalization processing module.
  • the preprocessing module includes a comparison selection module and a calculation shift.
  • the bit module, the addition module includes a Wallace tree module, a final result accumulation module, and a leading zero prediction module.
  • the comparison selection module performs a pairwise selection comparison operation, as shown in FIG. 4, that is, if e a >e b , then a is selected, otherwise b is selected. Then, as shown in FIG. 5, using the binary tree structure, the floating point number f max having the largest exponent bit is sequentially selected, and the sign bit, the exponent bit, and the mantissa are s max , e max , m max , respectively .
  • the specific operation is to first fill the top bit of the mantissa m i with a hidden bit.
  • the value of the hidden bit is 1; when the floating point number f i is non-specific
  • the hidden bit is 0. After the lowest digit of the mantissa bit, add k "0"s as valid bits.
  • the total number of digits of the mantissa is equal to the total number of bits after shifting, that is, the number of digits of the original mantissa + the number of hidden digits + the number of significant digits added
  • each floating point number f i is shifted according to the previously obtained number of bits to be logically shifted, where n bits are shifted right first to discard the lowest n bits of the mantissa bit; then the shifted mantissa bit is shifted The lowest bit is used as the sticky bit, and the “n” operation is performed with the discarded n bits, and the operation result is updated to the value of the sticky bit to obtain the final result of the desired shifted mantissa bit.
  • the mantissas of the shifted floating point numbers are added until they are reduced to two numbers, denoted as sum 1 and carry 1 , and output to The final result is the accumulation module and the leading zero prediction module.
  • the Wallace tree structure quickly sums up the processed multiple floating point numbers into two numbers by simple hardware, that is, each time using i full adders, the j i-bit numbers are added and converted into The number of 2*j/3 i+1 bits is added, and then converted into 4*j/9 numbers by a full-adder, until it is converted into 2 numbers.
  • the final result accumulation module uses two channels to calculate the operation result.
  • the structure is shown in Fig. 7. Adding a sum of a path 1 and directly with Carry, other path of the two counter-code addition, according to the most significant bit last results obtained in the first passage, if the most significant bit is 0, the choice of The result of one path is output as the final result tmp_sum of the accumulated portion, otherwise, the result of the second path is selected as the final result tmp_sum of the accumulated portion and output.
  • the leading zero prediction module uses the leading zero anticipator (LZA) method to first obtain the propagation function of the input sum 1 and carry 1 in bits.
  • LZA leading zero anticipator
  • the final result tmp_sum is logically shifted according to the position num_shift of the first significant digit of the leading zero prediction module, the number of bits is num_shift, and then normalized to obtain the sign bit of the final result.
  • s result , exponent bit e result and mantissa bit m result , , combined to get the final result sum result ⁇ s result ,e result ,m result ⁇ .
  • the floating-point number standard adopts IEEE754's half-type floating-point number standard, that is, each floating-point number is composed of 1 bit symbol bit, 5 bit exponent bit and 10 bit mantissa bit.
  • the mantissa part of the above is 1010000001, and the highest bit is increased by one hidden bit.
  • f 1 is a normalized floating point number
  • the value of the hidden bit is 1 and 11010000001 is obtained; Three 0s, and the lowest bit is defined as the sticky bit, which gives 11010000001000.
  • n 1 5
  • we need to move 5 bits so the rightmost 5 bits 01000 need to be discarded, get 00000110100000; the discarded 5 digit 01000 and the sticky bit 0 are ORed to get 1,
  • the result is updated with the result, that is, the value of the sticky bit is 1 and the result after shifting is 00000110100001.
  • the mantissa part of the above is 0011110000, and the highest bit is increased by one hidden bit.
  • f 2 is a normalized floating point number
  • the value of the hidden bit is 1, and 3 bits are added after the lowest bit. 0, and define the lowest bit as the sticky bit, get 10011110000000.
  • the sticky bit is updated with the result, that is, the value of the sticky bit is 0, and the result of the shift is 00100111100000.
  • the result of the preprocessing is input to the addition module.
  • Four 14-bit preprocessed mantissas are processed using the Wallace tree structure shown in FIG.
  • the leading zero prediction part is to calculate the output result of the first stage 4-2 Wallace tree by using the leading zero prediction algorithm (LZA algorithm) to calculate the final result of the accumulated part.
  • LZA algorithm leading zero prediction algorithm
  • the addition operation of multiple floating-point numbers of the same standard can be completed quickly and efficiently, the number of operands supported by one operation is increased, the operation delay is reduced, the operation process is accelerated, and the precision loss of the operation result is reduced.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Nonlinear Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

A device and method for adding up a plurality of floating point numbers. The device comprises a preprocessing module, an addition operation module and a normalizing module. The preprocessing module pre-processes a plurality of floating point numbers such that the exponent bits and the sign bits of the plurality of floating point numbers are aligned. The addition operation module adds up the plurality of pre-processed floating point numbers, so as to obtain an accumulation result and the number of bits to be shifted for the accumulation result. The normalizing module shifts the sign bit, exponent bits and mantissa bits of the accumulation result according to the number of bits to be shifted so as to obtain a normalized accumulation result. The present device and method have the advantages of low operation delay and small precision loss for the result when a plurality of floating point numbers are added up.

Description

一种用于执行多个浮点数相加的装置及方法Apparatus and method for performing multiple floating point number addition 技术领域Technical field
本发明提供一种用于执行多个浮点数相加的装置及方法,可用于图像处理器、数字处理器、智能设备及片上网络数据运算等。The present invention provides an apparatus and method for performing a plurality of floating point number additions, which can be used for image processors, digital processors, smart devices, and on-chip network data operations.
背景技术Background technique
随着大数据时代的来临,数据的运算量也随之大幅增加,对运算的速度也提出了更高的要求。无论是图像还是数字等处理器,都必须满足低延迟、准确度高的运算要求。浮点数加法,作为最基本且最常用的浮点数运算之一,如何加速此类运算,也显得尤为重要,并引起了广泛的讨论和研究。With the advent of the era of big data, the amount of data calculation has also increased significantly, and higher requirements have been placed on the speed of computing. Processors such as images and digital must meet the low-latency, high-accuracy computing requirements. Floating-point addition, as one of the most basic and most commonly used floating-point operations, how to speed up such operations is also particularly important and has led to extensive discussion and research.
现有的操作数相加的加速装置,主要分为两种,串行进位加法树和进位保存加法树。The existing accelerometers for adding operands are mainly divided into two types, a serial carry addition tree and a carry save addition tree.
图1示出了串行进位加法树的结构,即采用二叉树的结构,对待运算的操作数两两相加,而后向上传递,直至得到最终结果。显然,该结构支持多浮点数并行相加,加速了加法运算,但是在进位传播时,需要消耗大量的时钟延迟,而且,运算结果和操作数的顺序也有一定的关系,运算结果的精度损失较大。Figure 1 shows the structure of the serial carry addition tree, that is, the structure of the binary tree is used, the operands to be operated are added two by two, and then passed up until the final result is obtained. Obviously, the structure supports multi-floating point parallel addition, which accelerates the addition operation, but in the carry propagation, it needs to consume a large amount of clock delay, and the operation result and the order of the operand also have a certain relationship, and the precision loss of the operation result is more Big.
图2示出了进位保存加法树的结构,即利用华莱士树的结构,将由每一级全加器的进位生成的部分连接到下一级的高位,通过连线实现进位传递,避免复杂的进位传递逻辑,降低了进位传递的延迟。然而,这种方法并不能直接用于浮点数的加法,并且,操作数的顺序不同,也可能带来计算误差。Figure 2 shows the structure of the carry-save addition tree. That is, using the structure of the Wallace tree, the part generated by the carry of each stage of the full adder is connected to the upper part of the next stage, and the carry-over is realized by the connection to avoid the complexity. The carry transfer logic reduces the delay of carry transfer. However, this method cannot be directly used for the addition of floating-point numbers, and the order of the operands is different, which may also cause calculation errors.
另外,常用算法中,大多将浮点数加法和浮点数的累加混合在一起,这种混合操作,就要求运算器必需同时支持这两种运算,并且运算结果和给定的操作数顺序无关。 In addition, in the commonly used algorithms, the accumulation of floating-point addition and floating-point numbers is mostly mixed. This hybrid operation requires the operator to support both operations at the same time, and the operation result is independent of the order of the given operands.
发明内容Summary of the invention
(一)要解决的技术问题(1) Technical problems to be solved
本发明的目的在于,提供一种能够执行多个浮点数相加的装置和方法,具有运算延迟低、结果精度损失小的优点。It is an object of the present invention to provide an apparatus and method capable of performing a plurality of floating point numbers addition, which has the advantages of low arithmetic delay and small loss of accuracy.
(二)技术方案(2) Technical plan
本发明提供一种用于执行多个浮点数相加的装置,浮点数包括符号位、指数位和尾数位,装置包括:The present invention provides an apparatus for performing a plurality of floating point number additions, the floating point number including a sign bit, an exponent bit, and a mantissa bit, and the apparatus includes:
预处理模块,用于对所述多个浮点数进行预处理,以使多个浮点数的指数位和符号位一致;a preprocessing module, configured to preprocess the plurality of floating point numbers such that exponential bits and sign bits of the plurality of floating point numbers are consistent;
加法运算模块,用于对预处理后的多个浮点数相加,得到累加结果及该累加结果的待移位值,所述累加结果包括符号位、指数位和尾数位;An adding module, configured to add a plurality of floating-point numbers after the pre-processing, to obtain an accumulated result and a value to be shifted of the accumulated result, where the accumulated result includes a sign bit, an exponent bit, and a mantissa bit;
规格化处理模块,用于根据待移位值对所述累加结果的符号位、指数位和尾数位进行移位,得到规格化的累加结果。And a normalization processing module, configured to shift the sign bit, the exponent bit, and the mantissa bit of the accumulated result according to the value to be shifted, to obtain a normalized accumulated result.
进一步,预处理模块包括:Further, the preprocessing module includes:
比较选择模块,用于以二叉树的形式对所述多个浮点数的指数位进行两两比较,选择出最大的指数位;a comparison selection module, configured to compare the exponential bits of the plurality of floating point numbers in a binary tree to select a maximum exponent bit;
计算移位模块,用于根据每个浮点数和具有最大指数位的浮点数的指数位的关系来求得每个浮点数需要进行逻辑移位的位数n,并对相应浮点数的尾数位进行逻辑移位,以使每一个浮点数的指数位均等于所述最大的指数位,同时,令每一个浮点数的符号位与指数位最大的浮点数的符号位一致,其中,浮点数在改变符号位时,对其尾数位取补码。Calculating a shifting module for determining a number n of bits that need to be logically shifted for each floating point number according to a relationship between each floating point number and an exponent bit of a floating point number having a maximum exponent bit, and the mantissa bit of the corresponding floating point number Performing a logical shift such that the exponent bits of each floating point number are equal to the maximum exponent bit, and at the same time, the sign bit of each floating point number is consistent with the sign bit of the largest floating point number of the exponent bit, wherein the floating point number is When changing the sign bit, the mantissa is complemented.
进一步,计算移位模块求得逻辑移位的位数n,包括:Further, calculating the number of bits n of the logical shift obtained by the shift module includes:
计算最大的指数位和待逻辑移位的浮点数的指数位的差值Δe;Calculating a difference Δe between the largest exponent bit and the exponent bit of the floating point number to be logically shifted;
若具有最大指数位的浮点数为规格化浮点数,而待逻辑移位的浮点数为非规格化浮点数,则令n=Δe-1;否则,令n=Δe。If the floating point number with the largest exponent bit is a normalized floating point number and the floating point number to be logically shifted is a non-normalized floating point number, let n = Δe-1; otherwise, let n = Δe.
进一步,计算移位模块对浮点数的尾数位进行逻辑移位,包括:Further, the shift module calculates a logical shift of the mantissa of the floating point number, including:
在所述浮点数的尾数位的最高位前补上1位隐藏位,其中,对于规格化浮点数,隐藏位的值为1,对于非规格化浮点数,隐藏位的值为0;Adding a 1-bit hidden bit before the highest bit of the mantissa of the floating-point number, wherein the value of the hidden bit is 1 for the normalized floating-point number, and 0 for the non-normalized floating-point number;
在浮点数的尾数位的最低位后增加k个“0”,作为有效位; Add k "0"s as the valid bits after the lowest bit of the mantissa of the floating point number;
对增加有效位和隐藏位的尾数位右移n位,以舍弃尾数位最低的n位;Shifting the mantissa of the valid and hidden bits to the right by n bits to discard the lowest n bits of the mantissa;
将移位后的尾数位的最低位作为粘滞位,将粘滞位与舍弃的n位进行“或”运算,用运算结果更新粘滞位,得到最终所需的浮点数的尾数位。The lowest bit of the shifted mantissa bit is used as a sticky bit, and the sticky bit is ORed with the discarded n bit, and the sticky bit is updated by the operation result to obtain the mantissa bit of the final desired floating point number.
进一步,加法运算模块包括:Further, the addition module includes:
华莱士树模块,用于采用华莱士树结构将多个浮点数相加,直至归约为两个数;The Wallace Tree module is used to add multiple floating point numbers using the Wallace tree structure until it is reduced to two numbers;
最终结果累加模块,用于将该两个数相加,得到第一累加结果,并将该两个数的反码相加,得到第二累加结果,根据第一累加结果的最高位选择第一累加结果或者第二累加结果,作为累加结果;The final result accumulating module is configured to add the two numbers to obtain a first accumulated result, and add the inverse codes of the two numbers to obtain a second accumulated result, and select the first according to the highest bit of the first accumulated result. Accumulating the result or the second accumulated result as an accumulated result;
前导零预测模块,用于对该两个数进行逻辑运算,确定累加结果第一个有效数字位的位置,以得到累加结果的待移位值。具体而言,假设该两个数为A和B,首先利用传播函数
Figure PCTCN2016080126-appb-000001
生成函数G=AB,杀死函数Z=(AB)’分别对每一位进行运算;而后,为每一位设置一个指
Figure PCTCN2016080126-appb-000002
么我们就可以得到位置参数为
Figure PCTCN2016080126-appb-000003
第一个不为0的位置参数即为所求的第一个有效数字位的位置,将其下角标以二进制的形式输出即可。
The leading zero prediction module is configured to perform logical operations on the two numbers to determine the position of the first significant digit of the accumulated result to obtain the value to be shifted of the accumulated result. Specifically, suppose the two numbers are A and B, first using the propagation function.
Figure PCTCN2016080126-appb-000001
Generate function G=AB, kill function Z=(AB)' to operate on each bit separately; then, set one finger for each bit
Figure PCTCN2016080126-appb-000002
Then we can get the positional parameter as
Figure PCTCN2016080126-appb-000003
The first position parameter that is not 0 is the position of the first significant digit that is sought, and the lower corner is output in binary form.
进一步,规格化处理模块根据待移位值对累加结果进行逻辑移位,以使累加结果的第一个有效数字位处于最高位,并对逻辑移位后的累加结果进行规格化,得到累加结果的符号位、指数位和尾数位。Further, the normalization processing module logically shifts the accumulated result according to the value to be shifted, so that the first significant digit of the accumulated result is at the highest position, and normalizes the accumulated result after the logical shift to obtain an accumulated result. Sign bit, exponent bit, and mantissa.
本发明还提供一种用于执行多个浮点数相加的方法,浮点数包括符号位、指数位和尾数位,方法包括:The present invention also provides a method for performing a plurality of floating point number additions, the floating point number including a sign bit, an exponent bit, and a mantissa bit, and the method includes:
S1,对多个浮点数进行预处理,以使多个浮点数的指数位和符号位一致;S1, preprocessing a plurality of floating point numbers to make the exponential bits and the sign bits of the plurality of floating point numbers coincide;
S2,对预处理后的多个浮点数相加,得到累加结果及该累加结果的待移位值,所述累加结果包括符号位、指数位和尾数位; S2, adding a plurality of floating-point numbers after the pre-processing to obtain an accumulated result and a value to be shifted of the accumulated result, where the accumulated result includes a sign bit, an exponent bit, and a mantissa bit;
S3,根据待移位值对累加结果的符号位、指数位和尾数位进行移位,得到规格化的累加结果。S3, shifting the sign bit, the exponent bit and the mantissa bit of the accumulated result according to the value to be shifted, to obtain a normalized accumulated result.
进一步,步骤S1包括:Further, step S1 includes:
S11,以二叉树的形式对多个浮点数的指数位进行两两比较,选择出最大的指数位;S11, comparing the exponential bits of the plurality of floating point numbers in the form of a binary tree to select the largest exponential bit;
S12,根据每个浮点数和具有最大指数位的浮点数的指数位的关系来求得每个浮点数需要进行逻辑移位的位数n,并对相应浮点数的尾数位进行逻辑移位,以使每一个浮点数的指数位均等于所述最大的指数位,同时,令每一个浮点数的符号位与指数位最大的浮点数的符号位一致,其中,浮点数在改变符号位时,对其尾数位取补码。S12. Determine, according to the relationship between each floating point number and an exponent bit of the floating point number having the largest exponent bit, a bit number n that needs to be logically shifted for each floating point number, and logically shift the mantissa bit of the corresponding floating point number. So that the exponent bits of each floating point number are equal to the maximum exponent bit, and at the same time, the sign bit of each floating point number is consistent with the sign bit of the largest floating point number of the exponent bit, wherein when the floating point number changes the sign bit, Complement the code for its mantissa.
进一步,步骤S12中,求得逻辑移位的位数n,包括:Further, in step S12, the number of bits n of the logical shift is obtained, including:
计算最大的指数位和待逻辑移位的浮点数的指数位的差值Δe;Calculating a difference Δe between the largest exponent bit and the exponent bit of the floating point number to be logically shifted;
若具有最大指数位的浮点数为规格化浮点数,而待逻辑移位的浮点数为非规格化浮点数,则令n=Δe-1;否则,令n=Δe。If the floating point number with the largest exponent bit is a normalized floating point number and the floating point number to be logically shifted is a non-normalized floating point number, let n = Δe-1; otherwise, let n = Δe.
进一步,步骤S12中,对浮点数的尾数位进行逻辑移位,包括:Further, in step S12, logically shifting the mantissa bits of the floating point number includes:
在浮点数的尾数位的最高位前补上1位隐藏位,其中,对于规格化浮点数,隐藏位的值为1,对于非规格化浮点数,隐藏位的值为0;The first bit is padded before the highest bit of the mantissa of the floating point number, wherein the value of the hidden bit is 1 for the normalized floating point number and 0 for the non-normalized floating point number;
在浮点数的尾数位的最低位后增加k个“0”,作为有效位;Add k "0"s as the valid bits after the lowest bit of the mantissa of the floating point number;
对增加有效位和隐藏位的尾数位右移n位,以舍弃尾数位最低的n位;Shifting the mantissa of the valid and hidden bits to the right by n bits to discard the lowest n bits of the mantissa;
将移位后的尾数位的最低位作为粘滞位,将粘滞位与舍弃的n位进行“或”运算,用运算结果更新粘滞位,得到最终所需的浮点数的尾数位。The lowest bit of the shifted mantissa bit is used as a sticky bit, and the sticky bit is ORed with the discarded n bit, and the sticky bit is updated by the operation result to obtain the mantissa bit of the final desired floating point number.
进一步,步骤S2包括:Further, step S2 includes:
S21,采用华莱士树结构将多个浮点数相加,直至归约为两个数;S21, using a Wallace tree structure to add a plurality of floating point numbers until they are reduced to two numbers;
S22,将该两个数相加,得到第一累加结果,并将该两个数的反码相加,得到第二累加结果,根据指数位最大的浮点数的符号位,选择第一累加结果或者第二累加结果,作为所述累加结果;S22, adding the two numbers to obtain a first accumulated result, and adding the inverse codes of the two numbers to obtain a second accumulated result, and selecting the first accumulated result according to the sign bit of the floating point number with the largest exponent bit. Or a second accumulated result as the accumulated result;
S23,对该两个数进行逻辑运算,确定累加结果第一个有效数字位 的位置,以得到累加结果的待移位值。S23, performing logical operations on the two numbers to determine the first significant digit of the accumulated result The position to get the value to be shifted of the accumulated result.
进一步,步骤S3包括,根据累加结果的待移位值对累加结果进行逻辑移位,以使累加结果的第一个有效数字位处于最高位,并对逻辑移位后的累加结果进行规格化,得到累加结果的符号位、指数位和尾数位。Further, step S3 includes: logically shifting the accumulated result according to the value to be shifted of the accumulated result, so that the first significant digit of the accumulated result is at the highest position, and normalizing the accumulated result after the logical shift, The sign bit, exponent bit, and mantissa bit of the accumulated result are obtained.
(三)有益效果(3) Beneficial effects
本发明能够对多个相同标准的浮点数进行加法运算,解决了一次操作完成多个操作数的加法运算的问题,同时增设有效数字位和粘滞位,降低运算结果的精度损失;利用华莱士树等结构进行运算,降低了硬件的复杂性,提高了运算速度。The invention can add multiple floating point numbers of the same standard, solves the problem of adding operations of multiple operands in one operation, and adds effective digital bits and sticky bits to reduce the precision loss of the operation result; The calculation of the structure such as the tree reduces the complexity of the hardware and improves the operation speed.
附图说明DRAWINGS
图1是现有技术中串行进位加法树的结构示意图。FIG. 1 is a schematic structural diagram of a serial carry addition tree in the prior art.
图2是现有技术中华莱士树的结构示意图。2 is a schematic view showing the structure of a Wallace tree in the prior art.
图3是本发明提供的执行多个浮点数相加的装置的示意图。3 is a schematic diagram of an apparatus for performing multiple floating point number additions provided by the present invention.
图4是本发明中指数位两两比较的示意图。Figure 4 is a schematic illustration of the comparison of index points in the present invention.
图5是本发明中选择最大指数位的示意图。Figure 5 is a schematic illustration of the selection of the maximum index bit in the present invention.
图6是本发明中计算移位模块的示意图。Figure 6 is a schematic diagram of a calculation shifting module in the present invention.
图7是本发明中最终结果累加模块的示意图。Figure 7 is a schematic illustration of the final result accumulation module of the present invention.
具体实施方式detailed description
为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明进一步详细说明。The present invention will be further described in detail below with reference to the specific embodiments of the invention.
图3是本发明提供的执行多个浮点数相加的装置的示意图,如图3所示,装置包括预处理模块、加法运算模块和规格化处理模块,预处理模块包括比较选择模块和计算移位模块,加法运算模块包括华莱士树模块、最终结果累加模块和前导零预测模块。3 is a schematic diagram of an apparatus for performing multiple floating point number addition according to the present invention. As shown in FIG. 3, the apparatus includes a preprocessing module, an adding operation module, and a normalization processing module. The preprocessing module includes a comparison selection module and a calculation shift. The bit module, the addition module includes a Wallace tree module, a final result accumulation module, and a leading zero prediction module.
现有x个相同标准的y位浮点数进行相加,第i个浮点数用fi表示, 其中,x、y和i均为正整数,且1≤i≤x。There are x identical y-bit floating point numbers of the same standard, and the i-th floating-point number is represented by f i , where x, y, and i are both positive integers, and 1 ≤ i ≤ x.
在预处理模块中,将每个浮点数fi,拆分为符号位部分si,指数位部分ei和尾数位部分mi,即fi=(si,ei,mi)。通过比较选择模块进行两两选择比较操作,如图4所示,即若ea>eb,则选择a,否则选择b。而后,如图5所示,利用二叉树结构,依次选择出具有最大指数位的浮点数fmax,其符号位、指数位、尾数位分别为smax,emax,mmaxIn the preprocessing module, each floating point number f i is split into a sign bit portion s i , an exponent bit portion e i and a mantissa bit portion m i , that is, f i =(s i , e i , m i ). The comparison selection module performs a pairwise selection comparison operation, as shown in FIG. 4, that is, if e a >e b , then a is selected, otherwise b is selected. Then, as shown in FIG. 5, using the binary tree structure, the floating point number f max having the largest exponent bit is sequentially selected, and the sign bit, the exponent bit, and the mantissa are s max , e max , m max , respectively .
图6是本发明中计算移位模块的示意图,即分别求得每个浮点数fi与最大指数位的浮点数fmax的指数的差值Δe。若fmax为规格化浮点数,fi为非规格化浮点数,那么对fi的尾数部分进行逻辑移位的位数n=Δe-1;否则,n=Δe。而后相应地对每个浮点数fi的尾数部分mi进行逻辑移位。移位操作结束后,相当于该x个浮点数的指数位相同,尾数位可以直接进行运算。具体操作是,首先将尾数位mi的最高位前补上1个隐藏位,当该浮点数fi为规格化浮点数时,隐藏位的值为1;当该浮点数fi为非规格化浮点数时,隐藏位为0。尾数位最低位后增添k个“0”作为有效位。此时,尾数位总位数等于移位后总位数,即原尾数位数+隐藏位位数+新增有效数字位数。而后,根据之前求得的待逻辑移位的位数n对每个浮点数fi进行移位,这里先右移n位,以舍弃尾数位最低的n位;再将移位后的尾数位的最低位作为粘滞位,与舍弃的n位作“或”运算,运算结果更新为粘滞位的值即得到所需的移位后尾数位的最终结果。最后,判断每个浮点数fi的符号位部分si与最大指数位的浮点数fmax的符号位部分smax是否相同,相同则无需任何操作,不同则将尾数部分取其补码,便于后面运用加法器直接进行运算。6 is a schematic diagram of the calculation shifting module in the present invention, that is, the difference Δe of the index of each floating point number f i and the floating point number f max of the maximum exponent bit is respectively determined. If f max is a normalized floating point number and f i is a non-normalized floating point number, then the number of bits that logically shift the mantissa portion of f i is n = Δe-1; otherwise, n = Δe. The mantissa portion m i of each floating point number f i is then logically shifted accordingly. After the shift operation ends, the exponent bits corresponding to the x floating point numbers are the same, and the mantissa bits can be directly calculated. The specific operation is to first fill the top bit of the mantissa m i with a hidden bit. When the floating point number f i is a normalized floating point number, the value of the hidden bit is 1; when the floating point number f i is non-specific When floating point numbers are used, the hidden bit is 0. After the lowest digit of the mantissa bit, add k "0"s as valid bits. At this time, the total number of digits of the mantissa is equal to the total number of bits after shifting, that is, the number of digits of the original mantissa + the number of hidden digits + the number of significant digits added Then, each floating point number f i is shifted according to the previously obtained number of bits to be logically shifted, where n bits are shifted right first to discard the lowest n bits of the mantissa bit; then the shifted mantissa bit is shifted The lowest bit is used as the sticky bit, and the “n” operation is performed with the discarded n bits, and the operation result is updated to the value of the sticky bit to obtain the final result of the desired shifted mantissa bit. Finally, it is judged whether the sign bit portion s i of each floating point number f i is the same as the sign bit portion s max of the floating point number f max of the maximum exponent bit, and the same does not require any operation, and the difference is obtained by taking the mantissa portion as a complement. The latter is directly operated by the adder.
在加法运算模块中,利用如图2所示的华莱士树结构,对移位后的各个浮点数的尾数相加,直到归约为两个数,记为sum1和carry1,输出至最终结果累加模块和前导零预测模块。华莱士树结构用简单的硬件快速将处理后的多个浮点数相加归约成两个数相加,即每次利用i个全加器,把j个i位的数相加转换成2*j/3个i+1位的数相加,再用一层全加器转换成4*j/9个数相加,直到转换成2个数。In the addition module, using the Wallace tree structure shown in Figure 2, the mantissas of the shifted floating point numbers are added until they are reduced to two numbers, denoted as sum 1 and carry 1 , and output to The final result is the accumulation module and the leading zero prediction module. The Wallace tree structure quickly sums up the processed multiple floating point numbers into two numbers by simple hardware, that is, each time using i full adders, the j i-bit numbers are added and converted into The number of 2*j/3 i+1 bits is added, and then converted into 4*j/9 numbers by a full-adder, until it is converted into 2 numbers.
最终结果累加模块利用双通路计算出运算结果,该结构如图7所示。 一条通路对sum1和carry1直接进行相加,另一条通路对二者的反码进行相加,最后根据第一条通路的所得结果的最高位,若最高位的值为0,则选择第一条通路的结果作为累加部分的最终结果tmp_sum并输出,否则,选择第二条通路的结果作为累加部分的最终结果tmp_sum并输出。前导零预测模块,利用前导零预测(leading zero anticipator,LZA)方法,首先对输入的sum1和carry1按位求得其传播函数
Figure PCTCN2016080126-appb-000004
生成函数G=sum1·carry1,杀死函数Z=(sum1·carry1)’的值;而后,求得每一
Figure PCTCN2016080126-appb-000005
么就可以得到位置参数为
Figure PCTCN2016080126-appb-000006
第一个不为0的位置参数的下角标的值即为我们所求的累加部分的最终结果tmp_sum的第一个有效数字位的位置num_shift,将其以二进制的形式输出即可。
The final result accumulation module uses two channels to calculate the operation result. The structure is shown in Fig. 7. Adding a sum of a path 1 and directly with Carry, other path of the two counter-code addition, according to the most significant bit last results obtained in the first passage, if the most significant bit is 0, the choice of The result of one path is output as the final result tmp_sum of the accumulated portion, otherwise, the result of the second path is selected as the final result tmp_sum of the accumulated portion and output. The leading zero prediction module uses the leading zero anticipator (LZA) method to first obtain the propagation function of the input sum 1 and carry 1 in bits.
Figure PCTCN2016080126-appb-000004
Generate the function G=sum 1 · carry 1 , kill the value of the function Z=(sum 1 · carry 1 )'; then, find each
Figure PCTCN2016080126-appb-000005
You can get the positional parameter as
Figure PCTCN2016080126-appb-000006
The value of the lower corner of the first position parameter that is not 0 is the position num_shift of the first significant digit of the final result tmp_sum of the cumulative part we are looking for, and it can be output in binary form.
在规格化处理模块中,根据前导零预测模块分的第一个有效数字位的位置num_shift对最终结果tmp_sum进行逻辑移位,移动位数为num_shift,而后进行规格化,分别得到最终结果的符号位sresult、指数位eresult和尾数位mresult,,组合得到最终结果sumresult={sresult,eresult,mresult}。In the normalization processing module, the final result tmp_sum is logically shifted according to the position num_shift of the first significant digit of the leading zero prediction module, the number of bits is num_shift, and then normalized to obtain the sign bit of the final result. s result , exponent bit e result and mantissa bit m result , , combined to get the final result sum result ={s result ,e result ,m result }.
本发明提供一实施例,是4个16位浮点数相加,即x=4,y=16。其中,浮点数标准采用IEEE754的half类型浮点数标准,即每个浮点数由1位符号位,5位指数位和10位尾数位组成。The present invention provides an embodiment in which four 16-bit floating point numbers are added, that is, x=4 and y=16. Among them, the floating-point number standard adopts IEEE754's half-type floating-point number standard, that is, each floating-point number is composed of 1 bit symbol bit, 5 bit exponent bit and 10 bit mantissa bit.
在如图3所示的装置中,输入4个浮点数并用二进制表示为f1=0001001010000001,f2=0001110011110000,f3=00011001011111111,f4=0010010011011001,拆分为符号位、指数位、尾数位的格式,即{s,e,m},用二进制表示得到f1={0,00100,1010000001},f2={0,00111,0011110000},f3={0,00110,01011111111},f4={0,01001,0011011001}。利用如图4所示装置,分别比较f1、f2的指数位e1=00100、e2=00111,选出较大的指数值emax(e1,e2)=00111,和比较f3、f4的指数位e3=00110、e4=01001,选出较大的指数值emax(e3,e4)=01001,而后利用如图5所示的树状结构,比较emax(e1,e2)=00111和emax(e3,e4)=01001,选择出较大的指数位emax=01001,该浮点数用fmax=f4=0010010011011001表示,符号位和尾数 位分别为smax=0和mmax=0011011001。In the apparatus shown in FIG. 3, four floating-point numbers are input and expressed in binary as f 1 =0001001010000001, f 2 =0001110011110000, f 3 =00011001011111111,f 4 =0010010011011001, split into sign bit, exponent bit, mantissa bit The format, ie {s,e,m}, is expressed in binary to get f 1 ={0,00100,1010000001},f 2 ={0,00111,0011110000},f 3 ={0,00110,01011111111},f 4 = {0, 01001, 0011011001}. Using the apparatus shown in FIG. 4, the exponential bits e 1 = 00100 and e 2 = 00111 of f 1 and f 2 are respectively compared, and a larger index value e max (e1, e2) = 00111 is selected, and f 3 is compared. The index bits of f 4 are e 3 =00110, e 4 =01001, and a larger index value e max(e3, e4) =01001 is selected, and then e max(e1, is compared using a tree structure as shown in FIG. 5 . E2) =00111 and e max(e3,e4) =01001, and a larger exponent bit e max =01001 is selected, which is represented by f max =f 4 =0010010011011001, and the sign bit and the mantissa are respectively s max = 0 and m max = 0011011001.
而后,分别求得f1、f2、f3、f4的指数位e1、e2、e3、e4与emax的差值,Δe1=5、Δe2=2、Δe3=3、Δe4=0。由于f1、f2、f3、f4均为规格化浮点数,故待移位的位数n=Δe,即n1=Δe1=5、n2=Δe2=2、n3=Δe3=3、n4=Δe4=0。这里为了降低运算过程中的精度损失,增加三位有效数字位,即k=3,并令最低位为粘滞位。移位时,由于该实施例采用IEEE754标准,故首先fmax、f1、f2、f3、f4的尾数位部分的最高位前补上1位隐藏位,并判断它们是否为规格化浮点数。由于f1、f2、f3、f4均为规格化浮点数,也就是令fmax、f1、f2、f3、f4的隐藏位的值为1。而后,将尾数位的最低位后面补上3个“0”,即达到了预设的总位数:原尾数位数+隐藏位位数+新增有效数字位数=10+1+3=14位,接着,根据指数差n进行右移,舍弃最低的n位;将舍弃的n位的数值与最后一位粘滞位进行“或”运算,用运算结果更新粘滞位的值,得到所需的移位后尾数位的最终结果。以f1为例,由上面可得其尾数部分为1010000001,最高位增加一位隐藏位,由于f1为规格化浮点数,故该隐藏位的值为1,得到11010000001;在最低位后面补3个0,并定义最低位为粘滞位,得到11010000001000。由于n1=5,所以我们需要移动5位,故最右面的5位01000均需要舍弃,得到00000110100000;将该被舍弃的5位数字01000与粘滞位0进行“或”运算,得到1,用该结果更新粘滞位,即粘滞位的值为1得到移位后的结果为00000110100001。又以f2为例,由上面可得其尾数部分为0011110000,最高位增加一位隐藏位,由于f2为规格化浮点数,故该隐藏位的值为1,在最低位后面补3个0,并定义最低位为粘滞位,得到10011110000000。由于n2=2,所以我们需要移动2位,故最右面的2位00均需要舍弃,得到00100111100000;将该被舍弃的2位数字00与粘滞位0进行“或”运算,得到0,用该结果更新粘滞位,即粘滞位的值为0,得到移位后的结果为00100111100000。最后,对浮点数f1、f2、f3、f4的符号位s1、s2、s3、s4与smax进行比较,结果均为0,即均为正数,故无需对其尾数部分再进行取补的操作。Then, the difference between the exponent bits e 1 , e 2 , e 3 , e 4 and e max of f 1 , f 2 , f 3 , and f 4 is obtained, respectively, Δe 1 =5, Δe 2 = 2, Δe 3 = 3. Δe 4 =0. Since f 1 , f 2 , f 3 , and f 4 are normalized floating point numbers, the number of bits to be shifted is n = Δe, that is, n 1 = Δe 1 = 5, n 2 = Δe 2 = 2, n 3 = Δe 3 = 3, n 4 = Δe 4 = 0. Here, in order to reduce the precision loss in the operation process, three significant digits are added, that is, k=3, and the lowest bit is a sticky bit. When shifting, since this embodiment adopts the IEEE754 standard, first the first bit of the mantissa portion of f max , f 1 , f 2 , f 3 , and f 4 is padded with 1 hidden bit, and it is judged whether or not they are normalized. Floating point number. Since f 1 , f 2 , f 3 , and f 4 are normalized floating point numbers, that is, the values of hidden bits of f max , f 1 , f 2 , f 3 , and f 4 are 1. Then, the last digit of the mantissa is followed by 3 "0"s, that is, the preset total number of digits is reached: the original mantissa + the hidden digits + the new significant digits = 10 + 1 + 3 = 14 bits, then, shift right according to the index difference n, discard the lowest n bits; OR the value of the discarded n bits with the last bit of the sticky bit, and update the value of the sticky bit with the operation result to obtain The final result of the desired mantissa after the shift. Taking f 1 as an example, the mantissa part of the above is 1010000001, and the highest bit is increased by one hidden bit. Since f 1 is a normalized floating point number, the value of the hidden bit is 1 and 11010000001 is obtained; Three 0s, and the lowest bit is defined as the sticky bit, which gives 11010000001000. Since n 1 =5, we need to move 5 bits, so the rightmost 5 bits 01000 need to be discarded, get 00000110100000; the discarded 5 digit 01000 and the sticky bit 0 are ORed to get 1, The result is updated with the result, that is, the value of the sticky bit is 1 and the result after shifting is 00000110100001. Taking f 2 as an example, the mantissa part of the above is 0011110000, and the highest bit is increased by one hidden bit. Since f 2 is a normalized floating point number, the value of the hidden bit is 1, and 3 bits are added after the lowest bit. 0, and define the lowest bit as the sticky bit, get 10011110000000. Since n 2 = 2, we need to move 2 bits, so the rightmost 2 bits 00 need to be discarded to get 00100111100000; the discarded 2 digit 00 and the sticky bit 0 are ORed to get 0. The sticky bit is updated with the result, that is, the value of the sticky bit is 0, and the result of the shift is 00100111100000. Finally, the sign bits s 1 , s 2 , s 3 , s 4 and s max of the floating point numbers f 1 , f 2 , f 3 , and f 4 are compared, and the results are all 0, that is, both are positive numbers, so no need to The mantissa part is then replenished.
如图3所示,将预处理的结果输入到加法运算模块。利用图2示出 的华莱士树结构,对四个14位预处理后的尾数进行处理。这里,本发明考虑使用两级华莱士树结构,首先通过一级4-2华莱士树结构部分,进行相加,而后将结果分别输入到第二级3-2华莱士树结构部分和前导零预测部分进行运算。3-2华莱士树将运算结果最终归约为两个数,即sum1=11011000000100和carry1=110100010,输出至最终结果累加部分。在该部分中,利用双通路计算出运算结果,一条通路对sum1和carry1直接进行相加,另一条通路对二者先取反码,而后进行相加。由于第一条通路所得结果的最高位为0,故选择得到第一条通路的结果作为累加部分的最终结果,即tmp_sum=0011100101001000,输出至第三模块。前导零预测部分是将第一级4-2华莱士树的输出结果利用前导零预测算法(LZA算法)计算得到累加部分的最终结果规格化待移动的位数用二进制表示为num_shift=10,输出至第三模块。其中,前导零预测部分和第二级华莱士树部分是并行执行的。As shown in FIG. 3, the result of the preprocessing is input to the addition module. Four 14-bit preprocessed mantissas are processed using the Wallace tree structure shown in FIG. Here, the present invention contemplates the use of a two-level Wallace tree structure, first by adding a first-level 4-2 Wallace tree structure, and then inputting the results separately to the second-level 3-2 Wallace tree structure. Operates with the leading zero prediction part. 3-2 Wallace tree will eventually return the result to two numbers, namely sum 1 =11011000000100 and carry 1 =110100010, and output to the final result accumulation. In this part, the operation result is calculated by using two paths, one channel directly sums sum 1 and carry 1 , and the other path reverses the two, and then adds. Since the highest bit of the result of the first path is 0, the result of obtaining the first path is selected as the final result of the accumulated part, that is, tmp_sum=0011100101001000, and is output to the third module. The leading zero prediction part is to calculate the output result of the first stage 4-2 Wallace tree by using the leading zero prediction algorithm (LZA algorithm) to calculate the final result of the accumulated part. The number of bits to be moved is expressed in binary as num_shift=10. Output to the third module. Among them, the leading zero prediction part and the second level Wallace tree part are executed in parallel.
如图3所示,规格化处理模块利用LZA算法,根据tmp_sum和第一模块得到的fmax进行逻辑操作,得到最终结果的符号位sresult=0;根据第一模块得到的fmax、第二模块累加部分得到的tmp_sum和前导零预测部分的输出结果num_shift进行逻辑操作,得到最终结果的指数位eresult=01001;根据前导零预测部分的输出结果num_shift、第一模块得到的fmax对第二模块得到的tmp_sum进行移位并规格化得到最终结果的尾数位mresult=11001100101001。最后,将三者组合得到最终结果sumresult={sresult,eresult,mresult}={0,01001,11001100101001}=00100111001100101001。As shown in FIG. 3, the normalization processing module uses the LZA algorithm to perform logical operations according to tmp_sum and f max obtained by the first module, to obtain a sign bit s result =0 of the final result; f max and second obtained according to the first module. The tmp_sum obtained by the module accumulation portion and the output result num_shift of the leading zero prediction portion are logically operated to obtain an exponential bit e result =01001 of the final result; the output result num_shift of the leading zero prediction portion, and the f max pair obtained by the first module are second. The tmp_sum obtained by the module is shifted and normalized to obtain the mantissa of the final result m result =11001100101001. Finally, combine the three to get the final result sum result ={s result ,e result ,m result }={0,01001,11001100101001}=00100111001100101001.
综上所述,能够快速高效的完成多个相同标准的浮点数的加法操作,增加了一次运算支持的操作数数量,降低了运算延迟,加速了运算过程,降低了运算结果的精度损失。In summary, the addition operation of multiple floating-point numbers of the same standard can be completed quickly and efficiently, the number of operands supported by one operation is increased, the operation delay is reduced, the operation process is accelerated, and the precision loss of the operation result is reduced.
以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 The specific embodiments of the present invention have been described in detail, and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims (12)

  1. 一种用于执行多个浮点数相加的装置,所述浮点数包括符号位、指数位和尾数位,其特征在于,装置包括:A device for performing a plurality of floating point number additions, the floating point number comprising a sign bit, an exponent bit and a mantissa bit, wherein the device comprises:
    预处理模块,用于对所述多个浮点数进行预处理,以使所述多个浮点数的指数位和符号位一致;a preprocessing module, configured to preprocess the plurality of floating point numbers such that exponential bits and sign bits of the plurality of floating point numbers are consistent;
    加法运算模块,用于对预处理后的多个浮点数相加,得到累加结果及该累加结果的待移位值,所述累加结果包括符号位、指数位和尾数位;An adding module, configured to add a plurality of floating-point numbers after the pre-processing, to obtain an accumulated result and a value to be shifted of the accumulated result, where the accumulated result includes a sign bit, an exponent bit, and a mantissa bit;
    规格化处理模块,用于根据所述待移位值对所述累加结果的符号位、指数位和尾数位进行移位,得到规格化的累加结果。And a normalization processing module, configured to shift the sign bit, the exponent bit, and the mantissa bit of the accumulated result according to the value to be shifted to obtain a normalized accumulated result.
  2. 根据权利要求1所述的用于执行多个浮点数相加的装置,其特征在于,所述预处理模块包括:The apparatus for performing a plurality of floating point number additions according to claim 1, wherein the preprocessing module comprises:
    比较选择模块,用于以二叉树的形式对所述多个浮点数的指数位进行两两比较,选择出最大的指数位;a comparison selection module, configured to compare the exponential bits of the plurality of floating point numbers in a binary tree to select a maximum exponent bit;
    计算移位模块,用于根据每个浮点数和具有最大指数位的浮点数的指数位的关系来求得每个浮点数需要进行逻辑移位的位数n,并对相应浮点数的尾数位进行逻辑移位,以使每一个浮点数的指数位均等于所述最大的指数位,同时,令每一个浮点数的符号位与指数位最大的浮点数的符号位一致,其中,浮点数在改变符号位时,对其尾数位取补码。Calculating a shifting module for determining a number n of bits that need to be logically shifted for each floating point number according to a relationship between each floating point number and an exponent bit of a floating point number having a maximum exponent bit, and the mantissa bit of the corresponding floating point number Performing a logical shift such that the exponent bits of each floating point number are equal to the maximum exponent bit, and at the same time, the sign bit of each floating point number is consistent with the sign bit of the largest floating point number of the exponent bit, wherein the floating point number is When changing the sign bit, the mantissa is complemented.
  3. 根据权利要求2所述的用于执行多个浮点数相加的装置,其特征在于,所述计算移位模块求得逻辑移位的位数n,包括:The apparatus for performing a plurality of floating point number additions according to claim 2, wherein the calculating the shifting block to determine the number of bits n of the logical shift comprises:
    计算最大的指数位和待逻辑移位的浮点数的指数位的差值Δe;Calculating a difference Δe between the largest exponent bit and the exponent bit of the floating point number to be logically shifted;
    若具有最大指数位的浮点数为规格化浮点数,而待逻辑移位的浮点数为非规格化浮点数,则令n=Δe-1;否则,令n=Δe。If the floating point number with the largest exponent bit is a normalized floating point number and the floating point number to be logically shifted is a non-normalized floating point number, let n = Δe-1; otherwise, let n = Δe.
  4. 根据权利要求3所述的用于执行多个浮点数相加的装置,其特征在于,所述计算移位模块对浮点数的尾数位进行逻辑移位,包括:The apparatus for performing a plurality of floating point number additions according to claim 3, wherein the calculating the shifting module logically shifting the mantissa bits of the floating point number comprises:
    在所述浮点数的尾数位的最高位前补上1位隐藏位,其中,对于规格化浮点数,隐藏位的值为1,对于非规格化浮点数,隐藏位的值为0;Adding a 1-bit hidden bit before the highest bit of the mantissa of the floating-point number, wherein the value of the hidden bit is 1 for the normalized floating-point number, and 0 for the non-normalized floating-point number;
    在所述浮点数的尾数位的最低位后增加k个“0”,作为有效位; Adding k “0”s as the effective bits after the lowest bit of the mantissa of the floating point number;
    对增加了有效位和隐藏位的尾数位右移n位,以舍弃尾数位最低的n位;Shift the n-bit of the significant digit and the hidden digit to the right by n bits to discard the lowest n-bit of the mantissa;
    将移位后的尾数位的最低位作为粘滞位,将粘滞位与舍弃的n位进行“或”运算,用运算结果更新粘滞位,得到最终所需的浮点数的尾数位。The lowest bit of the shifted mantissa bit is used as a sticky bit, and the sticky bit is ORed with the discarded n bit, and the sticky bit is updated by the operation result to obtain the mantissa bit of the final desired floating point number.
  5. 根据权利要求1所述的用于执行多个浮点数相加的装置,其特征在于,所述加法运算模块包括:The apparatus for performing a plurality of floating point number additions according to claim 1, wherein the adding module comprises:
    华莱士树模块,用于采用华莱士树结构将所述多个浮点数相加,直至归约为两个数;a Wallace tree module for adding the plurality of floating point numbers using a Wallace tree structure until the two numbers are reduced;
    最终结果累加模块,用于将该两个数相加,得到第一累加结果,并将该两个数的反码相加,得到第二累加结果,根据第一累加结果的最高位,选择第一累加结果或者第二累加结果,作为所述累加结果;The final result accumulating module is configured to add the two numbers to obtain a first accumulated result, and add the inverse codes of the two numbers to obtain a second accumulated result, and select the first according to the highest bit of the first accumulated result. An accumulated result or a second accumulated result as the accumulated result;
    前导零预测模块,用于对该两个数进行逻辑运算,确定所述累加结果第一个有效数字位的位置,以得到所述累加结果的待移位值。The leading zero prediction module is configured to perform logical operations on the two numbers to determine a position of the first significant digit of the accumulated result to obtain a value to be shifted of the accumulated result.
  6. 根据权利要求5所述的用于执行多个浮点数相加的装置,其特征在于,所述规格化处理模块根据所述待移位值对所述累加结果进行逻辑移位,以使所述累加结果的第一个有效数字位处于最高位,并对逻辑移位后的累加结果进行规格化,得到累加结果的符号位、指数位和尾数位。The apparatus for performing a plurality of floating point number additions according to claim 5, wherein said normalization processing module logically shifts said accumulation result according to said value to be shifted, so that said said The first significant digit of the accumulated result is at the highest bit, and the accumulated result after the logical shift is normalized to obtain the sign bit, exponent bit, and mantissa of the accumulated result.
  7. 一种用于执行多个浮点数相加的方法,所述浮点数包括符号位、指数位和尾数位,其特征在于,方法包括:A method for performing a plurality of floating point numbers, the floating point number comprising a sign bit, an exponent bit, and a mantissa bit, wherein the method comprises:
    S1,对所述多个浮点数进行预处理,以使所述多个浮点数的指数位和符号位一致;S1, preprocessing the plurality of floating point numbers such that exponential bits and sign bits of the plurality of floating point numbers are consistent;
    S2,对预处理后的多个浮点数相加,得到累加结果及该累加结果的待移位值,所述累加结果包括符号位、指数位和尾数位;S2, adding a plurality of floating-point numbers after the pre-processing to obtain an accumulated result and a value to be shifted of the accumulated result, where the accumulated result includes a sign bit, an exponent bit, and a mantissa bit;
    S3,根据所述待移位值对所述累加结果的符号位、指数位和尾数位进行移位,得到规格化的累加结果。S3, shifting the sign bit, the exponent bit, and the mantissa bit of the accumulated result according to the value to be shifted to obtain a normalized accumulated result.
  8. 根据权利要求7所述的用于执行多个浮点数相加的方法,其特征在于,所述步骤S1包括: The method for performing a plurality of floating point number additions according to claim 7, wherein the step S1 comprises:
    S11,以二叉树的形式对所述多个浮点数的指数位进行两两比较,选择出最大的指数位;S11, comparing the exponential bits of the plurality of floating point numbers in the form of a binary tree, and selecting the largest exponent bit;
    S12,根据每个浮点数和具有最大指数位的浮点数的指数位的关系来求得每个浮点数需要进行逻辑移位的位数n,并对相应浮点数的尾数位进行逻辑移位,以使每一个浮点数的指数位均等于所述最大的指数位,同时,令每一个浮点数的符号位与指数位最大的浮点数的符号位一致,其中,浮点数在改变符号位时,对其尾数位取补码。S12. Determine, according to the relationship between each floating point number and an exponent bit of the floating point number having the largest exponent bit, a bit number n that needs to be logically shifted for each floating point number, and logically shift the mantissa bit of the corresponding floating point number. So that the exponent bits of each floating point number are equal to the maximum exponent bit, and at the same time, the sign bit of each floating point number is consistent with the sign bit of the largest floating point number of the exponent bit, wherein when the floating point number changes the sign bit, Complement the code for its mantissa.
  9. 根据权利要求8所述的用于执行多个浮点数相加的方法,其特征在于,所述步骤S12中,求得逻辑移位的位数n,包括:The method for performing a plurality of floating point number additions according to claim 8, wherein in step S12, the number of bits n of the logical shift is obtained, including:
    计算最大的指数位和待逻辑移位的浮点数的指数位的差值Δe;Calculating a difference Δe between the largest exponent bit and the exponent bit of the floating point number to be logically shifted;
    若具有最大指数位的浮点数为规格化浮点数,而待逻辑移位的浮点数为非规格化浮点数,则令n=Δe-1;否则,令n=Δe。If the floating point number with the largest exponent bit is a normalized floating point number and the floating point number to be logically shifted is a non-normalized floating point number, let n = Δe-1; otherwise, let n = Δe.
  10. 根据权利要求8所述的用于执行多个浮点数相加的方法,其特征在于,所述步骤S12中,对浮点数的尾数位进行逻辑移位,包括:The method for performing a plurality of floating point number additions according to claim 8, wherein in step S12, logically shifting the mantissa bits of the floating point number comprises:
    在所述浮点数的尾数位的最高位前补上1位隐藏位,其中,对于规格化浮点数,隐藏位的值为1,对于非规格化浮点数,隐藏位的值为0;Adding a 1-bit hidden bit before the highest bit of the mantissa of the floating-point number, wherein the value of the hidden bit is 1 for the normalized floating-point number, and 0 for the non-normalized floating-point number;
    在所述浮点数的尾数位的最低位后增加k个“0”,作为有效位;Adding k “0”s as the effective bits after the lowest bit of the mantissa of the floating point number;
    对增加有效位和隐藏位的尾数位右移n位,以舍弃尾数位最低的n位;Shifting the mantissa of the valid and hidden bits to the right by n bits to discard the lowest n bits of the mantissa;
    将移位后的尾数位的最低位作为粘滞位,将粘滞位与舍弃的n位进行“或”运算,用运算结果更新粘滞位,得到最终所需的浮点数的尾数位。The lowest bit of the shifted mantissa bit is used as a sticky bit, and the sticky bit is ORed with the discarded n bit, and the sticky bit is updated by the operation result to obtain the mantissa bit of the final desired floating point number.
  11. 根据权利要求7所述的用于执行多个浮点数相加的方法,其特征在于,所述步骤S2包括:The method for performing a plurality of floating point number additions according to claim 7, wherein the step S2 comprises:
    S21,采用华莱士树结构将所述多个浮点数相加,直至归约为两个数;S21, adding the plurality of floating point numbers by using a Wallace tree structure until the two numbers are reduced;
    S22,将该两个数相加,得到第一累加结果,并将该两个数的反码相加,得到第二累加结果,根据第一累加结果的最高位,选择第一累加结果或者第二累加结果,作为所述累加结果; S22, adding the two numbers to obtain a first accumulated result, and adding the inverse codes of the two numbers to obtain a second accumulated result, and selecting the first accumulated result or the first according to the highest bit of the first accumulated result Two accumulated results as the accumulated result;
    S23,对该两个数进行逻辑运算,确定所述累加结果第一个有效数字位的位置,以得到所述累加结果的待移位值。S23, performing logical operations on the two numbers to determine a position of the first significant digit of the accumulated result to obtain a value to be shifted of the accumulated result.
  12. 根据权利要求11所述的用于执行多个浮点数相加的方法,其特征在于,所述步骤S3包括,根据所述累加结果的待移位值对所述累加结果进行逻辑移位,以使所述累加结果的第一个有效数字位处于最高位,并对逻辑移位后的累加结果进行规格化,得到累加结果的符号位、指数位和尾数位。 The method for performing a plurality of floating point number additions according to claim 11, wherein the step S3 comprises: logically shifting the accumulated result according to the value to be shifted of the accumulated result, The first significant digit of the accumulated result is placed in the highest bit, and the accumulated result after the logical shift is normalized to obtain a sign bit, an exponent bit, and a mantissa bit of the accumulated result.
PCT/CN2016/080126 2016-04-25 2016-04-25 Device and method for adding up plurality of floating point numbers WO2017185203A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/080126 WO2017185203A1 (en) 2016-04-25 2016-04-25 Device and method for adding up plurality of floating point numbers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/080126 WO2017185203A1 (en) 2016-04-25 2016-04-25 Device and method for adding up plurality of floating point numbers

Publications (1)

Publication Number Publication Date
WO2017185203A1 true WO2017185203A1 (en) 2017-11-02

Family

ID=60160595

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/080126 WO2017185203A1 (en) 2016-04-25 2016-04-25 Device and method for adding up plurality of floating point numbers

Country Status (1)

Country Link
WO (1) WO2017185203A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3107374A1 (en) * 2020-02-18 2021-08-20 Commissariat A L'energie Atomique Et Aux Energies Alternatives Binary tree-based floating-point summation system on a chip
CN114327360A (en) * 2020-09-29 2022-04-12 华为技术有限公司 Arithmetic unit, floating point number calculation method and device, chip and calculation equipment
FR3115901A1 (en) * 2020-11-05 2022-05-06 Commissariat A L'energie Atomique Et Aux Energies Alternatives System-on-chip binary tree summation of floating-point values
WO2023231363A1 (en) * 2022-06-01 2023-12-07 寒武纪(西安)集成电路有限公司 Method for multiplying and accumulating operands, and device therefor

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120011182A1 (en) * 2010-07-06 2012-01-12 Silminds, Llc, Egypt Decimal floating-point square-root unit using newton-raphson iterations
CN102855117A (en) * 2011-06-29 2013-01-02 Arm有限公司 Floating-point adder
CN103176767A (en) * 2013-03-01 2013-06-26 浙江大学 Implementation method of floating point multiply-accumulate unit low in power consumption and high in huff and puff
CN103176948A (en) * 2013-03-04 2013-06-26 浙江大学 Single precision elementary function operation accelerator low in cost
CN103455305A (en) * 2013-08-27 2013-12-18 西安交通大学 Rounding prediction method for floating point adder

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120011182A1 (en) * 2010-07-06 2012-01-12 Silminds, Llc, Egypt Decimal floating-point square-root unit using newton-raphson iterations
CN102855117A (en) * 2011-06-29 2013-01-02 Arm有限公司 Floating-point adder
CN103176767A (en) * 2013-03-01 2013-06-26 浙江大学 Implementation method of floating point multiply-accumulate unit low in power consumption and high in huff and puff
CN103176948A (en) * 2013-03-04 2013-06-26 浙江大学 Single precision elementary function operation accelerator low in cost
CN103455305A (en) * 2013-08-27 2013-12-18 西安交通大学 Rounding prediction method for floating point adder

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3107374A1 (en) * 2020-02-18 2021-08-20 Commissariat A L'energie Atomique Et Aux Energies Alternatives Binary tree-based floating-point summation system on a chip
WO2021165075A1 (en) * 2020-02-18 2021-08-26 Commissariat A L'energie Atomique Et Aux Energies Alternatives System on a chip for the binary tree summation of floating-point numbers
CN114327360A (en) * 2020-09-29 2022-04-12 华为技术有限公司 Arithmetic unit, floating point number calculation method and device, chip and calculation equipment
FR3115901A1 (en) * 2020-11-05 2022-05-06 Commissariat A L'energie Atomique Et Aux Energies Alternatives System-on-chip binary tree summation of floating-point values
WO2022096246A1 (en) * 2020-11-05 2022-05-12 Commissariat A L'energie Atomique Et Aux Energies Alternatives System-on-a-chip for binary tree summation of floating values
WO2023231363A1 (en) * 2022-06-01 2023-12-07 寒武纪(西安)集成电路有限公司 Method for multiplying and accumulating operands, and device therefor

Similar Documents

Publication Publication Date Title
CN107305485B (en) Device and method for performing addition of multiple floating point numbers
US9483232B2 (en) Data processing apparatus and method for multiplying floating point operands
WO2017181342A1 (en) Non-linear function computing device and method
WO2017185203A1 (en) Device and method for adding up plurality of floating point numbers
US10140092B2 (en) Closepath fast incremented sum in a three-path fused multiply-add design
CN107608655B (en) Method for executing FMA instruction in microprocessor and microprocessor
CN1928809A (en) System, apparatus and method for performing floating-point operations
CN110955406A (en) Floating point dynamic range extension
US11106431B2 (en) Apparatus and method of fast floating-point adder tree for neural networks
CN106250098B (en) Apparatus and method for controlling rounding when performing floating point operations
US8788561B2 (en) Arithmetic circuit, arithmetic processing apparatus and method of controlling arithmetic circuit
Wahba et al. Area efficient and fast combined binary/decimal floating point fused multiply add unit
KR102481418B1 (en) Method and apparatus for fused multiply-add
WO2023124372A1 (en) Floating-point number processing apparatus and method, electronic device, storage medium, and chip
KR100465371B1 (en) apparatus and method for design of the floating point ALU performing addition and round operations in parallel
Vázquez et al. Iterative algorithm and architecture for exponential, logarithm, powering, and root extraction
CN111796798A (en) Fixed-point and floating-point converter, processor, method and storage medium
JP2015531927A (en) Modal interval calculation based on decoration composition
CN114077419A (en) Method and system for processing floating point numbers
CN113377334B (en) Floating point data processing method and device and storage medium
KR101922462B1 (en) A data processing apparatus and method for performing a shift function on a binary number
US20160085508A1 (en) Optimized structure for hexadecimal and binary multiplier array
US20120259903A1 (en) Arithmetic circuit, arithmetic processing apparatus and method of controlling arithmetic circuit
Ravi et al. Segmentation of blood vessels and optic disc in retinal images
US20230334117A1 (en) Method and system for calculating dot products

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16899718

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16899718

Country of ref document: EP

Kind code of ref document: A1