CN114327361B - 21-bit floating-point adder - Google Patents

21-bit floating-point adder Download PDF

Info

Publication number
CN114327361B
CN114327361B CN202210217664.8A CN202210217664A CN114327361B CN 114327361 B CN114327361 B CN 114327361B CN 202210217664 A CN202210217664 A CN 202210217664A CN 114327361 B CN114327361 B CN 114327361B
Authority
CN
China
Prior art keywords
addend
result
bit
temporary
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210217664.8A
Other languages
Chinese (zh)
Other versions
CN114327361A (en
Inventor
尚德龙
郝美琪
乔树山
周玉梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Nanjing Intelligent Technology Research Institute
Original Assignee
Zhongke Nanjing Intelligent Technology Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Nanjing Intelligent Technology Research Institute filed Critical Zhongke Nanjing Intelligent Technology Research Institute
Priority to CN202210217664.8A priority Critical patent/CN114327361B/en
Publication of CN114327361A publication Critical patent/CN114327361A/en
Application granted granted Critical
Publication of CN114327361B publication Critical patent/CN114327361B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a 21-bit floating-point adder, comprising: the order shifting addition circuit, the normalization circuit and the output circuit; the output of the order-matching shift addition circuit is connected with the input of the normalization circuit, and the output of the normalization circuit is connected with the input of the output circuit; the order-matching shift addition circuit is used for carrying out order-matching shift addition on the addend and the addend to obtain a temporary addition result; the normalization circuit is used for carrying out sign bit calculation and integer part shift according to the temporary addition result to obtain a temporary output result; the output circuit is used for normalizing the temporary output result and outputting the normalized temporary output result; the addend and the addend are both 21bit numbers, and the 21bit numbers sequentially comprise a 1-bit sign bit, a 6-bit order code, a 4-bit integer part and a 10-bit decimal part from high to low. The invention improves the working frequency of the adder.

Description

21-bit floating-point adder
Technical Field
The invention relates to the technical field of adders, in particular to a 21-bit floating-point adder.
Background
The addition operation is the basis of other operations, and the performance of the whole operation circuit can be effectively improved by improving the operation performance of the adder, so that the speed and the energy efficiency of the convolutional neural network are improved, and the performance of a chip is finally improved. At present, the operation performance of the adder needs to be improved.
Disclosure of Invention
The invention aims to provide a 21-bit floating-point adder, which improves the working frequency of the adder.
In order to achieve the purpose, the invention provides the following scheme:
a 21-bit floating-point adder, comprising: the order shifting addition circuit, the normalization circuit and the output circuit;
the output of the order-matching shift addition circuit is connected with the input of the normalization circuit, and the output of the normalization circuit is connected with the input of the output circuit; the order-matching shift addition circuit is used for carrying out order-matching shift addition on the addend and the addend to obtain a temporary addition result; the normalization circuit is used for carrying out sign bit calculation and integer part shift according to the temporary addition result to obtain a temporary output result; the output circuit is used for normalizing the temporary output result and outputting the normalized temporary output result;
the addend and the addend are both 21bit numbers, and the 21bit numbers sequentially comprise a 1-bit sign bit, a 6-bit order code, a 4-bit integer part and a 10-bit decimal part from high to low;
the decimal expression of the 21bit number is as follows:
Figure 722958DEST_PATH_IMAGE001
wherein mix21(10)A decimal representing the 21-bit number, S represents a sign bit, E represents a code, I represents an integer part, and M represents a decimal part.
Optionally, the adding the addend and the addend by the pair-shift adder circuit to obtain a temporary addition result, specifically including:
recording the maximum order code in the order codes of the addends and the added number as a third order code;
if the addend rank code is smaller than the addend rank code, the addend log-to-rank shift splicing result is the difference value of the addend rank code and the addend rank code, which is obtained by splicing the integer part and the decimal part of the addend, and the addend log-to-rank shift splicing result is the splicing of the integer part and the decimal part of the addend;
if the step code of the addend is greater than the step code of the addend, the split-step shift splicing result of the addend is the splicing of the integer part and the decimal part of the addend, and the split-step shift splicing result of the addend is the sum of the split-up right shift of the integer part and the decimal part of the addend and the complement of the addend step code;
if the step code of the addend is equal to the step code of the addend, the log-rank shift splicing result of the addend is the splicing of the integer part and the fractional part of the addend, and the log-rank shift splicing result of the addend is the splicing of the integer part and the fractional part of the addend;
if the sign bit of the addend is the same as the sign bit of the addend, the temporary addition result is the sum of the added logarithmic phase shift splicing result and the added logarithmic phase shift splicing result;
if the sign bit of the addend is 0, the complement of the addend pair-order shift splicing result is equal to the addend pair-order shift splicing result, and if the sign bit of the addend is 1, the complement of the addend pair-order shift splicing result is the negative of the addend pair-order shift splicing result plus 1; if the sign bit of the addend is 0, the complement of the opposite-order shift splicing result of the addend is equal to the opposite-order shift splicing result of the addend, and if the sign bit of the addend is 1, the complement of the opposite-order shift splicing result of the addend is the inverse of the opposite-order shift splicing result of the addend plus 1;
and if the sign bit of the addend is not the same as the sign bit of the addend, the temporary addition result is the sum of the complement of the addend opposite-order shift splicing result and the complement of the addend opposite-order shift splicing result.
Optionally, the normalizing circuit performs sign bit calculation and integer part shift according to the temporary addition result to obtain a temporary output result, and specifically includes:
when the addend and the addend are positive numbers, the sign bit to be calculated is 0; when the addend and the added number are both negative numbers, the sign bit to be calculated is 1; when the addend and the addend are positive numbers and negative numbers, the sign bit to be calculated is the sum of the sign bit of the addend, the sign bit of the addend and the highest bit of the temporary addition result;
when the sign bit of the addend is the same as the sign bit of the addend, if the most significant bit of the temporary addition result is 1 and the third order code is 111111, the temporary output result is the temporary addition result, and if the most significant bit of the temporary addition result is 1 and the third order code is not 111111, the temporary output result is the temporary addition result shifted to the right by 4 bits;
when the sign bit of the addend is different from the sign bit of the addend, if the highest bit of the temporary addition result is 1, the temporary output result is that the temporary addition result is shifted to the right by 4 bits;
when the sign bit of the addend is different from the sign bit of the addend, if the highest bit of the temporary addition result is not 1, judging whether the lower 4 bits from the second highest bit in the temporary addition result are 0, if so, the temporary output result is that the temporary addition result is shifted to the left by 4 bits, and if not, the temporary output result is that the temporary addition result is.
Optionally, the outputting circuit normalizes the temporary output result and outputs the normalized result, and specifically includes:
if the lower 14 bits of the temporary output result are all 0, the output of the output circuit is 0_000000_0000_ 00000000000000;
if the lower 14 bits of the temporary output result are not all 0, and when the sign bit to be calculated and the highest bit of the temporary output result are both 0, the output of the output circuit is the concatenation of the sign bit to be calculated, the third-order code and the lower 14 bits of the temporary output result;
if the lower 14 bits of the temporary output result are not all 0, and when the sign bit to be calculated is 0 and the most significant bit of the temporary output result is 1, the output of the output circuit is 0_111111_0000_ 0000000000;
if the lower 14 bits of the temporary output result are not all 0, and when the sign bit to be calculated is 1 and the most significant bit of the temporary output result is 0, and the sign bit of the addend is the same as the sign bit of the addend, the output of the output circuit is the concatenation of the sign bit to be calculated, the third-order code and the lower 14 bits of the temporary output result;
if the lower 14 bits of the temporary output result are not all 0, and when the sign bit to be calculated is 1 and the most significant bit of the temporary output result is 0, and the sign bit of the addend is different from the sign bit of the addend, the output of the output circuit is the concatenation of negating the sign bit to be calculated, the third-order code and the lower 14 bits of the temporary output result and adding 1;
if the lower 14 bits of the temporary output result are not all 0, and when the sign bit to be calculated is 1 and the most significant bit of the temporary output result is 1, the output of the output circuit is 1_111111_0000_ 00000000000000.
Optionally, a first complement generator and a second complement generator are also included; the output of the first complement generator and the output of the second complement generator are both connected with the log-shift addition circuit, the first complement generator is used for calculating a complement of splicing the integer part and the fractional part of the addend, and the first complement generator is also used for calculating a complement of splicing the integer part and the fractional part of the addend; the second complement generator is for calculating the complement of the added order code.
Optionally, the apparatus further comprises a clock signal, and the clock signal is respectively connected to the pair-step shift adding circuit, the normalizing circuit and the output circuit.
Optionally, an enable input signal is further included, and the enable input signal is respectively connected with the pair-order shift adding circuit, the normalizing circuit and the output circuit.
Optionally, the apparatus further comprises a reset signal, and the reset signal is respectively connected with the pair-step shift adding circuit, the normalizing circuit and the output circuit.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the addend and the added number are 21bit numbers, and the 21bit numbers sequentially comprise a 1-bit sign bit, a 6-bit order code, a 4-bit integer part and a 10-bit decimal part from high to low according to the bits, so that the exponent bits are fewer, the shifting times are fewer, the speed is higher, and the working frequency of the adder is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of a 21-bit floating-point adder according to the present invention;
FIG. 2 is a diagram illustrating an operation process of a 21-bit floating-point adder according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a 21-bit floating-point adder, which improves the working frequency of the adder.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a schematic diagram of a 21-bit floating-point adder according to the present invention, and as shown in fig. 1, a 21-bit floating-point adder includes: a pair shift addition circuit 103, a normalization circuit 104, and an output circuit 105;
the output of the pair shift addition circuit is connected with the input of the normalization circuit 104, and the output of the normalization circuit 104 is connected with the input of the output circuit 105; the order-matching shift addition circuit 103 is used for performing order-matching shift addition on the addend and the addend to obtain a temporary addition result; the normalizing circuit 104 is configured to perform sign bit calculation and integer part shift according to the temporary addition result to obtain a temporary output result; the output circuit 105 normalizes the temporary output result and outputs the normalized result.
The addend and the summand are both 21bit numbers, and the 21bit numbers sequentially comprise a 1-bit sign bit, a 6-bit order code, a 4-bit integer part and a 10-bit fractional part from high to low.
The decimal expression for 21 bits is:
Figure 470465DEST_PATH_IMAGE001
wherein mix21(10)A decimal representing 21 bits, S a sign bit, E a gradation, I an integer part, and M a fractional part.
The 21-bit floating-point adder further comprises a first complement generator 101 and a second complement generator 102, wherein the output of the first complement generator 101 and the output of the second complement generator 102 are both connected with a log-shift addition circuit 103, the first complement generator 101 is used for calculating a complement of the splicing of an integer part and a decimal part of an addend, and the first complement generator 101 is also used for calculating a complement of the splicing of the integer part and the decimal part of an addend; the second complement generator 102 is used to calculate the complement of the added order code.
The 21-bit floating-point adder also comprises clock signals which are respectively connected with the step-pair shifting addition circuit 103, the normalization circuit 104 and the output circuit 105.
A21-bit floating-point adder further includes enable input signals respectively connected to the log-shift-add circuit 103, the normalization circuit 104, and the output circuit 105.
The 21-bit floating-point adder also comprises reset signals which are respectively connected with the order-shifting addition circuit 103, the normalization circuit 104 and the output circuit 105.
The input signals include a clock signal clock, a reset signal reset, an enable signal enable1, two 21-bit addends add1 and an addend add2, and the output signals include an enable output signal enable2 and an addition result. The 21-bit floating-point adder includes 5 components: a first complement generator 101, a second complement generator 102, a log shift addition circuit 103, a normalization circuit 104, and an output circuit 105.
The function of the step shift-and-add circuit 103 is to perform a step code shift operation, an integer decimal shift operation, a sign bit calculation, and obtain an addition result.
The normalization circuit 104 functions to perform an order code addition and a mantissa shift for an integer portion of 0.
The output circuit 105 functions to perform a boundary case discussion based on the result of the second normalization step, and then to obtain the addition result signal and the output enable signal of the entire 21-bit adder circuit.
A21-bit floating-point adder (21-bit adder) is represented by a mix21 adder. The 21-bit addend or the 21-bit summand is defined as a mix21 data type, the mix21 data type is shown in table 1, the 21-bit is divided into four parts, namely a sign bit (S), an order code (E), an integer part (I) and a decimal part (M), the most significant bit is a sign bit, 0 represents a positive number, and 1 represents a negative number; 19 bits to 14 bits are a code; bits 13 to 10 are integer parts; the lower 10 bits are the fractional part.
TABLE 1 mix21 data types
Type (B) S E I M
bit 20 19:14 13:10 9:0
Bit width 1 6 4 10
Definition of Sign bit Code order Integer part Fractional part
Condition Arbitrary value Arbitrary value Arbitrary value other than 0 Arbitrary value
From the data types of Table 1, a decimal expression for mix21 can be calculated:
Figure 276878DEST_PATH_IMAGE002
that is:
Figure 359104DEST_PATH_IMAGE003
for example, the mix21 data type represents a number 0_100001_0001_1000000000, as can be seen from Table 1: s =1, E =100001, I =0001, M =1000000000, then:
Figure 909165DEST_PATH_IMAGE004
data type conditions for mix21 in table 1: s is any value, E is any value, I is any value other than 0, and M is any value. In this case, there are 6 boundary cases for the data type expression of mix21 to be ambiguous.
E.g., decimal number 0, expressed as the data type of mix21 in table 1, S is an arbitrary value, E is an arbitrary value, I is 0, and M is 0. A case may occur where both 1_100001_0000_ 00000000000000 and 0_100101_0000_ 00000000000000 represent 0, where fixed 0 represents 0_000000_0000_ 00000000000000.
Further for example, when a positive decimal value is greater than the maximum positive representation range of mix21 (2)124X 15.9990234) where S is 0, E is the maximum value 63, I is the maximum value 15, and M is the maximum value 1023, the number cannot be represented. This number is fixed to + Infinity and denoted as 0_111111_0000_ 0000000000.
The boundaries for the data type setting mix21 are expressed as follows:
0 can only be expressed in the following table 2 format.
Expression of 0 in data types of Table 2 mix21
Types of S E I M
Numerical value 0 0 0 0
NaN, used to indicate the number is not a valid value, as shown in table 3. For skipping operations.
Expression of NaN in the data types of Table 3 mix21
Type (B) S E I M
Numerical value 0 32 0 0
+ Infinity, i.e., a positive overflow value that cannot be represented by this type, which can be obtained by dividing any positive number by 0, is shown in table 4.
TABLE 4 expression of + Infinity in data types of mix21
Type (B) S E I M
Numerical value 0 63 0 0
-Infinity, i.e. a negative overflow value that cannot be represented by this type, which can be obtained by dividing any negative number by 0, -Infinity as shown in table 5.
TABLE 5 expression of-Infinity in data types of mix21
Type (B) S E I M
Numerical value 1 63 0 0
+ min, the underflow value that cannot be represented by this type, is shown in Table 6.
Expression of + min in data types of Table 6 mix21
Type (B) S E I M
Numerical value 0 1 0 0
Min, the negative underflow value that cannot be represented by this type, as shown in Table 7.
TABLE 7 expression of-min in data types of mix21
Type (B) S E I M
Numerical value 1 1 0 0
The operation process of the 21-bit floating-point adder is shown in fig. 2, the actual operation process of the addition is completed by at least three cycles, and each operation is performed on the rising edge of a clock signal.
The 21bit floating point adder of the invention firstly reaches the initialization state (3'd 111), and initializes the sign bit (s1), the order code (e1), the integer part (i1), the fractional part (m1), the sign bit (s2), the order code (e2), the integer part (i2) and the fractional part (m2) of the addend according to the clock signal, the reset signal, the enable signal, the two 21bit addends and the addend.
The actual operation process of the 21-bit floating-point adder comprises three periods of 3'd 001, 3'd 010 and 3 ' b 100.
The first cycle (3'd 001) performs calculation of the order shift addition circuit 103, that is, calculation of the order code, and shift operation of integers and fractions based on the order code calculation result.
The order-shift-pair adding circuit 103 performs order-shift addition on the addend and the summand to obtain a temporary addition result, and specifically includes:
the largest code of the addend code e1 and the addend code e2 is denoted as the third code e 3.
If the addend code e1 is smaller than the addend code e 2. The log-shift splicing result reg _ a of the addend is the sum of the complement of the addend order code e1 and the addend order code e2 after splicing the integer part i1 and the fractional part m1 of the addend, that is, reg _ a is obtained by shifting the difference bits of e1 and e2 to the right after splicing i1 and m 1. The summand log-shifted stitching result reg _ B is the stitching of the summand integer part i2 and the fractional part m 2.
If the addend code e1 is greater than the addend code e 2. The logarithmic step shift splicing result reg _ a of the addend is the splicing of the integer part i1 and the fractional part m1 of the addend. The result reg _ B of the logarithmic shift splicing of the addend is the sum of the complement of the addend order code e1 and the addend order code e2 after the splicing of the integer part i2 and the fractional part m2 of the addend, namely reg _ B is obtained by shifting the difference bits of e1 and e2 to the right after the splicing of i2 and m 2.
If the addend code e1 is equal to the addend code e 2. The logarithmic step shift splicing result reg _ a of the addend is the splicing of the integer part i1 and the fractional part m1 of the addend. The summand log-shifted stitching result reg _ B is the stitching of the summand integer part i2 and the fractional part m 2.
And calculating the complement com _ A of the summated log-rank-shift splicing result reg _ A, and calculating the complement com _ B of the summated log-rank-shift splicing result reg _ B.
If the sign bit s1 of the addend is 0, its complement com _ a equals reg _ a itself, i.e. com _ a = reg _ a, if the sign bit s1 of the addend is 1, its complement com _ a is reg _ a minus 1; likewise, com _ B = reg _ B if the sign bit s2 of the addend is 0. If the sign bit s2 of the addend is 1, com _ B is reg _ B and negates plus 1.
If the sign bit s1 of the addend is the same as the sign bit of the addend s2, the temporary addition result temp _ add is the sum of the logarithmic-order shift splicing result reg _ a of the addend and the logarithmic-order shift splicing result reg _ a of the addend.
If the sign bit s1 of the addend is not the same as the sign bit s2 of the addend, the temporary addition result temp _ add is the sum of the complement com _ a of the addend's log-shift splicing result reg _ a and the complement com _ B of the addend's log-shift splicing result reg _ a.
The second cycle (3'd 010) is a normalized state, and sign bit calculation, shift operation of normalized integer part, and operation of order code accumulation are performed.
The normalizing circuit 104 performs sign bit calculation and integer part shift according to the temporary addition result to obtain a temporary output result, which specifically includes:
when the addend and the addend are positive numbers, the sign bit s3 to be calculated is 0; when the addend and the addend are negative numbers, the position of the symbol s3 to be calculated is 1; when the addend and the addend are a positive number and a negative number, the sign bit s3 to be calculated is the sum of the sign bit s1 of the addend, the sign bit s2 of the addend, and the highest bit of the provisional addition result.
When the sign bit s1 of the addend and the sign bit s2 of the addend are the same, if the most significant bit of the provisional addition result is 1 and the third code e3 is 111111, the provisional output result is the provisional addition result, and if the most significant bit of the provisional addition result is 1 and the third code e3 is not 111111, the provisional output result is the provisional addition result shifted to the right by 4 bits.
When the sign bit s1 of the addend is not the same as the sign bit s2 of the addend, if the most significant bit of the temporary addition result is 1, the temporary output result is that the temporary addition result is shifted to the right by 4 bits.
When the sign bit s1 of the addend is not the same as the sign bit s2 of the addend, if the most significant bit of the provisional addition result is not 1, it is determined whether the lower 4 bits from the second most significant bit in the provisional addition result are 0, if so, the provisional output result is a provisional addition result shifted left by 4 bits, and if not, the provisional output result is a provisional addition result.
The third cycle (3'd 100) is the output state, and normalization output is performed, and due to the discussion of boundary conditions in the data type and the definition of the integer part, the final result F is obtained by performing normalization on the result of the temporary addition.
The output circuit 105 normalizes the temporary output result and outputs the result, and specifically includes:
if the lower 14 bits of the provisional output result are all 0, the output of the output circuit 105 is 0_000000_0000_ 00000000000000.
If the lower 14 bits of the temporary output result are not all 0, and when the sign bit to be calculated and the most significant bit of the temporary output result are both 0, the output of the output circuit 105 is the concatenation of the sign bit to be calculated, the third order code, and the lower 14 bits of the temporary output result.
If the lower 14 bits of the provisional output result are not all 0, and when the sign bit to be calculated is 0 and the most significant bit of the provisional output result is 1, the output of the output circuit 105 is 0_111111_0000_ 00000000000000.
If the lower 14 bits of the temporary output result are not all 0, and when the sign bit to be calculated is 1 and the most significant bit of the temporary output result is 0, the sign bit of the addend is the same as the sign bit of the addend, the output of the output circuit 105 is the concatenation of the sign bit to be calculated, the third code, and the lower 14 bits of the temporary output result.
If the lower 14 bits of the temporary output result are not all 0, and when the sign bit to be calculated is 1 and the most significant bit of the temporary output result is 0, and the sign bit of the addend is not the same as the sign bit of the addend, the output of the output circuit 105 is the concatenation of the sign bit to be calculated, the third code and the lower 14 bits of the temporary output result, negating and adding 1.
If the lower 14 bits of the temporary output result are not all 0, and when the sign bit to be calculated is 1 and the most significant bit of the temporary output result is 1, the output of the output circuit 105 is 1_111111_0000_ 00000000000000.
The following briefly describes three cycles of the actual operation of the 21-bit floating-point adder of the present invention.
The first cycle (3'd 001) performs the calculation of the gradation code, and performs the shift operation of the integer and the decimal according to the calculation result of the gradation code.
For the step shift addition stage, the step code of the addition result is calculated to be the larger one of e1 and e2, and is denoted as e 3. And shifting the integer part and the decimal part of the addend to mark reg _ A, and if the addend e1 is greater than the summand e2, the reg _ A is obtained by splicing i1 and m 1. If e1 is smaller than e2, reg _ A is obtained by shifting the difference bit of e1 and e2 to the right after i1 and m1 are spliced.
And shifting the integer part and the decimal part of the summand to mark as reg _ B, and if the addend e1 is smaller than the summand e2, the reg _ B is obtained by splicing i2 and m 2. If e1 is larger than e2, reg _ B is obtained by shifting the difference bit of e1 and e2 to the right after splicing i2 and m 2. And calculating the complement results of reg _ A and reg _ B to be com _ A and com _ B respectively, wherein the com _ A and com _ B operate according to whether the sign bits of the reg _ A and reg _ B are 1 or not, and if the sign bits are 1, the reg _ A and reg _ B are inverted and added with 1 to obtain com _ A and com _ B. If the sign bit is 0, com _ A and com _ B are reg _ A and reg _ B themselves. The provisional result of the addition is temp _ add (temp _ add = (s1= = s2)
Figure 229288DEST_PATH_IMAGE005
reg _ A + reg _ B com _ A + com _ B), if s1 ands2 is the same number, the temporary result is the sum of reg _ A and reg _ B; if s1 and s2 are opposite signs, the temporary result is the sum of com _ A and com _ B. And then jump to the normalization state.
The second cycle (3'd 010) is a normalized state, and sign bit calculation, shift operation of normalized integer part, and operation of order code accumulation are performed.
The sign bit s3 of the addition result is calculated, s3=0 when both the addend and the addend are positive, s3=1 when both the addend and the addend are negative, and s3 is the sum of the sign bit of the addend, and the highest bit of the provisional addition result when both the addend and the addend are positive and negative.
And judging whether the temporary addition result needs to be shifted or not according to whether the highest bit of the temporary addition result temp _ add is 1 or not.
Addend and summand are signed: when the highest bit of temp _ add is 1, if the level code e3 is the maximum value (111111), the state is directly transited to the third period output state, and the output enable signal is pulled high. If e3 is not the maximum value, a zero padding operation is performed, and the temporary addition result temp _ add is shifted to the right by 4 bits to obtain a new temp _ add ((temp _ add [ regwidth:0] = {4' b0, temp _ add [ mwidth: 0) ]), the output state is jumped to, and the output enable signal is pulled high.
Addend and summand-plus-minus: when the highest bit of the temp _ add is 1, zero padding operation is carried out on the temp _ add, and the temporary addition result temp _ add is shifted to the right by 4 bits, so that new temp _ add is obtained (temp _ add [ regwidth:0] = {4' b0, temp _ add [ mwwidth: 0 }). If the most significant bit of temp _ add is not 1, it is determined whether the lower 4 bits of temp _ add from the next highest bit are 0. If the output signal is 0, the left shift is 4 bits, otherwise, the output state is jumped to, and the output enable signal is pulled high.
The third cycle (3'd 100) is the output state, and normalization output is performed, and due to the discussion of boundary conditions in the data type and the definition of the integer part, the final result F is obtained by performing normalization on the result of the temporary addition.
The output enable signal is high to operate: if the lower 14 bits of the temporary addition result temp _ add are all 0, the output F is 0_000000_0000_0000000000, and if the lower 14 bits of the temporary addition result temp _ add are not all 0, the output is judged by the sign bit s3 and the most significant bit of the temporary addition result temp _ add. When the sign bit s3 of the result and the most significant bit temp _ add of the temporary addition result are 00, the output F is s3, and the splicing result of e3 and the lower 14 bits of temp _ add is obtained; when the sign bit s3 and the most significant bit of the temporary addition result temp _ add are 01, the output F is 0_111111_0000_ 00000000000000; when the sign bit s3 and the highest bit of the temporary addition result temp _ add are 10, if s1 and s2 have the same sign, F is the splicing result of s3 and e3 and the lower 14 bits of temp _ add, and if s1 and s2 have different signs, F is the splicing result of s3, e3 and the lower 14 bits of temp _ add, and the lower 14 bits of temp _ add are inverted and added by 1. When the sign bit s3 and the most significant bit of the temporary addition result temp _ add are 11, the output F is 1_111111_0000_ 00000000000000.
Example 1: decimal calculation (+240) + (+24) = (+ 264). The +240 and +24 data types are shown in table 8 according to mix 21.
Mix21 data type representation of tables 8 +240 and +24
Decimal system mix21 data type representation
+240 0_100001_1111_0000000000
+24 0_100001_0001_1000000000
In the first cycle, for the step-shift addition stage, the step code of +24 is subtracted from the step code of +240, and the difference is 0, then e3= 100001. reg _ a = com _ a =1111_ 00000000000000, reg _ B = com _ B =0001_1000000000, and the provisional result of the addition of two complementary codes is temp _ add =1_0000_ 1000000000.
In the second cycle, the addend and the addend are both positive, and the sign bit of the addition s3= 0. The normalized shift results in temp _ add = 000010000100000.
Third cycle, F = 10001000010000100000.
Example 2: decimal calculation (+24) + (-1.125) = (+ 22.875). +24 and-1.125 are shown in table 9 according to mix21 data type representation.
Mix21 data type representation of tables 9 +24 and-1.125
Decimal system mix21 data type representation
+24 0_100001_0001_1000000000
-1.125 1_100000_0001_0010000000
The first cycle, for the step shift addition phase, the step of +24 minus the step of-1.125, then e3= 000001. The integer part and fractional part of +24 do not need to be shifted, reg _ a = com _ a =0001_ 1000000000. The integer and fractional parts of 1.125 are shifted, resulting in reg _ B =0000 _0001001000, and the complement com _ B =1111_ 1110111000. The provisional result of the two complement addition is temp _ add =1_ 0001_ 0110111000.
The second cycle, the addend and the summand-minus, the sum of the addition result sign bit plus the sign bit, the summand sign bit, the most significant bit of the provisional addition result, s3=1+0+1= 0.
In the third cycle, the output is determined according to s3 and temp _ add.
{ s3, temp _ add } =00, then:
F={s3,e3,temp_add[regwidth-1:0]}= 0_100001_0001_0110111000。
thus, a calculation result F =0_100001_0001_0110111000 is obtained, and the result is calculated to be +22.875 by an expression of mix 21.
In the invention, the 21bit data type is designed, the speculation is carried out through a mobilenet network, the design of a hardware circuit and the description of a hardware language are carried out, and finally the circuit is simulated and integrated. It can be seen that the 21-bit adder includes two advantages.
The present invention defines a new data type, resulting in fewer exponent bits, fewer shift times, and faster speed. The chip area is reduced and the working frequency is improved through a simpler complement generation circuit. The 21-bit floating point adder is suitable for a convolutional neural network, improves the speed and the energy efficiency of the convolutional neural network, and finally improves the performance of a chip.
The 21-bit adder of the new data type of the present invention thus has a higher operating efficiency.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. A 21-bit floating-point adder, comprising: the order shifting addition circuit, the normalization circuit and the output circuit;
the output of the order-matching shift addition circuit is connected with the input of the normalization circuit, and the output of the normalization circuit is connected with the input of the output circuit; the order-matching shift addition circuit is used for carrying out order-matching shift addition on the addend and the addend to obtain a temporary addition result; the normalization circuit is used for carrying out sign bit calculation and integer part shift according to the temporary addition result to obtain a temporary output result; the output circuit is used for normalizing the temporary output result and outputting the normalized temporary output result;
the addend and the addend are both 21bit numbers, and the 21bit numbers sequentially comprise a 1-bit sign bit, a 6-bit order code, a 4-bit integer part and a 10-bit decimal part from high to low;
the decimal expression of the 21bit number is as follows:
Figure 403958DEST_PATH_IMAGE001
wherein mix21(10)A decimal representing the 21-bit number, S represents a sign bit, E represents a code, I represents an integer part, and M represents a decimal part.
2. The 21-bit floating point adder according to claim 1, wherein the log-shift adding circuit performs a log-shift addition on the addend and the summand to obtain a temporary addition result, specifically comprising:
recording the maximum order code in the order codes of the addends and the added number as a third order code;
if the addend rank code is smaller than the addend rank code, the addend log-to-rank shift splicing result is the difference value of the addend rank code and the addend rank code, which is obtained by splicing the integer part and the decimal part of the addend, and the addend log-to-rank shift splicing result is the splicing of the integer part and the decimal part of the addend;
if the step code of the addend is greater than the step code of the addend, the split-step shift splicing result of the addend is the splicing of the integer part and the decimal part of the addend, and the split-step shift splicing result of the addend is the sum of the split-up right shift of the integer part and the decimal part of the addend and the complement of the addend step code;
if the step code of the addend is equal to the step code of the addend, the log-rank shift splicing result of the addend is the splicing of the integer part and the fractional part of the addend, and the log-rank shift splicing result of the addend is the splicing of the integer part and the fractional part of the addend;
if the sign bit of the addend is the same as the sign bit of the addend, the temporary addition result is the sum of the added logarithmic phase shift splicing result and the added logarithmic phase shift splicing result;
if the sign bit of the addend is 0, the complement of the addend pair-order shift splicing result is equal to the addend pair-order shift splicing result, and if the sign bit of the addend is 1, the complement of the addend pair-order shift splicing result is the negative of the addend pair-order shift splicing result plus 1; if the sign bit of the addend is 0, the complement of the add's logarithmic shift splicing result is equal to the addend's logarithmic shift splicing result, and if the sign bit of the addend is 1, the complement of the addend's logarithmic shift splicing result is the negation of the addend's logarithmic shift splicing result plus 1;
and if the sign bit of the addend is not the same as the sign bit of the addend, the temporary addition result is the sum of the complement of the addend opposite-order shift splicing result and the complement of the addend opposite-order shift splicing result.
3. The 21-bit floating point adder according to claim 2, wherein the normalizing circuit performs sign bit calculation and integer part shift according to the temporary addition result to obtain a temporary output result, and specifically comprises:
when the addend and the addend are positive numbers, the sign bit to be calculated is 0; when the addend and the added number are both negative numbers, the sign bit to be calculated is 1; when the addend and the addend are positive numbers and negative numbers, the sign bit to be calculated is the sum of the sign bit of the addend, the sign bit of the addend and the highest bit of the temporary addition result;
when the sign bit of the addend is the same as the sign bit of the addend, if the most significant bit of the temporary addition result is 1 and the third order code is 111111, the temporary output result is the temporary addition result, and if the most significant bit of the temporary addition result is 1 and the third order code is not 111111, the temporary output result is the temporary addition result shifted to the right by 4 bits;
when the sign bit of the addend is different from the sign bit of the addend, if the highest bit of the temporary addition result is 1, the temporary output result is that the temporary addition result is shifted to the right by 4 bits;
when the sign bit of the addend is different from the sign bit of the addend, if the highest bit of the temporary addition result is not 1, judging whether the lower 4 bits from the second highest bit in the temporary addition result are 0, if so, the temporary output result is that the temporary addition result is shifted to the left by 4 bits, and if not, the temporary output result is that the temporary addition result is.
4. The 21-bit floating-point adder according to claim 3, wherein the output circuit normalizes the temporary output result and outputs the normalized result, and specifically comprises:
if the lower 14 bits of the temporary output result are all 0, the output of the output circuit is 0_000000_0000_ 00000000000000;
if the lower 14 bits of the temporary output result are not all 0, and when the sign bit to be calculated and the highest bit of the temporary output result are both 0, the output of the output circuit is the concatenation of the sign bit to be calculated, the third-order code and the lower 14 bits of the temporary output result;
if the lower 14 bits of the temporary output result are not all 0, and when the sign bit to be calculated is 0 and the most significant bit of the temporary output result is 1, the output of the output circuit is 0_111111_0000_ 00000000000000;
if the lower 14 bits of the temporary output result are not all 0, and when the sign bit to be calculated is 1 and the most significant bit of the temporary output result is 0, and the sign bit of the addend is the same as the sign bit of the addend, the output of the output circuit is the concatenation of the sign bit to be calculated, the third-order code and the lower 14 bits of the temporary output result;
if the lower 14 bits of the temporary output result are not all 0, and when the sign bit to be calculated is 1 and the most significant bit of the temporary output result is 0, and the sign bit of the addend is different from the sign bit of the addend, the output of the output circuit is the concatenation of negating the sign bit to be calculated, the third-order code and the lower 14 bits of the temporary output result and adding 1;
if the lower 14 bits of the temporary output result are not all 0, and when the sign bit to be calculated is 1 and the most significant bit of the temporary output result is 1, the output of the output circuit is 1_111111_0000_ 00000000000000.
5. The 21-bit floating point adder according to claim 1, further comprising a first complement generator and a second complement generator; the output of the first complement generator and the output of the second complement generator are both connected with the log-shift addition circuit, the first complement generator is used for calculating a complement of splicing the integer part and the fractional part of the addend, and the first complement generator is also used for calculating a complement of splicing the integer part and the fractional part of the addend; the second complement generator is for calculating the complement of the added order code.
6. The 21-bit floating point adder according to claim 1, further comprising clock signals respectively connected to the log-shift adding circuit, the normalizing circuit, and the output circuit.
7. The 21-bit floating point adder according to claim 1, further comprising an enable input signal respectively connected to the log-shift adding circuit, the normalizing circuit and the output circuit.
8. The 21-bit floating point adder according to claim 1, further comprising reset signals respectively connected to the log-shift adding circuit, the normalizing circuit, and the output circuit.
CN202210217664.8A 2022-03-08 2022-03-08 21-bit floating-point adder Active CN114327361B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210217664.8A CN114327361B (en) 2022-03-08 2022-03-08 21-bit floating-point adder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210217664.8A CN114327361B (en) 2022-03-08 2022-03-08 21-bit floating-point adder

Publications (2)

Publication Number Publication Date
CN114327361A CN114327361A (en) 2022-04-12
CN114327361B true CN114327361B (en) 2022-05-27

Family

ID=81031167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210217664.8A Active CN114327361B (en) 2022-03-08 2022-03-08 21-bit floating-point adder

Country Status (1)

Country Link
CN (1) CN114327361B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014134932A (en) * 2013-01-09 2014-07-24 Lapis Semiconductor Co Ltd Semiconductor device and data access method
CN106970775A (en) * 2017-03-27 2017-07-21 南京大学 A kind of general adder of restructural fixed and floating
CN109343823A (en) * 2018-11-01 2019-02-15 何安平 The addition method of floating-point adder device based on asynchronous controlling and floating number
CN110688086A (en) * 2019-09-06 2020-01-14 西安交通大学 Reconfigurable integer-floating point adder
CN112230882A (en) * 2020-10-28 2021-01-15 海光信息技术股份有限公司 Floating-point number processing device, floating-point number adding device and floating-point number processing method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014134932A (en) * 2013-01-09 2014-07-24 Lapis Semiconductor Co Ltd Semiconductor device and data access method
CN106970775A (en) * 2017-03-27 2017-07-21 南京大学 A kind of general adder of restructural fixed and floating
CN109343823A (en) * 2018-11-01 2019-02-15 何安平 The addition method of floating-point adder device based on asynchronous controlling and floating number
CN110688086A (en) * 2019-09-06 2020-01-14 西安交通大学 Reconfigurable integer-floating point adder
CN112230882A (en) * 2020-10-28 2021-01-15 海光信息技术股份有限公司 Floating-point number processing device, floating-point number adding device and floating-point number processing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
计算机中浮点数的溢出与规格化处理;徐爱芸;《黑龙江科技信息》;20130615(第17期);全文 *

Also Published As

Publication number Publication date
CN114327361A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
CN105468331B (en) Independent floating point conversion unit
CN107168678B (en) Multiply-add computing device and floating-point multiply-add computing method
KR101603471B1 (en) System and method for signal processing in digital signal processors
US8549054B2 (en) Arithmetic processing apparatus and arithmetic processing method
CN108196822A (en) A kind of method and system of double-precision floating point extracting operation
US8930433B2 (en) Systems and methods for a floating-point multiplication and accumulation unit using a partial-product multiplier in digital signal processors
Jun et al. Modified non-restoring division algorithm with improved delay profile and error correction
Barik et al. Time efficient signed Vedic multiplier using redundant binary representation
JP2012528391A (en) Integer and multiply-add operations with saturation
CN108334304B (en) Digital recursive division
CN114327361B (en) 21-bit floating-point adder
CN112527239A (en) Floating point data processing method and device
CN110727412A (en) Mask-based hybrid floating-point multiplication low-power-consumption control method and device
CN108153513B (en) Leading zero prediction
CN112685001A (en) Booth multiplier and operation method thereof
CN115268832A (en) Floating point number rounding method and device and electronic equipment
CN115407966A (en) Data representation method, tensor quantization method and multiply-add calculation device
CN113504892A (en) Method, system, equipment and medium for designing multiplier lookup table
Murillo et al. PLAM: A Posit Logarithm-Approximate Multiplier for Power Efficient Posit-based DNNs
KR100974190B1 (en) Complex number multiplying method using floating point
CN116136752B (en) Method and system for determining array input strategy
EP1429239A2 (en) Floating-point multiplication
Siddamshetty et al. Efficient Hardware Architecture for Posit Addition/Subtraction
Bommana et al. A Run-time Tapered Floating-Point Adder/Subtractor Supporting Vectorization
JP3187402B2 (en) Floating point data addition / subtraction circuit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant