KR20040033198A

KR20040033198A - Floating point with multiply-add unit

Info

Publication number: KR20040033198A
Application number: KR1020020062136A
Authority: KR
Inventors: 김학윤
Original assignee: 주식회사 하이닉스반도체
Priority date: 2002-10-11
Filing date: 2002-10-11
Publication date: 2004-04-21
Also published as: KR100929423B1

Abstract

PURPOSE: A floating point MAU(Multiply-Add Unit) is provided to reduce a chip scale and enhance an operation speed by using a Booth algorithm for processing an operation at two cycles when a multiplier of the longest operation time among the 3 step pipelines of an operator is processed as an unsigned number in order to perform a floating point 'A*B+C' operation by a single instruction. CONSTITUTION: The multiplying part decides multiplication of A and C, and arranges a mantissa of B in order to add the multiplication. An adding part adds the multiplication and the mantissa. A normalizing part performs the first 0 or first 1/0 anticipation in order to normalize an adding result, and normalizes or rounds the result, or detects an overflow or underflow exception. The multiplying part comprises a multiplier(110) and an arranger(120). The multiplier comprises a partial multiplication generator(112), an adding tree tool(114), and a carry hold adder(116).

Description

Floating Point Multiplication and Accumulation Units {FLOATING POINT WITH MULTIPLY-ADD UNIT}

본 발명은 부동소수점의 곱셈 및 누산장치에 관한 것으로서 보다 상세하게는 단일명령어로 부동소수점의 A*C+B의 연산을 수행할 때 연산기의 3단 파이프라인 중 가장 연산시간이 긴 곱셈기를 무부호수처리시 2사이클에 연산을 수행할 수 있도록 부스알고리즘을 이용하여 칩의 규모를 감소시키고 연산속도를 향상시키도록 한 부동소수점의 곱셈 및 누산장치에 관한 것이다.The present invention relates to a floating point multiply and accumulator, and more particularly, to perform an A * C + B operation of a floating point with a single instruction. The present invention relates to a floating-point multiply and accumulator that uses a booth algorithm to reduce the size of the chip and improves the computation speed so that the operation can be performed in two cycles during processing.

통상적으로 부동소수점 연산기(floating point unit)는 그래픽 가속기(graphic accelerator), 디지탈 신호 처리기(digital signal processor) 및 고성능을 요구하는 컴퓨터 등에 필수적으로 사용되고 있다.Floating point units are commonly used in graphic accelerators, digital signal processors, and computers requiring high performance.

최근들어 반도체 기술분야의 발전으로 칩의 집적도가 증가함에 따라 부동소수점 연산기를 중앙처리장치(central processing unit)와 함께 한 칩에 내장할 수 있게 함으로써 부동소수점 연산기는 주 연산기의 중요한 요소로 등장하고 있다. 부동소수점 연산기가 중앙처리장치에 내장되는 경우 차지하는 면적으로 인하여 덧셈/뺄셈, 곱셈 등의 기본적인 연산기만 내장되므로, 부동소수점 곱셈기는 전체적인 부동 소수점 연산에 큰 영향을 준다.In recent years, as the integration of chips increases due to the development of semiconductor technology, floating point arithmetic units have emerged as an important element of the main arithmetic unit by allowing a floating point arithmetic unit to be embedded in a chip together with a central processing unit. . When the floating point arithmetic unit is built in the central processing unit, only the basic arithmetic units such as addition / subtraction and multiplication are built in because of the area occupied by the floating point arithmetic unit. Therefore, the floating point multiplier greatly affects the overall floating point arithmetic.

한편, 부동소수점 곱셈 연산에서 가수부의 처리는 첫째, 곱셈(multiplication), 이 곱셈과정에서 생성된 캐리(carry)와 합(sum)의 덧셈(addition), 정규화(normalization), 반올림(rounding)의 순서로 이루어지는것이 있고, 둘째, 곱셈, 덧셈, 반올림, 정규화의 4과정으로 이루어진다.On the other hand, in the floating point multiplication operation, the mantissa processing is performed in the order of first, multiplication, addition, normalization, and rounding of the carry and sum generated in the multiplication process. Second, it consists of four processes: multiplication, addition, rounding, and normalization.

상기한 과정을 수행하기 위한 부동소수점 수의 이진표현에 관한 IEEE-754 표준안에서는 32비트인 단정밀도(single precision)형식과 64비트인 배정밀도(double precision) 형식이 있다. 상기 단정밀도 형식은 1비트의 부호비트, 8비트 지수부, 23비트의 가수부로 되어 있다. 또한 배정밀도 형식은 1비트의 부호 비트, 11비트 지수부, 52비트의 가수부로 되어 있다. 여기서 IEEE표준안에 따르는 정규화된 피연산자(A)는 다음식으로 표현할 수 있다.In the IEEE-754 standard for binary representation of floating point numbers for performing the above process, there are a single precision format of 32 bits and a double precision format of 64 bits. The single-precision format includes a 1-bit sign bit, an 8-bit exponent part, and a 23-bit mantissa part. The double-precision format includes a 1-bit sign bit, an 11-bit exponent, and a 52-bit mantissa. Here, the normalized operand A according to the IEEE standard can be expressed by the following equation.

여기서 S : 부호, M : 가수, E : 지수를 각각 나타낸다.Where S: sign, M: mantissa, and E: exponent.

상기 s는 가수부에 대한 부호비트이며, M는 절대값 형태의 가수부이며, E는 바이어스(bias) 형태의 지수부이다. 가수부의 정규화된 형태는 최상위비트(MSB : Most Significant Bit)가 1인 상태이며, 부동 소수점 표현에서는 이 MSB가 생략되므로 히든비트(hidden bit)라 한다.Where s is the sign bit for the mantissa, M is the mantissa in the form of an absolute value, and E is the exponent in the form of a bias. The normalized form of the mantissa part is called a hidden bit because the most significant bit (MSB) is 1, and the MSB is omitted in the floating point representation.

일반적인 부동소수점의 곱셈 및 누산장치(MAC : Multiply-Add Unit)는 다른 값 A 및 C의 곱으로부터 하나의 값 B를 더하거나 뺌으로써 세개의 값 A, B 및 C를 결합하는 것이다. 곱셈기와 가산기를 구비하는 산술 회로는 개별적인 단계로 이와 같은 MAC 연산을 수행할 수 있어, 곱셈기를 이용하여 값 A 및 C를 곱하고, 그 결과를 라운드한 다음, 상기 가산기를 이용하여 곱셈의 결과값에 값 B를 더하거나 또는 상기 곱셈의 결과값으로부터 값 C를 뺀다.A common floating point multiply and accumulator (MAC) combines three values A, B and C by adding or subtracting one value B from the product of the other values A and C. An arithmetic circuit with a multiplier and an adder can perform these MAC operations in separate steps, multiplying the values A and C using a multiplier, rounding the result, and then using the adder to multiply the result of the multiplication. The value B is added or the value C is subtracted from the result of the multiplication.

선택적으로, 휴즈된(fused) MAC 장치는 곱셈과 누산을 병렬로 수행하고, MAC연산의 처리 성능(지연과 정확도)을 향상시키기 위해 곱의 라운딩을 생략하는 것으로써 IEEE Journal of Solid-State Circuits, vol. 25, No. 5, October 1990, Hokenek 등에 의한 'Second-Generation RISC Floating Point with Multiply-Add Fused'는 곱 A*C의 결정 후 지연없이 값 B가 곱 A*C와 누산되도록 값 A 및 C의 곱셈과 병렬로 값 B에 대한 비트 정렬을 수행하는 부동 소수점 MAC 장치가 개시되어 있다.Optionally, a fused MAC device performs multiplication and accumulation in parallel, and omits the rounding of the product to improve the processing performance (delay and accuracy) of the MAC operation, according to IEEE Journal of Solid-State Circuits, vol. 25, No. 5, October 1990, 'Second-Generation RISC Floating Point with Multiply-Add Fused', in parallel with multiplication of values A and C so that value B accumulates with product A * C without delay after determination of product A * C. A floating point MAC device is disclosed that performs bit alignment on value B.

결과 A*C+B는 에러를 유입할 수 있는 중간 곱 A*C를 라운드하거나 또는 트렁케이트하지 않고 누산된다. 부수적으로, 선두의 제로 예측기(anticipotor)는 결과 A*C+B이 누산 직후 정규화되도록 값 B가 곱 A*C와 누산되는 동안 부동 소수점 표현에 따른 결과 A*C+B를 정규화하는데 필요한 쉬프트를 식별한다. 따라서, 휴즈된 MAC 장치는 전반적으로 순차적으로 사용되는 곱셈기와 누산기보다 빠르고 정확하다.The resulting A * C + B accumulates without rounding or truncating the intermediate product A * C that might introduce an error. Incidentally, the leading zero predictor is responsible for the shift required to normalize the result A * C + B according to the floating point representation while the value B is accumulated with the product A * C so that the result A * C + B is normalized immediately after accumulation. To identify. Therefore, the fused MAC device is faster and more accurate than the multipliers and accumulators used throughout.

도 1은 전형적인 부동소수점의 A*C+B 연산을 위한 곱셈 및 누산장치를 간략하게 나타낸 도면이다.1 is a simplified diagram of a multiply and accumulator for a typical floating point A * C + B operation.

여기에 도시된 바와 같이 값 A 및 C의 곱 Ma*Mc를 결정하고 곱 Ma*Mc와 누산하기 위해 값 B의 가수 Mb를 정렬시키는 곱셈부(10)와, 값 B의 가수 Mb와 곱 Ma*Mc를 누산하는 누산부(20)와, 누산부(20)에서 누산된 결과를 정규화하기 위해 선두 제로 또는 선두 1/제로 예측을 수행하고 결과를 정규화하거나 반올림하고 또는 오버플로우와 언더플로우 예외를 검출하는 정규화부(30)로 이루어진다.Multiplier 10 for determining the product Ma * Mc of values A and C and aligning the mantissa Mb of value B to accumulate with product Ma * Mc as shown here, and the mantissa Mb and product Ma * of value B Accumulator 20 accumulating Mc and leading zero or leading 1 / zero prediction to normalize the accumulated result in accumulator 20 and normalizing or rounding the result or detecting overflow and underflow exceptions It consists of the normalization unit 30.

그리고, 곱셈부(10)는 값 A 및 B의 곱 Ma*Mb를 결정하는 곱셈기(12)와, 곱Ma*Mb와 누산하기 위해 값 C의 가수 Mc를 정렬시키고 값 A, B 및 C의 지수로 가리켜진 지수차 (Ea+Ec)-Eb로 곱 A*C에 대한 값 B의 크기에 따라 각각의 MAC를 분류하고 단순화하는 정렬기(14)로 이루어진다.Then, the multiplication unit 10 aligns the multiplier 12 which determines the product Ma * Mb of the values A and B, and the mantissa Mc of the value C to accumulate with the product Ma * Mb, and the exponents of the values A, B and C. It is composed of a sorter 14 which classifies and simplifies each MAC according to the magnitude of the value B for the product A * C with the exponential difference (Ea + Ec) -Eb indicated by.

이와 같이 부동소수점의 A*C+B의 연산을 수행함에 있어서 별도의 곱셈부와 누산부로 구성되어 있는 경우 2개 이상의 명령어를 요구함으로 연산속도 및 많은 설계 면적이 요구되는 문제점이 있을 뿐만 아니라 무부호(unsigned) 수의 곱셈연산기 3단 곱셈기를 사용함으로써 칩규모가 증가되고 연산속도의 지연을 가져오는 문제점이 있다.As described above, when a floating point A * C + B operation is performed with a separate multiplier and an accumulator, two or more instructions are required, which requires computation speed and a large design area. The use of (unsigned) number multiplier multi-stage multiplier increases the chip size and causes a delay in computation speed.

본 발명은 상기와 같은 문제점을 해결하기 위해 창작된 것으로서, 본 발명의 목적은 단일명령어로 부동소수점의 A*C+B의 연산을 수행할 때 연산기의 3단 파이프라인 중 가장 연산시간이 긴 곱셈기를 무부호수처리시 2사이클에 연산을 수행할 수 있도록 부스알고리즘을 이용하여 칩의 규모를 감소시키고 연산속도를 향상시키도록 한 부동소수점의 곱셈 및 누산장치를 제공함에 있다.The present invention has been made to solve the above problems, and an object of the present invention is a multiplier having the longest operation time among the three-stage pipelines of an operator when performing a floating point A * C + B operation with a single instruction. In order to reduce the size of the chip and improve the operation speed by using a booth algorithm to perform the operation in two cycles in the unsigned number processing, a floating point multiplication and accumulation device is provided.

도 2는 본 발명에 의한 부동소수점의 곱셈 및 누산장치의 곱셈기를 나타낸 블록구성도이다.2 is a block diagram showing a multiplier of a floating point multiply and accumulator according to the present invention.

도 4는 본 발명에 의한 부동소수점의 곱셈 및 누산장치에서 부호비트 확장 및 제거방법을 나타낸 도면이다.4 is a diagram illustrating a sign bit extension and removal method in a floating point multiply and accumulator according to the present invention.

도 5는 본 발명에 의한 부동소수점의 곱셈 및 누산장치의 곱셈과정을 나타낸 흐름도이다.5 is a flowchart illustrating a multiplication process of a floating point multiplication and accumulation device according to the present invention.

- 도면의 주요부분에 대한 부호의 설명 --Explanation of symbols for the main parts of the drawings-

10 : 곱셈부 20 : 누산부10: multiplication unit 20: accumulator

30 : 정규화부30: normalization unit

110,12 : 곱셈기 120,14 : 정렬기110,12: multiplier 120,14: sorter

112 : 부분곱 생성수단 114 : 가산트리수단112: partial product generation means 114: addition tree means

116 : 캐리보류가산수단116: carry-hold addition means

상기와 같은 목적을 실현하기 위한 본 발명은 부동소수점의 제 1값과 제 2값의 곱을 결정하는 곱셈기와 곱의 결정과 제 3값의 누산을 위한 가수를 정렬하는 정렬기를 포함하는 곱셈부와 제 3값과 제 1값과 제 2값의 곱을 누산하는 누산부와,누산된 결과를 정규화하는 정규화부로 이루어진 부동소수점의 곱셈 및 누산장치에 있어서, 곱셈부의 곱셈기는 최초 곱셈연산을 시작하는 제 1사이클에서 곱셈연산을 하고자 하는 제 1값과 제 2값에서 2n비트의 곱셈수를 n비트씩 나누어 부스알고리즘으로 부분곱을 생성하는 부분곱 생성수단과, 부분곱 생성수단에서 생성된 부분곱을 가산하여 제 1합과 제 1캐리를 생성하는 가산트리수단과, 제 1사이클에서 가산트리수단으로부터 출력된 제 1합과 제 1캐리를 상기 가산트리수단으로 피드백하여 제 2사이클에서 생성되는 제 2합과 제 2캐리를 최종가산하여 출력하는 캐리보류가산수단을 포함하여 이루어진 것을 특징으로 한다.The present invention for achieving the above object is a multiplier and a multiplier including a multiplier for determining the product of the first value and the second value of the floating point and a sorter for aligning the mantissa for the determination of the product and the accumulation of the third value In the floating-point multiplication and accumulator comprising an accumulator for accumulating a product of three values, a first value and a second value, and a normalizing unit for normalizing the accumulated result, the multiplier of the multiplier is a first cycle of starting an initial multiplication operation. A partial product generating means for generating a partial product by using a booth algorithm by dividing a multiplication number of 2n bits from the first value and the second value to be multiplied by n bits, and adding the partial product generated by the partial product generating means to the first value. An addition tree means for generating a sum and a first carry, and a first sum and first carry outputted from the addition tree means in a first cycle are fed back to the addition tree means to be generated in a second cycle; Claim characterized in that formed, including a carry hold adding means for adding the output end a second sum and a second carry.

위에서, 부분곱생성수단에서 2n비트로된 곱셈수를 분리할 때 제 1사이클에서 하위 n비트의 상위 2비트에 "0"값을 삽입하여 n+2비트로 분리하고 제 2사이클에서 상위 n비트를 하위 n비트로 이동시켜 처리하는 것을 특징으로 한다.In the above, when the multiplication product of 2n bits is separated in the partial product generating means, a value of "0" is inserted into the upper two bits of the lower n bits in the first cycle, and separated into n + 2 bits, and the upper n bits in the second cycle are lower. The processing is characterized by shifting to n bits.

또한, 부분곱생성수단에서 부분곱을 처리할 때 부호비트보수화를 적용하는 것을 특징으로 한다.In addition, it is characterized in that the code bit complementation is applied when the partial product generation means processes the partial product.

또한, 캐리보류가산수단은 3:2 어레이기법을 적용한 것을 특징으로 한다.In addition, the carry-hold adding means is characterized by applying the 3: 2 array technique.

이하, 본 발명의 바람직한 실시예를 첨부된 도면을 참조하여 설명한다. 또한 본 실시예는 본 발명의 권리범위를 한정하는 것은 아니고, 단지 예시로 제시된 것이며 종래 구성과 동일한 부분은 동일한 부호 및 명칭을 사용한다.Hereinafter, exemplary embodiments of the present invention will be described with reference to the accompanying drawings. In addition, the present embodiment is not intended to limit the scope of the present invention, but is presented by way of example only and the same parts as in the conventional configuration using the same reference numerals and names.

본원 발명은 단정밀도 무부호 숫자의 부동소수점 연산기로써 도 1에 도시된 바와 같이 값 A 및 C의 곱 Ma*Mc를 결정하고 곱 Ma*Mc와 누산하기 위해 값 B의 가수 Mb를 정렬시키는 곱셈부(10)와, 값 B의 가수 Mb와 곱 Ma*Mc를 누산하는 누산부(20)와, 누산부(20)에서 누산된 결과를 정규화하기 위해 선두 제로 또는 선두 1/제로 예측을 수행하고 결과를 정규화하거나 반올림하고 또는 오버플로우와 언더플로우 예외를 검출하는 정규화부(30)와 동일한 과정을 통해 부동소수점의 A*C+B의 곱셈 및 누산을 수행한다.The present invention is a single-precision unsigned floating point operator that determines the product Ma * Mc of values A and C and aligns the mantissa Mb of value B to accumulate with product Ma * Mc as shown in FIG. (10), an accumulator 20 accumulating the mantissa Mb of the value B and the product Ma * Mc, and a leading zero or leading 1 / zero prediction is performed to normalize the result accumulated by the accumulator 20 and the result Multiply and accumulate the floating point A * C + B through the same process as the normalization unit 30 that normalizes or rounds or detects overflow and underflow exceptions.

이때 곱셈부(10)는 도 2에 도시된 바와 같이 곱셈부는 값 A 및 C의 곱 Ma*Mc를 결정하는 곱셈기(110)와, 곱 Ma*Mc와 누산하기 위해 값 B의 가수 Mb를 정렬시키고 값 A, B 및 C의 지수로 가리켜진 지수차 (Ea+Ec)-Eb로 곱 A*C에 대한 값 B의 크기에 따라 각각의 MAC를 분류하고 단순화하는 정렬기(120)로 이루어진다.In this case, as shown in FIG. 2, the multiplier 10 aligns the multiplier 110 that determines the product Ma * Mc of the values A and C, and the mantissa Mb of the value B to accumulate with the product Ma * Mc. An exponential difference (Ea + Ec) -Eb indicated by exponents of values A, B and C consists of a sorter 120 that sorts and simplifies each MAC according to the magnitude of the value B for the product A * C.

그리고, 곱셈기(110)는 최초 곱셈연산을 시작하는 제 1사이클에서 곱셈연산을 하고자 하는 제 1값과 제 2값에서 2n비트의 곱셈수를 n비트씩 나누어 부스알고리즘으로 부분곱을 생성하는 부분곱 생성수단(112)과, 부분곱 생성수단(112)에서 생성된 부분곱을 가산하여 제 1합과 제 1캐리를 생성하는 가산트리수단(114)과, 제 1사이클에서 가산트리수단(114)으로부터 출력된 제 1합과 제 1캐리를 상기 가산트리수단(114)으로 피드백하여 제 2사이클에서 생성되는 제 2합과 제 2캐리를 최종가산하여 출력하는 캐리보류가산수단(116)으로 이루어진다.In addition, the multiplier 110 generates a partial product by generating a partial product using a booth algorithm by dividing a multiplier of 2n bits in the first value and the second value to be multiplied in the first cycle to start the first multiplication operation by n bits. Means 112, an addition tree means 114 for generating the first sum and the first carry by adding the partial products generated by the partial product generating means 112, and outputting from the addition tree means 114 in the first cycle. The first sum and the first carry are fed back to the adding tree means 114, and the carry sum adding means 116 outputs the final sum of the second sum and the second carry generated in the second cycle.

그리고 부분곱 생성수단(112)에서 곱셈수를 분할 할 때 칩면적을 줄이기 위해 2n비트의 곱셈수를 n비트씩 나누어 2사이클에 곱셈연산을 수행한다. 이때 분할방법은 도 3에 도시된 바와 같이 제 1사이클에 곱셈수의 하위 n비트와 상위 "0"값의 2비트를 포함하여 n+2비트를 형성하여 무부호 숫자를 처리할 때 부호 숫자를 처리하는 부스알고리즘을 적용하여 발생되는 잘못된 연산결과를 보정하도록 함으로써 n/2+1의 부분곱을 생성하고, 제 2사이클에서 상위 n비트를 하위 n비트로 쉬프트시킨 후 7개의 부분곱을 생성한다.When the multiplying unit 112 divides the multiplying unit, multiplying the multiplying number by 2 bits is performed by n bits to reduce the chip area. In this case, as shown in FIG. 3, in the first cycle, when the unsigned number is processed by forming an n + 2 bit by including the lower n bits of the multiplication number and the two bits of the upper "0" value in the first cycle, A partial product of n / 2 + 1 is generated by correcting an incorrect operation result generated by applying a processing algorithm, and seven subproducts are generated after shifting the upper n bits to the lower n bits in the second cycle.

또한, 부호를 갖는 부분곱들을 처리할 때 부호비트 확장에 대한 부담을 줄이기 위해 부호비트의 보수화방법을 적용하여 도 4에 도시된 바와 같이 (가)와 같이 부호비트를 확장하거나 (나)같이 부호비트 확장을 제거하게 된다.In addition, in order to reduce the burden on code bit extension when processing partial signed products, the code bit is extended as shown in (a) or the code as shown in (b) as shown in FIG. This will remove the bit extension.

본 발명에 의한 작동을 도 5의 부동소수점의 곱셈 및 누산장치의 곱셈과정을 나타낸 흐름도를 참조하여 설명한다.The operation according to the present invention will be described with reference to a flowchart showing the multiplication of the floating point multiply and accumulator of FIG.

제 1사이클에 정렬기(120)에서는 값 A,B,C의 지수를 받아들여 값 A와 값B의 지수간에 Ea+Ec를 수행하고 값 B의 지수와 자리수를 맞추기 위한 자리이동량을 결정하여 차이만큼 값 B의 가수를 이동시키 값 A와 C의 ?? Ma*Mc의 결과값에 자리수를 맞춘다.In the first cycle, the sorter 120 receives the exponents of the values A, B, and C, performs Ea + Ec between the exponents of the values A and B, and determines the shift amount to match the exponent and the number of digits of the value B. Shift the mantissa of value B by Fit the digits to the result of Ma * Mc.

한편, 곱셈기(110)에서는 값 A와 C의 부분곱을 생성하기 위해 제 1사이클에 부분곱 생성수단(112)에서 곱셈수의 하위 n비트와 상위 "0"값의 2비트를 포함하여 n+2비트를 형성하여 무부호 숫자를 처리할 때 부호 숫자를 처리하는 부스알고리즘을 적용하여 발생되는 잘못된 연산결과를 보정하도록 함으로써 n/2+1의 부분곱을 생성하여 가수트리수단(114)에서 제 1합과 제 1캐리를 생성한다.Meanwhile, in the multiplier 110, n + 2 including the lower n bits of the multiplication number and the two bits of the upper "0" value in the partial product generating unit 112 in the first cycle to generate the partial product of the values A and C. When a bit is formed and an unsigned number is processed, a partial product of n / 2 + 1 is generated by applying a booth algorithm that processes a sign number to generate a partial product of n / 2 + 1 to form a first sum in the mantissa tree means 114. And the first carry is generated.

그리고, 제 2사이클에서 상위 n비트를 하위 n비트로 쉬프트시킨 후 7개의 부분곱을 생성하여 제 1합과 제 1캐리를 가수트리수단(114)으로 피드백시키고 정렬기(120)에 발생된 값이 가수트리수단(114)으로 출력되어 캐리보류수단(116)에서 최종적으로 가산된 제 2합과 제 2캐리를 생성한다.After shifting the upper n bits to the lower n bits in the second cycle, seven partial products are generated to feed back the first sum and the first carry to the mantissa tree means 114 and the value generated in the sorter 120 is a mantissa. Outputted to the tree means 114 produces a second sum and a second carry finally added by the carry holding means 116.

그런다음 제 3사이클에 캐리보류수단(116)에서 출력된 제 2합과 제 2캐리 및 정렬기(120)의 정렬된 값이 누산부(20)로 출력된다.Then, the second sum output from the carry holding means 116 and the aligned value of the second carry and aligner 120 are output to the accumulator 20 in the third cycle.

이후의 과정은 위에서 기술한 바와 같이 누산부(20)에서 값 B의 가수 Mb와 곱 Ma*Mc를 누산하고 정규화부(30)에서 누산된 결과를 정규화하기 위해 선두 제로 또는 선두 1/제로 예측을 수행하고 결과를 정규화하거나 반올림하고 또는 오버플로우와 언더플로우 예외를 검출하는 과정을 통해 부동소수점의 A*C+B의 곱셈 및 누산을 완료한다.Subsequent processes include leading zero or leading 1 / zero prediction to accumulate the mantissa Mb of the value B and the product Ma * Mc in the accumulator 20 and normalize the result accumulated in the normalizer 30 as described above. Performs multiplication and accumulation of floating point A * C + B by performing, normalizing or rounding the result, or detecting overflow and underflow exceptions.

상기한 바와 같이 본 발명은 단일명령어로 부동소수점의 A*C+B의 연산을 수행할 때 연산기의 3단 파이프라인 중 가장 연산시간이 긴 곱셈기를 무부호수처리시 2사이클에 연산을 수행할 수 있도록 부스알고리즘을 이용하여 칩의 규모를 감소시키고 연산속도를 향상시키도록 함으로써 그래픽 기능을 내장한 휴대용 기기에 소면적, 고기능의 연산기로 내장될 수 있는 이점이 있다.As described above, in the present invention, when performing the operation of A * C + B of floating point with a single instruction, the multiplier having the longest operation time among the three-stage pipelines of the operator can be operated in two cycles when unsigned. By using a booth algorithm to reduce the size of the chip and improve the operation speed, there is an advantage that can be embedded into a small-area, high-performance calculator in a portable device with a graphics function.

Claims

A multiplier comprising a multiplier for determining the product of the first and second values of the floating point and a sorter for aligning the mantissa for the determination of the product and the accumulation of the third value, and the third and the first and second values, respectively. In the floating point multiplication and accumulation device comprising an accumulator for accumulating a product and a normalizer for normalizing the accumulated result,

The multiplier of the multiplication unit generates a partial product by generating a partial product by a booth algorithm by dividing a multiplier of n bits from the first value and the second value to be multiplied in the first cycle to start the first multiplication operation by n bits. ,

Addition tree means for generating a first sum and a first carry by adding the partial product generated by the partial product generating means;

Carry-hold adding means for feeding back the first sum and the first carry output from the adding tree means in the first cycle to the adding-tree means to finally add and output the second sum and the second carry generated in the second cycle.

Floating point multiplication and accumulation device, characterized in that consisting of.

2. The method of claim 1, wherein when the multiplying means of 2n bits is separated in the partial product generating means, a " 0 " value is inserted into the upper two bits of the lower n bits in the first cycle and separated into n + 2 bits. A multiplicand and accumulator of a floating point number characterized in that the upper n bits are moved to the lower n bits for processing.

The floating point multiplication and accumulating device according to claim 1, characterized in that code bit complementary is applied when the partial product generating means processes the partial product.

The floating point multiplication and accumulating device according to claim 1, wherein the carry-hold adding means applies a 3: 2 array technique.