JPH03180928A

JPH03180928A - Floating point multiplier

Info

Publication number: JPH03180928A
Application number: JP1318940A
Authority: JP
Inventors: Nariya Tanaka; 成弥田中; Tetsuaki Nakamigawa; 哲明中三川; Hideo Maejima; 前島　英雄
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1989-12-11
Filing date: 1989-12-11
Publication date: 1991-08-06

Abstract

PURPOSE:To speed up multiplication by simultaneously executing addition to be executed after the partial product operation of mantissa part operation and rounding addition, previously executing the operation of a correction value which may be required at the time of operating an exponential part and the detection of an overflow and an underflow, and when the correction is required, selecting only the necessary part. CONSTITUTION:An exponential arithmetic circuit 110 calculates an exponential part, previously calculates the correction value of the exponential since the correction of a normalizing circuit or a rounding circuit may execute the correction of + or -1, and checks the overflow and underflow of uncorrected data and corrected data. Namely, an adder/subtractor 2201 calculates normal exponentials. A + or -1 correcting circuit 2203 calculates a correction value which may be generated at the time of normalizing or rounding operation and an overflow/underflow deciding circuit 2202 executes checking operation when the correction is zero, and if an overflow/underflow is generated, interrupts the floating point computing processing at the time of deciding the overflow/underflow. Thus, the processing can be executed at the high speed.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、浮動小数点乗算装置に関するものである。[Detailed description of the invention] [Industrial application field] The present invention relates to floating point multiplication devices.

[Conventional technology]

第２図は、従来の浮動小数点乗算回路の代表的な例を示
すもので、６４ビット長の２つのデータＸ。FIG. 2 shows a typical example of a conventional floating point multiplication circuit, in which two pieces of data X each having a length of 64 bits are processed.

Ｙを入力すると、その浮動小数点乗算を実行し、６４ビ
ットデータＷを出力する。入力データＸ、Ｙは、○〜５
１ビットまでが仮数部を、５２〜６２ビットまでが指数
部を、６３ビットが符号情報をそれぞれ基準に基づいて
配置されている。この基準は、「フローティング　ポイ
ント　サブコミツテイーワーキング　ドキュメント」、
アイ　イーイーイー（１１Ｆ１ｏａｔｊｎｇｐｏｉｎｔ
　Ｓｕｂｃｏｍｍｉｔｔｅｅ　Ｉｌｌｏｒｋｉｎｇ　Ｄ
ｏｃｕｍｏｎｔ”。When Y is input, the floating point multiplication is executed and 64-bit data W is output. Input data X, Y is ○~5
Up to 1 bit is arranged based on the mantissa part, 52 to 62 bits are arranged based on the exponent part, and 63 bits are arranged based on the code information. This standard is a “Floating Point Subcommittee Working Document”;
Ai Eeeee (11F1oatjngpoint
Subcommittee Illorking D
ocumont”.

ＩＥＩＥＥ）　ｐ、７５４．１９８７で示されている。IEEE) p, 754.1987.

浮動小数点乗算回路の動作は、符号部、指数部、仮数部
に対する３つの演算からなっており、出力データＷの符
号は、入力データＸ、Ｙの符号が同じなら正、相異なっ
ていれば負であり、符号演算回路２０６は、入力データ
の符号ビット２１０Ｘ、　２１０Ｙの排他的論理和を求
めて信号線２１５ヘデータＷの符号ピッＩ〜として出力
する。The operation of a floating-point multiplication circuit consists of three operations on the sign part, exponent part, and mantissa part. The sign of output data W is positive if the signs of input data X and Y are the same, and negative if they are different. The sign calculation circuit 206 calculates the exclusive OR of the sign bits 210X and 210Y of the input data and outputs it to the signal line 215 as the sign bit I~ of the data W.

一指数演算回路２０５は、入力データＸ、Ｙの符号２１０
Ｘ、　、　２１０Ｙ　、指数２］、１．Ｘ、　２１１Ｙ
を入力し、同符号のとき加算、異符号のとき減算を実行
し信号線２１６へ出力する。指数補正１鴨路２０４は、
仮数部の正規化回路２０２かまるめ回路２０３から最大
１または−１の補正が要求されたときに、出力Ｗの符号
２１５が正のときは１、負のときは−１を補正し、その
結果を出力データＷの指数部として信号線２２２へ出力
する。The one-exponent calculation circuit 205 has a sign 210 of the input data X and Y.
X, , 210Y, index 2], 1. X, 211Y
are input, addition is performed when the signs are the same, subtraction is performed when the signs are different, and the result is output to the signal line 216. Index correction 1 Kamoji 204 is
When the mantissa normalization circuit 202 or the rounding circuit 203 requests a maximum correction of 1 or -1, when the sign 215 of the output W is positive, it is corrected by 1, and when it is negative, it is corrected by -1, and the result is is output to the signal line 222 as the exponent part of the output data W.

仮数部の演算は、部分積演算回路２０１．加算ユニット
（ＡＵ）　２３０、正規化回路２０２、及びまるめ−１
路２０３で実行され、入力データＸ、Ｙの仮数部の積か
ら、出力データＷの仮数部が算出される。The calculation of the mantissa part is performed by the partial product calculation circuit 201. Addition unit (AU) 230, normalization circuit 202, and round-1
The mantissa part of the output data W is calculated from the product of the mantissa parts of the input data X and Y.

入力データＸの仮数部は５２ビットで表示されているが
、この５２個の“Ｏ”、“↓”の並びを＊＊本・・・と
かくと、当該データの仮数部の真のイ直ＴＪ　ＸはＰＸ＝１．　　　＋１’＊　　本　・　　・　・　［２
コ　　　・・・　・　　（１）である。入力データＹの
仮数部１）　Ｙについても同様で、どちらも５３ピツＩ
〜の２進数となる。ただし［２コは２進数表示であるこ
とを示している。従って乗算は式（１）の形の２つの数
ＰＸ、ＰＹの間で実行され、その結果は１１、＊申串・・・［２］１０、＊　　申　傘　・　・　・　［２］　　　　　　
　　・・・・・・　（２）０１、＊寧＊・・・［２］のどれかの形になる。ただしここでの本本本・・は１０
４個の“Ｏ”、′１”の並びである。ここまでの乗算は
、まず部分積演算回路２０１へ入力データＸ、Ｙ（７）
仮数部２１２Ｘ、　２１２Ｙが入力されると、その入力
の最上位ビット（ＭＳＢ）に１を追加した式（１）の形
の数の積計算が行われる。部分積演算回路２０１は、後
述するようにＣＳ　Ａ　（ＣａｒｒｙＳａｖｅ　Ａｄｄ
ｅｒ）により構成されており、その出力はサムの集合体
２３２　（１０２ビット）、キャリーの集合体２３１　
（１０６ビット）である。１０６ビット長のＡＵ（Ａｒ
ｉｔｈｍｅｔｊ、ｃ　１Ｊｎｉｔ）　２３はこの２つの
出力を加算して式（２）の形の１０６ビットの乗算結果
を信号線２１７に出力する。The mantissa part of the input data X is PX=1. +1'* Book ・・・ [2
Ko... ・(1). Mantissa part of input data Y 1) The same goes for Y, both are 53 pits I
It becomes the binary number of ~. However, [2] indicates that it is a binary number. Therefore, multiplication is performed between the two numbers PX, PY of the form of equation (1), and the result is 11, * Shin Kushi... [2] 10, * Shin Kushi ・・・ [2]
...... (2)01, *Ning*...[2] It takes the form of one of the following. However, this book here is 10
It is a sequence of four "O" and '1'.The multiplication up to this point is performed by first inputting data X, Y (7) to the partial product calculation circuit 201.
When the mantissa parts 212X and 212Y are input, a product calculation is performed on the numbers in the form of equation (1) in which 1 is added to the most significant bit (MSB) of the input. The partial product calculation circuit 201 performs CS A (CarrySave Add
er), and its output is a sum set 232 (102 bits) and a carry set 231
(106 bits). AU (Ar
ithmetj, c 1Jnit) 23 adds these two outputs and outputs a 106-bit multiplication result in the form of equation (2) to the signal line 217.

以上の乗算結果が式（２）の最初の２つの形の７− いずれかになったときは、これを式（１）の形にして出
力データＷの仮数部とする必要がある。正規化回路２０
２はこの場合に乗算出力２１７を１ビット左ヘシフトし
、指数部の演算結果を補正すると同時に、次のまるめ演
算のために左シフト後の最下位（ＬＳＢ）を、左シフト
前のＬＳＢ　（これはシフトにより消える）及びその工
つ上位のビットの論理和に置き換えておく。When the result of the above multiplication is one of the first two forms of 7- in equation (2), it is necessary to convert this into the form of equation (1) and use it as the mantissa part of output data W. Normalization circuit 20
In this case, 2 shifts the multiplication output 217 to the left by 1 bit, corrects the calculation result of the exponent part, and at the same time converts the least significant bit (LSB) after the left shift for the next rounding operation to the LSB before the left shift (this (disappears due to shift) and the logical sum of its higher order bits.

こうして得た正規化回路２０２の出力２１８は１０５ビ
ット長となっているが、まるめ回路２０３はこれを式（
１）の形の５３ビット長のデータとする。このまるめの
方法としては、前述のＩＥＥＥ基準によると４つのモー
ドＲＺ、ＲＭ、ＲＭ、ＲＰ、及びＲＮが与えられている
。ＲＺモードはＯに近付づくようにまるめるもので、切
り捨て処理を実行する。ＲＭモードは−■に近づけるよ
うにするもので切り下げ処理を実行する。ＲＰモードは
＋■に近づけるもので切り上げ処理を実行する。最後の
ＲＮモードは四捨五入処理を実行する。これらの処理は
、小数点以下５２ビット目をり、５３ビット目８− をＧ、５４ビット目をＲ１５５ビット目以下の各ビット
のオアをＳ、演算結果の符号をＳ、としたとき、次式に
従って求められた値が小数点以下５２ビット目に加えら
れて、小数点以下５３ビット目以降の部分が除かれる。The output 218 of the normalization circuit 202 obtained in this way has a length of 105 bits, but the rounding circuit 203 converts it to the formula (
The data is 53 bits long in the form 1). According to the above-mentioned IEEE standard, four modes are provided as this rounding method: RZ, RM, RM, RP, and RN. The RZ mode rounds to approach O, and performs rounding down. The RM mode is for making the value closer to -■ and executes a rounding down process. In the RP mode, rounding up is performed in a way that brings the value closer to +■. The last RN mode performs a rounding process. These processes are performed using the following equation, where the 52nd bit after the decimal point is removed, the 53rd bit is G, the 54th bit is R1, the OR of each bit from the 55th bit is S, and the sign of the operation result is S. The value obtained according to the above is added to the 52nd bit below the decimal point, and the portion after the 53rd bit below the decimal point is removed.

ＲＺモード：ＯＲＭモード：Ｓ、・　（Ｇ＋Ｒ＋５）ＲＰモード：Ｓ、・　（Ｇ＋Ｒ＋５）ＲＮモード二Ｇ・（Ｒ＋Ｓ）＋Ｌ　−Ｇ・　（Ｒ＋Ｓ）
・・・・・・（３）こうしてまるめられた５３ビットのデータの内、ＭＳＢ
にある“１”が自動的に取り除かれ、出力データＷの仮
数部にセットされる。RZ mode: O RM mode: S, (G+R+5) RP mode: S, (G+R+5) RN mode 2G・(R+S)+L -G・ (R+S)
......(3) Of the 53 bits of data rounded in this way, the MSB
The "1" in the output data W is automatically removed and set in the mantissa part of the output data W.

特殊データ処理回路２０７は、ＩＥＥＥ基準で決められ
ている一■、＋■、Ｎ　Ａ　Ｎ　（Ｎｏｔ　Ａ　Ｎｕｍ
ｂｅｒ）など例外的な数字を検出し処理を終了させるた
めの制御回路であるが、本発明には関係ないのでその説
明は省略する。The special data processing circuit 207 has 1, +, and N A N (not a
This is a control circuit for detecting an exceptional number such as ber) and terminating the process, but since it is not related to the present invention, its explanation will be omitted.

次に、本発明に関係のある部分積演算回路２０１の従来
例を詳細に説明する。手計算による乗算と９− 同様に、乗数の１桁づつを被乗数にかけると１つづつの
部分積が生じ、こうして得た部分積を桁合わせをして加
算すれば乗算が完了する。この手計算と同じ方法をその
まま実行すると、最初の部分積と次の部分積を加算し、
その結果と次の部分積を加算するという処理を繰り返す
ことになる。しかしこれでは各加算時にキャリー伝搬が
生じるため並列処理が行えず、性能が上がらない。これ
を改善するためにＣ８Ａ方式が考案された。このＣ８Ａ
の構成は第３図に示されており、同図（ｂ）の回路４（
ｉＩＪ）を同図（ａ）のように接続して部分積演算回路
２０１が構成されている。ユニット回路４（ｉ、ｊ）は
、入力データＸ、Ｙの仮数部２１２Ｘ、　２１２Ｙを式
（１）の形にしたＰＸ、ＰＹの第Ｊｌ−ＬビットＸ（ｊ
）、Ｙ（ｉ）（ともにＬＳＢから数えてＪ＋１番目のビ
ット）のアンドをアンドゲートＡＮＤ　（ｊｌ　ｉ）で
とり、その出力と他の２つの入力ｃ　（ｉ−１１ｊＬ　
ｓ　（ｉ−１１ｊ＋１）との３入力に対する全加算を全
加算器ＦＡ（ｉ、ｊ）で算出する。その結果はサムｓ（
ｉ。Next, a conventional example of the partial product calculation circuit 201 related to the present invention will be described in detail. 9- Similarly to multiplication by hand calculation, multiplying the multiplicand by each digit of the multiplier produces one partial product at a time, and the multiplication is completed by adjusting the digits of the partial products obtained in this way and adding them. If you carry out the same method as this manual calculation, you will add the first partial product and the next partial product,
The process of adding the result and the next partial product is repeated. However, in this case, carry propagation occurs during each addition, so parallel processing cannot be performed and performance does not improve. To improve this, the C8A method was devised. This C8A
The configuration of circuit 4 (b) is shown in FIG.
A partial product calculation circuit 201 is constructed by connecting the iIJ) as shown in FIG. The unit circuit 4 (i, j) converts the Jl-L bits X(j
), Y(i) (both the J+1st bit counting from the LSB) are ANDed using an AND gate AND (jl i), and its output and the other two inputs c (i-11jL
A full adder FA(i,j) calculates a full addition for three inputs with s(i-11j+1). The result is Sam s(
i.

１〇− Ｓ）、キャリー（−ｉ、ｊ）として出力される。ここで
３入力に対する全加算器ＦＡの出力ｓ、ｃの真理値表は
第１−表の通りである。10-S), and is output as carry (-i, j). Here, the truth table of the outputs s and c of the full adder FA for three inputs is as shown in Table 1.

第　　１−　　表第３図の回路構成の動作は次の通りである。第３図（ａ
）で、Ｙ（ｉ−１）が入力されている１番−にのユニッ
ト回路・・４　（ｉ−１，ｊ＋１）。The operation of the circuit configuration shown in Table 1-Table 3 is as follows. Figure 3 (a
), the unit circuit at No. 1- to which Y(i-1) is input...4 (i-1, j+1).

４　（ｉ−工ｔ　、’］）　＋　４　（ｉ−１ｔ　ｊ−
、ｔ）は・・・は、ＰＹを乗数としたときその第（ｉ−
１）ビット目Ｙ（ｉ、−１）と被乗数１）　Ｘとの部分
積の各ビットを算出しており、またＹ　（コ）が入力さ
れている第２行目のユニット回路・・４（ｊ、ｊ→１）
。4 (i-t,']) + 4 (i-1t j-
, t) is... is the (i-th) when PY is the multiplier
1) Each bit of the partial product between bit Y (i, -1) and multiplicand 1) j, j → 1)
.

’］　（、；　ｒ　、ｊ）　＋　４　（ｊ、、　＋ｊ−
ｉ）　　・・・は、ＰＹの第」ビットＹ（：ｉ、）とＰ
Ｘとの部分積の各ビットを算出している。一方、縦方向
の各ユニット回路にはＰＸの同じ桁のビットが入力され
ているから、Ｊ−記の第１行目の各ユニット回路の出力
ビットは第２行目のそれより１桁下位に相当する。従っ
て同図のように各ユニット回路のサムＳを■っ下の行の
］っ右よりのユニツ１−回路ｌ＼、キャリーＣをすぐ下
のユニット回路へ入力し、こうして全加算器ＦＡにより
アンドゲートの出力と合わせて各ユニット回路で加算を
行えば、必要な部分積の和が求められる。しかもこの方
法によると、第３図（ａ）の各行の中でのキャリーの伝
搬はなくなり、キャリーは次の加算時に加えるというや
り方であるので、部分積同志の加算は各ビット並列に行
える。ＰＹのビット数は５３であるが、ＰＹのＬＳＢと
その１つ」このビットＹ　（ｏ）　、　ｙ　（１）に対
しては第３図（ｂ）のユニット回路は不要でアントゲ−
１−だけでよい。そしてこの２段の各アンドゲート出力
が第３段目のＹ（２）に対するユニット回路の行の全加
算器へ入力されればよいので、結局上記の部分積の加算
は５１段の全加算器の遅延時間により実行でき、高速な
演算が可能となる。'] (,; r, j) + 4 (j,, +j-
i) ... is the 'th bit Y(:i,) of PY and P
Each bit of the partial product with X is calculated. On the other hand, since the bits of the same digit of PX are input to each unit circuit in the vertical direction, the output bit of each unit circuit in the first row of J- is one digit lower than that in the second row. Equivalent to. Therefore, as shown in the figure, the sum S of each unit circuit is inputted to the unit circuit 1 from the right in the bottom row, and the carry C is input to the unit circuit immediately below, and the full adder FA performs an AND operation. By performing addition in each unit circuit together with the output of the gate, the necessary sum of partial products can be obtained. Moreover, according to this method, propagation of carries within each row in FIG. 3(a) is eliminated, and carries are added at the next addition, so addition of partial products can be performed in parallel for each bit. The number of bits in PY is 53, but for the LSB of PY and its one bit Y (o), y (1), the unit circuit shown in Fig. 3(b) is not necessary and the
Only 1- is sufficient. Then, the outputs of each AND gate of these two stages need only be input to the full adder in the row of the unit circuit for Y(2) in the third stage, so in the end, the addition of the above partial products can be performed using a 51-stage full adder. It can be executed with a delay time of , and high-speed calculation is possible.

ただし第３図（ａ）の最下段にくる各ユニット回路及び
最右列にくる各ユニット回路からは、サムのビット列と
キャリーのビット列とが出力されており、これらのビッ
ト列がサム集合体２３２、キャリー集合体２３１を形成
する。これらをＡ　Ｕ　２３０で加算したものが乗算結
果となる。この全体の様子は第４図（ａ）に示されてい
る。（ただしこの図は簡単のため各入力データを８ビッ
トとして示した）部分積演算の高速化技術の別の従来例
を次に説明する。第３図の方法は、各部分積をＣ８Ａ方
式により第４図（、）のように順次加算していくもので
あるが、これを同図（ｂ）に示すように、各部分積を奇
数と偶数に分け、各々を並列にＣ８Ａ方式により同時実
行し、それぞれのサム集合体ＡＩ、Ｂｌ及びキャリー集
合体Ａ２．Ｂ２を求める。However, each unit circuit in the bottom row and each unit circuit in the rightmost column in FIG. A carry assembly 231 is formed. The result of adding these at A U 230 is the multiplication result. The entire situation is shown in FIG. 4(a). (In this figure, however, each input data is shown as 8 bits for simplicity.) Another conventional example of a technique for increasing the speed of partial product calculation will be described next. The method shown in Figure 3 is to sequentially add each partial product using the C8A method as shown in Figure 4 (,). are divided into even numbers and executed in parallel using the C8A method, and the respective sum aggregates AI, Bl and carry aggregates A2 . Find B2.

３次に各サム集合体Ａ１．．Ｂ１の和及び各キャリー集合
体Ａ２．Ｂ２の和を求めたのち、これらの和をＡＵで加
算するもので、その詳細ばＴＳＳＣＣ８４（Ｉｎｔｅｒ
ｎａｔｊｏｎａｌ　５ｏｌｉｄ　５ｔａｔｅ　Ｃｊ、ｒ
ｃｕｉｔ　Ｃｏｎｆｅｒｅｎｃｅ）　　の９２〜９３頁
に示されている。3 Next, each thumb aggregate A1. ．． B1 and each carry aggregate A2. After calculating the sum of B2, these sums are added in AU.For details, refer to TSSCC84 (Inter
natjonal 5olid 5tate Cj,r
Cuit Conference), pages 92-93.

別の高速化技術として、Ｗａｌｌ、ａｃ＋ｑ方式と呼ば
れるものがある。本方式は、各桁ごとに木構成をとりな
がら部分積を求めるものであるが、ビット長が大きくな
ると不規則な構造のため不利である。Another speed-up technique is called the Wall, ac+q method. This method calculates partial products while forming a tree structure for each digit, but it is disadvantageous when the bit length becomes large due to the irregular structure.

しかし、第３図（ｂ）の方法で各部分積の結果をまとめ
る部分、つまり４つの集合体を２つの集合体にするよう
な部分には有効であり、Ｃ８ＡとＷａｌ、］ａｃｅの両
方式を併用するものがある。なお、Ｗａｌｌａｃｅの方
式については、ＩＥＥＥ　Ｔｒａｎｓ、　Ｅｌｅｃｔｒ
ｏｎ。However, it is effective for the part where the results of each partial product are summarized using the method shown in Figure 3(b), that is, for the part where four aggregates are made into two aggregates, and both the C8A and Wal,]ace formulas are effective. There are some that are used in combination. Regarding the Wallace method, IEEE Trans, Electr
on.

Ｃｏｍｐｕｔｅｒｓ、　ｖｏｌ、ＥＣ−１３，ＰＰ１４
−１７．　ｌｉ’ｅｂ、］９６ｉ１に示されている。Computers, vol, EC-13, PP14
-17. li'eb, ]96i1.

部分積演算のもう１つの拙守な高速化方法として、ブー
スのアルゴリズムがしげられる。本アルゴリズムの特長
は、部分積の数を半分にしてしまうところにあり、これ
によって高速化を実現して１４いる。このアルゴリズムについては、Ｑｕａｒｔ、　Ｊ
。Another modest method for speeding up partial product operations is Booth's algorithm. The feature of this algorithm is that the number of partial products is halved, thereby achieving speedup14. This algorithm is described by Quart, J.
.

Ｍｅｃｈ、　Ａｐｐｌ、　Ｍａｔｈ、、　ｖｏｌ、４．
　Ｐａｒｔ２．１９５１に示されており、第５図は本方
式の部分積演算回路の例を示すものである。第５図（ａ
）において、入力データＹの仮数部に１．をつけたデー
タＰＹはブースのデコーダ５３０へ入力され、各ユニッ
ト回路への選択信号Ｂ　（ｉ）に変換される。この信号
Ｂ（ｉ）の個数はデータＰＹのビット数の半分であり、
各信号Ｂ　（ｉ）は３本の信号線により３ビットの選択
信号となっている。ユニット回路５（ｉ。Mech, Appl, Math, vol. 4.
Part 2.1951, and FIG. 5 shows an example of a partial product calculation circuit of this method. Figure 5 (a
), 1. is added to the mantissa part of input data Y. The data PY marked with is input to the Booth decoder 530 and converted into a selection signal B (i) to each unit circuit. The number of signals B(i) is half the number of bits of data PY,
Each signal B (i) is a 3-bit selection signal via three signal lines. Unit circuit 5 (i.

ｊ）は、第５図（ｂ）のような構成となっており、デー
タＰＸ（７）２”）（７）ビットＸ　（ｊ）　、　Ｘ　
（ｊ−１）からセレクタＳＥＬ　（ｉｌ　ｊ）により５
つのデータを生成し、さらにその内の１つを上記の選択
信号Ｂ　（ｉ）の値に応じて選び出す。これが部分積の
値となるので、第３図の場合と同様に、３入力の全加算
器ＦＡ　（１１ｊ）によってＣ８Ａ方式の加算を行う。j) has a configuration as shown in FIG. 5(b), and data PX (7) 2'') (7) bits X (j),
5 from (j-1) by selector SEL (il j)
One of the data is selected according to the value of the selection signal B (i). Since this becomes the value of the partial product, C8A type addition is performed using the three-input full adder FA (11j), as in the case of FIG.

本例によれば、部分積の段数は２７段でその加算は２５
段の全加算器の遅延時間で実行できる。According to this example, the number of stages of partial products is 27, and the addition is 25.
It can be executed using the delay time of the full adder in the stage.

５− 〔発明が解決しようとするＲ題〕上記した従来技術のいずれにおいても、仮数部の乗算の
中でサム集合体とキャリー集合体の加算とまるめの加算
という２度の加算をシリアルに行っているため、処理速
度が余計にかっていた。また、指数部の演算において補
正が必要となったときに補正値の算出とそのオーバーフ
ローあるいはアンダーフローの検出を行っており、これ
も高速化の障害となっていた。5- [Problem R to be solved by the invention] In all of the above-mentioned conventional techniques, two additions are performed serially: the addition of the sum set and the carry set, and the rounding addition during the multiplication of the mantissa part. Because of this, the processing speed was increased unnecessarily. Furthermore, when a correction is required in the calculation of the exponent part, a correction value is calculated and its overflow or underflow is detected, which also poses an obstacle to speeding up the calculation.

本発明の目的は、浮動小数点の乗算をより高速に行える
ようにした浮動小数点乗算装置を提供するにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a floating-point multiplication device that can perform floating-point multiplication at higher speed.

[Means to solve the problem]

上記の目的を達成するために、本発明においては、仮数
部演算の部分積演算後の加算とまるめの加算を一度に実
行する構成とし、指数部の演算では、指数の演算時にお
こる可能性のある補正値の演算及びオーバーフロー及び
アンダーフローの検出を予め行っておき、補正が必要と
なったときにはそれらを単に選択する構成とし、また、
部分積演算では５以上の入力をもつ全加算器を用いて同
時に２つ以上の部分積の和を求める構成とした。In order to achieve the above object, the present invention has a configuration in which the addition after partial product operation in the mantissa operation and the rounding addition are executed at the same time, and in the exponent operation, the Calculation of a certain correction value and detection of overflow and underflow are performed in advance, and when correction becomes necessary, these are simply selected, and
In the partial product operation, a full adder with five or more inputs is used to calculate the sum of two or more partial products at the same time.

[For production]

仮数部演算において、部分積をＣ８Ａ方式で加算して得
たサム集合体及びキャリー集合体からまるめに必要な補
正値を生成すれば、これは簡単な処理で短時間で行える
から、この補正値と２つの集合体とを同時に加算すれば
まるめのための加算時間を別に設ける必要がなくなる。In the mantissa calculation, if the correction value necessary for rounding is generated from the sum set and carry set obtained by adding the partial products using the C8A method, this correction value can be done easily and in a short time. If the sum and the two aggregates are added at the same time, there is no need to provide additional addition time for rounding.

また、指数部の演算とその補正値の演算、及びオーバー
フローあるいはアンダーフローの検出を予め実行してお
けば、仮数部の結果がでたときには選択するだけでよい
ので、指数部の補正およびオーバーフロー／アンダーフ
ローの検出時間を節約できる。さらに、５以上の入力を
もつ全加算器を用いて部分積の加算を複数個並列に行え
ば、処理時間が短縮され、高速化がはかれる。In addition, if you perform the calculation of the exponent part, the calculation of its correction value, and the detection of overflow or underflow in advance, you only need to select it when the result of the mantissa part is obtained. Underflow detection time can be saved. Furthermore, if a plurality of partial products are added in parallel using a full adder having five or more inputs, the processing time can be shortened and the processing speed can be increased.

〔Example〕

以下、本発明を実施例により説明する。本発明の浮動小
数点乗算回路１００は、指数演算部と仮数１７− 演算部の構成が従来の第２図と異なっており、以下では
この異なった部分を説明する。The present invention will be explained below using examples. The floating point multiplication circuit 100 of the present invention differs from the conventional structure shown in FIG. 2 in the configuration of the exponent operation section and the mantissa operation section, and these different parts will be explained below.

指数演算回路１１０は、入力データＸ、Ｙの指数部２１
１Ｘ、　２１１Ｙの各１１ビットと符号演算回路２０６
の出力２１５から指数の計算を実行するが、この他に正
規化回路またはまるめ回路で±１の補正を実行する可能
性があるため、予め指数の補正値を計算し、かつ補正前
のデータと補正後のデータのオーバーフロー及びアンダ
ーフローをチエツクしておく。第６図は、指数演算回路
１１０の構成を示すもので、加減算器２２０１は従来通
りの指数の計算を行う。±１補正回路２２０３は、予め
正規化またはまるめ時に発生する可能性のある補正値を
演算しておく。オーバーフロー／アンダーフロー判定回
路２２０２は、補正がＯのときのチエツクを行い、もし
オーバーフロー／アンダーフローが発生していればその
判定時点に浮動小数点演算処理を中断してもよい。なぜ
なら、補正前にすでにオーバーフロー／アンダーフロー
していれば、補正後もオーバーフロー／アンダーフロー
しているからである。The exponent calculation circuit 110 calculates the exponent part 21 of input data X, Y.
11 bits each of 1X, 211Y and sign operation circuit 206
Calculation of the exponent is performed from the output 215 of Check for overflow and underflow of the data after correction. FIG. 6 shows the configuration of the exponent calculation circuit 110, in which an adder/subtractor 2201 performs conventional exponent calculations. The ±1 correction circuit 2203 calculates in advance correction values that may occur during normalization or rounding. The overflow/underflow determination circuit 2202 checks when the correction is O, and if overflow/underflow has occurred, it may interrupt the floating point arithmetic processing at the time of the determination. This is because if overflow/underflow has already occurred before correction, overflow/underflow will still occur after correction.

オーバーフロー／アンダーフロー判定回路２２０４は±
１補正回路２２０３出力の判定を予め行っておく。The overflow/underflow determination circuit 2204 is ±
The output of the 1 correction circuit 2203 is determined in advance.

この判定によりオーバーフロー／アンダーフローが検出
されたときに処理中断を行う場合には、検出後すぐに中
断せず、補正が必要であると判明した時点に中断する。When interrupting the process when an overflow/underflow is detected by this determination, the process is not interrupted immediately after the detection, but is interrupted when it becomes clear that correction is necessary.

以上の処理は仮数部の演算結果により補正するかどうか
わかるまでに補正値の演算及びオーバーフロー／アンダ
ーフローの判定処理が予め終わってしまうから、後はそ
の選択のみですみ、全体の処理時間を短縮できる。In the above process, the calculation of the correction value and the overflow/underflow judgment process are already completed before it is determined whether or not to correct based on the calculation result of the mantissa, so all that is left to do is to make the selection, reducing the overall processing time. can.

第１図に戻って、仮数部の演算について説明する。入力
データＸ、Ｙの仮数部２］２Ｘ、　２＋、２Ｙから式（
１）で説明したデータＰＸ、ＰＹ　（各々５３ビット）
を生成し、Ｃ８Ａ方式で部分積の和を求め、サム集合体
２３２とキャリー集合体２３１を部分積演算回路２０１
で算出するまでは従来と同じである。本発明実施例では
このあと、各集合体２３２．２３１をＡＵで加算して１
０６ビットの結果を得るのではなく、各集合体２３２，
２３１の下位ビットより補正値発生回路１】６で補正値
を生成し、各集合体２３２．２３コの上位ビットの加算
とまるめの加算を同時に行う方法をとる。最後に正規化
を行い、処理を完了する。Returning to FIG. 1, the calculation of the mantissa part will be explained. Mantissa part 2 of input data X, Y] From 2X, 2+, 2Y, formula (
Data PX and PY explained in 1) (53 bits each)
is generated, the sum of the partial products is calculated using the C8A method, and the sum set 232 and carry set 231 are transferred to the partial product calculation circuit 201.
The calculation is the same as before. In the embodiment of the present invention, after this, each aggregate 232.231 is added by AU to 1
Rather than obtaining a 06-bit result, each aggregate 232,
A correction value generation circuit 1]6 generates a correction value from the lower bits of 231 and performs addition of the upper bits of each set 232.23 and rounding addition at the same time. Finally, normalization is performed to complete the process.

第７図は補ＪＩＥ発生の説明図で、２つの集合体２３１
．２３２が入力である。今これらのデータのピッＩ・位
置を■、Ｓ　１３から第Ｏピッ［〜、第１ビット・・と
呼ぶことにすると、集合体２３１．、２３２の小数点付
置は第１０４ピッＩ−と１０３ピツ１〜との間にある。FIG. 7 is an explanatory diagram of supplementary JIE generation, in which two aggregates 231
．． 232 is an input. Now, if we call the bits I and positions of these data as ■, S13 to Oth bit [~, 1st bit...], then the set 231. , 232, the decimal point is placed between the 104th pitch I- and the 103rd pitch 1~.

そしてこれらの加算結果は第１０５ピツ１〜（ＭＳＢ）
にＮ　１　＋＋が発生する場合がある。従来の方法では
この加算を行い、Ｍ、　Ｓ　１３に１１１”が立つと１
−ビットシフトをするという正規化がまず行われた後ま
るめ処理が行われたが、本発明実施例では加算も正規化
も行う前にまるめのための処理を行う。そこで、加算結
果のＭＳＢがＩＩ　Ｑ　１１の場合をＡデータ型式、パ
１”の場合をＢデータ型式とよぶと、この両方の場合に
対するまるめの処理を予め行っておく必要がある。従っ
て、式“３”で説明したまるめ処理のための情報はＡデ
ータ型式の場合は第５２〜５０ビットがＬｌ、、　Ｇｌ
、　Ｒ１，第４９ビット〜○ビットのオアがＳｌであり
、Ｂデータ型式の場合＝１９は（まだシフトしていないので）第５３〜５１ビットが
Ｌ２．　Ｇ２．　Ｒ２，第５０−０ビットのオアがＳ２
’ｔｌ−ある。そしてまるめ処理は、Ａデータ型式の場
合第５２ピッＩ−目にまるめによる補正と、本来乗算す
ることにより得られる第５１〜Ｏビットからのキャリー
信号の２つの補正をする。その補正値はＯ５＋１、＋２
をとる。０は下からのキャリー信号がなくかつまるめに
よる補正がない場合、＋１は下からのキャリー信号があ
るかまたはまるめによる補正がある場合、＋２は下から
のキャリーがありかつまるめによる補正もある場合であ
る。一方、Ｂデータ型式の場合は、Ａデータ型式の場合
と、各ビットが１つ上位へ移る点を除けば同じ処理でよ
い。And these addition results are the 105th pits 1~ (MSB)
N 1 ++ may occur in some cases. In the conventional method, this addition is performed, and when 111" stands in M and S 13, 1
- Normalization with bit shifting is first performed, and then rounding processing is performed, but in the embodiment of the present invention, rounding processing is performed before addition and normalization. Therefore, when the MSB of the addition result is II Q 11, it is called the A data format, and when it is Pa 1", it is called the B data format. It is necessary to perform rounding processing for both of these cases in advance. Therefore, the formula In the case of the A data format, the information for the rounding process explained in "3" is Ll, Gl for the 52nd to 50th bits.
, R1, the OR of the 49th bit to ○ bit is Sl, and in the case of B data type = 19 (because it has not been shifted yet), the 53rd to 51st bits are L2. G2. R2, OR of 50th-0th bit is S2
'tl-There is. In the rounding process, in the case of the A data type, two corrections are performed: rounding of the 52nd pitch I-to, and correction of the carry signal from the 51st to O bits originally obtained by multiplication. The correction value is O5+1, +2
Take. 0 is when there is no carry signal from below and there is no correction due to rounding, +1 is when there is a carry signal from below or there is correction due to rounding, +2 is when there is a carry from below and there is also correction due to rounding. It is. On the other hand, in the case of the B data format, the same processing as in the case of the A data format may be performed except that each bit is moved up by one.

第上図の補正値発生回路１１６は、キャリー集合体２３
２及びサム集合体２３１のそれぞれの下位半分の第５３
〜０ビットを取り込み、第８図の処理１１６１をまず実
行して、まるめのための情報を発生する。The correction value generation circuit 116 shown in the upper figure has a carry aggregate 23.
2 and the 53rd of each lower half of the sum aggregate 231
.about.0 bit is taken in, and processing 1161 in FIG. 8 is first executed to generate information for rounding.

ここでフラグａ、ｂ、及びＣは、２つの集合体２３］、
、　２３２の加算結果の第５０〜○ビット、第５１〜０
ピツ１〜、及び５２〜○ビットがオール○かどうかを示
すもので、まるめ処理のためＳｌ、、Ｓ２ビットを得る
ためのものである。Ｓｌ、Ｓ２は加算結果の論理和であ
るが、その論理的な反対は加算結果のオールＯと等価で
あるからである。そしてこのオールＯの検出は特開昭６
３−２０８９３８号に示されているように、簡単な回路
で行え、その処理時間も小さい。Here, flags a, b, and C are two aggregates 23],
, 50th to ○ bits, 51st to 0th bits of the addition result of 232
This indicates whether or not the bits 1 to 52 are all ○, and are used to obtain bits Sl, S2 for rounding processing. This is because Sl and S2 are the logical sum of the addition results, and their logical opposite is equivalent to the addition result of all O's. And this detection of all O
As shown in Japanese Patent No. 3-208938, this can be done with a simple circuit and the processing time is short.

つづ＜ｂ−ｆは、加算結果の第５１〜０．５２〜○、及
び５０〜０ピツＩ〜までのキャリー出力を示すフラグ、
ｇ−Ｑは図示の各ビット位置の（ｆ＆、ｍは符号信号で
、これらも簡単に求められる。Continuation<b-f is a flag indicating the carry output from 51st to 0.52nd to ○ and 50th to 0th I of the addition result,
gQ is (f&, m are code signals of each bit position shown in the figure, and these can also be easily obtained.

以上の１３個のデータが求まると、まるめモードに応じ
た補正発生処理１１６２が実行される。この内容は第９
図に示されており、式（３）で説明した各モード対応の
補正値を第８図のデータから求めるもので、これらの補
正値２３３はまるめ回ｇ１．１５へ入力される。Once the above 13 pieces of data are obtained, a correction generation process 1162 corresponding to the rounding mode is executed. This content is the 9th
The correction values corresponding to each mode shown in the figure and explained using equation (3) are obtained from the data in FIG. 8, and these correction values 233 are input to the rounding g1.15.

第１０図はまるめ回路１１５の構成を示すもので、部分
積演算回路２０１の出力のサム集合体２３２及びキャリ
ー集合体２３１の上半分（第１０５〜５２ビット）と上
記の補正値２３３が入力され、Ａデータ型式用、Ｂデー
タ型式用の３入力ＡＵ回路２１０１．２１０２で加算さ
れる。これによって各データ型式のときの、仮数部乗算
結果をまるめた値が各ＡＵ回路出力２３４．２３５とし
て出力される。さらに正規化回路１１４のセレクト信号
２３５として、３入力ＡＵ回路２１０１の第１０５ビッ
トのデータ２７３が出力される。正規化回路１１４は、
セレクタとなっており、セレクト信号１７１１により、
目標のデータを切り変える。FIG. 10 shows the configuration of the rounding circuit 115, in which the upper half (105th to 52nd bits) of the sum set 232 and carry set 231 of the output of the partial product calculation circuit 201 and the above correction value 233 are input. , A data type, and B data type 3-input AU circuits 2101 and 2102. As a result, a value obtained by rounding the mantissa multiplication result for each data type is output as each AU circuit output 234.235. Furthermore, the 105th bit data 273 of the 3-input AU circuit 2101 is output as the select signal 235 of the normalization circuit 114. The normalization circuit 114 is
It is a selector, and by the select signal 1711,
Switch target data.

セレクト信号２７３がＯであれば　れデータ型式を１で
あれば、Ｂデータ形式を出力すればよい。このようにし
て、本実施例によれば、第８図、第９図の補正値の算出
は簡単な論理処理で極めて短い時間を要するだけである
から、２つの集合体２３２．２３１の加算とまるめのた
めの加算をまるめ回路１１５で同時に実行できる。If the select signal 273 is O, then if the data type is 1, then the B data format may be output. In this way, according to the present embodiment, since the calculation of the correction values in FIGS. 8 and 9 is a simple logical process and requires only a very short time, it is possible to calculate the correction values in FIGS. Addition for rounding can be performed simultaneously by the rounding circuit 115.

次に、５以上の入力をもつ全加算器を用いて部分積演算
を高速化したいくつかの実施例を説明する。以下では７
入力の全加算器を用いるものとし、その具体的な構成例
を第は１１図に示す。これは３入力全加算器を４個用い
て構成したもので、７つの入力１００１〜１００７を加
算して、３つの出力、即ちサムＳ、第１のキャリーｃ１
．第２のキャリーＣ２を得る。この入出力の関係は先に
示した３入力のときの真理値表（第１表）と同様に示せ
るが、入力の組み合わせが２’＝１２８通りあって長く
なるので省略する。要は７つの入力（すべてがＯ”か“
ｌ”）の内の“１”の個数をｎとしたときこれを２進数
で表したものが（ｃ２．ｃｌ、ｓ）になる。Next, several embodiments will be described in which full adders with five or more inputs are used to speed up partial product operations. Below, 7
An input full adder is used, and a specific example of its configuration is shown in FIG. This is constructed using four 3-input full adders, and adds seven inputs 1001 to 1007 to produce three outputs: sum S, first carry c1
．． Obtain a second carry C2. This input/output relationship can be shown in the same way as the truth table for three inputs (Table 1) shown above, but since there are 2'=128 combinations of inputs, the table is long, so it will be omitted. In short, there are 7 inputs (all are “O” or “
When the number of "1"s in "1") is defined as n, the binary representation of this is (c2.cl, s).

このように７入力とすると、出力は１０進でＯ〜７であ
るから、２進数では３ビット必要で、２つのキャリーｃ
２．ｃｌが出力として必要になる。そして第１１図の回
路構成では、回路の遅延時間は３入力全加算器３段分の
遅延時間に相当し、面積は４個分に相当する。なお、７
入力の全加算器を第１１図のような構成ではなく、直接
、プール代数を用いた最適な回路にしてもよい。If there are 7 inputs in this way, the output is 0 to 7 in decimal, so 3 bits are required in binary, and 2 carries c
2. cl is required as output. In the circuit configuration shown in FIG. 11, the delay time of the circuit corresponds to the delay time of three stages of three-input full adders, and the area corresponds to four stages. In addition, 7
The input full adder may not be constructed as shown in FIG. 11, but may be directly constructed into an optimal circuit using pool algebra.

第１２図はこの７入力全加算器（ＦＡ）を用いたＣ８Ａ
方式の部分積演算回路の一実施例を示しており、部分積
の第ｎ＋２．ｎ＋１．ｎ桁目（ピッ２３ト位置と同じ）の部分積［ｉ］〜［ｉ　＋　７］を加え
る部分を抜きだしたものである。同図の左下の７入力全
加算器を例にとると、入力１００５〜１００７にはすぐ
上のサム出力Ｓ、その右のキャリー出力ｃ１．及びさら
にその右のキャリー出力Ｃ２が入力され、これらが部分
積［ｉ］〜［ｉ　＋　３　］までを加えた結果得られた
サム及びキャリーである。Figure 12 shows a C8A using this 7-input full adder (FA).
This figure shows an example of a partial product calculation circuit based on the partial product calculation circuit of the n+2. n+1. This is an extraction of the part to which the partial products [i] to [i + 7] of the n-th digit (same as the pit position) are added. Taking the 7-input full adder at the bottom left of the figure as an example, inputs 1005 to 1007 have a sum output S immediately above, a carry output c1 to the right of it, and a carry output c1. Further, the carry output C2 on the right thereof is input, and these are the sum and carry obtained as a result of adding the partial products [i] to [i + 3].

方、入力１００１〜１００４には部分積［ｉ＋４］〜［
ｉ＋７］のｎ桁目（図の丸印。これは乗数、被乗数の対
応ビットのアンドをとった値）がそのまま入力される。On the other hand, inputs 1001 to 1004 contain partial products [i+4] to [
i+7] (circled in the diagram; this is the value obtained by ANDing the corresponding bits of the multiplier and multiplicand) is input as is.

このような接続関係はどの全加算器に対しても同様で（
周辺部を除いて）規則的になっている。なお、ここでは
、７入力データの内、上段の演算出力を入力１００５〜
１００７へ、今求めた部分積を入力１００１〜１００４
へ入力したが、論理的には７入力は対称であるから、ど
の入力にどれが入ってもよい。しかし一般に、上段の演
算出力がクリティカルパスとなるため、全加算器の遅延
時間の小さい方から３つをこの演算出力に使う接続方法
が好ましい。This kind of connection relationship is the same for any full adder (
(except for the periphery) is regular. In addition, here, among the 7 input data, the upper stage calculation output is input 1005 to 1005.
Input the partial product you just found into 1007 1001 to 1004
However, logically, the 7 inputs are symmetrical, so any input can go into any input. However, in general, since the upper stage calculation output becomes a critical path, it is preferable to use a connection method in which three full adders with the smallest delay time are used for this calculation output.

２４− 次に本実施例の動作を説明する。第１３図は５３ビット
×５３ビットの部分積演算を行う場合の、ｎ桁目の演算
方法を示すもので、第１２図の接続関係からｎ桁目のみ
をとり出している。まず５３ビットのデータから第５図
で述べたブースのアルゴリズムを通して２７個の部分積
が生成されているとする。24- Next, the operation of this embodiment will be explained. FIG. 13 shows a calculation method for the n-th digit when performing a 53-bit×53-bit partial product operation, and only the n-th digit is extracted from the connection relationship shown in FIG. 12. First, it is assumed that 27 partial products are generated from 53-bit data through the Booth algorithm described in FIG.

本図で丸印に数字が書かれているものが、このｎ桁目の
各部分積を示している。この２７個の部分積を７入力全
加算器を使って演算すると、第１段目の７入力全加算器
は、部分積１〜７の和を算出する。第２段目の加算器は
、部分積８〜１１と第１段目の出力との和を、以後部分
積４個づつを１段ごとに加算して、全部で６段の７入力
全加算器でｎ桁目（ｎはどこでもよい）の値が算出され
る。第６段目の演算結果は各桁とも３つ存在するため、
よく知られている３入力１１ａｌｌａｃｅの方式により
２つの出力にする。こうして第↑図の補正発生回路１１
６及びまるめ回路１１５への２つの入力データ２３２．
２３１とする。なお、第１２図では部分積を求める部分
を単に丸印で示したが、ブースのアルゴリズムを使う場
合には第５図に示したような部分積生成のための回路構
成を７入力全加算器に対しても行う必要がある。この詳
細は第１４図、第１５図に示されており、第１４図は第
１３図の７入力がすべて部分積である全加算器９０］、
、　９０２の場合の構成である。In this figure, the numbers written in circles indicate the n-th digit partial products. When these 27 partial products are calculated using a 7-input full adder, the 7-input full adder in the first stage calculates the sum of partial products 1 to 7. The second stage adder adds the sum of partial products 8 to 11 and the output of the first stage, and then adds 4 partial products at each stage, making a total of 6 stages of 7-input full addition. The value of the n-th digit (n can be anywhere) is calculated by the device. Since there are three calculation results for each digit in the sixth stage,
The well-known three-input 11allace method produces two outputs. In this way, the correction generation circuit 11 shown in Fig.
6 and two input data 232 to the rounding circuit 115.
231. Note that in Figure 12, the part for calculating partial products is simply indicated by a circle, but when using Booth's algorithm, the circuit configuration for partial product generation as shown in Figure 5 can be changed to a 7-input full adder. It is also necessary to do so. The details are shown in FIGS. 14 and 15, and FIG. 14 shows the full adder 90 in FIG. 13 whose seven inputs are all partial products],
, 902.

入力データＸの仮数部を式（１）の形にしたデータＸＰ
の８つの連続したビットＸ（ｊ−１）〜Ｘ（ｊ＋６）の
２つづつから各セレクタＳ　Ｅ　Ｌはそれぞれ５つの値
を生威しくこれは第５図のセレクタと同じもの）、ブー
スのデコーダ出力Ｂ　（ｉ）〜Ｂ　（ｊ＋６）によりそ
れぞれ土つが選択されて７入力全加算器１０００の入力
１００１〜１００７どなる。また第１５図は第１３図の
第２回目行以降のＦＡ演算を行う場合で、第１４図の構
成でキャリーＣ１，、Ｃ２゜サムＳの入力部分の部分積
をとる回路（セレクタ）を除去した構成になっている。Data XP in which the mantissa part of input data X is in the form of formula (1)
Each selector SEL generates five values from two of each of the eight consecutive bits X(j-1) to X(j+6) (this is the same as the selector in Figure 5), and Booth's The inputs 1001 to 1007 of the 7-input full adder 1000 are selected by the decoder outputs B (i) to B (j+6), respectively. Also, Fig. 15 shows the case where FA calculations are performed after the second row in Fig. 13, and the circuit (selector) that takes the partial product of the input part of carry C1, C2 ° sum S is removed in the configuration shown in Fig. 14. The structure is as follows.

本実施例によると、２７個のデータの部分積演算を実施
するために、６段の７入力加算器と３入力ｌ１ｌａｌｌ
ａｃｅ回路の遅延時間により部分積演算が実行できる。According to this embodiment, in order to perform a partial product operation on 27 pieces of data, a 6-stage 7-input adder and a 3-input l1llall are used.
Partial product operations can be performed using the delay time of the ace circuit.

これは、７入力全加算器を第１１図の回路とすると、３
入力全カロ算器の遅延時間しこ換算して１９段となり、
従来のブース方式とＣ８Ａ方式を組み合わせたときの２
７段に比へさらに高速となる。If the 7-input full adder is the circuit shown in Figure 11, then 3
The delay time of the input all Calorie calculator is converted into 19 steps,
2 when combining the conventional booth method and C8A method
Even faster than the 7th gear.

第１６図は７入力全加算器を用いた部分積演算回路の別
の実施例を示すもので、ｎ桁１」の部分積演算の部分の
みを示している。５３ビットの入力はブースのデコーダ
と第１４図及び第１５図で示した回路により２７個の部
分積に変換され（丸印を付けた１〜２７の記号で示した
）、２系列の７入力全加算器により並列に加算される。FIG. 16 shows another embodiment of a partial product calculation circuit using a 7-input full adder, and shows only the part for partial product calculation of n digits 1. The 53-bit input is converted into 27 partial products (indicated by symbols 1 to 27 with circles) by the Booth decoder and the circuit shown in Figures 14 and 15, resulting in two series of 7 inputs. They are added in parallel by a full adder.

７入力全加算器９０１゜及び９０２は、部分積の第１〜
７番目、及び８〜１４番目をそれぞれ加算する。以下、
７入力全加算器９０３は部分積の１５〜１８番目と他の
桁からの２つのキャリーと全加算器９０１のサムＳとの
加算を、７入力全加算器９０４は部分積の第１９〜２２
番１１と他の桁からの２つのキャリーと全加算器９０２
のサムＳとの加算を・・・という具合に構成されている
。The 7-input full adders 901° and 902 add the first to partial products.
Add the 7th and 8th to 14th, respectively. below,
The 7-input full adder 903 adds the 15th to 18th partial products, two carries from other digits, and the sum S of the full adder 901, and the 7-input full adder 904 adds the 19th to 22nd partial products.
Two carries from number 11 and other digits and full adder 902
The configuration is such that the sum S is added to the sum S.

ここで他の桁からのキャリーｃｌ、ａ２は、それぞれ１
つ下、２つ下の桁からのキャリー出力である。Here, carry cl and a2 from other digits are each 1
This is the carry output from the next or second digit below.

そして最後にｂｌ　ａ　ｌ　、１．　ａ　ｃ　ｅの回路
（６入力全加算器）７によりキャリー集合体２３１およびサム集合体２３２を
生成する。And finally bl a l, 1. A carry aggregate 231 and a sum aggregate 232 are generated by the ace circuit (6-input full adder) 7 .

第１７図は上記実施例の動作説明図で、和を２系統に分
けて並列に求める。即ち、第１回目に部分積１〜７と同
８〜１４を同時に演算し、次に部分積１５〜１８と同１
９〜２２の各々４つを同時に加算し、次に部分積２３〜
２４と同２５〜２７を同時に加算しくこの段階では４つ
づつはないが、余った所は０を入力）、最後にＷａｌ、
ｉａｃ、ｅの回路でまとめ、集合体２３１．、２３２と
して第１−図のまるめ回路１１５及び補正植発生回路】
コロへ出力する。以上の演算の遅延時間は、７入力全加
算器３段と６入力Ｗａｌｌａｃｅ回路ｊ段分であり、こ
れら各加算器を第１１図の構成とすると、３入力全加算
器に換算して１２段分の遅延時間となり、さらに高速化
が遠戚できる。FIG. 17 is an explanatory diagram of the operation of the above embodiment, in which the sum is divided into two systems and calculated in parallel. That is, the first time, partial products 1 to 7 and 8 to 14 are calculated simultaneously, and then partial products 15 to 18 and 1 are calculated simultaneously.
Add 4 each of 9 to 22 at the same time, then add partial products 23 to 22 at the same time.
24 and 25 to 27 are added at the same time, so at this stage there are not 4 each, but enter 0 in the remaining places), and finally Wal,
iac, e circuit, aggregate 231. , 232 as the rounding circuit 115 and correction cropping circuit in Figure 1]
Output to Coro. The delay time for the above calculation is equivalent to three stages of 7-input full adders and j stages of 6-input Wallace circuits.If each of these adders is configured as shown in Fig. 11, it is equivalent to 12 stages in terms of a 3-input full adder. The delay time will be 1 minute, and even faster speeds will be possible.

なお、本実施例では第３回目の演算を行う７入力全加算
器への入力数が不足し、そこへＯを入力するものとした
が、Ｃ８Ａ方式による７入力加算は第２回目の演算まで
とし、部分積２３〜２７はＷａｌｌ、ａｅｅの方式で加
算してもよい。この場合は１１入力のＷａｌ、１ａｃｅ
同２８路を必要とし、全体の遅延時間は２段の７入力全加算器
と１１入力ＬＮａｌｌａｃｅ回路１段分となる。これを
３入力全加算器の遅延時間で見ると１１段となる。In addition, in this example, the number of inputs to the 7-input full adder that performs the third operation is insufficient, and O is input there, but the 7-input addition by the C8A method is performed until the second operation. Then, the partial products 23 to 27 may be added using the Wall and aee methods. In this case, 11 inputs Wal, 1ace
28 circuits are required, and the total delay time is equivalent to two stages of 7-input full adders and one stage of 11-input LNallace circuit. Looking at this in terms of the delay time of a 3-input full adder, there are 11 stages.

このように、部分積が丁度４個残らなかったどきはその
分をＷａｌ、１ａｃｅ回路で演算することもできる。In this way, if exactly four partial products do not remain, those can be calculated using the Wal and 1ace circuits.

次に、さらに並列度を高めた実施例を説明する。Next, an example in which the degree of parallelism is further increased will be described.

第１８図は７入力全加算器を用いて４並列に部分積の加
算を行う方法の動作説明図で、最初に部分積上〜７．８
〜〕４．１５〜２１．及び２２〜２７（あまりはＯを入
れる）を同時に算出する。この演算結果は、各桁とも】
２の出力（４並列にそれぞれ３出力）が存在するため、
よく知られている１２人Ｗａｌｌａｃｅの方式より２つ
の出力にする。この出力を各桁でまとめてまるめ回路１
１５等／＼出力する。Fig. 18 is an operation explanatory diagram of a method of adding partial products in four parallel ways using a 7-input full adder.
~]4.15~21. and 22 to 27 (enter O for remainder) at the same time. The result of this operation is for each digit]
Since there are 2 outputs (4 parallel outputs each with 3 outputs),
The well-known 12-person Wallace method has two outputs. Rounding circuit 1 summarizes this output by each digit.
15 mag/\ output.

第１８図の方法を実現する回路構成を第１９図（ａ）に
示す。この実施例は、以下の同図（ｂ）（ｃ）とともに
、部分積演算回路２０１の全体を示しており、ＬＳＩ化
のときの各回路の配置の概要図である。A circuit configuration for realizing the method shown in FIG. 18 is shown in FIG. 19(a). This embodiment shows the entire partial product calculation circuit 201, as shown in FIGS. 3(b) and 3(c) below, and is a schematic diagram of the arrangement of each circuit when integrated into an LSI.

入力データＸ、Ｙは共に５３ビットとし、入力データＹ
はブースのデコーダにより２７個の出力に変換される。Both input data X and Y are 53 bits, and input data Y
is converted into 27 outputs by the Booth decoder.

図中Ｕ７は第１４図に示した部分積と和を求めるユニッ
ト回路で、縦方向に並んだ４個が第１８図に示した工桁
分の部分積の和を上回で同時に算出する。そしてその結
果は、１２入力のＷａｌ、］、ａｃｅ回路Ｗ１回路主１
められる。ここで、各ユニット回路Ｕ７からＷａｌｌａ
ｃｅ回路Ｗ１２までの矢印の長さは、実際の配線長の大
小を定性的に表している。このため、必要があれば、各
段ユニット回路Ｕ７の出力ドライバの原動能力を配線長
に合わせて変えてもよい。In the figure, U7 is a unit circuit for calculating the partial products and sums shown in FIG. 14, and the four unit circuits arranged in the vertical direction simultaneously calculate the sum of the partial products for the number of steps shown in FIG. 18. And the result is 12 input Wal, ], ace circuit W1 circuit main 1
I can't stand it. Here, from each unit circuit U7 to Walla
The length of the arrow up to the ce circuit W12 qualitatively represents the actual wiring length. Therefore, if necessary, the driving capacity of the output driver of each stage unit circuit U7 may be changed in accordance with the wiring length.

本実施例によれば、１段の７入力全加算器と１２入力Ｗ
ａｌｌａｃｅ回路の遅延時間を持つから、３入力全加算
器の遅延時間に換算すると８段となり、層の高速化がは
かれる。According to this embodiment, one stage of 7-input full adder and 12-input W
Since it has the delay time of the allace circuit, the delay time of a 3-input full adder becomes 8 stages, which increases the speed of the layer.

第１９図（ｂ）は、第１９図（ａ）の１２入力Ｗａｌｌ
ａｃｅ回路Ｗ１２を２つの６入力Ｗａ１．１ａｃｅ回路
ｗ６と４入力Ｖａ１．１ａｃｅ回路Ｗ４に分けて配置し
た実施例を示すものである。上段の２つのユニット回Ｍ
Ｕ７の出力が上側の６入力Ｗａ１．］、ａｃｅ回路ｗ６
で加算され、下段の２つのユニット回路Ｕ７出力が下側
の６入力Ｗａｌｌａｃｅ回路Ｗ６で加算され、さらに各
６人カ１Ｉｌａ１１ａｃｅ回路Ｗ””　出力が４入力Ｗ
ａ１１．ａｃｅ回路ｗ４でまとめられる。本実施例によ
ると、図示の矢印かられかるように、上側の６入力１１
ａｌｌａｃｅ回路の出方ドライバだけ高駆動能力のある
デバイスとすればよく、ドライバの面積を小さく、さら
にピーク電流も小さくできるという効果がある。Figure 19(b) shows the 12-input Wall of Figure 19(a).
This shows an embodiment in which the ace circuit W12 is divided into two 6-input Wa1.1 ace circuit w6 and a 4-input Va1.1 ace circuit W4. Upper two unit times M
The output of U7 is the upper 6 input Wa1. ], ace circuit w6
The outputs of the two lower unit circuits U7 are added together in the lower 6-input Wallace circuit W6, and each of the 6-person 1Ila11ace circuit W"" outputs are added to the 4-input W
a11. It is summarized by the ace circuit w4. According to this embodiment, as shown by the arrow in the figure, the upper 6 inputs 11
Only the output driver of the allace circuit needs to be a device with high driving capability, which has the effect of reducing the area of the driver and further reducing the peak current.

第１９図（ｃ）は第１９図（ｂ）の構成において、２つ
の６入力Ｗａｌｌａｃｅ回路ｗ６の位置を変え、そこへ
入力を送るユニット回路Ｕ７の出方負荷を揃えるように
して、回路設計を容易番こしたものである。Figure 19(c) shows the circuit design in the configuration of Figure 19(b) by changing the positions of the two 6-input Wallace circuits w6 and aligning the output loads of the unit circuits U7 that send input thereto. It's easy to understand.

以上、種々の実施例を説明した。パイプライン化による
高速化については触れながったが、どの実施例において
もパイプライン化を行うことは可能で、−層の高速化が
実現できる。また演算対象とするデータ長は６４ビット
としたが、この長さが変わっても本発明が適用できるこ
とはいうまでもない。さらに、部分積演算回路での加算
の高速化のために７入力の全加算器を用いる例を示した
が、これは出力を３ビット（サムと２つのキャリー）＝
３１− としたときは入力の最大が７個、従って中間段階で４個
の部分積をそれまでの結果（キャリー２個とサム）と−
度に加算でき、同し３ビット出力の４〜６入力全加算器
の場合よりも効率がよいからである。全加算器をｎビッ
ト出力（サムｎ−１個のキャリー）とすれば、入力数は
最大２ｎ−１ビットまでとれ、このような全加算器の利
用へ本発明を拡張することは容易である。Various embodiments have been described above. Although we have not mentioned speeding up by pipelining, pipelining is possible in any of the embodiments, and speeding up of -layers can be achieved. Furthermore, although the data length to be calculated is 64 bits, it goes without saying that the present invention can be applied even if this length changes. Furthermore, we showed an example of using a 7-input full adder to speed up addition in a partial product calculation circuit, but this uses a 3-bit output (sum and 2 carries) =
31-, the maximum number of inputs is 7, so in the intermediate stage, 4 partial products are combined with the previous results (2 carries and sum) and -
This is because it can perform addition at once, and is more efficient than a 4- to 6-input full adder with the same 3-bit output. If a full adder has an n-bit output (sum n-1 carries), the number of inputs can be up to 2n-1 bits, and it is easy to extend the present invention to the use of such a full adder. .

〔Effect of the invention〕

本発明によれば、まるめの処理と仮数部の積を求める演
算処理の一部とを同時に行え、また指数部の演算及び乗
算結果のオーバーフロー／アンダーフローを予め検出で
き、さらに５以上の入力をもつ全加算器を用いることに
より部分積の加算を複数個並列に実行できるから、浮動
小数点の乗算処理を高速に行えるという効果がある。According to the present invention, it is possible to simultaneously perform rounding processing and part of the arithmetic processing for calculating the product of the mantissa parts, to detect in advance the calculation of the exponent part and overflow/underflow of the multiplication results, and to detect inputs of 5 or more. By using a full adder, multiple partial product additions can be executed in parallel, which has the effect of allowing floating point multiplication processing to be performed at high speed.

[Brief explanation of drawings]

第１図は本発明の一実施例を示すブロック図、第２図〜
第５図は従来の浮動小数点乗算回路の構成と動作を示す
図、第６図は第１図の指数演算回路の構成例を示す図、
第７図〜第９図は第１図の２補正値発生回路の動作説明図、第１０図は第１図のまる
め回路の構成例を示す図、第１１図は７入力全加算器の
構成例を示す図、第１２図〜第１９図は第工図の部分積
演算回路に７入力全加算器を用いて高速化した実施例と
その動作説明図である。１００・・・浮動小数点乗算回路、１１０・・指数演算
回路、１１３・セレクタ、１１４・・・正規化回路、１
１５・・・まるめ回路、１１６・・・補正値発生回路、
２０１・・・部分積演算回路。FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG.
FIG. 5 is a diagram showing the configuration and operation of a conventional floating-point multiplication circuit, and FIG. 6 is a diagram showing an example of the configuration of the exponent calculation circuit in FIG. 1.
7 to 9 are diagrams illustrating the operation of the correction value generation circuit 2 in Figure 1, Figure 10 is a diagram showing an example of the configuration of the rounding circuit in Figure 1, and Figure 11 is the configuration of a 7-input full adder. 12 to 19 are diagrams illustrating an example in which a 7-input full adder is used to increase the speed of the partial product calculation circuit shown in FIG. 1, and its operation. 100... Floating point multiplication circuit, 110... Exponential calculation circuit, 113... Selector, 114... Normalization circuit, 1
15... Rounding circuit, 116... Correction value generation circuit,
201...Partial product calculation circuit.

Claims

[Claims] A sign operation unit that takes one or two floating point numbers as input data and calculates the sign of output data from the signs of the two input data, and a product of the mantissa parts of the two input data and the result. a mantissa operation unit that performs normalization and rounding processing to calculate the mantissa part of the output data, and a mantissa calculation unit that performs addition and subtraction processing of the exponent parts of two input data and correction processing that corresponds to the result of the normalization and rounding processing of the processing result. In a floating-point multiplication device, the exponent calculation unit is configured to perform corrections that may be necessary depending on the results of the normalization and rounding processing. a first means for calculating a value in advance from the result of the addition/subtraction processing; a second means for detecting in advance the presence or absence of overflow or underflow of the output of the means and the result of the addition/subtraction processing; and the mantissa calculation section. When the normalization and rounding processing results are output, a necessary value is selected from the above addition/subtraction processing results or a pre-calculated correction value according to the result and is used as the exponent part of the output data, and the above-mentioned pre-detected 1. A floating point multiplication device comprising: third means for extracting a determination result as to the presence or absence of overflow or underflow. 2. When an overflow or an underflow of the result of addition/subtraction processing of the exponent part is detected by the second means, the multiplication processing is interrupted at the time of detection, and the correction calculated by the first means is Floating point multiplication according to claim 1, characterized in that the multiplication process is interrupted when an overflow or underflow of a value is detected by the second means and the correction value is selected by the third means. Device. 3. Partial product calculation means for calculating the product of the mantissa calculation unit and the entire input data of the other for each bit or multiple bits of one input data as a partial product, and each partial product calculated by the means; Addition is performed by calculating the sum of each bit of the data to be added as the sum bit and carry bit at that bit position, and when adding the next data, the sum bit and carry bit calculated above are combined with the respective corresponding bits of the next data. A means for calculating partial sums by repeating the process of addition, and a set of sum bits and a set of carry bits obtained by adding all the above partial products by means, and a means for calculating a partial sum by repeating the process of addition, and a set of sum bits and a set of carry bits obtained by adding all the above partial products. a correction value generating means for generating correction data; and a means for generating a mantissa multiplication result obtained by adding the correction value generated by the means to the set of sum bits and the set of carry bits and performing rounding processing. 2. The floating point multiplication device according to claim 1, further comprising: rounding means; and normalization means for normalizing the output of said means to generate a mantissa part of output data. 4. The partial sum calculation means outputs one sum bit and n-1 carry bits, and has 2^n-1 input bits, where n is an integer of 3 or more. The inputs of the full adder are bits of the same digit of 2^n-1 partial products, or 2^n
- n-1 bits of the same digit of the above partial products, the sum bit output corresponding to the corresponding digit of the other full adders, and the carry bit corresponding to the corresponding bit output from the other n-1 full adders. 4. A floating point multiplication device as claimed in claim 3. 5. When m is an integer of 2 or more, divide the portion to be added into m groups, calculate a partial sum by the partial sum calculating means provided for each divided group, and further calculate the calculated m
The configuration is such that a set of sum bits and a set of carry bits are generated when all partial products are added by a composition means that calculates the respective sums of the set of sum bits and the set of carry bits for each partial sum. 5. A floating point multiplication device according to claim 3 or 4.