JP2765516B2

JP2765516B2 - Multiply-accumulate unit

Info

Publication number: JP2765516B2
Application number: JP7153915A
Authority: JP
Inventors: 恒平撫原
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1995-05-29
Filing date: 1995-05-29
Publication date: 1998-06-18
Anticipated expiration: 2013-06-18
Also published as: JPH08328828A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、積和演算器に関し、特
にＣＳＡ（ＣａｒｒｙＳａｖｅＡｄｄｅｒ。キャリ
セーブ加算器）ツリーを有して固定小数点積和演算を行
う積和演算器に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a product-sum operation unit, and more particularly to a product-sum operation unit having a CSA (Carry Save Adder: Carry Save Adder) tree for performing a fixed-point product-sum operation.

【０００２】[0002]

【従来の技術】以前より、パーソナル・コンピュータや
エンジニアリング・ワークステーション等に採用される
汎用マイクロプロセッサは、符号付き整数乗算命令を命
令セットに備えていた。2. Description of the Related Art Conventionally, general-purpose microprocessors employed in personal computers, engineering workstations, and the like have a signed integer multiply instruction in an instruction set.

【０００３】ところで、近年、音声，オーディオまたは
画像の圧縮および伸長等のために、いわゆるディジタル
信号処理を高速に行う必要が生じてきた。In recent years, it has become necessary to perform so-called digital signal processing at high speed in order to compress and decompress voice, audio or image.

【０００４】したがって、最近のマイクロプロセッサで
は、上述の既存の命令セット（符号付き整数乗算命令を
備えた命令セット）に加えて、ディジタル信号処理で頻
繁に用いられる符号付き固定小数点積和演算を行うため
の符号付き固定小数点積和演算命令が命令セットに追加
されることが一般的になりつつある。Therefore, in recent microprocessors, in addition to the above-mentioned existing instruction set (an instruction set having a signed integer multiplication instruction), a signed fixed-point multiply-accumulate operation frequently used in digital signal processing is performed. It is becoming common to add a signed fixed-point multiply-accumulate operation instruction to the instruction set.

【０００５】このように、マイクロプロセッサの命令セ
ットに符号付き固定小数点積和演算命令を追加すること
により、次のような利点が生じる。すなわち、従来であ
れば、符号付き乗算命令および符号付き加算命令の２命
令で行われていた処理が、符号付き固定小数点積和演算
命令だけで（すなわち、１命令で）行えるようになるた
め、デジタル信号処理を大幅に高速化することができ
る。[0005] By adding a signed fixed-point multiply-accumulate operation instruction to the instruction set of the microprocessor, the following advantages are obtained. That is, conventionally, the processing performed by two instructions of the signed multiplication instruction and the signed addition instruction can be performed only by the signed fixed-point multiply-add instruction (that is, by one instruction). Digital signal processing can be significantly speeded up.

【０００６】本発明は、このようなマイクロプロセッサ
（ディジタル信号処理性能を強化したマイクロプロセッ
サ）に内蔵するための積和演算器（符号付き固定小数点
積和演算が可能な積和演算器）を対象とするものであ
る。The present invention is directed to a product-sum operation unit (a product-sum operation unit capable of performing a fixed-point product-sum operation with a sign) built in such a microprocessor (a microprocessor having enhanced digital signal processing performance). It is assumed that.

【０００７】ここで、本発明で対象とする積和演算器に
は、３３ビット長の符号付き被乗数Ｘと、３２ビット長
の符号付き乗数Ｙと、同じく３２ビット長の加算値Ｚと
が入力されるものとする。ただし、被乗数Ｘは、３２ビ
ット長の任意の符号付き数を３３ビット長に符号拡張し
た値，または３３ビット長の任意の正数であるものとす
る。Here, a multiplicand X with a sign of 33 bits, a signed multiplier Y with a sign of 32 bits, and an addition value Z also with a sign of 32 bits are input to the multiply-accumulate unit targeted by the present invention. Shall be performed. However, the multiplicand X is a value obtained by sign-extending an arbitrary 32-bit signed number to a 33-bit length, or an arbitrary positive number having a 33-bit length.

【０００８】この種の積和演算器では、次のおよび
に示すような処理が必要になる（図４参照）。なお、本
発明で対象とする積和演算器で採用される固定小数点数
表現は、最上位ビット側に小数点を固定する表現である
ものとする。In this kind of product-sum operation unit, the following processing is required (see FIG. 4). It is assumed that the fixed-point number representation employed in the product-sum operation unit according to the present invention is an expression in which a decimal point is fixed on the most significant bit side.

【０００９】被乗数Ｘと乗数Ｙとの符号付き乗算を
行って６５ビット長の中間結果を得る。[0009] Signed multiplication of the multiplicand X and the multiplier Y is performed to obtain an intermediate result having a length of 65 bits.

【００１０】その中間結果の上位側３３ビットに、
加算値Ｚが１ビット符号拡張された３３ビット長の値を
符号付き加算して、６５ビット長の結果Ｒを出力する。In the upper 33 bits of the intermediate result,
Signed addition is performed on a 33-bit length value obtained by sign-extending the addition value Z by 1 bit, and a 65-bit length result R is output.

【００１１】この種の積和演算器における「乗算を行う
手法」としては、乗数Ｙの一部と被乗数Ｘ全体とからブ
ースのデコーダで部分積を生成し、生成した複数個の部
分積をＣＳＡツリーで加算する手法が一般的である。As a "multiplication method" in this kind of multiply-accumulate unit, a partial product is generated by a Booth decoder from a part of the multiplier Y and the entire multiplicand X, and the generated partial products are CSA. A method of adding by a tree is general.

【００１２】ブースのデコーダの構成手法は、ウェステ
（ＮｅｉｌＨ．Ｅ．Ｗｅｓｔｅ）らによる「プリンシ
プルズ・オブ・シーモス・ブイエルエスアイ・デザイ
ン，ア・システムズ・パースペクティブ，セカンド・エ
ディション（ＰＲＩＮＣＩＰＬＥＳＯＦＣＭＯＳ
ＶＬＳＩＤＥＳＩＧＮ，ＡＳｙｓｔｅｍｓＰｅｒ
ｓｐｅｃｔｉｖｅ，ＳｅｃｏｎｄＥｄｉｔｉｏｎ），
Ａｄｄｉｓｏｎ−ＷｅｓｌｅｙＰｕｂｌｉｓｈｉｎｇ
Ｃｏｍｐａｎｙ，ＩＳＢＮ０−２０１−５３３７６
−６，第５４７〜５５５頁」に示されている。The method of constructing the booth decoder is described in "Principles of Cimos VSI Design, A Systems Perspective, Second Edition" by Neil HE. Weste et al.
VLSI DESIGN, A Systems Per
selective, Second Edition),
Addison-Wesley Publishing
Company, ISBN 0-201-53376
-6, pp. 547-555 ".

【００１３】ここで、ブースのデコーダについての基本
的な考え方を説明する（この考え方における各式ならび
に図５および図６は、後述する本発明の実施例の説明に
おいても引用する）。Here, the basic concept of the Booth decoder will be described (Equations in this concept and FIGS. 5 and 6 will be referred to in the description of embodiments of the present invention described later).

【００１４】今、符号付き３３ビット長の被乗数Ｘと符
号付き３３ビット長の乗数Ｙとの積Ｐ（Ｐ＝Ｘ×Ｙ）を
求めたいとする。Now, it is assumed that a product P (P = X × Y) of a signed 33-bit multiplicand X and a signed 33-bit length multiplier Y is to be obtained.

【００１５】被乗数Ｘおよび乗数Ｙの各ビット（最下位
ビットを第０ビットとしたときの第ｉビット（ｉは０ま
たは正整数））をそれぞれｘ_iおよびｙ_iと示すと、被
乗数Ｘおよび乗数Ｙは次式で表される。When the bits of the multiplicand X and the multiplier Y (the i-th bit (i is 0 or a positive integer) when the least significant bit is the 0th bit) are denoted by x _i and y _i , respectively, the multiplicand X and the multiplier Y is represented by the following equation.

【００１６】[0016]

【数１】 (Equation 1)

【００１７】乗数Ｙをブースのデコーダで以下のように
３ビット単位に分割すると、乗数Ｙを次のように書き換
えることができる。When the multiplier Y is divided into three bits by a Booth decoder as follows, the multiplier Y can be rewritten as follows.

【００１８】[0018]

【数２】 (Equation 2)

【００１９】式（４）で定義されるＹ_j（ｊは−１，０
または正整数）は、乗数Ｙの各ビット（ｙ_-1を含む）の
うちの３つのビットの組み合わせによって、−２，−
１，０，１，および２のいずれかの値をとる。すなわ
ち、図５に示すように、ｙ_(2j+1)，ｙ_2j，およびｙ
_(2j-1)の３つのビットの全ての組み合わせ（０または１
の組み合わせ）を検討しても、上記５つの値のうちのい
ずれかの値しかとらない。Y _j defined by equation (4) (j is −1,0
Or a positive integer) is -2,-by the combination of three bits of each bit (including y- ₁ ) of the multiplier Y.
It takes one of the values 1, 0, 1, and 2. That is, as shown in FIG. 5, y _{(2j + 1)} , y _2j , and y
All combinations of the three bits of _(2j-1) (0 or 1
), Only one of the above five values is obtained.

【００２０】求めたい積Ｐは、次式のように表される。The product P to be obtained is expressed by the following equation.

【００２１】[0021]

【数３】 (Equation 3)

【００２２】ここで、Ｐ_j＝Ｘ・Ｙ_j （６）とおくと、このＰ_jは、各Ｙ_jの値について、図６に示
す方法（生成法）で生成することができる。すなわち、
Ｙ_jの値が負の場合には、被乗数Ｘの各ビットｘ_iを反
転してｃ_j＝１を加算することにより、被乗数Ｘの２の
補数が求められる。Here, if P _j = X · Y _j (6), this P _j can be generated by the method (generation method) shown in FIG. 6 for each Y _j value. That is,
If the value of Y _j is negative, by adding the c _j = 1 inverts each bit x _i of the multiplicand X, the 2's complement of the multiplicand X is determined.

【００２３】このＰ_jをビット単位に書き表わすと、次
式のようになる（同式中のＰ_jiおよびｃ_jについては図
６を参照されたい）。When this P _j is written in bit units, the following equation is obtained (for P _ji and c _{j in} the equation, see FIG. 6).

【００２４】[0024]

【数４】 (Equation 4)

【００２５】被乗数Ｘが３３ビット長であり、Ｙ_jが−
２，１，０，１，または２であるので、Ｐ_jは最大３４
ビット長になりうる。The multiplicand X is 33 bits long and Y _j is-
2, 1, 0, 1, or 2, P _j is 34 at the maximum.
It can be bit length.

【００２６】次に、上述のようにして求めた積Ｐに加算
値Ｚを加算し、積和演算の結果Ｒを求める考え方につい
て説明する。Next, the concept of adding the addition value Z to the product P obtained as described above and obtaining the result R of the product-sum operation will be described.

【００２７】符号付き固定小数点３２ビット長の加算値
Ｚは、各ビットをｚ_iと示し、ｚ₃₂＝ｚ₃₁として１ビッ
トの符号拡張を行えば、次式のように表される。The addition value Z having a fixed-point fixed-point 32-bit length is represented by the following equation when each bit is represented by z _i and sign extension of one bit is performed by setting z ₃₂ = z ₃₁ .

【００２８】[0028]

【数５】 (Equation 5)

【００２９】したがって、積和演算の結果Ｒは、式
（５），（７）および（８）より、次式で表される。Therefore, the result R of the product-sum operation is expressed by the following equation from equations (5), (7) and (8).

【００３０】[0030]

【数６】 (Equation 6)

【００３１】式（９）中の負数部Ｎの２の補数化を次式
のようにして行い、加算器で扱えるようにする。ここ
で、被乗数Ｘおよび乗数Ｙが３２ビット長の負の最大値
であり加算値Ｚが正の最大値であるときに、積和演算の
結果Ｒは最大６５ビット長になるので、当該演算は６５
ビット長で行われる。The two's complement conversion of the negative part N in the equation (9) is performed as in the following equation so that it can be handled by the adder. Here, when the multiplicand X and the multiplier Y are the negative maximum value having a 32-bit length and the addition value Z is the positive maximum value, the result R of the product-sum operation has a maximum length of 65 bits. 65
Performed in bit length.

【００３２】[0032]

【数７】 (Equation 7)

【００３３】このＮを用いると、積和演算の結果Ｒは次
式で表される。Using this N, the result R of the product-sum operation is expressed by the following equation.

【００３４】[0034]

【数８】 (Equation 8)

【００３５】式（１３）より、積和演算の結果Ｒを求め
るには、ブースのデコーダを用いて式（１３）に示すさ
れる各項目を全て生成し、ＣＳＡツリーで加算すればよ
いことが分かる。From the equation (13), in order to obtain the result R of the product-sum operation, it is necessary to generate all the items shown in the equation (13) using the Booth's decoder and add them in the CSA tree. I understand.

【００３６】次に、このような考え方に基づく従来の積
和演算器（固定小数点積和演算器）の構成および動作に
ついて、図７〜図９を参照しながら説明する。Next, the configuration and operation of a conventional product-sum operation unit (fixed-point product-sum operation unit) based on such a concept will be described with reference to FIGS.

【００３７】図７に示す積和演算器は、この種の積和演
算器の一例であり、第１のセレクタ１０１と、第２のセ
レクタ１０２と、第３のセレクタ１０３と、第４のセレ
クタ１０４と、ブースのデコーダ１０５と、１１入力Ｃ
ＳＡツリー１０６と、第１のパイプライン・レジスタ１
０７と、第２のパイプライン・レジスタ１０８と、キャ
リ伝播加算器１０９と、符号拡張器１１０とを含んで構
成されている。The product-sum operation unit shown in FIG. 7 is an example of this type of product-sum operation unit, and includes a first selector 101, a second selector 102, a third selector 103, and a fourth selector 104, booth decoder 105, and 11 inputs C
SA tree 106 and first pipeline register 1
07, a second pipeline register 108, a carry propagation adder 109, and a sign extender 110.

【００３８】図７に示す構成の積和演算器では、基本的
に上述の考え方（式（１３）によって積和演算の結果Ｒ
を求める考え方）を採用しながらも、部分積の生成およ
び加算を複数回に分割して行うことで、ハードウェア量
を削減している。なお、このような積和演算器と同種の
技術は、特開平４−２０５６２４号公報（直並列型乗算
器）に開示されている。ただし、特開平４−２０５６２
４号公報に示された構成の直並列型乗算器では、部分積
加算の途中結果として、ＣＳＡに続いてキャリ伝播加算
器を通した後の単一の値を保持しているが、図７に示す
積和演算器の構成では、部分積加算の途中結果としてＣ
ＳＡの出力であるハーフサムとハーフキャリとの組を保
持して演算を高速化している。In the product-sum operation unit having the configuration shown in FIG. 7, the result R of the product-sum operation is basically calculated according to the above-mentioned concept (Equation (13)).
), The amount of hardware is reduced by dividing and generating the partial product a plurality of times. A technique similar to such a product-sum operation unit is disclosed in Japanese Patent Application Laid-Open No. 4-205624 (series-parallel type multiplier). However, JP-A-4-20562
In the serial / parallel type multiplier having the configuration disclosed in Japanese Patent Application Laid-Open No. 4 (1999) -1995, a single value after passing through the carry propagation adder following the CSA is held as an intermediate result of the partial product addition. In the configuration of the product-sum operation unit shown in FIG.
The operation is speeded up by holding a set of half sum and half carry, which are the outputs of SA.

【００３９】次に、図７に示す積和演算器の動作につい
て、図８および図９を参照しながら説明する。Next, the operation of the product-sum operation unit shown in FIG. 7 will be described with reference to FIGS.

【００４０】図７に示す積和演算器は、ハードウェア量
を削減するために、部分積の生成および加算を「第１の
演算ステップ」と「第２の演算ステップ」との２回に分
けて行う。The product-sum operation unit shown in FIG. 7 divides the generation and addition of partial products into two operations of a "first operation step" and a "second operation step" in order to reduce the amount of hardware. Do it.

【００４１】第１に、第１の演算ステップにおける演算
実行時の動作を説明する。First, the operation at the time of executing the operation in the first operation step will be described.

【００４２】第１の演算ステップでは、被乗数Ｘと乗数
Ｙの下位１６ビットとが乗算され、演算前半（第１の演
算ステップ）の演算結果（途中演算結果）が生成され
る。In the first operation step, the multiplicand X and the lower 16 bits of the multiplier Y are multiplied to generate the operation result (intermediate operation result) of the first half of the operation (first operation step).

【００４３】第１のセレクタ１０１は、演算ステップ切
替えビット（第１の演算ステップの演算を行うか第２の
演算ステップの演算を行うかを示す１ビット長の情報。
この場合には第１の演算ステップの演算を行う旨を示し
ている）を参照し、乗数Ｙの下位１６ビットを選択出力
する。The first selector 101 has an operation step switching bit (1-bit information indicating whether to perform the operation of the first operation step or the operation of the second operation step).
In this case, the calculation in the first calculation step is performed), and the lower 16 bits of the multiplier Y are selectively output.

【００４４】また、第２のセレクタ１０２は、第１の演
算ステップの演算を行う旨を示している演算ステップ切
替えビットを参照し、第２の定数を選択出力する。ここ
で、第２の定数とは、式（１３）における１・２³²（ｊ
＝０の場合の１・２^(32+2j)）とｃ_n（１・２³¹）とを
示す数値をいう（図８参照）。Further, the second selector 102 selects and outputs a second constant with reference to an operation step switching bit indicating that the operation of the first operation step is performed. Here, the second constant is ^1.232 (j in Expression (13).
^{= 1 · 2 (32 + 2j} ) in the case of 0) and refers to a number indicating the _{^{c n (1 · 2 31)}} ( see FIG. 8).

【００４５】さらに、第３のセレクタ１０３は、第１の
演算ステップの演算を行う旨を示している演算ステップ
切替えビットを参照し、全ビット０を示す定数（値０の
定数）を選択出力する。Further, the third selector 103 selects and outputs a constant (constant of value 0) indicating all bits 0 with reference to the operation step switching bit indicating that the operation of the first operation step is performed. .

【００４６】加えて、第４のセレクタ１０４は、第１の
演算ステップの演算を行う旨を示している演算ステップ
切替えビットを参照し、第１の定数を選択出力する。こ
こで、第１の定数とは、式（１３）におけるｃ_z（１・
２³¹）とｃ₁₅・２³⁰（ｊ＝１５の場合のｃ_j・２^2j）と
ｃ₇・２¹⁴（ｊ＝７の場合のｃ_j・２^2j）とを示す数値
をいう（図８参照）。In addition, the fourth selector 104 selects and outputs a first constant with reference to an operation step switching bit indicating that the operation of the first operation step is performed. Here, the first constant is c _z (1 ·
2 ³¹ ), c ₁₅ · 2 ³⁰ (c _j · 2 ^2j when j = 15) and c ₇ · 2 ¹⁴ (c _j · 2 ^2j when j = 7) (FIG. 8). reference).

【００４７】ブースのデコーダ１０５は、被乗数Ｘと第
１のセレクタ１０１の出力とを入力し、第１のセレクタ
１０１の出力を切出し開始位置を最上位ビットから２ビ
ットずつ下位ビットに移動しながら３ビット単位で切り
出した値を第８の部分積生成器，第７の部分積生成器，
第６の部分積生成器，第５の部分積生成器，第４の部分
積生成器，第３の部分積生成器，第２の部分積生成器，
および第１の部分積生成器に供給し、これらの値と被乗
数Ｘの値とから第８の部分積生成器〜第１の部分積生成
器により第８の部分積〜第１の部分積（被乗数Ｘと乗数
Ｙの下位１６ビットとを乗算したときの８個の部分積Ｐ
₇〜Ｐ₀）を生成する。The booth decoder 105 receives the multiplicand X and the output of the first selector 101, and cuts out the output of the first selector 101 to shift the start position from the most significant bit to the lower bit by two bits at a time. An eighth partial product generator, a seventh partial product generator,
A sixth partial product generator, a fifth partial product generator, a fourth partial product generator, a third partial product generator, a second partial product generator,
And the first partial product generator, and from these values and the value of the multiplicand X, from the eighth partial product generator to the first partial product generator, the eighth partial product to the first partial product ( Eight partial products P when multiplicand X is multiplied by lower 16 bits of multiplier Y
_{7 to} P ₀ ).

【００４８】各部分積は、図５に示すように被乗数Ｘを
−２，−１，０，１，または２倍した値となり、図６に
示すようにして生成される。なお、部分積Ｐ₀〜Ｐ₇に
は式（１３）における１・２^(32+2j)（ｊ＝１〜８）を
示すビットが付加され、部分積Ｐ₁〜Ｐ₇には式（１
３）におけるｃ_j・２^2j（ｊ＝０〜６）を示すビットが
付加される（図８参照。このように部分積に付加される
ビットによって表される情報を「付加ビット情報」と呼
ぶ）。ここで、ｃ_jの値は、Ｙ_jの値に基づいて、図６
に示すように決定される。Each partial product has a value obtained by multiplying the multiplicand X by -2, -1, 0, 1, or 2 as shown in FIG. 5, and is generated as shown in FIG. A bit indicating ^{1.2 (32 + 2j)} (j = 1 to 8) in the equation (13) is added to the partial products P _{0 to} P _7, and the equation (1) is added to the partial products P _{1 to} P _7.
A bit indicating c _j · 2 ^2j (j = 0 to 6) in 3) is added (see FIG. 8. The information represented by the bit added to the partial product in this manner is called “additional bit information”. ). Here, the value of c _j is calculated based on the value of Y _j in FIG.
Is determined as shown in FIG.

【００４９】１１入力ＣＳＡツリー１０６は、当該８個
の部分積（ブースのデコーダ１０５内の第１の部分積生
成器〜第８の部分積生成器によって生成される部分積。
付加ビット情報を含む），第２のセレクタ１０２の出
力，第３のセレクタ１０３の出力，および第４のセレク
タ１０４の出力を第１入力〜第１１入力として入力し、
それらを加算し、その加算結果をハーフサムとハーフキ
ャリとの組として出力する。The 11-input CSA tree 106 includes the eight partial products (the partial products generated by the first to eighth partial product generators in the Booth decoder 105).
Including the additional bit information), the output of the second selector 102, the output of the third selector 103, and the output of the fourth selector 104 as first to eleventh inputs.
They are added, and the addition result is output as a set of a half sum and a half carry.

【００５０】当該ハーフサムおよびハーフキャリは、第
１の演算ステップの演算結果として、それぞれ、第１の
パイプライン・レジスタ１０７および第２のパイプライ
ン・レジスタ１０８に保持される。The half sum and the half carry are held in the first pipeline register 107 and the second pipeline register 108, respectively, as the operation result of the first operation step.

【００５１】なお、図８に、第１の演算ステップにおけ
る１１入力ＣＳＡツリー１０６の各入力（ビット列）お
よび各出力（ハーフサムおよびハーフキャリのビット
列）の一覧を示す。FIG. 8 shows a list of each input (bit string) and each output (half-sum and half-carry bit string) of the 11-input CSA tree 106 in the first operation step.

【００５２】第２に、第２の演算ステップにおける演算
実行時の動作を説明する。Second, the operation at the time of executing the calculation in the second calculation step will be described.

【００５３】第２の演算ステップでは、被乗数Ｘと乗数
Ｙの上位１７ビットとが乗算され、その乗算結果と第１
の演算ステップにおける演算結果と加算値Ｚとが加算さ
れ、最終的な積和演算の結果Ｒが生成される。In the second operation step, the multiplicand X is multiplied by the upper 17 bits of the multiplier Y, and the result of the multiplication is multiplied by the first
Is added to the addition value Z to generate the final product-sum operation result R.

【００５４】符号拡張器１１０は、加算値Ｚに対する１
ビットの符号拡張を行う。これにより、加算値Ｚは３３
ビット長になる。The sign extender 110 calculates 1
Performs bit sign extension. Thus, the added value Z is 33
It becomes bit length.

【００５５】第１のセレクタ１０１は、第２の演算ステ
ップの演算を行う旨を示している演算ステップ切替えビ
ット（第１の演算ステップから第２の演算ステップへの
移行時に演算ステップ切替えビットの情報は切り替えら
れる）を参照し、乗数Ｙの上位１７ビットを選択出力す
る。The first selector 101 has an operation step switching bit indicating that the operation of the second operation step is to be performed (the information of the operation step switching bit at the time of transition from the first operation step to the second operation step). Is switched), and the higher 17 bits of the multiplier Y are selectively output.

【００５６】また、第２のセレクタ１０２は、第２の演
算ステップの演算を行う旨を示している演算ステップ切
替えビットを参照し、第２のパイプライン・レジスタ１
０８に保持されている値（第１の演算ステップの演算結
果のうちのハーフキャリ）の上位３５ビットを選択出力
する。Further, the second selector 102 refers to the operation step switching bit indicating that the operation of the second operation step is to be performed, and makes the second pipeline register 1
The upper 35 bits of the value held in 08 (half carry of the operation result of the first operation step) are selectively output.

【００５７】さらに、第３のセレクタ１０３は、第２の
演算ステップの演算を行う旨を示している演算ステップ
切替えビットを参照し、符号拡張器１１０の出力（符号
拡張された加算値Ｚ）を選択出力する。Further, the third selector 103 refers to the operation step switching bit indicating that the operation in the second operation step is to be performed, and outputs the output of the sign extender 110 (the sign-extended added value Z). Select output.

【００５８】加えて、第４のセレクタ１０４は、第２の
演算ステップの演算を行う旨を示している演算ステップ
切替えビットを参照し、第１のパイプライン・レジスタ
１０７に保持されている値（第１の演算ステップの演算
結果のうちのハーフサム）の上位３４ビットを選択出力
する。In addition, the fourth selector 104 refers to the operation step switching bit indicating that the operation of the second operation step is to be performed, and refers to the value held in the first pipeline register 107 ( The upper 34 bits of the half-sum of the operation result of the first operation step are selectively output.

【００５９】ブースのデコーダ１０５は、被乗数Ｘと第
１のセレクタ１０１の出力とを入力し、第１のセレクタ
１０１の出力を切出し開始位置を最上位ビットから２ビ
ットずつ下位ビットに移動しながら３ビット単位で切り
出した値を第８の部分積生成器，第７の部分積生成器，
第６の部分積生成器，第５の部分積生成器，第４の部分
積生成器，第３の部分積生成器，第２の部分積生成器，
および第１の部分積生成器に供給し、これらの値と被乗
数Ｘの値とから第８の部分積生成器〜第１の部分積生成
器により第８の部分積〜第１の部分積（被乗数Ｘと乗数
Ｙの上位１７ビットとを乗算したときの８個の部分積Ｐ
₁₅〜Ｐ₈）を生成する。The booth decoder 105 receives the multiplicand X and the output of the first selector 101, and cuts out the output of the first selector 101 to shift the start position from the most significant bit to the lower bit by two bits at a time. An eighth partial product generator, a seventh partial product generator,
A sixth partial product generator, a fifth partial product generator, a fourth partial product generator, a third partial product generator, a second partial product generator,
And the first partial product generator, and from these values and the value of the multiplicand X, from the eighth partial product generator to the first partial product generator, the eighth partial product to the first partial product ( Eight partial products P obtained by multiplying the multiplicand X and the higher 17 bits of the multiplier Y
_{15 to} P ₈ ).

【００６０】各部分積は、図５に示すように被乗数Ｘを
−２，−１，０，１，または２倍した値となり、図６に
示すようにして生成される。なお、部分積Ｐ₈〜Ｐ₁₄に
は式（１３）における１・２^(32+2j)（ｊ＝９〜１５）
を示すビットが付加され、部分積Ｐ₉〜Ｐ₁₅には式（１
３）におけるｃ_j・２^2j（ｊ＝８〜１４）を示すビット
が付加される（図９参照）。Each partial product has a value obtained by multiplying the multiplicand X by -2, -1, 0, 1, or 2 as shown in FIG. 5, and is generated as shown in FIG. It should be noted that the partial products P _{8 to} P ₁₄ have a value of 1.2 ^{(32 + 2j)} (j = 9 to 15) in the equation (13).
Are added, and the partial products P _{9 to} P ₁₅ are given by the formula (1).
A bit indicating c _j · 2 ^2j (j = 8 to 14) in 3) is added (see FIG. 9).

【００６１】１１入力ＣＳＡツリー１０６は、当該８個
の部分積（ブースのデコーダ１０５内の第１の部分積生
成器〜第８の部分積生成器によって生成される部分積。
付加ビット情報を含む），第２のセレクタ１０２の出
力，第３のセレクタ１０３の出力，および第４のセレク
タ１０４の出力を第１入力〜第１１入力として入力し、
それらを加算し、その加算結果をハーフサムとハーフキ
ャリとの組として出力する。The 11-input CSA tree 106 includes the eight partial products (the partial products generated by the first to eighth partial product generators in the Booth decoder 105).
Including the additional bit information), the output of the second selector 102, the output of the third selector 103, and the output of the fourth selector 104 as first to eleventh inputs.
They are added, and the addition result is output as a set of a half sum and a half carry.

【００６２】キャリ伝播加算器１０９は、以下のの値
およびの値を加算し、単一の結果Ｒ（当該積和演算の
結果Ｒ）を得て出力する（以下のの値およびの値を
対象としてキャリ伝播加算を行う）。４９ビット長のハーフサム（第２の演算ステップで
求められたハーフサム）と、第１の演算ステップで求め
られたハーフサムの下位１６ビット（第１のパイプライ
ン・レジスタ１０７に保持されている値の下位１６ビッ
ト）とを連結した値（図８および図９参照）４８ビット長のハーフキャリ（第２の演算ステップ
で求められたハーフキャリ）と、第１の演算ステップで
求められたハーフキャリの下位１５ビット（第２のパイ
プライン・レジスタ１０８に保持されている値の下位１
５ビット）とを連結した値（図８および図９参照）Carry propagation adder 109 adds the following value and the following value to obtain and output a single result R (result R of the product-sum operation) (the following values and Carry propagation addition). A 49-bit half-sum (half-sum obtained in the second operation step) and the lower 16 bits of the half-sum obtained in the first operation step (lower value of the value held in the first pipeline register 107) 8 and 9 (see FIGS. 8 and 9). A half carry having a length of 48 bits (a half carry obtained in the second operation step) and a lower part of the half carry obtained in the first operation step 15 bits (lower 1 of the value held in the second pipeline register 108)
5 bits) (see FIGS. 8 and 9)

【００６３】なお、図９に、第２の演算ステップにおけ
る１１入力ＣＳＡツリー１０６の各入力（ビット列）お
よび各出力（ハーフサムおよびハーフキャリのビット
列）の一覧を示す。FIG. 9 shows a list of each input (bit string) and each output (half-sum and half-carry bit string) of the 11-input CSA tree 106 in the second operation step.

【００６４】式（１３）中の各項目は、必ず、図８また
は図９中の入力のビット列に含まれている。したがっ
て、第１の演算ステップおよび第２の演算ステップのい
ずれかで、式（１３）中の各項目の値は１１入力ＣＳＡ
ツリー１０６により入力され加算されることになる。Each item in the equation (13) is always included in the input bit string in FIG. 8 or FIG. Therefore, in either the first operation step or the second operation step, the value of each item in the equation (13) is set to 11 input CSA.
The data is input and added by the tree 106.

【００６５】[0065]

【発明が解決しようとする課題】上述した従来の積和演
算器（図７に示すような積和演算器）では、符号拡張さ
れた加算値Ｚ全体が第２の演算ステップで加算されてい
るので、第２の演算ステップではＣＳＡツリーで１１個
の値を入力して加算する必要があり（図９参照）、入力
数が１１のＣＳＡツリーを用意する必要がある。ＣＳＡ
ツリーの入力数が多くなると、当該ＣＳＡツリーの構成
が複雑になり、当該ＣＳＡツリーを実現するためのハー
ドウェア量の増大および当該ＣＳＡツリーを使用した演
算の遅延（加算器段数が増えることによる積和演算器全
体の動作速度の低下）が生じうるという問題点がある。In the above-described conventional product-sum operation unit (product-sum operation unit as shown in FIG. 7), the entire sign-extended addition value Z is added in the second operation step. Therefore, in the second operation step, it is necessary to input and add 11 values in the CSA tree (see FIG. 9), and it is necessary to prepare a CSA tree having 11 inputs. CSA
When the number of tree inputs increases, the configuration of the CSA tree becomes complicated, the amount of hardware for realizing the CSA tree increases, and the delay of the operation using the CSA tree (the product due to the increase in the number of adder stages). However, there is a problem that the operating speed of the entire sum operation unit may be reduced.

【００６６】本発明の目的は、上述の点に鑑み、符号拡
張された加算値Ｚを２分割するビット分割器を設けるこ
と等により、ＣＳＡツリーの入力数を削減することがで
き、ハードウェア量が少なくてすみ、高速に動作しうる
積和演算器を提供することにある。In view of the above, an object of the present invention is to provide a bit divider for dividing the sign-extended addition value Z into two parts, thereby reducing the number of inputs to the CSA tree and reducing the amount of hardware. It is an object of the present invention to provide a multiply-accumulate unit which requires only a small amount of data and can operate at high speed.

【００６７】すなわち、本発明の目的は、３３ビット長
に符号拡張された加算値Ｚをビット分割器によって上位
１５ビットと下位１８ビットとに分割し、符号拡張され
た加算値Ｚのうちの下位１８ビットは第１の定数と連結
して第１の演算ステップでＣＳＡツリーに与え、符号拡
張された加算値Ｚのうちの上位１５ビットは第１の演算
ステップにおけるハーフサムの上位３４ビットと連結し
て第２の演算ステップでＣＳＡツリーに与えることによ
り、ＣＳＡツリーの入力数を削減することにある。That is, an object of the present invention is to divide an addition value Z sign-extended to a 33-bit length into upper 15 bits and lower 18 bits by a bit divider, and The 18 bits are concatenated with the first constant and provided to the CSA tree in the first operation step, and the upper 15 bits of the sign-extended addition value Z are concatenated with the upper 34 bits of the half sum in the first operation step. In the second operation step, the number of inputs to the CSA tree is reduced by providing the data to the CSA tree.

【００６８】[0068]

【課題を解決するための手段】本発明の積和演算器は、
３３ビット長の符号付き被乗数Ｘ，３２ビット長の符号
付き乗数Ｙ，および３２ビット長の加算値Ｚを入力し、
これらの積和であるＸ×Ｙ＋Ｚを結果Ｒとして出力する
積和演算器において、加算値Ｚに対して１ビットの符号
拡張を行う符号拡張器と、前記符号拡張器の出力を下位
１８ビットと上位１５ビットとに分割するビット分割器
と、第１の演算ステップでは乗数Ｙの下位１６ビットを
選択出力し、第２の演算ステップでは乗数Ｙの上位１７
ビットを選択出力する第１のセレクタと、第１の演算ス
テップでは第２の定数を選択出力し、第２の演算ステッ
プでは第２のパイプライン・レジスタに保持されている
値の上位３５ビットを選択出力する第２のセレクタと、
第１の演算ステップでは前記ビット分割器の出力の下位
１８ビットと第１の定数とを連結した値を選択出力し、
第２の演算ステップでは前記ビット分割器の出力の上位
１５ビットと第１のパイプライン・レジスタに保持され
ている値の上位３４ビットとを連結した値を選択出力す
る第３のセレクタと、前記第１のセレクタの出力を切出
し開始位置を最上位ビットから２ビットずつ下位ビット
に移動しながら３ビット単位で切り出した値を第８の部
分積生成器，第７の部分積生成器，第６の部分積生成
器，第５の部分積生成器，第４の部分積生成器，第３の
部分積生成器，第２の部分積生成器，および第１の部分
積生成器に供給し、これらの値と被乗数Ｘの値とから第
８の部分積，第７の部分積，第６の部分積，第５の部分
積，第４の部分積，第３の部分積，第２の部分積，およ
び第１の部分積を生成するブースのデコーダと、前記ブ
ースのデコーダによって生成された第１の部分積，第２
の部分積，第３の部分積，第４の部分積，第５の部分
積，第６の部分積，第７の部分積，および第８の部分積
と前記第２のセレクタの出力と前記第３のセレクタの出
力と付加ビット情報群とを加算する１０入力ＣＳＡツリ
ーと、第１の演算ステップにおける前記１０入力ＣＳＡ
ツリーの出力のうちのハーフサムを保持する前記第１の
パイプライン・レジスタと、第１の演算ステップにおけ
る前記１０入力ＣＳＡツリーの出力のうちのハーフキャ
リを保持する前記第２のパイプライン・レジスタと、第
２の演算ステップで前記１０入力ＣＳＡツリーにより出
力されたハーフサムと第１の演算ステップで前記第１の
パイプライン・レジスタに保持されたハーフサムの下位
１６ビットとを連結した値，および第２の演算ステップ
で前記１０入力ＣＳＡツリーにより出力されたハーフキ
ャリと第１の演算ステップで前記第２のパイプライン・
レジスタに保持されたハーフキャリの下位１５ビットと
を連結した値を対象としてキャリ伝播加算を行うキャリ
伝播加算器とを有する。According to the present invention, a multiply-accumulate unit according to the present invention comprises:
A signed multiplicand X having a length of 33 bits, a signed multiplier Y having a length of 32 bits, and an addition value Z having a length of 32 bits are input.
In a product-sum operation unit that outputs X × Y + Z, which is the sum of these products, as a result R, a sign extender that performs one-bit sign extension on the addition value Z, A bit divider for dividing into upper 15 bits, a lower 16 bits of the multiplier Y are selectively output in the first operation step, and a higher 17 bits of the multiplier Y are output in the second operation step.
A first selector for selecting and outputting a bit; and a first operation step for selectively outputting a second constant. In the second operation step, the upper 35 bits of the value held in the second pipeline register are read. A second selector for selecting and outputting;
In a first operation step, a value obtained by connecting the lower 18 bits of the output of the bit divider and a first constant is selected and output;
A second selector for selecting and outputting a value obtained by connecting the upper 15 bits of the output of the bit divider and the upper 34 bits of the value held in the first pipeline register; While shifting the output of the first selector from the most significant bit to the lower bit by two bits from the most significant bit, the values extracted in units of 3 bits are used as the values of the eighth partial product generator, the seventh partial product generator, and the sixth partial product generator. , A fifth partial product generator, a fourth partial product generator, a third partial product generator, a second partial product generator, and a first partial product generator, From these values and the value of the multiplicand X, the eighth partial product, the seventh partial product, the sixth partial product, the fifth partial product, the fourth partial product, the third partial product, and the second partial product A Booth decoder for generating a product and a first partial product; First partial product generated Te, the second
, The third partial product, the fourth partial product, the fifth partial product, the sixth partial product, the seventh partial product, the eighth partial product, the output of the second selector, A 10-input CSA tree for adding the output of the third selector and the additional bit information group, and the 10-input CSA tree in the first operation step
A first pipeline register for holding a half sum of an output of the tree, and a second pipeline register for holding a half carry of an output of the 10-input CSA tree in a first operation step; A value obtained by concatenating the half sum output by the 10-input CSA tree in the second operation step and the lower 16 bits of the half sum held in the first pipeline register in the first operation step; and The half carry output by the 10-input CSA tree in the operation step and the second pipeline in the first operation step.
A carry propagation adder that performs carry propagation addition on a value obtained by connecting the lower 15 bits of the half carry held in the register.

【００６９】[0069]

【作用】本発明の積和演算器では、３３ビット長の符号
付き被乗数Ｘ，３２ビット長の符号付き乗数Ｙ，および
３２ビット長の加算値Ｚを入力しこれらの積和であるＸ
×Ｙ＋Ｚを結果Ｒとして出力する積和演算器において、
符号拡張器が加算値Ｚに対して１ビットの符号拡張を行
い、ビット分割器が符号拡張器の出力を下位１８ビット
と上位１５ビットとに分割し、第１のセレクタが第１の
演算ステップでは乗数Ｙの下位１６ビットを選択出力し
第２の演算ステップでは乗数Ｙの上位１７ビットを選択
出力し、第２のセレクタが第１の演算ステップでは第２
の定数を選択出力し第２の演算ステップでは第２のパイ
プライン・レジスタに保持されている値の上位３５ビッ
トを選択出力し、第３のセレクタが第１の演算ステップ
ではビット分割器の出力の下位１８ビットと第１の定数
とを連結した値を選択出力し第２の演算ステップではビ
ット分割器の出力の上位１５ビットと第１のパイプライ
ン・レジスタに保持されている値の上位３４ビットとを
連結した値を選択出力し、ブースのデコーダが第１のセ
レクタの出力を切出し開始位置を最上位ビットから２ビ
ットずつ下位ビットに移動しながら３ビット単位で切り
出した値を第８の部分積生成器，第７の部分積生成器，
第６の部分積生成器，第５の部分積生成器，第４の部分
積生成器，第３の部分積生成器，第２の部分積生成器，
および第１の部分積生成器に供給しこれらの値と被乗数
Ｘの値とから第８の部分積，第７の部分積，第６の部分
積，第５の部分積，第４の部分積，第３の部分積，第２
の部分積，および第１の部分積を生成し、１０入力ＣＳ
Ａツリーがブースのデコーダによって生成された第１の
部分積，第２の部分積，第３の部分積，第４の部分積，
第５の部分積，第６の部分積，第７の部分積，および第
８の部分積と第２のセレクタの出力と第３のセレクタの
出力と付加ビット情報群とを加算し、第１のパイプライ
ン・レジスタが第１の演算ステップにおける１０入力Ｃ
ＳＡツリーの出力のうちのハーフサムを保持し、第２の
パイプライン・レジスタが第１の演算ステップにおける
１０入力ＣＳＡツリーの出力のうちのハーフキャリを保
持し、キャリ伝播加算器が第２の演算ステップで１０入
力ＣＳＡツリーにより出力されたハーフサムと第１の演
算ステップで第１のパイプライン・レジスタに保持され
たハーフサムの下位１６ビットとを連結した値，および
第２の演算ステップで１０入力ＣＳＡツリーにより出力
されたハーフキャリと第１の演算ステップで第２のパイ
プライン・レジスタに保持されたハーフキャリの下位１
５ビットとを連結した値を対象としてキャリ伝播加算を
行う。In the multiply-accumulate unit of the present invention, a signed multiplicand X having a length of 33 bits, a signed multiplier Y having a length of 32 bits, and an addition value Z having a length of 32 bits are inputted, and X, which is the product sum of these,
In a product-sum operation unit that outputs × Y + Z as a result R,
The sign extender performs one-bit sign extension on the addition value Z, the bit divider divides the output of the sign extender into lower 18 bits and upper 15 bits, and the first selector performs the first operation step. Selects and outputs the lower 16 bits of the multiplier Y, selects and outputs the upper 17 bits of the multiplier Y in the second operation step, and selects the second 16 bits in the first operation step.
In the second operation step, selects and outputs the upper 35 bits of the value held in the second pipeline register. In the first operation step, the third selector outputs the output of the bit divider. And outputs a value obtained by concatenating the lower 18 bits of the first constant with the lower 15 bits of the output of the bit divider and the upper 34 bits of the value held in the first pipeline register in the second operation step. A value obtained by concatenating the bits is selected and output, and the Booth decoder cuts out the output of the first selector and shifts the start position from the most significant bit to the lower bit by two bits at a time in units of three bits, and outputs the value extracted in the eighth bit. Partial product generator, seventh partial product generator,
A sixth partial product generator, a fifth partial product generator, a fourth partial product generator, a third partial product generator, a second partial product generator,
And the values supplied to the first partial product generator and these values and the value of the multiplicand X to calculate an eighth partial product, a seventh partial product, a sixth partial product, a fifth partial product, and a fourth partial product. , The third partial product, the second
, And a first partial product of 10 inputs CS
The A-tree is generated by the Booth decoder using the first partial product, the second partial product, the third partial product, the fourth partial product,
The fifth partial product, the sixth partial product, the seventh partial product, the eighth partial product, the output of the second selector, the output of the third selector, and the additional bit information group are added, and the first Pipeline registers have 10 inputs C in the first operation step
The second pipeline register holds the half carry of the output of the 10-input CSA tree in the first operation step, and the carry propagation adder holds the second operation of the second operation. A value obtained by concatenating the half sum output by the 10-input CSA tree in the step and the lower 16 bits of the half sum held in the first pipeline register in the first operation step, and the 10-input CSA in the second operation step The half carry output by the tree and the lower one of the half carry held in the second pipeline register in the first operation step
Carry propagation addition is performed on a value obtained by concatenating 5 bits.

【００７０】[0070]

【実施例】次に、本発明について図面を参照して詳細に
説明する。Next, the present invention will be described in detail with reference to the drawings.

【００７１】図１は、本発明の積和演算器の一実施例の
構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of one embodiment of the product-sum operation unit of the present invention.

【００７２】本実施例の積和演算器は、３３ビット長の
符号付き被乗数Ｘ，３２ビット長の符号付き乗数Ｙ，お
よび３２ビット長の加算値Ｚを入力し、これらの積和で
あるＸ×Ｙ＋Ｚを結果Ｒとして出力する積和演算器であ
る。この積和演算器は、第１のセレクタ１と、第２セレ
クタ２と、第３のセレクタ３と、ブースのデコーダ４
と、ビット分割器５と、１０入力ＣＳＡツリー６と、第
１のパイプライン・レジスタ７と、第２のパイプライン
・レジスタ８と、キャリ伝播加算器９と、符号拡張器１
０とを含んで構成されている。The product-sum operation unit of this embodiment receives a signed multiplicand X having a length of 33 bits, a signed multiplier Y having a length of 32 bits, and an addition value Z having a length of 32 bits. A product-sum calculator that outputs × Y + Z as a result R. The multiply-accumulate unit includes a first selector 1, a second selector 2, a third selector 3, and a booth decoder 4.
, A bit divider 5, a 10-input CSA tree 6, a first pipeline register 7, a second pipeline register 8, a carry propagation adder 9, and a sign extender 1.
0 is included.

【００７３】ブースのデコーダ４は、第１の部分積生成
器〜第８の部分積生成器を含んで構成されている。The Booth's decoder 4 includes a first to an eighth partial product generator.

【００７４】図２は、本実施例の積和演算器における第
１の演算ステップでの１０入力ＣＳＡツリー６の各入力
（ビット重みを表すためのビット開始位置およびビット
長を示したビット列）および各出力（ハーフサムおよび
ハーフキャリのビット列）の一覧を示す図である。FIG. 2 shows each input of the 10-input CSA tree 6 (a bit string indicating a bit start position and a bit length for representing a bit weight) in a first operation step in the product-sum operation unit of this embodiment and FIG. 6 is a diagram showing a list of each output (bit string of half sum and half carry).

【００７５】図３は、本実施例の積和演算器における第
２の演算ステップでの１０入力ＣＳＡツリー６の各入力
（ビット列）および各出力（ハーフサムおよびハーフキ
ャリのビット列）の一覧を示す図である。FIG. 3 is a diagram showing a list of each input (bit string) and each output (bit string of half sum and half carry) of the 10-input CSA tree 6 in the second operation step in the product-sum operation unit of the present embodiment. It is.

【００７６】図５および図６は、先にも言及したが、本
実施例の積和演算器の動作を説明するための図（本発明
で採用されるブースのデコーダの考え方を説明するため
の図）である。FIGS. 5 and 6 are diagrams for describing the operation of the multiply-accumulate unit according to the present embodiment (refer to FIGS. 5 and 6 for explaining the concept of the booth decoder employed in the present invention). Figure).

【００７７】次に、図１とともに図２および図３を参照
して、このように構成された本実施例の積和演算器の動
作について説明する。Next, with reference to FIG. 1 and FIGS. 2 and 3, the operation of the multiply-accumulate unit according to the present embodiment will be described.

【００７８】図１に示す本実施例の積和演算器は、ハー
ドウェア量を削減するために、部分積の生成および加算
を「第１の演算ステップ」と「第２の演算ステップ」と
の２回に分けて行う（この点は図７に示す従来の積和演
算器と同様である）。The product-sum operation unit of the present embodiment shown in FIG. 1 performs generation and addition of partial products by a “first operation step” and a “second operation step” in order to reduce the amount of hardware. This operation is performed twice (this is the same as the conventional product-sum operation unit shown in FIG. 7).

【００７９】第１に、第１の演算ステップにおける演算
実行時の動作を説明する。First, the operation at the time of executing the calculation in the first calculation step will be described.

【００８０】本実施例の積和演算器は、第１の演算ステ
ップでは、被乗数Ｘと乗数Ｙの下位１６ビットとを乗算
し、さらに加算値Ｚの一部（下位１８ビット）を加算
し、途中演算結果を生成する。In the first operation step, the product-sum operation unit of this embodiment multiplies the multiplicand X by the lower 16 bits of the multiplier Y and further adds a part (lower 18 bits) of the addition value Z. Generate the intermediate calculation result.

【００８１】符号拡張器１０は、加算値Ｚに対する１ビ
ットの符号拡張を行う。これにより、加算値Ｚは３３ビ
ット長になる。The sign extender 10 performs 1-bit sign extension on the addition value Z. Thereby, the addition value Z becomes 33 bits long.

【００８２】ビット分割器５は、符号拡張器１０の出力
（符号拡張された加算値Ｚ）を下位１８ビットと上位１
５ビットとに分割する。The bit divider 5 converts the output of the sign extender 10 (sign-extended addition value Z) into the lower 18 bits and the upper 1
Divide into 5 bits.

【００８３】第１のセレクタ１は、演算ステップ切替え
ビット（第１の演算ステップの演算を行うか第２の演算
ステップの演算を行うかを示す１ビット長の情報。この
場合には第１の演算ステップの演算を行う旨を示してい
る）を参照し、乗数Ｙの下位１６ビットを選択出力す
る。The first selector 1 has an operation step switching bit (1-bit information indicating whether to perform the operation of the first operation step or the operation of the second operation step. In this case, the lower 16 bits of the multiplier Y are selectively output.

【００８４】また、第２のセレクタ２は、第１の演算ス
テップの演算を行う旨を示している演算ステップ切替え
ビットを参照し、第２の定数を選択出力する。ここで、
第２の定数とは、式（１３）における１・２³²（ｊ＝０
の場合の１・２^(32+2j)）とｃ_n（１・２³¹）とを示す
数値をいう（図２参照）。The second selector 2 selects and outputs a second constant with reference to an operation step switching bit indicating that the operation of the first operation step is performed. here,
The second constant is equal to ^1.232 (j = 0) in Expression (13).
In the case of (1) ^{(32 + 2j)} ) and c _n (1 · 2 ³¹ ) (see FIG. 2).

【００８５】さらに、第３のセレクタ３は、第１の演算
ステップの演算を行う旨を示している演算ステップ切替
えビットを参照し、ビット分割器５の出力の下位１８ビ
ット（ビット分割器５によって出力された「符号拡張さ
れた加算値Ｚの下位１８ビット」）と第１の定数とを連
結した値を選択出力する。ここで、第１の定数とは、式
（１３）におけるｃ_z（１・２³¹）とｃ₁₅・２³⁰（ｊ＝
１５の場合のｃ_j・２^2j）とｃ₇・２¹⁴（ｊ＝７の場合
のｃ_j・２^2j）とを示す数値をいう（図２参照）。Further, the third selector 3 refers to the operation step switching bit indicating that the operation of the first operation step is performed, and the lower 18 bits of the output of the bit divider 5 (by the bit divider 5). A value obtained by connecting the output “lower 18 bits of sign-extended added value Z”) and the first constant is selected and output. Here, the first constant is c _z (1 · 2 ³¹ ) and c ₁₅ · 2 ³⁰ (j =
The numerical values indicate c _j · 2 ^{2j in} the case of 15 and c ₇ · 2 ¹⁴ (c _j · 2 ^{2j in} the case of j = 7) (see FIG. 2).

【００８６】ブースのデコーダ４は、被乗数Ｘと第１の
セレクタ１の出力とを入力し、第１のセレクタ１の出力
を切出し開始位置を最上位ビットから２ビットずつ下位
ビットに移動しながら３ビット単位で切り出した値を第
８の部分積生成器，第７の部分積生成器，第６の部分積
生成器，第５の部分積生成器，第４の部分積生成器，第
３の部分積生成器，第２の部分積生成器，および第１の
部分積生成器に供給し、これらの値と被乗数Ｘの値とか
ら第８の部分積生成器〜第１の部分積生成器により第８
の部分積〜第１の部分積（被乗数Ｘと乗数Ｙの下位１６
ビットとを乗算したときの８個の部分積Ｐ₇〜Ｐ₀）を
生成する。The Booth's decoder 4 receives the multiplicand X and the output of the first selector 1 and cuts out the output of the first selector 1 while shifting the start position from the most significant bit to the lower bits by two bits. An eighth partial product generator, a seventh partial product generator, a sixth partial product generator, a fifth partial product generator, a fourth partial product generator, a third partial product generator, The partial product generator, the second partial product generator, and the first partial product generator are supplied to each other, and from these values and the value of the multiplicand X, an eighth partial product generator to a first partial product generator are provided. By the 8th
Product to the first partial product (lower 16 of multiplicand X and multiplier Y)
Eight partial products P _{7 to} P ₀ ) are generated when multiplied by a bit.

【００８７】各部分積は、図５に示すように被乗数Ｘを
−２，−１，０，１，または２倍した値となり、図６に
示すようにして生成される。なお、図７に示す従来の積
和演算器と同様に、部分積Ｐ₀〜Ｐ₇には式（１３）に
おける１・２^(32+2j)（ｊ＝１〜８）を示すビットが付
加され、部分積Ｐ₁〜Ｐ₇には式（１３）におけるｃ_j
・２^2j（ｊ＝０〜６）を示すビットが付加される（図２
参照。先に述べたように、各部分積に付加されるビット
によって表される情報を「付加ビット情報」と呼ぶ）。
ここで、ｃ_jの値は、Ｙ_jの値に基づいて、図６に示す
ように決定される。Each partial product is a value obtained by multiplying the multiplicand X by -2, -1, 0, 1, or 2 as shown in FIG. 5, and is generated as shown in FIG. As in the conventional product-sum operation unit shown in FIG. 7, the partial product P ₀ to P ₇ bits added showing the Formula 1 and 2 in ^{(13) (32 + 2j)} (j = 1~8) And the partial products P _{1 to} P ₇ have c _j in equation (13).
A bit indicating 2 ^2j (j = 0 to 6) is added (FIG. 2)
reference. As described above, information represented by bits added to each partial product is called "additional bit information").
Here, the value of c _j is determined based on the value of Y _j as shown in FIG.

【００８８】１０入力ＣＳＡツリー６は、当該８個の部
分積（ブースのデコーダ４内の第１の部分積生成器〜第
８の部分積生成器によって生成される部分積。付加ビッ
ト情報を含む），第２のセレクタ２の出力，および第３
のセレクタ３の出力を第１入力〜第１０入力として入力
し、それらを加算し、その加算結果をハーフサムとハー
フキャリとの組として出力する。The 10-input CSA tree 6 includes the eight partial products (the partial products generated by the first to eighth partial product generators in the Booth's decoder 4 and includes additional bit information). ), The output of the second selector 2 and the third
Of the selector 3 are input as first to tenth inputs, and they are added, and the addition result is output as a set of a half sum and a half carry.

【００８９】当該ハーフサムおよびハーフキャリは、第
１の演算ステップの演算結果として、それぞれ、第１の
パイプライン・レジスタ７および第２のパイプライン・
レジスタ８に保持される。The half sum and the half carry are used as a result of the first operation step as a first pipeline register 7 and a second pipeline register, respectively.
It is held in the register 8.

【００９０】第２に、第２の演算ステップにおける演算
実行時の動作を説明する。Second, the operation at the time of executing the calculation in the second calculation step will be described.

【００９１】本実施例の積和演算器は、第２の演算ステ
ップでは、被乗数Ｘと乗数Ｙの上位１７ビットとを乗算
し、さらに第１の演算ステップの演算結果および加算値
Ｚの一部（第１の演算ステップで加算しなかった上位１
５ビット）を加算し、最終的な積和演算の結果Ｒを生成
する。In the second operation step, the multiply-accumulate unit of this embodiment multiplies the multiplicand X by the higher 17 bits of the multiplier Y, and further calculates the result of the first operation step and a part of the added value Z. (Top 1 not added in the first calculation step
5 bits) to generate a final product-sum operation result R.

【００９２】なお、第１の演算ステップから第２の演算
ステップへの移行時には、演算ステップ切替えビットが
「第１の演算ステップを行う旨を示す情報」から「第２
の演算ステップを行う旨を示す情報」に切り替えられ
る。At the time of transition from the first operation step to the second operation step, the operation step switching bit changes from “information indicating that the first operation step is performed” to “second operation step”.
Information indicating that the calculation step is performed ”.

【００９３】第１のセレクタ１は、第２の演算ステップ
の演算を行う旨を示している演算ステップ切替えビット
を参照し、乗数Ｙの上位１７ビットを選択出力する。The first selector 1 selects and outputs the upper 17 bits of the multiplier Y with reference to the operation step switching bit indicating that the operation of the second operation step is performed.

【００９４】また、第２のセレクタ２は、第２の演算ス
テップの演算を行う旨を示している演算ステップ切替え
ビットを参照し、第２のパイプライン・レジスタ８に保
持されている値（第１の演算ステップの演算結果のうち
のハーフキャリ）の上位３５ビットを選択出力する。Further, the second selector 2 refers to the operation step switching bit indicating that the operation of the second operation step is to be performed, and refers to the value (second value) held in the second pipeline register 8. The upper 35 bits of the half carry of the operation result of the operation step 1 are selectively output.

【００９５】さらに、第３のセレクタ３は、第２の演算
ステップの演算を行う旨を示している演算ステップ切替
えビットを参照し、ビット分割器５の出力の上位１５ビ
ット（ビット分割器５によって出力された「符号拡張さ
れた加算値Ｚの上位１５ビット」）と第１のパイプライ
ン・レジスタ７に保持されている値（第１の演算ステッ
プの演算結果のうちのハーフサム）の上位３４ビットと
を連結した値を選択出力する。Further, the third selector 3 refers to the operation step switching bit indicating that the operation of the second operation step is performed, and refers to the upper 15 bits of the output of the bit divider 5 (by the bit divider 5). The output “upper 15 bits of sign-extended added value Z”) and the upper 34 bits of the value (half-sum of the operation result of the first operation step) held in first pipeline register 7 Select and output the value concatenated with.

【００９６】ブースのデコーダ４は、被乗数Ｘと第１の
セレクタ１の出力とを入力し、第１のセレクタ１の出力
を切出し開始位置を最上位ビットから２ビットずつ下位
ビットに移動しながら３ビット単位で切り出した値を第
８の部分積生成器，第７の部分積生成器，第６の部分積
生成器，第５の部分積生成器，第４の部分積生成器，第
３の部分積生成器，第２の部分積生成器，および第１の
部分積生成器に供給し、これらの値と被乗数Ｘの値とか
ら第８の部分積生成器〜第１の部分積生成器により第８
の部分積〜第１の部分積（被乗数Ｘと乗数Ｙの上位１７
ビットとを乗算したときの８個の部分積Ｐ₁₅〜Ｐ₈）を
生成する。The booth decoder 4 receives the multiplicand X and the output of the first selector 1 and cuts out the output of the first selector 1 while shifting the start position from the most significant bit to the lower bit by two bits. An eighth partial product generator, a seventh partial product generator, a sixth partial product generator, a fifth partial product generator, a fourth partial product generator, a third partial product generator, The partial product generator, the second partial product generator, and the first partial product generator are supplied to each other, and from these values and the value of the multiplicand X, an eighth partial product generator to a first partial product generator are provided. By the 8th
Product to the first partial product (upper 17 of multiplicand X and multiplier Y)
Eight partial products P _{15 to} P ₈ ) are generated when multiplied by the bits.

【００９７】各部分積は、図５に示すように被乗数Ｘを
−２，−１，０，１，または２倍した値となり、図６に
示すようにして生成される。なお、図７に示す従来の積
和演算器と同様に、部分積Ｐ₈〜Ｐ₁₄には式（１３）に
おける１・２^(32+2j)（ｊ＝９〜１５）を示すビットが
付加され、部分積Ｐ₉〜Ｐ₁₅には式（１３）におけるｃ
_j・２^2j（ｊ＝８〜１４）を示すビットが付加される
（図３参照）。Each partial product has a value obtained by multiplying the multiplicand X by -2, -1, 0, 1, or 2 as shown in FIG. 5, and is generated as shown in FIG. As in the conventional product-sum operation unit shown in FIG. 7, the partial products P ₈ to P _14-bit additional showing the Formula 1 and 2 in ^{(13) (32 + 2j)} (j = 9~15) And the partial products P _{9 to} P ₁₅ have c in equation (13).
A bit indicating _j · 2 ^2j (j = 8 to 14) is added (see FIG. 3).

【００９８】１０入力ＣＳＡツリー６は、当該８個の部
分積（ブースのデコーダ４内の第１の部分積生成器〜第
８の部分積生成器によって生成される部分積。付加ビッ
ト情報を含む），第２のセレクタ２の出力，および第３
のセレクタ３の出力を第１入力〜第１０入力として入力
し、それらを加算し、その加算結果をハーフサムとハー
フキャリとの組として出力する。The 10-input CSA tree 6 includes the eight partial products (the partial products generated by the first to eighth partial product generators in the Booth's decoder 4 and includes additional bit information). ), The output of the second selector 2 and the third
Of the selector 3 are input as first to tenth inputs, and they are added, and the addition result is output as a set of a half sum and a half carry.

【００９９】キャリ伝播加算器９は、以下のの値およ
びの値を加算し、単一の結果Ｒ（当該積和演算の結果
Ｒ）を得て出力する（以下のの値およびの値を対象
としてキャリ伝播加算を行う）。４９ビット長のハーフサム（第２の演算ステップで
求められたハーフサム）と、第１の演算ステップで求め
られたハーフサムの下位１６ビット（第１のパイプライ
ン・レジスタ７に保持されている値の下位１６ビット）
とを連結した値（図２および図３参照）４８ビット長のハーフキャリ（第２の演算ステップ
で求められたハーフキャリ）と、第１の演算ステップで
求められたハーフキャリの下位１５ビット（第２のパイ
プライン・レジスタ８に保持されている値の下位１５ビ
ット）とを連結した値（図２および図３参照）Carry propagation adder 9 adds the following values and the following values to obtain and output a single result R (result R of the product-sum operation) (the following values and Carry propagation addition). A 49-bit half-sum (half-sum obtained in the second operation step) and the lower 16 bits of the half-sum obtained in the first operation step (lower value of the value held in the first pipeline register 7) 16 bits)
(See FIGS. 2 and 3) A half carry of 48 bits (half carry obtained in the second operation step) and the lower 15 bits of the half carry obtained in the first operation step ( A value concatenated with the lower 15 bits of the value held in the second pipeline register 8 (see FIGS. 2 and 3)

【０１００】図７に示した従来の積和演算器では、加算
値Ｚが符号拡張された値を加算するために、ＣＳＡツリ
ー（１１入力ＣＳＡツリー１０６）に専用の入力が設け
られていた。これに対して、図１に示す本実施例の積和
演算器では、加算値Ｚが符号拡張された値が次の〜
に示すような手順でＣＳＡツリー（１０入力ＣＳＡツリ
ー６）に与えられる。加算値Ｚが符号拡張された値が、他の入力値（式
（１３）における各項目に対応する入力値）とビット重
みが重ならないようにビット分割器５で分割される。ビット分割器５の出力の２つの値が、他の入力値と
連結される。第１の演算ステップと第２の演算ステップとの２回
に分けて、で分割されで他の入力値と連結された
「加算値Ｚが符号拡張された値」が、ＣＳＡツリー（１
０入力ＣＳＡツリー６）に与えられる。In the conventional product-sum operation unit shown in FIG. 7, a dedicated input is provided in the CSA tree (11-input CSA tree 106) in order to add a value obtained by sign-extending the addition value Z. On the other hand, in the product-sum operation unit of the present embodiment shown in FIG.
Is given to the CSA tree (10-input CSA tree 6) according to the procedure shown in FIG. The value obtained by sign-extending the addition value Z is divided by the bit divider 5 so that the bit weight does not overlap with another input value (input value corresponding to each item in the equation (13)). The two values at the output of the bit divider 5 are concatenated with the other input values. The “value in which the addition value Z is sign-extended” divided into two and divided into two by the first operation step and the second operation step and connected to other input values is a CSA tree (1
0 input CSA tree 6).

【０１０１】このため、従来１１個必要であったＣＳＡ
ツリーの入力数を１０個に減らすことができ、ＣＳＡツ
リーにおけるハードウェアの簡素化および演算の遅延の
短縮（動作の高速化）を達成することが可能となる。For this reason, CSA conventionally required 11
The number of tree inputs can be reduced to ten, and hardware simplification of the CSA tree and shortening of operation delay (speeding up of operation) can be achieved.

【０１０２】なお、図２および図３を参照すると、式
（１３）中の各項目は、必ず、図２または図３中の各入
力のビット列に含まれている。したがって、第１の演算
ステップおよび第２の演算ステップのいずれかで、式
（１３）中の各項目の値は１０入力ＣＳＡツリー６によ
り入力され加算されている。Referring to FIG. 2 and FIG. 3, each item in equation (13) is always included in the bit string of each input in FIG. 2 or FIG. Therefore, in each of the first operation step and the second operation step, the value of each item in the equation (13) is input by the 10-input CSA tree 6 and added.

【０１０３】ここで、式（１３）中に現れる１・２
^(32+2j)（ｊ＝０〜１６）とｃ_j・２^2j（ｊ＝０〜１
５）とｃ_zとｃ_nとをどの入力として１０入力ＣＳＡツ
リー６に与えるかという態様は、図２および図３に示す
態様に限られない。図２および図３に示す位置と同じビ
ット重みの位置に上述の値（１・２^(32+2j)（ｊ＝０〜
１６），ｃ_j・２^2j（ｊ＝０〜１５），ｃ_z，およびｃ
_n）を示すビットを入力するのであれば、図２および図
３に示す態様以外の態様も許容される。したがって、第
１の定数および第２の定数の値も、本実施例における値
に限られるものではない。Here, 1.2 appearing in equation (13)
^{(32 + 2j)} (j = 0 to 16) and c _j · 2 ^2j (j = 0 to 1)
The form in which 5), c _z, and c _n are given to the 10-input CSA tree 6 is not limited to the forms shown in FIGS. 2 and 3. The above value ( ^{1.2 (32 + 2j)} (j = 0 to 0 ^{) is added to the} position of the same bit weight as the position shown in FIGS.
16), c _j · 2 ^2j (j = 0 to 15), c _z , and c
_As long as the bit indicating _n ) is input, modes other than those shown in FIGS. 2 and 3 are also allowed. Therefore, the values of the first constant and the second constant are not limited to the values in the present embodiment.

【０１０４】また、本実施例では部分積Ｐ₀〜Ｐ₁₅に対
する付加ビット情報がブースのデコーダ４側で付加され
るものとしたが、１０入力ＣＳＡツリー６側でこれらの
付加ビット情報を付加することも可能である。In the present embodiment, the additional bit information for the partial products P _{0 to} P _{15 is} added at the booth decoder 4 side. However, these additional bit information is added at the 10-input CSA tree 6 side. It is also possible.

【０１０５】なお、第１演算ステップの演算と第２演算
ステップの演算とのいずれを行っているかを示す情報を
第１のセレクタ１，第２のセレクタ２および第３のセレ
クタ３に指示する手法は、本実施例のように「演算ステ
ップ切替えビットによって指示する手法」に限られるも
のではない。A method of instructing the first selector 1, the second selector 2 and the third selector 3 with information indicating which of the first calculation step and the second calculation step is being performed. Is not limited to the “method instructed by the operation step switching bit” as in the present embodiment.

【０１０６】[0106]

【発明の効果】以上説明したように本発明は、加算値Ｚ
が符号拡張された値を他の入力値とビット重みが重なら
ないように分割して他の入力値と連結し、当該分割・連
結によって生成された値を第１の演算ステップと第２の
演算ステップとの２回に分けてＣＳＡツリーに与えるこ
とにより、従来は１１個必要であったＣＳＡツリーの入
力数を１０個に削減することができ（ＣＳＡツリーのハ
ードウェア量を削減することができ）、従来の積和演算
器（固定小数点積和演算器）よりもハードウェア量を減
らすことができ、従来の積和演算器よりも動作を高速化
できる（演算の遅延を短縮できる）積和演算器を実現す
ることが可能になるという効果を有する。As described above, according to the present invention, the sum Z
Is divided so that the bit weight does not overlap with the other input value, and is connected to the other input value. The value generated by the division / concatenation is divided into a first operation step and a second operation By giving to the CSA tree in two steps, the number of inputs of the CSA tree conventionally required 11 can be reduced to 10 (the hardware amount of the CSA tree can be reduced. ), The amount of hardware can be reduced compared to the conventional product-sum operation unit (fixed-point product-sum operation unit), and the operation can be performed faster than the conventional product-sum operation unit (operation delay can be reduced) There is an effect that it becomes possible to realize a computing unit.

[Brief description of the drawings]

【図１】本発明の積和演算器の一実施例の構成を示すブ
ロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment of a product-sum calculator according to the present invention.

【図２】図１に示す積和演算器における第１の演算ステ
ップでの図１中の１０入力ＣＳＡツリーの各入力および
各出力の一覧を示す図である。FIG. 2 is a diagram showing a list of each input and each output of a 10-input CSA tree in FIG. 1 in a first operation step in the product-sum operation unit shown in FIG. 1;

【図３】図１に示す積和演算器における第２の演算ステ
ップでの図１中の１０入力ＣＳＡツリーの各入力および
各出力の一覧を示す図である。3 is a diagram showing a list of each input and each output of a 10-input CSA tree in FIG. 1 in a second operation step in the product-sum operation unit shown in FIG. 1;

【図４】本発明で対象とする積和演算の態様を示す図で
ある。FIG. 4 is a diagram showing an aspect of a product-sum operation targeted by the present invention.

【図５】本発明で採用されるブースのデコーダの考え方
を説明し、図１および図７に示す積和演算器の動作を説
明するための図である。FIG. 5 is a diagram for explaining the concept of the Booth decoder employed in the present invention and explaining the operation of the product-sum operation unit shown in FIGS. 1 and 7;

【図６】本発明で採用されるブースのデコーダの考え方
を説明し、図１および図７に示す積和演算器の動作を説
明するための図である。FIG. 6 is a diagram for explaining the concept of the Booth decoder employed in the present invention and explaining the operation of the product-sum operation unit shown in FIGS. 1 and 7;

【図７】従来の積和演算器の一例の構成を示すブロック
図である。FIG. 7 is a block diagram illustrating a configuration of an example of a conventional product-sum operation unit.

【図８】図７に示す積和演算器における第１の演算ステ
ップでの図７中の１１入力ＣＳＡツリーの各入力および
各出力の一覧を示す図である。8 is a diagram showing a list of each input and each output of the 11-input CSA tree in FIG. 7 in a first operation step in the product-sum operation unit shown in FIG. 7;

【図９】図７に示す積和演算器における第２の演算ステ
ップでの図７中の１１入力ＣＳＡツリーの各入力および
各出力の一覧を示す図である。9 is a diagram showing a list of each input and each output of the 11-input CSA tree in FIG. 7 in a second operation step in the product-sum operation unit shown in FIG. 7;

[Explanation of symbols]

１第１のセレクタ２第２のセレクタ３第３のセレクタ４ブースのデコーダ５ビット分割器６１０入力ＣＳＡツリー７第１のパイプライン・レジスタ８第２のパイプライン・レジスタ９キャリ伝播加算器１０符号拡張器Ｒ結果Ｘ被乗数Ｙ乗数Ｚ加算値 DESCRIPTION OF SYMBOLS 1 1st selector 2 2nd selector 3 3rd selector 4 Booth's decoder 5 Bit divider 6 10-input CSA tree 7 1st pipeline register 8 2nd pipeline register 9 Carry propagation adder 10 Sign extender R Result X Multiplicand Y Multiplier Z Add value

Claims

(57) [Claims]

1. A signed multiplicand X, 32 having a length of 33 bits.
A signed multiplier Y having a bit length and an added value Z having a 32-bit length are input, and the product sum X × Y + Z is obtained as a result R
A sign extender that performs 1-bit sign extension on the addition value Z; a bit divider that divides the output of the sign extender into lower 18 bits and upper 15 bits; In the first operation step, the lower 16 bits of the multiplier Y are selectively output, and in the second operation step, the upper 17 bits of the multiplier Y are selectively output. In the first operation step, the second constant is set. Select output, 2nd
A second selector for selecting and outputting the upper 35 bits of the value held in the second pipeline register in the operation step, and a lower 18 bits of the output of the bit divider and the first selector in the first operation step. Select and output the value concatenated with the constant of
A second selector for selecting and outputting a value obtained by connecting the upper 15 bits of the output of the bit divider and the upper 34 bits of the value held in the first pipeline register; While shifting the output of the first selector from the most significant bit to the lower bit by two bits from the most significant bit, the values extracted in units of 3 bits are used as the values of the eighth partial product generator, the seventh partial product generator, and the sixth partial product generator. , A fifth partial product generator, a fourth partial product generator, a third partial product generator, a second partial product generator, and a first partial product generator, From these values and the value of the multiplicand X, the eighth partial product, the seventh partial product, the sixth partial product, the fifth partial product, the fourth partial product, the third partial product, and the second partial product A Booth decoder for generating a product and a first partial product; First partial product generated I, second partial product, the third partial product, the fourth partial product, the fifth
, The sixth partial product, the seventh partial product, the eighth partial product, the output of the second selector, the output of the third selector, and the additional bit information group and a 10-input CS.
An A-tree; the first pipeline register for holding a half-sum of the outputs of the 10-input CSA tree in a first operation step; and an output of the 10-input CSA tree in a first operation step. A second pipeline register for holding a half carry, a half sum output by the 10-input CSA tree in a second operation step, and a first sum in the first operation step.
The value obtained by concatenating the lower 16 bits of the half sum held in the pipeline register of the second register, the half carry output by the 10-input CSA tree in the second operation step, and the second pipe in the first operation step A carry-propagation adder for performing carry-propagation addition on a value obtained by concatenating the lower 15 bits of the half carry held in the line register.

2. The method according to claim 1, wherein the operation of the first operation step is performed.
The operation step switching bit indicating whether or not to perform the operation of the operation step of the first selector, the second selector, and the third selector
2. The multiply-accumulate unit according to claim 1, wherein the sum is given to the selector.

3. The first constant is “c _z , c ₁₅ · ²³⁰ and c ₇
- 2 ¹⁴ and a numerical value "indicating a second constant" number indicating the 1, 2 ³² and the c _n ", in the first operation step, the partial product P _0, P _1, P _2,
^{1.2 (32 + 2j) for} P ₃ , P ₄ , P ₅ , P ₆ , and P ₇
(J = 1, 2, 3, 4, 5, 6, 7, and 8) are added, and the partial products P ₁ , P ₂ , P ₃ , P ₄ ,
P _5, P _6, and P ₇ in the _{^{c j · 2 2j (j =}} 0,1,
2, 3, 4, 5, and 6), and in a second operation step, the partial products P ₈ , P ₉ , P ₁₀ ,
P ₁₁ , P ₁₂ , P ₁₃ , and ~ P ₁₄ have _1.2 ^{(32 + 2j)} (j
= 9, 10, 11, 12, 13, 14, and 15), and the partial products P ₉ , P ₁₀ , P ₁₁ , P ₁₂ ,
P ₁₃ , P ₁₄ , and P ₁₅ have c _j · 2 ^2j (j = 8, 9, 1
3. The sum-of-products arithmetic unit according to claim 1, wherein a bit indicating 0, 11, 12, 13, and 14) is added.