JPH0326857B2

JPH0326857B2 -

Info

Publication number: JPH0326857B2
Application number: JP59179638A
Authority: JP
Inventors: Noryuki Ikumi
Original assignee: Tokyo Shibaura Electric Co Ltd
Current assignee: Toshiba Corp
Priority date: 1984-08-29
Filing date: 1984-08-29
Publication date: 1991-04-12
Also published as: JPS6158036A

Description

[Detailed description of the invention]

〔発明の技術分野〕この発明は、例えば画像処理などのように高速
な信号処理が必要とされるものに使用される乗算
器に関する。〔発明の技術的背景とその問題点〕乗算器の乗算方式としては従来から様々な方式
が提案されているが、人手で行なう乗算と全く同
じ原理を用い、これをハードウエアで実現したの
が並列乗算器である。これは第３図に示すよう
に、乗数ｙと被乗数ｘとの各ビツト毎の部分積を
生成するさめのアンドゲート１１と、その部分積
ｘ，ｙ、同じ桁の前段の和出力S′および下の桁か
らの桁上げ信号C′を足し合わせ加算出力Ｓおよび
桁上げ信号Ｃを得る全加算器１２とを１つの単位
回路（基本セル）１３とし、この基本セルを第４
図に示すようにアレイ状に配置したものである。
第４図は４×４ビツトの乗算器を示すもので、x₁
〜x₄は被乗数、y₁〜y₄は乗数、１３₁〜１３₆は基
本セル、１４は４ビツトの加算器、S₁〜S₈は乗算
出力である。この方式は、部分積の生成および加
算を並列に行なうために、高速な演算が可能であ
る。しかし、上記のような構成では、ｎ×ｎビツト
の乗算器を形成しようとすると、基本セルが2ⁿ個
必要となり、ハードウエア量が多くなる欠点があ
る。ところで、一般に信号の伝播時間は通過するセ
ルの段数で決定されるため、この方式ではｎ×ｎ
ビツト乗算器において、基本セルをｎ回通過する
（ここではセル・アレイのみで最終段の加算器１
４は含まない）ことになる。従つて、さらに高速
化を望むならば、基本セルの通過段数を減らせば
良く、これは同時にハードウエア量の低減にもつ
ながる。このようなセルの通過段数を減らす方式
として、Wallaceのトリーがある。この方式によ
ればセルの通過段数は大幅に削減され、より一層
の高速化が期待される。しかし、LSI化を考える
と、上記Wallaceのトリー方式はそのパターン形
状を矩形にまとめることが難しく、大きな無効エ
リアを生ずるためチツプの有効利用という点から
は不適である。また、必要な配線も複雑であり、
配線遅延も無視できない等の欠点を有している。〔発明の目的〕この発明は上記のような事情に鑑みてなされも
ので、その目的とするところは、高速動作が可能
で、しかもLSI化に適した規則的なパターンを有
し、高集積化できる乗算器を提供することであ
る。〔発明の概要〕すなわち、この発明においては、上記の目的を
達成するために、通過するセルの段数の低減とい
う点からBoothアルゴリズムと、２系統の同じ桁
の加算という２つの方式を併用したもので、
Boothアルゴリズムを用いることによつて通過す
るセルの段数を1/2に低減するとともに、同じ桁
の加算を複数の経路を用いることによつて約1/2
に減らし、合計で通過するセルの段数を約1/4に
減らすことによつて高速化を図つている。また、
基本的には並列乗算器であるのでパターンの規則
性も維持している。〔発明の実施例〕以下、この発明の一実施例について図面を参照
して説明する。一般に、部分積の生成で最もよく用いられる手
法としてBoothのアルゴリズムがある。このアル
ゴリズムは２の補数の乗算が補正なしで実行でき
るという利点がある。今、例として２ビツトの
Boothをあげて説明する。２の補数表示のとき、
乗数Ｙは、Ｙ＝−y_o・2^n-1＋_o-1 〓^j=1 y_i2^i-1 ……(1) （y_oは符号ビツト、y_o-1〜y₁は数値部）で表わされる。上式(1)は次のような書き換えるこ
とができる。従つて、乗数Ｐ＝Ｘ・Ｙは、となる。上記(3)において、（y_2i＋y_2i+1−2y_2i+2）
は、相続く３ビツト（y_2i，y_2i+1，2y_2i+2）の値に
応じて「０」，「±１」，「±２」の値を取るので、
部分積はそれによつて０，±Ｘ，±2Xのどれかを
取ることになる。前式(3)から明らかなように、
Boothのアルゴリズムを用いれば、部分積は通常
の並列乗算器のｎ個に対して半分のｎ／２個で済
む。一方、部分積の加算の段数を減らす手法とし
て、第５図に示すように同じ桁の加算２つの経路
（例えば偶数段と奇数段）で行ない、最終段でそ
の両者を加え合わせるものがある。この方式よれ
ば、ｎ×ｎビツトの乗算がｎ／２＋２段で計算できることにより、ワード長が短かい場合には大きな
効果は得られないが、ワード長が長くなるに従つ
て効果的である。第５図において、ａを付した符
号は奇数段、ｂは付した符号は偶数段を示してい
る。例えば、基本セル１５ａからの和出力Ｓは、
次段の基本セル１５ｂを飛び越して１６ａへ、ま
た、キヤリーＣも同様に１個飛び越して１８ａへ
供給される。一方、偶数段でも同様に例えば１５
ｂからの和出力Ｓは基本セル１６ｂへ、キヤリー
Ｃは基本セル１８ｂそれぞれ供給される。このよ
うに、偶数段と奇数段でそれぞれ別々に加算を行
ない、最後に両者を加え合わせる（図ではセル１
９〜２２を用いている）ため、余分に２段必要と
なるが、ワード長が長くなるに従つてその影響は
薄らぐ。この発明においては、パターンの規則性を維持
しつつしかも高速化を実現するために、上述した
２つの方式を併用して乗算器を形成している。第
１図はその構成を示すもので８×８ビツトの乗算
器を示している。この乗算器は第２図に示すよう
な基本セル２３によつて構成される。すなわち、
和入力S_io、キヤリー入力C_ioおよび被乗数Ｘが供
給され、その和出力S_putおよびキヤリー出力C_cut
を得る全加算器２４の被乗数Ｘ入力端には、排他
的ノアゲート２５の出力が供給される。こ排他的
ノアゲート２５の一方の入力端には、反転信号
（−2X，−Ｘ）NEGAが供給され、他方の入力端
にはノアゲート２６の出力が供給される。このノ
アゲート２６の入力端には、アンドゲート２７₁，
２７₂の出力がそれぞれ供給される。アンドゲー
ト２７₁の一方の入力端には被乗数Ｘが、他方の
入力端にはＸセレクト信号SSELXがそれぞれ供
給される。また、アンドゲート２７₂の一方の入
力端には被乗数Ｘを２倍した信号２Ｘが、他方の
入力端にはこの２Ｘのセレクト信号SEL２Ｘがそ
れぞれ供給されるようになつている。このような基本セル２３は、第１図に示すよう
にマトリクス状に配設される。マトリクス状に配
置された基本セル２３₉，２３₁₈，２３₂₇，２３₃₆
および２３₈，２３₁₇，２３₂₆，２３₃₅には被乗数
X₀が、基本セ２３₈，２３₁₇，２３₂₆，２３₃₅およ
び２３₁，２３₁₆，２３₂₅，２３₃₄には被乗数X₁
が、基本セル２３₇，２３₁₆，２３₂₅，２３₃₄およ
び２３₆，２３₁₅，２３₂₄，２３₃₃には被乗数X₂
が、以下、同様にして基本セル２３₁〜２３₆，２
３₁₀〜２２３₁₅，２３₁₉〜２３₂₄および２３₂₈〜２
３₃₃には被乗数X₃〜X₇がそれぞれ供給される。
ここで、各基本セルの右側から入力された被乗数
が第２図における２Ｘに、左側から入力された被
乗数がＸに相当している。なお、基本セル２３₁，
２３₁₀，２３₁₉および２３₂₈には被乗数Ｘ，２Ｘ
として被乗数X₇が供給され、２３₉，２３₁₈，２
３₂₇，２３₃₆の２Ｘとして接地電位V_SSが供給され
る。一方、乗数Y₀〜Y₇はそれぞれ、３ビツトが１
組としてデコーダ２８₁〜２８₄に供給される。す
なわち、デコーダ２８₁には乗数Y₀，Y₁と接地電
位ＶSSが、デコーダ２８₂には乗数Y₁，Y₂，Y₃
が、デコーダ２８₃には乗数Y₃，Y₄，Y₅が、デコ
ーダ２８₄には乗数Y₅，Y₆，Y₇がそれぞれ供給さ
れる。そして、上記デコーダ２８₁から出力され
る制御信号（Ｘセレクト信号SELX，２Ｘセレク
ト信号SEL２Ｘ、反転信号NEGA）はそれぞれ、
基本セル２３₁〜２３₉に供給され、デコーダ２８
_２から出力される制御信号はそれぞれ基本セル２
３₁₀〜２３₁₈に、デコーダ２８₃から出力される制
御信号はそれぞれ基本セル２３₁₉〜２３₂₇に、デ
コーダ２８₄から出力される制御信号はそれぞれ
基本セル２３₂₈〜２３₃₆に供給される。また、上
記基本セル２３₂〜２３₁₀，２３₁₂〜２３₁₉，２３
₂₁，２３₂₂，２３₂₈，２３₃₀および２３₃₁にはそれ
ぞれ、和入力S_ioおよびキヤリー入力C_ioとして接
地電位V_SSが供給される。上記基本セル２３₁は符
号ビツトとして働くもので、接地電位V_SSおよび
電源電位V_DDが供給される。同様に、基本セル２
３₁₁，２３₂₀，２３₂₉にも接地電位V_SSおよび電源
電位V_DDが供給される。上記基本セル２３₁〜２３
_５の和出力は、基本セル２３₂₃〜２３₂₇に供給され
る。これら基本セルの２３₂₃〜２３₂₇のキヤリー
入力としては、接地電位V_SSが供給される。上記
基本セル２３₆〜２３₉から得られる和出力は、桁
上げ信号を生成するための多入力高速加算器２９
に供給される。上記基本セル２３₁₀〜２３₁₄の和
出力はそれぞれ基本セル２３₃₂〜２３₃₆に供給さ
れ、基本セル２３₁₅〜２３₁₈の出力はそれぞれ上
記多入力高速加算器２９に供給される。また、基
本セル２３₁₉のキヤリー出力および和出力はそれ
ぞれ、加算器３０₁，３０₂に供給され、基本セル
２３₂₀のキヤリー出力および和出力はそれぞれ加
算器３０₂，３０₃に、基本セル２３₂₁のキヤリー
出力および和出力はそれぞれ加算器３０₃，３０₄
に、基本セル２３₂₂のキヤリー出力および和出力
は加算器３０₄，３０₅に、基本セル２３₂₃のキヤ
リー出力および和出力は加算器３０₅，３０₆にそ
れぞれ供給される。上記基本セル２３₂₄のキヤリ
ー出力は加算器３０₆へ供給されるとともに、和
出力は多入力高速加算器２９へ供給され、基本セ
ル２３₂₇のキヤリー出力および和出力はそれぞれ
上記多入力高速加算器２９へ供給される。基本セ
ル２３₂₈の和出力は加算器３０₇へ供給され、基
本セル２３₂₉〜２３₃₄の和出力は加算器３０₁〜３
０₆へ、キヤリー出力は加算器３０₇〜３０₁₂へそ
れぞれ供給される。さらに、基本セル２３₃₅のキ
ヤリー出力は加算器３０₁₃へ、和出力は多入力高
速加算器２９へ供給され、基本セル２３₃₆の和出
力およびキヤリー出力は多入力高速加算器２９へ
それぞれ供給される。上記加算器３０₁〜３０₆の
和出力は上記加算器３０₈〜３０₁₃へ、キヤリー
出力は加算器３０₇〜３０₁₂へそれぞれ供給され
る。上記加算器３０₇〜３０₁₃の和出力およびキ
ヤリー出力はそれぞれ、例えばCLA（Carry
Look Ahead）等から成り最終和を求めるための
高速加算器３１に供給される。また、上記多入力
高速加算器２９のキヤリー出力は上記加算器３０
₁₃と上記高速加算器３１へそれぞれ供給される。
そして、多入力高速加算器２９から乗算出力Z₀〜
Z₇を、高速加算器３１から乗算出力Z₈〜Z₁₄をそ
れぞれ得るようにして成る。次に、上記のような構成において動作を説明す
る。被乗数X₀〜X₇が各基本セル２３₁〜２３₃₆に
供給されるとともに、乗数Y₀〜Y₇がデコーダ２
８₁〜２８₄に供給されると、これらデコーダ２８
_１〜２８₄によつて３ビツトの乗数データのデコー
ダが行なわれ、これに対応した制御信号（Ｘセレ
クト信号SELX，２Ｘセレクト信号SEL２Ｘ、反
転信号NEGA）が各基本セル２３₁〜２３₃₆に供
給されて０，±Ｘ，±2Xの選択が行なわれる。こ
れによつて、被乗数X₀〜X₇と乗数Y₀〜Y₇との部
分積が生成される。この部分積は、奇数段および
偶数段毎に加算され、上記基本セル２３₆〜２３
_９，２３₁₅〜２３₁₈，２３₂₄〜２３₂₇，２３₃₅，２
３₃₆の和出力およびキヤリー出力の少なくとも一
方が選択的に多入力高速加算器２９に供給され。
基本セル２３₁₉〜２３₂₄および２３₂₈〜２３₃₅の
和出力およびキヤリー出力の少なくとも一方がそ
れぞれ加算器３０₁〜３０₁₃に選択的に供給され、
奇数段と偶数段の部分積の和が最終的に加算され
る。そして、これらの加算器３０₇〜３０₁₃の和
出力およびキヤリー出力が高速加算器３１によつ
て加算される。ここで、高速加算器３１の入力
（加算器３０₇〜３０₁₃の出力）が確定したときに
は、下位の加算器からの桁上げ信号も確定してい
なければならない。このため下位側ではセルの段
数を増やし和出力とキヤリー出力とを２つに絞り
込むことはせずに、２系統のセルから出力された
和出力とキヤリー出力とから桁上げ信号を生成す
る多入力高速加算器２９を用い、全ての基本セル
を通過する間に上位への桁上げを確定させてい
る。このような構成によれば、下表に示すように通
過するセルの段数を低減でき、これによつて高速
化を図れる。また、並列型であるのでパターン構
成する際の規則性を維持でき、LSI化にも好適な
ものである。 [Technical Field of the Invention] The present invention relates to a multiplier used in applications that require high-speed signal processing, such as image processing. [Technical background of the invention and its problems] Various methods have been proposed as multiplier multiplication methods, but this method uses exactly the same principle as multiplication performed manually and has been realized using hardware. It is a parallel multiplier. As shown in Fig. 3, this consists of the same AND gate 11 that generates a partial product for each bit of the multiplier y and the multiplicand x, the partial products x, y, the sum output S' of the previous stage of the same digit, and A full adder 12 that adds the carry signal C' from the lower digit and obtains the addition output S and the carry signal C is one unit circuit (basic cell) 13, and this basic cell is used as the fourth
They are arranged in an array as shown in the figure.
Figure 4 shows a 4 x 4 bit multiplier, x ₁
_.about.x4 is a multiplicand, _y1 to _y4 are multipliers, ₁₃₁ to ₁₃₆ are basic cells, 14 is a 4-bit adder, and _S1 to _S8 are multiplication outputs. This method enables high-speed calculations because partial product generation and addition are performed in parallel. However, in the above configuration, if an n×n bit multiplier is to be formed, 2 ⁿ basic cells are required, which increases the amount of hardware. By the way, since the propagation time of a signal is generally determined by the number of stages of cells it passes through, this method uses n×n
In the bit multiplier, the basic cell is passed n times (here, only the cell array is used, and the final stage adder 1
4 is not included). Therefore, if further speeding up is desired, the number of stages through which the basic cells pass can be reduced, which also leads to a reduction in the amount of hardware. Wallace's tree is a method for reducing the number of stages that cells pass through. According to this method, the number of stages through which cells pass can be significantly reduced, and even higher speeds are expected. However, when considering LSI implementation, Wallace's tree method described above is difficult to organize the pattern shape into a rectangle and produces a large invalid area, making it unsuitable from the point of view of effective chip utilization. In addition, the required wiring is complicated,
It also has drawbacks such as non-negligible wiring delays. [Object of the Invention] This invention was made in view of the above-mentioned circumstances, and its purpose is to enable high-speed operation, have a regular pattern suitable for LSI implementation, and achieve high integration. The objective is to provide a multiplier that can. [Summary of the Invention] In other words, in order to achieve the above object, the present invention combines the Booth algorithm and two systems of addition of the same digits in order to reduce the number of stages of passing cells. in,
By using the Booth algorithm, the number of cells passing through can be reduced to 1/2, and by using multiple paths to add the same digit, it can be reduced to about 1/2.
We aim to increase speed by reducing the total number of stages of cells that pass through to about 1/4. Also,
Since it is basically a parallel multiplier, it also maintains the regularity of the pattern. [Embodiment of the Invention] Hereinafter, an embodiment of the present invention will be described with reference to the drawings. In general, Booth's algorithm is the most commonly used method for generating partial products. This algorithm has the advantage that two's complement multiplication can be performed without correction. Now, as an example, 2 bits
I'll give you Booth and explain. When displaying in two's complement,
The multiplier Y is Y=-y _o・2 ^n-1 + _o-1 〓 ^j=1 y _i 2 ^i-1 ...(1) (y _o is the sign bit, y _o-1 ~ _{y 1} is the numerical part ). The above equation (1) can be rewritten as follows. Therefore, the multiplier P=X・Y is becomes. In (3) above, (y _2i +y _2i+1 −2y _2i+2 )
takes values of ``0'', ``±1'', and ``±2'' depending on the values of successive 3 bits (y _2i , y _2i+1 , 2y _2i+2 ), so
Depending on the partial product, it will take either 0, ±X, or ±2X. As is clear from the previous equation (3),
If Booth's algorithm is used, the number of partial products can be n/2, which is half of the number n of a normal parallel multiplier. On the other hand, as a method of reducing the number of stages of addition of partial products, as shown in FIG. 5, there is a method in which addition of the same digit is performed in two paths (for example, an even number stage and an odd number stage), and both are added at the final stage. According to this method, since n×n bit multiplication can be calculated in n/2+2 stages, a large effect is not obtained when the word length is short, but it becomes more effective as the word length becomes longer. In FIG. 5, the symbols appended with a indicate odd-numbered stages, and the symbols appended with b indicate even-numbered stages. For example, the sum output S from the basic cell 15a is
It skips over the next stage basic cell 15b and supplies it to 16a, and similarly skips one carry C and supplies it to 18a. On the other hand, for even-numbered stages, for example, 15
The sum output S from b is supplied to the basic cell 16b, and the carry C is supplied to the basic cell 18b. In this way, addition is performed separately in the even-numbered stages and odd-numbered stages, and finally the two are added together (in the figure, cell 1
9 to 22), two extra stages are required, but this effect diminishes as the word length increases. In this invention, in order to achieve high speed while maintaining pattern regularity, the above-mentioned two methods are used together to form a multiplier. FIG. 1 shows its configuration, and shows an 8×8 bit multiplier. This multiplier is constituted by basic cells 23 as shown in FIG. That is,
A sum input S _io , a carry input C _io and a multiplicand X are supplied, and their sum output S _put and carry output C _cut
The output of the exclusive NOR gate 25 is supplied to the multiplicand X input terminal of the full adder 24 which obtains . One input terminal of this exclusive NOR gate 25 is supplied with the inverted signal (-2X, -X) NEGA, and the other input terminal is supplied with the output of the NOR gate 26. At the input end of this NOR gate 26 are AND gates 27 ₁ ,
27 ₂ outputs are provided respectively. The multiplicand X is supplied to one input terminal of the AND gate ₂₇₁ , and the X selection signal SSELX is supplied to the other input terminal. Further, a signal 2X obtained by doubling the multiplicand X is supplied to one input terminal of the AND gate ₂₇₂ , and a select signal SEL2X of this 2X is supplied to the other input terminal. Such basic cells 23 are arranged in a matrix as shown in FIG. Basic cells arranged in a matrix 23 ₉ , 23 ₁₈ , 23 ₂₇ , 23 ₃₆
and 23 ₈ , 23 ₁₇ , 23 ₂₆ , 23 ₃₅ have multiplicands
_X ₀ _is _the _multiplicand _{_} _{_} _{_} _{_} _{_}
However, the basic cells 23 ₇ , 23 ₁₆ , 23 ₂₅ , 23 ₃₄ and 23 ₆ , 23 ₁₅ , 23 ₂₄ , 23 ₃₃ have the _multiplicand
However, hereinafter, basic cells 23 ₁ to 23 ₆ , 2
3 ₁₀ ~ 223 ₁₅ , 23 ₁₉ ~ 23 ₂₄ and 23 ₂₈ ~ 2
₃₃₃ are supplied with multiplicands X ₃ to X ₇ , respectively.
Here, the multiplicand input from the right side of each basic cell corresponds to 2X in FIG. 2, and the multiplicand input from the left side corresponds to X. In addition, the basic cell 23 ₁ ,
23 ₁₀ , 23 ₁₉ and 23 ₂₈ have multiplicands X and 2X
The multiplicand X ₇ is supplied as 23 ₉ , 23 ₁₈ , 2
The ground potential V _SS is supplied as 2X of 3 ₂₇ and 23 ₃₆ . On the other hand, each of the multipliers Y ₀ to _{Y 7} has 3 bits equal to 1
The signals are supplied as a set to decoders 28 ₁ to 28 ₄ . That is, the decoder 28 ₁ receives the multipliers Y ₀ , Y ₁ and the ground potential VSS, and the decoder 28 ₂ receives the multipliers Y ₁ , Y ₂ , Y _{3 .}
However, the decoder 28 ₃ is supplied with multipliers Y ₃ , Y ₄ , and Y ₅ , and the decoder 28 ₄ is supplied with multipliers Y ₅ , Y ₆ , and Y _{7 ,} respectively. The control signals (X select signal SELX, 2X select signal SEL2X, and inverted signal NEGA) output from the decoder ₂₈₁ are as follows:
It is supplied to basic cells 23 ₁ to 23 ₉ and decoder 28
The control signals output from each basic cell ₂
3 ₁₀ to 23 ₁₈ , the control signals output from the decoder 28 ₃ are supplied to the basic cells 23 ₁₉ to 23 ₂₇ , respectively, and the control signals output from the decoder 28 ₄ are supplied to the basic cells 23 ₂₈ to 23 ₃₆ , respectively. Further, the basic cells 23 ₂ to 23 ₁₀ , 23 ₁₂ to 23 ₁₉ , 23
The ground potential V _SS is supplied to ₂₁ , 23 ₂₂ , 23 ₂₈ , 23 ₃₀ and 23 ₃₁ as a sum input S _io and a carry input C _io , respectively. The basic cell ₂₃₁ functions as a code bit and is supplied with a ground potential V _SS and a power supply potential V _DD . Similarly, basic cell 2
The ground potential V _SS and the power supply potential V _DD are also supplied to 3 ₁₁ , 23 ₂₀ , and 23 ₂₉ . The above basic cells 23 ₁ to 23
The sum output of ₅ is supplied to basic cells 23 ₂₃ to 23 ₂₇ . The ground potential V _SS is supplied to the carry inputs of these basic cells 23 ₂₃ to 23 ₂₇ . The sum output obtained from the basic cells 23 ₆ to 23 ₉ is sent to a multi-input high-speed adder 29 for generating a carry signal.
is supplied to The sum outputs of the basic cells 23 ₁₀ to 23 ₁₄ are supplied to the basic cells 23 ₃₂ to 23 ₃₆ , respectively, and the outputs of the basic cells 23 ₁₅ to 23 ₁₈ are supplied to the multi-input high-speed adder 29, respectively. Further, the carry output and sum output of the basic cell 23 ₁₉ are supplied to adders 30 ₁ and 30 ₂ , respectively, and the carry output and sum output of the basic cell 23 ₂₀ are supplied to adders 30 ₂ and 30 ₃ , respectively. The carry output and sum output of ₂₁ are added to adders 30 ₃ and 30 ₄ respectively.
The carry output and sum output of basic cell 23 ₂₂ are supplied to adders 30 ₄ and 30 ₅ , and the carry output and sum output of basic cell 23 ₂₃ are supplied to adders 30 ₅ and 30 ₆ , respectively. The carry outputs of the basic cells _23-24 are supplied to the adders _30-6 , the sum outputs are supplied to the multi-input high-speed adder 29, and the carry outputs and sum outputs of the basic cells _23-27 are respectively supplied to the multi-input high-speed adder 29. 29. The sum output of the basic cells 23 ₂₈ is supplied to the adder 30 ₇ , and the sum output of the basic cells 23 ₂₉ to 23 ₃₄ is supplied to the adders 30 ₁ to 3.
0 ₆ and the carry outputs are supplied to adders 30 ₇ to 30 ₁₂ , respectively. Further, the carry output of the basic cell 23 ₃₅ is supplied to the adder 30 ₁₃ , the sum output is supplied to the multi-input high speed adder 29, and the sum output and carry output of the basic cell 23 ₃₆ are supplied to the multi-input high speed adder 29, respectively. Ru. The sum outputs of the adders 30 ₁ to 30 ₆ are supplied to the adders 30 ₈ to 30 ₁₃ , and the carry outputs are supplied to the adders 30 ₇ to 30 ₁₂ , respectively. The sum output and carry output of the adders 30 ₇ to 30 ₁₃ are, for example, CLA (Carry
Look Ahead), etc., and is supplied to a high-speed adder 31 for determining the final sum. Further, the carry output of the multi-input high-speed adder 29 is transmitted to the adder 30.
₁₃ and the high-speed adder 31, respectively.
Then, the multiplication output Z ₀ ~ from the multi-input high-speed adder 29
Z ₇ and multiplication outputs Z ₈ to _{Z 14} are obtained from the high-speed adder 31, respectively. Next, the operation in the above configuration will be explained. The multiplicands X ₀ to X ₇ are supplied to each basic cell 23 ₁ to 23 ₃₆ , and the multipliers Y ₀ to _{Y 7} are supplied to the decoder 2.
8 ₁ to 28 ₄ , these decoders 28
₁ to ₂₈₄ decode the 3-bit multiplier data, and the corresponding control signals (X select signal SELX, 2X select signal SEL2X, inverted signal NEGA) are supplied to each basic cell ₂₃₁ to ₂₃₃₆ . 0, ±X, ±2X are selected. As a result, partial products of the multiplicands X ₀ to X ₇ and the multipliers Y ₀ to _{Y 7} are generated. These partial products are added for each odd-numbered stage and even-numbered stage, and are added to the basic cells 23 ₆ to 23
₉ , 23 ₁₅ ~ 23 ₁₈ , 23 ₂₄ ~ 23 ₂₇ , 23 ₃₅ , 2
At least one of the sum output and the carry output of ₃₃₆ is selectively supplied to a multi-input high speed adder 29.
At least one of the sum output and carry output of the basic cells 23 ₁₉ to 23 ₂₄ and 23 ₂₈ to 23 ₃₅ is selectively supplied to adders 30 ₁ to 30 ₁₃ , respectively;
The sums of the partial products of the odd and even stages are finally added. Then, the sum output and carry output of these adders 30 ₇ to 30 ₁₃ are added by a high-speed adder 31. Here, when the input of the high-speed adder 31 (the output of the adders 30 ₇ to 30 ₁₃ ) is determined, the carry signal from the lower adder must also be determined. Therefore, on the lower side, instead of increasing the number of cell stages and narrowing down the sum output and carry output to two, there are multiple inputs that generate a carry signal from the sum output and carry output output from the two systems of cells. A high-speed adder 29 is used to determine the carry to the higher order while passing through all the basic cells. According to such a configuration, the number of stages of passing cells can be reduced as shown in the table below, thereby increasing the speed. Furthermore, since it is a parallel type, regularity in pattern configuration can be maintained, making it suitable for LSI implementation.

〔Effect of the invention〕

以上説明したようにこの発明によれば、高速動
作が可能で、しかもLSI化に適した規則的なパタ
ーンを有し、高集積化できる乗算器が得られる。 As described above, according to the present invention, it is possible to obtain a multiplier that is capable of high-speed operation, has a regular pattern suitable for LSI implementation, and can be highly integrated.

[Brief explanation of drawings]

第１図はこの発明の一実施例に係わる乗算器の
構成図、第２図は上記第１図における基本セルの
構成図、第３図は従来の並列乗算器の基本の構成
図、第４図は上記第３図の基本セルを用いて構成
した並列乗算器の構成図、第５図は部分積の加算
の段数を減らす手法を説明するための図である。 X₀〜X₇……被乗数、Y₀〜Y₇……乗数、２３₁
〜２３₃₆……基本セル、２２８₁〜２８₄……デコ
ーダ、２９……多入力高速加算器、３０₁〜３０
₁₃……加算器、３１……高速加算器、Z₀〜Z₁₄…
…乗算出力、SELX……Ｘセレクト信号、SEL２
Ｘ……２Ｘセレクト信号、NEGA……反転信号、
２４……全加算器、２５……排他的ノアゲート、
２６……ノアゲート、２７₁，２７₂……アンドゲ
ート、S_io……和入力、C_io……キヤリー入力、S_put
……和出力、C_put……キヤリー出力。 FIG. 1 is a block diagram of a multiplier according to an embodiment of the present invention, FIG. 2 is a block diagram of the basic cell in FIG. 1, FIG. 3 is a basic block diagram of a conventional parallel multiplier, and FIG. This figure is a block diagram of a parallel multiplier constructed using the basic cell shown in FIG. 3, and FIG. 5 is a diagram for explaining a method of reducing the number of stages of addition of partial products. X ₀ ~ X ₇ ... Multiplicand, Y ₀ ~ _{Y 7} ... Multiplier, 23 ₁
~23 ₃₆ ...Basic cell, _2281-284 ...Decoder, ₂₉ ...Multi-input high-speed adder, _301-30
₁₃ ... Adder, 31... High speed adder, Z ₀ ~ Z ₁₄ ...
...Multiply output, SELX...X select signal, SEL2
X...2X select signal, NEGA...inverted signal,
24...Full adder, 25...Exclusive NOR gate,
26...Noah gate, 27 ₁ , 27 ₂ ...And gate, S _io ...Sum input, C _io ...Carry input, S _put
... Sum output, C _put ... Carry output.

Claims

[Claims] 1. A generating means for generating a selection signal from a multiplier of at least 3 bits; a basic cell for selecting a multiplicand and adding partial products according to the selection signal output from the generating means; A matrix-like calculation means in which cells are arranged in a matrix and the same digits of the obtained partial products are added separately in even and odd stages, and a plurality of partial products outputted from this matrix-like calculation means. A multiplier comprising: an adding means for adding and outputting sums; and a multiplier. 2. The basic cell includes a first AND gate to which a multiplicand and a selection signal of this multiplicand are supplied, and a second AND gate to which a signal obtained by doubling the multiplicand and a selection signal of this doubled signal, a NOR gate to which the outputs of the first and second AND gates are supplied, and an exclusive NOR gate to which the outputs and inverted signals of the NOR gate are supplied;
2. The multiplier according to claim 1, further comprising a full adder for obtaining a sum output and a carry output to which the output of the exclusive NOR gate, the sum signal, and the carry signal are supplied.