JP3695561B2

JP3695561B2 - Accumulator

Info

Publication number: JP3695561B2
Application number: JP16001997A
Authority: JP
Inventors: 登小林
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1997-06-17
Filing date: 1997-06-17
Publication date: 2005-09-14
Anticipated expiration: 2017-06-17
Also published as: JPH117439A

Description

【０００１】
【発明の属する技術分野】
本発明は積和器に関し、高速の積和処理を行う積和器に関する。
近年、ＤＳＰ（Digital Signal Processor) が幅広い分野で適用されている。
ＤＳＰは信号処理演算によく現われる積和処理を高速に実行することが求められており、このため、ＤＳＰ内で積和処理を実行する積和器の高速動作が要望されている。
【０００２】
【従来の技術】
図７は並列乗算器の構成図を示す。同図中、乗算器１０は入力Ａ，Ｂの部分積を作り、その部分積を加算して乗算結果を得ている。部分積の加算はキャリの伝播の発生しないように考慮されたツリー状の構成の加算器回路（ワレサのツリー）によって行われる。具体的には部分積を半加算していき、ビット数をしぼりこんでいく。
【０００３】
例えば部分積の生成にＢｏｏｔｈのアルゴリズムを用いれば１６ビット×１６ビット乗算の場合、部分積は１６×８ビット発生する。この１２８ビット（１６×８）を３入力２出力の半加算器により各桁を計算することで最終的には２つ（サム，キャリー）のデータが得られる。この２つのデータをキャリ伝播のある全加算器１２で加算することで最終結果が得られる。
【０００４】
この乗算器を使用して積和器も作られる。積和器はΣＡｉ・Ｂｉのように乗算結果を加算し続けるように構成される。図８に積和器の構成図を示す。積和器は乗算器とほぼ同じ構成をとるが、乗算器１４の乗算結果がサム（Ｓ），キャリー（Ｃ）にしぼりこまれた２つのデータと直前までの積和処理の結果Ａｃｃとを加算する必要があるため、ここでもう一度、３入力２出力の半加算器１６でサム＋キャリー＋直前の積和結果を行い、最終のサム，キャリーを計算する。最後にこれを全加算器１８で全加算して積和結果を得る。この積和器では図９に示す積和のサイクル時間ＴＳは部分積の生成及び半加算器による絞り込みを行う時間Ｔ１と、得られた結果と直前の積和結果を加算する時間Ｔ２で決定される。
【０００５】
この積和のサイクル時間を短くして積和サイクルの低減を図る方法としてパイプライン化が従来行われてきた。この方法は図１０に示すように、部分積の生成・絞り込みの部分と全加算の部分で処理をレジスタ１９，２０で時間的に分けることにより、レイテンシは増加するものの処理サイクルは短くなり、全体の処理時間を短くすることができる。図１１に示すように、部分積の生成・絞り込みの時間Ｔ１と、最終結果計算の時間Ｔ２のうちの長い方の時間をサイクルとしてパイプライン化することができる。つまり、部分積の絞り込みの時間と最終結果の加算の時間が等しい場合には、積和処理のサイクル時間は、１／２になる。
【０００６】
また別の方法として図１２に示すように単純に２つの積和器を設け、デュアル化する構成がある。この構成ではそれぞれの積和器に積和結果記憶用のレジスタ２２Ａ，２２Ｂを設け、積和演算を２つの積和器１４Ａ，１６Ａ，１８Ａと１４Ｂ，１６Ｂ，１８Ｂとに分けて求める。それぞれの積和処理が終了した時点で、この２つの積和結果を全加算器２４で加算し最終結果を得てレジスタ２６に格納する。この方法では積和に要する時間は部分積生成や最終結果加算の時間に関係なく１／２とすることができる。
【０００７】
【発明が解決しようとする課題】
図８に示す構成のパイプライン化では、一般的に部分積の絞り込みの時間Ｔ１と、最終結果の加算の時間Ｔ２とが一致せず、いずれか長い方の時間がサイクル時間となって、パイプライン化によりサイクル時間を１／２とすることはできない。
【０００８】
図１２に示す構成のデュアル化では、サイクル時間を１／２とすることができるが、最終結果を得るためにレジスタ２２Ａ，２２Ｂと、全加算器２４とを設ける必要があり、ハードウェア規模が大きくなり、また、積和処理終了後に加算処理が必要になるという問題がある。また、積和器はＤＳＰの命令により制御され、積和器のレジスタ構成が変わる（レジスタ２２Ａ，２２Ｂが増す）ことは命令セットの変更を意味し、ファームウェア資産を生かせないという問題が発生する。
【０００９】
本発明は上記の点に鑑みなされたもので、サイクル時間を半分にできると共に、ハードウェア規模が大きくなることを抑制でき、レジスタ構成の変更がない積和器を提供することを目的とする。
【００１０】
【課題を解決するための手段】
請求項１に記載の発明は、２入力の乗算の部分積を半加算してサムとキャリーを出力する第一の乗算部と、
２入力の乗算の部分積を半加算してサムとキャリーを出力する第二の乗算部と、
前記第一の乗算部の出力と、前記第二の乗算部の出力と、第二の最終結果加算部の出力とを入力し加算を行い積和結果を出力する第一の最終結果加算部と、
前記第二の乗算部の出力と、前記第一の乗算部の出力と、前記第一の最終結果加算部の出力とを入力し加算を行い積和結果を出力する第二の最終結果加算部とを
備える。
【００１１】
このように、第一、第二の最終結果加算部で第一、第二の乗算部からの今回の入力に対するサム及びキャリーと、第二、第一の乗算部からの前回の入力に対するサム及びキャリーと、第二、第一の最終結果加算部からの前前回の入力に対する積和結果を加算して、今回の入力に対する積和結果を得ることができ、従来のデュアル化と同様にサイクル時間を半分にすることができると共に、従来のデュアル化に対して半加算器を削減でき、ハードウェア規模の増大を抑制でき、かつ、レジスタ構成の変更が生じない。
【００１２】
請求項４に記載の発明は、請求項１記載の積和器において、
前記第一の乗算部と前記第二の乗算部の代りに、単一の乗算部を時分割で使用する。
このため、乗算部が単一で済み、更にハードウェア規模を小さくできる。
請求項５に記載の発明は、請求項１記載の積和器において、
前記第一の最終結果加算部と第二の最終結果加算部の代りに、単一の最終結果加算部を時分割で使用する。
【００１３】
このため、最終結果加算部が単一で済み、更にハードウェア規模を小さくできる。
【００１４】
【発明の実施の形態】
図１は本発明の第１実施例の構成図を示す。同図中、入力Ａ，Ｂは図２（Ａ）に示すタイミングで供給され、このうち奇数番目の入力Ａ，Ｂはレジスタ３０Ａ，３２Ａにラッチされて乗算器３４Ａに供給され、偶数番目の入力Ａ，Ｂはレジスタ３０Ｂ，３２Ｂにラッチされて乗算器３４Ｂに供給される。
【００１５】
乗算器（乗算部）３４Ａ，３４Ｂ夫々は入力Ａ，Ｂの部分積を作り、その部分積を加算して絞り込み、サム（Ｓ）とキャリー（Ｃ）の２つのデータを生成する。図２（Ｂ），（Ｃ）は乗算器３４Ａ，３４Ｂ夫々のデータＳ，Ｃの出力タイミングを示す。乗算器３４Ａの出力データＳ，Ｃは半加算器３６Ａ及びレジスタ３８Ｂに供給され乗算器３４Ｂの出力データＳ，Ｃは半加算器３６Ｂ及びレジスタ３８Ａに供給される。
【００１６】
半加算器３６Ａに乗算器３４Ａからｎ＋１番目の入力に対する乗算データＳ，Ｃが供給されるとき、レジスタ３８Ａからは図２（Ｅ）に示すｎ番目の入力に対する乗算データＳ，Ｃが供給され、レジスタ４０Ａからは図２（Ｇ）に示すｎ−１番目の入力に対する積和データが供給され、これらのデータに対するサム（Ｓ），キャリー（Ｃ）の２つのデータが出力される。全加算器４２Ａでは上記のデータＳ，Ｃの加算が行われ最終結果つまりｎ＋１番目の入力に対する積和データが求められる。この積和データはセレクタ４４で選択されてレジスタ４６，４０Ｂ夫々に格納される。図２（Ｈ）はレジスタ４６の内容を示す。
【００１７】
半加算器３６Ｂに乗算器３４Ｂからｎ＋２番目の入力に対する乗算データＳ，Ｃが供給されるとき、レジスタ３８Ｂからは図２（Ｄ）に示すｎ＋１番目の入力に対する乗算データＳ，Ｃが供給され、レジスタ４０Ｂからは図２（Ｆ）に示すｎ番目の入力に対する積和データが供給され、これらのデータに対するサム（Ｓ），キャリー（Ｃ）の２つのデータが出力される。全加算器４２Ｂでは上記のデータＳ，Ｃの加算が行われ最終結果つまりｎ＋２番目の入力に対する積和データが求められる。この積和データはセレクタ４４で選択されてレジスタ４６，４０Ａ夫々に格納される。図２（Ｈ）はレジスタ４６の内容を示す。
【００１８】
上記の乗算器３４Ａが第一の乗算部に対応し、乗算器３４Ｂが第二の乗算部に対応し、半加算器３６Ａと全加算器４２Ａ及びレジスタ４０Ａ，３８Ａが第一の最終結果加算部に対応し、半加算器３６Ｂと全加算器４２Ｂ及びレジスタ４０Ｂ，３８Ｂが第二の最終結果加算部に対応する。
図３は半加算器３６Ａ，３６Ｂとして使用される４ビット５入力半加算器の構成図を示す。端子５０にはレジスタ４０Ａより積和データが入来し、端子５１，５２にはレジスタ３８Ａから２つのデータＳ，Ｃが入来し、これらのデータは半加算器５４ａ〜５４ｄで加算され、これらで得られたデータＳは半加算器５６ａ〜５６ｄに供給され、またデータＣは半加算器５８ａ，５６ａ〜５６ｃ夫々に供給される。半加算器５６ａ〜５６ｄには端子５３より乗算器３４Ａ出力のデータＳが供給されて加算される。半加算器５６ａ〜５６ｄの出力するデータＳは半加算器５８ｂ〜５８ｅに供給され、半加算器５８ａ〜５８ｄの出力するデータＣは半加算器５８ａ〜５８ｄに供給される。また、半加算器５８ｂ〜５８ｅには端子５５より乗算器３４Ａ出力のデータＣが供給されて加算される。上記の半加算器５８ａ〜５８ｅ夫々の出力するデータＣ，Ｓが全加算器４２Ａに供給される。
【００１９】
ここでは、端子５０〜５２に入力するデータの加算は段数が多く遅延時間が大きいと考えられるが、レジスタ４０Ａ，３８Ａ夫々のデータは加算サイクルの開始時点で既に値が決定しており、乗算器３４Ａの出力データＳ，Ｃの加算はこれより遅れて開始されることを考慮すると、上記半加算器５８ａ〜５８ｄにおける遅延は全体に何ら影響を与えない。
【００２０】
このように乗算器３４Ａ，３４Ｂ，半加算器３６Ａ，３６Ｂ，全加算器４２Ａ，４２Ｂで構成される２つの積和回路を１８０度位相をずらして交互に動作させることで、積和処理に必要なサイクル時間をデュアル化と同様に半分にすることができる。また、デュアル化のように余分なレジスタ２２Ａ，２２Ｂ及び全加算器２４が必要ないためハードウェアが大規模化することがなく、かつ、レジスタ構成の変更が生じない。
【００２１】
図４は本発明の第２実施例の構成図を示す。同図中、入力Ａ，Ｂは図５（Ａ）に示すタイミングで供給され、乗算器３３に供給される。乗算器３３は入力Ａ，Ｂの部分積を作り、その部分積を加算して絞り込み、サム（Ｓ）とキャリー（Ｃ）の２つのデータを生成する。図５（Ｂ）は乗算器３３のデータＳ，Ｃの出力タイミングを示す。乗算器３３の出力データＳ，Ｃはレジスタ３５Ａ，３５Ｂに供給される。
【００２２】
レジスタ３５Ａは図５（Ｃ）に示すラッチクロックのローレベル期間に奇数番目の入力に対する乗算器３３出力Ｓ，Ｃを図５（Ｄ）に示すように格納し、レジスタ３５Ｂは上記ラッチクロックのハイレベル期間に偶数番目の入力に対する乗算器３３出力Ｓ，Ｃを図５（Ｇ）に示すように格納する。レジスタ３５Ａに格納された奇数番目の２つのデータＳ，Ｃは半加算器３６Ａ及びレジスタ３８Ｂに供給され、レジスタ３５Ｂに格納された偶数番目の２つのデータは半加算器３６Ｂ及びレジスタ３８Ａに供給される。
【００２３】
半加算器３６Ａにレジスタ３５Ａからｎ＋１番目の入力に対する乗算データＳ，Ｃが供給されるとき、レジスタ３８Ａからは図５（Ｅ）に示すｎ番目の入力に対する乗算データＳ，Ｃが供給され、レジスタ４０Ａからは図５（Ｆ）に示すｎ−１番目の入力に対する積和データが供給され、これらのデータに対するサム（Ｓ），キャリー（Ｃ）の２つのデータが出力される。全加算器４２Ａでは上記のデータＳ，Ｃの加算が行われ最終結果つまりｎ＋１番目の入力に対する積和データが求められる。この積和データはセレクタ４４で選択されてレジスタ４６，４０Ｂ夫々に格納される。図５（Ｊ）はレジスタ４６の内容を示す。
【００２４】
半加算器３６Ｂにレジスタ３５Ｂからｎ＋２番目の入力に対する乗算データＳ，Ｃが供給されるとき、レジスタ３８Ｂからは図５（Ｈ）に示すｎ＋１番目の入力に対する乗算データＳ，Ｃが供給され、レジスタ４０Ｂからは図５（Ｉ）に示すｎ番目の入力に対する積和データが供給され、これらのデータに対するサム（Ｓ），キャリー（Ｃ）の２つのデータが出力される。全加算器４２Ｂでは上記のデータＳ，Ｃの加算が行われ最終結果つまりｎ＋２番目の入力に対する積和データが求められる。この積和データはセレクタ４４で選択されてレジスタ４６，４０Ａ夫々に格納される。図５（Ｊ）はレジスタ４６の内容を示す。
【００２５】
部分積の絞り込みの時間が最終結果の加算時間よりも短いとした場合、乗算器３３は積和サイクルの半分以下の時間で動作することになり、この実施例では乗算器３３を時分割で使用することで乗算器を１つに削除でき回路規模を小さくしている。この実施例でも従来のデュアル化と同様に積和処理に必要なサイクル時間を半分にすることができる。
【００２６】
これとは逆に、部分積の絞り込みの時間が最終結果の加算時間よりも長い場合は図６に示す第３実施例の回路構成とする。図６において、入力Ａ，Ｂのうち奇数番目の入力Ａ，Ｂはレジスタ３０Ａ，３２Ａにラッチされて乗算器３４Ａに供給され、偶数番目の入力Ａ，Ｂはレジスタ３０Ｂ，３２Ｂにラッチされて乗算器３４Ｂに供給される。
【００２７】
乗算器３４Ａ，３４Ｂ夫々は入力Ａ，Ｂの部分積を作り、その部分積を加算して絞り込み、サム（Ｓ）とキャリー（Ｃ）の２つのデータを生成する。乗算器３４Ａの出力データＳ，Ｃ夫々はセレクタ３７，３９に供給され乗算器３４Ｂの出力データＳ，Ｃ夫々はセレクタ３７，３９に供給される。
セレクタ３７，３９は２つのデータＳ，Ｃを奇数番目、偶数番目で順次選択して半加算器４１に供給する。半加算器４１にはレジスタ４５から前サイクルで得られた積和データが供給されており、半加算器４１はこれらのデータに対するＳ，Ｃの２つのデータを出力する。全加算器４３は上記のデータＳ，Ｃの加算を行い、最終結果の積和データがレジスタ４５に格納される。
【００２８】
この場合は最終結果の加算時間が短いため、時分割多重を行って１つの半加算器４１及び１つの全加算器４３で処理している。逆に部分積の絞り込みの時間が長いため、２つの乗算器３４Ａ，３４Ｂを使用してサイクルタイムの短縮を行っている。この実施例でも従来のデュアル化と同様に積和処理に必要なサイクル時間を半分にすることができる。
【００２９】
【発明の効果】
上述の如く、請求項１に記載の発明によれば、第一、第二の最終結果加算部で第一、第二の乗算部からの今回の入力に対するサム及びキャリーと、第二、第一の乗算部からの前回の入力に対するサム及びキャリーと、第二、第一の最終結果加算部からの前前回の入力に対する積和結果を加算して、今回の入力に対する積和結果を得ることができ、従来のデュアル化と同様にサイクル時間を半分にすることができると共に、従来のデュアル化に対して半加算器を削減でき、ハードウェア規模の増大を抑制でき、かつ、レジスタ構成の変更が生じない。
【００３１】
また、請求項４に記載の発明によれば、乗算部が単一で済み、更にハードウェア規模を小さくできる。
また、請求項５に記載の発明によれば、最終結果加算部が単一で済み、更にハードウェア規模を小さくできる。
【図面の簡単な説明】
【図１】本発明の構成図である。
【図２】図１の信号タイミングチャートである。
【図３】半加算器の構成図である。
【図４】本発明の構成図である。
【図５】図４の信号タイミングチャートである。
【図６】本発明の構成図である。
【図７】従来の乗算器の構成図である。
【図８】従来の積和器の構成図である。
【図９】図７の乗算サイクルを説明するための図である。
【図１０】従来の積和器の構成図である。
【図１１】図１０の乗算サイクルを説明するための図である。
【図１２】従来の積和器の構成図である。
【符号の説明】
３０Ａ，３０Ｂ，３２Ａ，３２Ｂ，３５Ａ，３５Ｂ，３８Ａ，３８Ｂ，４０Ａ，４０Ｂ，４５，４６レジスタ
３３，３４Ａ，３４Ｂ乗算器
３６Ａ，３６Ｂ，４１半加算器
４２Ａ，４２Ｂ，４３全加算器[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a product-sum device, and more particularly to a product-sum device that performs high-speed product-sum processing.
In recent years, DSPs (Digital Signal Processors) have been applied in a wide range of fields.
DSPs are required to perform product-sum processing that often appears in signal processing operations at high speed. For this reason, high-speed operation of a product-sum machine that performs product-sum processing in the DSP is desired.
[0002]
[Prior art]
FIG. 7 shows a configuration diagram of the parallel multiplier. In the figure, a multiplier 10 creates partial products of inputs A and B, and adds the partial products to obtain a multiplication result. The addition of the partial products is performed by an adder circuit (Walesa tree) having a tree-like structure so that carry propagation does not occur. Specifically, the partial product is half-added to reduce the number of bits.
[0003]
For example, if Boot's algorithm is used to generate a partial product, a 16 × 8-bit multiplication results in a 16 × 8-bit partial product. By calculating each digit of this 128 bits (16 × 8) with a half-adder with 3 inputs and 2 outputs, two (sum, carry) data are finally obtained. The final result is obtained by adding these two data by the full adder 12 having carry propagation.
[0004]
Using this multiplier, a sum of products is also made. The accumulator is configured to continue adding the multiplication results as ΣAi · Bi. FIG. 8 shows a configuration diagram of the sum of products. The product-sum multiplier has almost the same configuration as that of the multiplier. However, the multiplication result of the multiplier 14 is obtained by summing up two data, which are stored in the sum (S) and carry (C), and the result Acc of the previous product-sum processing. Since it is necessary to add, here, the half sum adder 16 with 3 inputs and 2 outputs once again performs sum + carry + the previous product-sum result to calculate the final sum and carry. Finally, this is fully added by the full adder 18 to obtain a product-sum result. In this product-sumer, the product-sum cycle time TS shown in FIG. 9 is determined by a time T1 for generating a partial product and narrowing down by a half adder, and a time T2 for adding the obtained result and the previous product-sum result. The
[0005]
Pipelining has been conventionally performed as a method of reducing the product-sum cycle time by shortening the cycle time of the product-sum. In this method, as shown in FIG. 10, by dividing the processing by the registers 19 and 20 in the partial product generation / narrowing part and the full addition part, the latency increases, but the processing cycle becomes shorter. The processing time can be shortened. As shown in FIG. 11, the longer one of the partial product generation / narrowing time T1 and the final result calculation time T2 can be pipelined as a cycle. That is, when the partial product narrowing time is equal to the final result adding time, the product-sum processing cycle time is halved.
[0006]
As another method, as shown in FIG. 12, there is a configuration in which two multipliers / summers are simply provided to make dual. In this configuration, registers 22A and 22B for storing product-sum results are provided in each product-sum unit, and the product-sum operation is obtained by dividing into two product-sum units 14A, 16A, 18A and 14B, 16B, 18B. At the end of each product-sum process, the two product-sum results are added by the full adder 24 to obtain the final result and stored in the register 26. In this method, the time required for the product sum can be halved regardless of the time for partial product generation and final result addition.
[0007]
[Problems to be solved by the invention]
In the pipelining of the configuration shown in FIG. 8, in general, the time T1 for narrowing the partial product and the time T2 for adding the final result do not coincide with each other, and the longer time becomes the cycle time. The cycle time cannot be halved due to line formation.
[0008]
In the dual configuration shown in FIG. 12, the cycle time can be halved, but it is necessary to provide the registers 22A and 22B and the full adder 24 in order to obtain the final result. In addition, there is a problem that an addition process is required after the product-sum process is completed. In addition, the sum of products is controlled by a DSP instruction, and changing the register configuration of the sum of products (increasing the registers 22A and 22B) means changing the instruction set, causing a problem that firmware assets cannot be utilized.
[0009]
The present invention has been made in view of the above points, and an object of the present invention is to provide a accumulator that can halve the cycle time, suppress an increase in hardware scale, and does not change the register configuration.
[0010]
[Means for Solving the Problems]
The invention according to claim 1 is a first multiplication unit that half-adds a partial product of two-input multiplication and outputs a sum and carry;
A second multiplication unit that half-adds a partial product of two-input multiplication and outputs a sum and a carry;
A first final result adding unit that inputs the output of the first multiplying unit, the output of the second multiplying unit, and the output of the second final result adding unit and outputs a product-sum result; ,
A second final result adder that inputs the output of the second multiplier, the output of the first multiplier, and the output of the first final result adder, adds the result, and outputs the product-sum result. And
Prepare.
[0011]
Thus, the sum and carry for the current input from the first and second multipliers in the first and second final result adders, and the sum and carry for the previous input from the second and first multipliers, Carry and the product-sum result for the previous input from the second and first final result adders can be added to obtain the product-sum result for the current input. Can be halved, half adders can be reduced compared to the conventional dual configuration, an increase in hardware scale can be suppressed, and no change in register configuration occurs.
[0012]
According to a fourth aspect of the present invention, there is provided the sum of products according to the first aspect,
Instead of the first multiplier and the second multiplier, a single multiplier is used in a time division manner.
For this reason, a single multiplication unit is sufficient, and the hardware scale can be further reduced.
The invention as set forth in claim 5 is the sum of the multipliers according to claim 1,
Instead of the first final result adding unit and the second final result adding unit, a single final result adding unit is used in a time division manner.
[0013]
Therefore, a single final result adding unit is sufficient, and the hardware scale can be further reduced.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 shows a block diagram of a first embodiment of the present invention. In FIG. 2, inputs A and B are supplied at the timing shown in FIG. 2A. Among them, odd-numbered inputs A and B are latched by registers 30A and 32A and supplied to multiplier 34A, and even-numbered inputs. A and B are latched by the registers 30B and 32B and supplied to the multiplier 34B.
[0015]
Each of the multipliers (multipliers) 34A and 34B creates partial products of inputs A and B, adds the partial products, narrows down, and generates two data of a sum (S) and a carry (C). 2B and 2C show the output timings of the data S and C of the multipliers 34A and 34B, respectively. The output data S and C of the multiplier 34A are supplied to the half adder 36A and the register 38B, and the output data S and C of the multiplier 34B are supplied to the half adder 36B and the register 38A.
[0016]
When the multiplication data S and C for the (n + 1) th input are supplied from the multiplier 34A to the half adder 36A, the multiplication data S and C for the nth input shown in FIG. 2 (E) are supplied from the register 38A. The product sum data for the (n-1) th input shown in FIG. 2G is supplied from the register 40A, and two data of the sum (S) and carry (C) for these data are output. In the full adder 42A, the above data S and C are added, and the final result, that is, the product-sum data for the (n + 1) th input is obtained. The product-sum data is selected by the selector 44 and stored in the registers 46 and 40B. FIG. 2 (H) shows the contents of the register 46.
[0017]
When the multiplication data S and C for the (n + 2) th input are supplied from the multiplier 34B to the half adder 36B, the multiplication data S and C for the (n + 1) th input shown in FIG. 2D are supplied from the register 38B. The product sum data for the nth input shown in FIG. 2F is supplied from the register 40B, and two data of the sum (S) and carry (C) for these data are output. In the full adder 42B, the above-described data S and C are added to obtain the final result, that is, the product-sum data for the (n + 2) th input. The product-sum data is selected by the selector 44 and stored in the registers 46 and 40A. FIG. 2 (H) shows the contents of the register 46.
[0018]
The multiplier 34A corresponds to the first multiplier, the multiplier 34B corresponds to the second multiplier, the half adder 36A, the full adder 42A, and the registers 40A and 38A are the first final result adder. The half adder 36B, the full adder 42B, and the registers 40B and 38B correspond to the second final result adder.
FIG. 3 shows a configuration diagram of a 4-bit 5-input half adder used as the half adders 36A and 36B. The product-sum data is input from the register 40A to the terminal 50, and the two data S and C are input from the register 38A to the terminals 51 and 52. These data are added by the half adders 54a to 54d. The data S obtained in the above is supplied to the half adders 56a to 56d, and the data C is supplied to the half adders 58a and 56a to 56c. Data S output from the multiplier 34A is supplied from the terminal 53 to the half adders 56a to 56d and added. Data S output from the half adders 56a to 56d is supplied to the half adders 58b to 58e, and data C output from the half adders 58a to 58d is supplied to the half adders 58a to 58d. Further, the data C output from the multiplier 34A is supplied from the terminal 55 to the half adders 58b to 58e and added. Data C and S output from the half adders 58a to 58e are supplied to the full adder 42A.
[0019]
Here, the addition of data input to the terminals 50 to 52 is considered to have a large number of stages and a large delay time. However, the values of the data of the registers 40A and 38A have already been determined at the start of the addition cycle. Considering that the addition of the output data S and C of 34A starts later than this, the delay in the half adders 58a to 58d has no influence on the whole.
[0020]
As described above, the two product-sum circuits composed of the multipliers 34A and 34B, the half adders 36A and 36B, and the full adders 42A and 42B are alternately operated with a phase difference of 180 degrees, which is necessary for the product-sum processing. Cycle time can be halved as with dualization. Further, since the extra registers 22A and 22B and the full adder 24 are not required unlike the dual configuration, the hardware is not enlarged and the register configuration is not changed.
[0021]
FIG. 4 shows a block diagram of the second embodiment of the present invention. In the figure, inputs A and B are supplied at the timing shown in FIG. 5A and supplied to the multiplier 33. The multiplier 33 creates partial products of the inputs A and B, adds the partial products, narrows down, and generates two data of a sum (S) and a carry (C). FIG. 5B shows the output timing of the data S and C of the multiplier 33. Output data S and C of the multiplier 33 are supplied to the registers 35A and 35B.
[0022]
The register 35A stores the outputs 33 and S of the multiplier 33 for odd-numbered inputs as shown in FIG. 5D during the low level period of the latch clock shown in FIG. 5C, and the register 35B stores the high level of the latch clock. Multiplier 33 outputs S and C for even-numbered inputs are stored as shown in FIG. The odd-numbered two data S and C stored in the register 35A are supplied to the half adder 36A and the register 38B, and the even-numbered two data stored in the register 35B are supplied to the half adder 36B and the register 38A. The
[0023]
When the multiplication data S and C for the (n + 1) th input are supplied from the register 35A to the half adder 36A, the multiplication data S and C for the nth input shown in FIG. The product-sum data for the (n-1) th input shown in FIG. 5F is supplied from 40A, and two data, sum (S) and carry (C), for these data are output. In the full adder 42A, the above data S and C are added, and the final result, that is, the product-sum data for the (n + 1) th input is obtained. The product-sum data is selected by the selector 44 and stored in the registers 46 and 40B. FIG. 5J shows the contents of the register 46.
[0024]
When the multiplication data S and C for the (n + 2) th input are supplied from the register 35B to the half adder 36B, the multiplication data S and C for the (n + 1) th input shown in FIG. 5 (H) are supplied from the register 38B. The product-sum data for the nth input shown in FIG. 5 (I) is supplied from 40B, and two data of sum (S) and carry (C) for these data are output. In the full adder 42B, the above-described data S and C are added to obtain the final result, that is, the product-sum data for the (n + 2) th input. The product-sum data is selected by the selector 44 and stored in the registers 46 and 40A. FIG. 5J shows the contents of the register 46.
[0025]
If the time for narrowing down the partial product is shorter than the addition time of the final result, the multiplier 33 operates in a time that is less than half of the product-sum cycle. In this embodiment, the multiplier 33 is used in a time division manner. By doing so, one multiplier can be eliminated, and the circuit scale is reduced. In this embodiment as well, the cycle time required for the product-sum processing can be halved as in the conventional dual configuration.
[0026]
On the contrary, if the time for narrowing down the partial product is longer than the addition time of the final result, the circuit configuration of the third embodiment shown in FIG. 6 is adopted. In FIG. 6, odd-numbered inputs A and B of inputs A and B are latched by registers 30A and 32A and supplied to multiplier 34A, and even-numbered inputs A and B are latched and multiplied by registers 30B and 32B. Is supplied to the vessel 34B.
[0027]
Each of the multipliers 34A and 34B creates partial products of the inputs A and B, adds the partial products, narrows down, and generates two data of a sum (S) and a carry (C). The output data S and C of the multiplier 34A are supplied to the selectors 37 and 39, respectively, and the output data S and C of the multiplier 34B are supplied to the selectors 37 and 39, respectively.
The selectors 37 and 39 sequentially select the two data S and C in odd and even numbers and supply them to the half adder 41. The product-sum data obtained in the previous cycle is supplied from the register 45 to the half adder 41, and the half adder 41 outputs two data of S and C for these data. The full adder 43 adds the data S and C, and the final product-sum data is stored in the register 45.
[0028]
In this case, since the addition time of the final result is short, time division multiplexing is performed and processing is performed by one half adder 41 and one full adder 43. On the contrary, since the time for narrowing the partial product is long, the cycle time is shortened by using two multipliers 34A and 34B. In this embodiment as well, the cycle time required for the product-sum processing can be halved as in the conventional dual configuration.
[0029]
【The invention's effect】
As described above , according to the first aspect of the present invention , the first and second final result adder units perform the sum and carry for the current input from the first and second multiplier units, and the second and first units. The sum and carry for the previous input from the multiplication unit and the product-sum result for the previous previous input from the second and first final result addition units may be added to obtain the product-sum result for the current input. The cycle time can be halved in the same way as in the conventional dual configuration, half adders can be reduced compared to the conventional dual configuration, the increase in hardware scale can be suppressed, and the register configuration can be changed. Does not occur.
[0031]
In addition, according to the fourth aspect of the present invention , a single multiplication unit is sufficient, and the hardware scale can be further reduced.
Further , according to the invention described in claim 5 , a single final result adding unit is sufficient, and the hardware scale can be further reduced.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of the present invention.
FIG. 2 is a signal timing chart of FIG. 1;
FIG. 3 is a configuration diagram of a half adder.
FIG. 4 is a configuration diagram of the present invention.
FIG. 5 is a signal timing chart of FIG. 4;
FIG. 6 is a configuration diagram of the present invention.
FIG. 7 is a configuration diagram of a conventional multiplier.
FIG. 8 is a configuration diagram of a conventional sum of products.
9 is a diagram for explaining a multiplication cycle in FIG. 7; FIG.
FIG. 10 is a configuration diagram of a conventional sum of products.
11 is a diagram for explaining a multiplication cycle in FIG. 10; FIG.
FIG. 12 is a block diagram of a conventional sum of products.
[Explanation of symbols]
30A, 30B, 32A, 32B, 35A, 35B, 38A, 38B, 40A, 40B, 45, 46 Registers 33, 34A, 34B Multipliers 36A, 36B, 41 Half adders 42A, 42B, 43 Full adders

Claims

A first multiplier that half-adds a partial product of two-input multiplication and outputs a sum and a carry;
A second multiplication unit that half-adds a partial product of two-input multiplication and outputs a sum and a carry;
A first final result adding unit that inputs the output of the first multiplying unit, the output of the second multiplying unit, and the output of the second final result adding unit and outputs a product-sum result; ,
A second final result adder that inputs the output of the second multiplier, the output of the first multiplier, and the output of the first final result adder, adds the result, and outputs the product-sum result. And
Product-sum unit, characterized in that it includes.

The accumulator according to claim 1, wherein
The first multiplier and the second multiplier alternately take in two inputs and perform multiplication .

The accumulator according to claim 1, wherein
In the first final result adder, the output of the second final result adder with respect to the input of the previous cycle, the output of the second multiplier with respect to the input of the previous cycle, and the first with respect to the input of the current cycle A sum of products summed with the output of the multiplication unit and output as a product sum result for the input of the current cycle .

The accumulator according to claim 1, wherein
Instead of the first multiplier and the second multiplier, a single multiplier is used in a time division manner.

The accumulator according to claim 1, wherein
Instead of the first final result addition unit and the second final result addition unit, a single product finalizer is used in a time division manner.