JPH0784761A

JPH0784761A - Arithmetic processor

Info

Publication number: JPH0784761A
Application number: JP22920193A
Authority: JP
Inventors: Yukio Otaguro; 幸雄大田黒; Naoyoshi Yano; 直佳矢野
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-09-14
Filing date: 1993-09-14
Publication date: 1995-03-31

Abstract

PURPOSE:To provide an inexpensive arithmetic processor whose hardware amount is remarkably reduced without increasing arithmetic processing time too much. CONSTITUTION:This arithmetic processor is constituted of a multiplier register SR 1 holding a n-bit multiplier and performing 1-bit shift by synchronizing with a clock, a partial product generation part generating the partial product of the shift output of the multiplier register and an m-bit number to be multiplier, an addition part adding the output of the partial product generation part and the output of an intermediate result register and an output register holding an intermediate result register REG 16 and an output of the addition part and the multiplication result of the multiplier and a number to be multiplied. For the period of one cycle of the clock, the generation of the partial product of the shift output of the multiplier register SR 1 and the number to be multiplied, the addition of the output of the partial product generation part and the output of the intermediate result register, the holding of the output of the addition part to the intermediate result register and the writing operation of one of the outputs of the addition part in the most significant bit of the output register are successively performed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明はデータ処理プロセッサや
マイクロプロセッサ等に組み込まれる演算装置に係り、
特に、ハードウェア量を削減すると共に演算処理時間の
増大を小さく抑さえた、低コストの演算装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an arithmetic unit incorporated in a data processor, a microprocessor or the like,
In particular, the present invention relates to a low-cost arithmetic device that reduces the amount of hardware and suppresses an increase in arithmetic processing time.

【０００２】[0002]

【従来の技術】データ処理プロセッサやマイクロプロセ
ッサ等において、特に乗算は重要な演算の１つである。
乗算の演算方法は、その要求性能や許容できるハードウ
ェア量に応じて幾つか提案されている。2. Description of the Related Art In data processors and microprocessors, multiplication is one of the most important operations.
Several calculation methods for multiplication have been proposed according to the required performance and the allowable hardware amount.

【０００３】図１０は、従来の乗算器（第１の従来例）
の構成図である。本従来例の乗算器は、キャリー・セー
ブ・アダー方式を採用した乗算器であり、８ビットの被
乗数データＸ（Ｘ＜０＞−Ｘ＜７＞）と８ビットの乗数
データＹ（Ｙ＜０＞−Ｙ＜７＞）の乗算を行って、乗算
結果ＯＵＴ（ＯＵＴ＜０＞−ＯＵＴ＜１５＞）を出力す
る。FIG. 10 shows a conventional multiplier (first conventional example).
It is a block diagram of. The multiplier of this conventional example is a multiplier that adopts a carry save adder method, and includes 8-bit multiplicand data X (X <0> -X <7>) and 8-bit multiplier data Y (Y <0. > -Y <7>) and outputs the multiplication result OUT (OUT <0> -OUT <15>).

【０００４】図１０において、乗算器は、被乗数Ｘを保
持する８ビットの被乗数レジスタＸＲと、乗数Ｙを保持
する８ビットの乗数レジスタＹＲと、乗算を行う乗算部
とから構成されている。In FIG. 10, the multiplier comprises an 8-bit multiplicand register XR for holding the multiplicand X, an 8-bit multiplier register YR for holding the multiplier Y, and a multiplication unit for performing multiplication.

【０００５】乗算部は、８段の部分積計算アレイと、最
終段の部分積計算アレイに接続されるキャリールックア
ヘッドアダーＣＬＡＡＤＤＥＲから構成されている。部
分積計算アレイは、各段毎に、部分積を発生する８個の
論理積ゲートより成る部分積生成部と、部分積を累積加
算する７個の全加算器ＦＡＤＤ及び１個の半加算器ＨＡ
ＤＤより成る加算部とから構成されている。参考とし
て、図２（ａ）に全加算器ＦＡＤＤの入力Ａ，Ｂ，Ｃに
対する出力Ｚ，ＣＯの真理値表を、図２（ｂ）に半加算
器ＨＡＤＤの入力Ｂ，Ｃに対する出力Ｚ，ＣＯの真理値
表を、それぞれ示す。The multiplication unit is composed of an 8-stage partial product calculation array and a carry look ahead adder CLAADDER connected to the final stage partial product calculation array. The partial product calculation array includes, for each stage, a partial product generator including eight logical product gates that generate partial products, seven full adders FADD and one half adder that cumulatively add partial products. HA
And an adder including a DD. For reference, a truth table of outputs Z and CO for inputs A, B and C of full adder FADD is shown in FIG. 2A, and an output Z for inputs B and C of half adder HADD is shown in FIG. 2B. The truth table of CO is shown respectively.

【０００６】先ず、第１段目の部分積計算アレイでは、
被乗数レジスタＸＲの出力Ｘ＜０＞−Ｘ＜７＞と乗数レ
ジスタＹＲの出力Ｙ＜０＞の論理積が計算され、該１段
目の部分積計算アレイの出力は第２段目の部分積計算ア
レイに加えられる。次に、第２段目の部分積計算アレイ
では、第１段目の部分積計算アレイの出力と、被乗数レ
ジスタＸＲの出力Ｘ＜０＞−Ｘ＜７＞と、乗数レジスタ
ＹＲの出力Ｙ＜１＞とを用いて、部分積の累積加算が行
われる。First, in the first stage partial product calculation array,
The logical product of the outputs X <0> -X <7> of the multiplicand register XR and the output Y <0> of the multiplier register YR is calculated, and the output of the partial product calculation array of the first stage is the partial product of the second stage. Added to the computational array. Next, in the partial product calculation array of the second stage, the output of the partial product calculation array of the first stage, the outputs X <0> -X <7> of the multiplicand register XR, and the output Y <of the multiplier register YR. 1> and are used to perform partial addition of partial products.

【０００７】同様にして、第３段目から第８段目までの
部分積計算アレイによって累積加算が順次行われて、最
後にキャリールックアヘッドアダーＣＬＡＡＤＤＥＲに
より第８段目の部分積計算アレイの出力に対してキャリ
ー先見加算が行われて、キャリールックアヘッドアダー
ＣＬＡＡＤＤＥＲの出力が乗算結果ＯＵＴ＜０＞−ＯＵ
Ｔ＜８＞として、第２段目から第８段目までの最下位ビ
ットに位置する半加算器ＨＡＤＤの出力を、それぞれ乗
算結果ＯＵＴ＜９＞−ＯＵＴ＜１５＞として乗算結果が
生成される。Similarly, cumulative addition is sequentially performed by the partial product calculation arrays from the third stage to the eighth stage, and finally the carry look ahead adder CLAADDER outputs the partial product calculation array of the eighth stage. Carry look-ahead addition is performed on the carry look-ahead adder CLAADDER and the multiplication result OUT <0> -OU
As T <8>, the outputs of the half adders HADD located in the least significant bits from the second stage to the eighth stage are respectively generated as multiplication results OUT <9> -OUT <15>, and a multiplication result is generated. .

【０００８】このような第１の従来例の乗算器では、乗
数Ｙのビット数（図１０の例では、８）と被乗数Ｘのビ
ット数（８）の積（６４）に相当する個数の部分積計算
回路（論理積ゲート、及び全加算器または半加算器）が
必要である。例えば、３２ビットのデータを乗算する乗
算器の場合、３２×３２＝１０２４個もの部分積加算回
路が必要となり、回路の面積が著しく増大する。In such a first conventional multiplier, the number of bits corresponding to the product (64) of the number of bits of the multiplier Y (8 in the example of FIG. 10) and the number of bits (8) of the multiplicand X. A product calculation circuit (AND gate and full adder or half adder) is required. For example, in the case of a multiplier that multiplies 32-bit data, as many as 32 × 32 = 1024 partial product addition circuits are required, which significantly increases the area of the circuit.

【０００９】次に、第２の従来例に係る乗算装置の構成
図を図１１に示す。本従来例の乗算装置は、繰り返し加
算を用いた乗算装置であり、３２ビットデータＸ（Ｘ＜
０：３１＞）と３２ビットデータＹ（Ｙ＜０：３１＞）
の乗算を行う。Next, FIG. 11 shows a configuration diagram of a multiplication device according to a second conventional example. The conventional multiplication device is a multiplication device using repetitive addition, and has 32-bit data X (X <X
0:31>) and 32-bit data Y (Y <0:31>)
Multiplies by.

【００１０】図１１において、乗算器は、被乗数Ｘを保
持する３２ビットの被乗数レジスタＸＲ、乗数Ｙを保持
する３２ビットの乗数レジスタＹＲ、乗算途中の累積加
算結果を保持する３２ビットのレジスタＺＲ、被乗数レ
ジスタＸＲとレジスタＺＲとの内容を加算する３２ビッ
トの加算器ＡＤＤＥＲ、並びに、乗数レジスタＹＲの最
下位ビットＹ＜３１＞が”１”の時に加算器ＡＤＤＥＲ
の出力ＡＤＤ＜０：３１＞を、Ｙ＜３１＞が”０”の時
にレジスタＺＲの出力Ｚ＜０：３１＞を選択して出力す
る選択器ＳＥＬＲから構成されている。尚、最終的な乗
算結果は乗数レジスタＹＲに格納される。In FIG. 11, the multiplier includes a 32-bit multiplicand register XR holding a multiplicand X, a 32-bit multiplier register YR holding a multiplier Y, a 32-bit register ZR holding a cumulative addition result during multiplication, A 32-bit adder ADDER that adds the contents of the multiplicand register XR and the register ZR, and an adder ADDER when the least significant bit Y <31> of the multiplier register YR is "1"
Output ADD <0:31> of the selector ZLR for selecting and outputting the output Z <0:31> of the register ZR when Y <31> is “0”. The final multiplication result is stored in the multiplier register YR.

【００１１】本従来例の乗算装置では、レジスタＺＲの
最上位ビットＺ＜０＞に加算器ＡＤＤＥＲのキャリー出
力ＣＡＲＲＹを書き込むと同時に、レジスタＺＲのＺ＜
１：３１＞に選択器ＳＥＬＲの出力ＳＥＬ＜０：３０＞
を書き込み、また、乗数レジスタＹＲの最上位ビットＺ
＜０＞に選択器ＳＥＬＲの出力の最下位ビットＳＥＬ＜
３１＞を、乗数レジスタＹＲのＹ＜１：３１＞に乗数レ
ジスタＹＲのＹ＜０：３０＞を書き込む、という一連の
処理を３２回繰り返し行う。In the multiplication device of this conventional example, the carry output CARRY of the adder ADDER is written in the most significant bit Z <0> of the register ZR, and at the same time Z <of the register ZR is set.
1:31>, the output SEL <0:30> of the selector SELR
And the most significant bit Z of the multiplier register YR
<0> is the least significant bit SEL <of the output of the selector SELR
31> is written to Y <1:31> of the multiplier register YR and Y <0:30> of the multiplier register YR is repeated 32 times.

【００１２】つまり、１つの処理単位としては、乗数Ｙ
の各桁の値Ｙ＜ｉ＞（ｉ＝０〜３１）に応じて、累積加
算結果を保持するレジスタＺＲに、被乗数Ｘ＜０：３１
＞を１ビット右シフトさせながら足し込む（Ｙ＜ｉ＞＝
１の時）、または、足し込まずにレジスタＺＲの内容を
１ビット右シフトさせる（Ｙ＜ｉ＞＝０の時）という処
理と、第ｉ番目の処理で確定する乗算結果の上位から第
ｉ番目のビットを乗数レジスタＹＲの最上位ビットに、
乗数レジスタＹＲを１ビット右シフトさせながら格納す
るという処理とを行う。この処理単位を３２回（ｉ＝０
〜３１）繰り返し行った結果、乗算結果の下位３２ビッ
トが乗数レジスタＹＲに、上位３２ビットがレジスタＺ
Ｒにそれぞれ格納されることとなる。That is, as one processing unit, the multiplier Y
Depending on the value Y <i> (i = 0 to 31) of each digit of the multiplicand X <0:31
＞ is added by shifting 1 bit to the right (Y <i> =
1) or a process of right-shifting the contents of the register ZR by 1 bit without adding (when Y <i> = 0) and the multiplication result determined in the i-th process The th bit to the most significant bit of the multiplier register YR,
A process of storing the multiplier register YR while shifting it to the right by one bit is performed. This processing unit is performed 32 times (i = 0
~ 31) As a result of repeated execution, the lower 32 bits of the multiplication result are in the multiplier register YR and the upper 32 bits are in the register Z.
It will be stored in R respectively.

【００１３】このような第２の従来例の乗算装置では、
上記処理単位を乗数のビット数の回数（３２回）繰り返
す必要があり、しかも加算器ＡＤＤＥＲの加算はキャリ
ーの伝搬を伴うために演算時間が大きく、結果として乗
算の演算時間が非常に長いものとなる。In such a second conventional multiplication device,
It is necessary to repeat the above-mentioned processing unit the number of times of the number of bits of the multiplier (32 times), and the addition of the adder ADDER requires a long operation time because of carry propagation, resulting in a very long operation time of multiplication. Become.

【００１４】[0014]

【発明が解決しようとする課題】以上のように、従来の
乗算を行う演算装置では、第１の従来例のように高速な
乗算器を構成しようとするとハードウェア量が非常に大
きくなり、また、第２の従来例のようにハードウェア量
を削減しようとすると演算時間が非常に大きくなるとい
う二律背反的な問題があった。As described above, in the conventional arithmetic unit for performing multiplication, the hardware amount becomes very large when a high-speed multiplier is constructed as in the first conventional example. However, there is a trade-off problem that the calculation time becomes very long when trying to reduce the amount of hardware as in the second conventional example.

【００１５】本発明は、上記問題点を解決するもので、
演算処理時間をあまり増大をさせることなく、ハードウ
ェア量を大きく削減した、低コストの演算装置を提供す
ることを目的とする。The present invention solves the above-mentioned problems.
An object of the present invention is to provide a low-cost arithmetic device in which the amount of hardware is greatly reduced without increasing the arithmetic processing time.

【００１６】[0016]

【課題を解決するための手段】図１は本発明の原理説明
図である。FIG. 1 is a diagram for explaining the principle of the present invention.

【００１７】上記課題を解決するために、本発明の第１
の特徴は、図１に示す如く、ｎビット（ｎは任意の正整
数；図１ではｎ＝８）の乗数ＳＲ＜０：７＞を保持し、
クロックに同期して最上位ビットＭＳＢから最下位ビッ
トＬＳＢの方向へ１ビットシフトする乗数レジスタＳＲ
１と、前記乗数レジスタＳＲ１のシフト出力とｍビット
（ｍは任意の正整数；図１ではｍ＝８）の被乗数Ａ＜
０：７＞との部分積を生成する部分積生成部と、前記部
分積生成部出力と後記中間結果レジスタ出力とを加算す
る加算部と、前記加算部出力を保持する中間結果レジス
タＲＥＧ１６と、前記乗数ＳＲ＜０：７＞及び被乗数Ａ
＜０：７＞の乗算結果ＯＵＴ＜０：７＞を保持する出力
レジスタとを具備し、前記クロックの１サイクルの期間
に、前記乗数レジスタＳＲ１のシフト出力ＳＲ＜７＞と
被乗数Ａ＜０：７＞との部分積の生成、前記部分積生成
部出力と中間結果レジスタＲＥＧ１６出力との加算、加
算部出力の中間結果レジスタＲＥＧ１６への保持、並び
に前記加算部出力の１つを前記出力レジスタの最上位ビ
ットＭＳＢに書き込み、該出力レジスタを最上位ビット
から最下位ビットの方向へ１ビットシフトする動作を、
順次行なうことである。In order to solve the above problems, the first aspect of the present invention
1 has a n-bit (n is an arbitrary positive integer; n = 8 in FIG. 1) multiplier SR <0: 7> as shown in FIG.
Multiplier register SR for shifting one bit in the direction from the most significant bit MSB to the least significant bit LSB in synchronization with the clock
1 and the shift output of the multiplier register SR1 and m bits (m is an arbitrary positive integer; m = 8 in FIG. 1) multiplicand A <
0: 7> for generating a partial product, an adder for adding the output of the partial product and an output of an intermediate result register described later, and an intermediate result register REG16 for holding the output of the adder. The multiplier SR <0: 7> and the multiplicand A
An output register that holds a multiplication result OUT <0: 7> of <0: 7>, and a shift output SR <7> of the multiplier register SR1 and a multiplicand A <0: in a period of one cycle of the clock. 7>, a partial product is generated, addition of the output of the partial product and the output of the intermediate result register REG16, holding of the output of the adder in the intermediate result register REG16, and one of the outputs of the adder are stored in the output register. The operation of writing to the most significant bit MSB and shifting the output register by 1 bit from the most significant bit to the least significant bit,
It should be done sequentially.

【００１８】また、本発明の第２の特徴は、図３に示す
如く、ｎビット（ｎは任意の正整数；図３ではｎ＝８）
の乗数ＳＲ＜０：７＞を保持し、クロックに同期して最
上位ビットＭＳＢから最下位ビットＬＳＢの方向へｐビ
ット（ｐはｎ以下の正整数；図３ではｐ＝２）シフトす
る乗数レジスタＳＲ２と、前記乗数レジスタＳＲ２のシ
フト出力１ビットＳＲ＜７＞とｍビット（ｍは任意の正
整数；図３ではｍ＝８）の被乗数Ａ＜０：７＞との部分
積を生成する第１段目の部分積生成部と、前記第１段の
部分積生成部出力と後記中間結果レジスタ出力ＲＥＧ１
６とを加算する第１段目の加算部と、前記乗数レジスタ
ＳＲ２の最上位ビットからｉビット目（ｉ＝２〜ｐ）の
ビットと前記被乗数との部分積を生成する第ｉ段目の部
分積生成部と、前記第ｉ段目の部分積生成部出力と前記
ｉ−１段目の部分積生成部出力とを加算する第ｉ段目の
加算部と、前記第ｐ段目の加算部出力を保持する中間結
果レジスタＲＥＧ１６と、前記乗数ＳＲ＜０：７＞及び
被乗数Ａ＜０：７＞の乗算結果を保持する出力レジスタ
とを具備し、前記クロックの１サイクルの期間に、前記
乗数レジスタＳＲ２のシフト出力１ビットと被乗数との
部分積の生成、前記第１段目の部分積生成部出力と中間
結果レジスタＲＥＧ１６出力との加算、前記第２段目か
ら第ｐ段目までの部分積生成部及び加算部の演算、前記
第ｐ段目の加算部出力の中間結果レジスタＲＥＧ１６へ
の保持、並びに前記１からｐ段目の最下位桁の加算部出
力のｐビット分を前記出力レジスタの最上位からｐビッ
トに書き込み、該出力レジスタを最上位ビットから最下
位ビットの方向へｐビットシフトする動作を、順次行な
うことである。The second feature of the present invention is, as shown in FIG. 3, n bits (n is an arbitrary positive integer; n = 8 in FIG. 3).
A multiplier that holds the multiplier SR <0: 7> of p and shifts p bits (p is a positive integer less than or equal to n; p = 2 in FIG. 3) in the direction from the most significant bit MSB to the least significant bit LSB in synchronization with the clock. A partial product of the register SR2 and the shift output 1 bit SR <7> of the multiplier register SR2 and the m-bit (m is any positive integer; m = 8 in FIG. 3) multiplicand A <0: 7> is generated. Partial product generator of the first stage, output of the partial product generator of the first stage, and intermediate result register output REG1 described later.
6 in the first stage, and the i-th stage for generating a partial product of the i-th bit (i = 2 to p) from the most significant bit of the multiplier register SR2 and the multiplicand. A partial product generation unit, an i-th stage addition unit that adds the i-th stage partial product generation unit output and the i-1th stage partial product generation unit output, and the p-th stage addition An intermediate result register REG16 for holding a partial output and an output register for holding a multiplication result of the multiplier SR <0: 7> and the multiplicand A <0: 7>. Generation of a partial product of the shift output 1 bit of the multiplier register SR2 and the multiplicand, addition of the partial product generation unit output of the first stage and the intermediate result register REG16 output, and the second to p-th stages Operation of partial product generator and adder, adder of p-th stage Holding the force in the intermediate result register REG16 and writing p bits of the output of the adder of the least significant digit of the 1st to pth stages from the most significant bit of the output register to the most significant bit of the output register. The operation of shifting p bits in the direction of the least significant bit is sequentially performed.

【００１９】また、本発明の第３の特徴は、請求項１ま
たは２に記載の演算装置において、図１または図３に示
す如く、前記乗数レジスタＳＲ１またはＳＲ２と出力レ
ジスタは、兼用のレジスタＳＲ１またはＳＲ２で実現さ
れることである。A third feature of the present invention is that, in the arithmetic unit according to claim 1 or 2, as shown in FIG. 1 or 3, the multiplier register SR1 or SR2 and the output register also serve as a dual-purpose register SR1. Alternatively, it is realized by SR2.

【００２０】また、本発明の第４の特徴は、請求項１、
２、または３に記載の演算装置において、図４に示す如
く、前記演算装置は、前記クロックＣＬＯＣＫを生成す
るクロック発生回路ＣＬＫＧＥＮと、当該演算装置外部
より与えられる演算開始信号ＳＴＡＲＴに基づき、前記
クロック発生回路ＣＬＫＧＥＮから所定数のクロックＣ
ＬＯＣＫを生成させる制御回路ＣＮＴと、を具備するこ
とである。Further, a fourth feature of the present invention is as follows.
In the arithmetic unit described in 2 or 3, as shown in FIG. 4, the arithmetic unit uses the clock generation circuit CLKGEN for generating the clock CLOCK and the clock based on an arithmetic start signal START given from the outside of the arithmetic unit. A predetermined number of clocks C from the generation circuit CLKGEN
And a control circuit CNT for generating LOCK.

【００２１】また、本発明の第５の特徴は、請求項４に
記載の演算装置において、前記クロック発生回路ＣＬＫ
ＧＥＮは、前記乗数レジスタＳＲ１の一部、前記加算
部、及び前記部分積生成部の一部、或いは、前記乗数レ
ジスタＳＲ２の一部、前記第１段目から第ｐ段目まで加
算部、及び第１段目から第ｐ段目までの部分積生成部の
一部と、同一の構成を備えることである。The fifth feature of the present invention is that in the arithmetic unit according to the fourth aspect, the clock generation circuit CLK.
GEN includes a part of the multiplier register SR1, a part of the adder, and a part of the partial product generator, or a part of the multiplier register SR2, an adder from the first stage to the p-th stage, and This is to have the same configuration as part of the partial product generators from the first stage to the p-th stage.

【００２２】更に、本発明の第６の特徴は、前記制御回
路ＣＮＴは、前記演算開始信号ＳＴＡＲＴを受け取って
から演算が終了するまでの間、当該演算装置が演算実行
中である旨を示す信号ＲＥＡＤＹを外部に出力すること
である。Further, a sixth feature of the present invention is that the control circuit CNT is a signal indicating that the arithmetic unit is in execution of arithmetic operation from the receipt of the arithmetic start signal START to the end of the arithmetic operation. It is to output READY to the outside.

【００２３】[0023]

【作用】本発明の第１及び第３の特徴の演算装置では、
図１に示す如く、クロックＣＬＯＣＫの１サイクルの期
間に、部分積生成部により乗数レジスタＳＲ１のシフト
出力ＳＲ＜７＞と被乗数Ａ＜０：７＞との部分積を生成
し、加算部により部分積生成部出力と中間結果レジスタ
ＲＥＧ１６出力との加算を行い、該加算部出力を中間結
果レジスタＲＥＧ１６へ保持し、更に加算部出力の１つ
（最下位桁の和出力Ｚ）を出力レジスタの最上位ビット
ＭＳＢに書き込み、該出力レジスタを最上位ビットから
最下位ビットの方向へ１ビットシフトする、という一連
の動作を順次行なうようにしている。尚、第３の特徴の
演算装置では、乗数レジスタと出力レジスタを兼用のレ
ジスタＳＲ１で実現するので、シフト動作は最初の１回
のみである。In the arithmetic units having the first and third features of the present invention,
As shown in FIG. 1, the partial product generation unit generates a partial product of the shift output SR <7> of the multiplier register SR1 and the multiplicand A <0: 7> during the period of one cycle of the clock CLOCK, and the addition unit generates a partial product. The output of the product generator and the output of the intermediate result register REG16 are added, the output of the adder is held in the intermediate result register REG16, and one of the outputs of the adder (sum output Z of the least significant digit) is stored in the output register. A series of operations of writing to the upper bit MSB and shifting the output register by 1 bit from the most significant bit to the least significant bit is sequentially performed. In the arithmetic unit of the third feature, since it is realized by the register SR1 which also serves as the multiplier register and the output register, the shift operation is performed only once at the beginning.

【００２４】このように、乗算アレイを最小限の段数の
キャリーセーブ型の部分積計算アレイ（部分積生成部及
び加算部）で実現し、上記サイクルを繰り返し行うこと
により、１サイクルに要する時間を短縮すると共に、回
路規模を大きく削減した、低コストの演算装置を実現で
きる。As described above, the multiplication array is realized by a carry-save type partial product calculation array (partial product generation unit and addition unit) having a minimum number of stages, and the time required for one cycle is reduced by repeating the above cycle. It is possible to realize a low-cost arithmetic device with a shortened circuit scale and a greatly reduced circuit scale.

【００２５】また、本発明の第２及び第３の特徴の演算
装置では、図３に示す如く、クロックＣＬＯＣＫの１サ
イクルの期間に、乗数レジスタＳＲ２のシフト出力１ビ
ットＳＲ＜７＞と被乗数Ａ＜０：７＞との部分積を第１
段目の部分積生成部により生成し、第１段目の加算部に
より第１段目の部分積生成部出力と中間結果レジスタＲ
ＥＧ１６出力とを加算し、第２段目から第ｐ段目までの
部分積生成部及び加算部の演算を順次行い、第ｐ段目の
加算部出力を中間結果レジスタＲＥＧ１６へ保持し、更
に１から第ｐ段目の最下位桁の加算部出力のｐビット分
を出力レジスタの最上位からｐビットに書き込み、該出
力レジスタを最上位ビットから最下位ビットの方向へｐ
ビットシフトする、という一連の動作を順次行なうよう
にしている。尚、第３の特徴の演算装置では、乗数レジ
スタと出力レジスタを兼用のレジスタＳＲ２で実現する
ので、シフト動作は最初の１回のみである。Further, in the arithmetic units having the second and third characteristics of the present invention, as shown in FIG. 3, the shift output 1 bit SR <7> of the multiplier register SR2 and the multiplicand A are provided in the period of one cycle of the clock CLOCK. First the partial product with <0: 7>
It is generated by the partial product generation unit of the first stage, and the output of the partial product generation unit of the first stage and the intermediate result register R by the addition unit of the first stage.
The output of the EG16 is added, the operations of the partial product generation unit and the addition unit from the second stage to the pth stage are sequentially performed, and the output of the addition unit of the pth stage is held in the intermediate result register REG16. To p bits of the output of the adder of the least significant digit in the p-th stage are written from the most significant bit of the output register to the p bits, and the output register is shifted from the most significant bit to the least significant bit in the direction of p.
A series of operations such as bit shifting are sequentially performed. In the arithmetic unit of the third feature, since it is realized by the register SR2 which also serves as the multiplier register and the output register, the shift operation is performed only once at the beginning.

【００２６】このように、乗算アレイをｐ段のキャリー
セーブ型の部分積計算アレイ（ｐ段の部分積生成部及び
加算部）で実現して（被乗数のビット数）／ｐ回の繰り
返しサイクルで乗算を実行する。つまり、要求される演
算処理速度とハードウェア量により部分積計算アレイの
段数ｐを設定して、演算処理速度とハードウェア量のト
レードオフを実現でき、結果として、１サイクルに要す
る時間を短縮すると共に、全体の演算処理時間をあまり
増大させることなく、回路規模も削減した、低コストの
演算装置を実現できる。As described above, the multiplication array is realized by the p-stage carry-save type partial product calculation array (p-stage partial product generation section and addition section), and the number of bits of the multiplicand is repeated in p repetition cycles. Perform multiplication. In other words, the number of stages p of the partial product calculation array can be set according to the required arithmetic processing speed and the amount of hardware, and a trade-off between the arithmetic processing speed and the amount of hardware can be realized, and as a result, the time required for one cycle can be reduced. At the same time, it is possible to realize a low-cost arithmetic device in which the circuit scale is reduced without significantly increasing the overall arithmetic processing time.

【００２７】更に、本発明の第４、第５、及び第６の特
徴の演算装置では、図４に示す如く、当該演算装置外部
より与えられる演算開始信号ＳＴＡＲＴに基づき、制御
回路ＣＮＴは、クロック発生回路ＣＬＫＧＥＮから所定
数のクロックＣＬＯＣＫを生成するように制御してい
る。Further, in the arithmetic units of the fourth, fifth, and sixth features of the present invention, as shown in FIG. 4, the control circuit CNT controls the clock based on the arithmetic start signal START given from the outside of the arithmetic unit. The generation circuit CLKGEN is controlled to generate a predetermined number of clocks CLOCK.

【００２８】一般に乗算回路を備える、或いは乗算回路
と接続されるプロセッサにおいては、プロセッサ内、或
いはプロセッサ周辺に乗算回路用の高速クロックを生成
する発振器を必要とするが、発振器のコストが高く、発
振器から乗算回路までの遅延時間が大きく、更にクロッ
ク信号線の引き回しにより消費電力が増大する等の問題
がある。演算装置（乗算回路）内部にクロック発生回路
ＣＬＫＧＥＮを備えることにより、上記課題を克服し、
高速な演算処理を実現できる。Generally, a processor including a multiplication circuit or connected to the multiplication circuit needs an oscillator for generating a high-speed clock for the multiplication circuit in or around the processor, but the oscillator cost is high and the oscillator is high. There is a problem in that the delay time from the power supply circuit to the multiplication circuit is large, and the power consumption is increased due to the routing of the clock signal line. By providing the clock generation circuit CLKGEN in the arithmetic unit (multiplication circuit), the above problems are overcome,
High-speed arithmetic processing can be realized.

【００２９】特に、本発明の第５の特徴の演算装置で
は、図５に示す如く、クロック発生回路ＣＬＫＧＥＮの
構成を、第１の特徴の乗算装置に対しては乗数レジスタ
ＳＲ１の一部、部分積生成部、及び加算部の一部と同じ
構成（同じサイズのデバイスで構成）とし、また第２の
特徴の演算装置に対しては、乗数レジスタＳＲ２の一
部、第１段目から第ｐ段目までの加算部、及び第１段目
から第ｐ段目までの部分積生成部の一部と同じ構成とす
ることにより、乗算アレイとクロック生成回路ＣＬＫＧ
ＥＮとに同一の遅延特性を持たせることができ、製造プ
ロセスの条件や周囲環境の変化に対しても、誤動作を起
こすことなく安定した演算動作を保証できる。In particular, in the arithmetic unit having the fifth characteristic of the present invention, as shown in FIG. 5, the clock generating circuit CLKGEN has a configuration in which a part or part of the multiplier register SR1 is provided for the multiplying unit having the first characteristic. Part of the multiplier register SR2, the first stage to the p-th stage have the same configuration as the product generation unit and a part of the addition unit (configured with devices of the same size), and for the arithmetic unit of the second feature. The multiplication array and the clock generation circuit CLKG have the same configuration as the addition unit up to the stage and the partial product generation unit from the first stage to the p-th stage.
EN can be given the same delay characteristic, and stable arithmetic operation can be guaranteed without causing a malfunction even under the conditions of the manufacturing process and changes in the surrounding environment.

【００３０】更に、本発明の第６の特徴の演算装置で
は、制御回路ＣＮＴは、演算開始信号ＳＴＡＲＴを受け
取ってから演算が終了するまでの間、当該演算装置が演
算実行中である旨を示す信号ＲＥＡＤＹを外部に出力す
るようにしている。Further, in the arithmetic unit having the sixth characteristic of the present invention, the control circuit CNT indicates that the arithmetic unit is executing an arithmetic operation from the receipt of the arithmetic start signal START to the completion of the arithmetic operation. The signal READY is output to the outside.

【００３１】当該演算装置を内蔵するプロセッサ、或い
は当該乗算装置と接続されるプロセッサにおいては、演
算装置内部の動作クロックと異なるクロックで動いてい
るために、プロセッサ及び演算装置間で同期を取ること
が必要となる。このプロセッサ及び演算装置間の同期
を、演算開始信号ＳＴＡＲＴ及び演算実行中である旨を
示す信号ＲＥＡＤＹにより実現することにより、複雑な
ハードウェアを用いることなく実現可能となる。In the processor incorporating the arithmetic unit or the processor connected to the multiplying unit, the processor and the arithmetic unit can be synchronized because they operate at a clock different from the operation clock inside the arithmetic unit. Will be needed. By realizing the synchronization between the processor and the arithmetic unit by the arithmetic start signal START and the signal READY indicating that the arithmetic operation is being executed, it becomes possible to realize without using complicated hardware.

【００３２】[0032]

【実施例】次に、本発明に係る実施例を図面に基づいて
説明する。Embodiments of the present invention will now be described with reference to the drawings.

【００３３】（第１の実施例）図１に本発明の第１の実
施例に係る演算装置の構成図を示す。本実施例の演算装
置は、キャリー・セーブ・アダー方式を採用し、１段の
部分積計算アレイを用いた乗算器であり、８ビットの被
乗数データＡ（Ａ＜０：７＞）と８ビットの乗数データ
ＳＲ（ＳＲ＜０：７＞）の乗算を行って、乗算結果の下
位８ビットＯＵＴ（ＯＵＴ＜０：７＞）を得る。(First Embodiment) FIG. 1 shows a block diagram of an arithmetic unit according to a first embodiment of the present invention. The arithmetic unit according to the present embodiment is a carry-save-adder type multiplier that uses a one-stage partial product calculation array, and has 8-bit multiplicand data A (A <0: 7>) and 8-bit multiplicand data. Multiply the multiplier data SR (SR <0: 7>) to obtain the lower 8 bits OUT (OUT <0: 7>) of the multiplication result.

【００３４】図１において、乗算器は、８ビットのシフ
トレジスタから成り、乗数ＳＲを保持する乗数レジスタ
ＳＲ１と、部分積を累積加算する累積加算部と、前記累
積加算部からの中間結果を保持する１６ビットの中間結
果レジスタＲＥＧ１６とから構成されている。尚、乗算
結果ＯＵＴは乗数レジスタＳＲ１に格納される。In FIG. 1, the multiplier comprises an 8-bit shift register and holds a multiplier register SR1 for holding a multiplier SR, a cumulative adder for cumulatively adding partial products, and an intermediate result from the cumulative adder. 16-bit intermediate result register REG16. The multiplication result OUT is stored in the multiplier register SR1.

【００３５】累積加算部は１段の部分積計算アレイから
成り、部分積計算アレイは、部分積を発生する８個の論
理積ゲートＡＮ２より成る部分積生成部と、部分積を累
積加算する７個の全加算器ＦＡＤＤ及び１個の半加算器
ＨＡＤＤより成る加算部とから構成されている。参考と
して、図２（ａ）に全加算器ＦＡＤＤの入力Ａ，Ｂ，Ｃ
に対する出力Ｚ，ＣＯの真理値表を、図２（ｂ）に半加
算器ＨＡＤＤの入力Ｂ，Ｃに対する出力Ｚ，ＣＯの真理
値表を、それぞれ示す。The cumulative addition unit is composed of a one-stage partial product calculation array. The partial product calculation array cumulatively adds partial products with a partial product generation unit composed of eight AND gates AN2 for generating partial products. The number of full adders FADD and one half adder HADD. For reference, the inputs A, B, C of the full adder FADD are shown in FIG.
2B shows a truth table of outputs Z and CO for FIG. 2B, and a truth table of outputs Z and CO for inputs B and C of the half adder HADD.

【００３６】次に、本実施例の乗算器の動作を説明す
る。Next, the operation of the multiplier of this embodiment will be described.

【００３７】先ず、初めに乗数を乗数レジスタＳＲ１に
書き込む。尚、図１では図示していないが乗数レジスタ
ＳＲ１は８ビットの乗数データＳＲ＜０＞−ＳＲ＜７＞
を同時に各ビットに格納する機能をも有している。ここ
で、ＳＲ＜０＞が最上位ビット、ＳＲ＜７＞が最下位ビ
ットである。また、中間結果レジスタＲＥＧ１６を”
０”にクリアする。First, the multiplier is written in the multiplier register SR1. Although not shown in FIG. 1, the multiplier register SR1 has 8-bit multiplier data SR <0> -SR <7>.
Is also stored in each bit at the same time. Here, SR <0> is the most significant bit and SR <7> is the least significant bit. In addition, the intermediate result register REG16 is set to "
Clear to 0 ".

【００３８】次に、乗数レジスタＳＲ１を、クロックＣ
ＬＯＣＫ（図示せず）に同期して、最上位ビットＭＳＢ
から最下位ビットＬＳＢの方向へ１ビットシフトする。Next, the multiplier register SR1 is set to the clock C.
MSB most significant bit in synchronization with LOCK (not shown)
1 bit is shifted in the direction from to the least significant bit LSB.

【００３９】次に、部分積生成部により乗数レジスタＳ
Ｒ１のシフト出力ＳＲ＜７＞と被乗数Ａ＜０：７＞との
部分積をそれぞれ生成し、加算部により部分積生成部出
力と中間結果レジスタＲＥＧ１６出力との加算を行な
う。ここで第ｊビット目（ｊ＝２〜７）の全加算器ＦＡ
ＤＤは、部分積生成部出力を入力Ａ、前サイクルで保持
された中間レジスタＲＥＧ１６のＲＥＧ＜（ｊ−１）×
２＞を入力Ｂ、前サイクルで保持された中間レジスタＲ
ＥＧ１６のＲＥＧ＜（ｊ−１）×２＋１＞を入力Ｃとし
た全加算を行ない、キャリー出力ＣＯ及び和出力Ｚを生
成する。また、第８ビット目の半加算器ＨＡＤＤは、前
サイクルで保持された中間レジスタＲＥＧ１６のＲＥＧ
＜１４＞を入力Ｂ、前サイクルで保持された中間レジス
タＲＥＧ１６のＲＥＧ＜１５＞を入力Ｃとした半加算を
行ない、キャリー出力ＣＯ及び和出力Ｚを生成する。Next, the partial product generation unit causes the multiplier register S
Partial products of the shift output SR <7> of R1 and the multiplicand A <0: 7> are respectively generated, and the adder adds the partial product generator output and the intermediate result register REG16 output. Here, the j-th bit (j = 2 to 7) full adder FA
DD inputs the output of the partial product generation unit A, and REG <(j−1) × of the intermediate register REG16 held in the previous cycle.
2> is input B, the intermediate register R held in the previous cycle
The REG <(j−1) × 2 + 1> of the EG 16 is used as the input C for full addition to generate a carry output CO and a sum output Z. The eighth adder HADD has the REG of the intermediate register REG16 held in the previous cycle.
Half-addition is performed using <14> as the input B and REG <15> of the intermediate register REG16 held in the previous cycle as the input C to generate the carry output CO and the sum output Z.

【００４０】次に、加算部出力を中間結果レジスタＲＥ
Ｇ１６へ保持する。尚、中間レジスタＲＥＧ１６の最上
位ＲＥＧ＜０＞には、被乗数の最上位ビットＡ＜０＞と
乗数レジスタＳＲ１のシフト出力ＳＲ＜７＞との論理積
の結果が保持され、またＲＥＧ＜１＞−ＲＥＧ＜１４＞
には、全加算器ＦＡＤＤのキャリー出力ＣＯ及び和出力
Ｚが保持され、更にＲＥＧ＜１５＞には、半加算器ＨＡ
ＤＤのキャリー出力ＣＯが保持される。Next, the output of the adder is set to the intermediate result register RE.
Hold to G16. The highest-order REG <0> of the intermediate register REG16 holds the result of the logical product of the highest-order bit A <0> of the multiplicand and the shift output SR <7> of the multiplier register SR1, and REG <1>. -REG <14>
Holds the carry output CO and the sum output Z of the full adder FADD, and the REG <15> has a half adder HA.
The carry output CO of DD is held.

【００４１】更に、加算部の最下位桁の半加算器ＨＡＤ
Ｄの和出力Ｚを、出力レジスタ（ＳＲ１）の最上位ビッ
トＭＳＢに書き込む。Further, the least-significant half-adder HAD of the adder unit
The sum output Z of D is written in the most significant bit MSB of the output register (SR1).

【００４２】以上の一連の動作を順次、８回繰り返すこ
とにより、出力レジスタＳＲ１に乗算結果の下位８ビッ
トがＯＵＴ＜０＞−ＯＵＴ＜７＞として得られる。By sequentially repeating the above series of operations eight times, the lower 8 bits of the multiplication result are obtained in the output register SR1 as OUT <0> -OUT <7>.

【００４３】このように、本実施例の乗算器では、乗算
アレイを最小限の段数のキャリーセーブ型の部分積計算
アレイ（部分積生成部及び加算部）で実現し、上記サイ
クルを繰り返し行うことにより、１サイクルに要する時
間を短縮すると共に、回路規模を大きく削減した、低コ
ストの乗算器を実現できる。As described above, in the multiplier of this embodiment, the multiplication array is realized by the carry save type partial product calculation array (partial product generation unit and addition unit) having the minimum number of stages, and the above cycle is repeated. As a result, it is possible to realize a low-cost multiplier in which the time required for one cycle is shortened and the circuit scale is greatly reduced.

【００４４】（第２の実施例）図３に本発明の第２の実
施例に係る演算装置の構成図を示す。本実施例の演算装
置も第１の実施例と同様に、キャリー・セーブ・アダー
方式を採用し、２段の部分積計算アレイを用いた乗算器
であり、８ビットの被乗数データＡ（Ａ＜０：７＞）と
８ビットの乗数データＳＲ（ＳＲ＜０＞−ＳＲ＜７＞）
の乗算を行って、乗算結果の下位８ビットＯＵＴ（ＯＵ
Ｔ＜０＞−ＯＵＴ＜７＞）を得る。(Second Embodiment) FIG. 3 shows a block diagram of an arithmetic unit according to a second embodiment of the present invention. Similar to the first embodiment, the arithmetic unit of the present embodiment is also a multiplier that employs the carry save adder method and uses a two-stage partial product calculation array, and has 8-bit multiplicand data A (A <A < 0: 7>) and 8-bit multiplier data SR (SR <0> -SR <7>)
And the lower 8 bits OUT (OU
T <0> -OUT <7>) is obtained.

【００４５】図３において、乗算器は、乗数ＳＲを保持
する乗数レジスタＳＲ２と、部分積を累積加算する累積
加算部と、前記累積加算部からの中間結果を保持する１
６ビットの中間結果レジスタＲＥＧ１６とから構成され
ている。尚、乗算結果ＯＵＴは乗数レジスタＳＲ２に格
納される。In FIG. 3, the multiplier holds a multiplier register SR2 holding a multiplier SR, a cumulative adder for cumulatively adding partial products, and an intermediate result from the cumulative adder 1
It is composed of a 6-bit intermediate result register REG16. The multiplication result OUT is stored in the multiplier register SR2.

【００４６】乗数レジスタＳＲ２は、８ビットのシフト
レジスタであるが、１サイクルで２ビットのシフト動作
を行うために、ＳＲ＜０＞のシフト出力をＳＲ＜２＞
に、ＳＲ＜１＞のシフト出力をＳＲ＜３＞に、…といっ
た接続により構成されている。The multiplier register SR2 is an 8-bit shift register, but in order to perform a 2-bit shift operation in one cycle, the shift output of SR <0> is changed to SR <2>.
, The shift output of SR <1> to SR <3>, ...

【００４７】累積加算部は２段の部分積計算アレイＳＴ
ＡＧＥ１及びＳＴＡＧＥ２から成る。それぞれの部分積
計算アレイは、部分積を発生する８個の論理積ゲートＡ
Ｎ２より成る部分積生成部と、部分積を累積加算する７
個の全加算器ＦＡＤＤ及び１個の半加算器ＨＡＤＤより
成る加算部とから構成されている。The cumulative addition unit is a two-stage partial product calculation array ST.
It consists of AGE1 and STAGE2. Each partial product calculation array has eight AND gates A for generating partial products.
Partial product generator consisting of N2 and cumulative addition of partial products 7
The number of full adders FADD and one half adder HADD.

【００４８】次に、本実施例の乗算器の動作を説明す
る。Next, the operation of the multiplier of this embodiment will be described.

【００４９】先ず、初めに乗数を乗数レジスタＳＲ１に
書き込む。尚、図３では図示していないが乗数レジスタ
ＳＲ１は８ビットの乗数データＳＲ＜０＞−ＳＲ＜７＞
を同時に各ビットに格納する機能をも有している。ここ
で、ＳＲ＜０＞が最上位ビット、ＳＲ＜７＞が最下位ビ
ットである。また、中間結果レジスタＲＥＧ１６を”
０”にクリアする。First, the multiplier is written in the multiplier register SR1. Although not shown in FIG. 3, the multiplier register SR1 has 8-bit multiplier data SR <0> -SR <7>.
Is also stored in each bit at the same time. Here, SR <0> is the most significant bit and SR <7> is the least significant bit. In addition, the intermediate result register REG16 is set to "
Clear to 0 ".

【００５０】次に、乗数レジスタＳＲ１を、クロックＣ
ＬＯＣＫ（図示せず）に同期して、最上位ビットＭＳＢ
から最下位ビットＬＳＢの方向へ２ビットシフトする。Next, the multiplier register SR1 is set to the clock C.
MSB most significant bit in synchronization with LOCK (not shown)
From the least significant bit to the least significant bit LSB.

【００５１】次に、第１段目の部分積生成部により乗数
レジスタＳＲ１のシフト出力ＳＲ＜７＞と被乗数Ａ＜
０：７＞との部分積をそれぞれ生成し、第１段目の加算
部により第１段目の部分積生成部出力と中間結果レジス
タＲＥＧ１６出力との加算を行なう。ここで第ｊビット
目（ｊ＝２〜７）の全加算器ＦＡＤＤは、部分積生成部
出力を入力Ａ、前サイクルで保持された中間レジスタＲ
ＥＧ１６のＲＥＧ＜（ｊ−１）×２＞を入力Ｂ、前サイ
クルで保持された中間レジスタＲＥＧ１６のＲＥＧ＜
（ｊ−１）×２＋１＞を入力Ｃとした全加算を行ない、
キャリー出力ＣＯ及び和出力Ｚを生成する。また、第８
ビット目の半加算器ＨＡＤＤは、前サイクルで保持され
た中間レジスタＲＥＧ１６のＲＥＧ＜１４＞を入力Ｂ、
前サイクルで保持された中間レジスタＲＥＧ１６のＲＥ
Ｇ＜１５＞を入力Ｃとした半加算を行ない、キャリー出
力ＣＯ及び和出力Ｚを生成する。Next, the partial product generator of the first stage shifts the shift output SR <7> of the multiplier register SR1 and the multiplicand A <.
0: 7> and the partial product generator output of the first stage and the intermediate result register REG16 output are added by the adder of the first stage. Here, the full adder FADD of the jth bit (j = 2 to 7) inputs the output of the partial product generation unit A, and the intermediate register R held in the previous cycle.
REG <(j-1) × 2> of EG16 is input B, REG <of intermediate register REG16 held in the previous cycle
(J-1) × 2 + 1> is used as the input C for full addition,
Generate carry output CO and sum output Z. Also, the eighth
The half adder HADD of the bit is input B to REG <14> of the intermediate register REG16 held in the previous cycle,
RE of the intermediate register REG16 held in the previous cycle
Half-adding with G <15> as input C is performed to generate carry output CO and sum output Z.

【００５２】次に、第２段目の部分積生成部により乗数
レジスタＳＲ１のＳＲ＜６＞と被乗数Ａ＜０：７＞との
部分積をそれぞれ生成し、第２段目の加算部により第２
段目の部分積生成部出力と第１段目の加算部出力との加
算を行なう。Next, the partial product generator of the second stage generates partial products of SR <6> of the multiplier register SR1 and the multiplicand A <0: 7>, respectively, and the adder of the second stage generates the partial products. Two
The output of the partial product generator of the first stage and the output of the adder of the first stage are added.

【００５３】ここで第ｊビット目（ｊ＝２〜７）の全加
算器ＦＡＤＤは、第２段目の部分積生成部出力を入力
Ａ、第１段目の加算部のｊ−１ビット目の全加算器ＦＡ
ＤＤの和出力Ｚを入力Ｂ、第１段目の加算部のｊビット
目の全加算器ＦＡＤＤのキャリー出力ＣＯを入力Ｃとし
た全加算を行ない、キャリー出力ＣＯ及び和出力Ｚを生
成する。但し、第２ビット目の全加算器ＦＡＤＤのＢ入
力は第１段目の部分積生成部の１ビット目の出力であ
る。Here, the full adder FADD of the jth bit (j = 2 to 7) receives the output of the partial product generating section of the second stage as input A, and the j-1th bit of the adding section of the first stage. Full adder FA
The sum output Z of DD is input B, and the carry output CO of the j-th bit full adder FADD of the first stage adder is used as input C to perform full addition to generate a carry output CO and a sum output Z. However, the B input of the full adder FADD of the second bit is the output of the first bit of the partial product generator of the first stage.

【００５４】また、第８ビット目の半加算器ＨＡＤＤ
は、第１段目の加算部の７ビット目の全加算器ＦＡＤＤ
の和出力Ｚを入力Ｂ、第１段目の加算部の８ビット目の
半加算器ＨＡＤＤのキャリー出力ＣＯを入力Ｃとした半
加算を行ない、キャリー出力ＣＯ及び和出力Ｚを生成す
る。Also, the 8th bit half adder HADD
Is the 7th bit full adder FADD of the adder of the first stage
The input B is used as the sum output Z and the carry output CO of the eighth-bit half adder HADD of the adder of the first stage is used as the input C to perform half addition to generate the carry output CO and the sum output Z.

【００５５】次に、第２段目の加算部出力を中間結果レ
ジスタＲＥＧ１６へ保持する。尚、中間レジスタＲＥＧ
１６の最上位ＲＥＧ＜０＞には、被乗数の最上位ビット
Ａ＜０＞と乗数レジスタＳＲ１のＳＲ＜６＞との論理積
の結果が保持され、またＲＥＧ＜１＞−ＲＥＧ＜１４＞
には、全加算器ＦＡＤＤのキャリー出力ＣＯ及び和出力
Ｚが保持され、更にＲＥＧ＜１５＞には、半加算器ＨＡ
ＤＤのキャリー出力ＣＯが保持される。Next, the output of the adder of the second stage is held in the intermediate result register REG16. The intermediate register REG
The 16 most significant REG <0> holds the result of the logical product of the most significant bit A <0> of the multiplicand and SR <6> of the multiplier register SR1, and REG <1> -REG <14>.
Holds the carry output CO and the sum output Z of the full adder FADD, and the REG <15> has a half adder HA.
The carry output CO of DD is held.

【００５６】更に、第１段目の加算部の最下位桁の半加
算器ＨＡＤＤの和出力Ｚを、出力レジスタ（ＳＲ１）の
最上位ビットＭＳＢから２ビット目（ＳＲ＜１＞）に、
また、第２段目の加算部の最下位桁の半加算器ＨＡＤＤ
の和出力Ｚを、出力レジスタ（ＳＲ１）の最上位ビット
ＭＳＢ（ＳＲ＜０＞に）、それぞれ書き込む。Further, the sum output Z of the half adder HADD of the least significant digit of the adder of the first stage is transferred from the most significant bit MSB of the output register (SR1) to the second bit (SR <1>).
In addition, the half-adder HADD of the least significant digit of the adder in the second stage
The sum output Z is written to the most significant bit MSB (SR <0>) of the output register (SR1).

【００５７】以上の一連の動作を順次、４回繰り返すこ
とにより、出力レジスタＳＲ１に乗算結果の下位８ビッ
トがＯＵＴ＜０＞−ＯＵＴ＜７＞として得られる。By sequentially repeating the above series of operations four times, the lower 8 bits of the multiplication result are obtained as OUT <0> -OUT <7> in the output register SR1.

【００５８】第１の実施例の乗算器では、乗算時間が、
｛（部分積計算アレイの遅延時間）＋（中間結果レジス
タＲＥＧ１６の遅延時間）｝×（被乗数のビット数ｍ）
となるため、乗算の演算時間が大きいという欠点があ
る。In the multiplier of the first embodiment, the multiplication time is
{(Delay time of partial product calculation array) + (delay time of intermediate result register REG16)} × (number of bits of multiplicand m)
Therefore, there is a drawback that the calculation time of multiplication is long.

【００５９】それに対して、本実施例の乗算器では、乗
算アレイを（ｐ＝）２段のキャリーセーブ型の部分積計
算アレイ（２段の部分積生成部及び加算部）で実現し
て、（被乗数のビット数）／（部分積計算アレイの段
数：２段）＝４回の繰り返しサイクルで乗算を実行する
ことができる。つまり、要求される演算処理速度とハー
ドウェア量により部分積計算アレイの段数ｐを設定する
ことにより、演算処理速度とハードウェア量のトレード
オフを実現できる。On the other hand, in the multiplier of this embodiment, the multiplication array is realized by (p =) two-stage carry-save type partial product calculation array (two-stage partial product generation unit and addition unit). (Number of bits of multiplicand) / (Number of stages of partial product calculation array: 2 stages) = Multiplication can be executed in four repeating cycles. That is, by setting the number p of stages of the partial product calculation array according to the required arithmetic processing speed and the required amount of hardware, a trade-off between the arithmetic processing speed and the required amount of hardware can be realized.

【００６０】その結果として、１サイクルに要する時間
を短縮すると共に、全体の演算処理時間をあまり増大さ
せることなく、回路規模も削減した、低コストの演算装
置を実現できる。As a result, it is possible to realize a low-cost arithmetic unit in which the time required for one cycle is shortened and the circuit scale is reduced without significantly increasing the overall arithmetic processing time.

【００６１】（第３の実施例）図４に本発明の第３の実
施例に係る演算装置の構成図を示す。(Third Embodiment) FIG. 4 is a block diagram of an arithmetic unit according to the third embodiment of the present invention.

【００６２】同図において、本実施例の演算装置は、被
乗数データＡと乗数データＢの乗算を行い、乗算結果Ｏ
ＵＴを得る乗算器ＭＵＬと、内部クロックＣＬＯＣＫを
生成する内部クロック発生回路ＣＬＫＧＥＮと、当該演
算装置外部より与えられる演算開始信号ＳＴＡＲＴに基
づきクロック制御信号ＳＷを発して、内部クロック発生
回路ＣＬＫＧＥＮから所定数の内部クロックＣＬＯＣＫ
を生成させ、また演算開始信号ＳＴＡＲＴを受け取って
から演算が終了するまでの間、当該演算装置が演算実行
中である旨を示す乗算終了信号ＲＥＡＤＹを外部に出力
する制御回路ＣＮＴとから構成されている。In the figure, the arithmetic unit of this embodiment multiplies the multiplicand data A and the multiplier data B, and the multiplication result O
A multiplier MUL that obtains UT, an internal clock generation circuit CLKGEN that generates an internal clock CLOCK, and a clock control signal SW that is issued based on an operation start signal START given from the outside of the operation device, and a predetermined number is output from the internal clock generation circuit CLKGEN. Internal clock CLOCK
And a control circuit CNT which outputs a multiplication end signal READY indicating that the arithmetic unit is in the middle of arithmetic operation from the reception of the arithmetic start signal START to the end of the arithmetic operation. There is.

【００６３】乗算器ＭＵＬは、第１の実施例で示された
乗算器ＭＵＬ１または第２の実施例で示された乗算器Ｍ
ＵＬ２である。The multiplier MUL is the multiplier MUL1 shown in the first embodiment or the multiplier M shown in the second embodiment.
It is UL2.

【００６４】ＭＵＬ１→ＭＵＬ２の段数が２段の場合の
内部クロック発生器ＣＬＫＧＥＮの回路構成図を図５に
示す。同図の内部クロック発生器ＣＬＫＧＥＮは、乗算
器ＭＵＬ１（第１の実施例）に供給される場合の構成で
あり、乗算器ＭＵＬ１における乗数レジスタＳＲ１から
部分積生成までの遅延時間と同等の遅延時間を有するダ
ミー回路と、制御回路ＣＮＴからのクロック制御信号Ｓ
Ｗとダミー回路出力との論理積をとる論理積ゲートＡＮ
２Ｙと、クロックバッファＣＢ１及びＣＢ２とから構成
されている。FIG. 5 shows a circuit diagram of the internal clock generator CLKGEN when the number of stages of MUL1 → MUL2 is two. The internal clock generator CLKGEN in the figure has a configuration when supplied to the multiplier MUL1 (first embodiment), and has a delay time equivalent to the delay time from the multiplier register SR1 to the partial product generation in the multiplier MUL1. And a clock control signal S from the control circuit CNT.
AND gate AN that takes the logical product of W and the output of the dummy circuit
2Y and clock buffers CB1 and CB2.

【００６５】ダミー回路は、乗算器ＭＵＬ１において１
回のクロックサイクルで乗算データが通過するのに必要
な時間と同じ遅延時間を持つ回路を実現するものであ
り、乗数レジスタＳＲ１の１ビット分と同等の回路で構
成される（同一サイズの素子を使用した）ダミーレジス
タＳＲＸと、論理積ゲートＡＮ２と同等の回路で構成さ
れるダミーゲートＡＮ２Ｘと、全加算器ＦＡＤＤと同等
の回路で構成されるダミー全加算器ＦＡＤＤＸとを備え
て構成されている。The dummy circuit is 1 in the multiplier MUL1.
This circuit realizes a circuit having the same delay time as the time required for the multiplication data to pass in one clock cycle, and is configured by a circuit equivalent to one bit of the multiplier register SR1 (elements of the same size are The dummy register SRX (used), a dummy gate AN2X composed of a circuit equivalent to the AND gate AN2, and a dummy full adder FADDX composed of a circuit equivalent to the full adder FADD are provided. .

【００６６】制御回路ＣＮＴは、クロック制御信号ＳＷ
をアクティブにし、内部クロックＣＬＯＣＫが所定数だ
け生成されると、クロック制御信号ＳＷをインアクティ
ブにする。つまり、図５において、クロック制御信号Ｓ
Ｗが”Ｈ”レベルの時は、ダミー回路〜論理積ゲートＡ
Ｎ２Ｙの経路がオンとなり、パルス幅がダミー回路及び
論理積ゲートＡＮ２Ｙの遅延時間である内部クロックＣ
ＬＯＣＫが発生し、クロック制御信号ＳＷが”Ｌ”レベ
ルの時は、内部クロックＣＬＯＣＫは停止する。The control circuit CNT has a clock control signal SW.
When the internal clock CLOCK is generated by a predetermined number, the clock control signal SW is made inactive. That is, in FIG. 5, the clock control signal S
When W is at "H" level, dummy circuit to AND gate A
The internal clock C in which the path of N2Y is turned on and the pulse width is the delay time of the dummy circuit and the AND gate AN2Y
When LOCK is generated and the clock control signal SW is at "L" level, the internal clock CLOCK is stopped.

【００６７】一般に演算装置を備える、或いは演算装置
と接続されるプロセッサにおいては、プロセッサ内、或
いはプロセッサ周辺に演算装置用の高速クロックを生成
する発振器を必要とするが、・演算装置外部に置かれる発振器（クロック発生器）の
コストが高い、・発振器から演算装置までの遅延時間が大きい、・クロック信号線の引き回しにより消費電力が増大す
る、等の問題がある。本実施例のように、演算装置内部に内
部クロック発生回路ＣＬＫＧＥＮを備えることにより、
上記課題を克服し、高速な演算処理を実現できる。Generally, a processor equipped with an arithmetic unit or connected to the arithmetic unit needs an oscillator for generating a high-speed clock for the arithmetic unit in or around the processor. There is a problem that the cost of the oscillator (clock generator) is high, the delay time from the oscillator to the arithmetic unit is long, the power consumption increases due to the routing of the clock signal line, and so on. By providing the internal clock generation circuit CLKGEN inside the arithmetic unit as in this embodiment,
The above problems can be overcome and high-speed arithmetic processing can be realized.

【００６８】また、乗算器ＭＵＬは乗算アレイの遅延時
間が乗算器内部で発生するクロック周期より遅くなる
と、動作しないという問題がある。また他方で、回路の
遅延時間は、製造プロセスの条件や電圧、温度といった
周囲状況で変化し、その変化の大小は回路の種類によっ
て異なる。Further, there is a problem that the multiplier MUL does not operate when the delay time of the multiplication array is later than the clock cycle generated inside the multiplier. On the other hand, the delay time of the circuit changes depending on the conditions of the manufacturing process, ambient conditions such as voltage and temperature, and the magnitude of the change depends on the type of circuit.

【００６９】従って、内部クロック発生回路ＣＬＫＧＥ
Ｎを、乗算アレイと全く異なる回路構成で作る場合、条
件の変動があってもクロック周期を乗算アレイの遅延時
間より大きくなるように構成する必要があるために、ク
ロック周期を乗算アレイの遅延時間より十分遅くしてや
る必要がある。Therefore, the internal clock generation circuit CLKGE
When N is formed by a circuit configuration that is completely different from that of the multiplication array, it is necessary to configure the clock period to be larger than the delay time of the multiplication array even if the conditions change. I need to make it slow enough.

【００７０】ところが、本実施例のように、内部クロッ
ク発生回路ＣＬＫＧＥＮの構成を、第２の実施例の演算
器ＭＵＬ２に対しては、乗数レジスタＳＲ２の一部、第
１段目の部分積生成部、第１段目の加算部の一部、とす
れば、乗算アレイとクロック生成回路ＣＬＫＧＥＮとに
同一の遅延特性を持たせることができ、製造プロセスの
条件や周囲環境の変化に対しても、誤動作を起こすこと
なく安定した演算動作を保証できる。However, as in the present embodiment, the internal clock generation circuit CLKGEN is configured so that, for the arithmetic unit MUL2 of the second embodiment, a part of the multiplier register SR2 and a partial product of the first stage are generated. Section, a part of the first-stage addition section, the multiplication array and the clock generation circuit CLKGEN can have the same delay characteristics, so that even if the manufacturing process conditions and the surrounding environment change, , Stable operation can be guaranteed without causing malfunction.

【００７１】更に、当該演算装置を内蔵するプロセッ
サ、或いは当該演算装置と接続されるプロセッサ等にお
いては、該プロセッサの動作クロックが、演算装置内部
の動作クロックと異なるクロックで動いているために、
プロセッサ及び演算装置間で同期を取る必要がある。Further, in the processor incorporating the arithmetic unit or the processor connected to the arithmetic unit, the operating clock of the processor operates at a clock different from the operating clock inside the arithmetic unit.
It is necessary to synchronize the processor and the arithmetic unit.

【００７２】本実施例では、このプロセッサ及び演算装
置間の同期を、演算開始信号ＳＴＡＲＴ及び演算実行中
である旨を示す乗算終了信号ＲＥＡＤＹにより実現して
いる。つまり、演算装置が実行中である旨を乗算終了信
号ＲＥＡＤＹによって示されている間は、プロセッサの
動作を停止することにより、複雑なハードウェアを用い
ることなく同期を取ることが可能となる。In this embodiment, the synchronization between the processor and the arithmetic unit is realized by the arithmetic start signal START and the multiplication end signal READY indicating that the arithmetic operation is being executed. That is, by stopping the operation of the processor while the multiplication end signal READY indicates that the arithmetic unit is executing, it is possible to achieve synchronization without using complicated hardware.

【００７３】（第４の実施例）図６に本発明の第４の実
施例に係る演算装置の構成図を示す。本実施例の演算装
置は、キャリー・セーブ・アダー方式を採用し、２段の
部分積計算アレイを用いた乗算器であり、３２ビットの
被乗数データＡ（Ａ＜０：３１＞）と３２ビットの乗数
データＢ（Ｂ＜０＞−Ｂ＜３１＞）の乗算を行って、６
４ビットの乗算結果ＯＵＴ（ＯＵＴ＜０＞−ＯＵＴ＜６
３＞）を得る。(Fourth Embodiment) FIG. 6 shows a block diagram of an arithmetic unit according to a fourth embodiment of the present invention. The arithmetic unit of the present embodiment is a multiplier that employs a carry save adder method and uses a two-stage partial product calculation array, and has 32-bit multiplicand data A (A <0:31>) and 32-bit multiplicand data. Multiplying the multiplier data B (B <0> -B <31>) of 6
4-bit multiplication result OUT (OUT <0> -OUT <6
3>) is obtained.

【００７４】同図において、本実施例の演算装置は、乗
数Ｂを保持する乗数レジスタＢＲと、部分積を累積加算
する累積加算部と、前記累積加算部からの中間結果を保
持する６４ビットの中間結果レジスタＲＥＧ６４と、演
算の最後に各ビット毎の和及びキャリーを加算して最終
的な結果を算出する最終加算部ＦＳＡと、内部クロック
ＣＬＯＣＫを生成する内部クロック発生回路ＣＬＫＧＥ
Ｎと、当該演算装置外部より与えられる演算開始信号Ｓ
ＴＡＲＴに基づきクロック制御信号ＳＷを発して、内部
クロック発生回路ＣＬＫＧＥＮから所定数の内部クロッ
クＣＬＯＣＫを生成させ、また演算開始信号ＳＴＡＲＴ
を受け取ってから演算が終了するまでの間、当該演算装
置が演算実行中である旨を示す乗算終了信号ＲＥＡＤＹ
を外部に出力する制御回路ＣＮＴとから構成されてい
る。尚、乗算結果ＯＵＴは最終加算部ＦＳＡの出力ＯＵ
Ｔ＜０：３１＞と、乗数レジスタＢＲの出力ＯＵＴ＜３
３：６３＞により得られる。In the figure, the arithmetic unit of this embodiment has a multiplier register BR for holding a multiplier B, a cumulative adder for cumulatively adding partial products, and a 64-bit for holding an intermediate result from the cumulative adder. An intermediate result register REG64, a final adder FSA that adds the sum and carry for each bit at the end of the operation to calculate a final result, and an internal clock generation circuit CLKGE that generates an internal clock CLOCK.
N and a calculation start signal S given from the outside of the calculation device
The clock control signal SW is generated based on TART to generate a predetermined number of internal clocks CLOCK from the internal clock generation circuit CLKGEN, and the calculation start signal START is generated.
The multiplication end signal READY indicating that the operation device is executing the operation from the reception of the
Is output to the outside. The multiplication result OUT is the output OU of the final addition unit FSA.
T <0:31> and the output OUT <3 of the multiplier register BR
3:63>.

【００７５】乗数レジスタＳＲ２は、始めに乗数Ｂ＜
０：３１＞を格納し、その後、内部クロックＣＬＯＣＫ
に同期して２ビットずつシフト動作を行う３２ビットの
シフトレジスタであり、入力側に入力セレクタＢＩＳ、
出力側に出力セレクタＢＯＳを備える。Multiplier register SR2 starts with multiplier B <
0:31>, and then the internal clock CLOCK
Is a 32-bit shift register that performs a 2-bit shift operation in synchronization with the input selector BIS,
An output selector BOS is provided on the output side.

【００７６】中間結果レジスタＲＥＧ６４は、累積加算
部からの出力である部分積の和とキャリー（中間結果）
を一時的に保持する６４ビットレジスタであり、入力側
に入力セレクタＲＩＳ、出力側に出力セレクタＲＯＳを
備える。The intermediate result register REG64 is a sum of partial products output from the cumulative addition unit and a carry (intermediate result).
Is a 64-bit register for temporarily holding the input selector RIS on the input side and the output selector ROS on the output side.

【００７７】累積加算部は２段の部分積計算アレイＳＴ
ＡＧＥ１及びＳＴＡＧＥ２から成る。それぞれの部分積
計算アレイは、部分積を発生する３２個の論理積ゲート
ＡＮ２より成る部分積生成部と、部分積を累積加算する
３１個の全加算器ＦＡＤＤ及び１個の半加算器ＨＡＤＤ
より成る加算部とから構成されている。図７に、第２段
目の部分積計算アレイの詳細回路構成図を示す。The cumulative addition unit is a two-stage partial product calculation array ST.
It consists of AGE1 and STAGE2. Each partial product calculation array includes a partial product generation unit that includes 32 AND gates AN2 that generate partial products, 31 full adders FADD and 1 half adder HADD that cumulatively add partial products.
And an adding section. FIG. 7 shows a detailed circuit configuration diagram of the second stage partial product calculation array.

【００７８】内部クロック発生回路ＣＬＫＧＥＮは、ク
ロック制御信号ＳＷに基づき内部クロックＣＬＯＣＫを
生成するもので、第３の実施例における内部クロック発
生回路ＣＬＫＧＥＮ（図５参照）と同様であるが、本実
施例の内部クロック発生回路ＣＬＫＧＥＮは、第３の実
施例で説明した乗算器ＭＵＬ２に対する構成、即ちダミ
ー回路を乗数レジスタＳＲ２の一部、第１段目の部分積
生成部、第１段目の加算部の一部、第２段目の部分積生
成部、及び第２段目の加算部の一部と同じ構成とした場
合、に相当する。The internal clock generation circuit CLKGEN is for generating the internal clock CLOCK based on the clock control signal SW and is similar to the internal clock generation circuit CLKGEN (see FIG. 5) in the third embodiment, but this embodiment is the same. The internal clock generation circuit CLKGEN of FIG. 2 has a configuration for the multiplier MUL2 described in the third embodiment, that is, a dummy circuit is a part of the multiplier register SR2, a first-stage partial product generation unit, and a first-stage addition unit. Of the second stage, the partial product generating section of the second stage, and the adding section of the second stage.

【００７９】制御回路ＣＮＴは、内部クロックＣＬＯＣ
Ｋの回数をカウントするインクリメンタＣＩＮＣ、カウ
ンタ用レジスタＣＴＲ、インクリメンタＣＩＮＣ及びａ
ｌｌ−ｚｅｒｏ値を選択してカウンタ用レジスタＣＴＲ
に供給するセレクタＣＴＳ、制御信号ＣＴＥＮＢ及びＥ
ＮＢを生成するデコーダＤＥＣ、排他的論理和ゲートＸ
Ｏ１、論理和ゲートＯＲ１、遅延素子（ディレイ）ＤＬ
Ｙ、クロックドインバータＣＩ１及びＣＩ２、並びにイ
ンバータＩ３から構成されている。The control circuit CNT uses the internal clock CLOC.
Incrementer CINC that counts the number of times K, counter register CTR, incrementer CINC and a
Select the ll-zero value to register the counter CTR
CTS and control signals CTENB and E supplied to
Decoder DEC for generating NB, exclusive OR gate X
O1, OR gate OR1, delay element (delay) DL
Y, clocked inverters CI1 and CI2, and an inverter I3.

【００８０】図８は、本実施例の演算装置の動作を説明
するタイミングチャートである。FIG. 8 is a timing chart for explaining the operation of the arithmetic unit of this embodiment.

【００８１】先ず、同図に示される各信号及び図６で使
用する内部信号について説明する。First, each signal shown in the figure and the internal signal used in FIG. 6 will be described.

【００８２】演算開始信号ＳＴＡＲＴは、演算装置外部
より与えられる信号で、立ち上がりおよび立ち上がりで
演算の開始を知らせる。乗算終了信号ＲＥＡＤＹは、演
算装置が演算実行中である旨を示す信号で、立ち上がり
および立ち下がりで演算が終了したことを知らせる。従
って、図８のタイミングチャートのように、演算中はＳ
ＴＡＲＴ≠ＲＥＡＤＹであり、演算待ちの状態ではＳＴ
ＡＲＴ＝ＲＥＡＤＹである。The calculation start signal START is a signal given from the outside of the calculation device and indicates the start of calculation at the rising edge and the rising edge. The multiplication end signal READY is a signal indicating that the arithmetic unit is executing an arithmetic operation, and notifies that the arithmetic operation is completed at the rising and falling edges. Therefore, as shown in the timing chart of FIG.
When TART ≠ READY and the operation is waiting, ST
ART = READY.

【００８３】カウンタ出力ＣＴ＜０：４＞は、インクリ
メンタＣＩＮＣの結果を一時的に保存するカウンタ用レ
ジスタＣＴＲの出力５ビットであり、”０”から”１
６”まで内部クロックＣＬＯＣＫに同期してインクリメ
ントされる。演算待ちの状態では”１６”で停止してお
り、乗算の初期化が行われると”０”になる。The counter output CT <0: 4> is the output 5 bits of the counter register CTR for temporarily storing the result of the incrementer CINC, and is "0" to "1".
It is incremented up to 6 "in synchronization with the internal clock CLOCK. It is stopped at" 16 "in the operation waiting state, and becomes" 0 "when the initialization of multiplication is performed.

【００８４】リセット信号ＣＴＥＮＢは、最初にカウン
タ出力ＣＴ＜０：４＞の値を正しくリセットするための
信号であり、デコーダＤＥＣにより、ＣＴ＜０：４＞＝
１００００の時はＣＴＥＮＢ＝０、ＣＴ＜０：４＞≠１
００００の時はＣＴＥＮＢ＝１となるよう生成される。
従って、通常動作時には、乗算中にＣＴＥＮＢ＝１、待
機中にＣＴＥＮＢ＝０となる。The reset signal CTENB is a signal for first correctly resetting the value of the counter output CT <0: 4>, and CT <0: 4> = by the decoder DEC.
When 10,000, CTENB = 0, CT <0: 4> ≠ 1
When it is 0000, it is generated so that CTENB = 1.
Therefore, during normal operation, CTENB = 1 during multiplication and CTENB = 0 during standby.

【００８５】クロック制御信号ＳＷは、内部クロックＣ
ＬＯＣＫの発生／停止を制御する信号であり、ＳＷ＝１
の時内部クロックＣＬＯＣＫが発生し、ＳＷ＝０の時内
部クロックＣＬＯＣＫは停止する。クロック制御信号Ｓ
Ｗは、演算開始信号ＳＴＡＲＴ及び乗算終了信号ＲＥＡ
ＤＹの排他的論理和（記号表現として、＾を使用する）
の結果と、リセット信号ＣＴＥＮＢとの論理和をとるこ
とにより生成される。従って、ＣＴＥＮＢ＝１の間は無
条件にＳＷ＝１であり、ＣＴＥＮＢ≠１の時にはＳＴＡ
ＲＴ＾ＲＥＡＤＹ＝１の時、即ちＳＴＡＲＴ≠ＲＥＡＤ
Ｙの時（演算中）にＳＷ＝１となり内部クロックＣＬＯ
ＣＫを発生し、ＳＴＡＲＴ＾ＲＥＡＤＹ＝０の時にはＳ
Ｗ＝０となり内部クロックＣＬＯＣＫを停止させる。The clock control signal SW is the internal clock C.
This is a signal for controlling the generation / stop of LOCK, and SW = 1
When SW = 0, the internal clock CLOCK is generated, and when SW = 0, the internal clock CLOCK is stopped. Clock control signal S
W is a calculation start signal START and a multiplication end signal REA
Exclusive OR of DY (use ^ as symbolic expression)
And the reset signal CTENB. Therefore, SW = 1 while CTENB = 1, and STA when CTENB ≠ 1.
When RT ^ READY = 1, that is, START ≠ READ
When Y (during calculation), SW = 1 and internal clock CLO
When CK is generated and START ^ READY = 0, S
W = 0 and the internal clock CLOCK is stopped.

【００８６】内部クロックＣＬＯＣＫは演算の制御に用
いられ、クロック制御信号ＳＷに基づき生成される。即
ち、ＳＷ＝１の時に発生し、ＳＷ＝０の時に停止しＣＬ
ＯＣＫ＝０に固定される。The internal clock CLOCK is used to control the operation and is generated based on the clock control signal SW. That is, it occurs when SW = 1 and stops when SW = 0 and CL
It is fixed at OCK = 0.

【００８７】イネーブル信号ＥＮＢは、カウンタ出力Ｃ
Ｔ＜４＞を用いており、カウンタが”１６”を示してい
る時にＥＮＢ＝１であり、他の時にはＥＮＢ≠１であ
る。また、イネーブル信号ＥＮＢは初期化のイネーブル
信号であり、ＥＮＢ＝１の状態で内部クロックＣＬＯＣ
Ｋが立ち上がると、乗算開始のための初期化動作が行わ
れる。初期化が行われると、ＣＴ＜０：４＞＝０≠１６
となるので、ＥＮＢ＝０へと変化する。それ以降は、乗
算が終了するまでＥＮＢ＝０であり、ＥＮＢ＝１にはな
らない。また、ＥＮＢ＝１の間は、最終加算部ＦＳＡが
動作するので、乗算が終了した時に加算結果が得られる
ことになる。The enable signal ENB is the counter output C.
When T <4> is used, ENB = 1 when the counter indicates "16", and ENB ≠ 1 at other times. Further, the enable signal ENB is an enable signal for initialization, and when ENB = 1, the internal clock CLOC is used.
When K rises, an initialization operation for starting multiplication is performed. When initialization is performed, CT <0: 4> = 0 ≠ 16
Therefore, ENB = 0 is changed. After that, ENB = 0 and ENB = 1 does not hold until the multiplication is completed. Further, since the final addition unit FSA operates while ENB = 1, the addition result will be obtained when the multiplication is completed.

【００８８】乗数レジスタＢＲの出力ＳＨＦ＜０：３１
＞；乗算の初期化が行われた時、乗数レジスタＢＲには
乗数Ｂ＜０：３１＞がデータとして取り込まれる。その
後の演算中には、内部クロックＣＬＯＣＫに同期して２
ビットずつシフトが行われ、出力ＳＨＦの下位ビットで
あるＳＨＦ＜３１＞，ＳＨＦ＜３０＞をそれぞれ部分積
生成部ＰＳＧ１，ＰＳＧ２に出力する。また内部クロッ
クＣＬＯＣＫの立ち下がりで、加算部ＳＡ１，ＳＡ２か
らそれぞれ最下位ビットＳＯ，ＳＯ２を受け取り、ＳＨ
Ｆ＜０＞，ＳＨＦ＜１＞として格納する。乗算の終了時
には、ＳＨＦ＜０：３１＞には乗算結果の内の下位３２
ビットが保持されている状態となり、ＥＮＢ＝１の時に
はこれらの値が結果の一部として出力される。ＥＮＢ＝
０の時には結果の下位３２ビット（ＯＵＴ＜３３：６３
＞）は全て”０”に固定される。Output of multiplier register BR SHF <0:31
>; When the multiplication is initialized, the multiplier B is loaded with the multiplier B <0:31> as data. During the subsequent calculation, 2 in synchronization with the internal clock CLOCK.
The bits are shifted bit by bit, and the lower bits of the output SHF, SHF <31> and SHF <30>, are output to the partial product generators PSG1 and PSG2, respectively. At the falling edge of the internal clock CLOCK, the least significant bits SO and SO2 are received from the adders SA1 and SA2, respectively, and SH
It is stored as F <0> and SHF <1>. At the end of the multiplication, SHF <0:31> contains the lower 32 bits of the multiplication result.
The bits are held and when ENB = 1 these values are output as part of the result. ENB =
When 0, the lower 32 bits of the result (OUT <33:63
>) Are all fixed to "0".

【００８９】加算部ＳＡ２の和出力Ｓ＜０：３１＞は、
加算部ＳＡ２の全加算器ＦＡＤＤ、半加算器ＨＡＤＤか
らの出力の内、和に対応する。The sum output S <0:31> of the adder SA2 is
It corresponds to the sum of the outputs from the full adder FADD and the half adder HADD of the adder SA2.

【００９０】加算部ＳＡ２のキャリー出力Ｃ＜０：３１
＞は、加算部ＳＡ２の全加算器ＦＡＤＤ、半加算器ＨＡ
ＤＤからの出力の内、キャリーに対応する。Carry output of adder SA2 C <0:31
> Is a full adder FADD and a half adder HA of the adder SA2
Corresponds to carry in the output from DD.

【００９１】中間結果レジスタＲＥＧ６４の出力ＲＥＧ
＜０：６３＞；中間結果レジスタＲＥＧ６４には、加算
部ＳＡ２の出力の最下位ビットの部分和ＳＯ２を除くＳ
＜０：３１＞、Ｃ＜０：３１＞が一時的に格納される。Output REG of the intermediate result register REG64
<0:63>; the intermediate result register REG64 has S excluding the partial sum SO2 of the least significant bit of the output of the adder SA2.
<0:31> and C <0:31> are temporarily stored.

【００９２】次に、本実施例の演算装置の動作を説明す
る。２段の部分積計算アレイＳＴＡＧＥ１及びＳＴＡＧ
Ｅ２により累積加算を行うため、この２段の部分積計算
アレイＳＴＡＧＥ１及びＳＴＡＧＥ２による処理を１６
回ずつ繰り返し行うことにより乗算が実行される。Next, the operation of the arithmetic unit of this embodiment will be described. Two-stage partial product calculation arrays STAGE1 and STAG
Since the cumulative addition is performed by E2, the processing by the two-stage partial product calculation arrays STAGE1 and STAGE2 is performed 16 times.
The multiplication is executed by repeating each time.

【００９３】（Ａ）リセット時電源をオンにした場合等には、リセット動作を行い、正
しい演算が行われるように値を設定することが必要にな
る。このリセット動作は、リセット信号ＣＴＥＮＢを用
いて必要に応じ、自動的に行われる。以下にこの動作に
ついて説明する。(A) At the time of reset When the power is turned on, it is necessary to perform a reset operation and set a value so that a correct calculation is performed. This reset operation is automatically performed as needed using the reset signal CTENB. This operation will be described below.

【００９４】リセット動作を行って正しい値を設定する
必要があるのは、カウンタ用レジスタＣＴＲに保持され
る値についてである。通常であれば、該レジスタＣＴＲ
の保持する値は演算待ちの時にＣＴ＜０：４＞＝１００
００、演算中はＣＴ＜０：４＞＜１００００である。従
って、演算を行わない時には、カウンタの値が”１００
００”になっていることが必要である。It is the value held in the counter register CTR that needs to be reset to set the correct value. Normally, the register CTR
Holds the value CT <0: 4> = 100 when waiting for calculation
00, CT <0: 4><10000 during calculation. Therefore, when the calculation is not performed, the value of the counter is "100.
It must be "00".

【００９５】もし、ＣＴ＜０：４）≠１００００であれ
ば、ＣＴＥＮＢ＝１となり、ＳＷ＝（ＳＴＡＲＴ＾ＲＥ
ＡＤＹ）｜ＣＴＥＮＢ＝１であるので、その他の信号に
関わらず強制的に内部クロックＣＬＯＣＫが発生する。
尚、記号”｜”は論理和を示す。If CT <0: 4) ≠ 10000, CTENB = 1 and SW = (START ^ RE
Since ADY) | CTENB = 1, the internal clock CLOCK is forcibly generated regardless of other signals.
The symbol "|" indicates a logical sum.

【００９６】このようにして内部クロックＣＬＯＣＫが
発生するので、インクリメンタＣＩＮＣが動作してカウ
ントアップする。ＣＴ＜０：４＞＝１００００となった
時点でＣＴＥＮＢ＝０に変化し、ＳＷ＝０となり内部ク
ロックＣＬＯＣＫが停止する（この時、ＳＴＡＲＴ＝Ｒ
ＥＡＤＹ）。従って、ＣＴ＜０：４＞＝１００００のま
までカウンタも停止する。この際に、最終加算部ＦＳＡ
から結果が出力されるが、当然ながら該出力結果は意味
を持たない。Since the internal clock CLOCK is generated in this way, the incrementer CINC operates and counts up. When CT <0: 4> = 10000, CTENB = 0, SW = 0, and the internal clock CLOCK is stopped (at this time, START = R
EADY). Therefore, the counter is also stopped with CT <0: 4> = 10000. At this time, the final addition unit FSA
, The result is output, but of course, the output result has no meaning.

【００９７】以上で、演算のリセット動作が完了し、乗
算開始の指示待ち状態となる。As described above, the reset operation of the operation is completed, and the operation waits for the instruction to start the multiplication.

【００９８】（Ｂ）演算時乗算の開始は、図８のタイミングチャートに示したよう
に、乗算開始信号ＳＴＡＲＴが立ち上がることにより知
らされる。演算待ちの時にはＳＴＡＲＴ＝ＲＥＡＤＹで
あり、上記のリセット動作によりＣＴＥＮＢ＝０である
から、ＳＷ＝（ＳＴＡＲＴ＾ＲＥＡＤＹ）｜ＣＴＥＮＢ
＝０である。演算の開始信号ＳＴＡＲＴが立ち上がる
と、ＳＴＡＲＴ≠ＲＥＡＤＹとなりＳＷ＝１となり、内
部クロックＣＬＯＣＫが発生する。(B) At the time of operation The start of multiplication is notified by the rise of the multiplication start signal START, as shown in the timing chart of FIG. Since START = READY when waiting for an operation and CTENB = 0 due to the above reset operation, SW = (START ^ READY) | CTENB
= 0. When the calculation start signal START rises, START ≠ READY, SW = 1, and the internal clock CLOCK is generated.

【００９９】この時、ＥＮＢ＝ＣＴ＜０＞＝１であるの
で、内部クロックＣＬＯＣＫの立ち上がりと同時に、乗
算器の初期化の動作が起こる。本実施例では、この初期
化の動作は、全てＥＮＢ＝１の際に内部クロックＣＬＯ
ＣＫが立ち上がった場合に起きるものとする。従って、
ＥＮＢ＝０の時には、内部クロックＣＬＯＣＫが立ち上
がっても初期化は起こらない。At this time, since ENB = CT <0> = 1, the operation of initializing the multiplier occurs at the same time when the internal clock CLOCK rises. In this embodiment, this initialization operation is performed by the internal clock CLO when ENB = 1.
It shall occur when CK rises. Therefore,
When ENB = 0, initialization does not occur even if the internal clock CLOCK rises.

【０１００】（Ｂ−１）初期化動作先ず、カウンタ用レジスタＣＴＲからＣＴ＜０：４＞＝
０００００が出力される。従って、ＣＴ＜０＞≠１、即
ち、ＥＮＢ＝ＣＴ＜０＞＝０となる。このことは、１回
の乗算において初期化は１回のみ起こることを意味して
いる。(B-1) Initialization Operation First, CT <0: 4> = from the counter register CTR.
00000 is output. Therefore, CT <0> ≠ 1, that is, ENB = CT <0> = 0. This means that initialization can occur only once in one multiplication.

【０１０１】乗数レジスタＢＲについては、初期化が起
こると、下位ビットＢ＜３１＞，Ｂ＜３０＞が出力さ
れ、ＳＨＦ＜３１＞，ＳＨＦ＜３０＞を介して部分積生
成部ＰＳＧ１，ＰＳＧ２に供給される。また中間結果レ
ジスタＲＥＧ６４については、初期化の際には全ビット
から”０”が出力される。When the multiplier register BR is initialized, the lower bits B <31> and B <30> are output to the partial product generators PSG1 and PSG2 via SHF <31> and SHF <30>. Supplied. Further, in the intermediate result register REG64, "0" is output from all bits at the time of initialization.

【０１０２】上記の出力を受けて、部分積生成部ＰＳＧ
１，ＰＳＧ２、及び加算部ＳＡ１，ＳＡ２が動作し、各
ビット毎の和Ｓ＜０：３１＞及びＳＯ２、並びにキャリ
ーＣ＜０：３１＞が生成される。In response to the above output, the partial product generator PSG
1, PSG2, and the addition units SA1 and SA2 operate to generate sums S <0:31> and SO2 for each bit and carry C <0:31>.

【０１０３】以上の初期化の動作は、並列乗算器（第１
の従来例）における３２段の内、２段目までが終了した
のと同じである。The above initialization operation is performed by the parallel multiplier (first
This is the same as the completion of the second stage out of the 32 stages in the conventional example).

【０１０４】（Ｂ−２）その他の動作次にクロックの立ち下がり時における動作について説明
する。(B-2) Other Operations Next, operations at the falling edge of the clock will be described.

【０１０５】先ず、カウンタ用レジスタＣＴＲについて
は、インクリメント動作が行われて１つカウントアップ
された値が格納される。First, with respect to the counter register CTR, an increment operation is performed and a value counted up by one is stored.

【０１０６】乗数レジスタＢＲについては、右に２ビッ
トシフトした値がＳＨＦ＜２：３１＞として保持され
る。この際、乗数レジスタＢＲの上位２ビットＳＨＦ＜
０：１＞には、加算部ＳＡ１，ＳＡ２からそれぞれ和の
最下位ビットＳＯ，ＳＯ２が取り込まれる。Regarding the multiplier register BR, the value shifted to the right by 2 bits is held as SHF <2:31>. At this time, the upper 2 bits of the multiplier register BR SHF <
In 0: 1>, the least significant bits SO and SO2 of the sum are fetched from the adders SA1 and SA2, respectively.

【０１０７】中間結果レジスタＲＥＧ６４については、
２段目のの加算部ＳＡ２から出力される和Ｓ＜０：３１
＞とキャリーＣ＜０：３１＞が途中結果として格納され
る。Regarding the intermediate result register REG64,
Sum S <0:31 output from adder SA2 in the second stage
> And carry C <0:31> are stored as intermediate results.

【０１０８】次に、内部クロックＣＬＯＣＫの立ち上が
り時における動作について説明する。Next, the operation at the rising edge of the internal clock CLOCK will be described.

【０１０９】カウンタ用レジスタＣＴＲからＣＴ＜０：
４＞がインクリメンタＣＩＮＣに対して供給される。乗
数レジスタＢＲからは、ＳＨＦ＜０：３１＞の内の下位
２ビットＳＨＦ＜３１＞，ＳＨＦ＜３０＞がそれぞれ部
分積生成部ＰＳＧ１，ＰＳＧ２に供給される。また、中
間結果レジスタＲＥＧ６４からは、ＲＥＧ＜０：６３＞
が１段目の加算部ＳＡ１に対して供給される。From the counter register CTR to CT <0:
4> is supplied to the incrementer CINC. From the multiplier register BR, the lower 2 bits SHF <31> and SHF <30> of SHF <0:31> are supplied to the partial product generators PSG1 and PSG2, respectively. In addition, REG <0:63> is output from the intermediate result register REG64.
Is supplied to the addition unit SA1 of the first stage.

【０１１０】以上の入力を受けて、部分積生成部ＰＳＧ
１，ＰＳＧ２、及び加算部ＳＡ１，ＳＡ２が動作する。
この動作１回で並列乗算器２段分の演算を行ったのと同
等であるので、この動作を１６回繰り返せば３２ビット
分の部分積を生成し、それらを累積加算したことにな
る。Upon receipt of the above inputs, the partial product generator PSG
1, PSG2 and addition units SA1 and SA2 operate.
Since this operation is equivalent to performing the operation for two stages of the parallel multipliers, if this operation is repeated 16 times, 32 bits of partial products are generated and they are cumulatively added.

【０１１１】そこで、カウンタの役目を果たすレジスタ
ＣＴＲからＣＴ＜０：４＞＝１００００が出力された時
点、即ちカウンタが”１６”を示した時点で、ＥＮＢ＝
ＣＴ＜０＞＝１とする。ＥＮＢ＝１になると、乗算終了
信号ＲＥＡＤＹは立ち上がり、外部に対して乗算の終了
を知らせることになる。乗算終了信号ＲＥＡＤＹが立ち
上がったのでＲＥＡＤＹ＝ＳＴＡＲＴとなり、この時Ｃ
ＴＥＮＢ＝０であるのでＳＷ＝０となる。従って、内部
クロックＣＬＯＣＫは停止する。Therefore, when CT <0: 4> = 10000 is output from the register CTR serving as a counter, that is, when the counter indicates "16", ENB =
CT <0> = 1. When ENB = 1, the multiplication end signal READY rises to inform the outside of the multiplication. Since the multiplication end signal READY has risen, READY = START, and at this time C
Since TENB = 0, SW = 0. Therefore, the internal clock CLOCK is stopped.

【０１１２】それと同時に、ＥＮＢ＝１であるので、最
終加算部ＦＳＡが動作し、２段目の加算部ＳＡ２のキャ
リーＣ＜０：３１＞とビット毎の部分和Ｓ＜０：３１＞
を足し合わせる操作を行う。これにより、乗算結果の上
位の３３ビット分ＯＵＴ＜０：３２＞の結果が得られ
る。また、ＥＮＢ＝１の時は、乗数レジスタＢＲに保持
しておいた乗算結果の一部も取り出せるようになってい
る。これらが乗算結果の下位３１ビット分ＯＵＴ＜３
３：６３＞に相当する。At the same time, since ENB = 1, the final adder FSA operates and the carry C <0:31> of the second-stage adder SA2 and the partial sum S <0:31> of each bit.
Perform the operation to add. As a result, the result of OUT <0:32> for the upper 33 bits of the multiplication result is obtained. Further, when ENB = 1, part of the multiplication result held in the multiplier register BR can be taken out. These are the lower 31 bits of the multiplication result OUT <3
3:63>.

【０１１３】即ち、上記より、出力ＯＵＴ＜０：６３＞＝｛最終加算部ＦＳＡ出力３３
ビット，ＳＨＦ＜０：３１＞｝である。That is, from the above, the output OUT <0:63> = {final adder FSA output 33
Bit, SHF <0:31>}.

【０１１４】乗算結果が得られた後は、ＣＴ＜０：４＞
＝１００００、ＥＮＢ＝１、ＣＴＥＮＢ＝０の状態が保
たれ、乗数レジスタＢＲ、中間結果レジスタＲＥＧ６４
の値もそのまま保持される。同時に最終加算部ＦＳＡは
出力し続けるので、乗算結果は次の乗算が始まるまで保
持される。After the multiplication result is obtained, CT <0: 4>
= 10000, ENB = 1, CTENB = 0 are maintained, multiplier register BR, intermediate result register REG64
The value of is also retained. At the same time, since the final addition unit FSA continues to output, the multiplication result is held until the next multiplication starts.

【０１１５】その後、乗数Ｂ及び被乗数Ａが変化し、次
に乗算開始信号ＳＴＡＲＴが立ち上がると、次の乗算が
開始されて上記の動作が繰り返され、乗算結果が出力さ
れる。After that, when the multiplier B and the multiplicand A change and the multiplication start signal START rises next, the next multiplication is started and the above operation is repeated and the multiplication result is output.

【０１１６】図９は、第４の実施例の演算装置が適用さ
れるデータ処理装置の全体構成図である。FIG. 9 is an overall configuration diagram of a data processing device to which the arithmetic unit of the fourth embodiment is applied.

【０１１７】データ処理装置は、マイクロプロセッサＭ
ＰＵ、及び外部メモリＥＸＴＭＥＭから構成されてい
る。The data processing device is a microprocessor M.
It is composed of a PU and an external memory EXTMEM.

【０１１８】マイクロプロセッサユニットＭＰＵは、デ
ータに対する演算を行う演算ユニットＥＸＵ、外部メモ
リＥＸＴＭＥＭとのインタフェースを司るバスインタフ
ェースユニットＢＩＵ、外部クロックＥＸＴＣＬＫを取
り込んで乗算装置ＭＰＹからの乗算終了信号ＲＥＡＤＹ
に基づき内部クロックＣＬＯＣＫを生成するクロックバ
ッファＣＢＵＦと、バスインタフェースユニットＢＩＵ
を介して取り込まれる命令ＩＮＳＴを解読してマイクロ
プロセッサＭＰＵ内部の制御信号を生成する命令デコー
ドユニットＩＤＵとから構成されている。The microprocessor unit MPU takes in the arithmetic unit EXU for performing arithmetic operations on data, the bus interface unit BIU for controlling the interface with the external memory EXTMEM, and the external clock EXTCLK to take in the multiplication end signal READY from the multiplication device MPY.
A clock buffer CBUF that generates an internal clock CLOCK based on the bus interface unit BIU
It is composed of an instruction decode unit IDU which decodes the instruction INST fetched via the and generates a control signal inside the microprocessor MPU.

【０１１９】また、演算ユニットＥＸＵは、演算装置
（乗算装置）ＭＰＹ、算術論理演算ユニットＡＬＵ、及
び汎用レジスタＧＲを備えている。The arithmetic unit EXU comprises an arithmetic unit (multiplier) MPY, an arithmetic logic unit ALU, and a general-purpose register GR.

【０１２０】このように、演算装置ＭＰＹを内蔵するプ
ロセッサＭＰＵにおいては、該プロセッサＭＰＵの動作
クロックＣＬＫが、演算装置ＭＰＹ内部の動作クロック
ＣＬＯＣＫと異なるクロックで動いているために、プロ
セッサＭＰＵ及び演算装置ＭＰＹ間で同期を取る必要が
ある。As described above, in the processor MPU including the arithmetic unit MPU, the operating clock CLK of the processor MPU operates at a clock different from the operating clock CLOCK inside the arithmetic unit MPU. It is necessary to synchronize between MPYs.

【０１２１】本実施例では、このプロセッサＭＰＵ及び
演算装置ＭＰＹ間の同期を、演算開始信号ＳＴＡＲＴ及
び演算実行中である旨を示す乗算終了信号ＲＥＡＤＹに
より実現している。In the present embodiment, the synchronization between the processor MPU and the arithmetic unit MPY is realized by the arithmetic start signal START and the multiplication end signal READY indicating that the arithmetic operation is being executed.

【０１２２】つまり、演算装置ＭＰＹが実行中である旨
を乗算終了信号ＲＥＡＤＹによって示されている間は、
演算ユニットＥＸＵ内の算術論理演算ユニットＡＬＵ、
並びにプロセッサＭＰＵ内の他のユニット、即ちバスイ
ンタフェースユニットＢＩＵ、及び命令デコードユニッ
トＩＤＵの動作を停止することにより、制御が簡単にな
り、例えばＰＬＬ回路等の複雑なハードウェアを用いる
ことなく同期を取ることが可能となる。That is, while the multiplication end signal READY indicates that the arithmetic unit MPY is executing,
An arithmetic logic operation unit ALU in the operation unit EXU,
In addition, by stopping the operation of other units in the processor MPU, that is, the bus interface unit BIU and the instruction decode unit IDU, the control is simplified and synchronization is achieved without using complicated hardware such as a PLL circuit. It becomes possible.

【０１２３】[0123]

【発明の効果】以上説明したように、本発明によれば、
乗算アレイを最小限の段数のキャリーセーブ型の部分積
計算アレイ（部分積生成部及び加算部）で実現し、部分
積計算アレイにおける処理を繰り返し行うことにより、
部分積計算アレイにおける処理時間を短縮すると共に、
回路規模を大きく削減した、低コストの演算装置を提供
することができる。As described above, according to the present invention,
By implementing the multiplication array with a carry-save partial product calculation array (partial product generation unit and addition unit) with a minimum number of stages, and repeating the processing in the partial product calculation array,
While reducing the processing time in the partial product calculation array,
It is possible to provide a low-cost arithmetic device with a greatly reduced circuit scale.

【０１２４】また、本発明によれば、乗算アレイをｐ段
のキャリーセーブ型の部分積計算アレイ（ｐ段の部分積
生成部及び加算部）で実現して（被乗数のビット数）／
ｐ回の繰り返しサイクルで乗算を実行することとしたの
で、要求される演算処理速度とハードウェア量により部
分積計算アレイの段数ｐを設定して、演算処理速度とハ
ードウェア量のトレードオフを実現でき、結果として、
１サイクルに要する時間を短縮すると共に、全体の演算
処理時間をあまり増大させることなく、回路規模も削減
した、低コストの演算装置を提供することができる。According to the present invention, the multiplication array is realized by a p-stage carry save type partial product calculation array (p-stage partial product generation section and addition section) (bit number of multiplicand) /
Since the multiplication is executed in p repetition cycles, the number of stages p of the partial product calculation array is set according to the required arithmetic processing speed and the amount of hardware to realize the trade-off between the arithmetic processing speed and the amount of hardware. And as a result,
It is possible to provide a low-cost arithmetic device in which the time required for one cycle is shortened, the overall arithmetic processing time is not significantly increased, and the circuit scale is reduced.

【０１２５】また、本発明によれば、当該演算装置外部
より与えられる演算開始信号に基づき、制御回路は、ク
ロック発生回路から所定数のクロックを生成することと
したので、周辺に専用の高速クロック生成手段を必要と
せず、低コストで、高速な演算処理を提供することがで
きる。Further, according to the present invention, the control circuit generates the predetermined number of clocks from the clock generation circuit based on the calculation start signal given from the outside of the arithmetic unit, so that a high-speed clock dedicated to the periphery is generated. It is possible to provide low-cost, high-speed arithmetic processing without requiring a generation unit.

【０１２６】特に、クロック発生回路の構成を、乗数レ
ジスタ、部分積生成部、及び加算部と同じ構成（同じサ
イズのデバイスで構成）とすることにより、乗算アレイ
とクロック生成回路とに同一の遅延特性を持たせること
ができ、製造プロセスの条件や周囲環境の変化に対して
も、誤動作を起こすことなく安定した演算動作を保証で
きる。In particular, by making the configuration of the clock generation circuit the same as that of the multiplier register, the partial product generation unit, and the addition unit (configured by devices of the same size), the same delay is applied to the multiplication array and the clock generation circuit. The characteristics can be provided, and stable arithmetic operation can be guaranteed without causing a malfunction even under the conditions of the manufacturing process and changes in the surrounding environment.

【０１２７】更に、制御回路から、演算開始信号を受け
取ってから演算が終了するまでの間、当該演算装置が演
算実行中である旨を示す信号を外部に出力することとし
たので、当該演算装置を内蔵するプロセッサ、或いは当
該乗算装置と接続されるプロセッサとの同期を、複雑な
ハードウェアを用いることなく実現できる。Further, from the control circuit, the signal indicating that the arithmetic operation is being executed is output to the outside from the time when the arithmetic start signal is received until the arithmetic operation is completed. It is possible to realize the synchronization with the processor having the built-in processor or the processor connected to the multiplication device without using complicated hardware.

[Brief description of drawings]

【図１】本発明の第１の実施例に係る演算装置の構成図
である。FIG. 1 is a configuration diagram of an arithmetic unit according to a first embodiment of the present invention.

【図２】図２（ａ）は全加算器ＦＡＤＤの入力Ａ，Ｂ，
Ｃに対する出力Ｚ，ＣＯの真理値表、図２（ｂ）は半加
算器ＨＡＤＤの入力Ｂ，Ｃに対する出力Ｚ，ＣＯの真理
値表である。FIG. 2 (a) shows inputs A, B, of a full adder FADD.
A truth table of outputs Z and CO for C, and FIG. 2B is a truth table of outputs Z and CO for inputs B and C of the half adder HADD.

【図３】本発明の第２の実施例に係る演算装置の構成図
である。FIG. 3 is a configuration diagram of an arithmetic unit according to a second embodiment of the present invention.

【図４】本発明の第３の実施例に係る演算装置の構成図
である。FIG. 4 is a configuration diagram of an arithmetic unit according to a third embodiment of the present invention.

【図５】第３の実施例におけるクロック発生器の回路構
成図である。FIG. 5 is a circuit configuration diagram of a clock generator in a third embodiment.

【図６】本発明の第４の実施例に係る演算装置の構成図
である。FIG. 6 is a configuration diagram of an arithmetic unit according to a fourth embodiment of the present invention.

【図７】第４の実施例における第２段目の部分積計算ア
レイの詳細回路構成図である。FIG. 7 is a detailed circuit configuration diagram of a second stage partial product calculation array in the fourth embodiment.

【図８】第４の実施例の演算装置の動作を説明するタイ
ミングチャートである。FIG. 8 is a timing chart illustrating the operation of the arithmetic unit according to the fourth embodiment.

【図９】第４の実施例の演算装置が適用されるデータ処
理装置の全体構成図である。FIG. 9 is an overall configuration diagram of a data processing device to which an arithmetic device according to a fourth embodiment is applied.

【図１０】従来の乗算器（第１の従来例）の構成図であ
る。FIG. 10 is a configuration diagram of a conventional multiplier (first conventional example).

【図１１】第２の従来例に係る乗算装置の構成図であ
る。FIG. 11 is a configuration diagram of a multiplication device according to a second conventional example.

[Explanation of symbols]

Ａ（Ａ＜０：７＞）被乗数データＢ（Ｂ＜０：７＞）乗数データＳＲ（ＳＲ＜０＞−ＳＲ＜７＞）乗数データＯＵＴ（ＯＵＴ＜０＞−ＯＵＴ＜７＞）乗算結果の下
位８ビットＳＲ１乗数レジスタＲＥＧ１６中間結果レジスタＡＮ２論理積ゲートＦＡＤＤ全加算器ＨＡＤＤ半加算器ＳＲ２乗数レジスタＳＴＡＧＥ１第１段目の部分積計算アレイＳＴＡＧＥ２第２段目の部分積計算アレイＭＵＬ乗算器ＭＵＬ１第１の実施例で示された乗算器ＭＵＬ２第２の実施例で示された乗算器ＣＬＫＧＥＮ内部クロック発生回路ＣＮＴ制御回路ＳＷクロック制御信号ＡＮ２Ｙ論理積ゲートＣＢ１，ＣＢ２クロックバッファＳＲＸダミーレジスタＡＮ２ＸダミーゲートＦＡＤＤＸダミー全加算器Ａ（Ａ＜０：３１＞）被乗数データＢ（Ｂ＜０：３１＞）乗数データＯＵＴ（ＯＵＴ＜０：６３＞）乗算結果ＢＲ乗数レジスタＲＥＧ６４中間結果レジスタＦＳＡ最終加算部ＢＩＳ乗数レジスタの入力セレクタＢＯＳ乗数レジスタの出力セレクタＲＩＳ中間結果レジスタの入力セレクタＲＯＳ中間結果レジスタの出力セレクタＰＳＧ１，ＰＳＧ２部分積生成部ＳＡ１，ＳＡ２加算部ＦＳＡ最終加算部Ｓ’＜０：３１＞，Ｃ’＜０：３１＞第１段目の部分
積計算アレイの結果Ｓ＜０：３１＞，Ｃ＜０：３１＞第２段目の部分積計
算アレイの結果ＲＥＧ＜０：６３＞中間結果レジスタの出力ＳＨＦ＜０：３１＞乗数レジスタの出力ＣＮＴ制御回路ＣＬＫＧＥＮ内部クロック発生回路ＳＴＡＲＴ乗算開始信号ＲＥＡＤＹ乗算終了信号ＣＬＯＣＫ内部クロックＣＩＮＣインクリメンタＳＴＳカウンタ入力セレクタＣＴＲカウンタ用レジスタＤＥＣデコーダＣＴ＜０：４＞カウンタ出力ＥＮＢイネーブル信号ＣＴＥＮＢリセット信号ＸＯ１排他的論理和ゲートＯＲ１論理和ゲートＤＬＹディレイＣＩ１，ＣＩ２クロックドインバータＩ１インバータＭＰＵマイクロプロセッサＥＸＴＭＥＭ外部メモリＥＸＵ演算ユニットＢＩＵバスインタフェースユニットＣＢＵＦクロックバッファＩＤＵ命令デコードユニットＭＰＹ演算装置（乗算装置）ＡＬＵ算術論理演算ユニットＧＲ汎用レジスタＢＵＳ内部バスＥＸＴＣＬＫ外部クロックＩＮＳＴ命令Ｘ（Ｘ＜０＞−Ｘ＜７＞）被乗数データＹ（Ｙ＜０＞−Ｙ＜７＞）乗数データＯＵＴ（ＯＵＴ＜０＞−ＯＵＴ＜１５＞）乗算結果ＸＲ被乗数レジスタＹＲ乗数レジスタＣＬＡＡＤＤＥＲキャリールックアヘッドアダーＸ（Ｘ＜０：３１＞）被乗数データＹ（Ｙ＜０：３１＞）乗数データＺＲ乗算途中の累積加算結果を保持するレジスタＺ＜０：３１＞レジスタＺＲの出力ＡＤＤＥＲ３２ビット加算器ＡＤＤ＜０：３１＞加算器ＡＤＤＥＲの出力ＳＥＬＲ選択器ＳＥＬ＜０：３１＞選択器ＳＥＬＲの出力A (A <0: 7>) Multiplicand data B (B <0: 7>) Multiplier data SR (SR <0> -SR <7>) Multiplier data OUT (OUT <0> -OUT <7>) Multiplication result Lower 8 bits of SR1 multiplier register REG16 intermediate result register AN2 AND gate FADD full adder HADD half adder SR2 multiplier register STAGE1 first stage partial product calculation array STAGE2 second stage partial product calculation array MUL multiplier MUL1 Multiplier MUL2 shown in the first embodiment Multiplier shown in the second embodiment CLKGEN Internal clock generation circuit CNT control circuit SW Clock control signal AN2Y AND gate CB1, CB2 Clock buffer SRX dummy register AN2X dummy gate FADDX Dummy full adder A (A <0:31>) Multiplicand Data B (B <0:31>) Multiplier data OUT (OUT <0:63>) Multiplication result BR Multiplier register REG64 Intermediate result register FSA Final adder BIS Multiplier register input selector BOS Multiplier register output selector RIS Intermediate result Register input selector ROS Intermediate result register output selector PSG1, PSG2 Partial product generator SA1, SA2 adder FSA final adder S '<0:31>, C'<0:31> Partial product calculation of the first stage Array result S <0:31>, C <0:31> Second stage partial product calculation Array result REG <0:63> Output of intermediate result register SHF <0:31> Output of multiplier register CNT control Circuit CLKGEN Internal clock generation circuit START Multiplication start signal READY Multiplication end signal CLOCK Internal clock CINC incrementer STS counter input selector CTR counter register DEC decoder CT <0: 4> counter output ENB enable signal CTENB reset signal XO1 exclusive OR gate OR1 logical OR gate DLY delay CI1, CI2 clocked inverter I1 inverter MPU Microprocessor EXTMEM External memory EXU Operation unit BIU Bus interface unit CBUF Clock buffer IDU Instruction decode unit MPY Operation unit (multiplier) ALU Arithmetic and logic operation unit GR General register BUS Internal bus EXTCLK External clock INST instruction X (X <0> -X <7>) Multiplicand data Y (Y <0> -Y <7>) Multiplier data OUT (OUT <0> -OUT <15 >) Multiplication result XR Multiplicand register YR Multiplier register CLAADDER Carry look ahead adder X (X <0:31>) Multiplicand data Y (Y <0:31>) Multiplier data ZR Register holding cumulative addition result during multiplication Z <0:31> Output of register ZR ADDER 32-bit adder ADD <0:31> Output of adder ADDER SELR selector SEL <0:31> Output of selector SELR

Claims

[Claims]

1. A multiplier register that holds a multiplier of n bits (n is any positive integer) and shifts by 1 bit in the direction from the most significant bit to the least significant bit in synchronization with a clock; and a shift output of the multiplier register. And a partial product generation unit that generates a partial product of an m-bit (m is an arbitrary positive integer) multiplicand, an addition unit that adds the partial product generation unit output and an intermediate result register output described below, and the addition unit output And an output register for holding a multiplication result of the multiplier and the multiplicand, generating a partial product of the shift output of the multiplier register and the multiplicand during the period of one cycle of the clock, Addition of the output of the partial product and the output of the intermediate result register, holding the output of the adder in the intermediate result register, and writing one of the outputs of the adder to the most significant bit of the output register An arithmetic unit characterized by sequentially performing an operation of shifting the output register by 1 bit from the most significant bit to the least significant bit.

2. A multiplier that holds a multiplier of n bits (n is any positive integer) and shifts by p bits (p is a positive integer of n or less) in the direction from the most significant bit to the least significant bit in synchronization with a clock. Register, and the shift output of the multiplier register, 1 bit and m bits (m
Is an arbitrary positive integer) and a first-stage partial product generator for generating a partial product with a multiplicand, and a first stage for adding an output of the partial product generator of the first stage and an output of an intermediate result register described later. And an i-th bit from the least significant bit of the multiplier register (i =
2 to p) bit and the multiplicand, the i-th stage partial product generation unit, the i-th stage partial product generation unit output, and the i-1th stage partial product generation unit An i-th stage adder for adding an output, an intermediate result register for holding an output of the p-th stage adder, and an output register for holding a multiplication result of the multiplier and the multiplicand, the clock In the period of one cycle of, the generation of the partial product of the shift output 1 bit of the multiplier register and the multiplicand, the addition of the partial product generation unit output of the first stage and the intermediate result register output, the second stage Calculation of the partial product generation unit and the addition unit up to the p-th stage, holding of the output of the addition unit of the p-th stage in the intermediate result register, and p of the addition unit output of the lowest digit of the 1st to p-th stages Write the bit portion from the most significant bit of the output register to p bits, The force register from the most significant bit operation to p bit shifted to the least significant bit direction, the arithmetic apparatus characterized by sequentially performed.

3. The arithmetic unit according to claim 1, wherein the multiplier register and the output register are realized by dual-purpose registers.

4. The arithmetic unit includes a clock generating circuit for generating the clock, and a control circuit for generating a predetermined number of clocks from the clock generating circuit based on an arithmetic start signal given from the outside of the arithmetic unit. The arithmetic unit according to claim 1, 2 or 3, characterized by comprising.

5. The clock generation circuit includes a part of the multiplier register, a part of the adder unit, and a part of the partial product generator, or a part of the multiplier register, the first to pth stages. The arithmetic unit according to claim 4, which has the same configuration as a part of the adding unit up to the eye and a part of the partial product generating unit from the first stage to the p-th stage.

6. The control circuit outputs, to the outside, a signal indicating that the operation device is executing an operation from when the operation start signal is received to when the operation is completed. The arithmetic unit according to Item 4 or 5.