JPH04239909A

JPH04239909A - Method and device for arithmetic processing

Info

Publication number: JPH04239909A
Application number: JP3006655A
Authority: JP
Inventors: Hideyuki Kabuo; 蕪尾　英之; Takashi Taniguchi; 隆志谷口
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1991-01-24
Filing date: 1991-01-24
Publication date: 1992-08-27

Abstract

PURPOSE:To execute a multiplication instruction or an instruction realized with convergent operation during execution of another instruction realized with convergent operation in an arithmetic processing device which performs the pipeline processing. CONSTITUTION:A pipeline multiplier 3 consisting of three stages of higher-order and lower-order multiplying parts 4 and 5 and a binary conversion part 6, a temporary register 7 where data is temporarily held, and a control pipeline which has plural pipelines in the execution stage and following stages are provided, and the period when at least one operation block required for a second instruction processed by this operation block out of operation blocks is unused is detected during execution of a first instruction processed by plural operation blocks, and the second instruction is executed in parallel with execution of the first instruction.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、パイプライン処理を行
う演算処理装置に関するものであり、特に収束型演算で
実現される命令と乗算命令または収束型演算で実現され
る命令とを並列に実行するための演算処理方法と演算処
理装置、また収束型演算を高速に実行するための演算処
理装置に関するものである。[Industrial Application Field] The present invention relates to an arithmetic processing device that performs pipeline processing, and in particular executes an instruction realized by convergence type operation and a multiplication instruction or an instruction realized by convergence type operation in parallel. The present invention relates to an arithmetic processing method and an arithmetic processing device for performing convergence-type arithmetic operations at high speed.

【０００２】0002

【従来の技術】図６は従来の演算処理装置のデータパス
部のブロック図を示す。図６において、１００　はオペ
ランドを供給するレジスタ、１０１　は除算に必要な初
期値を格納しているデータＲＯＭ、１０２　は倍精度乗
算を行なう乗算器、１０３　は２進数変換器、１０４　
は演算結果を一時保持するためのテンポラリレジスタで
ある。2. Description of the Related Art FIG. 6 shows a block diagram of a data path section of a conventional arithmetic processing device. In FIG. 6, 100 is a register that supplies operands, 101 is a data ROM that stores initial values necessary for division, 102 is a multiplier that performs double precision multiplication, 103 is a binary number converter, and 104 is a multiplier that performs double precision multiplication.
is a temporary register for temporarily holding the operation result.

【０００３】図７は従来の演算処理装置において除算命
令を収束型演算として実現する場合のタイミングチャー
トを示す。除算命令実行時の各演算部のシーケンスを図
６と図７を用いて説明する。浮動小数点演算における除
算は指数部と仮数部に別けて考えることができ、指数部
に関しては被除数Ｘの指数と除数Ｙの指数との差によっ
て求められる。したがって、ここでは以後仮数部での演
算のみについて記述する。仮数部の除算アルゴリズムの
１つにＮｅｗｔｏｎ−Ｒａｐｈｓｏｎ法があり、除数の
逆数を求めるための漸化式は（１）　式に示すものであ
る。FIG. 7 shows a timing chart when a division instruction is implemented as a convergent operation in a conventional arithmetic processing device. The sequence of each arithmetic unit when executing a division instruction will be explained using FIGS. 6 and 7. Division in floating point arithmetic can be considered separately into an exponent part and a mantissa part, and the exponent part is determined by the difference between the exponent of the dividend X and the exponent of the divisor Y. Therefore, hereafter only operations on the mantissa part will be described. One of the mantissa division algorithms is the Newton-Raphson method, and the recurrence formula for finding the reciprocal of the divisor is shown in equation (1).

【０００４】Ｒｉ＝Ｒｉ−１×（２−Ｒｉ−１×Ｙｍ）　　　　（１
）ここで、Ｙｍは除数の仮数部を示し、Ｒｉは除数Ｙｍ
の逆数の第ｉ次近似値であり、初期値Ｒ０はＲＯＭから
与えるものとする。（１）　式より求められた除数Ｙｍ
の逆数の最終近似値Ｒｎと被除数の仮数部Ｘｍの積を求
め、これを商とする。初期値Ｒ０を２−９の精度で与え
た場合、倍精度浮動小数点の精度を満たすために（１）
　式においてＲ３まで求める必要がある。（１）　式に
示した漸化式は、Ｒｉ−１×ＹｍとＲｉ−１×（２−Ｒ
ｉ−１×Ｙｍ）との２回の乗算により実現される。図６
に示した演算処理装置のデータパス部において、倍精度
浮動小数点乗算は１サイクル（２クロック）の乗算と１
サイクル（２クロック）の２進数変換により実現される
。[0004] Ri=Ri-1×(2-Ri-1×Ym) (1
) Here, Ym indicates the mantissa part of the divisor, and Ri is the divisor Ym
It is assumed that the initial value R0 is given from the ROM. (1) Divisor Ym obtained from formula
Find the product of the final approximation value Rn of the reciprocal of and the mantissa part Xm of the dividend, and use this as the quotient. If the initial value R0 is given with a precision of 2-9, in order to satisfy the precision of double precision floating point, (1)
It is necessary to find up to R3 in the formula. The recurrence formula shown in equation (1) is Ri-1×Ym and Ri-1×(2-R
i-1×Ym) twice. Figure 6
In the data path section of the arithmetic processing unit shown in Fig.
This is realized by binary number conversion in cycles (2 clocks).

【０００５】図６、図７において、Ｌステージでレジス
タ１００　からオペランドＹｍが読みだされる。Ｅステ
ージ１サイクル目において、オペランドＹｍはＲＯＭ１
０１　をアクセスし、初期値Ｒ０を読みだすとともにテ
ンポラリレジスタ１０４　に書き込まれる。Ｅステージ
２サイクル目において、Ｒ０×Ｙｍの演算を実行する。Ｅステージ３サイクル目において、前サイクルでの演算
結果Ｒ０×Ｙｍは２進数変換器１０３を通らずに直接乗
算器１０２　の乗数側に入力され、２−Ｒ０×Ｙｍの値
としてリコードし、Ｒ０との乗算を行うと同時にテンポ
ラリレジスタ１０４　からオペランドＹｍが読みだされ
る。Ｅステージ４サイクル目において、前サイクルでの
演算結果Ｒ１は２進数変換器１０３　に送られ変換され
るとともに、前記演算結果Ｒ１を直接、乗算器１０２　
の乗数側に入力しＲ１×Ｙｍの演算を行う。Ｅステージ
５サイクル目において、前サイクルでの演算結果Ｒ１×
Ｙｍは２進数変換器１０３　を通らずに直接乗算器１０
２　の乗数側に入力され、２−Ｒ１×Ｙｍの値としてリ
コードし、２進数変換器１０３　の出力結果Ｒ１との乗
算を行う。Ｅステージ６サイクル目において、前サイク
ルでの演算結果Ｒ２は２進数変換器１０３　に送られ変
換されるとともに、前記演算結果Ｒ２を直接、乗算器１
０２　の乗数側に入力しＲ２×Ｙｍの乗算を行う。Ｅス
テージ７サイクル目において、前サイクルでの演算結果
Ｒ２×Ｙｍは２進数変換器１０３　を通らずに直接乗算
器１０２　の乗数側に入力され、２−Ｒ２×Ｙｍの値と
してリコードし、２進数変換器１０３　の出力結果Ｒ２
との乗算Ｒ２×（２−Ｒ２×Ｙｍ）を実行する。Ｅステ
ージ８サイクル目において、前記演算結果Ｒ３を直接、
乗算器１０２　の乗数側に入力しＸｍ×Ｒ３を行う。Ｅ
ステージ９サイクル目において、除算結果Ｘｍ×Ｒ３の
２進数変換および丸めを行い、Ｓステージで結果をレジ
スタ１００　に書き込む。以上のように、倍精度除算は
９サイクル（１８クロック）で実行される。In FIGS. 6 and 7, operand Ym is read from register 100 at the L stage. In the first cycle of E stage, operand Ym is in ROM1.
01 and reads out the initial value R0, which is also written to the temporary register 104. In the second cycle of the E stage, the calculation of R0×Ym is executed. In the third cycle of the E stage, the calculation result R0×Ym from the previous cycle is input directly to the multiplier side of the multiplier 102 without passing through the binary converter 103, and is recoded as a value of 2-R0×Ym, and is converted to R0. At the same time as the multiplication is performed, the operand Ym is read from the temporary register 104. In the fourth cycle of the E stage, the calculation result R1 from the previous cycle is sent to the binary number converter 103 and converted, and the calculation result R1 is directly sent to the multiplier 102.
is input to the multiplier side and the calculation of R1×Ym is performed. In the 5th cycle of the E stage, the calculation result of the previous cycle R1×
Ym is directly input to the multiplier 10 without passing through the binary converter 103.
It is input to the multiplier side of 2, is recoded as a value of 2-R1×Ym, and is multiplied by the output result R1 of the binary converter 103. In the sixth cycle of the E stage, the calculation result R2 from the previous cycle is sent to the binary number converter 103 and converted, and the calculation result R2 is directly sent to the multiplier 1.
02 is input to the multiplier side and multiplied by R2×Ym. In the 7th cycle of the E stage, the calculation result R2×Ym from the previous cycle is directly input to the multiplier side of the multiplier 102 without passing through the binary number converter 103, and is recoded as a value of 2-R2×Ym and converted into a binary number. Output result R2 of converter 103
Execute the multiplication R2×(2−R2×Ym). In the 8th cycle of the E stage, the calculation result R3 is directly
It is input to the multiplier side of the multiplier 102 and performs Xm×R3. E
In the ninth cycle of the stage, the division result Xm×R3 is converted into a binary number and rounded, and the result is written to the register 100 in the S stage. As described above, double precision division is executed in 9 cycles (18 clocks).

【０００６】図８は従来の演算処理装置の制御パイプラ
インの構造を示すものである。図８において、１１１　
は命令のデコードを行うためのＩステージ、１１２　は
演算に必要なオペランドをレジスタから演算器へ送るた
めのＬステージ、１１３　は演算を実行するためのＥス
テージ、１１４　は演算結果をレジスタに書き込むため
のＳステージ、１１５　は演算器の状態を示すステート
マシンである。FIG. 8 shows the structure of a control pipeline of a conventional arithmetic processing device. In FIG. 8, 111
112 is an I stage for decoding instructions, 112 is an L stage for sending operands necessary for operations from registers to arithmetic units, 113 is an E stage for executing operations, and 114 is for writing operation results to registers. The S stage 115 is a state machine that indicates the state of the arithmetic unit.

【０００７】図９は従来の演算処理装置における命令毎
のパイプラインの進行を示したタイミングチャートであ
る。図９において、１２０　〜１２３　は第１の除算命
令がそれぞれＩステージ、Ｌステージ、Ｅステージ、Ｓ
ステージにある期間、１２４　〜１２７　は第２の乗算
命令がそれぞれＩステージ、Ｌステージ、Ｅステージ、
Ｓステージにある期間、１２８　はＩステージウェイト
信号を示す。FIG. 9 is a timing chart showing the progress of a pipeline for each instruction in a conventional arithmetic processing device. In FIG. 9, the first division instructions 120 to 123 are in the I stage, L stage, E stage, and S stage, respectively.
During the periods 124 to 127, the second multiplication instructions are in the I stage, L stage, E stage, and
During the S stage, 128 indicates the I stage wait signal.

【０００８】このように構成された従来の演算処理装置
について、図９に示した命令のシーケンスを実行した場
合の動作を以下に説明する。図８、図９において、ステ
ートマシン１１５　は第１の除算命令ｆｄｉｖｄがＬス
テージ１２１　に進んだとき、Ｉステージ１１１　に対
してウェイト信号１２８　をＨｉｇｈにし、ウェイト信
号１２８　がＬｏｗとなるまで第２の乗算命令ｆｍｕｌ
ｄをロック状態にする。次に第１の除算命令ｆｄｉｖｄ
はＥステージ１２２　へと進み、乗算器１０２　と２進
数変換器１０３　を用いて図７に示した収束型除算の実
行を行なう。ステートマシン１１５　は第１の除算命令
ｆｄｉｖｄのＳステージ１２３　の２サイクル前にＩス
テージ１１１　に対するウェイト信号１２８　をＬｏｗ
とし、第１の除算命令ｆｄｉｖｄのＥステージ１２２　
最終サイクルで第２の乗算命令ｆｍｕｌｄをＬステージ
１２５　に送る。第１の除算命令ｆｄｉｖｄがＳステー
ジ１２３　に入り、その演算結果をレジスタ１００　へ
書き込むと同時に第２の乗算命令ｆｍｕｌｄはＥステー
ジ１２６　へと進み、乗算命令の実行を開始する。次に
第２の乗算命令ｆｍｕｌｄはＳステージ１２７　に入り
、その演算結果をレジスタ１００　へ書き込む。The operation of the conventional arithmetic processing device configured as described above when executing the instruction sequence shown in FIG. 9 will be described below. 8 and 9, when the first division instruction fdivd advances to the L stage 121, the state machine 115 sets the wait signal 128 to High for the I stage 111, and waits until the wait signal 128 becomes Low. Multiplication instruction fmul
d into the locked state. Next, the first division instruction fdivd
Proceeds to the E stage 122, where the convergent division shown in FIG. 7 is executed using the multiplier 102 and the binary converter 103. The state machine 115 sets the wait signal 128 for the I stage 111 to Low two cycles before the S stage 123 of the first division instruction fdivd.
and the E stage 122 of the first division instruction fdivd
In the final cycle, the second multiplication instruction fmuld is sent to the L stage 125. The first division instruction fdivd enters the S stage 123 and writes its operation result to the register 100. At the same time, the second multiplication instruction fmuld advances to the E stage 126 and starts executing the multiplication instruction. Next, the second multiplication instruction fmuld enters the S stage 127 and writes the result of the operation to the register 100.

【０００９】[0009]

【発明が解決しようとする課題】しかしながら、上記の
ような構成では、除算命令がＥステージである間はそれ
以降の命令はパイプロックされた状態にあり、除算命令
がＳステージになるまで次の命令は実行されない。すな
わち、収束型演算で実現される命令が実行されている間
は他の命令は実行できず、演算処理装置自体のパフォー
マンスは低下するという問題を有していた。[Problem to be Solved by the Invention] However, in the above configuration, while the division instruction is in the E stage, subsequent instructions are in a pipe-locked state, and until the division instruction reaches the S stage, the next instruction cannot be executed. The command is not executed. That is, while an instruction realized by convergent operation is being executed, other instructions cannot be executed, resulting in a problem that the performance of the arithmetic processing device itself deteriorates.

【００１０】また、収束型演算の実現においては、演算
精度の高低にかかわらず同じビット幅の演算処理をして
いたために演算実行速度を上げられないという問題を有
していた。[0010] Furthermore, in realizing convergent arithmetic operations, there was a problem in that the arithmetic execution speed could not be increased because arithmetic operations were performed with the same bit width regardless of the level of arithmetic precision.

【００１１】本発明はかかる点に鑑み、収束型演算で実
現される命令を実行している間も他の乗算命令や収束型
演算で実現される命令の実行を可能とすることでパフォ
ーマンスの向上を実現することができる演算処理方法と
演算処理装置、さらには収束型演算を高速に実行するた
めの演算処理装置を提供することを目的とするものであ
る。In view of the above, the present invention improves performance by making it possible to execute other multiplication instructions or instructions realized by convergent arithmetic while executing an instruction realized by convergent arithmetic. It is an object of the present invention to provide an arithmetic processing method and an arithmetic processing device that can realize the above, and furthermore, an arithmetic processing device that can perform convergent arithmetic at high speed.

【００１２】0012

【課題を解決するための手段】上記課題を解決するため
に、本発明の演算処理方法は、複数の演算ブロックによ
り処理される第１の命令の実行中に、前記演算ブロック
の内少なくとも１つの演算ブロックにより処理される第
２の命令をフェッチした場合、第１の命令の実行中に第
２の命令に必要な演算ブロックの未使用期間を検出し、
第１の命令の実行と並列に第２の命令を実行するもので
ある。Means for Solving the Problems In order to solve the above problems, the arithmetic processing method of the present invention provides a method for processing at least one of the arithmetic blocks during execution of a first instruction processed by a plurality of arithmetic blocks. When fetching a second instruction to be processed by the arithmetic block, detecting an unused period of the arithmetic block necessary for the second instruction during execution of the first instruction;
The second instruction is executed in parallel with the execution of the first instruction.

【００１３】また、本発明の演算処理装置は、高位側と
低位側の乗算部と２進数変換部の３段構成からなるパイ
プライン型乗算器と、オペランドを供給するためのレジ
スタと、一時的にデータを保持するテンポラリレジスタ
と、実行ステージ以降に複数のパイプライン構造を有す
る制御パイプラインとを備え、収束型演算として実現さ
れる命令を実行している間も他の乗算命令や収束型演算
として実現される命令の実行を可能にしたものである。Further, the arithmetic processing device of the present invention includes a pipeline multiplier having a three-stage configuration of a high-order side multiplication section, a low-order side multiplication section, and a binary number conversion section, a register for supplying operands, and a temporary It is equipped with a temporary register that holds data, and a control pipeline that has multiple pipeline structures after the execution stage. This makes it possible to execute instructions realized as .

【００１４】また、本発明の演算処理装置は、前記パイ
プライン型乗算器と、前記テンポラリレジスタと、前記
パイプライン型乗算器の乗算器および２進数変換部のシ
ーケンスを制御する制御部とを備え、収束型演算として
実現される命令を実行する場合、演算精度が低いときは
高位側あるいは低位側のいずれかの乗算部と２進数変換
部を用いて演算を行い、演算精度が高位側あるいは低位
側いずれかの乗算部の入力ビット幅より大きくなったと
きは高位側と低位側の乗算部と２進数変換部を用いて演
算を行うように制御し、高速に収束型演算を実行するこ
とを可能にしたものである。Further, the arithmetic processing device of the present invention includes the pipeline multiplier, the temporary register, and a control section that controls the sequence of the multiplier and the binary conversion section of the pipeline multiplier. , when executing an instruction realized as a convergent operation, if the operation precision is low, the operation is performed using either the high-order or low-order multiplication section and the binary conversion section; When the input bit width is larger than the input bit width of the multiplier on either side, the multiplier on the high-order side and the low-order side and the binary conversion section are used to perform the operation, so that the convergence type operation can be executed at high speed. It made it possible.

【００１５】[0015]

【作用】本発明は前記した構成により、収束型演算によ
り実現される第１の命令を実行中に乗算命令または収束
型演算命令により実現される第２の命令をフェッチした
場合、収束型演算における漸化式の繰り返し実行によっ
て発生するパイプライン型乗算器の各段の未使用期間を
検出し、第２の命令が乗算命令の場合、第１の命令が存
在する第１の実行パイプラインとは別の第２の実行パイ
プラインに第２の命令を供給することにより、第１の命
令の実行と並列に第２の命令を実行する。第２の命令が
収束型演算により実現される命令の場合、第１の実行パ
イプラインにおける第１の命令の実行ステージの進行に
合わせて第１の実行パイプラインに第２の命令を供給し
、第１の命令の実行中に第２の命令を実行する。第１の
命令の実行中にフェッチされる命令はパイプライン型乗
算器の各段の未使用期間が検出される期間中続けて第１
または第２の実行パイプラインに供給することで実行し
、その演算結果は一時テンポラリレジスタに格納する。第１の命令の実行が終了し、第１の命令の演算結果をレ
ジスタへ書き込むのと同時あるいは続いてテンポラリレ
ジスタに書き込んだ演算結果もレジスタへ移動する。こ
のことにより、収束型演算により実現される命令の実行
中も制御パイプラインをロックし続けることなく次の演
算命令を実行できる。[Operation] According to the above-described structure, when a multiplication instruction or a second instruction realized by a convergent operation instruction is fetched while a first instruction realized by a convergent operation is being executed, Detects the unused period of each stage of the pipeline multiplier that occurs due to repeated execution of the recurrence formula, and when the second instruction is a multiplication instruction, the first execution pipeline in which the first instruction exists is The second instruction is executed in parallel with the execution of the first instruction by supplying the second instruction to a separate second execution pipeline. If the second instruction is an instruction realized by a convergent operation, supplying the second instruction to the first execution pipeline in accordance with the progress of the execution stage of the first instruction in the first execution pipeline; A second instruction is executed during execution of the first instruction. The instruction fetched during the execution of the first instruction continues to be fetched during the period in which the unused period of each stage of the pipelined multiplier is detected.
Alternatively, it is executed by supplying it to the second execution pipeline, and the result of the operation is temporarily stored in a temporary register. When the execution of the first instruction is completed and the operation result of the first instruction is written to the register, the operation result written to the temporary register is also moved to the register. As a result, even during the execution of an instruction realized by convergent operation, the next operation instruction can be executed without continuing to lock the control pipeline.

【００１６】また、本発明における収束型演算の実現に
関しては、前記した構成により、演算精度が低いときは
高位側あるいは低位側いずれかの乗算部と２進数変換部
を用いて演算を行い、演算精度が高位側あるいは低位側
いずれかの乗算部の入力ビット幅より大きくなったとき
は高位側と低位側の乗算部と２進数変換部を用いて演算
を行う。このことにより、演算精度が低いときに省ける
サイクル数分、高速に収束型演算を実行することができ
る。[0016] Furthermore, in order to realize the convergent operation according to the present invention, with the above-described configuration, when the operation accuracy is low, the operation is performed using either the high-order side or the low-order side multiplication section and the binary conversion section. When the precision becomes larger than the input bit width of either the high-order side or the low-order side multiplier, the calculation is performed using the high-order side and low-order side multipliers and the binary conversion section. As a result, the convergence type calculation can be executed at high speed by the number of cycles that can be saved when the calculation accuracy is low.

【００１７】[0017]

【実施例】以下本発明の一実施例を図面に基づいて説明
する。図１は本発明の第１の実施例における演算処理装
置のデータパス部のブロック図を示す。図１において、
１はオペランドを供給するレジスタ、２は除算に必要な
初期値を格納しているデータＲＯＭ、３は倍精度乗算を
行なうパイプライン型乗算器であり、４〜６はそれぞれ
パイプライン型乗算器３における低位側乗算部、高位側
乗算部、２進数変換部である。７は演算結果を一時保持
するためのテンポラリレジスタである。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 shows a block diagram of a data path section of an arithmetic processing unit in a first embodiment of the present invention. In Figure 1,
1 is a register that supplies operands, 2 is a data ROM that stores initial values necessary for division, 3 is a pipeline type multiplier that performs double precision multiplication, and 4 to 6 are pipeline type multipliers 3, respectively. These are a low-order multiplication section, a high-order multiplication section, and a binary number conversion section. 7 is a temporary register for temporarily holding the calculation result.

【００１８】図２は本発明の第１の実施例の演算処理装
置において除算命令を収束型演算として実現する場合、
演算精度が低いときは高位側乗算部と２進数変換部を用
いて演算を行い、演算精度が高位側乗算部の入力ビット
幅より大きくなったときは高位側と低位側の乗算部と２
進数変換部を用いて演算を行うシーケンス実行時のタイ
ミングチャートを示す。FIG. 2 shows the case where the division instruction is realized as a convergent operation in the arithmetic processing device according to the first embodiment of the present invention.
When the arithmetic precision is low, the high-order side multiplier and the binary conversion section are used to perform the calculation, and when the arithmetic precision becomes larger than the input bit width of the high-order side multiplier, the high-order side and low-order side multipliers are used.
A timing chart when executing a sequence in which arithmetic operations are performed using a base number conversion unit is shown.

【００１９】このように構成された演算処理装置につい
て、図２に示す倍精度除算命令実行時の各演算部のシー
ケンスを説明する。図１に示した演算処理装置のデータ
パス部において、倍精度浮動小数点乗算は２サイクル（
２クロック）の乗算と１サイクル（１クロック）の２進
数変換により実現される。図２において、Ｌステージで
レジスタ１からオペランドＹｍが読みだされる。Ｅ１ス
テージ１サイクル目において、オペランドＹｍはＲＯＭ
２をアクセスし初期値Ｒ０を読みだすとともにテンポラ
リレジスタ７に書き込まれる。Ｅ１ステージ２サイクル
目において、Ｒ０×Ｙｍの演算を実行する。このときＲ
０の精度が低いことに注目すると、乗算器の入力幅とし
て倍精度分を必要としない。したがって、Ｒ０を乗数側
（リコーダ側）入力とし、Ｙｍを被乗数側入力として高
位側乗算部５のみで演算を実行できる。Ｅ１ステージ３
サイクル目において、前サイクルでの演算結果Ｒ０×Ｙ
ｍは２進数変換器６を通らずに直接乗数器３の乗数側に
入力され、２−Ｒ０×Ｙｍの値としてリコードし、Ｒ０
との乗算を行うと同時にテンポラリレジスタ７からオペ
ランドＹｍが読みだされる。このとき、前サイクルと同
様にＲ０×Ｙｍの精度が低いことに注目し、高位側乗算
部５のみで演算を実行する。Ｅ１ステージ４サイクル目
において、前サイクルでの演算結果Ｒ１は２進数変換器
６に送られ変換されるとともに、前記演算結果Ｒ１を直
接、乗算器３の乗数側に入力Ｒ１×Ｙｍの演算を行う。このとき、前サイクルと同様にＲ１の精度が低いことに
注目し、高位側乗算部５のみで演算を実行する。Ｅ１ス
テージ５サイクル目において、前サイクルでの演算結果
Ｒ１×Ｙｍは２進数変換器６を通らずに直接乗算器３の
乗数側に入力され、２−Ｒ１×Ｙｍの値としてリコード
し、２進数変換器６の出力結果Ｒ１との乗算を行う。こ
のとき、前サイクルの演算結果Ｒ１×Ｙｍの精度が高位
側乗算部５の乗数側入力幅より大きくなるため、まず低
位側乗算部４で演算を実行した後、次サイクルで高位側
乗算部５での演算を実行する。Regarding the arithmetic processing device configured as described above, the sequence of each arithmetic unit when executing the double-precision division instruction shown in FIG. 2 will be described. In the data path section of the arithmetic processing unit shown in Figure 1, double-precision floating-point multiplication is performed in two cycles (
This is realized by multiplication of 2 clocks) and binary conversion of 1 cycle (1 clock). In FIG. 2, operand Ym is read from register 1 at the L stage. In the first cycle of E1 stage, operand Ym is in ROM
2 and reads out the initial value R0, which is also written to the temporary register 7. In the second cycle of the E1 stage, the calculation R0×Ym is executed. At this time R
Focusing on the low precision of 0, double precision is not required as the input width of the multiplier. Therefore, the calculation can be executed only by the high-order side multiplier 5 with R0 as the multiplier side (recorder side) input and Ym as the multiplicand side input. E1 stage 3
In the cycle, the calculation result from the previous cycle is R0×Y
m is input directly to the multiplier side of the multiplier 3 without passing through the binary number converter 6, and is recoded as a value of 2-R0×Ym, R0
The operand Ym is read out from the temporary register 7 at the same time as the multiplication with . At this time, paying attention to the low precision of R0×Ym as in the previous cycle, only the high-order multiplier 5 executes the calculation. In the fourth cycle of the E1 stage, the calculation result R1 from the previous cycle is sent to the binary number converter 6 and converted, and the calculation result R1 is directly input to the multiplier side of the multiplier 3 for calculation of R1×Ym. . At this time, paying attention to the low precision of R1 as in the previous cycle, only the high-order multiplier 5 executes the calculation. In the fifth cycle of the E1 stage, the calculation result R1×Ym from the previous cycle is input directly to the multiplier side of the multiplier 3 without passing through the binary number converter 6, and is recoded as a value of 2-R1×Ym, and converted into a binary number. Multiplication with the output result R1 of the converter 6 is performed. At this time, since the accuracy of the calculation result R1×Ym of the previous cycle is greater than the multiplier side input width of the high-order side multiplier 5, the calculation is first performed in the low-order side multiplier 4, and then in the next cycle, the high-order side multiplier 5 Execute the calculation in .

【００２０】以後の乗算は演算精度を高めるため、高位
側５と低位側４両方の乗算部を用いて行われる。Ｅ２ｄ
ステージ１サイクル目において、Ｒ１×（２−Ｒ１×Ｙ
ｍ）の高位側の乗算を実行するとともにテンポラリレジ
スタ７からオペランドＹｍが読みだされる。Ｅ２ｄステ
ージ２サイクル目において、前サイクルでの演算結果Ｒ
２は２進数変換器６に送られ変換されるとともに、前記
演算結果Ｒ２を直接、乗算器３の乗数側に入力しＲ２×
Ｙｍの低位側乗算を行う。Ｅ２ｄステージ３サイクル目
において、Ｒ２×Ｙｍの高位側乗算を行なうと同時に前
サイクルで２進数変換されたＲ２をテンポラリレジスタ
７に書き込む。Ｅ２ｄステージ４サイクル目において、
前サイクルでの演算結果Ｒ２×Ｙｍは２進数変換器６を
通らずに直接乗算器３の乗数側に入力され、２−Ｒ２×
Ｙｍの値としてリコードし、Ｒ２をテンポラリレジスタ
７から読み出し、Ｒ２×（２−Ｒ２×Ｙｍ）の低位側乗
算を行う。Ｅ２ｄステージ５サイクル目において、Ｒ２
×（２−Ｒ２×Ｙｍ）の高位側乗算を行うと同時にレジ
スタ１からオペランドＸｍを読みだす。Ｅ２ｄステージ
６サイクル目において、前記演算結果Ｒ３を直接、乗算
器３の乗数側に入力しＸｍ×Ｒ３の低位側乗算を行う。Ｅ２ｄステージ７サイクル目において、Ｘｍ×Ｒ３の高
位側乗算を行う。Ｅ３ｄステージにおいて、除算結果Ｘ
ｍ×Ｒ３の２進数変換および丸めを行い、Ｓステージで
結果をレジスタ１に書き込む。The subsequent multiplications are performed using both the high-order side 5 and low-order side 4 multipliers in order to improve the calculation accuracy. E2d
In the first cycle of stage, R1×(2-R1×Y
At the same time, the operand Ym is read from the temporary register 7. In the second cycle of the E2d stage, the calculation result R of the previous cycle
2 is sent to the binary number converter 6 and converted, and the calculation result R2 is directly input to the multiplier side of the multiplier 3, and R2×
Perform low-order multiplication of Ym. In the third cycle of the E2d stage, the high-order side multiplication of R2×Ym is performed, and at the same time, R2, which has been converted into a binary number in the previous cycle, is written into the temporary register 7. In the 4th cycle of E2d stage,
The calculation result R2×Ym in the previous cycle is input directly to the multiplier side of the multiplier 3 without passing through the binary converter 6, and is calculated as 2−R2×
It is recoded as the value of Ym, R2 is read from the temporary register 7, and low-order side multiplication of R2×(2−R2×Ym) is performed. At the 5th cycle of E2d stage, R2
Operand Xm is read from register 1 at the same time as the high-order side multiplication of ×(2−R2×Ym) is performed. In the sixth cycle of the E2d stage, the calculation result R3 is directly input to the multiplier side of the multiplier 3 to perform a low-order side multiplication of Xm×R3. In the seventh cycle of the E2d stage, a high-order side multiplication of Xm×R3 is performed. At the E3d stage, the division result X
Perform binary conversion and rounding of m×R3, and write the result to register 1 in the S stage.

【００２１】以上のように本実施例によれば、高位側と
低位側の乗算部と２進数変換部の３段構成からなるパイ
プライン型乗算器と一時的にデータを保持するテンポラ
リレジスタとを備えることにより、除算命令のシーケン
スにおいて演算精度が低いときは高位側乗算部と２進数
変換部を用いて演算を行い、演算精度が高位側乗算部の
入力ビット幅より大きくなったときは高位側と低位側の
乗算部と２進数変換部を用いて演算を行うことができ、
倍精度の除算を１３サイクル（１３クロック）で実行で
きる。As described above, according to this embodiment, a pipeline multiplier consisting of a three-stage configuration of a high-order side multiplication section, a low-order side multiplication section, and a binary number conversion section, and a temporary register for temporarily holding data are used. By providing this, when the arithmetic precision is low in the sequence of division instructions, the high-order side multiplier and binary conversion section are used to perform the operation, and when the arithmetic precision becomes larger than the input bit width of the high-order side multiplier, the high-order side multiplier is used. It is possible to perform calculations using the low-order side multiplier and binary conversion section,
Double precision division can be executed in 13 cycles (13 clocks).

【００２２】また、前記Ｅ２ｄステージ１サイクル目で
は低位側乗算部４が未使用であり、前記Ｅ２ｄステージ
２サイクル目では高位側乗算部５、前記Ｅ２ｄステージ
３サイクル目では２進数変換器６と低位側乗算器４がそ
れぞれ未使用である。したがって、前記Ｅ１ステージ５
サイクル目にオペランドを読みだせば、上述した各乗算
部４〜５および２進数変換部６の未使用期間に乗算を実
行でき、その演算結果はテンポラリレジスタ７へ書き込
める。同様にＥ２ｄステージ２サイクル目にオペランド
を読みだせば、Ｅ２ｄステージ３〜５サイクルの各乗算
部４〜６および２進数変換部６の未使用期間に乗算を実
行でき、Ｅ２ｄステージ４サイクル目にオペランドを読
みだせば、Ｅ２ｄステージ５〜７サイクルの各乗算部４
〜５および２進数変換部６の未使用期間に乗算を実行で
きる。Furthermore, in the first cycle of the E2d stage, the low-order multiplier 4 is unused, and in the second cycle of the E2d stage, the high-order multiplier 5 is used, and in the third cycle of the E2d stage, the low-order multiplier 4 is unused. The side multipliers 4 are each unused. Therefore, the E1 stage 5
If the operand is read out in the cycle, multiplication can be executed while the multipliers 4 to 5 and the binary converter 6 are not used, and the result of the operation can be written to the temporary register 7. Similarly, if the operand is read in the second cycle of the E2d stage, multiplication can be executed during the unused period of each multiplier 4 to 6 and the binary conversion part 6 in the 3rd to 5th cycles of the E2d stage, and the operand is read in the 4th cycle of the E2d stage. If you read out
.about.5 and the binary number converter 6 are not used, the multiplication can be performed.

【００２３】なお、本実施例において収束型演算により
実現される命令を除算命令としたが、平方根演算命令や
初等関数などの収束型演算においても同様の効果を得ら
れる。また、演算精度が低いときの乗算は高位側乗算部
を用いて行なったが、低位側乗算部を用いても同様の効
果を得られる。In this embodiment, the instruction realized by the convergence-type operation is a division instruction, but the same effect can be obtained by convergence-type operations such as square root operation instructions and elementary functions. Further, although the multiplication when the calculation precision is low is performed using the high-order multiplication section, the same effect can be obtained by using the low-order multiplication section.

【００２４】図３は本発明の第２の実施例における演算
処理装置の制御パイプラインの構造図を示す。図３にお
いて、１１は命令のデコードを行なうためのＩステージ
、１２は演算に必要なオペランドをレジスタから演算器
へ送るためのＬステージ、１３は演算を実行するための
Ｅ１ステージ、１４と１５はそれぞれ除算／平方根演算
を実行するためのＥ２ｄステージとＥ３ｄステージ、１
６は演算結果をレジスタに書き込むためのＳステージ、
１７と１８はそれぞれ乗算を実行するためのＥ２ｍステ
ージとＥ３ｍステージ、１９は演算結果を一時テンポラ
リレジスタに書き込み、その後演算結果をレジスタに書
き込むためのＳＱｕｅ、２０は演算器の状態を示すステ
ートマシンである。FIG. 3 shows a structural diagram of a control pipeline of an arithmetic processing unit in a second embodiment of the present invention. In FIG. 3, 11 is an I stage for decoding instructions, 12 is an L stage for sending operands necessary for an operation from a register to an arithmetic unit, 13 is an E1 stage for executing an operation, and 14 and 15 are E2d and E3d stages for performing division/square root operations, respectively, 1
6 is an S stage for writing the calculation result to the register;
17 and 18 are the E2m stage and E3m stage for executing multiplication, respectively; 19 is an SQue for writing the operation result into a temporary register and then writing the operation result to the register; 20 is a state machine that indicates the state of the arithmetic unit. be.

【００２５】図４は本発明の第２の実施例の演算処理装
置における命令毎のパイプラインの進行を示したタイミ
ングチャートである。図４において、３１〜３６は第１
の除算命令がそれぞれＩステージ、Ｌステージ、Ｅ１ス
テージ、Ｅ２ｄステージ、Ｅ３ｄステージ、Ｓステージ
にある期間、３７〜４３は第２の乗算命令がそれぞれＩ
ステージ、Ｌステージ、Ｅ１ステージ、Ｅ２ｍステージ
、Ｅ３ｍステージ、Ｓｑ０ステージ、Ｓステージにある
期間、４４〜４９は第３の除算命令がそれぞれＩステー
ジ、Ｌステージ、Ｅ１ステージ、Ｅ２ｄステージ、Ｅ３
ｄステージ、Ｓステージにある期間、５０〜５６は第４
の乗算命令がそれぞれＩステージ、Ｌステージ、Ｅ１ス
テージ、Ｅ２ｍステージ、Ｅ３ｍステージ、Ｓｑ０ステ
ージ、Ｓステージにある期間、５７〜６３は第５の乗算
命令がそれぞれＩステージ、Ｌステージ、Ｅ１ステージ
、Ｅ２ｍステージ、Ｅ３ｍステージ、Ｓｑ０ステージ、
Ｓステージにある期間、６４〜６６は第６〜第８の乗算
命令がＳステージにある期間、６７〜７２は第９の乗算
命令がそれぞれＩステージ、Ｌステージ、Ｅ１ステージ
、Ｅ２ｍステージ、Ｅ３ｍステージ、Ｓステージにある
期間、７３はＩステージウェイト信号を示す。FIG. 4 is a timing chart showing the progress of the pipeline for each instruction in the arithmetic processing device according to the second embodiment of the present invention. In FIG. 4, 31 to 36 are the first
During periods 37 to 43, the second multiplication instructions are in the I stage, L stage, E1 stage, E2d stage, E3d stage, and S stage, respectively.
44 to 49 are periods in which the third division instruction is in the I stage, L stage, E1 stage, E2d stage, and E3, respectively.
D stage, S stage period, 50-56 is the 4th stage
During periods 57 to 63, the fifth multiplication instructions are in the I stage, L stage, E1 stage, E2m stage, E3m stage, Sq0 stage, and S stage, respectively. stage, E3m stage, Sq0 stage,
Periods 64 to 66 are periods in which the sixth to eighth multiplication instructions are in the S stage, and periods 67 to 72 are periods in which the ninth multiplication instructions are in the I stage, L stage, E1 stage, E2m stage, and E3m stage, respectively. , 73 indicates an I stage wait signal during the S stage.

【００２６】図５は本実施例の演算処理装置において倍
精度除算命令を収束型演算として実現する場合、演算精
度にかかわらず高位側と低位側の乗算部と２進数変換部
を用いて演算を行うシーケンス実行時のタイミングチャ
ートを示す。図２に示したシーケンスの第１の除算命令
を実行中に第２の除算命令をフェッチした場合、第２の
除算命令を図５に示したシーケンスで実行すると図２の
Ｅ２ｄステージ以降の各乗算部および２進数変換部の未
使用期間に第２の除算命令を実行することができる。FIG. 5 shows that when a double-precision division instruction is realized as a convergent operation in the arithmetic processing device of this embodiment, the operation is performed using the multipliers on the high-order side and the low-order side and the binary conversion section, regardless of the calculation accuracy. A timing chart for executing the sequence is shown. If the second division instruction is fetched while the first division instruction in the sequence shown in FIG. 2 is being executed, if the second division instruction is executed in the sequence shown in FIG. The second division instruction can be executed during the unused period of the unit and the binary conversion unit.

【００２７】このように構成された本発明の第２の実施
例の演算処理装置について、図４に示した命令のシーケ
ンスを実行した場合の動作を以下に説明する。図３およ
び図４において、ステートマシン２０は第１の除算命令
ｆｄｉｖｄがＬステージ３２に進んだときＩステージ１
１に対してウェイト信号７３をＨｉｇｈにし、ウェイト
信号７３がＬｏｗになるまで第２の乗算命令ｆｍｕｌｄ
をロック状態にする。次に第１の除算命令ｆｄｉｖｄは
Ｅ１ステージ３３へと進み、図２に示した演算シーケン
スを実行する。ステートマシン２０は第１の除算命令ｆ
ｄｉｖｄのＥ１ステージ３３の４サイクル目にＩステー
ジ１１に対するウェイト信号７３をＬｏｗとし、次のサ
イクルで第２の乗算命令ｆｍｕｌｄをＬステージ３８に
送ると同時に第３の除算命令ｆｄｉｖｄのＩステージ１
１に対してウェイト信号７３をＨｉｇｈにする。次に第
１の除算命令ｆｄｉｖｄは除算／平方根演算用実行パイ
プのＥ２ｄステージ３４へと進み、第２の乗算命令ｆｍ
ｕｌｄはＥ１ステージ３９へと進み、ステートマシン２
０は第３の除算命令ｆｄｉｖｄのＩステージ１１に対す
るウェイト信号７３をＬｏｗとする。The operation of the arithmetic processing device according to the second embodiment of the present invention configured as described above will be described below when the instruction sequence shown in FIG. 4 is executed. 3 and 4, the state machine 20 operates in I stage 1 when the first division instruction fdivd advances to L stage 32.
1, the wait signal 73 is made High, and the second multiplication instruction fmuld is executed until the wait signal 73 becomes Low.
to the locked state. Next, the first division instruction fdivd advances to the E1 stage 33 and executes the operation sequence shown in FIG. The state machine 20 executes a first division instruction f
In the fourth cycle of the E1 stage 33 of divd, the wait signal 73 for the I stage 11 is set to Low, and in the next cycle, the second multiplication instruction fmuld is sent to the L stage 38, and at the same time, the I stage 1 of the third division instruction fdivd is set to low.
1, the wait signal 73 is set to High. The first divide instruction fdivd then advances to the E2d stage 34 of the divide/square root operation execution pipe and the second multiply instruction fm
uld advances to E1 stage 39 and state machine 2
0 sets the wait signal 73 for the I stage 11 of the third division instruction fdivd to Low.

【００２８】第１の除算命令ｆｄｉｖｄが除算／平方根
演算用実行パイプのＥ２ｄステージ４にある間、第２の
乗算命令ｆｍｕｌｄはステートマシン２０からの情報に
したがい乗算用実行パイプ１７〜１８を進むことができ
、第１の除算命令ｆｄｉｖｄがＥ２ｄステージ１４とＥ
３ｄステージ１５にある間、第２の乗算命令ｆｍｕｌｄ
はＳＱｕｅ１９に命令をとどめ、その演算結果はテンポ
ラリレジスタ７に一時格納する。第１の除算命令ｆｄｉ
ｖｄがＳステージ３６に入ると同時にＳＱｕｅ１９にあ
る第２の乗算命令ｆｍｕｌｄの演算結果はテンポラリレ
ジスタ７からレジスタ１へと書き込まれる。第１の除算
命令ｆｄｉｖｄがＥ２ｄステージ３４にある間、第３の
除算命令ｆｄｉｖｄはＩステージ４４からＥ１ステージ
４６まで進み、第１の除算命令ｆｄｉｖｄがＥ３ｄステ
ージ３５に進むと同時に第３の除算命令ｆｄｉｖｄはＥ
２ｄステージ４７に進む。第１の除算命令ｆｄｉｖｄが
Ｅ２ｄステージ３４にあり、かつ第３の除算命令ｆｄｉ
ｖｄがＥ１ステージ４６にある間は乗算器の各段が演算
実行中であるため、他の乗算命令や除算命令、平方根演
算命令は実行できない。ステートマシン２０は第３の除
算命令ｆｄｉｖｄがＥ２ｄステージ４７に移る２サイク
ル前にＩステージ１１に対するウェイト信号７３をＬｏ
ｗとし、次のサイクルで第４の乗算命令ｆｍｕｌｄをＬ
ステージ５１へ移すと同時にＩステージ１１に対してウ
ェイト信号７３をＨｉｇｈにする。第３の除算命令ｆｄ
ｉｖｄがＥ２ｄステージ４７に移ると同時に第４の乗算
命令ｆｍｕｌｄはＥ１ステージ５２に移り、Ｉステージ
１１に対するウェイト信号７３をＬｏｗとする。第３の
除算命令ｆｄｉｖｄがＥ２ｄステージ４７にある間、２
サイクル毎に乗算命令が受け付けられ、第４から第８の
乗算命令ｆｍｕｌｄは全てＳｑｕｅ１９に命令をとどめ
、その演算結果はテンポラリレジスタ７に一時格納され
る。While the first division instruction fdivd is in the E2d stage 4 of the division/square root execution pipe, the second multiplication instruction fmuld proceeds through the multiplication execution pipes 17-18 according to information from the state machine 20. is completed, and the first division instruction fdivd is transferred to E2d stage 14 and E2d stage 14.
While in the 3d stage 15, the second multiply instruction fmuld
The instruction is retained in the SQue 19, and the operation result is temporarily stored in the temporary register 7. First division instruction fdi
At the same time that vd enters the S stage 36, the operation result of the second multiplication instruction fmuld in the SQue 19 is written from the temporary register 7 to the register 1. While the first divide instruction fdivd is in the E2d stage 34, the third divide instruction fdivd advances from the I stage 44 to the E1 stage 46, and at the same time as the first divide instruction fdivd advances to the E3d stage 35, the third divide instruction fdivd advances to the E3d stage 35. fdivd is E
Proceed to 2d stage 47. The first division instruction fdivd is in the E2d stage 34 and the third division instruction fdi
While vd is in the E1 stage 46, each stage of the multiplier is executing an operation, so other multiplication instructions, division instructions, and square root operation instructions cannot be executed. The state machine 20 sets the wait signal 73 for the I stage 11 to Lo two cycles before the third division instruction fdivd moves to the E2d stage 47.
w, and the fourth multiplication instruction fmuld is set to L in the next cycle.
At the same time as moving to the stage 51, the wait signal 73 for the I stage 11 is set to High. Third division instruction fd
At the same time as ivd moves to the E2d stage 47, the fourth multiplication instruction fmuld moves to the E1 stage 52 and sets the wait signal 73 for the I stage 11 to Low. While the third divide instruction fdivd is in the E2d stage 47, the 2
A multiplication instruction is accepted every cycle, and the fourth to eighth multiplication instructions fmuld all remain in the Sque 19, and the operation results are temporarily stored in the temporary register 7.

【００２９】第３の除算命令ｆｄｉｖｄがＳステージ４
９に入ると同時にＳＱｕｅ１９にある第４の乗算命令ｆ
ｍｕｌｄの演算結果はテンポラリレジスタ７からレジス
タ１へと書き込まれる。続いてＳＱｕｅ１９にある第５
、第６の乗算命令ｆｍｕｌｄの演算結果がテンポラリレ
ジスタ７からレジスタ１へと書き込まれ、次にＳＱｕｅ
１９にある第７、第８の乗算命令ｆｍｕｌｄの演算結果
がテンポラリレジスタ７からレジスタ１へと書き込まれ
る。第９の乗算命令ｆｍｕｌｄは第３の除算命令ｆｄｉ
ｖｄがＳステージ４９に入ると同時にＥ１ステージ６９
に移り、第９の乗算命令ｆｍｕｌｄがＥ３ｍステージ７
１にあるとき、第７、第８の乗算命令ｆｍｕｌｄの演算
結果のレジスタ１への書き込みを行なう。このことによ
り第９の乗算命令ｆｍｕｌｄ以前に実行された命令の演
算結果は全てレジスタ１への書き込みを終了でき、第９
の乗算命令ｆｍｕｌｄはＥ３ｍステージ７１の後Ｓステ
ージ７２に移ることができる。The third division instruction fdivd is at S stage 4.
9, the fourth multiplication instruction f in SQue19 at the same time
The operation result of muld is written from temporary register 7 to register 1. Next, the 5th in SQue 19
, the operation result of the sixth multiplication instruction fmuld is written from temporary register 7 to register 1, and then SQue
The operation results of the seventh and eighth multiplication instructions fmuld in step 19 are written from temporary register 7 to register 1. The ninth multiplication instruction fmuld is the third division instruction fdi
At the same time as vd enters S stage 49, E1 stage 69
, the ninth multiplication instruction fmuld is executed at E3m stage 7.
1, the operation results of the seventh and eighth multiplication instructions fmuld are written into register 1. As a result, all the operation results of the instructions executed before the ninth multiplication instruction fmuld can be written to register 1, and the ninth
The multiplication instruction fmuld can be moved to the S stage 72 after the E3m stage 71.

【００３０】以上のように本実施例によれば、Ｅ２ステ
ージ以降において制御パイプラインを除算／平方根演算
命令実行パイプラインと乗算命令実行パイプラインとに
分け、演算結果を一時テンポラリレジスタへ保存するた
めのＳＱｕｅを設けることにより、収束型演算である除
算命令の実行中もパイプライン型乗算器の各段の未使用
期間を検出し、乗算命令や除算命令を並列に実行するこ
とができる。また、第１の除算命令の実行中に第２の除
算命令をフェッチした場合は、第２の除算命令のシーケ
ンスを演算精度によらず高位側と低位側の乗算部と２進
数変換部を用いた演算とすることにより第１の除算命令
の実行中に第２の除算命令を並列に実行することができ
る。As described above, according to this embodiment, after the E2 stage, the control pipeline is divided into a division/square root operation instruction execution pipeline and a multiplication instruction execution pipeline, and the operation results are temporarily stored in a temporary register. By providing the SQue, it is possible to detect the unused period of each stage of the pipeline multiplier even during execution of a division instruction, which is a convergent operation, and to execute multiplication instructions and division instructions in parallel. In addition, if the second division instruction is fetched while the first division instruction is being executed, the sequence of the second division instruction is processed using the high-order and low-order multipliers and the binary conversion section, regardless of the arithmetic precision. By performing the same operation, the second division instruction can be executed in parallel while the first division instruction is being executed.

【００３１】なお、本実施例において収束型演算により
実現される命令を除算命令としたが、平方根演算命令や
初等関数などの収束型演算においても同様の効果を得ら
れる。また、本実施例においては実行ステージ以降のパ
イプラインを２本としたが、複数本でも良く、あるいは
１本でもステートマシンにおいて実行ステージ以前の命
令を実行するという制御が可能であれば、同様の効果を
得られる。また、本実施例においては収束型演算を実行
した場合について述べたが、ビット処理命令などの第１
の命令実行時において複数のブロックを用いて処理を行
なう場合、前記ブロックの未使用期間を検出し、前記未
使用期間に第２の命令を実行するようにすることにより
、同様の効果を得られる。In this embodiment, the instruction realized by the convergence-type operation is a division instruction, but the same effect can be obtained by convergence-type operations such as square root operation instructions and elementary functions. In addition, in this embodiment, there are two pipelines after the execution stage, but there may be more than one pipeline, or even just one pipeline can be used if the state machine can control the execution of instructions before the execution stage. You can get the effect. Furthermore, in this embodiment, the case where a convergence type operation is executed has been described, but the first
When processing is performed using multiple blocks when executing an instruction, the same effect can be obtained by detecting the unused period of the block and executing the second instruction during the unused period. .

【００３２】[0032]

【発明の効果】以上説明したように、本発明によれば、
収束型演算で実現される命令の実行中も各乗算部および
２進数変換部の未使用期間を検出することにより、制御
パイプラインをロックし続けることなく次の演算命令を
実行できる。また、収束型演算の実行において、精度の
高低の程度に応じて乗算部の組み合わせを変えることに
より、収束型演算の高速実行が可能となる。このことか
ら、本発明の演算処理方法及び演算処理装置を用いれば
、パフォーマンスの向上を実現することができる。[Effects of the Invention] As explained above, according to the present invention,
By detecting the unused period of each multiplication unit and binary conversion unit during the execution of an instruction realized by convergent operation, the next operation instruction can be executed without continuing to lock the control pipeline. Furthermore, in executing convergent operations, by changing the combination of multipliers depending on the level of accuracy, high-speed execution of convergent operations becomes possible. Therefore, by using the arithmetic processing method and arithmetic processing device of the present invention, performance can be improved.

[Brief explanation of the drawing]

【図１】本発明の第１の実施例における演算処理装置の
データパス部のブロック図である。FIG. 1 is a block diagram of a data path section of an arithmetic processing device in a first embodiment of the present invention.

【図２】同第１の実施例の演算処理装置における除算命
令の高速型実行時のタイミングチャート図である。FIG. 2 is a timing chart during high-speed execution of a division instruction in the arithmetic processing device of the first embodiment.

【図３】本発明の第２の実施例における演算処理装置の
制御パイプの構造図である。FIG. 3 is a structural diagram of a control pipe of an arithmetic processing unit in a second embodiment of the present invention.

【図４】同第２の実施例の演算処理装置における命令毎
のパイプラインの進行を示したタイミングチャート図で
ある。FIG. 4 is a timing chart showing the progress of the pipeline for each instruction in the arithmetic processing device of the second embodiment.

【図５】同第２の実施例の演算処理装置において第１の
実施例の図２に示した除算命令と並列に実行するための
除算命令のタイミングチャート図である。5 is a timing chart diagram of a division instruction to be executed in parallel with the division instruction shown in FIG. 2 of the first embodiment in the arithmetic processing device of the second embodiment; FIG.

【図６】従来の演算処理装置のデータパス部のブロック
図である。FIG. 6 is a block diagram of a data path section of a conventional arithmetic processing device.

【図７】従来の演算処理装置において除算命令を収束型
演算として実現する場合のタイミングチャート図である
。FIG. 7 is a timing chart diagram when a division instruction is implemented as a convergent operation in a conventional arithmetic processing device.

【図８】従来の演算処理装置の制御パイプの構造図であ
る。FIG. 8 is a structural diagram of a control pipe of a conventional arithmetic processing device.

【図９】従来の演算処理装置における命令毎のパイプラ
インの進行を示したタイミングチャート図である。FIG. 9 is a timing chart showing the progress of a pipeline for each instruction in a conventional arithmetic processing device.

[Explanation of symbols]

１　　　　レジスタ２　　　　データＲＯＭ３　　　　パイプライン型乗算器４　　　　パイプライン型乗算器の低位側乗算部５　　
　　パイプライン型乗算器の高位側乗算部６　　　　２
進数変換部７　　　　テンポラリレジスタ１１　　　　Ｉステージ１２　　　　Ｌステージ１３　　　　Ｅ１ステージ１４　　　　除算／平方根演算実行パイプのＥ２ｄステ
ージ１５　　　　除算／平方根演算実行パイプのＥ３ｄ
ステージ１６　　　　Ｓステージ１７　　　　乗算実行パイプのＥ２ｍステージ１８　　
　　乗算実行パイプのＥ３ｍステージ１９　　　　ＳＱ
ｕｅ２０　　　　ステートマシン1 Register 2 Data ROM 3 Pipeline multiplier 4 Low-order side multiplication section 5 of pipeline multiplier
High-order side multiplication section 6 2 of pipeline type multiplier
Base number converter 7 Temporary register 11 I stage 12 L stage 13 E1 stage 14 E2d stage 15 of division/square root operation execution pipe E3d of division/square root operation execution pipe
Stage 16 S stage 17 E2m stage 18 of the multiplication execution pipe
E3m stage 19 SQ of multiplication execution pipe
ue 20 state machine

Claims

[Claims]

1. When a second instruction to be processed by at least one of the arithmetic blocks is fetched during execution of a first instruction to be processed by a plurality of arithmetic blocks, An arithmetic processing method characterized by detecting an unused period of an arithmetic block required for a second instruction during execution, and executing the second instruction in parallel with execution of the first instruction.

2. A plurality of pipelines are provided after the execution stage of the control pipeline, and during execution of a first instruction to be processed by a plurality of calculation blocks, the instruction is processed by at least one of the calculation blocks. When a second instruction is fetched, an unused period of an operation block required for the second instruction is detected during execution of the first instruction,
means for supplying the second instruction to a second execution pipeline different from the first execution pipeline in which the first instruction exists and executing the second instruction in parallel with execution of the first instruction; An arithmetic processing device characterized by being provided.

3. A plurality of pipelines are provided after the execution stage of the control pipeline, and during execution of a first instruction processed by a plurality of arithmetic blocks, the instruction is processed by at least one arithmetic block among the arithmetic blocks. When a second instruction is fetched, an unused period of an operation block required for the second instruction is detected during execution of the first instruction,
The control pipeline is characterized by providing means for executing the second instruction in parallel with the execution of the first instruction by setting the state of the control pipeline to the execution state with respect to the second instruction before the execution stage. Arithmetic processing unit.

4. A pipelined multiplier having a configuration of at least three or more stages, wherein while a first instruction realized by a convergent operation is being executed, a second instruction realized by a multiplication instruction or a convergence operation instruction is executed. When an instruction is fetched, means is provided for detecting an unused period in each stage of the pipeline multiplier while executing the first instruction, and executing the second instruction during the same period. A processing unit that performs

5. A pipelined multiplier consisting of a three-stage configuration of high-order side and low-order side multipliers and a binary number conversion section, and a sequence of the multiplication section and the binary number conversion section is controlled and realized as a convergent operation. When executing an instruction, if the arithmetic precision is low, the multiplier on either the high-order side or the low-order side and the two
Calculation is performed using the base conversion unit, and when the calculation precision becomes larger than the input bit width of either the high-order side or low-order side multiplication unit, the calculation is performed using the high-order side and low-order side multiplication units and the binary conversion unit. 1. An arithmetic processing device comprising: a control unit configured to perform control.