JPS6086671A

JPS6086671A - Vector processing device

Info

Publication number: JPS6086671A
Application number: JP58194249A
Authority: JP
Inventors: Yasuhiro Inagami; 稲上　泰弘; Koichiro Omoda; 面田　耕一郎; Shigeo Nagashima; 長島　重夫; Takayuki Nakagawa; 貴之中川
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1983-10-19
Filing date: 1983-10-19
Publication date: 1985-05-16
Also published as: JPH0445860B2

Abstract

PURPOSE:To prevent degradation of the division capacity by performing division in a high speed in the pipeline system when vector data is divided in the reciprocal approximation system where multiplication is repeated to obtain a quotient. CONSTITUTION:A divided N and an approximate reciprocal R0 are supplied successively at one-machine cycle pitch through data busses 13 and 19 and data busses 14 and 20 respectively, and they are set to data registers 48 and 49 through data registers 40 and 41 or the like respectively. These numbers are subjected to multiplication processing in the pipeline system with multiple generating circuits 61 and 62, CSA trees 64 and 65, and a parallel adder 67, and a result N1 is obtained in a data register 77. A result R1 calculated by a pipeline multiplier is sent through a data bus 16 and is set to a data register 100. The result N1 calculated by a pipeline division adding circuit is set to a data register 101, and a pipeline multiplier 3 sends out the multiplication result as a quotient Q. Thus, division is performed in a high speed.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は、ベクトル処理装置において、ベクトルデータ
の除算をパイプラインにて高速に処理する装置に関する
ものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Application of the Invention] The present invention relates to a vector processing device that processes division of vector data at high speed in a pipeline.

[Background of the invention]

演′ｘ、数の集合に対し一連の演算を行うベクトルデー
タ処理装置においては、次々と供給されるベクトルデー
タをパイプラインで処理し、１マシンサイクルピツチで
次々と演算結果ｆｆｉ！出するパイプライン演：ｊｌＬ
器を有している場合が多い。加昇。In a vector data processing device that performs a series of operations on a set of numbers (operation 'x'), vector data supplied one after another is processed in a pipeline, and the operation results ffi! Pipeline performance: jlL
They often have a container. Kasho.

減算１乗算を行うパイプライン演算器については既に知
られ−Ｃいるが、除Ｘをパイプ２インで行っている例は
ない。このため、いくつかの四則演算の組合せで実行さ
れるプログラムを考えた場合、除算を含まない演算はパ
イプラインにて高速に処理されるが、除Ｊ！を含む演算
においては極端に性能が低下する場合があシ、除Ｘをパ
イプラインで高速に処理する必安がある。A pipeline arithmetic unit that performs subtraction and one multiplication is already known, but there is no example of one that performs division by two pipes. Therefore, when considering a program that is executed by a combination of several arithmetic operations, operations that do not include division are processed at high speed in the pipeline, but division J! In calculations involving X, the performance may be extremely degraded, so it is necessary to process the division X at high speed in a pipeline.

除算を高速に処理する演算方式として、逆数近似方式が
広く用いられている。この方式は、被除数をＮ、除数を
９１曲をＱとしたとき、とおいて、ＤｘＲｏｘＲ１×・
・・・・・を１に近づけるような几ｏ　、、Ｒ１、Ｒ２
・・・・・・をめることによシ、Ｑ〜ＮＸＲＩＯＸＲＩ
　Ｘ・・・・・・をめるものである。The reciprocal approximation method is widely used as an arithmetic method for processing division at high speed. In this method, when the dividend is N and the divisor is 91 songs as Q, then DxRoxR1×・
R1, R2 that brings ... closer to 1
・・・・・・・・・・・ Q〜NXRIOXRI
X...... is used.

最初の几０は除数りよシ近似逆数表を引いてＤの逆数を
近似的にめる。除数りの近似的な逆数である几◎の精度
は、近似逆数表の大きさに依存するが、今仮に次式で示
される精度でめったとする。For the first value, calculate the reciprocal of D by calculating the divisor and using the approximate reciprocal table. The accuracy of ⇠◎, which is an approximate reciprocal of the divisor, depends on the size of the approximate reciprocal table, but let us assume that the accuracy is as shown in the following equation.

ＤＸＲｏ＝１±ε　０くε＜　１　（１）ことで、Ｒ１＝２　ＤＸＲｏ　（２）とすると、Ｒ１＝２−（１±ε）＝１＋ε　（３）と表わされ、Ｄ　Ｘ　Ｒｏ　Ｘ＆　＝　（１±ε）（１〒ｇ）−１−
ε２０くε”＜１　（４）となる。次に几２　＝２　ＤＸＲｏ　ＸＲｔ　（５）とするとＪヨし、＝２−（１−ε２　ン　＝　１　＋　ε２　（
６）Ｄ　Ｘ　Ｒｏ　Ｘ　ＲＩ　Ｘ　Ｒ２＝　（１−ε２
）（１＋６２）＝１−ε４０くε’＜１　＋７１となる。同様の操作を繰シ返すことによυ、ＲｏＸＲｔ
　ｘ−・・−・ｘｎ、→１（８）とすることができる。DXRo=1±ε 0×ε< 1 (1) Therefore, if R1=2 DXRo (2), then R1=2−(1±ε)=1+ε (3), and DXRo X&= (1±ε) (1〒g)−1−
ε20kuε"<1 (4) Next, if 几2 = 2 DXRo
6) D X Ro X RI X R2= (1-ε2
)(1+62)=1−ε40×ε′<1+71. By repeating the same operation, υ, RoXRt
It can be set as x-...xn, →1(8).

いま仮に６４が対象とするベクトル処理装置のデータ表
現有効桁数以下の精度となれば、となって、Ｑ＝　Ｎ　Ｘ　Ｒｏ　Ｘ　Ｒｔ　Ｘ　Ｒ２ｕｏｌを商と
することができる。Now, if the precision of 64 is less than the number of effective digits of data representation of the target vector processing device, then Q=N x Ro x Rt x R2uol can be used as the quotient.

このように逆数近似方式による原典はデータ表現有効桁
数よシ梢度の悪い除数の近似的な逆数をあらかじめ用意
しておき、米菓処理を繰り返すことによシ逆数の精度を
上げて曲を得ようとするものである。In this way, the source material using the reciprocal approximation method is to prepare in advance an approximate reciprocal of the divisor, which has a lower precision than the number of effective digits for data expression, and then repeat the rice cracker process to improve the accuracy of the reciprocal and create a song. That's what you're trying to get.

高速の乗算器をもつ処理装置においては、逆数近似方式
は除ＪＪＬを高速に処理する有効な’０Ｋｌｔ方式であ
るが、一方、パイプライン方式で除ＪＩ４．を行う場合
には、所望のＦｆ！嵐を得るまでに必要な乗算の回数に
寺しい乗算器を直列、にＭ会する必要があシ、パイプラ
インで除ｘ’を行うために必要な回路の規模饋が膨大な
ものとなる。In a processing device with a high-speed multiplier, the reciprocal approximation method is an effective '0Klt method that processes division JJL at high speed, but on the other hand, the pipeline method processes division JI4. When performing the desired Ff! Since it is necessary to connect multiple multipliers in series to match the number of multiplications required to obtain the result, the scale of the circuit required to perform division x' in a pipeline becomes enormous.

このため、従来のベクトル処理装置においては所望の１
腿を得るまでに必要な米真の回数に等しい数だけ、１個
のパイプライン末算器を繰シ返し用いることによシ除算
を実現している例がめった。For this reason, in conventional vector processing devices, the desired 1
In rare cases, the division is realized by repeatedly using a single pipeline terminator a number of times equal to the number of squares required to obtain the thigh.

すなわち、仮除数Ｎ１除畝りとした場合、次の４ステツ
プで商Ｑを得る。That is, when the temporary divisor is N1, the quotient Q is obtained in the following four steps.

ステップ１：近似逆数の１丼几０〜１／Ｄ　μυ ステップ２：近似逆数の精度向上Ｒｔ　＝　（２ＤＸＲｏ　）　ｕ７）ステップ３：分子の近似Ｎｓ　＝ＮＸＲｏ　（１３ステップ４：閤の精度向上Ｑ＝＝Ｎ＋　ＸＲｘ　（１４１第１図は、このときに用いる浮動小数点データ表現形式
を示したものである。Step 1: 1 bowl of approximate reciprocal 0 to 1/D μυ Step 2: Improving accuracy of approximate reciprocal Rt = (2DXRo) u7) Step 3: Approximating numerator Ns = NXRo (13 Step 4: Improving accuracy of Q = =N+XRx (141 FIG. 1 shows the floating point data representation format used at this time.

浮動小数点データは、符号の表現に１ビツト、指数の表
現に１５ビツト、仮数の表現に４８ビツト、曾計６４ビ
ットを使用して衣男される。この従来技術では、逆数近
似方式による除鼻によって仮数部のデータ有効表現術ｉ
４８ビットの梢厩を確保すればよく、そのために上記ス
テップ１からステップ４の４段階の演算で除′Ｊ４．を
実現している。Floating point data is processed using 1 bit to represent the sign, 15 bits to represent the exponent, and 48 bits to represent the mantissa, for a total of 64 bits. In this conventional technology, the effective representation of the data of the mantissa is achieved by removing the nose using the reciprocal approximation method.
It is sufficient to secure a 48-bit memory, and for this purpose, the four-step operation from step 1 to step 4 is performed to divide 'J4. has been realized.

ステップ１では除数りの近似逆数ＲＯを精度３０ビット
でめる。次にステップ２で精度を４７ビツトまで向上さ
せてＲｓ’にめ、ステップ３、ステップ４で、ＮＸＲＯ
ＸＲ，を１弄して藺Ｑを丼出している。謁２図は、この
除算処理における精度向上の様子を示したものである。In step 1, an approximate reciprocal number RO of the divisor is calculated with a precision of 30 bits. Next, in step 2, improve the accuracy to 47 bits and set it to Rs', and in steps 3 and 4, NXRO
I play with XR and serve a bowl of IiQ. Figure 2 shows how the accuracy is improved in this division process.

従来技術では上記４ステツプから成るベクトルデータの
除算処理を、次に示すような命令およびハードウェア構
成で実現している。In the prior art, the vector data division process consisting of the four steps described above is realized by the following instructions and hardware configuration.

ステップ１の精度３０ピットの近似逆畝几Ｏをめる処理
は、この処理のためだけに用意されたＦｌｏａｔｉｎｇ
　ｐｏｉｎｔ　ｒｅｃｉｐｒｏｃａｌ　ａｌ）ｌ）ｒｏ
ｘｉｍ３ｔｉ□ｎ命令で行われる。The process of creating an approximate inverted ridge O with an accuracy of 30 pits in step 1 is a Floating process prepared just for this process.
point reciprocal al)l)ro
This is done using the xim3ti□n instruction.

ステップ２の処理は、この処理のためだけに用意された
ｌ（、ｅｃｉｐｒｏｃ２１１ｔｅｒａｔｉｏｎｓ　命令
で処理される。The process in step 2 is performed using the l(, eciproc211terations command, which is prepared only for this process.

ステップ３およびステップ４の処理は、通常の浮動／」
・数点ベクトル乗算命令を用いて処理される。The processing in steps 3 and 4 is a normal floating/"
・Processed using several-point vector multiplication instructions.

以上の処理のために従来では浮動小数点栄典ユニットお
よび浮動小数点逆数近似ユニットが用いられる浮動小数
点乗算ユニットは、迫′にのベクトル乗Ｊｌ命令の処理
に用いられる演算器で、１マシンサイクルピツチで次々
と送られてくるデータをパイプラインで処理し、１マシ
ンサイクルに１演＃鮎釆の割合で乗算結果を出力するバ
イグライン構造の乗算器でおる。浮動小数点逆数近似ユ
ニットは、ｐｌｏａｔｊｒｌｇ　ｐｏｊｎｔ　ｌ（、ｅ
ｃｉｐｒｏｃ４１ＡＴ；）ｐｒｏｘｉｍＢ　ｔ　ｉ□ｎ
命令を処理すルユニットテ、１マシンサイクルピツチで
久々と送られてくるデータの精度３０ビツトの近似的な
逆数をパイプ２インで鼻出し、１マシンサイクルに１演
算結果の割合で近似逆数を出力するパイプライン構造の
演算器である。Conventionally, a floating-point multiplication unit and a floating-point reciprocal approximation unit are used for the above processing.The floating-point multiplication unit is an arithmetic unit used to process the vector multiplication Jl instruction, which is executed one machine cycle at a time. It is a multiplier with a bigline structure that processes the data sent in a pipeline and outputs the multiplication result at a rate of one operation per machine cycle. The floating point reciprocal approximation unit is plotjrlg pojnt l(,e
ciproc41AT;)proximB t i□n
The unit that processes instructions outputs an approximate reciprocal number with a precision of 30 bits from the data that has been sent for a long time at a rate of 1 machine cycle, and outputs the approximate reciprocal number at a rate of 1 operation result per 1 machine cycle. It is an arithmetic unit with a pipeline structure.

上記のことかられかるように、従来ベクトルデータの除
算においては、浮動小数点逆数近似ユニットｔ１回、浮
動小数点栄舅ユニットを３回に用して、いずれもパイプ
ラインで処理されるｐｌｏｔｔｉｎｇ　ｐｏｉｎｔ　Ｂ
ｅｃｉｐｒｏｃａｌ　Ａｐｐｒｏｘｊｌｎａｔｉｏｎ命
令を１回、１（ｅｃｉｐｒｏｃａＩ　ＪｔｅｒａｔｌＯ
ｎ８　命令を１回、浮動小数点乗算命令を２回、１４命
令を実行してベクトルデータの除Ｊ！ヲパイプライン的
に処理する。このような処理方式では、ベクトルデータ
の成典処理に４命令の実行を必要とし処理時間がかかる
とともに、除算処理のために浮動小数点乗算ユニットが
使用されている間、通常の來昇処理が行えないといった
問題がある。また、間Ｑを得るまでに、ステップ２．ス
テップ３のつ団瀦果几１．Ｎ１を商Ｑを侍るまでの中間
結果として保持する必要があり、余分の１白、は鎖酸あ
るいはベクトルレジスタが必要であるといった問題があ
る。As can be seen from the above, in the conventional division of vector data, the floating point reciprocal approximation unit t is used once and the floating point reciprocal unit is used three times, both of which are processed in a pipeline at plotting point B.
Execute the eciprocal Approxjlnation instruction once, 1 (eciprocaI JteratlO
Execute n8 instruction once, floating point multiplication instruction twice, and 14 instructions to divide vector data J! Process it in a pipeline. In such a processing method, it is necessary to execute four instructions to process vector data, which takes processing time, and while the floating-point multiplication unit is used for division processing, normal inbound processing cannot be performed. There are problems like this. Also, before obtaining the interval Q, step 2. Step 3: 1. There is a problem in that it is necessary to hold N1 as an intermediate result until the quotient Q is received, and the extra 1 white requires a chain register or a vector register.

[Purpose of the invention]

本発明の目的は、乗算を繰υ返して藺をめる除算方式を
採用しているベクトル処理装置において、ベクトルデー
タの除算をパイプラインで高速に処理する回路を提供す
ることにある。An object of the present invention is to provide a circuit that processes division of vector data at high speed in a pipeline in a vector processing device that employs a division method in which multiplication is repeated.

[Overview of the invention]

本発明の符似とするところは、筒速性を２１ｉ＄するた
めに候数個のパイプライン演算器を装備したベクトル処
理装置において、乗算処理を繰シ返して藺をめる除算方
式を採用してベクトルデータの除Ｊｌを行うとき、曲を
めるまでに心安な乗算を専用に処理する乗算回路を用意
することはせず、ベクトルデータの乗Ｊ１行う命令の処
理に使用することを目的として設けられたパイプライン
乗算器２個をＭ慎重に結合し、かつ連動して動作させて
除算に必要な９．典処理を行い、除算をパイプラインに
て筒速に処理するところにある。すなわち、２個のパイ
プライン乗算器を組とし、一方の乗算器の出力結果を他
方の乗算器に込るデータバスを設けることによ９２個の
パイプライン乗算器を頴合し、また後者のパイプライン
乗算器とデータの供給口を共用する除算専用のパイプラ
イン構造の付加回路を設け、これらを連動して動作させ
ることによシ、ベクトルデータの除算をパイプ２インで
処理する。パイプラインによる味昇は、入力オペランド
として除数を指足し、その近似的な逆数を出力オペラン
ドとする命令、お↓び入力オペランドとして被除数、除
数、除数の近似的な逆数を指定し、藺を出力オペランド
とする命令の付８ｒ２命令を浬絖して実行することによ
シ行われる。The present invention is similar in that, in order to achieve a cylinder speed of 21 i$, a vector processing device equipped with a candidate number of pipeline arithmetic units employs a division method that repeats multiplication processing to achieve the desired result. When performing division Jl of vector data, we do not prepare a multiplication circuit dedicated to processing safe multiplication until the end of the song, but instead use it to process the instruction that multiplies vector data J1. Two pipeline multipliers provided as M are carefully combined and operated in conjunction to obtain the 9. The main purpose of this method is to perform standard processing and process division at cylinder speed in a pipeline. That is, by pairing two pipeline multipliers and providing a data bus that inputs the output result of one multiplier into the other multiplier, the 92 pipeline multipliers can be combined. By providing an additional circuit with a pipeline structure dedicated to division that shares a pipeline multiplier and a data supply port, and operating these in conjunction, division of vector data is processed in two pipes. Pipelining is an instruction that adds a divisor as an input operand and its approximate reciprocal as an output operand, and specifies the dividend, divisor, and approximate reciprocal of the divisor as input operands, and outputs the output. This is done by executing the 8r2 instruction with the instruction as the operand.

[Embodiments of the invention]

以下、実施例を用いて本発明の詳細な説明する。 Hereinafter, the present invention will be explained in detail using Examples.

本発明における除算方式は、逆数近似方式を基本として
いる。本実施例においては、７Ａ３図にボず浮動小数点
データ形式をもつベクトル処理装置を考える。ベクトル
処理装置のデータ派現形式がどのような形式であるかは
本発明の本質ではない。The division method in the present invention is based on a reciprocal approximation method. In this embodiment, a vector processing device having a floating point data format shown in FIG. 7A3 will be considered. The nature of the data derivation format of the vector processing device is not essential to the present invention.

本実施例で扱うデータ表現形式は、第３凶に示す如く、
符号部を１ビツト、指数部をｅビット、仮数部をｍビッ
トで表現している。また、仮数部の小数点は仮数部の先
頭に位置する。The data representation format handled in this example is as shown in the third example.
The sign part is represented by 1 bit, the exponent part by e bits, and the mantissa part by m bits. Furthermore, the decimal point of the mantissa is located at the beginning of the mantissa.

逆数近似方式による除算で荷に関係あるのは、仮数部の
嵌現桁数であり、第３図のデータ衣現形式では、仮数部
の有効桁数はｍビットであるので逆数近似方式による除
算において、乗算の繰シ返しによシ、精度ｍビット（精
度２−”　）の藺をめればよい。What is relevant in division using the reciprocal approximation method is the number of digits present in the mantissa.In the data representation format shown in Figure 3, the number of significant digits in the mantissa is m bits, so division using the reciprocal approximation method is In this case, it is sufficient to consider the precision of m bits (precision 2-'') for the repetition of multiplication.

本実施例においては、近似逆数表に記憶させておく除数
の第一次近似逆数は梢Ｋｔビットとし、かつ、第一次近
似逆数のｎ度と浮動小数点有効桁数との間には次の関係
がある。In this embodiment, the first approximate reciprocal of the divisor to be stored in the approximate reciprocal table is Kt bits, and the distance between the n degrees of the first approximate reciprocal and the number of floating point significant digits is as follows. There is a relationship.

６ｔ≦ｍ（７ｔ　μｍ即ち、近似逆数表を引いて得られる除数の第一次近似逆
数に対し、浮動小数点有効桁数ｍビットを満たす藺を得
るには、逆数近似方式によシ乗算を繰り返すことによυ
、精度を６倍に上げる必要がある。第３図に、第一次近
似逆数の精度ｔと浮動小数点データ仮数部有効桁数ｍと
の関係を示す。6t≦m(7t μm In other words, to obtain a value that satisfies the number of floating point significant digits (m bits) for the first approximate reciprocal of the divisor obtained by subtracting the approximate reciprocal table, repeat multiplication using the reciprocal approximation method. Especially υ
, it is necessary to increase the accuracy by a factor of 6. FIG. 3 shows the relationship between the precision t of the first approximate reciprocal and the number m of significant digits in the mantissa part of floating point data.

本実施例において、被除数をＮ１除ｅ、をＤとし、近似
逆数表を引いて除数りの第一次近似逆数であるｒを得て
から、精度を６倍に上げて、商を得るまでの原理は次の
通ｐである。In this example, the dividend is divided by N1, e is D, and after drawing the approximate reciprocal table to obtain r, which is the first approximate reciprocal of the divisor, the precision is increased by 6 times to obtain the quotient. The principle is as follows.

藺Ｑを得るまでの処理は、次の６ステツプより成ｚ０ステップ１　：除数りの仮数部上位ｔビラトラもとに、近似逆数衣を引
いて、除数りの第一次近似逆数ｒを得る。第一次近似逆
数ｒの精度は前述のようにｔビットであるので、第一次
近似逆数ｒを得るのに必要な除数りのビット数は、仮数
部ｍビットのうちの上位ｔビットでよい。The process to obtain Q consists of the following 6 steps. Step 1: Subtract the approximate reciprocal value from the upper t number of mantissa parts of the divisor to obtain the first approximate reciprocal r of the divisor. As mentioned above, the precision of the first approximate reciprocal r is t bits, so the number of bits of the divisor necessary to obtain the first approximate reciprocal r may be the upper t bits of the m bits of the mantissa. .

ステップ２　：ｒｔ　＝１＋（１−１）ｘ　ｒ）　＋（１−ＤＸｒ）２
μθ の岨其。Step 2: rt = 1 + (1-1) x r) + (1-DXr) 2
岨其 of μθ.

ステップ３　：几０＝ｒｘｒＩ　Ｈの計算。Step 3: 几0=rxrI　H calculation.

ステップ４　：Ｒｔ　＝　２−Ｄ　Ｘ　Ｒｏ　ｕｌの計算。Step 4: Rt = 2-D X Roul calculation.

ステップ５　：Ｎ　１　＝　Ｎ　Ｘ　ｆｉ、　Ｏ四の計算。Step 5: N1=NXfi, O4 calculation.

ステップ６　：Ｑ　＝　Ｎ　Ｉ　Ｘ　Ｒ１ンｐの計算。Step 6: Q = N I X R1mp calculation.

次に上記ステップ１からステップ６の処理で、精度ｍビ
ットの商Ｑが得られることを示す。Next, it will be shown that a quotient Q with an accuracy of m bits is obtained by the processing from step 1 to step 6 above.

除数りと第一次近似逆数ｒを乗じると１に近い櫨となシ
、その誤差をε（０≦さく１）とすると次式がいえる。When the divisor is multiplied by the first approximation reciprocal r, the value is close to 1. If the error is ε (0≦x1), the following equation can be obtained.

ＤＸｒ＝１±ε　＠誤着εは第一次近似逆数ｒが浮動小数点仮数部有効桁数
ｍビットよシ小さいｔビットの精度しが持たないことに
起因する。DXr=1±ε @ The error ε is caused by the fact that the first approximate reciprocal r does not have an accuracy of t bits, which is smaller than the number of significant digits of the floating point mantissa, m bits.

式住η、（至）、四、（至）、ンυと弐四よ９次の一連
の式が導かれる。The formulas η, (to), 4, (to), υ and 24 lead to a series of 9-order equations.

−１−１）Ｘｒ＝ｌ−（１±６）＝王ε　■ ・　（１−ＤＸ　ｒ　）”−ｇ　”　＠・　ｒ１＝１干
ε＋ε２　（至）＊　ＤＸＲ，＝Ｄｘｒｘｒｌ＝（１±ε）（１不ε＋ε２）＝１±６３　（２）・　几ｌ　＝２ＤＸＲ◎ −２−（１±６３　）＝１＋ε３ｖ０ −　ｌ）　Ｘ　ＲＯＸ　Ｒ１＝　（１±６３　）（１＋
ｇ３　）＝１−６６　＠式（至）よシ、Ｄｘ几ｏＸＲｔを計算することによって
、除数りの逆数の積置が、第−次近似逆ｅｒのεからそ
の６倍のｎ度であるε６に上がシ、浮動小数点データ仮
数部有効表現桁数以下の精度となる。-1-1) (1+ε2) =1±63 (2) ・几l=2DXR◎ −2−(1±63) =1+ε3v0 − l) X ROX R1= (1±63) (1+
g3 ) = 1-66 @ Equation (to), by calculating Dx o In this case, the precision of the floating-point data mantissa is less than or equal to the number of effective representation digits.

従って、商ＱをＱ−Ｎ　Ｘ　Ｒｏ　ＸＲ１＠とすれば、〜ＮＸＲＯＸＦ＆１　＜となって、浮動小数点データ仮数部有効桁数の精度を満
たす値となる。Therefore, if the quotient Q is Q-N X Ro

上述のステップ５．ステップ６は式に）を計算するため
のものである。Step 5 above. Step 6 is for calculating ) in Eq.

本実施例においては、商を得るまでに必要な上述のステ
ップ１からステップ６までの処理を次のようにして実現
する。即ち、ステップ１がらステップ３までの処理’ｋ
　’Ｖ　Ｅ　Ｒ命令（ｖｅｃｔｏｒｇｌｅｍｅｎｔｗｉ
ｓｅ　１（ｅｃｉｐｒｏｃＢｌ命令）で行う、　ＶＥＲ
命令の出力結果は式恨樽で与えられる凡０であシ、ＲＯ
は式に）から、除数りの誤差ε３の近似的な逆数となっ
ている。従って、ＶＥＲ命令は、而Ｑを得る中間結果を
出力する命令であるとともに、近似逆数を算出する命令
ともなっている。ステップ４からステップ６の処理はＶ
ＢＤ命令（ＶｅｃｔｏｒＥｌｅｍｅｎｔｗｉｓｅ　Ｄｉ
ｙｊｄｅ命令〕で行う。In this embodiment, the processes from step 1 to step 6 described above, which are necessary to obtain the quotient, are implemented as follows. That is, the processing 'k from step 1 to step 3
'V E R instruction (vectorglementwi
VER performed with se 1 (eciprocBl instruction)
The output result of the command is approximately 0 given in the formula, RO
is an approximate reciprocal of the error ε3 of the divisor. Therefore, the VER instruction is an instruction that outputs an intermediate result to obtain Q, and is also an instruction that calculates an approximate reciprocal. The processing from step 4 to step 6 is V
BD instruction (VectorElementwise
yjde command].

このように、前記のステップ１がらステップ６までの除
算の処理を、ＶＥＲ命令、ＶＥＤ命令の２個の命令を連
続して実行することにより行う。In this way, the division processing from step 1 to step 6 described above is performed by successively executing two instructions, the VER instruction and the VED instruction.

ＶＥＲ命令およびＶＥＤ命令の処理は、通常の乗算命令
−ＶＥＭ命令（■ｅｃｔｏｒ　ｇｌｅｒｎｅｎｔｗｉ３
ｅＭＨ１ｔｉｐｌｙ命令）と呼ぶことにする−をパイプ
ラインで処理するパイプライン乗算器および除算処理の
ために設けたパイプライン構造の除算回路を連動させて
動作させることによシ、パイプラインで処理される。以
下、処理の内容を詳細に示す。The processing of the VER instruction and the VED instruction is a normal multiplication instruction-VEM instruction (■ector glernentwi3
eMH1tiply instruction) is processed in a pipeline by operating a pipeline multiplier and a division circuit with a pipeline structure provided for division processing in conjunction with each other. . The details of the processing will be described below.

最初に、パイプライン乗算器の構造例を示す。First, an example of the structure of a pipeline multiplier will be shown.

第４図はパイプライン乗算器の侮成を示したものである
。第４図の中で、１．２１’ｔ、それぞれ被乗数、乗数
が送られてくるデータバス、３は栄其紹果を出力するデ
ータバス、１０〜１７はデータレジスタ、２０〜２３は
倍数発生回路、３０〜３３はＣ８Ａトリー（Ｃａｒｒｙ
　５ａｖｅ　Ａｄｄｅｒ）　リー）、３４はパラレルア
ダー、４０は第１部分積のキャリー出力レジスタ、４１
は第１部分積のサム出力レジスタ、以下４２〜４７は第
２部分積、第３部分積、第４部分積のキャリー出力レジ
スタおよびサム出力レジスタ、４８は乗：Ｊｌ：Ｉ＃ｉ
果レジスタである。第４図に示すパイプライン乗算器の
乗算方式は既に公知の技術であって、乗数を２ビット単
位でデコードして被乗数の倍数を発生させ（＊数の仮数
部は８ｇ３図に示した如くｍビットであるので、７個の
倍数が発生される）、これら倍数をＣＢｒｒｙＳａｖｅ
　Ａｄｄｅｒおよびパラレル・アダーで加算して米鼻結
果を得るものである。第４図の例では、ｍビットと、デ
ータレジスターｏに保持される被乗数を倍数発生回路２
ｏに入力して倍数を発生させ、これらの倍数をＣ８Ａ）
！Ｊ−３０に入力して加算し、データレジスタ４ｏにキ
ャリー出力、データレジスタ４１にサム出力を得る（第
１部分積の算出）。次に、乗数の第２下位−ビットと被
乗数を倍数発生回ｊｉｌ）２１に入力して倍数を発生さ
せ、これらの倍数と第１部分積のキャリー出方、サム出
力とをＣ８Ａ）り　−３１で加算して、第２部分積のキ
ャリー出力、サム出力をそれぞれデータレジスタ４２．
４３に得る。以下同様の処理をして一第４部分槓のキャ
リー出力をデータレジスタ４６、サム出力をデータレジ
スタ４７に得、これらをパラレルアダー３４で加算して
、データレジスタ４８に最終積を得る。FIG. 4 shows the implementation of a pipeline multiplier. In Figure 4, 1.21't is a data bus to which the multiplicand and multiplier are sent, respectively, 3 is a data bus that outputs Rongqi Shaogu, 10 to 17 are data registers, and 20 to 23 are multiple generation. Circuits, 30 to 33 are C8A tree (Carry
5ave Adder), 34 is a parallel adder, 40 is a first partial product carry output register, 41
is the sum output register of the first partial product, 42 to 47 are the carry output register and sum output register of the second partial product, third partial product, and fourth partial product, and 48 is the multiplication: Jl:I#i
This is the result register. The multiplication method of the pipeline multiplier shown in Fig. 4 is already a well-known technology, in which the multiplier is decoded in 2-bit units to generate a multiple of the multiplicand (*The mantissa part of the number is m as shown in Fig. 8g3. bits, so multiples of 7 are generated), and save these multiples using CBrrySave
Adder and parallel adder are added to obtain the rice nose result. In the example shown in FIG. 4, the m bits and the multiplicand held in the data register o are
o to generate multiples and convert these multiples into C8A)
! It is input to J-30 and added, and a carry output is obtained to the data register 4o, and a sum output is obtained to the data register 41 (calculation of the first partial product). Next, the second lower bit of the multiplier and the multiplicand are input to the multiple generation circuit 21 to generate a multiple, and these multiples, the carry output of the first partial product, and the sum output are calculated by C8A) -31 and the carry output and sum output of the second partial product are respectively sent to the data register 42.
Obtained at 43. Thereafter, similar processing is performed to obtain the carry output of the first and fourth partial outputs in the data register 46 and the sum output in the data register 47, and these are added by the parallel adder 34 to obtain the final product in the data register 48.

第４図に示した例では、上記に示した末其処理カハイフ
ラインで行える構造となっている。即チ、入力データで
ある被乗数および乗数がそれぞれデータバス１および２
から処理装置の基本処理単位時間である１マシンサイク
ルに１データの割付で次々と送られてくる。最初に送ら
れてきた被乗数および乗数がそれぞれデータレジスタ１
０．１４にセットされると直ちに第１ｓ分禎の計算にｓ
ｂ、１マシンサイクル後にキャリー出力、サム出力がデ
ータレジスタ４０．４１にまる。これと同時に、最初に
送られてきた被乗数はデータレジスタ１５にセットされ
、データレジスタ１ｏおよび１４には第２膏目に送られ
てきた被乗数および乗数がセットされる。以下同様に、
最初に送られてきたデータの第２部分績がデータレジス
タ４２゜４３にまるとき、データレジスタ４０．４１に
は第２膏目に送られてきたデータの第１部分績が、また
データレジスタ１０．１４には第３番目に送られてきた
被乗数および乗数がそれぞれセットされる。そして、デ
ータレジスタ４８に最初に送られてきたデータの最終積
がまるとき、第２１Ｆ目に送られてきたデータの第４部
分積がデータレジスタ４６．４７に、第３査目に送られ
てきたデータの第３部分績がデータレジスタ４４．４５
に、第３着目に送られてきたデータの第２部分棟がデー
タレジスタの４２．４３に、第４査目に送られテキたデ
ータの第１部分績がデータレジスタ４０゜４１に、第５
査目に送られてきた被乗数および乗数がデータレジスタ
１０．１４にセットされている。In the example shown in FIG. 4, the structure is such that it can be carried out using the above-mentioned terminal processing line. In other words, the input data, the multiplicand and the multiplier, are connected to data buses 1 and 2, respectively.
The data is sent one after another with one data allocation per machine cycle, which is the basic processing unit time of the processing device. The first multiplicand and multiplier sent are each in data register 1.
When set to 0.14, s is immediately used to calculate the first s division.
b. After one machine cycle, the carry output and sum output are stored in data registers 40 and 41. At the same time, the multiplicand sent first is set in data register 15, and the multiplicand and multiplier sent in the second column are set in data registers 1o and 14. Similarly below,
When the second part of the data sent first is stored in the data registers 42 and 43, the first part of the data sent to the second column is stored in the data registers 40 and 41. The third sent multiplicand and multiplier are respectively set in .14. Then, when the final product of the data sent first to the data register 48 is completed, the fourth partial product of the data sent to the 21st F is sent to the data register 46.47 in the third scan. The third part of the data is stored in data register 44.45.
Then, the second part of the data sent to the third checker is placed in the data register 42.43, the first part of the data sent to the fourth checker is placed in the data register 40°41, and the fifth part is placed in the data register 40.41.
The multiplicand and multiplier sent to the test are set in data register 10.14.

このようにして乗算がパイプラインで処理され、最初に
送られてきたデータの乗算結果がデータバス３を介して
送出されると、以後エマシンサイクルピッチで次々と乗
算結果が送出される。In this way, multiplication is processed in the pipeline, and once the multiplication result of the data sent first is sent out via the data bus 3, the multiplication results are sent out one after another at the machine cycle pitch.

次に、前記のステップ１からステップ６までの除算処理
を、第４図に示したパイプライン乗算器２個および除算
専用のパイプライン構造の付加回路を有機的に納会して
処理する実施例を第５図を用いて詳細に説明する。Next, we will discuss an embodiment in which the division processing from Step 1 to Step 6 described above is processed by organically combining the two pipeline multipliers shown in FIG. 4 and an additional circuit with a pipeline structure dedicated to division. This will be explained in detail using FIG.

第５図において、１および３はパイプライ／乗算器であ
って、構造は第４図に示したパイプライン乗算器と全く
同じである。パイプライ／乗算器１および３は独立に動
作することができ、ベクトルデータの乗算を行うＶＥＭ
命令をそれぞれ独立に処理できる。即ち、パイプライン
乗算器１でＶＥＭ命令を処理する場合は、データバス１
０゜１１から被乗数データ、乗数データが次々と供給さ
れ、乗算結果がデータバス１２かう次々と送出される。In FIG. 5, 1 and 3 are pipeline/multipliers whose structure is exactly the same as the pipeline multiplier shown in FIG. Pipeline/multipliers 1 and 3 can operate independently and are VEMs that perform vector data multiplication.
Each command can be processed independently. That is, when pipeline multiplier 1 processes a VEM instruction, data bus 1
Multiplicand data and multiplier data are supplied one after another from 0.degree. 11, and the multiplication results are sent out one after another over the data bus 12.

またパイプライ／乗算器３でＶＥＭ命令を処理する場合
は、データバス１３．１４から被乗数データ、乗数デー
タが次々と供給され、乗算結果がデータバス１５から次
々と送出される。パイプライン乗算器１と３で別のＶＥ
Ｍ命令を同時に処理することが可能である。When the pipeline/multiplier 3 processes a VEM instruction, multiplicand data and multiplier data are successively supplied from the data buses 13 and 14, and multiplication results are sent from the data bus 15 one after another. Separate VE in pipeline multipliers 1 and 3
It is possible to process M instructions simultaneously.

次に、第５図に示した実施例における除算処理の動作を
説明する。第５図に示した実施例において除算処理を行
うとき、回路の構成上次の点が特徴的である。Next, the operation of the division process in the embodiment shown in FIG. 5 will be explained. When performing division processing in the embodiment shown in FIG. 5, the following points are characteristic of the circuit configuration.

（１）　第５図において、４はパイプライン除算付加回
路であシ、前記の除算処理ステップ１からステップ６の
うち、ステップ２とステップ５の処理を行うために特別
に設けられた回路であシ、パイプライン構造となってい
る。パイプライン除算付加回路の内部構成の詳細はの入
力データの供給口は、パイプライ／乗算器３０入力デー
タの供給口と共通となっておシ、パイプライン除算付加
回路４にデータを供給するときは、パイプライン乗算器
３にデータを供給するデータバス１３．１４から、デー
タバス１９．２０ｆｔ介して行われる。(1) In FIG. 5, 4 is a pipeline division addition circuit, which is a circuit specially provided to perform steps 2 and 5 of the division processing steps 1 to 6. It has a pipeline structure. For details of the internal configuration of the pipeline division addition circuit, the input data supply port is common to the input data supply port of the pipeline/multiplier 30. When supplying data to the pipeline division addition circuit 4, , from data bus 13.14, which supplies data to pipeline multiplier 3, via data bus 19.20ft.

（３）第５図において、パイプライン除算付加回路４の
出力データはデータバス１７．１８′を介してパイプラ
イン乗算器３へ送られる。(3) In FIG. 5, the output data of the pipeline division addition circuit 4 is sent to the pipeline multiplier 3 via the data bus 17, 18'.

（４）　（２）、　（３）より、パイプライン除算付加
回路４は専用の入力データ供給口および出力データ送出
口を持たず、パイプライン乗算器３に併設された付加回
路の性格をもつ。便って、第５図において、第５図の回
路を含むベクトル処理装置からみたとき、パイプライン
除算付加回路４は１個の独立した演算器でなく、パイプ
ライン乗算器３と合わせた回路が１個の演算器として扱
われる。第５図において、パイプライ／乗算器とパイプ
ライン除算付加回路を会わせた回路２を、除算付加機構
付パイプライン乗算器と呼ぶ。即ち、本実施例における
除算処理専用に設けたパイプライン除算付加回路４は、
ベクトル処理装置全体から見た場合、大量のベクトルデ
ータのやシとシをするデータバスを新たに設ける必要が
ないという点で有利であり、本実施例の特徴の１つであ
る。(4) From (2) and (3), the pipeline division additional circuit 4 does not have a dedicated input data supply port and output data output port, and has the characteristics of an additional circuit attached to the pipeline multiplier 3. For convenience, in FIG. 5, when viewed from a vector processing device including the circuit in FIG. 5, the pipeline division addition circuit 4 is not an independent arithmetic unit, but a circuit combined with the pipeline multiplier 3. It is treated as one arithmetic unit. In FIG. 5, a circuit 2 in which a pipeline/multiplier and a pipeline division addition circuit are combined is called a pipeline multiplier with division addition mechanism. That is, the pipeline division addition circuit 4 provided exclusively for division processing in this embodiment is as follows:
When viewed from the perspective of the vector processing device as a whole, this embodiment is advantageous in that there is no need to provide a new data bus for handling a large amount of vector data, which is one of the features of this embodiment.

（５）第５図において、パイプライ／乗算器１の出力デ
ータを、パイプライン乗算器３に送るデータバス１６が
ちシ、且つ、その間にビット反転回路２１が入っている
。(5) In FIG. 5, there is a data bus 16 for sending the output data of the pipeline/multiplier 1 to the pipeline multiplier 3, and a bit inverting circuit 21 is inserted therebetween.

（６）第５図において、３０．３１はデータバスセレク
ト回路であｐ１データバスセレクト回路３０はパイプラ
イン乗算器３へ被乗数データを供給するパス１３、パイ
プライン除算付加回路４の出力データを送るパス１７、
パイプライン乗算器１の出力データをビット反転したデ
ータを送るパス１６のいずれかを選択することができ、
またデータセレクト回路３１は、パイプライン乗算器３
へ乗数データを供給するパス１４、パイプライン除算付
加回路４の出力データを送るパス１８のいずれかを選択
することができる。(6) In FIG. 5, 30.31 is a data bus select circuit, and the p1 data bus select circuit 30 sends the output data of the pipeline division addition circuit 4 to the path 13 that supplies multiplicand data to the pipeline multiplier 3. pass 17,
It is possible to select one of the paths 16 for sending data obtained by bit-inverting the output data of the pipeline multiplier 1,
Further, the data selection circuit 31 includes a pipeline multiplier 3
It is possible to select either the path 14 for supplying multiplier data to or the path 18 for transmitting output data of the pipeline division/addition circuit 4.

次に、パイプライン除算付加回路４の内部構成を説明す
る。第５図において、４０〜５１および７８．７９はデ
ータレジスタ、６０〜６２は第４図で説明した＋ｆ！数
発生回路と同一の回路、６３〜６５は第４図で説明した
Ｃ８Ａ）ＩＪ−と同一の回路、６６．６７は第４図で説
明したパラレルアダーと同一の回路、７０〜７５はそれ
ぞれＣ８Ａトリー６３〜６５のキャリー出力レジスタ、
サム出力レジスタ、７６．７７はそれぞれパラレルアダ
ー６６．６７の出力レジスタ、３２．３３はデータバス
セレクト回路である。また、第５図において８０は記憶
回路でろって、近似逆数表の登録を目的とする。以上の
構成をもつパイプライン除算付加回路４は機能的には次
の３つのコンポーネントから構成されている。Next, the internal configuration of the pipeline division addition circuit 4 will be explained. In FIG. 5, 40 to 51 and 78.79 are data registers, and 60 to 62 are +f! 63 to 65 are the same circuits as C8A) IJ- explained in Fig. 4, 66 and 67 are the same circuits as the parallel adder explained in Fig. 4, and 70 to 75 are each C8A) Carry output registers of trees 63 to 65,
Sum output registers 76 and 77 are output registers of parallel adders 66 and 67, respectively, and 32 and 33 are data bus select circuits. Further, in FIG. 5, reference numeral 80 denotes a memory circuit, the purpose of which is to register an approximate reciprocal table. The pipeline division addition circuit 4 having the above configuration is functionally comprised of the following three components.

（１）　乗数巾−ビットのパイプライン乗算器第５図の
データレジスタ４０、倍数発生回路６０、Ｃ８Ａトリー
６３、パラレルアダー６６、データレジスタ７０，７１
゜７６で構成される回路は、データレジスタ４０に貯え
られているデータを被乗数、記憶回路８０から読出した
データを乗数とするパイプライン乗算器となっている。(1) Multiplier width-bit pipeline multiplier Data register 40, multiple generation circuit 60, C8A tree 63, parallel adder 66, data registers 70, 71 in FIG.
The circuit constituted by 76 is a pipeline multiplier that uses the data stored in the data register 40 as a multiplicand and the data read from the storage circuit 80 as a multiplier.

乗数のビット数は、倍数発生回路６０、Ｃ８Ａトリー６
３が第４図で説明したそれらと同一であるので一ビット
である。The number of bits of the multiplier is determined by the multiple generation circuit 60 and the C8A tree 6.
Since 3 is the same as those explained in FIG. 4, it is one bit.

（２）　乗ｉ中−ビットのパイプライン＊算器データレ
ジスタ４８〜５１、倍数発生回路６１．６２、Ｃ８Ａ）
リー６４．６５、パラレルアダー６７、データレジスタ
７２〜７５および７７で構成される回路は、データレジ
スタ４８に貯えられているデータを被乗数、データレジ
スタ４９に貯わえられているデータを乗数とするパイプ
ライン乗算器となっている。乗数のビット数は、＠４図
で説明したものと同じ倍数発生回路、ットである。(2) Multiplication i middle-bit pipeline *calculator data registers 48 to 51, multiple generation circuits 61, 62, C8A)
A circuit consisting of a key 64, 65, a parallel adder 67, and data registers 72 to 75 and 77 uses the data stored in the data register 48 as a multiplicand and the data stored in the data register 49 as a multiplier. It is a pipeline multiplier. The number of bits of the multiplier is the same as the multiple generator circuit explained in Figure @4.

（３）近似逆数表第５図の記憶回路８０は、データレジスタ４０に貯わ見
られているデータをアドレスとしてデータを読出すもの
で、除算処理における第一次近似逆数を保持する近似逆
数表として使われる。(3) Approximate reciprocal table The storage circuit 80 shown in FIG. 5 reads data by using the data stored and viewed in the data register 40 as an address, and is an approximate reciprocal table that holds the first approximate reciprocal in division processing. used as.

このように、パイプライン除算付加回路は除算処理のた
めに特別に用意されてはいるが、回路構成は通常のパイ
プライン乗算器と類似したものとなっておシ、回路実現
上Ｍ利である。In this way, although the pipeline division addition circuit is specially prepared for division processing, the circuit configuration is similar to a normal pipeline multiplier, which is advantageous in terms of circuit implementation. .

以上のような特徴的７Ｊ：構成をもつ第５図の実施例に
おいて、前記のステップ１からステップ６の除算処理が
パイプラインで実行される様子を次に説明する。本笑施
例における除算処理は前述のように、ＶＥ几命令　ＶＥ
Ｄ命令のＨｔ２個の命令で行われる。In the embodiment of FIG. 5 having the characteristic 7J: configuration as described above, the manner in which the division processing from step 1 to step 6 is executed in a pipeline will be described below. As mentioned above, the division process in this embodiment uses the VE command VE
This is performed using two instructions: D and Ht.

（１）ＶＥＲ命令の処理ＶＥＲ命令は除数りを入力データとし、前記のステップ
トステツプ３の処理を行って、弐賭で与えられる除数り
の近似的な逆数几０を出力データとして出力する。処理
は＠５図の除算付加愼構付パイプライン乗典器２を用い
て行われる。以下各ステップ毎に処理の詳細を示す。(1) Processing of the VER instruction The VER instruction takes the divisor as input data, performs the step-to-step 3 processing described above, and outputs the approximate reciprocal of the divisor given by 2 bets, 0, as output data. The processing is performed using the pipeline multiplier 2 with division/addition structure shown in Figure 5. Details of the processing for each step are shown below.

ステップ１：入力データ除数りが、第５図のデータバス１３および１
９を介して１マシンサイクルピツチで次々と供給され、
データレジスタ４０にセットされる。データレジスタ４
０にセットされた除ｉＤの仮数部ｍビットのうちの上位
ｔビットをアドレスとして記憶回路８０に登録されてい
る近似逆数を引き、除数りの第一次近似逆数ｒを得る。Step 1: The input data divisor is connected to data buses 13 and 1 in FIG.
are supplied one after another at one machine cycle pitch through 9,
It is set in the data register 40. data register 4
The approximate reciprocal registered in the storage circuit 80 is subtracted using the upper t bits of the m bits of the mantissa of the divisor iD set to 0 as an address to obtain the first approximate reciprocal r of the divisor.

ｒのビット巾はｔビットである。The bit width of r is t bits.

ステップ２：除数りおよび第一次近似逆数ｒよシ次式を計算する。Step 2: Calculate the divisor and the first approximate reciprocal r.

ｒｔ＝１＋（１−ＤＸｒ）＋（’１−Ｄｘｒ）”　（１
７）まず、１−Ｄｘｒ　（３１）の計算をバイグライン除算付加回路４の中の乗数巾−ピ
ットのパイプライン乗算器で行う。rt=1+(1-DXr)+('1-Dxr)" (1
7) First, 1-Dxr (31) is calculated by the multiplier width-pit pipeline multiplier in the bigline division addition circuit 4.

第一次近似逆数ｒはｔビット巾であり、またｔと浮動小
数点仮数部ビット数ｍとの間には式（ｔ！９に示す関係
がおシ、が成立する。The first approximate reciprocal number r has a width of t bits, and the relationship shown in the equation (t!9) holds between t and the number m of floating point mantissa bits.

従って、式（３１ンの計算で、Ｄｘｒは、Ｄを被乗数、
ｒを乗数として乗数中−ビットの〕くイブライン乗算器
で計算できる。Therefore, in the calculation of formula (31), Dxr is D as the multiplicand,
It can be calculated using an Eveline multiplier with r as a multiplier and a -bit in the multiplier.

実際には式（３１）　’に変形して式（３３）の形で計
算する。Actually, it is calculated in the form of equation (33) by transforming into equation (31)'.

１　＋Ｄｘ　（−ｒ　）　（３３）式（３３）の計算において、・　乗数をｒから−ｒとする処理は第６図のビット反転
回路８１で行われ１の補数化を行い処理する。1 +Dx (-r) (33) In the calculation of equation (33): - The process of changing the multiplier from r to -r is performed by the bit inverting circuit 81 in FIG. 6, which performs 1's complement conversion.

・　値１の加算は、倍数発生回路７５で発生された倍数
をＣ８Ａトリー７０で加算するときに、値１発生回路８
２の出力を合わせて加算することにより処理する。- Addition of value 1 is performed by adding value 1 generation circuit 8 when adding multiples generated by multiple generation circuit 75 in C8A tree 70.
Processing is performed by adding the two outputs together.

以上の処理によシデータレジスタ４４に式（３３）の値
がまる。これらの処理はパイプラインで行われる。即ち
、データバス１３゜１９を介して除数りが１マシンサイ
クルピツチで次々と供給され、第−蕾目のデータの演算
結果がデータレジスタ７６にセットされるとき、第２４
ｉ目のデータの部分積がデータレジスタ７０．７１に、
第３蕾目のデータがデータレジスタ４０にセットされる
。Through the above processing, the value of equation (33) is stored in the data register 44. These processes are performed in a pipeline. That is, when the divisors are supplied one after another at one machine cycle pitch via the data buses 13 and 19, and the calculation result of the -th bud data is set in the data register 76, the 24th digit
The partial product of the i-th data is stored in the data register 70.71,
The data of the third bud is set in the data register 40.

次に、式（３３）の計算結果を用いて式αηを計算する
。式［７Ｊの計算はバイグライン除算付加回路４の中の
乗数巾−ビットのパイプライン乗Ｘ器を用いて行う。成
仏りの計算は、成仏ηを式（３４）のように変形して何
う。Next, the equation αη is calculated using the calculation result of equation (33). The calculation of equation [7J is performed using a pipeline multiplier with a multiplier width of bits in the bigline division/addition circuit 4. To calculate Buddhahood, transform Buddhahood η as shown in equation (34).

１＋（１−ＤＸｒ）　・（１＋（１−］）Ｘｒ））（３
４）即ち、データセレクト回路３２．３３をデータレジ
スタ７６の値を選択するように制御し、データレジスタ
７６に得られた（１−ＤＸｒ）の１ｉ［を乗数としてデ
ータレジスタ４９にセツトシ、データレジスタ７６にイ
尋られｆｃ１直を十１回％８３を通して得られる値（１
＋（１−ＤＸす）を被乗数としてデータレジスタ４８に
セットする。1+(1-DXr) ・(1+(1-])Xr))(3
4) That is, the data select circuits 32 and 33 are controlled to select the value of the data register 76, and 1i[ of (1-DXr) obtained in the data register 76 is set in the data register 49 as a multiplier. The value (1
+(1-DX) is set in the data register 48 as the multiplicand.

データレジスタ４８．４９に被乗数および乗数がセット
されると、倍数発生回路６１゜６２、Ｃ８Ａ）リ−６４
，６５、パラレル・アダー６７を用いてパイプラインで
乗算処理が開始される。When the multiplicand and multiplier are set in the data registers 48 and 49, the multiple generation circuit 61, 62, C8A) Lee-64
, 65, and a multiplication process is started in the pipeline using a parallel adder 67.

また式（３４）における値１の加算は、式（３３）の計
算において用いたのと同じ手法で、倍数発生回路６１が
発生した倍数をＣ８Ａト！Ｊ−６４で加算するときに値
１発生回路８４の出力を付わせて加算することによ多処
理する。Furthermore, the addition of the value 1 in equation (34) is performed using the same method as used in the calculation of equation (33), and the multiple generated by the multiple generation circuit 61 is added to C8A! When adding in J-64, the output of the value 1 generation circuit 84 is added and added, thereby performing multiple processing.

以上の処理によシ式αηの値ｒ！がデータレジスタ７９
にまる。By the above processing, the value r of the formula αη! is the data register 79
Nimaru.

また記憶回路８０から読み出された第一次近似逆数ｒは
、それと対応する弐αηをめるまでの乗算処理が進行す
るのと同期してデータレジスタ４２〜４７を進む。即ち
、第Ｉ前月（ｌは自然数）のデータに対応する式ｕ′Ｄ
の演算結果ｒ１（りがデータレジスタにセットされると
き、データレジスタ４７には、第ｉ希目のデータに対応
する第一次近似逆数ｒ　（ｉ）がセットされる。Further, the first approximate reciprocal number r read from the storage circuit 80 advances through the data registers 42 to 47 in synchronization with the progress of the multiplication process until the corresponding 2αη is obtained. That is, the formula u'D corresponding to the data of the I-th previous month (l is a natural number)
When the calculation result r1(ri) is set in the data register, the first approximate reciprocal r (i) corresponding to the i-th data is set in the data register 47.

なお、式（３４）の計算において、乗数１−ＤＸｒのｌ
Ｊ［は式Ｑで与えられ、ｔピットおれはよいので、式（
３２）の関係よシ乗数巾−は充分で２ある。In addition, in the calculation of equation (34), the multiplier 1-DXr l
Since J[ is given by the formula Q and t pit is good, the formula (
According to the relationship 32), the multiplier width is sufficient and is 2.

ステップ３：式四の乗算処理はパイプライン乗算器３を用いて行われ
る。Step 3: The multiplication process in Equation 4 is performed using the pipeline multiplier 3.

即ち、データバスセレクト回路３０をデータバス１７を
選択するように制御し、またデータセレクト回路３１を
データバス１８を選択するように制御して、データレジ
スタ４７に得られた値ｒおよびデータレジスタ７７に得
られた値ｒ１　をそれぞれデータレジスター００，１０
１に取シ込む。データレジスター００，１０１にデータ
が取シ込まれると、パイプライン乗算器３が第４図の説
明で述べたように動作し、式Ｑ８の乗算処理がパイプラ
インで処理され、演算結果Ｒｏがデータレジスター０２
に得られ、データバス１５を介して、ＶＥＲ命令の演算
結果として送出される。That is, by controlling the data bus select circuit 30 to select the data bus 17 and controlling the data select circuit 31 to select the data bus 18, the value r obtained in the data register 47 and the data register 77 are controlled. The values r1 obtained in
Incorporate into 1. When data is input into the data registers 00 and 101, the pipeline multiplier 3 operates as described in the explanation of FIG. Register 02
is obtained and sent out via the data bus 15 as the operation result of the VER instruction.

以上示したＶＥＲ命令の処理において、入力データであ
る除数りがデータレジスタ４ｏにセットされてから演Ｋ
Ｍ果Ｒ，がデータレジスタ１０２にまるまでの一連の処
理はパイプラインで行われ、データバス１３．１９を介
して１マシンサイクルピツチでベクトルデータを次々と
供給するとき、第１蕾目のデータの演ＫＭ来がデータバ
ス１５を介して送出されると、以後１７シンサイクルピ
ツチで次々と演算結果が送出される。In the processing of the VER instruction shown above, after the input data, the divisor, is set in the data register 4o, the
A series of processing until the M result R, is stored in the data register 102 is performed in a pipeline, and when vector data is supplied one after another at one machine cycle pitch via the data bus 13.19, the data of the first bud is When the KM result is sent out via the data bus 15, the calculation results are sent out one after another at a pitch of 17 syncycles.

（２）ＶＥＤ命令の処理ＶＥＤ命令は、被除数Ｎ１除数りおよびＶＥＩＬ命令の
演算結果である几０を入力データとし、θσ記のステッ
プ４〜ステツプ６の処理を行って、商Ｑを出力データと
して出力する。処理は第５図のパイプライン乗算器１お
よび除算イづ加慎楕付パイプライン乗算器２を連動して
動作させることによｐ行う。以下各ステップ母の処理の
詳細を示す。(2) Processing of the VED instruction The VED instruction takes the dividend N1 and the divisor and the calculation result of the VEIL instruction, 几0, as input data, performs the processing of steps 4 to 6 in θσ, and outputs the quotient Q. Output. The processing is performed by operating the pipeline multiplier 1 and the pipeline multiplier 2 shown in FIG. 5 in conjunction with each other. Details of the processing of each step mother are shown below.

ステップ４：弐μ鐘の演算はパイプライン乗算器１により行われる。Step 4: The second calculation is performed by the pipeline multiplier 1.

Ｒ１＝２　ＤＸＲｏ　ｕｌ第６図において、データバス１０を介して除数Ｄ１デー
タバス１ｌｔ−介して近似逆数几０が１マシンサイクル
ピツチで次々と供給される。除数りおよび近似逆数几０
がそれぞれデータレジスタ２００．２０１にセットされ
ると、パイプライン乗算器１が第４図の説明で述べたよ
うに動作し、Ｄ　Ｘ　Ｒｏの乗算処理がパイプラインで
処理され、ｆｆ１Ｊ！結果がデータレジスタ２０２に得
られる。R1=2DXRoul In FIG. 6, the approximate reciprocal number 0 is supplied one after another at one machine cycle pitch via the data bus 10 and the divisor D1 data bus 1lt-. Divisor and approximate reciprocal 0
are set in the data registers 200 and 201, respectively, the pipeline multiplier 1 operates as described in the explanation of FIG. 4, the multiplication process of D X Ro is processed in the pipeline, and ff1J! The result is available in data register 202.

ＤＸＲｏＯ値を２から減じてＲ１１をめる処理は、ＤＸ
Ｒｏの値の２の補数値を倚ることに相当し、これは、ビ
ット反転回路２１゜＋１回路２２によシ実現される。The process of subtracting the DXRoO value from 2 to obtain R11 is DX
This corresponds to obtaining the two's complement value of the value of Ro, and this is realized by the bit inversion circuit 21°+1 circuit 22.

以上によシ得られた値Ｒ１はデータバス１６を介して除
算付加機構付パイプライン乗算器２へ送られる。The value R1 obtained above is sent to the pipeline multiplier 2 with a division addition mechanism via the data bus 16.

ステップ５：式（イ）の乗算処理はパイプライン除算付加回る。Step 5: The multiplication process in equation (a) is performed through pipeline division and addition.

Ｎ　ｔ　＝　Ｎ　Ｘ　Ｒｏ　し１第５図において、データバス１３および１９を介して被
除数Ｎ１データバス１４および２０を介して近似逆数凡
０が１マシンサイクルピツチで次々と供給され、それぞ
れデータレジスタ４０および４１にセットされる。N t = N and set to 41.

本ステップの処理においてはデータバスセレクト回路３
２はデータレジスタ７８を選択、データバスセレクト回
路３３はデータレジスタ７９の値を選択するよう制御さ
れる。データレジスタ４０にセットされた被除数Ｎは、
データレジスタ７８、データバスセレクト回路３２を介
してデータレジスタ４８にセットされる。同様に、デー
タレジスタ４１にセットされた近似逆数几０はデータレ
ジスタ７９およびデータバスセレクト回路３３を介して
データレジスタ４９にセットされる。In the processing of this step, the data bus select circuit 3
2 selects the data register 78, and the data bus select circuit 33 is controlled to select the value of the data register 79. The dividend N set in the data register 40 is
It is set in the data register 48 via the data register 78 and the data bus select circuit 32. Similarly, the approximate reciprocal value 0 set in the data register 41 is set in the data register 49 via the data register 79 and the data bus select circuit 33.

データレジスタ４８にセットされたＮｆ：被乗数、デー
タレジスタ４９にセットされたＲｅを乗数として、倍数
発生回路６１，６２、Ｃ８Ａ）リー６４．６５、パラレ
ルアダー６７ｔ″用いて弐〇７１の乗算処理がパイプラ
インで行われ、結果Ｎ１がデータレジスタ７７に得られ
る。Using Nf: multiplicand set in the data register 48 and Re set in the data register 49 as a multiplier, the multiplication process of 2071 is performed using the multiple generation circuits 61, 62, C8A) Lee 64.65, and parallel adder 67t''. This is done in a pipeline, and the result N1 is obtained in the data register 77.

式（至）の乗算処理において、乗数となる几◎は式に）
に示した精度をもつ。即ち、弗−次近似逆数ｒの楕匿の
３倍の精度をもつ。第一次近似逆数ｒがｔビットで表現
されるから、几０は３Ｘｔビツトで表現すれば精度的に
問題ない。ｔと浮動小数点仮数部有効桁数ｍとの間には
弐Ｕωの関係があるので、て問題はない。In the multiplication process of the expression (to), the multiplier ⇠◎ is in the expression)
It has the accuracy shown in . That is, it has three times the accuracy of the ellipse of the cross-order approximate reciprocal r. Since the first approximation reciprocal r is expressed in t bits, there is no problem in terms of accuracy if the value 0 is expressed in 3Xt bits. Since there is a relationship of 2Uω between t and the number of significant digits m of the floating point mantissa, there is no problem.

ここで、ステップ４およびステップ５の処理は同期して
行われる。即ち、１マシンサイクルピツチで次々と入力
データが供給されるとき、第ｉ査目の除数Ｄ１第ｉ番目
の被除数Ｎ、第ｉ査目の近似逆数Ｒｏがデータレジスタ
２００，１００゜１０１および２０１にセットされるの
は同一時刻である。さらに、第１着目の除数Ｄ１近似逆
数１１１１ｏがそれぞれデータレジスタの２００．２０
１にセットされてから、ステップ４の演算においてＤ　
Ｘ　Ｂ　ｏＯ値がデータレジスタ２０２にセットされる
までの時間と、第１査目の被除数Ｎ、近似逆数Ｒｏがそ
れぞれデータレジスタ４０．４１にセットされてから、
ステップ５の演算結果Ｎｌがデータレジスタ７７にセッ
トされるまでの時間は共に５マシンサイクルとなるよう
、パイプライン乗膵器１およびパイプライン除丼付加回
路４はｍｔ戊されている。従って、次に説明するステッ
プ６の処理において、第１査目のＲ１、第Ｉ針目のＮｌ
がそれぞれデータレジスタｉｏｏ、ｉｏｉにセットされ
るのは同一時刻である。Here, the processes of step 4 and step 5 are performed synchronously. That is, when input data is supplied one after another at one machine cycle pitch, the i-th column divisor D1, the i-th dividend N, and the approximate reciprocal Ro of the i-th column are stored in the data registers 200, 100° 101, and 201. They are set at the same time. Furthermore, the first divisor D1 approximate reciprocal 1111o is 200.20 of the data register.
After being set to 1, D
The time taken until the X B oO value is set in the data register 202, and after the first dividend N and approximate reciprocal Ro are set in the data registers 40 and 41, respectively.
The pipeline multiplication device 1 and the pipeline removal addition circuit 4 are omitted so that the time it takes for the calculation result Nl of step 5 to be set in the data register 77 is 5 machine cycles. Therefore, in the process of step 6 to be described next, R1 of the first scan, Nl of the I-th stitch
are set in data registers ioo and ioi, respectively, at the same time.

ステップ６：式ンυの乗算処理がパイプライン乗算器３で処理される
。ＶＥＤ命令実行時、第６図のデータバスセレクト回路
３０はデータバス１６を選択、データバスセレクト回路
３１はデータバス１８を選択するよう１ｔｔｌＪ御され
る。Step 6: The multiplication process of the expression nυ is processed by the pipeline multiplier 3. When the VED instruction is executed, the data bus select circuit 30 in FIG. 6 selects the data bus 16, and the data bus select circuit 31 is controlled by 1ttlJ to select the data bus 18.

パイプライン乗算器１でパイプラインで計算されたステ
ップ４の結果几１はデータバス１６を介してｌマシンサ
イクルピッチで次々と送られ、データレジスタ１００に
セットされる。またパイプライン除算付加回路でパイプ
ラインで計算されたステップ５の結果Ｎ１ｋｉｆ−タバ
ス１８を介してｌマシンサイクルピッチで仄々と送られ
、データレジスタ１０１にセットされる。このとき、前
にも述べた通シ、第１着目のＲ１がデータレジスタｉｏ
ｏ。The result 1 of step 4 calculated in the pipeline by the pipeline multiplier 1 is sent one after another at a pitch of 1 machine cycle via the data bus 16 and set in the data register 100. Further, the result of step 5 calculated in the pipeline by the pipeline division/addition circuit is sent slowly through the N1kif-tabus 18 at l machine cycle pitch and set in the data register 101. At this time, as mentioned above, the first R1 of interest is the data register io.
o.

第ｉ査目のＮ１がデータレジスタ１０１にセットされる
のは同一時刻である。N1 of the i-th scan is set in the data register 101 at the same time.

Ｒ，およびＮｌがそれぞれデータレジスタ１００．１０
１にセットされると、パイプライン乗算器３が第４図の
説明で述べたように動作し、式ｅυの乗算処理がパイプ
ラインで行われ、演＄Ｍ来がデータレジスタ１０２に得
られる。データレジスタ１０２に得られたデータはＶＥ
Ｄ命令の乗算結果、曲Ｑとして、データバス１５を介し
て１マシンサイクルピツチで次々と送出される。R, and Nl are data registers 100.10, respectively.
When set to 1, the pipeline multiplier 3 operates as described in the explanation of FIG. The data obtained in the data register 102 is VE
The multiplication result of the D instruction is sent out one after another as music Q via the data bus 15 at one machine cycle pitch.

以上示したＶＥＤ命令の処理において、入力データであ
る除ＩＩＬＤ、被除数Ｎ１近似運数Ｒｏがそれぞれデー
タレジスタ２００，１００，２０１゜１０１にセットさ
れてから出力データでめる曲Ｑがデータレジスタ１０２
にまるまでの一連の処理はパイプラインで行われ、入力
データが１７シンサイクルピツチで次々と供給されると
き、第１査目のデータの演算結果がデータバス１５を介
して送出されると、以後１マシンサイクルピツチで次々
と演算結果が送出される。In the processing of the VED instruction shown above, after the input data division IILD, dividend N1, and approximate luck number Ro are set in the data registers 200, 100, 201° 101, respectively, the song Q that can be performed using the output data is stored in the data register 102.
The series of processing up to the end of the first scan is performed in a pipeline, and when input data is supplied one after another at a pitch of 17 syncycles, when the calculation result of the first scan data is sent out via the data bus 15, Thereafter, the calculation results are sent out one after another at one machine cycle pitch.

以上第５図を用いて説明した本発明の実施例においては
、乗算を繰シ返して曲をめるｆｕ、近似方式のベク）／
レデータの除算を、第５図に示した回路を用いて、ＶＥ
Ｒ命令、ＶＥＩ）命令の２茄令を連続して実行すること
によシ行う。さらに、ＶＥＲ命令、ＶＥＤ茄令はいずれ
もパイプラインで処理され、それぞれ１マ７ンサイクル
に１演昇結果が得られる。従って、第５図の実施例では
而Ｑが等測的に２マシンサイクルに１演算結果の割付で
得られる。In the embodiment of the present invention described above with reference to FIG.
The division of data is performed using the circuit shown in FIG.
This is accomplished by consecutively executing two commands: the R command and the VEI command. Furthermore, both the VER instruction and the VED instruction are processed in a pipeline, and one execution result is obtained in one machine cycle. Therefore, in the embodiment shown in FIG. 5, Q can be obtained isometrically by allocating one operation result to two machine cycles.

第６図は、第５図に示したベクトル除算処理用の回路構
成を會むベクトル処理装置の一実施例を示したものでお
る。第６図において、パイプライン乗算器１、除算付加
機構付パイプライン乗算器２、データバス１０〜１６は
第５図のそれらと対応している。主記憶装置１００はベ
クトルデータやベクトル命令列を保持する、２ｏｏはベ
クトルレジスタ群でろって、主記憶装置とパイプライン
演算器との間に位置し、ベクトルデータを一時的に記憶
するためのものである。第６図の実施例ではベクトルレ
ジスタはＮ本あムそれぞれｏ、１゜２、・・・、Ｎ−１
と着力付けされている。また谷ベクトルレジスタは最大
り個の要素から成るベクトルデータを保持することがで
きるようになっている。データバス１０１〜１０５は主
記憶装置とベクトルレジスタとの間のデータ転送を行う
ものである。FIG. 6 shows an embodiment of a vector processing device that combines the circuit configuration for vector division processing shown in FIG. In FIG. 6, pipeline multiplier 1, pipeline multiplier with division addition mechanism 2, and data buses 10 to 16 correspond to those in FIG. The main memory 100 holds vector data and vector instruction sequences, and 2oo is a group of vector registers, which are located between the main memory and the pipeline arithmetic unit and are used to temporarily store vector data. It is. In the embodiment of FIG. 6, there are N vector registers, each o, 1°2, . . . , N-1.
It is gaining strength. Furthermore, the valley vector register is capable of holding vector data consisting of a maximum of a large number of elements. Data buses 101-105 are for data transfer between the main memory and vector registers.

２０６はベクトルレジスタ読出／誓込制御回路であって
、ベクトルレジスタとパイプライン演算器との間のデー
タバスの頴会関保ｆｔｆｔ？ｌＪ　呻するものである。Reference numeral 206 denotes a vector register read/input control circuit, which controls the data bus connection between the vector register and the pipeline arithmetic unit. lJ It's something to groan about.

データバス２０１〜２０５はベクトルレジスタとベクト
ルレジスタ読出／誓込制御回路との間のデータバスであ
る。Data buses 201-205 are data buses between the vector registers and the vector register read/pledge control circuit.

３００はベクトル命令レジスタ（ＶｅｃｔｏｒＩｎｓｔ
ｒｕｃｔｊｏｎ　Ｒｅｇｉｓｔｅｒ、Ｖ　Ｉ　Ｒと略す
）でろって、データバス３０４を介し−Ｃ王記憶装置か
ら読出されたベクトル命令を一時的に保持するレジスタ
である。300 is a vector instruction register (VectorInst
A register (abbreviated as VIR) is a register that temporarily holds a vector instruction read from the CPU storage device via the data bus 304.

３０１はベクトル命令レジスタ３００に保持されている
ベクトル命令を解読する回路であって、信号線３０２は
ベクトル命令の解読結果をベクトルレジスタ続出／費込
制御回路へ通知するためのもの、また信号線３０３は、
第５図で示した除繕−付加愼構付パイブライン乗昇器２
内のデータセレクト回路３０，３１，３２．３３を制御
するためのものである。301 is a circuit for decoding the vector instruction held in the vector instruction register 300; a signal line 302 is for notifying the vector register decoding/expense control circuit of the decoding result of the vector instruction; and a signal line 303 teeth,
Pipeline elevator 2 with repair/additional structure shown in Fig. 5
This is for controlling the data select circuits 30, 31, 32, and 33 within.

なお、第６図の実施例においては、パイプライン演算器
として除算処理に関係のある２個のパイプライン演算器
のみを示したが、他にパイプライン演算器がβっても着
しつかえない。In the embodiment shown in FIG. 6, only two pipeline arithmetic units related to the division process are shown as pipeline arithmetic units, but it is not possible to use β other pipeline arithmetic units. .

第７図は、第６図に示したベクトル処理装置において除
算を実行するためのベクトル命令列の一例を示したもの
である。第７図において、命令■■は主記憶装置上にあ
るベクトルデータ被除数Ｎ。FIG. 7 shows an example of a vector instruction sequence for executing division in the vector processing device shown in FIG. 6. In FIG. 7, instruction ■■ is vector data dividend number N on the main memory.

除ｅｉＤをそれぞれベクトルレジスタのＯ蒼＋を蕾にロ
ードするｖｅｃｔｏｒ　ＬｏａＤ命令（略号Ｖｌ、Ｄ）
である。命令■は前記で示したＶ　Ｅ　Ｒ命令であって
、命令■によってベクトルレジスタの第１査にロードさ
れた除数りを読出して近似逆数Ｒｅを計算し結果をベク
トルレジスタの第２番に格納するものである。命令■は
前記で示したＶＥＤ命令であって、命令■、■、■でそ
れぞれベクトルレジスタの第Ｏ査、第１査、第２査に格
納されている被除数Ｎ、除数り、近似逆数Ｒｏを読出し
て、閤Ｑ′ｌ１ｆ−計算し結果をベクトルレジスタの第
３番に格納するものである。なお、命令■では、几◎が
格納されているベクトルレジスタ第２査の指定がないが
これは、■ＥＤ命令のオペランドの指定方式として、除
数りが格納されているベクトルレジスタの前号よシ１多
い前号のベクトルレジスタに除数りの近似逆数几０が格
納されていると仮尾して、オペランドの指定数を減らし
ていることによる。Vector LoaD instruction (abbreviation Vl, D) that loads each vector register Oao+ into the bud.
It is. Instruction ■ is the VER instruction shown above, which reads the divisor loaded into the first register of the vector register by instruction ■, calculates the approximate reciprocal Re, and stores the result in the second vector register. It is something. Instruction ■ is the VED instruction shown above, and instructions ■, ■, and ■ calculate the dividend N, divisor, and approximate reciprocal Ro stored in the Oth, first, and second fields of the vector register, respectively. It reads out, calculates Q'l1f, and stores the result in the third vector register. Note that the instruction ■ does not specify the second register of the vector register where ◎ is stored, but this is because the operand specification method of the ■ED instruction is different from the previous one of the vector register where the divisor is stored. This is because the number of specified operands is reduced by assuming that the approximate reciprocal number 0 of the divisor is stored in the vector register of the previous issue, which is 1 more.

次に、第７図に示したベクトル館令列が第６図に示した
ベクトル処理装置において来信される様子を説明する。Next, the manner in which the vector order sequence shown in FIG. 7 is received by the vector processing device shown in FIG. 6 will be explained.

ここでｍ７図の命令■、■は本発明と特に関連をもたな
いので説明を省略する。Here, the instructions ① and ② in the m7 diagram have no particular relation to the present invention, so their explanation will be omitted.

（ｉ）ｖＥｇ命令の処理主記憶装置１００よりデータバス３０４を介して第７図
の命令■、ＶＥ］Ｒ命令が読出されると、ベクトル命令
レジスタ３００にセットされ直ちにベクトル命令解読回
路３０１に送られる。ベクトル館令解読回ｌＮ１３０１
において命令の内容が解読されると、信号線３０２を介
してベクトルレジスタ読出／−込制御回路に対し、デー
タバス２０２とデータバス１３、データバス２０３とデ
ータバス１５全結合し、ベクトルレジスタ第１着からの
データの読出し、およびベクトルレジスタ第２４ｉへの
データの簀込みを指示する。また１ｉ号巌３０３を介し
て、除算付加機構付パイプライン乗算器に対し、ＶＥＲ
命令の処理を指示する。しかる後、ベクトルレジスタの
第１査から除数りを次々と絖出し、データバス２０２及
び１３を介して除算付加機構付パイプライン乗算器に供
胎し、パイプラインで几◎を計其シ、データバス１５お
よび２０３を介してベクトルレジスタ第２蕾へ次々と書
き込まれる。(i) Processing of vEg instructions When the instructions ■ and VE]R shown in FIG. It will be done. Vector Hall Order Deciphering Episode lN1301
When the contents of the instruction are decoded in the signal line 302, the data bus 202 and the data bus 13, and the data bus 203 and the data bus 15 are all coupled to the vector register read/write control circuit, and the first vector register Instructs to read data from the destination and store data in vector register 24i. In addition, via the No. 1i Iwao 303, the VER
Directs the processing of instructions. Thereafter, the divisor values are sequentially extracted from the first check of the vector register, and sent to the pipeline multiplier with a division addition mechanism via the data buses 202 and 13. They are successively written to the second bud of the vector register via buses 15 and 203.

（２）ＶＥＤ命令の処理前述のＶＥＲ命令の場曾と全く同様に主記憶装置１００
から胱出された第７図の命令■、ＶＥＤ命令はベクトル
命令解読回路３０１で解読される。ベクトル命令解読回
路３０１で命令の内容が解読されると、信号線３０２を
介してベクトルレジスタ読出／簀込制御回路に対し、デ
ータバス２０１とデータバス１３、データバス２０２と
データバス１０、データバス２０３とデータバス１１お
よび１４ｅそれぞれ粘会し、ベクトルレジスタの第０査
。(2) Processing of the VED instruction In exactly the same way as for the VER instruction described above, the main memory 100
The instruction ① and the VED instruction in FIG. When the contents of the instruction are decoded by the vector instruction decoding circuit 301, the data bus 201 and the data bus 13, the data bus 202 and the data bus 10, and the data bus 203 and data buses 11 and 14e, respectively, and check the 0th vector register.

第１１！、第２査の続出しおよび第３１への簀込みを指
示する。また信号線３０３を介し、除算付加慎構付パイ
プライン乗昇器に対しＶＥＤ命令の処理全指示する。し
かる俊、ベクトルレジスタの第０査、第１食、巣２査か
らそれぞれ板除数Ｎ、除数り、近似２１２！数Ｒ。11th! , instructs the continuation of the second examination and its inclusion in the 31st examination. Further, via the signal line 303, all processing instructions for the VED instruction are given to the pipeline booster with division/additional protection. Shikarutoshi, from the 0th check, 1st eclipse, and nest 2 check of the vector register, the plate divisor N, divisor ri, and approximation 212, respectively! Number R.

を次々と胱出し、それぞれデータバス２０１と１３，２
０２と１０，２０３と１１および１４を介してパイプラ
イン乗算器１および除算付加機構付パイプライン乗算器
２へ供給し、商Ｑをパイプラインで計算し、データバス
１５および２０４を介してベクトルレジスタ第３査に次
々と書込む。The data buses 201, 13, and 2 are taken out one after another, respectively.
02 and 10, 203, 11 and 14 to the pipeline multiplier 1 and the pipeline multiplier 2 with division addition mechanism, the quotient Q is calculated in the pipeline, and the quotient Q is supplied to the vector register via data buses 15 and 204. Write one after another in the third examination.

以上のように、本実施例によれば、通常の乗算処理に用
いるパイプライン乗算器を流用して、ＶＥＲ命令および
ＶＥＤ命令の２命令を用いて、ベクトルデータの除′ｘ
、をパイプラインで処理するとともに、商Ｑを得るまで
の途中経過としてＲｅを保持するベクトルレジスタが１
本で隣むといった特徴がある。As described above, according to this embodiment, the pipeline multiplier used for normal multiplication processing is used, and the vector data is divided by 'x' using two instructions, the VER instruction and the VED instruction.
, is processed in the pipeline, and the vector register holding Re is 1 as part of the process until obtaining the quotient Q.
It has the characteristic of being next to each other with books.

〔Effect of the invention〕

以上述べたように、ベクトル処理装置において末：ｊ！
を繰シ返して商をめる逆数近似方式によシベクトルデー
タの除算を行うとき、従来は通常の乗算器を用い商を得
るまでに必要なｐｔＡシ返しの乗算と通常の来１４．命
令等を用いて行っていたのに対し、本発明では、通常の
パイプライン乗算処理に用いる２１１ｉ１のパイプライ
ン乗算器に対し、一方の乗算器の出力結果を直接他方の
乗算器の入力データとするパスを設け、また通常のパイ
プライン乗算器と類似した構造をもつ除算処理専用のパ
イプライン構造の付加回路１個を人口データの供給口を
パイプライン乗算器と共用する形で併設し、ベクトル処
理装置内に除算処理専用の大規模な回路を設けることな
く、ベクトルデータの除Ｊ１．をパイプラインで高速に
処理することができる。As mentioned above, in a vector processing device, the end:j!
When dividing vector data using the reciprocal approximation method, which calculates the quotient by repeating ptA, conventionally, a normal multiplier is used to calculate the quotient. In contrast, in the present invention, for the 211i1 pipeline multiplier used in normal pipeline multiplication processing, the output result of one multiplier is directly input to the other multiplier. In addition, an additional circuit with a pipeline structure dedicated to division processing, which has a structure similar to a normal pipeline multiplier, is installed in such a way that the population data supply port is shared with the pipeline multiplier. Vector data division J1. without providing a large-scale circuit dedicated to division processing within the processing device can be processed quickly using a pipeline.

[Brief explanation of drawings]

第１図は従来の浮動小数点数値データ表現形式を示す図
、第２図は従来の除算処理における積置向上の概念を示
す図、謁３図は本発明の一実施例で扱う浮動小数点デー
タ表現形式を示す図、第４図は本発明の一実施例で扱う
パイプライン乗算器のブロック図、第５図および第６図
は本発明の一実施例を示すブロック図、第７図は本発明
の一実施例テ扱うベクトル命令列を示す図である。１・・・パイプライン乗算器、２・・・除典付加慎構旬
パイプライン乗算器、３・・・パイプライン乗算器、４
第　１　図ｆＪ２図 ■　４　図Figure 1 is a diagram showing a conventional floating-point numerical data representation format, Figure 2 is a diagram showing the concept of multiplication improvement in conventional division processing, and Figure 3 is a diagram showing a floating-point data representation used in an embodiment of the present invention. Figure 4 is a block diagram of a pipeline multiplier used in an embodiment of the present invention, Figures 5 and 6 are block diagrams of an embodiment of the present invention, and Figure 7 is a block diagram of a pipeline multiplier handled in an embodiment of the present invention. FIG. 3 is a diagram showing a vector instruction sequence handled by one embodiment. DESCRIPTION OF SYMBOLS 1... Pipeline multiplier, 2... Dictionary addition Shinkushun pipeline multiplier, 3... Pipeline multiplier, 4
Figure 1 fJ2 Figure ■ 4 Figure

Claims

[Claims]

It is equipped with at least two pipeline-structured multipliers prepared for the purpose of processing vector data multiplication instructions for the purpose of speeding up processing, and employs a method that performs division by repeated multiplication. In a vector processing device,
The two multipliers are grouped as a unit, and a data bus for directly inputting the output result of one multiplier to the other multiplier, and an addition dedicated to division processing that shares a data input port with the latter multiplier. A vector processing device characterized by processing division of vector data in a pipeline by providing a circuit and operating the two multipliers and an additional circuit dedicated to the division in conjunction.