JPH0760430B2

JPH0760430B2 - Parallel data processing method

Info

Publication number: JPH0760430B2
Application number: JP1165025A
Authority: JP
Inventors: 裕行宮田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1989-06-27
Filing date: 1989-06-27
Publication date: 1995-06-28
Anticipated expiration: 2010-06-28
Also published as: JPH0329068A

Description

【発明の詳細な説明】［産業上の利用分野］この発明は、一つの制御部の制御のもと，複数の同一型
の基本演算要素（以後、PEと呼ぶ）が同一動作を行うSI
MD（Single−Instruction−Multi−Dsta stream）型の
並列データ処理装置において、浮動小数点演算を実行す
るための処理方式に関するものである。DETAILED DESCRIPTION OF THE INVENTION [Industrial application] The present invention is an SI system in which a plurality of basic arithmetic elements of the same type (hereinafter referred to as PE) perform the same operation under the control of one control unit.
The present invention relates to a processing method for executing a floating point operation in an MD (Single-Instruction-Multi-Dsta stream) type parallel data processing device.

［従来の技術］この種の並列データ処理装置は、第11図に示すように一
つの制御部13により制御される演算部14を有し、この演
算部14は複数の同一型のPE1により構成されている。全
てのPE1は制御部13の制御のもと同一動作を行う。な
お、各PE1間のデータ転送に関する接続方式について
は、ここでは特に限定しない。[Prior Art] This kind of parallel data processing device has an arithmetic unit 14 controlled by one control unit 13 as shown in FIG. 11, and the arithmetic unit 14 is composed of a plurality of PEs 1 of the same type. Has been done. All PE1s perform the same operation under the control of the control unit 13. The connection method for data transfer between the PEs 1 is not particularly limited here.

第12図は、文献,K.E.Batcher “Design of a Massively
Parallel Processor " ,IEEE Transactions on Comput
ers,Vol.C−29,No.9,Sep.1980,pp.836−840に示された
従来の並列データ処理装置の一つのPEの内部構成を示す
ブロック図である。図において、15〜20は各々データを
保持する１ビットレジスタであり、15はＡレジスタ、16
はＢレジスタ、17はＣレジスタ、18はＰレジスタ、19は
Ｇレジスタ、20はＳレジスタである。21は任意ビット長
のシフトレジスタ、22は１ビット全加算器、23はデータ
バス、24はメモリである。Figure 12 is from the literature, KE Batcher “Design of a Massively
Parallel Processor ", IEEE Transactions on Comput
FIG. 8 is a block diagram showing the internal configuration of one PE of the conventional parallel data processing device shown in ers, Vol. C-29, No. 9, Sep. 1980, pp. 836-840. In the figure, 15 to 20 are 1-bit registers for holding data, 15 is an A register, and 16 is a register.
Is a B register, 17 is a C register, 18 is a P register, 19 is a G register, and 20 is an S register. Reference numeral 21 is a shift register having an arbitrary bit length, 22 is a 1-bit full adder, 23 is a data bus, and 24 is a memory.

次に動作について説明する。Ａレジスタ15,Pレジスタ18
及びＣレジスタ17の値は、１ビット全加算器22によって
加算され、その結果の和はＢレジスタ16に、桁上げ（キ
ャリー）はＣレジスタ17に格納される。また、場合によ
っては、シフトレジスタ21によりＢレジスタ16の値がシ
フトされＡレジスタ15に格納される。第11図に示したよ
うに、一般に各PE1は２次元格子状に接続されており、
これらPE1間のデータ転送はＰレジスタ18により行われ
る。Ｇレジスタ19は、Ｐレジスタ18との値の一致，不一
致が検査され、その結果がデータバス23に送られる。メ
モリ24とのデータのやりとりはデータバス23を使用して
行われる。又、外部とのデータの入出力は各PE1のＳレ
ジスタ20を通して行われる。Next, the operation will be described. A register 15, P register 18
And the values of the C register 17 are added by the 1-bit full adder 22, and the sum of the results is stored in the B register 16 and the carry is stored in the C register 17. In some cases, the value of the B register 16 is shifted by the shift register 21 and stored in the A register 15. As shown in FIG. 11, generally, the PE1s are connected in a two-dimensional grid pattern,
Data transfer between these PE1s is performed by the P register 18. The G register 19 is inspected for a value match or a value mismatch with the P register 18, and the result is sent to the data bus 23. Data is exchanged with the memory 24 using the data bus 23. Input / output of data to / from the outside is performed through the S register 20 of each PE1.

各PE1の制御は一つの制御部13から与えられる制御信号
により行われるため、全てのPE1は同一動作をする。各P
E1単位に異なった処理（浮動小数点加減算における桁合
せのためのデータシフトや正規化のためのデータシフト
は、データの値によりシフト量が異なる）を行うには、
シフトレジスタ21を用いて各PE1単位のシフト量を変え
る。Since control of each PE1 is performed by the control signal given from one control unit 13, all PE1 perform the same operation. Each P
To perform different processing for E1 units (the data shift for digit alignment in floating point addition and subtraction and the data shift for normalization differ depending on the data value),
The shift amount of each PE1 unit is changed using the shift register 21.

［発明が解決しようとする課題］従来の並列データ処理装置は以上のように構成されてい
るので、浮動小数点加減算の桁合せや正規化などのよう
にデータによりその動作を変更させる場合には、シフト
レジスタ21を用いる以外に方法がなく、その処理に非常
に時間がかかっていた。[Problems to be Solved by the Invention] Since the conventional parallel data processing device is configured as described above, when the operation is changed by data such as digit alignment and normalization of floating point addition / subtraction, There was no other method than using the shift register 21, and the processing took a very long time.

この発明は上記のような課題を解消するためになされた
もので、各PEにおける浮動小数点演算を高速に処理する
ことができる並列データ処理方式を得ることを目的とす
る。The present invention has been made to solve the above problems, and an object of the present invention is to obtain a parallel data processing system capable of processing floating point arithmetic in each PE at high speed.

［課題を解決するための手段］この発明に係る並列データ処理方式は、各PE内に、アド
レス指定可能なレジスタファイルと、このレジスタファ
イルへのアドレスを各PE毎に定めるためのアドレスレジ
スタと、各PE内の状態を表すための２ビットのフラグレ
ジスタと、上記アドレスレジスタの内容を＋１あるいは
−１して再びアドレスレジスタに戻す機能と、上記アド
レスレジスタの値が‘0'であるか否かを判定し，‘0'の
とき上記フラグレジスタの１ビットの値を反転する機能
とを有する判定回路と、上記フラグレジスタの各値に基
づき、上記アドレスレジスタの値によりレジスタファイ
ルから読み出されたデータをそのまま出力するか，その
各ビットを反転して出力するか，あるいは全ビット‘0'
の値か全ビット‘1'の値を出力するかを選択するセレク
タと、外部から入力されたアドレスによりレジスタファ
イルから読み出されたデータと上記セレクタを介したデ
ータを入力する演算回路とを備え、各PEにおける浮動小
数点加減算においてオペランドの指数部の値が異なる場
合に、その差に基づく値を上記アドレスレジスタに格納
し、指数部の値が小さいオペランドの仮数部の値を指数
部の値が大きいオペランドに合わせてレジスタファイル
から読み出すことにより、各PEにおけるデータにより上
記レジスタファイルから読み出すアドレスの異なる桁合
せ処理を実行し、また浮動小数点演算においてオペラン
ドの仮数部の上位に‘0'の桁がある場合に、その数に基
づく値を上記アドレスレジスタに格納し、仮数部の‘0'
でない最上位の桁よりレジスタファイルから読み出すこ
とにより、各PEにおけるデータにより上記レジスタファ
イルから読み出すアドレスの異なる正規化処理を実行す
るようにしたものである。[Means for Solving the Problems] A parallel data processing method according to the present invention is such that an addressable register file is provided in each PE, and an address register for determining an address to the register file for each PE. A 2-bit flag register for indicating the state in each PE, a function of returning the content of the address register by +1 or -1 and returning it to the address register again, and whether or not the value of the address register is "0". Determination circuit that has the function of inverting the 1-bit value of the flag register when it is '0' and the value of the address register based on each value of the flag register and read from the register file. Data is output as it is, each bit is inverted and then output, or all bits are "0".
Equipped with a selector for selecting whether to output the value of all bits or the value of all bits '1', and an arithmetic circuit for inputting the data read from the register file by the address input from the outside and the data via the selector. , If the exponent value of the operand is different in floating point addition / subtraction in each PE, the value based on the difference is stored in the above address register, and the value of the mantissa part of the operand with the small exponent value is set to the exponent value. By reading from the register file in accordance with a large operand, the digit matching processing of the address read from the above register file is executed according to the data in each PE, and in the floating-point operation, the digit of "0" is placed above the mantissa part of the operand. If there is, the value based on that number is stored in the above address register, and the mantissa part "0" is stored.
By reading from the register file from the highest digit that is not, normalization processing of different addresses read from the register file is executed according to the data in each PE.

［作用］この発明における並列データ処理方式は、各PE内におい
て上記各手段を用いることにより、浮動小数点加減算に
おいては、指数部の値が小さいオペランドの仮数部の値
を指数部の値が大きいオペランドに合わせて取り出すこ
とによって、各PE内における桁合わせ処理を高速に実行
でき、また浮動小数点演算においては、仮数部の最上位
にある‘0'の桁を取り除いて、‘0'でない最上位の桁か
ら取り出すことによって、各PEにおける正規化処理を高
速に実行する。[Operation] In the parallel data processing method according to the present invention, by using each of the above means in each PE, in floating-point addition / subtraction, the value of the mantissa part of the operand with a small exponent value is changed to the operand with a large exponent value. The digit alignment processing in each PE can be executed at a high speed by extracting it according to the above, and in floating-point arithmetic, the most significant digit '0' in the mantissa part is removed and the most significant digit other than '0' is removed. By extracting from the digit, the normalization processing in each PE is executed at high speed.

［実施例］以下、この発明の一実施例を図について説明する。な
お、全体構成は前記第11図に示したものと同様である。[Embodiment] An embodiment of the present invention will be described below with reference to the drawings. The overall structure is the same as that shown in FIG.

第１図は、実施例における各PE1の内部構成を示すブロ
ック図である。図において、２はアドレス指定が可能な
レジスタファイルであり、例えば第２図に示すような４
ビット×32ワード構成から成る（ただし、第２図ではア
ドレスを16進表示で示す）。このレジスタファイル２
は、外部から与えられるアドレスにより、そのアドレス
に対応する値の読み書きが可能である。更に、読み出し
においては、同時に与えられた２つのアドレスに対して
各々に対応する値を同時に読み出せるもので、また、書
き込みに対しても同じく与えられたアドレスに対して読
み出しと同時に書き込みができるものである。第１図に
おいては、書き込みアドレスは外部から与えられたARA
により、読み出しは同じくARBと、ARC又は後述のアドレ
スレジスタ６の値をセレクタ７により選択したものによ
り行われる。3,4は上記レジスタファイル２から同時に
読み出された２つの値を各々格納するRBレジスタとRCレ
ジスタ、５はレジスタファイル２に書き込む値を格納す
るRAレジスタである。レジスタファィル２が第２図に示
す構成から成る場合には、上記RAレジスタ5,RBレジスタ
3,RCレジスタ４は全て４ビット幅のレジスタとなる。６
はレジスタファイル２からRCレジスタ４に読み出す値の
アドレスを各PE単位に定める場合に使用するアドレスレ
ジスタであり、第２図に示すレジスタファイル２を使用
するときには５ビットレジスタとなる。７はRCレジスタ
４に読み出す値を定めるアドレスを外部からのアドレス
ARCとアドレスレジスタ６の値から選択するセレクタ、
８はアドレスレジスタ６の内容を＋１あるいは−１して
再びアドレスレジスタ６に戻す機能と、アドレスレジス
タ６の値を判定し、もしその値が‘0'ならば後述のフラ
グレジスタ９のフラグF1を‘1'とする機能を有する判定
回路、９は後述のセレクタ10の制御信号を与えるための
フラグF1とフラグF2を有する２ビットのフラグレジス
タ、10は後述するALU12の一方のデータ入力を定めるた
めのセレクタであり、候補としては、RCレジスタ４の
値，その各ビットをノットゲート11を介して反転した
値，全ビットがオール‘0'及びオール‘1'の４通りがあ
る。このセレクタ10の制御は、第３図に示す真理値表に
基づき上記フラグレジスタ９のフラグF1,フラグF2によ
り行われる。12はALU（算術論理演算回路）であり、入
力されたデータに対し加算，減算，論理演算などを実行
する。第２図に示したレジスタファイル２を使用する場
合には４ビットALUとなる。FIG. 1 is a block diagram showing the internal configuration of each PE 1 in the embodiment. In the figure, 2 is an addressable register file, for example, 4 as shown in FIG.
It consists of bits x 32 words (however, the address is shown in hexadecimal notation in Fig. 2). This register file 2
Can read and write a value corresponding to an address given from the outside. Further, in reading, a value corresponding to each of two addresses given at the same time can be read at the same time, and also for writing, the value can be written simultaneously to the given address at the same time. Is. In FIG. 1, the write address is the ARA supplied from the outside.
Thus, the reading is also performed by the ARB and the ARC or the value of the address register 6 described later selected by the selector 7. Reference numerals 3 and 4 denote an RB register and an RC register respectively storing two values read out simultaneously from the register file 2, and an RA register 5 stores a value to be written in the register file 2. When the register file 2 has the configuration shown in FIG. 2, the RA register 5 and the RB register described above are used.
3, RC registers 4 are all 4-bit width registers. 6
Is an address register used when the address of the value read from the register file 2 to the RC register 4 is determined for each PE unit, and is a 5-bit register when the register file 2 shown in FIG. 2 is used. 7 is an external address that determines the value to be read to RC register 4.
Selector to select from the value of ARC and address register 6,
8 is a function of returning the content of the address register 6 by +1 or -1 to the address register 6 again, and judging the value of the address register 6, and if the value is "0", the flag F1 of the flag register 9 described later is set. A determination circuit having a function of setting to "1", 9 is a 2-bit flag register having a flag F1 and a flag F2 for giving a control signal of a selector 10 described later, and 10 is for determining one data input of an ALU 12 described later There are four possible values of the RC register 4, a value obtained by inverting each bit of the RC register 4 via the NOT gate 11, and all '0' and all '1'. The control of the selector 10 is performed by the flags F1 and F2 of the flag register 9 based on the truth table shown in FIG. Reference numeral 12 is an ALU (arithmetic logical operation circuit), which executes addition, subtraction, logical operation, etc. on the input data. When the register file 2 shown in FIG. 2 is used, it is a 4-bit ALU.

以上のように構成された並列データ処理装置の各PE1の
動作について以下詳細に説明する。The operation of each PE 1 of the parallel data processing device configured as described above will be described in detail below.

先ず初めに、本実施例で用いる浮動小数点数の形式を第
４図に示す。全体は32ビット幅であり、最上位から順
に，仮数部の符号（S;1ビット），指数部（E;7ビッ
ト），仮数部（M;24ビット）とする。指数部には実際の
値に64の重みをつけ、仮数部は絶対値表現とし、基数は
16とする。これにより、第４図に示される浮動小数点数
の値ＮはＮ＝Ｍ×16^(E-64) となる。First, FIG. 4 shows the format of the floating point number used in this embodiment. The whole is 32 bits wide, and the sign of the mantissa part (S; 1 bit), the exponent part (E; 7 bits), and the mantissa part (M; 24 bits) are arranged in order from the highest order. The exponent part is given a weight of 64 to the actual value, the mantissa part is expressed as an absolute value, and the radix is
16 As a result, the value N of the floating point number shown in FIG. 4 becomes N = M × 16 ^(E-64) .

本発明では、データ毎にシフト量の異なる浮動小数点加
減算ににおける桁合せ処理及び正規化処理をSIMD型の並
列データ処理装置で高速に行うことを特徴とするため、
以下、桁合せ処理と正規化処理を別々に説明する。In the present invention, since digit alignment processing and normalization processing in floating point addition / subtraction with different shift amounts for each data are performed at high speed by a SIMD type parallel data processing device,
The digit alignment process and the normalization process will be separately described below.

（１）桁合せ処理２つの浮動小数点数どうしを加算，あるいは減算する場
合には、各々の指数部の値が等しくなるように仮数部を
桁合せする必要がある（第５図参照）。この桁合せを行
うため、２つの数の指数部の差に応じて一方の仮数部を
シフトして演算を行わなければならない。従来のSIMD型
の並列データ処理装置は、この処理ができないか，ある
いはシフトレジスタ21を用いて各データのシフト動作を
行わなければならなかった。(1) Digit alignment processing When adding or subtracting two floating point numbers, it is necessary to align the mantissa so that the exponents have the same value (see FIG. 5). In order to perform this digit alignment, one mantissa part must be shifted according to the difference between the exponent parts of the two numbers to perform the operation. The conventional SIMD type parallel data processing device cannot perform this processing or must use the shift register 21 to shift each data.

本発明においては次のようにする。今、加算（減算）す
べき２つの数の仮数部を第６図（ｂ）に示すように、X0
〜X5,Y0〜Y5（Xi,Yi各々４ビットの数）と表わす。この
仮数部のうち、その指数部が小さい方の値（シフトの対
象となる値）を同図（ａ）に示すようにレジスタファイ
ル２のアドレス‘0'から‘5'に順に格納する。もう一方
の仮数部はレジスタファイル２の他の領域に格納する
（ここではアドレス‘1A'〜‘1F'とした）。また、両者
の指数部の差を２とする。すると基数が16である点か
ら、これら２つの仮数部の加算は、第６図（ｂ）に示す
ようにY0〜Y5の数の上位に８ビットの‘0'を付け加えて
全体を８ビット右シフトしておき、加算することにな
る。加算はX5＋Y3の桁から始めて最上位まで行えばよ
い。In the present invention, it is as follows. Now, as shown in FIG. 6 (b), the mantissa part of two numbers to be added (subtracted) is X0.
~ X5, Y0 to Y5 (4 bits each of Xi and Yi). Of the mantissa part, the value with the smaller exponent part (the value to be shifted) is stored in order from address '0' to '5' of the register file 2 as shown in FIG. The other mantissa is stored in another area of the register file 2 (here, the addresses are "1A" to "1F"). Further, the difference between the exponents of the two is set to 2. Then, from the point that the radix is 16, the addition of these two mantissas is performed by adding 8-bit '0' to the upper part of the numbers Y0 to Y5 and adding 8 bits to the right, as shown in FIG. 6 (b). It will be shifted and added. Addition should start from the digit X5 + Y3 and go to the highest place.

シフトする必要のない数（この場合はX0からX5で表わさ
れる値）は順に下位から加算を施せばよいため、第１図
に示す外部からのアドレスARBにより、レジスタファイ
ル２からRBレジスタ３にその４ビット毎を下位から順に
読み出し、ALU12に送る。シフトされる数（この場合はY
0〜Y5）は，先ず２つの数の指数部の差から加算を開始
する桁のアドレスを計算しておき、これをアドレスレジ
スタ６に格納し、この値によりレジスタファイル２から
RCレジスタ４に読み出す。第６図の例では、最初にアド
レスレジスタ６に格納される値はY3に対応するアドレス
である‘3'となる。以下、アドレスレジスタ６の値は判
定回路８により１ずつ減らされていき、順にその値がRC
レジスタ４に読み出す値のアドレスとして使用される。Since the numbers that do not need to be shifted (values represented by X0 to X5 in this case) may be added in order from the lower order, the register file 2 to the RB register 3 can be changed by the address ARB from the outside shown in FIG. Every 4 bits are read in order from the lower order and sent to the ALU12. The number to shift (Y in this case)
For 0 to Y5), first calculate the address of the digit to start addition from the difference between the exponents of the two numbers, store this in the address register 6, and then use this value to read from the register file 2
Read to RC register 4. In the example of FIG. 6, the value initially stored in the address register 6 is '3' which is the address corresponding to Y3. After that, the value of the address register 6 is decremented by 1 by the judgment circuit 8, and the value is decreased by RC.
It is used as the address of the value read to the register 4.

第７図にこの処理のタイムチャートを示す。なお、フラ
グレジスタ９の各フラグF1,F2は同図に示すように各々
‘0'に初期設定されており、セレクタ10は第３図からRC
レジスタ４の値を選択するように設定されている。上述
しように、ｔ＝１においては、RBレジスタ３に読み出す
アドレス，すなわちARBが‘1F'、RCレジスタ４に読み出
すアドレス，すなわちアドレスレジスタ６の値が‘3'で
あり、この結果、両レジスタ3,4には各々X5,Y3が読み出
され、ｔ＝２においてX5＋Y3がALU12で実行される。以
下順に、ｔ＝３でX4＋Y2,t＝４でX3＋Y1が行われる。と
ころで、このｔ＝４においてアドレスレジスタ６の値が
‘0'となる。この結果、判定回路８によりこれが検知さ
れ、フラグレジスタ９のフラグF1の値が‘0'→‘1'へと
変化する。すなわち、セレクタ10が今までRCレジスタ４
の値を選択してALU12に送っていたものが、全ビット
‘0'の値を選択することになる。これにより、アドレス
レジスタ６の値が‘0'から更に１ずつ減算されて‘1F',
‘1E',・・・のアドレスで示される値がRCレジスタ４に
読み出されても、ALU12には全ビット‘0'の値が送られ
ることになる。このため、ALU12での加算はｔ＝５でX2
＋Y0が行われた後、ｔ＝６でX1＋‘0',t＝７でX0＋‘0'
となり、第６図で示したY0〜Y5の上位に‘0'をつめた数
の加算が実行される。FIG. 7 shows a time chart of this processing. The flags F1 and F2 of the flag register 9 are initially set to "0" as shown in FIG.
It is set to select the value of the register 4. As described above, at t = 1, the address read to the RB register 3, that is, ARB is '1F', the address read to the RC register 4, that is, the value of the address register 6 is '3', and as a result, both registers 3 , 4 read X5 and Y3, respectively, and at t = 2, X5 + Y3 is executed by the ALU12. In the following order, X4 + Y2 is performed at t = 3, and X3 + Y1 is performed at t = 4. By the way, at this t = 4, the value of the address register 6 becomes "0". As a result, the determination circuit 8 detects this, and the value of the flag F1 of the flag register 9 changes from "0" to "1". That is, the selector 10 has been the RC register 4 until now.
What was sent to ALU12 by selecting the value of is to select the value of all bits '0'. As a result, the value of the address register 6 is further decremented by 1 from "0", resulting in "1F",
Even if the value indicated by the address of "1E", ... Is read to the RC register 4, the value of all bits "0" is sent to the ALU12. Therefore, the addition in ALU12 is X2 when t = 5.
After + Y0 is performed, X1 + '0' at t = 6, X0 + '0' at t = 7
Therefore, the addition of the number of "0" is added to the high order of Y0 to Y5 shown in FIG.

減算の場合には、セレクタ10がRCレジスタ４の値を反転
した値（１の補数）をALU12に送るようにするために、
第３図からフラグレジスタ９の各フラグF1,F2が各々
‘0',‘1'に設定されているので、アドレスレジスタ６
の値が‘0'になってフラグF1が‘0'→‘1'に変化するこ
とにより、セレクタ10では全ビット‘1'が選択され、第
５図（ｂ）に示す減算が実行される。In the case of subtraction, in order for the selector 10 to send the value (1's complement) obtained by inverting the value of the RC register 4 to the ALU 12,
From FIG. 3, since the flags F1 and F2 of the flag register 9 are set to "0" and "1", respectively, the address register 6
Is changed to "0" and the flag F1 is changed from "0" to "1", all the bits "1" are selected in the selector 10, and the subtraction shown in FIG. 5 (b) is executed. .

（２）正規化処理基数が16の浮動小数点数の正規化処理とは、第８図に示
すように仮数部の上位の４ビット単位の‘0'を取り除
き、左シフトして下位ビットに‘0'づめすることを指す
（この場合、シフト量だけ指数部を更新する必要があ
る）。(2) Normalization processing Normalization processing for floating-point numbers with a radix of 16 means removing the upper 4-bit unit "0" of the mantissa part as shown in FIG. It means to add 0 '(In this case, it is necessary to update the exponent part by the shift amount).

この正規化処理もデータによりそのシフト量が変わるた
め、SIMD型の並列データ処理装置の不得意とするところ
である。This normalization process is also a weak point of the SIMD type parallel data processing device because the shift amount changes depending on the data.

本発明においては次のようにする。今、正規化すべき浮
動小数点数の仮数部を、第９図（ｂ）に示すように上位
４ビットが‘0'で以下Z0〜Z4で表わされるものとする。
先ず、対象となる仮数部を同図（ａ）に示すようにレジ
スタファイル２のアドレス‘1A'〜‘1F'に格納する。ア
ドレスレジスタ６には、予め‘1A'〜‘1F'に格納された
仮数部の上位にある４ビット単位の‘0'を検索し、全て
の４ビットが‘0'でない最上位のアドレス（第９図
（ａ）では‘1B'）を格納しておく。その後、アドレス
レジスタ６で示されるアドレスによりレジスタファイル
２の値をRCレジスタ４に読み出し、ALU12,RAレジスタ５
経由でレジスタファイル２の結果を格納する領域に送
る。なお、アドレスレジスタ６の値は判定回路８により
順に１ずつ加算され、RBレジスタ３側のALU12入力は
‘0'とする。In the present invention, it is as follows. Now, it is assumed that the mantissa part of the floating point number to be normalized is represented by Z0 to Z4 below with the upper 4 bits being "0" as shown in FIG. 9 (b).
First, the target mantissa part is stored at addresses '1A' to '1F' of the register file 2 as shown in FIG. The address register 6 searches for "0" in 4-bit units, which is stored in advance in "1A" to "1F", in the upper part of the mantissa part, and finds the highest address (4th bit) where all 4 bits are not "0". In FIG. 9A, "1B") is stored. After that, the value of the register file 2 is read into the RC register 4 by the address indicated by the address register 6, and the ALU12 and RA register 5 are read.
It is sent to the area for storing the result of the register file 2 via. The value of the address register 6 is sequentially incremented by 1 by the decision circuit 8, and the ALU12 input on the RB register 3 side is set to "0".

この処理のタイムチャートを第10図に示す。なお、フラ
グレジスタ９の各フラグF1,F2は同図に示すように各々
‘0'に初期設定されており、セレクタ10はRCレジスタ４
の値を選択するように設定されている。ｔ＝１では、ア
ドレスレジスタ６の値が‘1B'であり、レジスタファイ
ル２からZ0がRCレジスタ４に読み出され、ｔ＝２におい
てALU12より出力される。以下、ｔ＝3,4,5,6では各々Z
1,Z2,Z3,Z4が出力される。また、ｔ＝６においてはアド
レスレジスタ６の値が‘0'となる。このため、前記桁合
せ処理の項で述べたのと同様にフラグレジスタ９のフラ
グF1が‘0'→‘1'となり、これ以降はALU12の入力が全
ビット‘0'となる。よって、ALU12の出力も‘0'とな
り、第９図（ｂ）に示すように、Z0〜Z4の下位に４ビッ
ト単位の‘0'が必要な数だけ付け加えられる。A time chart of this process is shown in FIG. The flags F1 and F2 of the flag register 9 are initially set to "0" as shown in FIG.
Is set to select the value of. At t = 1, the value of the address register 6 is “1B”, Z0 is read from the register file 2 into the RC register 4, and is output from the ALU 12 at t = 2. Below, at t = 3,4,5,6, Z
1, Z2, Z3, Z4 are output. Further, at t = 6, the value of the address register 6 becomes "0". Therefore, the flag F1 of the flag register 9 changes from "0" to "1" as in the case of the digit alignment process, and thereafter, all bits of the ALU 12 are input to "0". Therefore, the output of the ALU12 also becomes "0", and as shown in FIG. 9 (b), the necessary number of "0" in 4-bit units is added to the lower order of Z0 to Z4.

以上述べた方式により、桁合せ処理，正規化処理がSIMD
型の並列データ処理装置においても効率よく高速に実行
される。With the method described above, digit alignment processing and normalization processing are SIMD.
Type parallel data processing device can be executed efficiently and at high speed.

なお、上記実施例では、第４図に示した浮動小数点数の
形式に従って、その桁合せ処理，正規化処理を行う場合
を示したが、本発明は必ずしもこれに限定されるもので
はない。浮動小数点数における各フィールドのビット幅
は任意に変更してもよい。また、基数は16の場合を扱っ
たが、２の場合でも充分に対応可能である。ただし、２
の基数を用いる場合には、レジスタファイル2,ALU12な
どが１ビット幅のデータを扱えるように変更する必要が
ある。In the above embodiment, the digit matching process and the normalizing process are performed according to the floating point number format shown in FIG. 4, but the present invention is not necessarily limited to this. The bit width of each field in the floating point number may be changed arbitrarily. Although the case where the radix is 16 is dealt with, a case where the radix is 2 can be sufficiently dealt with. However, 2
When the radix of is used, it is necessary to change the register file 2, ALU12, etc. so as to handle 1-bit width data.

［発明の効果］以上のように、この発明によれば、各PE内に、アドレス
指定可能なレジスタファイルと、このレジスタファイル
へのアドレスを各PE毎に定めるためのアドレスレジスタ
と、各PE内の状態を表すための２ビットのフラグレジス
タと、上記アドレスレジスタの内容を＋１あるいは−１
して再びアドレスレジスタに戻す機能と、上記アドレス
レジスタの値が‘0'であるか否かを判定し，‘0'のとき
上記フラグレジスタの１ビットの値を反転する機能とを
有する判定回路と、上記フラグレジスタの各値に基づ
き、上記アドレスレジスタの値によりレジスタファイル
から読み出されたデータをそのまま出力するか，その各
ビットを反転して出力するか，あるいは全ビット‘0'の
値か全ビット‘1'の値を出力するかを選択するセレクタ
と、外部から入力されたアドレスによりレジスタファイ
ルから読み出されたデートと上記セレクタを介したデー
タを入力する演算回路とを備え、各PEにおける浮動小数
点加減算においてオペランドの指数部の値が異なる場合
に、その差に基づく値を上記アドレスレジスタに格納
し、指数部の値が小さいオペランドの仮数部の値を指数
部の値が大きいオペランドに合わせてレジスタファイル
から読み出すことにより、各PEにおけるデータにより上
記レジスタファイルから読み出すアドレスの異なる桁合
せ処理を実行し、また浮動小数点演算においてオペラン
ドの仮数部の上位に‘0'の桁がある場合に、その数に基
づく値を上記アドレスレジスタに格納し、仮数部の‘0'
でない最上位の桁よりレジスタファイルから読み出すこ
とにより、各PEにおけるデータにより上記レジスタファ
イルから読み出すアドレスの異なる正規化処理を実行す
るようにしたので、SIMD型の並列データ処理装置におけ
る各PEで浮動小数点演算の桁合せ処理及び正規化処理が
高速に行える効果がある。[Effects of the Invention] As described above, according to the present invention, in each PE, an addressable register file, an address register for determining an address to this register file for each PE, and in each PE 2 bit flag register for indicating the state of the above and the contents of the above address register are +1 or -1.
And a function of returning to the address register again and a function of judging whether the value of the address register is "0" and inverting the 1-bit value of the flag register when the value is "0". And, based on each value of the above flag register, the data read from the register file is output as it is according to the value of the above address register, each bit is inverted and then output, or the value of all bits '0'. A selector for selecting whether to output the value of all bits '1' or an arithmetic circuit for inputting the date read from the register file by the address input from the outside and the data via the selector. When the exponent value of the operand is different in floating point addition / subtraction in PE, the value based on the difference is stored in the above address register, and the value of the exponent value is small. By reading the mantissa value of the command according to the operand with a large exponent value from the register file, the digit alignment processing for different addresses read from the register file is executed according to the data in each PE. If there is a digit of "0" in the upper part of the mantissa part of, the value based on that number is stored in the above address register and
By reading from the register file from the highest digit that is not, the normalization processing of the address read from the above register file is performed differently depending on the data in each PE, so that floating point can be used in each PE in the SIMD parallel data processing device. There is an effect that the digit matching process and the normalizing process of the operation can be performed at high speed.

[Brief description of drawings]

第１図はこの発明の一実施例によるSIMD型の並列データ
処理装置における各PEの内部構成を示すブロック図、第
２図は上記各PE内のレジスタファイルの一構成例を示す
図、第３図は実施例におけるフラグレジスタの各フラグ
とその各値によって制御されるセレクタの動作状態の真
理値表を示す図、第４図は実施例で使用した浮動小数点
数の形式を示す図、第５図（ａ），（ｂ）は浮動小数点
加減算における桁合せ処理を示す図、第６図（ａ），
（ｂ）は実施例において桁合せ処理を行う場合のレジス
タファイル内のデータの位置と加算時の処理を示す図、
第７図は上記桁合せ処理の動作を示すタイムチャート、
第８図は浮動小数点数の正規化処理を示す図、第９図
（ａ），（ｂ）は実施例において正規化処理を行う場合
のレジスタファイル内のデータの位置とその処理を示す
図、第10図は上記正規化処理の動作を示すタイムチャー
ト、第11図はSIMD型の並列データ処理装置の全体構成
図、第12図は従来のSIMD型の並列データ処理装置におけ
る各PEの内部構成を示すブロック図である。１はPE（基本演算要素）、２はレジスタファイル、３は
RBレジスタ、４はRCレジスタ、５はRAレジスタ、６はア
ドレスレジスタ、７はアドレス切り換え用セレクタ、８
は判定回路、９はフラグレジスタ、10はALU入力切り換
え用セレクタ、11はノットゲート、12はALU（算術論理
演算回路）、13は制御部、14は演算部。なお、図中、同一符号は同一、又は相当部分を示す。FIG. 1 is a block diagram showing an internal configuration of each PE in a SIMD type parallel data processing device according to an embodiment of the present invention, and FIG. 2 is a diagram showing an example of a configuration of a register file in each PE, and FIG. FIG. 4 is a diagram showing a truth table of operation states of selectors controlled by respective flags of the flag register and respective values thereof in the embodiment. FIG. 4 is a diagram showing a format of floating point numbers used in the embodiment. FIGS. 6 (a) and 6 (b) are diagrams showing digit alignment processing in floating point addition / subtraction, and FIG. 6 (a) and FIG.
FIG. 9B is a diagram showing the position of data in the register file and the processing at the time of addition when digit alignment processing is performed in the embodiment;
FIG. 7 is a time chart showing the operation of the digit matching process,
FIG. 8 is a diagram showing the normalization process of floating point numbers, and FIGS. 9A and 9B are diagrams showing the position of data in the register file and the process when the normalization process is performed in the embodiment. FIG. 10 is a time chart showing the operation of the above normalization processing, FIG. 11 is an overall configuration diagram of a SIMD type parallel data processing device, and FIG. 12 is an internal configuration of each PE in the conventional SIMD type parallel data processing device. It is a block diagram showing. 1 is PE (basic arithmetic element), 2 is register file, 3 is
RB register, 4 RC register, 5 RA register, 6 address register, 7 address selector, 8
Is a decision circuit, 9 is a flag register, 10 is an ALU input switching selector, 11 is a not gate, 12 is an ALU (arithmetic logic operation circuit), 13 is a control unit, and 14 is an operation unit. In the drawings, the same reference numerals indicate the same or corresponding parts.

Claims

[Claims]

1. A parallel data processing device having a plurality of basic arithmetic elements of the same type controlled by the same instruction from the outside, wherein an addressable register file is provided in each basic arithmetic element, and An address register for determining an address for each basic operation element, a 2-bit flag register for indicating the state in each basic operation element, and the contents of the address register are incremented by +1 or -1 and returned to the address register again. A determination circuit having a function and a function of determining whether or not the value of the address register is "0" and inverting the 1-bit value of the flag register when the value is "0"; Based on the value
Whether the data read from the register file is output as it is according to the value of the above address register, each bit is inverted and then output, or the value of all bits "0" or the value of bit "1" is output. And an arithmetic circuit for inputting the data read from the register file by the address input from the outside and the data via the selector, and the exponent part of the operand in the floating point addition / subtraction in each basic arithmetic element. When the values of the two differ, the value based on the difference is stored in the address register, and the mantissa value of the operand with a small exponent value is read from the register file according to the operand with a large exponent value. Digit matching processing for different addresses read from the above register file depending on the data in each basic operation element When executing, and when there is a digit of "0" in the upper part of the mantissa part of the operand in floating-point arithmetic, store the value based on that number in the above address register. By reading from the register file,
A parallel data processing method characterized in that normalization processing of different addresses read from the register file is executed by data in each basic operation element.