JPH0329068A

JPH0329068A - Parallel data processing system

Info

Publication number: JPH0329068A
Application number: JP1165025A
Authority: JP
Inventors: Hiroyuki Miyata; 宮田　裕行
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1989-06-27
Filing date: 1989-06-27
Publication date: 1991-02-07
Anticipated expiration: 2010-06-28
Also published as: JPH0760430B2

Abstract

PURPOSE:To perform the digit matching and normalizing processes of a floating point arithmetic at a high speed by carrying out the normalizing processes of different addresses read out of a register file based on the data on each PE (basic arithmetic element). CONSTITUTION:A register file 2 is prepared together with an RB register 3, an RC register 4, an RA register 5, an address register 6, an address switching selector 7, a deciding circuit 8, a flag register 9, an arithmetic logic unit ALU input switching selector 10, a NOT gate 11, and an ALU 12. Then these means 2 - 12 are used in each PE 1. Thus the value of the mantissa part of an operand having the small value of the exponent part is taken out in accordance with an operand having the large value of the exponent part in the addition/ substraction of floating points. In a floating point arithmetic state, the digit '0' of the highest rank of the mantissa part is excluded and the highest rank digit except '0' is taken out. As a result, the digit matching and normalizing processes are carried out at a high speed.

Description

【発明の詳細な説明】［産業上の利用分野］この発明は、一つの制御部の制御のもと，複数の同一型
の基本演算要素（以後、ＰＥと呼ぶ）が同一動作を行う
Ｓ　Ｉ　Ｍ　Ｄ　（Ｓｉｎｇｌｅ−Ｉｎｓｔｒｕｃｔｉ
ｏｎ−Ｍｕｌｔｉ−Ｄａｔａ　ｓｔｒｅａｍ）型の並列
データ処理装置において，浮動小数点演算を実行するた
めの処理方式に関するものである．［従来の技術］この種の並列データ処理装置は、第１１図に示すように
一つの制御部１３により制御される演算部１４を有し、
この演算部１４は複数の同一型のＰＥＩにより構成され
ている．全てのＰＥＩは制御部１３の制御のもと同一動
作を行う．なお、各ＰＥＩ間のデータ転送に関する接続
方式については，ここでは特に限定しない．第１２図は，文献，κ．Ｅ．Ｂａｔｃｈｅｒ　　”Ｄｅ
ｓｉｇｎ　ｏｆａ　Ｍａｓｓｉｖｅｌｙ　Ｐａｒａｌｌ
ｅｌ　Ｐｒｏｃｅｓｓｏｒ　　’　　，　ＩＥＥＥＴｒ
ａｎｓａｃｔｉｏｎｓ　ｏｎ　Ｃｏｍｐｕｔｅｒｓ，Ｖ
ｏｌ．Ｃ−２９，Ｎｏ．９，Ｓｅｐ．１９８０，ｐｐ．
８３６−８４０に示された従来の並列データ処理装置の
一つのＰＥの内部構成を示すブロック図である．図にお
いて、■５〜２０は各々データを保持する１ビットレジ
スタであり、１５はＡレジスタ，１６はＢレジスタ、１
７はＣレジスタ、１８はＰレジスタ、１９はＧレジスタ
、２０はＳレジスタである。２１は任意ビット長のシフ
トレジスタ、２２はｌビット全加算器，２３はデータバ
ス、２４はメモリである．次に動作について説明する．Ａレジスタ１５，Ｐレジス
タ１８及びＣレジスタ１７の値は、１ビット全加算器２
２によって加算され、その結果の和はＢレジスタ１６に
，桁上げ（キャリー）はＣレジスタｌ９に格納される．
また、場合によっては、シフトレジスタ２１によりＢレ
ジスタ１６の値がシフトされＡレジスタ１５に格納され
る。[Detailed Description of the Invention] [Field of Industrial Application] This invention is an S I system in which a plurality of basic computing elements of the same type (hereinafter referred to as PEs) perform the same operation under the control of one control unit. M D (Single-Instruction)
This paper relates to a processing method for executing floating point operations in an on-Multi-Data stream type parallel data processing device. [Prior Art] This type of parallel data processing device has a calculation unit 14 controlled by one control unit 13, as shown in FIG.
This calculation unit 14 is composed of a plurality of PEIs of the same type. All PEIs perform the same operation under the control of the control unit 13. Note that the connection method for data transfer between each PEI is not particularly limited here. Figure 12 is from the literature, κ. E. Batcher “De
sign ofa Massively Parallel
elProcessor', IEEETr
ansactions on Computers, V
ol. C-29, No. 9, Sep. 1980, pp.
836-840 is a block diagram showing the internal configuration of one PE of the conventional parallel data processing device shown in FIG. In the figure, ■5 to 20 are 1-bit registers that each hold data, 15 is the A register, 16 is the B register, 1
7 is a C register, 18 is a P register, 19 is a G register, and 20 is an S register. 21 is a shift register of arbitrary bit length, 22 is an l-bit full adder, 23 is a data bus, and 24 is a memory. Next, we will explain the operation. The values of the A register 15, P register 18 and C register 17 are stored in the 1-bit full adder 2.
2, and the resulting sum is stored in B register 16, and the carry is stored in C register 19.
In some cases, the value of the B register 16 is shifted by the shift register 21 and stored in the A register 15.

第１１図に示したように、一般に各ＰＥＩは２次元格子
状に接続されており、これらＰＥＩ間のデータ転送はＰ
レジスタ１８により行われる．Ｇレジスタ１９は，Ｐレ
ジスタ１８との値の一致，不一致が検査され、その結果
がデータバス２３に送られる．メモリ２４とのデータの
やりとりはデータバス２３を使用して行われる．また、
外部とのデータの入出力は各ＰＥＩのＳレジスタ２０を
通して行われる．各ＰＥＩの制御は一つの制御部１３から与えられる制御
信号により行われるため、全てのＰＥＩは同一動作をす
る．各ＰＥＩ単位に異なった処理（浮動小数点加減算に
おける桁合せのためのデータシフトや正規化のためのデ
ータシフトは，データの値によりシフト量が異なる）を
行うには、シフトレジスタ２ｌを用いて各ＰＥＩ単位の
シフト量を変える．［発明が解決しようとする課題］従来の並列データ処理装置は以上のように構成されてい
るので、浮動小数点加減算の桁合せや正規化などのよう
にデータによりその動作を変更させる場合には、シフト
レジスタ２１を用いる以外に方法がなく、その処理に非
常に時間がかかっていた．この発明は上記のような課題を解消するためになされた
もので，各ＰＥにおける浮動小数点演算を高速に処理す
ることができる並列データ処理方式を得ることを目的と
する。As shown in Figure 11, PEIs are generally connected in a two-dimensional grid, and data transfer between these PEIs is
This is done by register 18. The G register 19 is checked for match or mismatch with the P register 18, and the result is sent to the data bus 23. Data is exchanged with the memory 24 using the data bus 23. Also,
Data input/output to/from the outside is performed through the S register 20 of each PEI. Since each PEI is controlled by a control signal given from one control unit 13, all PEIs operate in the same way. To perform different processing for each PEI unit (data shift for digit alignment and data shift for normalization in floating point addition/subtraction, the shift amount differs depending on the data value), use the shift register 2l for each PEI unit. Change the shift amount in PEI units. [Problems to be Solved by the Invention] Since the conventional parallel data processing device is configured as described above, when changing its operation depending on data, such as digit alignment or normalization in floating point addition/subtraction, it is necessary to There was no other way but to use the shift register 21, and the process took a very long time. This invention was made to solve the above-mentioned problems, and aims to provide a parallel data processing method that can process floating point operations at high speed in each PE.

［課題を解決するための手段］この発明に係る並列データ処理方式は，各ＰＥ内に、ア
ドレス指定可能なレジスタファイルと，このレジスタフ
ァイルへのアドレスを各ＰＥ毎に定めるためのアドレス
レジスタと、各ＰＥ内の状態を表すための２ビットのフ
ラグレジスタと，上記アドレスレジスタの値が′Ｏ′で
あるか否かを判定し，“０′のとき上記フラグレジスタ
のｔビットの値を反転する判定回路と、」二記フラグレ
ジスタの各値に基づき、上記アドレスレジスタの値によ
りレジスタファイルから読み出されたデータをそのまま
出力するか，その各ビットを反転して出力するか，ある
いは全ビットａ　Ｏ　ｔの値か全ビット‘１’の値を出
力するかを選択するセレクタと、外部から入力されたア
ドレスによりレジスタファイルから読み出されたデータ
と上記セレクタを介したデータを人力する演算回路とを
備え、各ＰＥにおける浮動小数点加減算においてオペラ
ンドの指数部の値が異なる場合に，その差に基づく値を
上記アドレスレジスタに格納し、指数部の値が小さいオ
ペランドの仮数部の値を指数部の値が大きいオペランド
に合わせてレジスタファイルから読み出すことにより、
各ＰＥにおけるデータにより上記レジスタファイルから
読み出すアドレスの異なる桁合せ処理を実行し、また浮
動小数点演算においてオペランドの仮数部の上位にｔ　
Ｏ　ｙのＨｊがある場合に、その数に基づく値を上記ア
ドレスレジスタに格納し、仮数部の′０″でない最上位
の桁よりレジスタファイルから読み出すことにより、各
ＰＥにおけるデータにより上記レジスタファイルから読
み出すアドレスの異なる正規化処理を実行するようにし
たものである．［作用］この発明における並列データ処理方式は、各ＰＥ内にお
いて上記各手段を用いることにより，浮動小数点加減算
においては、指数部の値が小さいオペランドの仮数部の
値を指数部の値が大きいオペランドに合わせて取り出す
ことによって、各ＰＥ内における桁合わせ処理を高速に
実行でき，また浮動小数点演算においては、仮数部の最
上位にある‘０’の桁を取り除いて゛Ｏ′でない最上位
の桁から取り出すことによって，各ＰＥにおける正規化
処理を高速に実行する。[Means for Solving the Problems] The parallel data processing method according to the present invention includes, in each PE, an addressable register file, an address register for determining the address to this register file for each PE, Determines whether the value of the 2-bit flag register to represent the state in each PE and the above address register is 'O', and if it is '0', inverts the value of the t bit of the above flag register. A determination circuit, based on each value of the flag register (2), outputs the data read from the register file as is, inverts each bit, or outputs all bits a. A selector that selects whether to output the value of Ot or the value of all bits '1', and an arithmetic circuit that manually inputs the data read from the register file according to the address input from the outside and the data via the above selector. When the values of the exponent parts of operands differ in floating point addition and subtraction in each PE, the value based on the difference is stored in the above address register, and the value of the mantissa part of the operand with the smaller value of the exponent part is stored in the exponent part. By reading from the register file according to the operand with the larger value,
The data in each PE performs different digit alignment processing for the address read from the register file, and in floating point arithmetic, t is added to the upper part of the mantissa of the operand.
If there is Hj of O y, store the value based on that number in the above address register, and read it from the register file starting from the most significant digit that is not '0'' of the mantissa. Normalization processing is performed for different read addresses. [Operation] The parallel data processing method of the present invention uses the above-mentioned means in each PE, so that in floating point addition and subtraction, the exponent part is By extracting the mantissa value of the operand with a small value according to the operand with a large exponent value, digit alignment processing within each PE can be executed at high speed. By removing a certain '0' digit and extracting from the most significant digit that is not 'O', normalization processing in each PE is executed at high speed.

［実施例］以下、この発明の一実施例を図について説明する。なお
、全体構成は前記第１１図に示したものと同様である。[Example] Hereinafter, an example of the present invention will be described with reference to the drawings. The overall configuration is the same as that shown in FIG. 11 above.

第１図は、実施例における各ＰＥＩの内部構成を示すブ
ロック図である。図において、２はアドレス指定が可能
なレジスタファイルであり、例えば第２図に示すような
４ビットＸ３２ワード構成から成る（ただし、第２図で
はアドレスを１６進表示で示す）。このレジスタファイ
ル２は，外部から与えられるアドレスにより、そのアド
レスに対応する値の読み書きが可能である．更に，読み
出しにおいては、同時に与えられた２つのアドレスに対
して各々に対応する値を同時に読み出せるもので、また
，書き込みに対しても同じく与えられたアドレスに対し
て読み出しと同時に書き込みができるものである。第１
図においては，Ｗき込みアドレスは外部から与えられた
ＡＲＡにより、読み出しは同じ＜ＡＲＢと，ＡＲＣ又は
後述のアドレスレジスタ６の値をセレクタ７により選択
したものにより行われる．３．４は上記レジスタファイ
ル２から同時に読み出された２つの値を各々格納するＲ
ＢレジスタとＲＣレジスタ，５はレジスタファイル２に
書き込む値を格納するＲＡレジスタである．レジスタフ
ァイル２が第２図に示す構或から成る場合には、上記Ｒ
Ａレジスタ５，ＲＢレジスタ３，ＲＣレジスタ４は全て
４ビット幅のレジスタとなる．６はレジスタファイル２
からＲＣレジスタ４に読み出す値のアドレスを各ＰＥ単
位に定める場合に使用するアドレスレジスタであり、第
２図に示すレジスタファイル２を使用するときには５ビ
ットレジスタとなる．７はＲＣレジスタ４に読み出す値
を定めるアドレスを外部からのアドレスＡＲＣとアドレ
スレジスタ６の値から選択するセレクタ、８はアドレス
レジスタ６の内容を＋１あるいは−１して再びアドレス
レジスタ６に戻す機能と，アドレスレジスタ６の値を判
定し、もしその値がｉ　０　＋ならば後述のフラグレジ
スタ９のフラグＦ１を１１′とする機能を有する判定回
路、９は後述のセレクタ１０の制御信号を与えるための
フラグＦ１とフラグＦ２を有する２ビットのフラグレジ
スタ、１０は後述するＡＬＵ１２の一方のデータ入力を
定めるためのセレクタであり、候補としては、ＲＣレジ
スタ４の値，その各ビットをノットゲート１１を介して
反転した値，全ビットがオールＩ　Ｏ　＋及びオール′
１′の４通りがある．このセレクタ１０の制御は、第３
図に示す真理値表に基づき」二記フラグレジスタ９のフ
ラグＦ１，フラグＦ２により行われる．１２はＡＬＵ　
（算術論理演算回路）であり，入力されたデータに対し
加算，減算，論理演算などを実行する．第２図に示した
レジスタファイル２を使用する場合には４ビットＡ　Ｌ
　Ｕとなる．以上のように構成された並列データ処理装
置の各ＰＥＩの動作について以下詳細に説明する．先ず
初めに，本実施例で用いる浮動小数点数の形式を第４図
に示す。全体は３２ビット幅であり、最上位から順に，
仮数部の符号（Ｓ；１ビット），指数部（Ｅ；７ビット
），仮数部（Ｍ；２４ビット）とする．指数部には実際
の値に６４の重みをつけ、仮数部は絶対値表現とし，基
数は１６とする．これにより、第４図に示される浮動小
数点数の値ＮはＮ＝ＭＸ　１　６”−′４’ となる．本発明では，データ毎にシフト量の異なる浮動小数点加
減算における桁合せ処理及び正規化処理をＳＩＭＤ型の
並列データ処理装置で高速に行うことを特徴とするため
、以下、桁合せ処理と正規化処理を別々に説明する．（り桁合せ処理２つの浮動小数点数どうしを加算，あるいは減算する場
合には，各々の指数部の値が等しくなるように仮数部を
桁合せする必要がある（第５図参照）．この桁合せを行
うため，２つの数の指数部の差に応じて一方の数の仮数
部をシフトして演算を行わなければならない．従来のＳ
ＩＭＤ型の並列データ処理装置は、この処理ができない
か，あるいはシフトレジスタ２１を用いて各データのシ
フト動作を行わなければならなかった．本発明において
は次のようにする。今、加算（減算）すべき２つの数の
仮数部を第６図（ｂｌに示すように．ＸＯ”Ｘ５，ＹＯ
”Ｙ５　（Ｘｉ，Ｙｉは各々４ビットの数）と表わす．
この仮数部のうち、その指数部が小さい方の値（シフト
の対象となる値）を同図ｔａ＋に示すようにレジスタフ
ァイル２のアドレス‘０’から′５′に順に格納する．
もう一方の仮数部はレジスタファイル２の他の領域に格
納する（ここではアドレス′ＩＡ′〜″ＩＦ’　とした
）．また、両者の指数部の差を２とする．すると基数が
１６である点から，これら２つの仮数部の加算は、第６
図（ｂｌに示すようにＹＯ−Ｙ５の数の上位に８ビット
の′Ｏ　ｅ　を付け加えて全体を８ビット右シフトして
おき、加算することになる．加算はＸ５＋Ｙ３の桁から
始めて最上位まで行えばよい．シフトする必要のない数（この場合はＸＯからＸ５で表
わされる値）は順に下位から加算を施せばよいため、第
１図に示す外部からのアドレスＡＲＢにより、レジスタ
ファイル２からＲＢレジスタ３にその４ビット毎を下位
から順に読み出し，ＡＬｔＪ１２に送る．シフトされる
数（この場合はＹＯ−Ｙ５）は，先ず２つの数の指数部
の差から加算を開始する桁のアドレスを計算しておき，
これをアドレスレジスタ６に格納し、この値によりレジ
スタファイル２からＲＣレジスタ４に読み出す．第６図
の例では、最初にアドレスレジスタ６に格納される値は
Ｙ３に対応するアドレスである′３′となる．以下，ア
ドレスレジスタ６の値は判定回路８により１ずつ減らさ
れていき、順にその値がＲＣレジスタ４に読み出す値の
アドレスとして使用される．第７図にこの処理のタイムチャートを示す．なお，フラ
グレジスタ９の各フラグＦｌ，Ｆ２は同図に示すように
各々ｔ　Ｏ　＋　に初期設定されており，セレクタ１０
は第３図からＲＣレジスタ４の？を選択するように設定
されている。上述したように、ｔ＝１においては、ＲＢ
レジスタ３に読み出すアドレス，すなわちＡＲＢが’Ｉ
Ｆ’　．ＲＣレジスタ４に読み出すアドレス，すなわち
アドレスレジスタ６の値が１３′であり、この結果，両
レジスタ３，４には各々Ｘ５，Ｙ３が読み出され、ｔ＝
２においてＸ５＋Ｙ３がＡＬＵ１２で実行される．以下
順に、ｔ＝３でＸ　４　＋　Ｙ　２　，ｔ＝４でＸ３＋
Ｙ１が行われる．ところで、このｔ＝４においてアドレ
スレジスタ６の値がＩ　■　Ｐとなる．この結果、判定
回路８によりこれが検知され，フラグレジ．スタ９のフ
ラグＦｌの値がＪ　Ｏ　ｌ→′１′へと変化する．すな
わち、セレクタ１０が今までＲＣレジスタ４の値を選択
してＡＬＵ１２に送っていたものが、全ビット′Ｏ　ｊ
の値を選択することになる．これにより，アドレスレジ
スタ６の値が“０′から更に１ずつ減算されて’ＩＦ”
ＩＥ’，　　・・・のアドレスで示される値がＲＣレジ
スタ４に読み出されても、ＡＬＵ１２には全ビットｔ　
Ｏ　ｐの値が送られることになる．このため，ＡＬＵ１
２での加算はｔ＝５でＸ２＋ＹＯが行われた後，ｔ＝６
でＸ１＋’０’　，ｔ＝７でｘｏ＋　’ｏ’　となり、
第６図で示したＹＯ−Ｙ５の上位に‘０’をつめた数の
加算が実行される．減算の場合には、セレクタ１０がＲＣレジスタ４の値を
反転した値（１の補数）をＡＬＵ１２に送るようにする
ために、第３図からフラグレジスタ９の各フラグＦｌ，
Ｆ２が各々ＪＱ＃，ＪＰに設定されているので．アドレ
スレジスタ６の値が‘０’になってフラグＦ１が″０′
→′１′に変化することにより、セレクタ１０では全ビ
ット′Ｊ″が選択され、第５図（′ｂ）に示す減算が実
行される．（２）正規化処理基数が１６の浮動小数点数の正規化処理とは、第８図に
示すように仮数部の上位の４ビット単位の′Ｏ′を取り
除き、左シフトして下位ビットに“０′づめすることを
指す（この場合、シフト量だけ指数部を更新する必要が
ある）．この正規化処理もデータによりそのシフト量が変わるた
め、ＳＩＭＤ型の並列データ処理装置の不得意とすると
ころである．本発明においては次のようにする．今、正規化すべき浮
動小数点数の仮数部を、第９図へ｝に示すように上位４
ビットがｔ　Ｏ　ｐで以下Ｚ　Ｏ　−　７．　４で表わ
されるものとする．先ず、対象となる仮数部を同図（ａ
ｌに示すようにレジスタファイル２のアドレス′ＩＡ′
〜′ＩＦ′に格納する．アドレスレジスタ６には、予め
’ＩＡ″〜゛ＩＦ′に格納された仮数部の上位にある４
ビット単位の゛０′を検索し，全ての４ビットが゛Ｏ″
でない最上位のアドレス（第９図ｔａ＋では’ＩＢ’）
を格納しておく．その後、アドレスレジスタ６で示され
るアドレスによりレジスタファイル２の値をＲＣレジス
タ４に読み出し、ＡＬＵ１２，ＲＡレジスタ５経出でレ
ジスタファイル２の結果を格納する領域に送る．なお、
アドレスレジスタ６の値は判定回路８により順に１ずつ
加算され、Ｒ　Ｂレジスタ３側のＡＬＵ１２人力は‘０
’とする．この処理のタイムチャートを第ｌＯ図に示す．なお、フ
ラグレジスタ９の各フラグＦｌ，Ｆ２は同図に示すよう
に各々ｔ　Ｏ　ｙに初期設定されており、セレクタ１０
はＲＣレジスタ４の値を選択するように設定されている
．ｔ＝１では、アドレスレジスタ６の値が′ＩＢ′であ
り、レジスタファイル２からＺＯがＲＣレジスタ４に読
み出され，ｔ＝２においてＡＬＵ１２より出力される．
以下，ｔ＝３．４，５．６では各々Ｚｌ，Ｚ２，Ｚ３，
Ｚ４が出力される．また、ｔ＝６においてはアドレスレ
ジスタ６の値がｔ　Ｏ　ｊ　となる．このため，前記桁
合せ処理の項で述べたのと同様にフラグレジスタ９のフ
ラグＦ１が‘０’→′１′となり，これ以降はＡＬＵ１
２の入力が全ビット‘０’　となる。よって、ＡＬＵ１
２の出力も″Ｏｌとなり，第９図山）に示すように，Ｚ
Ｏ〜Ｚ４の下位に４ビット単位のｔ　Ｏ　ｔが必要な数
だけ付け加えられる．以上述べた方式により、桁合せ処理，正規化処理がＳＩ
ＭＤ型の並列データ処理装置においても効率よく高速に
実行される。FIG. 1 is a block diagram showing the internal configuration of each PEI in the embodiment. In the figure, reference numeral 2 denotes an addressable register file, which has, for example, a 4-bit x 32 word configuration as shown in FIG. 2 (however, addresses are shown in hexadecimal notation in FIG. 2). This register file 2 can read and write values corresponding to addresses given from the outside. Furthermore, for reading, it is possible to read the values corresponding to two addresses given at the same time, and for writing, it is also possible to read and write to a given address at the same time. It is. 1st
In the figure, the W write address is performed by ARA given from the outside, and the read is performed by the same <ARB and ARC or the value of address register 6, which will be described later, selected by selector 7. 3.4 is R that stores two values read simultaneously from register file 2 above.
B register, RC register, and 5 are RA registers that store values to be written to register file 2. If the register file 2 has the structure shown in FIG.
A register 5, RB register 3, and RC register 4 are all 4-bit wide registers. 6 is register file 2
This is an address register used to determine the address of the value to be read from the RC register 4 to the RC register 4 for each PE, and becomes a 5-bit register when the register file 2 shown in FIG. 2 is used. 7 is a selector that selects the address that determines the value to be read into the RC register 4 from the external address ARC and the value of the address register 6; 8 is a function that adds 1 or -1 to the contents of the address register 6 and returns it to the address register 6; , a determination circuit having a function of determining the value of the address register 6 and, if the value is i 0 +, setting a flag F1 of a flag register 9 to be described later to 11'; 9 is for providing a control signal for a selector 10 described later; A 2-bit flag register having a flag F1 and a flag F2, 10 is a selector for determining one data input of the ALU 12, which will be described later.As a candidate, the value of the RC register 4 and each bit of the register are set to the not gate 11. The value inverted through , all bits are all I O + and all '
There are four ways of 1'. This selector 10 is controlled by the third
Based on the truth table shown in the figure, this is performed using flags F1 and F2 of the second flag register 9. 12 is ALU
(arithmetic and logic operation circuit), which performs addition, subtraction, logical operations, etc. on input data. When using register file 2 shown in Figure 2, 4 bits A L
It becomes U. The operation of each PEI of the parallel data processing device configured as above will be explained in detail below. First of all, FIG. 4 shows the format of floating point numbers used in this embodiment. The entire width is 32 bits, and starting from the most significant one,
The sign of the mantissa part (S; 1 bit), the exponent part (E; 7 bits), and the mantissa part (M; 24 bits). The exponent part is given a weight of 64 to the actual value, the mantissa part is expressed as an absolute value, and the base number is 16. As a result, the value N of the floating point number shown in FIG. Since the processing is performed at high speed using a SIMD type parallel data processing device, digit alignment processing and normalization processing will be explained separately below. When subtracting, it is necessary to align the mantissas so that the values of the exponent parts of each number are equal (see Figure 5). The operation must be performed by shifting the mantissa part of one number.
IMD type parallel data processing devices either cannot perform this processing or have had to shift each data using a shift register 21. In the present invention, the following steps are performed. Now, the mantissa parts of the two numbers to be added (subtracted) are as shown in Figure 6 (bl.
``Y5 (Xi and Yi are each 4-bit numbers).
Among these mantissa parts, the value whose exponent part is smaller (the value to be shifted) is stored in order from address '0' to '5' in register file 2, as shown in ta+ in the figure.
The other mantissa part is stored in another area of register file 2 (in this case, addresses 'IA' to 'IF' are used).Also, assume that the difference between the exponent parts of both is 2.Then, the base number is 16. From the point, the addition of these two mantissas is the sixth
As shown in the figure (bl), add 8 bits of 'O e to the upper part of the number YO-Y5, shift the whole thing to the right by 8 bits, and then add it.The addition starts from the X5+Y3 digit and goes up to the most significant digit. For numbers that do not need to be shifted (in this case, the values represented by XO to Read every 4 bits in register 3 sequentially from the lowest order and send it to ALtJ12.For the number to be shifted (YO-Y5 in this case), first calculate the address of the digit to start addition from the difference between the exponent parts of the two numbers. Keep it,
This is stored in the address register 6, and this value is read from the register file 2 to the RC register 4. In the example shown in FIG. 6, the value initially stored in address register 6 is '3', which is the address corresponding to Y3. Thereafter, the value of the address register 6 is decremented by 1 by the determination circuit 8, and the values are sequentially used as the address of the value to be read into the RC register 4. Figure 7 shows a time chart of this process. Note that the flags Fl and F2 of the flag register 9 are each initially set to t O + as shown in the figure, and the selector 10
Is the value of RC register 4 from Figure 3? is set to select. As mentioned above, at t=1, RB
The address read into register 3, that is, ARB is 'I
F'. The address read to the RC register 4, that is, the value of the address register 6, is 13', and as a result, X5 and Y3 are read to both registers 3 and 4, respectively, and t=
2, X5+Y3 is executed by ALU12. In the following order: X 4 + Y 2 at t=3, X3+ at t=4
Y1 will be held. By the way, at t=4, the value of address register 6 becomes I.sub.P. As a result, this is detected by the determination circuit 8, and the flag register is set. The value of flag Fl of star 9 changes from J O l to '1'. That is, the selector 10 has so far selected the value of the RC register 4 and sent it to the ALU 12, but all bits 'O j
The value of will be selected. As a result, the value of address register 6 is further subtracted by 1 from "0" and becomes 'IF'.
Even if the value indicated by the address of IE', . . . is read to the RC register 4, all bits t
The value of Op will be sent. For this reason, ALU1
For addition at 2, after X2+YO is performed at t=5, t=6
At t=7, it becomes x1+'0', and at t=7, it becomes xo+'o'.
Addition of the number YO-Y5 shown in Figure 6 with '0' added to the upper part is executed. In the case of subtraction, each flag Fl,
Since F2 is set to JQ# and JP, respectively. The value of address register 6 becomes '0' and flag F1 becomes '0'
→ '1', all bits 'J' are selected in the selector 10, and the subtraction shown in Figure 5 ('b) is executed. (2) Normalization process Floating point number with base 16 As shown in Figure 8, the normalization process involves removing the upper 4 bits of 'O' from the mantissa, shifting it to the left, and filling the lower bits with '0' (in this case, the shift amount (need to update the exponent part) This normalization process is also a weak point of SIMD type parallel data processing devices because the amount of shift varies depending on the data. In the present invention, it is done as follows. Now, the mantissa part of the floating point number to be normalized is divided into the top 4 as shown in Figure 9.
The bit is t O p and the following Z O − 7. 4. First, the mantissa part to be considered is shown in the same figure (a
Address 'IA' of register file 2 as shown in l
~ Store in 'IF'. The address register 6 contains the 4 characters above the mantissa stored in advance in 'IA' to 'IF'.
Search for bitwise ``0'', all 4 bits are ``O''
('IB' in Figure 9 ta+)
Store it. Thereafter, the value of the register file 2 is read into the RC register 4 using the address indicated by the address register 6, and sent to the area where the result of the register file 2 is stored via the ALU 12 and the RA register 5. In addition,
The value of the address register 6 is sequentially added by 1 by the judgment circuit 8, and the value of the ALU 12 on the RB register 3 side is '0'.
'. A time chart of this process is shown in Figure 1O. Note that the flags Fl and F2 of the flag register 9 are each initially set to t O y as shown in the figure, and the selector 10
is set to select the value of RC register 4. At t=1, the value of address register 6 is 'IB', ZO is read from register file 2 to RC register 4, and output from ALU 12 at t=2.
Below, at t=3.4 and 5.6, Zl, Z2, Z3,
Z4 is output. Furthermore, at t=6, the value of address register 6 becomes t O j . Therefore, the flag F1 of the flag register 9 changes from '0' to '1', as described in the section of the digit alignment process, and from this point on, the ALU1
The input of 2 becomes all bits '0'. Therefore, ALU1
The output of 2 is also ``Ol'', and as shown in Figure 9), Z
The required number of t O t in 4-bit units is added to the lower part of O to Z4. With the method described above, digit alignment processing and normalization processing can be performed using SI.
It can be executed efficiently and at high speed even in an MD type parallel data processing device.

なお、上記実施例では、第４図に示した浮動小数点数の
形式に従って、その桁合せ処理，正規化処理を行う場合
を示したが、本発明は必ずしもこれに限定されるもので
はない．祥動小数点数における各フィールドのビット幅
は任意に変更してもよい．また、基数は１６の場合を扱
ったが、２の場合でも充分に対応可能である．ただし、
２の基数を用いる場合には、レジスタファイル２，ＡＬ
Ｕ１２などが１ビット幅のデータを扱えるように変更す
る必要がある．［発明の効果］以上のように、この発明によれば，各ＰＥ内に，アドレ
ス指定可能なレジスタファイルと、このレジスタファイ
ルへのアドレスを各ＰＥ毎に定めるためのアドレスレジ
スタと、各ＰＥ内の状態を表すための２ビットのフラグ
レジスタと、上記アドレスレジスタの値がｊ　Ｏ　ｌで
あるか否かを判定し，′０″のとき上記フラグレジスタ
の１ビットの値を反転する判定回路と，上記フラグレジ
ス夕の各値に基づき、上記アドレスレジスタの値により
レジスタファイルから読み出されたデータをそのまま出
力するか，その各ビットを反転して出力するか，あるい
は全ビットｔ　Ｏ　ｔの値か全ビットＩｌｌ　の値を出
力するかを選択するセレクタと、外部から入力されたア
ドレスによりレジスタファイルから読み出されたデータ
と上記セレクタを介したデータを入力する演算回路とを
備え、各ＰＥにおける浮動小数点加減算においてオペラ
ンドの指数部の値が異なる場合に、その差に基づく値を
上記アドレスレジスタに格納し，指数部の値が小さいオ
ペランドの仮数部の値を指数部の値が大きいオペランド
に合わせてレジスタファイルから読み出すことにより、
各ＰＥにおけるデータにより上記レジスタファイルから
読み出すアドレスの異なる桁合せ処理を実行し，また浮
動小数点演算においてオペランドの仮数部の上位に″Ｏ
′の桁がある場合に、その数に基づく値を上記アドレス
レジスタに格納し，仮数部の‘０’でない最上位の桁よ
りレジスタファイルから読み出すことにより、各ＰＥに
おけるデータにより上記レジスタファイルから読み出す
アドレスの異なる正規化処理を実行するようにしたので
、ＳＩＭＤ型の並列データ処理装置における各ＰＥで浮
動小数点演算の桁合せ処理及び正規化処理が高速に行え
る効果がある。In the above embodiment, a case has been described in which digit alignment and normalization processing are performed according to the floating point number format shown in FIG. 4, but the present invention is not necessarily limited to this. The bit width of each field in the floating point number may be changed arbitrarily. Also, although we have dealt with the case where the base number is 16, it is also possible to deal with the case where the base number is 2. however,
When using radix 2, register file 2, AL
It is necessary to change U12 etc. so that it can handle 1-bit width data. [Effects of the Invention] As described above, according to the present invention, each PE includes an addressable register file, an address register for determining the address to this register file for each PE, and an addressable register file in each PE. a 2-bit flag register for representing the state of the address register, and a determination circuit that determines whether the value of the address register is j O l and inverts the value of 1 bit of the flag register when it is '0''. , Based on each value of the flag register above, determine whether the data read from the register file by the value of the address register is output as is, each bit is inverted and output, or the value of all bits t O t. It is equipped with a selector that selects whether to output the value of all bits Ill, and an arithmetic circuit that inputs the data read from the register file according to the address input from the outside and the data via the above selector. When the exponent values of operands differ during decimal point addition and subtraction, the value based on the difference is stored in the above address register, and the mantissa value of the operand with the smaller exponent value is adjusted to match the value of the mantissa with the larger exponent value. By reading from the register file,
The data in each PE performs different digit alignment processing for the address read from the register file, and in floating point operations, the upper part of the mantissa of the operand is
' If there is a digit, store the value based on that number in the address register above, and read from the register file starting from the most significant digit that is not '0' in the mantissa, and read from the register file using the data in each PE. Since the normalization processing is performed for different addresses, each PE in the SIMD type parallel data processing device has the effect of speeding up the digit alignment processing and normalization processing of floating point operations.

[Brief explanation of drawings]

第１図はこの発明の一実施例によるＳＩＭＤ型の並列デ
ータ処理装置における各ＰＥの内部構成を示すブロック
図、第２図は上記各ＰＥ内のレジスタファイルの一構成
例を示す図，第３図は実施例におけるフラグレジスタの
各フラグとその各値によって制御されるセレクタの動作
状態の真理値表を示す図，第４図は実施例で使用した浮
動小数点数の形式を示す図，第５図ｆａ＋，（ｂｌは浮
動小数点加減算における桁合せ処理を示す図、第６図（
ａ），（′ｂ）は実施例において桁合せ処理を行う場合
のレジスタファイル内のデータの位置と加算時の処理を
示す図、第７図は上記桁合せ処理の動作を示すタイムチ
ャート、第８図は浮動小数点数の正規化処理を示す図、
第９図（ａｌ，（ｂ）は実施例において正規化処理を行
う場合のレジスタファイル内のデータの位置とその処理
を示す図、第１０図は上記正規化処理の動作を示すタイ
ムチャート、第１１図はＳＩＭＤ型の並列データ処理装
置の全体構成図、第１２図は従来のＳＩＭＤ型の並列デ
ータ処理装置における各ＰＥの内部構或を示すブロック
図である．１はＰＥ（基本演算要素），２はレジスタファイル、３
はＲＢレジスタ，４はＲＣレジスタ，５はＲＡレジスタ
、６はアドレスレジスタ、７はアドレス切り換え用セレ
クタ、８は判定回路，９はフラグレジスタ、１０はＡＬ
Ｕ入力切り換え用セレクタ，１１はノットゲート、１２
はＡＬＵ　（算術論理演算回路）、１３は制御部、１４
は演算部．なお、図中、同一符号は同一，又は相当部分を示す。ｖ！Ｊ１図ＡＲＡ　ＡＲＢ　　ＡＲＣ制原田占号FIG. 1 is a block diagram showing the internal configuration of each PE in a SIMD type parallel data processing device according to an embodiment of the present invention, FIG. 2 is a diagram showing an example of the configuration of a register file in each PE, and FIG. Figure 4 shows a truth table of the operating states of the selector controlled by each flag of the flag register and its value in the example, Figure 4 shows the format of floating point numbers used in the example, and Figure 5 shows the format of the floating point number used in the example. Figure fa+, (bl is a diagram showing digit alignment processing in floating-point addition and subtraction, Figure 6 (
a) and ('b) are diagrams showing the position of data in the register file and the processing at the time of addition when performing digit alignment processing in the embodiment, FIG. 7 is a time chart showing the operation of the digit alignment processing, and FIG. Figure 8 shows the normalization process for floating point numbers.
9(a) and 9(b) are diagrams showing the position of data in the register file and its processing when normalization processing is performed in the embodiment, FIG. 10 is a time chart showing the operation of the normalization processing, and FIG. Fig. 11 is an overall configuration diagram of a SIMD type parallel data processing device, and Fig. 12 is a block diagram showing the internal structure of each PE in a conventional SIMD type parallel data processing device. 1 is a PE (basic calculation element). , 2 is the register file, 3
is the RB register, 4 is the RC register, 5 is the RA register, 6 is the address register, 7 is the address switching selector, 8 is the judgment circuit, 9 is the flag register, and 10 is the AL
Selector for U input switching, 11 is not gate, 12
is an ALU (arithmetic logic operation circuit), 13 is a control unit, 14
is the calculation part. In addition, in the figures, the same reference numerals indicate the same or equivalent parts. v! J1 Diagram ARA ARB ARC Sei Harada Sengo

Claims

[Scope of Claims] In a parallel data processing device having a plurality of basic arithmetic elements of the same type that are controlled in the same way, each basic arithmetic element has an addressable register file and an address to this register file. An address register for determining each basic operation element, a 2-bit flag register for representing the state within each basic operation element, and a 2-bit flag register for determining whether the value of the address register is '0' or not. ', a determination circuit that inverts the value of 1 bit of the flag register, and based on each value of the flag register, outputs the data read from the register file as it is based on the value of the address register, or outputs each bit of the data as is. or all bits'
A selector that selects whether to output a value of 0' or a value of all bits '1', and an arithmetic circuit that inputs data read from the register file based on an externally input address and data via the selector. If the values of the exponent parts of the operands differ in floating point addition and subtraction in each basic calculation element, the value based on the difference is stored in the above address register, and the value of the mantissa part of the operand with the smaller value of the exponent part is used as the exponent. By reading from the register file according to the operand with a large part value, the data in each basic calculation element performs different digit alignment processing of the address read from the register file, and in floating point calculations, the upper part of the mantissa part of the operand is read from the register file. If there is a '0' digit, the value based on that number is stored in the above address register and read from the register file starting from the most significant digit that is not '0' in the mantissa. A parallel data processing method characterized by executing normalization processing for different addresses read from a register file.