JP2967928B2

JP2967928B2 - Parallel processor

Info

Publication number: JP2967928B2
Application number: JP62151381A
Authority: JP
Inventors: 隆木村; 友雄深沢
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1987-06-19
Filing date: 1987-06-19
Publication date: 1999-10-25
Anticipated expiration: 2014-10-25
Also published as: JPS63316254A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、ハード量が少なく、小形にして転送路の使
用効率が高く、任意の演算プロセッサ間で転送路の競合
が無く、効率的にデータの授受を行なう並列プロセッサ
に関するものである。〔従来の技術〕複数のプロセッサ間の相互のデータ転送を行なう装置
としては、各プロセッサが共通に転送路を使用する第12
図に示すようなバス構造が最もハード量が少なく並列プ
ロセッサとして小形化が可能な構成である。しかし、こ
のバスを用いた並列プロセッサでは、１度に２つのプロ
セッサしか転送路を使用できず、他の多くのプロセッサ
は転送路が空くまで処理を中断して待たねばならない。
このため、並列プロセッサの数が増大するとともに、演
算処理と転送とを合わせた全体の処理実効時間は大幅に
増大し、処理速度が低下するという欠点があった。な
お、第12図において、PE1〜PE6はプロセッサ要素、BSは
バス、ABはバス調定（アービタ）である。〔発明が解決しようとする問題点〕一方、転送路のデータ競合を無くすため、第13図に示
すようなクロスバスイッチCSや多段スイッチなどの専用
の転送装置を負荷した並列プロセッサでは、１度にすべ
てのプロセッサがそれぞれ他の１つのプロセッサとの転
送路を実現し、転送の効率化・高速化を図ることができ
る。しかし、このような高速転送装置の問題は、すべて
のプロセッサから独立に転送装置に信号線が集中し、接
続ケーブルの本数が膨大になり、並列プロセッサのプロ
セッサ数の増大とともに実現不可能な規模になるという
欠点があった。本発明はこのような点に鑑みてなされたものであり、
その目的とするところは、同時に多数の転送経路を実現
し、個々の転送路の使用効率を高め、並列プロセッサの
処理実効速度を向上させ、転送装置の小形化と高速性を
両立させた並列プロセッサを提供することにある。［問題点を解決するための手段］このような目的を達成するために本発明は、以下のよ
うな構成をとるものである。複数のプロセッサ要素、データの転送方向が左右単方
向の２組の転送路からなる一次元状の転送路、および複
数のスイッチ回路を有し、前記複数のプロセッサ要素と前記複数のスイッチ回路
とが交互に前記一次元状の転送路上に配置され、前記複数のプロセッサ要素中の任意の２つのプロセッ
サ要素間で転送経路を構成してデータの転送・情報の授
受を行うため、前記複数のスイッチ回路はそれぞれ、前
記一次元状の転送路を電気的に切断もしくは接続する手
段を有し、前記複数のプロセッサ要素が、それぞれ前記一次元状
の転送路の使用について制御する機能を有する並列プロ
セッサにおいて、第ｉ、第（ｉ＋１）のスイッチ回路で挟まれた転送路
Piに接続されたプロセッサ要素PEiは右方向及び左方向
への転送のため必要とする転送路の転送経路予約情報QD
を発生する手段QRegを有し、スイッチ回路SWiは、右方向の転送路Piに対し、前段のスイッチ回路SWi−
１からの転送路Piの転送路使用要求信号RReqおよび転送
路Pi−１に接続されたプロセッサ要素PEi−１からの転
送路Piの転送路要求信号LReqを入力し、転送路Pi−１か
らの転送路要求が発生している場合に該転送路要求の発
生元が前記スイッチ回路SWi−１と前記プロセッサ要素P
Ei−１のどちらであるかを示す要求発生元確認信号ORG
を生成する手段PCONを有し、転送路Piの転送路使用要求
信号RReqを次段の第ｉ＋１のスイッチ回路SWi＋１に出
力し、左方向の転送路Pi−１に対し、第ｉ＋１のスイッチ回
路SWi＋１からの転送路Pi−１の転送路使用要求信号RRe
qおよび転送路Piに接続されたプロセッサ要素PEiからの
転送路Pi−１の転送路要求信号LReqを右から入力し、転
送路Piからの転送路要求が発生している場合に該転送路
要求の発生元が前記スイッチ回路SWi＋１と前記プロセ
ッサ要素PEiのどちらであるかを示す要求発生元確認信
号ORGを生成する手段PCONを有し、転送路Pi−１の転送
路使用要求信号RReqをスイッチ回路SWi−１に出力する
転送要求処理回路RPと、前記プロセッサ要素PEi−１からの転送経路予約情報Q
Dまたは前段のスイッチ回路SWi−１からの転送経路予約
情報QDを入力し、次段のスイッチ回路SWi＋１へ前記転
送路使用要求信号RReqおよび前記要求発生元確認信号OR
Gに応じた値の転送経路予約情報QDを出力し、次段のスイッチ回路SWi＋１からその時点の転送経路
に関する転送経路確認情報ADを入力し、前記転送路使用
要求信号RReqおよび前記要求発生元確認信号ORGならび
に入力した前記転送経路予約情報QDに応じた値の転送経
路確認情報ADを前段のスイッチ回路SWi−１に出力し、
前記転送路使用要求信号RReqおよび前記要求発生元確認
信号ORGおよび入力した前記転送経路予約情報QDならび
に入力した前記転送経路確認情報ADを用いて前段から要
求された次段のスイッチ回路SWi＋１への転送路Piが確
保されたか否かを判定し、前段の転送路Pi−１に接続さ
れるプロセッサ要素PEi−１及び前段のスイッチ回路SWi
−１に転送路使用許可信号ACK（右方向）を出力し、左
方向についても同様に転送路使用許可信号ACK（左方
向）を出力する転送経路判定回路PDと、前記転送要求処理回路RPにおいて生成される転送路要
求発生元確認信号ORG（右方向および左方向）と転送経
路判定回路PDにおいて生成される転送路使用許可信号AC
K（右方向および左方向）と前記転送経路判定回路PDに
入力した前記転送経路予約情報QDおよび前記転送経路確
認情報ADとから前段要求転送路と後段の空き転送路とを
電気的に接続もしくはその接続を絶つ転送路切換回路S
C、とからなる構成。［作用］本発明による並列プロセッサは、ハード規模が小さい
バス構造を基本としながら、１本の信号転送路を、分散
的に挿入したスイッチ回路によって、任意の数と任意の
長さの転送路に分割することを可能にする。また、転送
路分割と使用のための転送路の空き管理制御を簡単な方
法で転送路自身が行なうことができ、プロセッサからの
要求発生に応じて転送経路をダイナミックに変えられ、
１度に多数の２つのプロセッサ間転送路を確保できる。〔実施例〕本発明の第１の特徴は、第２図に示すように、１本の
データ転送線路を複数のプロセッサ要素PE1〜PE3が共用
する構成でありながら、１本の転送線路に分散してスイ
ッチ回路SW1〜SW4を挿入し、このスイッチ回路で挟まれ
た転送路P1,P2,P3をバスとしてプロセッサ要素PE1,PE2,
PE3を接続した構成である。本発明の第２の特徴は、１本の転送路を任意に分割し
て、それぞれ独立した転送路として使用でき、また、各
プロセッサ要素が転送路使用のスケジュール表に基づい
て競合のない転送装置として、転用予約手続きを自動的
にできる手段を有している点にある。本発明の第３の特徴は、転送路の空きで使用可能な経
路と、各プロセッサ要素の使用要求の発生とデータ授受
を必要とするプロセッサとの転送距離とが判明した時点
で、ダイナミックに転送経路設定が可能な制御手段を有
している点にある。この特徴は、簡単な信号の授受により効率的に自動的
に転送路は右と左の独立した方向を持った転送線路で構
成され、一次元上に配置されたスイッチ回路が、転送方
向に対し、前段からの要求信号と個々に管理するプロセ
ッサ要素からの要求信号とから転送経路の始点を判断
し、これにより、全く非同期に発生するプロセッサ要素
の転送要求に応じて、しかも必要なプロセッサ要素間の
転送距離の遠近に依存しないで転送経路が形成されるこ
とを特徴としている。本発明の第４の特徴は、転送路を複数の信号線路で構
成し、この信号線路をスイッチ回路間で乗り換えられる
ように、次段の転送路の信号線の空き情報に基づいて前
段の転送路の信号線と後段の転送路の信号線とを独立に
接続する回路を設けたスイッチ回路を構成している点に
ある。特許請求の範囲に記載の発明の実施例としての並列プ
ロセッサの構成を第１図に示し、スイッチ回路を第３図
に、そのスイッチ回路の処理フローを第４図に、並列プ
ロセッサのスイッチ回路の動作状態の変化を第５図に、
プロセッサ要素の転送路インタフェースを第６図に示
す。第１図で、スイッチSW1〜SW3は転送路中に挿入された
スイッチ回路で、PE1〜PE3はスイッチ回路で挟まれた転
送路に接続されたプロセッサ要素である。第１図におい
て、DRおよびDLはそれぞれ左から右および右から左方向
のデータの転送路であり、QDR,ADRおよびQDL,ADLはそれ
ぞれ左から右へおよび右から左への転送経路予約情報QD
（Query Distance）とその転送経路確認情報AD（Acquir
ed Distance）の情報である。LReqR,LReqLはDR,DLの転
送路それぞれに対するプロセッサ要素からの転送要求信
号（Local,ReqR,L）である。第１図において、スイッチ回路SW1〜SW3は、右方向，
左方向の転送路に関する回路で構成される。第３図はスイッチ回路SWj（ｊ＝1,2,・・・）の右方
向の転送路に関する回路を示したもので、SCは転送路切
替回路で、その他に転送要求処理回路RPと転送経路判定
回路PDとから構成されている。左方向の転送路も同様で
ある。転送路切替回路SCは、ここでは、ハイインピーダ
ンスを含む３値をとるバッファBUFでの組合せで構成さ
れ、転送経路判定回路PDからの切替制御信号Ｓに基づい
て、入力側の転送路DRi（ｉ＝0,1,・・・,m）と出力側
の転送路Ｄ′Ri（ｉ＝0,1,・・・,m）とを電気的に接続
したり、切断したりする。以下、右，左の添字R,Lを省
略する。次に、第３図の回路の動作について説明する。 1.1）転送要求処理回路RPに前段スイッチからRReqi
n、接続されたプロセッサ要素からLReqinが入力され
る。 1.2） RReqinはフリップフロップFF1にリモート・パス
・ステート（Remote Path State,RPS）として記憶さ
れ、LReqinはフリップフロップFF2にローカル・パス・
ステート（Local Path State,LPS）として記憶される。
前段スイッチに接続されたプロセッサ要素から要求があ
れば、RPS,LPSが１になる。 1.3）制御論理手段PCON（プライオリティコントロー
ラ）はRPS,LPSを監視する。RPSの方がプライオリティが
高い。RPS＝１のときは、LPSのいかんにかかわらずオリ
ジン（ORG）＝０とし（第４図のステップ4,3）、接続プ
ロセッサ要素からの要求を無視する。RReqoutにRReqin
を出力する。RPS＝０のときはLPSをみる。LPS＝１のと
きは接続プロセッサ要素から要求があるので、ORG＝１
とする（第４図のステップ1,2）。LPS＝０のときは要求
がないので、ORG＝０とする（第４図のステップ1,5）。
RReqout＝０が出力される。次に、第３図の回路の動作を第４図を用いて説明す
る。第４図はスイッチ回路の基本処理を示したもので、
プロセッサ要素と前段のスイッチからの転送路使用要求
の有無、転送経路予約情報と転送経路確認情報の入力情
報との一致によって前段のスイッチあるいはプロセッサ
要求への転送経路予約情報QDoutと転送経路確認情報ADo
utを出力する。 2.1）転送経路判定回路PDに、前段スイッチからQDi
n、後段スイッチからADinが入力される。後段スイッチ
にQDoutが出力され、前段スイッチにADoutとACKR（一致
信号）が出力される。 2.2）制御論理手段SD（ステータスディテクタ）はORG
とRReqoutを監視する。RPS,LPS共に０のとき（ステップ
４からステップ５へ移行するとき）、後段からのADinに
１を加え、前段にADout＝ADin＋１を前段のスイッチ回
路に出力する（ステップ５）。自らの要求はないので、
後段の空き区間数に自らの区間を加えて前段に伝えるの
である。第３図のINCは１を加算するインクリメンタで
ある。 2.3） RPS,LPSのいずれかが１のときはステップ６へ移
行し（ステップ１〜４）、ORG＝０であれば前段からのQ
D_inに１を減じ（ステップ８）、後段にQDout＝QDin−１
を出力する（ステップ６〜10）。QDoutは要求区間数を
示し、自らを区間として設定するのでこれを減じて後段
に伝えるのである。DECは１を減算するデクリメンタで
ある。ORG＝１であれば、接続PEが要求するQDinを後段
に出力する（ステップ9,11）。 2.4）また、ADin＋１とQDoutを比較し、一致しない間
はORGにかかわらず、転送経路要求がある場合は、ADout
＝０、ACKout＝０を前段のスイッチ回路に出力する（ス
テップ12,13）。一致すれば、スイッチを接続（オン）
し、ADout＝ADin＋１、ACKout＝１を前段のスイッチ回
路に出力する（ステップ14）。 2.5） LPS、RPSのいずれかが１の場合にQDout＝０にな
れば、要求転送経路の終端と判断し、この時、スイッチ
を切断（オフ）し、ADout＝０、ACKout＝１を前段に、Q
Dout＝０、RReq＝０を後段に出力する（ステップ15）。 3.1）第５図は、第４図の基本処理に基づいて実行さ
れる８つのスイッチ回路のQDoutとADoutの出力の時間的
推移を示している。第５図において、最初はすべてのス
イッチSWiでADout＝FF（最大値）であり、これは後段の
すべての区間を使用できることを意味する。QDout＝０
となっているのは転送路の要求がないためである。 3.2）今、クロック１でスイッチSW1にオリジンが発生
し、転送路区間数＝３が要求されたとする。すなわち、
スイッチSW1に接続されるプロセッサ要素PE0からスイッ
チSW4に接続されるプロセッサ要素PE3に転送要求がなさ
れた場合を例として説明する。この場合、スイッチSW1
からQDout＝３、ADout＝０が出力される。ここで、ADou
t＝０はスイッチSW1から右側へは転送路を形成できない
ことを示す。転送路予約手続が開始される。 3.3）クロック２でスイッチSW2はスイッチSW1の要求
をうけ、QDout＝2,ADout＝０を出力する（ステップ9,1
0,12,13）。クロック３でスイッチSW3はQDout＝1,ADout＝０を出
力する（ステップ9,10,12,13）。クロック４でスイッチSW4はRReq＝１が入力され、か
つQDout＝０となるため、ADout＝０、ACKout＝１を出力
する。ACKout＝１は前段のスイッチ回路に伝えられる
（ステップ8,15）。 3.4）クロック５でスイッチSW3のACKin＝１となり、
スイッチSW3はスイッチSW4のADout＝０に１を加え、QDo
ut＝ADout＝１となり、スイッチをオンし、スイッチSW3
のACKout＝１となる（ステップ6,8,9,10,12,14）。クロック６で、スイッチSW2でQDout＝ADout＝２を出
力するとともに、スイッチをオンし、ACKout＝１を前段
のスイッチ回路に出力する（ステップ6,8,9,10,12,1
4）。クロック７で、スイッチSW1でQDout＝ADout＝３を出
力するとともに、スイッチをオンし、ACKout＝１を前段
に出力する（ステップ6,8,9,11,12,14）。ここにおいて、スイッチSW1からSW4、プロセッサ要素
PE0からPE3への転送路が確保される。 4.1）プロセッサ要素PE1からPE4へLReqを１に保つこ
とにより、スイッチSW1〜SW4のQDout、ADoutおよびスイ
ッチオン、スイッチオフの状態を不変に保ち（ステップ
6,16）、この間データ転送が行なわれる。 5.1）データ転送が終了すると、プロセッサ要素PEOで
LReqを０に、すなわちクロック101でスイッチSW1はLPS
＝０、ORG＝０、RReqout＝０となる（ステップ4,1,
5）。第３図の制御論理手段SDはこれを検出してスイッ
チを直ちにオフし、QDout＝０とする（ステップ4,1,
5）。転送予約解除が開始される。 5.2） RReqがスイッチSW1〜SW4へ伝わることによっ
て、クロック102で、スイッチSW2にSW1からのRReqout＝０
が入力され、QDout＝０、ADout＝２、スイッチオフとな
り、クロック103で、スイッチSW3のQDout＝０、ADout＝
１、スイッチオフとなり、クロック104で、スイッチSW4のQDout＝０、ADout＝０
となる（ステップ4,5）。 5.3）スイッチSW4でORG＝０であれば、クロック105で
RReqin＝０より、後段のスイッチについて使用可能な転
送路区間数にセットされる。すなわち、ADout＝FF＋１
＝FF（最大使用可能な区間FF）が出力される（ステップ
4,1,5）。クロック106で、スイッチSW3でSW4のADoutに１を加
え、ADoutが出力される。クロック107で、スイッチSW2でSW3のADoutに１を加
え、ADoutが出力される。クロック108で、スイッチSW1が順次１を加算されたAD
outを出力し、ここで転送路の解除が完了する（ステッ
プ4,1,5）。第６図はプロセッサ要素の転送路インタフェース回路
を示したもので、転送路としてDRにトライステートのI/
OバッファB1で接続され、転送経路予約情報QDはトライ
ステートの入力バッファB2で、転送路確認信号ADは同じ
くトライステートの出力バッファB3で接続されている。
転送要求信号LReqおよびACK信号は直接スイッチ回路と
接続されている。第６図で、シリアルパラレルシフトレ
ジスタSPRはビット長の変換が必要な場合に用いられ
る。同図において、IDBは内部データバス、CDBは制御デ
ータバス、20はインタフェース制御回路である。次に本実施例の具体的な効果について説明する。この
場合、第９図に示すように、９つのスイッチ回路と８つ
の演算プロセッサ（プロセッサ要素）PE0〜PE7から構成
される並列プロセッサを例として、従来のバスで構成さ
れたもの、クロスバスイッチで構成されたものと比較す
る。デバイスシミュレーションなどで特に高速化が困難と
されているモンテカルロシミュレーションの並列演算に
おいては、すべてのプロセッサが殆ど一斉に他のすべて
のプロセッサにデータを送る問題がある。この問題につ
いて本実施例の効果を説明する。今、８台の各プロセッ
サからほぼ一斉に他の７台のプロセッサに１つのデータ
を送る場合、バス構造の並列プロセッサでは、第７図に
示すように、28回の転送回数が必要となる。最も理想的
な転送装置としてクロスバスイッチを有する並列プロセ
ッサでは第８図に示すように４回の転送で済む。これに
対し、右方向と左方向の１組の転送路で構成されている
本実施例の並列プロセッサでは、第９図に示すように20
回、さらに、これにもう１組の転送路構成を持つ並列プ
ロセッサ構成をとると、10回となる。第９図の点線は他
の回に転送路となっているが、当該回に変更可能な場合
を示す。一般的にプロセッサ要素数ｎと転送路の本数2^mの場合
の本実施例の転送回数とケーブル本数の関係式を表に示
す。表において、2ⁿはプロセッサ要素数、ｂは１本の転
送路のビット幅である。本実施例の転送路の本数を関数
としてプロセッサ要素数を変化した時の転送回数とケー
ブル本数を第10図，第11図に示す。転送路本数の極限は
クロスバスイッチで必要な本数の1/4で、この時の転送
速度もほぼクロスバスイッチのそれに匹敵する。第10図
および第11図より、転送路の本数は、装置の規模と転送
速度の要求に応じて、任意に変えられるのが本実施例の
大きな効果である。この場合、転送回数とケーブル本数
との関係は表に示すようにいくつかの組合せを選択でき
る。第10図，第11図において、S1,S3はクロスバスイッ
チの特性線、S2,S4は通常バスの特性線、40,50は本実施
例における選択の範囲を示す。〔発明の効果〕以上説明したように本発明は、１本の転送路に分散し
てスイッチ回路を配置したことにより、１本の転送路上
に一度に複数の任意の区間長の転送路を実現でき、転送
路の使用効率を増大することができる。また、転送路に
接続されたプロセッサ要素はスイッチ回路から出力され
る転送路の空き情報として区間長情報を見て、転送要求
信号とともに必要な転送区間長の情報を送り、転送路が
確保されればACKを受け転送路を使用できる。このよう
な簡単な制御で２つのプロセッサ間の転送路を個々のプ
ロセッサ要素が処理途中で必要な経路をダイナミックに
変えられ、使用効率とともに高速性を実現できる。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The present invention has a small amount of hardware, is small in size, has a high use efficiency of a transfer path, has no transfer path contention between arbitrary arithmetic processors, and is efficient. The present invention relates to a parallel processor that exchanges data. 2. Description of the Related Art As an apparatus for performing mutual data transfer between a plurality of processors, a twelfth embodiment uses a common transfer path for each processor.
The bus structure as shown in the figure has the smallest amount of hardware and is a configuration that can be downsized as a parallel processor. However, in a parallel processor using this bus, only two processors can use a transfer path at a time, and many other processors have to suspend processing and wait until the transfer path becomes free.
For this reason, the number of parallel processors increases, and the overall processing effective time including the arithmetic processing and the transfer greatly increases, and the processing speed decreases. In FIG. 12, PE1 to PE6 are processor elements, BS is a bus, and AB is a bus adjustment (arbiter). [Problems to be Solved by the Invention] On the other hand, in order to eliminate data competition on the transfer path, a parallel processor loaded with a dedicated transfer device such as a crossbar switch CS or a multi-stage switch as shown in FIG. All of the processors realize transfer paths with one other processor, and transfer efficiency and speed can be improved. However, the problem of such a high-speed transfer device is that signal lines concentrate on the transfer device independently of all processors, the number of connection cables becomes enormous, and as the number of parallel processors increases, the scale becomes impossible. There was a disadvantage of becoming. The present invention has been made in view of such a point,
The goal is to achieve multiple transfer paths at the same time, improve the efficiency of use of individual transfer paths, improve the effective processing speed of parallel processors, and achieve a compact and high-speed transfer device. Is to provide. [Means for Solving the Problems] In order to achieve such an object, the present invention has the following configuration. A plurality of processor elements, a one-dimensional transfer path including two sets of transfer paths in which the data transfer direction is unidirectional in the left and right directions, and a plurality of switch circuits, wherein the plurality of processor elements and the plurality of switch circuits are The plurality of switch circuits are alternately arranged on the one-dimensional transfer path, and form a transfer path between any two of the plurality of processor elements to transfer data and exchange information. Has a means for electrically disconnecting or connecting the one-dimensional transfer path, respectively, in a parallel processor, wherein the plurality of processor elements each have a function of controlling use of the one-dimensional transfer path, Transfer path between i-th and (i + 1) th switch circuits
The processor element PEi connected to Pi has transfer path reservation information QD of a transfer path required for transfer in the right and left directions.
The switch circuit SWi is provided with a switch circuit SWi-
The transfer path use request signal RReq of the transfer path Pi from the transfer path Pi-1 and the transfer path request signal LReq of the transfer path Pi from the processor element PEi-1 connected to the transfer path Pi-1 are input. When a transfer path request has occurred, the source of the transfer path request is the switch circuit SWi-1 and the processor element P.
Request source confirmation signal ORG indicating which of Ei-1
And outputs the transfer path use request signal RReq of the transfer path Pi to the (i + 1) th switch circuit SWi + 1 of the next stage, and outputs the (i + 1) th switch circuit SWi + 1 to the left transfer path Pi-1. From the transfer path Pi-1 from the transfer path use request signal RRe
q and the transfer path request signal LReq of the transfer path Pi-1 from the processor element PEi connected to the transfer path Pi is input from the right, and when a transfer path request from the transfer path Pi is generated, the transfer path request , Which generates a request source confirmation signal ORG indicating whether the generation source is the switch circuit SWi + 1 or the processor element PEi, and transmits the transfer path use request signal RReq of the transfer path Pi-1 to the switch circuit. A transfer request processing circuit RP to be output to SWi-1, and transfer path reservation information Q from the processor element PEi-1.
D or the transfer path reservation information QD from the preceding switch circuit SWi-1 is input, and the transfer path use request signal RReq and the request source confirmation signal OR are sent to the next switch circuit SWi + 1.
The transfer path reservation information QD having a value corresponding to G is output, the transfer path confirmation information AD relating to the transfer path at that time is input from the next-stage switch circuit SWi + 1, and the transfer path use request signal RReq and the request source confirmation are confirmed. A signal ORG and a transfer path confirmation information AD having a value corresponding to the input transfer path reservation information QD are output to the previous-stage switch circuit SWi-1,
Using the transfer path use request signal RReq, the request source confirmation signal ORG, the input transfer path reservation information QD, and the input transfer path confirmation information AD, transfer to the next-stage switch circuit SWi + 1 requested from the previous stage. It is determined whether or not the path Pi has been secured, and the processor element PEi-1 connected to the previous transfer path Pi-1 and the previous switch circuit SWi
The transfer path determination circuit PD outputs a transfer path use permission signal ACK (right direction) to -1 and similarly outputs a transfer path use permission signal ACK (left direction) for the left direction. The generated transfer path request source confirmation signal ORG (rightward and leftward) and the transfer path use permission signal AC generated by the transfer path determination circuit PD
K (rightward and leftward) and the transfer path reservation information QD and the transfer path confirmation information AD input to the transfer path determination circuit PD to electrically connect the previous request transfer path and the subsequent free transfer path. Transfer path switching circuit S that cuts off the connection
C, a configuration consisting of [Operation] The parallel processor according to the present invention is based on a bus structure having a small hardware scale, and a switch circuit in which one signal transfer path is distributed and inserted into an arbitrary number and an arbitrary length of transfer paths. Allow to split. In addition, the transfer path itself can be controlled in a simple manner by the transfer path division management for the transfer path for use and transfer, and the transfer path can be dynamically changed in response to a request from the processor,
Many transfer paths between two processors can be secured at one time. [Embodiment] A first feature of the present invention is that, as shown in FIG. 2, a single data transfer line is shared by a plurality of processor elements PE1 to PE3, but is distributed to one transfer line. Then, switch circuits SW1 to SW4 are inserted, and the transfer paths P1, P2, and P3 sandwiched by the switch circuits are used as buses and the processor elements PE1, PE2,
In this configuration, PE3 is connected. A second feature of the present invention is that a single transfer path can be arbitrarily divided and used as independent transfer paths, and each processor element can be used as a transfer apparatus free from contention based on a transfer path use schedule table. The point is that there is a means for automatically performing a diversion reservation procedure. A third feature of the present invention is that when a transfer path that is free and available and a transfer distance between a processor that requires use of each processor element and a processor that requires data transfer are determined, a dynamic transfer is performed. It has a control means capable of setting a route. This feature is that the transfer path is composed of transfer lines having independent directions of right and left automatically and efficiently by simple transmission and reception of signals. The starting point of the transfer path is determined from the request signal from the preceding stage and the request signal from the processor element which is individually managed, and thereby, in response to the transfer request of the processor element which occurs completely asynchronously, and between the necessary processor elements, The transfer path is formed without depending on the distance of the transfer distance. A fourth feature of the present invention is that a transfer path is constituted by a plurality of signal lines, and the signal lines of the preceding stage are transferred based on the vacancy information of the signal line of the next-stage transfer path so that the signal lines can be switched between the switch circuits. The point is that a switch circuit provided with a circuit for independently connecting the signal line of the path and the signal line of the subsequent transfer path is configured. FIG. 1 shows a configuration of a parallel processor as an embodiment of the invention described in the claims, FIG. 3 shows a switch circuit, FIG. 4 shows a processing flow of the switch circuit, and FIG. The change in the operating state is shown in FIG.
FIG. 6 shows the transfer path interface of the processor element. In FIG. 1, switches SW1 to SW3 are switch circuits inserted in the transfer path, and PE1 to PE3 are processor elements connected to the transfer path sandwiched by the switch circuits. In FIG. 1, DR and DL are transfer paths for data from left to right and right to left, respectively, and QDR and ADR and QDL and ADL are transfer path reservation information QD from left to right and right to left, respectively.
(Query Distance) and its transfer route confirmation information AD (Acquir
ed Distance). LReqR, LReqL are transfer request signals (Local, ReqR, L) from the processor element for the respective DR and DL transfer paths. In FIG. 1, switch circuits SW1 to SW3 are
It is composed of a circuit related to the transfer path in the left direction. FIG. 3 shows a circuit related to the transfer path in the right direction of the switch circuit SWj (j = 1, 2,...), Where SC is a transfer path switching circuit, and a transfer request processing circuit RP and a transfer path And a decision circuit PD. The same applies to the transfer path in the left direction. Here, the transfer path switching circuit SC is configured by a combination of a buffer BUF having three values including high impedance, and based on the switching control signal S from the transfer path determination circuit PD, the transfer path DRi (i on the input side. = 0, 1,..., M) and the transfer path D′ Ri (i = 0, 1,..., M) on the output side are electrically connected or disconnected. Hereinafter, the right and left subscripts R and L are omitted. Next, the operation of the circuit of FIG. 3 will be described. 1.1) RReqi from the preceding switch to the transfer request processing circuit RP
n, LReqin is input from the connected processor element. 1.2) RReqin is stored as a remote path state (RPS) in flip-flop FF1, and LReqin is stored in flip-flop FF2 as a local path state.
It is stored as a state (Local Path State, LPS).
If there is a request from a processor element connected to the preceding switch, RPS and LPS become 1. 1.3) The control logic means PCON (priority controller) monitors RPS, LPS. RPS has higher priority. When RPS = 1, the origin (ORG) = 0 (steps 4 and 3 in FIG. 4) regardless of the LPS, and the request from the connected processor element is ignored. RReqin to RReqout
Is output. When RPS = 0, look at LPS. When LPS = 1, there is a request from the connected processor element, so ORG = 1
(Steps 1 and 2 in FIG. 4). When LPS = 0, there is no request, so ORG = 0 (steps 1 and 5 in FIG. 4).
RReqout = 0 is output. Next, the operation of the circuit of FIG. 3 will be described with reference to FIG. FIG. 4 shows the basic processing of the switch circuit.
The presence / absence of a transfer path use request from the processor element and the preceding switch, the matching of the transfer path reservation information and the input information of the transfer path confirmation information, and the transfer path reservation information QDout and the transfer path confirmation information ADo for the preceding switch or processor request.
Output ut. 2.1) QDi from the preceding switch to the transfer path decision circuit PD
n, ADin is input from the subsequent switch. QDout is output to the rear switch, and ADout and ACKR (coincidence signal) are output to the front switch. 2.2) ORG for control logic means SD (status detector)
And monitor RReqout. When RPS and LPS are both 0 (when shifting from step 4 to step 5), 1 is added to ADin from the subsequent stage, and ADout = ADin + 1 is output to the preceding stage to the preceding switching circuit (step 5). Since there is no own request,
The own section is added to the number of empty sections in the latter stage, and the result is transmitted to the former stage. INC is an incrementer for adding one. 2.3) If either RPS or LPS is 1, proceed to step 6 (steps 1 to 4), and if ORG = 0, Q from the previous stage
Subtracting 1 to D _in (Step 8), QDout downstream = QDin-1
Is output (steps 6 to 10). QDout indicates the required number of sections and sets itself as a section, so this is subtracted and transmitted to the subsequent stage. DEC is a decrementer for subtracting one. If ORG = 1, QDin required by the connection PE is output to the subsequent stage (steps 9 and 11). 2.4) Also, ADin + 1 is compared with QDout, and if they do not match, regardless of ORG, if there is a transfer route request, ADout
= 0 and ACKout = 0 are output to the preceding switch circuit (steps 12 and 13). If they match, connect the switch (ON)
Then, ADout = ADin + 1 and ACKout = 1 are output to the preceding switch circuit (step 14). 2.5) If QDout = 0 when either LPS or RPS is 1, it is determined that the end of the request transfer path. At this time, the switch is turned off (off), and ADout = 0 and ACKout = 1 are set at the preceding stage. , Q
Dout = 0 and RReq = 0 are output to the subsequent stage (step 15). 3.1) FIG. 5 shows the temporal transition of the outputs of QDout and ADout of the eight switch circuits executed based on the basic processing of FIG. In FIG. 5, initially, ADout = FF (maximum value) in all the switches SWi, which means that all the subsequent sections can be used. QDout = 0
The reason is that there is no request for the transfer path. 3.2) Now, it is assumed that an origin is generated in the switch SW1 at the clock 1 and the number of transfer path sections = 3 is requested. That is,
A case where a transfer request is made from the processor element PE0 connected to the switch SW1 to the processor element PE3 connected to the switch SW4 will be described as an example. In this case, switch SW1
Output QDout = 3 and ADout = 0. Where ADou
t = 0 indicates that a transfer path cannot be formed from the switch SW1 to the right side. The transfer route reservation procedure is started. 3.3) At clock 2, switch SW2 receives the request of switch SW1, and outputs QDout = 2 and ADout = 0 (steps 9.1 and 9.1).
0,12,13). At clock 3, the switch SW3 outputs QDout = 1 and ADout = 0 (steps 9, 10, 12, 13). At clock 4, the switch SW4 receives RReq = 1 and QDout = 0, and outputs ADout = 0 and ACKout = 1. ACKout = 1 is transmitted to the preceding switch circuit (steps 8, 15). 3.4) At clock 5, ACKin of switch SW3 = 1,
Switch SW3 adds 1 to ADout = 0 of switch SW4, and QDo
ut = ADout = 1, the switch is turned on, and the switch SW3
ACKout = 1 (steps 6, 8, 9, 10, 12, 14). At clock 6, QDout = ADout = 2 is output by switch SW2, the switch is turned on, and ACKout = 1 is output to the preceding switch circuit (steps 6, 8, 9, 10, 12, 1).
Four). At clock 7, QDout = ADout = 3 is output by the switch SW1, the switch is turned on, and ACKout = 1 is output to the preceding stage (steps 6, 8, 9, 11, 12, and 14). Here, switches SW1 to SW4, processor elements
A transfer path from PE0 to PE3 is secured. 4.1) By keeping LReq to 1 from the processor element PE1 to PE4, the state of QDout, ADout and the switch on and switch off of the switches SW1 to SW4 is kept unchanged (step
6, 16), during which time data transfer is performed. 5.1) When the data transfer is completed, the PEO
Set LReq to 0, that is, at clock 101, switch SW1 switches LPS
= 0, ORG = 0, RReqout = 0 (steps 4, 1,
Five). The control logic means SD in FIG. 3 detects this and immediately turns off the switch, setting QDout = 0 (steps 4, 1, 1).
Five). Transfer reservation release is started. 5.2) When RReq is transmitted to switches SW1 to SW4, clock 102 causes switch SW2 to switch RReqout = 0 from SW1.
Is input, QDout = 0, ADout = 2, and the switch is turned off. At clock 103, QDout = 0 of the switch SW3, ADout =
1. The switch is turned off. At clock 104, QDout = 0 and ADout = 0 of the switch SW4.
(Steps 4 and 5). 5.3) If ORG = 0 at switch SW4, clock 105
Since RReqin = 0, it is set to the number of transfer path sections that can be used for the switch at the subsequent stage. That is, ADout = FF + 1
= FF (maximum usable section FF) is output (step
4,1,5). At the clock 106, 1 is added to ADout of SW4 by the switch SW3, and ADout is output. At clock 107, 1 is added to ADout of SW3 by switch SW2, and ADout is output. At clock 108, switch SW1 sequentially adds 1 to AD
out is output, and the release of the transfer path is completed here (steps 4, 1, 5). FIG. 6 shows a transfer path interface circuit of a processor element.
The transfer path reservation information QD is connected by a tri-state input buffer B2, and the transfer path confirmation signal AD is also connected by a tri-state output buffer B3.
The transfer request signal LReq and the ACK signal are directly connected to the switch circuit. In FIG. 6, the serial / parallel shift register SPR is used when bit length conversion is required. In the figure, IDB is an internal data bus, CDB is a control data bus, and 20 is an interface control circuit. Next, specific effects of the present embodiment will be described. In this case, as shown in FIG. 9, a parallel processor composed of nine switch circuits and eight arithmetic processors (processor elements) PE0 to PE7 is used as an example. Compared to what was done. In the parallel operation of the Monte Carlo simulation, which is particularly difficult to speed up in device simulation or the like, there is a problem that all processors almost simultaneously send data to all other processors. The effect of this embodiment will be described with respect to this problem. Now, when one data is sent from the eight processors to the other seven processors almost simultaneously, a parallel processor having a bus structure requires 28 transfer times as shown in FIG. A parallel processor having a crossbar switch as the most ideal transfer device requires only four transfers as shown in FIG. On the other hand, in the parallel processor of this embodiment, which is configured by one set of transfer paths in the right and left directions, as shown in FIG.
If a parallel processor configuration having another set of transfer path configurations is used, the number will be 10 times. The dotted line in FIG. 9 indicates a case where the transfer path is used for another time, but can be changed to that time. In general, a table shows a relational expression between the number of transfers and the number of cables in this embodiment when the number of processor elements is n and the number of transfer paths is 2 ^m . In the table, 2 ⁿ is the number of processor elements, and b is the bit width of one transfer path. FIGS. 10 and 11 show the number of transfers and the number of cables when the number of processor elements is changed as a function of the number of transfer paths in the present embodiment. The limit of the number of transfer paths is 1/4 of the number required for the crossbar switch, and the transfer speed at this time is almost equal to that of the crossbar switch. 10 and 11, it is a great effect of the present embodiment that the number of transfer paths can be arbitrarily changed according to the size of the device and the demand for the transfer speed. In this case, several combinations can be selected as shown in the table for the relationship between the number of transfers and the number of cables. 10 and 11, S1 and S3 denote the characteristic lines of the crossbar switch, S2 and S4 denote the characteristic lines of the normal bus, and 40 and 50 denote the selection ranges in the present embodiment. [Effects of the Invention] As described above, the present invention realizes a plurality of transfer paths of any section length at a time on one transfer path by distributing the switch circuits on one transfer path. As a result, the use efficiency of the transfer path can be increased. Further, the processor element connected to the transfer path sees the section length information as the free path information of the transfer path output from the switch circuit, and sends information of the required transfer section length together with the transfer request signal, so that the transfer path is secured. If ACK is received, the transfer path can be used. With such a simple control, the transfer path between the two processors can be dynamically changed by the individual processor elements in the course of processing, thereby realizing high efficiency as well as usage efficiency.

【図面の簡単な説明】第１図は本発明に係わる並列プロセッサの一実施例を示
すブロック系統図、第２図は本発明の概要を説明するた
めの並列プロセッサを示すブロック系統図、第３図はス
イッチ回路を示すブロック系統図、第４図はそのスイッ
チ回路の処理フローを示すフローチャート、第５図は第
４図の基本処理に基づいて実行される８つのスイッチ回
路のQDoutとADoutの出力の時間的推移を示す説明図、第
６図はプロセッサ要素の転送路インタフェース回路を示
すブロック系統図、第７図はバス構成の並列プロセッサ
での転送回数の説明図、第８図は両方向転送路をもつク
ロスバスイッチ構成の並列プロセッサでの転送回数を示
す説明図、第９図は本発明による並列プロセッサでの転
送回数を示す説明図、第10図，第11図は装置の規模と転
送速度の要求に対する転送路の本数を示すグラフ、第12
図は従来のバス構成の並列プロセッサを示すブロック系
統図、第13図は従来のクロスバスイッチ構成の並列プロ
セッサを示すブロック系統図である。 PE1〜PE3……プロセッサ要素、SW1〜SW4……スイッチ回
路、DR,DL……信号線、P1〜P3……転送路、SC……転送
路切替回路、PD……転送経路判定回路、RP……転送要求
処理回路、BUF……バッファ回路、SD,PCON……制御論理
手段、INC……インクリメンタ、DEC……ディクリメン
タ、FF1,FF2……フリップフロップ。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of a parallel processor according to the present invention. FIG. 2 is a block diagram showing a parallel processor for explaining an outline of the present invention. FIG. 4 is a block diagram showing a switch circuit, FIG. 4 is a flowchart showing a processing flow of the switch circuit, and FIG. 5 is an output of QDout and ADout of eight switch circuits executed based on the basic processing of FIG. FIG. 6 is a block diagram illustrating a transfer path interface circuit of a processor element, FIG. 7 is a view illustrating the number of transfers in a parallel processor having a bus configuration, and FIG. 8 is a bidirectional transfer path. FIG. 9 is an explanatory diagram showing the number of transfers in a parallel processor having a crossbar switch configuration having a sword, FIG. 9 is an explanatory diagram showing the number of transfers in a parallel processor according to the present invention, and FIGS. Graph showing the number of transfer path relative to the speed requirements, 12th
FIG. 13 is a block diagram showing a conventional parallel processor having a bus configuration, and FIG. 13 is a block diagram showing a conventional parallel processor having a crossbar switch configuration. PE1 to PE3 Processor element, SW1 to SW4 Switch circuit, DR, DL Signal line, P1 to P3 Transfer path, SC Transfer path switching circuit, PD Transfer path determination circuit, RP ... Transfer request processing circuit, BUF ... Buffer circuit, SD, PCON ... Control logic means, INC ... Incrementer, DEC ... Decrementer, FF1, FF2 ... Flip-flop.

Claims

(57) [Claims] A plurality of processor elements, a one-dimensional transfer path including two sets of transfer paths in which the data transfer direction is unidirectional in the left and right directions, and a plurality of switch circuits, wherein the plurality of processor elements and the plurality of switch circuits are The plurality of switch circuits are alternately arranged on the one-dimensional transfer path, and form a transfer path between any two of the plurality of processor elements to transfer data and exchange information. Has a means for electrically disconnecting or connecting the one-dimensional transfer path, respectively, in a parallel processor, wherein the plurality of processor elements each have a function of controlling use of the one-dimensional transfer path, Transfer path Pi sandwiched between the ith and (i + 1) th switch circuits
The processor element PEi connected to the switch circuit S has a means QReg for generating transfer path reservation information QD of a transfer path required for transfer in the right and left directions, and the switch circuit SWi is connected to the transfer path Pi in the right direction. On the other hand, the preceding switch circuit SWi-1
Transfer path use request signal RReq of transfer path Pi from transfer path and transfer path
The transfer path request signal LReq of the transfer path Pi from the processor element PEi-1 connected to Pi-1 is input, and when a transfer path request from the transfer path Pi-1 is generated, the transfer path request is generated. Originally, the switch circuit SWi-1 and the processor element PEi
-1 that generates a request source confirmation signal ORG indicating which of the two is −1, outputs a transfer path use request signal RReq of the transfer path Pi to the (i + 1) th switch circuit SWi + 1 of the next stage, The (i + 1) th switch circuit for the transfer path Pi-1
Transfer path use request signal RReq of transfer path Pi-1 from SWi + 1
And a transfer path request signal LReq of the transfer path Pi-1 from the processor element PEi connected to the transfer path Pi is input from the right, and when a transfer path request from the transfer path Pi is generated, the transfer path request Means PCON for generating a request source confirmation signal ORG indicating whether the source is the switch circuit SWi + 1 or the processor element PEi, and transmits the transfer path use request signal RReq of the transfer path Pi-1 to the switch circuit SWi Transfer request processing circuit RP to output the transfer path reservation information QD from the processor element PEi-1.
Alternatively, the transfer path reservation information QD from the preceding switch circuit SWi-1 is input, and the transfer path use request signal RReq and the request source confirmation signal OR are sent to the next switch circuit SWi + 1.
The transfer path reservation information QD having a value corresponding to G is output, the transfer path confirmation information AD relating to the transfer path at that time is input from the next-stage switch circuit SWi + 1, and the transfer path use request signal RReq and the request source confirmation are confirmed. A signal ORG and transfer path confirmation information AD having a value corresponding to the input transfer path reservation information QD are output to the preceding switch circuit SWi-1, and the transfer path use request signal RReq and the request source confirmation signal ORG and the input Using the transfer path reservation information QD and the input transfer path confirmation information AD, it is determined whether or not a transfer path Pi to the next-stage switch circuit SWi + 1 requested from the previous stage has been secured. -1 connected to the processor element PEi-1 and the switch circuit SWi-
1, the transfer path use permission signal ACK (right direction) is output, and the transfer path use permission signal ACK (left direction) is similarly applied to the left direction.
A transfer path determination circuit PD for outputting a transfer path request source confirmation signal ORG (rightward and leftward) generated in the transfer request processing circuit RP and a transfer path use permission signal generated in the transfer path determination circuit PD ACK
(Right direction and left direction) and the transfer path reservation information QD and the transfer path confirmation information AD input to the transfer path determination circuit PD, to electrically connect the previous request transfer path and the subsequent free transfer path, or A parallel processor, comprising: a transfer path switching circuit SC for disconnecting connection.