JP2679994B2

JP2679994B2 - Vector processing equipment

Info

Publication number: JP2679994B2
Application number: JP62201701A
Authority: JP
Inventors: 正守柏山; 仁阿部
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1987-08-14
Filing date: 1987-08-14
Publication date: 1997-11-19
Anticipated expiration: 2012-11-19
Also published as: DE3827500C2; JPS6446162A; US5001626A; DE3827500A1

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、２バンクRAMで構成されるベクトルレジス
タの高速動作に係り、特にスーパーコンピュータ等の超
高速マシンサイクル実現に好適なベクトル処理装置に関
する。〔従来の技術〕一般に、スーパーコンピュータの性能を向上させるた
めにベクトル処理装置内に複数個のパイプライン演算器
と複数個のベクトルレジスタを設け、因果性のない命令
間でのベクトルデータ処理の並列化と、並列処理される
夫々のベクトルデータを高速にベクトルレジスタからパ
イプライン演算器へ、パイプライン演算器からベクトル
レジスタへ転送すること、すなわちマシンサイクルのア
ップが最も効果的であることが公知である。第７図の従来のベクトル処理装置の概要を図示したも
のである。ベクトル処理装置は、高速のランダム・アク
セス・メモリー（RAM）で構成されるVR0〜VR31のベクト
ルレジスタ１と、ベクトルレジスタ１の出力ベクトルデ
ータ信号５を命令により夫々０〜３のパイプライン演算
器６へ選択して転送するスイッチマトリックス論理で構
成されるセレクタ（SEL）３と、０〜３のパイプライン
演算器６の出力結果パス８を命令によりVR0〜VR31のベ
クトルレジスタ１に選択するスイッチマトリックス論理
で構成されるセレクタ（DIST）２、及びVR0〜VR31へ前
記DIST2を通してベクトルデータをMS（主憶装置）９か
らロードするベクトル・ロードパイプライン10と、VR0
〜VR31に格納されている演算結果ベクトルデータをSEL3
を通して主メモリ（MS）９へ出力するベクトルストアパ
イプライン11から構成されている。ベクトルロード命令によりMS9からベクトルデータを
ベクトルロードパイプライン10を通して命令で示される
ベクトルレジスタ１の番号へ割り当てられ、マシンサイ
クルのクロック速度で供給される前記ベクトルデータが
ベクトル要素順にRAMにアドレッシングされて書き込ま
れる。次に演算命令により前記ベクトルデータをベクト
ルレジスタより演算パイプライン６へオペランドとして
ベクトル要素順に読み出す。その後当該命令により演算
結果を格納するベクトルレジスタ１の番号が割り当てら
れ、その番号の示すベクトルレジスタが構成されている
RAMに書き込まれる。さらにベクトル演算は同一ベクト
ルデータに対して繰り返し演算が必要であるため、ベク
トルレジスタ１に高速のRAMを使用し、マシンサイクル
のクロック速度でオペランドの読み出しと、演算結果格
納が実現できる構成になっている。このことは、ベクト
ル演算をMS9との間で直接行った場合、MS9を構成してい
る大容量RAMのデータ読み出しにおける立上り時間の遅
れが全体のベクトル演算処理時間に大きく占めるため、
ベクトル演算のベクトルデータに対する繰り返し処理の
利点を生かして一時記憶バッファとしてベクトルレジス
タが採用された理由である。さらに第７図で示したベク
トル処理装置は、マシンサイクルの向上を目的として実
装的遅延時間を短縮するために３次元実装構造や、DIST
2、ベクトルレジスタ１、SEL3等の論理及びRAMを物理的
制約の許す範囲で分割し、分割した論理等を半導体チッ
プで構成すること、及び前記半導体チップをセラミック
の基板に搭載し信号伝達距離を短くする方式も実施され
ている。このような実装構造を有するベクトル処理装置
は日経エレクトロニクス1985年12月16日号第195頁から
第209頁と同じく日経エレクトロニクス1984年11月19日
号第237頁から272頁に紹介されている。また特公昭61−
52521号公報はベクトルレジスタを設けて高速にベクト
ル演算を実行するベクトル処理装置を開示している。さ
らにベクトル演算の特徴である繰り返し演算処理におい
ては、ベクトル演算結果を格納したベクトルレジスタが
次の命令の処理においてオペランドを供給する場合が多
い。そこで論理的に同一番号のベクトルレジスタに対し
てオペランドデータの読み出しと演算結果の書き込みを
同時に行うチェイニング処理を可能とするために前記ベ
クトルレジスタを構成するRAMを２つの独立したアドレ
ッシングが可能なバンク配列とし、一方のバンクはベク
トルデータの全ての偶数要素を保持し、他方のバンクは
前記ベクトルデータの全ての奇数要素を保持するよう構
成し、マシンサイクルのクロック速度で各バンクへの書
き込みと読み出しを可能としたベクトル処理装置も特公
昭61−52521号公報は開示している。一方、半導体分野の従来技術ではアイ・イー・イー・
イー、ジャーナルオブソリッド・ステート・サーキ
ット、エスシー21、４（1986年）第501頁から第504頁
（IEEE、JOURNAL OF SOLID STATE CIRCUITS、SC−2
1、４、（1986年）PP501−504において論じられている
ようなアドレス・アクセス時間がサブナノ秒（1ns以
下）の超高速RAMも実現されている。また超高速のRAMを
半導体チップの中にランダムロジックと混在した形で構
成し物理的に閉じた系の中での超高速動作を実現させる
方法も一般に知られている。また特開昭59−77574号公
報はバンク分けされないベクトルレジスタを用いる高速
化技法を開示している。〔発明が解決しようとする問題点〕ところで、超高速マシンサイクル実現のために上述の
２バンクRAM構成のベクトルレジスタを改良し、物理的
に閉じた系のモジュールでベクトル処理装置に使用す
る。前記ベクトル処理装置実現に際して、基本的にベク
トル処理装置の能力を２倍に向上させるためには２倍高
速なマシンサイクルを実現し、そのマシンサイクルのク
ロック速度でベクトルレジスタへの書き込みと読み出し
を実現すればよいことが従来例から理解できる。しかし
マシンサイクルを向上させることは物理的に閉じた系、
例えば半導体チップ内、モージュール内、パッケージカ
ード内等ではそれら物質を形成している電気的特性能力
の範囲に限定される。すなわち従来例で示した３次元実
装構造、セラミック基板に半導体チップを搭載するモー
ジュール等をベクトル処理装置のモジュールとして使用
し、サブナノ秒でアクセスされる超高速RAMをベクトル
レジスタとして有効に利用することを考えたとき、上記
従来技術では、実装的接続用のコネクタ及びピンを電気
信号が通過することで引き起こされるインピーダンスの
不整合による電気的障害がサブナノ秒の数倍で変化する
マシンサイクルに対しては論理的振幅を十分に得られな
いことや、ノイズに対して誤動作しやすい等の問題があ
る。本発明は物理的に閉じた系のモジュール内部ではベク
トル転送ピッチをマシンサイクルの２倍速にし、モジュ
ール間に信号が渡りインピーダンスの不正合により電気
的障害が発生しそうな所ではマシンサイクルのクロック
速度でベクトルデータを転送しても従来の２倍速のマシ
ンサイクルと同等の演算能力を持つベクトル処理装置を
提供する。〔問題点を解決するための手段〕上記目的は、主記憶と、データを演算する複数の演算
器を有する演算器モジュールと、前記主記憶および前記
演算器からデータが書き込まれ、かつ、前記演算器が演
算するデータまたは前記主記憶へ格納されるデータが読
み出されるRAMを有するベクトルレジスタモジュールと
から構成されるベクトル処理装置に於いて、前記ベクト
ルレジスタモジュールの前記RAMは、基本マシンサイク
ルの２倍速のクロック速度で書き込みと読み出しができ
る、奇数エレメントRAMバンクおよび偶数エレメントRAM
バンクからなり、前記ベクトルレジスタモジュールは、
当該ベクトルレジスタモジュールへ入力される、前記基
本マシンサイクルでかつ互いに1/2サイクル位相をずら
した２つのデータの流れを、前記基本マシンサイクルの
２倍速の１つのデータの流れとして出力する第１のピッ
チ変換回路と、前記基本マシンサイクルおよび1/2位相
をずらした基本マシンサイクルを用いて、前記第１のピ
ッチ変換回路からの２倍速の１つのデータの流れを、前
記奇数エレメントRAMバンクまたは偶数エレメントRAMバ
ンクへ書き込むための書き込みアドレスおよびタイミン
グを制御する書き込み制御回路と、前記基本マシンサイ
クルおよび1/2位相ずらした基本マシンサイクルを用い
て、前記奇数エレメントRAMバンクおよび前記偶数エレ
メントRAMバンクから、前記基本マシンサイクル２倍速
の１つのデータの流れとして読み出すための読み出しア
ドレスおよびタイミングを制御する読み出し制御回路
と、および、前記奇数エレメントRAMバンクおよび前記
偶数エレメントRAMバンクから読み出された２倍速の１
つのデータの流れを、前記基本マシンサイクルでかつ互
いに1/2サイクル位相をずらした２つのデータの流れと
して、当該ベクトルレジスタモジュールから出力する第
２のピッチ変換回路とを有し、ここで、前記RAMバン
ク、前記書き込み制御回路、および、前記読み出し制御
回路を１つの半導体チップ内に構成し、前記第１のピッ
チ変換回路および前記第２のピッチ変換回路をそれぞれ
１つの半導体チップで構成することにする。〔作用〕ベクトルレジスタが２つの独立してアドレッシング可
能なバンク配列で構成され、書き込みアドレスを発生さ
せる書き込み制御信号と読み出しアドレスを発生させる
読み出し制御信号がマシンサイクルのクロック速度で1/
2周期位相差を持っていることにより各バンクに対する
書き込みアドレスと読み出しアドレスがマシンサイクル
の２倍速のクロック速度で供給することができる。それ
によってベクトルレジスタはマシンサイクルの２倍速の
クロック速度で書き込みと読み出しが可能である。また
前記ベクトルレジスタから読み出されるマシンサイクル
の２倍速で切替わるベクトルデータは出力のスイッチマ
トリックス論理で偶数要素と奇数要素が２つのマシンサ
イクルのクロック速度で切替わるマシンサイクルのクロ
ック速度で1/2周期位相を変化させた出力となることか
らベクトルレジスタを含むモジュールからの出力がモジ
ュールのピン通過によりマシンサイクルの２倍速で切替
るベクトルデータを直接出力するのに比較して電気的安
定を損わない。さらに前記ベクトルレジスタに書き込ま
れるマシンサイクルの２倍速で切替るベクトルデータも
前記ベクトルレジスタの入口に設けられたスイッチマト
リックス論理で合成されることからベクトルレジスタを
含むモジュールへのピン通過を含む入力に際して電気的
安定がマシンサイクルの２倍速で切替るベクトルデータ
を直接入力するよりも損われることはない。〔実施例〕本発明の全体的なシステム構成を第１図に概略的に示
す。第１図は、VR0〜VR31のベクトルレジスタ101と、ス
イッチマトリックス論理（DIST）102と、スイッチマト
リックス論理（SEL）103と、パイプライン演算器106
と、ベクトルロードパイプライン110と、ベクトルスト
アパイプライン111と、MS（主記憶）109に分けられる。
ベクトルレジスタ101はそれぞれベクトルデータの偶数
要素を保持するＡバンクRAM125と奇数要素を保持するＢ
バンクRAM126と、２つのバンクRAMに対して書き込みア
ドレスを発生するWAカウンタ121と、同様に読み出しア
ドレスを発生するRAカウンタ122と、それぞれのカウン
タから発生されるアドレスをピッチ制御回路127により
マシンサイクルの２倍速のピッチで振り分けるＡバンク
RAM125用セレクタ123と、同様にＢバンクRAM126用セレ
クタ124と、それぞれのバンクから出力されるデータを
ピッチ制御回路127によりマシンサイクルの２倍速ピッ
チで選択するセレクタ128により構成され、ベクトル要
素を128個保持することができる。またベクトルレジス
タ101は書き込み制御回路112から書き込み制御信号113
と、読み出し制御回路115から読み出し制御信号116がそ
れぞれのベクトルレジスタ101に対してマシンサイクル
のクロック速度の1/2周期位相を変化させた関係で出力
されており、動作中は命令によりそれぞれのベクトルレ
ジスタ101を並列に制御することができる。尚本発明の
ベクトルレジスタ101については詳細に後述する。 DIST102は、ベクトルデータの偶数要素を選択するセ
レクタ118と、奇数要素を選択するセレクタ119がパイプ
ライン演算器106の演算結果出力パス108、108−０、108
−１からのベクトルデータとベクトルロードパイプライ
ンからのMS109に記憶されていたベクトルデータを選択
するよう構成されている。またセレクタ118とセレクタ1
19はマシンサイクルのクロック速度の1/2周期位相を変
化させた状態でそれぞれのセレクタが動作するようにな
っており、第１図には図示しないがベクトルレジスタ10
1の数だけ、詳細には32個用意されていて夫々並列動作
可能である。動作中は、命令により書き込み制御回路11
2から出力されるベクトルレジスタ選択信号114と、ピッ
チ制御回路120からのピッチ選択により命令が示すベク
トルレジスタ101に対応したセレクタ118とセレクタ119
が、マシンサイクルのクロック速度でかつ1/2周期位相
を変化させたベクトルデータの偶数要素と奇数要素の前
半1/2周期分を選択し、それぞれのセレクタの出力をOR
したものが当該ベクトルレジスタ101の書き込みデータ
パス104に出力される。SEL103は、ベクトルレジスタ101
から出力される32本のマシンサイクルの２倍速で駆動さ
れる読み出しデータパス105をマシンサイクルのクロッ
ク速度の1/2周期位相を変化させた関係で動作している
ところの読み出しベクトルデータの偶数要素を選択する
セレクタ129と前記ベクトルデータの奇数要素を選択す
るセレクタ130を有する。またセレクタ129とセレクタ13
0の組は第１図には図示しないが４つのパイプライン演
算器106への出力パス107、107−０、107−１とMS109へ
のベクトルデータの格納に使用するところのベクトルス
トアパイプライン111に対してもそれぞれ用意されてい
て夫々並列動作可能である。動作中は、命令により読み
出し制御回路115から出力されるベクトルレジスタ選択
信号117により命令が示すベクトルレジスタ101に対応し
た読み出しデータパス105から命令が示すパイプライン
演算器106及びベクトルストアパイプライン111への出力
パスに対応したセレクタ129とセレクタ130の組が、ベク
トルデータの偶数要素をセレクタ129、前記ベクトルデ
ータの奇数要素をセレクタ130というふうに選択し、マ
シンサイクルのクロック速度で駆動される。例えば偶数
要素はパス107−０、奇数要素は107−１に出力される。
またパイプライン演算器106等の機能モジュールも夫々
ベクトルデータの偶数要素と奇数要素のパス、例えばパ
イプライン演算器３を例に取ると偶数要素入力バス107
−０、奇数要素入力パス107−１、偶数要素出力パス108
−０、奇数要素出力パス108−１を設け、それぞれ偶数
要素パスと奇数要素パスはマシンサイクルのクロック速
度で駆動されており、かつそれぞれのパスはマシンサイ
クルのクロック速度の1/2周期位相差の関係にある。第１図に示すベクトル処理装置の全体の処理概要は、
従来例で示した第７図のベクトル処理装置及び特開昭58
−114274号と同様であるので省略する。さらにベクトル
レジスタ101の詳細構成は第２図、動作は第５図に示
し、第１図で示したベクトル処理装置のDIST102、ベク
トルレジスタ101、SEL103で構成されるベクトルモジュ
ール（VRモジュール）のデータ系の構成と動作は第３図
及び第６図で示し後述する。第４図は第１図で示したベクトル処理装置の実装構成
の概略を示したものである。第４図でVRモジュール201
は、DIST102、VR0−31のベクトルレジスタ101、SEL103
で論理的に構成されており、物理的にはそれぞれDIST10
2、SEL103がランダムロジックの半導体チップ、ベクト
ルレジスタ101が超高速RAMとランダムロジックが混在し
た構造の半導体チップで構成されている。尚、VRモジュ
ール201には書き込み制御回路112と読み出し制御回路11
5も含まれるものとするが第４図では図示していない。
さらに、４つのパイプライン演算器106も演算器モジュ
ール202内部に複数の半導体チップで構成されており、
演算器モジュール202とVRモジュール201は、接続ピンで
ベクトル処理カード200上に実装されている。尚、ベク
トルロードパイプライン110、ベクトルストアパイプラ
イン111、及びMS109は、他のベクトル処理カードに実装
されているものとし第４図では図示されていない。第４
図のベクトルデータパスについては、マシンサイクルの
２倍速のクロック速度で駆動される書き込みデータパス
104と読み出しデータパス105はVRモジュール201内で物
理的に閉じた関係になっているが、マシンサイクルのク
ロック速度で駆動される演算結果出力パス108、108−
０、108−１及び、ベクトルデータ入力パス107、107−
０、107−１については、２回の接続ピン通過とベクト
ル処理カード200上の配線を伝達することになる。この
ように第１図で示したベクトル処理装置を第４図で示す
実装構成にすることによって、マシンサイクルの２倍速
のクロック速度で駆動されるベクトルデータ信号は、モ
ジュール内という物理的に閉じた空間的広がりの小さい
場所に限定される。さらに各モジュール間のベクトルデ
ータの入出力は、マシンサイクルのクロック速度で1/2
周期位相差を持った関係の信号が切替わることになるの
で、従来のようにマシンサイクルの２倍速のクロック速
度で一接続ピンが切替わる必要がなく、インピーダンス
の不整合が発生する接続ピンでの電気的安定が得られ
る。ベクトルレジスタ第２図はVR0〜VR31の32個のベクトルレジスタ101を構
成するところの一つのベクトルレジスタ101−０を詳細
に示す。また第２図のベクトルレジスタ101−０の動作
説明のためのタイミングチャートが第５図である。（１）クロックベクトルレジスタ101−０に入力されており第５図でT
01クロックはマシンサイクルの２倍速のクロック、T0ク
ロックとT1クロックはマシンサイクルのクロックでそれ
ぞれの関係が1/2周期位相差を持った関係にある。（２）ピッチ制御回路127 ピッチ制御回路127は、T1クロックで駆動されるフリ
ップフロップPIKOE127−０とT0のクロックで駆動される
フリップフロップPIKOL127−１と、前記２つのフリップ
フロップの出力を排他的論理和するEORゲート127−２
と、EORゲート127−２の出力であるピッチ信号127−３
と、T01クロックで駆動されるフリップフロップRDPTCH1
27−４で構成される。PIKO信号144から入力され、T1ク
ロックに同期したマシンサイクルの２倍周期の信号が、
フリップフロップPIKOL127−１でマシンサイクルの1/2
周期位相差をつけられ、フリップフロップPIKOE127−０
との排他的論理和を取ることにより、第５図のEOR127−
３に示すT01クロックに同期してT0クロックで“1"にな
りT1クロックで“1"になる信号を出力する。（３） WAカウンタ121 RAMの書き込みアドレスを発生するWAカウンタ121は、
T0クロックで駆動されるフリップフロップWINC121−０
と、＋１回路121−１と、T0クロックで駆動される６ビ
ットアドレスレジスタWAC121−２で構成される。またWA
カウンタ121は、図示はしないがアドレスレジスタWAC12
1−２をクリアーする構造にもなっている。動作中は、
第５図で示すWINC121−０の信号のように書き込み制御
回路112から出力される書き込み制御信号113によりアド
レスデータがカウントアップされ、アドレスレジスタWA
C121−２にセットされ、WAカウンタアドレスデータ121
−３として出力される。（４） RAカウンタ122 RAMの読み出しアドレスを発生するRAカウンタ122は、
T1クロックで駆動されるフリップフロップRINC122−０
と、＋１回路122−１と、T1クロックで駆動される６ビ
ットのアドレスレジスタRAC122−２で構成される。また
RAカウンタ122は、図示はしないがアドレスレジスタRAC
122−２をクリアーする構造にもなっている。動作中
は、第５図で示すRINC122−０の信号のように読み出し
制御回路115から出力される読み出し制御信号116により
アドレスデータがカウントアップされ、アドレスレジス
タRAC122−１にセットされ、RAカウンタアドレスデータ
122−３として出力される。（５）セレクタ123 ＡバンクRAM125のアドレスデータを選択するセレクタ
123の動作は、第５図に示すようにPITCH信号EOR127−３
が“1"のときWAカウンタアドレスデータ121−３を選択
し、ピッチ信号EOR127−３が“0"のときRAカウンタアド
レスデータ122−３を選択する。さらにセレクタ123の出
力はT01クロックで駆動されるビットのＡバンクアドレ
スレジスタAAD131に入力され、ＡバンクRAMアドレスデ
ータ信号131−０としてＡバンクRAM125に入力される。（６）セレクタ124 ＢバンクRAM126のアドレスデータを選択するセレクタ
124の動作は、第５図に示すようにピッチ信号EOR127−
３が“0"のときWAカウンタアドレスデータ121−３を選
択し、ピッチ信号EOR127−３が“1"のときRAカウンタア
ドレスデータ122−３を選択する。さらにセレクタ124の
出力はT01クロックで駆動される６ビットのＢバンクア
ドレスレジスタBAD132に入力されＢバンクRAMアドレス
データ信号132−０としてＢバンクRAM126に入力され
る。（７）書き込みデータ書き込みデータは書き込みデータパス104から入力さ
れ、T01クロックで駆動されるレジスタWTDATA133に入力
される。さらにレジスタWTDATA133の出力であるDIパス1
33−０を通ってＡバンクRAM125とＢバンクRAM126に入力
される。（８） WE制御回路 WE制御回路はベクトルレジスタ101のそれぞれに設け
られており、命令により書き込み制御回路112からそれ
ぞれのベクトルレジスタ101が並列に動作できるよう制
御されている。WE制御回路の構成は、T0クロックで駆動
されるフリップフロップWEF134と、T1クロックで駆動さ
れるフリップフロップWES135と、セレクタ136と、セレ
クタ137と、T01クロック駆動されるＡバンクRAMのライ
ト・モード・フリップフロップWTMDA138及びＢバンクRA
Mのライト・モード・フリップフロップWTMDB139と、T01
クロックの立ち上りを遅延させRAMの書き込みセットア
ップ時間とT01クロックのパルス幅を重ね合せてRAMWEの
パルス幅及び書き込みホールド時間を調整するライト・
パルス発生器140と、それぞれのライト・モードとライ
ト・パルス発生器140の出力パルスと論理積を取るANDゲ
ート141、142から成り立っている。動作中は第５図に示
すように、ピッチ信号EOR127−３が“1"のときセレクタ
136がフリップフロップWEF134の出力を選択し、ピッチ
信号が“0"のときはセレクタ137がフリップフロップWES
135の出力を選択する。すなわち動作中は全ベクトルデ
ータの偶数要素に保持するＡバンクRAM125への書き込み
には、書き込み制御信号113−０を出力し、ベクトルデ
ータの奇数要素を保持するＢバンクRAM126への書き込み
には、書き込み制御信号113−１を出力する。（９）読み出しデータ動作中は、セレクタ128がＡバンクアドレスレジスタA
AD131が読み出しアドレスデータのときＡバンクRAM125
のデータ出力125−０を、ＢバンクアドレスレジスタBAD
132が読み出しアドレスデータのときＢバンクRAM126の
データ出力126−０を選択するようピッチ制御回路127の
フリップフロップRDPTCH127−４の出力信号127−５で振
り分けられる。さらにセレクタ128の出力はT01クロック
で駆動されるデータレジスタRDDATA143を通って読み出
しデータパス105に出力される。（10）レジスタRAM ２つの超高速RAMを同一アドレスデータ値で同一ベク
トルデータ要素を表現するよう配置する。全ベクトルデ
ータの偶数要素を保持するＡバンクRAM125は、Ａバンク
アドレスレジスタAAD131の出力131−０でアドレッシン
グされる。またベクトルデータの奇数要素を保持するＢ
バンクRAM126は、ＢバンクアドレスレジスタBAD132の出
力132−０でアドレッシングされる。次に第２図に示すベクトルレジスタ101−０の全体的
動作概要を第５図を参照して説明する。第５図はベクト
ルレジスタ101−０に対してベクトルデータの書き込み
と読み出しが同時に行われているチェイニング処理を表
わしている。尚ベクトルデータの要素数は６とし、それ
ぞれ順にe₀、e₁、e₂、e₃、e₄、e₅とする。まず書き込み
は時間t₀にWAカウンタ121のフリップフロップWINC121−
０へWAカウンタ121のクリアー信号W0を発行する。W0は
セレクタ123でピッチ信号EOR127−３が“7"の間選択さ
れるので、t₀−t₁の時間幅となってＡバンクアドレスレ
ジスタAAD131に入力され、出力がt₁時間からt₂時間まで
アドレスAW0としてＡバンクRAM125に印加される。さら
に時間t₀にバンクRAM125の書き込みとしてフリップフロ
ップWEF134に書き込み信号WT0が入力され、セレクタ136
でEOR127−３が“1"の間選択されるのでt₀−t₁の時間幅
となってフリップフロップWTMDA138へ入力される。さら
にフリップフロップWTMDA138の出力ではWT0は時かt₁か
ら時間t₂まで有効となりANDゲート141でライト・パルス
発生器140の出力パルスとANDを取り、時間t₁−t₂の間Ａ
バンクRAM125のWEとして印加される。さらに書き込みベ
クトルデータe₀は時間t₁にレジスタWTDATA133に入力さ
れ、出力はt₁−t₂の時間幅で有効となる。すなわちベク
トルデータの偶数要素の最初であるe₀は時間t₁−t₂の間
にＡバンクRAM125に書き込まれる。次にＢバンク側であ
るが、前記W0はセレクタ124でEOR127−３が“0"の間選
択されるので、t₁−t₂の時間幅となってＢバンクアドレ
スレジスタBAD132に入力され、出力がt₂時間からt₃時間
までアドレスBW0としてＢバンクRAM126に印加される。
さらに時間t₁にＢバンクRAM126の書き込みとしてフリッ
プフロップWES135に書き込み信号WT1が入力され、セレ
クタ137でEOR127−３が“0"の間選択されるので、t₁−t
₂の時間幅となってフリップフロップWTMDB139へ入力さ
れる。さらにフリップフロップWTMDB139の出力ではWT1
は時間t₂から時間t₃まで有効となり、ANDゲート142でラ
イト・パルス発生器140の出力パルスとANDを取り時間t₂
−t₃の間ＢバンクRAM126のWEとして印加される。さらに
書き込みベクトルデータe₁は時間t₂にレジスタWTDATA13
3に入力され、出力はt₂−t₃の時間幅で有効となる。よ
ってベクトルデータ奇数要素の最初であるe₁は時間t₂−
t₃の間にＢバンクRAM126に書き込まれる。以下同様に書
き込みベクトルデータe₂、e₃、e₄、e₅に対してWAカウン
タ121のフリップフロップWINC121−０へWAカウンタ121
のカウントアップ信号W1、W2が入力され、それぞれＡバ
ンクRAM125のアドレスAW1、AW2及びＢバンクRAM126のア
ドレスBW1、BW2となる。またe₂、e₃、e₄、e₅を書き込む
ためのWEであるWT2、WT3、WT4、WT5は、e₂、e₃、e₄、e₅
をe_nとし、WT2、WT3、WT4、WT5をWTnとし、e_nがレジス
タWTDATA133に入力される時間をt_nで表現すると、WTnを
フリップフロップWEF134（ｎ＝２、４）と、フリップフ
ロップWES135（ｎ＝３、５）に入力する時間はt_n−１と
すれば、e₂、e₃、e₄、e₅を書き込むことができる。一方、ベクトルデータe₀、e₁、e₂、e₃、e₄、e₅の読み
出しは時間t₁にRAカウンタ122のフリップフロップRINC1
22−０へRAカウンタ122のクリアー信号R0を発行する。
前記R0はセレクタ123でEOR127−３が“0"の間選択され
るので時間t₁−t₂の間有効となり、Ａバンクアドレスレ
ジスタAAD131に入力され、出力がt₂からt₃時間までアド
レスAR0となってＡバンクRAM125に印加される。さらに
フリップフロップRDPTCH127−４の出力が“0"のときセ
レクタ128はＡバンクRAM125のデータ出力125−０を選択
するので、t₂−t₃の時間ＡバンクRAM125に印加されてい
るアドレスAR0に対応したベクトルデータe₀が出力さ
れ、レジスタRDDATA143に入力され、t₃時間からt₄時間
の間読み出しデータパス105に出力される。次にＢバン
ク側であるが、前記R0はセレクタ124でEOR127−３が
“1"の間選択されるので、時間t₂−t₃の間有効となり、
ＢバンクアドレスレジスタBAD132に入力され出力がt₃時
間からt₄時間までアドレスBR0となってＢバンクRAM126
に印加される。さらにフリップフロップRDPTCH127−４
の出力が“1"のときセレクタ128はＢバンクRAM126のデ
ータ出力126−０を選択するので、t₃−t₄の時間Ｂバン
クRAM126に印加されているアドレスBR0に対応したベク
トルデータe₁が出力され、レジスタRDDATA143に入力さ
れ、t₄時間からt₅時間の間読み出しデータパス105に出
力される。以下同様にベクトルデータe₂、e₃、e₄、e₅を
読み出すためにRAカウンタ122のフリップフロップRINC1
22−０へRAカウンタ122のカウントアップ信号R1、R2が
入力され、それぞれＡバンクRAM125のアドレスAR1、AR2
及びＢバンクRAM126のアドレスBR1、BR2となり、第５図
で示す様にデータレジスタRDDATA143を通って読み出し
データパス105に出力される。よって第２図で示したベ
クトルレジスタ101−０はマシンサイクルの２倍速のピ
ッチでベクトルデータの書き込みと読み出しが同時に可
能である。 VRモジュール第３図はVRモジュールのデータ系の構成概略図であ
る。また第３図の動作説明に使用するのが第６図のタイ
ミングチャートである。第３図はDIST102、ベクトルレ
ジスタ101、SEL103で構成される第４図で図示したVRモ
ジュールである。DIST102は前述の如くであるが、詳細
にはピッチ制御回路120はベクトルレジスタ101−０のピ
ッチ制御回路と同様のもので、T1クロックで駆動される
フリップフロップDPIKOE120−０とT0クロックで駆動さ
れるフリップフロップDPIKOL120−１と、前記２つのフ
リップフロップの出力を排他的論理和するEORゲート120
−２、EORゲート120−２の出力であるピッチ信号120−
３で構成され、動作はピッチ制御回路127と同様であ
る。また詳細にはベクトルデータの偶数要素を選択する
セレクタ118の入力にはT0クロックで駆動されるレジス
タ145、146と、前記ベクトルデータの奇数要素を選択す
るセレクタ119の入力にもT1クロックで駆動されるレジ
スタ147、148が設けられている。さらに詳しくは各セレ
クタ118と119の組の出力はORゲート149となっている。
一方SEL103も詳細にはベクトルデータの偶数要素を選択
するセレクタ129の出力にT0クロックで駆動されるレジ
スタ150と、前記ベクトルデータの奇数要素を選択する
セレクタ130の出力にT1クロックで駆動されるレジスタ1
51が設けられている。特に第３図では第６図での説明の
関係上DIST102のパイプライン演算器106の演算結果偶数
要素出力パス108−０のレジスタをDA3F145、前記演算結
果奇数要素出力パス108−１レジスタをDA3S147とし、さ
らにSEL103のパイプライン演算器3106への偶数要素パス
107−０のレジスタをSA3F150、奇数要素パス107−１の
レジスタをSA3S151とし、その他のレジスタ及びパスに
ついては詳細に図示しない。以上述べた第３図に図示し
た構成のVRモジュール201に対して前記ベクトルレジス
タ101−１の動作の項で説明したベクトルデータe₀、
e₁、e₂、e₃、e₄、e₅がパイプライン演算器106との間で
チェイニング処理を行っている様子を示すタイミングチ
ャートが第６図である。第６図はパイプライン演算器10
6から最初のベクトルデータである偶数要素e₀がパス108
−０からt₀時間にレジスタDA3F145に入力され、e₀はt₀
−t₂の時間有効となりセレクタ118に入力される。また
セレクタ118はピッチ信号120−３が“1"のときレジスタ
DA3F145の出力を選択するので、e₀はt₀−t₁の時間幅で
有効となりORゲート149に出力される。一方、ベクトル
データの奇数要素の最初であるe₁はバス108−１からt₁
時間にレジスタDA3S147に入力され、e₁はt₁−t₃の時間
有効となりセレクタ119に入力される。さらにセレクタ1
19もピッチ信号120−３が“0"のときレジスタDA3S147の
出力を選択するので、e₁はt₁−t₂の時間幅で有効とな
り、ORゲート149に出力される。以下同様にe₂、e₃、
e₄、e₅に対しても同じ手順が繰り返され、マシンサイク
ルピッチでVRモジュール201に送られて来るベクトルデ
ータの偶数要素と奇数要素がマシンサイクルの２倍速ピ
ッチで切替るベクトルデータの列に変換されベクトルレ
ジスタ101のレジスタWTDATA133に入力される。尚、セレ
クタ118とセレクタ119は第６図では図示及び説明はしな
いが、命令によりベクトルレジスタ101のVR0を選択する
組が使用されているものとする。次にベクトルデータ
e₀、e₁、e₂、e₃、e₄、e₅は、ベクトルレジスタ101に保
持され読み出されるわけであるが、詳細は前記ベクトル
レジスタ101−０の動作説明の項に示す。ところでレジスタRDDATA143から出力されたベクトル
データe₀、e₁、e₂、e₃、e₄、e₅は、マシンサイクルの２
倍速ピッチで切替っているわけであるが、SEL103のレジ
スタSA3F150とSA3S151がそれぞれT0クロック、T1クロッ
クで駆動されているために、第６図に示す如くベクトル
データはマシンサイクルピッチで切替る偶数要素のベク
トルデータと奇数要素のベクトルデータとしてパス107
−０及び107−１からマシンサイクルのクロック速度で1
/2周期位相差のある関係で出力される。よって第３図に
図示したDIST102及びSEL103の論理構成にすることによ
り、第２図で図示したベクトルレジスタ101−０をVRモ
ジュール201内の物理的に閉じた空間的広がりの小さい
場所に使用してマシンサイクルの２倍速ピッチで切替る
ベクトルデータ信号をVRモジュール201内に限定するこ
とができる。またVRモジュール内の電気的安定の得られ
る小さな場所では各ベクトルレジスタに対して一本の書
き込みデータパスと一本の読み出しデータパスにできる
ものでVRモジュール内のハード量も低減できる。さらに
VRモジュール201からベクトル処理カードへの信号の入
出力もマシンサイクルのクロック速度で1/2周期位相差
があることから信号の同時切替えによる電気的ノイズに
対しても安定が得られる利点がある。〔発明の効果〕本発明によればマシンサイクルの２倍高速に切替るベ
クトルデータ信号を物理的に限定した場所に閉じこめる
ことができる効果がある。また本発明のベクトルレジス
タはマシンサイクルの２倍高速な処理を行うことができ
る。DETAILED DESCRIPTION OF THE INVENTION [Industrial applications]   The present invention is a vector register composed of 2 banks of RAM.
High-speed operation of computer, especially super computer
For vector processing equipment suitable for realizing high-speed machine cycles
I do. [Conventional technology]   In general, to improve the performance of supercomputers
Multiple pipeline arithmetic units in the vector processor
And multiple vector registers are provided, and instructions with no causality
Parallelization of vector data processing between
Each vector data is transferred from the vector register at high speed.
Vector from pipeline operator to ipline operator
Transfer to register, that is, machine cycle
Is known to be the most effective.   FIG. 7 illustrates the outline of the conventional vector processing device.
It is. The vector processor is a fast random access
Vector of VR0 to VR31 configured with process memory (RAM)
The vector register 1 and the output vector
Pipeline operation of data signal 5 by instruction
Switch matrix logic to select and transfer to the device 6
Selector (SEL) 3 and pipeline of 0-3
The output result path 8 of the arithmetic unit 6 is output to the VR0 to VR31
Switch matrix logic selected for cut register 1
Selector (DIST) 2 and VR0 to VR31
Note MS (memory device) 9 for vector data through DIST2
Vector load pipeline 10 and VR0
~ SEL3 the operation result vector data stored in VR31
Output to the main memory (MS) 9 via
It consists of the ipline 11.   Vector data can be loaded from MS9 by a vector load instruction.
Instructed through vector load pipeline 10
It is assigned to the number of vector register 1 and
The vector data supplied at the clock speed of
Addressed and written to RAM in vector element order
It is. Next, the vector data is vectorized by the operation command.
From the register to the operation pipeline 6 as an operand
Read in vector element order. Then calculate by the instruction
The number of vector register 1 that stores the result is assigned
And the vector register indicated by that number is configured.
Written to RAM. Furthermore, the vector operation is the same
It is necessary to repeatedly calculate
High-speed RAM is used for the tor register 1, machine cycle
Operand reading at the clock speed of
It is a structure that can be delivered. This is a vector
If the calculation is performed directly with the MS9, the
The rise time is slow when reading data from large-capacity RAM.
Since it occupies a large amount of the vector calculation processing time,
Iterative processing of vector data of vector operation
Vector Regis as a temporary storage buffer with the advantage
This is the reason why Furthermore, the vector shown in Fig. 7
The torque processor is used for the purpose of improving the machine cycle.
3D mounting structure and DIST to reduce the mounting delay time.
2. Physical logic and RAM such as vector register 1 and SEL3
Divide as much as the constraints allow, and divide the divided logic into semiconductor chips.
And the semiconductor chip is made of ceramic
A method to shorten the signal transmission distance by mounting it on the board of
ing. Vector processing device having such a mounting structure
From Nikkei Electronics December 16, 1985 page 195
Same as page 209 Nikkei Electronics November 19, 1984
No. 237-272. In addition,
The 52521 publication has a vector register for high-speed vectorization.
Disclosed is a vector processing device that executes a le arithmetic operation. Sa
The repetitive arithmetic processing characteristic of vector arithmetic
Is a vector register that stores the vector operation result
Often supplies operands in the processing of the next instruction
No. Therefore, for vector registers with the same logical number,
Read operand data and write operation result
To enable the chaining process to be performed simultaneously,
The RAM that composes the cut-out register has two independent addresses.
Use a bank array that allows
Holds all the even elements of the Tor data, the other bank
It is designed to hold all odd elements of the vector data.
And write to each bank at the clock speed of the machine cycle.
Specially designed vector processing device that enables loading and reading
Japanese Laid-Open Patent Publication No. 61-52121 discloses it.   On the other hand, in the conventional technology of the semiconductor field,
E, Journal of Solid State Sark
T.S.C. 21, 4 (1986), pages 501-504.
(IEEE, JOURNAL OF SOLID STATE CIRCUITS, SC-2
1, 4 (1986), discussed in PP501-504
Address access time such as sub-nanosecond (1ns or less
Ultra-high-speed RAM (below) has also been realized. Also ultra-fast RAM
Random logic is mixed in the semiconductor chip.
To achieve ultra-high-speed operation in a physically closed system
Methods are also generally known. Also, Japanese Patent Laid-Open No. 59-77574
Fast using non-banked vector registers
Is disclosed. [Problems to be solved by the invention]   By the way, in order to realize a super high speed machine cycle,
Physically improved the vector register of 2-bank RAM configuration
It is a module of a closed system and is used in a vector processing device.
You. When implementing the vector processing device, basically
Twice higher to double the capacity of the processing unit
Achieve a fast machine cycle and
Write to and read from vector register at lock speed
It can be understood from the conventional example that the following is realized. However
Improving machine cycles is a physically closed system,
For example, in a semiconductor chip, module, package package
Electrical characteristics that form those substances in the environment
Is limited to the range. That is, the three-dimensional real shown in the conventional example
Mounting structure, a module that mounts a semiconductor chip on a ceramic substrate.
Module as vector processing device module
And vector ultra-fast RAM accessed in sub-nanosecond
When considering effective use as a register,
In the prior art, connectors and pins for mounting connection are electrically connected.
Of the impedance caused by the passage of the signal
Mismatch-induced electrical impairments vary by subnanoseconds
Insufficient logical amplitude for machine cycle
And the problem of malfunction due to noise.
You.   The present invention is effective inside a physically closed system module.
Toll transfer pitch is set to double the machine cycle speed
Signal is passed between the
Machine cycle clocks where dynamic failure is likely to occur
Even if you transfer vector data at a speed,
A vector processing device with the same computing power as
provide. [Means for solving the problem]   The above purpose is the main memory and a plurality of operations for operating the data.
An arithmetic unit module having a controller, the main memory and the
Data is written from the arithmetic unit, and the arithmetic unit operates.
The data to be calculated or the data stored in the main memory is read
A vector register module with RAM to be found
In a vector processing device composed of
The RAM of the register module is the basic machine cycle.
Write and read at twice the clock speed
Odd element RAM bank and even element RAM
The vector register module consists of a bank,
The base input to the vector register module
This machine cycle and the 1/2 cycle phase shift
The two data flows that were
The first pulse output as one data stream at double speed
H conversion circuit and the basic machine cycle and 1/2 phase
Using the shifted basic machine cycle,
The data flow from the switch converter circuit at double speed
Note Odd element RAM bank or even element RAM bank
Address and time to write to the link
Write control circuit for controlling the
Curu and 1/2 phase shifted basic machine cycle
The odd element RAM banks and the even element RAM banks.
From the ment RAM bank, double the basic machine cycle
To read as one data stream of
Readout control circuit for controlling dress and timing
And the odd element RAM bank and
Double speed 1 read from even element RAM bank
Two data streams in the basic machine cycle and mutually
In the two data flows with 1/2 cycle phase shift
And output from the vector register module
2 pitch conversion circuit, wherein the RAM van
, The write control circuit, and the read control
The circuit is constructed in one semiconductor chip, and the first
The H conversion circuit and the second pitch conversion circuit are respectively
It is configured with one semiconductor chip. [Action]   Two vector registers can be independently addressed
It is composed of a bank array that can generate write addresses.
Generate write control signal and read address
Read control signal is 1 / at machine cycle clock speed
By having a phase difference of 2 periods,
Machine cycle for write address and read address
Can be supplied at a clock speed twice as fast as That
Makes vector register twice as fast as machine cycle
It is possible to write and read at clock speed. Also
Machine cycle read from the vector register
Vector data that switches at 2x speed
Trick logic has two machine elements with even and odd elements
Machine cycle clocks that switch at the clock speed of the icicle
That the output changes the 1/2 cycle phase at the clock speed
Output from the module including the vector register
Switch at double speed of machine cycle by passing pin of tool
The electrical safety compared to directly outputting vector data
Does not spoil the stability. Further write to the vector register
Vector data that switches at twice the machine cycle speed
A switch mat provided at the entrance of the vector register
Since it is synthesized by Rix logic, vector register
Electrical on input including pin passing to module containing
Vector data where stability switches at twice the machine cycle speed
Is no less spoiled than typing. 〔Example〕   The overall system configuration of the present invention is schematically shown in FIG.
You. Figure 1 shows the vector register 101 of VR0-VR31 and the
Switch Matrix Logic (DIST) 102 and Switchmat
Rix logic (SEL) 103 and pipeline arithmetic unit 106
And the vector road pipeline 110 and the vector strike
It is divided into an pipeline 111 and an MS (main memory) 109.
Vector register 101 is an even number of vector data
A bank RAM 125 holding elements and B holding odd elements
Write to bank RAM 126 and two bank RAMs
A WA counter 121 that generates a dress
The RA counter 122 that generates the dress and each count
Address generated by the pitch control circuit 127
A bank that distributes at twice the machine cycle pitch
Selector for RAM125 and selector for B-bank RAM126 as well.
Data output from each bank
Pitch control circuit 127 enables double speed machine cycle
Vector selector.
It can hold 128 elements. Vector regis
Data from the write control circuit 112 to the write control signal 113.
Read control signal 116 from the read control circuit 115.
Machine cycle for each vector register 101
Output by changing the 1/2 cycle phase of the clock speed of
And each vector register is instructed by an instruction during operation.
The transistors 101 can be controlled in parallel. In addition,
The vector register 101 will be described later in detail.   DIST102 selects the even element of vector data.
Lectar 118 and selector 119 for selecting odd elements are pipes
Calculation result output paths 108, 108-0, 108 of the line calculator 106
Vector data from -1 and vector load pipeline
Select the vector data stored in MS109 from the
Is configured to. Also selector 118 and selector 1
19 changes the 1/2 cycle phase of the machine cycle clock speed
Each selector will operate in the converted state.
Although not shown in FIG. 1, the vector register 10
Only 1 number, in detail 32 pieces are prepared in parallel operation
It is possible. During operation, the write control circuit 11
2 and the vector register selection signal 114 output from
The pitch indicated by the instruction is selected by the pitch control circuit 120.
Selector 118 and selector 119 corresponding to the toll register 101
Is the machine cycle clock speed and 1/2 cycle phase
Before even and odd elements of vector data with changed
Select half a cycle and OR the output of each selector
The written data is the write data of the vector register 101.
It is output to the path 104. SEL103 is the vector register 101
Driven at double speed of 32 machine cycles output from
The read data path 105
It operates with the relationship that the 1/2 cycle phase of the speed is changed.
Select the even element of the read vector data
Selects the selector 129 and the odd element of the vector data
A selector 130. In addition, selector 129 and selector 13
The group of 0 is not shown in FIG.
Output paths 107, 107-0, 107-1 to the calculator 106 and MS109
Vector space used to store the vector data of
It is also prepared for each tor pipeline 111
Each can operate in parallel. Read by instruction during operation
Vector register selection output from output control circuit 115
Corresponds to the vector register 101 indicated by the instruction with signal 117
Read data path 105 pipeline from instruction
Output to arithmetic unit 106 and vector store pipeline 111
The set of selector 129 and selector 130 corresponding to the path is
Selects the even element of the data
Select an odd element of the
It is driven at the clock rate of the thin cycle. Even number
The element is output to the path 107-0 and the odd element is output to 107-1.
In addition, each functional module such as pipeline arithmetic unit 106
Path of even and odd elements of vector data,
Taking the ipline calculator 3 as an example, the even element input bus 107
-0, odd element input path 107-1, even element output path 108
-0, odd number element output path 108-1 is provided, and each is even
Element paths and odd element paths are machine cycle clock speeds
Are driven by the
It is related to the phase difference of 1/2 cycle of the clock speed of Kuru.   The overall processing outline of the vector processing device shown in FIG.
A vector processing device shown in FIG.
It is the same as No. 114274, so its explanation is omitted. Further vector
The detailed configuration of the register 101 is shown in FIG. 2 and the operation is shown in FIG.
Then, the vector processing device DIST102 shown in FIG.
Vector module consisting of the torque register 101 and SEL103
Figure 3 shows the configuration and operation of the data system of the module (VR module).
6 and will be described later.   FIG. 4 is an implementation configuration of the vector processing device shown in FIG.
It shows the outline of. VR module 201 in Fig. 4
Is DIST102, vector register 101 of VR0-31, SEL103
It is logically composed of DIST10
2, SEL103 is a random logic semiconductor chip, vector
Register 101 is a combination of ultra high speed RAM and random logic
It is composed of a semiconductor chip having a different structure. In addition, VR module
Write control circuit 112 and read control circuit 11
Although 5 is also included, it is not shown in FIG.
Furthermore, the four pipeline arithmetic units 106 are also arithmetic unit modules.
It is composed of multiple semiconductor chips inside the
The computing module 202 and VR module 201 are connected pins.
It is implemented on the vector processing card 200. In addition, Baek
Toll road pipeline 110, vector store pipeline
In 111 and MS109 are mounted on other vector processing cards
However, it is not shown in FIG. 4th
For the vector data path in the figure, the machine cycle
Write data path driven at double clock speed
104 and the read data path 105 are the ones in the VR module 201.
Although the relationship is reasonably closed, the machine cycle
Calculation result output paths driven at lock speed 108, 108-
0, 108-1 and vector data input paths 107, 107-
For 0 and 107-1, the connection pin has passed twice and the vector has passed.
The wiring on the processing card 200 will be transmitted. this
The vector processing device shown in FIG. 1 is shown in FIG.
Double the machine cycle speed by implementing
Vector data signals driven at
Small spatial extent of physically closed space in the module
Limited to location. In addition, the vector data between each module
Input and output of data is 1/2 at clock speed of machine cycle
The related signals with the periodic phase difference will be switched.
And the clock speed is twice as fast as the conventional machine cycle.
One connection pin does not need to be switched every time, impedance
Electrical stability at the connecting pins that results in
You. Vector register   FIG. 2 shows 32 vector registers 101 of VR0 to VR31.
Details of one vector register 101-0
Shown in Also, the operation of the vector register 101-0 in FIG.
FIG. 5 is a timing chart for explanation. (1) Clock   It is input to the vector register 101-0 and is T in FIG.
The 01 clock is T0 clock, which is twice as fast as the machine cycle.
Lock and T1 clock are machine cycle clocks
Each of them has a 1/2 phase difference. (2) Pitch control circuit 127   The pitch control circuit 127 is driven by the T1 clock.
Driven by the clock of the up-flop PIKOE 127-0 and T0
Flip-flop PIKOL127-1 and the two flips
EOR gate 127-2 for exclusive ORing the outputs of flops
And the pitch signal 127-3 output from the EOR gate 127-2.
And flip-flop RDPTCH1 driven by T01 clock
It is composed of 27-4. Input from PIKO signal 144,
A signal with a double cycle of the machine cycle synchronized with the lock
1/2 of machine cycle with flip-flop PIKOL 127-1
Flip-flop PIKOE127-0 with periodic phase difference
By taking the exclusive OR with
Set to "1" at the T0 clock in synchronization with the T01 clock shown in 3.
A signal that becomes "1" at the T1 clock is output. (3) WA counter 121   WA counter 121, which generates the write address of RAM,
Flip-flop WINC121-0 driven by T0 clock
And +1 circuit 121-1 and 6-bit driven by T0 clock
Address register WAC121-2. Also WA
The counter 121 has an address register WAC12 (not shown).
It has a structure that clears 1-2. During operation,
Write control like the signal of WINS121-0 shown in Fig. 5.
The write control signal 113 output from the circuit 112
Address data is counted up and address register WA
WA counter address data 121 set in C121-2
Is output as -3. (4) RA counter 122   The RA counter 122 that generates the RAM read address is
Flip-flop RINC122-0 driven by T1 clock
, +1 circuit 122-1 and 6-bit driven by T1 clock
Address register RAC122-2. Also
Although not shown, the RA counter 122 is an address register RAC.
It has a structure that clears 122-2. in action
Reads like the signal of RINC122-0 shown in FIG.
By the read control signal 116 output from the control circuit 115
The address data is counted up and the address register
RA counter address data set in RAC 122-1
It is output as 122-3. (5) Selector 123   Selector for selecting address data of A bank RAM125
The operation of 123 is the PITCH signal EOR127-3 as shown in FIG.
WA counter address data 121-3 is selected when is "1"
However, when the pitch signal EOR127-3 is “0”, the RA counter is added.
Select the response data 122-3. Furthermore, the output of selector 123
Force is bit A driven by T01 clock
It is input to the register AAD131 and the A bank RAM address
A data signal 131-0 is input to the A bank RAM 125. (6) Selector 124   Selector for selecting address data of B bank RAM126
The operation of the pitch signal EOR127- is as shown in FIG.
WA counter address data 121-3 is selected when 3 is "0".
When the pitch signal EOR127-3 is “1”, the RA counter
Select the dress data 122-3. Further selector 124
The output is a 6-bit B bank address driven by the T01 clock.
B bank RAM address input to the dress register BAD132
Data signal 132-0 is input to B bank RAM 126
You. (7) Write data   The write data is input from the write data path 104.
Input to register WTDATA133 driven by T01 clock
Is done. In addition, DI path 1 which is the output of register WTDATA133
Input to A bank RAM 125 and B bank RAM 126 through 33-0
Is done. (8) WE control circuit   The WE control circuit is provided in each of the vector registers 101
And the write control circuit 112
Control each vector register 101 to operate in parallel.
It is controlled. WE control circuit is driven by T0 clock
It is driven by the flip-flop WEF134 and the T1 clock.
Flip-flop WES135, selector 136, selector
Connector 137 and a bank of A bank RAM driven by T01 clock.
Mode flip-flop WTMDA138 and B bank RA
M write mode flip-flop WTMDB139 and T01
Delay the rising edge of the clock and set the RAM write
Of the RAMWE
Write to adjust the pulse width and write hold time
The pulse generator 140 and its respective write mode and light
AND pulse AND the output pulse of pulse generator 140
It consists of 141, 142. Shown in Figure 5 during operation
When the pitch signal EOR127-3 is “1”, the selector
136 selects the output of flip-flop WEF134 and pitches
When the signal is “0”, the selector 137 turns the flip-flop WES
Select 135 outputs. That is, during operation all vector data
Writing to the A bank RAM 125 which is held in the even elements of the data
Write control signal 113-0 is output to the
Writing to B bank RAM 126 that holds odd elements of data
Then, the write control signal 113-1 is output to. (9) Read data   During operation, the selector 128 sets the A bank address register A
When AD131 is read address data, A bank RAM125
Data output 125-0 of B bank address register BAD
When 132 is read address data, B bank RAM 126
Of the pitch control circuit 127 to select the data output 126-0.
The output signal 127-5 of the flip-flop RDPTCH127-4 swings.
Divided. Furthermore, the output of the selector 128 is the T01 clock
Read through data register RDDATA143 driven by
And output to the data path 105. (10) Register RAM   Two ultra high-speed RAMs with the same address and data value
It is arranged so that it represents the Toru data element. All vector de
A bank RAM 125 that holds the even elements of the data
Address register AAD131 output 131-0 for addressing
Is In addition, B that holds an odd number of vector data elements
The bank RAM126 is the output of the B bank address register BAD132.
Addressed with force 132-0.   Next, the entire vector register 101-0 shown in FIG.
The outline of the operation will be described with reference to FIG. Figure 5 is vector
Writing vector data to the register 101-0
Represents the chaining process in which read and read are performed simultaneously.
I do. The number of vector data elements is 6, and
E in order₀, E₁, E_Two, E_Three, E_Four, E_FiveAnd Write first
Is time t₀WA counter 121 flip-flop WINC121−
The clear signal W0 of the WA counter 121 is issued to 0. W0 is
The selector 123 selects the pitch signal EOR127-3 while it is "7".
So t₀−t₁Becomes the time width of
Input to the AAD131 register, and the output is t₁Time to t_TwoUp to time
The address AW0 is applied to the A bank RAM 125. Further
At time t₀Flip flow to write to bank RAM125
Write signal WT0 is input to WEF134 and selector 136
Since EOR127-3 is selected during "1", t₀−t₁Time span
Is input to the flip-flop WTMDA138. Further
At the output of the flip-flop WTMDA138, WT0 is₁Or
Time t_TwoValid until AND Write pulse with AND gate 141
AND with the output pulse of the generator 140, and₁−t_TwoBetween A
Applied as WE of bank RAM125. Write more
Cutle data e₀Is time t₁Input to register WTDATA133
And the output is t₁−t_TwoIt becomes effective in the time width of. I.e.
E, which is the first of the even elements of the Tor data₀Is time t₁−t_TwoBetween
Is written to the A bank RAM 125. Next, on the B bank side
However, W0 is selected by the selector 124 while EOR127-3 is “0”.
Selected, t₁−t_TwoIt becomes the time width of B bank address
Input to register BAD132 and output is t_TwoTime to t_Threetime
Is applied to the B bank RAM 126 as the address BW0.
Further time t₁As writing to B bank RAM126
The write signal WT1 is input to the program WES135,
Since the EOR127-3 is selected during “0” by the Kuta 137, t₁−t
_TwoInput to flip-flop WTMDB139.
It is. In addition, the output of flip-flop WTMDB139 is WT1
Is time t_TwoFrom time t_ThreeValid until AND gate 142
AND pulse generator 140 output pulse and AND time t_Two
−t_ThreeDuring this period, it is applied as WE of B bank RAM 126. further
Write vector data e₁Is time t_TwoRegister WTDATA13
Input to 3 and output is t_Two−t_ThreeIt becomes effective in the time width of. Yo
Is the beginning of the vector data odd element e₁Is time t_Two−
t_ThreeIs written in the B bank RAM 126 during the period. And so on
Vector data e_Two, E_Three, E_Four, E_FiveAgainst WA coun
To the flip-flop WNC121-0 of the data counter 121 WA counter 121
The count-up signals W1 and W2 of
Link RAM 125 addresses AW1 and AW2 and B bank RAM 126 address
Dresses BW1 and BW2. Also e_Two, E_Three, E_Four, E_FiveWrite
WE for WT2, WT3, WT4, WT5_Two, E_Three, E_Four, E_Five
E_nAnd WT2, WT3, WT4, WT5 are WTn, e_nThe Regis
The time input to WTDATA133 is t_nIs expressed as
Flip-flop WEF134 (n = 2, 4) and flip-flop
The time to input to rop WES135 (n = 3, 5) is t_n-1 and
If you do e_Two, E_Three, E_Four, E_FiveCan be written.   On the other hand, vector data e₀, E₁, E_Two, E_Three, E_Four, E_FiveReading
Time is t₁RA counter 122 flip-flop RINC1
The clear signal R0 of the RA counter 122 is issued to 22-0.
The R0 is selected by the selector 123 while EOR127-3 is "0".
Time t₁−t_TwoIs valid during
Input to the AAD131 register, and the output is t_TwoFrom t_ThreeAdd up to time
It becomes a response AR0 and is applied to the A bank RAM125. further
When the output of the flip-flop RDPTCH127-4 is “0”,
Lector 128 selects data output 125-0 of A bank RAM 125
So t_Two−t_ThreeIs being applied to A bank RAM125 for
Vector data e corresponding to the address AR0₀Is output
Input to register RDDATA143, t_ThreeTime to t_Fourtime
During this period, the data is output to the read data path 105. Then B van
The R0 is the selector 124 and the EOR127-3 is
Since it is selected during “1”, time t_Two−t_ThreeIs valid for
Input to B bank address register BAD132 and output is t_ThreeTime
From t_FourAddress BR0 until time B bank RAM126
Is applied to Further flip-flop RDPTCH127-4
When the output of "1" is "1", the selector 128 is
Since data output 126-0 is selected, t_Three−t_FourTime B van
Vector corresponding to address BR0 applied to RAM 126
Torudata e₁Is output and input to the register RDDATA143.
And t_FourTime to t_FiveAppears on read data path 105 for a period of time
Is forced. Similarly, vector data e_Two, E_Three, E_Four, E_FiveTo
RA counter 122 flip-flop RINC1 to read
22-0 Count up signals R1 and R2 of RA counter 122
Addresses AR1 and AR2 of A bank RAM125 respectively
And addresses B1 and BR2 of B bank RAM126, as shown in FIG.
Read through the data register RDDATA143 as shown in
It is output to the data path 105. Therefore, the
The cuttle register 101-0 is a machine speed double speed machine cycle.
Switch allows vector data to be written and read at the same time
Noh. VR module   Figure 3 is a schematic diagram of the data structure of the VR module.
You. The tie of FIG. 6 is used to explain the operation of FIG.
It is a mining chart. Figure 3 shows DIST102, vector
The VR model shown in Fig. 4 that consists of the transistor 101 and SEL103.
It is Jules. DIST102 is as described above, but details
The pitch control circuit 120 is connected to the vector register 101-0.
It is similar to the switch control circuit and is driven by the T1 clock.
Driven by flip-flop DPIKOE120-0 and T0 clock
Flip-flop DPIKOL120-1 and the two flip-flops
EOR gate 120 for exclusive ORing the outputs of the lip flops
-2, the pitch signal 120- which is the output of the EOR gate 120-2
3 and the operation is similar to that of the pitch control circuit 127.
You. Also, in detail, select even elements of vector data.
The register 118 input is a register driven by the T0 clock.
Data 145, 146 and the odd elements of the vector data
The register that is driven by the T1 clock is also input to the selector 119.
Stars 147 and 148 are provided. For more details,
The output of the pair of connectors 118 and 119 is an OR gate 149.
On the other hand, SEL103 also selects even elements of vector data in detail.
The register driven by the T0 clock is output to the selector 129.
Star 150 and select odd elements of the vector data
Register 1 driven by T1 clock at the output of selector 130
51 is provided. Particularly in FIG. 3, the explanation of FIG.
Due to the relation, the arithmetic result of the pipeline arithmetic unit 106 of DIST102 is even
Set the register of element output path 108-0 to DA3F145,
As a result, set the odd-numbered element output path 108-1 register to DA3S147.
And even-numbered element path to pipeline arithmetic unit 3106 of SEL103
Set the register of 107-0 to SA3F150, the odd element path 107-1
Register as SA3S151, and other registers and paths
This is not shown in detail. Shown in Figure 3 above
The vector register for the VR module 201 with a different configuration
Vector data e explained in the section of operation of data 101-1₀,
e₁, E_Two, E_Three, E_Four, E_FiveBetween the pipeline arithmetic unit 106
Timing chart showing how chaining is being performed
FIG. 6 is a chart. Figure 6 shows the pipeline arithmetic unit 10
Even element e which is the first vector data from 6₀Has pass 108
-0 to t₀It is input to the register DA3F145 at time and e₀Is t₀
−t_TwoIs valid for the time and is input to the selector 118. Also
The selector 118 is a register when the pitch signal 120-3 is "1".
Select the DA3F145 output.₀Is t₀−t₁In the time span of
It becomes valid and is output to the OR gate 149. Meanwhile, the vector
E, which is the first of the odd elements of the data₁From bus 108-1 to t₁
It is input to the register DA3S147 at time and e₁Is t₁−t_Threetime of
It becomes valid and is input to the selector 119. Further selector 1
19 also when the pitch signal 120-3 is "0", register DA3S147
Select the output, e₁Is t₁−t_TwoValid in the time range of
Output to the OR gate 149. And so on_Two, E_Three,
e_Four, E_FiveThe same procedure is repeated for the machine cycle.
The vector data sent to the VR module 201 at
The even and odd elements of the data are double the speed of the machine cycle.
Switch to the vector data sequence that is switched by the switch.
It is input to the register WTDATA133 of the register 101. In addition,
Actor 118 and selector 119 are not shown or described in FIG.
However, the instruction selects VR0 of vector register 101.
It is assumed that a tuple is used. Then vector data
e₀, E₁, E_Two, E_Three, E_Four, E_FiveAre stored in the vector register 101.
It is held and read, but the details are the vector
It is shown in the section for explaining the operation of the register 101-0.   By the way, the vector output from the register RDDATA143
Data e₀, E₁, E_Two, E_Three, E_Four, E_Five2 of the machine cycle
Although it is switching at a double speed pitch, the cash register of SEL103
The SA3F150 and SA3S151 are T0 clock and T1 clock respectively.
Since it is driven by the clock, the vector as shown in Fig. 6
The data is a vector of even elements that is switched at the machine cycle pitch.
Path 107 as the vector data of the toll data and the odd element
-1 from 0 and 107-1 at machine cycle clock speed
/ 2 Period Output with a phase difference. Therefore, in FIG.
According to the logical configuration of DIST102 and SEL103 shown in the figure,
Then, the vector register 101-0 shown in FIG.
Small spatial extent of physically closed space in module 201
Use in place and switch at double speed pitch of machine cycle
Limit the vector data signal to the VR module 201.
Can be. Also, the electrical stability inside the VR module can be obtained.
One book for each vector register in a small space
Can be embedded data path and one read data path
With this, the amount of hardware in the VR module can also be reduced. further
Signal input from VR module 201 to vector processing card
Output also has a 1/2 cycle phase difference at the machine cycle clock speed
Because of the presence of electrical noise due to simultaneous switching of signals
On the other hand, there is an advantage that stability can be obtained. 〔The invention's effect〕   According to the present invention, the switching speed is twice as fast as the machine cycle.
Confine the cuttle data signal to a physically restricted location
There is an effect that can be. The vector register of the present invention
Data processing is twice as fast as a machine cycle
You.

【図面の簡単な説明】第１図は本発明の全体的システム構成を示すブロック
図、第２図は第１図のベクトルレジスタの詳細構成を示
す図、第３図はVRモジュールのデータ系構成を示す図、
第４図はベクトル処理装置の実装構成を示す図、第５図
はベクトルレジスタの動作を説明するタイミングチャー
ト、第６図はVRモジュールのデータ系動作を説明するタ
イミングチャート、第７図は従来例を示す図である。 101〜101−０……ベクトルレジスタ、102……DIST、103
……SEL、106……パイプライン演算器、109……MS、110
……ベクトルロードパイプライン、111……ベクトルス
トアパイプライン、112……書き込み制御回路、115……
読み出し制御回路、118〜119……セレクタ、120……ピ
ッチ制御回路、121〜121−３……WAカウンタ、122〜122
−３……RAカウンタ、123〜124……セレクタ、125……
ＡバンクRAM、126……ＢバンクRAM、127〜127−４……
ピッチ制御回路、128……セレクタ、131……Ａバンクア
ドレスレジスタAAD、132……Ｂバンクアドレスレジスタ
BAD、133……データレジスタWTDATA、134〜142……WE制
御回路、143……データレジスタRDDATA、200……ベクト
ル処理カード、201……VRモジュール、202……演算器モ
ジュール。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing the overall system configuration of the present invention, FIG. 2 is a diagram showing the detailed configuration of the vector register of FIG. 1, and FIG. 3 is a data system configuration of a VR module. Showing the figure,
FIG. 4 is a diagram showing the mounting structure of the vector processing device, FIG. 5 is a timing chart for explaining the operation of the vector register, FIG. 6 is a timing chart for explaining the data system operation of the VR module, and FIG. 7 is a conventional example. FIG. 101 to 101-0 ... vector register, 102 ... DIST, 103
…… SEL, 106 …… Pipeline arithmetic unit, 109 …… MS, 110
…… Vector load pipeline, 111 …… Vector store pipeline, 112 …… Write control circuit, 115 ……
Read-out control circuit, 118-119 ... Selector, 120 ... Pitch control circuit, 121-121-3 ... WA counter, 122-122
-3 ... RA counter, 123-124 ... selector, 125 ...
A bank RAM, 126 ...... B bank RAM, 127-127-4 ...
Pitch control circuit, 128 selector, 131 A bank address register AAD, 132 B bank address register
BAD, 133 ... data register WTDATA, 134 to 142 ... WE control circuit, 143 ... data register RDDATA, 200 ... vector processing card, 201 ... VR module, 202 ... computing module.

Claims

(57) [Claims] A main memory, a calculator module having a plurality of calculators for calculating data, data written from the main memory and the calculator, and data stored in the main memory or calculated by the calculator Is read
In a vector processing device including a vector register module having a RAM, the RAM of the vector register module can be written and read at a clock speed twice as fast as a basic machine cycle. The vector register module is composed of a RAM bank, and the vector register module outputs two data flows which are input to the vector register module and which are out of phase with each other by ½ cycle in the basic machine cycle, at a speed twice that of the basic machine cycle. Using the first pitch conversion circuit for outputting as one data flow and the basic machine cycle and the basic machine cycle with the 1/2 phase shifted, one double speed data from the first pitch conversion circuit Of the odd element RAM bank or even From the odd element RAM bank and the even element RAM bank, using a write control circuit for controlling a write address and timing for writing to the element RAM bank, and using the basic machine cycle and the basic machine cycle shifted by 1/2 phase, A read control circuit for controlling a read address and a timing for reading as one data stream of the double speed of the basic machine cycle; and a double speed of 1 read from the odd element RAM bank and the even element RAM bank. And a second pitch conversion circuit for outputting from the vector register module as two data streams in the basic machine cycle and having phase shifts of 1/2 cycle with respect to each other. RAM bank, write system A circuit and the read control circuit are formed in one semiconductor chip, and the first pitch conversion circuit and the second pitch conversion circuit are respectively formed in one semiconductor chip. .