JPS6073786A

JPS6073786A - Vector data processing device

Info

Publication number: JPS6073786A
Application number: JP18116283A
Authority: JP
Inventors: Shoji Nakatani; 中谷　彰二; Yuji Oinaga; 勇次追永
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-09-29
Filing date: 1983-09-29
Publication date: 1985-04-25

Abstract

PURPOSE:To execute a load instruction without decreasing hardware by accessing a main memory device by one out of the pipelines in the uncontinuous access and by making respective access pipelines independent in the access whose data continues. CONSTITUTION:When there are (1) VLD (data continuous), (2) VLD (-ditto-), (3) VLD (data uncontinuous), (4) VLD (data continuous) and (5) VLD (-ditto-) as vector load instructions (VLD) lines, the address needed to execute the instructions (1) is made at the address pipeline 17-A and the address needed to execute the instruction (2) is generated by the same address pipeline 17-B. After both addresses end, the address needed to execute the instruction (3) is generated by the same pipeline 17-A-17-D. There, the address needed to execute the instruction (4) is generated by the pipeline 17-A and the address needed to execute the instruction (5) is generated by the pipeline 17-B.

Description

[Detailed description of the invention]

［発明Ｑ）技術分野〕本発明は、主記憶装置σ）ブロック・アクセスを行う場
合には１個のアドレス発生制御部が１個のアドレス生成
部音制御し、主記憶装置のブロック・アクセス以外のア
クセスを行う場合には１個のアドレス発生制御部が複数
のアドレス生成部を制御するようになったベクトル時デ
ータ処理装置に関するものである。Ｃ従来技術と問題点〕ベクトル・データ処理装置は、ベクトル・レジスタ、ベ
クトル演算器、２本のアクセス・パイプライン及び命令
処理部などを有している。ベクトル演算バイブラインで
は、スカラ演算器と同様に倍精度や単精度、固定小数点
演算が行われる。倍精度データ、単精度データ及び固足
小截点データは、第１図のような形式でベクトル・レジ
スＩ上で定義されている。そのため、主記憶装置からベクトル・レジスタへの転送
又はベクトル・レジスタから主記憶装置への書込みは４
Ｂ（バイト）幅ないし８Ｂ単位で行われている。更に、
ロード／ストアの転送速度は演算の速度に見合うもので
なくてはならない。つまり、第２図に示すように、ベクトル・ロードされた
データをベクトル・レジスタに書込みつ〜ベクトル演算
パイプラインと連鎖させることによ演算器は１τ／４オ
ペレーシヨンで動作するという前提の下で、アクセス・
パイプラインのスル−ブツトも１τ／４オペレーシヨン
にしである。また、主記憶制御装置のアクセスでは、４
Ｂ又は８Ｂ命令を扱うことから、４Ｂデー４を扱う場合
にはディスタンスが４Ｂないし８Ｂのとき、８Ｂｆ−夕
を扱う場合にはディスタンスが８Ｂのときには、主記憶
装置のブロック・アクセスを行うことが可能となって。主記憶とのスルーブツト≧ ベクトル・レジスタとのスルーブツトとなり、演算スピードに見合うだけのスループットが可
能となる。ディスタンスが大きい場合、上記のスルーブ
ツトヲ満足させるためには、２本のアクセス・パイプラ
インのそれぞれが４本ずつ。計８本のアドレス・パイプラインが必要であるが。８本のアドレス・パイプラインを設けることはハードウ
ェア量の増大を招く。〔発明の目的〕本発明は、上記の考察に基づくものであって。ハードウェア量を増加させることなく且つスルーブツト
を比較的におと１−ことなく、ブロック・アクセス不同
のベクトル・ロード命令やベクトル・ストア命令などを
実行できるようにしたベクトル・データ処理装置ケ提供
することを目的としている。〔発明の構成〕そしてそのため１本発明のベクトル−データ処理装置は
、１つないし複徐のデータ・バスとアドレス生成部とア
ドレス生成スをもって主記憶装置とベクトル・レジスタ
間の転送を行う独立に動作可能な複数個のアクセス・パ
イプフィンを有するベクトル・データ処理装置において
、主記憶装置上のデータを参照する場合参照するデータ
が連続するアクセスにおいては、各アクセス・パイプラ
インが独立にアドレス生成部およびアドレス・バスを管
理して主記憶装置にアクセスを出し、データが不連続で
あるアクセスにおいては複数個のアクセス・パイプライ
ンのうちいずれか１つのアクセス・パイプラインがアド
レス生成部及びアドレス・バスを管理して主記憶装置を
アクセスするようにしたことを特徴とするものである。〔発明の実施例〕以下、本発明を図面を参照しつつ説明する。第３図は本発明が適用されるベクトル・データ処理装置
の概要を示す図、第４図はアクセス・パイプラインの１
実施例の構成を示す図、第５図はアクセス・パイプライ
ンのアドレス部の１実施例の構成を示す図、第６図はア
ドレス発生部の１実施例の構成を示す図、第７図はアド
レス発生制御部の要部の１実施例構成を示す図、第８図
は本発明の詳細な説明する図である。第３図において、ｌは主記憶装置、２は主記憶装置上ν
、３は命令処理部、４−Ａと４−Ｂはアク構成のもので
あり、また、ブロック・アクセスを行い得るものである
。主記憶制御装置２は、主記憶装置１とアクセス要求元
装置との間におけるデータ転送の仲介を行うものである
。命令処理部３は、アクセス・パイプラインやベクトル
演算器[Invention Q) Technical field] The present invention provides that when main memory σ) block access is performed, one address generation control section controls one address generation section, and when performing block access to the main memory The present invention relates to a vector time data processing device in which one address generation control section controls a plurality of address generation sections when performing an access. C. Prior Art and Problems] A vector data processing device includes a vector register, a vector arithmetic unit, two access pipelines, an instruction processing unit, and the like. The vector operation vibe line performs double-precision, single-precision, and fixed-point operations in the same way as scalar arithmetic units. Double-precision data, single-precision data, and fixed foot small point data are defined on the vector register I in the format shown in FIG. Therefore, a transfer from main memory to a vector register or a write from a vector register to main memory is 4
This is done in B (byte) width or 8B units. Furthermore,
The load/store transfer speed must match the calculation speed. In other words, as shown in Figure 2, by writing vector-loaded data to a vector register and chaining it with the vector operation pipeline, the arithmetic unit operates in 1τ/4 operations. ,access·
The pipeline throughput is also 1τ/4 operation. In addition, when accessing the main memory control unit, 4
Since it handles B or 8B instructions, block access to the main memory can be performed when the distance is 4B to 8B when handling 4B data 4, and when the distance is 8B when handling 8Bf-1. It became possible. The throughput with the main memory is greater than or equal to the throughput with the vector register, making it possible to achieve a throughput commensurate with the calculation speed. If the distance is large, each of the two access pipelines needs four access pipelines to satisfy the above throughput. A total of eight address pipelines are required. Providing eight address pipelines increases the amount of hardware. [Object of the Invention] The present invention is based on the above consideration. To provide a vector data processing device capable of executing vector load instructions, vector store instructions, etc. with nonuniform block access without increasing the amount of hardware and without decreasing throughput. The purpose is to [Structure of the Invention] Therefore, the vector data processing device of the present invention independently performs transfer between the main storage device and the vector register using one or more data buses, an address generation unit, and an address generation unit. In a vector data processing device having a plurality of operable access pipe fins, when referring to data on the main memory, each access pipeline independently processes an address generator when the data to be referenced is accessed consecutively. and the address bus to issue accesses to the main memory, and in accesses where data is discontinuous, one of the multiple access pipelines handles the address generation unit and the address bus. The main storage device is accessed by managing the main memory. [Embodiments of the Invention] The present invention will be described below with reference to the drawings. FIG. 3 is a diagram showing an overview of a vector data processing device to which the present invention is applied, and FIG.
FIG. 5 is a diagram showing the configuration of an embodiment of the address section of the access pipeline. FIG. 6 is a diagram showing the configuration of the address generation section of one embodiment. FIG. FIG. 8 is a diagram showing the configuration of one embodiment of the main part of the address generation control section, and is a diagram for explaining the present invention in detail. In FIG. 3, l is the main memory and 2 is ν on the main memory.
, 3 is an instruction processing unit, and 4-A and 4-B are of an ACK configuration and can perform block access. The main storage control device 2 mediates data transfer between the main storage device 1 and the access request source device. The instruction processing unit 3 includes an access pipeline and a vector arithmetic unit.

【図示せず）に対する命令発信などを行うもので
ある。アクセスｅバイブライン４−Ａ、　４−Ｂに対し
て命令発信ケ行う場合、命令処理部３は、起動信号、ベ
クトル・レングス、ベース−アドレス。ティスタンス及びオペコードをアクセス・パイプライン
に送る。アクセス・パイプライン４−Ａは。主記憶装置へのベクトル働データのストア及び主記憶装
置からのベクトル・データのロード７行うものであリー
ベクトル・レジスタからのデータの読出し−ベクトル・
レジスタへのデータの書込み。アライン処理、アドレス生成及び主記憶制御装置ヘのリ
クエストやリクエスト・アドレス、オペコードの送出な
どを行う。アクセス・パイプライン４−Ｂは４−Ａと同
じ機能を有している。ベクトル・レジス４１５は複数の
エレメントよりなるベクトル・データを格納するもので
ある。図にはベクトル・レジスタは１個しか示されてい
ないが、実ントずつベクトル・レジスタからニレメント
ラ読出すことが出来、また１度に４ニレメントスつニレ
メントビベクトル・レジスタに書込むことが出来る。第４図はアクセス−パイプラインの１実施例の構成を示
すものである。第４図において、６はプライオリティ制
御回路１．７はアドレス部、８はアライン制御部、９は
アライン回路、１０は８００回路、１１と１２は制御信
号ｉをそれぞれ示している。主記憶制御装置２は、プラ
イオリティ制御回路６を有しており、主記憶アクセス要
求が競合した場合、ブライオｌ】ティに従って１個の主
記憶クセス・パイプライン４−Ａとの間には４本のデー
タ・バスＤＢが張られており、同様に主記憶制御装置２
とアクセス−パイプライン４−Ｂとの間にも４本のデー
タ・バスＤＢが張られている。データ・バスは８Ｂ幅の
ものである。アクセス・パイプライン４−Ａは、アドレ
ス部７．アライン制御部８及びアライン回路９を有して
いる。主記憶装置１をアクセスしたい場合、アドレス部
７はリクエスト、オペコード及びリクエスト・アドレス
を制御信号線１１を介して主記憶制御装置２に送る。ア
ライン制御部８は、主記憶制御装置２からの制御信号に
従ってアライン回路９を制御するものである。アライン
回路９とベクトル・レジスタ５との間には、４本のデー
タ・バス（バス帳ハ８Ｂ）が張られIt is used to issue commands to (not shown). When transmitting a command to the access e-vibration lines 4-A and 4-B, the command processing unit 3 sends an activation signal, a vector length, and a base address. Send the stance and opcode to the access pipeline. The access pipeline 4-A is. It stores vector working data to the main memory and loads vector data from the main memory.Reading data from the vector register - Vector
Writing data to a register. Performs alignment processing, address generation, and sending requests, request addresses, and opcodes to the main memory control unit. Access pipeline 4-B has the same functionality as 4-A. Vector register 415 stores vector data consisting of a plurality of elements. Although only one vector register is shown in the figure, real points can be read from the vector register, and four vector registers can be written to the vector register at a time. FIG. 4 shows the structure of one embodiment of the access pipeline. In FIG. 4, 6 is a priority control circuit 1, 7 is an address section, 8 is an align control section, 9 is an align circuit, 10 is an 800 circuit, and 11 and 12 are control signals i, respectively. The main memory control device 2 has a priority control circuit 6, and when main memory access requests conflict, four main memory access pipelines are connected to one main memory access pipeline 4-A according to priority. A data bus DB is connected to the main memory controller 2.
Four data buses DB are also connected between the access pipeline 4-B and the access pipeline 4-B. The data bus is 8B wide. The access pipeline 4-A includes an address section 7. It has an align control section 8 and an align circuit 9. When it is desired to access the main memory device 1, the address unit 7 sends a request, an operation code, and a request address to the main memory controller 2 via the control signal line 11. The align control section 8 controls the align circuit 9 according to a control signal from the main memory control device 2 . Four data buses (bus number 8B) are connected between the align circuit 9 and the vector register 5.

【いる。前述したよ
うに、アクセス・パイプライン４−Ｂは４−Ａと同様な
構成を有している。第５図はアドレス部の１実施例の構成を示すものである
。第５図において、１３−Ａと１３−Ｂはアドレス発生制
御部、１４と１４′はアドレス発生部、ＢＡはベース・
アドレス、Ｄはディスタンス、ＶＬはベクトル長、ｏｐ
ｃはオペコード、ＳＴＡはアドレス発生制御部１３−Ａ
に対する起動信号、ＳＴＢはアドレス発生制御部１３−
Ｂに対する起動信号をそれぞれ示している。アドレス発
生制御部１３−Ａは、命令処理部３からのベース・アド
レスＢＡ、プイスｉンスＤ、ベクトル長Ｖ　Ｌ、オペコ
ードＯＰＣ及び起動信号ＳＴＡを受取ったとき、アドレ
ス生成のための制御を行う。アドレス発生制御部１３−
Ａは、ディスタンスＤを調べ、ブロック拳アクセス可能
の場合には後述するようにアドレス・パイプライン１７
−Ａ’に制御してブロック・アクセスｗ行い、ブロック
ｅアクセスが不可の場合には後述するようにアドレス・
パイプライン１７−Ａ、　１７−Ｂ、　１７−Ｃ，１７
−Ｄを制御してアドレス生成を行う。アドレス発生制御
部１３−Ｂは、命令制御部３からのベース・アドレスＢ
Ａ、ディスタンスＤ、ベクトル長ＶＬ、オペコードＯＰ
Ｃ及び起動信号を受取ったとき、ブロックアクセス可能
であるときに限って動作し、後述するようにアドレス・
パイプライン１７−Ｂ’Ｙ制御してブロック０アドレス
の生成を行う。第６図はアドレス発生部の１実施例の構成を示すもので
ある。第６図において、１５−Ａと１５−Ｂは加算器、
１ｆｔ−Ａと１６−Ｂはアドレス変換回路、ＢＡＲＡと
ＢＡＲＢはベース・アドレス・レジスタ、ＤＲＡとＤＲ
Ｂはディスタンス・レジス４゜ＬＡＲＡとＬＡＲＤは論
理アドレス・レジスタ、　ＲＱＡとＲＱＢはリクエスト
・アドレス・レジスＪ、　ＲＱＱＡとＲＱＱＢはリクエ
スト・キュー・バッファ。ＰＡとＰＢはボート、Ｇは入力ゲート、１７人ないし１
７Ｂはアドレス・パイプラインをそれぞれ示している。２１０ｊＦ器１５−Ａは、ベース９アドレス・レジスタ
ＢＡＲＡとディスタンス・レジスＪＤＲＡの内容に基づ
いて論理アドレスを生成するものである。この論理アドレスは論理アドレス・レジスタＬＡＲＡに
セットされ、論理アドレス・レジスタＬＡ０ＲＡの内容はアドレス変換回路１６−Ａによって実アド
レスに変換され、この実アドレスはリクエスト・アドレ
ス・レジスタＲＱＡＫセットされる。リクエスト−アドレス・レジスタＲＱＡから出力される
リクエスト・アドレスはリクエスト・キュー−ハッ７ア
ＲＱＱＡに入力される。リクエスト・キュー・バッフ７
ＲＱＱＡから出力されるリクエスト・アドレスはボー）
ＰＡを介してプライオリテの部分から構成されている。アドレス・バイブライン１７−Ｉｌｌ　７−Ｃ，１７−
Ｄは、１７−Ａと同様な構成を有している。アドレス発生制御部１３−Ａは、ブロック・ア／　セｘ
　Ｙ　行う場合、アドレス・パイプライン１７−ＡＶ制
御する。こＱ）場合、加算器１５−Ａは先ずベース−ア
ドレスＢＡ’２出力する。これはベース・アドレス−レ
ジスタＢＡＲＡにセットされる。以下、ベースｅアドレス・レジスタＢＡＲＡ　Ｋ　４　
Ｄ１一アドレス・レジスタＢＡＲＡにセットされる。アドレス発生制御部１３−Ａは、ブロック・アクセスが
不可の場合、アト１／ス・パイプライン１７−Ａ、　１
７−Ｂ、　１７−Ｃ，１７−Ｄを制御する。加算器１５−Ａは先ずベース・アドレスＢＡ’に出カス
る。これはベース・アドレス・レジスタＢＡＲＡにセッ
トされる。以下、ベース・アドレス・ずベース−アドレ
スＢＡにＩＤを加算したものを出カスる。これはベース
−アドレス□レジスタＢＡＲＢにセットされる。以下、
ベース・アドレス。レジスタＢＡＲＢに４Ｄを加算したものが出力され−こ
の値は再ヒベース・アドレス・レジスタＢＡＩＲＪＩに
セットされる。アドレス・パイプライン１７−Ｃの加算
器は最初にベース・アドレスＢＡに２Ｄを加算したもσ
）を出力し、アドレス・バイブライｙ１７−Ｄの加算器
は最初にベース・アドレスＢ２Ａに３Ｄを力ｎ算したものを出力する。それ以後の処理
は、アドレス・パイプライン１７−Ａ、１７−Ｂと同じ
である。第７図（イ）、（ロ）はアドレス発生制御部の要部の１
実施例構成を示す図である。第７図（イ）、（ロ）にお
いて、１８Ａないし２４Ａはレジスタ、１８Ｂないし２
４Ｂもレジスタ、２５Ａと２５Ｂは減算器、２６−Ａと
２６−Ｂはデコーダ、２７はデータネ連続検出回路、２
８−Ａと２８−Ｂは終了検出回路、２９ないし３４はＡ
ＮＤ回路、３５ないし３６及び４０はＯＲ回路、３７な
いし３９はＮＯＴ回路をそれぞれ示している。先ず、第７図（イ］から説明する。レジスタ１８−人は
起動信号ＳＴＡを保持する。レジスタ１８−人の内容は
、レジスタ１９−Ａおよび２０−Ａを介してリフニス）
ＲＱとして主記憶制御装置２に送られる。レジスタ２３
−Ａは残りベクトル長を保持するものである。減算器２
５−Ａは、アドレス生成が行われる度にレジスタ２３−
Ａの内容から定数を減算する。残りベクトル長が零にな
ると、終了検出回路２８−ＡはＰｉＰＥＢＵＳＹ　Ａ−
Ｐ番ＰＥ信号を論理「０」とする。レジス４２４−Ａは
オペコードＯＰＣを保持するものである。レジスタ２４
−Ａに保持されているオペコードはデコーダ２６−Ａに
よってデコードされ、デコード結果がレジスタ２１−Ａ
および２２−Ａ’（！−介して主記憶制御装置２に送ら
れる。Ｂ側の回路はＡ側の回路と同じである。データネ
連続検出回路２７は、ディスタンス・レジスタＤＲＡの
内容に基づいて。否かｔ検出するものである。この回路２７は、データネ
連続（ブロック自アクセス不可）を検出したときには論
ｍｒｌＪ’＆出力し、データ連続【ブロック−アクセス
可能】のときには論理「０」を出力する。終了検出回路
２８−Ａが論理【１」な出力し且つデータネ連続検出回
路２７が論理「１」を出力している場合にはｐＬｐｇ　
ＢＵＳＹ　ｎ−ｐｔｐｇ信号は論理「ｌ」となり−また
、終了検出回路２８−Ｂが論理ｒｌＪを出力し且つデー
タネ連続検出回路２７が論理ｒＯＪを出力している場合
にはＰｊＰＥ　ＢＵＳＹ　Ｂ−ＰｉＰＥ信号は論理「１
」となる。データネ連続検出回路２７が論理「ｌ」を出
力しているときにはデコーダ２６−Ａのデコード結果が
レジス４２１−Ｈにセットされ、データネ連続検出回路
が論理ｒＯＪ＊出方しているときにハテコーダ２６−Ｂ
のデコード結果がレジスタ２１−Ｂにセットされる。レ
ジスタ２１−Ｂの内容はレジスＪ２２−ＢＹ介して主記
憶制御装置２に送られる。命令処理部３からアクセス・パイプライン４−Ａ　ヘノ
扁令発信はＰＬＰＥ　ＥＵＳＹ　Ａ−ＰｊＰＥ信号によ
り制御され、同様に命令処理部３からアクセス・パイプ
ライン４−Ｂへの命令発信はＰＩＰＥＢＵＳＹ　Ｂ−Ｐ
ｉＰＢ信号によって制御される。ｐｉＰＥ　ＢＵＳＹ　
Ａ−ＰＺＰＥ信号が論理「１」のときにはアドレス発生
制御部１３−ＡがＢＵＳＹ状態であるようにし、同様Ｋ
ＰｉＰＥ　ＢＵＳＹ　Ｂ−ＰｉＰＥ信号が論理「１」の
ときにはアドレス発生制御部１３−ＢがＢＵ８Ｙ状態に
なるようにする。したがってデータが連続している場合
のアクセスについ５ては、２番ＰＥ　ＢＵＳＹ　Ａ−ＰｉＰＥ信号およびｐ
ｉＰＥ　ＢＵＳＹ　Ｂ−ＰＩＰＥ信号は対応したアクセ
ス・パイプラインによって独立して生成され、データが
不連続の場合［はＰｔＰＥ　ＢＵＯＹ　Ａ−ＰｉＰＥ信
号およびｐｉｐＥ　ＢＵＳＹ　Ｂ−ＰＩＰＥ信号が共に
ｉ＃Ｕ［１」になり、命令処理部３からＱノアクセスｅ
パイプライン４−Ｂへの命令投入は禁止される。ｖシｘｌｌ　１８−Ａ内の起動信号ＳＴＡはレジス検出
が行われると、レジスタ１８−Ａはクリアされる。これ
と同時に、減算器２５−Ａによる減算処理およびレジス
タ２４−Ａによるオペコードの保持な停止し１次の命令
を投入できるようにする。Ｂ側θ）回路についても同様な動作が行われる。第７図１）はアドレス・パイプライン１７−Ａ。１７−Ｂ、　１７−Ｃ１１７−Ｄのそれぞれの中和存在
する入力ゲート、ペース・アドレス・レジスタ及びディ
スタンスφレジスタなどを制御する部分である。第７閣
印の■の信号か論理「ｏ」のと６きには、アドレス−パイプライン１７−Ｂはアドレス発
生制御部１３−Ｂにより制御され一アドレス・パイプラ
イン１７−Ａはアドレス発生制御部１３−Ａによっ°Ｃ
制御される。■の信号が１ａ埋「１」のときには、アド
レス・パイプライン１７−Ａ。１７−ｆＬ　１７−Ｃ，１７−Ｄは全てアトｉ／ス発生
制御部１３−Ａによって制御される。なお、制御信号を
生成する回路は、第７回部、―）には示されていない。第８図は本発明の詳細な説明する図である。いま、下記
のようなベクトル命令列があったと仮足する。 ■　ＶＬＤ（デー４Ｉ連続） ■　ＶＬＤ（同　上　） ■　ＶＬＤ（デーｌ不連続） ■　ＶＬＤｔデータ連続） ■　ＶＬＤ（同　上　）なお、ＶＬＤはベクトル・ロード命令を表わしている。 ■Ｕ）命令乞実行するために必要なアドレスはアドレス
・パイプライン１７−Ａで作成され。これと同時に■の命令を実行するために必要なアドレス
はアドレス・パイプライン１７−Ｂによって生成される
。■の命令を実行するためのアドレスの生成及び■の命
令を実行するためのアドレスの生成が両方とも終了した
後に、■の命令を実行するために必要なアドレスはアド
レス・パイプライン１７−Ａ、１７−Ｂ、１７−Ｃ，１
７−Ｄによって生成される。■の命令を実行するための
ア実行するために必要なアドレスがアドレス・パイプラ
イン１７−Ｈによって生成される。〔発明の効果〕以上の説明から明らかなように、本発明によれば、デー
タが不連続な場合のアクセスにおいてもハードウェア重
を増加させることなく且つスルーグツトを比較的おとす
ことなく、ベクトル・ロード命令やベクトル・ストア命
令などを実行することが出来る。[There is. As described above, the access pipeline 4-B has the same configuration as the access pipeline 4-A. FIG. 5 shows the structure of one embodiment of the address section. In FIG. 5, 13-A and 13-B are address generation control sections, 14 and 14' are address generation sections, and BA is a base control section.
address, D is distance, VL is vector length, op
c is an operation code, and STA is an address generation control unit 13-A.
The activation signal STB is the address generation control unit 13-
The activation signals for B are shown respectively. When the address generation control section 13-A receives the base address BA, the source D, the vector length VL, the operation code OPC, and the start signal STA from the instruction processing section 3, it performs control for address generation. Address generation control section 13-
A checks the distance D and, if block fist access is possible, uses the address pipeline 17 as described below.
-A', block access w is performed, and if block e access is not possible, the address
Pipelines 17-A, 17-B, 17-C, 17
-D is controlled to generate an address. The address generation control unit 13-B receives the base address B from the instruction control unit 3.
A, distance D, vector length VL, opcode OP
It operates only when block access is possible when it receives C and activation signal, and the address and
A block 0 address is generated under pipeline 17-B'Y control. FIG. 6 shows the structure of one embodiment of the address generation section. In FIG. 6, 15-A and 15-B are adders;
1ft-A and 16-B are address conversion circuits, BARA and BARB are base address registers, DRA and DR
B is a distance register 4° LARA and LARD are logical address registers, RQA and RQB are request address registers J, and RQQA and RQQB are request queue buffers. PA and PB are boats, G is input gate, 17 people or 1
7B each indicates an address pipeline. The 210jF unit 15-A generates a logical address based on the contents of the base 9 address register BARA and the distance register JDRA. This logical address is set in logical address register LARA, the contents of logical address register LA0RA are translated into a real address by address translation circuit 16-A, and this real address is set in request address register RQAK. The request address output from the request-address register RQA is input to the request queue register RQQA. Request queue buffer 7
The request address output from RQQA is baud)
It consists of a priority part via a PA. Address vibe line 17-Ill 7-C, 17-
D has a similar configuration to 17-A. The address generation control unit 13-A controls block a/x
Y If performed, address pipeline 17-AV control is performed. In this case, the adder 15-A first outputs the base address BA'2. This is set in the base address register BARA. Below, base e address register BARA K 4
D1 - Set in one address register BARA. If block access is not possible, the address generation control unit 13-A controls the address generation control unit 17-A, 1
7-B, 17-C, and 17-D. Adder 15-A first outputs base address BA'. This is set in base address register BARA. Hereafter, the base address BA plus the ID is output. This is set in the base-address□ register BARB. below,
Base address. Register BARB plus 4D is output - this value is then set in the base address register BAIRJI. The adder of address pipeline 17-C first adds 2D to the base address BA, and σ
), and the adder of the address library y17-D first outputs the base address B2A multiplied by 3D. The subsequent processing is the same as that of address pipelines 17-A and 17-B. Figures 7(a) and 7(b) show part 1 of the main part of the address generation control section.
It is a figure showing an example composition. In FIGS. 7(a) and (b), 18A to 24A are registers, 18B to 2
4B is also a register, 25A and 25B are subtracters, 26-A and 26-B are decoders, 27 is a data continuity detection circuit, 2
8-A and 28-B are end detection circuits, 29 to 34 are A
ND circuits, 35 to 36 and 40 are OR circuits, and 37 to 39 are NOT circuits, respectively. First, explanation will be given from FIG.
It is sent to the main memory control device 2 as an RQ. register 23
-A holds the remaining vector length. Subtractor 2
5-A is a register 23-A each time an address is generated.
Subtract a constant from the contents of A. When the remaining vector length becomes zero, the end detection circuit 28-A turns PiPEBUSY A-
The P number PE signal is set to logic "0". The register 424-A holds the operation code OPC. register 24
The opcode held in -A is decoded by decoder 26-A, and the decoding result is stored in register 21-A.
and 22-A'(!-) to the main memory controller 2. The circuit on the B side is the same as the circuit on the A side. The data continuity detection circuit 27 is based on the contents of the distance register DRA. This circuit 27 outputs logic "mrlJ'&" when it detects data continuity (block cannot be accessed), and outputs logic "0" when data continuity [block - access is possible]. If the end detection circuit 28-A outputs logic [1] and the data continuity detection circuit 27 outputs logic [1], pLpg.
The BUSY n-ptpg signal becomes logic "L" - and when the end detection circuit 28-B outputs logic rlJ and the data continuity detection circuit 27 outputs logic rOJ, the PjPE BUSY B-PiPE signal becomes Logic "1"
”. When the data continuity detection circuit 27 is outputting the logic "L", the decoding result of the decoder 26-A is set in the register 421-H, and when the data continuity detection circuit is outputting the logic rOJ*, the decoding result of the decoder 26-A is set to the register 421-H. B
The decoding result is set in register 21-B. The contents of register 21-B are sent to main memory control device 2 via register J22-BY. The transmission of commands from the instruction processing unit 3 to the access pipeline 4-A is controlled by the PLPE EUSY A-PjPE signal, and similarly the transmission of commands from the instruction processing unit 3 to the access pipeline 4-B is controlled by the PIPEBUSY B-P.
Controlled by the iPB signal. piPE BUSY
When the A-PZPE signal is logic "1", the address generation control unit 13-A is in the BUSY state, and similarly, the K
PiPE BUSY B-When the PiPE signal is logic "1", the address generation control section 13-B is set to the BU8Y state. Therefore, for access when data is continuous, the 2nd PE BUSY A-PiPE signal and the
The iPE BUOY B-PIPE signal is generated independently by the corresponding access pipeline, and if the data is discontinuous, the PtPE BUOY A-PiPE signal and pipE BUSY B-PIPE signal are both i#U[1]. Then, Q access e from the instruction processing unit 3
Inputting instructions to pipeline 4-B is prohibited. When the register 18-A is detected as a start signal STA in the v-shill 18-A, the register 18-A is cleared. At the same time, the subtraction process by the subtractor 25-A and the holding of the operation code by the register 24-A are stopped so that the next instruction can be input. A similar operation is performed for the B-side θ) circuit. FIG. 7 1) shows the address pipeline 17-A. This is a part that controls the input gate, pace address register, distance φ register, etc. of each of 17-B and 17-C and 117-D. In some cases, the address pipeline 17-B is controlled by the address generation control unit 13-B, and the address pipeline 17-A is controlled by the address generation control unit 13-B. Part 13-A °C
controlled. When the signal (2) is "1" in 1a, the address pipeline 17-A. 17-fL 17-C and 17-D are all controlled by the at least one generation control section 13-A. Note that the circuit that generates the control signal is not shown in the seventh part, -). FIG. 8 is a diagram explaining the present invention in detail. Suppose we have a vector instruction sequence like the one below. ■ VLD (data 4I continuous) ■ VLD (same as above) ■ VLD (data discontinuous) ■ VLDt data continuous) ■ VLD (same as above) Note that VLD represents a vector load instruction. (U) The addresses necessary for executing the command are created by the address pipeline 17-A. At the same time, the address necessary to execute the instruction (2) is generated by the address pipeline 17-B. After the generation of an address for executing the instruction (2) and the generation of an address for executing the instruction (2) have both been completed, the address required to execute the instruction (2) is transferred to the address pipeline 17-A, 17-B, 17-C, 1
Generated by 7-D. Addresses necessary for executing the instruction (2) are generated by the address pipeline 17-H. [Effects of the Invention] As is clear from the above description, according to the present invention, vector loading can be performed without increasing the hardware load and relatively reducing throughput even when accessing data is discontinuous. It can execute instructions, vector store instructions, etc.

[Brief explanation of drawings]

第１図は倍精度、単精度および固定小数点データのベク
トル・レジスタ上における格納形式を示す図％第２図は
ベクトル・データ処理装置の実行性能を向上させるため
の望ましいベクトル・データのロードの仕方を示す図、
第３図は本発明が適用されるベクトル・データ処理装置
の概要を示す図、第４図はアクセス・パイプラインの１
実施例の構成を示す図、第５図はアクセス・パイプライ
ンのアドレス部の１実施例の構成ン示す図、第６図はア
ドレス発生部の１実施例の構成を示す図。第７図はアドレス発生制御部の要部の１実施例構成を示
す図、第８図は本発明の詳細な説明する図である。１・・・主記憶装置、２・−・主記憶制御装置、３・・
・命令処理部、４−Ａと４−Ｂ・・・アクセス・パイプ
ライン、５・・・ベクトル・レジスタ％　６・・・プラ
イオリティ制御回路、７・・・アドレス部、８・・・ア
ライン制御部、９・・・アライン回路、１０・・・ＥＣ
Ｃ回路、１１と１２・・・制御信号線、１３−Ａと１３
９ −Ｂ・・・アドレス発生制御部、１４と１４′・・・ア
ドレス発生部、ＢＡ・・・ペース・アドレス、Ｄ・・・
ディスタンス、ＶＬ・・・ベクトル長、ｏｐｃ・・・オ
ペコード、ＳＴＡ・・・アドレス発生制御部１３−Ａに
対する起動信号、ＳＴＢ・・・アドレス発生制御部１３
−Ｂに対する起動信号、１５−Ａと１５−Ｂ・・・加算
器、１６−Ａと１６−Ｂ・・・アドレス変換回路、ＢＡ
ＲＡとＢＡＲＢ・・・ベース・アドレス・レジｘ４．　
ＤＲ）、とＤＲＢ・・・ディスタンス・Ｖ９ｘｉ、ＬＡ
ＲＡとＬＡＲＢ・・・論理アドレス・レジス４．ＲＱＡ
とＲＱＢ・・・リクエスト・アドレス・レジスタ、ＲＱ
ＱＡ　とＲＱＱＢ・・・１１クエスト・キューｅバッフ
ァ、ＰＡ！：ＰＢ・・・ボート、Ｑ・・・入カゲー）、
１７−Ａないし１７−Ｂ・・・アドレス・パイプライン
。特許出願人　富士通株式会社代理人弁理士　京　谷　四　部　０才１図すＺ図Figure 1 shows the storage format of double-precision, single-precision, and fixed-point data in vector registers. Figure 2 shows the preferred way to load vector data to improve the execution performance of a vector data processing device. A diagram showing
FIG. 3 is a diagram showing an overview of a vector data processing device to which the present invention is applied, and FIG.
FIG. 5 is a diagram showing the configuration of an embodiment of the address section of the access pipeline, and FIG. 6 is a diagram showing the configuration of the address generation section of the embodiment. FIG. 7 is a diagram showing the configuration of one embodiment of the main part of the address generation control section, and FIG. 8 is a diagram illustrating the present invention in detail. 1... Main memory device, 2... Main memory control device, 3...
・Instruction processing section, 4-A and 4-B...Access pipeline, 5...Vector register% 6...Priority control circuit, 7...Address section, 8...Align control section , 9... Align circuit, 10... EC
C circuit, 11 and 12...control signal line, 13-A and 13
9-B...Address generation control unit, 14 and 14'...Address generation unit, BA...Pace address, D...
Distance, VL...Vector length, opc...Operation code, STA...Start signal for address generation control section 13-A, STB...Address generation control section 13
Activation signal for -B, 15-A and 15-B...adder, 16-A and 16-B...address conversion circuit, BA
RA and BARB...base address register x4.
DR), and DRB...Distance V9xi, LA
RA and LARB...logical address register 4. RQA
and RQB...request address register, RQ
QA and RQQB...11 quest queue e-buffer, PA! :PB...boat, Q...entering game),
17-A to 17-B...Address pipeline. Patent Applicant: Fujitsu Ltd. Representative Patent Attorney Yotsubu Kyotani 0 Years Old 1 Figure Z Diagram

Claims

[Claims]

In a vector data processing device having a plurality of independently operable access pipelines that perform transfer between a main memory and a vector register using one or more data buses, an address generation unit, and an address own bus, During continuous access to data that refers to data on the main memory, each access pipeline independently manages the address generation unit and address bus to issue accesses to the main memory and access the data. In the case of discontinuous accesses, any one of the plurality of access pipelines manages the address generator and the address bus to access the main memory. vector data processing device.