JPS6042985B2

JPS6042985B2 - Parallel execution control method for linked instructions

Info

Publication number: JPS6042985B2
Application number: JP16663779A
Authority: JP
Inventors: 哲郎岡本; 正徳茂木
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1979-12-21
Filing date: 1979-12-21
Publication date: 1985-09-26
Also published as: JPS5688562A

Description

【発明の詳細な説明】本発明は、リンクした命令の並列実行制御方式、特に
ベクトル演算を行なう処理システムにおいてエレメント
、データを格納するベクトル・レジスタをもうけて１つ
のバンク上に複数のベクトル・レジスタの一部が夫々位
置せしめられ、かつ各バンクに対応して夫々入出力部を
他と分離してもうけ、更に該ベクトル・レジスタをメモ
リによつて構成すると共に１つのベクトル・レジスタ内
の互に連続する番地が異なるバンクに位置するようイン
タリーフさせておき、リンクした命令の演算のエレメン
ト・データＸ、、Ｘ２、Ｘ３・・・・・・を順次格納す
るためのバンク・アクセス・タイミングと、次の命令に
対応して当該エレメント・データｘ、、Ｘ２、Ｘ３・・
・・・・を順次フエツチするためのバンク・アクセス・
タイミングとを互にずらせるようにしたリンクした命令
の並列実行制御方式に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention provides a method for controlling the parallel execution of linked instructions, particularly in a processing system that performs vector operations. A part of the vector register is located in each bank, and each bank has an input/output section separated from the others, and furthermore, the vector register is constituted by a memory, and mutual communication within one vector register is provided. Bank access timing for sequentially storing element data X, X2, X3, etc. of operations of linked instructions by interleaving so that consecutive addresses are located in different banks; In response to the next instruction, the element data x, , X2, X3...
Bank access to sequentially fetch...
This invention relates to a parallel execution control system for linked instructions in which the timings are shifted from each other.

ベクトル演算処理においては、一般に、ベクトルＢに
属するエレメント・データｂｌ、ｂ２９ｂ３・・・・・
・および／またはベクトルＣに属するエレメント・デー
タＣｌ、Ｃ２、Ｃｓ・・・・・・を演算してエレメント
・データａ、、ａ。In vector arithmetic processing, generally element data belonging to vector B bl, b29b3...
・And/or calculate element data Cl, C2, Cs, etc. belonging to vector C to obtain element data a, , a.

、ａａ・・・・・・を得て、当該エレメント・データａ
ｌ、ａ２９ａ３・・・・・・をベクトルＡに属するエレ
メント・データとして格納するようにされる。このよう
な演算処理は、各エレメントａ、、ａ。、ａ。・・・・
・・を連続して得ることができるように、パイフライン
処理によつて実行される。しかし、第１図に示す如く、
第１のロード命令ＶＬによつて＃１ベクトル・レジスタ
＃１ＶＲにロードし、次いでその直後の第２の加算命令
ＶＡにおいて＃１ベクトル・レジスタ＃１ＶＲの内容を
用いるように、複数の命令がリンクしている場合を考え
ると、通常、第２図図示の如く命令ＶＬによるロードが
終了した後に命令ＶＡの実行を開始する形となり、処理
効率が低下する。, aa..., and obtain the element data a.
1, a29a3, . . . are stored as element data belonging to vector A. Such arithmetic processing is performed for each element a,,a. ,a.・・・・・・
. However, as shown in Figure 1,
Multiple instructions are linked such that a first load instruction VL loads #1 vector register #1VR, and then a second add instruction VA immediately following uses the contents of #1 vector register #1VR. In this case, as shown in FIG. 2, the execution of the instruction VA is usually started after the loading by the instruction VL is completed, resulting in a decrease in processing efficiency.

このために、第３図に示す如く、命令■Ｌによつてエレ
メント・データＸ。For this purpose, as shown in FIG. 3, element data

，Ｘｌ，Ｘ２・・・・・・がロードされて＃１ベクトル
・レジスタに順にストアされる際に、当該エレメント・
データを分岐して第２の命令■Ａのためにも取込むよう
にし、命令ＶＡを命令■Ｌの実行途中から並行して実行
することが考慮される。該第３図図示の方式の場合、エ
レメント・データ式のロードが終了したときに直ちに当
該エレメント・データＸ。を用いた次の命令■Ａに関す
る演算を開始できる優れた利点をもつている。しかし、
上記の如くデータを分岐して取込むためのバスを必要と
することとなる。本発明は、上記バスをもうけることな
く、第４図図示の如くエレメント・データ為のロードが
終了した後の早い時点において上記＃１ベクトル・レジ
スタの空き状態が得られるようにして、次の命令ＶＡに
関する処理を開始できるようにすることを目的としてい
る。, Xl, X2... are loaded and stored in order in the #1 vector register,
It is considered that the data is branched and also fetched for the second instruction (2)A, and the instruction VA is executed in parallel from the middle of the execution of the instruction (2)L. In the case of the method shown in FIG. 3, the element data X is immediately loaded when the loading of the element data formula is completed. It has the excellent advantage of being able to start calculations regarding the next instruction ``A'' using . but,
As mentioned above, a bus is required for branching and fetching data. The present invention enables the empty state of the #1 vector register to be obtained at an early point after the loading of element data is completed as shown in FIG. 4 without creating the above-mentioned bus. The purpose is to enable processing related to VA to be started.

そしてそのため、本発明のリンクした命令の並列実行制
御方式は、ｍ箇のベクトル・レジスタをそなえて該ｍ箇
のベクトル・レジスタに対しベクトルＢに属する複数の
エレメント・データＢｉ，ｂ２，ｂ３・・・・・を取り
出しおよび／または格納する機能をもつベクトル演算処
理システムにおいて、上記ベクトル・レジスタ内の互に
連続する番地が互に異なるバンクに位置するように上記
ｍ（ｍ〉２）箇のベクトル・レジスタをＫ．（ｋ〉２）
バンクのメモリにインタリーブした記憶装置によつて構
成すると共に、該記憶装置の各バンクはそれぞれその入
出力部をバンク毎に分離し、複数のベクトルデータ格納
元からのデータは各バンク毎にもうけられる入力マルチ
プレクサを・介して各バンクに個別に入力し、複数のベ
クトルデータ出刃先へのデータは各出刃先毎にもうけら
れる出力マルチプレクサを介して出力することにより、
異なるバンクへのアクセスを同一時点から開始すること
が可能なように構成し、さらに互にリンクされた命令に
よる処理結果のエレメント・データＸｌ，Ｘ２，Ｘ３・
・・・・を順に格納するためのバンク・アクセス・タイ
ミングと上記処理結果のエレメント・データＸｌ，Ｘ２
，ｊ・・・・・・を用いて演算する次の命令に対応して
当該エレメント●データＸｌ，Ｘ２，Ｘ３・・・・・・
を順にフエツチするためのバンク・アクセス・タイミン
グとを互にずらせるようにしたことを特徴としている。
以下図面を参照しつつ説ノ明する。第５図は本発明の一
実施例構成を示す。Therefore, the parallel execution control method for linked instructions of the present invention is provided with m vector registers and a plurality of element data Bi, b2, b3, etc. belonging to vector B for the m vector registers. In a vector arithmetic processing system having a function of retrieving and/or storing ..., the m (m〉2) vectors are arranged such that consecutive addresses in the vector register are located in different banks.・Click the register to K. (k>2)
It is composed of storage devices interleaved with bank memories, and each bank of the storage device has its input/output section separated for each bank, and data from a plurality of vector data storage sources can be received for each bank. By individually inputting data to each bank via an input multiplexer, and outputting data to multiple vector data points through an output multiplexer provided for each cutting point,
It is configured so that accesses to different banks can be started from the same point in time, and element data Xl, X2, X3, etc. are processed as a result of mutually linked instructions.
Bank access timing for sequentially storing ... and element data Xl, X2 of the above processing results
, j..., corresponding element data Xl, X2, X3...
This feature is characterized in that the bank access timing for sequentially fetching the data is shifted from each other.
This will be explained below with reference to the drawings. FIG. 5 shows the configuration of an embodiment of the present invention.

図中の符号１はベクトル・レジスタを構成する記憶装置
、＃０■Ｒ，＃１ＶＲ，・・・・＃１５ＶＲは夫々ベク
トル●レジスタ、２，−０，２−１，２−２，２−３は
夫々バンク、３−０ないし３−２および４−０ないし４
−４は夫々必要に応じてもうけられるバッファ用レジス
タ、５は加算器、６は乗算器、７はエレメント・データ
●ロード用のバッファ、８はエレメント・データ●スト
ア用のバツフ”ア、９ないし２５は夫々マルチプレクサ
を表わしている。図示の場合、各ベクトル●レジスタ＃
０■Ｒ，＃１ＶＲ，・・・は夫々例えば２５６箇の番地
をもち、１つのバンク２−１上に複数のベクトル・レジ
スタの夫々の一部が位置するようにされ、各ベクトル●
レジスタ＃０■Ｒ，＃１ＶＲ，・・の番地ＲＯｌｒ４．
，ｌ８Ｊ・・・・・がバンク２−０に、番地Ｒｌｊｒ
５Ｊ・・・・がバンク２−１に、番地Ｒ２．，ｒ６ョ・
・・・・がバンク２−２に、番地Ｒ３Ｊ，ｒ７Ｊ・・・
・・がバンク２−３に位置するよう構成され、いわゆる
インタプーリされた形となつている。In the figure, numeral 1 is a storage device that constitutes a vector register, #0■R, #1VR, ... #15VR are vector registers, 2, -0, 2-1, 2-2, 2- 3 are banks, 3-0 to 3-2 and 4-0 to 4, respectively.
-4 is a buffer register provided as needed, 5 is an adder, 6 is a multiplier, 7 is a buffer for loading element data, 8 is a buffer for storing element data, 9 or 25 each represent a multiplexer. In the case shown, each vector ● register #
0■R, #1VR, .
Address ROlr4. of register #0■R, #1VR,...
, l8J... is in bank 2-0, address Rljr
5J... is in bank 2-1, address R2. , r6yo・
...is in bank 2-2, addresses R3J, r7J...
.

ただ一般の記憶装置におけるインタリーブは、一連のデ
ータを連続してフエツチできるようにするために考慮さ
れたものである。しかし、本願発明の場合には、第１図
に示す如きリンクした命令を効率よく実行するために上
記インタリーブの構成が考慮されるものであり、各バン
クはアクセス●タイムが１サイクルであつて同じタイミ
ングの下で並行してアクセスされることがある点におい
て大きく異つている。第５図図示のバッファ用レジスタ
３−０ないし３−２および４−０ないし４−４は必要に
応じて例えば１サイクル分の持ち状態を与えるために用
いられるものであると考えてよい。However, interleaving in general storage devices is designed to enable successive fetching of a series of data. However, in the case of the present invention, the above-mentioned interleaving structure is taken into consideration in order to efficiently execute the linked instructions as shown in FIG. They differ significantly in that they may be accessed in parallel under timing. It may be considered that the buffer registers 3-0 to 3-2 and 4-0 to 4-4 shown in FIG. 5 are used to provide a holding state for one cycle, for example, as necessary.

第１図図示の如き命令■Ｌと■Ａとを実行する場合には
、次の如く処理される。When executing commands ``L'' and ``A'' as shown in FIG. 1, the following processing is performed.

（１）＃１ベクトル・レジスタ＃１■ＲにタイミングＴ
Ｏ，Ｔｌにおいて夫々エレメント●データＸ。(1) Timing T in #1 vector register #1■R
Element ●Data X in O and Tl, respectively.

，Ｘｌが順にストアされて、エレメント●データＸ２を
ストアすべくバンク２−２がアクセスされたとする。こ
のタイミングをＴ２とする。（２）このタイミングＴ２
において、バンク２−１がアクセスされて＃２ベクトル
・レジスタ＃２ＶＲからエレメント・データｙ１が読出
される。, Xl are stored in order, and bank 2-2 is accessed to store element data X2. This timing is defined as T2. (2) This timing T2
At , bank 2-1 is accessed and element data y1 is read from #2 vector register #2VR.

またこのタイミングＴ２においては、タイミンＴ１にお
いてバンク２−０がアクセスされて読出された＃２ベク
トル・レジスタ＃２ＶＲのエレメント・データＹ。が例
えば図示レジスタ４一０にセットされている。そしてま
た上記タイミングＴ２においてバンク２−０がアクセス
されて＃１ベクトル・レジスタ＃１ＶＲから先にストア
されたエレメント・データＸ。が読出される。（３）そ
してタイミングＴ２において、バンク２−０から読出さ
れたエレメント・データ為と図示レジスタ４−０の内容
（エレメント●データＹＯ）とが加算器５によつて（Ｙ
Ｏ＋ＸＯ）の加算処理が開始される。Also, at timing T2, element data Y of #2 vector register #2VR, which was read by accessing bank 2-0 at timing T1. is set in the illustrated register 4-0, for example. Then, again at the above timing T2, bank 2-0 is accessed and element data X is stored first from #1 vector register #1VR. is read out. (3) Then, at timing T2, the element data read from bank 2-0 and the contents of illustrated register 4-0 (element ● data YO) are combined by adder 5 (Y
O+XO) addition processing is started.

（４）次にタイミングＴ３においては、ロードされてき
たエレメント・データ！がバンク２−３のアクセスによ
つて＃１ベクトル・レジスタ＃１ＶＲにストアされ、バ
ンク２−２からエレメント・データＹ２が読出されまた
エレメント・データｙ１がレジスタ４−０に存在し、バ
ンク２−１から先にストアされたエレメント●データＸ
１が読出される。(4) Next, at timing T3, the loaded element data! is stored in #1 vector register #1VR by accessing bank 2-3, element data Y2 is read from bank 2-2, element data y1 exists in register 4-0, and Elements stored from 1 to data X
1 is read.

即ち、（ｙ１＋Ｘ１）の加算処理が開始される。（５）
加算器５の加算に要する時間が仮に１サイクルであると
すると、上記タイミングＴ３において（ＹＯ＋ＸＯ）の
結果が出力され、バンク２−０をアクセスすることによ
つて当該内容が＃３ベクトル●レジスタ＃３ＶＲの番地
ＲＯョにストアされることとなる。That is, the addition process of (y1+X1) is started. (5)
Assuming that the time required for addition by the adder 5 is one cycle, the result of (YO+XO) is output at the above timing T3, and by accessing bank 2-0, the contents are transferred to #3 vector ● register # It will be stored at address RO of 3VR.

上記の場合、命令■Ｒによつてロードされてきたエレメ
ント●データ為がベクトル・レジスタにストアされた後
に２サイクル経過して読出され、命令■Ａによる演算の
ために利用されることを示した。In the above case, the element data loaded by instruction R is stored in the vector register, then read out two cycles later, and used for the operation by instruction A. .

しかし、加算器の処理に要する時間やバンク数に応じて
、幾サイクル後に読出されて処理されるかは変化される
ことがあるのは言うまでもない。しかし、第２図図示の
処理にくらべると大幅に処理速度が向上される。以上説
明した如く、本発明によれば、リンクした命令を高々数
サイクル程度の持ち時間を与えるだけで並行して実行す
ることが可能となる。However, it goes without saying that the number of cycles after which the data is read and processed may vary depending on the time required for processing by the adder and the number of banks. However, the processing speed is significantly improved compared to the processing shown in FIG. As described above, according to the present invention, it is possible to execute linked instructions in parallel by giving them at most several cycles of time.

[Brief explanation of drawings]

第１図ないし第３図は本発明の前提問題を説明する説明
図、第４図は本発明の詳細な説明する説明図、第５図は
本発明の一実施例構成を示す。1 to 3 are explanatory diagrams for explaining the prerequisite problems of the present invention, FIG. 4 is an explanatory diagram for explaining the present invention in detail, and FIG. 5 shows the configuration of an embodiment of the present invention.

Claims

[Claims]

1 A vector that is equipped with m vector registers and has the function of extracting and/or storing a plurality of element data b_1, b_2, b_3, etc. belonging to vector B into the m vector registers. In the arithmetic processing system, the m (m>2) vector registers are arranged so that consecutive addresses in the vector registers are located in different banks.
It is composed of storage devices interleaved with bank memories, and each bank of the storage device has its input/output section separated for each bank, and data from a plurality of vector data storage sources can be received for each bank. By inputting data to each bank individually through an input multiplexer and outputting data to a vector data output destination through a multiplexer provided for each output destination, access to different banks can be started from the same point. Element data x_1, x_2, x_3, etc. as a result of processing by mutually linked instructions
Bank access timing for sequentially storing . . . and element data x_1, x_2 of the above processing results
, x_3..., corresponding element data x_1, x_2, x_3.
A method for controlling parallel execution of linked instructions, characterized in that bank access timings for sequentially fetching . . . are staggered.