JP2580371B2

JP2580371B2 - Vector data processing device

Info

Publication number: JP2580371B2
Application number: JP2190335A
Authority: JP
Inventors: 正守柏山; 幸巳松本; 誠古賀
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1990-07-18
Filing date: 1990-07-18
Publication date: 1997-02-12
Anticipated expiration: 2012-02-12
Also published as: JPH0476772A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、ベクトルデータ処理装置に係り、特に複数
の演算器で構成するベクトル演算器の演算器構成方式
と、演算実行方式に関するものである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a vector data processing apparatus, and more particularly, to an operation unit configuration method of a vector operation unit including a plurality of operation units and an operation execution method. .

［従来の技術］ベクトルデータ処理装置は、先行する演算命令の演算
結果を格納するレジスタを後続の演算命令のオペランド
レジスタとして指定するいわゆるチェイニング機能によ
って複数の演算器を時間的にオーバーラップして使用す
ることができ、高いデータ処理能力を有している。ベク
トルデータ処理装置で大規模科学技術計算を行うとき、
プログラムコーディングスタイルは、並列演算器を活か
したコーディングが一般的である。すなわち、代入文の
右辺の演算項を多くする技法が性能向上に効果的であ
る。また、複素数演算やベクトル関数では、ベクトル演
算命令の出現比率が他のベクトル命令に比較して高く、
演算命令を高速に処理することが全体の性能向上に大き
く寄与する。[Prior Art] A vector data processing apparatus temporally overlaps a plurality of arithmetic units by a so-called chaining function in which a register for storing an operation result of a preceding operation instruction is designated as an operand register of a subsequent operation instruction. It can be used and has high data processing capability. When performing large-scale scientific and technical calculations with a vector data processing device,
As a program coding style, coding utilizing a parallel computing unit is generally used. That is, a technique for increasing the number of operation terms on the right side of an assignment statement is effective for improving performance. Also, in complex arithmetic and vector functions, the appearance ratio of vector operation instructions is higher than other vector instructions,
High-speed processing of operation instructions greatly contributes to overall performance improvement.

以下、この種の傾向に対応し性能向上を図った従来の
ベクトルデータ処理装置の構成を示す。Hereinafter, a configuration of a conventional vector data processing device which improves performance in response to this kind of tendency will be described.

第６図は、複数の演算器を持つベクトルデータ処理装
置の一例の概略構成を示したものである。同図におい
て、ベクトルデータ処理装置は、高速のランダムアクセ
スメモリ（以下、RAMという）で構成され、各々独立に
読出し・書込みが可能で、各々128要素のベクトルデー
タを保持できるVR0〜VR31の32個のベクトルレジスタ111
a〜111hと、２つのオペランドデータに対応したスカラ
データを格納するスカラデータバッファ112a,112bと、
ベクトルレジスタ111a〜111hとスカラデータバッファ11
2a,112bの出力データを命令により各々のリソースへ選
択出力するスィッチマトリックス論理のセレクタ（以
下、SELという）114と、パイプライン加算器101から構
成される０番演算器102と、パイプライン乗算器103から
構成される１番演算器104と、SEL114を通して供給する
オペランドデータを０番演算器102へ入力するオペラン
ドパス105,106と、同様に、１番演算器103へ入力するオ
ペランドパス108,109と、０番演算器102の演算結果を出
力するリザルトパス107と、１番演算器104の演算結果を
出力するリザルトパス110と、命令によりリザルトパス1
07,110からの演算結果をVR0〜VR31のベクトルレジスタ1
11a〜111hに書き込むため選択するスィッチマトリック
ス論理で構成されるセレクタ（以下、DISTという）113
により構成されている。FIG. 6 shows a schematic configuration of an example of a vector data processing device having a plurality of arithmetic units. In the figure, a vector data processing device is composed of a high-speed random access memory (hereinafter, referred to as a RAM), can read and write independently, and can hold 32 vector data of 32 elements each of VR0 to VR31. Vector register 111
a to 111h, scalar data buffers 112a and 112b for storing scalar data corresponding to two operand data,
Vector registers 111a to 111h and scalar data buffer 11
A selector (hereinafter, referred to as SEL) 114 of switch matrix logic for selectively outputting the output data of 2a and 112b to each resource by an instruction, a No. 0 arithmetic unit 102 including a pipeline adder 101, and a pipeline multiplier The first operation unit 104 composed of the first operation unit 103, the operand paths 105 and 106 for inputting the operand data supplied through the SEL 114 to the zero operation unit 102, the operand paths 108 and 109 for inputting the first operation unit 103 similarly, A result path 107 for outputting the operation result of the operation unit 102, a result path 110 for outputting the operation result of the first operation unit 104, and a result path 1 according to an instruction.
The calculation result from 07, 110 is stored in the vector register 1 of VR0 to VR31.
Selector (hereinafter, referred to as DIST) 113 composed of switch matrix logic selected to write to 11a to 111h
It consists of.

尚、主記憶からベクトルレジスタへオペランドデータ
を供給するロードパイプラインとベクトルレジスタから
主記憶へデータを格納するストアパイプラインは図示は
しないが具備されているものとする。また、ここでいう
リソースとは、０番演算器102と、１番演算器104と、ロ
ードパイプラインとストアパイプラインを指し、各々の
リソースは命令により並列動作及びチェイニング動作が
可能である。It is assumed that a load pipeline for supplying operand data from the main memory to the vector register and a store pipeline for storing data from the vector register to the main memory are provided although not shown. Further, the resources referred to here refer to the 0th operation unit 102, the 1st operation unit 104, the load pipeline and the store pipeline, and each resource can perform a parallel operation and a chaining operation by an instruction.

第６図に示したベクトルデータ処理装置において、２
つの演算リソースである０番演算器102と、１番演算器1
04は、各々並列動作が可能である。従って、加算命令と
乗算命令、または乗算命令と加算命令を並列に処理する
ことが可能であり、チェイニング動作を併用することで
高い処理性能を達成していた。In the vector data processing device shown in FIG.
No. operation unit 102, which is one operation resource, and No. 1 operation unit 1
04 can be operated in parallel. Therefore, an addition instruction and a multiplication instruction, or a multiplication instruction and an addition instruction can be processed in parallel, and high processing performance has been achieved by using a chaining operation together.

ところが、大規模科学技術計算における多項演算をベ
クトルデータ処理装置の命令列に展開したケースにおい
ては、理論的に同時並列処理可能な加算命令が連続する
ケースや、乗算命令が連続するケースが出現する場合が
多々ある。第４図（ｃ），（ｅ）に、第６図に示したベ
クトルデータ処理装置における、このようなケースの処
理シーケンスを図示する。第４図（ｃ）において、命令
Vadd VR0 VR1 VR2は、ベクトルレジスタVR1,VR2の内容
を読み出してパイプライン加算器でベクトルデータ加算
を行い、その結果をベクトルレジスタVR0に書き込むと
いう命令を表わす。以下、この関係を以下のように表記
する。However, in the case where a polynomial operation in large-scale scientific and technical calculation is expanded into an instruction sequence of a vector data processing device, a case in which successive addition instructions that can be theoretically simultaneously processed in parallel and a case in which multiplication instructions are continuous appear. There are many cases. FIGS. 4 (c) and 4 (e) show a processing sequence in such a case in the vector data processing apparatus shown in FIG. In FIG. 4 (c), the instruction
Vadd VR0 VR1 VR2 represents an instruction to read the contents of the vector registers VR1 and VR2, add the vector data by the pipeline adder, and write the result to the vector register VR0. Hereinafter, this relationship is described as follows.

Vadd VR0 VR1 VR2⇒VR1＋VR2＝VR0 次命令は、 Vadd VR3 VR4 VR5⇒VR4＋VR5＝VR3 であり、同様にベクトルレジスタVR4,VR5の内容を読み
出してベクトルデータ加算を行い、その結果をベクトル
レジスタVR3に書き込むという命令を示す。ところで、
この２つの命令列は理論的に同時並列動作可能である。
しかし、第６図に示したベクトルデータ処理装置の演算
器構成では、第４図（ｃ）に示すように、複数段のパイ
プラインで構成されるベクトル加算器の先頭段をVR1＋V
R2のためのオペランドデータが通過した後でないと、VR
4＋VR5のためのオペランドデータをパイプライン先頭段
に供給できない。さらに、第４図（ｅ）の乗算命令のケ
ースも同様である。また、第４図（ｃ）に示したケース
においては、Vadd命令に先行する命令、例えば、ベクト
ルレジスタVR4へのロード命令とのテェイニング動作
は、途切れることになる。この様な状態を演算リソース
ネックと呼び、性能が低下する要因となる。さらに、第
４図（ｃ），（ｅ）のケースとも一方の演算リソースが
ビジー（busy）であるにもかかわらず、他方の演算リソ
ースが遊んでいることになり効率が悪い。Vadd VR0 VR1 VR2 ⇒ VR1 + VR2 = VR0 The next instruction is Vadd VR3 VR4 VR5 ⇒ VR4 + VR5 = VR3. Similarly, the contents of the vector registers VR4 and VR5 are read, the vector data is added, and the result is written to the vector register VR3. Indicates an instruction. by the way,
These two instruction sequences can theoretically be operated simultaneously in parallel.
However, in the arithmetic unit configuration of the vector data processing device shown in FIG. 6, as shown in FIG. 4 (c), the first stage of the vector adder composed of a plurality of pipelines is VR1 + V
Not after the operand data for R2 has passed, VR
4 + Operand data for VR5 cannot be supplied to the first stage of the pipeline. The same applies to the case of the multiplication instruction shown in FIG. In the case shown in FIG. 4C, the chaining operation with the instruction preceding the Vadd instruction, for example, the instruction to load the vector register VR4 is interrupted. Such a state is called an operation resource bottleneck, and causes a reduction in performance. Further, in both cases of FIGS. 4 (c) and 4 (e), even though one of the operation resources is busy, the other operation resource is idle and the efficiency is low.

第７図は、複数の演算器構成をさらに改良したベクト
ルデータ処理装置の概略構成を表現したものである。同
図において、ベクトルレジスタ215a〜215h,スカラデー
タバッファ216a,216b,SEL218,オペランドパス209,210,2
12,213,リザルトパス211,214,DIST217及び、ロード／ス
トアパイプラインについては第６図と同様の構成であ
る。また、１番演算器205は、パイプライン加算器204で
構成される。ただし、０番演算器203は、パイプライン
乗算器202とパイプライン加算器201の複合構成になって
おり、パイプライン乗算器202の演算結果を出力パス206
を通してパイプライン加算器201の一方のオペランドと
してデータ供給ができる構成になっている。さらに、オ
ペランドパス209は、パイプライン乗算器202とパイプラ
イン加算器201の両方に対してデータを供給できる構成
になっている。パイプライン乗算器202の演算結果出力
パス206とパイプライン加算器201の演算結果出力パス20
7は、乗算命令実行時は出力パス206をセレクタ208によ
り選択し、乗算結果を一方のオペランドデータとした加
算命令実行時は出力パルス207をセレクタ208により選択
し、リザルトパス211に出力するように構成されてい
る。さらに、０番演算器203は、オペランドパス210から
のデータをパイプライン乗算器202がそのまま通過（ス
ルー）させることで、パイプライン加算器201を利用し
た単純加算命令を実行することも可能である。FIG. 7 shows a schematic configuration of a vector data processing device in which the configuration of a plurality of arithmetic units is further improved. In the figure, vector registers 215a to 215h, scalar data buffers 216a and 216b, SEL 218, operand paths 209, 210, and 2
The configuration of 12,213, result paths 211, 214, DIST 217, and the load / store pipeline is the same as in FIG. The first arithmetic unit 205 includes a pipeline adder 204. However, the 0-th arithmetic unit 203 has a composite configuration of the pipeline multiplier 202 and the pipeline adder 201, and outputs the arithmetic result of the pipeline multiplier 202 to the output path 206.
Through which data can be supplied as one operand of the pipeline adder 201. Further, the operand path 209 is configured to supply data to both the pipeline multiplier 202 and the pipeline adder 201. Operation result output path 206 of pipeline multiplier 202 and operation result output path 20 of pipeline adder 201
7 is such that the output path 206 is selected by the selector 208 when the multiplication instruction is executed, and the output pulse 207 is selected by the selector 208 when the addition instruction using the multiplication result as one operand data is output to the result path 211. It is configured. Further, the 0-th arithmetic unit 203 can execute a simple addition instruction using the pipeline adder 201 by allowing the data from the operand path 210 to pass through the pipeline multiplier 202 as it is. .

第７図で示したベクトルデータ処理装置では、複合演
算器構成を活かした自動ベクトル化機能として複合演算
（内積，総和，等）をマクロ命令として処理している。
例えば、日立社製HITAC Ｓ−810やその後継機であるHI
TAC Ｓ−820ベクトルデータ処理装置では、ベクトル乗
算器の結果をベクトル加算器のオペランドデータとして
利用することで、連立一次方程式の解法に有効な内積演
算（Ｓ＝Ｓ＋Ａ（Ｉ）＊Ｂ（Ｉ））,3項の積和演算（Ａ
（Ｉ）＋Ｃ＊Ｂ（Ｉ））などの複合演算をマクロ命令と
して一命令で実行するベクトル処理方式を適用し高速化
を図っている。この第７図に示したベクトルデータ処理
装置においては、加算器を２個有するので第４図（ｃ）
で示したケースの同時並列処理が可能である。当該ケー
スを第４図（ｄ）に示す。すなわち、加算演算リソース
の競合が無いため同時並列処理可能である。また、第５
図（ｉ）で示す複合演算命令（マクロ命令）が０番演算
器203で可能である。第５図（ｉ）において、 Vmltadd VR0 VR1 VR2 SR⇒（VR2×SR）＋VR1＝VR0 である。当該命令の処理において、先行する乗算がベク
トルデータVR2とスカラデータSRの積、後続する加算が
先行乗算結果のベクトルデータ（VR2×SR）とベクトル
データVR1の積となる。しかし、第７図に示したベクト
ルデータ処理装置において、例えば、 Vmltadd VR0 R1 VR2 VR3⇒（VR2×VR3）＋VR1＝VR0 等の全オペランドがベクトルデータであるオールベクト
ルオペランド処理は、不可能である。これは、ベクトル
データVR2とベクトルデータVR1の読み出しが両方共オペ
ランドパス209を使用するための制約である。オペラン
ドパス209を時分割で使用することも考えられるが、そ
のための制御が複雑になるとともに、処理性能を低減す
ることになる。In the vector data processing device shown in FIG. 7, a complex operation (inner product, sum, etc.) is processed as a macro instruction as an automatic vectorizing function utilizing the complex arithmetic unit configuration.
For example, Hitachi's HITAC S-810 and its successor HITAC S-810
In the TAC S-820 vector data processing device, the inner product operation (S = S + A (I) * B (I)) that is effective in solving simultaneous linear equations by using the result of the vector multiplier as operand data of the vector adder ), Product-sum operation of three terms (A
(I) + C * B (I)) and the like, and a vector processing method for executing a composite operation as a macro instruction with one instruction is applied to achieve high speed. Since the vector data processing device shown in FIG. 7 has two adders, FIG.
It is possible to perform simultaneous parallel processing in the case indicated by. The case is shown in FIG. 4 (d). That is, since there is no contention for the addition operation resources, simultaneous parallel processing is possible. In addition, the fifth
A composite operation instruction (macro instruction) shown in FIG. In FIG. 5 (i), Vmltadd VR0 VR1 VR2 SR → (VR2 × SR) + VR1 = VR0. In the processing of the instruction, the preceding multiplication is a product of the vector data VR2 and the scalar data SR, and the subsequent addition is a product of the vector data (VR2 × SR) of the result of the preceding multiplication and the vector data VR1. However, in the vector data processing apparatus shown in FIG. 7, it is impossible to perform all-vector operand processing in which all operands such as, for example, Vmltadd VR0 R1 VR2 VR3⇒ (VR2 × VR3) + VR1 = VR0 are vector data. This is a constraint that both the reading of the vector data VR2 and the reading of the vector data VR1 use the operand path 209. Although it is conceivable to use the operand path 209 in a time-division manner, the control for this is complicated and the processing performance is reduced.

さらに、この様な演算器構成のベクトルデータ処理装
置で命令列のチェイニングを行い、高速のベクトルデー
タ処理を実現するためには、演算命令を各々の演算器に
効果的に割り当てるスケジューリング制御が必要であ
る。例えば、加算命令と乗算命令の連続ケースにおい
て、最初の加算命令を０番演算器203に割り当てた場
合、次の乗算命令起動では、演算リソースコンフリクト
が生じる。その結果、乗算処理が待たされチェイン切れ
を発生することになる。このように不均一な演算器構成
は、特に多項演算において複雑な演算命令スケジューリ
ングを必要とし、効率的な演算器割付けが困難である。Furthermore, in order to perform chaining of instruction sequences and realize high-speed vector data processing in the vector data processing device having such an arithmetic unit configuration, scheduling control for effectively assigning arithmetic instructions to each arithmetic unit is necessary. It is. For example, if the first addition instruction is assigned to the 0th operation unit 203 in a continuous case of an addition instruction and a multiplication instruction, an operation resource conflict occurs at the next activation of the multiplication instruction. As a result, the multiplication process is awaited, and a chain break occurs. Such a non-uniform operation unit configuration requires complicated operation instruction scheduling especially in polynomial operation, and it is difficult to efficiently allocate the operation units.

第８図に示すベクトルデータ処理装置は、複数の演算
器構成で多項演算処理を高速に行うもう一つの例であ
る。同図において、ベクトルレジスタ321a〜321h、スカ
ラデータバッファ322a,322b、及びロード／ストアパイ
プラインについては第６図と同様の構成である。SEL32
4、オペランドパス309,310,312,313,315,316,318,319、
リザルトパス311,314,317,320,DIST323については、第
６図に対して増加した演算器分のデータパス増加と、セ
レクタ論理変更とを行っている。また、パイプライン加
算器301で構成される０番演算器302と、パイプライン乗
算器303で構成される１番演算器304と、パイプライン加
算器305で構成される２番演算器306と、パイプライン乗
算器307で構成される３番演算器308とは、演算リソース
であり各々独立して並列に演算命令を実行することがで
きる。The vector data processing device shown in FIG. 8 is another example in which polynomial operation processing is performed at high speed with a configuration of a plurality of operation units. 6, the vector registers 321a to 321h, the scalar data buffers 322a and 322b, and the load / store pipeline have the same configuration as in FIG. SEL32
4, operand path 309,310,312,313,315,316,318,319,
As for the result paths 311, 314, 317, 320, and DIST 323, the data paths for the arithmetic units increased compared to FIG. 6 and the selector logic is changed. Also, a No. 0 arithmetic unit 302 including a pipeline adder 301, a No. 1 arithmetic unit 304 including a pipeline multiplier 303, a No. 2 arithmetic unit 306 including a pipeline adder 305, The third arithmetic unit 308 including the pipeline multiplier 307 is an arithmetic resource and can execute arithmetic instructions independently and in parallel.

この第８図に示したベクトルデータ処理装置に対し
て、第４図に示した加算命令の連続及び、乗算命令の連
続ケースを実行した場合第４図（ｄ），（ｆ）に示すよ
うに、同一種の演算リソースの競合が生じないため、命
令列の処理時間を短縮することができる。ところが、こ
の様な演算器構成の場合、複数の演算器がすべて各々一
つの演算命令を実行できる演算リソースとなるため、命
令制御側から見ると演算命令列をそれぞれの演算リソー
スに割り付けるスケジューリング処理が複雑になる。ま
た、データパス、およびSEL324を初めとする制御論理等
のハードウェア物量も多くなる。さらに、ベクトルデー
タ処理装置は、多項演算ケースばかりでなく、ロード／
ストア系命令と演算系命令が１対１の割合で混在する命
令列も多い。この様なケースにおいては、当該装置の演
算リソースは使用されない演算器が生じる。すなわち、
過剰な設計（オーバーデザイン）であるといえる。When the successive case of the addition instruction and the successive case of the multiplication instruction shown in FIG. 4 are executed for the vector data processing device shown in FIG. 8, as shown in FIGS. 4 (d) and 4 (f). Since the same type of operation resources do not conflict, the processing time of the instruction sequence can be reduced. However, in the case of such an arithmetic unit configuration, since a plurality of arithmetic units are all operation resources capable of executing one operation instruction, from the instruction control side, a scheduling process of allocating an operation instruction sequence to each operation resource is performed. It gets complicated. In addition, the amount of hardware such as a data path and control logic including the SEL 324 also increases. Further, the vector data processing device is capable of not only the polynomial operation case, but also load /
There are many instruction sequences in which store-related instructions and arithmetic-related instructions are mixed in a one-to-one ratio. In such a case, there occurs a computing unit in which the computing resources of the device are not used. That is,
It can be said that it is excessive design (over design).

ところで、第７図に示した従来例の代表的なベクトル
データ処理装置の例としては、日経エレクトロニクス、
1983年、４月11日、（No.314）第159〜第184頁、およ
び、同、1987年、12月28日、（No.437）第111〜第125頁
で紹介されている日立社製HITAC Ｓ−810,S−820があ
る。また、特開昭64−67678号にも同様のベクトルデー
タ処理装置が開示されている。さらに、第８図に示した
従来例に該当する装置は、同じく日立社製HITAC Ｓ−8
10がある。By the way, as examples of the typical vector data processing device of the conventional example shown in FIG.
Hitachi, introduced on April 11, 1983, (No. 314), pp. 159 to 184, and 1987, December 28, 1987, (No. 437), pp. 111 to 125. HITAC S-810 and S-820 manufactured by HITAC. Japanese Patent Application Laid-Open No. 64-67678 discloses a similar vector data processing device. Further, an apparatus corresponding to the conventional example shown in FIG. 8 is a Hitachi HITAC S-8.
There are ten.

［発明が解決しようとする課題］以上の従来技術によれば、前記ベクトルデータ処理装
置は、第６図に示した複数の演算器の構成方式によって
は、同一種の演算命令が連続するケースにおいて演算リ
ソースネックが発生する。また、一方の演算リソースが
ビジーであるにもかかわらず、他方の演算リソースが遊
ぶケースが発生し効率が悪くなることがある。また、第
７図に示したような複合演算命令をサポートするため
に、乗算器と加算器を直列に接続した演算器構成では、
オールベクトルオペランド処理が不可能であった。さら
に、第８図に示した演算リソースを増やす方式は、従来
の２倍のデータパスを必要とするばかりか、複雑な演算
命令スケジューリングも必要であり、効率的な演算器割
付けが困難であった。また、かかる複雑な制御を実現す
るために、ハードウェア量が増大するという問題があっ
た。[Problems to be Solved by the Invention] According to the above-described conventional technology, the vector data processing apparatus may be used in a case where the same type of operation instruction is consecutive depending on the configuration of a plurality of operation units shown in FIG. An operation resource bottleneck occurs. In addition, there is a case where the efficiency of the operation is reduced due to a case where one operation resource is busy and the other operation resource plays. In addition, in order to support a complex operation instruction as shown in FIG. 7, in an arithmetic unit configuration in which a multiplier and an adder are connected in series,
All vector operand processing was not possible. Further, the method of increasing the operation resources shown in FIG. 8 not only requires twice as many data paths as the conventional one but also requires complicated operation instruction scheduling, which makes it difficult to efficiently allocate the operation units. . In addition, there is a problem that the amount of hardware increases to realize such complicated control.

そこで、本発明は、複数の演算器で構成するベクトル
データ処理装置において、ハードウェア量の増加を最小
限に押さえつつ、前記問題点を克服し、効率的なベクト
ルデータ処理を提供することを目的とする。Accordingly, an object of the present invention is to provide a vector data processing device including a plurality of arithmetic units, which overcomes the above-described problems and provides efficient vector data processing while minimizing an increase in the amount of hardware. And

［課題を解決するための手段］前記目的を達成するために、本発明によるベクトルデ
ータ処理装置は、複数のベクトル要素よりなるベクトル
データを複数組保持するベクトルデータバッファを有
し、該ベクトルデータバッファに保持されたベクトルデ
ータについてベクトル演算を行なうベクトルデータ処理
装置において、加算器および乗算器を含み、両演算器の
少なくとも一方の出力を他方の入力に帰還するパスを有
する第１の複合演算器と、該第１の複合演算器と同一構
成の第２の複合演算器と、前記第１の複合演算器の外部
から当該加算器および乗算器の全４入力のうち３入力を
一時に与えることができる第1,第2,第３のオペランドパ
スと、前記第２の複合演算器の外部から当該加算器およ
び乗算器の全４入力のうち３入力を一時に与えることが
できる第4,第5,第６のオペランドパスとを具備したもの
である。Means for Solving the Problems In order to achieve the above object, a vector data processing device according to the present invention has a vector data buffer for holding a plurality of sets of vector data composed of a plurality of vector elements. A vector data processing apparatus for performing a vector operation on the vector data held in the first compound arithmetic unit including an adder and a multiplier, and having a path for feeding back at least one output of both arithmetic units to the other input; A second composite arithmetic unit having the same configuration as that of the first composite arithmetic unit, and three of the four inputs of the adder and the multiplier being externally provided from the first composite arithmetic unit. The first, second, and third operand paths that can be provided, and at the same time, three inputs out of all four inputs of the adder and the multiplier from the outside of the second complex operator are given at a time. Fourth capable, fifth, those provided with the sixth operand path.

好ましくは、前記第１および第２の複合演算器の各々
は、当該加算器の出力を前記４入力のうちの１入力に選
択的に帰還する手段と、当該乗算器の出力を前記４入力
のうちの１入力に選択的に帰還する手段とを有する。Preferably, each of the first and second composite arithmetic units selectively returns the output of the adder to one of the four inputs, and outputs the output of the multiplier to the four inputs. Means for selectively feeding back to one of the inputs.

また、好ましくは、連続する二つの同一種の演算命令
について、当該演算命令の種類の如何を問わず、先行す
る演算命令をビジーでない方の複合演算器に割り付け、
後続の演算命令を他方の複合演算器に割り付ける演算命
令スケジューリング手段を更に具備する。Also, preferably, for two consecutive same type of operation instructions, regardless of the type of the operation instruction, the preceding operation instruction is allocated to the less busy complex operation unit,
Operation instruction scheduling means for allocating a subsequent operation instruction to the other composite operation unit is further provided.

この演算命令スケジューリング手段は、先行する加算
（または乗算）命令の結果を格納するレジスタを後続の
乗算（または加算）命令が当該オペランドレジスタとし
て指定している場合、両命令を同一の複合演算器に割り
付けることができる。The operation instruction scheduling means, when a subsequent multiplication (or addition) instruction designates a register for storing the result of a preceding addition (or multiplication) instruction as the operand register, stores both instructions in the same complex operation unit. Can be assigned.

前記各複合演算器は、先行する加算（または乗算）命
令の結果を格納するレジスタを後続の乗算（または加
算）命令が当該オペランドレジスタとして指定している
場合、好ましくは、先行する第１の演算命令は加算器
（または乗算器）で３つのオペランドパスのうち２つを
用いて演算処理し、後続の演算命令は乗算器（または加
算器）で前記演算処理結果のフィードバックパスおよび
残りのオペランドパスを用いて演算処理する。Preferably, each of the compound arithmetic units is configured such that, when a subsequent multiplication (or addition) instruction designates a register for storing a result of a preceding addition (or multiplication) instruction as the operand register, the preceding first operation An instruction is processed by an adder (or multiplier) using two of the three operand paths, and a subsequent operation instruction is processed by a multiplier (or adder) as a feedback path of the operation result and the remaining operand paths. Calculation processing is performed using

前記各複合演算器は、演算命令が、先行加算（または
乗算）結果を後続乗算（または加算）のオペランドデー
タとして用いるマクロ演算命令である場合、好ましく
は、先行する演算は加算器（または乗算器）で３つのオ
ペランドパスのうち２つを用いて演算処理し、後続の演
算は乗算器（または加算器）で前記演算処理結果のフィ
ードバックパスおよび残りのオペランドパスを用いて演
算処理する。In each of the complex operation units, when the operation instruction is a macro operation instruction that uses a result of a preceding addition (or multiplication) as operand data of a subsequent multiplication (or addition), preferably, the preceding operation is performed by an adder (or a multiplier) ), The operation is performed using two of the three operand paths, and the subsequent operation is performed by a multiplier (or adder) using the feedback path of the operation processing result and the remaining operand paths.

前記先行する演算の結果が前記フィードバックパスを
介して前記後続の演算の演算器に入力されるタイミング
で、前記残りのオペランドパスからのオペランドが当該
演算器に到達するよう当該オペランド自体またはその読
出し指示を遅延させる手段を具備してもよい。At the timing when the result of the preceding operation is input to the operation unit of the subsequent operation via the feedback path, the operand itself or an instruction for reading the operand so that the operand from the remaining operand path reaches the operation unit. May be provided.

本発明によるベクトルデータ処理装置は、他の見地に
よれば、複数のベクトル要素よりなるベクトルデータを
複数組保持するベクトルデータバッファを有し、該ベク
トルデータバッファに保持されたベクトルデータについ
てベクトル演算を行なうベクトルデータ処理装置におい
て、加算器および乗算器を含み、両演算器の少なくとも
一方の出力を他方の入力に帰還するパスを有する複合演
算器と、該複合演算器の外部から当該加算器および乗算
器の全４入力のうち３入力を一時に与えることができる
第1,第2,第３のオペランドパスとを具備し、前記複合演
算器は、当該加算器の出力を前記４入力のうちの１入力
に選択的に帰還する手段と、当該乗算器の出力を前記４
入力のうちの１入力に選択的に帰還する手段とを有する
ものである。According to another aspect, a vector data processing device according to the present invention includes a vector data buffer that holds a plurality of sets of vector data including a plurality of vector elements, and performs a vector operation on the vector data held in the vector data buffer. In a vector data processing apparatus for performing a complex operation, a complex operation unit including an adder and a multiplier, and having a path for returning at least one output of both operation units to the other input, and the adder and the multiplication unit from outside the complex operation unit And a first, second, and third operand paths that can provide three inputs at a time out of all four inputs of the adder, wherein the complex arithmetic unit outputs the output of the adder to the output of the four inputs. Means for selectively feeding back to one input;
Means for selectively feeding back one of the inputs.

［作用］本発明に係るベクトルデータ処理装置において、ベク
トル加算命令が連続するケースでは、最初の加算命令を
第１および第２の複合演算器のビジーでない方の複合演
算器に対して演算処理を割付け、後続の加算命令を先行
加算命令が割付けられた反対の複合演算器に割り付け
る。また、ベクトル乗算命令が連続するケースでは、最
初の乗算命令を第１および第２の複合演算器のビジーで
ない方の複合演算器に対して演算処理を割付け、後続の
乗算命令を先行乗算命令が割り付けられた反対の複合演
算器に割付ける。さらに、ベクトル加算命令とベクトル
乗算命令が連続で、両ベクトル演算命令同士がチェイニ
ング関係にあるとき、先行する加算命令について第１お
よび第２の複合演算器のビジーでない方の複合演算器に
対してその演算処理を割付け、後続の乗算命令も当該複
合演算器に割付ける。[Operation] In the vector data processing apparatus according to the present invention, in the case where the vector addition instruction is continuous, the first addition instruction is subjected to the arithmetic processing on the non-busy composite operation unit of the first and second composite operation units. Assignment, assigning the succeeding addition instruction to the opposite complex operator to which the leading addition instruction is assigned. Further, in the case where the vector multiplication instructions are consecutive, the first multiplication instruction is assigned an operation processing to the non-busy composite operation unit of the first and second composite operation units, and the subsequent multiplication instruction is assigned to the preceding multiplication instruction. Allocate to the opposite complex operator. Further, when the vector addition instruction and the vector multiplication instruction are continuous and both vector operation instructions are in a chaining relationship, the first addition operation instruction and the second operation unit which are not busy with respect to the first addition operation instruction are compared with each other. And the subsequent multiplication instruction is also assigned to the complex arithmetic unit.

前記先行する加算命令のオペランドは、命令で示すベ
クトルレジスタもしくはスカラデータバッファから読み
出し、加算命令を割付けた複合演算器が第１複合演算器
の場合は第１オペランドパスと第２オペランドパスを、
第２複合演算器の場合は第４オペランドパスと第５オペ
ランドパスを通じて当該パイプライン加算器に供給す
る。該供給オペランドは、該パイプライン加算器の演算
ステージ分経過した後、加算結果をリザルトパスへ出力
し、該加算命令が示すベクトルレジスタに書き込む。さ
らに、該加算結果は、フィードバックパスを通して当該
複合演算器のパイプライン乗算器に、後続する前記乗算
命令の被乗数データとして入力される。該乗算命令を乗
数データは、該乗算命令で示すベクトルレジスタもしく
はスカラデータバッファから該フィードバック結果の先
頭エレメントが、該パイプライン乗算器の先頭ステージ
に同時に到着するように読み出す制御を行い、当該複合
演算器が、第１複合演算器の場合は第３オペランドパス
を、第２複合演算器の場合は第６オペランドパスを通し
て該パイプライン乗算器に入力する。該供給オペランド
は、該パイプライン乗算器の演算ステージ分経過した
後、乗算結果をリザルトパスへ出力し、該乗算命令が示
すベクトルレジスタに書き込む。また、ベクトル乗算命
令とベクトル加算命令が連続で、両ベクトル演算命令同
士がチェイニング関係にあるときも、同様に同一複合演
算器に割付ける。The operand of the preceding addition instruction is read from the vector register or the scalar data buffer indicated by the instruction, and if the composite operation unit to which the addition instruction is assigned is the first composite operation unit, the first operand path and the second operand path are
In the case of the second compound operation unit, the signal is supplied to the pipeline adder through the fourth operand path and the fifth operand path. The supply operand outputs the addition result to the result path after the lapse of the operation stage of the pipeline adder, and writes the result to the vector register indicated by the addition instruction. Further, the result of the addition is input as a multiplicand data of the subsequent multiplication instruction to a pipeline multiplier of the complex arithmetic unit via a feedback path. The multiplication instruction is controlled so that the multiplier data is read from the vector register or the scalar data buffer indicated by the multiplication instruction so that the head element of the feedback result arrives at the head stage of the pipeline multiplier at the same time. When the unit is the first compound operation unit, the third operand path is input to the pipeline multiplier through the sixth operand path when the unit is the second compound operation unit. The supply operand outputs the multiplication result to the result path after the lapse of the operation stage of the pipeline multiplier, and writes the result to the vector register indicated by the multiplication instruction. Also, when a vector multiplication instruction and a vector addition instruction are continuous and both vector operation instructions have a chaining relationship, they are similarly assigned to the same complex arithmetic unit.

すなわち、当該処理装置においては、パイプライン加
算器を使用する演算命令と、パイプライン乗算器を使用
する演算命令がチェイニング関係にあるとき、これら命
令列は、同一複合演算器に割付ける制御処理を行う。That is, in the processing device, when an operation instruction using a pipeline adder and an operation instruction using a pipeline multiplier are in a chaining relationship, these instruction strings are assigned to a control process assigned to the same complex arithmetic unit. I do.

さらに、ベクトル加算とベクトル乗算を複合したマク
ロ命令の場合、第１および第２の複合演算器のビジーで
ない方の複合演算器に対して演算処理を割付ける。該マ
クロ演算処理は、演算処理のオペランドを、命令で示す
ベクトルレジスタもしくはスカラデータバッファから読
み出し、当該マクロ命令を割付けた複合演算器が第１複
合演算器の場合は、第１オペランドパス，第２オペラン
ドパス，第３オペランドパスを、第２複合演算器の場合
は、第４オペランドパス，第５オペランドパス，第６オ
ペランドパスを、通じて供給する。該供給オペランド
は、第1,第４オペランドが演算データ、第2,第５オペラ
ンドが被演算データとなる。前記演算データと被演算デ
ータが、当該マクロ命令の内容が示す加算もしくは乗算
の先行パイプライン演算器の演算ステージ分経過した
後、その演算結果をフィードバックパスを通して、該複
合演算器の該マクロ命令の内容が示す加算もしくは乗算
の後続パイプライン演算器に、後続演算の被演算データ
として入力される。該後続演算の演算データは、前記フ
ィードバック結果の先頭エレメントが該パイプライン演
算器の先頭ステージに同時に到着するように、当該複合
演算器が第１複合演算器の場合は第３オペランドパス
を、第２複合演算器の場合は第６オペランドパスを遅延
制御して前記パイプライン演算器に入力する。該供給オ
ペランドは、該パイプライン演算器の演算ステージ分経
過した後、その演算結果をリザルトパスへ出力し、当該
マクロ命令が示すベクトルレジスタに書き込む。Further, in the case of a macro instruction in which vector addition and vector multiplication are combined, the arithmetic processing is assigned to the non-busy one of the first and second compound operators. In the macro operation processing, the operand of the operation processing is read from a vector register or a scalar data buffer indicated by the instruction, and if the composite operation unit to which the macro instruction is assigned is the first composite operation unit, the first operand path, the second operand path, The operand path and the third operand path are supplied through the second compound operation unit, and the fourth operand path, the fifth operand path, and the sixth operand path are supplied through the second compound operation unit. In the supply operand, the first and fourth operands are operation data, and the second and fifth operands are operation target data. After the operation data and the data to be operated have elapsed by the operation stage of the preceding pipeline operation unit of the addition or multiplication indicated by the content of the macro instruction, the operation result is passed through a feedback path to the macro instruction of the composite operation unit. The data is input to the subsequent pipeline arithmetic unit of the addition or multiplication indicated by the contents as the operation target data of the subsequent operation. The operation data of the subsequent operation includes a third operand path when the composite operation unit is the first composite operation unit, so that a first element of the feedback result arrives at the first stage of the pipeline operation unit at the same time. In the case of a two-composite arithmetic unit, the sixth operand path is delay-controlled and input to the pipeline arithmetic unit. After a lapse of the operation stage of the pipeline operation unit, the supply operand outputs the operation result to a result path and writes the result to a vector register indicated by the macro instruction.

また、マクロ命令が連続するケースでは、最初のマク
ロ命令を第１および第２の複合演算器のビジーでない方
の複合演算器に対して演算処理を割付け、後続のマクロ
命令を先行マクロ命令が割付けられた反対の複合演算器
に割り付ける。In the case where the macro instructions are consecutive, the first macro instruction is assigned to the arithmetic operation unit which is not busy with the first and second composite operation units, and the subsequent macro instruction is assigned to the preceding macro instruction. Assigned to the opposite complex operator.

［実施例］以下、本発明の実施例について詳細に説明する。[Examples] Hereinafter, examples of the present invention will be described in detail.

まず、本発明に係る第１のベクトルデータ処理装置の
実施例について説明する。第１図に、本実施例に係るベ
クトルデータ処理装置の制御部を除く構成を示す。First, an embodiment of the first vector data processing device according to the present invention will be described. FIG. 1 shows a configuration excluding a control unit of the vector data processing device according to the present embodiment.

同図において、ベクトルデータ処理装置は、各々独立
に読出し・書込みが可能で、各々128要素のベクトルデ
ータを保持できるVR0〜VR31の32個のベクトルレジスタ3
3a〜33hと、３つのオペランドデータに対応したスカラ
データを格納するスカラデータバッファ34a〜34cと、ベ
クトルレジスタ33a〜33hとスカラデータバッファ34a〜3
4cの出力データを命令により各々のリソースへ選択する
スイッチマトリックス論理のSEL36と、各々のリソース
からベクトルレジスタ33a〜33hに送られる書き込みデー
タを、命令で示すベクトルレジスタ33a〜33hに選択して
送り出すスイッチマトリクス論理DIST35と、演算リソー
スである０番複合演算器３と、１番複合演算器６とで構
成される。また、図示はしないが、主記憶からベクトル
レジスタへオペランドを供給するロードパイプラインと
ベクトルレジスタから主記憶へデータを格納するストア
パイプラインは、具備されていて、それぞれ制御系から
はリソースとして制御する。In the figure, the vector data processing device has 32 vector registers 3 of VR0 to VR31, each of which can read and write independently, and can hold vector data of 128 elements each.
3a to 33h, scalar data buffers 34a to 34c for storing scalar data corresponding to three operand data, vector registers 33a to 33h, and scalar data buffers 34a to 34a.
A switch matrix logic SEL 36 for selecting the output data of 4c to each resource by an instruction, and a switch for selecting and sending the write data sent from each resource to the vector registers 33a to 33h to the vector registers 33a to 33h indicated by the instruction. It is composed of a matrix logic DIST 35, a No. 0 complex operator 3 which is an operation resource, and a No. 1 complex operator 6. Although not shown, a load pipeline for supplying operands from the main memory to the vector register and a store pipeline for storing data from the vector register to the main memory are provided, and each is controlled as a resource from the control system. .

０番複合演算器３（以下、PDI（Parallel Dual Instr
uction）演算器０という）は、ベクトルデータ及びスカ
ラデータを、ベクトルレジスタ33a〜33hとスカラバッフ
ァ34a〜34cから読み出す第１オペランドパス15,第２オ
ペランドパス16,第３オペランドパス17と、パイプライ
ン加算器１と、パイプライン乗算器２と、パイプライン
加算器１の演算結果をフィードバックし、パイプライン
加算器１及びパイプライン乗算器２のオペランドとする
パス11と、パイプライン乗算器２の演算結果をフィード
バックしてパイプライン加算器１及びパイプライン乗算
器２のオペランドとするパス12と、第１オペランドパス
15とフィードバックパス11とフィードバックパス12から
の入力データを命令により選択してパイプライン加算器
１の被加数データとするセレクタ７と、第２オペランド
パス16と第３オペランドパス17とフィードバックパス11
とフィードバックパス12からの入力データを命令により
選択してパイプライン加算器１の加数データとするセレ
クタ８と、第１オペランドパス15とフィードバックパス
11とフィードバックパス12からの入力データを命令によ
り選択してパイプライン乗算器２の被乗数データとする
セレクタ９と、第２オペランドパス16と第３オペランド
パス17とフィードバックパス11とフィードバックパス12
からの入力データを命令により選択してパイプライン乗
算器２の乗数データとするセレクタ10と、パイプライン
加算器１の出力回路13と、パイプライン乗算器２の出力
回路14と、パイプライン加算器１の演算結果をベクトル
レジスタに書き込むリザルトパス18と、パイプライン乗
算器２の演算結果をベクトルレジスタに書き込むリザル
トパス19から構成される。No. 0 complex operator 3 (hereinafter, PDI (Parallel Dual Instr.)
Auction 0) is a first operand path 15, a second operand path 16, a third operand path 17 for reading vector data and scalar data from vector registers 33a to 33h and scalar buffers 34a to 34c, and a pipeline. The operation result of the adder 1, the pipeline multiplier 2, and the pipeline adder 1 is fed back, and the path 11 is used as an operand of the pipeline adder 1 and the pipeline multiplier 2, and the operation of the pipeline multiplier 2 A path 12 that feeds back the result to be an operand of the pipeline adder 1 and the pipeline multiplier 2, and a first operand path
A selector 7 which selects input data from the feedback path 11, the feedback path 11 and the feedback path 12 by an instruction and uses them as addend data of the pipeline adder 1, a second operand path 16, a third operand path 17, and a feedback path 11
, A selector 8 which selects input data from the feedback path 12 by an instruction and uses it as addend data of the pipeline adder 1, a first operand path 15 and a feedback path.
A selector 9 that selects input data from the feedback path 11 and the feedback path 12 by an instruction and sets the data as multiplicand data of the pipeline multiplier 2, a second operand path 16, a third operand path 17, a feedback path 11 and a feedback path 12
10, a selector 10 which selects input data from the CPU 2 as an instruction to select multiplier data of the pipeline multiplier 2, an output circuit 13 of the pipeline adder 1, an output circuit 14 of the pipeline multiplier 2, and a pipeline adder. The result path 18 writes the operation result of 1 to the vector register, and the result path 19 writes the operation result of the pipeline multiplier 2 to the vector register.

１番複合演算器６（以下、PDI演算器１という）は、
ベクトルデータ及びスカラデータを、ベクトルレジスタ
33a〜33hとスカラバッファ34a〜34cから読み出す第４オ
ペランドパス28,第５オペランドパス29,第６オペランド
パス30と、パイプライン加算器４と、パイプライン乗算
器５と、パイプライン加算器４の演算結果をフィードバ
ックしてパイプライン加算器４及びパイプライン乗算器
５のオペランドとするパス24と、パイプライン乗算器５
の演算結果をフィードバックしてパイプライン加算器４
及びパイプライン乗算器５のオペランドとするパス25
と、第４オペランドパス28とフィードバックパス24とフ
ィードバックパス25からの入力データを命令により選択
してパイプライン加算器４の被加数データとするセレク
タ20と、第５オペランドパス29と第６オペランドパス30
とフィードバックパス24とフィードバックパス25からの
入力データを命令により選択してパイプライン加算器４
の加数データとするセレクタ21と、第４オペランドパス
28とフィードバックパス24とフィードバックパス25から
の入力データを命令により選択してパイプライン乗算器
５の被乗数データとするセレクタ22と、第５オペランド
パス29と第６オペランドパス30とフィードバックパス24
とフィードバックパス25からの入力データを命令により
選択してパイプライン乗算器５の乗数データとするセレ
クタ23と、パイプライン加算器４の出力回路26と、パイ
プライン乗算器５の出力回路27と、パイプライン加算器
４の演算結果をベクトルレジスタに書き込むリザルトパ
ス31と、パイプライン乗算器５の演算結果をベクトルレ
ジスタに書き込むリザルトパス32とから構成される。The first composite computing unit 6 (hereinafter referred to as PDI computing unit 1)
Vector registers and scalar data are stored in vector registers
33a-33h and fourth operand path 28, fifth operand path 29, sixth operand path 30, read from scalar buffers 34a-34c, pipeline adder 4, pipeline multiplier 5, and pipeline adder 4. A path 24 which feeds back the operation result and is used as an operand of the pipeline adder 4 and the pipeline multiplier 5;
Is fed back to the pipeline adder 4
And a path 25 as an operand of the pipeline multiplier 5
And a selector 20 which selects input data from the fourth operand path 28, the feedback path 24, and the feedback path 25 by an instruction and uses the data as addend data of the pipeline adder 4, a fifth operand path 29, and a sixth operand Pass 30
And the input data from the feedback path 24 and the feedback path 25 are selected by the instruction and the pipeline adder 4
Selector 21 as the addend data of the fourth operand path
A selector 22, which selects input data from the feedback path 24, the feedback path 24, and the feedback path 25 by an instruction and sets the data as multiplicand data of the pipeline multiplier 5, a fifth operand path 29, a sixth operand path 30, and a feedback path 24.
And a selector 23 that selects input data from the feedback path 25 by an instruction and sets the data as multiplier data of the pipeline multiplier 5, an output circuit 26 of the pipeline adder 4, an output circuit 27 of the pipeline multiplier 5, It comprises a result path 31 for writing the operation result of the pipeline adder 4 to the vector register, and a result path 32 for writing the operation result of the pipeline multiplier 5 to the vector register.

第２図に、本実施例に係るベクトルデータ処理装置と
命令制御部の構成を示す。FIG. 2 shows a configuration of the vector data processing device and the instruction control unit according to the present embodiment.

図において、ベクトルデータ処理装置命令制御部37
は、命令バッファ38と、PDI演算器１用実行命令キュー4
0（以下、PDI1Xiという）と、次実行命令キュー39（以
下、PDI1Niという）と、PDI演算器０用実行命令キュー4
2（以下、PDI0Xiという）と、次実行命令キュー41（以
下、PDI0Niという）と、命令バッファ38から命令キュー
39〜42への命令転送パス45と、DIST35,ベクトルレジス
タ33,SEL36,PDI演算器０に対してPDI0Xi,PDI0Niの内容
により制御指示を発行する制御パス44と、DIST35,ベク
トルレジスタ33,SEL36,PDI演算器１に対してPDI1Xi,PDI
1Niの内容により制御指示を発行する制御パス43とから
構成される。また、Xi,Niの２面の命令キューは、各リ
ソース対応に設けられており、各々ビジーフラグ（busy
falg）を持つことで命令キューとリソースのビジー状
態を管理することができる構成になっている。尚、図示
はしないがロード／ストアパイプライン用リソース命令
キューも、PDI0,1演算器用命令キュー同様設けられてい
る。In the figure, the vector data processing device instruction control unit 37
Is the instruction buffer 38 and the execution instruction queue 4 for the PDI operator 1
0 (hereinafter, referred to as PDI1Xi), the next execution instruction queue 39 (hereinafter, referred to as PDI1Ni), and the execution instruction queue 4 for the PDI operation unit 0
2 (hereinafter, referred to as PDI0Xi), a next execution instruction queue 41 (hereinafter, referred to as PDI0Ni), and an instruction queue from the instruction buffer 38.
An instruction transfer path 45 to 39 to 42, a control path 44 for issuing a control instruction to the DIST 35, the vector register 33, the SEL 36, and the PDI arithmetic unit 0 according to the contents of the PDI0Xi and PDI0Ni, and a DIST 35, the vector register 33, the SEL 36, PDI1Xi, PDI for PDI calculator 1
The control path 43 issues a control instruction based on the contents of 1Ni. Also, instruction queues on two sides of Xi and Ni are provided for each resource, and each has a busy flag (busy flag).
falg) enables the management of the busy state of the instruction queue and resources. Although not shown, a resource instruction queue for the load / store pipeline is provided similarly to the instruction queue for the PDI0,1 arithmetic unit.

第３図は、本発明による第１のベクトルデータ処理装
置において、最も効果を発揮する演算命令列の一例を説
明した図である。FIG. 3 is a diagram for explaining an example of an operation instruction sequence that is most effective in the first vector data processing device according to the present invention.

同図において、演算命令列は、 Vadd VR0 R1 VR2⇒VR1＋VR2＝VR0 Vmlt VR3 VR4 VR0⇒VR4×VR0＝VR3 Vmlt VR5 VR6 VR7⇒VR6×VR7＝VR5 Vadd VR8 VR9 VR5⇒VR9＋VR5＝VR8 であり、命令制御部37内の命令バッファ38に命令を先
頭として順次格納されているものとする。また、命令
の加算結果のVR0と命令のオペランドVR0及び命令の
乗算結果のVR5と命令のオペランドVR5がチェイニング
関係にある。In the figure, the operation instruction sequence is Vadd VR0 R1 VR2⇒VR1 + VR2 = VR0 Vmlt VR3 VR4 VR0⇒VR4 × VR0 = VR3 Vmlt VR5 VR6 VR7⇒VR6 × VR7 = VR5 Vadd VR8 VR9 VR5⇒VR9 + VR5 = VR8, and instruction control It is assumed that instructions are sequentially stored in the instruction buffer 38 in the unit 37 with the instructions at the head. Further, the addition result VR0 of the instruction and the operand VR0 of the instruction, the multiplication result VR5 of the instruction, and the operand VR5 of the instruction have a chaining relationship.

当該命令列を第１図に示す第１のベクトルデータ処理
装置で処理する場合の動作は、先行命令においてPDI演
算器0,1のリソースが使用されていないものとすると、
加算命令をPDI0Xiにアサインし、続く乗算命令を加
算命令とチェイニング関係にあるため同じPDI0Niにア
サインする。さらに、続く乗算命令は、PDI演算器０
がビジー状態にあるためPDI1Xiにアサインし、続く加算
命令を乗算命令とチェイニング関係にあるためPDI1
Niにアサインする。このように、連続する演算命令列が
互いにチェイニング関係にあるとき、これら二つの演算
命令列は同一リソースの命令キューにアサインする。命
令バッファから読み出した命令列は、リソース対応に設
けた命令キューに積まれる訳であるが、通常の演算命令
列、例えば、先行と後行の命令の間にチェイニング関係
が無いときは先行の命令がアサインされた反対の演算リ
ソース命令キューに積むことになる。The operation when the instruction sequence is processed by the first vector data processing device shown in FIG. 1 is based on the assumption that the resources of the PDI operation units 0 and 1 are not used in the preceding instruction.
The addition instruction is assigned to PDI0Xi, and the subsequent multiplication instruction is assigned to the same PDI0Ni because of the chaining relationship with the addition instruction. Further, the following multiplication instruction is executed by the PDI operator 0
Is busy, assign to PDI1Xi, and since the subsequent addition instruction is chained with the multiplication instruction, PDI1Xi
Assign to Ni. As described above, when the continuous operation instruction sequences are in a chaining relationship with each other, these two operation instruction sequences are assigned to the instruction queue of the same resource. The instruction sequence read from the instruction buffer is accumulated in the instruction queue provided for the resource.However, if there is no chaining relationship between the preceding and succeeding instructions, for example, the preceding The instruction will be placed in the opposite computational resource instruction queue to which it was assigned.

PDI0Xi42に格納した加算命令の起動によりベクトル
制御部37は、VR1とVR2を同時に読み出す指示をベクトル
レジスタ33に与えると共に、VR1とVR2のデータを第１オ
ペランドパス，第２オペランドパスにセレクトする指示
をSEL36に与え、第１オペランドパス，第２オペランド
パスからのデータを加算実行する指示をPDI演算器０に
与える。さらに、PDI演算器０のパイプライン加算器リ
ザルトパス18をVR0にセレクトする指示をDIST35に与
え、VR0に書き込む指示をベクトルレジスタ33に与え
る。Upon activation of the addition instruction stored in PDI0Xi42, the vector control unit 37 gives an instruction to simultaneously read VR1 and VR2 to the vector register 33 and an instruction to select data of VR1 and VR2 to the first operand path and the second operand path. SEL36, and an instruction to add data from the first operand path and the second operand path to the PDI operator 0. Further, an instruction to select the pipeline adder result path 18 of the PDI operation unit 0 to VR0 is given to the DIST 35, and an instruction to write to VR0 is given to the vector register 33.

PDI0Ni41に格納した乗算命令は、加算命令の起動
からパイプライン加算器１の演算ステージタイムとフィ
ードバックパス11のトラベルタイムの合計時間経過後、
起動される。該起動制御は、命令起動と同時に、オペ
レーションサイクル毎に＋１するカウント回路を用い
て、カウンター動作を行い、あらかじめ登録してあるリ
リースカウント値（演算ステージタイム＋フィードバッ
クパストラベルタイム）とカウント値とが一致したとき
に行う。乗算命令の起動によりベクトル制御部37は、
VR4の読み出し指示をベクトルレジスタ33に与え、VR4の
データを第３オペランドパス17にセレクトする指示をSE
L36に与え、第３オペランドパス17とフィードバックパ
ス11からのデータを乗算する指示をPDI演算器０に与え
る。さらに、PDI演算器０のパイプライン乗算器リザル
トパス19をVR3にセレクトする指示をDIST35に与え、VR3
に乗算結果を書き込む指示をベクトルレジスタ33に与え
る。この結果、加算命令と乗算命令は、PDI演算器
０でオーバーラップして処理される。The multiplication instruction stored in PDI0Ni41 is generated after the sum of the operation stage time of the pipeline adder 1 and the travel time of the feedback path 11 since the start of the addition instruction.
Is activated. In the activation control, a counter operation is performed using a counting circuit that increments by one every operation cycle at the same time as the instruction activation, and the release count value (operation stage time + feedback path travel time) and the count value registered in advance are counted. Perform when they match. Upon activation of the multiplication instruction, the vector control unit 37
An instruction to read VR4 is given to the vector register 33, and an instruction to select data from VR4 to the third operand path 17 is given by SE.
L36, and instructs the PDI calculator 0 to multiply the data from the third operand path 17 and the feedback path 11. Further, an instruction to select the pipeline multiplier result path 19 of the PDI operator 0 to VR3 is given to the DIST 35, and the VR3
Is given to the vector register 33. As a result, the addition instruction and the multiplication instruction are overlapped and processed by the PDI calculator 0.

一方、PDI1Xi40に格納した乗算命令の起動は、PDI
演算器１がビジー状態に無いとすると格納と同時に行
う。すなわち、PDI0Niへの命令格納の次のオペレーシ
ョンサイクルで行う。当該起動によりベクトル制御部37
は、VR6とVR7を同時に読み出す指示をベクトルレジスタ
33に与え、VR6とVR7のデータを第４オペランドパス，第
５オペランドパスにセレクトする指示をSEL36に与え、
第４オペランドパス，第５オペランドパスからのデータ
を乗算実行することをPDI演算器１に与える。さらに、P
DI演算器１のパイプライン乗算器リザルトパス32をVR5
にセレクトする指示をDIST35に与え、VR5に書き込む指
示をベクトルレジスタ33に与える。On the other hand, activation of the multiplication instruction stored in PDI1Xi40
If the arithmetic unit 1 is not in the busy state, the processing is performed simultaneously with the storage. That is, the operation is performed in the next operation cycle after the instruction is stored in PDI0Ni. By the activation, the vector control unit 37
Sets the vector register to read VR6 and VR7 simultaneously.
33, and an instruction to select the data of VR6 and VR7 to the fourth operand path and the fifth operand path is given to SEL36.
The PDI operation unit 1 is instructed to multiply and execute data from the fourth and fifth operand paths. Furthermore, P
VR5 of pipeline multiplier result path 32 of DI operation unit 1
Is given to the DIST 35, and an instruction to write to VR5 is given to the vector register 33.

PDI1Ni39に格納した加算命令は、の乗算命令の起
動からパイプライン乗算器５の演算ステージタイムとフ
ィードバックパス25のトラベルタイムの合計時間経過
後、起動される。加算命令の起動によりベクトル制御
部37は、VR9の読み出し指示をベクトルレジスタ33に与
え、VR9のデータを第６オペランドパス30にセレクトす
る指示をSEL36に与え、第６オペランドパス３とフィー
ドバックパス25からのデータを加算する指示をPDI演算
器１に与える。さらに、PDI演算器１のパイプライン加
算器リザルトパス31をVR8にセレクトする指示をDIST35
に与え、VR8に乗算結果を書き込む指示をベクトルレジ
スタ33に与える。この結果、乗算命令と加算命令
は、PDI演算器１でオーバーラップして処理される。The addition instruction stored in PDI1Ni39 is activated after a total time of the operation stage time of the pipeline multiplier 5 and the travel time of the feedback path 25 has elapsed since the activation of the multiplication instruction. Upon activation of the addition instruction, the vector control unit 37 gives a read instruction of VR9 to the vector register 33, gives an instruction to select the data of VR9 to the sixth operand path 30 to the SEL, and outputs the instruction from the sixth operand path 3 and the feedback path 25. Is given to the PDI calculator 1. Further, an instruction to select the pipeline adder result path 31 of the PDI operation unit 1 to VR8 is issued by DIST35.
And the instruction to write the multiplication result in VR8 is given to the vector register 33. As a result, the multiplication instruction and the addition instruction are overlapped and processed by the PDI calculator 1.

このようにPDI演算器では、先行演算命令と後行演算
命令がチェイニング関係にあるケースにおいて、先行演
算命令のパイプライン演算結果をベクトルレジスタに書
き込みつつ、フィードバックパスを介して、ペアーを構
成している他方のパイプライン演算器にオペランド供給
が可能である。すなわち、一つのPDI演算器においてチ
ェイニング関係にある演算命令を２命令まで同時並列実
行可能である。当該処理をマクロチェイニングモードと
呼ぶことにする。マクロチェイニングモードでは、先行
演算結果がベクトルレジスタに書き込まれることによ
り、当該演算結果を並列動作可能な他のリソースとの間
でチェイニング処理することが可能である。例えば、第
５図に示す演算命令列ケース Vadd VR0 VR1 VR2⇒VR1＋VR2＝VR0 Vmlt VR3 VR4 VR0⇒VR4×VR0＝VR3 Vmlt VR5 VR6 VR0⇒VR6×VR7＝VR5 において、従来例の第８図に示したベクトルデータ処理
装置では、第５図（ｇ）に示すように命令列とが別
々の演算リソースでベクトルレジスタを介してチェイニ
ング処理されるため演算命令はリソースビジーになり
処理が遅れる。しかし、本発明による第１のベクトルデ
ータ処理装置では、第５図（ｈ）に示すように命令列
との間を一方のPDI演算器を用いてマクロチェイニン
グモード処理を行いつつ、他方のPDI演算器を用いて命
令列とのチェニニングをオーバーラップ処理可能で
あることから処理時間の短縮が実現できる。また、命令
をVR0ストア命令に変更した場合、先行演算結果を主
記憶にストアするチェイニング処理も可能であり、同様
の効果を得ることができる。なお、前記演算命令列〜
の一連の処理関係を第３図（ｂ）に示す。また、比較
として従来例の第８図に示したベクトルデータ処理装置
において、同一命令列実行時の処理関係を第３図（ａ）
に示す。本発明による第１のベクトルデータ処理装置に
おいては、第３図（ｂ）に示すように演算処理時間の短
縮が可能である。In this way, in the case where the preceding operation instruction and the following operation instruction are in a chaining relationship, the PDI operation unit forms a pair via the feedback path while writing the pipeline operation result of the preceding operation instruction to the vector register. Operands can be supplied to the other pipeline arithmetic unit. In other words, one PDI operation unit can simultaneously execute up to two operation instructions having a chaining relationship in parallel. This process is called a macro chaining mode. In the macro chaining mode, the result of the preceding operation is written to the vector register, so that the result of the operation can be chained with another resource that can operate in parallel. For example, in the operation instruction sequence case shown in FIG. 5, Vadd VR0 VR1 VR2⇒VR1 + VR2 = VR0 Vmlt VR3 VR4 VR0⇒VR4 × VR0 = VR3 Vmlt VR5 VR6 VR0⇒VR6 × VR7 = VR5, which is shown in FIG. 8 of the conventional example. In the vector data processing device, as shown in FIG. 5 (g), the instruction sequence is subjected to chaining processing via separate vector processing resources via the vector register, so that the operation instruction becomes resource busy and the processing is delayed. However, in the first vector data processing device according to the present invention, as shown in FIG. 5 (h), while performing macro-chaining mode processing using one PDI operator between the instruction sequence and the other PDI processing unit, The processing time can be shortened because the overlap with the instruction sequence can be performed using the arithmetic unit. When the instruction is changed to the VR0 store instruction, a chaining process for storing the result of the preceding operation in the main memory is also possible, and the same effect can be obtained. In addition, the above-mentioned operation instruction sequence
FIG. 3 (b) shows a series of processing relationships. As a comparison, FIG. 3 (a) shows the processing relationship when executing the same instruction sequence in the conventional vector data processing apparatus shown in FIG.
Shown in In the first vector data processing device according to the present invention, as shown in FIG. 3 (b), the calculation processing time can be reduced.

次に、本発明に係る第２のベクトルデータ処理装置の
実施例について説明する。Next, an embodiment of the second vector data processing device according to the present invention will be described.

第９図に、本実施例に係るベクトルデータ処理装置の
制御部を除く構成を示す。この第２のベクトルデータ処
理装置においては、加算と乗算を任意に組み合わせたマ
クロ命令をそれぞれの複合演算器で並列処理することが
できる。なお、第１のベクトルデータ処理装置において
は、マクロ演算命令について説明しなかったが、第１の
ベクトルデータ処理装置においてもマクロ演算命令も同
様に実行することができる。FIG. 9 shows a configuration excluding the control unit of the vector data processing device according to the present embodiment. In the second vector data processing device, a macro instruction in which addition and multiplication are arbitrarily combined can be processed in parallel by each of the complex arithmetic units. Although the macro operation instruction has not been described in the first vector data processing device, the macro operation instruction can be similarly executed in the first vector data processing device.

本実施例が第１の実施例と異なる点は、複合演算器内
の加算器と乗算器のいずれか一方の出力を選択する手段
を設けたことにある。これにより、第１の実施例に比
べ、演算結果をベクトルレジスタへ戻すリザルトパスの
数を半減させ、かつDISTの構成を簡略化することが可能
になる、以下、本実施例を具体的な構成および作用を説
明する。This embodiment is different from the first embodiment in that a means for selecting one of the outputs of the adder and the multiplier in the complex arithmetic unit is provided. This makes it possible to halve the number of result paths for returning operation results to the vector register and to simplify the configuration of the DIST as compared with the first embodiment. And the operation will be described.

第９図のベクトルデータ処理装置は、第１図のベクト
ルデータ処理装置同様、ベクトルレジスタ429a〜429h
と、スカラデータバッファ430a〜430cと、SEL432と、DI
ST431と、演算リソースである０番複合演算器403と、１
番複合演算器406で構成される。また、図示はしない
が、主記憶からベクトルレジスタへオペランドを供給す
るロードパイプラインとベクトルレジスタから主記憶へ
データを格納するストアパイプラインは、具備されてい
て、それぞれ制御系からはリソースとして制御する。The vector data processing device of FIG. 9 is similar to the vector data processing device of FIG.
, Scalar data buffers 430a to 430c, SEL432, DI
ST431, No. 0 compound arithmetic unit 403 as an arithmetic resource, and 1
It is composed of a number composite arithmetic unit 406. Although not shown, a load pipeline for supplying operands from the main memory to the vector register and a store pipeline for storing data from the vector register to the main memory are provided, and each is controlled as a resource from the control system. .

０番複合演算器403（以下、PDI演算器０という）は、
ベクトルデータ及びスカラデータを、ベクトルレジスタ
492a〜429hとスカラバッファ430a〜430cから読み出す第
１オペランドパス414,第２オペランドパス415,第３オペ
ランドパス416と、パイプライン加算器401と、パイプラ
イン乗算器402と、パイプライン加算器401の演算結果を
フィードバックしてパイプライン加算器401及びパイプ
ライン乗算器402のオペランドとするパス411と、パイプ
ライン乗算器402の演算結果をフィードバックしてパイ
プライン加算器401及びパイプライン乗算器402のオペラ
ンドとするパス412と、第１オペランドパス414とフィー
ドバックパス411とフィードバックパス412からの入力デ
ータを命令により選択してパイプライン加算器401の被
加数データとするセレクタ407と、第２オペランドパス4
15と第３オペランドパス416とフィードバックパス411と
フィードバックパス412からの入力データを命令により
選択してパイプライン加算器401の加数データとするセ
レクタ408と、第１オペランドパス414とフィードバック
パス411とフィードバックパス412からの入力データを命
令により選択してパイプライン乗算器402の被乗数デー
タとするセレクタ409と、第２オペランドパス415と第３
オペランドパス416とフィードバックパス411とフィード
バックパス412からの入力データを命令により選択して
パイプライン乗算器402の乗算データとするセレクタ410
と、パイプライン加算器401の演算結果フィードバック
パス411とパイプライン乗算器402の演算結果フィードバ
ックパス412をセレクトするセレクタ413と、PDI演算器
０の演算結果をベクトルレジスタに書き込むリザルトパ
ス417とから構成される。The No. 0 complex arithmetic unit 403 (hereinafter referred to as PDI arithmetic unit 0)
Vector registers and scalar data are stored in vector registers
492a to 429h and first operand path 414, second operand path 415, third operand path 416 read from scalar buffers 430a to 430c, pipeline adder 401, pipeline multiplier 402, and pipeline adder 401. A path 411 that feeds back the operation result and is used as an operand of the pipeline adder 401 and the pipeline multiplier 402, and a feedback path that feeds back the operation result of the pipeline multiplier 402 and the operand of the pipeline adder 401 and the pipeline multiplier 402. Path 412, a first operand path 414, a feedback path 411, and a selector 407 that selects input data from the feedback path 412 by an instruction and uses the data as augend data of the pipeline adder 401, and a second operand path 4
A selector 408 that selects input data from the fifteenth, third operand path 416, feedback path 411, and feedback path 412 by an instruction and uses it as addend data of the pipeline adder 401; a first operand path 414 and a feedback path 411; A selector 409 that selects input data from the feedback path 412 by an instruction and uses the data as multiplicand data of the pipeline multiplier 402, a second operand path 415, and a third
A selector 410 that selects input data from the operand path 416, the feedback path 411, and the feedback path 412 by an instruction and uses the data as multiplication data of the pipeline multiplier 402.
And a selector 413 for selecting the operation result feedback path 411 of the pipeline adder 401 and the operation result feedback path 412 of the pipeline multiplier 402, and a result path 417 for writing the operation result of the PDI operation unit 0 to the vector register. Is done.

１番複合演算器406（以下、PDI演算器１という）は、
ベクトルデータ及びスカラデータを、ベクトルレジスタ
429a〜429hとスカラバッファ430a〜430cから読み出す第
４オペランドパス425,第５オペランドパス426,第６オペ
ランドパス427と、パイプライン加算器404と、パイプラ
イン乗算器405と、パイプライン加算器404の演算結果を
フィードバックしてパイプライン加算器404及びパイプ
ライン乗算器405のオペランドとするパス422と、パイプ
ライン乗算器405の演算結果をフィードバックしてパイ
プライン加算器404及びパイプライン乗算器405のオペラ
ンドとするパス422と、第４オペランドパス425とフィー
ドバックパス422とフィードバックパス423からの入力デ
ータを命令により選択してパイプライン加算器404の被
加数データとするセレクタ418と、第５オペランドパス4
26と第６オペランドパス427とフィードバックパス422と
フィードバックパス423からの入力データを命令により
選択してパイプライン加算器404の加数データとするセ
レクタ419と、第４オペランドパス425とフィードバック
パス422とフィードバックパス423からの入力データを命
令により選択してパイプライン乗算器405の被乗数デー
タとするセレクタ420と、第５オペランドパス426と第６
オペランドパス427とフィードバックパス422とフィード
バックパス423からの入力データを命令により選択して
パイプライン乗算器405の乗数データとするセレクタ421
と、パイプライン加算器404の演算結果フィードバック
パス422とパイプライン乗算器405の演算結果フィードバ
ックパス423をセレクトするセレクタ424と、PDI演算器
１の演算結果をベクトルレジスタに書き込むリザルトパ
ス428とから構成される。The first composite computing unit 406 (hereinafter referred to as PDI computing unit 1)
Vector registers and scalar data are stored in vector registers
429a to 429h and fourth operand path 425, fifth operand path 426, sixth operand path 427, read from scalar buffers 430a to 430c, pipeline adder 404, pipeline multiplier 405, and pipeline adder 404. A path 422 that feeds back the operation result and is used as an operand of the pipeline adder 404 and the pipeline multiplier 405, and an operand of the pipeline adder 404 and the pipeline multiplier 405 that feeds back the operation result of the pipeline multiplier 405. 422, a selector 418 that selects input data from the fourth operand path 425, the feedback path 422, and the feedback path 423 by an instruction and uses it as addend data of the pipeline adder 404, and a fifth operand path 4
A selector 419 that selects input data from the 26, sixth operand path 427, feedback path 422, and feedback path 423 by an instruction and uses it as addend data of the pipeline adder 404; a fourth operand path 425 and a feedback path 422; A selector 420 which selects input data from the feedback path 423 by an instruction and sets the data as multiplicand data of the pipeline multiplier 405, a fifth operand path 426 and a sixth
A selector 421 that selects input data from the operand path 427, the feedback path 422, and the feedback path 423 by an instruction and uses the data as multiplier data of the pipeline multiplier 405.
And a selector 424 for selecting an operation result feedback path 422 of the pipeline adder 404 and an operation result feedback path 423 of the pipeline multiplier 405, and a result path 428 for writing the operation result of the PDI operation unit 1 to a vector register. Is done.

本実施例に係るベクトルデータ処理装置の命令制御部
の構成は、第１のベクトルデータ処理装置と同様で第２
図に示す通りであり、説明を省略する。The configuration of the instruction control unit of the vector data processing device according to the present embodiment is the same as that of the first vector data
As shown in the figure, the description is omitted.

第11図は、本発明による第２のベクトルデータ処理装
置において、第３図に示した演算命令列，をマクロ
命令として、一命令で実行する演算命令の一例を説明し
た図である。FIG. 11 is a diagram for explaining an example of an operation instruction executed by one instruction in the second vector data processing device according to the present invention, using the operation instruction sequence shown in FIG. 3 as a macro instruction.

図において、演算命令は、 Vaddmlt VR3 VR4 VR1 VR2⇒ （VR1＋VR2）×VR4＝VR3 であり、図示はしないが、第３図に示した演算命令列
，も Vmltadd VR8 VR9 VR6 VR7⇒ （VR6×VR7｝＋VR9＝VR8 と、マクロ命令で置き換え可能である。当該命令列は、
第２図に示した命令制御部37内の命令バッファ38に命令
を先頭として両マクロ命令が順次格納されているもの
とする。当該命令列を第９図に示す第２のベクトルデー
タ処理装置で処理する場合の動作は、先行命令において
PDI演算器0,1のリソースが使用されていないものとする
と、マクロ命令をPDI0Xiにアサインし、続くマクロ命
令は、PDI演算器０がbusy状態にあるためPDI1Xiにア
サインする。このように、演算命令列は、交互に演算リ
ソースを使用する演算スケジューリングを行うよう命令
キューにアサインする。In the figure, the operation instructions are as follows: Vaddmlt VR3 VR4 VR1 VR2 ⇒ (VR1 + VR2) × VR4 = VR3. Although not shown, the operation instruction sequence shown in FIG. 3 is also Vmltadd VR8 VR9 VR6 VR7 ⇒ (VR6 × VR7｝). + VR9 = VR8 and can be replaced by a macro instruction.
It is assumed that both macro instructions are sequentially stored in the instruction buffer 38 in the instruction control unit 37 shown in FIG. The operation when the instruction sequence is processed by the second vector data processing device shown in FIG.
Assuming that the resources of the PDI computing units 0 and 1 are not used, a macro instruction is assigned to PDI0Xi, and the subsequent macro instruction is assigned to PDI1Xi because the PDI computing unit 0 is in the busy state. In this way, the operation instruction sequence is assigned to the instruction queue so as to perform operation scheduling using the operation resources alternately.

PDI0Xi42に格納したのマクロ命令の起動によりベク
トル制御部37は、VR1とVR2とVR4を同時に読み出す指示
をベクトルレジスタ429a〜429hに与え、VR1とVR2とVR4
のデータを、それぞれ第１オペランドパス，第２オペラ
ンドパス，第３オペランドパスにセレクトする指示をSE
L432に与え、第１オペランドパス，第２オペランドパス
からのデータを加算実行し、その加算結果であるフィー
ドバックパス411からのデータと第３オペランドパス416
からのデータを乗算する指示をPDI演算器０に与える。
この時、第10図に示すように、第３オペランドパスから
PDI演算器０に入力するオペランドを遅延回路501を設
け、遅延制御指示を発行することにより先行演算、この
場合加算処理の演算ステージタイム分遅延させる。この
ことにより加算結果の先頭ベクトルデータと第３オペラ
ンドパスから入力されるベクトルデータの先頭要素が同
時に演算開始できる。尚、図示はしないが遅延回路501
と同様の構成が第６オペランドパス427にも設けられて
いる。ところで、この遅延処理は、第３オペランドパ
ス，第６オペランドパスへのベクトルレジスタ読み出し
指示を、第１のベクトルデータ処理装置同様リリースカ
ウント回路を用いて遅らせる方法でも可能である。さら
に制御部37は、PDI演算器０のパイプライン乗算器402出
力結果を選択しリザルトパス417に送出する指示をセレ
クタ413に与え、PDI演算器０のリザルトパス417をVR3に
セレクトする指示をDIST431に与え、VR3にマクロ演算結
果を書き込む指示をベクトルレジスタ429a〜429hに与え
る。この結果、マクロ命令は、PDI演算器０で処理さ
れる。Upon activation of the macro instruction stored in PDI0Xi42, the vector control unit 37 gives an instruction to simultaneously read VR1, VR2, and VR4 to the vector registers 429a to 429h, and outputs VR1, VR2, and VR4.
SE to select the first data into the first operand path, the second operand path, and the third operand path, respectively.
L432, the data from the first operand path and the data from the second operand path are added and executed, and the data obtained from the feedback path 411 and the third operand path 416 are added.
Is given to the PDI calculator 0.
At this time, as shown in FIG.
A delay circuit 501 is provided for an operand input to the PDI calculator 0, and a delay control instruction is issued to delay the preceding operation, in this case, the operation stage time of the addition processing. Thus, the operation of the head vector data of the addition result and the head element of the vector data input from the third operand path can be started simultaneously. Although not shown, the delay circuit 501
A configuration similar to that described above is also provided in the sixth operand path 427. Incidentally, this delay processing can be performed by a method of delaying the vector register read instruction to the third operand path and the sixth operand path using a release count circuit as in the first vector data processing device. Further, the control unit 37 gives to the selector 413 an instruction to select the output result of the pipeline multiplier 402 of the PDI arithmetic unit 0 and send it to the result path 417, and gives an instruction to select the result path 417 of the PDI arithmetic unit 0 to VR3 to the DIST 431. And an instruction to write the macro operation result to VR3 is given to the vector registers 429a to 429h. As a result, the macro instruction is processed by the PDI calculator 0.

一方、PDI1Xi40に格納したマクロ命令の起動は、PD
I演算器１がビジー状態に無いとすると格納と同時に行
う。すなわち、PDI0Xiへのマクロ命令格納の次のオペ
レーションサイクルで行う。当該起動によりベクトル制
御部37は、VR6とVR7とVR9を同時に読み出す指示をベク
トルレジスタ429a〜429hに与え、VR1とVR2とVR4のデー
タを、それぞれ第４オペランドパス，第５オペランドパ
ス，第６オペランドパスにセレクトする指示をSEL432に
与え、第４オペランドパス，第５オペランドパスからの
データを乗算実行し、その乗算結果であるフィードバッ
クパス23からのデータと遅延した第６オペランドパス42
7からのデータを加算する指示をPDI演算器１に与える。
さらに制御部37は、PDI演算器１のパイプライン加算器4
04出力結果を選択しリザルトパス428に送出する指示を
セレクト424に与え、PDI演算器１のリザルトパス428をV
R8にセレクトする指示をDIST431に与え、VR8にマクロ演
算結果を書き込む指示をベクトルレジスタ429a〜429hに
与える。この結果、マクロ命令は、PDI演算器１で処
理される。On the other hand, the activation of the macro instruction stored in PDI1Xi40
If the I operation unit 1 is not in the busy state, the operation is performed simultaneously with the storage. That is, it is performed in the next operation cycle after the storage of the macro instruction in the PDI0Xi. With this activation, the vector control unit 37 gives an instruction to simultaneously read VR6, VR7, and VR9 to the vector registers 429a to 429h, and stores the data of VR1, VR2, and VR4 in the fourth operand path, the fifth operand path, and the sixth operand, respectively. An instruction to select the path is given to the SEL 432, the data from the fourth and fifth operand paths are multiplied and executed, and the data from the feedback path 23, which is the result of the multiplication, and the delayed sixth operand path 42
The instruction to add the data from 7 is given to the PDI calculator 1.
Further, the control unit 37 controls the pipeline adder 4 of the PDI operation unit 1.
04 The instruction to select the output result and send it to the result path 428 is given to the select 424, and the result path 428 of the PDI operator 1 is set to V
The instruction to select is given to R8 to DIST 431, and the instruction to write the macro operation result to VR8 is given to vector registers 429a to 429h. As a result, the macro instruction is processed by the PDI calculator 1.

このように本発明による第２のベクトルデータ処理装
置においては、加算と乗算を任意に組合せたマクロ演算
命令を２命令まで同時並列実行可能であり、演算処理時
間の短縮が可能である。As described above, in the second vector data processing apparatus according to the present invention, up to two macro operation instructions in which addition and multiplication are arbitrarily combined can be executed simultaneously in parallel, and the operation processing time can be reduced.

ところで、第1,第２のベクトルデータ処理装置は、３
つのスカラデータバッファを任意のオペランドパスに接
続可能であることから、スカラデータ，ベクトルデータ
を任意のオペランドに割り付ける柔軟な演算処理が可能
である。さらに、PDI演算器のフィードバックパスを加
算，乗算のパイプライン演算器毎に設け、該フィードバ
ックパスをそれぞれのパイプライン演算器の両方のオペ
ランドとして選択できることから、ベクトルデータもし
くはスカラデータを、A,B,Cとすると、以下に示す演算
処理が１つのマクロ演算命令で実行可能である。By the way, the first and second vector data processing devices
Since one scalar data buffer can be connected to an arbitrary operand path, flexible arithmetic processing for assigning scalar data and vector data to an arbitrary operand is possible. Further, a feedback path of the PDI operation unit is provided for each of the addition and multiplication pipeline operation units, and the feedback path can be selected as both operands of each of the pipeline operation units. , C, the following arithmetic processing can be executed by one macro operation instruction.

（１）Ａ±Ｂ（２）Ａ×Ｂ（３）（Ａ±Ｂ）×Ｃ（４）（Ａ×Ｂ）±Ｃ（５）（Ａ±Ｂ）さらに、本発明による第1,第２のベクトルデータ処理
装置に、１オペレーションサイクル中に複数のベクトル
エレメントを並列に処理する要素並列方式を組み合わせ
て構成した場合、例えば、４エレメント並列処理の場
合、4i（ｉ＝0,1,2…）番エレメントを処理する複数の
ベクトルレジスタと２つのPDI演算器の組と、4i＋１番
エレメントを処理する組、4i＋２番エレメントを処理す
る組、4i＋３番エレメントを処理する各々独立した組か
ら構成され、それどれの組では、１オペレーションサイ
クルに１エレメントの処理が可能であり、装置全体で、
４エレメントの演算並列処理が可能であることから、更
に高い処理性能を得ることが可能である。(1) A ± B (2) A × B (3) (A ± B) × C (4) (A × B) ± C (5) (A ± B) Further, when the first and second vector data processing devices according to the present invention are configured by combining an element parallel method of processing a plurality of vector elements in parallel in one operation cycle, for example, in the case of four-element parallel processing, A set of a plurality of vector registers and two PDI operators for processing the 4i (i = 0, 1, 2...) Element, a set for processing the 4i + 1 element, a set for processing the 4i + 2 element, and a 4i + 3 element It is composed of independent sets for processing, and each set can process one element in one operation cycle.
Since the operation parallel processing of four elements is possible, higher processing performance can be obtained.

また、本発明による第1,第２のベクトルデータ処理装
置の方式、特にマクロチェイニングモード方式は、ベク
トルレジスタ，スカラレジスタ等のレジスタ類と複数の
パイプライン演算器を１チップに集積するマイクロプロ
セッサ等にはトラベルタイムを低減することが可能であ
り有用である。Further, the first and second vector data processing systems according to the present invention, particularly the macro chaining mode system, use a microprocessor in which registers such as a vector register and a scalar register and a plurality of pipeline arithmetic units are integrated on one chip. For example, the travel time can be reduced, which is useful.

［発明の効果］以上のように、本発明によれば、次に示す効果を持っ
てベクトルデータ処理装置を提供することができる。[Effects of the Invention] As described above, according to the present invention, a vector data processing device can be provided with the following effects.

（１）同一種の演算命令が連続するケースにおいて演算
リソースネックを緩和することができる。(1) In the case where the same type of operation instruction continues, the operation resource bottleneck can be reduced.

（２）同じく、加算器および乗算器からなる同一構成の
複合演算器を２組有するので、演算命令のスケジューリ
ング制御が容易となる。(2) Similarly, since there are two sets of composite arithmetic units having the same configuration including an adder and a multiplier, scheduling control of arithmetic instructions becomes easy.

（３）各複合演算器には３つのオペランドパスを設けた
ので、オールベクトルオペランドによる複合演算が可能
となる。(3) Since three operand paths are provided in each compound operation unit, compound operation using all vector operands can be performed.

（４）（３）に加え、各複合演算器の加算器および乗算
器の各出力を当該加算器および乗算器の入力に帰還する
フィードバックパスを設けたので、ベクトルレジスタを
介在せずに、乗算と加算を任意に組み合わせた複合演算
が可能になる。(4) In addition to (3), a feedback path is provided to feed back the outputs of the adders and multipliers of each complex arithmetic unit to the inputs of the adders and multipliers. It is possible to perform a composite operation by arbitrarily combining and addition.

（５）以上より、データパス及び制御論理のハードウェ
ア量増加を最小限に押さえ、かつ処理時間の短縮を実現
しすることができる。(5) As described above, it is possible to minimize an increase in the amount of hardware of the data path and the control logic, and to shorten the processing time.

[Brief description of the drawings]

第１図は本発明の一実施例に係る第１のベクトルデータ
処理装置のデータ系概略ブロック図、第２図は本発明に
係るベクトルデータ処理装置の命令制御部の概略ブロッ
ク図、第３図（ａ）は従来例によるベクトルデータ処理
装置の処理概念を示すタイミング図、第３図（ｂ）は本
発明による第１のベクトルデータ処理装置の処理概念を
示すタイミング図、第４図（ｃ），（ｅ）は従来例によ
るベクトルデータ処理装置の処理概念を示すタイミング
図、第４図（ｄ），（ｆ）は、本発明による第1,第２の
ベクトルデータ処理装置及び従来例の演算器並列方式ベ
クトルデータ処理装置の処理概念を示すタイミング図、
第５図（ｇ）は従来例によるベクトルデータ処理装置の
処理概念を示すタイミング図、第５図（ｈ）は本発明に
よる第１のベクトルデータ処理装置のマクロチェイニン
グモード処理概念を示すタイミング図、第５図（ｉ）は
オールベクトルオペランドのマクロ命令処理が不可能な
従来例によるベクトルデータ処理装置の処理概念を示す
タイミング図、第６図は演算リソースネックになりやす
い従来のベクトルデータ処理装置を示すブロック図、第
７図はオールベクトルオペランドのマクロ命令処理が不
可能な従来のベクトルデータ処理装置を示すブロック
図、第８図は従来の演算器並列方式ベクトルデータ処理
装置を示すブロック図、第９図は本発明のベクトルデー
タ処理装置の第２の実施例のブロック図、第10図は第９
図のベクトルデータ処理装置のオペランド遅延回路を示
すブロック図、第11図は本発明による第２のベクトルデ
ータ処理装置の処理概念の一例を示すタイミング図であ
る。 1,4……パイプライン加算器、2,5……パイプライン乗算
器、３……０番複合演算器（PDI演算器０）、６……１
番複合演算器（PDI演算器１）、７〜10……セレクタ、1
1,12……フィードバックパス、13,14……出力回路、15
……第１オペランドパス、16……第２オペランドパス、
17……第３オペランドパス、18,19……リザルトパス、2
0〜23……セレクタ、24,25……フィードバックパス、2
6,27……出力回路、28……第４オペランドパス、29……
第５オペランドパス、30……第６オペランドパス、31,3
2……リザルトパス、33a〜33h,33……ベクトルレジス
タ、34a〜34c……スカラデータバッファ、35……DIST、
36……SEL、37……命令制御部、38……命令バッファ、3
9〜42……命令キュー、43,44……制御パス、501……遅
延回路。FIG. 1 is a schematic block diagram of a data system of a first vector data processing device according to an embodiment of the present invention, FIG. 2 is a schematic block diagram of an instruction control unit of the vector data processing device according to the present invention, and FIG. FIG. 3A is a timing chart showing a processing concept of a conventional vector data processing apparatus, FIG. 3B is a timing chart showing a processing concept of a first vector data processing apparatus according to the present invention, and FIG. , (E) are timing diagrams showing the processing concept of the conventional vector data processing device, and FIGS. 4 (d), (f) are the first and second vector data processing devices according to the present invention and the operation of the prior art. Diagram showing the processing concept of the device parallel system vector data processing device,
FIG. 5 (g) is a timing chart showing the processing concept of the conventional vector data processing apparatus, and FIG. 5 (h) is a timing chart showing the macro chaining mode processing concept of the first vector data processing apparatus according to the present invention. FIG. 5 (i) is a timing chart showing the processing concept of a conventional vector data processing apparatus which cannot execute the macro instruction processing of all vector operands, and FIG. 6 is a conventional vector data processing apparatus which tends to become an operation resource bottleneck. FIG. 7 is a block diagram showing a conventional vector data processing device which cannot perform macro instruction processing of all vector operands, FIG. 8 is a block diagram showing a conventional arithmetic unit parallel system vector data processing device, FIG. 9 is a block diagram of a second embodiment of the vector data processing apparatus of the present invention, and FIG.
FIG. 11 is a block diagram showing an operand delay circuit of the vector data processing device shown in FIG. 11, and FIG. 11 is a timing chart showing an example of the processing concept of the second vector data processing device according to the present invention. 1,4... Pipeline adder, 2,5... Pipeline multiplier, 3... 0 complex computing unit (PDI computing unit 0), 6.
No. compound arithmetic unit (PDI arithmetic unit 1), 7 to 10 ... selector, 1
1,12 ... feedback path, 13,14 ... output circuit, 15
... First operand path, 16... Second operand path,
17… 3rd operand pass, 18,19 …… Result pass, 2
0 to 23: Selector, 24, 25 ... Feedback path, 2
6,27 …… Output circuit, 28 …… Fourth operand path, 29 ……
5th operand path, 30 ... 6th operand path, 31,3
2 ... Result path, 33a-33h, 33 ... Vector register, 34a-34c ... Scalar data buffer, 35 ... DIST,
36 ... SEL, 37 ... Instruction control unit, 38 ... Instruction buffer, 3
9 to 42: instruction queue, 43, 44: control path, 501: delay circuit.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭64−67678（ＪＰ，Ａ) 特開昭61−62174（ＪＰ，Ａ) 特開昭56−88561（ＪＰ，Ａ) ＦＵＪＩＴＳＵＶｏｌ．41 Ｎｏ. １Ｐ．３−11 ＩＥＥＥＩｎｔＣｏｎｆＡｃｏｕｓｔＳｐｅｅｃｈＳｉｇｎａｌＰｒｏｃｅｓｓＶｏｌ．1983 Ｎｏ. １Ｐ．447−450 ────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A-64-67678 (JP, A) JP-A-61-62174 (JP, A) JP-A-56-88561 (JP, A) FUJITSU Vol. 41 No. 1P. 3-11 IEEE Int Conf Acoustic Speech Signal Process Vol. 1983 No. 1P. 447−450

Claims

(57) [Claims]

A vector data processing apparatus having a vector data buffer for holding a plurality of sets of vector data composed of a plurality of vector elements and performing a vector operation on the vector data held in the approximate vector data buffer. And a total of six operand paths, each of which is provided with three for each of the two composite arithmetic units and provides one input of the corresponding composite arithmetic unit, An adder, a multiplier, a first feedback path for providing the output of the adder as one input of the complex arithmetic unit, and a second feedback path for providing the output of the multiplier as one input of the complex arithmetic unit. A feedback path, i is provided for each of the inputs of the adder and the inputs of the multiplier of the complex arithmetic unit, i is provided for the complex arithmetic unit. A first selection in which two inputs given by two of the three operand paths are two inputs of an adder; and ii, three inputs of the three operand paths provided for the complex operation unit. A second selection of two inputs given by the two operand paths as two inputs of the multiplier; ii.
i, three inputs given by three operand paths provided for the complex arithmetic unit are taken as two inputs of the adder and one input of the multiplier, and one input given by the first feedback path is taken as the multiplier Iv, the three inputs given by the three operand paths provided for the complex arithmetic unit are taken as two inputs of the multiplier and one input of the adder, and And a fourth selection unit that makes one input given by the feedback path as the remaining one input of the adder; and a selection unit that performs at least four selections.

2. A method according to claim 1, further comprising: allocating a preceding operation instruction to a non-busy complex operation unit and allocating a subsequent operation instruction to the other composite operation unit irrespective of the type of the operation instruction. 2. The vector data processing device according to claim 1, further comprising an operation instruction scheduling means for allocating to the operation unit.

3. The arithmetic instruction scheduling means according to claim 1, wherein a register for storing a result of a preceding addition (or multiplication) instruction is designated by a subsequent multiplication (or addition) instruction as the operand register. 3. The vector data processing device according to claim 2, wherein the vector data is assigned to the composite arithmetic unit.

4. The selecting means of each of the compound arithmetic units is configured to execute the above-mentioned operation when a register for storing a result of a preceding addition (or multiplication) instruction is designated by a subsequent multiplication (or addition) instruction as the operand register. The third (or fourth) selection is performed, the operation of the preceding first operation instruction is processed by the adder (or multiplier), and the operation of the subsequent operation instruction is performed by the multiplier (or adder) 2. The vector processing apparatus according to claim 1, wherein the processing is performed.

5. The multi-function device according to claim 3, wherein the operation instruction is a macro operation instruction using a result of preceding addition (or multiplication) as operand data of a subsequent multiplication (or addition). 2. The vector processing device according to claim 1, wherein the selection is performed, the preceding operation is processed by an adder, and the subsequent operation is processed by a multiplier.

6. The method according to claim 1, wherein at a timing when a result of the preceding operation processing is input to the operation unit for performing the subsequent operation processing via the first (or second) feedback path, the operand from the operand path is used for the subsequent operation. 5. The system according to claim 4, further comprising means for delaying the operand itself or an instruction to read the operand so that the operand arrives at a processing unit for processing.
Or the vector data processing device according to 5.

7. The vector data according to claim 1, wherein each of the composite arithmetic units has an operation result path for separately outputting each operation result to the outside of each of the adder and the multiplier. Processing equipment.

8. The vector according to claim 1, wherein each of the composite arithmetic units has an operation result path for selectively outputting one of both operation results to the outside for each of the adder and the multiplier. Data processing device.

9. An element-parallel vector data processing apparatus for processing a plurality of vector elements in parallel in one operation cycle, wherein the apparatus according to claim 1 is employed for at least one element parallel. Processing equipment.

10. The vector data processing apparatus according to claim 1, wherein at least said vector data buffer and said second composite arithmetic unit are constructed in a one-chip microprocessor.

11. A composite operation comprising: a vector data buffer for holding a plurality of sets of vector data composed of a plurality of vector elements; a scalar data buffer for holding a plurality of scalar data; and an adder and a multiplier. And a total of six operand paths for providing vector data or scalar data operands to the corresponding composite arithmetic units, three each for each of the two composite arithmetic units, and the vector data buffer Means for connecting individual read paths of the scalar data buffer to each operand path in response to a program instruction; and, in response to the program instruction, each output of each adder and each multiplier of each of the complex arithmetic units, Means for independently connecting to and writing to a vector data buffer or a scalar data buffer. A first feedback path for providing the output of the adder of the complex arithmetic unit as one operand of the complex arithmetic unit, and the complex feedback of the output of the multiplier of the complex arithmetic unit. A second feedback path provided as one operand of the arithmetic unit, i as the selection of each operand of the adder and each operand of the multiplier of the complex arithmetic unit, i, three operand paths provided for the complex arithmetic unit, Selection, in which the two operands given by the two operand paths are two operands of the adder, and ii, two operand paths of the three operand paths provided for the complex arithmetic unit A second selection in which the two operands given by the above are used as two operands of the multiplier, and iii, A third selection in which the three operands given by the perland path are taken as two operands of the adder and one operand of the multiplier, and one operand given by the first feedback path is taken as the remaining one operand of the multiplier; The three operands provided by the three operand paths provided for the complex arithmetic unit are defined as two operands of the multiplier and one operand of the adder, and one operand provided by the second feedback path is defined as one remaining operand of the adder. A selecting means for performing at least four selections of a fourth selection as an operand in response to a program instruction.