JPH06149864A

JPH06149864A - Vector processor

Info

Publication number: JPH06149864A
Application number: JP30126692A
Authority: JP
Inventors: Makoto Koga; 誠古賀; Masamori Kashiyama; 正守柏山; Yasuhiro Maeda; 康博前田
Original assignee: Hitachi Ltd; Hitachi Computer Engineering Co Ltd
Current assignee: Hitachi Ltd; Hitachi Computer Engineering Co Ltd
Priority date: 1992-11-11
Filing date: 1992-11-11
Publication date: 1994-05-31

Abstract

PURPOSE:To obtain a vector processor capable of processing a macro operation code efficiently and at high speed with a small amount of materials. CONSTITUTION:This processor is equipped with arithmetic resources 6, 7 comprised of plural arithmetic parts consisting of a vector computing element which processes vector data held with vector registers VRO-VR31 and a scalar data buffer 5, and a second scalar data buffer 11 which fetches the data from the buffer on one side of two pairs of arithmetic resources 6, 7 and stores data to be inputted to a post-processing computing element 13. The resources 6, 7 after delivering a computed result to the second buffer 11 can start arithmetic processing by the next macro operation code without awaiting the completion of the processing of the post-processing computing element, which enables the processing of the macro operation code to be executed at high speed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、全ベクトル要素間演算
命令（マクロ演算命令）処理を実行するベクトル処理装
置に係り、特に、複数の演算器により構成される演算リ
ソースを備えるベクトル処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a vector processing device for executing an operation instruction (macro operation instruction) between all vector elements, and more particularly to a vector processing device provided with operation resources composed of a plurality of operation units. .

【０００２】[0002]

【従来の技術】一般に、ベクトル処理装置は、ベクトル
命令の処理速度を上げるため、ベクトルデータの複数の
ベクトル要素を並列に処理するためのベクトル演算器を
複数個備えて構成され、これにより、ベクトル命令の処
理速度の向上を図っている。このような構成のベクトル
処理装置は、１つの命令を複数のベクトル演算器で同時
に処理することにより、ベクトルデータ同志の和、積等
の演算処理を、より高速に行うことができるものであ
る。2. Description of the Related Art In general, a vector processing device is provided with a plurality of vector arithmetic units for processing a plurality of vector elements of vector data in parallel in order to increase the processing speed of vector instructions. The processing speed of instructions is improved. The vector processing device having such a configuration is capable of performing arithmetic processing such as sum and product of vector data at a higher speed by simultaneously processing one instruction by a plurality of vector arithmetic units.

【０００３】そして、この種のベクトル処理装置は、内
積、総和、比較等の全ベクトル要素間演算命令（マクロ
演算命令）処理のためには、各ベクトル演算器で求めら
れた中間結果を最後にまとめる演算処理が必要である。In this type of vector processing device, the intermediate result obtained by each vector operation unit is finally given for the operation instruction between all vector elements (macro operation instruction) such as inner product, summation, comparison and the like. Arithmetic processing is required.

【０００４】このような全ベクトル要素間演算命令の処
理を行うことのできるベクトル処理装置に関する従来技
術として、例えば、特開昭６３−１０２６３号公報等に
記載された技術が知られている。As a conventional technique relating to a vector processing apparatus capable of processing such an operation instruction between all vector elements, for example, a technique described in Japanese Patent Laid-Open No. 63-10263 is known.

【０００５】この従来技術は、ベクトル演算器により求
められた中間結果を保持するスカラデータバッファと、
そのバッファからデータを取り出して最後に後処理演算
を行う後処理演算器とを設け、これにより、内積、総
和、比較等の全ベクトル要素間演算命令の処理を行うと
いうものである。This prior art is a scalar data buffer that holds an intermediate result obtained by a vector operation unit,
A post-processing arithmetic unit for taking out data from the buffer and finally performing a post-processing operation is provided, whereby processing of operation instructions between all vector elements such as inner product, summation, and comparison is performed.

【０００６】以下、この種の従来技術を図面により説明
する。A conventional technique of this type will be described below with reference to the drawings.

【０００７】図４は演算リソース及び後処理演算器を備
える従来技術の構成を示すブロック図、図５は複数組の
演算リソース及び後処理演算器を備える従来技術の構成
を示すブロック図、図６は図５におけるマクロ演算命令
処理時の演算リソース及び後処理演算器の処理ピッチを
説明する図である。図４、図５において、３、１２、１
７はセレクタ、４はベクトル演算器、５はスカラデータ
バッファ、６は演算リソース、６ａ〜６ｂは演算部、１
３は後処理演算器、ＶＲ０〜ＶＲ３１はベクトルレジス
タ、ＳＲ０〜ＳＲ３１はスカラレジスタである。FIG. 4 is a block diagram showing a configuration of a conventional technique having a computing resource and a post-processing computing unit, FIG. 5 is a block diagram showing a configuration of a conventional technique having a plurality of sets of computing resources and a post-processing computing unit, FIG. FIG. 6 is a diagram for explaining processing resources and a processing pitch of a post-processing operation unit when processing a macro operation instruction in FIG. 5. 4 and 5, 3, 12, 1
7 is a selector, 4 is a vector calculator, 5 is a scalar data buffer, 6 is a calculation resource, 6a to 6b are calculation units, 1
3 is a post-processing arithmetic unit, VR0 to VR31 are vector registers, and SR0 to SR31 are scalar registers.

【０００８】図４に示す従来技術は、ベクトルレジスタ
に保持されているベクトルデータの２５６個のベクトル
要素を並列に処理する４個のベクトル演算器、及び、４
個のベクトル演算器４で求められた中間結果を保持する
４個のスカラデータバッファにより構成される演算リソ
ースと、スカラデータバッファからデータを取り出して
最後に後処理を行う後処理演算器とを備えて構成される
ベクトル処理装置の例である。In the prior art shown in FIG. 4, four vector calculators for processing 256 vector elements of vector data held in a vector register in parallel, and four vector calculators are provided.
And a post-processing arithmetic unit that takes out data from the scalar data buffer and finally performs post-processing, with arithmetic resources composed of four scalar data buffers that hold the intermediate results obtained by the vector arithmetic units 4. 2 is an example of a vector processing device configured as follows.

【０００９】すなわち、図４に示すベクトル処理装置
は、高速のランダムアクセスメモリ（ＲＡＭ）で構成さ
れ、それぞれ独立に、読み出し書き込みが可能で、２５
６要素のベクトルデータを保持することのできる３２個
のベクトルレジスタＶＲ０〜ＶＲ３１と、スカラデータ
を格納する３２個のスカラレジスタＳＲ０〜ＳＲ３１
と、ベクトルレジスタＶＲ０〜ＶＲ３１とスカラレジス
タＳＲ０〜ＳＲ３１との出力データを命令により演算器
へ選択出力するスイッチマトリックス論理のセレクタ３
と、ベクトル演算器４及びその演算結果を格納するスカ
ラデータバッファ５により構成される複数の演算部６ａ
〜６ｄにより構成される演算リソース６と、セレクタ３
を通して供給されるオペランドデータを演算リソース６
に入力するオペランドパス８ａ〜８ｈと、後処理演算器
にオペランドで指定されたスカラレジスタのデータを入
力するオペランドパス１０と、後処理演算結果のフィー
ドバックパス２０と、演算リソース６の中のそれぞれの
演算部内のスカラデータバッファ５のデータ、スカラレ
ジスタＳＲ０〜ＳＲ３１のデータ及び後処理演算結果の
フィードバックデータのうちから後処理演算器の入力デ
ータを選択出力するスイッチマトリックス論理で構成さ
れるセレクタ１２と、スカラデータバッファ５のデー
タ、スカラレジスタＳＲ０〜ＳＲ３１のデータ及び後処
理演算結果のフィードバックデータを演算する後処理演
算器１３と、ベクトル演算器４の演算結果を出力するリ
ザルトパス１４と、後処理演算器１３の演算結果を出力
するリザルトパス１６と、各リザルトパスからの演算結
果をベクトルレジスタＶＲ０〜ＶＲ３１またはスカラレ
ジスタＳＲ０〜ＳＲ３１に書き込むための選択を行うス
イッチマトリックス論理により構成されるセレクタ１７
とを備えて構成されている。That is, the vector processing device shown in FIG. 4 is composed of a high speed random access memory (RAM), and can read and write independently of each other.
32 vector registers VR0 to VR31 capable of holding vector data of 6 elements, and 32 scalar registers SR0 to SR31 storing scalar data.
And a selector 3 of a switch matrix logic for selectively outputting the output data of the vector registers VR0 to VR31 and the scalar registers SR0 to SR31 to an arithmetic unit by an instruction.
And a plurality of arithmetic units 6a composed of the vector arithmetic unit 4 and the scalar data buffer 5 storing the arithmetic result thereof.
To 6d, the calculation resource 6 and the selector 3
Operand data supplied through
Operand paths 8a to 8h to be input to the following, an operand path 10 for inputting data of a scalar register designated by an operand to a post-processing arithmetic unit, a feedback path 20 of a post-processing arithmetic result, and each of the arithmetic resources 6 A selector 12 composed of a switch matrix logic for selecting and outputting input data of a post-processing arithmetic unit from data of the scalar data buffer 5 in the arithmetic unit, data of the scalar registers SR0 to SR31 and feedback data of the post-processing arithmetic result; A post-processing arithmetic unit 13 that calculates data in the scalar data buffer 5, data in the scalar registers SR0 to SR31, and feedback data of the post-processing arithmetic result, a result path 14 that outputs the arithmetic result of the vector arithmetic unit 4, and a post-processing arithmetic. Result path 1 which outputs the calculation result of the device 13 When the selector 17 composed of a switch matrix logic for selecting to write the operation result from the result path vector registers VR0~VR31 or scalar register SR0~SR31
And is configured.

【００１０】そして、ベクトル演算器４は、加算器、乗
算器あるいは乗／加算器であり、加算器の場合、入力デ
ータを全て加算する機能｛Σ〔Ａ（ｉ）＋Ｂ（ｉ）〕｝
及び比較する機能｛ＭａｘＡ（ｉ）、ＭｉｎＡ（ｉ）｝
を有し、乗算器の場合、入力データを乗算しその結果を
すべて加算する機能｛ΣＡ（ｉ）×Ｂ（ｉ）｝を有し、
さらに、乗／加算器の場合、片方の演算器の結果が他方
の演算器の入力データとなる機能を有している。The vector calculator 4 is an adder, a multiplier or a multiplier / adder, and in the case of an adder, a function of adding all input data {Σ [A (i) + B (i)]}.
And the function of comparing {MaxA (i), MinA (i)}
In the case of a multiplier, it has a function of multiplying input data and adding all the results {ΣA (i) × B (i)},
Further, in the case of the multiplier / adder, it has a function that the result of one arithmetic unit becomes the input data of the other arithmetic unit.

【００１１】また、図示ベクトル処理装置は、図示しな
いが、主記憶からベクトルレジスタにオペランドデータ
を供給するロードパイプラインとベクトルレジスタから
主記憶にデータを格納するストアパイプラインとが備え
られている。Although not shown, the illustrated vector processing device is provided with a load pipeline for supplying operand data from the main memory to the vector register and a store pipeline for storing data from the vector register in the main memory.

【００１２】次に、図４に示すように構成されるベクト
ル処理装置において、連立一次方程式の解法に有効な内
積演算Ｓ＝Ｓ＋Σ｛Ａ（ｉ）×Ｂ（ｉ）｝（ｉ＝０〜２５
５）をマクロ演算命令として１命令で処理する場合の動作を
説明する。Next, in the vector processing device constructed as shown in FIG. 4, an inner product operation S = S + Σ {A (i) × B (i)} (i = 0 to 25) effective for solving simultaneous linear equations.
The operation when 5) is processed by one instruction as a macro operation instruction will be described.

【００１３】各要素に対する演算を行うベクトル演算器
（乗算器）４を含む演算部６ａ〜６ｄは、それぞれ、演
算部６ａが、Σ｛Ａ（４ｊ）×Ｂ（４ｊ）｝を演算し、
演算部６ｂが、Σ｛Ａ（４ｊ＋１）×Ｂ（４ｊ＋１）｝
を演算し、演算部６ｃが、Σ｛Ａ（４ｊ＋２）×Ｂ（４
ｊ＋２）｝を演算し、演算部６ｄが、Σ｛Ａ（４ｊ＋
３）×Ｂ（４ｊ＋３）｝を演算する。但し、ｊ＝０〜６
３である。Each of the arithmetic units 6a to 6d including the vector arithmetic unit (multiplier) 4 for performing an arithmetic operation on each element, the arithmetic unit 6a calculates Σ {A (4j) × B (4j)},
The calculation unit 6b calculates Σ {A (4j + 1) × B (4j + 1)}.
And the calculation unit 6c calculates Σ {A (4j + 2) × B (4
j + 2)}, and the calculation unit 6d calculates Σ {A (4j +
3) × B (4j + 3)} is calculated. However, j = 0 to 6
It is 3.

【００１４】前述によりベクトル演算器４で求まった中
間結果は、スカラデータバッファ５に格納される。セレ
クタ１２は、パス１０、４７ａ〜４７ｄ、２０から後処
理演算器１３に対する入力データを選択出力し、後処理
演算器１３は、各ベクトル演算器の中間結果とオペラン
ドで指定されたスカラレジスタのデータとのすべての和
を演算し、最終結果をリザルトパス１６に出力する。セ
レクタ１７は、オペランドで指定されたスカラレジスタ
ＳＲ０〜ＳＲ３１の１つを選択してこの出力を書き込
み、命令を終了する。The intermediate result obtained by the vector calculator 4 is stored in the scalar data buffer 5. The selector 12 selects and outputs the input data to the post-processing arithmetic unit 13 from the paths 10, 47a to 47d, 20. The post-processing arithmetic unit 13 outputs the intermediate result of each vector arithmetic unit and the data of the scalar register designated by the operand. And all the sums of and are calculated, and the final result is output to the result path 16. The selector 17 selects one of the scalar registers SR0 to SR31 designated by the operand, writes this output, and ends the instruction.

【００１５】前述した従来技術は、１つのベクトルデー
タに対応する複数個のベクトル演算器の組で構成される
演算リソースに対し、一つの後処理演算器が対応してい
るため、マクロ命令、連続するベクトル命令を高速に処
理することができる。In the above-mentioned prior art, since one post-processing arithmetic unit corresponds to the arithmetic resource composed of a plurality of vector arithmetic units corresponding to one vector data, the macro instruction, the continuous The vector instructions to be processed can be processed at high speed.

【００１６】次に、ベクトル処理をより高速に行うこと
のできる他の従来技術を図５、図６により説明する。Next, another conventional technique capable of performing vector processing at a higher speed will be described with reference to FIGS.

【００１７】図５に示す従来技術は、演算リソースを複
数組設備えて構成されるものであり、図４に示す従来技
術に、ベクトル演算器４とスカラデータバッファ５とに
より構成される演算部７ａ〜７ｄによるさらに１組の演
算リソース７を付加して構成されている。The prior art shown in FIG. 5 is constructed by equipping a plurality of sets of arithmetic resources. In the prior art shown in FIG. 4, an arithmetic unit 7a constituted by a vector arithmetic unit 4 and a scalar data buffer 5 is provided. .About.7d, a set of calculation resources 7 is added.

【００１８】すなわち、図５に示すベクトル処理装置
は、高速のＲＡＭで構成され、それぞれ独立に、読み出
し書き込みが可能で、２５６要素のベクトルデータを保
持することのできる３２個のベクトルレジスタＶＲ０〜
ＶＲ３１と、スカラデータを格納する３２個のスカラレ
ジスタＳＲ０〜ＳＲ３１と、ベクトルレジスタＶＲ０〜
ＶＲ３１とスカラレジスタＳＲ０〜ＳＲ３１との出力デ
ータを命令により演算器へ選択出力するスイッチマトリ
ックス論理のセレクタ３と、ベクトル演算器４及びその
演算結果を格納するスカラデータバッファ５により構成
される複数の演算部６ａ〜６ｂ、７ａ〜７ｂにより構成
される２組の演算リソース６、７と、セレクタ３を通し
て供給されるオペランドデータを演算リソース６、７に
入力するオペランドパス８ａ〜８ｈ、９ａ〜９ｈと、後
処理演算器にオペランドで指定されたスカラレジスタの
データを入力するオペランドパス１０と、後処理演算結
果のフィードバックパス２０と、演算リソース６、７の
中のそれぞれの演算部内のスカラデータバッファ５のデ
ータ、スカラレジスタＳＲ０〜ＳＲ３１のデータ及び後
処理演算結果のフィードバックデータのうちから後処理
演算器の入力データを選択出力するスイッチマトリック
ス論理で構成されるセレクタ１２と、スカラデータバッ
ファ５のデータ、スカラレジスタＳＲ０〜ＳＲ３１のデ
ータ及び後処理演算結果のフィードバックデータを演算
する後処理演算器１３と、ベクトル演算器４の演算結果
を出力するリザルトパス１４、１５と、後処理演算器１
３の演算結果を出力するリザルトパス１６と、各リザル
トパスからの演算結果をベクトルレジスタＶＲ０〜ＶＲ
３１またはスカラレジスタＳＲ０〜ＳＲ３１に書き込む
ための選択を行うスイッチマトリックス論理により構成
されるセレクタ１７とを備えて構成されている。That is, the vector processing device shown in FIG. 5 is composed of a high-speed RAM and can read and write independently, and 32 vector registers VR0 to VR0 capable of holding vector data of 256 elements.
VR31, 32 scalar registers SR0 to SR31 for storing scalar data, and vector registers VR0 to
A plurality of operations composed of a switch matrix logic selector 3 for selectively outputting the output data of the VR 31 and the scalar registers SR0 to SR31 to an operation unit by an instruction, a vector operation unit 4 and a scalar data buffer 5 for storing the operation result. Two sets of operation resources 6 and 7 configured by the units 6a to 6b and 7a to 7b, and operand paths 8a to 8h and 9a to 9h for inputting operand data supplied through the selector 3 to the operation resources 6 and 7, Operand path 10 for inputting the data of the scalar register specified by the operand to the post-processing operation unit, feedback path 20 for the post-processing operation result, and scalar data buffer 5 in each operation unit of operation resources 6 and 7 The data, the data in the scalar registers SR0 to SR31, and the post-processing operation result Selector 12 configured by switch matrix logic for selecting and outputting input data of the post-processing arithmetic unit from the feedback data, data of the scalar data buffer 5, data of the scalar registers SR0 to SR31, and feedback data of the post-processing arithmetic result are calculated. The post-processing arithmetic unit 13, the result paths 14 and 15 for outputting the calculation result of the vector arithmetic unit 4, and the post-processing arithmetic unit 1
The result path 16 that outputs the operation result of No. 3 and the operation results from each result path are vector registers VR0 to VR.
31 or a selector 17 configured by a switch matrix logic for selecting to write to the scalar registers SR0 to SR31.

【００１９】そして、ベクトル演算器４は、加算器、乗
算器あるいは乗／加算器であり、加算器の場合、入力デ
ータを全て加算する機能｛Σ〔Ａ（ｉ）＋Ｂ（ｉ）〕｝
及び比較する機能｛ＭａｘＡ（ｉ）、ＭｉｎＡ（ｉ）｝
を有し、乗算器の場合、入力データを乗算しその結果を
すべて加算する機能｛ΣＡ（ｉ）×Ｂ（ｉ）｝を有し、
さらに、乗／加算器の場合、片方の演算器の結果が他方
の演算器の入力データとなる機能を有している。The vector calculator 4 is an adder, a multiplier, or a multiplier / adder. In the case of an adder, a function of adding all input data {Σ [A (i) + B (i)]}
And the function of comparing {MaxA (i), MinA (i)}
In the case of a multiplier, it has a function of multiplying input data and adding all the results {ΣA (i) × B (i)},
Further, in the case of the multiplier / adder, it has a function that the result of one arithmetic unit becomes the input data of the other arithmetic unit.

【００２０】また、図示ベクトル処理装置は、図示しな
いが、主記憶からベクトルレジスタにオペランドデータ
を供給するロードパイプラインとベクトルレジスタから
主記憶にデータを格納するストアパイプラインとが備え
られている。Although not shown, the illustrated vector processing device is provided with a load pipeline for supplying operand data from the main memory to the vector register and a store pipeline for storing data from the vector register to the main memory.

【００２１】前述のように構成されるベクトル処理装置
は、例えば、マクロ演算命令が３つ以上連続して与えら
れた場合、最初の２つのマクロ演算命令を、それぞれ２
つの演算リソース６、７に発行すが、残りのマクロ演算
命令を、最初の２つのマクロ演算命令のうち片方の最終
結果が求まるまで待たせるように動作する。In the vector processing device configured as described above, for example, when three or more macro operation instructions are consecutively given, the first two macro operation instructions are each given two.
Although it is issued to one computing resource 6 or 7, the remaining macro computing instructions operate so as to wait until the final result of one of the first two macro computing instructions is obtained.

【００２２】すなわち、スカラデータバッファ５のデー
タは、後処理演算器１３演算されて処理が終了するま
で、このバッファ５内に格納されている必要があり、こ
のスカラデータバッファ５を含む演算リソースは、この
間次のマクロ演算命令による演算を行うことができず、
このリソースに対して次のマクロ演算命令を発行するこ
とができない。That is, the data in the scalar data buffer 5 needs to be stored in the buffer 5 until the post-processing operation unit 13 operates and the processing is completed. , During this time, the operation by the next macro operation instruction cannot be performed,
The next macro operation instruction cannot be issued to this resource.

【００２３】前述のベクトル処理装置がマクロ演算命令
を連続して処理する場合の処理ピッチを図６を参照して
説明する。The processing pitch when the above-described vector processing device processes macro operation instructions successively will be described with reference to FIG.

【００２４】いま、図５に示すベクトル処理装置に連続
してマクロ演算命令が発行されたとする。図６に示す例
では、この場合、まず最初に、演算リソース６がマクロ
演算命令の処理７２を実行し、演算リソース７がマク
ロ演算命令の処理７３を実行している。後処理演算器
１３は、演算リソース６におけるマクロ演算命令の処
理７２が終了した時点で、マクロ演算命令の処理７２
に対する後処理演算６９を開始するが、演算リソース６
には、マクロ演算命令の処理７２に対する後処理演算
６９が終わるまで次のマクロ演算命令を発行すること
ができない。Now, it is assumed that macro operation instructions are continuously issued to the vector processing device shown in FIG. In the example shown in FIG. 6, in this case, first, the arithmetic resource 6 executes the macro arithmetic instruction process 72, and the arithmetic resource 7 executes the macro arithmetic instruction process 73. The post-processing arithmetic unit 13 processes the macro operation instruction 72 at the time when the macro operation instruction processing 72 in the operation resource 6 is completed.
Start post-processing operation 69 for
Therefore, the next macro operation instruction cannot be issued until the post-processing operation 69 for the macro operation instruction processing 72 is completed.

【００２５】そして、このマクロ演算命令は、後処理
演算６９が終了した後に演算リソース６に発行されて、
マクロ演算命令の処理６７が実行される。このため、
演算リソース６には、無駄な時間６８が生じてしまう。
演算リソース７においても同様であり、マクロ演算命令
の処理７３に対する後処理演算７０が終わるまで、マ
クロ演算命令を演算リソース７に発行することができ
ず、その処理７１が遅れて、無駄な時間７４が生じるこ
とになる。Then, this macro operation instruction is issued to the operation resource 6 after the post-processing operation 69 is completed,
The processing 67 of the macro operation instruction is executed. For this reason,
A wasteful time 68 occurs in the computing resource 6.
The same applies to the calculation resource 7, and the macro calculation instruction cannot be issued to the calculation resource 7 until the post-processing calculation 70 for the processing 73 of the macro calculation instruction is completed, and the processing 71 is delayed, resulting in a dead time 74. Will occur.

【００２６】なお、図６においては、後処理演算器の演
算時間が、演算リソースのベクトル演算器の演算時間の
約１／２として示しているが、一般に、後処理演算器の
演算時間は、ベクトル演算器の演算時間より短く、前述
の無駄時間は、これらの演算器の演算時間により図示例
とは異なるものとなる。Note that, in FIG. 6, the calculation time of the post-processing arithmetic unit is shown as about 1/2 of the calculation time of the vector arithmetic unit of the calculation resource, but generally, the calculation time of the post-processing arithmetic unit is The dead time is shorter than the calculation time of the vector calculators, and the dead time described above differs from the illustrated example depending on the calculation time of these calculators.

【００２７】前述したように、図５に示すベクトル処理
装置は、演算リソースを２つに増加したにもかかわら
ず、マクロ演算命令が連続する場合の処理ピッチの向上
を図ることができないものである。As described above, the vector processing device shown in FIG. 5 cannot increase the processing pitch when macro operation instructions are continuous, although the operation resources are increased to two. .

【００２８】このマクロ演算命令が連続する場合の処理
ピッチの向上を図ることができない点は、図４に示した
従来技術の場合も同様である。The fact that the processing pitch cannot be improved when the macro operation instructions are continuous is the same in the case of the prior art shown in FIG.

【００２９】[0029]

【発明が解決しようとする課題】前述したように、図
４、図５に示す従来技術は、後処理演算器での処理が終
わるまで、ベクトル演算の結果を後処理演算器に与えた
演算リソースに、次のマクロ演算命令を発行することが
できず、特に、マクロ演算命令が連続する場合に、その
処理ピッチを低下させてしまうという問題点を有してい
る。As described above, in the prior art shown in FIGS. 4 and 5, the calculation resource that gives the result of the vector calculation to the post-processing arithmetic unit until the processing by the post-processing arithmetic unit is completed. In addition, there is a problem in that the next macro operation instruction cannot be issued, and in particular, when the macro operation instructions are continuous, the processing pitch is reduced.

【００３０】本発明の目的は、前記従来技術の問題点を
解決し、少ないハードウェア量で構成することができ、
効率的、かつ、高速に連続するマクロ演算命令を処理す
ることのできるベクトル処理装置を提供することにあ
る。The object of the present invention is to solve the above-mentioned problems of the prior art, and to configure with a small amount of hardware,
An object of the present invention is to provide a vector processing device capable of efficiently and continuously processing macro operation instructions.

【００３１】[0031]

【課題を解決するための手段】本発明によれば前記目的
は、演算リソースと後処理演算器とを有するベクトル処
理装置において、演算リソースと後処理演算器との間
に、後処理演算器の数と等しい数の、演算リソースのス
カラデータバッファ内の後処理演算器の入力データとな
るデータを保持する第２のスカラデータバッファを備え
ることにより達成される。According to the present invention, the object is to provide a post-processing arithmetic unit between a calculation resource and a post-processing arithmetic unit in a vector processing device having a calculation resource and a post-processing arithmetic unit. This is achieved by providing a second scalar data buffer that holds the same number of data as the input data of the post-processing arithmetic unit in the scalar data buffer of the arithmetic resource.

【００３２】[0032]

【作用】演算リソースは、演算終了後、自演算リソース
のスカラデータバッファ内のデータを直ちに第２のバッ
ファに転送する。自リソースのスカラデータバッファ内
のデータを第２のバッファに転送した演算リソースは、
次のマクロ演算命令を実行することが可能になる。After the operation is completed, the operation resource immediately transfers the data in the scalar data buffer of the operation resource to the second buffer. The operation resource that transferred the data in the scalar data buffer of its own resource to the second buffer is
It becomes possible to execute the next macro operation instruction.

【００３３】後処理演算器は、第２のバッファに転送さ
れたデータの処理を行い、処理終了後、後処理演算の必
要な次の演算リソースのスカラデータバッファ内のデー
タを第２のバッファに取り込んみ、演算処理を繰り返し
実行する。The post-processing operation unit processes the data transferred to the second buffer, and after the processing is completed, the data in the scalar data buffer of the next operation resource requiring the post-processing operation is transferred to the second buffer. Capture and repeat the arithmetic processing.

【００３４】このように、本発明は、第２のバッファに
データが取り込まれた演算リソースに対して、早期にマ
クロ演算命令を含むベクトル命令の発行が可能であり、
これにより、連続したマクロ演算命令を高速に処理する
ことができる。As described above, according to the present invention, it is possible to issue a vector instruction including a macro operation instruction at an early stage to an operation resource in which data is fetched in the second buffer.
As a result, continuous macro operation instructions can be processed at high speed.

【００３５】[0035]

【実施例】以下、本発明によるベクトル処理装置の一実
施例を図面を用いて詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of a vector processing device according to the present invention will be described in detail below with reference to the drawings.

【００３６】図１は本発明の一実施例の構成を示すブロ
ック図、図２は制御系の概略の構成を示すブロック図、
図３は図１におけるマクロ演算命令処理時の演算リソー
ス及び後処理演算器の処理ピッチを説明する図である。
図１、図２において、１１は第２のスカラデータバッフ
ァ、、１８はスカラデータバッファ、１９、２９はセレ
クタ、２１は命令バッファ、２３、２５、２７、２８は
演算命令バッファ、３０はベクトル制御部であり、他の
符号は図４、図５の場合と同一である。FIG. 1 is a block diagram showing a configuration of an embodiment of the present invention, FIG. 2 is a block diagram showing a schematic configuration of a control system,
FIG. 3 is a diagram for explaining the processing resources and the processing pitch of the post-processing operation unit when processing the macro operation instruction in FIG.
In FIGS. 1 and 2, 11 is a second scalar data buffer, 18 is a scalar data buffer, 19 and 29 are selectors, 21 is an instruction buffer, 23, 25, 27 and 28 are operation instruction buffers, and 30 is vector control. The other reference numerals are the same as those in FIGS. 4 and 5.

【００３７】図１に示す本発明の一実施例は、図５によ
り説明した従来技術における演算リソースと後処理演算
器との間に、セレクタ１９と第２のスカラデータバッフ
ァ１１を設けて構成されている。The embodiment of the present invention shown in FIG. 1 is constructed by providing a selector 19 and a second scalar data buffer 11 between the operation resource and the post-processing operation unit in the prior art described with reference to FIG. ing.

【００３８】すなわち、図１に示す本発明の一実施例に
よるベクトル処理装置は、高速のＲＡＭで構成され、そ
れぞれ独立に、読み出し書き込みが可能で、２５６要素
のベクトルデータを保持することのできる３２個のベク
トルレジスタＶＲ０〜ＶＲ３１と、スカラデータを格納
する３２個のスカラレジスタＳＲ０〜ＳＲ３１と、ベク
トルレジスタＶＲ０〜ＶＲ３１とスカラレジスタＳＲ０
〜ＳＲ３１との出力データを命令により演算器へ選択出
力するスイッチマトリックス論理のセレクタ３と、ベク
トル演算器４及びその演算結果を格納するスカラデータ
バッファ５により構成される複数の演算部６ａ〜６ｂ、
７ａ〜７ｂにより構成される２組の演算リソース６、７
と、セレクタ３を通して供給されるオペランドデータを
演算リソース６、７に入力するオペランドパス８ａ〜８
ｈ、９ａ〜９ｈと、後処理演算器にオペランドで指定さ
れたスカラレジスタのデータを入力するオペランドパス
１０と、演算リソース６、７の中の各スカラデータバッ
ファ５、１８のデータのうち後処理演算器の入力データ
を格納するバッファ１１ａ〜１１ｄによる第２のスカラ
データバッファ１１と、演算リソース６、７からのデー
タの一方を選択して第２のスカラデータバッファ１１に
与えるセレクタ１９と、後処理演算結果のフィードバッ
クパス２０と、第２のスカラデータバッファ１１内のデ
ータ、スカラレジスタＳＲ０〜ＳＲ３１のデータ及び後
処理演算結果のフィードバックデータのうちから後処理
演算器の入力データを選択出力するスイッチマトリック
ス論理で構成されるセレクタ１２と、第２のスカラデー
タバッファ１１のデータ、スカラレジスタＳＲ０〜ＳＲ
３１のデータ及び後処理演算結果のフィードバックデー
タを演算する後処理演算器１３と、ベクトル演算器４の
演算結果を出力するリザルトパス１４、１５と、後処理
演算器１３の演算結果を出力するリザルトパス１６と、
各リザルトパスからの演算結果をベクトルレジスタＶＲ
０〜ＶＲ３１またはスカラレジスタＳＲ０〜ＳＲ３１に
書き込むための選択を行うスイッチマトリックス論理に
より構成されるセレクタ１７とを備えて構成されてい
る。That is, the vector processing apparatus according to the embodiment of the present invention shown in FIG. 1 is composed of a high-speed RAM, can read and write independently, and can store vector data of 256 elements. Vector registers VR0 to VR31, 32 scalar registers SR0 to SR31 for storing scalar data, vector registers VR0 to VR31 and scalar register SR0
To a plurality of arithmetic units 6a to 6b each composed of a switch matrix logic selector 3 for selectively outputting the output data of SR31 to an arithmetic unit according to an instruction, a vector arithmetic unit 4 and a scalar data buffer 5 storing the arithmetic result thereof.
Two sets of computing resources 6, 7 composed of 7a-7b
And operand paths 8a to 8 for inputting the operand data supplied through the selector 3 to the operation resources 6 and 7.
h, 9a to 9h, the operand path 10 for inputting the data of the scalar register designated by the operand to the post-processing arithmetic unit, and the post-processing of the data of the scalar data buffers 5 and 18 in the arithmetic resources 6 and 7. The second scalar data buffer 11 by the buffers 11a to 11d for storing the input data of the arithmetic unit, the selector 19 for selecting one of the data from the arithmetic resources 6 and 7 and giving it to the second scalar data buffer 11, and A switch for selecting and outputting input data of the post-processing arithmetic unit from the feedback path 20 of the processing arithmetic result, the data in the second scalar data buffer 11, the data of the scalar registers SR0 to SR31, and the feedback data of the post-processing arithmetic result. The selector 12 composed of matrix logic and the second scalar data buffer 11 Over data, scalar register SR0~SR
31. The post-processing arithmetic unit 13 for calculating the data of 31 and the feedback data of the post-processing arithmetic result, the result paths 14 and 15 for outputting the arithmetic result of the vector arithmetic unit 4, and the result for outputting the arithmetic result of the post-processing arithmetic unit 13. Pass 16,
The calculation result from each result path is a vector register VR
0-VR31 or a scalar register SR0-SR31 and a selector 17 constituted by a switch matrix logic for selecting to write.

【００３９】前述した、ベクトル演算器４は、加算器、
乗算器あるいは乗／加算器であり、加算器の場合、入力
データを全て加算する機能｛Σ〔Ａ（ｉ）＋Ｂ
（ｉ）〕｝及び比較する機能｛ＭａｘＡ（ｉ）、Ｍｉｎ
Ａ（ｉ）｝を有し、乗算器の場合、入力データを乗算し
その結果をすべて加算する機能｛ΣＡ（ｉ）×Ｂ
（ｉ）｝を有し、さらに、乗／加算器の場合、片方の演
算器の結果が他方の演算器の入力データとなる機能を有
している。The above-mentioned vector calculator 4 is an adder,
It is a multiplier or a multiplier / adder, and in the case of an adder, a function of adding all input data {Σ [A (i) + B
(I)]} and the function to compare {MaxA (i), Min
A (i)} and, in the case of a multiplier, a function of multiplying input data and adding all the results {ΣA (i) × B
(I)}, and in the case of the multiplier / adder, it has a function that the result of one arithmetic unit becomes the input data of the other arithmetic unit.

【００４０】また、図示本発明の一実施例によるベクト
ル処理装置は、図示しないが、主記憶からベクトルレジ
スタにオペランドデータを供給するロードパイプライン
とベクトルレジスタから主記憶にデータを格納するスト
アパイプラインとが備えられている。Although not shown in the drawings, the vector processing apparatus according to the embodiment of the present invention includes a load pipeline for supplying operand data from the main memory to the vector register and a store pipeline for storing data from the vector register to the main memory. And are provided.

【００４１】前述のように構成される本発明の一実施例
によるベクトル処理装置において、例えば、マクロ演算
命令が３つ以上連続して与えられた場合、最初の２つの
マクロ演算命令が、それぞれ２つの演算リソース６、７
に発行される。そして、演算リソース６、７による演算
の中間結果がスカラデータバッファ４、１８に格納され
る。In the vector processing apparatus according to the embodiment of the present invention configured as described above, for example, when three or more macro operation instructions are continuously given, the first two macro operation instructions are respectively 2 Two computing resources 6, 7
Published in. Then, the intermediate result of the operation by the operation resources 6 and 7 is stored in the scalar data buffers 4 and 18.

【００４２】セレクタ１９は、２つの演算リソース６、
７のどちらかを選択し、後処理演算器１３に対する入力
データとして、スカラデータバッファ５、１８の一方の
データを選択して第２のスカラデータバッファ１１に格
納する。The selector 19 has two calculation resources 6,
7 is selected, one of the scalar data buffers 5 and 18 is selected as input data to the post-processing calculator 13, and the selected data is stored in the second scalar data buffer 11.

【００４３】後処理演算器１３は、この第２のスカラデ
ータバッファ１１のデータ及びスカラレジスタＳＲ０〜
ＳＲ３１のデータを入力データとして演算を行い最終結
果をリザルトパス１６、セレクタ１７を介して、オペラ
ンドで指定されたレジスタに書き込む。The post-processing arithmetic unit 13 outputs the data in the second scalar data buffer 11 and the scalar registers SR0 to SR0.
The SR31 data is used as input data for the calculation, and the final result is written to the register designated by the operand via the result path 16 and the selector 17.

【００４４】前述の動作で、スカラデータバッファ５、
１８のデータが第２のスカラデータバッファ１１に格納
された演算リソースには、次のマクロ演算命令が発行さ
れる。これにより、内部のスカラデータバッファが空き
となった演算リソースは、後処理演算の終了を待たず
に、直ちに、次のマクロ演算命令の実行を開始すること
ができ、処理装置全体の処理効率の向上を図ることがで
きる。In the above operation, the scalar data buffer 5,
The following macro operation instruction is issued to the operation resource in which the data of 18 is stored in the second scalar data buffer 11. As a result, the calculation resource with which the internal scalar data buffer becomes empty can immediately start executing the next macro calculation instruction without waiting for the end of the post-processing calculation, thus improving the processing efficiency of the entire processing device. It is possible to improve.

【００４５】前述した演算動作を制御する制御論理の一
例の回路構成が図２に示されている。FIG. 2 shows a circuit configuration of an example of the control logic for controlling the arithmetic operation described above.

【００４６】図２に示すように、ベクトル制御部３０
は、ベクトル命令バッファ２１と、演算リソース６、７
に対する命令バッファ２３、２５と、演算リソース６、
７の中間結果を演算する後処理演算器１３に対する命令
バッファ２７、２８と、これらの命令バッファ２７、２
８からの命令を後処理演算器１３に選択出力するスイッ
チマトリックス論理のセレクタ２９とを備えて構成され
ている。As shown in FIG. 2, the vector controller 30
Is a vector instruction buffer 21 and calculation resources 6 and 7
Instruction buffers 23 and 25 for
Instruction buffers 27 and 28 for the post-processing calculator 13 that calculates the intermediate result of 7 and these instruction buffers 27 and 2
The switch matrix logic selector 29 for selectively outputting the command from 8 to the post-processing arithmetic unit 13 is configured.

【００４７】すなわち、本発明の一実施例は、各演算リ
ソースに対する命令バッファと後処理演算器に対する命
令バッファとを、後処理演算器で処理する演算リソース
の数と同数設けることにより、各演算リソース及び後処
理演算器の並列動作を可能にとすることのできるもので
ある。That is, according to one embodiment of the present invention, the number of instruction buffers for each operation resource and the number of instruction buffers for the post-processing operation unit are the same as the number of operation resources processed by the post-processing operation unit, so that each operation resource is Also, it is possible to enable the parallel operation of the post-processing arithmetic unit.

【００４８】前述のベクトル処理装置がマクロ演算命令
を連続して処理する場合の処理ピッチを図３を参照して
説明する。The processing pitch when the above-described vector processing device processes macro operation instructions successively will be described with reference to FIG.

【００４９】いま、図１に示すベクトル処理装置に連続
してマクロ演算命令が発行されたとする。図３に示す例
では、この場合、まず最初に、演算リソース６がマクロ
演算命令の処理３４を実行し、演算リソース７がマク
ロ演算命令の処理３１を実行している。演算リソース
６においてマクロ演算命令の処理３４が終了した時点
で、後処理演算器１３は、マクロ演算命令の処理３４
に対する後処理演算３２を開始する。Now, it is assumed that macro operation instructions are continuously issued to the vector processing device shown in FIG. In the example shown in FIG. 3, in this case, first, the arithmetic resource 6 executes the macro arithmetic instruction process 34, and the arithmetic resource 7 executes the macro arithmetic instruction process 31. When the processing 34 of the macro operation instruction is completed in the operation resource 6, the post-processing operation unit 13 determines that the processing 34 of the macro operation instruction is completed.
The post-processing operation 32 for is started.

【００５０】このとき、演算リソース６で求められたマ
クロ演算命令の処理３４による中間結果は、図１に示
す第２のスカラデータバッファ１１に格納されるため、
演算リソース６は、マクロ演算命令の処理３４の終了
後、直ちに次のマクロ演算命令の処理３３を実行する
ことが可能となる。At this time, since the intermediate result by the processing 34 of the macro operation instruction obtained by the operation resource 6 is stored in the second scalar data buffer 11 shown in FIG.
The computing resource 6 can immediately execute the next macro computing instruction processing 33 after the macro computing instruction processing 34 is completed.

【００５１】前述のように本発明の一実施例は、無駄な
時間を省き処理ピッチの低下を防ぐことができる。As described above, according to the embodiment of the present invention, it is possible to prevent wasteful time and prevent the processing pitch from decreasing.

【００５２】なお、なお、図３においては、後処理演
算器の演算時間が、演算リソースのベクトル演算器の演
算時間の約１／２として示しているが、一般に、後処理
演算器の演算時間は、ベクトル演算器の演算時間より短
く、これらの演算時間によっては、演算リソースまたは
後処理演算器の一方の処理に僅かな無駄時間が生じる場
合がある。Note that, in FIG. 3, the calculation time of the post-processing arithmetic unit is shown as about 1/2 of the calculation time of the vector arithmetic unit of the calculation resource, but in general, the calculation time of the post-processing arithmetic unit is shown. Is shorter than the operation time of the vector operation unit, and depending on these operation times, a small dead time may occur in the processing of either the operation resource or the post-processing operation unit.

【００５３】また、前述した本発明の一実施例は、演算
リソースを２組、後処理演算器を１組設けたものとして
説明したが、本発明は、さらに多くの演算リソースの組
と、これより少ない複数の後処理演算器とを有するベク
トル処理装置に対しても適用することができる。この場
合、後処理演算器に対応して、その数と第２のスカラデ
ータバッファを設ければよい。Although the above-described embodiment of the present invention has been described assuming that two sets of calculation resources and one set of post-processing calculators are provided, the present invention includes more sets of calculation resources and It can also be applied to a vector processing device having a smaller number of post-processing arithmetic units. In this case, the number and the second scalar data buffer may be provided corresponding to the post-processing arithmetic unit.

【００５４】さらに、本発明は、図４により説明した従
来技術、すなわち、演算リソースと後処理演算器とがそ
れぞれ１組、あるいは、同数設けられたベクトル処理装
置に対しても適用することができ、この場合にも、演算
リソースは、後処理演算の処理終了を待つことなく、次
の、マクロ演算命令の処理を開始することができる。Furthermore, the present invention can be applied to the prior art described with reference to FIG. 4, that is, to the vector processing device in which the calculation resource and the post-processing calculator are each provided in one set or in the same number. Also in this case, the calculation resource can start the processing of the next macro calculation instruction without waiting for the end of the post-processing calculation.

【００５５】前述した本発明の一実施例によれば、各要
素対応のベクトル演算器でマクロ演算命令の中間結果を
求める処理と、その中間結果を入力データとし後処理演
算器で最終結果を求める処理とを分離しているので、演
算リソースの数を増加した場合にも、後処理演算器の物
量を最小限に抑え、マクロ演算命令が連続して発行され
る場合にも演算リソースを有効に使用することができ、
これにより、処理速度を高速化することができる。According to the above-described embodiment of the present invention, the process of obtaining the intermediate result of the macro operation instruction by the vector operation unit corresponding to each element and the final result by the post-processing operation unit using the intermediate result as input data. Since it is separated from the processing, even when the number of computing resources is increased, the physical quantity of the post-processing computing unit is minimized, and the computing resources are effective even when macro computing instructions are issued consecutively. Can be used
Thereby, the processing speed can be increased.

【００５６】[0056]

【発明の効果】以上説明したように本発明によれば、演
算リソースと後処理演算器とを有するベクトル処理装置
において、演算リソースと後処理演算器との間に第２の
スカラデータバッファを設け、マクロ演算命令のように
演算リソースの中間結果を後処理演算器で処理する場合
に、前記第２のスカラデータバッファに、演算リソース
からのデータを取り込むことにより、第２のスカラデー
タバッファにデータが取り込まれた演算リソースが、次
のマクロ演算命令を含むベクトル命令の処理を早期に開
始することができ、少ない物量で１つのマクロ演算命令
の高速化と、マクロ演算命令が連続した場合の命令処理
ピッチの短縮化を図ることができる。As described above, according to the present invention, in the vector processing device having the arithmetic resource and the post-processing arithmetic unit, the second scalar data buffer is provided between the arithmetic resource and the post-processing arithmetic unit. , When the intermediate result of the arithmetic resource is processed by the post-processing arithmetic unit like the macro arithmetic instruction, the data from the arithmetic resource is fetched into the second scalar data buffer, and the data is stored in the second scalar data buffer. The processing resource that has been taken in can start the processing of the vector instruction including the next macro processing instruction at an early stage, speeding up one macro processing instruction with a small amount of material, and an instruction when the macro processing instructions are continuous. The processing pitch can be shortened.

[Brief description of drawings]

【図１】本発明の一実施例の構成を示すブロック図であ
る。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention.

【図２】制御系の概略の構成を示すブロック図である。FIG. 2 is a block diagram showing a schematic configuration of a control system.

【図３】図１におけるマクロ演算命令処理時の演算リソ
ース及び後処理演算器の処理ピッチを説明する図であ
る。FIG. 3 is a diagram for explaining processing resources and a processing pitch of a post-processing operation unit when processing a macro operation instruction in FIG.

【図４】演算リソース及び後処理演算器を備える従来技
術の構成を示すブロック図である。FIG. 4 is a block diagram showing a configuration of a conventional technique including a computing resource and a post-processing computing unit.

【図５】複数組の演算リソース及び後処理演算器を備え
る従来技術の構成を示すブロック図である。FIG. 5 is a block diagram showing a configuration of a conventional technique including a plurality of sets of calculation resources and a post-processing calculator.

【図６】図５におけるマクロ演算命令処理時の演算リソ
ース及び後処理演算器の処理ピッチを説明する図であ
る。6 is a diagram for explaining processing resources and a processing pitch of a post-processing operation unit when processing a macro operation instruction in FIG.

[Explanation of symbols]

３、１２、１７、１９、２９セレクタ４ベクトル演算器５、１８スカラデータバッファ６、７演算リソース１３後処理演算器ＶＲ０〜ＶＲ３１ベクトルレジスタＳＲ０〜ＳＲ３１スカラレジスタ１１第２のスカラデータバッファ１９、２９セレクタ２１命令バッファ２３、２５、２７、２８演算命令バッファ３０ベクトル制御部 3, 12, 17, 19, 19, 29 selector 4 vector calculator 5, 18 scalar data buffer 6, 7 calculation resource 13 post-processing calculator VR0-VR31 vector register SR0-SR31 scalar register 11 second scalar data buffer 19, 29 Selector 21 instruction buffer 23, 25, 27, 28 operation instruction buffer 30 vector control unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者柏山正守神奈川県秦野市堀山下１番地株式会社日立製作所汎用コンピュータ事業部内 (72)発明者前田康博神奈川県秦野市堀山下１番地日立コンピュータエンジニアリング株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Masamori Kashiwayama, No. 1 Horiyamashita, Horiyamashita, Hadano-city, Kanagawa Prefecture, General Computer Division, Hitachi, Ltd. (72) Yasuhiro Maeda, No. 1 Horiyamashita, Hadano, Kanagawa Prefecture Hitachi Computer Data Engineering Co., Ltd.

Claims

[Claims]

1. A plurality of vector registers for holding vector data composed of a plurality of vector elements, a vector calculator for processing a plurality of vector elements of the vector data held in the vector register in parallel, and a scalar data buffer. In a vector processing device having an arithmetic resource and a post-processing arithmetic unit that uses the data of the scalar data buffer as input data, a second scalar data buffer that holds the data of the scalar data buffer as input data of the post-processing arithmetic unit A vector processing device comprising:

2. A plurality of vector registers for holding vector data consisting of a plurality of vector elements, a vector calculator for processing a plurality of vector elements of the vector data held in the vector register in parallel, and a scalar data buffer. In a vector processing device having a plurality of arithmetic resources and a post-processing arithmetic unit that receives the data of the scalar data buffer as input data and has a number smaller than the number of the arithmetic resources, the data of the scalar data buffer is post-processing arithmetic unit. Second to hold as input data of
Vector processing device, characterized in that the same number of scalar data buffers are provided as in the post-processing arithmetic unit.

3. The vector processing device according to claim 1, wherein the arithmetic resource can start arithmetic processing for the next vector element after passing data to the second scalar data buffer. .