JPS63118875A

JPS63118875A - Array processor

Info

Publication number: JPS63118875A
Application number: JP26637487A
Authority: JP
Inventors: Giichi Tanaka; 義一田中; Shunichi Torii; 鳥井　俊一
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1987-10-23
Filing date: 1987-10-23
Publication date: 1988-05-23

Abstract

PURPOSE:To obtain an array processor of simple constitution that can carry out double DO loop processing at a high speed, by applying the pipeline control to carry out each inside loop processing of the double DO loop processing and at the same time carrying out these DO loop processings in parallel with each other. CONSTITUTION:A central vector processing unit VPC fetches and decodes a vector instruction or an array instruction. Then the vector instruction if decoded is carried out. While the vector processing units PE1-PEM are started when the array instruction is decoded. Each of these units PE1-PEM decodes the instruction fetched by the unit VPC when it is started by the VPC (in array instruction mode) and carries out an arithmetic operation designated by an instruction to a single vector array of the array data designated by the decoded instruction and to undergo an arithmetic operation.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明はアレイ演算が高速で実行可能なアレイ処理向き
ディジタル形電子計算機（以下これをアレイプロセッサ
と呼ぶ）に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Application of the Invention] The present invention relates to a digital electronic computer suitable for array processing (hereinafter referred to as an array processor) capable of executing array operations at high speed.

[Background technology]

科学技術計算においては、第１図のフォートランプログ
ラムで示される二次元のアレイ計算の高速化が要求され
ている。以下、一次元、二次元のアレイを簡単化のため
それぞれベクトル、アレイと呼ぶ。アレイ計算の高速化
のためには現在まで次の２つのアプローチがとられてい
る。第１は、多数のプロセッサを二次元に並列配置し、
多数のアレイ要素に対して同時に並列実行するアレイプ
ロセッサ方式である。このアプローチは、プロセッサ数
を増すことによって、原理的にはその数に比例した性能
向上が期待できる。しかし、処理すべきアレイ要素数が
設置プロセッサより少ない場合など、プロセッサの可動
率が低く、経済的に引き合わない。第２は、ベクトルレ
ジスタとパイプライン制御の演算器を用いて、ベクトル
処理を高速化するベクトルプロセッサ方式である。In scientific and technical calculations, there is a demand for faster two-dimensional array calculations as shown in the Fortran program shown in FIG. Hereinafter, one-dimensional and two-dimensional arrays will be referred to as vectors and arrays, respectively, for simplicity. To date, the following two approaches have been taken to speed up array calculations. The first is to arrange many processors in parallel in two dimensions,
This is an array processor method that simultaneously executes parallel processing on a large number of array elements. With this approach, by increasing the number of processors, performance can be expected to improve in principle in proportion to the number of processors. However, in cases where the number of array elements to be processed is smaller than the number of installed processors, the utilization rate of the processors is low and it is not economically advantageous. The second method is a vector processor method that uses vector registers and pipeline-controlled arithmetic units to speed up vector processing.

その代表的なものは米国特許４，１２８，８８０に記載
されているものである。ベクトルプロセッサでは一重の
Ｄｏループに関する処理を高速化できるが、二重ループ
処理の高速化には限界がある。A representative example thereof is described in US Pat. No. 4,128,880. A vector processor can speed up processing related to a single Do loop, but there is a limit to speeding up double loop processing.

第１図のプログラムは通常第２図のように一重のループ
処理Ａ１〜ＡＭに分けられ、これらのループ処理Ａ１〜
ＡＭを時系列的に実行せざるを得ないからである。The program in Figure 1 is usually divided into single loop processing A1 to AM as shown in Figure 2, and these loop processing A1 to
This is because AM has no choice but to be executed in chronological order.

[Purpose of the invention]

本発明は二重のＤｏループ処理の内側のループ処理の各
々をパイプライン制御で実行し、かつ。The present invention executes each of the inner loop processes of the double Do loop process under pipeline control, and.

これらのＤｏループ処理を互いに並列に実行することに
より、二重Ｄｏループ処理を高速に実行できる。構成の
簡単なアレープロセッサを提供することを目的とする。By executing these Do loop processes in parallel, the double Do loop process can be executed at high speed. The purpose of the present invention is to provide an array processor with a simple configuration.

[Summary of the invention]

このため、本発明では、複数のベクトルレジスタと、パ
イプライン制御の演算器とを有し、ベクトル処理を要求
する命令（ベクトル命令）を実行する中央ベクトル処理
ユニットと、複数のベクトルレジスタとパイプライン制
御の演算器とを有し。For this reason, the present invention provides a central vector processing unit that has a plurality of vector registers and a pipeline-controlled arithmetic unit and executes an instruction (vector instruction) that requests vector processing, and a central vector processing unit that has a plurality of vector registers and a pipeline-controlled arithmetic unit; It has a control calculator.

アレイ処理を要求する命令（アレイ命令）を実行するた
めの、複数のベクトル処理ユニットとしてアレイプロセ
ッサが構成される。中央ベクトル処理ユニットではベク
トル命令又はアレイ命令をフェッチして解読し、解読し
た命令がベクトル命令のときにはその命令を実行し、解
読した命令がアレイ命令のときには、これらのベクトル
処理ユニットを起動する。各ベクトル処理ユニットでは
中央ベクトル処理ユニットにより起動された時（アレイ
命令の時）中央ベクトル処理ユニットによりフェッチさ
れた命令を解読し、その命令で指定される演算の対象と
なるアレイデータの一つのベクトルデータに対して命令
で指定される演算を実行する。アレイ命令で指定される
演算結果がベクトルデータのときには、各ベクトル処理
ユニットではこの結果ベクトルの一つの要素を算出し、
それを内蔵のスカラレジスタに格納するとともに、中央
ベクトル処理ユニット内のベクトルレジスタに格納する
ために中央ベクトル処理ユニットに送出する。アレイ命
令で指定される演算結果がアレイデータのときには、各
ベクトルと処理ユニットではこの結果アレイデータの一
つのベクトルデータを算出し、それを内蔵のベクトルレ
ジスタに格納する。An array processor is configured as a plurality of vector processing units for executing instructions that require array processing (array instructions). The central vector processing unit fetches and decodes vector instructions or array instructions, executes the decoded instruction when it is a vector instruction, and activates these vector processing units when the decoded instruction is an array instruction. Each vector processing unit, when activated by the central vector processing unit (in the case of an array instruction), decodes the instruction fetched by the central vector processing unit and generates one vector of array data to be subjected to the operation specified by the instruction. Performs the operation specified by the instruction on the data. When the operation result specified by the array instruction is vector data, each vector processing unit calculates one element of this result vector,
It is stored in an internal scalar register and sent to the central vector processing unit for storage in a vector register within the central vector processing unit. When the operation result specified by the array instruction is array data, each vector and processing unit calculates one vector data of the resulting array data and stores it in a built-in vector register.

[Embodiments of the invention]

第３図に示すように、本発明によるアレイプロセッサは
、スカラ処理ユニットＳＰ、中央ベクトル処理ユニット
ＶＰＣ，Ｍ台のベクトル処理ユニットＰＥ工〜ＰＥＭ、
これらに共通に設けられた主記憶装置（ＭＳ）Ｃ５１，
主記憶制御ユニット（ＳＣＵ）Ｃ５２を有する。主記憶
装置Ｃ５１には、スカラ命令列とベクトル命令、すなわ
ち、アレイ処理は要求せず、ベクトル処理を要求する命
令およびアレイ命令、すなわちアレイ処理を要求する命
令、からなるベクトル・アレイ命令列が区分して記憶さ
れ、それ以外にスカラ、ベクトルとアレイデータが記憶
されていて、これら１士主記憶制御ユニットＣ５２によ
りアクセスされる。以下では、ベクトル命令、アレイ命
令のいずれをも単にベクトル・アレイ命令と呼ぶことが
ある。As shown in FIG. 3, the array processor according to the present invention includes a scalar processing unit SP, a central vector processing unit VPC, M vector processing units PE to PEM,
Main memory device (MS) C51 provided in common to these,
It has a main storage control unit (SCU) C52. The main memory C51 is divided into a vector array instruction string consisting of a scalar instruction string, a vector instruction, that is, an instruction that does not request array processing but requests vector processing, and an array instruction, that is, an instruction that requests array processing. In addition, scalar, vector, and array data are stored and accessed by the main storage control unit C52. Hereinafter, both vector instructions and array instructions may be simply referred to as vector array instructions.

スカラ処理ユニットＳＰは、スカラ命令レジスタ（ＳＩ
Ｒ）ＭＳ１．汎用レジスタ（Ｇ　Ｒ）ＭＳ２、スカラ演
算器Ｃ５３とこれらを制御する制御部ＳＣ○を有し、こ
れらにより主記憶装置Ｃ５１内のスカラ命令列を実行す
る。このスカラ命令列には、一般的なスカラ命令、たと
えばＩＢＭ社発行のマニュアルｒ　Ｓ　ｙｓｔｅｍ／　
３７０Ｐｒｉｎｃｉｐｌｅｓ　　ｏｆ　　０ｐｅｒａｔ
ｉｏｎＪ　　　（Ｇ　Ｃ−２２−７０００）に記載され
ている命令が含まれる。スカラ処理ユニットＳＰは、こ
れらの命令により汎用レジスタＭ５２または主記憶装置
Ｃ５ｌ内のスカラデータについてのスカラ演算を行い、
その結果を、汎用レジスタＭ５２または主記憶装置Ｃ５
１に格納する。さらにスカラ処理ユニットＳＰは、これ
らのスカラ命令以外に、中央ベクトル処理ユニットｖＰ
Ｃおよびベクトル処理ユニットＰＥ１〜ＰＥＭが動作す
るに必要なデータを汎用レジスタＭ５２から読出し、中
央ベクトル処理ユニット■ＰＣへ線３７を通して与える
ためのス゛カラ命令および中央ベクトル処理ユニットｖ
Ｐｃの起動を指示するスカラ命令を実行する（詳細後述
）。The scalar processing unit SP has a scalar instruction register (SI
R) MS1. It has a general-purpose register (GR) MS2, a scalar arithmetic unit C53, and a control unit SC○ that controls these, and executes a scalar instruction string in the main memory C51. This scalar instruction string includes general scalar instructions, such as the manual r system/
370Principles of 0perat
ionJ (GC-22-7000). The scalar processing unit SP performs a scalar operation on the scalar data in the general-purpose register M52 or the main memory C5l according to these instructions,
The result is stored in general-purpose register M52 or main memory C5.
Store in 1. In addition to these scalar instructions, the scalar processing unit SP also has a central vector processing unit vP.
A scalar instruction and a central vector processing unit v for reading data necessary for the operation of C and vector processing units PE1 to PEM from a general-purpose register M52 and providing it to the central vector processing unit PC through a line 37.
Executes a scalar instruction that instructs PC startup (details will be described later).

中央ベクトル処理ユニット■ＰＣはスカラ処理ユニット
ＳＰにより起動されて、ベクトルアレイ命令列を主記憶
装置Ｃ５１よりフェッチし、フェッチされた命令がベク
トル命令のときはこれを実行し、フェッチされた命令が
アレイ命令のときには、ベクトル処理ユニットＰＥ１〜
ＰＥＭを起動する。さらに中央ベクトル処理ユニットｖ
Ｐｃはベクトル処理ユニットＰＥ１〜ＰＥＭでのアレイ
処理に必要なベクトルデータをベクトル処理により求め
線４４を介してこれらのユニットＰＥ１〜ＰＥＭにセッ
トアツプする。The central vector processing unit PC is activated by the scalar processing unit SP, fetches a vector array instruction sequence from the main memory C51, executes it when the fetched instruction is a vector instruction, and stores the fetched instruction in the array. In the case of an instruction, vector processing units PE1 to
Start PEM. Furthermore, the central vector processing unit v
Pc sets up vector data necessary for array processing in the vector processing units PE1 to PEM via the search line 44 in these units PE1 to PEM by vector processing.

ベクトル処理ユニットＰＥ１〜ＰＥＭは組合さって、一
つのアレイ命令を実行するもので、各ベクトル処理ユニ
ットＰＥ、は、あるアレイ命令で指定されるアレイデー
タの一つの列ベクトル又は行ベクトルに対してそのアレ
イ命令で指定される演算を他のベクトル処理ユニットＰ
Ｅ、と並列に実行する。The vector processing units PE1 to PEM are combined to execute one array instruction, and each vector processing unit PE performs one column vector or one row vector of array data specified by a certain array instruction. The operation specified by the array instruction is executed by another vector processing unit P.
Execute in parallel with E.

本実施例では、ベクトル処理ユニットＰＥ、〜ＰＥＭは
演算の結果がアレイデータ又はベクトルデータとなるア
レイ命令のみを実行し、演算の結果がスカラーデータと
なるアレイ命令は実行しないように構成されている。こ
の結果、演算の結果がスカラーデータとなるアレイ処理
は、演算の結果がベクトルデータとなるアレイ命令と、
この結果ベクトルを用いてスカラーデータを得るための
ベクトル命令により実行される。これらのアレイ命令と
ベクトル命令はそれぞれベクトル処理ユニットＰＥユ〜
ＰＥＭと中央ベクトル処理ユニットｖＰＣにより実行さ
れる。このために、中央ベクトル処理ユニットｖＰＣが
ベクトル処理ユニットＰＥ１〜ＰＥＭでのアレイ処理の
結果ベクトルを利用するために、ベクトル処理ユニット
ＰＥ１〜ＰＥＭから演算結果ベクトルを線４７１〜４７
Ｍを介して受取るように構成されている。In this embodiment, the vector processing units PE, ~PEM are configured to execute only array instructions whose operation result is array data or vector data, and not to execute array instructions whose operation result is scalar data. . As a result, array processing in which the result of an operation is scalar data is different from an array instruction in which the result of an operation is vector data.
This result vector is used to execute a vector instruction to obtain scalar data. These array instructions and vector instructions are processed by the vector processing unit PE.
It is implemented by PEM and central vector processing unit vPC. For this purpose, in order for the central vector processing unit vPC to utilize the result vectors of the array processing in the vector processing units PE1 to PEM, the calculation result vectors are transferred from the vector processing units PE1 to PEM to the lines 471 to 471.
It is configured to receive via M.

さらに、アレイ処理の内、アレイデータとスカラデータ
とを用いるアレイ処理はベクトル処理ユニットＰＥ１〜
ＰＥＭでは実行されない、このようなアレイ処理を実行
するときには、中央ベクトル処理ユニットＶＰＣにより
、このスカラデータからその同じスカラデータ番要素と
するベクトルデータを作るベクトル命令を実行し、この
ベクトルデータをベクトル処理ユニットＰＥ１〜ＰＥＭ
に転送し、それらにおいてアレイデータとこのベクトル
データとの演算を指令するアレイ命令が実行される。こ
のため、中央ベクトル処理ユニットｖＰＣはベクトル演
算により得られたベクトルデータをベクトル処理ユニッ
トＰＥ１〜ＰＥＭに線４４を介して転送するように構成
されている。Furthermore, among array processing, array processing using array data and scalar data is performed by vector processing units PE1 to
When performing such array processing, which is not performed by PEM, the central vector processing unit VPC executes a vector instruction that creates vector data from this scalar data as an element of the same scalar data number, and then performs vector processing on this vector data. Unit PE1~PEM
An array instruction is executed that directs an operation on the array data and this vector data. For this purpose, the central vector processing unit vPC is configured to transfer the vector data obtained by the vector operations to the vector processing units PE1 to PEM via the line 44.

本実施例では、スカラ処理ユニットＳＰは、スカラ命令
、スカラデータのフェッチあるいはスカラデータの主記
憶装置Ｃ５１への格納を行うために、主記憶制御装置Ｃ
５２にアクセス要求を出す。In this embodiment, the scalar processing unit SP uses the main memory control device C51 to fetch scalar instructions, scalar data, or store scalar data in the main memory device C51.
52 to issue an access request.

また、中央ベクトル処理ユニットｖＰＣはベクトル・ア
レイ命令、ベクトルデータのフェッチあるいはベクトル
データの主記憶装置Ｃ５１への格納を行うために、主記
憶制御装置Ｃ５２にアクセス要求を出す。一方、ベクト
ル処理ユニットＰＥ１〜ＰＥＭはそれぞれアレイデータ
内の一つの行又は列ベクトルをフェッチ又はストアする
ためのアクセス要求を主記憶制御装置Ｃ５２に出す。The central vector processing unit vPC also issues an access request to the main memory controller C52 in order to perform a vector array command, fetch vector data, or store vector data in the main memory C51. On the other hand, each of the vector processing units PE1 to PEM issues an access request to the main memory controller C52 to fetch or store one row or column vector in the array data.

中央ベクトル処理ユニットｖＰＣからの命令フェッチ要
求に基づきフェッチされたベクトル・アレイ命令は線６
３を介して中央ベクトル処理ユニットｖＰＣとベクトル
処理ユニットＰＥ工〜ＰＥＭにも入力されるようになっ
ており、このフェッチされた命令がアレイ命令のときに
のみベクトル処理ユニットＰＥ１〜ＰＥＭに取り込まれ
るようになっている（詳細後述）。Vector array instructions fetched based on an instruction fetch request from the central vector processing unit vPC are shown on line 6.
3, it is also input to the central vector processing unit vPC and vector processing units PE to PEM, and this fetched instruction is taken into the vector processing units PE1 to PEM only when it is an array instruction. (Details will be explained later).

主記憶制御ユニットＣ５２は、スカラ処理ユニッ１〜Ｓ
Ｐの制御部ＳＣＯからの、スカラデータまたはスカラ命
令の、読出し要求に応答して、スカラ処理ユニットＳＰ
の制御部ＳＣＯから与えられるアドレスにて指定される
スカラデータまたはスカラ命令を主記憶装置Ｃ５１から
読出し、スカラデータをスカラ処理ユニットＳＰの汎用
レジスタＭ５２又はスカラ演算器Ｃ５３へ送出し、スカ
ラ命令をスカラ命令レジスタＭ５１に送出する。また、
主記憶制御ユニットＣ５２は、スカラ処理ユニットＳＰ
の制御部ＳＣＯからのデータの書込み要求に応答して、
そのユニットＳＰの制御部ＳＣＯから与えられるアドレ
スにて指定される主記憶装置Ｃ５１内の位置に、そのユ
ニットＳＰの汎用レジスタＭ５２又はスカラ演算器Ｃ５
３から与えられるスカラデータを格納する。The main memory control unit C52 includes scalar processing units 1 to S.
In response to a read request for scalar data or a scalar instruction from the control unit SCO of P, the scalar processing unit SP
The scalar data or scalar instruction specified by the address given from the control unit SCO is read from the main memory C51, the scalar data is sent to the general-purpose register M52 of the scalar processing unit SP or the scalar arithmetic unit C53, and the scalar instruction is sent to the scalar processing unit SP. It is sent to the instruction register M51. Also,
The main memory control unit C52 includes a scalar processing unit SP
In response to a data write request from the control unit SCO,
The general-purpose register M52 or scalar arithmetic unit C5 of that unit SP is stored at the location in the main memory C51 specified by the address given from the control unit SCO of that unit SP.
Stores the scalar data given from 3.

同様に、主記憶制御ユニットＣ５２は、中央ベクトル処
理ユニットｖＰＣからの、データまたはベクトル・アレ
イ命令の、読出し要求側ご応答して、そのユニットＶＰ
Ｃから線６４または７１でそれぞれ与えられるデータア
ドレス又は命令アドレスにて指定されるデータまたはベ
クトル・アレイ命令を主記憶袋［Ｃ５１から読出し、そ
れぞれ線４３または６３を通して中央ベクトル処理ユニ
ットＶＰＣへ送出する。線６３上のベクトル・アレイ命
令はベクトル処理ユニットＰＥ１〜ＰＥＭにも入力され
る。また、主記憶制御ユニットＣ５２は、中央ベクトル
処理ユニットｖＰＣからのデータの書込み要求に応答し
て、そのユニット■ＰＣから線６４で与よられるアドレ
スにて指定される主記憶装置Ｃ５１内の位置に、線４０
上のデータを格納する。Similarly, main memory control unit C52, in response to a read request for data or vector array instructions from central vector processing unit vPC,
The data or vector array instructions specified by the data or instruction addresses provided from C on lines 64 or 71, respectively, are read from the main memory bag [C51 and sent on lines 43 or 63, respectively, to the central vector processing unit VPC. Vector array instructions on line 63 are also input to vector processing units PE1-PEM. In addition, in response to a data write request from the central vector processing unit vPC, the main memory control unit C52 writes data to the location in the main memory C51 specified by the address given by the line 64 from the unit PC. , line 40
Store the above data.

同様に、主記憶制御ユニットＣ５２は、ベクトル処理ユ
ニットＰＥｉ　（＋＝１〜Ｍ）からのデータ読出し要求
に応答して、アドレス線１３０１にて指定されるデータ
を主記憶装置Ｃ５１から読出し、線１１９．を通してベ
クトル処理ユニットＰＥ１へ送出する。また、主記憶制
御ユニットＣ５２は、ベクトル処理ユニットＰＥ工から
のデータの書込み要求に応答して、アドレス線１３０１
にて指定される主記憶装置Ｃ５１内の位置に、線１１８
１上のデータを格納する。Similarly, main memory control unit C52 reads data specified by address line 1301 from main memory device C51 in response to a data read request from vector processing unit PEi (+=1 to M), and ．． The data is sent to the vector processing unit PE1 through the vector processing unit PE1. In addition, the main memory control unit C52 responds to a data write request from the vector processing unit PE, and the address line 1301
A line 118 is placed at the location in the main storage device C51 specified by
Store the data on 1.

このように、本発明による記憶制御ユニットＣ５２は、
スカラ処理ユニットＳＰ、中央ベクトル処理ユニットｖ
ＰＣ１およびＭ台のベクトル処理ユニットＰＥ□〜ＰＥ
Ｍからのアクセス要求に。In this way, the storage control unit C52 according to the present invention:
Scalar processing unit SP, central vector processing unit v
PC1 and M vector processing units PE□~PE
In response to an access request from M.

別々に応答するように構成されている。configured to respond separately.

中央ベクトル処理ユニットｖＰＣは、第４図（ａ）に示
すように、ベクトル長ＭのＬ本のベクトルレジスタＶＣ
Ｉ〜ＶＣＬ、Ｚ本のスカラレジスタＳＣＩ〜ＳＣＺを有
する。また、各々のベクトル処理ユニットＰＥｉも、ベ
クトル長Ｎのに本のベクトルレジスタＶ、１〜ＶＩＫ、
中央ベクトル処理ユニットｖＰＣのベクトルレジスタの
本数と同数のＬ本のスカラレジスタＳ　＝　ｌ　−８ｉ
　Ｌを有する。これらのレジスタ群を第４図（ｂ）〜（
ｄ）のように再構成することにより、スカラデータを扱
う概念上のスカラレジスタＳＲ、ベクトルデータを扱う
概念上のベクトルレジスタＶＲ１そしてアレイデータを
扱う概念上のアレイレジスタＡＲを規定する。すなわち
、中央ベクトル処理ユニットｖＰＣのスカラレジスタＳ
ＣＩ〜ＳＣ２、ベクトルレジスタＶＣ１〜ＶＣＬをそれ
ぞれ、そのまま概念上のスカラレジスタＳＲＩ〜ＳＲＺ
。As shown in FIG. 4(a), the central vector processing unit vPC has L vector registers VC of vector length M.
It has I to VCL and Z scalar registers SCI to SCZ. Each vector processing unit PEi also has vector registers V, 1 to VIK, of vector length N,
L scalar registers S = l −8i, the same number as the vector registers of the central vector processing unit vPC
It has L. These register groups are shown in Figure 4(b) to (
By reconfiguring as in d), a conceptual scalar register SR that handles scalar data, a conceptual vector register VR1 that handles vector data, and a conceptual array register AR that handles array data are defined. That is, the scalar register S of the central vector processing unit vPC
CI to SC2 and vector registers VC1 to VCL are converted into conceptual scalar registers SRI to SRZ, respectively.
.

ベクトルレジスタＶＴｊｌ〜ＶＲＬと規定する。また、
第４図（ｃ）に示すように、各々のベクトル処理ユニッ
トＰＥ１〜ＰＥＭの同一番号スカラレジスタ、例えばＳ
０１〜ＳＭ１を順に並べたものも概念上の１番ベクトル
レジスタＶＲＩと規定する。同様に２番〜Ｌ番の概念上
のベクトルレジスタＶＲ２〜ＶＲＬが規定される。本実
施例では中央ベクトル処理ユニットＶＰＣのベクトルレ
ジスタＶＣＪと、ベクトル処理ユニットＰＥ１〜ＰＥＭ
のスカラレジスタ８４〜ＳＭのいずれによっても同一の
概念上のベクトルレジスタＶＲ，，が規定されるため、
後に詳述するように、ベクトルレジスタｖＣｊとスカラ
レジスタ５ｉＪ−３，ＡＪの内容が同一となるように制
御される。またベクトル処理ユニッｌ−Ｐ　Ｅ　、〜Ｐ
ＥＭの同一番号ベクトルレジスタ、例えばＶｉｌ〜ｖＭ
ｌを第４図（ｄ）のように順に並べたものを概念上の１
番アレイレジスタＡＲＩと規定する。同様に概念上のア
レイレジスタＡＲ２〜ＡＲＫが規定される。Vector registers VTjl to VRL are defined. Also,
As shown in FIG. 4(c), the same number scalar registers of each vector processing unit PE1 to PEM, for example S
The sequential arrangement of 01 to SM1 is also defined as the conceptual first vector register VRI. Similarly, conceptual vector registers VR2 to VRL numbered 2 to L are defined. In this embodiment, the vector register VCJ of the central vector processing unit VPC and the vector processing units PE1 to PEM
Since the same conceptual vector register VR, , is defined by all of the scalar registers 84 to SM,
As will be described in detail later, the contents of vector register vCj and scalar registers 5iJ-3 and AJ are controlled to be the same. Also, vector processing units l-P E , ~P
Same number vector register of EM, e.g. Vil~vM
1 arranged in order as shown in Figure 4(d) is conceptually 1.
The number array register ARI is defined as the number array register ARI. Similarly, conceptual array registers AR2-ARK are defined.

この結果、本実施例は概念上のレジスタとして、ＮＸＭ
構成のアレイレジスタＡＲをに本、ベクトル長Ｍのベク
トルレジスタＶＲを５本、そしてスカラレジスタをＺ本
有する。As a result, this embodiment uses NXM as a conceptual register.
It has two array registers AR, five vector registers VR of vector length M, and Z scalar registers.

本実施例では中央ベクトル処理ユニットｖＰＣとベクト
ル処理ユニットＰＥ、〜ＰＥＭがアクセスできる概念上
のレジスタは異なる。つまり、中央ベクトル処理ユニッ
ト■ＰＣは５本の概念上のベクトルレジスタＶＲＪ　　
（ｊ　＝１〜Ｌ）と２個の概念上のスカラレジスタＳＲ
よ　（ｉ＝１〜２）が扱え、概念上のに個のアレイレジ
スタＡＲｋ　（ｋ＝１〜Ｋ）を扱うことができない。ま
た、ベクトル処理ユニットＰＥ、〜ＰＥＭは概念上のア
レイレジスタＡＲｋ　（ｋ＝１〜Ｋ）と概念上のベクト
ルレジスタＶＲａ（ｊ＝ｌ〜Ｋ）が扱え、スカラレジス
タＳＲ１（ｉ＝１〜Ｚ）を扱うことができない。このこ
とから、このシステムではアレイデータとスカラデータ
の演算はベクトル処理ユニットＰＥ１〜ＰＥＭにより直
接実行できない。In this embodiment, the conceptual registers that can be accessed by the central vector processing unit vPC and the vector processing units PE, -PEM are different. In other words, the central vector processing unit PC has five conceptual vector registers VRJ.
(j = 1 to L) and two conceptual scalar registers SR
(i=1 to 2), but cannot conceptually handle 2 array registers ARk (k=1 to K). In addition, vector processing units PE, ~PEM can handle conceptual array register ARk (k = 1 ~ K) and conceptual vector register VRa (j = l ~ K), and can handle scalar register SR1 (i = 1 ~ Z). can't handle it. For this reason, in this system, operations on array data and scalar data cannot be directly executed by the vector processing units PE1 to PEM.

概念上のスカラレジスタＳＲ１の内容を概念上のベクト
ルレジスタＶＲｊに転送したのち、その概念上のベクト
ルレジスタＶＲ，ｊと概念上のアレイレジスタＡＲｋと
を用いて、ベクトル処理ユニットＰＥ１〜ＰＥＭにより
実行される。つまり、概念上のベクトルレジスタＶＲ，
は中央ベクトル処理ユニットｖＰＣとベクトル処理ユニ
ットＰＥ１〜ＰＥＭのインタフェースとなっている。After transferring the contents of the conceptual scalar register SR1 to the conceptual vector register VRj, the processing is executed by the vector processing units PE1 to PEM using the conceptual vector register VR,j and the conceptual array register ARk. Ru. In other words, the conceptual vector register VR,
serves as an interface between the central vector processing unit vPC and the vector processing units PE1 to PEM.

逆にベクトル処理ユニットＰＥ１〜ＰＥＭで実行された
アレイデータについ−ての演算の結果得られたベクトル
データを用いて他のスカラデータとのベクトル演算を行
う場合、中央ベクトル処理ユニットが概念上のベクトル
レジスタｖＲＪにある上述の結果ベクトルデータを用い
てベクトル演算を行う。このため、概念上のベクトルレ
ジスタＶＲＪを構成するベクトルレジスタｖＣＪおよび
スカシレジスタ群Ｓ　Ｉ　Ｊ　”””　Ｓ　Ｍ　ｊの内
容が同一になるように制御される。すなわちベクトル処
理ユニットＰＥ１〜ＰＥＭがアレイデータについての演
算を行った結果ベタ１−ルデータが得られる場合、この
ベクトルデータはスカシレジスタ群Ｓ１．ｊ〜ＳＭＪに
格納されるとともに、ベクトルレジスタＶ　ＣＪにも格
納される。逆に中央ベクトル処理ユニットｖＰＣがベク
トルデータについての演算を行った結果、ベタ１〜ルデ
ータが得られる場合、このベクトルデータはベクトルレ
ジスタＶＣＩにス１−アされるとともにスカシレジスタ
群Ｓｘ＝〜ＳＭ□にもストアされる。なお、以下では簡
単化のために概念上のスカラレジスタ、ベクトルレジス
タ、アレイレジスタを単にスカラレジスタＳＲ。Conversely, when performing vector operations with other scalar data using vector data obtained as a result of operations on array data executed by the vector processing units PE1 to PEM, the central vector processing unit A vector operation is performed using the above-mentioned result vector data in register vRJ. For this reason, the contents of the vector register vCJ and the space register group S I J """ S M j that constitute the conceptual vector register VRJ are controlled to be the same. That is, the vector processing units PE1 to PEM process the array data. When vector data is obtained as a result of the calculation for , this vector data is stored in the square register group S1.j to SMJ and also stored in the vector register V CJ.On the contrary, the central vector processing unit When vPC performs an operation on vector data, when solid data is obtained, this vector data is stored in the vector register VCI and also in the blank register group Sx=~SM□. In the following, for the sake of simplicity, conceptual scalar registers, vector registers, and array registers will simply be referred to as scalar registers SR.

ベクトルレジスタＶＲ，アレイレジスタＡＲと呼ぶ。ま
た第１番目の概念上のスカラレジスタを単にスカラレジ
スタＳＲ，と呼ぶ。概念上のベクトルレジスタ、アレイ
レジスタについても同様である。なお、以下では、スカ
ラレジスタＳＣＩ〜ＳＣＺ、Ｓ　ｔ　１〜Ｓ　Ｉ　Ｌ、
　Ｓ　２　ｌ−８２Ｌ、　７・・・＝−８Ｍ１〜ＳＭＬ
をそれぞれまとめてスカラレジスタｓｃ、ｓ、、ｓ２．
・・・・・・ｔｓＭと呼ぶことがある。同様にベクトル
レジスタＶＣＩ〜ＶＣＬ。They are called vector register VR and array register AR. Further, the first conceptual scalar register is simply referred to as a scalar register SR. The same holds true for conceptual vector registers and array registers. Note that in the following, scalar registers SCI to SCZ, S t 1 to S I L,
S 2 l-82L, 7...=-8M1~SML
are collectively stored in scalar registers sc, s, , s2 .
...It is sometimes called tsM. Similarly, vector registers VCI to VCL.

Ｖ１１〜Ｖｔ　Ｋ、Ｖ２１−Ｖ２に、□・・・・・、Ｖ
ＭＩ〜ｖＭＫをそれぞれまとめてベクトルレジスタＶＣ
，Ｖ　ｔ　−Ｖ　２−・・・・・・ｅＶＭと呼ぶことが
ある。V11-Vt K, V21-V2, □..., V
MI to vMK are collectively stored in vector register VC.
, V t -V 2-... Sometimes called eVM.

第５図は本発明のプレイプロセッサで用いるベクトル命
令、アレイ命令を記述する命令語のフォーマットを示す
。ＯＰフィールドは命令コードを指定し、Ｒ１フィール
ドは原則として演算結果が格納されるレジスタの番号を
、Ｒ２，Ｒ３フィールドは原則として演算の入力となる
レジスタの番号を指定する。Ａよ、■１ビットはＲ１フ
ィールドで指定されるレジスタがアレイレジスタＡＲ／
ベクトルレジスタＶＲ／スカラレジスタＳＲのいずれで
あるかの識別に用いられ、第６図に示すように、これら
が７１１１７８の時は、アレイレジスタＡＲは　ｒｅｏ
ｌｒｒの時はベクトルレジスタＶＲを、そして′″００
″の時はスカラレジスタＳＲを示す。FIG. 5 shows the format of instruction words describing vector instructions and array instructions used in the play processor of the present invention. The OP field specifies the instruction code, the R1 field basically specifies the number of the register where the result of the operation is stored, and the R2 and R3 fields basically specify the number of the register that will be the input of the operation. A, ■1 bit indicates that the register specified by the R1 field is the array register AR/
It is used to identify whether it is a vector register VR or a scalar register SR, and as shown in FIG. 6, when these are 711178, the array register AR is reo.
When lrr, vector register VR, and '''00
'' indicates the scalar register SR.

同様にＡ２．■２ビットおよびＡ３．Ｖ３ビットはそれ
ぞれＲ２，Ｒ３フィールドが指定するレジスタの種類を
示す。このように、命令コードとレジスタ番号と、レジ
スタ種別を分離することにより、第７図のように同一命
令コードで、種々のレジスタに関する演算が統一的に指
定できる。Similarly A2. ■2 bit and A3. The V3 bit indicates the type of register specified by the R2 and R3 fields, respectively. By separating the instruction code, register number, and register type in this way, operations regarding various registers can be specified uniformly with the same instruction code as shown in FIG.

第７図（ａ）〜（ｄ）にはそれぞれの左側部分に記載さ
れたプログラム部分を実行するための本発明による命令
語を示す。ここでＭｔＪＬＴは乗算に対する命令コード
を表わす。FIGS. 7(a) to 7(d) show instruction words according to the invention for executing the program portions described in the respective left-hand portions. Here, MtJLT represents the instruction code for multiplication.

第７図（ａ）の場合には、アレイレジスタＡＲ２，ＡＲ
３にそれぞれストアされたアレイデータＢ　（Ｉ、Ｊ）
Ｃ（Ｉ、Ｊ）の積をアレイレジスタＡ（Ｉ、Ｊ）に格納
することにより、第７図（ａ）のプログラムが実行され
る。この積の実行は、アレイレジスタ番号のみを含む命
令語−つで指示できる。In the case of FIG. 7(a), array registers AR2, AR
Array data B (I, J) respectively stored in 3
The program of FIG. 7(a) is executed by storing the product of C(I, J) in array register A(I, J). Execution of this product can be instructed by an instruction word containing only the array register number.

同様に第７図（ｂ）のプログラムではアレイレジスタΔ
Ｒ２にあるデー’ｌｒ３　（■ｔ　Ｊ）　　（ｒ＝１〜
Ｎ、Ｊ＝１〜Ｍ）と、ベクトルレジスタＶＲＬにあるベ
クトルデータＸ　（Ｊ）の積Ａ　（Ｉ、Ｊ）をアレイレ
ジスタＡＲ，１に格納する。この積を指定する命令語は
図示されているようにアレイレジスタ番号とベクトルレ
ジスタ番号を含む。Similarly, in the program of FIG. 7(b), the array register Δ
Day'lr3 (■t J) in R2 (r=1~
The product A (I, J) of vector data X (J) in the vector register VRL is stored in the array register AR,1. The instruction word specifying this product includes an array register number and a vector register number as shown.

第７図（ｃ）では、ベクトルレジスタＶＲ２゜ＶＲ３に
それぞれあるベクトルデータＹ　（Ｊ）　。In FIG. 7(c), vector data Y (J) respectively located in vector registers VR2 and VR3.

Ｚ（Ｊ）（Ｊ＝１〜Ｍ）の積Ｘ　（Ｊ）をベクトルレジ
スタＶＲＩに格納することにより図のプログラムが実行
される。この積の実行を指定する命令語は図のようにベ
クトルレジスタ番号のみからなる。The program shown in the figure is executed by storing the product X (J) of Z(J) (J=1 to M) in the vector register VRI. The instruction word specifying the execution of this product consists only of vector register numbers as shown in the figure.

第７図（ｄ）ではベクトルレジスタＶＲ２にあるベクト
ルデータＹ　（Ｊ）とスカラレジスタＳＲＩにあるスカ
ラデータＳの積Ｘ　（Ｊ）を求めて、ベクトルレジスタ
ＶＲＬに格納することにより図のプログラムが実行され
る。この積の実行を指定する命令語はスカラレジスタ番
号とベクトルレジスタ番号を含む。このように、アレイ
レジスタ間の乗算、アレイレジスタとベクトルレジスタ
間の乗算、ベクトルレジスタ間の乗算、ベクトルレジス
タとスカシレジスタ間の乗算であっても同じ命令コード
と命令フォーマットを用いることができる。利用するレ
ジスタの種類の相違は、Ａ　ｉ　ｒＢｉビットの値の違
いに反映されるのみである。In FIG. 7(d), the program shown in the figure is executed by calculating the product X (J) of vector data Y (J) in vector register VR2 and scalar data S in scalar register SRI and storing it in vector register VRL. be done. The instruction word specifying the execution of this product includes a scalar register number and a vector register number. In this way, the same instruction code and instruction format can be used for multiplication between array registers, multiplication between array registers and vector registers, multiplication between vector registers, and multiplication between vector registers and spacing registers. The difference in the type of register used is only reflected in the difference in the value of the A i rBi bit.

命令のＲ１，Ｒ２，Ｒ３フィールドのいずれかでアレイ
レジスタを指定している命令を以下、アレイ命令と呼ぶ
。アレイレジスタの指定が１つもなく、ベクトルレジス
タの指定がある命令をベクトル命令と定義する。An instruction that specifies an array register in any of the R1, R2, and R3 fields of the instruction is hereinafter referred to as an array instruction. An instruction in which no array register is specified and a vector register is specified is defined as a vector instruction.

また、アレイ命令による演算又は処理をそれぞれ以下で
はアレイ演算又はアレイ処理と呼ぶ。同様にベクトル命
令による演算又は処理をそれぞれベタ１−ル演算又はベ
クトル処理と呼ぶ。In addition, an operation or process using an array instruction is hereinafter referred to as an array operation or an array process, respectively. Similarly, operations or processing using vector instructions are called vector operations or vector processing, respectively.

第８図、第９図はそれぞれ、中央ベクトル処理ユニット
ｖｐｃ、ベクトル処理ユニッＩ−Ｐ　Ｅ　Ｌの詳細を示
す。以下、これらの装置の動作の詳細を第１１図（ａ）
のアレイ演算のプログラムに基づき説明する。8 and 9 show details of the central vector processing unit vpc and vector processing unit I-PEL, respectively. The details of the operation of these devices are shown below in Figure 11(a).
The explanation will be based on the array operation program.

今、仮定として、アレイデータＡ、Ｂ、Ｃはいずれも１
００Ｘ１００の要素からなり各要素は８バイト長とする
。さらに、アレイデータＡの各要素は主記憶装置Ｃ５１
上に第１３図に示すように、同一の制御変数Ｊを有する
要素が制御変数丁の大きさの順に、連続して記憶される
ものとする。アレイデータＢ、Ｃの各要素の配列も同様
と仮定する。Now, as an assumption, array data A, B, and C are all 1
It consists of 00x100 elements, and each element is 8 bytes long. Furthermore, each element of array data A is stored in the main memory C51.
As shown in FIG. 13 above, it is assumed that elements having the same control variable J are successively stored in order of the size of the control variable J. It is assumed that the arrangement of each element of array data B and C is also similar.

第１１図（ａ）のプログラムでは、制御変数Ｊを順次１
からＭまで変化させ、かつＪの各値に対して制御変数■
を１からＪ＋１まで順次変化させ、■とＪとの各組合せ
に対して、アレイ要素Ｂ（Ｉ。In the program shown in FIG. 11(a), the control variable J is sequentially set to 1.
to M, and for each value of J, the control variable ■
is sequentially changed from 1 to J+1, and for each combination of ■ and J, array element B(I.

Ｊ）とＣ（１，Ｊ）の和Ａ　（Ｉ、Ｊ）を求める。Find the sum A (I, J) of J) and C (1, J).

Ｍの値を指定するステートメントは簡ｍ化のために図示
されていない。制御変数Ｊがある値のときに、制御変数
Ｉを変化して得られるデータすなわち、内側ループによ
り処理されるデータＡ（Ｔ。Statements specifying the value of M are not shown for simplicity. When the control variable J has a certain value, the data obtained by changing the control variable I, that is, the data A(T) processed by the inner loop.

Ｊ）、ｌ３（Ｉ、Ｊ）、Ｃ（１，Ｊ）は、一つのベクト
ルデータとなる。したがって、アレイデータ、たとえば
Ａは、それぞれ異なるＪの値に対応するベクトルデータ
の集合となる。J), l3(I, J), and C(1, J) become one vector data. Therefore, array data, for example A, is a collection of vector data each corresponding to a different value of J.

本実施例では、ベクトル処理ユニットＰＥ工は内側のル
ープで処理されるベクトルデータ、すなわち、制御変数
Ｊの値がｉである一つの列ペクト／Ｌ／データＡ　（１
，ｉ）　　（Ｉ＝１〜ｉ、＋Ｉ）　　をパイプライン的
にベクトル処理をし、各バク１−ル処理ユニットＰＥ１
〜ＰＥＭでのベタ１〜ル処理は互いに並列に実行される
。In this embodiment, the vector processing unit PE processes the vector data processed in the inner loop, that is, one column pect/L/data A (1
, i) (I = 1 to i, +I) in a pipeline manner, and each backup processing unit PE1
〜Betta 1〜 processing in PEM is executed in parallel with each other.

このようにして、アレイデータの処理が高速に行なわれ
る。この場合、各列ベクトルのベクトル長は異なるが本
実施例は、各ベクトル処理ユニットＰＥｌはそれぞれ必
要な任意のベクトル長についての演算が可能なように構
成されている。In this way, processing of array data is performed at high speed. In this case, although the vector length of each column vector is different, in this embodiment, each vector processing unit PEl is configured to be able to perform calculations for any necessary vector length.

本実施例ではプログラムが要求するベクトル演算又はア
レイ演算を実行するのに必要なデータをＩｇ！備するた
めに次のようなセットアツプ処理が必要となる。なお、
以下のセットアツプ処理の内、スカラ処理はスカラ処理
ユニットＳＰにて実行され、ベクトル処理は中央ベクト
ル処理ユニットｖｐｃにて実行される。In this embodiment, Ig! In order to prepare, the following setup processing is required. In addition,
Among the following setup processes, scalar processing is executed by the scalar processing unit SP, and vector processing is executed by the central vector processing unit vpc.

セットアツプ１；アレイ演算を示す二重Ｄｏループの制
御変数Ｉ、Ｊのとり得る値の個数を計算する。側倒Ｄｏ
ループの制御変数Ｊのとり得る値の個数をベクトル長Ｖ
Ｌと呼ぶことにする。この計算はスカラ処理ユニットＳ
Ｐで行なう（スカラ処理）。今の例では、Ｊの範囲から
ベクトル長ＶＬはＭとなる。また、内側Ｄｏループの制
御変数Ｉのとり得る値の個数をベクトル長ＡＬと呼ぶ。Setup 1: Calculate the number of possible values of control variables I and J of a double Do loop indicating array operation. Do sideways
The number of possible values of the control variable J of the loop is defined as the vector length V
I'll call it L. This calculation is performed by the scalar processing unit S
Perform this using P (scalar processing). In the present example, the vector length VL is M from the range of J. Further, the number of possible values of the control variable I of the inner Do loop is called vector length AL.

べθトル長ＡＬは一般に外側ＤＯループの制御変数Ｊの
関数でベクトル量であるので、以下、ＡＬ（Ｊ）と表わ
す。このベクトル長ＡＬ　（Ｊ）は各ベクｉ−ル処理ユ
ニットＰＥ工が処理するべきベクトル長を示す。この計
算はスカラ処理およびベクトル処理によって行なわれる
。今の例では、■の範囲からベクトル長ΔＬ　（Ｊ）は
、Ｊ＝１．２゜・・・・・・１Ｍに対してそれぞれ２Ｉ
３．・・・９Ｍ＋１となる。Since the vector θ torque length AL is generally a function of the control variable J of the outer DO loop and is a vector quantity, it is hereinafter expressed as AL(J). This vector length AL (J) indicates the vector length to be processed by each vector processing unit PE. This calculation is performed by scalar processing and vector processing. In the present example, the vector length ΔL (J) from the range of ■ is 2I for each J=1.2°...1M
3. ...9M+1.

なお、実行すべきプログラムが一重ループからなるとき
、すなわち、ベクトル演算を必要とする場合は、この−
重Ｄｏループの制御変数の取りうる範囲を処理すべきベ
クトル長ＶＬとして計算する（スカラ処理）。Note that when the program to be executed consists of a single loop, that is, when vector operations are required, this -
The possible range of the control variable of the heavy Do loop is calculated as the vector length VL to be processed (scalar processing).

セットアツプ２：ベクトル又はアレイ処理をすべきベク
トル又はアレイデータの主記憶上の記憶位置の先頭アド
レスＶＡＤＲを計算する（スカラ処理）。今の例ではア
レイデータＡ、Ｂ、Ｃの先頭アドレスとして、それぞれ
Ａ（１，１）、Ｂ（１，１）、Ｃ（１，１）のアドレス
が求められる。Setup 2: Calculate the start address VADR of the storage location in the main memory of vector or array data to be subjected to vector or array processing (scalar processing). In the present example, the addresses A(1,1), B(1,1), and C(1,1) are obtained as the start addresses of array data A, B, and C, respectively.

セットアツプ３ニアレイデータＡ、Ｂ、Ｃのそれぞれの
外側ＤＯループの制御変数Ｊに関する隣接要素たとえば
Ａ　（Ｉ、ｊ）とＡ　（Ｉ、ｊ＋１）の間のアドレス増
分値ＩＮＣＶを計算する（スカラ処理）。今の例のアレ
イデータＡ、Ｂ、Ｃのいずれに関しても、制御変数Ｊに
関するアドレス増分値ＩＮＣＶは７Ｌ／イデータＡが１
００Ｘ１００のアレイであり、１語８バイトであるため
。Setup 3 Calculate the address increment value INCV between adjacent elements such as A (I, j) and A (I, j+1) with respect to the control variable J of the outer DO loop of each of near array data A, B, C (scalar process). For any of the array data A, B, and C in the current example, the address increment value INCV regarding the control variable J is 7L/I data A is 1.
Because it is an array of 00x100 and one word is 8 bytes.

８００となる。また、アレイデータの内側Ｄｏループの
制御変数■に関する隣接要素、たとえばＡ　（ｉ、Ｊ）
とＡ　（ｉ＋１．Ｊ）の間のアドレスの増分値ＩＮＣＡ
を制御変数Ｊの関数として計算する。It becomes 800. Also, the adjacent elements regarding the control variable ■ of the inner Do loop of the array data, for example, A (i, J)
Incremental value INCA of address between and A (i+1.J)
is calculated as a function of the control variable J.

一般にはこの増分値ＩＮＣＡ　（Ｊ）はベクトルデータ
であるので、スカラ処理およびベクトル処理により求め
られる。今の例ではアレイデータＡ。Since this increment value INCA (J) is generally vector data, it is obtained by scalar processing and vector processing. In this example, it is array data A.

Ｂ、Ｃに関しては制御変数■に関する隣接要素間の増分
値ＩＮＣＡ　（Ｊ）は、各要素が８バイ１−であるので
制御変数Ｊの値にかかわらず８となる。Regarding B and C, the increment value INCA (J) between adjacent elements regarding the control variable ■ is 8 regardless of the value of the control variable J because each element is 8 by 1-.

なお、実行すべきプログラムがベクトル演算を必要とす
る一重Ｄｏループの場合は、ＤＯループの制御変数によ
る隣接データ間のアドレスの増分値を計算し、これをＩ
ＮＣＶとする（スカラ処理）。Note that if the program to be executed is a single Do loop that requires vector operations, the increment value of the address between adjacent data using the control variable of the DO loop is calculated, and this is
NCV (scalar processing).

セットアツプ４ニアレイデータＡ、Ｂ、Ｃのそれぞれに
ついての先頭アドレスＶＡＤＲとアドレス増分値ＩＮＣ
Ｖより二重ＤＯループの内側り。Start address VADR and address increment value INC for each of set-up 4 near array data A, B, and C
Inside the double DO loop from V.

ループの制御変数Ｉを変数とするベクトルデータＡ　（
Ｉ、Ｊ）　、Ｂ　（Ｉ、ＪＬ、Ｃ（Ｉ、Ｊ）、（１＝ｌ
−Ｊ＋１）の先頭要素のアドレスＡ　Ａ　Ｄ　ＲＡ（Ｊ
　）を計算する（ベクトル処理）。Vector data A (
I, J), B (I, JL, C(I, J), (1=l
-J+1) address of the first element A A D RA(J
) (vector processing).

今の例ではアレイデータＡの先頭アドレスＡＡＤＲＡ（
Ｊ）（Ｊ＝　１〜Ｍ）はＡ　（１，１）。In the current example, the start address of array data A is AADRA (
J) (J=1-M) is A (1,1).

Ａ（１，２）・・・Ａ（１，Ｍ）のアドレスとなる。A(1,2)...This is the address of A(1,M).

アレイデータＢ、Ｃについても同様である。なお、プロ
グラムが一重ＤＯループからなるときには、そのＤｏル
ープで処理されるデータの先頭要素のアドレスＡ　Ａ　
Ｄ　Ｒｎ　（Ｊ　）　、　Ａ　Ａ　Ｄ　Ｒｃ　（Ｊ　）
が求められる。The same applies to array data B and C. Note that when a program consists of a single DO loop, the address A of the first element of the data processed in that Do loop is
D Rn (J), A A D Rc (J)
is required.

セットアツプ５：主記憶装置Ｃ５１上にあるべクトル・
アレイ命令列の先頭命令アドレスの計算を行なう　（ス
カラ処理）。Setup 5: Vector files on main memory C51
Calculates the first instruction address of the array instruction string (scalar processing).

第１１図（ｂ）は、第１１図（ａ）で示したプログラム
を実行するための命令列を書いたものである。ベクトル
・アレイ命令列ｖ１〜Ｖｌｌを中央ベクトル処理ユニッ
トｖＰＣとベクトル処理ユニットＰＥ１〜ＰＥＭにて開
始させるにあたり。FIG. 11(b) shows an instruction sequence for executing the program shown in FIG. 11(a). In starting the vector array instruction sequence v1 to Vll in the central vector processing unit vPC and the vector processing units PE1 to PEM.

スカラ処理ユニットＳＰは、スカラ命令列Ｓ１およびス
カラ命令Ｓ２を行なう。The scalar processing unit SP executes a scalar instruction sequence S1 and a scalar instruction S2.

（スカラ命令列Ｓｔ）命令列Ｓ１は前述のセットアツプ処理のうちスカラ処理
と示した処理を実行する。スカラ処理ユニットＳＰは特
願昭５６−４２３１４号で示すスカラ処理ユニットを簡
単化したものを用いる。すなわち、上記出願では、一般
的なスカラ命令のほかに次の６つの命令が実行できる構
成をとっている。（１）　５ｔａｒｔ　Ｖｅｃｔｏｒ　
Ｐｒｏｃｅｓｓｏｒ　（Ｓ　Ｖ　Ｐ　）命令（２）　Ｓ
ｅｔ　Ｖｅｃｔｏｒ　Ａｄｄｒｅｓｓ　ＲｅｆＸｉｓｔ
ｅｒ（ＳＶＡＲ）命令（３）　Ｓｅｔ　Ｖｅｃｔｏｒ　
ＡｄｄｒｅｓｓＩ　ｎｃｒｅｍｅｎｔ　Ｒｅｇｉｓｔｅ
ｒ　（Ｓ　Ｖ　Ａ　Ｉ　Ｒ）命令（４）　Ｓｅｔ　５ｃ
ａｌａｒ　Ｒｅｇｉｓｔｅｒ　（Ｓ　Ｓ　Ｒ）命令（５
）　Ｒｅａｄ　５ｃａｌａｒ　Ｒｅｇｉｓｔｅｒ　（Ｒ
Ｓ　Ｒ）命令（６）　Ｔｅ５ｔ、　Ｖｅｃｔｏｒ　Ｐｒ
ｏｃｅｓｓｏｒ　（Ｔ　Ｖ　Ｐ　）命令。　　　□本発
明のアレイプロセッサのスカラ処理ユニットＳＰでは、
このうちｓｖｐ命令、ＳＳＲ命令。(Scalar Instruction Sequence St) The instruction sequence S1 executes a process indicated as a scalar process among the aforementioned set-up processes. The scalar processing unit SP uses a simplified version of the scalar processing unit shown in Japanese Patent Application No. 56-42314. That is, the above application has a configuration in which the following six instructions can be executed in addition to general scalar instructions. (1) 5tart Vector
Processor (S V P ) instruction (2) S
et Vector Address RefXist
er (SVAR) instruction (3) Set Vector
AddressIcrement Register
r (SV A I R) instruction (4) Set 5c
alar Register (SSR) instruction (5
) Read 5calar Register (R
S R) Instruction (6) Te5t, Vector Pr
ocessor (T V P ) instruction. □In the scalar processing unit SP of the array processor of the present invention,
Among these, svp instruction and SSR instruction.

ＴＶＰ命令を実行できる構成とする。すなわち。The configuration is such that it can execute TVP instructions. Namely.

上記出願のスカラ処理ユニットより、５ＶＡＲ命令、５
ＶＡＴＲ命令、Ｒ５Ｒ命令のデコードの結果がセットさ
れるフリップフロップおよびその出力線、そしてＲ８Ｒ
命令実行のためのデータ線を省いた構成とする。From the scalar processing unit of the above application, 5 VAR instructions, 5
A flip-flop and its output line where the decoding results of the VATR and R5R instructions are set, and the R8R
The configuration does not include data lines for executing instructions.

命令列Ｓ１はセットアツプ処理としてベクトル・アレイ
命令の実行のために必要な前述のスカラデータを算出し
、これらのデータの内、ベクトル長ＶＬと先頭命令アド
レス以外のデータを中央ベクトル処理ユニットｖＰＣの
スカラレジスタＳｃにセットする。The instruction sequence S1 calculates the aforementioned scalar data necessary for executing the vector array instruction as a set-up process, and among these data, data other than the vector length VL and the first instruction address are sent to the central vector processing unit vPC. Set in scalar register Sc.

この処理のためには、通常のスカラ命令により必要なデ
ータを演算して汎用レジスタＧＲ（第３図）に格納し、
ＳＳＲ命令により、汎用レジスタＧＲのスカラデータを
スカラレジスタＳＣにセットする６ＳＳＲ命令がスカラ処理ユニッｌ−Ｓ　Ｐでデコードさ
れると、中央ベクトル処理ユニット■ＰＣ（第８図）に
は線３７から（選択された）汎用レジスタＯＲ内のデー
タが、線３５からこの命令により指定されたスカラレジ
スタＳＣのレジスタ番号が入力される。セレクタＬ２４
．Ｌ２５はこのときリセット状態にあるフリップフロッ
プ（以下ＦＦと略す）Ｆ５の出力線３６上の１１０１ｔ
信号に対応し、線３５．３７上のレジスタ番号およびス
カラデータをそれぞれ選択する。スカラレジスタＳＣに
は線７０を介してスカラ処理ユニッｈＳＰから、ＳＳＲ
命令をデコードして得られる書込み１８号が与えられる
。マルチプレクサｒ、２６はセレクタＬ２４から出力さ
れるレジスタ番号に対応する一つのスカラレジスタＳＣ
ｉに、セレクタＬ２５から出力されるスカラデータを送
出する。For this processing, the necessary data is calculated using a normal scalar instruction and stored in the general-purpose register GR (Figure 3).
The SSR instruction sets the scalar data in the general-purpose register GR to the scalar register SC.6 When the SSR instruction is decoded by the scalar processing unit l-SP, the central vector processing unit ■PC (Fig. 8) receives data from line 37. The data in the (selected) general-purpose register OR is input from line 35 to the register number of the scalar register SC specified by this instruction. Selector L24
．． L25 is 1101t on the output line 36 of the flip-flop (hereinafter abbreviated as FF) F5 which is in the reset state at this time.
Select the register number and scalar data on lines 35 and 37, respectively, corresponding to the signal. The scalar register SC is connected via line 70 from the scalar processing unit hSP to the SSR register SC.
Write No. 18 obtained by decoding the instruction is given. The multiplexer r, 26 is one scalar register SC corresponding to the register number output from the selector L24.
The scalar data output from selector L25 is sent to i.

こうして、ＳＳＲ命令で指定されるスカラレジスタＳＣ
８にスカラデータが書込まれる。なお、ＦＦ、Ｆ５は、
スカラ処理ユニットＳＰが動作中の時は”ｏ”　、中央
ベクトル処理ユニットｖＰｃまたはベクトル処理ユニッ
トＰＥｉが動作中の時は１１１　ＩＩがセットされてい
る・以上の処理によりスカラレジスタＳＣに次のスカラデー
タがセットすることができる。Thus, the scalar register SC specified by the SSR instruction
Scalar data is written to 8. In addition, FF and F5 are
When the scalar processing unit SP is in operation, "o" is set, and when the central vector processing unit vPc or vector processing unit PEi is in operation, 111 II is set. The next scalar data is stored in the scalar register SC by the above processing. can be set.

スカラレジスタＳＣ１にはアレイデータＡの先頭アドレ
スＶＡＤＲ（Ａ）が格納される。同様に、スカラレジス
タＳＣ２にはアレイデータＢの先頭アドレスＶＡＤＲ（
Ｂ）　、スカラレジスタＳＣ３にはアレイデータＣの先
頭アドレスＶＡＤＲ（Ｃ）、スカラレジスタＳＣ４には
アレイデータへの制御変数Ｊに関するアドレス増分値Ｉ
　ＮＣＶ（アレイデータＢ、Ｃのアドレス増分値ｔ　Ｎ
ＣＶもこのプログラムでは同一の値）、ＳＣ５にはアレ
イデータＡの制御変数Ｉに関するアドレス増分値ＩＮＣ
Ａ（アレイデータＢ、ＣのＩ　ＮＣＡもこの場合、同一
の値）、スカラレジスタＳＣ６にはベクトル長ΔＬの計
算のために必要なスカラデータ″′２″が格納される。The start address VADR(A) of array data A is stored in the scalar register SC1. Similarly, the scalar register SC2 contains the start address VADR (
B) The scalar register SC3 contains the start address VADR(C) of the array data C, and the scalar register SC4 contains the address increment value I regarding the control variable J to the array data.
NCV (address increment value t N of array data B, C
CV is also the same value in this program), and SC5 contains the address increment value INC regarding control variable I of array data A.
A (INCA of array data B and C are also the same value in this case), scalar data "'2" necessary for calculating the vector length ΔL is stored in the scalar register SC6.

（命令Ｓ２）ＳＳＲ命令列によるセットアツプが完了す
ると、中央ベクトル処理ユニットｖＰＣに対し、ベクト
ル・アレイ命令の実行の開始を指示するためのＳｖＰ命
令が実行される。ＳＶＰ命令が実行されると、この命令
で指定される汎用レジスタＧＲから、ベクトル長ＶＬお
よびベクトル・アレイ命令列の先頭命令アドレスが読み
出され、それぞれ線６０および３７を介して中央バク１
−ル処理ユニットｖＰＣに送出される。さらにスカラ処
理ユニットＳＰより線５７にＳｖＰ命令デコード信号が
出力される。この信号に応答して、線６０上のベクトル
長ＶＬを、ＶＬＲレジスタＭ４にストアする。(Instruction S2) When the setup using the SSR instruction sequence is completed, an SvP instruction is executed to instruct the central vector processing unit vPC to start executing the vector array instruction. When the SVP instruction is executed, the vector length VL and the first instruction address of the vector array instruction sequence are read from the general-purpose register GR specified by this instruction, and are sent to the central back 1 via lines 60 and 37, respectively.
- sent to the file processing unit vPC. Furthermore, an SvP instruction decode signal is output to a line 57 from the scalar processing unit SP. In response to this signal, vector length VL on line 60 is stored in VLR register M4.

さらに、セレクタＬ３５は、線５７上のＳｖＰ命令デコ
ード信号に応答して、線３７にのせられているベクトル
・アレイ命令列の先頭命令アドレスを選択し、命令アド
レスレジスタ（ＩＡＲ）Ｍ５にストアする。そして線５
７上の信号はＦＦ。Furthermore, in response to the SvP instruction decode signal on line 57, selector L35 selects the first instruction address of the vector array instruction string placed on line 37, and stores it in instruction address register (IAR) M5. and line 5
The signal above 7 is FF.

Ｆ５に入力され、これをセットし、そのセット出力が線
３６を介してベクトル命令制御回路（１）Ｃ１に入力さ
れ、ベクトル・アレイ処理の実行開始を指示する。この
ようにして中央ベクトル処理ユニットｖＰＣを起動後、
スカラ処理ユニットｓｐは命令Ｓ３（第１１図）を行な
う。（命令Ｓ３）この命令はＴＶＰ命令である。The set output is input to the vector instruction control circuit (1) C1 via line 36 to instruct the start of execution of vector array processing. After starting the central vector processing unit vPC in this way,
The scalar processing unit sp executes instruction S3 (FIG. 11). (Instruction S3) This instruction is a TVP instruction.

ＴＶＰ命令は中央バク１〜ル処理ユニットｖＰＣおよび
ベクトル処理ユニットＰＥ、の動作状態を、ＦＦ、Ｆ５
の出力信号をもとにテストし、条件コードを設定する命
令である。中央ベクトル処理ユニットｖＰＣまたはベク
トル処理ユニットＰＥ工が動作中のときには、条件コー
ドはｌに、動作中でないときは０にセットされる。条件
コードは。The TVP instruction changes the operating status of the central vector processing unit vPC and vector processing unit PE to FF, F5.
This is an instruction that tests based on the output signal of and sets a condition code. The condition code is set to 1 when the central vector processing unit vPC or vector processing unit PE is in operation, and 0 otherwise. The condition code is.

ＩＢＭ社発行のマニュアルｒ　Ｓ　ｙＳｔｅｍ／　３７
０ＰｒｉｎｃｉｐｌｅＳｏｆ　０ｐｅｒａｔｉｏｎ　（
Ｇ　Ｃ−２２−７０００）Ｊに記載されているような条
件付分岐命令に利用され、条件コードｌのときは、ＴＶ
Ｐ命令と条件対分岐命令とが繰り返され、条件コードが
０となった時１次のスカラ命令の実行に移る。Manual published by IBM SystemStem/37
0PrincipleSof 0operation (
G C-22-7000) J is used for conditional branch instructions, and when condition code l, TV
The P instruction and the conditional branch instruction are repeated, and when the condition code becomes 0, execution moves to the first scalar instruction.

次に中央ベクトル処理ユニットＶＰＣ（第８図）とベク
トル処理ユニットＰＥ１（第９図）の動作を説明する。Next, the operations of the central vector processing unit VPC (FIG. 8) and vector processing unit PE1 (FIG. 9) will be explained.

中央ベクトル処理ユニットｖＰＣのベクトル命令制御回
路（１）Ｃ１が起動されると。When the vector command control circuit (1) C1 of the central vector processing unit vPC is activated.

命令アドレスレジスタＭ５内にセットされた命令アドレ
ス線７１を介して、主記憶制御ユニットＣ５２に送られ
、このアドレスに基づき、主記憶装置Ｃ５１内のベクト
ル・アレイ命令列の先頭の命令が読出され、主記憶制御
ユニツＩ”　Ｃ５２と線６３を介して命令レジスタＭ１
にセットされる。The instruction is sent to the main memory control unit C52 via the instruction address line 71 set in the instruction address register M5, and based on this address, the first instruction of the vector array instruction string in the main memory C51 is read out. Main memory control unit I" C52 and instruction register M1 via line 63.
is set to

第１１図（ａ）の例では、最初のベクトル・アレイ命令
ｖ１がレジスタＭｌにセットされる。In the example of FIG. 11(a), the first vector array instruction v1 is set in register M1.

命令がレジスタＭ１にセットされると、そのフィールド
Ｍ７１〜Ｍ７３．Ｍ７５．Ｍ７６゜Ｍ７８．Ｍ７９にそ
れぞれあるの命令コードＯＰ。When an instruction is set in register M1, its fields M71-M73 . M75. M76°M78. Instruction code OP for each M79.

ＡＩ、Ｖｌビット、Ａ２．Ｖ２ビット、Ａ３゜■３ビッ
トが線１を介してベクトル命令制御回路（１）、Ｃ１に
入力される。回路Ｃ１は、命令コードＯＰに基づきベク
トル演算器Ｃ３その他の回路を制御し、Ａｔ、Ｖｌ、Ａ
２．Ｖ２．Ａ３゜■３ビットに基づき、スカラレジスタ
ＳＣ，ベクトルレジスタＶＣの読出、書込みを指示する
。なお書込みを行うべきスカラレジスタＳＣ１又はベク
トルレジスタ■Ｃ１の番号は、後述するセレクタＬ２４
およびアンドゲートＬ１４からそれぞれ与えられる。AI, Vl bit, A2. The V2 bit and the A3°3 bit are input via line 1 to the vector instruction control circuit (1), C1. The circuit C1 controls the vector arithmetic unit C3 and other circuits based on the instruction code OP, and controls At, Vl, A
2. V2. Based on the A3°■3 bits, reads and writes to the scalar register SC and vector register VC. Note that the number of the scalar register SC1 or vector register ■C1 to which writing is to be performed is determined by the selector L24, which will be described later.
and from AND gate L14, respectively.

また、コノ命令（７）Ａｔ、Ｖｘ、Ａ２．Ａ３ビットは
さらにそれぞれ線２，３，４．６を介してＯＲゲートＬ
４．ＡＮＤゲートＬ８．Ｆ９゜ＬＩＯ，Ｌｌｌからなる
回路に入力されその結果、ベクトル・アレイ命令が４つ
に分類され、対応するＦＦ、Ｆｌ−Ｆ４の１つがセット
される。ＦＦ。Also, Kono instruction (7) At, Vx, A2. The A3 bit is further connected to the OR gate L via lines 2, 3, and 4.6, respectively.
4. AND gate L8. The vector array instruction is input to a circuit consisting of F9°LIO and Lll, and as a result, the vector array instruction is classified into four, and one of the corresponding FFs, Fl-F4, is set. FF.

Ｆｌはアレイ命令で、かつ演算結果をアレイレジスタＡ
Ｒに書き込む命令すなわち、Ａ１＝１の命令によりセッ
トされ、ＦＦ、Ｆ２はアレイ命令で、かつ演算結果をベ
クトルレジスタＶＲに書き込む命令すなわち、八〇＝０
でかつＡ２又はＡ３＝１の命令によりセットされ、ＦＦ
、Ｆ３はベクトル命令で、かつ演算結果をベクトルレジ
スタＶＲに書き込む命令、すなわちＡ工”　Ａ　２　＝
　Ａ　３　＝　０１ｖｔ＝ｉの命令によりセットされ、
ＦＦ、Ｆ４はベクトル命令で、かつ演算結果をスカラレ
ジスタＳＲに書き込む命令すなわち、Ａ　ｌ　＝Ａ　２
　＝Ａ　３＝Ｏ，Ｖ、＝Ｏの命令によりセットされる。Fl is an array instruction, and the operation result is stored in array register A.
It is set by an instruction to write to R, that is, A1=1, FF and F2 are array instructions, and an instruction to write the operation result to vector register VR, that is, 80=0.
and set by an instruction with A2 or A3 = 1, and the FF
, F3 is a vector instruction and an instruction to write the operation result to the vector register VR, that is, "A" A 2 =
Set by the instruction A 3 = 01vt=i,
FF and F4 are vector instructions and instructions to write the operation result to the scalar register SR, that is, A l =A 2
=A 3=O, V, =O is set by the command.

また命令が指定するレジスタ番号を中央ベクトル処理ユ
ニットｖＰＣの対応するレジスタに送出するために、命
令（７）Ｖ　ｔ　　（Ｍ　７３）　、　Ｖ　２（Ｍ７６
）　＋　Ｖ　ａ　　（Ｍ７９）　フィＪＬ／ドとレジス
タ番号Ｒ１（Ｍ７４）、Ｒ２（Ｍ７７）、Ｒ３′（Ｍ２
Ｏ）がそれぞれ線３，５．７および線１４゜１５．１６
を介してＡＮＤゲートＬ１３．Ｒ１４゜Ｒ１５，Ｒ１６
，Ｒ１７，Ｒ１８に入力される。Further, in order to send the register number specified by the instruction to the corresponding register of the central vector processing unit vPC, the instruction (7) V t (M 73), V 2 (M 76
) + V a (M79) FiJL/do and register numbers R1 (M74), R2 (M77), R3' (M2
O) are lines 3, 5.7 and 14°15.16 respectively
via AND gate L13. R14゜R15, R16
, R17, and R18.

ＡＮＤゲートＬ１３．Ｒ１５，Ｒ１７はそれぞれＲ１，
Ｒ２又はＲ３がスカラレジスタＳＣの番号のとき、すな
わち、Ｖｌ、Ｖ２又はｖ３が０のときにそれぞれ線２２
．２４又は２６上にレジスタ番号Ｒ１，Ｒ２又はＲ３を
出力し、ＡＮＤゲートＬ１４．Ｒ１６，Ｒ１８はそれぞ
れＲ１，Ｒ２又はＲ３がベクトルレジスタＶＣの番号の
とき、すなわち、それぞれＶｌ、Ｖ２．Ｖ３が１のとき
に線２３，２５．２７上にそれぞれレジスタ番号Ｒ１，
Ｒ２，Ｒ３を出力する。AND gate L13. R15 and R17 are R1, respectively
When R2 or R3 is the number of the scalar register SC, that is, when Vl, V2 or v3 are 0, the line 22
．． 24 or 26 and outputs register number R1, R2 or R3 on AND gate L14. R16, R18 are respectively Vl, V2 . When V3 is 1, register numbers R1 and R1 are placed on lines 23 and 25.27, respectively.
Output R2 and R3.

以下、ベクトル・アレイ命令ｖｔ−ｖｔｔ（第１１図）
を説明する。命令ｖ１はベクトル長ＡＬ（Ｊ）、命令ｖ
２〜ｖ５は７Ｌ／イデータＡ、Ｂ。Below, vector array instruction vt-vtt (Figure 11)
Explain. Instruction v1 has vector length AL(J), instruction v
2-v5 are 7L/ideta A, B.

Ｃの各々の先頭アドレスＡ　Ａ　Ｄ　ＲＡ（Ｊ　）〜Ａ
ＡＤＲｃ　（Ｊ）、命令ｖ６は、アレイデータＡ。Each start address of C A A D RA (J) ~ A
ADRc (J), instruction v6 is array data A.

Ｂ、Ｃに共通なアドレス増分値ＩＮＣＡ　（Ｊ）の計算
を行なう。以上はペクト・ル処理によるセットアツプで
ある。命令ｖ７〜ｖ８でアレイデータＢ。An address increment value INCA (J) common to B and C is calculated. The above is the setup using Pectro processing. Array data B with instructions v7 to v8.

Ｃを主記憶装置Ｃ５１から読み出し、命令ｖ９でアレイ
データＢ、Ｃの加算を行ない、命令ｖｌＯで演算結果を
主記憶装置Ｃ５１に書き込む。命令Ｖｌｌでベクトル・
アレイ命令列が終了したことを知らせる。なお、命令Ｖ
ｌ−Ｖ６が指定する、概念上のスカラレジスタＳＲ１〜
６を実現するスカラレジスタＳＣＩ〜６には、すでに述
べたように、必要なスカラデータがセットされている。C is read from the main memory device C51, array data B and C are added with an instruction v9, and the operation result is written into the main memory device C51 with an instruction vlO. Vector
Notifies that the array instruction sequence has ended. In addition, the instruction V
Conceptual scalar register SR1~ specified by l-V6
As already mentioned, the necessary scalar data is set in the scalar registers SCI to SCI-6 that implement 6.

また概念上のベクトルレジスタＶＲ２を具体化するベク
トルレジスタＶＣの１〜Ｍ番のレジスタおよびベクトル
処理ユニットＰＥ１〜ＰＥＭのスカラレジスタＳ１２〜
ＳＭ２には０．■、・・・、Ｍ−１の定数ベクトルがあ
らかじめ格納されていると仮定する。（命令Ｖｔ）命令Ｖｌは概念上のベクトルレジスタＶＲＩに書き込む
ベクトル命令であるため、ＦＦ、Ｆ３がセラ１−され、
線２３，２５．２６上にはレジスタ番号＃　１　ｕ　、
　　ＩＩ　２１＋、　　１１５７＋がそれぞれセットさ
れる。In addition, registers 1 to M of the vector register VC that embody the conceptual vector register VR2 and scalar registers S12 to M of the vector processing units PE1 to PEM
0 for SM2. (2) Assume that constant vectors of M-1, . . . , are stored in advance. (Instruction Vt) Since the instruction Vl is a vector instruction that writes to the conceptual vector register VRI, FF and F3 are set to 1-,
On lines 23, 25, and 26 are register numbers # 1 u,
II 21+ and 1157+ are respectively set.

セレクタＬ　２７は線２２，２４．２６から入力される
レジスタ番号によりそれぞれ指定されるスカラレジスタ
の内容をそれぞれセレクタＬ３゜１．３２．Ｒ３１に出
力するようにのなっている。The selector L27 selects the contents of the scalar registers designated by the register numbers input from the lines 22, 24, 26, respectively, to the selectors L3, 1, 32, . It is designed to be output to R31.

今の場合、スカラレジスタＳＣ６の内容゛′２″′がセ
レクタＬ３１に出力される。セレクタＬ２８は線２３，
２５．２７から入力されるレジスタ番号によりそれぞれ
ま指定されるベクトルレジスタを選択して、それぞれセ
レクタＬ３．Ｌ３２゜Ｒ３１に接続するようになってい
る。今の場合。In this case, the contents of the scalar register SC6 ``'2'''' are output to the selector L31.The selector L28 is connected to the line 23,
25. The vector registers specified by the register numbers input from 27 are selected respectively, and the respective selectors L3. It is designed to connect to L32°R31. In this case.

ベタ１−ルレジスタｖＣ２が選択され、セレクタＬ３２
に接続される。セレクタＬ３１は、線５を介して入力さ
れる命令のビットｖ２が０か１かによリセレクタＬ２７
又はＲ２８の出力を選択する。Beta register vC2 is selected and selector L32
connected to. Selector L31 selects reselector L27 depending on whether bit v2 of the instruction input via line 5 is 0 or 1.
Or select the output of R28.

セレクタ１，３２は線７を介して入力される命令のビッ
トｖ３が０かｌかによりセレクタＬ２７又はＬ　２８の
出力を選択する。今の場合、Ｖ２＝１゜ｖ３＝０である
ノテ、−ｔ！レクタＬ３２．Ｒ３１によりそれぞれセレ
クタＬ２７．Ｌ２８が選択される。したがって、スカラ
レジスタＳＣ６とベクトルレジスタＶＣ２の出力がベク
トル演算器Ｃ３に入力される。Selectors 1 and 32 select the output of selector L27 or L28 depending on whether bit v3 of the instruction input via line 7 is 0 or 1. In this case, note that V2=1°v3=0, -t! Rector L32. R31 respectively selector L27. L28 is selected. Therefore, the outputs of the scalar register SC6 and vector register VC2 are input to the vector arithmetic unit C3.

命令ＶｌはスカシレジスタＳＲ６内にある定数２とベク
トルレジスタＶＲ２内にある定数０〜Ｍ−１からなるベ
クトルとを加算して、定数２〜Ｍ＋１からなるベクトル
をベクトルレジスタＶＲＩに格納する命令である。この
結果ベクトルは、内側Ｄｏループで処理されるベクトル
長ＡＬ　（Ｊ）を表わす。The instruction Vl is an instruction to add the constant 2 in the square register SR6 and the vector consisting of the constants 0 to M-1 in the vector register VR2, and store the vector consisting of the constants 2 to M+1 in the vector register VRI. . This result vector represents the vector length AL (J) processed in the inner Do loop.

ベクトル命令制御回路（１）ＣＩは、命令コードＯＰを
解読し、その実行を制御するもので。The vector instruction control circuit (1) CI decodes the instruction code OP and controls its execution.

ＡＤＤ命令の場合、ベクトルレジスタＶＣとスカラレジ
スタＳＣに一要素ずつ連続して読み出すことを指令し、
かつベクトル演算器Ｃ３を起動する。In the case of the ADD instruction, it commands the vector register VC and scalar register SC to read one element at a time,
And the vector calculator C3 is activated.

ベクトル演算器Ｃ３は、スカラレジスタＳＣ２から入力
された定数２とベクトルレジスタＶＣ２から入力された
定数ベクトルの各要素について、公知のパイプラインモ
ードで加算を行い、定数２〜Ｍ＋１からなるベクトルを
得る。ベクトル命令制御回路Ｈ）ｃｔは、ベタ１−ル演
算の結果が線５４上に出力され次第、命令で指定される
スカラレジスタＳＣ又はベクトルレジスタＶＲにストア
指令を送出する。セレクタＬ２４．Ｌ２５は線３６上の
ＦＦ、Ｆ５の出力が１１１７１の時ベクトル命令で指定
された書き込みスカラレジスタ番号（線２２）とデータ
Ｃ線５４）をそれぞれ選択する。またセレクタＬ４０は
、後述するＬＯＡＤ命令の時にベクトル命令制御回路（
１）ＣＩによって出される信号（線５５）により、線７
７を選択し、それ以外の時は線５４を選択する。従って
、ＡＤＤ命令の場合には、セレクタＬ４０により線上５
４の演算結果が選択され線４４に読み出される。またセ
レクタＬ３０は、上述の信号線５５およびＦＦ、Ｆ２が
セットされているか否かを示す線１９上の信号により選
択される。ＦＦ、Ｆ２がセットされている場合は線４５
の上のデータが、ＬＯＡＤ命令の時は線４３上のデータ
が、他の場合は線４４上のデータがそれぞれ選択される
。従って今の場合は、線４４上の演算結果ベクトルが選
択され、線２３上の書込みレジスタ番号のベクトルレジ
スタｖＣ１にマルチプレクサＬ２９により入力され演算
結果ベクトルの各要素が順次ストアされる。Vector arithmetic unit C3 performs addition in a well-known pipeline mode for each element of constant 2 input from scalar register SC2 and constant vector input from vector register VC2, and obtains a vector consisting of constants 2 to M+1. The vector command control circuit H)ct sends a store command to the scalar register SC or vector register VR designated by the command as soon as the result of the beta 1-al operation is output on the line 54. Selector L24. L25 selects the write scalar register number (line 22) and data C line 54) designated by the vector instruction when the output of FF and F5 on line 36 is 11171, respectively. In addition, the selector L40 controls the vector instruction control circuit (
1) The signal issued by CI (line 55) causes line 7 to
7, otherwise select line 54. Therefore, in the case of the ADD instruction, selector L40 selects 5 on the line.
4 is selected and read out on line 44. Further, the selector L30 is selected by the signal on the line 19 indicating whether or not the signal line 55 and FF and F2 are set. If FF and F2 are set, line 45
When the data on the line 43 is a LOAD command, the data on the line 44 is selected, and in other cases, the data on the line 44 is selected. Therefore, in this case, the operation result vector on line 44 is selected and input to the vector register vC1 of the write register number on line 23 by multiplexer L29, and each element of the operation result vector is stored in sequence.

ベクトルデータのある要素ｉについての演算と、その前
の要素（ｉ−１）についての演算結果の格納は、並列に
行なわれる。The calculation for a certain element i of vector data and the storage of the calculation result for the previous element (i-1) are performed in parallel.

また、ＩＩ”Ｆ、Ｆ３がセットされる命令では、同一の
概念上のベクトルレジスタＶＲ２を実現するベクトルレ
ジスタＶＣ２とベクトル処理ユニットＰＥ１〜ＰＥＭ内
のスカラレジスタＳ、２〜ＳＭ２の両方に同一演算結果
を書込む。このため、線４４上の演算結果を各ベクトル
処理ユニットＰＥ１〜ＰＥＭに順に送る必要がある。Ｆ
Ｆ。Furthermore, in an instruction in which II"F and F3 are set, the same operation result is applied to both the vector register VC2 that realizes the same conceptual vector register VR2 and the scalar registers S and 2 to SM2 in the vector processing units PE1 to PEM. Therefore, it is necessary to send the calculation results on line 44 to each vector processing unit PE1 to PEM in order.
F.

Ｆ３がセットされると、その出力が線２０を介して演算
結果送出信号回路Ｃ４に送出される。演算結果送出信号
回路Ｃ４は、カウンタ（図示せず）およびデコーダ（図
示せず）を有し、このカウンタはＦＦ、Ｆ３の出力によ
りリセットされ、ベクトル演算器Ｃ３から線５３に出力
される要素ごとの演算結果終了信号をカウントするもの
で、ベクトル処理ユニットＰＥ１〜ＰＥＭのそれぞれに
結合されている信号線５２１〜５２Ｍの内、このカウン
ト値に対応したものに送出信号を送る。またＶ　Ｌ　Ｒ
レジスタＭ４にセットされているベクトル長■Ｌを線５
０を介して読み出し、カウント値と比較し、演算終了判
定を行ない、演算終了の場合は線４８を介してＦＦ、Ｆ
３をリセットする。この間、演算結果はセレクタＬ４０
を介して線４４からベタ１〜ル処理ユニットＰＥ１〜Ｐ
ＥＭに送られる。また、ＦＦ、Ｆ３の出力は線２０を介
してベクトル処理ユニットＰＥ１〜ＰＥＭに送られ。When F3 is set, its output is sent via line 20 to the calculation result sending signal circuit C4. The calculation result sending signal circuit C4 has a counter (not shown) and a decoder (not shown). , and sends a sending signal to one of the signal lines 521 to 52M connected to each of the vector processing units PE1 to PEM, which corresponds to this count value. Also V L R
Vector length ■L set in register M4 is line 5
0, and compares it with the count value to determine the completion of the calculation. If the calculation is completed, the data is read from FF and F via the line 48.
Reset 3. During this time, the calculation result is sent to selector L40.
from the line 44 to the flat processing units PE1 to P
Sent to EM. Further, the outputs of FF and F3 are sent via line 20 to vector processing units PE1 to PEM.

ｍ４４上のデータの取込みを指示する。また、書込みベ
タ１〜ルレジスタ番号が線２３からこれらのユニットに
送られる。Instructs to import data on m44. Further, write register numbers 1 to 1 are sent from line 23 to these units.

一方、各ベクトル処理ユニットＰＥ１（第９図）では、
線２０上のＦＦ、Ｆ３の出力信号１によりセレクタし１
０８が線４４上のデータを選択し、セレクタＬ１０７が
中央ベタ１−ル処理ユニットｖＰＣよりの線２３上のレ
ジスタ番号（今の場合は番号１）を選択する。マルチプ
レクサＬ１０９によってこの番号のスカラレジスタＳｌ
ｌに線４４上のデータが入力され、線５２□から演算結
果送出信号が入力されたときにストアされる。こうして
命令ｖ１の処理が終わる。On the other hand, in each vector processing unit PE1 (FIG. 9),
Select 1 by output signal 1 of FF, F3 on line 20
08 selects the data on line 44, and selector L107 selects the register number (number 1 in this case) on line 23 from the central solid processing unit vPC. The scalar register Sl of this number is selected by multiplexer L109.
The data on the line 44 is input to 1, and is stored when the calculation result sending signal is input from the line 52□. Thus, the processing of instruction v1 ends.

また、演算結果送出回路Ｃ４により、ＦＦ。Further, the calculation result sending circuit C4 outputs FF.

Ｆ３がリセットされるとＦＦ、Ｆｌ〜Ｆ４の出力信号が
入力されているＮＯＲゲートＬ２１の出力が１となり、
この出力が線３２を介してベクトル命令制御回路（１）
ＣＩに入力される。ベクトル・アレイ命令が実行される
ときはＦＦ、Ｆｌ〜Ｆ４の１つがセットされ、演算が終
了するとセラ１〜されたＦＦをリセットする機構になっ
ているため、演算終了時には線３２の信号はｎ　１７１
となる。When F3 is reset, the output of NOR gate L21 to which the output signals of FF and Fl to F4 are input becomes 1,
This output is transmitted via line 32 to the vector instruction control circuit (1).
Input to CI. When a vector array instruction is executed, one of the FFs, Fl to F4, is set, and when the operation is completed, the FFs set from Sera1 to F4 are reset, so when the operation is completed, the signal on line 32 is n. 171
becomes.

ベクトル命令制御回路（１）Ｃ１は、このＧ号に同期し
て、＋１加算回路Ｃ６からセレクタＬ３５を介して与え
られる次の命令アドレスを命令アドレスレジスタＭ５に
セラ１−する。このセットされた次の命令アドレスが、
線７１を介して主記憶制御装置Ｃ５２に送られ、主記憶
装置Ｃ５１から次のベクトル・アレイ命令がベクトル・
アレイ命令レジスタＭ１にセラ１〜される。（命令Ｖ２
〜Ｖ６）命令ｖ２〜ｖ６は命令ｖ１と同じ種類のレジス
タを用いるもので、ＦＦ、Ｆ３がセラｌ−され命令ｖ１
と同様に処理される。ここで、命令ｖ２の命令コードＭ
　Ｕ　Ｌ　Ｔは乗算、命令ｖ６の命令コードＭＯＶＥは
転送を意味する。なおＭＯＶＥ命令にはＲ３フィールド
Ｍ８０がない。命令ｖ２はスカラレジスタＳＲ４（すな
わち、スカラレジスタ５Ｃ４）にストアされている外側
ループ処理のアドレス増分値ＩＮＣＶと、ベクトルレジ
スタＶＲ２すなわちベクトルレジスタＶＣ２およびスカ
シレジスタ群８□２〜ＳＭ２に格納されている定数ベク
トル０〜Ｍ−１との積を求めて、ベクトルレジスタＶＲ
３（すなわちベクトルレジスタＶＣ３およびスカラレジ
スタ３１３〜５Ｍ３）に格納する命令である。さらに、
命令■３は、スカラレジスタＳＲＩにストアされている
アレイデータＡの先頭アドレスＶＡＤＲと、ベクトルレ
ジスタＶＲ３に命令ｖ２により格納されたベクトルデー
タＯ〜（Ｍ−１）ＸＩＮＣＶとの和をベクトルレジスタ
ＶＲ４に格納する命令である。したがって５ベクトルレ
ジスタＶＲ４には、命令ｖ３により内側ループで処理さ
れ、異なるＪに対するアレイデータＡの列ベクトルＡ　
（Ｉ、Ｊ）（Ｉ＝１〜Ｊ＋１）の先頭要素Ａ　（１，Ｊ
）のアドレスＡＡＤＲＡ（Ｊ）からなるベクトルが格納
される。The vector instruction control circuit (1) C1, in synchronization with this G number, stores the next instruction address given from the +1 addition circuit C6 via the selector L35 into the instruction address register M5. This set next instruction address is
The next vector array instruction is sent to the main memory controller C52 via line 71 from the main memory controller C51.
Sera 1~ are stored in the array instruction register M1. (Instruction V2
~V6) Instructions v2 to v6 use the same type of register as instruction v1, and FF and F3 are set to cell l-, and instruction v1
is processed in the same way. Here, the instruction code M of instruction v2
ULT means multiplication, and instruction code MOVE of instruction v6 means transfer. Note that the MOVE instruction does not have the R3 field M80. Instruction v2 includes the outer loop processing address increment value INCV stored in scalar register SR4 (i.e., scalar register 5C4), and constants stored in vector register VR2, that is, vector register VC2 and scalar register group 8□2 to SM2. Find the product of vectors 0 to M-1 and set the vector register VR
3 (ie, vector register VC3 and scalar registers 313 to 5M3). moreover,
Instruction 3 writes the sum of the start address VADR of array data A stored in scalar register SRI and vector data O~(M-1)XINCV stored in vector register VR3 by instruction v2 to vector register VR4. This is an instruction to store. Therefore, 5-vector register VR4 contains column vector A of array data A for different J, which is processed in the inner loop by instruction v3.
First element A of (I, J) (I=1 to J+1) (1, J
) is stored.

命令ｖ４はスカラレジスタＳＲ２にあるアレイデータＢ
の先頭アドレスＶＡＤＲｎ　（Ｊ　）を用いて命令ｖ３
と同じように、アレイデータＢの、内側ループにて処理
される列ベクトルＢ　（Ｉ、Ｊ）（Ｉ＝ｌ−Ｊ＋１）の
先頭要素Ｂ　（１，Ｊ）のアドレスＡ　Ａ　Ｄ　Ｒｅ　
（Ｊ　）からなるベクトルをベクトルレジスタＶＲ５に
格納する。Instruction v4 reads array data B in scalar register SR2.
instruction v3 using the start address VADRn (J) of
Similarly, the address A A D Re of the first element B (1, J) of the column vector B (I, J) (I=l-J+1) processed in the inner loop of array data B
(J) is stored in vector register VR5.

同様に命令ｖ５は、アレイデータＣの異なるＪに対する
先頭アドレスＡ　Ａ　Ｄ　Ｒｃ　（Ｊ　）からなるベク
トルをベクトルレジスタＶＲ６に格納する。Similarly, instruction v5 stores a vector consisting of start addresses A A D Rc (J) for different J of array data C in vector register VR6.

命令ｖ６はスカラレジスタＳ　Ｒ５にあるアレイデータ
Ａ、Ｂ、Ｃに共通である、アドレス増分値ＩＮＣＶをベ
クトルレジスタＶＲ７の各要素レジスタにそのまま書込
む命令である。この命令は、スカラレジスタＳ　Ｃ５に
ある増分値ＩＮＣＶをベクトル演算器Ｃ３を素通りして
ベクトルレジスタＶＣ７およびスカラレジスタ８１７〜
ｓＭ７に書き込むことにより実行される。こうして、各
要素がアドレス増分値Ｉ　ＮＣＶに等しいベタ１−ルが
ベタ１−ルレジスタＶＲ７にス１へアされる。Instruction v6 is an instruction for writing the address increment value INCV, which is common to array data A, B, and C in scalar register SR5, into each element register of vector register VR7 as it is. This instruction passes the increment value INCV in the scalar register SC5 through the vector arithmetic unit C3 to the vector register VC7 and the scalar registers 817 to 817.
This is executed by writing to sM7. In this way, the beta 1-ball, each element of which is equal to the address increment value INCV, is stored in the beta 1-ball register VR7.

（命令Ｖ７）この命令はベクトルレジスタＶ　Ｒ５にある先頭アドレ
スＡ　Ａ　Ｄ　Ｒｎ　（Ｊ　）のベクトルの各要素を先
頭アドレスとと、ベクトルレジスタＶＲ７にある増分値
ＩＮＣＶのベクトルの各要素を増分値とする複数のベク
トルデータを主記憶装置Ｃ５１よリフエッチして、アレ
イレジスタＡＲＩに格納する命令である。(Instruction V7) This instruction sets each element of the vector with the start address AAD Rn (J) in the vector register VR5 as the start address, and each element of the vector with the increment value INCV in the vector register VR7 as the increment value. This is an instruction to refetch a plurality of vector data from the main memory C51 and store it in the array register ARI.

命令ｖ７が命令レジスタＭｌにセットされると。When instruction v7 is set in instruction register Ml.

ＦＦ、Ｆｌがセラ１−される。線１８上のＦＦ。FF and Fl are activated. FF on line 18.

Ｆ”　１の出力信号はＯｒ（ゲー１へＬ　２２に入力さ
れ、その出力は線３０を介してベクトル命令制御回路（
１）Ｃ１と各ベクトル処理ユニットＰＥ１のベクトル命
令制御回路（２）Ｃ１ｏｔ、（第９図）に入力される。The output signal of F"1 is input to L22 to Or(gate 1), and its output is connected to the vector instruction control circuit (
1) Input to C1 and the vector instruction control circuit of each vector processing unit PE1 (2) C1ot (FIG. 9).

ＯＲゲートＬ２２は、ＦＦ、Ｆｌ〜Ｆ２の出力が入力さ
れており〜アレイ命令の時に出力がｒｔ　Ｉ　ＩＦとな
る。ベクトル命令制御回路（１）Ｃ１はＲ３０を介して
信号ビ′が人力された時。The outputs of the FFs, Fl to F2 are input to the OR gate L22, and the output becomes rt I IF in the case of an array command. Vector command control circuit (1) C1 receives signal B' manually through R30.

デコードを中止する。また、ベクトル命令制御回路（２
）Ｃ１ｏｔは線３０を介して信号ビ′が人力された時、
線６３上にあるアレイ命令をアレイ命令レジスタＭＩＯ
Ｉにセラ１へする。Stop decoding. In addition, the vector instruction control circuit (2
) C1ot is when the signal B' is inputted via line 30,
The array instruction on line 63 is stored in the array instruction register MIO.
Set I to Sera 1.

レジスタＭＩＯＩにセットされた命令のフィールドＭ１
７１〜Ｍ　＋　７３　、　Ｍ　１７５　、　Ｍ　Ｉ　７
６　。Field M1 of the instruction set in register MIOI
71~M+73, M175, MI7
6.

Ｒ１７８，Ｍ］７９にそれぞれある命令コー］くＯＰ、
Ａｔ、Ｖｌ、Ａ２．Ｖ２．Ａ３．Ｖ３は線１０２を介し
てベクトル命令制御回路（２）Ｃｌ０Ｉに入力される。The command codes in R178 and M79 are OP,
At, Vl, A2. V2. A3. V3 is input via line 102 to vector instruction control circuit (2) Cl0I.

回路Ｃ１０１は命令コード○Ｐに基づきベクトル演算器
ＣｌＯ３その他の回路を制御して命令を実行するととも
に、スカシレジスタＳ１．ベクトルレジスタＶｉの書込
み。The circuit C101 controls the vector arithmetic unit ClO3 and other circuits based on the instruction code ○P to execute instructions, and also executes the instructions in the scan registers S1. Write to vector register Vi.

読出しを制御する。なお、書込みを行うべきスカラレジ
スタＳｉＪあるいはベクトルレジスタＶｉ、＋の番号ｊ
は、それぞれセレクタＬ１０７とアンドゲート■、１０
２から与えられる。Control reading. Note that the number j of the scalar register SiJ or vector register Vi,+ to which writing is to be performed
are selector L107 and AND gate ■, 10 respectively.
Given from 2.

命令が指定するレジスタ番号Ｒ１〜Ｒ３をベクトル処理
ユニットＰＥ、の対応するレジスタに送出するためにＡ
工、Ａ２．Ａ３フィールドとレジスタ番号Ｒ１（Ｍ１７
４）、Ｒ２（Ｍ１７７）。A to send the register numbers R1 to R3 specified by the instruction to the corresponding registers of the vector processing unit PE.
Engineering, A2. A3 field and register number R1 (M17
4), R2 (M177).

Ｒ３（Ｍ２Ｂ５）が線１０３，１０５，１０７および線
１０４，１０６，１０８をそれぞれ介してＡＮＤゲート
ＬＩＯＩ、ＬｉＯ２，ＬｉＯ２゜ＬｉＯ２，Ｌｉ２Ｓ、
Ｌ１０６に入力される。R3 (M2B5) connects AND gates LIOI, LiO2, LiO2°LiO2, Li2S,
It is input to L106.

ＡＮＤゲートＬＬＯＬ、ＬｉＯ２，Ｌｉ２Ｓはそれぞれ
レジスタ番号Ｒ１，Ｒ２，Ｒ３がベクトルレジスタＶＲ
に対するとき、すなわち、それぞれ、Ａ１＝Ｏ，Ａ２＝
Ｏ，Ａ３＝Ｏのときに線１０９．１１１，１１３に、そ
れぞれレジスタ番号Ｒ１，Ｒ２，Ｒ３を送出する。ＡＮ
ＤゲートＬ１０２．ＬｉＯ２，ＬｌｏＧはそれぞれＲ１
゜Ｒ２，Ｒ３がアレイレジスタに対するとき、すなわち
、それぞれＡＩ＝１．Δ２＝ｌ、Ａ３＝１のときにそれ
ぞれ線１１０，１１２，１１４にレジスタ番号Ｒ１，Ｒ
２，Ｒ３を出力する。命令■７の場合、線１１０，１１
１，１１３上にはレジスタ番号１１１１　、　　ＬＩ５
Ｈ、１１７１１がそれぞれセラ１〜される。AND gates LLOL, LiO2, and Li2S have register numbers R1, R2, and R3, respectively, as vector registers VR.
, that is, A1=O, A2=, respectively.
When O, A3=O, register numbers R1, R2, and R3 are sent to lines 109, 111, and 113, respectively. AN
D gate L102. LiO2 and LloG are each R1
゜When R2 and R3 are for array registers, that is, AI=1. When Δ2=l and A3=1, register numbers R1 and R are applied to lines 110, 112, and 114, respectively.
2, output R3. In the case of instruction ■7, lines 110, 11
On 1,113 is register number 1111, LI5
H, 11711 are respectively set to Sera 1~.

命令ｖ７が解読されると、ベクトル命令制御回路（２）
Ｃ１ｏｔは、線１６３により入力される１番目のスカラ
レジスタＳ　ｌｌ内のベクトル長ＡＬのｉ番目の要素Ａ
Ｌ　（ｉ）をベクトル参照制御回路ＣｌＯ２に与えて、
これを起動する。この命令により指定されるアレイのア
ドレスＡ　Ｄ　Ｄ　Ｒｎ　（ｉ　）のｉ番目の要素と増
分値ＩＮＣＡがそれぞれ格納されているスカラレジスタ
Ｓ、５．Ｓｉ７の番号５，７はそれぞれ線１１１．１１
３を介してセレクタＬ１１０に入力され、セレクタＬＩ
ＩＯはＡＡＤＲｏ（ｉ）とＩＮＣＡをそれぞれ線１１５
．ｌｉＧに読み出しベクトル参照制御回路ＣｌＯ２に送
出する。When instruction v7 is decoded, vector instruction control circuit (2)
C1ot is the i-th element A of the vector length AL in the first scalar register Sll input by line 163.
Applying L (i) to the vector reference control circuit ClO2,
Start this. 5. A scalar register S in which the i-th element of the array address A D D Rn (i) specified by this instruction and the increment value INCA are stored, respectively; Numbers 5 and 7 of Si7 are lines 111.11 respectively
3 to the selector L110, and the selector LI
IO connects AADRo(i) and INCA to line 115 respectively.
．． liG to the read vector reference control circuit ClO2.

先頭アドレスＡ　Ａ　Ｄ　Ｒｅ　（ｉ　）およびアレイ
データＢのインクリメントｒＮＶＡ　（ｉ）はそれぞれ
Ｂ　（１，ｉ）と８に等しい回路ＣｌＯ２は、これらの
読み出しデータに基づき、アレイデータＢのｉ番目の列
ベクトルｉ＋１）を読み出すために先頭アドレスＡＡＤＲｕ　（
ｉ）のｉ番目の要素を順次増分値Ｉ　ＮＣＡだけ更新し
たアドレスを発生し，線１３０□を介してＳ　ＣＵ，　
Ｃ　５　２に順次入力する。The start address A AD Re (i) and the increment rNVA (i) of array data B are equal to B (1, i) and 8, respectively. Based on these read data, the circuit ClO2 reads the i-th column of array data B. To read the vector i+1), start address AADRu (
generates an address in which the i-th element of i) is sequentially updated by an increment value INCA, and sends it via line 130□ to SCU,
Input C 5 2 sequentially.

この結果、線１１９１上に、アレイデータＢのｉ番目の
列ベクトルの要素が順次読み出される。−方，線１１９
□上の読み出されたデータはセレクタＬ１１３によりベ
クトル命令制御回路（２）Ｃ１０１からの線１２０上の
Ｌ　Ｏ　Ａ　Ｄ命令デコード信号で選択され、線１１０
上にある書き込みレジスタ番号により制御されるセレク
タ■、１１２により、１番目のベタ１−ルレジスタＶｉ
１にス１−アされる。必要なベクトル長ＡＬ　（ｉ）の
データが読み出され演算が終了すると，ベクトル命令制
御回路（２）ＣＩＯＩは，線６１ｔを介して中央ベクト
ル処理ユニットｖＰＣへその報告を行なう。以上の処理
を各ベクトル処理ユニッｌ−　Ｐ　Ｅ　ｉが並列に動作
し，アレイデータＢの必要な列ベクトルがすべてアレイ
レジスタＡＲＩにストアされる。As a result, the elements of the i-th column vector of array data B are sequentially read out on line 1191. - direction, line 119
□The read data above is selected by the selector L113 with the L O A D instruction decode signal on the line 120 from the vector instruction control circuit (2) C101, and is sent to the line 110.
The selector 112 controlled by the write register number located above selects the first solid register Vi.
1. When the data of the required vector length AL (i) is read and the operation is completed, the vector instruction control circuit (2) CIOI reports it to the central vector processing unit vPC via the line 61t. The above processing is performed in parallel by each vector processing unit l-PEi, and all necessary column vectors of array data B are stored in array register ARI.

一方、中央ベクトル処理ユニットｖＰＣでは。On the other hand, in the central vector processing unit vPC.

その終了報告を、線６１工〜６１Ｍを介してＦＦ。The completion report is sent to FF via line 61-61M.

Ｆ６ｔ〜Ｆ６Ｍ（第８図）に受ける。そして、その出力
をＡＮＤゲート１．８８に入力し、すべてのベクトル処
理ユニットＰＥ，が演算終了した時に、信号１１　１　
Ｉ＋を線５６上に出力する。この命令の場合のように、
ＦＦ，Ｆｌがセラ１〜されている時は、ＡＮＤゲートＬ
２３、線３４を介してＦＦ，Ｆｌがリセットされる。It is received from F6t to F6M (Fig. 8). Then, the output is input to the AND gate 1.88, and when all the vector processing units PE have completed their calculations, the signal 11 1
Output I+ on line 56. As in the case of this command,
When FF and Fl are set to 1~, AND gate L
23, FF and Fl are reset via line 34.

また、ＦＦ，Ｆ６ｚ〜Ｆ６Ｍのりセラ１−は線６２を介
して、次の命令デコード時に，ベタ１ヘル命令制御回路
（１）ＣＩによって行なわれる。Further, FF, F6z to F6M are decoded via line 62 by solid 1 hell instruction control circuit (1) CI at the time of next instruction decoding.

（命令ｖ８〜ｖ１０）命令ｖ８〜ｖ１０も、命令ｖ７と同様に、ＦＦ。(Instructions v8 to v10) Instructions v8 to v10 are also FFs like instruction v7.

Ｆｌがセットされる命令であり、ベクトル処理ユニット
ＰＥｉで実行される。This is an instruction to which Fl is set, and is executed by the vector processing unit PEi.

命令ｖ８により、アレイレジスタＡＲ２に主記憶装置Ｃ
５１上のアレイデータＣがストアされる。By instruction v8, main memory C is stored in array register AR2.
Array data C on 51 is stored.

ＡＤＤ命令■９の処理は次のように行なわれる。The processing of ADD instruction (2) 9 is performed as follows.

ベクトル処理ユニツｌ−Ｐ　Ｅ　ｌでは、線１１２゜１
１４にＡＮＤゲートＬ１０４．ＬｉＯ２からレジスタ番
号１．２が出力され、セレクタＬｌｌｌによりベクトル
レジスタｖｉ　ｔ、Ｖ、２がそれぞれセレクタＬ１１５
．ＬＩ１４に接続される。セレクタＬｌ　１５．Ｌｌ　
１４は、それぞれ、線１０７．１０５に命令レジスタＭ
１０１から出力されるＡ３．Ａ２ビットが１のときにセ
レクタＬ　１１１を選択する。こうして、ベクトルレジ
スタＶ、１．Ｖｉ２がベクトル演算器ＣｌＯ３に接続さ
れる。In the vector processing unit l-P E l, the line 112°1
14 and AND gate L104. Register number 1.2 is output from LiO2, and vector registers vi t, V, and 2 are set to selector L115 by selector Lllll.
．． Connected to LI14. Selector Ll 15. Ll
14 are connected to the instruction register M on lines 107 and 105, respectively.
A3. output from 101. Selector L 111 is selected when the A2 bit is 1. Thus, vector registers V, 1 . Vi2 is connected to vector operator ClO3.

ベクトル命令制御回路（２）ＣＩＯＩは、命令ｖ９の命
令コードに応答して、ベクトルレジスタＶよ１　ｅ　Ｖ
　１２からアレイデータＢ、Ｃのｉ番目の列ベクトルＢ
（１，ｉ）〜Ｒ（ｉ＋１＋ｉ）およびＣ（１，ｉ）〜Ｃ
（ｉ＋Ｌ、ｉ）を順次、１要素ずつ読出す。Vector instruction control circuit (2) CIOI responds to the instruction code of instruction v9 by registering vector register V1 e V
12 to i-th column vector B of array data B, C
(1,i)~R(i+1+i) and C(1,i)~C
(i+L, i) is read out one element at a time.

ベクトル演算器ＣｌＯ３は、入力されたデータの各要素
に対して、公知のパイプラインモードでベクトル命令制
御回路（２）ｃｔｏｌの制御下で、命令で指定された演
算を行う。命令ｖ９の場合は、加算を行ない、アレイデ
ータＡのｉ番目の列ベクトルＡ（１，ｉ）〜Ａ（ｉ＋ｌ
、ｉ）を順次出力する。演算結果は線１１７を介してセ
レクタＬＬ１３．Ｌ１０８に入力される。セレクタＬ１
１３．Ｌ１０Ｂはそれぞれ線２０，１２０上の信号が０
なので、線１１７をともに選択する。The vector arithmetic unit ClO3 performs an operation specified by an instruction on each element of input data under the control of a vector instruction control circuit (2) ctol in a known pipeline mode. In the case of instruction v9, addition is performed and the i-th column vector A(1,i) to A(i+l
, i) are sequentially output. The calculation result is sent via line 117 to selector LL13. It is input to L108. Selector L1
13. For L10B, the signals on lines 20 and 120 are 0, respectively.
Therefore, line 117 is selected together.

しかしセレクタし１０８の出力が入力されるマルチプレ
クサし１０９にはセレクタし１０７からはレジスタ番号
が入力されない。セレクタＬ１０７はアントゲ−１−Ｌ
　１０１を選択するが、アンドゲートＬ１０１からは、
命令ｖ９のＡ１ビットが１のため、レジスタ番号が出力
されないからである。However, the register number is not input from the selector 107 to the multiplexer 109 to which the output of the selector 108 is input. Selector L107 is Antogame-1-L
101 is selected, but from AND gate L101,
This is because the A1 bit of instruction v9 is 1, so the register number is not output.

一方、セレクタＬＬ１３の出力が入力されるマルチプレ
クサＬ１１２にはアンドゲートＬ１０２からレジスタ番
号３が入力される。こうして、ベクトルレジスタＶ　ｔ
　３に演算結果が転送されベクトル命令制御回路（２）
Ｃ１ｏｔの制御下でストアされる。On the other hand, register number 3 is inputted from the AND gate L102 to the multiplexer L112 to which the output of the selector LL13 is inputted. Thus, vector register V t
The calculation result is transferred to vector instruction control circuit (2).
Stored under control of C1ot.

こうして、ベクトルレジスタＶよ３にアレイデータＡの
ｉ番目の列ベタ１〜ルＡ（１，ｉ）〜Ａ（ｉ＋ｌ、ｔ）
がストアされる。ベタ１〜ル処理ユニットＰＥ１〜ＰＥ
Ｍがこの処理を並列に実行するので、アレイデータＡが
アレイレジスタＡＲ３に格納されたことになる。In this way, the i-th column vectors 1 to A(1,i) to A(i+l,t) of array data A are stored in vector register V3.
is stored. Solid 1~le processing unit PE1~PE
Since M executes this process in parallel, array data A is stored in array register AR3.

５ＴＯＲＥ命令ＶＩＯの実行時には各ユニットＰＥ、に
おいてＬＯＡＤ命令ｖ７又は■８と同じくこの命令のＲ
２，Ｒ３フィールドで指定したアレイデータＡのｉ番目
の列ベクトルの先頭アドレスＡＡＤＲＡと、増分値ＩＮ
ＣＡをもつスカラレジスタＳ　＞　４　ｒ　Ｓ　＞　７
の内容がセレクタＬ１１０により選択されてベクトル参
照制御回路ＣｌＯ２に入力される。5 When the TORE instruction VIO is executed, the R of this instruction is
2. The start address AADRA of the i-th column vector of array data A specified in the R3 field and the increment value IN
Scalar register with CA S > 4 r S > 7
The contents of are selected by selector L110 and input to vector reference control circuit ClO2.

ベクトル命令制御回路（２）ＣＩＯＩはスカラレジスタ
Ｓ、１内にあるベクトル長ベク１−ルＡＬのｉ番目の要
素Ａ（ｉ）を受は取り、さらにこれをベクトル参照制御
回路ＣｌＯ２に送出して、これを起動する。回路ＣｌＯ
２はこれらのデータより主記憶装置Ｃ５１上にアレイデ
ータＡのｉｌ目の列ベクトルＡ（１，ｉ）〜Ａ（ｉ＋１
．ｉ）の各要素を格納すべきアドレスを順次発生し、線
１３０□を介して、Ｓ　ＣＵ、　Ｃ５２に送る。また格
すべきアレイデータＡのｉ番目の列ベクトルＡ（ｉ、ｉ
）〜Ａ　（ｉ＋ｌ、ｉ）を格納したべりｌ−ルレジスタ
Ｖ□３がこの命令Ｒ１フィールドに応答するセレクタＬ
ｌｌｌと、この命令のＡｔフィールドに応答するセレク
タＬ１３０により選択され、これらのデータが一要素ず
つ順次線１１８１上に読み出され、ベクトル参照制御回
路ＣｌＯ２から出力されるアドレスに基づいて主記憶装
置Ｃ５１にス１−アされる。Vector instruction control circuit (2) CIOI receives the i-th element A(i) of vector length vector 1-AL in scalar register S1, and further sends it to vector reference control circuit ClO2. , launch this. circuit ClO
2 stores the il-th column vectors A(1,i) to A(i+1) of array data A on the main memory C51 from these data.
．． Addresses to store each element of i) are sequentially generated and sent to the SCU, C52 via line 130□. Also, the i-th column vector A(i, i
)~A (i+l,i) is stored in the selector L register V□3 that responds to this instruction R1 field.
Ill and the selector L130 responsive to the At field of this instruction, these data are sequentially read out onto the line 1181 one element at a time, and are stored in the main memory C51 based on the address output from the vector reference control circuit ClO2. is read.

（命令Ｖ１１）Ｅ　Ａ　Ｐ命令Ｖｌｌは、Ｒ１，Ｒ２，Ｒ３７，ｒ−ル
ドがなく、命令コードＯＰ、Ｍ７１だけもつ命令である
。この命令がベクトル・アレイ命令レジスタＭｌにセッ
トされると、ベクトル命令制御回路（１）ＣＩは、線５
８を介して、ＦＦ、Ｆ５をリセットし、ベクトル・アレ
イ処理が終了したことをスカラ処理ユニットＳＰに知ら
せる。(Instruction V11) The EAP instruction Vll is an instruction that does not have R1, R2, R37, or r-old, but only has instruction codes OP and M71. When this instruction is set in the vector array instruction register Ml, the vector instruction control circuit (1) CI
8, it resets the FF, F5, and notifies the scalar processing unit SP that the vector array processing has ended.

以上で第１１図のプログラムによる動作説明は終わるが
、このプログラムでは現われなかったＦＦ、Ｆ２又はＦ
４をセットする命令について以下で説明する。This concludes the explanation of the operation using the program shown in Figure 11, but FF, F2, or F
The instruction to set 4 will be explained below.

第１２図（ａ）にはプログラム例とそのベクトル・アレ
イ命令列の後半の一部が示しである。FIG. 12(a) shows an example program and part of the latter half of its vector array instruction sequence.

（命令ＶＩＯＩ）命令ＶＩＯＩは演算結果をベクトルレジスタＶＲに格納
する命令であるのでＦＦ、Ｆ２がセットされ、ＦＦ、Ｆ
ｌがセットされた場合と同様に、オアゲートＬ２２から
の腺３０上の信号により。(Instruction VIOI) Since the instruction VIOI is an instruction to store the operation result in the vector register VR, FF and F2 are set, and FF and F2 are set.
By the signal on gland 30 from or gate L22, as if l were set.

ベクトル処理ユニットＰＥ、の命令レジスタＭＩＯＩに
線６３上のアレイ命令がセットされる。The array instruction on line 63 is set in instruction register MIOI of vector processing unit PE.

命令制御ユニツＩ−（２）Ｃ１ｏｔは命令コード部ＯＰ
、Ｍ１７１を解読する。ＰＲＯＤ命令は内積命令を意味
する。Instruction control unit I-(2) C1ot is instruction code section OP
, decipher M171. The PROD instruction means an inner product instruction.

各ベクトル処理ユニットＰＥ、ではベクトルレジスタ番
号１とＶ工２にそれぞれあるアレイデータＡ、Ｂのそれ
ぞれの１番目の列ベクトルＡ（１，ｉ）〜Ａ　（Ｎ、ｉ
）とＢ　（１，ｉ）〜Ｒ（Ｎ、ｉ）の内積がとられその
結果Ｓ　（ｉ）をスカラレジスタ３．３にストアする。In each vector processing unit PE, the first column vectors A(1,i) to A(N,i
) and B (1, i) to R (N, i) is taken and the result S (i) is stored in the scalar register 3.3.

そして演算終了の報告を中央ベクトル処理ユニツｔ−ｖ
　ｐ　ｃに対しベクトル命令制御回路（２）ｃｔｏｔが
線６１ｔを介して行なう。また、演算結果Ｓ　（＋）は
次のようにして中央ベクトル処理ユニット■ＰＣにある
ベクトルレジスタＶＣ３にもストアされる。Then, a report on the completion of the calculation is sent to the central vector processing unit tv.
The vector command control circuit (2) ctot performs the command for pc via the line 61t. Further, the operation result S (+) is also stored in the vector register VC3 in the central vector processing unit PC as follows.

ＦＦ、Ｆ２がセットされるとその出力は線１９を介して
演算結果送出要求回路Ｃ５に送られる。When FF and F2 are set, their outputs are sent via line 19 to the calculation result sending request circuit C5.

回路Ｃ５の詳細は第１０図に示される。第１０図を参照
するに、まずＦＦ、Ｆ２のセット時に、その出力線１９
によりＦＦ、Ｆ２０１１〜Ｆ２０１Ｍ、ＦＦ、Ｆ２０２
□〜Ｆ２０２Ｍはリセットされる。各ベクトル処理ユニ
ットＰＥ１からの線６１１上の演算終了信号によりＦＦ
。Details of circuit C5 are shown in FIG. Referring to FIG. 10, first, when setting FF and F2, the output line 19
FF, F2011~F201M, FF, F202
□ to F202M are reset. The FF is activated by the operation end signal on line 611 from each vector processing unit PE1.
.

ＦｊＯｌｔがセットされる。ＦＦ、Ｆ２０２□は、ベク
トル処理ユニットＰＥｌに対する演算結果送出要求信号
をＩ＠　５１　ｌ上にエンコーダＣ２０１が後述のよう
にして出した時に線２０２１を介してエンコーダＣ２０
１から出力される信号によりセットされる。ＦＦ、Ｆ２
０１　、の出力とＦＦ。FjOlt is set. FF, F202□ sends a request signal to the vector processing unit PEl to the encoder C20 via the line 2021 when the encoder C201 outputs the calculation result sending request signal on I@51l as described below.
It is set by the signal output from 1. FF, F2
01, output and FF.

Ｆ２０２ｔの出力の反転信号とＦＦ２０２１−０の出力
がそれぞれは線２０４１，２０３．。The inverted signal of the output of F202t and the output of FF2021-0 are connected to lines 2041, 203. .

２０３１−１を介してＡＮＤゲートＬ３００．につなが
りその出力は線２０１１を通してエンコーダＣ２０１に
入力される。エンコーダＣ２０１は信号１が入力される
と、線５１１へ演算結果送出要求を出力する。2031-1 through AND gate L300. and its output is input to encoder C201 through line 2011. When the encoder C201 receives the signal 1, it outputs a calculation result sending request to the line 511.

この結果、エンコーダＣ２０１は、ベクトル順次処理ユ
ニットＰＥ、からＰＥＭに対して演算結果送出要求をそ
れぞれ線５１□〜５１Ｍを介して送ることができる。こ
の際、処理ユニッＩ〜ＰＥでの演算が終了した後でなけ
れば処理ユニツ１−ＰＥ１に対するこの要求は送出され
ない。そしてエンコーダＣ２０１は、線５０を介して入
力されるＶＬＲレジタ、Ｍ４のベクトル長ＶＬと送出要
求した回数を比べ、送出要求がすべて終了した後、線４
９を介してＦＦ、Ｆ２をリセットする。As a result, the encoder C201 can send a calculation result sending request from the vector sequential processing units PE to PEM via the lines 51□ to 51M, respectively. At this time, this request to the processing units 1-PE1 is not sent until after the calculations in the processing units I-PE are completed. Then, the encoder C201 compares the vector length VL of the VLR register M4 inputted via the line 50 with the number of transmission requests, and after all the transmission requests are completed, the encoder
FF and F2 are reset via 9.

一方、ベクトル処理ユニットＰＥ工は、演算結果Ｓ　（
ｉ）をベクトル命令制御回路（２）、Ｃ１０１の制御に
よりＦＦ、ＦＩＯＩにセットしており、線５１．を介し
て送出要求信号が送られた時、ＡＮＤゲートＬ１１６に
より演算結果を線４７、に出力する。On the other hand, the vector processing unit PE engineer calculates the calculation result S (
i) is set to FF and FIOI under the control of the vector instruction control circuit (2) and C101, and the line 51. When a transmission request signal is sent via the AND gate L116, the calculation result is outputted to the line 47.

また、中央ベタ１〜ル処理ユニットｖＰＣは、各ベクト
ル処理ユニットＰＥ□から線４７１〜４７Ｍを介して送
られてくるデータをＯＲアゲ−−Ｌ３３により線４５を
介してセレクタＬ３０に送出する。Further, the central vector processing unit vPC sends the data sent from each vector processing unit PE□ via the lines 471 to 47M to the selector L30 via the line 45 via the OR game L33.

セレクタＬ３０はＦＦ、Ｆ２の出力により線４５を選択
し、マルチプレクサＬ２９は線２３を介してアンドゲー
トＬ１４から入力される書込みベクトルレジスタ番号３
によりベタ１−ルレジスタＶＣ３のｉ要素口に書き込む
。Selector L30 selects line 45 by the output of FF, F2, and multiplexer L29 selects write vector register number 3 input from AND gate L14 via line 23.
writes to the i element port of the flat register VC3.

（命令ｖ１０２〜Ｖ１０３）命令Ｖ１０２．Ｖ１０３は、以前に述べたＬＯＡＤ、Ａ
ＤＤ命令で、とも４：ＦＦ、Ｆ３をセットする命令であ
る。(Instructions v102 to V103) Instruction V102. V103 is the previously mentioned LOAD, A
Both are DD commands, which set 4:FF and F3.

中央ベタ１−ル処理ユニットｖＰＣではＬＯＡＤ命令ｖ
１０２により、Ｒ２，Ｒ３フィールドで指定されるスカ
ラレジスタＳＣＩ、ＳＣ２から、ベクトルＤの先頭アド
レスＶＡＤＲ，アドレスの増分値ＩＮＣＶがそれぞれ線
３９．３８上に読み出されベクトル参照制御回路Ｃ２に
入力される。回路Ｃ２はこれらのデータをもとに、ベク
トルＤの要素の内、ベクトル長レジスタＭ４により指定
されたベクトル長Ｖの要素Ｄ（１）〜Ｄ　（Ｍ）を読み
出すために、それらの記憶アドレスを腺６４を介して、
ＳＣＵ、Ｃ５２に順次入力する。この結果、線４３上に
ベクトル要素Ｄ　（１）〜Ｄ　（Ｍ）が順次読み出され
、ベクトル命令制御回路（１）から線５５に送出される
Ｉｔ　Ｉ　Ｈ信号と、ＦＦ。In the central processing unit vPC, the LOAD command v
102, the start address VADR and address increment value INCV of the vector D are read out onto lines 39 and 38 from the scalar registers SCI and SC2 specified by the R2 and R3 fields, respectively, and input to the vector reference control circuit C2. . Based on these data, the circuit C2 reads out the storage addresses of elements D(1) to D(M) of the vector length V specified by the vector length register M4 among the elements of the vector D. Through gland 64,
Sequentially input to SCU and C52. As a result, the vector elements D (1) to D (M) are sequentially read out on the line 43, and the It I H signal and the FF are sent from the vector instruction control circuit (1) to the line 55.

Ｆ２の０”出力とアンドゲートＬ１４からのレジスタ番
号゛″４″に基づきセレクタＬ２９゜Ｆ３０が制御され
ベク１へルレジスタＶＣ４にストアされる。Based on the 0'' output of F2 and the register number ``4'' from the AND gate L14, the selector L29°F30 is controlled and stored in the vector 1 register VC4.

命令■１０３によりベクトルレジスタＶＣ３内のデータ
５（１）〜Ｓ　（Ｍ）とべりトルレジスタＶＣ４内のデ
ータＤ（１）〜Ｄ（Ｍ）との間の加算を行ない、結果Ｃ
（ｔ）〜Ｃ（Ｍ）をベクトルレジスタＶＣ５に書込む。Instruction 103 performs addition between data 5(1) to S(M) in vector register VC3 and data D(1) to D(M) in vector register VC4, resulting in C.
(t) to C(M) are written to vector register VC5.

第１２図にはＦＦ、Ｆ４をセットする命令を含むプログ
ラム例とこれを実行するための命令が示しである。FIG. 12 shows an example of a program including an instruction to set FF and F4, and an instruction for executing the program.

（命令Ｖ１０４）命令Ｖｌ　０４はＦＦ、Ｆ４をセットする命令の例であ
る。(Instruction V104) Instruction Vl 04 is an example of an instruction to set FF and F4.

ここでは簡単のために、ベクトルレジスタＶＲ７とＶＲ
８にそれぞれバク１−ルデータｘ、ｙがすでにストアさ
れていると仮定する。命令ｖ１０４はこれらのベクトル
データの内積Ｓを求めて、スカラレジスタＳＲ７に書込
む命令である。Here, for simplicity, vector registers VR7 and VR
It is assumed that backup data x and y have been stored in 8 and 8 respectively. Instruction v104 is an instruction to obtain the inner product S of these vector data and write it to the scalar register SR7.

この命令Ｖｌ　０４が命令レジスタＭｌにセットされる
と、ベクトル命令制御回路（１）、Ｃ１の制御の下でベ
クトル演算器Ｃ３にてベクトルレジスタＶＣ７とＶＯ２
内のデー９Ｘ　（１）　〜Ｘ　（Ｍ）トＹ　（＋）〜Ｙ
　（Ｍ）の内積が求められ、結果Ｓをセレクタｒ−２５
、マルチプレクサＩ４２６を介してスカラレジスタＳＣ
７に書込まれる。When this instruction Vl 04 is set in the instruction register Ml, the vector register VC7 and VO2 are set in the vector operator C3 under the control of the vector instruction control circuit (1) and C1.
Day 9X (1) ~X (M) ToY (+) ~Y
The inner product of (M) is calculated and the result S is sent to the selector r-25
, scalar register SC via multiplexer I426
7 is written.

なお、以上の実施例においてはベクトル処理ユニッｌ−
Ｐ　Ｅ　１〜ＰＥＭの数Ｍと同じ数の要素についての演
算を行う外側Ｄｏループについて説明したが、外側Ｄｏ
ループでの演算要素数がＭより大きいときには、ベタ１
−ル処理ユニットＰＥ□〜ｐ　ｒ＞　、を繰り返し用い
ればよい。また外側Ｄｏループの演算要素がＭより小さ
い値Ｍ′のときには、みかけ上処理ユニットＰＥ１〜Ｐ
ＥＭをすべて動作させ、処理ユニットＰＥＭ’　や、〜
ＰＥＭの演算結果を利用しない、ように制御すればよい
。Note that in the above embodiment, the vector processing unit l-
Although we have explained the outer Do loop that performs operations on the same number of elements as the number M of P E 1 to PEM, the outer Do
When the number of calculation elements in the loop is larger than M, solid 1
- processing units PE□ to pr> may be used repeatedly. Furthermore, when the calculation element of the outer Do loop has a value M' smaller than M, the apparent processing units PE1 to P
All EMs are operated, and the processing units PEM' and ~
Control may be performed so that the PEM calculation results are not used.

以上の実施例で示したアレイプロセッサでは次の効果が
ある。The array processor shown in the above embodiment has the following effects.

（１）　　ベグ１−ル・アレイ演算を、ベクトル演算を
行なう中央ベクトル処理ユニットとアレイ演算を行なう
複数台のベクトル処理ユニットに分割するが、それぞれ
のベクトルレジスタとスカラレジスタをインタフェース
とする構成により、アレイデータとアレイデータ、バク
１−ルデータとベクトルデータとの演算ばがりでなく、
アレイデータとベクトルデータ、ベク！−ルデータとス
カラデータ等の演算も高速に処理できる。(1) Veg1-array operations are divided into a central vector processing unit that performs vector operations and multiple vector processing units that perform array operations, but by using a configuration in which the vector register and scalar register of each unit are used as interfaces, Not only calculations between array data and array data, backup data and vector data,
Array data and vector data, vector! - Operations on scalar data and scalar data can be processed at high speed.

（２）　　アレイ処理にベクトルレジスタを有するベク
Ｊ−ルプロセッサを使用したため、演算の中間値をベク
トルレジスタに保て、従来の並列側−２’）：機のネッ
クである主記憶に対する負荷が減る。(2) Because a vector processor with vector registers is used for array processing, intermediate values of calculations can be kept in vector registers, reducing the load on main memory, which is the bottleneck of conventional parallel processing. .

（３）　　アレイ演算のための複数台のベクトル処理ユ
ニットのそれぞれにベクトル長Ａを指定するスカラレビ
スタを設けたため、長方形領域ばかり任意領域のアレイ
演算ができる。(3) Since each of the plurality of vector processing units for array calculations is provided with a scalar register for specifying the vector length A, array calculations can be performed on arbitrary areas, including rectangular areas.

（４）本発明のアレイプロセッサの動作を規定する命令
仕様を、演算を指示する命令コード部と、オペランドの
データ形式を示す部分に分はしたため、データ形式を示
す部分に対する簡ｍなデコード回路により、アレイデー
タとアレイデータ、アレイデータとベクトルデータ等の
演算データ形式の組み合わせの決定が容易にできる。(4) Since the instruction specifications that define the operation of the array processor of the present invention are divided into an instruction code section that instructs operations and a section that indicates the data format of the operands, a simple decoding circuit for the section that indicates the data format can be used. , combinations of arithmetic data formats such as array data and array data, array data and vector data, etc. can be easily determined.

また、演算データ形式の異なる組合せに対しても同じ命
令コードが使える。Furthermore, the same instruction code can be used for different combinations of calculation data formats.

（５）　　ベクトル・アレイ演算部のプロセッサを、中
央ベクトル処理ユニットと複数台のベクトル処理ユニッ
１−の階層構造にしたため、アレイ演算のための複数台
のベクトル処理ユニットへのセットアツプ処理がベクト
ル演算で行なえ高速化される。中央ベクトル処理ユニッ
トのない構成では、セットアツプ処理がスカラ処理ユニ
ットによる時系列的処理となり性能上の大きな足かせと
なる。(5) The processor of the vector array calculation section has a hierarchical structure consisting of a central vector processing unit and multiple vector processing units 1-1, so the setup process for multiple vector processing units for array calculation is performed by vector calculation. This will speed up the process. In a configuration without a central vector processing unit, setup processing is performed in a time-series manner by a scalar processing unit, which becomes a major hindrance in terms of performance.

また以上の実施例では、簡単のため中央ベクトル処理ユ
ニットｖＰＣのベクトルレジスタＶＣ□と、ベクトル処
理ユニットＰＥ１〜ＰＥＭのスカラレジスタＳｌｉ−３
Ｍｉの内容の同一性を保証したが、他方のユニットに転
送する必要のないレジスタ書き込みに対しては（例えば
中間値の作成）転送を行なわない制御にすることも、転
送を行なわないレジスタ番号をきめることにより容易に
実現できる。Further, in the above embodiment, for simplicity, the vector register VC□ of the central vector processing unit vPC and the scalar register Sli-3 of the vector processing units PE1 to PEM are
Although the identity of the contents of Mi is guaranteed, register writes that do not need to be transferred to the other unit (for example, creating an intermediate value) can be controlled so that no transfer is performed, or register numbers that do not transfer can be set. This can be easily achieved by making a decision.

以上述べたように、本発明によれば、ベクトルレジスタ
を用いたベクトルプロセッサによるベクトル処理の高速
化の効果とその処理の並列化によリアレイ演算が高速に
実行できる。As described above, according to the present invention, relay calculations can be executed at high speed due to the effect of speeding up vector processing by a vector processor using vector registers and parallelization of the processing.

[Brief explanation of the drawing]

第１図は、アレイ処理される演算のＦＯＲＴＲＡＮプログラムの一例を示す図、第２図は、
従来のベクトルプロセッサにおける第１図のＦＯＲＴＲ
ＡＮプログラムの実行手順を示す図。第３図は、本発明によるアレイプロセッサの全体構成図
、第４図（ａ）は、第３図のアレイプロセッサで用いる
レジスタの説明図、第４図（ｂ）〜（ｄ）はそれぞれ概
念上のスカラ、ベクトル、アレイレジスタを示す図、第
５図は、第３図のアレイプロセッサを動作させるための
ベグ１−ル命令又はアレイ命令のフォーマットを示す図
、第６図は第５図の命令が指示するレジスタの種類の説
明図、第７図は、第３図の装置における命令の異なる使
用例を示す図、第８図は、第３図の装置に用いる中央ベ
クトル処理ユニットの構成を示す図、第９図は、第３図
の装置に用いるベクトル処理ユニットの構成を示す図、
第１０図は、第３図の装置に用いる演算結果送出要求回
路の構成を示す図、第１＋図は、アレイ処理を要するＦ
ＯＲＴＲＡＮプログラムの例とそれに対するベクトル・
アレイ命令列を示す図、第１２図は、アレイ処理のＦＯ
ＲＴ”ＲＡＮプログラムの他の例とそれに対応するとベ
クトル・アレイ命令列の例を示す図、第１３図はメモリ
上のアレイデータの配列を示す図である。＝）Ｄｏ　　１０　　Ｊ−７，、ＨＤＯＺＯＩ＝　　７．ＮＡ　（１，Ｊ）　＝　６（ｒ、　：Ｊ）　ｔ　Ｃ（Ｉ、
’ｆ）２０　Ｃ０ＮＴＩ７ＪｕＥ１ｏ　　こＯＮＴ／ＨＵＥ第２囚第夕囚第７図第　７　目（Ｌ：Ｌ）（ｂ） −ｆ）０　７θ　Ｊ＝７．Ｆｌ（こ）（ｄ−９ ’Ｅ）０　７０　、ｆ−７，？−４Ｘ（Ｊ）＝γ（、Ｔ）六Ｓ　　　　　　）４υＬＴ　　
ＶＢ２．ＶＢ２５；Ｒ７７θ　Ｃ０ＮＴ／ＮＵと第　７７目（Ｌ）（ｂ）スカラノ’Ｉｋｌリ　　　　　　　　　　　　ｔり７ト
ル・アレイ補９か多ＩＪ第７２目（久）（ｂ）ＤＯ７０ＪメＩＪバ５＝ＳｆＸ（Ｊ）”Ｙ（Ｊ）　　　　　　　　　　　　
　　　　ＰＲＯＤ　　ＳＫ’７．ｌ／Ｒ７，ＶＲＩ〜Ｖ
ｌｌｆ１７０　　ＣＯ？Ｊ７’／ｈ／Ｌ／Ｅ第　ノ３　ＢコFIG. 1 is a diagram showing an example of a FORTRAN program for calculations that are processed in an array, and FIG.
FORTR in Figure 1 in a conventional vector processor
The figure which shows the execution procedure of AN program. FIG. 3 is an overall configuration diagram of an array processor according to the present invention, FIG. 4(a) is an explanatory diagram of registers used in the array processor of FIG. 3, and FIGS. 4(b) to (d) are conceptual diagrams. 5 is a diagram showing the format of a vector instruction or an array instruction for operating the array processor of FIG. 3, and FIG. 6 is a diagram showing the format of the instruction of FIG. 5. 7 is a diagram showing different usage examples of instructions in the device of FIG. 3, and FIG. 8 is a diagram showing the configuration of the central vector processing unit used in the device of FIG. 3. 9 are diagrams showing the configuration of a vector processing unit used in the apparatus of FIG. 3,
FIG. 10 is a diagram showing the configuration of the arithmetic result sending request circuit used in the device of FIG. 3, and FIG.
An example of an ORTRAN program and its vector
A diagram showing the array instruction sequence, FIG. 12, is the FO of array processing.
FIG. 13 is a diagram showing another example of the RT"RAN program and a corresponding example of a vector array instruction sequence, and FIG. 13 is a diagram showing the arrangement of array data on the memory. DOZOI = 7.N A (1, J) = 6 (r, :J) t C (I,
'f) 20 C0NTI7JuE 1o ONT/HUE 2nd Prisoner 2nd Evening Prisoner Figure 7 7th (L:L) (b) -f)0 7θ J=7. Fl (ko) (d-9'E)0 70, f-7,? −4 X(J)=γ(,T)6S )4υLT
VB2. VB25; R77θ C0NT/NU and 77th (L) (b) Scarano'Iklli tri 7tor array supplement 9 or multi IJ 72nd (ku) (b) DO70J MeIJ bar 5 = SfX (J) ”Y(J)
PROD SK'7. l/R7, VRI~V
llf170 CO? J7'/h/L/E No. 3 B

Claims

[Claims] 1. A first means for obtaining one-dimensional array data or scalar data by executing a vector instruction that instructs an operation on one-dimensional array data; and an array that instructs an operation on two-dimensional array data. a plurality of second means for executing the instruction, each of which executes an operation instructed by the array instruction on one one-dimensional array data constituting the two-dimensional array data; When the command requests one-dimensional array data as result data, one element of the resultant one-dimensional array data is obtained, and when the array instruction requests two-dimensional array data as result data, it obtains one element of the resultant one-dimensional array data. The first means includes a plurality of first vector registers, a plurality of first scalar registers, and a first pipe. a line arithmetic unit, a means for fetching a vector instruction or an array instruction, a means for decoding whether the fetched instruction is a vector instruction or an array instruction, and when the decoded instruction is a vector instruction, the first vector register and the first means for controlling the scalar register and the first pipeline arithmetic unit to execute the decoded vector instruction; and when the operation result of the fetched vector instruction is one-dimensional array data, the one-dimensional array data is means for sending each element to a corresponding one of the plurality of second means, and storing the element of the one-dimensional array data calculated by each of the plurality of means in one of the first vector registers; each of the second means comprises a plurality of second vector registers, a plurality of second scalar registers, a second pipeline arithmetic unit, and an array in which the fetched instructions are arranged. At the time of the instruction, the second vector register, the second scalar register, and the second pipeline arithmetic unit are controlled to fetch one dimension of the two-dimensional array data specified by the fetched instruction. means for executing the fetched array instruction on array data; and when the execution result data is one element of one-dimensional array data, means for sending the one element to the first means; means for storing one of the elements sent by the first means into one of the second scalar registers. 2. A main memory that stores scalar data, vector data, and array data, and stores scalar instruction sequences and vector/array instruction sequences separately; scalar instruction execution means for reading scalar data, decoding and executing the read scalar instruction, and storing the scalar data obtained by the execution in the main storage device; A vector array instruction execution means reads an instruction sequence, vector data, or array data and stores the vector or array data obtained by the execution in the main storage device, and the scalar instruction execution means executes the vector array instruction sequence. an array processor having means for generating data necessary for the scalar instruction in response to the scalar instruction. 3. The vector array instruction execution means reads the vector array instruction from the main memory, and if the vector instruction is distinguished from the array instruction and the vector array instruction is a vector instruction, the vector array instruction is read from the main memory. main vector instruction execution means for reading and executing vector data, and storing vector data obtained by the execution in the main storage device;
If the vector array instruction is an array instruction, a second item comprising array instruction execution means for reading array data from the main memory, executing it, and storing array data obtained by the execution in the main memory. array processor. 4. The array instruction execution means decodes the array instruction as a vector instruction, divides the array data into a plurality of vector data from the main memory, reads them in parallel, executes them in parallel, and reads the array data into a plurality of vector data obtained by the execution. 4. The array processor according to claim 3, comprising a plurality of vector instruction execution means for storing data in parallel as array data in the main memory. 5. The main vector instruction execution means is an array according to item 4, having means for transferring data necessary for execution of the plurality of vector instruction execution means and vector data necessary for the array instruction to the plurality of instruction execution means. processor. 6. The array processor according to item 3, wherein the plurality of vector instruction execution means includes means for transferring vector data necessary for vector instruction execution performed by the main vector instruction execution means to the main vector instruction execution means. 7. Each vector instruction execution means constituting the array instruction execution means transfers the length of the vector to be processed independently by each vector instruction execution means and the vector data on the main memory to be processed, which is transferred by the transfer means. 4. The array processor according to item 4, having means for holding a start address and an address increment value of the vector data, and for use in control.