JP2834298B2

JP2834298B2 - Data processing device and data processing method

Info

Publication number: JP2834298B2
Application number: JP24755790A
Authority: JP
Inventors: 雅彦齊藤; 憲一黒沢; 小林　　芳樹
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1990-09-19
Filing date: 1990-09-19
Publication date: 1998-12-09
Anticipated expiration: 2013-12-09
Also published as: JPH04127351A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、計算機の高速処理に関わり、特に、プログ
ラムに内存する並列性に対応して、プログラムの実行方
法を変化させる可変構造型のデータ処理装置及びデータ
処理方法に関する。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to high-speed processing of a computer, and more particularly to variable-structure data that changes the execution method of a program in response to the parallelism inherent in the program. The present invention relates to a processing device and a data processing method.

[Conventional technology]

複数個のプロセツサエレメントを有し、それぞれのプ
ロセツサエレメントが並列に命令を実行するデータ処理
装置には、一般的に２種類の構成方法が存在する。第一
の構成方法が、マルチプロセツサ型計算機システムであ
り、第二の構成方法が、スーパスカラ型計算機システム
である。In general, there are two types of configuration methods for a data processing device having a plurality of processor elements, each of which executes instructions in parallel. The first configuration method is a multiprocessor type computer system, and the second configuration method is a superscalar type computer system.

マルチプロセツサ型計算機システムにおいては、複数
個のプロセツサエレメントがメモリ装置またはキヤツシ
ユ記憶装置を共有して動作する。プロセツサエレメント
内部にはプログラムの実行位置を指すプログラムカウン
タが装備され、各プロセツサエレメントはそれぞれ該プ
ログラムカウンタの指す位置より命令を取り出し、実行
する。すなわち、各プロセツサエレメントは別個のプロ
グラムを独立して実行するものである。In a multiprocessor type computer system, a plurality of processor elements operate by sharing a memory device or a cache storage device. A program counter that indicates a program execution position is provided inside the processor element, and each processor element fetches an instruction from a position indicated by the program counter and executes the instruction. That is, each processor element independently executes a separate program.

スーパスカラ型計算機システムにおいても、複数個
（ｎ個）のプロセツサエレメントがキヤツシユ記憶装置
並びにレジスタを共有して動作する。しかしながら、各
プロセツサエレメントの動作は、１つのプログラム内に
存在する複数個の命令を同期して並列に実行するという
ものである。スーパスカラ型計算機システムにおいて
は、プログラムカウンタを含む全レジスタを共有し、プ
ログラムカウンタによつて指される位置からｎ個の命令
を取り出し、それぞれを各プロセツサエレメントで実行
する。なお、スーパスカラ型計算機システムの一種とし
て、VLIW（超長命令）型計算機システムもある。VLIW型
計算機システムは、１つの命令により複数の演算，制御
が可能な計算機であり、これらの演算，制御を各プロセ
ツサエレメントに分配して実行する。Also in the superscalar computer system, a plurality (n) of processor elements operate by sharing a cache storage device and a register. However, the operation of each processor element is to execute a plurality of instructions existing in one program synchronously and in parallel. In the superscalar computer system, all registers including the program counter are shared, n instructions are extracted from the position pointed to by the program counter, and each instruction is executed by each processor element. One type of superscalar computer system is a VLIW (ultra-long instruction) computer system. The VLIW type computer system is a computer capable of performing a plurality of operations and controls by one instruction, and distributes and executes these operations and controls to each processor element.

マルチプロセツサ型計算機システム、スーパスカラ型
計算機システム両者のデータ処理装置とも、従来から各
種の分野に適用されている高速化技術である。ここで
は、両計算機システム、および、VLIW型計算機システム
それぞれの例として、以下３種類の文献を示しておく。The data processing devices of both the multiprocessor type computer system and the superscalar type computer system are high-speed technologies conventionally applied to various fields. Here, the following three types of documents are shown as examples of both the computer system and the VLIW type computer system.

１）マルチプロセツサ S.Thakkar et.al.:“The Balance Multiprocessor Syst
em,"IEEE MICRO,1989,2,pp.57−69. ２）スーパスカラ K.Murakami et.al.:“SIMP（Single Instruction strea
m/Multiple instruction Pipelining）:A Novel High−
Speed Single−Processor Architecture,"Proc.16th In
ternational Symposium on Computer Architecture,198
9,pp.78−85. ３）VLIW H.Hagiwara et.al.:“A Dynamically Microprogrammabl
e Computer with Low−Level Parallelism,"IEEE Tran
s.C−29,No.7,1980,pp.577−595. 〔発明が解決しようとする課題〕上述したスーパスカラ型計算機システムにおいては、
ｎ個の命令（VLIW型計算機システムにおいては、ｎ個の
演算）を同期して並列に実行する必要性があるため、各
プロセツサエレメント間の通信遅延をできる限り小さく
することが重要である。このため、各プロセツサエレメ
ントを１つのLSIチツプ、または、１つのボード上に配
置しなければならない。近年のVLSI技術の進歩により、
0.5μ（ミクロン）、0.3μ、さらには0.1μといつた単
位でLSIチツプを作成することが可能となり、上記の通
信遅延に対する要求は満たされつつある。1) Multiprocessor S. Thakkar et.al .: “The Balance Multiprocessor Syst
em, "IEEE MICRO, 1989, 2, pp. 57-69.2) Superscalar K. Murakami et.al .:" SIMP (Single Instruction strea)
m / Multiple instruction Pipelining): A Novel High-
Speed Single-Processor Architecture, "Proc.16th In
ternational Symposium on Computer Architecture, 198
9, pp. 78-85. 3) VLIW H. Hagiwara et.al .: “A Dynamically Microprogrammabl
e Computer with Low-Level Parallelism, "IEEE Tran
sC-29, No. 7, 1980, pp. 577-595. [Problem to be Solved by the Invention] In the superscalar computer system described above,
Since it is necessary to execute n instructions (n operations in a VLIW type computer system) synchronously and in parallel, it is important to minimize the communication delay between each processor element. Therefore, each processor element must be arranged on one LSI chip or one board. With recent advances in VLSI technology,
LSI chips can be made in units of 0.5 μm (micron), 0.3 μm, and even 0.1 μm, and the above requirement for communication delay is being satisfied.

しかしながら、ｎ個のプロセツサエレメントを用意
し、同期して並列に命令を実行させるハードウエアを実
現したとしても、実行すべきプログラムに並列性が見出
せない場合には、実現したハードウエアのうち限られた
部分しか使用できないという状態となる。極端な例とし
て、連続する命令間でまつたく並列性のないプログラム
を実行させたとすると、ｎ個のプロセツサエレメントの
うち、実際に動作するプロセツサエレメントは１個しか
存在せず、残り（ｎ−１）個のプロセツサエレメントが
無駄となる。このようなプログラムがｎ個存在し、各プ
ログラムの実行時間がSi（ｉは１乃至ｎ）秒であるとす
ると、全プログラムの実行時間はΣSiとなる（Σは全加
算を表す。）ところが、このようなプログラムをマルチプロセツサ
型計算機システムにおいて実行させた場合には、非常に
効果的に実行できることがある。各プログラムの実行順
序を換えても問題がない場合（これは異なつたｎ人のユ
ーザのプログラムである場合など、一般的に起こる仮定
である）、ｎ個のプロセツサエレメントで、独立にそれ
ぞれのプログラムを実行することができる。このとき、
理想的には、各プログラムの実行時間Siのうち、最大値
で全プログラムを終了させることが可能である。実際に
は、キヤツシユ記憶装置、メモリ装置のプロセツサエレ
メント間における競合などにより、実行時間が伸びる
が、それでもスーパスカラ型計算機システムに比べる
と、非常に有利である。However, even if hardware for executing instructions in parallel in a synchronized manner with n processor elements is provided, if parallelism cannot be found in the program to be executed, a limited number of realized hardware is required. It becomes a state in which only the specified part can be used. As an extreme example, if a program without parallelism is executed between consecutive instructions, only one processor element that actually operates exists among the n processor elements, and the remaining (n) -1) The processor elements are wasted. Assuming that there are n such programs and the execution time of each program is Si (i is 1 to n) seconds, the execution time of all programs is ΣSi (Σ represents full addition). When such a program is executed in a multiprocessor type computer system, it can be executed very effectively. If there is no problem even if the execution order of each program is changed (this is a common assumption such as a program of n different users), n processor elements are used to independently execute each program. The program can be executed. At this time,
Ideally, all programs can be terminated with the maximum value of the execution time Si of each program. Actually, the execution time is increased due to competition between the processor elements of the cache storage device and the memory device, but it is still very advantageous as compared with the superscalar computer system.

この例とは逆に、マルチプロセツサ型計算機システム
が不利となる惜況も有り得る。一例として、複数個のプ
ログラムが同時に実行される環境でない場合や、プロセ
ツサエレメント数がプログラム数に比べて多い場合が挙
げられる。すなわち、マルチプロセツサ型計算機システ
ムにおいては、存在するプログラム数だけの並列度しか
実現し得ないので、プログラム数をｐ型とすると、プロ
セツサエレメント数（ｎ個）がプログラム数より多い場
合、（ｎ−ｐ）個のプロセツサエレメントが無駄とな
る。スーパスカラ型計算機システムでは、１つのプログ
ラム内においても並列性を見出すことが可能な場合があ
るので、プログラム数が少ない場合でも、プロセツサエ
レメントを無駄にしないことが多い。Contrary to this example, there is a possibility that the multiprocessor type computer system is disadvantageous. As an example, there is a case where the environment is not an environment where a plurality of programs are executed simultaneously, or a case where the number of processor elements is larger than the number of programs. That is, in a multi-processor computer system, only the degree of parallelism can be realized by the number of existing programs. Therefore, when the number of programs is p-type, if the number of processor elements (n) is larger than the number of programs, ( np) processor elements are wasted. In a superscalar computer system, parallelism can be found even in one program, so that processor elements are often not wasted even when the number of programs is small.

本発明の第一の目的は、上記のようなスーパスカラ型
計算機システム、マルチプロセツサ型計算機システムの
課題を相補することであり、具体的には、プログラムの
並列性の度合いによつて、データ処理装置の動作モード
を、スーパスカラ型計算機システム、マルチプロセツサ
型計算機システムのうち、いずれかの動作（スーパスカ
ラ型計算機システムの動作を「並列動作」、マルチプロ
セツサ型計算機システムの動作を「マルチプロセツサ動
作」と呼ぶ）に設定することが可能なハードウエアを実
現することである。A first object of the present invention is to complement the above-mentioned problems of the superscalar computer system and the multiprocessor computer system. Specifically, data processing depends on the degree of parallelism of a program. Set the operation mode of the device to one of the superscalar computer system and the multiprocessor computer system (the operation of the superscalar computer system is referred to as “parallel operation”, and the operation of the multiprocessor computer system is referred to as “multiprocessor processor”). The operation is called “operation”.

また、本発明の第二の目的は、並列動作を実現するた
めに使用したVLSI設計技術を、マルチプロセツサ動作に
おいても有効に活用できるようにすることである。本項
の最初で記述したように、並列動作（スーパスカラ型計
算機システム）を実現するためには、各プロセツサエレ
メントを同一LSIチツプ、あるいは同一ボード内に配置
することが必要である。このような、プロセツサエレメ
ントの配置は、通常のマルチプロセツサ動作（マルチプ
ロセツサ型計算機システム）においては特に必要とはさ
れない。しかしながら、本発明においては、複数個のプ
ログラムが互いに協調して動作するような場合を考慮
し、「並列動作のために、プロセツサエレメント同志が
互いに近くに配置されている」事実を、マルチプロセツ
サ動作においても有効に利用することを目的とする。具
体的には、複数個のプログラム間で高速に同期、通信を
行える手段を設けることを第二の目的とする。A second object of the present invention is to enable the VLSI design technology used for realizing the parallel operation to be effectively used also in the multiprocessor operation. As described at the beginning of this section, in order to realize parallel operation (superscalar computer system), it is necessary to arrange each processor element on the same LSI chip or on the same board. Such an arrangement of the processor elements is not particularly required in a normal multiprocessor operation (multiprocessor computer system). However, in the present invention, considering the case where a plurality of programs operate in cooperation with each other, the fact that "processor elements are arranged close to each other for parallel operation" It is intended to make effective use in the setting operation. Specifically, a second object is to provide means for performing high-speed synchronization and communication between a plurality of programs.

[Means for solving the problem]

上記第一の目的は、以下に示す３種類の手段により達
成される。The first object is achieved by the following three types of means.

１）ｎ個のプロセツサエレメントを独立に動作させるマ
ルチプロセツサ動作機構；２）ｎ個のプロセツサエレメントを基本クロツクに同期
して動作させる並列動作機構；更に必要に応じて、３）本発明のデータ処理装置が並列動作で実行中か、マ
ルチプロセツサ動作で実行中かを示す並列化フラグ；なお、上記並列動作機構は各プロセツサエレメント内に
それぞれ設けるものである。1) a multi-processor operating mechanism for independently operating n processor elements; 2) a parallel operating mechanism for operating n processor elements in synchronization with a basic clock; and, if necessary, 3) the present invention. A parallelization flag indicating whether the data processing device is executing in a parallel operation or in a multiprocessor operation; the parallel operation mechanism is provided in each processor element.

上記第二の目的は、以下に示す３種類の手段により達
成される。The second object is achieved by the following three means.

ａ）プロセツサエレメント間でデータ，アドレス，制御
信号などを送受するための通信バス；ｂ）通信バスの管理を行うためのバス調停回路；ｃ）プロセツサエレメント間で同期を行う同期フアシリ
テイ；なお、以上の３種類の手段は上記マルチプロセツサ動
作機構の構成要素である。a) a communication bus for transmitting and receiving data, addresses, control signals, and the like between the processor elements; b) a bus arbitration circuit for managing the communication bus; c) a synchronization facility for synchronizing between the processor elements; The above three types of means are components of the multiprocessor operating mechanism.

[Action]

マルチプロセツサ動作機構は本発明のデータ処理装置
において、ｎ個のプロセツサエレメントが独立して動作
することを許可する。また、各プロセツサエレメントで
実行中のプログラムが高速に同期、通信を行う手段を提
供する。The multi-processor operating mechanism allows the n processor elements to operate independently in the data processing device of the present invention. Also, the present invention provides a means for synchronizing and communicating with the programs running in each processor element at high speed.

並列動作機構はｎ個のプロセツサエレメントの同期を
行う。また、並列動作時には、プロセツサエレメント間
でレジスタの共有を行う。このため、命令実行時にレジ
スタ参照が行われたとき、各プロセツサエレメント内の
並列動作機構が自プロセツサエレメント内のレジスタで
あるか否かを判定し、自プロセツサエレメント内のレジ
スタならば、命令を実行したプロセツサエレメントにレ
ジスタの内容を送出する。The parallel operating mechanism synchronizes the n processor elements. In parallel operation, registers are shared between the processor elements. For this reason, when a register is referred to at the time of instruction execution, it is determined whether or not the parallel operation mechanism in each processor element is a register in the own processor element, and if it is a register in the own processor element, The contents of the register are sent to the processor element that has executed the instruction.

並列化フラグは、本発明のデータ処理装置がマルチプ
ロセツサ動作で実行中であるか、並列動作で実行中であ
るかを指定する。並列化フラグがマルチプロセツサ動作
を指定する場合、前記並列動作機構の動作を抑制する。
並列化フラグが並列動作を指定を場合、前記マルチプロ
セツサ動作機構の動作を抑制する。The parallelization flag specifies whether the data processing device of the present invention is being executed in a multiprocessor operation or is being executed in a parallel operation. When the parallelization flag specifies a multiprocessor operation, the operation of the parallel operation mechanism is suppressed.
If the parallelization flag specifies parallel operation, the operation of the multiprocessor operation mechanism is suppressed.

通信バスは、マルチプロセツサ動作時にプロセツサエ
レメント間で協調してプログラムを実行する場合に使用
され、データ転送などを行う。The communication bus is used when a processor element cooperates and executes a program at the time of multiprocessor operation, and performs data transfer and the like.

バス管理装置は前記通信バスの使用権管理などを行
い、各プロセツサエレメントが通信バスを使用してデー
タ転送などを行う場合、プロセツサエレメントの優先順
位にしたがつて通信バスの使用権を授与する。The bus management device manages the right to use the communication bus, and grants the right to use the communication bus according to the priority of the processor element when each processor element performs data transfer or the like using the communication bus. I do.

同期フアシリテイは、プロセツサエレメント間でデー
タ転送，同期を行う場合、転送したデータの保持，プロ
セツサエレメントの停止，再開などを行う。When data transfer and synchronization are performed between the processor elements, the synchronization facility holds the transferred data, and stops and restarts the processor elements.

〔Example〕

本発明の実施例を図面を用いて説明する。 An embodiment of the present invention will be described with reference to the drawings.

第１図に本発明におけるデータ処理装置の全体構成図
を示す。A2,A3がプロセツサエレメントであり、キヤツ
シユ記憶装置またはメモリ装置から読み出された命令と
データにより演算を実行する。第１図はプロセツサエレ
メント数が２個の場合を示しているが、本発明における
プロセツサエレメント数を制限するものではない。ただ
し、以下、本実施例では、プロセツサエレメントの数を
２個として説明する。A4はメモリ装置であり、プログラ
ム，データ等を格納する。A5はキヤツシユ記憶装置であ
り、メモリ装置から読み出された命令、データを一時的
に格納する。キヤツシユ記憶装置はマルチポードキヤツ
シユであると仮定する。すなわち、プロセツサエレメン
トり数をｎ個とすると、少なくともｎ本の入出力ポー
ト、あるいは、ｎ本の入力ポートとｎ本の出力ポートと
を有し、さらに、ｎ本のアドレス入力ポートを有するキ
ヤツシユ記憶装置である。A11は基本クロツクであり、
プロセツサエレメントA2,A3,マルチプロセツサ動作機構
A8のクロツクとして使用される。FIG. 1 shows an overall configuration diagram of a data processing device according to the present invention. A2 and A3 are processor elements, which execute operations based on instructions and data read from a cache storage device or a memory device. FIG. 1 shows a case where the number of processor elements is two, but the number of processor elements in the present invention is not limited. However, in the present embodiment, description will be made below assuming that the number of processor elements is two. A4 is a memory device that stores programs, data, and the like. A5 is a cache storage device for temporarily storing commands and data read from the memory device. Assume that the cache store is a multi-port cache. That is, assuming that the number of processor elements is n, a cache having at least n input / output ports, or n input ports and n output ports, and further having n address input ports. It is a storage device. A11 is the basic clock,
Processor elements A2, A3, multi-processor operation mechanism
Used as A8 clock.

ｎ個のプロセツサエレメントは、各プロセツサエレメ
ントが独立に動作するマルチプロセツサ動作と、各プロ
セツサエレメントが基本クロツクに同期して動作する並
列動作との２つのモードを有する。２つの動作モードを
制御するための機構として、マルチプロセツサ動作機構
A8と並列動作機構A9,A10との２種類の制御機構を装備す
る。このうち、並列動作機構A9,A10はｎ個のプロセツサ
エレメントの同期と、プロセツサエレメント間でのレジ
スタの共有を行う。このとき、各プロセツサエレメント
で実行する命令の間で、レジスタ競合のチエツクを行
う。競合が発生している場合には一方のプロセツサエレ
メントの命令の実行を遅延させ、レジスタ競合を解消す
る。なお、レジスタ競合のチエツク並びに解消を行わな
いハードウエアも考えられるが、この場合、プロセツサ
A1はスーパスカラプロセツサとして動作するのではな
く、VLIWプロセツサとして動作する。The n processor elements have two modes: a multi-processor operation in which each processor element operates independently, and a parallel operation in which each processor element operates in synchronization with a basic clock. Multiprocessor operation mechanism as a mechanism for controlling two operation modes
It is equipped with two types of control mechanisms, A8 and parallel operation mechanisms A9 and A10. Of these, the parallel operation mechanisms A9 and A10 synchronize the n processor elements and share registers between the processor elements. At this time, a check for register conflict is performed between instructions executed by each processor element. When a conflict occurs, the execution of the instruction of one processor element is delayed to eliminate the register conflict. Note that hardware that does not check for and eliminate register conflicts may be considered.
A1 does not operate as a superscalar processor, but as a VLIW processor.

また、データ処理装置がマルチプロセツサ動作モード
で実行中であるか、並列動作モードであるかを示すため
に並列化フラグA7を用意する。並列化フラグA7がマルチ
プロセツサ動作モードを示す場合には、各プロセツサエ
レメント中の並列動作機構A9,A10の動作を抑制し、並列
動作モードを示す場合には、マルチプロセツサ動作機構
A8は自らの動作を抑制する。Further, a parallel flag A7 is prepared to indicate whether the data processing device is executing in the multiprocessor operation mode or in the parallel operation mode. When the parallelization flag A7 indicates the multiprocessor operation mode, the operation of the parallel operation mechanisms A9 and A10 in each processor element is suppressed, and when the parallel operation mode indicates the parallel processor mode, the multiprocessor operation mechanism
A8 suppresses its own movement.

並列化フラグA7の設定は、OSのみが実行できる特権命
令による。ただし、並列化フラグA7の代わりとして、外
部入力を用いることも可能である。The setting of the parallelization flag A7 is based on a privileged instruction that can be executed only by the OS. However, an external input can be used instead of the parallelization flag A7.

A6はキヤツシユ制御装置であり、並列化フラグA7の値
にしたがつて、キヤツシユ記憶装置A5の読み出し／書き
込みのデータ幅、データ数を変更する。並列化フラグA7
がマルチプロセツサ動作モードを示す場合、ｎ個のプロ
セツサエレメントが非同期に命令、データの読み出し／
書き込みを行う。このため、１プロセツサエレメントの
読み書きビツト幅をｋとすると、キヤツシユ記憶装置A5
をｎアドレス入力、ｋビツト・ｎ入出力に設定する。ま
た、並列化フラグA7が並列動作モードを示す場合、ｎ個
のプロセツサエレメントが同期して命令、データの読み
出し／書き込みを行う。この場合には、命令は１アドレ
ス入力、ｎ×ｋビツト・１出力で読み出されるが、デー
タの読み書きはマルチプロセツサ動作モード時と同様で
ある。A6 is a cache control device, which changes the read / write data width and the number of data of the cache storage device A5 according to the value of the parallelization flag A7. Parallelization flag A7
Indicates the multiprocessor operation mode, the n processor elements asynchronously read / read instructions and data.
Write. Therefore, assuming that the read / write bit width of one processor element is k, the cache storage device A5
Is set to n address input, k bits and n input / output. When the parallelization flag A7 indicates the parallel operation mode, the n processor elements read / write instructions and data in synchronization. In this case, the instruction is read with one address input and n × k bits / one output, but the reading and writing of data is the same as in the multiprocessor operation mode.

プロセツサエレメントA2,A3（並列動作機構A9,A10を
含む）、キヤツシユ記憶装置A5,キヤツシユ制御装置A6,
並列化フラグA7,マルチプロセツサ動作機構A8,基本クロ
ツクA11をまとめたA1が、プロセツサである。プロセツ
サA1における上記各機能部分を異なつたボード上に設け
ることも可能であるが、本実施例においては、プロセツ
サA1全体が１チツプLSIもしくは１ボード上に装備され
ていると仮定する。また、第１図においては、プロセツ
サA1がメモリ装置A4と１対１に接続されているが、通信
バス、ネツトワークなどを介して、複数個のプロセツサ
をメモリ装置A4に接続した共有メモリ型マルチプロセツ
サ構成とすることも可能である。Processor elements A2, A3 (including parallel operation mechanisms A9, A10), cache storage device A5, cache control device A6,
The processor A1 is a combination of the parallelization flag A7, the multiprocessor operation mechanism A8, and the basic clock A11. Although it is possible to provide each of the above functional parts of the processor A1 on a different board, in the present embodiment, it is assumed that the entire processor A1 is mounted on one chip LSI or one board. In FIG. 1, the processor A1 is connected one-to-one with the memory device A4, but a shared memory type multi-processor in which a plurality of processors are connected to the memory device A4 via a communication bus, a network, or the like. A processor configuration is also possible.

a1乃至a9はデータ線，アドレス線，制御線などを表す
信号線である。a1は並列化フラグA7の出力線であり、マ
ルチプロセツサ動作モードであるか、並列動作モードで
あるかを示す１ビツトの情報を、キヤツシユ制御装置A
6,マルチプロセツサ動作機構A8,並列動作機構A9,A10に
伝える。a2,a3はプロセツサエレメントA2,A3とキヤツシ
ユ記憶装置A5との間でデータまたは命令を転送するため
に使用するデータ線を示す。a2,a3は32ビツト乃至128ビ
ツトの参照幅を有する。a4,a5はデータ、命令のアドレ
スを指定するアドレス線である。キヤツシユ制御装置A6
は並列化フラグA7の示す動作モードにしたがつて、各プ
ロセツサエレメントのアドレス信号a4,a5をキヤツシユ
記憶装置A5に送信する。a6,a7はマルチプロセツサ動作
機構A8とプロセツサエレメントA2,A3間を接続する信号
線であり、マルチプロセツサ動作モード時に使用され
る。a6,a7はアドレス線、データ線，制御線それぞれを
有する。a8は基本クロツクA11の出力信号であり、プロ
セツサエレメントA2,A3,マルチプロセツサ動作機構A8に
対してクロツク信号を伝える。a9はプロセツサA1を複数
個装備したマルチプロセツサシステムにおいて、マルチ
プロセツサ動作機構間で通信を行うためのデータ線，制
御線を総括して示している。a1 to a9 are signal lines representing data lines, address lines, control lines, and the like. a1 is an output line of the parallelization flag A7, which outputs 1-bit information indicating whether the operation mode is the multiprocessor operation mode or the parallel operation mode, by the cache control device A
6. The information is transmitted to the multiprocessor operating mechanism A8 and the parallel operating mechanisms A9 and A10. a2 and a3 indicate data lines used to transfer data or instructions between the processor elements A2 and A3 and the cache storage device A5. a2 and a3 have a reference width of 32 to 128 bits. a4 and a5 are address lines for specifying addresses of data and instructions. Cache control device A6
Transmits the address signals a4 and a5 of each processor element to the cache memory A5 according to the operation mode indicated by the parallelization flag A7. a6 and a7 are signal lines connecting the multiprocessor operation mechanism A8 and the processor elements A2 and A3, and are used in the multiprocessor operation mode. a6 and a7 have an address line, a data line, and a control line, respectively. a8 is an output signal of the basic clock A11, and transmits a clock signal to the processor elements A2, A3 and the multiprocessor operating mechanism A8. a9 generally indicates data lines and control lines for performing communication between multiprocessor operation mechanisms in a multiprocessor system equipped with a plurality of processors A1.

第２図に、本発明において、キヤツシユ記憶装置A5を
命令キヤツシユとデータキヤツシユとに分割した場合の
全体構成図を示した。A1乃至A4,A6乃至A11ならびにa1と
a4乃至a9は第１図と同一の機構，信号線を示す。B1は命
令キヤツシユ、B2はデータキヤツシユであり、それぞれ
命令とデータを個別に保持するキヤツシユ記憶装置であ
る。B3はセレクタであり、キヤツシユ記憶装置とメモリ
装置間でのブロツク転送を行う場合、転送する対象が命
令であれば、命令キヤツシユB1とメモリ装置A4とを接続
し、転送する対象がデータであれば、データキヤツシユ
B2とメモリ装置A4とを接続する。FIG. 2 shows an overall configuration diagram when the cache storage device A5 is divided into an instruction cache and a data cache in the present invention. A1 to A4, A6 to A11 and a1 and
Reference numerals a4 to a9 denote the same mechanisms and signal lines as those in FIG. B1 is an instruction cache, and B2 is a data cache, which is a cache storage device that individually stores instructions and data. B3 is a selector, and when performing block transfer between the cache storage device and the memory device, if the transfer target is an instruction, the instruction cache B1 is connected to the memory device A4, and if the transfer target is data, , Data Cache
Connect B2 to memory device A4.

b1とb2、および、b3とb4はデータ線であり、それぞれ
命令キヤツシユB1,データキヤツシユB2とプロセツサエ
レメントA2,A3との間の命令転送，データ転送に使用さ
れる。プロセツサエレメントA2,A3から命令キヤツシユB
1への書き込みは起こり得ないため、命令キヤツシユB1
からのデータ線b1,b2の動作は出力モードのみである。b1 and b2 and b3 and b4 are data lines, which are used for instruction transfer and data transfer between the instruction cache B1 and the data cache B2 and the processor elements A2 and A3, respectively. Instruction cache B from processor elements A2 and A3
Since writing to 1 cannot occur, the instruction cache B1
The operation of the data lines b1 and b2 is only in the output mode.

命令キヤツシユB1,データキヤツシユB2はともにマル
チポートキヤツシユであり、ｎアドレス入力ポート,n入
出力ポート（命令キヤツシユB1は出力ポートのみ）を有
すると仮定している。データキヤツシユとしてマルチポ
ートキヤツシユを使用することにより、データの読み書
きは、マルチプロセツサ動作，並列動作各モードとも
に、ｎ個のプロセツサエレメントで独立して行われる。
したがつて、データキヤツシユB2に対するｎ本のアドレ
ス入力は各プロセツサエレメントからのアドレス線をそ
のまま使用する。また、命令の読み出しについては、マ
ルチプロセツサ動作時には、各プロセツサエレメントで
独立に行われ、並列動作時には、各プロセツサエレメン
トが連続したｎ個の命令を取り出して実行する。このた
め、並列化フラグA7がマルチプロセツサ動作モードを示
す場合には、ｎ個のプロセツサエレメントのアドレス線
をそのまま命令キヤツシユB1に独立して入力させ、並列
動作を示す場合には、第１番目のプロセツサエレメント
のアドレス線のみを使用する。このような制御を行うた
めのキヤツシユ制御装置A6の構成については、第４図以
降で言及する。It is assumed that the instruction cache B1 and the data cache B2 are both multiport caches and have n address input ports and n input / output ports (the instruction cache B1 has only output ports). By using a multiport cache as the data cache, data reading and writing are independently performed by n processor elements in each of the multiprocessor operation mode and the parallel operation mode.
Therefore, n address inputs to the data cache B2 use the address lines from each processor element as they are. Instruction reading is performed independently by each processor element during multiprocessor operation, and during parallel operation, each processor element fetches and executes consecutive n instructions. For this reason, when the parallelization flag A7 indicates the multiprocessor operation mode, the address lines of the n processor elements are directly input to the instruction cache B1 as they are. Only the address lines of the th processor element are used. The configuration of the cache control device A6 for performing such control will be described with reference to FIG. 4 and subsequent drawings.

第３図にプロセツサエレメントA2,A3（並列動作機構A
9,A10を含む）の内部構成図を示した。各プロセツサエ
レメント自体が２命令並列実行可能なスーパスカラプロ
セツサであると仮定している。すなわち、マルチプロセ
ツサ動作時にはそれぞれのプロセツサが独立に２命令を
並列実行し、並列動作時には基本クロツクに同期して４
命令を並列に実行する。図中、A1乃至A4,A6乃至A8,A11
とa1,a4乃至a9は第１図と同一の機構並びに信号線を示
す。また、B1乃至B3とb1乃至b4は第２図と同一の機構並
びに信号線である。FIG. 3 shows processor elements A2 and A3 (parallel operation mechanism A
9, A10). It is assumed that each processor element itself is a superscalar processor capable of executing two instructions in parallel. That is, during multiprocessor operation, each processor independently executes two instructions in parallel, and during parallel operation, the four instructions are synchronized with the basic clock.
Execute instructions in parallel. In the figure, A1 to A4, A6 to A8, A11
And a1, a4 to a9 show the same mechanism and signal lines as those in FIG. B1 to B3 and b1 to b4 are the same mechanism and signal lines as those in FIG.

D1乃至D4は命令レジスタであり、命令キヤツシユB1か
ら読み出した命令を一時的に保持する。D5乃至D8はデコ
ーダであり、命令レジスタに格納されている命令を解読
する。D9とD10はマルチプロセツサ動作時と並列動作時
にデコーダに対して異なつた制御を行うためのセレクタ
である。D11乃至D14は演算器、D15とD16はそれぞれ２つ
の演算器で共有されるレジスタフアイルである。D17は
レジスタの値をプロセツサエレメント間で受渡しするた
めのゲートである。D1 to D4 are instruction registers, which temporarily hold the instruction read from the instruction cache B1. D5 to D8 are decoders for decoding instructions stored in the instruction register. D9 and D10 are selectors for performing different controls on the decoder during the multiprocessor operation and the parallel operation. D11 to D14 are arithmetic units, and D15 and D16 are register files shared by two arithmetic units. D17 is a gate for transferring the value of the register between the processor elements.

d1,d2はレジスタの値を参照するためのリードバスd3,
d4はレジスタにデータを格納するためのストアバスであ
る。また、並列動作時にプロセツサエレメント間でレジ
スタの値を参照するため、グローバルリードバスd5とグ
ローバルストアバスd6を設けている。d7乃至d10はデコ
ーダD5乃至D8を制御するための信号線である。図中、セ
レクタD9,D10とゲートD17,信号線d5乃至d10から並列動
作機構A9,A10が構成される。d1 and d2 are read buses d3 and d3 for referring to register values.
d4 is a store bus for storing data in the register. In addition, a global read bus d5 and a global store bus d6 are provided to refer to register values between processor elements during parallel operation. d7 to d10 are signal lines for controlling the decoders D5 to D8. In the figure, selectors D9 and D10, gate D17, and signal lines d5 to d10 constitute parallel operation mechanisms A9 and A10.

プロセツサエレメントA2,A3は、マルチプロセツサ動
作時には独立に、並列動作時には基本クロツクに同期し
てという相違点はあるものの、それぞれ２命令並列実行
を行う。したがつて、命令のビツト長をｋとすると、命
令キヤツシユB1と命令レジスタD1乃至D4とを結合する信
号線b1,b2は、２×ｋビツト（２命令）のデータ幅を必
要とする。それぞれ、ｋビツトずつが命令レジスタD1乃
至D4に送られる。ここで、まず、第１番目のプロセツサ
エレメントA2が２命令並列実行を行う場合の動作を述べ
る。第１番目と第２番目の命令レジスタD1,D2に保持さ
れた命令はそれぞれ、第１番目と第２番目のデコーダD
5,D6により解読される。解読された命令により演算器D1
1,D12とレジスタフアイルD15の制御が行われる。各命令
のオペランドで指定されたレジスタの内容が第１番目の
レジスタフアイルD15からリードバスd1を介して読み出
され、各命令のオペレーシヨンコードで指定された演算
が第１番目と第２番目の演算器D11,D12とを用いて並列
実行される。演算結果はストアバスd3を介してレジスタ
フアイルD15に格納される。第２番目のプロセツサエレ
メントA3でも同様に２命令を並列実行する。The processor elements A2 and A3 execute two instructions in parallel, though there is a difference that they are independent during the multiprocessor operation and synchronized with the basic clock during the parallel operation. Therefore, assuming that the bit length of the instruction is k, the signal lines b1 and b2 connecting the instruction cache B1 and the instruction registers D1 to D4 require a data width of 2 × k bits (two instructions). In each case, k bits are sent to the instruction registers D1 to D4. Here, first, an operation in the case where the first processor element A2 executes two instructions in parallel will be described. The instructions held in the first and second instruction registers D1 and D2 are the first and second decoders D and D2, respectively.
Decoded by 5, D6. Arithmetic unit D1 according to decoded instruction
1, D12 and the register file D15 are controlled. The contents of the register specified by the operand of each instruction are read from the first register file D15 via the read bus d1, and the operation specified by the operation code of each instruction is performed by the first and second operations. It is executed in parallel using the arithmetic units D11 and D12. The operation result is stored in the register file D15 via the store bus d3. Similarly, the second processor element A3 executes two instructions in parallel.

しかしながら、プロセツサエレメントA2,A3で同時に
取り出される複数個の命令には、本来並列に実行できな
い命令の組み合わせも存在する。第１番目の命令と第２
番目の命令との間でレジスタ競合が発生した場合がこれ
に相当する。第１番目の命令のデステイネーシヨンレジ
スタ（命令の演算結果を格納するレジスタ）が第２番目
の命令のソースレジスタ（命令において参照すべき値を
有しているレジスタ）として指定されている場合、２つ
の命令を並列に実行できない。すなわち、第１番目の命
令を実行し、結果が得られるまで第２番目の命令の実行
を遅延させなければならない。並列動作機構A9,A10は、
このようなレジスタ競合の解消を行う役割と、プロセツ
サエレメント間でレジスタ共有を行う役割とを持つ。な
お、本発明はプロセツサエレメントA2,A3内の並列処理
がスーパスカラ的な動作を行うものと仮定しているた
め、従来のプロセツサとの互換性を保つ目的から、レジ
スタ競合の解消が必要となつている。しかし、プロセツ
サエレメントA2,A3がVLIWプロセツサとして並列処理を
行うとすれば、レジスタ競合の解消は必要ない。この場
合、コンパイラにより、レジスタ競合が起こり得ないよ
うな命令列を生成しておかなければならない。However, among a plurality of instructions fetched simultaneously by the processor elements A2 and A3, there are combinations of instructions that cannot be executed in parallel. First instruction and second
This corresponds to a case where a register conflict occurs with the second instruction. When the destination register of the first instruction (the register for storing the operation result of the instruction) is specified as the source register of the second instruction (the register having a value to be referred to in the instruction), Two instructions cannot be executed in parallel. That is, the first instruction must be executed and the execution of the second instruction must be delayed until a result is obtained. The parallel operation mechanism A9, A10,
It has a role of resolving such register conflicts and a role of register sharing between processor elements. Note that the present invention assumes that parallel processing in the processor elements A2 and A3 performs a superscalar operation, so that it is necessary to eliminate register conflicts for the purpose of maintaining compatibility with the conventional processor. ing. However, if the processor elements A2 and A3 perform parallel processing as VLIW processors, it is not necessary to eliminate register conflicts. In this case, an instruction sequence must be generated by the compiler so that register conflicts cannot occur.

まず第一に、並列動作機構A9,A10によるレジスタ競合
の解消方法について述べる。第１命令（命令レジスタD1
に保持された命令）は本来最も先に実行されるべき命令
であるので、他の並列実行される命令からの影響がな
い。しかし、第２命令（命令レジスタD2に保持された命
令）は本来第１命令の次に実行されるべき命令なので、
第１命令のデステイネーシヨンレジスタが第２命令のソ
ースレジスタとして指定されている場合、第１命令の実
行後に第２命令を実行しなければ、第２命令が正しいソ
ースレジスタの値を参照できず、誤つた演算を行うこと
がある。このため、デコーダD5は第１命令のデステイネ
ーシヨンレジスタ番号を、信号線d8を介してデコーダD6
に渡す。デコーダD6内部で第２命令のソースレジスタ番
号と受け取つた第１命令のデステイネーシヨンレジスタ
番号との一致比較を行う。この結果、一致している場合
には、デコーダD6は自分自身の命令解読動作を抑制し、
第１命令の示す演算が演算器D11で実行された後に第２
命令の実行を行うように制御する。First, a method of resolving register conflicts by the parallel operation mechanisms A9 and A10 will be described. The first instruction (instruction register D1
) Are originally the instructions to be executed first, so that there is no influence from other instructions to be executed in parallel. However, the second instruction (the instruction held in the instruction register D2) is an instruction that should be executed next to the first instruction, so that
If the destination register of the first instruction is specified as the source register of the second instruction, the second instruction cannot refer to the correct value of the source register unless the second instruction is executed after the execution of the first instruction. Erroneous calculation may be performed. Therefore, the decoder D5 sends the destination register number of the first instruction to the decoder D6 via the signal line d8.
Pass to. In the decoder D6, a comparison is made between the source register number of the second instruction and the destination register number of the received first instruction. As a result, if they match, the decoder D6 suppresses its own instruction decoding operation,
After the operation indicated by the first instruction is executed by the arithmetic unit D11, the second
Controls execution of instructions.

第２番目のプロセツサエレメントA3においても同様の
動作を行うが、マルチプロセツサ動作時と並列動作時と
で動作が異なる。マルチプロセツサ動作時には、各プロ
セツサエレメントが互いに独立して動作するので、プロ
セツサエレメントA3がプロセツサエレメントA2の影響を
まつたく受けない。このため第３命令（命令レジスタD3
に保持された命令）においては、レジスタ競合は起こり
えない。また、第４命令（命令レジスタD4に保持された
命令）においては、第３命令のデステイネーシヨンレジ
スタと第４命令のソースレジスタとの間のレジスタ競合
を解消するだけでよい。並列動作時にはプロセツサエレ
メントA3はプロセツサエレメントA2と同期して命令を並
列に実行するので、プロセツサエレメントA2で実行中の
第１命令と第２命令の影響を受ける。すなわち、第３命
令は、第１命令と第２命令のデステイネーシヨンレジス
タと自命令のソースレジスタとの競合解消を行い、第４
命令は、第１命令乃至第３命令のデステイネーシヨンレ
ジスタと自命令のソースレジスタとの競合解消を行わな
ければならない。したがつて、第３命令に対しては、信
号線d7とd9を介して第１命令と第２命令のデステイネー
シヨンレジスタ番号を渡し、デコーダD7において第３命
令のソースレジスタ番号と受け取つた第１命令と第２命
令のデステイネーシヨンレジスタ番号の一致比較を行
う。ここで競合していないことが判れば、すぐに第３命
令の演算を演算器D13を用いて実行する。逆に競合して
いることが判れば、デコーダD7は少なくとも第２命令が
演算器D12を用いて演算終了するまで、自分自身の命令
解読動作を抑制する。また、第４命令に対しては、信号
線d7とd10により、第１命令乃至第３命令のデステイネ
ーシヨンレジスタ番号を送る。デコーダD8内で第４命令
のソースレジスタ番号と受け取つた第１命令乃至第３命
令のデステイネーシヨンレジスタ番号との一致比較を行
う。ここで競合していないことが判れば、すぐに第４命
令の演算を演算器D14を用いて実行する。逆に競合して
いることが判れば、デコーダD8は少なくとも第３命令が
演算器D13を用いて演算終了するまで、自分自身の命令
解読動作を抑制する。なお、第ｎ命令の実行がレジスタ
競合により抑制されている場合、第ｎ＋１命令の実行も
抑制される。The same operation is performed in the second processor element A3, but the operation differs between the multiprocessor operation and the parallel operation. At the time of the multiprocessor operation, since the processor elements operate independently of each other, the processor element A3 is not affected by the processor element A2. Therefore, the third instruction (instruction register D3
), No register conflict can occur. Further, in the fourth instruction (the instruction held in the instruction register D4), it is only necessary to resolve the register conflict between the destination register of the third instruction and the source register of the fourth instruction. At the time of the parallel operation, the processor element A3 executes instructions in parallel with the processor element A2, and thus is affected by the first instruction and the second instruction being executed by the processor element A2. That is, the third instruction resolves the conflict between the destination register of the first instruction and the second instruction and the source register of the own instruction.
The instruction must resolve the conflict between the destination registers of the first to third instructions and the source register of the instruction. Therefore, for the third instruction, the destination register numbers of the first instruction and the second instruction are passed through signal lines d7 and d9, and the source register number of the third instruction received at the decoder D7 is received. A comparison is made between the destination register numbers of the first instruction and the second instruction. If it is determined that there is no conflict, the operation of the third instruction is immediately executed using the arithmetic unit D13. On the other hand, if it is determined that there is a conflict, the decoder D7 suppresses its own instruction decoding operation at least until the second instruction completes the operation using the arithmetic unit D12. For the fourth instruction, the destination register numbers of the first to third instructions are sent via signal lines d7 and d10. In the decoder D8, a comparison is made between the source register number of the fourth instruction and the destination register numbers of the received first to third instructions. If it is determined that there is no conflict, the operation of the fourth instruction is immediately executed using the arithmetic unit D14. Conversely, if it is determined that there is a conflict, the decoder D8 suppresses its own instruction decoding operation at least until the third instruction completes the operation using the arithmetic unit D13. When the execution of the n-th instruction is suppressed due to the register conflict, the execution of the (n + 1) -th instruction is also suppressed.

以上のような、マルチプロセツサ動作と並列動作によ
つて、プロセツサエレメントA3内のデコーダD7,D8の動
作の制御を変更する必要がある。これは、セレクタD9,D
10によつて行われる。セレクタD9,D10は並列化フラグA7
の値を示す信号線a1により制御され、マルチプロセツサ
動作時には、デコーダD7に０信号（命令解読動作を抑制
しない）を、デコーダD8には第３命令のデステイネーシ
ヨンレジスタ番号を送出する。並列動作時には、デコー
ダD7に第１命令と第２命令のデステイネーシヨンレジス
タ番号を、デコーダD8に第１命令乃至第３命令のデステ
イネーシヨンレジスタ番号を渡す。It is necessary to change the control of the operations of the decoders D7 and D8 in the processor element A3 by the multiprocessor operation and the parallel operation as described above. This is the selector D9, D
Performed by 10. Selector D9, D10 is parallel flag A7
The signal is transmitted by the signal line a1 indicating the value of, and sends a 0 signal (does not suppress the instruction decoding operation) to the decoder D7 and the destination register number of the third instruction to the decoder D8 during the multiprocessor operation. In the parallel operation, the destination register numbers of the first and second instructions are passed to the decoder D7, and the destination register numbers of the first to third instructions are passed to the decoder D8.

次に、並列動作機構A9,A10のもう一つの役割であるレ
ジスタ共有について説明する。レジスタ共有は並列動作
機構A9,A10のうち、ゲートD17と信号線d5,d6により行わ
れる。マルチプロセツサ動作時には各プロセツサエレメ
ントが独立に動作しているため、たとえレジスタ番号が
一致していても、物理的に異なるレジスタの値を参照す
ることになる。すなわち、プロセツサエレメントA2で実
行する場合にはレジスタフアイルD15のレジスタを、プ
ロセツサエレメントA3で実行する場合にはレジスタフア
イルD16のレジスタを使用する。これは、リードバスd1,
d2とストアバスd3,d4とを用いてそれぞれ読み出し／書
き込みを行うことで可能となる。しかし、並列動作時に
各プロセツサエレメントが同じレジスタ番号を指定した
場合、完全に同一のレジスタを参照することになる。こ
のことから、２つの動作モードに応じて読み出すべきレ
ジスタおよび書き込むべきレジスタを変更する必要があ
る。本発明では、この問題を解決するために、レジスタ
フアイルD15の出力ポート数と入力ポート数をレジスタ
フアイルD16のポート数の２倍とし、他プロセツサエレ
メントとレジスタフアイルD15内のレジスタの値を送受
するためのグローバルリードバスd5とグローバスストア
バスd6を設けている。なお、プロセツサエレメント数が
ｎ個であれば、レジスタフアイルD15のポート数は他プ
ロセツサエレメント内のレジスタフアイルのポート数の
ｎ倍となる。このようなレジスタフアイルの構成は、プ
ロセツサエレメント数が増加すると欠点があるが、VLSI
の大規模化に伴つて、将来実現可能である。Next, register sharing, which is another role of the parallel operation mechanisms A9 and A10, will be described. Register sharing is performed by the gate D17 and the signal lines d5 and d6 among the parallel operation mechanisms A9 and A10. Since each processor element operates independently during the multiprocessor operation, even if the register numbers match, the values of physically different registers are referred to. That is, the register of the register file D15 is used when the processing is executed by the processor element A2, and the register of the register file D16 is used when the processing is executed by the processor element A3. This is the read bus d1,
This can be achieved by performing read / write using d2 and the store buses d3 and d4, respectively. However, when each processor element specifies the same register number during the parallel operation, the same register is referred to completely. Therefore, it is necessary to change the register to be read and the register to be written according to the two operation modes. In the present invention, in order to solve this problem, the number of output ports and the number of input ports of the register file D15 are set to twice the number of ports of the register file D16, and the values of the registers in the register file D15 are transmitted and received with other processor elements. A global read bus d5 and a glow bus store bus d6. If the number of processor elements is n, the number of ports in the register file D15 is n times the number of ports in the register file in another processor element. Such a register file configuration has a drawback as the number of processor elements increases, but the VLSI
It can be realized in the future as the scale of the system increases.

マルチプロセツサ動作時には、各プロセツサエレメン
トはそれぞれ独立にレジスタフアイルの参照を行う。リ
ードバスd1,d2を用いて２命令並列実行に必要なレジス
タを読み出し、演算結果はストアバスd3,d4を用いてレ
ジスタにデータを書き込む。リードバスd1,d2とストア
バスd3,d4は各プロセツサエレメント内でのみ使用さ
れ、他プロセツサエレメントに影響を与えない。During multiprocessor operation, each processor element independently refers to a register file. The registers necessary for the parallel execution of the two instructions are read using the read buses d1 and d2, and the operation result is written into the registers using the store buses d3 and d4. The read buses d1, d2 and the store buses d3, d4 are used only in each processor element, and do not affect other processor elements.

並列動作時には、プロセツサエレメントA3はグローバ
ルリードバスd5とグローバルストアバスd6を使用してレ
ジスタフアイルD15のレジスタを参照し、レジスタフア
イルD16は使用しない。このようなレジスタ共有の制御
は、並列化フラグA7からの信号a1により、グローバルリ
ードバスd5とグローバルストアバスd6上に存在するゲー
トD17を開閉することにより行われる。なお、レジスタ
フアイルD15から読み出されたレジスタの内容のうち、
プロセツサエレメントA2において使用されるデータはリ
ードバスd1を介して演算器D11もしくは演算器D12に伝え
られる。演算器D11もしくは演算器D12による演算結果の
書き込みも、ストアバスd6を介して行われ、他プロセツ
サエレメントとの通信は行われない。In the parallel operation, the processor element A3 refers to the register file D15 using the global read bus d5 and the global store bus d6, and does not use the register file D16. Such control of register sharing is performed by opening and closing the gate D17 existing on the global read bus d5 and the global store bus d6 by the signal a1 from the parallelization flag A7. In addition, of the contents of the register read from the register file D15,
Data used in the processor element A2 is transmitted to the arithmetic unit D11 or the arithmetic unit D12 via the read bus d1. The writing of the operation result by the operation unit D11 or the operation unit D12 is also performed via the store bus d6, and no communication is performed with another processor element.

以上、プロセツサエレメントA2,A3とその構成要素で
ある並列動作機構A9,A10について示した。次に、マルチ
プロセツサ動作と並列動作によつて命令キヤツシユB1の
読み出し数と読み出し幅を変更するキヤツシユ制御装置
A6について示す。As described above, the processor elements A2 and A3 and the parallel operation mechanisms A9 and A10, which are the components thereof, have been described. Next, a cache control device for changing the read number and read width of the instruction cache B1 by multiprocessor operation and parallel operation.
A6 is shown.

命令の読み出しは、マルチプロセツサ動作時には、各
プロセツサエレメントで独立に行われ、並列動作時に
は、各プロセツサエレメントが連続したｎ個の命令を取
り出して実行する。このため、並列化フラグA7がマルチ
プロセツサ動作モードを示す場合には、ｎ個のプロセツ
サエレメントのアドレス線をそのまま命令キヤツシユB1
に独立して入力させ、並列動作を示す場合には、第１番
目のプロセツサエレメントのアドレス線のみを使用し、
アドレス線の値と、その値にそれぞれ１乃至（ｎ−１）
命令長分だけ増加した値とを命令キヤツシユB1のアドレ
ス入力とする必要がある。Instruction reading is performed independently by each processor element during multiprocessor operation, and during parallel operation, each processor element fetches and executes consecutive n instructions. For this reason, when the parallelization flag A7 indicates the multiprocessor operation mode, the instruction cache B1 is connected to the address lines of the n processor elements as they are.
, And to indicate parallel operation, use only the address line of the first processor element,
The value of the address line and the value are respectively 1 to (n-1)
It is necessary to use the value increased by the instruction length as the address input of the instruction cache B1.

第４図にこのようなキヤツシユ記憶装置の制御を行う
ための、キヤツシユ制御装置A6の最も単純な構成図を示
した。データキヤツシユB2に対しては各プロセツサエレ
メントのアドレス線a4,a5がそのまま入力される。E1は
加算器であり、第１番目のプロセツサエレメントA2のア
ドレス線の値に１命令長分だけ増加した信号をアドレス
線e1に出力する。E2はセレクタであり、並列化フラグA7
がマルチプロセツサ動作モードを示す場合には、第２番
目のプロセツサエレメントA3からのアドレス線a5を選択
し、並列動作モードである場合には、アドレス線e1を選
択する。これにより、マルチプロセツサ動作モード時に
は、各プロセツサエレメントが独立して命令のアドレス
入力を行い、並列動作時には、第１番目のプロセツサエ
レメントA2のアドレス線の値からｎ個の連続した命令を
指定することが可能となる。FIG. 4 shows the simplest configuration of the cache control device A6 for controlling such a cache storage device. The address lines a4 and a5 of each processor element are directly input to the data cache B2. E1 is an adder which outputs a signal which is increased by one instruction length from the value of the address line of the first processor element A2 to the address line e1. E2 is a selector, and the parallelization flag A7
Indicates the multi-processor operation mode, selects the address line a5 from the second processor element A3, and selects the address line e1 in the parallel operation mode. Thus, in the multiprocessor operation mode, each processor element independently inputs an address of an instruction. In a parallel operation, n consecutive instructions are read from the value of the address line of the first processor element A2. It can be specified.

なお、第４図に示した構成の他に、命令キヤツシユB1
のｎ個の出力ポートのうち、１つをｎ×ｍ×ｋビツトの
出力ポートとする方法もある（ｋは命令のビツト長、ｍ
は各プロセツサエレメントが同時に取り出す命令数）。
この構成図を第５図に示す（ｎ＝ｍ＝２時）。F1はセレ
クタであり、並列化フラグA7の出力信号a1により制御さ
れる。f1は第１番目の出力ポート、f2は第２番目の出力
ポートからのデータ線を示す。この場合、各プロセツサ
エレメントからのアドレス線a4,a5をそのまま命令キヤ
ツシユB1のアドレス入力とする。出力ポートf1はもう一
つの出力ポートf2のｎ倍のビツト幅を有する。このうち
mxkビツトは信号線b1を介してプロセツサエレメントA2
に送られるが、残りは信号線f4を介してセレエクタF4の
入力となる。マルチプロセツサ動作時には、それぞれの
出力ポートから、ｍ×ｋビツトを実行すべき命令として
取り出す。このとき、第１番目の出力ポートからの出力
のうち、残りの（ｎ−１）×ｍ×ｋビツトは使用されな
い。また、並列動作時には、第１番目のプロセツサエレ
メントの出力ポートから取り出したｎ×ｍ個の連続した
命令（ｎ×ｍ×ｋビツト）を各プロセツサエレメントに
分配する。このとき、残りの（ｎ−１）個の出力は使用
されない。プロセツサエレメントA3が信号線f2あるいは
f3のいずれの命令列を実行すべきかは、並列化フラグA7
の出力信号a1によつてセレクタF1を制御することによつ
て行われる。マルチプロセツサ動作時には信号線f2を、
並列動作時には信号線f3を選択する。Note that, in addition to the configuration shown in FIG.
Out of the n output ports, one of which is an n × m × k bit output port (k is the bit length of the instruction, m
Is the number of instructions that each processor element retrieves at the same time).
This configuration is shown in FIG. 5 (n = m = 2). F1 is a selector, which is controlled by the output signal a1 of the parallelization flag A7. f1 indicates a data line from the first output port, and f2 indicates a data line from the second output port. In this case, the address lines a4 and a5 from each processor element are used as the address input of the instruction cache B1. The output port f1 has a bit width n times that of the other output port f2. this house
The mxk bit is the processor element A2 via signal line b1.
, But the remainder becomes the input of the selector F4 via the signal line f4. At the time of multiprocessor operation, m × k bits are fetched from each output port as an instruction to be executed. At this time, of the output from the first output port, the remaining (n-1) × m × k bits are not used. In parallel operation, n × m continuous instructions (n × m × k bits) taken out from the output port of the first processor element are distributed to each processor element. At this time, the remaining (n-1) outputs are not used. Processor element A3 is connected to signal line f2 or
Which instruction sequence of f3 should be executed depends on the parallelization flag A7
This is performed by controlling the selector F1 according to the output signal a1. At the time of multiprocessor operation, the signal line f2
During the parallel operation, the signal line f3 is selected.

次に、本発明のデータ処理装置をマルチプロセツサ動
作モードにおいて使用する場合の実施例について説明す
る。Next, an embodiment in which the data processing device of the present invention is used in a multiprocessor operation mode will be described.

第６図はマルチプロセツサ動作機構の機能を示してい
る。マルチプロセツサ動作機構は、本発明のデータ処理
装置がマルチプロセツサとして動作するために必要なプ
ロセツサエレメント間の連絡装置である。FIG. 6 shows the function of the multiprocessor operation mechanism. The multiprocessor operating mechanism is a communication device between the processor elements necessary for the data processing device of the present invention to operate as a multiprocessor.

プロセツサエレメント間で割込みを行う機能として、
相手プロセツサエレメント番号を指定して割込みを発生
させる機能、全プロセツサエレメントに割込みを配布す
る機能（ALL）、並びに、いずれか１つのプロセツサエ
レメントが受け付ければよい割込み（ANYONE）の３種類
が存在する。As a function to interrupt between processor elements,
There are three types: a function that generates an interrupt by specifying the other processor element number, a function that distributes an interrupt to all processor elements (ALL), and an interrupt that any one processor element can accept (ANYONE). Exists.

また、制御命令関連では、並列処理をきめ細かく実行
するために、プロセツサエレメント間でレジスタの値を
転送する命令、他プロセツサエレメントからのスタート
信号が来るまで待ち状態となる停止命令（WAIT）、これ
を解除する再開命令（START）がある。For control instructions, instructions for transferring register values between processor elements to execute parallel processing in detail, a stop instruction (WAIT) that waits until a start signal from another processor element arrives, There is a restart instruction (START) to cancel this.

排他制御として、メモリ装置の特定のアドレスに対し
て、Test＆setもしくはCompare＆Swapを実行する命令が
ある。Compare＆Swap命令はメモリ装置からデータを読
み出し、読み出したデータの値を判定し、その結果に従
つてメモリ装置への書き込みを行う命令である。この一
連の処理の間、他プロセツサエレメントが同一アドレス
に対するCompare＆Swapを行わないようにする必要があ
る。このため、各プロセツサエレメントは、Compare＆S
wap命令を実行する前に、マルチプロセツサ動作機構に
対し、Compare＆Swap命令の実行要求を発する。実行要
求が許可された後、Compare＆Swap命令の動作を行い、
その後、要求を取り下げるように制御する。マルチプロ
セツサ動作機構は、複数個のプロセツサエレメントから
要求があつた場合には、同一アドレスに対する要求であ
れば、ただ一つのプロセツサエレメントに対してのみ実
行権を与え、異なるアドレスに対する要求であれば、プ
ロセツサエレメントそれぞれに対して実行権を与える。As exclusive control, there is an instruction to execute Test & Set or Compare & Swap for a specific address of the memory device. The Compare & Swap instruction is an instruction for reading data from a memory device, determining the value of the read data, and writing the data to the memory device according to the result. During this series of processing, it is necessary to prevent other processor elements from performing Compare & Swap for the same address. For this reason, each processor element is a Compare & S
Before executing the wap instruction, a request to execute the Compare & Swap instruction is issued to the multiprocessor operating mechanism. After the execution request is granted, the operation of the Compare & Swap instruction is performed,
Thereafter, control is performed so as to withdraw the request. The multi-processor operation mechanism gives execution rights to only one processor element if there is a request from a plurality of processor elements and if the request is for the same address, and issues a request for a different address. If so, the execution right is given to each processor element.

以上のようなCompare＆Swap命令制御を行うためのマ
ルチプロセツサ動作機構A8の内部構成図を第７図に示
す。FIG. 7 shows an internal configuration diagram of the multiprocessor operating mechanism A8 for performing the above Compare & Swap instruction control.

図中、g1,g2は各プロセツサエレメントがCompare＆Sw
ap命令を実行する時に使用すアドレス線、g3,g4はCompa
re＆Swap命令の実行終了を示すための信号線、g5,g6はC
ompare＆Swap命令の実行開始を示すための信号線であ
る。g7,g8はマルチプロセツサ動作機構A8からプロセツ
サエレメントA2,A3に対してCompare＆Swap命令の実行権
の可否を伝えるための信号線であり、「ON」のとき実行
可、「OFF」のとき実行不可を示す。信号線g1乃至g8が
第１図における信号線a6,a7を構成する。また、マルチ
プロセツサ動作機構A8は並列化フラグA7の出力信号a1を
入力として使用する。In the figure, g1 and g2 are Compare & Sw for each processor element.
Address line used when executing ap instruction, g3 and g4 are Compa
Signal lines to indicate the end of re & Swap instruction execution, g5 and g6 are C
This is a signal line for indicating the start of execution of the ompare & Swap instruction. g7 and g8 are signal lines for notifying the execution right of Execute of the Compare & Swap instruction from the multiprocessor operation mechanism A8 to the processor elements A2 and A3, which are executable when "ON" and executable when "OFF". Indicates impossibility. The signal lines g1 to g8 constitute the signal lines a6 and a7 in FIG. The multiprocessor operation mechanism A8 uses the output signal a1 of the parallelization flag A7 as an input.

G1は現在実行中のCompare＆Swap命令のアドレスを保
持するバツフアレジスタ、G2は現在実行中のCompare＆S
wap命令数を示すカウンタである。G3乃至G5は、加減算
を行うための演算器である。G6乃至G8は比較器であり、
Compare＆Swap命令のアドレスやカンウンタG2の値を比
較する。G9,G10は並列化フラグの値によつて比較する。
G9,G10は並列化フラグの値によつて信号線g7,g8の値を
変化させるためのセレクタ、G11はバツフアに入力する
アドレスを選択するためのセレクタである。G1 is a buffer register that holds the address of the currently executing Compare & Swap instruction, and G2 is the currently executing Compare & Swap instruction.
This is a counter indicating the number of wap instructions. G3 to G5 are arithmetic units for performing addition and subtraction. G6 to G8 are comparators,
Compares the address of the Compare & Swap instruction and the value of the counter G2. G9 and G10 are compared based on the value of the parallelization flag.
G9 and G10 are selectors for changing the values of the signal lines g7 and g8 according to the value of the parallelization flag, and G11 is a selector for selecting an address to be input to the buffer.

次に、上記ハードウエアの動作例を示す。２つのプロ
セツサエレメントA2,A3からのアドレス線g1,g2はともに
比較器G6,G7により、バツフアの内容と比較され、一致
したアドレスが存在すれば「ON」、存在しなければ「OF
F」が信号線g9,g10に出力される。また、カウンタG2の
値は比較器G8によつて、常に０と比較され、その否定が
信号線g11となる。すなわち、信号線g11はカウンタG2の
値が０でない（バツフアG1に記憶されているアドレスが
存在する）場合に「ON」となり、カウンタG2の値が０の
とき「OFF」となる。このとき、信号線g5,g9,g11の論理
積をとることにより、第１番目のプロセツサエレメント
A2がCompare＆Swap命令を実行した時に、バツフアG1内
に同一アドレスが存在することを示すことができる。ま
た、信号線g6,g9,g11の論理積をとることにより、第２
番目のプロセツサエレメントがCompare＆Swap命令を実
行した時に、バツフアG1内に同一アドレスが存在するこ
とを示すことができる。これらの信号の否定をそれぞれ
信号線g12,g13とすると、バツフアG1内に同一アドレス
が存在しない、すなわち、Compare＆Swap命令を実行す
ることができる許可信号を意味する。ただし、これは既
に実行中のCompare＆Swap命令のアドレスとの比較であ
り、同時に実行した他プロセツサエレメントのCompare
＆Swap命令との競合を解消しなければならない。Next, an operation example of the above hardware will be described. The address lines g1 and g2 from the two processor elements A2 and A3 are both compared with the contents of the buffer by the comparators G6 and G7. If there is a matching address, "ON";
F ”is output to the signal lines g9 and g10. Further, the value of the counter G2 is always compared with 0 by the comparator G8, and the negation thereof becomes the signal line g11. That is, the signal line g11 is "ON" when the value of the counter G2 is not 0 (the address stored in the buffer G1 exists), and is "OFF" when the value of the counter G2 is 0. At this time, by taking the logical product of the signal lines g5, g9, and g11, the first processor element
When A2 executes the Compare & Swap instruction, it can be shown that the same address exists in the buffer G1. Also, by taking the logical product of the signal lines g6, g9, and g11, the second
When the processor element executes the Compare & Swap instruction, it can indicate that the same address exists in the buffer G1. If these signals are negated by signal lines g12 and g13, respectively, this means that the same address does not exist in the buffer G1, that is, a permission signal that can execute the Compare & Swap instruction. However, this is a comparison with the address of the Compare & Swap instruction that is already being executed.
& Conflict with Swap instruction must be resolved.

実施例として、第７図は、常に第１番目のプロセツサ
エレメントA2を優先するというハードウエアを示してい
る。したがつて、第１番目のプロセツサエレメントA2の
許可信号g12はそのままセレクタG9を介して、信号g7と
して送出される。しかしながら、第２番目のプロセツサ
エレメントA3に関しては、第１番目のプロセツサエレメ
ントA2の許可信号g12と競合を起こしていないことを条
件としてセレクタG10を介して信号g8として送出され
る。２つの許可信号g12とg13が競合を起こしていないこ
とは、信号g12とg13の論理積の否定g14をとればよい。
したがつて、第２番目のプロセツサエレメントA3の真の
許可信号g15は信号g13とg14との論理積となる。これが
セレクタG10の入力となる。As an embodiment, FIG. 7 shows hardware that always gives priority to the first processor element A2. Accordingly, the enable signal g12 of the first processor element A2 is transmitted as it is via the selector G9 as the signal g7. However, the second processor element A3 is transmitted as the signal g8 via the selector G10 on condition that no competition occurs with the permission signal g12 of the first processor element A2. The fact that the two permission signals g12 and g13 do not compete with each other can be obtained by taking the negative g14 of the logical product of the signals g12 and g13.
Accordingly, the true enable signal g15 of the second processor element A3 is the logical product of the signals g13 and g14. This is the input of the selector G10.

セレクタg9,g10は並列化フラグA7がマルチプロセツサ
動作を示す場合には上記の許可信号g12,g15を選択し、
並列動作を示す場合にはマルチプロセツサ動作機構は不
要なので、常に、並列化フラグA7自身の値を返すように
設定している。The selectors g9 and g10 select the permission signals g12 and g15 when the parallelization flag A7 indicates a multiprocessor operation,
When the parallel operation is indicated, the multiprocessor operation mechanism is unnecessary, so that the value of the parallelization flag A7 itself is always set to be returned.

バツフアG1へのアドレスを記憶させる場合の選択はセ
レクタG11により行われる。いずれのプロセツサエレメ
ントに実行権を与えるかによつてバツフアG1へ与えるア
ドレス線を選択させる。また、実行権を与えた場合に
は、カウンタG2の値を演算器G3により増加させる。演算
器G3の制御は、２つのプロセツサエレメントの許可信号
g12とg15の論理和g16により行われる。Compare＆Swap命
令の実行が終了すると、終了信号g3もしくはg4が渡され
るので、これにより、演算器G4,G5を用いてカウンタG2
を減少させる。演算器G4,G5の制御は終了信号g3,g4と、
バツフアG1内に指定のアドレスが存在することを示す比
較器G6,G7の出力信号g9,g10との論理積g17,g18によりそ
れぞれ行われる。The selection for storing the address to the buffer G1 is performed by the selector G11. An address line to be supplied to the buffer G1 is selected depending on which processor element is given execution right. When the execution right is given, the value of the counter G2 is increased by the arithmetic unit G3. The operation unit G3 is controlled by the permission signals of the two processor elements.
This is performed by the logical sum g16 of g12 and g15. When the execution of the Compare & Swap instruction is completed, the end signal g3 or g4 is passed, whereby the counter G2 is calculated using the arithmetic units G4 and G5.
Decrease. The control of the arithmetic units G4 and G5 is performed by ending signals g3 and g4,
This is performed by the logical products g17 and g18 with the output signals g9 and g10 of the comparators G6 and G7 indicating that the specified address exists in the buffer G1.

ここで、バツフアG1がオーバフローすることもある
が、Compare＆Swap命令の実行が終了すると、すぐさま
アドレスをバツフアG1から削除するため、十分な大きさ
のバツフアG1を用意しておけば問題ない。このように、
本発明のマルチプロセツサ動作機構は、基本的に異なる
アドレスへのCompare＆Swap命令の実行であれば、複数
個に対して実行権を与えていることが特徴である。Here, the buffer G1 may overflow, but when the execution of the Compare & Swap instruction is completed, the address is immediately deleted from the buffer G1, so that there is no problem if a sufficiently large buffer G1 is prepared. in this way,
The multiprocessor operation mechanism of the present invention is characterized in that execution rights are given to a plurality of basically, if a Compare & Swap instruction is executed at a different address.

次に第６図に示したレジスタ転送命令、停止／再開命
令を実現するためのマルチプロセツサ動作機構A8の構成
図を第８図に示す。H1,H2は同期フアシリテイであり、
各プロセツサエレメント間のデータ転送，同期などを行
う。H3はバス調停回路であり、プロセツサエレメント間
でデータ転送等を行う場合に、バスの使用権の調停など
を行う回路である。h1乃至h3はアドレス線，データ線，
制御線などを含む信号線である。なお、h1は一般的に通
信バスと呼ばれる。Next, FIG. 8 shows a configuration diagram of the multiprocessor operation mechanism A8 for realizing the register transfer instruction and the stop / restart instruction shown in FIG. H1 and H2 are synchronous facilities,
Performs data transfer and synchronization between the processor elements. H3 is a bus arbitration circuit that arbitrates the right to use the bus when performing data transfer or the like between the processor elements. h1 to h3 are address lines, data lines,
This is a signal line including a control line and the like. Note that h1 is generally called a communication bus.

このようなマルチプロセツサ動作機構A8を使用するこ
とにより、プロセツサエレメント間で協調してプログラ
ムを実行することが可能である。本実施例において説明
するプロセツサエレメント間の協調命令は、第６図にお
いて示したように、プロセツサエレメント間でレジスタ
の内容を転送する「レジスタ転送命令」、あるアドレス
を指定して自プロセツサエレメントを停止させる「停止
命令」、指定したアドレスで停止しているプロセツサエ
レメントを再開する「再開命令」である。各命令のフオ
ーマツトの一例を第９図に示す。レジスタ転送命令に関
しては２種類のフオーマツトを示した。By using such a multiprocessor operation mechanism A8, it is possible to execute a program in cooperation between the processor elements. As shown in FIG. 6, the coordination instruction between the processor elements described in this embodiment is a "register transfer instruction" for transferring the contents of a register between the processor elements. A "stop instruction" for stopping an element and a "restart instruction" for restarting a processor element stopped at a specified address. FIG. 9 shows an example of the format of each instruction. Regarding the register transfer instruction, two types of formats are shown.

レジスタ転送命令（１）は送信側プロセツサエレメン
トが実行する送信命令のフオーマツトと、受信側プロセ
ツサエレメントが実行する受信命令のフオーマツトとが
異なる。この２つの命令を組み合わせて使用する場合、
送信命令において、転送すべき値を保持するレジスタの
番号（R1）、受信側プロセツサエレメントの番号（PE
＃）、転送した値を格納するレジスタの番号（R2）の３
種類を指定する。受信命令では何も指定する必要がな
い。レジスタ転送命令（２）の命令フオーマツトは送受
信共通である。送信命令においては、転送すべき値を保
持するレジスタの番号（R1）、受信側プロセツサエレメ
ントの番号（PE＃）を指定する。受信命令においては、
転送されてきた値を格納するレジスタの番号（R1）と、
送信側プロセツサエレメントの番号（PE＃）とを指定す
る。In the register transfer instruction (1), the format of the transmission instruction executed by the transmission-side processor element is different from the format of the reception instruction executed by the reception-side processor element. When using these two instructions in combination,
In the transmission instruction, the number of the register holding the value to be transferred (R1) and the number of the receiving processor element (PE
#), 3 of the register number (R2) for storing the transferred value
Specify the type. There is no need to specify anything in the receive command. The instruction format of the register transfer instruction (2) is common to transmission and reception. In the transmission instruction, the number of the register holding the value to be transferred (R1) and the number of the receiving processor element (PE #) are specified. In the receiving command,
The register number (R1) for storing the transferred value,
Specifies the number (PE #) of the transmitting processor element.

なお、第９図のレジスタ転送命令のフオーマツトで
は、プロセツサエレメントの番号を指定するフイールド
が11ビツトであるため、１対１のレジスタ転送を行う場
合、プロセツサエレメントの数は最大2¹¹個まで許容で
きる。また、１対多のレジスタ転送を可能とするために
は、１ビツトを１プロセツサエレメントと対応させる必
要があるため、最大11個までのプロセツサエレメントを
装備できる。In the the format of register transfer instruction of FIG. 9, for field to specify the number of processor elements is 11 bits, a case of performing one-to-one register transfer, the number of processor elements is up to 2 ¹¹ acceptable. Also, in order to enable one-to-many register transfer, one bit must correspond to one processor element, so that a maximum of 11 processor elements can be provided.

レジスタ転送においては２種類の同期方式を考えるこ
とができる。すなわち、受信側プロセツサエレメントが
受信命令を実行しているか否かにかかわらず、送信側プ
ロセツサエレメントがレジスタの内容を送信してそのま
ま次の命令を実行できる方式（同期方式（ａ））と、受
信側プロセツサエレメントが受信命令を実行するまで送
信側プロセツサエレメントが一時停止状態となる方式
（同期方式（ｂ））がある。前述の２種類のフオーマツ
トと２種類の同期方式との組合せにより、４種類のハー
ドウエアを考えることができるが、本実施例において
は、命令フオーマツト（１）と同期方式（ａ）とを組み
合わせたハードウエア、および、命令フオーマツト
（２）と同期方式（ｂ）とを組み合わせたハードウエア
の２種類についてのみ説明する。他の２種類の組み合わ
せに関しては以降に示す２種類のハードウエアから容易
に変更可能である。In register transfer, two types of synchronization methods can be considered. That is, regardless of whether the receiving processor element is executing the receiving instruction or not, the transmitting processor element can transmit the contents of the register and execute the next instruction as it is (synchronous method (a)). There is a method (synchronization method (b)) in which the transmission-side processor element is temporarily stopped until the reception-side processor element executes the reception instruction. Four types of hardware can be considered by combining the above-mentioned two types of formats and two types of synchronization methods. In the present embodiment, the instruction format (1) and the synchronization method (a) are combined. Only two types of hardware and hardware combining the instruction format (2) and the synchronization method (b) will be described. The other two types of combinations can be easily changed from the following two types of hardware.

第９図には、停止命令、再開命令についても示してい
る。停止命令、再開命令において指定するアドレスはレ
ジスタR1と変位ｄとベースレジスタRbにより計算され
る。通常の計算機においては、（Rb＋R1＋ｄ）をアドレ
スとして計算することが多いが、本発明においては、こ
れらの値により停止アドレスが一意に決定されることが
保障されていれば十分である。FIG. 9 also shows a stop instruction and a restart instruction. The address specified in the stop instruction and the restart instruction is calculated by the register R1, the displacement d, and the base register Rb. In an ordinary computer, (Rb + R1 + d) is often calculated as an address, but in the present invention, it is sufficient if it is guaranteed that the stop address is uniquely determined by these values.

第10図乃至第14図にレジスタ転送命令を実現するハー
ドウエアの構成図を示す。第10図，第11図は命令フオー
マツト（１）と同期方式（ａ）とを組み合わせたハード
ウエアである。また、第12図，第13図が命令フオーマツ
ト（２）と同期方式（ｂ）とを組み合わせたハードウエ
アである。第14図が２種類のハードウエアに共通の優先
順位決定回路である。以下、図番の順序にしたがつて、
説明を行う。FIGS. 10 to 14 show hardware configuration diagrams for realizing the register transfer instruction. FIGS. 10 and 11 show hardware combining the instruction format (1) and the synchronization method (a). FIGS. 12 and 13 show hardware in which the instruction format (2) and the synchronization method (b) are combined. FIG. 14 shows a priority determining circuit common to the two types of hardware. Hereinafter, according to the order of the figure numbers,
Give an explanation.

第10図は、命令フオーマツト（１）と同期方式（ａ）
との組み合わせ時の同期フアシリテイH1の内部を示す。
同期フアシリテイH2も同様である。同期方式（ａ）にお
いては、送信命令を実行したプロセツサエレメントが受
信側プロセツサエレメントの受信命令を待つことはな
い。このため、各プロセツサエレメントに対して、転送
されてきたレジスタの値とその値を格納すべきレジスタ
の番号とを一時的に保持しておくバツフア（レジスタフ
アイル）を用意しておく必要がある。I1がレジスタの値
と格納先のレジスタ番号を保持しておく同期レジスタフ
アイルである。I2,I3は同期レジスタフアイルのアドレ
スを指定するアドレスレジスタであり、それぞれ、受信
側プロセツサエレメントが受信値を読み出す場合、送信
側プロセツサエレメントが送信値を書き込む場合に使用
される。I4は加算器であり、アドレスレジスタI2,I3に
よる同期レジスタフアイルI1の読み出し／書き込みの後
に、各アドレスレジスタの値を増加させる。同期レジス
タフアイルI1はFIFO（先入れ先出し）形式で読み書きを
行うので、アドレスレジスタI2,I3共に読み書きの後に
アドレスを増加しなければならないためである。I5はセ
レクタであり、読み出し／書き込みの各動作にしたがつ
て、同期レジスタフアイルI1へのアドレス入力をアドレ
スレジスタI2,I3のいずれかから選択する。I6,I7は比較
回路である。同期レジスタフアイルI1に記憶されている
データがない場合には受信命令を実行するプロセツサエ
レメントを停止させ、同期レジスタフアイルI1の容量限
度までデータが格納されている場合には送信命令を実行
するプロセツサエレメントを停止させなければならな
い。読み出しアドレスレジスタI2の値をRAとし、書き込
みアドレスレジスタI3の値をWAとすると、「WA＝RA」を
表す信号を生成することにより、同期レジスタフアイル
I1に記憶されているデータがないことを示すことができ
る。また、「WA＋１＝RA」を表す信号により、同期レジ
スタフアイルI1の容量限度までデータが格納されている
ことを示すことが可能である。FIG. 10 shows the instruction format (1) and the synchronization method (a).
5 shows the inside of the synchronization facility H1 when combined with.
The same applies to the synchronous facility H2. In the synchronous method (a), the processor element that has executed the transmission instruction does not wait for the reception instruction of the receiving processor element. Therefore, for each processor element, it is necessary to prepare a buffer (register file) for temporarily holding the value of the transferred register and the number of the register in which the value is to be stored. . I1 is a synchronous register file that holds the register value and the register number of the storage destination. I2 and I3 are address registers for specifying the address of the synchronization register file, and are used when the receiving processor element reads out the received value and when the transmitting processor element writes the transmitted value. I4 is an adder that increases the value of each address register after reading / writing of the synchronization register file I1 by the address registers I2 and I3. This is because the synchronous register file I1 performs reading and writing in a FIFO (first-in first-out) format, so that both the address registers I2 and I3 must increase the address after reading and writing. I5 is a selector, which selects an address input to the synchronization register file I1 from one of the address registers I2 and I3 according to each operation of read / write. I6 and I7 are comparison circuits. If there is no data stored in the synchronous register file I1, the processor element that executes the receiving instruction is stopped, and if the data is stored up to the capacity limit of the synchronous register file I1, the processing that executes the transmitting instruction is stopped. The sensor element must be stopped. Assuming that the value of the read address register I2 is RA and the value of the write address register I3 is WA, a signal indicating "WA = RA" is generated, thereby generating a synchronous register file.
It can indicate that there is no data stored in I1. Further, it is possible to indicate that data is stored up to the capacity limit of the synchronization register file I1 by a signal representing “WA + 1 = RA”.

i1乃至i8は制御線を示す。i1はクロツク信号線であ
り、これを入力クロツクとしてプロセツサエレメントA
2,アドレスレジスタI2,I3等は動作する。i2は読み出し
信号線であり、プロセツサエレメントA2が受信命令を実
行した場合、「ON」となる。i3は書き込み信号線であ
る。他プロセツサエレメントが自プロセツサエレメント
に対して送信命令を実行した場合、信号i3が送られる。i1 to i8 indicate control lines. i1 is a clock signal line, which is used as an input clock by the processor element A
2. The address registers I2, I3, etc. operate. i2 is a read signal line, which is turned "ON" when the processor element A2 executes a reception command. i3 is a write signal line. When another processor element executes a transmission instruction to its own processor element, a signal i3 is sent.

同期レジスタフアイルI1への読み出し要求と書き込み
要求が同時に起こる場合、いずれか一方の要求を待たせ
ておく必要がある。第10図のハードウエアは読み出し要
求を待たせておく場合を示している。読み出し信号線i2
と書き込み信号線i3の論理積をとり、読み出し禁止信号
i4として、プロセツサエレメントA2に要求を出してい
る。プロセツサエレメントA2は読み出し禁止信号i4が
「ON」を示す間、命令の実行を一時停止する。i5,i6は
比較回路の出力信号線であり、それぞれ同期レジスタフ
アイルI1に記憶されているデータがないこと、容量限度
までデータが格納されていることを示す。信号線i5はプ
ロセツサエレメントA2に接続される。プロセツサエレメ
ントA2は信号線i5が「ON」を示す間に受信命令を実行し
ようとしたとき、一時停止状態となる。実際に同期レジ
スタフアイルI1からデータを読み出すのは、受信命令を
実行し、かつ書き込み要求がなく、同期レジスタフアイ
ルI1に記憶されているデータがある場合に行われる。し
たがつて、信号線i2とi4′とi5′との論理積を信号線i7
とし、これを読み出しアドレスレジスタI2に制御線とす
る（ｘ′はｘの否定を表す）。信号i7は同期レジスタフ
アイルI1からデータを読み出した後、アドレスレジスタ
I2を増加させる。同期レジスタフアイルI1へデータを書
き込むことに関しても同様であり、送信命令を実行し、
かつ同期レジスタフアイルI1の容量限度までデータが格
納されていない場合に行われる。したがつて、信号線i3
とi6′との論理積を信号線i8として、これを書き込みア
ドレスレジスタI3の制御線とする。信号i8は同期レジス
タフアイルI1へデータを書き込んだ後、アドレスレジス
タI3を増加させる。信号i8はまた、セレクタI5の制御，
同期レジスタフアイルI1の読み出し／書き込みモードの
制御にも使用される。When a read request and a write request to the synchronous register file I1 occur at the same time, it is necessary to wait for one of the requests. FIG. 10 shows a case where the hardware waits for a read request. Read signal line i2
AND AND write signal line i3
A request is issued to processor element A2 as i4. The processor element A2 suspends the execution of the instruction while the read inhibition signal i4 indicates "ON". i5 and i6 are output signal lines of the comparison circuit, which indicate that there is no data stored in the synchronization register file I1 and that data is stored up to the capacity limit. The signal line i5 is connected to the processor element A2. When the processor element A2 attempts to execute the reception command while the signal line i5 indicates "ON", the processor element A2 enters a temporary stop state. The data is actually read from the synchronous register file I1 when the receiving instruction is executed, there is no write request, and there is data stored in the synchronous register file I1. Therefore, the logical product of the signal lines i2, i4 'and i5' is
And this is used as a control line for the read address register I2 (x 'represents negation of x). After reading data from the synchronous register file I1, the signal i7 is
Increase I2. The same is true for writing data to the synchronous register file I1, executing a transmission instruction,
This is performed when data is not stored up to the capacity limit of the synchronous register file I1. Therefore, signal line i3
A logical product of the logical address of i6 'and i6' is defined as a signal line i8, which is used as a control line of the write address register I3. The signal i8 increases the address register I3 after writing data to the synchronization register file I1. The signal i8 also controls the selector I5,
It is also used to control the read / write mode of the synchronization register file I1.

i9,i10はレジスタの値および格納先レジスタの番号を
送受信するための信号線である。また、i11は各種制御
線であり、通信バスh1の使用権要求のための信号線，受
信側プロセツサエレメントの番号を指定する信号線など
を含む。i9 and i10 are signal lines for transmitting and receiving the value of the register and the number of the storage destination register. Reference numeral i11 denotes various control lines, including a signal line for requesting the right to use the communication bus h1, a signal line for designating the number of the receiving processor element, and the like.

第11図にバス調停回路H3の内部構成図を示す。J1は発
振器であり、クロツク信号を発生する。これは基本クロ
ツクA11と同一と考えてよい。J2は優先順位決定回路で
あり、通信バスh1の使用が複数個のプロセツサエレメン
トで競合した場合、可変もしくは固定の優先順位にした
がつて、通信バスh1の使用権をいずれかのプロセツサエ
レメントに授与する。J3,J4は通信バスh1の使用権を記
憶しておくフリツプフロツプであり、各プロセツサエレ
メントに使用権を与えられている間、この値を「ON」に
設定する。J5はマルチプレクサであり、送信命令によつ
て指定されるプロセツサエレメント番号にしたがつて、
与えられた信号をいずれか一方の出力信号線に送出する
ことになる。FIG. 11 shows an internal configuration diagram of the bus arbitration circuit H3. J1 is an oscillator that generates a clock signal. This may be considered the same as the basic clock A11. J2 is a priority determining circuit, and when the use of the communication bus h1 conflicts with a plurality of processor elements, the right to use the communication bus h1 is assigned to one of the processor elements according to a variable or fixed priority. To be awarded. J3 and J4 are flip-flops for storing the right to use the communication bus h1, and this value is set to "ON" while the right to use is given to each processor element. J5 is a multiplexer, and according to the processor element number specified by the transmission instruction,
The given signal is sent to one of the output signal lines.

j1乃至j14は通信バスh1を構成する信号線である。j1
はクロツク信号線であり、本発明のデータ処理装置全体
を制御するためのクロツク信号を出力する。信号線j1は
第10図における信号線i1に対応する。j2,j3はバスリク
エスト信号線であり、各プロセツサエレメントはレジス
タ転送のために通信バスh1を使用する場合には、バスリ
クエスト信号j2,j3を出力する。j4,j5はバスグラント信
号線である。あるプロセツサエレメントからバスリクエ
スト信号j2,j3が出力され、通信バスh1の使用権を与え
ることが可能な場合、該プロセツサエレメントヘバスグ
ラント信号j4,j5を送る。同時に複数個のプロセツサエ
レメントからバスリクエスト信号が出力された場合、固
定または可変の優先順位にしたがつて、信号線j4もしく
はj5により、通信バスh1の使用権を授与する。j6,j7は
書き込み禁止信号線である。第10図において説明したよ
うに、あるプロセツサエレメントが送信命令を実行し、
かつ、受信側プロセツサエレメントの同期レジスタフア
イルI1の容量限度までデータが格納されている場合、送
信命令を一時停止させる必要がある。送信命令を停止さ
せることが必要な場合、信号線j6,j7に書き込み禁止信
号を送出する。j8,j9は同期レジスタフアイルI1の容量
限度までデータが格納されていることを示す信号線であ
る。j10,j11は書き込み信号線であり、各プロセツサエ
レメントに対して送信命令が実行されたことを示す信号
を送出する。信号線j8,j10は第10図ではそれぞれ信号線
i6,i3に対応する。j12はプロセツサエレメント番号の指
定を行う信号線である。送信命令実行時に、受信側プロ
セツサエレメントを指定するために使用する。信号線j
2,j4,j6とj12が第10図における信号線i11に対応する。j
13,j14はそれぞれデータ線，レジスタ番号を指定する信
号線であるが、バス調停回路H3はバスリクエスト信号j
2,j3とバスグラント信号j4,j5を用いてこれらを管理す
るのみであり、バス調停回路H3内で直接使用されるもの
ではない。j1 to j14 are signal lines constituting the communication bus h1. j1
Is a clock signal line for outputting a clock signal for controlling the entire data processing device of the present invention. The signal line j1 corresponds to the signal line i1 in FIG. j2 and j3 are bus request signal lines, and each processor element outputs bus request signals j2 and j3 when using the communication bus h1 for register transfer. j4 and j5 are bus grant signal lines. When a bus request signal j2, j3 is output from a certain processor element and the right to use the communication bus h1 can be given, a bus grant signal j4, j5 is sent to the processor element. When a bus request signal is output from a plurality of processor elements at the same time, the right to use the communication bus h1 is granted via the signal line j4 or j5 according to the fixed or variable priority. j6 and j7 are write inhibit signal lines. As described in FIG. 10, a certain processor element executes a transmission instruction,
Further, when data is stored up to the capacity limit of the synchronization register file I1 of the receiving processor element, it is necessary to temporarily stop the transmission command. When it is necessary to stop the transmission command, a write inhibit signal is transmitted to signal lines j6 and j7. j8 and j9 are signal lines indicating that data is stored up to the capacity limit of the synchronization register file I1. j10 and j11 are write signal lines for sending out a signal indicating that a transmission instruction has been executed to each processor element. Signal lines j8 and j10 are signal lines in FIG. 10, respectively.
Corresponds to i6, i3. j12 is a signal line for designating a processor element number. Used to specify the receiving processor element when executing a transmission instruction. Signal line j
2, j4, j6 and j12 correspond to the signal line i11 in FIG. j
13 and j14 are data lines and signal lines for specifying register numbers, respectively.
These are only managed using the bus grant signals j2 and j3 and the bus grant signals j4 and j5, and are not directly used in the bus arbitration circuit H3.

次にバス調停回路H3内で生成される信号j6とj7,j10と
j11の生成方法を述べる。バスグラント信号j4,j5の生成
方式については第14図において後述する。書き込み要求
信号j10,j11は、送信命令実行時に指定されるプロセツ
サエレメントへ送出される信号である。ここで、プロセ
ツサエレメント番号は信号線j12により指定され、いず
れかのプロセツサエレメントが送信命令を実行している
ことは全てのバスグラント信号の論理和j15により指定
される。したがつて、信号線j15を入力とし、プロセツ
サエレメント番号j12を選択信号としたマルチプレクサJ
5の出力信号をそのまま、書き込み要求信号j10,j11とす
ればよい。書き込み禁止信号j6,j7は送信命令実行時
に、受信側プロセツサエレメントの同期レジスタフアイ
ルI1の容量限度までデータが格納されている場合に送出
される。同期レジスタフアイルI1の容量限度までデータ
が格納されていることは信号線j8,j9によつて、また送
信命令を実行中であることは信号線j4,j5またはj10,j11
（j4,j5は送信側、j10,j11は受信側で検出）によつて指
定されるので、信号線j8とj10との論理積j16、もしく
は、信号線j9とj11との論理積j17が受信側プロセツサエ
レメントでの書き込み禁止信号を示す。信号j16,j17等
の全ての論理和j18を生成し、信号線j18とj4,j5との論
理積を生成すれば、これが送信側プロセツサエレメント
での書き込み禁止信号j6,j7となる。Next, the signals j6, j7, j10 generated in the bus arbitration circuit H3 are
The generation method of j11 is described. The method of generating the bus grant signals j4 and j5 will be described later with reference to FIG. The write request signals j10 and j11 are signals sent to the processor element specified when the transmission command is executed. Here, the processor element number is specified by the signal line j12, and the fact that any of the processor elements is executing the transmission instruction is specified by the logical sum j15 of all the bus grant signals. Accordingly, the multiplexer J having the signal line j15 as an input and the processor element number j12 as a selection signal
The output signal of No. 5 may be used as it is as write request signals j10 and j11. The write inhibit signals j6 and j7 are transmitted when the data is stored up to the capacity limit of the synchronous register file I1 of the receiving processor element when the transmission command is executed. The fact that data is stored up to the capacity limit of the synchronous register file I1 is indicated by signal lines j8 and j9, and that the transmission instruction is being executed is indicated by signal lines j4 and j5 or j10 and j11.
(J4 and j5 are detected on the transmitting side and j10 and j11 are detected on the receiving side). Therefore, the logical product j16 of the signal lines j8 and j10 or the logical product j17 of the signal lines j9 and j11 is received. Indicates a write inhibit signal in the side processor element. If all the logical sums j18 of the signals j16, j17, etc. are generated and the logical product of the signal line j18 and j4, j5 is generated, these become write inhibit signals j6, j7 in the transmitting processor element.

次に、第９図において示した命令フオーマツト（２）
と同期方式（ｂ）とを組み合わせたハードウエアについ
て記述する。第12図に命令フオーマツト（２）と同期方
式（ｂ）を組み合わせた場合の同期フアシリテイH1を示
す。第12図に示すように、同期フアシリテイH1は信号線
のみから構成され、マルチプロセツサ動作機構A8はバス
調停回路H3による集中管理で実現される。m1はクロツク
信号線であり、第10図における信号線i1と同様である。
m2,m3は読み出し信号線，書き込み信号線を示す。レジ
スタ転送命令を実行する場合、受信命令実行時には読み
出し信号線m2,送信命令実行時には書き込み信号線m3に
信号を送出する。m4は転送禁止信号であり、レジスタ転
送において、送信命令と受信命令とが対応していない場
合、一時的にプロセツサエレメントA2を停止させるため
に使用する。m5はデータ線、m6は各種制御線である。Next, the instruction format (2) shown in FIG.
A description will be given of hardware that combines the above and the synchronization method (b). FIG. 12 shows a synchronization facility H1 when the instruction format (2) and the synchronization method (b) are combined. As shown in FIG. 12, the synchronization facility H1 is composed of only signal lines, and the multiprocessor operating mechanism A8 is realized by centralized management by the bus arbitration circuit H3. m1 is a clock signal line, which is the same as the signal line i1 in FIG.
m2 and m3 indicate a read signal line and a write signal line. When the register transfer instruction is executed, a signal is transmitted to the read signal line m2 when the reception instruction is executed, and to the write signal line m3 when the transmission instruction is executed. m4 is a transfer prohibition signal, which is used to temporarily stop the processor element A2 when a transmission command and a reception command do not correspond in register transfer. m5 is a data line, and m6 is various control lines.

第13図にバス調停回路H3の内部構成図を示す。J1乃至
J4、およびj1乃至j5は第11図と同様の回路、信号線であ
る。N1,N2はマルチプレクサであり、受信命令実行時
に、送信側プロセツサエレメントに対して、読み出し信
号を伝える役割を有する。N3,N4も同様のマルチプレク
サであるが、送信命令実行時に、受信側プロセツサエレ
メントに対して、書き込み信号を伝える役割を有するも
のである。N5,N6は送信命令実行時に使用されるセレク
タである。各プロセツサエレメントからの読み出し信号
を入力とし、受信側プロセツサエレメントの番号を指定
することにより、受信側プロセツサエレメントの読み出
し信号線を選択する。読み出し信号線はプロセツサエレ
メントが受信命令を実行したことを示すので、送信命令
実行時にセレクタN5,N6の出力が「ON」を示す場合、レ
ジスタの内容を転送できる。N7,N8は受信命令実行時に
使用されるセレクタである。各プロセツサエレメントか
らの書き込み信号を入力とし、送信側プロセツサエレメ
ントの番号を指定することにより、送信側プロセツサエ
レメントの書き込み信号線を選択する。送信命令と同様
に、受信命令実行時にセレクタN7,N8の出力が「ON」を
示す場合、レジスタの内容を転送できる。FIG. 13 shows an internal configuration diagram of the bus arbitration circuit H3. J1 ~
J4 and j1 to j5 are the same circuits and signal lines as in FIG. N1 and N2 are multiplexers that have a role of transmitting a read signal to the transmitting processor element when the receiving instruction is executed. N3 and N4 are similar multiplexers, but have a role of transmitting a write signal to a receiving processor element when executing a transmission instruction. N5 and N6 are selectors used when executing the transmission instruction. A read signal line of the receiving processor element is selected by inputting a read signal from each processor element and designating the number of the receiving processor element. Since the read signal line indicates that the processor element has executed the reception instruction, the contents of the register can be transferred when the outputs of the selectors N5 and N6 indicate "ON" at the time of execution of the transmission instruction. N7 and N8 are selectors used when executing the reception instruction. The write signal from each processor element is input, and the number of the processor element on the transmission side is designated to select the write signal line of the processor element on the transmission side. Similarly to the transmission instruction, when the outputs of the selectors N7 and N8 indicate "ON" at the time of execution of the reception instruction, the contents of the register can be transferred.

n1とn2,n3とn4はそれぞれ読み出し信号線，書き込み
信号線である。信号線n1とn2、n3とn4は受信命令、送信
命令を実行した場合に「ON」となる。信号線n1,n3は第1
2図における信号線m2,m3に対応する。n5,n6は転送禁止
信号線である。送信命令または受信命令を実行し、か
つ、相手のプロセツサエレメントが受信命令または送信
命令を実行中でない場合に、信号が送出される。転送禁
止信号n5,n6によりプロセツサエレメントの実行は一時
停止状態となる。信号線n5は第12図における信号線m4に
対応する。n7,n8はプロセツサエレメント番号の指定を
行う信号線であり、レジスタ転送命令実行時に、指定さ
れる転送相手のプロセツサエレメント番号を示す。信号
線j2,j4とn7が第12図における信号線m6に対応する。n9
はデータ線であり、第12図におけるm5に対応する。n1 and n2 and n3 and n4 are a read signal line and a write signal line, respectively. The signal lines n1 and n2, and n3 and n4 become "ON" when a reception command and a transmission command are executed. The signal lines n1 and n3 are the first
This corresponds to the signal lines m2 and m3 in FIG. n5 and n6 are transfer inhibition signal lines. A signal is sent when a transmission command or a reception command is executed and the other processor element is not executing the reception command or the transmission command. Execution of the processor element is suspended by the transfer inhibition signals n5 and n6. The signal line n5 corresponds to the signal line m4 in FIG. n7 and n8 are signal lines for designating a processor element number, and indicate a processor element number of a designated transfer partner when a register transfer instruction is executed. Signal lines j2, j4 and n7 correspond to signal line m6 in FIG. n9
Is a data line, which corresponds to m5 in FIG.

バス調停回路H3内で生成される信号n5,n6は、送信命
令または受信命令を実行し、かつ、相手のプロセツサエ
レメントで対応する受信命令または送信命令を実行中で
ない場合に送出される。相手プロセツサエレメントが受
信命令を実行中であることはセレクタN5,N6の出力信号n
10,n11により、送信命令を実行中であることはセレクタ
N7,N8の出力信号n12,n13によりそれぞれ示される。した
がつて、送信命令を実行し、かつ、相手プロセツサエレ
メントが受信命令を実行中でないことは、信号n3とn1
0′との論理積n14、もしくは、信号n4とn11′との論理
積n15により示すことが可能である。また、受信命令を
実行し、かつ、相手プロセツサエレメントが送信命令を
実行中でないことは、信号n1とn12′との論理積n16、も
しくは、信号n2とn13′との論理積n17により示すことが
可能である。信号線n14とn16の論理和、信号線n15とn17
の論理和が転送禁止信号n5,n6となる。The signals n5 and n6 generated in the bus arbitration circuit H3 are transmitted when the transmission command or the reception command is executed and the corresponding reception command or the transmission command is not being executed by the other processor element. The fact that the other processor element is executing the reception instruction is based on the output signal n of the selectors N5 and N6.
According to 10, n11, the transmission instruction is being executed.
This is indicated by output signals n12 and n13 of N7 and N8, respectively. Accordingly, the execution of the transmission instruction and the fact that the other processor element is not executing the reception instruction are performed by signals n3 and n1.
It can be indicated by a logical product n14 of 0 'or a logical product n15 of signals n4 and n11'. The fact that the receiving instruction is being executed and the other processor element is not executing the transmitting instruction is indicated by the logical product n16 of the signals n1 and n12 'or the logical product n17 of the signals n2 and n13'. Is possible. OR of signal lines n14 and n16, signal lines n15 and n17
Are the transfer prohibition signals n5 and n6.

第14図に第11図と第13図において使用される優先順位
決定回路J2の内部構成図を示す。各プロセツサエレメン
トのバスリクエスト信号j2,j3、バスグラント信号j4,j
5、およびクロツク信号j1を入力とする。R1,R2はフリツ
プフロツプである。R1はフリツプフロツプR2に対して入
力を遅らせるために用いられ、R2は前回バスグラント信
号を与えられたプロセツサエレメントを記憶しておくた
めに使用される。FIG. 14 shows an internal configuration diagram of the priority order determination circuit J2 used in FIG. 11 and FIG. Bus request signals j2, j3 and bus grant signals j4, j of each processor element
5, and clock signal j1 are input. R1 and R2 are flip-flops. R1 is used to delay the input to the flip-flop R2, and R2 is used to store the processor element to which the previous bus grant signal was given.

各プロセツサエレメントのバスリクエスト信号j2とj3
が競合しない場合、バスリクエスト信号j2,j3をそのま
まフリツプフロツプJ3,J4の入力とし、バスグラント信
号j4,j5を授与できる。しかし、バスリクエスト信号j2
とj3が競合する場合（信号線j2とj3の論理積r1が「ON」
となる場合）には、競合するプロセツサエレメント間で
優先順位を計算し、優先順位の高いプロセツサエレメン
トにバスグラント信号を送出する必要がある。プロセツ
サエレメント間で固定の優先順位を設定する方法もある
が、第14図に示した優先順位決定回路においては、前回
バスグラント信号を与えられたプロセツサエレメントの
優先順位を低くする方式をとる。Bus request signals j2 and j3 of each processor element
Do not conflict with each other, the bus request signals j2 and j3 can be directly used as the inputs of the flip-flops J3 and J4, and the bus grant signals j4 and j5 can be given. However, the bus request signal j2
And j3 conflict (the logical product r1 of the signal lines j2 and j3 is "ON"
In this case, it is necessary to calculate the priority between competing processor elements and send a bus grant signal to the processor element having a higher priority. Although there is a method of setting a fixed priority between the processor elements, the priority determining circuit shown in FIG. 14 employs a method of lowering the priority of the processor element to which the bus grant signal was given last time. .

各プロセツサエレメントのバスグラント信号j4,j5の
論理和の否定r2をフリツプフロツプR2のクロツク入力と
する。これにより、いずれかのバスグラント信号が「O
N」から「OFF」に変化する（通信バスの使用権を放棄す
る）ときに、フリツプフロツプR2の内容を書き換える。
フリツプフロツプR1は第２番目のプロセツサエレメント
A3のバスグラント信号j5を入力とする。フリツプフロツ
プR1はクロツク信号j1に同期させて、第２番目のプロセ
ツサエレメントA3のバスグラント信号j5を遅延させる。
フリツプフロツプR1の出力信号線r3はフリツプフロツプ
R2の入力信号となる。このため、フリツプフロツプR2
は、いずれかのプロセツサエレメントが通信バスの使用
権を放棄するとき、与えられていたバスグラント信号が
第２番目のプロセツサエレメントA3に対するものか否か
を記録することになる。フリツプフロツプR2の出力信号
線をr4とする。バスリクエスト信号j2とj3が競合する場
合、信号r4が「ON」のとき（前回のバスグラント信号が
第２番目のプロセツサエレメントA3に与えられていた場
合）、第１番目のプロセツサエレメントA2に対してバス
グラント信号j4を与え、信号r4が「OFF」のとき、第２
番目のプロセツサエレメントA3に対してバスグラント信
号j5を与える。ただし、既にいずれかのプロセツサエレ
メントにバスグラント信号が与えられている場合は、そ
の状態を保つものとする。The negation r2 of the logical sum of the bus grant signals j4 and j5 of each processor element is used as the clock input of the flip-flop R2. As a result, one of the bus grant signals becomes "O
When the state changes from "N" to "OFF" (the right to use the communication bus is relinquished), the contents of the flip-flop R2 are rewritten.
Flip flop R1 is the second processor element
The bus grant signal j5 of A3 is input. The flip-flop R1 delays the bus grant signal j5 of the second processor element A3 in synchronization with the clock signal j1.
The output signal line r3 of the flip-flop R1 is connected to the flip-flop.
It becomes the input signal of R2. For this reason, flip flop R2
Will record whether or not the given bus grant signal is for the second processor element A3 when any processor element relinquishes the right to use the communication bus. The output signal line of the flip-flop R2 is denoted by r4. When the bus request signals j2 and j3 compete, when the signal r4 is "ON" (when the previous bus grant signal has been given to the second processor element A3), the first processor element A2 To the bus grant signal j4, and when the signal r4 is "OFF", the second
The bus grant signal j5 is supplied to the third processor element A3. However, if a bus grant signal has already been given to any of the processor elements, that state shall be maintained.

以上に示した論理を回路にする場合、フリツプフロツ
プJ3への入力信号は、（r1′・j2＋r1・（r4・r2＋j4）） J4への入力信号は、（r1′・j3＋r1・（r4′・r2＋j5））となる。なお、ｘ・ｙはｘとｙとの論理積を示し、ｘ＋
ｙはｘとｙとの論理和を示す。それぞれの論理式の各項
の意味については、r1′・j2、r1′・j3がバスリクエス
ト信号の競合しない場合を示し、r1・r4・r2あるいはr1
・r4′・r2がバスリクエスト信号の競合する場合を示す
（r2は既にいずれかのプロセツサエレメントにバスグラ
ント信号が与えられていないことを示す）。r1・j4とr1
・j5は既にバスグラント信号が与えられている場合、そ
のバスグラント信号を継続することを示す。When the above-described logic is used as a circuit, the input signal to the flip-flop J3 is (r1'.j2 + r1. (R4.r2 + j4)) The input signal to J4 is (r1'.j3 + r1. (R4'.r2 + j5) ). Note that xy indicates the logical product of x and y, and x +
y indicates the logical sum of x and y. The meaning of each term in each logical expression indicates that r1'.j2 and r1'.j3 do not conflict with bus request signals, and r1.r4.r2 or r1.
R4'.r2 indicates a case where a bus request signal conflicts (r2 indicates that a bus grant signal has not been given to any of the processor elements). r1, j4 and r1
J5 indicates that when a bus grant signal is already given, the bus grant signal is continued.

次に第９図に示した停止命令、再開命令を実現するた
めのハードウエアについて説明する。第15図に停止命
令、再開命令を実行するための同期フアシリテイH1を示
す。T1は停止アドレスレジスタであり、プロセツサエレ
メントA2が停止命令を実行したとき、指定するアドレス
を記憶する。T2は比較回路であり、他プロセツサエレメ
ントが再開命令を実行した場合、通信バスh1からのアド
レス入力と停止アドレスレジスタT1とを比較する。Next, hardware for realizing the stop instruction and the restart instruction shown in FIG. 9 will be described. FIG. 15 shows a synchronization facility H1 for executing a stop instruction and a restart instruction. T1 is a stop address register which stores an address to be specified when the processor element A2 executes a stop instruction. T2 is a comparison circuit, which compares the address input from the communication bus h1 with the stop address register T1 when another processor element executes the restart instruction.

t1はクロツク信号線であり、i1またはm1と同様の信号
線である。t2は再開信号線であり、再開命令を実行した
場合、アドレス線t3と同時に出力される。アドレス線t3
は再開命令実行時と停止命令実行時にアドレスを指定す
るために使用される。再開命令実行時には通信バスh1に
対して出力されるが、停止命令実行時には停止アドレス
レジスタT1にアドレスを記憶させる。t6は停止信号線で
ある。信号線t6は、プロセツサエレメントA2が停止命令
を実行したとき、アドレス線t3と同時に出力され、指定
のアドレスを停止アドレスレジスタT1に記憶させるため
に使用される。t4,t5はいずれもアドレス線であり、そ
れぞれ、停止アドレスレジスタT1の出力および通信バス
h1を介して得られた他プロセツサエレメントからのアド
レスを示す。これらは比較回路T2の入力信号であり、信
号線t8により両アドレスが一致したことを示すことがで
きる。信号t8が「ON」となり、かつ、他プロセツサエレ
メントが再開信号t7を送出すれば、停止中のプロセツサ
エレメントを起動させるための信号t9をプロセツサエレ
メントA2に対して送る。t10は各種制御線であり、通信
バスh1管理の制御信号などを含む。t1 is a clock signal line, which is similar to i1 or m1. A restart signal line t2 is output simultaneously with the address line t3 when a restart instruction is executed. Address line t3
Is used to specify an address when executing a restart instruction and when executing a stop instruction. When the restart instruction is executed, the address is output to the communication bus h1, but when the stop instruction is executed, the address is stored in the stop address register T1. t6 is a stop signal line. The signal line t6 is output at the same time as the address line t3 when the processor element A2 executes the stop instruction, and is used to store a specified address in the stop address register T1. t4 and t5 are both address lines, respectively, the output of the stop address register T1 and the communication bus.
Indicates an address from another processor element obtained via h1. These are the input signals of the comparison circuit T2, and it can be shown by the signal line t8 that both addresses match. When the signal t8 is turned "ON" and another processor element sends the restart signal t7, a signal t9 for activating the stopped processor element is sent to the processor element A2. t10 is various control lines, and includes a control signal for managing the communication bus h1 and the like.

停止命令、再開命令を実現するハードウエアにおい
て、バス調停回路H3の役割は通信バスh1におけるアドレ
ス線，制御線などの使用権を管理するのみとなる。した
がつて、バス調停回路H3の内部構成は第11図または第13
図のうち、発振器J1,優先順位決定回路J2,フリツプフロ
ツプJ3,J4、およびクロツク信号j1,バスリクエスト信号
j2,j3、バスグラント信号j4,j5を含む回路のみでよい。
すなわち、既に示したレジスタ転送命令と停止命令、再
開命令という３種類の協調命令全てを実行できるハード
ウエアは、第10図もしくは第12図と第15図とを組み合わ
せた同期フアシリテイH1と、第11図もしくは第13図に示
したバス調停回路H3を用いることにより実現できる。ま
た、これらのハードウエアと第７図に示したハードウエ
アを組み合わせ、Compare＆Swap命令の制御も行うよう
にすることも可能である。In the hardware for realizing the stop instruction and the restart instruction, the role of the bus arbitration circuit H3 only manages the right to use the address line and the control line in the communication bus h1. Therefore, the internal configuration of the bus arbitration circuit H3 is shown in FIG.
In the figure, an oscillator J1, a priority determining circuit J2, flip-flops J3 and J4, a clock signal j1, and a bus request signal
Only circuits including j2 and j3 and bus grant signals j4 and j5 are required.
That is, the hardware capable of executing all the three types of coordination instructions, the register transfer instruction, the stop instruction, and the restart instruction already described, is a synchronous facility H1 combining FIG. 10 or FIG. 12 and FIG. This can be realized by using the bus arbitration circuit H3 shown in FIG. 13 or FIG. It is also possible to combine these hardware with the hardware shown in FIG. 7 to control the Compare & Swap instruction.

第16図乃至第21図は本実施例において示した制御命令
の動作例を示している。第16図乃至第20図は第９図で示
した命令フオーマツト（１）と同期方式（ａ）とを組み
合わせた場合の動作例、第21図は停止命令と再開命令を
実行する場合の動作例を示している。命令フオーマツト
（２）と同期方式（ｂ）とを組み合わせた場合、第12図
と第13図に示されるように、優先順位決定回路J2とフリ
ツプフロツプJ3,J4を除いて、順序回路（記憶回路）が
存在しない。したがつて、このハードウエアに対する動
作例は示さない。16 to 21 show an operation example of the control command shown in this embodiment. 16 to 20 show an operation example when the instruction format (1) and the synchronization method (a) shown in FIG. 9 are combined, and FIG. 21 shows an operation example when a stop instruction and a restart instruction are executed. Is shown. When the instruction format (2) and the synchronous system (b) are combined, as shown in FIGS. 12 and 13, except for the priority order determination circuit J2 and the flip-flops J3 and J4, the sequential circuit (storage circuit) Does not exist. Therefore, an operation example for this hardware is not shown.

命令フオーマツト（１）と同期方式（ａ）とを組み合
わせた場合の動作例を、第16図乃至第20図のタイミング
チヤートを用いて説明する。第16図は、第１番目のプロ
セツサエレメント（PE1）が第２番目のプロセツサエレ
メント（PE2）に対して送信命令を実行した場合のタイ
ミングチヤートである。最初に、送信側プロセツサエレ
メントはバスリクエスト信号（BR）を送出する。いずれ
のプロセツサエレメントも通信バスを使用していない場
合には、クロツク信号（clock）に同期して、バスグラ
ント信号（BG）が与えられる。なお、通信バスが使用さ
れている場合、およびバスリクエスト信号が競合した場
合、すぐさまバスグラント信号が与えられないことがあ
る。送信側プロセツサエレメントはバスグラント信号を
受け取つた後、送信命令において指定する受信側プロセ
ツサエレメントの番号（PE）、指定レジスタの値（dat
a）、受信側レジスタの番号（REG）を送出する。なお、
第16図ではこれらのデータを送出する間隔を1.5クロツ
クサイクルとしているが、この長さは、プロセツサエレ
メントの配置など種々の要因から変更され得るものであ
る。送信命令によるプロセツサエレメント番号指定によ
り、受信側プロセツサエレメントに対して、書き込み信
号（write）が送られる。受信側プロセツサエレメント
の同期レジスタフアイルの容量限度までデータが格納さ
れていない場合、指定レジスタの値と受信側レジスタの
番号を同期レジスタフアイルに書き込み、書き込みアド
レスレジスタ（WA）を増加させる。転送終了時にはバス
リクエスト信号を「OFF」に変化させ、これにより、バ
スグラント信号が「OFF」に変化する。An operation example in which the instruction format (1) and the synchronization method (a) are combined will be described with reference to the timing charts of FIGS. 16 to 20. FIG. 16 is a timing chart in the case where the first processor element (PE1) executes a transmission instruction to the second processor element (PE2). First, the transmitting processor element sends out a bus request signal (BR). When neither processor element uses the communication bus, a bus grant signal (BG) is given in synchronization with a clock signal (clock). When a communication bus is used or when a bus request signal competes, a bus grant signal may not be given immediately. After receiving the bus grant signal, the transmitting-side processor element receives the number of the receiving-side processor element (PE) specified in the transmission command and the value of the specified register (dat
a) Send the number of the register on the receiving side (REG). In addition,
In FIG. 16, the interval of sending these data is 1.5 clock cycles, but this length can be changed by various factors such as the arrangement of the processor elements. By designating the processor element number by the transmission instruction, a write signal (write) is sent to the receiving processor element. If data is not stored up to the capacity limit of the synchronization register file of the receiving processor element, the value of the designated register and the number of the receiving register are written in the synchronization register file, and the write address register (WA) is increased. At the end of the transfer, the bus request signal is changed to "OFF", whereby the bus grant signal changes to "OFF".

第17図も送信命令を実行したときのタイミングチヤー
トである。ただし、送信命令実行当初、受信側プロセツ
サエレメントの同期レジスタフアイルの容量限度までデ
ータが格納されている場合を示す。送信側プロセツサエ
レメントにおけるBR,BG,PE,data,REG、および受信側プ
ロセツサエレメントにおけるwrite,WAの各信号は第16図
と同様である。受信側プロセツサエレメントの同期レジ
スタフアイルの容量限度までデータが格納されている場
合（full信号が「ON」のとき）、write,full両信号の論
理積として書き込み禁止信号（Inhibit）が送出され
る。これにより、送信側プロセツサエレメントは、通信
バスの使用権を保持したまま一時停止状態となる。受信
側プロセツサエレメントが受信命令により同期レジスタ
フアイルの内容を読み出し、full信号を「OFF」に変化
させると、送信側プロセツサエレメントの書き込み禁止
信号が「OFF」となり、一時停止状態が解除される。こ
れにより、指定レジスタの値、受信側レジスタの番号を
受信側の同期レジスタフアイルに格納する。この場合、
書き込み禁止信号が「ON」の間分だけ、送信側プロセツ
サエレメントのPE,data,REGの各信号にデータが出力さ
れている時間が長くなる。BR,BGも同様である。FIG. 17 is also a timing chart when the transmission command is executed. However, the case where the data is stored up to the capacity limit of the synchronization register file of the receiving processor element at the beginning of the execution of the transmission command is shown. The signals of BR, BG, PE, data, REG in the transmitting-side processor element and the signals of write and WA in the receiving-side processor element are the same as in FIG. When data is stored up to the capacity limit of the synchronization register file of the receiving processor element (when the full signal is "ON"), a write inhibit signal (Inhibit) is transmitted as a logical product of write and full signals. . As a result, the transmitting-side processor element is temporarily stopped while holding the right to use the communication bus. When the receiving processor element reads the contents of the synchronization register file by the receiving command and changes the full signal to "OFF", the write inhibit signal of the transmitting processor element becomes "OFF" and the pause state is released. . As a result, the value of the designated register and the number of the register on the receiving side are stored in the synchronization register file on the receiving side. in this case,
The time during which data is output to each of the PE, data, and REG signals of the transmission-side processor element becomes longer by the time the write inhibit signal is "ON". The same applies to BR and BG.

第18図に送信命令実行時のもう１つのタイミングチヤ
ートを示す。送信命令実行時に、受信側プロセツサエレ
メントの同期レジスタフアイルの容量限度までデータが
格納されている場合、通信バスの使用権を放棄する方式
である。送信側プロセツサエレメントに対して、書き込
み禁止信号が送出されたとき、送信側プロセツサエレメ
ントを一時停止状態とするとともに、バスリクエスト信
号を「ON」から「OFF」に変化させる。これにより、バ
スグラント信号も「OFF」に変化し、通信バスの使用権
を放棄したことになる。その後、受信側プロセツサエレ
メントにおいて、full信号が「ON」から「OFF」に変化
した場合、送信側プロセツサエレメントに対して割込み
等を発生させ、送信側プロセツサエレメントの一時停止
状態を解除する。第18図の方式は通信バスの競合を低下
させるための方式である。FIG. 18 shows another timing chart when the transmission command is executed. When the transmission instruction is executed, if data is stored up to the capacity limit of the synchronization register file of the receiving processor element, the right to use the communication bus is relinquished. When the write-inhibit signal is sent to the transmitting-side processor element, the transmitting-side processor element is temporarily stopped, and the bus request signal is changed from "ON" to "OFF". As a result, the bus grant signal also changes to "OFF", meaning that the right to use the communication bus has been abandoned. Thereafter, when the full signal changes from “ON” to “OFF” in the receiving processor element, an interrupt or the like is generated for the transmitting processor element, and the pause state of the transmitting processor element is released. . The system shown in FIG. 18 is a system for reducing contention of the communication bus.

第19図には受信命令実行時のタイミングチヤートを示
す。受信側プロセツサエレメント（PE1）が読み出し信
号（read）を出力し、同期レジスタフアイルから、転送
されてきたレジスタの値（data）とデータ格納先レジス
タの番号（REG）を読み出す。このとき、読み出しアド
レスレジスタ（RA）の内容を増加させる。なお、受信命
令実行時においても、読み出し信号の出力間隔を1.5ク
ロツクサイクルとしているが、この長さは、プロセツサ
エレメントの配置など種々の要因から変更され得るもの
である。FIG. 19 shows a timing chart when the reception command is executed. The receiving processor element (PE1) outputs a read signal (read), and reads the transferred register value (data) and data storage destination register number (REG) from the synchronization register file. At this time, the contents of the read address register (RA) are increased. Although the output interval of the read signal is set to 1.5 clock cycles during execution of the reception command, this length can be changed by various factors such as the arrangement of the processor elements.

第20図に、受信命令実行時に同期レジスタフアイルに
記憶されているデータがない場合のタイミングチヤート
を示す。read,data,REG,RAの各信号の意味は第19図と同
様である。受信側プロセツサエレメントが受信命令を実
行し、読み出し信号を出力したとき、同期レジスタフア
イルに記憶されているデータが存在しないことがある。
この場合は、empty信号が「ON」となつている。受信命
令を実行したプロセツサエレメントはempty信号によ
り、同期レジスタに記憶されているデータがないことを
判断し、一時停止状態となる。この時点で、他プロセツ
サエレメントがPE1に対して送信命令を実行する（write
信号「ON」）と、empty信号が「ON」から「OFF」に変化
する。これにより、受信側プロセツサエレメントは一時
停止状態を解除され、第19図と同様に、同期レジスタフ
アイルから、転送されてきたレジスタの値とデータ格納
先レジスタの番号を読み出し、かつ、読み出しアドレス
レジスタの内容を増加させる。FIG. 20 shows a timing chart when there is no data stored in the synchronization register file at the time of executing the reception command. The meanings of the read, data, REG, and RA signals are the same as in FIG. When the receiving-side processor element executes the receiving instruction and outputs a read signal, data stored in the synchronous register file may not exist.
In this case, the empty signal is "ON". The processor element that has executed the reception instruction determines from the empty signal that there is no data stored in the synchronization register, and enters a pause state. At this point, another processor element executes a transmission instruction to PE1 (write
Signal “ON”) and the empty signal changes from “ON” to “OFF”. As a result, the receiving processor element is released from the suspended state, and reads the transferred register value and the number of the data storage destination register from the synchronization register file as in FIG. Increase the content of

第21図は停止命令、再開命令実行時のタイミングチヤ
ートである。第１番目のプロセツサエレメント（PE1）
が停止命令を、第２番目のプロセツサエレメント（PE
2）が再開命令を実行する。停止命令実行時には、命令
で指定したアドレス（WAR）を送出するとともに、停止
信号（sleep）を出力する。停止信号により、指定した
アドレスを停止アドレスレジスタに記憶させる。停止信
号を出力した後、PE1は一時停止状態となる。FIG. 21 is a timing chart at the time of executing a stop instruction and a restart instruction. 1st processor element (PE1)
Sends a stop instruction to the second processor element (PE
2) executes the restart instruction. When the stop instruction is executed, the address (WAR) specified by the instruction is transmitted, and the stop signal (sleep) is output. The stop signal causes the designated address to be stored in the stop address register. After outputting the stop signal, PE1 enters the pause state.

再開命令により、指定したアドレスで一時停止状態と
なつているプロセツサエレメントを全て再開させる。こ
のため、通信バスを介して、指定したアドレスを全ての
プロセツサエレメントに通知する。したがつて、PE2
は、最初に、バスリクエスト信号（BR）を出力し、バス
グラント信号（BG）によつて通信バスの使用権を確保し
なければならない。通信バスの使用権を確保した後、再
開信号（resume）と命令によつて指定したアドレス（ad
dress）を通信バス上に出力する。出力されたアドレス
は他プロセツサエレメント上の停止アドレスレジスタの
内容と比較され、一致すれば、起動信号（wakeup）を対
応するプロセツサエレメントに送出する。これにより、
一時停止状態となつているプロセツサエレメントが実行
を再開する。なお、停止信号，再開信号ともに出力間隔
を1.5クロツクサイクルとしているが、この長さは、プ
ロセツサエレメントの配置など種々の要因から変更され
得るものである。In response to the restart instruction, all the processor elements that are in the pause state at the specified address are restarted. Therefore, the specified address is notified to all the processor elements via the communication bus. Therefore, PE2
Must first output a bus request signal (BR) and secure the right to use the communication bus by a bus grant signal (BG). After the right to use the communication bus is secured, the resume signal (resume) and the address (ad
dress) on the communication bus. The output address is compared with the contents of a stop address register on another processor element, and if they match, a wake-up signal (wakeup) is sent to the corresponding processor element. This allows
The suspended processor element resumes execution. The output interval of both the stop signal and the restart signal is 1.5 clock cycles, but this length can be changed by various factors such as the arrangement of the processor elements.

最後に本発明の並列処理装置において、並列動作モー
ドとマルチプロセツサ動作モードとの切換え方式につい
て述べる。Lastly, a method of switching between the parallel operation mode and the multiprocessor operation mode in the parallel processing device of the present invention will be described.

本発明においては、２つの動作モードを切り換える特
殊命令として、マルチプロセツサ動作命令と、並列動作
命令との２つの命令を用意する。マルチプロセツサ動作
命令は、並列化フラグA7をマルチプロセツサ動作モード
に設定する。並列動作命令は、並列化フラグA7を並列動
作モードに設定する。各命令は、OS（オペレーテイング
システム）のみが実行することのできる特権命令とす
る。In the present invention, two instructions, a multiprocessor operation instruction and a parallel operation instruction, are prepared as special instructions for switching between two operation modes. The multiprocessor operation instruction sets the parallelization flag A7 to the multiprocessor operation mode. The parallel operation instruction sets the parallelization flag A7 to the parallel operation mode. Each instruction is a privileged instruction that can be executed only by the OS (operating system).

第22図に並列動作モードからマルチプロセツサ動作モ
ードへ変更する場合の、OS制御の流れの例を示す。並列
動作モードで動作中のプログラムが終了または一時停止
する（X1）と、OSが起動される（X2）。OS内では、スケ
ジユーリングX3により、新たに実行すべきプログラムを
取り出す。このとき、実行すべきプログラムが無けれ
ば、idleループにより、新たなプログラムが発生するま
で待つ。OSは実行すべきプログラムの並列度を調べ（X
4）、並列度が高い場合には、並列動作が効率的である
ので、並列動作モードのままOSの制御から抜ける（X
5）。並列度が低い場合にはマルチプロセツサ動作モー
ドで複数のプログラムを並列に実行させる方が効率的で
ある。したがつて、マルチプロセツサ動作命令を実行し
（X6）、各プロセツサエレメントを独立に動作させる。
第１番目のプロセツサエレメントで実行すべきプログラ
ムは、既にスケジユーリングX3において選択済みである
ので、そのままOSの制御から抜ける（X7）。また、第２
番目以降のプロセツサエレメントにおいては、新たにス
ケジユーリングX8を行い、実行すべきプログラムを取り
出す。FIG. 22 shows an example of the flow of OS control when changing from the parallel operation mode to the multiprocessor operation mode. When the program running in the parallel operation mode ends or pauses (X1), the OS is started (X2). In the OS, a new program to be executed is extracted by the scheduling X3. At this time, if there is no program to be executed, an idle loop waits until a new program is generated. The OS checks the parallelism of the program to be executed (X
4) If the degree of parallelism is high, the parallel operation is efficient, and the process exits from the OS control in the parallel operation mode (X
Five). When the degree of parallelism is low, it is more efficient to execute a plurality of programs in parallel in the multiprocessor operation mode. Therefore, the multiprocessor operation instruction is executed (X6), and each processor element is operated independently.
Since the program to be executed by the first processor element has already been selected in the scheduling X3, the control directly exits from the OS control (X7). Also, the second
In the processor elements following the first one, a new scheduling X8 is performed, and a program to be executed is extracted.

第23図にマルチプロセツサ動作モードから並列動作モ
ードへ変更する場合の、OS制御の流れの例を示す。複数
のプロセツサエレメントのうち第１番目のプロセツサエ
レメントで実行していたプログラムが終了したとする
（Y1）。第22図と同様にして、OSの起動Y2、スケジユー
リングY3が行われ、その後、新たに実行すべきプログラ
ムの並列度を調べる（Y4）。プログラムの並列度が低い
場合には、マルチプロセツサ動作モードのまま、OSの制
御を抜ける（Y5）。プログラムの並列度が高い場合に
は、並列動作モードの方が効率的であるので、並列動作
命令を実行して（Y6）、並列動作モードに移行する。こ
のとき、他プロセツサエレメントがプログラムを実行し
ている最中に並列動作モードに変更することは不可能で
あるため、並列動作命令を実行するのは、プログラムの
終了待ち（Y7）を行つた後である。他プロセツサエレメ
ントで実行中のプログラムが終了または一時停止すれ
ば、並列動作命令を実行することになる。FIG. 23 shows an example of the flow of OS control when changing from the multiprocessor operation mode to the parallel operation mode. It is assumed that the program executed by the first processor element among the plurality of processor elements has been completed (Y1). In the same manner as in FIG. 22, OS startup Y2 and scheduling Y3 are performed, and thereafter, the degree of parallelism of a program to be newly executed is checked (Y4). If the degree of parallelism of the program is low, the control exits the OS in the multi-processor operation mode (Y5). When the degree of parallelism of the program is high, the parallel operation mode is more efficient. Therefore, a parallel operation instruction is executed (Y6), and the mode shifts to the parallel operation mode. At this time, since it is impossible to change to the parallel operation mode while another processor element is executing the program, executing the parallel operation instruction is performed by waiting for the end of the program (Y7). Later. When the program being executed by the other processor element is terminated or temporarily stopped, the parallel operation instruction is executed.

なお、ここで述べたプログラムの並列度はソースプロ
グラムを目的プログラムに変換する時点で知ることがで
きる。したがつて、コンパイラ等がプログラムの並列度
を目的プログラム内に埋め込むことによつて、OSに並列
度を認識させることができる。The degree of parallelism of the program described here can be known at the time of converting the source program into the target program. Therefore, the OS can recognize the degree of parallelism by embedding the degree of parallelism of the program in the target program by a compiler or the like.

〔The invention's effect〕

本発明のデータ処理装置を使用することにより、プロ
グラムに内在する並列性の度合いにしたがつて、そのプ
ログラムに最適な計算機システムの構成を実現すること
ができる。すなわち、プログラム内部の並列度が高い場
合には、並列動作で実行し、プログラム内部の並列度は
低いが、複数個のプログラムを並列に実行可能な場合に
は、マルチプロセツサ動作で実行することができる。こ
れにより、複数個のプロセツサエレメントの稼働率を向
上させ、結果的に、マルチプロセツサ型計算機システム
やスーパスカラ型計算機システムに比べて、性能を向上
させることが可能である。By using the data processing device of the present invention, it is possible to realize a configuration of a computer system optimal for the program according to the degree of parallelism inherent in the program. In other words, if the degree of parallelism inside the program is high, the program should be executed in parallel operation. Can be. As a result, the operation rate of a plurality of processor elements can be improved, and as a result, the performance can be improved as compared with a multi-processor computer system or a superscalar computer system.

また、マルチプロセツサ型計算機システムにおいて
は、プロセツサエレメント間の同期、データ転送に専用
のハードウエアを設けることが少なく、たとえ専用ハー
ドウエアを設けたとしても、それほど高速性を重視して
いない。本発明においては、並列動作のため、各プロセ
ツサエレメントを「近接して」配置したことに注目し、
マルチプロセツサ動作時にもプロセツサエレメント間で
高速な同期、データ通信を行うことを可能とした。これ
により、複数個のプログラム間で互いに協調して動作す
る場合にも、容易に高速化が可能である。In a multiprocessor type computer system, dedicated hardware is rarely provided for synchronization and data transfer between the processor elements, and even if dedicated hardware is provided, the high speed is not so important. Note that in the present invention, each processor element is arranged "close" for parallel operation,
High-speed synchronization and data communication can be performed between processor elements even during multiprocessor operation. Accordingly, even when a plurality of programs operate in cooperation with each other, the speed can be easily increased.

[Brief description of the drawings]

第１図と第２図は本発明のデータ処理装置の全体構成
図、第３図はプロセツサエレメントの内部構成図、第４
図と第５図はキヤツシユ制御装置の内部構成図、第６図
はマルチプロセツサ動作機構の機能を示した図、第７図
は排他制御機能を行うためのマルチプロセツサ動作機構
の内部構成図、第８図はプロセツサエレメント間協調命
令を行うためのマルチプロセツサ動作機構の内部構成
図、第９図はプロセツサエレメント間協調命令のフオー
マツト、第10図，第12図，第15図は同期フアシリテイの
内部構成図、第11図，第13図はバス調停回路の内部構成
図、第14図は優先順位決定回路の内部構成図、第16図乃
至第20図はレジスタ転送命令実行時のタイミングチヤー
ト、第21図は停止命令実行時と再開命令実行時のタイミ
ングチヤート、第22図と第23図はマルチプロセツサ動作
モードと並列動作モードとを切り換える時のOSの制御フ
ローチヤートを示す。 A1……プロセツサ、A2,A3……プロセツサエレメント、A
5……キヤツシユ記憶装置、A6……キヤツシユ制御装
置、A7……並列化フラグ、A8……マルチプロセツサ動作
機構、A9,A10……並列動作機構、B1……命令キヤツシ
ユ、B2……データキヤツシユ、D1乃至D4……命令レジス
タ、D5乃至D8……デコーダ、D11乃至D14……演算器、D1
5,D16……レジスタフアイル、H1,H2……同期フアシリテ
イ、H3……バス調停回路、h1……通信バス、I1……同期
レジスタフアイル、I2……読み出しアドレスレジスタ、
I3……書き込みアドレスレジスタ、J1……発振器、J2…
…優先順位決定回路、T1……停止アドレスレジスタ。FIGS. 1 and 2 show the overall configuration of a data processing apparatus according to the present invention, FIG. 3 shows the internal configuration of a processor element, and FIG.
5 and 5 are diagrams showing the internal configuration of the cache control device, FIG. 6 is a diagram showing the functions of the multiprocessor operation mechanism, and FIG. 7 is an internal configuration diagram of the multiprocessor operation mechanism for performing the exclusive control function. , FIG. 8 is an internal configuration diagram of a multi-processor operation mechanism for executing a coordination instruction between processor elements, FIG. 9 is a format of a coordination instruction between processor elements, FIG. 10, FIG. 12, and FIG. 11 and 13 are internal configuration diagrams of a bus arbitration circuit, FIG. 14 is an internal configuration diagram of a priority order determination circuit, and FIGS. 16 to 20 are diagrams for explaining a register transfer instruction. FIG. 21 is a timing chart for executing a stop instruction and a resuming instruction, and FIGS. 22 and 23 are control flow charts of an OS when switching between a multiprocessor operation mode and a parallel operation mode. A1 …… Processor, A2, A3 …… Processor element, A
5: Cache storage device, A6: Cache control device, A7: Parallelization flag, A8: Multiprocessor operation mechanism, A9, A10: Parallel operation mechanism, B1: Instruction cache, B2: Data cache Push, D1 to D4: Instruction register, D5 to D8: Decoder, D11 to D14: Arithmetic unit, D1
5, D16: Register file, H1, H2: Synchronous facility, H3: Bus arbitration circuit, h1: Communication bus, I1: Synchronous register file, I2: Read address register
I3 Write address register, J1 Oscillator, J2
… Priority order determination circuit, T1… Stop address register.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平１−321549（ＪＰ，Ａ) 特開昭59−16071（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 15/16,15/80──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-1-321549 (JP, A) JP-A-59-16071 (JP, A) (58) Fields investigated (Int. Cl. ⁶ , DB name) G06F 15 / 16,15 / 80

Claims

(57) [Claims]

1. A data processing apparatus comprising: n processor elements; a basic clock for operating the n processor elements; a cache memory shared by the n processor elements; and a memory device. The apparatus includes a multiprocessor operating mechanism for operating the n processor elements independently, a parallel operating mechanism for simultaneously operating the n processor elements in synchronization with the basic clock, and an operating state of the data processing device. A parallel operation flag, a parallel operation mechanism that causes the data processing device to perform either the multiprocessor operation or the parallel operation according to the parallel operation flag, and a data read / write width of the cache storage device according to the parallel operation flag. And a cache control mechanism for changing the number of data read / writes. Processing apparatus.

2. The data processing device according to claim 1, wherein said parallel operation mechanism simultaneously reads n instructions sequentially arranged in said memory device or said cache storage device according to said basic clock, and said n instructions Is simultaneously shared with the n processor elements, and the n processor elements are simultaneously executed in synchronization, thereby executing n instructions in parallel.

3. The data processing device according to claim 1, wherein said n processor elements each include m instruction registers, m instruction decoders, and m arithmetic units, and at least m inputs and 2 × m outputs. A data processing device having a multi-port register file having ports and executing m instructions in parallel.

4. The data processing device according to claim 3, wherein said parallel operation mechanism simultaneously reads out n × m instructions sequentially arranged in said memory device or said cache storage device in accordance with said basic clock, and reads out said n × m instructions. m instructions
A data processing device for simultaneously executing n × m instructions in parallel by simultaneously supplying the n processor elements in units of units and simultaneously executing the n processor elements in synchronization.

5. The data processing device according to claim 1, wherein said n processor elements each have a program counter pointing to addresses of m consecutively arranged k-bit instructions, and said cache control device comprises: When the parallelization flag indicates the multiprocessor operation, the cache storage device is individually read out from the addresses pointed to by the respective program counters of the n processor elements, m m-bit instructions are read. n
If the address input / m × k bits, n output mode is set and the parallelization flag indicates the parallel operation, the cache storage device is set to n ×, A data processing device for reading one instruction continuously, setting one address input / n × m × k bits, and one output mode.

6. The data processing device according to claim 1, wherein said multiprocessor operating mechanism comprises: a communication bus for transferring data between said n processor elements; a bus arbitration circuit for managing said communication bus; A data processing device having n synchronization facilities for performing synchronization between elements.

7. The data processing device according to claim 1, wherein the multiprocessor operating mechanism includes a reference control signal line and a reference address line for the memory device for the n processor elements, respectively. When referring to the memory device from the plurality of processor elements, a reference execution right is given to only one of the processor elements in the case of a reference request to the same address, and a reference execution right is simultaneously provided in a reference request to a different address. Data processing device that controls to give

8. The data processing device according to claim 1, wherein said data processing device sets a parallel operation instruction to set said parallelization flag to said parallel operation, and sets a multiprocessor operation to set said parallelization flag to multiprocessor operation. A data processing device having instructions and two types of configuration control instructions.