JPH052568A

JPH052568A - Inter-processor synchronizing processor

Info

Publication number: JPH052568A
Application number: JP23453591A
Authority: JP
Inventors: Masatsugu Kametani; 雅嗣亀谷
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1990-09-14
Filing date: 1991-09-13
Publication date: 1993-01-08
Anticipated expiration: 2015-02-14
Also published as: JP3008223B2

Abstract

PURPOSE:To provide an inter-processor synchronizing processor capable of minimizing synchronizing processing overhead between tasks or processors which is generated due to the parallel processing of a general purpose multi- processor. CONSTITUTION:The inter-processor synchronizing processor consists of a synchronizing processing circuit 101 for executing synchronizing processing between plural processors necessary for parallel processing, plural processing units 1000 to 100n for executing partial charge of processing of respective tasks, and a data sharing circuit CSYS 102 connected to a system bus 120 for offering and storing shared data necessary for transferring necessary data among respective processors 10 to 1n in respective processing units or a local data sharing circuits CSYS 510 to 51n included in respective processing units while holding the coherence of data.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、マルチ・プロセッサ・
システムに係り、特にプロセッサ間の同期をとるのに好
適なプロセッサ間の同期装置に関する。BACKGROUND OF THE INVENTION The present invention relates to a multi-processor system.
The present invention relates to a system, and more particularly, to a synchronizer between processors suitable for synchronizing processors.

【０００２】[0002]

【従来の技術】マルチ・プロセッサにおける従来の同期
処理は、基本的にタスク間での同期（処理順序付け）で
あり、タスク・ドリプンあるいはデータ・ドリプンでタ
スクが起動され処理を進めていく方式をとる。これを汎
用のマルチ・プロセッサ上で実現しようとすると、共有
メモリ上にタスク終了フラグを持ち、各タスクが先行す
る必要なタスク処理がすべて終了したかどうかをチェッ
クしてデータフロー的に処理を進める方式や、トークン
制御方式が採られる。従来の同期処理については「マル
チ・マイクロプロセッサシステム，ｐｐ１１７〜１１
９，ｐｐ１１９〜１２２，哲学出版，１９８４年１１
月」に述べられている。2. Description of the Related Art The conventional synchronous processing in a multi-processor is basically synchronization (processing ordering) between tasks, and takes a system in which tasks are activated by task drip or data drip to proceed with processing. . If you try to realize this on a general-purpose multi-processor, you will have a task end flag on the shared memory, and check whether all the required task processes preceding each task have been completed and proceed with the data flow. Method and token control method are adopted. For the conventional synchronous processing, see "Multi-microprocessor system, pp117-11.
9, pp119-122, Philosophy Publishing, 1984 11
The Moon ".

【０００３】[0003]

【発明が解決しようとする課題】上記汎用マルチ・プロ
セッサ・システムにおける従来技術は、いずれもソフト
ウエアで処理する割合が高くまたチェック項目も多いた
め、タスク間の同期（タスク処理の先行関係の順当化）
あるいはプロセッサ間の同期に要する同期処理オーバー
ヘッドが大きい。したがって、タスクの細分化（細かな
タスク分割）が十分できない。また並列処理のタスク処
理順序が必要以上に制約される等の問題がありジョブの
並列性を十分引き出せないため、高効率な並列処理の実
現が困難である。In the prior art in the above-mentioned general-purpose multi-processor system, since the rate of processing by software is high and there are many check items, synchronization between tasks (regarding the precedence relationship of task processing is performed). )
Alternatively, the synchronization processing overhead required for synchronization between processors is large. Therefore, the task cannot be subdivided (fine task division). Further, there is a problem that the task processing order of parallel processing is unnecessarily restricted, and the parallelism of jobs cannot be sufficiently drawn out, so that it is difficult to realize highly efficient parallel processing.

【０００４】さらに、限られた個数のプロセッサが、多
数のタスクを処理するマルチプロセッサシステムにおい
ては、従来、あるプロセッサがあるタスク処理を終了
し、次のタスク処理に移行する際、他の必要なタスク処
理が終了しており必要な結果がすべて出そろっているか
どうかを確認する同期処理が必要となる。その時、その
プロセッサは、同期チェック処理に占有され、同期処理
が終了するまで実質的、有効的な処理を実行しない空処
理時間を生じてしまうという問題がある。Further, in a multiprocessor system in which a limited number of processors process a large number of tasks, conventionally, when a certain processor finishes a task process and shifts to the next task process, another need is required. Synchronous processing is required to confirm whether the task processing has been completed and all the required results have been obtained. At that time, the processor is occupied by the synchronization check processing, and there is a problem that there is an idle processing time in which substantially effective processing is not executed until the synchronization processing ends.

【０００５】本発明の目的は、汎用のマルチ・プロセッ
サにおける並列処理に伴うタスク間あるいはプロセッサ
間の同期処理オーバーヘッドを極小化することが可能な
プロセッサ間の同期装置を提供することにある。An object of the present invention is to provide a synchronization device between processors capable of minimizing the synchronization processing overhead between tasks or between processors due to parallel processing in a general-purpose multi-processor.

【０００６】[0006]

【課題を解決するための手段】上記目的は、複数のタス
クを分担して並列処理する複数個のプロセッサと、該複
数個のプロセッサ間でやりとりするデータを共有するデ
ータ共有回路と、前記複数個のプロセッサ間の同期をと
る同期処理回路と、複数個の内のそれぞれのプロセッサ
から前記データ共有回路への競合するアクセス要求を制
御する局所同期化回路と、を備えたことにより達成され
る。The above-described object is to provide a plurality of processors that share a plurality of tasks and perform parallel processing, a data sharing circuit that shares data exchanged between the plurality of processors, and a plurality of the plurality of processors. And a local synchronization circuit for controlling competing access requests to the data sharing circuit from each of the plurality of processors.

【０００７】上記目的は、前記同期処理回路がタスクを
終了したプロセッサが出力するタスク終了情報を入力
し、関連するタスクを処理するいくつかのプロセッサが
出力する前記タスク終了情報が全てアクティブに変化し
た後にそのタスク終了情報から同期させるべきプロセッ
サがタスク処理を終了したことを示す同期終了情報を対
応するプロセッサに対してアクティブな値を出力し、局
所同期化回路が、そのタスク終了情報を出力したプロセ
ッサが前記データ共有回路へアクセスする時に対応する
前記同期終了情報が前記同期処理回路からアクティブな
値として出力されていないならば、アクセスを禁止し前
記プロセッサの動作を休止することにより達成される。The above-mentioned object is to input the task end information output by the processor whose synchronous processing circuit has finished the task, and to change all the task end information output by some processors which process the related tasks to active. A processor that outputs an active value to the corresponding processor as synchronization end information indicating that the processor to be synchronized later has completed the task processing from the task end information, and the local synchronization circuit outputs the task end information. If the corresponding synchronization end information is not output as an active value from the synchronization processing circuit when the data access circuit accesses the data sharing circuit, the access is prohibited and the operation of the processor is suspended.

【０００８】上記目的は、前記同期処理回路が並列処理
を実施する際に関連するタスクを処理するプロセッサ同
士でグループを形成し、該グループ内の前記プロセッサ
間、或いは該グループ間で同期をとる同期手段を備えた
ことにより達成される。The above-mentioned object is to form a group of processors which process tasks related to each other when the synchronous processing circuit executes parallel processing, and synchronize the processors in the group or between the processors. It is achieved by providing means.

【０００９】上記目的は、前記同期手段は、前記グルー
プ内の１つのプロセッサが出力するグループを構成する
それぞれのプロセッサ名に相当する情報を記憶する同期
用レジスタと、該同期用レジスタへのアクセスによりト
リガされるフィリップ・フロップと、該フィリップ・フ
ロップの状態を他のプロセッサへ伝送する第１の伝送回
路と、伝送された該フィリップ・フロップの状態と前記
同期用レジスタの記憶内容と照合し前記同期用レジスタ
に記憶された前記グループ内のそれぞれのプロセッサが
アクティブか否か判断する判断回路と、該判断回路が出
力する判断結果を前記フィリップ・フロップを介して前
記グループ内の１つのプロセッサに伝送する第２の伝送
回路と、を備えたことにより達成される。[0009] The above-mentioned object is that the synchronization means stores a synchronization register for storing information corresponding to each processor name forming a group output by one processor in the group, and an access to the synchronization register. The triggered flip-flop, a first transmission circuit for transmitting the state of the flip-flop to another processor, the transmitted state of the flip-flop and the stored contents of the synchronization register, and the synchronization A decision circuit for deciding whether or not each processor in the group stored in the register for activation is active, and a decision result output by the decision circuit is transmitted to one processor in the group via the flip-flop. And a second transmission circuit.

【００１０】上記目的は、前記同期手段はが前記グルー
プ内の１つのプロセッサが出力するグループを構成する
それぞれのプロセッサ名に相当する情報を記憶する同期
用レジスタと、該同期用レジスタへのアクセスによりト
リガされるフィリップ・フロップと、前記フィリップ・
フロップがタスク終了情報を出力するためのトリガ信号
をプロセッサから前記フィリップ・フロップへ出力する
第１の信号伝送手段と、前記同期用レジスタがプロセッ
サのグループを記憶するためのトリガ信号をプロセッサ
から前記同期用レジスタへ出力する第２の信号伝送手段
と、前記同期用レジスタの値が全プロセッサはグループ
に属するとみなすようにセットするアクティブな信号を
プロセッサから前記同期用レジスタへ出力する第３の信
号伝送手段とを備えたことにより達成される。The above-mentioned object is achieved by the synchronizing means storing information corresponding to the names of the respective processors constituting the group output by one processor in the group, and by accessing the synchronizing register. The triggered Philip Flop and the Philip Flip
First signal transmission means for outputting a trigger signal for the flop to output task end information from the processor to the Philip flop, and a trigger signal for the synchronization register for storing a group of processors from the processor to the synchronization Second signal transmission means for outputting to the register for synchronization, and third signal transmission for outputting an active signal from the processor to the register for synchronization so that the value of the register for synchronization is set so that all processors are regarded as belonging to a group. It is achieved by having means.

【００１１】上記目的は、複数のプロセッサ間の同期処
理を行い、並列処理を矛盾なく制御するプロセッサ間の
同期処理装置において、前記各プロセッサに対応してそ
れぞれのプロセッサがタスクを終了した時点でアクティ
ブなタスク終了情報を出力する手段と、同期をとるべき
プロセッサからのタスク終了情報がすべてアクティブに
なったときとその情報を用いて各プロセッサに対応した
前記アクティブなタスク終了情報を非アクティブにする
手段と、前記タスク終了情報をプロセッサがチェックし
同期処理が終了したことを確認する手段とを有すること
により達成される。The above object is to provide a synchronous processing device for performing synchronous processing between a plurality of processors and controlling parallel processing without contradiction, and active when each processor corresponding to each processor finishes a task. For outputting active task end information, and means for deactivating the active task end information corresponding to each processor when all the task end information from the processors to be synchronized become active And a means for the processor to check the task end information to confirm that the synchronization processing has been completed.

【００１２】上記目的は、前記アクティブなタスク終了
情報の出力を指示するＥＯ命令と、前記同期処理が終了
したことを確認するＷ命令とを各プロセッサが指令する
手段を有することにより達成される。The above object is achieved by each means having a means for instructing an EO instruction for instructing the output of the active task end information and a W instruction for confirming the completion of the synchronous processing.

【００１３】上記目的は、１つのプロセッサのタスクの
終了時に、同レベルで実行を終了する他のプロセッサで
実行されたタスクから、次にプロセッサが実行すべきタ
スクへのリレーションが存在しない場合、前記ＥＯ命令
を挿入し、前記プロセッサが、あるタスクを終了して次
のタスクに処理を進める際、同レベルで実行された他の
プロセッサからの情報を必要としなければ前記ＥＯ命令
を実行して、次のタスクへ無条件で処理を進める手段を
有することにより達成される。[0013] The above-mentioned object is that when there is no relation from a task executed by another processor that terminates execution at the same level when the task of one processor ends, to a task to be executed next by the processor, When an EO instruction is inserted and the processor finishes a task and proceeds to the next task, if the information from another processor executed at the same level is not needed, the EO instruction is executed, This is achieved by having a means to proceed unconditionally to the next task.

【００１４】上記目的は、１つ前の同期レベルで前記Ｅ
Ｏ命令を実行して現在実行中のタスク処理に移ったプロ
セッサは、そのタスクの終了時点で次に実行すべきタス
クが同レベルで実行を終了する他のプロセッサで実行さ
れたタスクとのリレーションを有するとき、前レベルで
実行した前記ＥＯ命令に対応する同期チェック命令とし
てＷ命令を実行して、前レベルの同期処理が終了したか
を確認し、その後、現在の同期レベルの同期処理を行
い、次のタスクへ処理を進める手段を有することにより
達成される。The above-mentioned object is the above-mentioned E at the previous synchronization level.
The processor that has executed the O instruction and moved to the processing of the task currently being executed has a relation with the task executed by another processor in which the next task to be executed finishes at the same level at the end of the task. When it has, the W instruction is executed as a synchronization check instruction corresponding to the EO instruction executed in the previous level to check whether the synchronization processing in the previous level is completed, and then the synchronization processing in the current synchronization level is performed. This is achieved by having a means for advancing processing to the next task.

【００１５】[0015]

【作用】本発明の構成によれば、汎用マルチ・プロセッ
サにおいて同時刻に処理されているタスクはプロセッサ
の個数を越えないことに注目して、並列処理に伴う同期
の問題をすべてプロセッサ間の同期をとる問題に帰着さ
せ、有限であるプロセッサ間での同期処理をハードウエ
ア化すると共に、データ共有回路をプロセッサがアクセ
スする時の情報を用いることによってソフトウエア・オ
ーバーヘッドを極小化し、かつ同期処理時に発生する空
処理時間を減少させる。According to the configuration of the present invention, attention is paid to the fact that the number of tasks being processed at the same time in a general-purpose multi-processor does not exceed the number of processors, so that all synchronization problems caused by parallel processing are synchronized between processors. To minimize the software overhead by using the information when the processor accesses the data sharing circuit, and at the same time, the synchronization processing between the processors is limited. Reduces the idle processing time that occurs.

【００１６】並列処理をプロセッサ間で同期をとりなが
ら矛盾なくかつ高効率で実行するため、有限であるプロ
セッサ個数分のビットを用意し、関連するタスクを実行
中のプロセッサに対応するビットをアクティブにセット
したビット列（ワードデータ）を、プロセッサがタスク
処理の終了時にタスク終了情報として同期用レジスタに
セットするとともにタスク終了線をアクティブにするタ
スク終了処理を実行し、それらの情報は判断手段に入力
され、セットされた同期用レジスタの各ビットに対応す
るプロセッサがすべてタスク終了処理を完了したかどう
かを各プロセッサの対応するタスク終了線とつき合わせ
ることによって監視し、すべてが真（タスクを終了し
た）なら同期がとれたものとしてプロセッサに同期終了
情報を発行し、汎用性を損わない範囲で上記同期処理を
ハードウエア化したものが同期処理回路である。In order to execute parallel processing in a highly efficient manner without any contradiction while synchronizing the processors, a finite number of bits for the number of processors are prepared and the bits corresponding to the processors executing the related tasks are activated. The processor sets the set bit string (word data) in the synchronization register as task end information at the end of the task processing and executes the task end processing that activates the task end line, and the information is input to the determination means. , Check whether all the processors corresponding to each bit of the set synchronization register have completed the task end processing, by checking with the corresponding task end line of each processor, all are true (task has finished) Then, the synchronization completion information is issued to the processor, assuming that synchronization has been achieved, and general-purpose Those hardware the above synchronization processing is not jeopardized a is synchronous processing circuit.

【００１７】また、並列処理中の同期処理期間に発生す
るプロセッサの空処理時間（遊び時間）を自動的にかつ
オーバーヘッドを伴わず減少させるため、前述した同期
処理回路に加えて次のデータ共有回路と局所同期化回路
を設ける。Further, in order to automatically reduce the idle processing time (play time) of the processor that occurs during the synchronous processing period during the parallel processing without overhead, in addition to the synchronous processing circuit described above, the following data sharing circuit is provided. And a local synchronization circuit.

【００１８】データ共有回路は、プロセッサでのタスク
処理に必要なプロセッサ間で共有すべき共有データを保
持又は提供する各プロセッサからアクセス可能とする。The data sharing circuit is accessible from each processor that holds or provides shared data to be shared among the processors necessary for task processing in the processors.

【００１９】局所同期化回路は、前述した同期処理回路
にタスク終了情報を送出してタスクを終了したプロセッ
サが、次のタスク処理に無条件で移行し、初めて前記デ
ータ共有回路から必要な共有データを得ようとした時、
前記同期処理回路からの同期終了情報がまだ発行されて
いない（非アクティブのままである）ならデータ共有回
路へのアクセスをペンディングし、プロセッサを待たせ
る。The local synchronization circuit sends the task end information to the above-mentioned synchronization processing circuit and the processor which has finished the task unconditionally shifts to the next task processing, and the shared data required from the data sharing circuit is not received until the shared data required for the first time. When trying to get
If the synchronization end information from the synchronization processing circuit has not yet been issued (remains inactive), the access to the data sharing circuit is pending and the processor is made to wait.

【００２０】それにより、タスク処理が終了していて、
かつ同期処理が終了していない場合、従来の様に無条件
でプロセッサが待たされ、空処理時間が生じてしまうの
とは異なり、プロセッサは次のタスクで共有データが必
要となるまでの間できるだけ次のタスク処理を進めてい
くことができるため、空処理時間を減少させることがで
きる。As a result, the task processing is completed,
And when the synchronous processing is not completed, unlike the conventional case where the processor is unconditionally made to wait and the empty processing time is generated, the processor can wait until the next task needs the shared data. Since the next task processing can proceed, the idle processing time can be reduced.

【００２１】更に、同期用レジスタへの値のセットのみ
を各プロセッサのソフトウエアにより行い、それ以外は
すべてハードウエアによって行うとすれば、プログラマ
ブル（汎用性を有する）で、かつソフトウエア・オーバ
ーヘッドが１マシン命令程度と小さなプロセッサ間の同
期をとることができる。Further, if only the setting of the value in the synchronization register is performed by the software of each processor and the rest is performed by the hardware, it is programmable (having versatility) and software overhead is reduced. It is possible to synchronize between one machine instruction and a small processor.

【００２２】また、同期処理期間中に発生する空処理時
間を少なくする前記手段は、並列処理そのもののクリテ
ィカルパスを短縮する効果があり、より高効率な並列処
理を自動的に実行することができる。Further, the means for reducing the idle processing time generated during the synchronous processing period has an effect of shortening the critical path of the parallel processing itself and can automatically execute the parallel processing with higher efficiency. .

【００２３】ソフトウェア内に同期処理命令を配置する
コントロールフローと、前述したデータ共有回路へのア
クセス条件を利用したハードウェアによるデータフロー
的自動チューニングを組み合わせることにより、プロセ
ッサの遊び時間をより極小化でき、それにより並列処理
時の遊び時間の短縮を実現できるBy combining the control flow for arranging the synchronous processing instruction in the software and the automatic tuning for the data flow by the hardware utilizing the access condition to the data sharing circuit, the idle time of the processor can be further minimized. , Which can reduce the play time during parallel processing.

【００２４】[0024]

【実施例】以下図１〜図３により、本発明の全体構成を
説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The overall structure of the present invention will be described below with reference to FIGS.

【００２５】本発明は、並列処理のために必要なプロセ
ッサ間の同期処理を行う同期処理回路１０１、各タスク
の処理を分担して実行する複数の処理ユニット１００ｎ
（ｎ＝０，１，２，…）、各処理ユニット内のプロセッ
サ１ｎ（ｎ＝０，１，２，…）の間での必要なデータの
やりとりを行うため必要な共有データの提供及び保持を
行うためのシステムバスｌ２０上のデータ共有回路ＣＳ
ＹＳ１０２又は各処理ユニット内にデータのコヒーレン
シを保ちながら存在するローカルなデータ共有回路ＣＳ
ＹＳ５１ｎ（ｎ＝０，１，２，…）とから成る。同期処
理回路１０１は各処理ユニット内のプロセッサ１ｎから
それぞれ同期要求Ｓｌｎ（ｎ＝θ，１，２，…）を対応
する入力ＳＹＮＣｎＲＥＱに受け付け、その中のすべて
又はいくつかがアクティブになったときそれらのプロセ
ッサ間の同期がとれたとみなし、通知すべきプロセッサ
への同期チェック信号ＴＥＳＴｎをアクティブにする。
上記アクティブな同期チェック信号ＴＥＳＴｎは、同期
終了情報としてＴｅｓｔ信号ラインＴｌｎより処理ユニ
ット１００ｎ内の信号制御回路５０ｎ及びプロセッサ１
ｎのＴＥＳＴ入力に入力される。プロセッサ１ｎは、Ｔ
ＥＳＴ入力の状態をハードウエア又はソフトウエアでチ
ェックし、関連する処理を実行するプロセッサ間で同期
がとれたかどうかを判断する。同期処理回路の基本機能
は、プロセッサ１ｎからの同期要求Ｓｌｎを同期処理回
路１０１が受け付けた時に一度ＴＥＳＴｎ出力を非アク
ティブに戻し、同期しなければならないプロセッサから
の同期要求Ｓｌがすべてアクティブになったとき、ＴＥ
ＳＴｎ出力を再びアクティブ状態にセットする。同期処
理回路１０１内の具体的な実施例は、図３及び図４によ
り後で詳細に説明する。各プロセッサ１ｎ（ｎ＝θ，
１，２，…）は、タスク処理が終了した時点で同期処理
回路１０１にタスク終了情報としてアクティブなＳｌｎ
を出力する。それを受けて、対応するＴＥＳＴｎ出力が
アクティブに転じるまでデータ共有回路１０２及び５１
ｎ（ｎ＝θ，１，２，…）上に次のタスク処理で利用す
る他のプロセッサからの必要な結果データが完全には出
そろっていないとみなす。したがって、次の条件が成立
すれば、従来どおり矛盾なくプロセッサ間で同期をとり
ながら並列処理を進めていくことができる。According to the present invention, a synchronous processing circuit 101 for performing synchronous processing between processors necessary for parallel processing, and a plurality of processing units 100n for sharing and executing the processing of each task.
(N = 0, 1, 2, ...), Provision and holding of shared data necessary for exchanging necessary data between the processors 1n (n = 0, 1, 2, ...) In each processing unit. Data sharing circuit CS on the system bus 120 for performing
A local data sharing circuit CS that exists while maintaining data coherency in the YS 102 or each processing unit
It is composed of YS51n (n = 0, 1, 2, ...). The synchronous processing circuit 101 receives the synchronous request Sln (n = θ, 1, 2, ...) From the processor 1n in each processing unit to the corresponding input SYNCnREQ, and when all or some of them are activated, It is considered that the processors have been synchronized, and the synchronization check signal TESTn to the processor to be notified is activated.
The active synchronization check signal TESTn is used as synchronization end information from the Test signal line Tln and the signal control circuit 50n and the processor 1 in the processing unit 100n.
n TEST input. The processor 1n is T
The state of the EST input is checked by hardware or software to determine if the processors performing the associated processing are synchronized. The basic function of the synchronous processing circuit is that when the synchronous processing circuit 101 accepts the synchronous request Sln from the processor 1n, the TESTn output is once returned to inactive, and all the synchronous requests Sl from the processor that must be synchronized become active. When TE
The STn output is set to the active state again. A specific embodiment in the synchronization processing circuit 101 will be described later in detail with reference to FIGS. 3 and 4. Each processor 1n (n = θ,
1, 2, ...) are Slns that are active as task end information in the synchronous processing circuit 101 when the task processing is completed.
Is output. In response to this, the data sharing circuits 102 and 51 are activated until the corresponding TESTn output becomes active.
It is considered that necessary result data from other processors used in the next task processing are not completely available on n (n = θ, 1, 2, ...). Therefore, if the following conditions are satisfied, it is possible to proceed with parallel processing while maintaining synchronization between the processors as in the conventional case.

【００２６】１）関連するタスク処理を実行しているプ
ロセッサからのタスク終了情報（Ｓｌｎで同期処理回路
のＳＹＮＣｎＲＥＱに入力される情報）がすべてアクテ
ィブに転じた（そのレベルのすべての必要なタスク処理
が終了した）後のタイミングで、それらのタスク終了情
報を発行した処理ユニット及びプロセッサに対して、ア
クティブな同期終了情報（ＴＥＳＴｎから出力されＴｌ
ｎによって処理ユニットへ伝送される情報）を出力す
る。その同期終了情報がアクティブに転じるまで、その
レベルで実行されているタスク処理による結果情報はす
べてが出そろっている訳ではないと仮定する。1) All the task end information (information input to SYNCnREQ of the synchronous processing circuit at Sln) from the processor executing the related task processing has turned active (all required task processing at that level). At the timing after the completion of the task), the active synchronization end information (Tl output from TESTn is output to the processing unit and the processor that issued the task end information).
information transmitted to the processing unit by n). It is assumed that all the result information by the task processing executed at that level is not available until the synchronization end information becomes active.

【００２７】２）プロセッサが次のレベルのタスク処理
（次のレベルで実行すべきタスク処理）に移行し、他の
プロセッサの実行したタスク処理結果のうちプロセッサ
間で共有すべきデータを保持するデータ共有回路１０２
又は５１ｎを初めてアクセスする時点で、そのプロセッ
サは同期終了情報がすでにアクティブになっている事の
確認が終了していなければならない。もし、タスク処理
の中で初めてデータ共有回路をアクセスする時点で、同
期終了情報が非アクティブ状態であれば、アクティブに
転じるまでそのデータ共有回路へのアクセスをペンディ
ングしなければならない。2) Data that holds data to be shared between processors among the result of task processing executed by another processor when the processor shifts to task processing at the next level (task processing to be executed at the next level) Shared circuit 102
Alternatively, when the 51n is accessed for the first time, the processor must finish the confirmation that the synchronization end information is already active. If the synchronization end information is in the inactive state at the time of accessing the data sharing circuit for the first time in the task processing, the access to the data sharing circuit must be pending until it becomes active.

【００２８】３）ここで、レベルの境目の定義を、各プ
ロセッサがアクティブなタスク終了情報を発行した時点
と定義する。すなわち、タスクとして定義された各処理
の切れ目をレベルの境目とする。また、アクティブな同
期終了情報が同期処理回路１０１から発行された時点を
同期レベルの境目と定義する。3) Here, the definition of the level boundary is defined as the time when each processor issues active task end information. That is, the break of each process defined as a task is set as the level boundary. Further, the time when the active synchronization end information is issued from the synchronization processing circuit 101 is defined as the boundary of the synchronization level.

【００２９】以上の条件を満足した上で、最も効率良く
並列処理を実行するためには、同期処理で生じる空処理
時間（同期ポイントに早く到達したプロセッサが同期処
理が終了するまで待たされることによって生じるプロセ
ッサの遊び時間）を極力少なくする必要がある。このた
めには、各プロセッサに次のレベルの託す処理をできる
だけ先行して進めさせる方式を採るのが良い。すなわ
ち、タスク処理中で、初めてデータ共有回路にアクセス
する時点まで同期終了情報を無視してタスク処理を先に
進めて行き、最初のデータ共有回路アクセス時に初めて
同期終了情報がアクティブ状態（他のプロセッサが必要
なタスク処理をすべて終了しており、必要な共有情報が
すべてデータ共有回路上に存在している）である事を確
認する様にする。つまり、各プロセッサが各自の同期ポ
イントをできるだけ後にずらして進められるだけ先に処
理を進める方式である。これによって、一見固定的に分
割されて生成された様にみえる各タスクの処理時間を同
期条件によって動的に変えていく効果があり、並列処理
のクリティカルパス自体を短縮し、より高効率な並列処
理を自動的に実現する情報フロー並列処理的効果があ
る。この方式は、従来の様に同期処理終了情報にだけ依
存する方式と異なり、データ共有回路１０２又は５１ｎ
へのアクセス処理と密接に連動して同期処理が実行され
る。図１の処理ユニット１００ｎ及び図２に詳細を示し
た処理ユニット内の信号制御回路５０ｎを例にとってハ
ードウエアの詳細を説明する。In order to execute the parallel processing most efficiently after satisfying the above conditions, the empty processing time generated in the synchronous processing (by waiting for the processor that reaches the synchronization point earlier until the synchronous processing ends) It is necessary to minimize the processor idle time that occurs. For this purpose, it is advisable to adopt a method in which each processor advances the processing entrusted to the next level as early as possible. In other words, during task processing, the synchronization end information is ignored until the data sharing circuit is accessed for the first time, and the task processing proceeds, and when the first data sharing circuit is accessed, the synchronization end information is in the active state (other processors). Has completed all necessary task processing and all necessary shared information exists on the data sharing circuit). In other words, this is a method in which each processor advances its processing as much as possible by shifting its synchronization point as late as possible. This has the effect of dynamically changing the processing time of each task that seems to have been created by dividing it at a glance, depending on the synchronization conditions, shortening the critical path of parallel processing itself, and increasing the efficiency of parallel processing. There is an information flow parallel processing effect that automatically realizes processing. This method is different from the conventional method that depends only on the synchronization processing end information, unlike the data sharing circuit 102 or 51n.
The synchronization process is executed in close coordination with the access process to access. Details of the hardware will be described by taking the processing unit 100n of FIG. 1 and the signal control circuit 50n in the processing unit shown in detail in FIG. 2 as examples.

【００３０】処理ユニット１００ｎ内の信号制御回路５
０ｎは、プロセッサ１ｎからのアドレス信号を含む制御
信号ｌ９ｎを受け、それらをデコードして共有システム
ＣＳＹＳ５１ｎやデータ共有回路バスｌ１２をアクセス
するためのアクセス要求信号ｌ３ｎを生成し、データ共
有回路及びデータ共有回路バスｌ２０へのアクセス権を
いずれのプロセッサに与えるかを決定するアービトレー
ション回路ＤＣ１０３に送る。アービトレーション回路
ＤＣ１０３は、複数の処理ユニットから送られてきた複
数のアクセス要求信号ｌ３ｎ（ｎ＝１，２，…）を受け
付けて、その中から１つを選択し、アクセス許可信号ｌ
４ｎ（ｎ＝θ，１，２，…）のうち選択された処理ユニ
ットに対応するアクセス許可信号をアクティブにする機
能を有する。信号制御回路５０ｎは、アクセス許可信号
ｌ４ｎがアクティブになると出力バッファ５２ｎをＯＮ
にして、データ共有回路バスｌ２０へアドレス信号やコ
ントロール信号及び書き込み処理の場合は情報信号をバ
スラインｌ５ｎによって出力する。また、読み込み処理
の場合は、データ共有回路バス上のデータをバスライン
ｌ５ｎ及び入力バッファ５３ｎを介して局所データ共有
回路ＣＳＹＳ５１ｎに書き込んだり、バスラインｌ１０
ｎを介してプロセッサ１ｎが直接読み込んだりする。局
所データ共有回路ＣＳＹＳ５１ｎの情報をプロセッサ１
ｎが読み出す場合は、バスラインｌ７ｎを介する。各処
理ユニット１００ｎ（ｎ＝θ，１，２，…）内の各局所
データ共有回路ＣＳＹＳ５１ｎ（ｎ＝θ，１，２，…）
は、常に同一の内容を保持していなければならない。し
たがって、プロセッサ１ｎからの読み出し処理時には他
のプロセッサと独立に局所データ共有回路ＣＳＹＳ５１
ｎをローカルメモリのごとくアクセスすることができる
が、書き込み処理時には、データ共有回路バスｌ２０を
介してすべての処理ユニット（ｎ＝１，２，３，…）内
の局所データ共有回路ＣＳＹＳ５１ｎ（ｎ＝θ，１，
２，…）に書き込む必要があるため、アービトレーショ
ン回路１０３の管理下でデータ共有回路バスｌ２０を占
有して実行しなければならない。この時、各処理ユニッ
ト内で独立的に実行されている局所データ共有回路ＣＳ
ＹＳ５１ｎからの読み出し処理とのアクセス競合が生ず
るが、これに関しても信号制御回路５０ｎ内で競合制御
を行い、矛盾なくアクセスが可能となる様にする。The signal control circuit 5 in the processing unit 100n
0n receives a control signal l9n including an address signal from the processor 1n, decodes them to generate an access request signal l3n for accessing the shared system CSYS51n and the data sharing circuit bus l12, and the data sharing circuit and the data sharing. It is sent to the arbitration circuit DC103 which determines to which processor the access right to the circuit bus 120 is given. The arbitration circuit DC103 receives a plurality of access request signals 13n (n = 1, 2, ...) From a plurality of processing units, selects one of them, and selects an access permission signal l.
4n (n = θ, 1, 2, ...) Has a function of activating the access permission signal corresponding to the selected processing unit. The signal control circuit 50n turns on the output buffer 52n when the access permission signal 14n becomes active.
Then, an address signal, a control signal, and an information signal in the case of write processing are output to the data sharing circuit bus 120 via the bus line 15n. Further, in the case of the read processing, the data on the data sharing circuit bus is written to the local data sharing circuit CSYS51n via the bus line 15n and the input buffer 53n, or the bus line 10n.
It is directly read by the processor 1n via n. Information of the local data sharing circuit CSYS51n is processed by the processor 1
When n is read, it is via the bus line 17n. Each local data sharing circuit CSYS51n (n = θ, 1, 2, ...) In each processing unit 100n (n = θ, 1, 2, ...)
Must always have the same content. Therefore, during the read process from the processor 1n, the local data sharing circuit CSYS51 is independent of other processors.
Although n can be accessed like a local memory, at the time of write processing, the local data sharing circuit CSYS51n (n = n) in all processing units (n = 1, 2, 3, ...) Through the data sharing circuit bus 120. θ, 1,
2, ...), the data sharing circuit bus 120 must be occupied and executed under the control of the arbitration circuit 103. At this time, the local data sharing circuit CS that is independently executed in each processing unit
Although access competition with the read processing from the YS 51n occurs, the competition control is also performed in the signal control circuit 50n with respect to this as well, so that access can be performed without contradiction.

【００３１】図２は、信号制御回路１００ｎの実施例を
示している。デコーダ２００はプロセッサ１ｎからのア
ドレス線や制御線ｌ９ｎをデコードして、データ共有回
路ＣＳＹＳ５１ｎ及び１０２をプロセッサ１ｎがアクセ
スする場合のアクセス要求に対する制御信号、イネーブ
ル信号等を生成する。アービトレーション回路１０３へ
のデータ共有回路バスアクセス要求信号ＣＳＲＥＱ（Ｌ
ｏアクティブ）ｌ３ｎは、デコーダ２００からデータ共
有回路イネーブル信号ＣＳＥＮ（Ｌｏアクティブ）２１
３と同期処理回路からの同期終了情報ＳＹＮＣＯＫ（Ｌ
ｏアクティブ）ＴｌｎとのＯＲ理論をとることにより生
成する。アービトレーション回路１０３からのアクノリ
ッヂ信号ｌ４ｎは、前記ＣＳＲＥＱ信号ｌ３ｎに対する
アービトレーション回路１０３からの直接の返答である
ＣＳＡＣＫ信号（Ｌｏレベルのときアクセス要求が受け
付けられた）２０９と、共通システムＣＳＹＳ５１ｎ
（ｎ＝θ，１，２，…）がいずれかの処理ユニットの書
き込み処理に占有されていることを示すＣＳＢＵＳＹ信
号（Ｌｏアクティブ）から成る。ＲＤＹＡＣＫ信号（Ｈ
ｉアクティブ）２０６は、ＣＳＢＵＳＹ信号２１０がア
クティブの場合、非アクティブ（Ｌｏレベル）になる様
にコントロールされ、プロセッサ１ｎが自身のデータ共
有回路ＣＳＹＳ５１ｎからデータを読み出そうとした場
合、データ共有回路ＣＳＹＳ５１ｎへの他の処理ユニッ
トからの書き込み処理とアクセス競合を生じない様にな
っている。すなわち、ＣＳＹＳ５１ｎへの書き込み処理
が存在する場合ＮＡＮＤゲート２０８が非アクティブ
（Ｈｉレベル）になり、ＣＳＹＳ５１ｎへの読み出しイ
ネーブル信号ＣＳＲＤ（Ｌｏアクティブ）を非アクティ
ブにする。例えば、データ共有回路１０２にアクセスす
るため、デコーダ２００はＣＳＥＮ２１３をアクティブ
にしたとする。その時、ＳＹＮＣＯＫＴｌｎが非アクテ
ィブであるとすると、ＣＳＲＥＱｌ３ｎは、ＯＲゲート
２１２により、ＳＹＮＣＯＫＴｌｎが非アクティブ
のうちはアクティブにならず、したがって、ＣＳＡＣＫ
信号２０９も非アクテＣＳＡＣＫ信号２０９も非アクテ
ィブのままである。これにより、ＣＳＡＣＫ２０９とＣ
ＳＲＥＱｌ３ｎを受けるＮＯＲゲート２０２はＬｏレベ
ルに固定される。デコーダ２００は、ＮＯＲゲート２０
２の出力がＨｉレベルに転じるまでＣＳＥＮ２１３をＬ
ｏレベルに固定しつづける。また、同様に、出力バッフ
ァ５２ｎの出力をコントロールするＢＦＯＮ信号ｌ２ｎ
もＮＯＲゲート２０２の出力を受け、非アクティブ（Ｈ
ｉレベル）のままであり、データ共有回路バスｌ２０へ
のアクセスはＳＹＮＣＯＫＴｌｎがアクティブになるま
で待たされる。プロセッサ１ｎを一時停止する機能はＲ
ＥＡＤＹ信号（Ｌｏアクティブ）ｌ１ｎを非アクティブ
に保ち、プロセッサ１ｎのバスサイクルを終了させない
ことによって実現する。ＲＥＡＤＹ信号ｌ１ｎは、ＡＮ
Ｄゲート２１１により、ＣＳＲＤｌ８ｎか又はＢＦＯＮ
ｌ２ｎのいずれかがアクティブになった時、アクティブ
に転じ、プロセッサを次の処理に進める。これを局所同
期機能と呼ぶ。同様に、供給システムＣＳＹＳ５１ｎを
アクセスするための信号ＣＳＲＤｌ８ｎも、ＳＹＮＣＯ
ＫＴｌｎをインバータ２０７でＨｉアクティブに変換し
てＮＡＮＤゲートに入力しており、ＳＹＮＣＯＫＴｌｎ
が非アクティブのうちは、ＣＳＲＤｌ８ｎ及びＲＥＡＤ
Ｙｌ１ｎを非アクティブにすることにより、データ共有
回路ＣＳＹＳ５１ｎへのアクセスを一時停止する様にし
ている。FIG. 2 shows an embodiment of the signal control circuit 100n. The decoder 200 decodes the address line and the control line 19n from the processor 1n to generate a control signal, an enable signal and the like for an access request when the processor 1n accesses the data sharing circuits CSYS51n and 102 . Data sharing circuit bus access arbitration circuit 103 request signal CSREQ (L
o active) L3n, the data sharing circuit enable signal CS EN (Lo active from the decoder 200) 21
3 and the synchronization end information SYNCOK (L
o Active) Created by taking the OR theory with Tln
To achieve. Akunoriddji signal l4n from A over arbitration circuit 103, and the CSR EQ signal l 3n CSACK signal is a direct response from the arbitration circuit 103 for (access request when the Lo level is accepted) 209, a common system CSYS51n
(N = θ, 1, 2 , ...) consists CSBUSY signal indicating that it is occupied by the write processing of the processing unit of purchase Zureka (Lo active). RDYACK signal (H
The i-active) 206 is controlled to become inactive (Lo level) when the CSBUSY signal 210 is active, and when the processor 1n tries to read data from its own data sharing circuit CSYS51n, the data sharing circuit CSYS51n. The access conflict with the write processing from the other processing units does not occur. That, NAND gate 208 if the write process to CSYS51n exists becomes inactive (Hi level), the read enable signal to CSYS51n CSRD the (Lo active) inactive. For example, accessing the data sharing circuit 102
Because, decoders 200 active the CSEN213
Let's say At that time, SYNCOKTln non-A Kute
When a I Bed, CSREQl3n is by OR gate 212, not active Among SYNCO KTln is inactive, therefore, CSACK
Signal 20 9 also non accession Te CSA CK signal 209 also remains inactive. This allows CSACK 209 and C
The NOR gate 202 which receives SREQ13n is fixed at Lo level . De-coder 200, NOR gate 20
Set CSEN213 to L until the output of 2 turns to Hi level.
o level to characterize fixed situ. Similarly, a BFON signal 12n for controlling the output of the output buffer 52n
Receives the output of NOR gate 202 is also inactive (H
i level) remains, access to the data sharing circuit bus l20 is to wait until the SYNCOKT ln becomes active. The function to suspend the processor 1n is R
EADY signal (Lo active) l1n is inactive
To keep, not to terminate the bus cycle of the processor 1n
To realize I'm in particular. REA DY signal 11n is AN
Depending on the D gate 211, CSRD18n or BFON
When any one of l2n becomes active, it becomes active and the processor advances to the next processing. This is called a local synchronization function . Similarly, signals CSRDl8n for accessing the supply system CS YS51n, SYNCO
And enter the NAND gate is converted into Hi active inverter 20 7 KTln, SYNCOKTln
Is inactive, CSRD18n and READ
By making Y11n inactive, access to the data sharing circuit CSYS51n is temporarily stopped.

【００３２】図３は、同期処理回路１０１の実施例を示
している。ＮＡＮＤゲートＵＡθ〜ＵＡｎ……は、フィ
リップフロップＵＣθ〜ＵＣｎ……の対応するＱ出力
と、フィリップフロップＵＢθ〜ＵＢｎ……の対応する
Ｑ出力とを受け、その出力がすべてＨｉレベルになった
時のみ、それらの出力を受けている多入力ＮＡＮＤゲー
トの出力はＬｏレベルになる。フィリップフロップＵＣ
θ〜ＵＣｎ……のＱがＨｉレベルのとき、対応するＮＡ
ＮＤゲートＵＣθ〜ＵＣｎ……に入力されているフィリ
ップフロップＵＢθ〜ＵＢｎ……のＱ出力が有効にな
る。この例では、ＵＢθ〜ＵＢｎ……のＱ出力の有効、
無効を決定する作業をプロセッサ１θが行う。すなわ
ち、フィリップフロップＵＣθ〜ＵＣｎ……に、対応す
るプロセッサ１ｎ（ｎ＝θ，１，２，…）からの同期要
求Ｓｌｎ（ｎ＝θ，１，２，…）を無視する場合、プロ
セッサ１θは、対応するＳｌＤθ〜ＳｌＤｎ……をＬｏ
レベル（θ）にセットして、Ｓｌθにトリガ信号を発生
することにより、対応するＵＣθ〜ＵＣｎ……のＱ出力
をＬｏレベル（θ）にセットする。各プロセッサ１ｎ
（ｎ＝θ，１，２，…）は、同期要求として対応するＳ
ｌθ〜Ｓｌｎ……にトリガ信号を発生ずることにより、
Ｑ出力Ｌｏレベルをセットし、Ｑ出力にＨｉレベルをセ
ットして、同期終了情報Ｔｌθ〜Ｔｌｎ……を一度非ア
クティブにする。ＵＣθ〜ＵＣｎ……のＱ出力がＨｉレ
ベル（１）にセットされた対応するＮＡＮＤゲートＵＡ
θ〜ＵＡｎ……の出力はその時初めてＨｉレベルに転ず
る。ＵＣθ〜ＵＣｎ……のＱ出力がＨｉレベルにセット
された対応するＵＢθ〜ＵＢｎ……のＱ出力がすべてＬ
ｏレベルにセットされて初めて、多入力ＮＡＮＤゲート
ＵＤθの出力がＨｉレベルからＬｏレベルに遷移する。
ＵＤθのＬｏレベルの出力により、フィリップフロップ
ＵＢθ〜ＵＢｎ……はプリセットされ、そのＱ出力はＨ
ｉレベルに戻る。一方、ＵＢθ〜ＵＢｎ……のＱ出力は
Ｌｏレベルに戻り、アクティブな同期終了情報Ｔｌθ〜
Ｔｌｎ……を対応するプロセッサ１ｎ（ｎ＝θ，１，
２，…）に送る。FIG. 3 shows an embodiment of the synchronization processing circuit 101. The NAND gates UAθ to UAn ... Receive the corresponding Q outputs of the flip-flops UCθ to UCn ... and the corresponding Q outputs of the flip-flops UBθ to UBn ..., and only when all the outputs become Hi level. , The output of the multi-input NAND gate receiving these outputs becomes Lo level. Philip Flop UC
When Q of θ to UCn ... is at the Hi level, the corresponding NA
The Q outputs of the flip-flops UBθ to UBn, which are input to the ND gates UCθ to UCn, become valid. In this example, the Q outputs of UBθ to UBn ... are valid,
The processor 1θ performs the work of determining invalidity. That is, when ignoring the synchronization request Sln (n = θ, 1, 2, ...) From the corresponding processor 1n (n = θ, 1, 2, ...) In the flip-flops UCθ to UCn ... , Corresponding SlDθ to SlDn ...
By setting the level (θ) and generating a trigger signal in Slθ, the Q outputs of the corresponding UCθ to UCn ... Are set to the Lo level (θ). Each processor 1n
(N = θ, 1, 2, ...) Corresponds to S as a synchronization request.
By generating a trigger signal in lθ ~ Sln ...
The Q output Lo level is set, the Q output is set to Hi level, and the synchronization end information Tlθ to Tln ... Is inactivated once. Corresponding NAND gate UA in which the Q output of UCθ to UCn ... Is set to Hi level (1)
The outputs of θ to UAn ... turn to Hi level for the first time. Q outputs of UCθ to UCn ... are set to Hi level. All Q outputs of corresponding UBθ to UBn ... are L.
Only after being set to the o level, the output of the multi-input NAND gate UDθ transits from the Hi level to the Lo level.
The output of the UDθ at the Lo level presets the flip-flops UBθ to UBn.
Return to i level. On the other hand, the Q output of UBθ to UBn ... returns to Lo level, and the active synchronization end information Tlθ to
Tln ... Corresponding processor 1n (n = θ, 1,
2, ...)

【００３３】もし、すべてのプロセッサを同期処理に参
加させたければ、ＵＣθ〜ＵＣｎ……のＱ出力をすべて
Ｈｉレベルにセットすれば良い。これを一斉同期モード
と呼ぶ。一斉同期モードしか用いないならば、フィリッ
プフロップＵＣθ〜ＵＣｎ……は必要なく、ＵＢθ〜Ｕ
Ｂｎ……のＱ出力をインバータで論理反転し、多入力Ｎ
ＡＮＤゲートＵＤθに直接入力すれば良い。If all the processors are to participate in the synchronous processing, all the Q outputs of UCθ to UCn ... Can be set to the Hi level. This is called a simultaneous synchronization mode. If only the simultaneous synchronization mode is used, the flip-flops UCθ to UCn are not necessary and UBθ to U
Inverting the Q output of Bn ...
It may be directly input to the AND gate UDθ.

【００３４】次に、本発明の同期処理回路のもう一つの
実施例を図４により説明する。実施例におけるマルチ・
プロセッサ・システムは、ｍ台のプロセッサから構成さ
れるとし、本図では代表的なプロセッサ１n及び１n+1を
示している。各プロセッサにはそれぞれプロセッサ間同
期処理用の回路２n，２n+1が設けられている。２n，２n
+1は、図３に示す同期処理回路中の１０１Ａに示す回路
部分と等価である。同期用回路２n，２n+1間の情報交信
は信号ライン８によって行われる。本発明によれば、関
連のあるタスクを実行するプロセッサが任意にグループ
を構成し、グループ内で同期をとりながら処理を進める
ことができる。前述したプロセッサ間同期用の回路２
n，２n+1はそれぞれ各プロセッサに対応して、グループ
名に相当する情報を格納する同期用レジスタ５と、同期
用レジスタ５に値がセットされるタイミングかあるいは
それ以後のタイミングでトリガされるフィリップ・フロ
ップの状態を各プロセッサに放送する信号ラインと、放
送された情報を同期レジスタ５の内容と照合し、そこに
登録されたグループ内のプロセッサの状態がすべて真に
なったか否かをチェックする判断回路６と、判断回路６
のチェック結果をプロセッサに知らせる信号回路とを備
えている。４はアクセス信号の信号線、８はタスク終了
信号ライン、９はステータス線、１０はトリガ信号線を
示す。同期用レジスタ５は、図３の１０１Ａに示す回路
中のフィリップフロップＵＣθ〜ＵＣｎ……に相当す
る。本実施例においては、各プロセッサ１ｎに対応して
それぞれ同期用レジスタ５を有するため、各プロセッサ
が独自に、同期処理の対象と成るプロセッサを選択でき
る（図３の例ではプロセッサ１θのみが選択権を有す
る）。本実施例では、各プロセッサに対応して分散して
同期処理用回路を設けた構成を採っており、分散された
各同期処理用回路２ｎ（ｎ＝θ，１，……ｍ）は、信号
ライン８によって必要なデータを送受する。したがっ
て、同期処理用回路２ｎ（ｎ＝θ，１，……ｎ）と信号
ライン８をすべて含めて、図１に示した同期処理回路１
０１を構成していると言える。Next, another embodiment of the synchronization processing circuit of the present invention will be described with reference to FIG. Multi in the example
The processor system is assumed to be composed of m processors, and representative processors 1n and 1n + 1 are shown in this figure. Each processor is provided with circuits 2n and 2n + 1 for inter-processor synchronization processing. 2n, 2n
+1 is equivalent to the circuit portion indicated by 101A in the synchronization processing circuit shown in FIG. Information communication between the synchronizing circuits 2n and 2n + 1 is performed by the signal line 8. According to the present invention, a processor that executes a related task can arbitrarily configure a group, and the process can proceed while synchronizing with each other in the group. Circuit 2 for interprocessor synchronization described above
n and 2n + 1 correspond to each processor, and are triggered at a timing for setting a value in the synchronizing register 5 for storing information corresponding to a group name and a timing for setting a value in the synchronizing register 5 or thereafter. The state of the flip-flop is broadcasted to each processor and the broadcasted information is collated with the contents of the synchronization register 5, and it is checked whether the states of the processors in the group registered therein are all true. Determination circuit 6 for determining and determination circuit 6
And a signal circuit for notifying the processor of the check result. Reference numeral 4 is an access signal signal line, 8 is a task end signal line, 9 is a status line, and 10 is a trigger signal line. The synchronization register 5 corresponds to the flip-flops UCθ to UCn ... In the circuit 101A shown in FIG. In the present embodiment, since each processor 1n has the synchronization register 5 corresponding thereto, each processor can independently select the processor to be the target of the synchronization processing (in the example of FIG. 3, only the processor 1θ has the selection right). Have). In this embodiment, the synchronous processing circuits are provided in a distributed manner corresponding to each processor, and each of the distributed synchronous processing circuits 2n (n = θ, 1, ... The necessary data is transmitted and received by the line 8. Therefore, the synchronous processing circuit 1 shown in FIG. 1 including all the synchronous processing circuits 2n (n = θ, 1, ... N) and the signal line 8 is included.
It can be said that it constitutes 01.

【００３５】次に、プロセッサのグループ内での同期動
作シーケンスを説明する。プロセッサ１₀〜１nのうち例
えばプロセッサ１n及び１n+1がグループを構成し、関連
するタスクを実行しているものとする。まずプロセッサ
１nの動作を中心に説明する。プロセッサ１nはタスク処
理を完了すると、同期用レジスタ５にデータ・ライン３
（ＳｌＤθ〜ＳｌＤｎ〜ＳｌＤｍ）を通してそのｎ番目
とｎ＋１番目のビットに１をセットし、他のビットは０
をセットしたビット列（これによりプロセッサのグルー
プを示す）を書き込む。その書き込み動作の際、プロセ
ッサｎが同期用レジスタ５をアクセスしていることを示
す信号線４がアクティブなパルスを発生し、それが同期
用レジスタ５への書き込みクロック信号の役割を果た
す。また同時に、信号線４によってフィリップ・フロッ
プ７もトリガされ、その端子Ｑに０レベルのタスク終了
信号が出力され、端子Ｑには１レベルのステータス信号
が出力される。端子Ｑからのタスク終了信号はタスク終
了信号ライン８のｎに接続されており、各プロセッサの
同期用回路２₀〜２nへ伝送される。また、端子Ｑからの
ステータス信号はステータス線９によりプロセッサ１n
のＴＥＳＴ入力に入力されており、プロセッサはＴＥＳ
Ｔ入力が０レベルに転じるまで処理を中断する。同期用
レジスタ５にセットされた値と信号ライン８との値は、
互いに０からｍの対応関係を保ちつつ判断回路６に取り
込まれ、ＮＡＮＤ−ＮＡＮＤゲート（１０１ＡのＵＡθ
〜ＵＡｎ……とＵＤθに相当する）によって同期用レジ
スタ５に１がセットされたビットに対応するタスク終了
信号ライン８の値がすべて０レベルになった場合、すな
わちこの例ではタスク終了信号ライン８のｎ及びｎ＋１
が０レベルとなった場合、トリガ信号１０がアクティブ
０となる。これを受けて、フィリップ・フロップ７はブ
リセットされ、端子Ｑからのタスク終了信号１が１レベ
ルに転じ、タスク終了信号ライン８のｎは１レベルにな
るため、判断回路６のトリガ信号１０も１レベルに転じ
る。また同時に端子Ｑにステータス信号は０レベルに転
じ、したがってプロセッサ１nのＴＥＳＴ入力も０レベ
ルになるため、プロセッサ１nは処理を再開する。プロ
セッサ１n+1でも同様の操作が行われるため、この例で
は結果的にプロセッサ１nとプロセッサ１n+1が相方とも
タスク処理完了した時点で同期化されることになる。Next, the synchronous operation sequence within the group of processors will be described. It is assumed that, for example, the processors 1n and 1n + 1 among the processors 1 _{0 to} 1n form a group and execute related tasks. First, the operation of the processor 1n will be mainly described. When the processor 1n completes the task processing, the data line 3 is transferred to the synchronization register 5.
Set (1) to the nth and (n + 1) th bits through (SlDθ to SlDn to SlDm), and set 0 to the other bits.
Write the bit string that sets (which indicates the group of processors). During the write operation, the signal line 4 indicating that the processor n is accessing the synchronization register 5 generates an active pulse, which serves as a write clock signal to the synchronization register 5. At the same time, Philip furo Tsu <br/> flop 7 by a signal line 4 is also triggered, the 0 level of the task termination signal to the terminal Q is outputted, the terminal Q is outputted one level status signals. Task termination signal from the terminal Q is connected to the n of the task termination signal lines 8 are transmitted to the synchronous circuit 2 ₀ to 2n of each processor. The status signal from the terminal Q is sent to the processor 1n via the status line 9.
Is input to the TEST input of the
The process is suspended until the T input changes to 0 level. The value set in the synchronization register 5 and the value of the signal line 8 are
The decision circuit 6 is loaded while maintaining the correspondence relationship of 0 to m, and the NAND-NAND gate (UAθ of 101A is
~ UAn ... And UDθ), all the values of the task end signal line 8 corresponding to the bit for which 1 is set in the synchronization register 5 become 0 level, that is, the task end signal line 8 in this example. N and n + 1
Becomes 0 level, the trigger signal 10 becomes active 0. In response, Philip flop 7 is yellowtail set, the task termination signal 1 from the terminal Q is turned to 1 level, the n of the task termination signal lines 8 1 level to Do <br/> order, decision circuit 6 The trigger signal 10 of turns to 1 level. At the same time, the status signal at the terminal Q changes to 0 level, and the TEST input of the processor 1n also becomes 0 level, so that the processor 1n restarts processing. Since the same operation is performed in the processor 1n + 1, in this example, as a result, both the processor 1n and the processor 1n + 1 are synchronized when the task processing is completed.

【００３６】以上が本発明の同期処理回路の動作シーケ
ンスである。本発明において、ソフトウエアによる同期
処理は、同期用レジスタ５への値の書き込みだけであ
り、機械語レベルで１マシン命令程度である。その他は
すべてハードウエアで処理されるため、プログラマブル
であるという条件下で同期処理に要するオーバーヘッド
は極小化されている。さらに本発明によれば、１組の同
期処理回路１０１を利用するだけでプロセッサを関連す
るもの同志でグループ分けすることが可能であり、グル
ープ分けされた各グループ内に属するプロセッサ間で同
期処理を容易に行うことができる。また、本発明の同期
処理回路１０１を複数組備えることによって、グループ
間の同期処理等を多重に行うことが可能である。プロセ
ッサのグループ分けと多重同期とにより、並列処理に柔
軟性が生まれ、データーフローに近い高効率の並列処理
を汎用マルチ・プロセッサ・システム上で実現すること
ができる。第５図は本発明の同期処理回路１０１によ
る並列処理の制御の流れを示す。第５図の例では信号制
御回路５０ｎの中で行われる局所同期処理を用いておら
ず、同期処理回路１０１からの同期終了情報だけをプロ
セッサ１ｎがチェックすることによって同期処理を実行
するものとする。この図において、プロセッサａからプ
ロセッサｄの４台のプロセッサが、本発明の同期処理回
路１０１によって図中上部から下部へと時刻の経過とと
もに並列処理の進行制御がなされているものとする。ま
ず関連するタスクであるタスク１とタスク２とがそれぞ
れプロセッサａ及びプロセッサｂで処理され、同様にタ
スク３とタスク４とがそれぞれプロセッサｃ及びプロセ
ッサｄで処理されている。関連するタスクを実行するプ
ロセッサはグループを構成できるため、この場合プロセ
ッサａとプロセッサｂがグループ１１を構成し、プロセ
ッサｃとプロセッサｄがグループ１２を構成している。
タスク間を結ぶ実線矢印は同一プロセッサ間での処理の
流れとデータの流れを示し、一点鎖線矢印はグループ内
の他のプロセッサによって実行されるタスク間でのデー
タの流れ、すなわちグループ内プロセッサ間での通信を
示している。時刻ｔ₁及びｔ₂においてそれぞれ２つのグ
ループ内で通信の必要が生じ、本発明の同期処理回路１
０１によって同期処理がなされた後、処理データがプロ
セッサ間で交換されている。その後、プロセッサａとプ
ロセッサｂのグループはそれぞれタスク５とタスク６の
処理に移り、プロセッサｃとプロセッサｄのグループは
それぞれタスク７とタスク８の処理へ移る。このよう
に、図４に示すプロセッサ間同期処理回路１０１による
同期処理は、それまでのタスク処理をどのプロセッサと
グループを構成して実行してきたかを示している。また
グループ分けによってグループ間にはデータのやりとり
が生じないため、それぞれのグループの処理を独立して
進めることが可能であり、並列処理の進行制御がフレキ
シブルであるため、効率の良い並列処理を実現可能であ
る。以後、グループ１３、１４は、グループ１２、１３
と同じプロセッサ同志でグループを構成してタスク処理
を行っており、同様の時刻ｔ₃及び₄で同期処理がなされ
プロセス間の通信が行われている。時刻ｔ₅では、すべ
てのタスク９〜１２関連が生じ、グループ間の独立性が
なくなったため、一度グループ内で同期処理を行った
後、別系統のプロセッサ間同期処理回路１０１により再
び同期処理を行い、すべてのプロセッサを同期させてい
る。すなわち、別系統のプロセッサ間同期処理回路１０
１によりグループ間の同期処理を行ったと考えることが
でき、結果的にグループ１５はすべてのプロセッサが属
するグループとなっている。その後、関連するタスク１
３〜１５を処理するプロセッサａ〜ｃがグループ１６
を、独立したタスク１６を実行するプロセッサｄは単独
グループ１７をそれぞれ構成し、グループを再編成し
て、以下同様に時刻ｔ₆及びｔ₇等で同期処理を行いなが
ら並列処理を実行していく。この様に、複数組の同期処
理回路１０１を用意し、多重同期処理を行うと、グルー
プの再編成等も容易に実現でき、よりフレキシブルで高
効率な並列処理を実現できる。The above is the operation sequence of the synchronous processing circuit of the present invention. In the present invention, the synchronization processing by software is only the writing of the value to the synchronization register 5, and is about one machine instruction at the machine language level. Since all others are processed by hardware, the overhead required for synchronous processing is minimized under the condition of being programmable. Further, according to the present invention, it is possible to group the processors by related ones only by using one set of the synchronization processing circuit 101, and perform the synchronization processing between the processors belonging to each grouped group. It can be done easily. Further, by providing a plurality of sets of the synchronization processing circuit 101 of the present invention, it is possible to perform synchronization processing between groups in a multiplexed manner. Due to the grouping of processors and multiple synchronization, flexibility in parallel processing is created, and highly efficient parallel processing close to data flow can be realized on a general-purpose multi-processor system. FIG. 5 shows a flow of control of parallel processing by the synchronous processing circuit 101 of the present invention. In the example of FIG. 5, the local synchronization processing performed in the signal control circuit 50n is not used, and the synchronization processing is executed by the processor 1n checking only the synchronization end information from the synchronization processing circuit 101. . In this figure, it is assumed that four processors, processor a to processor d, are controlled by the synchronous processing circuit 101 of the present invention from the upper part to the lower part in the drawing in parallel with the progress of time. First, related tasks, task 1 and task 2 are processed by the processors a and b, respectively, and similarly, task 3 and task 4 are processed by the processors c and d, respectively. Since the processors that execute the related tasks can form a group, in this case, the processors a and b form the group 11, and the processors c and d form the group 12.
Solid arrows connecting tasks indicate the flow of processing and data between the same processors, and the dashed-dotted arrows indicate the flow of data between tasks executed by other processors in the group, that is, between processors in the group. Shows the communication of. At times t ₁ and t ₂ , communication needs to occur in two groups, and the synchronization processing circuit 1 of the present invention
After the synchronous processing is performed by 01, the processing data is exchanged between the processors. After that, the groups of the processors a and b move to the processing of tasks 5 and 6, respectively, and the groups of the processors c and d move to the processing of tasks 7 and 8, respectively. As described above, the synchronous processing by the inter-processor synchronous processing circuit 101 shown in FIG. 4 indicates which processor and group have been configured to execute the task processing up to that point. In addition, because data is not exchanged between groups due to grouping, it is possible to independently proceed with the processing of each group, and flexible progress control of parallel processing realizes efficient parallel processing. It is possible. After that, the groups 13 and 14 are changed to the groups 12 and 13
The same processors form a group to perform task processing, and at the same time t ₃ and ₄ , synchronous processing is performed and communication between processes is performed. At time t ₅ , all the tasks 9 to 12 are related and the independence between the groups disappears. Therefore, after performing the synchronization processing once within the group, the synchronization processing is performed again by the interprocessor synchronization processing circuit 101 of another system. , All processors are in sync. That is, the inter-processor synchronization processing circuit 10 of another system
It can be considered that the synchronization processing between the groups is performed by 1, and as a result, the group 15 is a group to which all the processors belong. Then related task 1
The processors ac for processing 3 to 15 are group 16
The processor d that executes the independent task 16 configures each single group 17, rearranges the groups, and executes parallel processing while performing synchronous processing at times t ₆ and t _{7 and the} like in the same manner. . Thus, by preparing a plurality of sets of synchronization processing circuits 101 and performing multiple synchronization processing, group reorganization and the like can be easily realized, and more flexible and highly efficient parallel processing can be realized.

【００３７】次に、図４に示した本発明の同期処理回路
を用いて、図１に示したプロセッサ間同期処理装置を構
成した場合の具体的な効果を、タスクを構成する最小単
位であるプロセッサのインストラクションレベルまで分
解して解析した一例を図に示す。第６ａ図は、局所同期
方式を用いないで同期終了情報Ｔｌｎのみに依存してプ
ロセッサ間の同期処理を行った場合であり、いわば従来
型と言える。すなわち、図５に示した並列処理フローの
一部を切り出したものと考えれば良い。図６の動作条件
及び仮定を以下に示す。Next, the concrete effect when the inter-processor synchronous processing device shown in FIG. 1 is constructed by using the synchronous processing circuit of the present invention shown in FIG. 4 is the minimum unit that constitutes a task. The figure shows an example of analysis by disassembling to the instruction level of the processor. FIG. 6a shows a case where the synchronization processing between processors is performed without relying on the local synchronization method and depending only on the synchronization end information Tln, which is, so to speak, conventional. That is, it may be considered that a part of the parallel processing flow shown in FIG. 5 is cut out. The operating conditions and assumptions of FIG. 6 are shown below.

【００３８】１）同期処理回路１０１へのタスク終了情
報を送出しタスクの終了をプロセッサが宣言するプロセ
ッサ間同期用インストラクションをＳと表記する。1) An instruction for inter-processor synchronization, in which task end information is sent to the synchronous processing circuit 101 and the end of the task is declared by the processor, is denoted by S.

【００３９】２）プロセッサＰｌ、Ｐｍ、Ｐｎはグルー
プを構成しており、グループ内で同期をとりながら関連
したタスクを実行している。2) The processors Pl, Pm and Pn form a group and execute related tasks while synchronizing with each other in the group.

【００４０】３）各プロセッサＰｌ、Ｐｍ、Ｐｎは、イ
ンストラクションＳを実行した後同期処理回路１０１か
らの同期終了情報（ＴＥＳＴ出力）がアクティブに転じ
るまで（Ｐｌ、Ｐｍ、Ｐｎのすべてがそのレベルでのタ
スク処理を終了し、それぞれインストラクションＳを実
行するまで）無条件で待たされる。3) Each of the processors Pl, Pm, and Pn executes the instruction S and then the synchronization end information (TEST output) from the synchronization processing circuit 101 becomes active (all of Pl, Pm, and Pn are at that level). Task processing is completed, and the instruction S is executed, and the task is unconditionally waited.

【００４１】４）同期終了情報が非アクティブ状態で
も、各プロセッサは外部のデータを使用しない範囲で、
プロセッサ内部の命令キューや命令キャシュに存在する
命令に限定して可能な限り実行することができる。すな
わち同期化ロジックとして、インストラクションＳを実
行する際に同期処理回路１０１をアクセスするため生成
した外部バスサイクルに対して、同期終了情報がアクテ
ィブに転じるまでＲeady信号をプロセッサに返送しない
様にし、強制的に外部バスサイクルを一時停止（外部バ
スサイクルを終了させない）する回路構成を採っている
とする。4) Even if the synchronization end information is inactive, each processor does not use external data,
The instructions can be executed as much as possible by limiting to the instructions existing in the instruction queue or the instruction cache inside the processor. That is, as the synchronization logic, for the external bus cycle generated for accessing the synchronization processing circuit 101 when executing the instruction S, the Ready signal is not returned to the processor until the synchronization end information becomes active, and it is forced. It is assumed that the external bus cycle is temporarily stopped (the external bus cycle is not terminated).

【００４２】５）外部データを使用しないインストラク
ション（例えばレジスタ間演算等）をＩと表記する。一
方、共有システム上のデータ以外の外部データを使用す
るインストラクションをＩＤ、共有システム上の共有デ
ータを使用するインストラクションをＩＣＤと表記す
る。5) An instruction that does not use external data (for example, calculation between registers) is denoted by I. On the other hand, an instruction that uses external data other than the data on the shared system is referred to as an ID, and an instruction that uses shared data on the shared system is referred to as an ICD.

【００４３】一方図６は、局所同期方式を用いて、後続
のタスク処理のうち最初の共有システムアクセス（ＩＣ
Ｄ）まで先行して処理を進めた場合を示している。条件
及び仮定は、前記図６におけるそれの１）、２）、５）
は同様である。条件３）及び４）の代りに以下の条件を
設定する。On the other hand, FIG. 6 shows the first shared system access (IC
It shows the case where the processing is advanced to D). The conditions and assumptions are 1), 2), 5) of that in FIG.
Is the same. The following conditions are set instead of the conditions 3) and 4).

【００４４】６）各プロセッサＰｌ、Ｐｍ、Ｐｎは、イ
ンストラクションＳを実行して、アクティブなタスク終
了情報を同期処理回路に送出した後、新たな共有システ
ムへのアクセス命令（ＩＣＤ）が現われるまで先行して
次のレベルのタスク処理を進めていく。6) Each processor Pl, Pm, Pn executes the instruction S and sends the active task end information to the synchronous processing circuit, and then precedes until an access command (ICD) to a new shared system appears. And proceed to the next level of task processing.

【００４５】７）６）の条件で先行して次のレベルのタ
スク処理を進めて行った時、各プロセッサＰｌ、Ｐｍ、
Ｐｎは、共有システムへのアクセス処理ＩＣＤを最初に
実行しようとした時点で、まだ前のレベルでの同期処理
回路１０１からの同期終了情報がアクティブになってい
ない場合、ＩＣＤによって生成された外部バスサイクル
に対するＲeady信号ｌ９を同期終了情報がアクティブに
なるまでプロセッサに返送しない。すなわち、ＩＣＤに
よって生成した外部バスサイクルを同期終了情報がアク
ティブに転ずるまで強制的に一時停止（バスサイクルを
終了させない）する。なお、命令キューや命令キャッシ
ュ上のインストラクションＩは、実行が可能ならば、さ
らに先行して処理を進めることができる。インストラク
ションＩＣＤと同期終了情報との組み合わせによる同期
処理を前述した様に局所同期処理と呼ぶ。7) When the task processing of the next level is advanced under the condition of 6), the processors Pl, Pm,
Pn is the external bus generated by the ICD when the synchronization end information from the synchronization processing circuit 101 at the previous level is not active at the time when the access processing ICD for accessing the shared system is first executed. The Ready signal 19 for the cycle is not sent back to the processor until the sync done information becomes active. That is, the external bus cycle generated by the ICD is forcibly suspended (the bus cycle is not ended) until the synchronization end information turns active. If the instruction I on the instruction queue or the instruction cache can be executed, the processing can be further advanced. The synchronization processing by the combination of the instruction ICD and the synchronization end information is called the local synchronization processing as described above.

【００４６】ここで、ＳＹＮＣｋは、同期処理回路から
アクティブな同期処理情報が返送された時点、すなわち
レベルｋでの同期がとれた時点を示している。図６にお
いては、レベルｋ−１の後半部でプロセッサＰｌ、Ｐｍ
にレベルをｋの後半部でプロセッサＰｌ、Ｐｎに空時間
が生じており、プロセッサは休止している。図６におい
ては、インストラクションＩＣＤが現われるまで各プロ
セッサは先に処理を進めていくことができるため、レベ
ルｋ−１の後半部でプロセッサＰｍにわずかな空時間を
生じている以外はレベルｋ−１、レベルｋ共、空時間を
生じていない。スタートをＭ、終了をＮとすると、図６
の上部に対して図６の下部ではＰｌがｔｌだけ、Ｐｍが
ｔｍだけ、Ｐｎがｔｎだけ、それぞれ処理時間を短縮し
た事になる。並列処理のクリティカルパスを実行するプ
ロセッサも、図６の上部ではレベルｋ−１がＰｎ、レベ
ルｋがＰｍ、レベルｋ＋１がＰｌとなっているのに対し
て図６の下部ではレベルｋ−１、ｋ、ｋ＋１共Ｐｎとな
っており、クリティカルパス自体が変化しているのがわ
かる。これは、各プロセッサがタスク処理をレベルにま
たがって先行処理したために、実質的なタスク処理時間
そのものが動的に変化したために、クリティカルパスに
も変化が生じたのである。その結果、処理の開始時点を
Ｍ、終了時点をＮとすると、図に示した並列処理全体で
図６の下部は図６の上部よりｔｃだけ処理時間を短縮し
た事になる。これは、処理能力が約２１％程度向上した
計算になり、局所同期方式を用いると処理能力を大幅に
向上できる可能性を示している。Here, SYNCk indicates the time when the active synchronization processing information is returned from the synchronization processing circuit, that is, the time when the synchronization at the level k is achieved. In FIG. 6, the processors Pl and Pm are provided in the latter half of the level k-1.
In the second half of the level k, idle time is generated in the processors Pl and Pn, and the processor is halted. In FIG. 6, since each processor can advance the processing until the instruction ICD appears, the level k-1 is generated except for a slight idle time in the processor Pm in the latter half of the level k-1. , Level k, no idle time is generated. If the start is M and the end is N, then FIG.
In the lower part of FIG. 6, the processing time is shortened by Pl for tl, Pm for tm, and Pn for tn in the lower part of FIG. Also in the processor executing the critical path of parallel processing, level k-1 is Pn, level k is Pm, and level k + 1 is Pl in the upper part of FIG. 6, whereas level k-1 is in the lower part of FIG. Both k and k + 1 are Pn, and it can be seen that the critical path itself is changing. This is because each processor performed the task processing in advance across the levels, and the actual task processing time itself dynamically changed, so that the critical path also changed. As a result, when the processing start time is M and the processing end time is N, the processing time in the lower part of FIG. 6 is shortened by tc from the upper part of FIG. 6 in the entire parallel processing shown in the figure. This is a calculation in which the processing capacity is improved by about 21%, and shows that the processing capacity can be significantly improved by using the local synchronization method.

【００４７】なお、図６において、Ａ、Ｂ、Ｃのポイン
トは、それぞれレベルｋ−１でのＰｌ、Ｐｍ、Ｐｎの同
期用インストラクションＳの位置を示している。Ｄ、
Ｅ、Ｆのポイントは、それぞれレベルｋで実行するタス
クの中で初めて実行する共有システムアクセスを伴うイ
ンストラクションＩＣＤである。同様に、Ｇ、Ｈ、Ｉは
レベルｋでのＳの位置、Ｊ、Ｋ、Ｌはレベルｋ＋１で実
行するタスク中での最初に現われるＩＣＤの位置を示し
ている。In FIG. 6, points A, B, and C respectively indicate the positions of the synchronization instruction S of Pl, Pm, and Pn at the level k-1. D,
The points E and F are the instruction ICD with shared system access executed for the first time in the tasks executed at the level k, respectively. Similarly, G, H, and I indicate the position of S at level k, and J, K, and L indicate the position of the first ICD appearing in the task executed at level k + 1.

【００４８】以上に述べたように本実施例は、固定的な
タスクの境目を排除し、タスク処理中で最初に共有シス
テムにアクセスして共有データのやりとりを行う必要が
生じるまで、先行してできるだけタスク処理を進めて行
くことができるので、同期処理時のプロセッサ間の待ち
合わせ処理に伴う空処理時間（プロセッサの休止時間）
を少なくすることができる。このことは、タスク処理時
間を各プロセッサの同期条件に応じて動的にタスク処理
時間を変更して並列処理のクリティカルパス自体を短縮
し、データドリブンと同様な効果を得て、より効率の良
い並列処理の実行を可能にする。As described above, this embodiment eliminates the fixed task boundary and precedes until it becomes necessary to access the shared system to exchange shared data first during task processing. Since it is possible to proceed with task processing as much as possible, idle processing time (processor idle time) accompanying waiting processing between processors during synchronous processing
Can be reduced. This makes it possible to dynamically change the task processing time according to the synchronization condition of each processor, shorten the critical path of parallel processing itself, obtain the same effect as data driven, and improve efficiency. Allows execution of parallel processing.

【００４９】また、本実施例の同期処理回路によれば、
汎用マルチ・プロセッサにおいて、固定されたジョブを
タスク分割し並列化して並列処理する場合、関連するタ
スクを実行する任意のプロセッサ同志をグリープにまと
め、グループ内のプロセッサ間あるいはグループ間で同
期をとる同期処理方式を用いることによって、プロセス
間の同期処理機構をソフトウエア・プログラマブルな範
囲で可能な限りハードウエア化することができるので、
同期処理に要するソフトウエア・オーバーヘッドを極小
化する効果がある。Further, according to the synchronous processing circuit of the present embodiment,
In a general-purpose multi-processor, when a fixed job is divided into tasks and parallelized for parallel processing, arbitrary processors that execute related tasks are grouped into a group and synchronized between processors within a group or between groups. By using the processing method, the synchronous processing mechanism between processes can be implemented as hardware in the software programmable range as much as possible.
This has the effect of minimizing the software overhead required for synchronization processing.

【００５０】次に、ソフトウェア内に同期処理命令を配
置することによりコントロールフローによって、並列処
理時の遊び時間の短縮を実現できる様にしたプロセッサ
間の同期装置用の命令セットの実施例を説明する。ソフ
トウェア的による基本的なコントロールフローは図４の
プロセッサ同期処理回路を用いれば実現できるが、より
高機能化を実現するため、図４に示した様に信号線４を
それぞれ特定の機能を有する４ａ、４ｂ、４ｃの３本に
分離したものを用いている。Next, a description will be given of an embodiment of an instruction set for a synchronizer between processors in which synchronous processing instructions are arranged in software so as to realize reduction of idle time during parallel processing by control flow. . Although a basic control flow by software can be realized by using the processor synchronous processing circuit of FIG. 4, in order to realize higher performance, each signal line 4 has a specific function 4a as shown in FIG. 4b and 4c are separated and used.

【００５１】図７は本発明の他の実施例の同期処理回路
の構成を示すブロック図である。信号線４ａは、タスク
終了情報を出力するフリップフロップをトリガするため
の信号、信号線４ｂは、同期用レジスタ５にグループを
登録するためのトリガ信号である。信号線４ｃについて
は後述する。FIG. 7 is a block diagram showing the structure of a synchronization processing circuit according to another embodiment of the present invention. The signal line 4a is a signal for triggering a flip-flop that outputs task end information, and the signal line 4b is a trigger signal for registering a group in the synchronization register 5. The signal line 4c will be described later.

【００５２】図８に同期処理命令の内容を示す。FIG. 8 shows the contents of the synchronous processing instruction.

【００５３】図７に示した同期処理回路を動作させるた
めに各プロセッサｎ（ｎ＝０，１，……）は、対応する
同期処理エレメントｎ（ｎ＝０，１，……）に対してＳ
ＹＮＣＯ〜ＳＹＮＣ７の合計８つのインストラクション
を指令する。それぞれのインストラクションにはそれぞ
れの機能を表現するニーモニックを用意している。すな
わち基本機能は以下の通りである。In order to operate the synchronous processing circuit shown in FIG. 7, each processor n (n = 0, 1, ...) Has a corresponding synchronous processing element n (n = 0, 1, ...). S
Command a total of eight instructions YNCO to SYNC7. Each instruction has a mnemonic that expresses its function. That is, the basic functions are as follows.

【００５４】ＧＳ……グループセッティング（同期用レ
ジスタ５にグループを登録することにより、どのプロセ
ッサ同志がグループを構成しているかを指示する）Ｅ
Ｏ……エンドアウト（タスク終了情報をアクティブに
し、タスク処理終了信号ライン８に出力して
タスクが終了した事を示す）Ｗ ……ウェイト（グループ内に属するすべてのプロセ
ッサのタスクが終了し、同期処理が完了する
まで対応するプロセッサを待ち状態にする。同期
処理が完了したかどうかを最終的に確認する機能で
ある）以上の基本機能を組み合わせて１マシン命令を実行でき
る様にした複合インストラクションとして以下のものを
有する。GS ... Group setting (By registering a group in the synchronization register 5, it is indicated which processor composes the group) E
O: End-out (task end information is activated and output to the task processing end signal line 8 to indicate that the task has ended) W ... Wait (tasks of all processors in the group have ended, Puts the corresponding processor into a wait state until the synchronization processing is completed.
This is a function for finally confirming whether or not the processing is completed.) The following is provided as a composite instruction that can execute one machine instruction by combining the above basic functions.

【００５５】ＧＳＥＯＷ……グループセッティング（Ｇ
Ｓ）→エンドアウト（ＥＯ）→ウェイト（Ｗ）の順で各
機能を連続処理し、一連の同期処理を１マシン命令で実
現する。GSEOW ... Group setting (G
Each function is continuously processed in the order of S) → end out (EO) → wait (W), and a series of synchronous processing is realized by one machine instruction.

【００５６】ＧＳＥＯ………グループセッティング（Ｇ
Ｓ）→エンドアウト（ＥＯ）の順で各機能を連続処理す
る。GSEO ... Group setting (G
S) → End out (EO) in this order, each function is processed continuously.

【００５７】ＥＯＷ…………エンドアウト（ＥＯ）→ウ
ェイト（Ｗ）の順で各機能を連続処理する。以前にＧＳ
機能によってセットされたグループに対し、一連の
同期処理を１マシン命令で実現する。EOW: Each function is continuously processed in the order of end out (EO) → wait (W). Previously GS
A series of synchronous processing is realized by one machine instruction for the group set by the function.

【００５８】その他、プロセッサ側の処理をより高速
化、簡略化するためには以下の複合インストラクション
を有する。In addition, the following complex instructions are provided in order to speed up and simplify the processing on the processor side.

【００５９】ＴＳＥＯＷ……トータルセッティング（Ｔ
Ｓ）→エンドアウト（ＥＯ）→ウェイト（Ｗ）の順で各
機能を連続処理する。TSEOW ... Total setting (T
Each function is continuously processed in the order of S) → end out (EO) → wait (W).

【００６０】ＴＳＥＯ………トータルセッティング（Ｔ
Ｓ）→エンドアウト（ＥＯ）の順で各機能を連続処理す
る。TSEO ... Total setting (T
S) → End out (EO) in this order, each function is processed continuously.

【００６１】ここで、ＴＳはすべてのプロセッサを１つ
のグループとみなして同期処理を行うことを指令する基
本機能であり、トータルセッティングと呼ぶ。図７にお
ける信号線４Ｃをアクティブにすることにより、同期レ
ジスタの値をすべてのプロセッサがグループに属すると
みなすモードにセットする機能である。Here, the TS is a basic function for instructing that all the processors are regarded as one group and synchronous processing is performed, and is called total setting. This is a function of activating the signal line 4C in FIG. 7 to set the value of the synchronization register to a mode in which all the processors are considered to belong to the group.

【００６２】図９、図１０はタスクの処理方法を説明す
る説明図である。9 and 10 are explanatory views for explaining the task processing method.

【００６３】本図を用いて従来のコントロールフローと
比較しながら、その使用法及び効果を説明する。The use and effect of the control flow will be described with reference to FIG.

【００６４】図９は、グループ内同期命令であるＧＳＥ
ＯＷか又はＥＯＷのみを使用し、図７に示した同期処理
回路を用いて行った従来の同期処理による並列処理制御
を示している。プロセッサｍとプロセッサｎとがグルー
プを構成している。既に述べた様に、互いのプロセッサ
がそれぞれタスクが終了した時点でプロセッサｍとｎと
が共にグループに属することを宣言しあう。そして、そ
の際出力されたグループ内のプロセッサｍ、ｎからのタ
スク処理終了信号がすべてアクティブになるまでプロセ
ッサｍ、ｎは互いに待ち合わせを行い、その後、次のタ
スクに処理を進めることにより矛盾なく並列処理を進め
ていく。FIG. 9 shows GSE which is an intra-group synchronization instruction.
8 shows parallel processing control by conventional synchronous processing performed using the synchronous processing circuit shown in FIG. 7 using only OW or EOW. The processor m and the processor n form a group. As described above, the respective processors declare each other that the processors m and n belong to the group when their respective tasks are completed. Then, the processors m and n wait for each other until all the task processing end signals from the processors m and n in the group output at that time become active, and thereafter, the processing is advanced to the next task so that parallel processing is performed without any conflict. Proceed with processing.

【００６５】図１０は、ＥＯ命令（タスク終了情報のみ
の出力）とＷ命令（グループ内のタスク終了情報がすべ
て出そろったかを確認する）を各々に使用することによ
り、図１に示す実施例で説明した共有システムＣＳＹＳ
へのアクセス条件によるデータドリブン（データフロー
的）を用いた自動チューニング機能と同じ様な効果を、
コントロールフロー（ソフトウェアによる同期命令の指
令によってトップダウン的に並列処理をコントロールす
る）によって実現した例である。並列化処理を行った段
階で、プログラム中の同期処理を行うポイントにコンパ
イラが図８に示したＳＹＮＣＯ〜７までの適切な同期用
インストラクションを挿入していく。ＳＹＮＣＯ〜７の
使い分け条件は以下の通りである。FIG. 10 shows the embodiment shown in FIG. 1 by using an EO instruction (outputting only task end information) and a W instruction (confirming that all task end information in the group has been output). Explained sharing system CSYS
The same effect as the automatic tuning function using data driven (data flow type) depending on the access condition to
This is an example realized by a control flow (top-down parallel processing is controlled by a command of a synchronization instruction by software). At the stage of performing the parallel processing, the compiler inserts appropriate synchronization instructions up to SYNCO to 7 shown in FIG. 8 at the point where the synchronization processing is performed in the program. The conditions for proper use of SYNCO to 7 are as follows.

【００６６】（１）あるプロセッサのタスクの終了時
に、同レベルで実行を終了する他のプロセッサで実行さ
れたタスクから、次にプロセッサが実行すべきタスクへ
のリレーション（図９、図１０において矢印で示してい
る）が存在しない場合、タスク終了情報を出力するだけ
の命令であるＥＯを挿入する。すなわち、図９、図１０
の例ではタスク３の終了時及びタスク４の終了時点がこ
の条件にあてはまる。プロセッサの動作レベルで説明す
れば、あるプロセッサが、あるタスクを終了して次のタ
スクに処理を進める際、同レベルで実行された他のプロ
セッサからの情報を必要としなければＥＯ命令を実行し
て、次のタスクへ無条件で処理を進めることができると
いう事を意味する。(1) When a task of a certain processor ends, the relation between a task executed by another processor that terminates execution at the same level and a task to be executed next by the processor (arrows in FIGS. 9 and 10) ), Which is an instruction for outputting the task end information, is inserted. That is, FIG. 9 and FIG.
In the above example, the end time of task 3 and the end time of task 4 apply to this condition. In terms of the operation level of a processor, when one processor finishes a task and proceeds to the next task, it executes the EO instruction unless it needs information from another processor executed at the same level. It means that the process can proceed unconditionally to the next task.

【００６７】（２）１つ前の同期レベルでＥＯ命令を実
行して現在実行中のタスク処理に移ったプロセッサは、
そのタスクの終了時点で次に実行すべきタスクが同レベ
ルで実行を終了する他のプロセッサで実行されたタスク
とのリレーションを有するとき、前レベルで実行したＥ
Ｏ命令に対応する同期チェック命令としてＷ命令を実行
して、前レベルの同期処理が終了したかを確認する。そ
の後、現在の同期レベルの同期処理（例えばＧＳＥＯＷ
又はＥＯＷ）を行い、次のタスクへ処理を進める。すな
わち、図９、図１０の例ではタスク５の終了時点がこの
条件に当てはまる。プログラム上では、タスク５の終了
時点にＷとＥＯＷをこの順で配置すれば良い。ここで
の同期レベルとは、図９、図１０におけるＳＹＮＣレベ
ル１〜３に相当し、そのレベルでのタスク処理が終了し
て同期処理を実行する必要が生じた時点を差す。なお、
ＧＳＥＯＷ命令とＥＯＷ命令との違いは、ＧＳＥＯＷが
グループに属するプロセッサを指定する機能（ＧＳ）を
伴うのに対して、ＥＯＷはその機能を伴わないことであ
る。したがって、ＥＯＷを指定した場合は、前のＧＳ機
能でセッティングされたプロセッサグループ構成がその
ままデフォールト値として使用される。(2) The processor that has executed the EO instruction at the previous synchronization level and moved to the task processing currently being executed,
When the next task to be executed at the end of the task has a relation with a task executed by another processor that ends execution at the same level, E executed at the previous level
The W instruction is executed as the synchronization check instruction corresponding to the O instruction to check whether the synchronization processing of the previous level is completed. After that, the synchronization process of the current synchronization level (for example, GSEOW
Or EOW) to proceed to the next task. That is, in the example of FIGS. 9 and 10, the end time point of task 5 applies to this condition. On the program, W and EOW may be arranged in this order at the end of task 5. The synchronization level here corresponds to the SYNC levels 1 to 3 in FIGS. 9 and 10, and refers to the point in time when the task processing at that level ends and the synchronization processing needs to be executed. In addition,
The difference between the GSEOW instruction and the EOW instruction is that GSEOW has a function (GS) of designating a processor belonging to a group, whereas EOW does not have that function. Therefore, when EOW is specified, the processor group configuration set by the previous GS function is used as it is as the default value.

【００６８】図１０の処理順を説明する。The processing order of FIG. 10 will be described.

【００６９】（ａ）タスク０、タスク２、タスク４、タ
スク６はこの順でプロセッサｍにより実行するとする。
タスク１、タスク３、タスク５、タスク７はこの順でプ
ロセッサｎにより実行するとする。また、本例では、各
タスク間に図中矢印で示したリレーションを有するとす
る。(A) It is assumed that task 0, task 2, task 4, task 6 are executed by processor m in this order.
It is assumed that task 1, task 3, task 5, and task 7 are executed by processor n in this order. Further, in this example, it is assumed that there is a relation shown by an arrow in the figure between each task.

【００７０】（ｂ）タスク０とタスク１の処理結果はお
互いにタスク２とタスク３で使用する。したがってプロ
セッサｍとｎはＳＹＮＣレベル１で同期をとる必要があ
り、グループの設定から同期完了チェックまでの一連の
同期処理をお互いにＧＳＥＯＷ命令を実行し合うことに
よって１命令で実現している。(B) The processing results of task 0 and task 1 are used by task 2 and task 3 mutually. Therefore, it is necessary for the processors m and n to synchronize at SYNC level 1, and a series of synchronization processes from group setting to synchronization completion check are realized by one instruction by mutually executing GSEOW instructions.

【００７１】（ｃ）ＳＹＮＣレベル２においては、タス
ク４はタスク２とタスク３の処理結果を使用し、タスク
５はタスク３の結果しか使用しない。ただし、グループ
の変更はない。したがって、プロセッサｍはＥＯＷ命令
によりタスク終了情報の出力から同期完了チェックまで
の一連の同期処理を実行し、プロセッサｎはＥＯ命令を
実行して、同期チェック（Ｗ）を行わずに連続して次の
処理に進む。(C) In SYNC level 2, task 4 uses the processing results of task 2 and task 3, and task 5 uses only the result of task 3. However, the group has not changed. Therefore, the processor m executes a series of synchronization processes from the output of the task end information to the synchronization completion check by the EOW instruction, and the processor n executes the EO instruction and continuously executes the next synchronization without performing the synchronization check (W). Go to processing.

【００７２】（ｄ）ＳＹＮＣレベル３においては、タス
ク６がタスク４の処理結果しか必要とせずタスク７はタ
スク４とタスク５のいずれの処理結果も必要とする。し
たがって、プロセッサｍはＥＯ命令を実行し、同期チェ
ック（Ｗ）を行わずに連続して次のプロセッサに進む。
一方、プロセッサｎは、以前（ＳＹＮＣレベル２）にお
けるＥＯに対応したＷ命令を実行してＳＹＮＣレベル２
の同期チェックを行い、次にＥＯＷ命令を実行してＳＹ
ＮＣレベル３の一連の同期処理を実行した後（すなわち
タスク４が完了したことを確認した後）、次のタスク
（タスク７）へ処理を進める。以上により、図９で生じ
ていた遊び時間ｔａ及びｔｂが、図１０では殆ど解消し
ているのがわかる。(D) In SYNC level 3, task 6 needs only the processing result of task 4, and task 7 needs both processing results of task 4 and task 5. Therefore, the processor m executes the EO instruction and continuously advances to the next processor without performing the synchronization check (W).
On the other hand, the processor n executes the W instruction corresponding to the EO in the previous (SYNC level 2) to execute the SYNC level 2
SYNC check is performed, then the EOW instruction is executed and SY
After executing a series of synchronization processing of NC level 3 (that is, after confirming that task 4 is completed), the processing is advanced to the next task (task 7). From the above, it can be seen that the play times ta and tb generated in FIG. 9 are almost eliminated in FIG.

【００７３】本実施例におけるコントロールフロー方式
の特徴を以下に挙げる。The features of the control flow system in this embodiment are listed below.

【００７４】（１）タスク間のリレーションと処理時間
が正確にわかっている場合は、事前に行う並列化スケジ
ュール時にプログラム内に適切な同期命令をルールに従
って挿入しておくだけで遊び時間の最適なチューニング
を実現できる。自動チューニングを用いた場合は、タス
ク間のリレーションを厳密に判断できないので、最適性
については本実施例より劣る。(1) If the relation between the tasks and the processing time are accurately known, it is only necessary to insert an appropriate synchronization instruction in the program according to the rule at the time of the parallelization schedule to be performed in advance, so that the play time is optimized. Tuning can be realized. When the automatic tuning is used, the relation between tasks cannot be strictly determined, and therefore the optimality is inferior to that of the present embodiment.

【００７５】（２）図７に示した同期処理回路以外の特
別なハードウェアを必要としないため、単純な構成とな
る。(2) Since no special hardware other than the synchronous processing circuit shown in FIG. 7 is required, the structure is simple.

【００７６】[0076]

【発明の効果】本発明によれば、複数のタスクを分担し
て並列処理する複数のプロセッサと、そのプロセッサ間
でやりとりするデータを共有するデータ共有回路と、複
数のプロセッサ間の同期をとる同期処理回路と、複数の
内のそれぞれのプロセッサからデータ共有回路への競合
するアクセス要求を制御する局所同期化回路とを備えた
ことにより、タスク処理が終了していて、かつ同期処理
が終了していない場合、プロセッサは次のタスクで共有
データが必要となるまでの間できるだけ次のタスク処理
を、先行してタスク処理を進めて行くことができるの
で、空処理時間を減少させる効果が得られる。According to the present invention, a plurality of processors that share a plurality of tasks and perform parallel processing, a data sharing circuit that shares data exchanged between the processors, and a synchronization that synchronizes the plurality of processors. The task processing is completed and the synchronization processing is completed by including the processing circuit and the local synchronization circuit that controls conflicting access requests from the respective processors in the plurality to the data sharing circuit. If not, the processor can advance the next task processing as much as possible until the next task needs the shared data, so that the empty processing time can be reduced.

【００７７】また、ジョブを並列処理する場合、関連す
るプロセッサ同志をグリープにまとめグループ内のプロ
セッサ間あるいはグループ間で同期をとることによっ
て、同期処理機構をハードウエア化することができるの
で、同期処理に要するソフトウエア・オーバーヘッドを
極小化する効果が得られる。Further, when jobs are processed in parallel, the synchronous processing mechanism can be implemented by hardware by collecting related processors in a group and synchronizing them between processors within a group or between groups. The effect of minimizing the software overhead required for is obtained.

【００７８】そして、ソフトウェアによる同期命令よっ
てトップダウン的に並列処理をコントロールするコント
ロールフローと、ハードウェアによるデータフロー的自
動チューニングを組み合わせることにより、プロセッサ
の遊び時間をより極小化でき、最適な並列化効率を実現
することが可能となる。By combining the control flow for controlling the parallel processing in a top-down manner with the software synchronous instruction and the automatic tuning of the data flow by the hardware, the idle time of the processor can be further minimized and the optimum parallelization can be achieved. It is possible to achieve efficiency.

[Brief description of drawings]

【図１】本発明の実施例に係る全体構成を示すハードウ
エア・ブロック図である。FIG. 1 is a hardware block diagram showing an overall configuration according to an embodiment of the present invention.

【図２】図１に示した信号制御回路の構成を示す系統図
である。FIG. 2 is a system diagram showing a configuration of a signal control circuit shown in FIG.

【図３】図１に示した同期処理回路の実施例を示す系統
図である。FIG. 3 is a system diagram showing an embodiment of the synchronization processing circuit shown in FIG.

【図４】図１に示した同期処理回路の他の実施例を示す
系統図である。FIG. 4 is a system diagram showing another embodiment of the synchronization processing circuit shown in FIG.

【図５】本発明のプロセッサ間の同期処理による並列処
理の制御例を示す説明図である。FIG. 5 is an explanatory diagram showing an example of control of parallel processing by synchronous processing between processors of the present invention.

【図６】本発明の実施例に係る局所同期を併用した場合
の処理時間の短縮効果を示す説明図である。FIG. 6 is an explanatory diagram showing the effect of shortening the processing time when using local synchronization according to the embodiment of the present invention.

【図７】本発明の他の実施例の同期処理回路の構成を示
すブロック図である。FIG. 7 is a block diagram showing a configuration of a synchronization processing circuit according to another embodiment of the present invention.

【図８】本発明の他の実施例の同期処理命令の内容を示
す。FIG. 8 shows the contents of a synchronization processing instruction according to another embodiment of the present invention.

【図９】本発明の他の実施例のタスクと同期処理命令の
関係を説明する説明図である。FIG. 9 is an explanatory diagram illustrating a relationship between a task and a synchronization processing instruction according to another embodiment of the present invention.

【図１０】本発明の他の実施例のタスクと同期処理命令
の関係を説明する説明図である。FIG. 10 is an explanatory diagram illustrating a relationship between a task and a synchronization processing instruction according to another embodiment of the present invention.

[Explanation of symbols]

１マルチ・プロセッサを構成するプロセッサ２同期処理エレメント４同期処理エレメントへのアクセス信号線４ａ信号線４ｂ信号線４ｃ信号線５同期用レジスタ６判断回路７フィリップ・フロップ８タスク処理終了信号ライン９ステータス信号線（タスク終了情報線）ｌ２０共有システムバスｌ２１アービトレーションライン５０ｎ信号制御回路５１ｎ局所共有システム１００ｎ処理ユニット１０１同期処理回路１０２共有システム１０３アービトレーション回路 1 Processors that make up a multi-processor 2 Synchronous processing element 4 Access signal line to synchronous processing element 4a signal line 4b signal line 4c signal line 5 Synchronization register 6 Judgment circuit 7 Philip Flop 8 task processing end signal line 9 Status signal line (task end information line) 120 shared system bus l21 arbitration line 50n signal control circuit 51n Local sharing system 100n processing unit 101 Synchronous processing circuit 102 sharing system 103 Arbitration circuit

Claims

[Claims]

1. A plurality of processors that share a plurality of tasks and perform parallel processing, a data sharing circuit that shares data exchanged between the plurality of processors, and a synchronization that synchronizes the plurality of processors. A synchronization processing device between processors, comprising: a processing circuit; and a local synchronization circuit for controlling competing access requests from the respective processors in the plurality of data sharing circuits.

2. The synchronous processing circuit inputs task end information output by a processor that has completed a task, and after all the task end information output by some processors that process related tasks have changed to active. From the task end information, the processor to be synchronized outputs an active value to the corresponding processor as the synchronization end information indicating that the task processing has ended, and the local synchronization circuit outputs the task end information to the processor that outputs the task end information. The access is prohibited and the operation of the processor is suspended if the synchronization end information corresponding to the access to the data sharing circuit is not output as an active value from the synchronization processing circuit. A synchronous processing device between processors according to.

3. The synchronization processing circuit forms a group of processors that process related tasks when performing parallel processing, and synchronizes between the processors in the group or between the groups. The synchronous processing device between processors according to claim 1, further comprising:

4. The synchronization means is triggered by access to the synchronization register, which stores information corresponding to the names of the respective processors forming the group output by one processor in the group, and access to the synchronization register. Flip-flop, a first transmission circuit for transmitting the state of the flip-flop to another processor, the transmitted state of the flip-flop and the stored contents of the synchronization register, and the synchronization register A decision circuit for deciding whether or not each processor in the group stored in the group is active, and a decision result output by the decision circuit for transmitting to the one processor in the group via the flip-flop. 4. The inter-processor synchronization processing device according to claim 3, further comprising:

5. The synchronization means is triggered by access to the synchronization register, which stores information corresponding to the names of the respective processors forming the group output by one processor in the group, and access to the synchronization register. A flip-flop, a first signal transmission means for outputting a trigger signal for outputting the task end information from the processor to the flip-flop, and the synchronization register storing a group of processors. Second signal transmission means for outputting a trigger signal to the synchronizing register from the processor, and an active signal from the processor for setting the value of the synchronizing register so that all the processors belong to the group. And a third signal transmission means for outputting to the register. The synchronous processing device between processors according to claim 3.

6. Synchronizing processing between a plurality of processors,
In a synchronous processing device between processors that controls parallel processing without contradiction, means for outputting active task end information at the time when each processor finishes a task corresponding to each processor, and a processor to be synchronized When all the task end information becomes active, and means for deactivating the active task end information corresponding to each processor using the information, and that the processor checks the task end information and finishes the synchronization processing. And a means for confirming that the synchronous processing device between processors.

7. The processor comprises means for instructing each processor to issue an EO instruction for instructing the output of the active task end information and a W instruction for confirming that the synchronization processing has been completed. A synchronous processing device between processors according to.

8. At the end of the task of one processor,
If there is no relation from a task executed by another processor that terminates execution at the same level to a task to be executed next by the processor, insert the EO instruction,
When the processor terminates a task and proceeds to the next task, if the information from another processor executed at the same level is not needed, the EO instruction is executed to unconditionally proceed to the next task. 8. The inter-processor synchronization processing device according to claim 7, further comprising means for advancing the processing by.

9. The processor which executes the EO instruction at the previous synchronization level and shifts to the currently executed task processing,
When the next task to be executed at the end of the task has a relation with a task executed by another processor that ends execution at the same level, as a synchronization check instruction corresponding to the EO instruction executed at the previous level. Execute the W command to check whether the previous level synchronization processing has finished,
9. The inter-processor synchronization processing apparatus according to claim 8, further comprising means for performing a synchronization processing of the current synchronization level and advancing the processing to the next task.