JPH061464B2

JPH061464B2 - Multi Microprocessor Module

Info

Publication number: JPH061464B2
Application number: JP59257533A
Authority: JP
Inventors: 雅嗣亀谷
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1984-12-07
Filing date: 1984-12-07
Publication date: 1994-01-05
Anticipated expiration: 2009-01-05
Also published as: JPS61136157A

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は種々の並列処理用ハードウエア機構を有し、高
度なMIMD(Multi-Instruction stream Multi Data strea
m)型の並列処理とプロセツサの遊び時間を利用した適応
型の並列処理とによつて、知能ロボツト等の高度知能処
理や運動制御適応制御に伴うリアルタイム処理に適した
マルチ・マイクロプロセツサに関するものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Use of the Invention] The present invention has various parallel processing hardware mechanisms and has an advanced MIMD (Multi-Instruction stream Multi Data stream).
(m) type parallel processing and adaptive parallel processing using the processor's play time, relating to multi-microprocessors suitable for advanced intelligence processing such as intelligent robots and real-time processing accompanying motion control adaptive control Is.

[Background of the Invention]

従来のマルチ・マイクロプロセツサ・システムは、ハー
ドウエア構成自体が専用的で、汎用的な用途に向かなか
つたり、平等プロセツサ方式を採り汎用性を目指したシ
ステムでもハードウエア構成が十分検討されておらず、
データの通信やプロセツサ間の同期をとるためのフラグ
やステータス授受に多大のソフトウエアオーバヘツドを
伴い、タスクを十分細分化できず効率の良い並列処理を
実現できていない例が多い。In the conventional multi-microprocessor system, the hardware configuration itself is dedicated, and it is suitable for general-purpose applications, and even with a system aiming at versatility by adopting the equal processor system, the hardware configuration has been thoroughly studied. No,
A lot of software overhead is involved in data communication and flag exchange for processor synchronization and status exchange, and in many cases tasks cannot be subdivided sufficiently and efficient parallel processing cannot be realized.

文献「システムと制御第２８巻第４号別冊ＰＰ７３〜７
６(1984)」に発表された「ブロードキヤストメモリ結合
型計算機」と題する論文を例にとつて説明する。この例
においては、プロセツサ間の通信手段として、プロセツ
サごとにブロードキヤストメモリと呼ばれる共有メモリ
を設け、読み出し処理のみ各プロセツサで独立して行
え、書き込み処理については、あるプロセツサが自分の
ブロードキヤストメモリにデータの書き込みを行うと自
動的にすべてのプロセツサのブロードキヤストメモリの
同一のアドレスにデータが転送される方式を採用して、
プロセツサ間のアクセス競合を減少させる工夫をしてい
る。この例においては、数値計算等のスタテイツクなプ
ログラムを並列処理するものとし、共有メモリへの読み
出し処理の回数の方が書き込み処理の回数より十分多い
ことを前提としているためこの様な設計が成り立つ訳で
あるが、データ領域として使用する共有メモリの場合、
読み出し処理が特に多くなるのは、共有メモリ上にステ
ータスを置きチエツクループで監視する様な場合であ
り、純粋な共有データの送受のみを考えた場合、共有メ
モリ上への書き込み処理回数の共有メモリへの全読み出
し、書き込み処理回数に占める割合は、通常の技術計算
や数値計算においても２０％前後には達すると考えられ
る。また制御用途のシステムを考えた場合、センサ等か
らのたれ流しデータの書き込み処理や、大量のデータ移
動などダイナミツクな要素が多いため、書き込み処理の
割合はさらに増加すると考えられ、書き込み処理、読み
出し処理が共に高速に行えなければならない。この例に
おける共有メモリへの書き込み処理は、汎用の１つのシ
ステムバスを使用するため、ステツチ切換やアクセス競
合によるハードウエアオーバーヘツドが非常に大きいと
考えられ、制御用途に使用する際の上述した種々の問題
が考慮されていないし、また、同一内容のメモリをプロ
セツサ数台分持つことは、経済的な観点からみれば無駄
が多く、すぐれているとは言えないと考えられる。さら
に、従来のシステムにおいてもこの例においても、プロ
セツサ間同期手段等の特別な並列処理用ハードウエア機
構を有しておらず、並列処理を行う際の必要な付加的処
理は共有メモリ等の汎用通信手段を利用してすべてソフ
トウエアにより行うが一般的であり、並列処理タスク間
の接続に多大のソフトウエアオーバーヘツドを要するた
め、分割タスクを大きくしなければならず、この例の第
１表に示される様に、並列処理性の最も高い対象の１つ
であり、プロセツサ台数分の並列処理効率が得られると
考えられる行列の積の計算においてすらも、通信オーバ
ーヘツドやプロセツサ間の同期オーバーヘツド及び分割
タスクの少なさ等から、並列処理効率が使用プロセツサ
数の増加に対して直線的に向上していない。また、この
例を含む従来のマルチ・マイクロプロセツサ・システム
は、最初から並列処理性の高い処理対象に限定したり、
処理の特種性に注目してそれに合致する様にハードウエ
ア及びソフトウエアを構成する専用用途向けのシステム
が大半であり、ダイナミツクな要素を含む並列処理プロ
セツサ間での条件ジヤンプや、処理対象の並列処理性に
伴うプロセツサの遊び時間等のプロセツサの余剰能力の
利用に関しては全く考慮されていなかった。しかし、種
々のリアルタイム処理を行うことを前提とした汎用シス
テムにおいては、並列処理性の高い処理も低い処理も混
在しており、高度な処理対象においてはプロセツサ間で
の条件ジヤンプ処理や、プロセツサの遊び時間の管理と
その有効利用に関する問題が、システムの汎用性と高い
コストパーフオーマンスを維持する上で重要になると考
えられる。Document "System and Control Vol. 28, No. 4, Supplement, PP73-7
6 (1984) ”entitled" Broadcast Memory Combining Computer "will be described as an example. In this example, as a communication means between the processors, a shared memory called a broadcast memory is provided for each processor, and only the reading process can be performed independently by each processor, and for the writing process, a certain processor can write to its own broadcast memory. Adopting the method of automatically transferring data to the same address of the Broadcast memory of all processors when writing data,
We are trying to reduce access competition between processors. In this example, a static program such as numerical calculation is processed in parallel, and it is assumed that the number of read processing to the shared memory is sufficiently larger than the number of write processing. However, in the case of shared memory used as a data area,
The number of read processes is especially large when the status is set on the shared memory and is monitored by a check loop. Considering only pure shared data transmission / reception, the shared memory of the number of write processes on the shared memory It is considered that the ratio of the total number of times of reading and writing to the memory reaches about 20% in ordinary technical calculation and numerical calculation. In addition, when considering a system for control applications, there are many dynamic elements such as the processing of writing data flowing away from sensors, etc., and the movement of large amounts of data, so the proportion of writing processing is considered to increase further. Both must be able to do it at high speed. Since the writing process to the shared memory in this example uses one general-purpose system bus, it is considered that the hardware overhead due to the switch of the switch and the access competition is very large. Is not taken into consideration, and it is considered that having a memory of the same contents for several processors is wasteful from an economical point of view and is not considered excellent. Furthermore, neither the conventional system nor this example has a special parallel processing hardware mechanism such as inter-processor synchronization means, and the additional processing required for parallel processing is general-purpose such as shared memory. It is general that all is done by software using communication means, and since a large amount of software overhead is required for connection between parallel processing tasks, the division tasks must be made large. As shown in, even in the calculation of the matrix product, which is one of the objects with the highest parallelism and is considered to have the parallelism efficiency of the number of processors, the communication overhead and the synchronization overrun between the processors Due to the small number of heads and divided tasks, parallel processing efficiency does not improve linearly with the increase in the number of processors used. In addition, the conventional multi-microprocessor system including this example is limited to the processing target with high parallel processing from the beginning,
Most of the systems are for specialized use, and the hardware and software are configured to match the special characteristics of the processing, and the condition jump between parallel processing processors including dynamic elements and the parallel processing of the processing target are included. No consideration was given to the use of excess capacity of the processor such as the play time of the processor due to the processability. However, in a general-purpose system premised on performing various kinds of real-time processing, processing with high parallel processing and processing with low parallel processing are mixed, and for advanced processing targets, conditional jump processing between processors and processing of processors are performed. The issues of managing play time and its effective use will be important in maintaining system versatility and high cost performance.

[Object of the Invention]

本発明の目的は、高い並列処理効率と汎用性を備えるマ
ルチ・マイクロプロセツサ・モジユールを提供すること
にある。An object of the present invention is to provide a multi-processor module having high parallel processing efficiency and versatility.

[Outline of Invention]

本発明は高い並列処理効率の実現のため、並列処理に是
非必要と思われるプロセツサ間の並列処理用ステータス
通信ハードウエア手段と、データ通信専用に設けた複数
の高速共用メモリ通信手段とを、それぞれ独立させて設
けることにより、高い通信スループツトと、プロセツサ
からのソフトウエアによるオペレーシヨンオーバーヘツ
ドの極小化とを実現し、並列処理におけるプロセツサ間
の命令伝達、プロセツサ間の同期等のタスク接続時間に
影響を及ぼす操作のソフトウエアオーバーヘツドと、デ
ータ通信におけるプロセツサ間の競合によるハードウエ
アオーバーヘツドの最小化を図ることによつて、タスク
の細分化と多数のプロセツサへの分配を可能とし、それ
によつて高い並列処理効率を得ることができる。また、
プロセツサを任意のグループに分け、そのグループ内の
プロセツサ間で同期をとるグループ内プロセツサ間同期
手段を独立に複数設け、多重同期によつて、データフロ
ー風にスケジユールされた並列処理を、プロセツサをグ
ループ化して統一的に制御することによりMIMD(Multi-I
nstruction stream Multi-Data stream)型並列処理を機
械的にかつ効率良く実行できるばかりか、同期手段によ
り同期単位を明確化し、プロセツサの並列動作を多重、
階層的に管理及び制御することが可能となるため、各同
期単位間での条件ジヤンプ処理等の汎用的な機能を実現
できる。さらに、同期手段の同期完了割込み機能により
プロセツサ間の同期チエツクをハードウエアで監視する
ことによつて、スケジユールされた並列処理の実行中に
生ずるプロセツサの遊び時間を、並列処理のスケジユー
ルを乱すことなくバツクグラウンドオペレーシヨンに割
り当てることができ、これによつてプロセツサの遊び時
間の有効利用が可能となる。In order to achieve high parallel processing efficiency, the present invention includes a status communication hardware means for parallel processing between processors, which is considered necessary for parallel processing, and a plurality of high-speed shared memory communication means dedicated for data communication, respectively. By providing them independently, high communication throughput and minimization of operation overhead by software from processors are realized, which affects the task connection time such as instruction transmission between processors and synchronization between processors in parallel processing. It is possible to subdivide tasks and distribute them to a large number of processors by minimizing the software overhead of the operation that affects the processing and the hardware overhead due to the competition between the processors in data communication. High parallel processing efficiency can be obtained. Also,
Divide the processors into arbitrary groups, and independently provide multiple inter-processor synchronization means to synchronize the processors in the group.By using multiple synchronization, parallel processing scheduled like a data flow can be performed by grouping the processors. MIMD (Multi-I
nstruction stream Multi-Data stream) type parallel processing can be executed mechanically and efficiently, and the synchronization unit is clarified by the synchronization means, and the parallel operation of the processor is multiplexed.
Since it is possible to manage and control in a hierarchical manner, general-purpose functions such as conditional jump processing between each synchronization unit can be realized. Furthermore, by monitoring the synchronization check between the processors by the hardware by the synchronization completion interrupt function of the synchronizing means, the processor play time generated during the execution of the scheduled parallel processing can be prevented without disturbing the parallel processing schedule. It can be assigned to the back ground operation, which allows effective use of the processor's play time.

以上により、目的の密結合型マルチ・マイクロプロセツ
サ・モジユールのハードウエア・アーキテクチユアを提
供した。なお本発明をモジユールと称したのは、本発明
のマルチ・マイクロプロセツサを多数結合し、さらに大
規模なマルチ・マイクロプロセツサ・システムを構築す
ることが最終目標であり、本発明のマルチ・マイクロプ
ロセツサは、その基礎となるプロセツサ・モジユールと
みなせるからである。As described above, the hardware architecture of the desired tightly coupled multi-microprocessor module is provided. The present invention is called a module because the final goal is to combine a large number of multi-microprocessors of the present invention to construct a larger-scale multi-microprocessor system. Microprocessors can be regarded as the underlying processor modules.

Example of Invention

以下本発明の実施例を図面を参照しながら詳細に説明す
る。Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

第１図は、本発明のマルチ・マイクロプロセツサ・モジ
ユールのハードウエア構成の実施例を示すブロツク図で
ある。ベースとなるマイクロプロセツサ１〜１３を３本
の独立した共有メモリバス７３，７４，７５及びおもに
共通のインタフエイス９５を接続するためのシステムバ
ス１０９とで接続している。また、並列処理のため予め
必要となると思われるフラグ，ステータス及び、任意プ
ロセツサ間で命令の送受を行うためのプロセツサ間命令
伝達手段，プロセツサ間で同期をとるためのグループ内
プロセツサ間同期手段，同期割込み機能等の遊びプロセ
ツサ管理機能及び機構等の特別に考案した並列処理用ハ
ードウエアを各プロセツサのコミユニケーシヨンコント
ローラ２１〜３３に格納のして共通バス７６、専用と共
通の混合バス７７とにより他の共通バスとは独立させ
て、各コミユニケーシヨンコントローラを接続してい
る。これにより、並列処理のために必要となるフオーマ
ツト化可能な機能及びステータス，フラグ類を、共有メ
モリ上でソフトウエアにより実現するのではなく、専用
ハードウエアによりごく簡単な操作で効率良く実現でき
るため、ソフトウエアに伴うオーバーヘツドタイムを極
小化できるばかりか、共有メモリ上で実行した場合共有
メモリを長期間専有するステータスチエツククループを
大半コミユニケーシヨンコントローラ２１〜３３の専用
手段上でアクセス競合によるオーバーヘツド無しで実行
できるため、共有メモリの負担を大幅に軽減し、全体の
スループツトを大幅に上昇させている。プロセツサ間命
令伝達手段及びグループ内プロセツサ間同期手段につい
ては後で詳述する。共有メモリ１４，１５，１６は、そ
れぞれ共有メモリバス７３，７４，７５の上にあり、プ
ロセツサ１〜１３を共有メモリに接続するためのアービ
テーシヨンコントロールを行う判別回路１７，１８，１
９によつて各共通バス７３，８４，７５を制御するよう
になつている。３４〜４６は、共有メモリ１４の接続さ
れた共通バス７３にプロセツサのローカルバス１１０〜
１１２をデコードして要求信号を作り出すデコーダ回路
であり、専用バス１２３につて共有メモリアクセス要求
信号、許可信号のやりとりを行う。共有メモリ１５，１
６における４７〜５９と１２４及び６０〜７２と１２５
の関係と機能も上記と同様である。システムバス１０９
は、共有メモリバス程高速でない汎用バスであり、バス
アービタ９４によつて制御される。９６〜１０８は、共
有メモリの場合と同様に、バススイツチ及びデコーダ回
路からなり、専用バス１２６を通してバス要求及びバス
使用許可信号の送受を行う。７８〜９０はプロセツサ１
〜１３にそれぞれ設けられたローカルメモリ及びローカ
ルインタフエイスである。本システムにおいては、通常
のプロセツサ及びプロセツサ個々に分担可能なインタフ
エイスはできるかぎり７８〜９０内に置く。ローカルイ
ンタフエイスには、演算用プロセツサ等の補助プロセツ
サ手段も含まれ、これに対する主プロセツサはそれら補
助プロセツサとの間でローカルな並列処理を実行する。
２０は、後述するプロセツサ間同期手段の共有メモリ１
４上に設けた命令指示マトリックステーブルの解析を行
う回路であり、解析結果を共有バス７７にのせて２１〜
３３に伝達している。本実施例においては、プロセツサ
１〜１３は基本的に平等であるが、便宜上１〜１０を並
列処理用、１１〜１３をシステムマネージメント用とし
て、１１〜１３にはそれぞれ外部のモジユールと通信を
行うためのデユアルポートＲＡＭをベースとした外部通
信手段９１〜９３を設けている。本発明の実施例は、上
述したように、バス手段において、機能別に、データ通
信を高速で行う共通メモリバス７３〜７５、並列処理関
する通信を専用に行うコミユニケーシヨンコントローラ
２１〜３３を結ぶ共通及び専用バス７６，７７さらに、
汎用の共通インタフエイス９５を接続するシステムバス
１０９とに機能分散を図ることにより、競合による損失
や無駄の少ない高いスループツトを実現することを特徴
としている。FIG. 1 is a block diagram showing an embodiment of the hardware configuration of the multi-microprocessor module of the present invention. The base microprocessors 1 to 13 are connected to three independent shared memory buses 73, 74 and 75 and a system bus 109 for connecting mainly a common interface 95. Also, flags and statuses that are considered to be required in advance for parallel processing, inter-processor command transmission means for sending and receiving instructions between arbitrary processors, inter-processor synchronization means for synchronization between processors, synchronization A specially devised parallel processing hardware such as a play processor management function such as an interrupt function and a mechanism is stored in the communication unity controllers 21 to 33 of each processor, and the common bus 76 and the dedicated and common mixed bus 77 are used. Each communication controller is connected independently of other common buses. As a result, the functions, statuses, and flags that can be converted to the format required for parallel processing can be efficiently realized by dedicated hardware, not by software on the shared memory. In addition to minimizing the overhead time associated with software, most of the status check loops that occupy the shared memory for a long time when executed on the shared memory are overrun due to access conflicts on the dedicated means of the communication controller 21 to 33. Since it can be executed without a head, the burden on shared memory is greatly reduced, and the overall throughput is greatly increased. The inter-processor command transmission means and the intra-processor synchronization means will be described in detail later. The shared memories 14, 15 and 16 are on the shared memory buses 73, 74 and 75, respectively, and the discrimination circuits 17, 18 and 1 for performing arbitration control for connecting the processors 1 to 13 to the shared memory.
9, the common buses 73, 84, 75 are controlled. 34 to 46 are the common buses 73 connected to the shared memory 14 and the processor local buses 110 to 110.
A decoder circuit that decodes 112 to generate a request signal, and exchanges a shared memory access request signal and a permission signal with the dedicated bus 123. Shared memory 15,1
6 47-59 and 124 and 60-72 and 125
The relationship and function of are also the same as above. System bus 109
Is a general purpose bus that is not as fast as the shared memory bus and is controlled by the bus arbiter 94. As in the case of the shared memory, 96 to 108 are composed of a bus switch and a decoder circuit, and send and receive a bus request and a bus use permission signal through the dedicated bus 126. 78-90 is processor 1
13 to 13 are a local memory and a local interface, respectively. In this system, an ordinary processor and an interface which can be shared by each processor are placed within 78 to 90 as much as possible. The local interface also includes auxiliary processor means such as a processor for arithmetic operation, to which the main processor executes local parallel processing with the auxiliary processors.
Reference numeral 20 denotes a shared memory 1 of inter-processor synchronization means described later.
4 is a circuit for analyzing the instruction instruction matrix table provided on the No. 4, and the analysis result is put on the shared bus 77 to 21 to 21.
To 33. In this embodiment, the processors 1 to 13 are basically equal, but for convenience, 1 to 10 are used for parallel processing, 11 to 13 are used for system management, and 11 to 13 communicate with external modules, respectively. External communication means 91 to 93 based on a dual port RAM are provided. In the embodiment of the present invention, as described above, in the bus means, the common memory buses 73 to 75 that perform data communication at high speed and the communication units 21 to 33 that exclusively perform communication related to parallel processing are connected by function. And private buses 76 and 77,
By distributing the functions to the system bus 109 connecting the general-purpose common interface 95, it is possible to realize a high throughput with less loss and waste due to competition.

ここで特に重要な共有メモリバス、判別回路及び共有メ
モリからなる共通メモリ通信手段についてさらに詳述す
ることにする。第１図に示すように、本実施例において
は３つの独立した共有メモリ１４，１５，１６を有して
おり、１４を種々のプログラムで使用する共有ステータ
ス領域として、共有メモリ１５，１６は共有データ通信
領域として定義している。共有メモリ１５，１６は合成
アドレス空間と称し、プロセツサ１〜１３に対する共有
メモリバス７４，７５における優先順位をそれぞれ変え
て設定し奇数ワードアドレスを１５に偶数ワードアドレ
スを１６に割り付けアドレス空間を合成している。本実
施例においては、１５がプロセツサ１，２…１３の順、
１６が１３，１２…１の順に優先順位を設定して全体と
してほぼ平等になるように考慮しているが、優先順位の
付け方は他にも可能である。また独立した共有メモリ数
及び機能の割り振りはベースプロセツサの性能や機能に
より最適なものを選ぶようにする必要がある。共有メモ
リの高速制御手法を第２図により説明する。プロセツサ
のメモリサイクルはＴ₁，Ｔ₂，Ｔ₃，Ｔ₄の４つのクロツ
クから成るとし、第２図および第３図中に示している。
ｔはシステム全体を制御するクロツク周期である。プロ
セツサ１〜１３及び共有メモリの判別回路１７，１８，
１９はすべて同一のクロツクで動作し、そのクロツクピ
リオドを第２図および第３図中ので代表される縦線で
示している。第２図により基本的なアクセスタイミング
を説明する。ｇ₁，ｇ₂はプロセツサＰｍ，Ｐｎそれぞれ
のメモリサイクルｇをａｃは共有メモリアクセス要求を
さらにｉには共有メモリアクセス許可を示しており、プ
ロセツサＰｍ，Ｐｎは読み出しサイクルにおいてそれぞ
れａ及びｂのタイミングでデータをプロセツサ内に取り
込むものとする。この点から前の２クロツクを共有メモ
リのサイクルとして定義し、共有メモリサイクルごとに
共有メモリバスの獲得、放棄を行って共有メモリのアク
セス許可信号ｉ_nは常にこの期間にアクテイブになる様
な制御方式を採つている。まず、共有メモリをアクセス
する際の共有メモリのアクセス要求信号ｈ_nはＴ₁の中程
でアクテイブにされ、アクセス競合が起こらない場合
は、次のクロツクピリオドのｃ及びｅのタイミングで判
別回路１７〜１９が共有メモリバスの制御を開始し、こ
の時点でプロセツサＰｍの場合もＰｎの場合も他のプロ
セツサのアクセス要求が無いので判別回路は直ちにプロ
セツサＰｍ，Ｐｎの共有メモリのアクセス許可信号ｉ₁
及びｉ₂をアクテイブにする。次のクロツクピリオドの
ｍとｎでｉ₁，ｉ₂がアクテイブになつている状態を受け
てＴ₃に入つたところでｉ₁，ｉ₂を非アクテイブにし、
さらに次のクロツクピリオドのｄ及びｆでそれぞれ共有
メモリのアクセス要求信号ｈ₁，ｈ₂が非アクテイブにな
つている状態を受けて共有メモリのアクセス許可信号ｉ
₁及びｉ₂を非アクテイブにする。第２図におけるプロセ
ツサＰｍとＰｎはメモリサイクルが重つているにもかかわらず、共
有メモリのアクセス要求信号ｈ₁とｈ₂が重つていないた
め、プロセツサの動作には一切影響与えず制御されてお
り、非常に効率が良くなつているのがわかる。第３図
は、プロセツサＰｍ及びＰｎの共有メモリ・アクセス要
求信号ｈ₃とｈ₄が重つており、アクセス競合が生じてい
る状態を示している。プロセツサＰｍは第２図と同様で
あるが、プロセツサＰｎは共有メモリのアクセス要求信
号ｈ₄をＴ₁の中程でアクテイブにし、次のクロツクピリ
オドで判定回路１７〜１９が共有メモリバスの制御を開
始すると、プロセツサＰｍの共有メモリのアクセス要求
信号ｈ₃及び許可信号ｉ₃が共にアクテイブになつてお
り、すでに共有メモリをアクセスしているので、プロセ
ツサＰｎの共有メモリのアクセス許可信号ｉ₄はアクテ
イブにせず、そのかわりにｊのタイミングでＴｗを挿入
する信号をつくり出し、プロセツサのメモリサイクルを
１クロツクだけのばし、それを受けてプロセツサＰｎの
共有メモリのアクセス要求信号ｈ₄のアクテイブな状態
も１クロツクのばす操作を行う。これ以後クロツクピリ
オドごとに同様の操作を繰り返していく。さて、ｐのタ
イミングでプロセツサＰｍの共有メモリのアクセス要求
信号ｈ₃が非アクテイブになついているので、それとプ
ロセツサＰｎの共有メモリのアクセス要求信号ｈ₄のア
クテイブな状態をｑで知り、判別回路１７〜１９は直ち
にプロセツサＰｍの共有メモリのアクセス許可信号ｉ₃
を非アクテイブにし、プロセツサＰｎの共有メモリのア
クセス許可信号ｉ₄をアクテイブにする。以後の動作は
第２図と同様であるが、共有メモリのサイクルはＴ₃と
Ｔｗの部分にずれ、プロセツサのデータ読み込みタイミ
ングは常にＴ₄の前のクロツクピリオドｋに位置する。
以上によつて必要最小限の待ち時間及び１クロツク内の
十分短かい時間で共有メモリバスの放棄、獲得処理を行
いプロセツサを共有メモリに接続していく無駄のない制
御タイミングにより、常に２クロツクだけ共有メモリを
占有し、共有メモリのアクセス要求を出しているプロセ
ツサを次々に連続アクセスさせる効率の良いバス制御を
行つている。また第３図の様に、すでにプロセツサＰｍ
のｉ₃がアクテイブになつている場合は、無条件に、先
にアクセスを許可されたプロセツサＰｍを優先するが、
いずれも共有メモリのアクセスを許可されておらず、共
有メモリのアクセス要求が重つている場合は、判別回路
１８〜１９がバス制御を行う際予め定められた優先順位
に基づいて、優先順位の高い方を優先する。なお、共有
メモリのサイクルを構成するクロツク数は、共有メモリ
のアクセス速度、バススイツチ時間、バツフア遅延時
間、セツトアツプ時間等を考慮して最適に決定する。Here, the common memory communication means including a shared memory bus, a discrimination circuit, and a shared memory, which are particularly important, will be described in more detail. As shown in FIG. 1, this embodiment has three independent shared memories 14, 15 and 16, and 14 is a shared status area used by various programs, and the shared memories 15 and 16 are shared. It is defined as a data communication area. The shared memories 15 and 16 are called composite address spaces, and are set by changing the priorities of the shared memory buses 74 and 75 for the processors 1 to 13, respectively, and assign an odd word address to 15 and an even word address to 16 to combine the address spaces. ing. In this embodiment, 15 is the order of the processors 1, 2, ...
16 are set in order of 13, 12, ... 1 so that they are almost equal as a whole, but other priorities can be assigned. In addition, it is necessary to select the optimum number of shared memories and allocation of functions depending on the performance and functions of the base processor. A high-speed control method for the shared memory will be described with reference to FIG. The memory cycle of the processor is assumed to consist of four clocks T ₁ , T ₂ , T ₃ , T ₄ and is shown in FIGS. 2 and 3.
t is a clock cycle that controls the entire system. Processors 1 to 13 and shared memory discrimination circuits 17 and 18,
19 operates with the same clock, and the clock period is shown by a vertical line represented by in FIGS. 2 and 3. The basic access timing will be described with reference to FIG. g ₁ and g ₂ indicate a memory cycle g of each of the processors Pm and Pn, ac indicates a shared memory access request, and i indicates a shared memory access permission, and the processors Pm and Pn respectively indicate timings of a and b in the read cycle. The data shall be loaded into the processor with. Define the previous two clock as the cycle of the shared memory from this point, shared memory acquisition of the shared memory bus for each recycling, access permission signal i _n of the shared memory by performing a waiver is always Akuteibu to become such control in this period The method is adopted. First, when accessing the shared memory, the access request signal h _n of the shared memory is made active in the middle of T ₁ , and when access conflict does not occur, the determination circuit is performed at the timings c and e of the next clock period. 17 to 19 start control of the shared memory bus, and at this point in time, neither the processor Pm nor the processor Pn has an access request from another processor. Therefore, the discrimination circuit immediately determines the access permission signal i of the shared memory of the processors Pm and Pn. ₁
And i ₂ are made active. In the next clock period, m and n, when i ₁ and i ₂ are inactive, when they enter T ₃ , i ₁ and i ₂ are made inactive,
Further, at the next clock periods d and f, the shared memory access request signals h ₁ and h ₂ are inactive, and the shared memory access permission signal i is received.
Make ₁ and i ₂ inactive. Although the processors Pm and Pn in FIG. 2 have overlapping memory cycles, the access request signals h ₁ and h _{2 of the} shared memory do not overlap, so they are controlled without affecting the operation of the processor. And it can be seen that it is very efficient. FIG. 3 shows a state in which the shared memory access request signals h ₃ and h ₄ of the processors Pm and Pn are overlapped with each other and access conflict occurs. The processor Pm is the same as that shown in FIG. 2, but the processor Pn activates the access request signal h ₄ of the shared memory in the middle of T ₁ , and the decision circuits 17 to 19 control the shared memory bus at the next clock period. When the access request signal h ₃ of the shared memory of the processor Pm and the permission signal i ₃ are both active and the shared memory has already been accessed, the access permission signal i ₄ of the shared memory of the processor Pn is Instead of making it active, a signal for inserting Tw is generated at the timing of j instead, the memory cycle of the processor is extended by one clock, and in response, the active state of the access request signal h ₄ of the shared memory of the processor Pn is also set to 1. Perform the spread operation of the clock. After that, the same operation is repeated for each clock period. Now, at the timing of p, the access request signal h ₃ of the shared memory of the processor Pm is inactive, so that it and the active state of the access request signal h ₄ of the shared memory of the processor Pn are known by q, and the discrimination circuit 17 to 19 immediately of the shared memory of the processor Pm access permission signal i ₃
Is made inactive, and the access permission signal i ₄ of the shared memory of the processor Pn is made active. The subsequent operation is the same as that shown in FIG. 2, but the cycle of the shared memory is deviated to the portions of T ₃ and Tw, and the data read timing of the processor is always located at the clock period k before T ₄ .
As a result, only two clocks are always available due to the control timing that is not wasteful, that is, the shared memory bus is abandoned and acquired and the processor is connected to the shared memory with the minimum required waiting time and a sufficiently short time within one clock. Efficient bus control is performed to occupy the shared memory and successively access the processors that are requesting access to the shared memory one after another. Moreover, as shown in FIG. 3, the processor Pm has already been
When i _{3 of} is active, the processor Pm that is allowed to access is given priority without any condition.
When none of the shared memory access is permitted and the shared memory access requests are overlapped, the determination circuits 18 to 19 have a high priority based on a predetermined priority when performing bus control. Give priority to one. The number of clocks forming the cycle of the shared memory is optimally determined in consideration of the access speed of the shared memory, the bus switch time, the buffer delay time, the set up time and the like.

次に第４図及び第５図を参照しながらプロセツサ時命令
伝達機構について詳細に説明する。プロセツサ間命令伝
達機構は、共有メモリ上に任意プロセツサから任意プロ
セツサへの命令伝達を可能とする命令指示マトリツクス
・テーブルを設け、命令を指示するプロセツサがそこへ
命令指示データを書き込むと自動的に命令を指示された
プロセツサへ割込みがかかり、命令を指示されたプロセ
ツサは命令指示テーブル上の命令指示データを直接受け
とりそれに従つて命令処理ルーチンの起動を行うことに
特徴がある。本実施例においては、各プロセツサ１〜１
３の割込みベクトルテーブルの一部を共有メモリ１４上
に共有し命令指示マトリツクス・テーブルを構成してい
る。命令を指示するプロセツサは命令を実行させたいプ
ロセツサに対して命令指示マトリツクス・テーブルの所
定の場所に実行させたい命令処理ルーチンの先頭番地を
直接書き込み、命令を実行させたいプロセツサの割込み
応答動作中の割込みベクトル・テーブルからジヤンプ先
フエツチ動作を利用し、指示した命令処理ルーチンの先
頭番地を直接フエツチさせ、命令処理ルーチンへジヤン
プさせる手法を採つている。第４図は、共有メモリ１４
上の命令指示マトリツクス・テーブルの本実施例におけ
る構成を示している。ＣＯＰｎは、プロセツサＰｎから
命令指示領域であり、ＣＯＰｎの中がさらにプロセツサ
Ｐ₀〜Ｐ₁₅への命令指示領域であるＶｎ₀〜Ｐｎ₁₅にわか
れている。例えば、ＶnmにプロセツサＰｎが命令を書き
込むとプロセツサＰｍに命令が伝達される。第５図はプ
ロセツサ間命令伝達手段のハードウエア・ブロツク図を
示している。動作シーケンスを詳述すれば、あるプロセ
ツサが命令指示を行うと、共有メモリ１４上の命令指示
マトリツクステーブル１４８をデコーダ回路１２７によ
つて解析し、命令を指示したプロセツサ番号ｎと命令を
指示されたプロセツサ番号ｍに変換してそれぞれ共通バ
ス１２８と１２９上に乗せる。共通バス１２８及び１２
９は、各プロセツサごとに設置されたコミユニケーシヨ
ンコントローラ２１〜３３中の命令指示回路１４９〜１
６１内にとり込まれデコーダ回路１３０が共通バス１２
９を解析して自分自身に命令が指示されたかどうかを知
り、もし自分自身に命令が指示されていたならデコーダ
回路１３１の解析結果を信号線１６２によつて有効にす
る。デコーダ回路１３１は、どのプロセツサが自分に命
令を指示したかを解析しており、デコーダ回路１３０か
ら有効信号を信号線１６２によつて受けたならば、解析
結果として命令を指示したプロセツサに対応する２進カ
ウンタ１３２〜１４４のうちいずれかをカウントさせ、
割込み制御回路１４５に割込み制御要求を伝達すると同
時に専用１４７に結果を乗せ、命令を指示したプロセツ
サにステータスとして知らせる。割込み制御回路１４５
はプロセツサに割込みをかけ、割込みが受け付けられた
なら命令を指示したプロセツサの属性に相当する割込み
ベクトルをプロセツサに対し発生し、それを受け取つた
プロセツサは、命令指示マトリツクス・テーブル中の命
令が書き込まれたアドレスを参照し、指示された命令ル
ーチンの先頭に直接ジヤンプする。ここで、命令を指示
されたプロセツサが命令指示マトリツクス・テーブルの
命令が指示されたアドレスを参照した際、上記と同様の
シーケンスで、２進カウンタ１３２〜１４４のうち同じ
２進カウンタをカウントさせ初期状態に戻す操作が自動
的に行われる。これにより割込み制御回路への割込み制
御要求信号がクリアされ、専用バス１４７へのステータ
ス信号も同時にクリアされる。命令を指示したプロセツ
サは、自分に関係する命令発動のステータスを例えばプ
ロセツサＰＯのステータス信号線１４６のごとくとり込
み監視することによつて、命令の発動及び起動状態を管
理でき、次の命令を発動できるか否かの判断が可能とな
る。以上の様にして、ごく簡単な操作により任意プロセ
ツサ間で高速な命令伝達が可能となるばかりか、命令の
起動状況の管理も可能となる。Next, with reference to FIGS. 4 and 5, the command transmission mechanism at the time of processor will be described in detail. The inter-processor instruction transmission mechanism provides an instruction instruction matrix table that enables instruction transmission from an arbitrary processor to an arbitrary processor on the shared memory, and when the instruction instruction processor writes the instruction instruction data to it, the instruction instruction table is automatically executed. Is interrupted to the processor instructed, and the processor instructed to directly receives the instruction instruction data on the instruction instruction table and activates the instruction processing routine according to the instruction instruction data. In this embodiment, each processor 1 to 1
A part of the interrupt vector table 3 is shared on the shared memory 14 to form an instruction instruction matrix table. The processor that gives the instruction directly writes the start address of the instruction processing routine that you want to execute to the processor that you want to execute the instruction to a predetermined location in the instruction instruction matrix table, A jump destination fetch operation is used from the interrupt vector table to directly fetch the designated start address of the instruction processing routine and jump to the instruction processing routine. FIG. 4 shows the shared memory 14
The structure in this embodiment of the above instruction instruction matrix table is shown. COPn is a command instruction region from processor Pn, are divided into Vn ₀ to PN ₁₅ is a command instruction region to further processor P ₀ to P ₁₅ is in the COPn. For example, when the processor Pn writes an instruction to Vnm, the instruction is transmitted to the processor Pm. FIG. 5 shows a hardware block diagram of the instruction transfer means between processors. The operation sequence will be described in detail. When a processor issues an instruction, the decoder circuit 127 analyzes the instruction instruction matrix table 148 in the shared memory 14 and the processor number n and the instruction that issued the instruction are instructed. It is converted into a processor number m and placed on the common buses 128 and 129, respectively. Common buses 128 and 12
Reference numeral 9 is an instruction designating circuit 149-1 in the communication unit controllers 21-33 installed for each processor.
The decoder circuit 130 incorporated in the common bus 12
9 is analyzed to know whether or not an instruction is given to itself, and if the instruction is given to itself, the analysis result of the decoder circuit 131 is validated by the signal line 162. The decoder circuit 131 analyzes which processor has instructed its own instruction, and if a valid signal is received from the decoder circuit 130 through the signal line 162, it corresponds to the processor instructing the instruction as the analysis result. One of the binary counters 132 to 144 is counted,
At the same time as transmitting the interrupt control request to the interrupt control circuit 145, the result is placed on the dedicated 147, and the processor instructing the instruction is notified as the status. Interrupt control circuit 145
Interrupts the processor, and if the interrupt is accepted, generates an interrupt vector corresponding to the attribute of the processor that instructed the instruction to the processor, and the processor that receives it writes the instruction in the instruction instruction matrix table. Address, and jump directly to the beginning of the designated instruction routine. Here, when the processor instructed the instruction refers to the address instructed by the instruction in the instruction instruction matrix table, the same binary counter among the binary counters 132 to 144 is counted in the same sequence as the above. The operation to return to the state is automatically performed. As a result, the interrupt control request signal to the interrupt control circuit is cleared and the status signal to the dedicated bus 147 is also cleared at the same time. The processor instructing the command can manage the command activation and activation state by taking in and monitoring the status of the command activation related to itself, for example, as in the status signal line 146 of the processor PO, and activates the next command. It becomes possible to judge whether or not it is possible. As described above, not only is it possible to transmit a command at high speed between arbitrary processors by a very simple operation, but it is also possible to manage the activation status of the command.

最後に、グループ内プロセツサ間同期手段とそれを使用
したプロセツサ制御例について第６図，第７図及び第８
図を参照しならが詳細に説明する。グループ内プロセツ
サ間同期手段は、関連のあるタスクを処理するプロセツ
サ同志が任意にグループを構成し、グループ内のプロセ
ツサ間で同期をとりながら機械的に並列処理を進める手
段である。第６図は、グループ内プロセツサ間同期手段
１７７〜１８９のハードウエアブロツク図を示してい
る。この図をもとにその動作シーケンスについて説明す
る。まず、プロセツサは、タスク処理を終了したところ
で、そのタスク処理をどの様なプロセツサのグループで
実行したかをグループレジスタ１６３に対して行う。こ
れをグループ宣言ＧＣと称し、本実施例においては、１
６bitのワード情報として表わし、そのbit番号０から１
２をプロセツサ１から１３に対応させ、ビツトが１のと
きにグループに属し、０のときグループに属さないと定
義している。プロセツサが自分自身もグループに含めた
グループ宣言をグループ・レジスタ１６３に対して行う
とまず、信号線１６９により同期完了ステータス１７５
を出力するラツチ回路１６６をクリアし、次に、グルー
プ情報がグループレジスタ１６３にラツチされ、タスク
処理が完了したとみなされて信号線１６８によつてラツ
チ回路１６５をセツトし、セツトされた信号がタスク処
理完了ステータスとして信号線１７０により共通バス７
６に出力される。グループ宣言が行われると、比較回路
１６４が共通バス７６上の各プロセツサのグループ内プ
ロセツサ間同期機構１７７〜１８９より出力されたタス
ク処理完了ステータスを監視し、グループレジスタ１６
３にラツチされたbit情報と比較し、グループ内に属す
るプロセツサのタスク処理がすべて完了したかどうかを
調べている。グループ内のプロセツサのタスク処理がす
べて完了したことを知るとプロセツサ間の同期がとれた
として、比較回路１６４中の比較解析回路１７３は、信
号線１７１によつてラツチ回路１６６をセツトし、信号
線１７４によつてラツチ回路１６５をリセツトする。こ
れにより、タスク完了ステータスをクリアし、ラツチ回
路１６６はラツチした同期完了ステータス１７５をプロ
セツサに対し出力する。プロセツサは、この信号をステ
ータス・チエツク・ループで監視することによつて同期
がとれたことを知る。また、同期完了割込みが許可され
ていれば、比較解析回路１７３は信号線１７２により同
期完了割込み発生回路１６７をアクテイブにし、同期完
了ステータス１７５と共に、プロセツサに対して同期完
了割込み１７６を発生してグループ内のプロセツサ間の
同期がとれたことを知らせる。同期完了割込み機能を利
用すれば、ソフトウエアによるステータス・チエツクを
行う必要がなくなるので、同期がとれるまでプロセツサ
の遊び時間がある場合、バツクグラウンドオペレーシヨ
ンを実行するなど遊び時間の有効利用が可能となる。ま
た、プロセツサ間の同期が完了したら直ちに、同期完了
割込みによりメインの並列処理に引き戻され、次のプロ
セツサの遊び時間にバツクグラウンドオペレーシヨンを
実行すると処理の中断点から処理を再開する手法を採る
ので、スケジユールされた並列処理の流れを乱すことが
なく、大きな一連の連続したバツクグラウンドジヨブを
メインの並列処理を意識することなく実行できるのも大
きな特徴である。本実施例では他にも、例えば各主プロ
セツサにローカルに設置された補助プロセツサ等の補助
機構に主プロセツサが処理を依頼した際生ずる処理終了
までの主プロセツサの遊び時間を、同期完了割込みと同
様の考えに基づく処理終了割込み機能を設けることによ
り管理し、有効利用できる様考慮している。なお、本実
施例においては、同期完了割込みの許可、不許可は、グ
ループ宣言の際のbit番号１５により行い、それに基づ
いてラツチ信号１６９によつて同期完了割込み発生回路
167の動作を有効にしたり無効にしたりする手法を採つ
ている。第７図および第８図は、グループ内プロセツサ
間の同期手段を使用して、プロセツサの処理の流れを制
御した例を示している。第７図は、まずプロセツサＰ
０，Ｐ１，Ｐ２とＰ３，Ｐ４，Ｐ５，Ｐ６，Ｐ７とＰ
８，Ｐ９とがそれぞれグループを構成し、CELL１を上か
ら下へかけて実行している。各グループ内のプロセツサ
は、グループ内プロセツサ間同期手段によりＡで代表さ
れる横線の部分で同期がとられ、それぞれ関連したタス
ク処理を行つている。グループ間は非同期であるが、Ｂ
の点において、グループ間で情報を交換する必要が発生
し、さらに別のグループ内プロセツサ間同期手段によ
り、プロセツサＰ０からＰ９までのプロセツサをすべて
グループとみなしグループ間の同期をとつた後グループ
を編成し直してCELL２へ処理を進めている。ここで、同
期の最小単位をレベルと称し、レベル内の処理内容をタ
スクと呼んでいる。この様に、本実施例では複数のグル
ープ内プロセツサ間同期手段を設け、多重同期処理を可
能にしており、プロセツサをグループに分けて統一的に
制御することによつてスケジユールされたMIMD型並列処
理を機械的に実行できる。第８図の２は、Ｃで代表され
るプロセツサのタスク処理の状態とＤで代表されるプロ
セツサの遊び時間の状態及び、Ｅ，Ｆ，Ｇで示した同期
単位間での条件ジヤンプ処理の様子を表わしている。図
に示したプロセツサの遊び時間に対して、同期完了割込
みを利用することによりバツクグラウンドオペレーシヨ
ン等の処理に利用する。条件ジヤンプについては、Ｅ，
Ｆ，Ｈがレベル間のジヤンプ、ＧがCELL間のジヤンプで
ある。この様に、ジヤンプ処理は各同期単位間で行われ
る。具体的には、グループ内のプロセツサがある同期点
で同期をとつた後、条件を判定して目的の同期へ分岐、
ループ等の処理を行うべくジヤンプする。また、判定す
べき条件が無い場合は、同期をとつた後、目的の同期点
へ無条件ジヤンプを行えば良い。すなわち、同期をと
り、レベルあるいはCBLLといつた各同期処理単位の実行
を進めていくことが、ノイマン型の単一プロセツサにお
けるプログラムカウンタの更新にあたると考えられる。
本機能により、汎用プロセツサには欠くべからざるダイ
ナミツクな要素を含む条件ジヤンプ処理を、多くの付加
的なソフトウエアの補助を必要とせず、比較的簡単に並
列処理スケジユール上で実現している。以上の様に、グ
ループ内プロセツサ間同期手段の利用は、同期処理にお
けるソフトウエアオーバーヘツドを極小化し、タスクの
細分化と多数のプロセツサへの分配を可能にして並列処
理効率を高め、さらに、プロセツサをグループに分け各
同期単位間で多重、階層的かつ統一的に管理及び制御で
きるためプロセツサの遊び時間の管理とその有効利用及
び同期単位間での条件ジヤンプ処理が可能となり、高度
で汎用性のあるMIMD型並列処理を容易にかつ効率良く実
現できる。Finally, FIGS. 6, 7, and 8 show a synchronization means between processors within a group and a processor control example using the same.
A detailed description will be given with reference to the drawings. The in-group processor synchronization means is a means in which processors that process related tasks arbitrarily form a group, and processors in the group perform mechanical parallel processing while synchronizing with each other. FIG. 6 shows a hardware block diagram of inter-processor synchronization means 177 to 189. The operation sequence will be described with reference to this figure. First, when the processor finishes the task processing, the processor performs, on the group register 163, in which processor group the task processing was executed. This is called a group declaration GC, and in this embodiment, it is 1
It is expressed as 6-bit word information, and its bit numbers 0 to 1
2 is associated with processors 1 to 13, and it is defined that when the bit is 1, it belongs to the group, and when it is 0, it does not belong to the group. When the processor makes a group declaration including itself in the group to the group register 163, the synchronization completion status 175 is first sent through the signal line 169.
Then, the group information is latched in the group register 163, it is considered that the task processing is completed, the latch circuit 165 is set by the signal line 168, and the set signal is output. The common bus 7 is indicated by the signal line 170 as the task processing completion status.
6 is output. When the group declaration is made, the comparison circuit 164 monitors the task processing completion status output from the intra-processor synchronization mechanism 177 to 189 of each processor on the common bus 76, and the group register 16
It is checked whether or not the task processing of the processors belonging to the group has been completed by comparing with the bit information latched in 3. When it is known that the processors in the group have completed the task processing, it is determined that the processors are synchronized with each other, and the comparison analysis circuit 173 in the comparison circuit 164 sets the latch circuit 166 by the signal line 171 and sets the signal line 166. The latch circuit 165 is reset by 174. As a result, the task completion status is cleared, and the latch circuit 166 outputs the latched synchronization completion status 175 to the processor. The processor knows that synchronization has been achieved by monitoring this signal in the status check loop. If the synchronization completion interrupt is permitted, the comparison analysis circuit 173 activates the synchronization completion interrupt generation circuit 167 through the signal line 172, and together with the synchronization completion status 175, generates the synchronization completion interrupt 176 to the processor and outputs it to the group. Informs that the processors in are synchronized. By using the synchronization completion interrupt function, it is not necessary to perform a status check by software, so if there is processor idle time until synchronization is achieved, it is possible to effectively utilize the idle time by executing a backup ground operation. Become. Also, as soon as the synchronization between the processors is completed, it is returned to the main parallel processing by the synchronization completion interrupt, and if the background operation is executed during the idle time of the next processor, the processing is restarted from the interruption point. , A major feature is that a large series of continuous background jobs can be executed without disturbing the flow of scheduled parallel processing, without being aware of the main parallel processing. In addition to this, in the present embodiment, the play time of the main processor until the end of processing which occurs when the main processor requests processing to an auxiliary mechanism such as an auxiliary processor locally installed in each main processor is the same as the synchronization completion interrupt. It is managed by providing a processing end interrupt function based on the above idea, and it is considered that it can be effectively used. In the present embodiment, the synchronization completion interrupt is enabled or disabled by the bit number 15 in the group declaration, and the latch signal 169 is used to generate the synchronization completion interrupt generation circuit based on the bit number 15.
The method of enabling or disabling the operation of 167 is adopted. 7 and 8 show an example in which the processing flow of the processors is controlled by using the synchronization means between the processors in the group. Figure 7 shows processor P first.
0, P1, P2 and P3, P4, P5, P6, P7 and P
8 and P9 respectively form a group, and CELL1 is executed from top to bottom. The processors within each group are synchronized by the inter-processor synchronization means at the horizontal line portion represented by A, and perform associated task processing. The groups are asynchronous, but B
In this regard, it becomes necessary to exchange information between groups, and by means of another intra-group processor synchronization means, all processors from processors P0 to P9 are regarded as groups and synchronization is established between groups. We are doing it again and proceeding to CELL2. Here, the minimum unit of synchronization is called a level, and the processing content in the level is called a task. As described above, in the present embodiment, a plurality of intra-group processor inter-processor synchronization means are provided to enable multiple synchronization processing, and the MIMD type parallel processing scheduled by dividing the processors into groups and uniformly controlling them is performed. Can be performed mechanically. 2 in FIG. 8 shows the state of task processing of the processor represented by C, the state of the play time of the processor represented by D, and the condition jump processing between the synchronization units indicated by E, F, and G. Is represented. The idle time of the processor shown in the figure is used for processing such as background operation by utilizing the synchronization completion interrupt. For the conditional jump, E,
F and H are jumps between levels, and G is jumps between CELLs. In this way, jump processing is performed between each synchronization unit. Specifically, the processor in the group synchronizes at a certain synchronization point, then judges the condition and branches to the target synchronization,
Jump to perform processing such as looping. Further, if there is no condition to be determined, it is sufficient to carry out unconditional jump to the target synchronization point after synchronizing. That is, it is considered that updating the program counter in the Neumann-type single processor is to synchronize and proceed with the execution of each synchronization processing unit with the level or CBLL.
With this function, conditional jump processing including dynamic elements that are indispensable for general-purpose processors is realized relatively easily on the parallel processing schedule without the need for much additional software assistance. As described above, the use of synchronization means between processors within a group minimizes software overhead in synchronization processing, enables task subdivision and distribution to multiple processors, and improves parallel processing efficiency. Groups are divided into groups and can be managed and controlled in a multi-layered, hierarchical and unified manner among the synchronization units, so that it is possible to manage the idle time of the processor and its effective use, and to perform conditional jump processing between synchronization units. A certain MIMD type parallel processing can be realized easily and efficiently.

なお、ここで述べてきたハードウエア構成及び並列処理
のための種々の手段を、一部分だけ利用して小規模なマ
ルチ・プロセツサを構成することも可能であり、さらに
第１図の９１，９２，９３に示す外部モジユールとの通
信手段により、他のマルチ・マイクロプロセツサ・モジ
ユールと結合し、さらに大規模なマルチ・プロセツサ・
システムに拡張することも可能である。Note that it is possible to configure a small-scale multi-processor by partially utilizing the hardware configuration and various means for parallel processing described here. By means of communication with an external module shown at 93, it can be combined with other multi-microprocessor modules to create a larger multi-processor module.
It can also be extended to the system.

上述した本発明の実施例によれば、並列処理効率に大き
く影響すると考えられ、従来のマルチ・プロセツサ・シ
ステムで問題となつていたプロセツサ間の競合による通
信オーバーヘツドを、独立した複数の高速共有メモリ通
信手段を設けることにより十分小さくできるとともに、
プロセツサ間の同期手段をハードウエアで設けることに
より、同期処理に要するソフトウエア・オペレーシヨン
・オーバーヘツドを極小化できるなど、並列処理によつ
て新たに生じたハードウエア及びソフトウエア・オーバ
ーヘツドを最小化する手法によつて、並列処理タスク間
の接続時間を減少させタスクの細分化を可能にし、多数
のプロセツサに分配することによつて、安価な汎用マイ
クロプロセツサをベースにしたマルチ・マイクロプロセ
ツサにおいても高い並列処理効率を得ることが可能とな
る。また、本発明のグループ内プロセツサ間同期手段を
複数使用することによる多重同期処理によつて、プロセ
ツサをグループにまとめ統一的に制御することができ、
データフロー風にスケジユールされたMIMD(Multi-Instr
ction stream Multi-Data stream)型の並列処理を機械
的にかつ効率良く実行できるばかりか、プロセツサの同
期単位間で、従来困難であつたダイナミツクな要素を含
む条件ジヤンプ処理も容易に行えるため、プログラムに
対する汎用性を高めることができる。さらに、同期完了
割込みを利用することによつて、プロセツサ間で同期が
とれるまでのプロセツサの遊び時間にバツクグラウンド
オペレーシヨンを実行でき、それによつてシステムの余
剰処理能力の有効利用が可能となる。以上により、リア
ルタイム処理を中心とした高度制御用途に有効でかつ安
価なマルチ・マイクロプロセツサ・モジユールを提供で
きる。According to the above-described embodiment of the present invention, it is considered that the parallel processing efficiency is greatly affected, and the communication overhead due to the competition between the processors, which is a problem in the conventional multi-processor system, is shared by a plurality of independent high speeds. It can be made small enough by providing memory communication means,
By providing the synchronization means between processors by hardware, the software operation overhead required for synchronization processing can be minimized, and the hardware and software overhead newly generated by parallel processing can be minimized. This method reduces the connection time between parallel processing tasks and enables subdivision of tasks, and by distributing them to a large number of processors, a multi-micro processor based on an inexpensive general-purpose microprocessor is used. It is possible to obtain high parallel processing efficiency even in the processor. Further, by the multiple synchronization processing by using a plurality of inter-processor synchronization means of the present invention, it is possible to collectively control the processors in a group,
MIMD (Multi-Instr
ction stream (Multi-Data stream) type parallel processing can be executed mechanically and efficiently, and conditional jump processing including dynamic elements, which has been difficult until now, can be easily performed between processor synchronization units. It is possible to improve versatility with respect to. Further, by using the synchronization completion interrupt, the background operation can be executed during the idle time of the processors until the processors are synchronized with each other, and the surplus processing capacity of the system can be effectively used. As described above, it is possible to provide a multi-microprocessor module that is effective and inexpensive for advanced control applications centered on real-time processing.

〔The invention's effect〕

本発明によれば、高い並列処理効率と汎用性を備えるマ
ルチ・マイクロプロセツサ・モジユールを提供すること
ができる。According to the present invention, it is possible to provide a multi-microprocessor module having high parallel processing efficiency and versatility.

[Brief description of drawings]

第１図は本発明のマルチ・マイクロプロセツサ・モジユ
ールのハードウエア構成を示す図、第２図および第３図
は本発明を構成する共有メモリのアクセスタイミング
図、第４図は本発明を構成する命令指示マトリツクス・
テーブルの構成図、第５図本発明を構成するプロセツサ
間命令伝達手段のハードウエアブロツク図、第６図は本
発明を構成するグループ内プロセツサ間同期手段のハー
ドウエアブロツク図、第７図および第８図は、本発明を
構成するグループ内プロセツサ間同期手段を複数利用し
たプロセツサ制御例を示す図である。１〜１３…プロセツサ、２１〜３３……コミニユケーシ
ヨンコントローラ、７３，７４，７５…共有メモリバ
ス、１４，１５，１６…共有メモリ、１７，１８，１９
…共有メモリバス制御用判別回路、２０…命令指示マト
リツクステーブル解析回路、７６，７７…コミニユケー
シヨンコントローラ間共通バス及び専用バス、１４９〜
１６１…命令指示回路、１７７〜１８９…グループ内プ
ロセツサ間同期手段。FIG. 1 is a diagram showing a hardware configuration of a multi-processor module of the present invention, FIGS. 2 and 3 are access timing diagrams of a shared memory constituting the present invention, and FIG. 4 is a diagram showing the present invention. Command instruction matrix
FIG. 5 is a block diagram of a table, FIG. 5 is a hardware block diagram of inter-processor instruction transmitting means which constitutes the present invention, and FIG. 6 is a hardware block diagram of intra-group processor synchronizing means which constitutes the present invention. FIG. 8 is a diagram showing an example of processor control using a plurality of intra-group processor synchronization means constituting the present invention. 1 to 13 ... Processor, 21 to 33 ... Communication controller, 73, 74, 75 ... Shared memory bus, 14, 15, 16 ... Shared memory, 17, 18, 19
... shared memory bus control discriminating circuit, 20 ... instruction instruction matrix table analysis circuit, 76, 77 ... common bus between dedicated controllers and dedicated bus, 149-
161 ... Instruction designating circuit, 177-189 ... Synchronizing means between processors within a group.

Claims

[Claims]

1. A multi-processor module for equally connecting a plurality of general-purpose microprocessors for parallel processing, wherein the parallel processing hardware means is from any processor via a shared memory bus (73-75). A high-speed shared memory communication means (14 to 16) that can be accessed and an instruction transmission means (2) having a function of exchanging instructions between arbitrary processors and monitoring the activation state of instruction processing.
0, 149 to 161) and processors that process related tasks, form a group arbitrarily, synchronize the processors in this group, and mechanically execute scheduled parallel processing. Internal processor synchronization means (177-1)
89), a dedicated bus means (76) connected to the inter-processor synchronization means (177 to 189), and a common bus means (77) connected to the command transmission means (20, 149 to 161). A multi-microprocessor module characterized by that.

2. A multi-microprocessor module according to claim 1, wherein instruction communication means (20, 149 to 161), which are various communication means necessary for parallel processing, and synchronization between processors within a group. Communication controllers (21 to 33) that control the means (177 to 189) are provided for each processor unit, and the bus means connects the processors with a plurality of independent high speed shared memory buses (73 to 75). Shared memory buses (73-75) and other system buses (109)
Communication controller (21-3
A multi-microprocessor module having a function-distributed multi-bus configuration by connecting 3) with buses (76, 77).

3. The multi-microprocessor module according to claim 1, wherein the shared memory communication means connects the shared memory (14-16) to each processor (1-13). (73 to 75) and a discrimination circuit (17 to 19) for performing arbitration control for connecting the processors (1 to 13) to a shared memory (14 to 16), and the discrimination circuit (17 to 19). ) Is a shared memory (14-1)
When the access requests to 6) are not in conflict, the access permission signal to the shared memory of the processor is activated, and when the access requests to the shared memories (14 to 16) are in conflict, the priority order is set. The pre-defined shared memory cycle of the processor with a lower priority is shifted to the processor with the higher priority that outputs the access request to the shared memory during this cycle, and the shared memory (14-1
6) A multi-microprocessor module comprising means for activating an access permission signal to 6).

4. The multi-microprocessor module according to claim 3, wherein, when the shared memory cycles overlap due to contention of access requests, the determination circuit is the number of the overlapping clocks. A multi-microprocessor module characterized by shifting the shared memory cycle on the processor side accessed later.

5. A multi-microprocessor module according to claim 1, wherein the shared memory communication means comprises a plurality of shared memories (14-1).
6) and a plurality of shared memory buses (73 to 75) connecting the processors (1 to 13), and the processors (1 to 3).
.. to 13) are connected to the shared memory (14 to 16), a plurality of discrimination circuits (17 to 19) for performing arbitration control are provided, and each of the discrimination circuits (17 to 19) includes the shared memory (14 to
16) In order to alleviate the competition of access requests to 16) and to make the access conditions to the shared memory of each processor almost equal as a whole, different priority levels for the processors are set for each discrimination circuit. -Microprocessor module.

6. A multi-microprocessor module according to claim 1, wherein instruction transfer means (20, 149 to 161) between processors.
Includes a command instruction matrix table (148) on the shared memory (14) that enables an interrupt from any processor to any processor. This command instruction matrix table (148) is provided with a command instruction from the processor. The operation of writing the instruction instruction data automatically interrupts the processor to which the instruction is instructed, and the shared memory (14
16) A multi-microprocessor module, characterized in that it carries the instructions written above.

7. A multi-microprocessor module according to claim 6, wherein instruction transfer means (20, 149 to 161) between the processors.
The command instruction matrix table (148) in
A part of the interrupt vector table of each processor is shared on the shared memory (14), and the instruction instruction matrix table (148) is an instruction processing for the processor which wants to execute the instruction by the instruction instruction from the processor instructing the instruction. The start address of the routine is set to the instruction instruction matrix table (14
8) Writing to a predetermined place, the instruction transmission means (20, 149 to 161) analyzes the accessed address, interrupts the processor instructed to instruct, and instructs the instruction as soon as the interrupt is received. By transmitting the attribute of the specified processor as interrupt vector information to the processor instructed by the instruction, the start address of the instruction routine written in the instruction matrix table is directly read by the processor instructed by the instruction and stored in the program counter. A multi-microprocessor module characterized by jumping directly to the beginning of an instruction routine to start instruction processing by loading.

8. A multi-microprocessor module according to claim 1, wherein instruction transfer means (20, 149 to 161) between processors.
Is considered to be an instruction invocation at the time when the processor instructing the instruction in the instruction instruction matrix table (148) writes the starting address of the instruction routine, and the instruction transmitting means (2
(0, 149 to 161) analyzes the accessed address, sets a flag indicating the activation of the corresponding instruction, and the processor instructed to read the start address of the instruction routine again reads the instruction instruction matrix table (148 ) Is regarded as the completion of the start of the instruction, and the instruction transmission means (20, 149 to 1)
61) similarly resets the flag indicating the activation of the above-mentioned instruction, and transmits the set or reset state of the flag as status information to the processor instructing the instruction, thereby notifying the activation state of the instruction. Multi-microprocessor module.

9. A multi-microprocessor module according to claim 1, wherein the inter-processor synchronization means (177 to 189) within a group.
Is a group register (163) in the synchronization means that manages the processor by using the attribute of the group indicating the group of the processor that has executed the task processing when the processor finishes the task processing.
It is a feature to check whether the task processing of the processors belonging to the group is completed together with the information from the synchronization means that manages other processors, and to notify the result as synchronization information to the processor only by writing to And a multi-microprocessor module.

10. A multi-microprocessor module according to claim 9, wherein the inter-processor synchronization means (177 to 189) within a group.
As a sequence, as soon as a processor performs an operation of writing the attribute data of a group to the group register (163), the synchronization means for managing that processor resets the flag indicating the previous synchronization information and the task processing of that processor. Is set and a flag indicating that the processing has been completed is sent to the common bus, and then the task processing completion information is obtained from the synchronization means of another processor through the common bus and registered in the group register (163). The comparison circuit (164) compares whether or not all the processors belonging to the group have completed the task processing, and if the task processing of all the processors in the group has completed, the processors in the group are synchronized. To the processor by setting a flag indicating synchronization information The processor, in response to the flag information, until the synchronization in the group is completed, the multi-microprocessor module, characterized in that it comprises means for stopping the start of the next task processing.

11. A multi-microprocessor module according to claim 10, wherein inter-group processor synchronization means (177 to 189).
Is set at the time when the flag indicating the synchronization information is set in the group, and if interruption is permitted to the synchronization means,
Information indicating that synchronization has been achieved is output to the processors, and the processors execute background operations in parallel during the idle time until task processing is completed and synchronization is achieved by this synchronization completion interrupt information, As soon as the processors are synchronized, the synchronization completion interrupt automatically and forcibly pulls them back to the main parallel processing, and then executes the background operation from the interruption point again during the idle time of the next processor to execute the parallel processing. A multi-microprocessor module having idle processor management means for assigning idle time of the processor to a series of consecutive background operation program processes.

12. The multi-microprocessor module according to claim 10 or 11, wherein the inter-group processor synchronization means (177 to 189).
Is a multi-microprocessor module characterized in that it has a plurality of means for synchronizing processors in a group independently, and the synchronization between groups is managed by another means having a similar function.

13. A multi-microprocessor module according to claim 1, wherein the shared memory communication means comprises a plurality of shared memories (14-1).
6), and these shared memories (14 to 16) include those that can be accessed corresponding to even address values indicated by the processor and those that can be accessed corresponding to odd address values. A multi-microprocessor module characterized in that these shared memories (14-16) are connected to a plurality of independent shared memory buses (73-75).