JPS62208158A

JPS62208158A - Multiprocessor system

Info

Publication number: JPS62208158A
Application number: JP61051236A
Authority: JP
Inventors: Hirotada Ueda; 博唯上田; Kanji Kato; 加藤　寛次; Hitoshi Matsushima; 整松島
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1986-03-08
Filing date: 1986-03-08
Publication date: 1987-09-12
Anticipated expiration: 2010-05-01
Also published as: EP0236762A1; EP0236762B1; DE3779150D1; JPH0740252B2; US4979096A

Abstract

PURPOSE:To increase processing speed and to provide the titled system with diversity by connecting bus control circuits like shift registers between respective processor components and processor devices consisting of local memories to form a ring bus. CONSTITUTION:The bus control circuit 4 is provided between respective processor components 3 and local memories 2 and plural bus control circuit are connected like shift registers. The shift registers from the ring bus 1 and are controlled by a ring bus control circuit 8. The whole system is controlled by a microcomputer 5 and connected to a host computer through a system bus 10 and an interface part 9 for I/O. The ring bus 1 can be used for executing the I/O of an image from/to an external equipments at high speed and can attain may operation modes.

Description

【発明の詳細な説明】〈産業上の利用分野〉本発明は、情報処理を高速に実行するためのマルチプロ
セシステムに係り、特に画像などの２次元データを高速
に並列処理するのに好適なマルチプロセシステムに関す
る。[Detailed Description of the Invention] <Industrial Application Field> The present invention relates to a multi-processing system for performing information processing at high speed, and is particularly suitable for high-speed parallel processing of two-dimensional data such as images. Regarding multi-process systems.

〈従来の技術〉画像処理では、単一のプロセッサでは実現できないほど
、大きな計算能力が要求されるため、マルチプロセッサ
型式のイメージプロセッサが、種々提案されている。従
来のマルチプロセッサでは。<Prior Art> Image processing requires a large computational capacity that cannot be achieved by a single processor, so various multiprocessor type image processors have been proposed. In traditional multiprocessors.

単純だが超高速演算が要求されるフィルタリング等の前
処理のレベルに着目していた。その場合には、１９８２
年度国際固体回路会議に発表されたアン　エルニスアイ
　アダプティブ　アレイ　プロセッサ（Ａｎ　　ＬＳＩ
　　Ａｄａｐｔｉｖｅ　　Ａｒｒａｙ　　Ｐｒｏｃｅｓ
ｓｏｒ）のように２画素数分だけのプロセッサを用意し
く完全並列型と呼ばれる）、それぞれのプロセッサは１
ビツトの演算という低い機能しか持たないが、これを多
数個並べることによって高速化を図るというような構成
が多く用いられた。　そして、上位のレベルの処理であ
る画像の特徴抽出や構造解析に関しては。We focused on the level of preprocessing such as filtering, which is simple but requires ultra-high-speed calculations. In that case, 1982
An LSI Adaptive Array Processor (An LSI
Adaptive Array Processes
It is called a fully parallel type, in which there are only two processors for each pixel, such as the sor), and each processor has one processor.
Although it only had a low function of bit calculation, many configurations were used in which high speed was achieved by arranging a large number of them. And regarding image feature extraction and structural analysis, which are higher-level processing.

速度を犠牲にして（高度な演算命令を使える）汎用マイ
コンのプログラムにまかせるか、用途を限定して専用化
した特殊なハードウェアを個別に設計して用いるという
方法が採られていた。The methods used were either to sacrifice speed and rely on general-purpose microcontroller programs (which could use advanced arithmetic instructions), or to individually design and use specialized hardware for limited purposes.

ところが、イメージ・プロセッサの応用分野が拡大する
につれ、要求される処理も高度化してきている。すなわ
ち、前処理のみの高速化では意味がなく、より上位のレ
ベルを扱うことのできるプロセッサが要求されるように
なった。However, as the field of application of image processors expands, the required processing is also becoming more sophisticated. In other words, increasing the speed of preprocessing alone is meaningless, and processors that can handle higher levels are now required.

これに応えるものとして、先の出願（特公昭５９−１４
０４５６）では、高度な機能を持つ複数のプロセッサを
リング状に接続することによって。In response to this, the earlier application (Japanese Patent Publication No. 59-14
0456), by connecting multiple processors with advanced functionality in a ring.

高速でかつ複雑な処理をも可能とする方式を提供した。We have provided a method that enables high-speed and complex processing.

く本発明が解決しようとする問題点〉マルチプロセッサ型式で演算能力を高める場合。Problems to be solved by the present invention> When increasing computing power with a multiprocessor model.

プロセッサ間通信あるいはプロセッサ間でのデータの共
有や交換がオーバヘッドとなって１期待したほどの能力
が得られないという問題がある。この一般的な問題に加
えて２画像処理では２次元的に隣接するプロセッサと高
速にデータ転送を行えるようにする必要がある１例えば
、前述の完全並列型のマ゛ルチプロセッサシステムでは
、すべてのプロセッサが隣接８方向のプロセッサとデー
タ転送できるように接続されている。そのため、回路の
規模が大きくなり、演算機能を１ビツトに制限した上で
、データ転送も１ビツト毎にしかできないようになって
おり、前処理は高速にできるが。There is a problem in that the communication between processors or the sharing and exchange of data between processors creates overhead, making it impossible to obtain the expected performance. In addition to this general problem, in two-image processing, it is necessary to be able to transfer data at high speed between two-dimensionally adjacent processors1.For example, in the fully parallel multiprocessor system mentioned above, all The processors are connected so that data can be transferred to eight adjacent processors. As a result, the scale of the circuit increases, the arithmetic function is limited to 1 bit, and data transfer can only be performed on a 1-bit basis, although preprocessing can be performed at high speed.

高度な画像処理には不適当であった。It was unsuitable for advanced image processing.

一方、前述した先の出願である。プロセッサをリング状
に接続する方式は１回路規模の増大を防ぐために考案さ
れたものである。しかし、その中の実施例に示された制
御方式は、リングバス上を流れるデータとプロセッサの
処理との同期をとるために、制御部が、いちいちそれら
の状態を判断した後、逐次にプロセッサおよびリングバ
ス制御部に対して、制御コマンドを送るという方法を用
いているために、十分な高速性が出せなかった。On the other hand, this is the earlier application mentioned above. The method of connecting processors in a ring was devised to prevent an increase in the size of one circuit. However, in the control method shown in the embodiment, in order to synchronize the data flowing on the ring bus with the processing of the processor, the control section judges the status of each of them, and then sequentially processes the processor and the processor. Because it uses a method of sending control commands to the ring bus control unit, it was not possible to achieve sufficient high speed.

本発明は、プロセッサを複数台用いる場合、この接続方
法に工夫を加えることによって、経済的で実用的な超高
性能イメージプロセッサを実現することを目的とする。An object of the present invention is to realize an economical and practical ultra-high performance image processor by adding innovation to the connection method when a plurality of processors are used.

く問題点を解決するための手段〉本発明の全体構成を第１図に示す０図中破線で示したプ
ロセッサ・ユニット（ＰＵ）２０は、プロセッサ・エレ
メント（ＰＥ）３と、そのローカルメモリ　（Ｌｏｃａ
ｌ　Ｍｅ＋ｍｏｒｙ）　　２から成り、これを複数組並
べる。但し、第１図では図を見易くするために、１つの
ＰＵ２０（１）のみを破線で示し他のＰＵについては破
線の記入と、記号２０（２）、２０（６４）などの記入
を省いである。Means for Solving the Problems> The overall configuration of the present invention is shown in FIG. 1. A processor unit (PU) 20 indicated by a broken line in FIG. Loca
lMe+mory) 2, which are arranged in multiple sets. However, in Figure 1, in order to make the diagram easier to read, only one PU 20 (1) is shown with a broken line, and for other PUs, the dashed line and symbols 20 (2), 20 (64), etc. are omitted. be.

それぞれのＰＥ３とローカルメモリ２の間には。between each PE3 and local memory 2.

バス制御回路（Ｐａｔｈ　Ｃｔｌ、）４が入っており、
６４個のバス制御回路はシフトレジスタ状に接続されて
いる。又、このシフトレジスタはリングバス（Ｒｉｎｇ
　ｆｌｕｓ）　　１を形成しており、リングバス制御回
路（Ｒｉｎｇ　ｌ１ｕｓ　Ｃｔｌ、）　８によって制御
される。Contains a bus control circuit (Path Ctl) 4,
The 64 bus control circuits are connected like a shift register. Also, this shift register is connected to a ring bus (Ring bus).
The ring bus control circuit (Ring l1us Ctl,) 8 forms a ring bus control circuit (Ring l1us Ctl, ) 8.

そして、全体はマイクロコンピュータ（６８０００）５
で制御され、システムバス１ｏと入出力のためのインタ
フェース部（ＤＭＡ　Ｉ／ＦとＰＩＯＩ／Ｆ）９を介し
て、ホストコンピュータと接続される。And the whole is a microcomputer (68000) 5
and is connected to a host computer via a system bus 1o and an interface section (DMA I/F and PIO I/F) 9 for input/output.

又、リングバス１は外部機器との間で画像の入出力を高
速に行なうためにも利用する。尚２本発明によれば、Ｐ
Ｕ数は任意に選択可能で、必要とする処理能力に合わせ
て自由に増減できる。The ring bus 1 is also used to input and output images to and from external devices at high speed. Furthermore, according to the present invention, P
The number of U can be arbitrarily selected and can be freely increased or decreased according to the required processing capacity.

〈作用〉本発明では、ＰＵのハードウェア的な接続関係を、（画
像処理にとっては有利ではないと考えられる）１次元的
なリングバス接続としている。敢えてこうしたのは、構
造が単純であるために、ハードウェア景に関しては、小
型化に適しており。<Operation> In the present invention, the hardware connection of the PUs is a one-dimensional ring bus connection (which is considered not advantageous for image processing). This is because the structure is simple, making it suitable for miniaturization in terms of hardware.

転送データφｔ５を１ワード（１６ビツト）と広く採れ
る。又、物理的接続は１次元であっても、論理的には（
８ｘ８，４ｘ１６，２ｘ３２というように）任意の２次
元配列として扱うことができるため、（８方向接続に比
べ）かえって自由度は高い等の利点があるためである。The transfer data φt5 can be as wide as 1 word (16 bits). Also, even if the physical connection is one-dimensional, logically (
This is because it can be handled as an arbitrary two-dimensional array (such as 8x8, 4x16, 2x32), so it has the advantage of having a higher degree of freedom (compared to 8-way connections).

又、リングバス１をプロセッサエレメント３とローカル
メモリ２の接続点の位置に設けたことにより、実施例で
詳しく述べる多様な動作モードを実現することができ、
更にリングバスはデータの転送のみの役割とし、ローカ
ルメモリへのアドレス及びリード・ライト信号は（プロ
グラムによって）ＰＥから供給するようにして、アドレ
スカウンタなどのハードウェアを省略し９回路規模の縮
小を図ると共に、リングバス上のデータの流れと、プロ
セッサの処理の同期化を自動化し、際めで高速に処理を
実行することが可能となった。Furthermore, by providing the ring bus 1 at the connection point between the processor element 3 and the local memory 2, it is possible to realize various operating modes, which will be described in detail in the embodiments.
Furthermore, the ring bus is used only to transfer data, and the address and read/write signals to the local memory are supplied from the PE (by program), eliminating hardware such as address counters and reducing the circuit size. At the same time, it has automated the synchronization of data flow on the ring bus and processor processing, making it possible to execute processing at high speed.

〈実施例〉以下１本発明の一実施例について説明する。<Example> An embodiment of the present invention will be described below.

尚１本実施例の中で、ＰＵ２０については、同一の回路
構成が、６４Ｍ１使われているので２これらが何番目で
あるかに意味があるときには、２０（１）あるいは２３
（２）などのように（）を使って表記し、一般的にその
うちの１つを指す場合には、２０あるいは２３というよ
うに（）を省いて表記する。1. In this embodiment, 64M1 of the same circuit configuration are used for the PU20, so 2. If there is any significance in the number of these, 20(1) or 23M1 is used.
It is written using parentheses, such as (2), and when referring to one of them, it is written without the parentheses, such as 20 or 23.

［４つの動作モード］本発明では、第１図に示したように、処理対象である画
像データはリングバス１を経由して、各Ｉ’Ｕ２０（１
）〜２０（６４）のローカルメモリ２（１）〜２（６４
）に送り込まれるようにしている。[Four Operation Modes] In the present invention, as shown in FIG.
) to 20 (64) local memory 2 (1) to 2 (64
).

又、各ＰＥ３（１）〜３（６４）はその中に独立に専用
のプログラムメモリを持ち、その内容（プログラム）は
システムメモリ（Ｓｙｓｔｅ＋＊　Ｍｅｍｏｒｙ）　６
から、必要に応じてＤＭＡコントローラ（ＤＭＡＣ）　
７によって転送される。このとき全ＰＥに対し、−斉に
同一プログラムを転送することもできる。そしてＰＥ３
の任意アドレスからの起動や停止などの制御は、マイク
ロプロセッサ（６８０００）５から行い、各ＰＥ３は必
要な時にはマイクロプロセッサ５に対して割込みを掛け
ることができる。In addition, each PE3 (1) to PE3 (64) has an independent dedicated program memory therein, and the contents (programs) are stored in the system memory (System+* Memory) 6
DMA controller (DMAC) as needed.
Transferred by 7. At this time, the same program can be transferred to all PEs at the same time. And PE3
The microprocessor (68000) 5 controls starting and stopping from an arbitrary address, and each PE 3 can interrupt the microprocessor 5 when necessary.

ところで本発明では、複数のＰＵ２０によって画像を処
理する訳であるから９画像データおよび処理プログラム
を何らかの形で各ＰＵに分配して。By the way, in the present invention, since images are processed by a plurality of PUs 20, nine image data and processing programs are distributed to each PU in some way.

効率良く並列処理することが重要である。汎用性のある
イメージプロセッサとして、多種多様な画像処理アルゴ
リズムを支障なく実行できるように。It is important to perform parallel processing efficiently. As a versatile image processor, it can execute a wide variety of image processing algorithms without any problems.

本実施例では４つの動作モード（利用形態）を画表１　
動作モード能とした。これらを表１に纏めた。以下それぞれのモー
ドについて説明する。In this example, four operation modes (usage forms) are shown in Table 1.
The operation mode was set to function. These are summarized in Table 1. Each mode will be explained below.

（ａ）複数データモード処理すべき画像が大址にあって、これらに対し順次同一
処理を施すというように、複数の画像をそれぞれ複数の
ＰＵに分配できるとき、このモードを用いると、ＰＵ間
でのデータのやり取りが不要となるため、オーバヘッド
がほとんどない、但し、複数のＰＵの処理の終了時刻が
一致したときには２画像データをローカルメモリに供給
するリングバスの競合が起き、待ちが生ずるので注意を
要する（実際にはＰＵの終了時刻は、適度にランダム化
されるので、大きなオーバーヘッドにはならない）。(a) Multiple data mode When multiple images can be distributed to multiple PUs, such as when there are images to be processed in a large area and the same processing is applied to them sequentially, this mode allows you to There is almost no overhead as there is no need to exchange data between the PUs.However, when the processing end times of multiple PUs coincide, there will be a conflict between the ring buses that supply two image data to the local memory, resulting in a wait. Care must be taken (actually, the end time of the PU is moderately randomized, so it does not result in a large overhead).

（ｂ）領域分割モード本モードは、１枚の画像を高速に処理したいときに用い
る。但し９画像を部分領域に分割し、これらを各ＰＵに
分配して処理する（第４図参照）ので、一連の処理が終
了した時点で２部分領域の境界線上で、処理結果をつな
ぎあわせる処理が必要となる場合がある。これは領域分
割モードに固有の余分な処理であり、不注意のこの処理
方式を実行すると大きなオーバーヘッドを生じる恐れが
あるが、後で詳述するように２本発明の構成によれば、
そのような処理に要する時間がオーバーヘッドとならな
いように十分に小さく抑えられる。(b) Region division mode This mode is used when it is desired to process one image at high speed. However, since the 9 images are divided into partial areas and distributed to each PU for processing (see Figure 4), when the series of processing is completed, the processing results are connected on the boundary line between the two partial areas. may be necessary. This is extra processing specific to the area division mode, and if this processing method is executed carelessly, there is a risk of causing a large overhead. However, as will be detailed later, according to the configuration of the present invention,
The time required for such processing can be kept sufficiently small so as not to cause overhead.

（ｃ）機能分割モード本モードは、１枚の対象画像に対し、多数の特徴量を求
めたり、多数の辞書とのマツチングを行なうような場合
で、処理機能を複数ＰＵに分担させられる処理アルゴリ
ズムに適する。この場合。(c) Functional division mode This mode is a processing algorithm that allows the processing functions to be divided among multiple PUs when calculating a large number of features or performing matching with a large number of dictionaries for one target image. suitable for in this case.

処理の結果として得られるデータ量は９元の画像のデー
タ量に比べはるかに少量になるので、各ＰＵに分散する
処理結果を集積する処理は、オーバヘッドと呼ぶ程には
ならない。Since the amount of data obtained as a result of the processing is much smaller than the amount of data for a 9-original image, the process of accumulating the processing results distributed to each PU cannot be called overhead.

（ｄ）パイプラインモード本モードは、スキャナから得られる画像信号のように９
発生する画素データの速度が遅く、このデータ・レート
に合わせて実時間で処理できるような場合であって、か
つ処理内容がシーケンシャルな複数の機能に分けられる
場合に適する。(d) Pipeline mode In this mode, the image signal obtained from a scanner is
This method is suitable when the pixel data generated is slow and can be processed in real time in accordance with this data rate, and when the processing content is divided into a plurality of sequential functions.

［リングバス部の構成］第１図で示したリングバス（Ｒｉｎｇ　Ｂｕｓ）　１上
にあるバス制御回路（Ｐａｔｈ　Ｃｔｌ、）　４のＩＰ
Ｕ分の詳細を第２図に示す、パス制御回路には２種のラ
ッチ（データラッチ２１とフラッグラッチ２２）があっ
て、それぞれシフトレジスタ状に隣のＰＵの同様のラッ
チに接続され、リングバス制御部８と合せて、全体とし
てリングバスを構成している。[Configuration of the ring bus section] IP of the bus control circuit (Path Ctl,) 4 on the ring bus 1 shown in Fig. 1
The details for U are shown in Figure 2. The path control circuit has two types of latches (data latch 21 and flag latch 22), each of which is connected like a shift register to a similar latch of an adjacent PU. Together with the bus control section 8, the ring bus is configured as a whole.

すなわちデータラッチ２１とフラッグラッチ２２は、シ
フトクロック１４を受けとるごとにデータを更新するこ
とによってシフトレジスタとしての動作をし、６４組の
全体として、リング状にデータを転送する。That is, the data latch 21 and the flag latch 22 operate as a shift register by updating data every time they receive the shift clock 14, and transfer data in a ring shape as a whole of 64 sets.

第２図に基づき、まずデータの流れについて説明する。Based on FIG. 2, the flow of data will first be explained.

隣の段のデータラッチから送られて来る１６ビツ１への
データは、データバス制御部（ＤａｔａＢｕｓ　Ｃｔｌ
、）　２３に入る。ここには、ＰＥ３のデータバス１０
１とローカルメモリ２のデータバス１０２と１０３（そ
れぞれ１６ビツトの幅を持つ）が接続されている。そし
てその先にその段に所属するデータラッチ２１があって
、隣のＰＵへと出て行く、データバス制御部２３は図示
したように７つのトライステート制御ゲート２３１〜２
３７によって構成され、どの制御ゲートを開き、どの制
御ゲートを閉じるかはセレクト信号（Ｓｅｌｅｃｔ　Ｓ
ｉｇｎａｌ）　　１０４によって切替られる。セレクト
信号とは、ＰＥ３から出力されるリード信号（ＲＤ）１
０５、ライト信号（ＷＴ）１０６．およびアドレスバス
（図では省略している）の組合せによって決まる信号で
あって、これらによって第３図に示す７種類の接続状態
が選択される。先に述べた本実施例の４つの動作モード
に対応して必要となる全ての画像データ転送は、これら
７種の接続状態を使い分けることによって実現できる。The data to 16-bit 1 sent from the data latch in the next stage is sent to the data bus control unit (DataBus Ctl).
,) enters 23. Here, data bus 10 of PE3
1 and local memory 2 are connected to data buses 102 and 103 (each having a width of 16 bits). Further, there is a data latch 21 belonging to that stage, and the data bus control unit 23, which goes out to the next PU, has seven tri-state control gates 231 to 2 as shown in the figure.
37, which control gate is opened and which control gate is closed is determined by a select signal (Select S
signal) 104. The select signal is the read signal (RD) 1 output from PE3.
05, write signal (WT) 106. and an address bus (not shown in the figure), which select the seven types of connection states shown in FIG. All the image data transfers required for the four operation modes of this embodiment described above can be realized by selectively using these seven types of connection states.

又、リングバスはデータの転送のみの役割とし、ローカ
ルメモリ２へのアドレス及びリード・ライト信号は（プ
ログラムによって）ＰＥ３から供給するようにして、ア
ドレスカウンタなどのハードウェアを省略し９回路規模
の縮小を図っている。In addition, the ring bus is used only to transfer data, and addresses and read/write signals to the local memory 2 are supplied from the PE 3 (by program), eliminating hardware such as address counters and reducing the number of circuits to 9. We are trying to downsize.

次にフラッグラッチの機能について説明する。Next, the function of the flag latch will be explained.

第２図に示したように、フラッグラッチの出力１０８は
ＡＮＤゲート２５によって、ＰＥ３のリード・ライト信
号（ＲＤ１０５．ＷＴ１０６）をＯＲゲート２４によっ
てオアした信号１０７と、アンドを取って、ＰＥ３のレ
ディピン（ＲＤＹ）１０９に接続されている。従って、
ＰＥ３のリード・ライト動作は、フラッグラッチ２２に
１がセットされている時にのみ実行され、フラッグラッ
チ２２が１になるまでは、ＰＥ３のリード・ライト動作
は引き延ばされる６つまり、リングバス１上を流れてく
るデータ１１が、所望のＰＵ３のデータバス制御部２３
に到達するタイミングで、フラッグラッチ２２がセット
されるようにして、ＰＥ３のプログラムによるリード・
ライト動作との同期を取っている。尚、フラッグラッチ
の状態は、リングバス制御回路８から送られて来るモー
ド信号（にｏｄｅ）１２．前段のフラッグラッチの状態
１０８（ｉ−１）、及び前シフトサイクルにおける自分
自身の状態１０８（ｉ）によって、フラッグ制御回路（
ＦｌａｇＣｔｌ、）　２６が決定する。このように。As shown in FIG. 2, the output 108 of the flag latch is ANDed by the AND gate 25 with the signal 107 obtained by ORing the read/write signal (RD105.WT106) of PE3 by the OR gate 24, and the signal 108 is output from the ready pin of PE3. (RDY) 109. Therefore,
The read/write operation of PE3 is executed only when the flag latch 22 is set to 1, and the read/write operation of PE3 is delayed until the flag latch 22 becomes 1. The data 11 flowing through the data bus controller 23 of the desired PU 3
The flag latch 22 is set at the timing of reaching the PE3 program.
It is synchronized with the light operation. Note that the state of the flag latch is determined by the mode signal 12. sent from the ring bus control circuit 8. The flag control circuit (
FlagCtl, ) 26 is determined. in this way.

本発明では、先の出願の方式では処理の高速化における
ネックとなっていたリングバス上のデータの流れとプロ
セッサによる処理の同期化が自動化され、極めて短時間
で行なえるようになった。In the present invention, the synchronization of the data flow on the ring bus and the processing by the processor, which was a bottleneck in speeding up the processing in the method of the previous application, is automated and can be performed in an extremely short time.

［領域分割モードに関する考察］領域分割モードにおいては、このモード固有の問題が生
じるので、それをここで詳しく説明する。[Considerations regarding area division mode] In area division mode, problems unique to this mode arise, which will be explained in detail here.

本モードでは、第４図に示すようにＸＸＹｉｉ！ｉｉの
大きさの画像を　ｘｘｙ　画素の大きさの部分画像Ｓ、
□に分割する。そして、このｍＸｎ個の部分画像ＳＩＪ
を、ｍＸｎ台のＰＵに割り当てて処理する。従って、実
行すべき処理が濃度変換のように２画素単位のものであ
る場合には、１台のＰＵで処理する場合に比べ、ｍＸｎ
倍の速度を得ることができる。ところが、フィルタリン
グ処理のように、１つの画素の処理結果は、その画素を
中心とするマスクエリアのなかに含まれる複数の画素の
データによって決定される場合には、状況はやや複雑に
なる。つまり、第５図に示すように。In this mode, as shown in FIG. 4, XXYii! An image of size ii is a partial image S of size of xxy pixels,
Divide into □. Then, these mXn partial images SIJ
is assigned to mXn PUs and processed. Therefore, when the processing to be executed is in units of two pixels, such as density conversion, mXn
You can get twice the speed. However, when the processing result of one pixel is determined by the data of a plurality of pixels included in a mask area centered on that pixel, as in the case of filtering processing, the situation becomes somewhat complicated. That is, as shown in FIG.

画像の周辺部の画素においては、その画素を中心とする
マスクエリアが２画像の外にはみ出すため。This is because the mask area centered on the pixel at the periphery of the image protrudes outside the two images.

正しく処理することができない（このようにしてできる
正しく処理できない部分を非正常部と呼ぶことにする）
にの問題は画像処理一般に生じるが、特に領域分割モー
ドにおける処理の場合には。Cannot be processed correctly (The part that cannot be processed correctly in this way is called the abnormal part)
This problem arises in image processing in general, but especially when processing in segmentation mode.

後で各部分画像をつなぎ合わせて元の大きさに戻してや
る必要があるため１部分画像の周辺（すなわち境界部）
に非正常部が生じると致命的である。Since it is necessary to connect each partial image later to return it to its original size, the periphery of one partial image (i.e. the border)
It is fatal if abnormal parts occur.

これを避けるために、領域分割モードでは、第６図（ａ
）に示すように９部分画像をあらかじめオーバラップさ
せておくことが必要である。更に。In order to avoid this, in the region division mode, in Figure 6 (a)
), it is necessary to overlap the nine partial images in advance. Furthermore.

部分画像の周辺に生ずる非正常部の幅（マスクエリアの
幅から１を引いた数の半分）は、フィルタリングを繰り
返すことによって、加算的に増加していくので、これを
阻止する必要がある。つまり。The width of the abnormal part (half the width of the mask area minus 1) that occurs around the partial image increases additively by repeating filtering, so it is necessary to prevent this. In other words.

フィルタリング処理を実行する度に（あるいは何回かの
繰返し毎に）非正常部を正常に処理された画像で置換す
ることが必要となる。この非正常部の１ａ換は次のよう
にして実現できる０部分画像Ｓｌｊ　　をフィルタリン
グ処理して生じた非正常部を、隣接する部分画像間で見
ると、第７図（、）に示すように、互いに相手の正常に
処理されたエリア内にあることが分かる。従って、８方
向に隣接するＰＵのもつこれらの正常な画像データを用
いれば、先の置換は正しく行なうことができる。It is necessary to replace the abnormal part with a normally processed image every time the filtering process is executed (or every few times it is repeated). This 1a transformation of the abnormal part can be realized as follows.If we look at the abnormal part generated by filtering the 0 partial image Slj between adjacent partial images, we can see the difference as shown in Fig. 7(,). , it can be seen that each is within the normally processed area of the other. Therefore, by using these normal image data of PUs adjacent in eight directions, the above replacement can be performed correctly.

ところで、非正常部の置換に際し、８方向のＰＵからの
画像データを転送せずに、これを３方向で済ませること
が可能である。先の説明では、直感的に分かりやすい処
理方式であることから、フィルタリング処理後の画像の
位置が処理前と変わらないように、マスクエリアの中心
位置に処理結果の画素を書き込む方式について述べた。By the way, when replacing an abnormal part, it is possible to do this in three directions without transferring the image data from the PUs in eight directions. In the previous explanation, since it is an intuitive and easy-to-understand processing method, a method was described in which pixels of the processing result are written at the center position of the mask area so that the position of the image after filtering processing is the same as before processing.

しかし。but.

結果の画像が処理前の位置に対して相対的に、全体にシ
フトすることを許すならば、マスクエリアの左上隅位置
に処理結果の画素を書き込むようにしてもよい、こうす
ると、非正常部は第７図（ｂ）に示すように、右方と下
方のみに生じる。それゆえ、前述の置換のためのデータ
転送は８方向ではなく３方向のみでよいことになる。こ
の場合、オーバラップエリアの取り方は第６図（ｂ）の
ようにする、尚、非正常部の幅はマスクエリアの幅から
１を引いたものとなるが、非正常部全体の画素数は前者
の場合と変わらない、以後、前者を８方向転送型、後者
を３方向転送型と呼ぶこととする。If the resulting image is allowed to shift entirely relative to the pre-processing position, the processed pixel may be written in the upper left corner position of the mask area. As shown in FIG. 7(b), this occurs only on the right and downward sides. Therefore, data transfer for the above-mentioned replacement only needs to be performed in three directions instead of eight directions. In this case, the overlap area is taken as shown in Figure 6(b).The width of the abnormal area is the width of the mask area minus 1, but the number of pixels in the entire abnormal area is is the same as the former case; hereinafter, the former will be referred to as the 8-way transfer type and the latter as the 3-way transfer type.

［領域分割モードにおけるリングバスの制御］前述した
ように２本発明では複数のＰＵ２０がリングバス１によ
って物理的に１次元的に接続されている。これを第４図
に示したような２次元の領域分割モードで利用する場合
に、各ＰＵを論理的な２次元配置（ｍＸｎ）に対応させ
る。このとき、ＰＵ間の接続の関係は＠８のようになっ
ており、実は画像のマスクスキャンとの整合性がうまく
取れている。更に、ＰＵ間の論理的な２次元方向の距離
と物理的な１次元の距離との対応も単純になる１例えば
、論理的な２次元空間での右へのデータ移動は、リング
バスの１回のシフトであり。[Control of Ring Bus in Area Division Mode] As described above, in the present invention, a plurality of PUs 20 are physically connected one-dimensionally by the ring bus 1. When using this in a two-dimensional area division mode as shown in FIG. 4, each PU is made to correspond to a logical two-dimensional arrangement (mXn). At this time, the connection relationship between the PUs is as shown in @8, which is actually well consistent with the mask scan of the image. Furthermore, the correspondence between the logical two-dimensional distance between PUs and the physical one-dimensional distance becomes simpler1.For example, data movement to the right in a logical two-dimensional space is performed using one ring bus. It is a shift of times.

下への移動はｍ回のシフト、左上への移動は（ｍｎ−ｎ
−１）回のシフトという具合に対応するが、この関係は
全てのＰＵの組合せについて成立する。これらの特徴を
生かしたリングバスの制御方式について、以下に２つの
例を示す。Moving down is shifted by m times, moving to the upper left is (mn-n
-1) times, and this relationship holds true for all PU combinations. Two examples of ring bus control methods that take advantage of these features are shown below.

（ａ）　　領域分割モードにおけるオーバラップを持つ
部分画像のロード画像データは第１図のリングバス制御回路８を介して、
外部からラスタスキャンで送られて来るものとする。(a) Loading image data of partial images with overlap in area division mode is carried out via the ring bus control circuit 8 shown in FIG.
It is assumed that the data is sent from outside via raster scan.

■　全てのＰＵ２０のフラッグラッチ２２をリセットす
る。■Reset the flag latches 22 of all PUs 20.

■　全てのＰＥ３には、？ローカルメモリ２に対し９画
像を書き込むべき先頭アドレスから順にインクリメン１
〜するアドレスを発生しつつ、リングバス１からのデー
タをローカルメモリ２に書き込むことを縁り返すＪプロ
グラムをロードして起動しておく ■　１番目のＰＵ３（１）のフラッグラッチ２２（１）
のみをセラ１−シ２画像データをリングバス制御回路８
からリングバス１に流し込む ■　次々とデータラッチ２１上をシフトして行く画像デ
ータは、１番目のＰＵ３（１）のローカルメモリ２（１
）のみに書き込まれる。この間はモード信号１２は、ホ
ールドモードにしておき、全てのフラッグラッチ２２は
変化させない ■　２番目のＰＵ２０（２）が受は取るべき最初の画素
データが２番目のＰＵ２０（２）に到達するタイミング
で、モード信号１２をコピー１モードにして再びホール
ドモードにするコピー１モードは、左側のフラッグラッチ２２（ｉ　−
１）が１であるときのみ、そのフラッグラッチ２２（ｉ
）を１にするという意味である。これにより（１番目の
フラッグラッチ２２（１）は１のままで）、２番目のフ
ラッグラッチ２２（２）が１となる。この状態で画像デ
ータのシフトを続行すると、１番目と２番目のローカル
メモリ２（１）と２（２）への書込みが並行して行なお
れる。すなおち。■ All PE3 have ? Increment 1 in order from the first address where 9 images should be written to local memory 2
Load and start the J program that writes the data from the ring bus 1 to the local memory 2 while generating the address for ~.■ Flag latch 22 (1) of the first PU3 (1)
The ring bus control circuit 8 only sends the image data to the cell 1 and 2.
The image data that is shifted onto the data latch 21 one after another is transferred to the ring bus 1 from the local memory 2 (1) of the first PU3 (1).
) only. During this time, the mode signal 12 is set to the hold mode, and all flag latches 22 are not changed. ■ Timing when the first pixel data that should be received by the second PU 20 (2) reaches the second PU 20 (2). In the copy 1 mode, which changes the mode signal 12 to the copy 1 mode and returns to the hold mode, the left flag latch 22 (i -
1) is 1, the flag latch 22(i
) is set to 1. As a result, the second flag latch 22(2) becomes 1 (the first flag latch 22(1) remains 1). If the image data continues to be shifted in this state, writing to the first and second local memories 2(1) and 2(2) will be performed in parallel. Sunaochi.

オーバラップ分の書込みが同時的に実行される。Overlapping writes are executed simultaneously.

そして ■　その１ラスタ中で、１番目のローカルメモリ２（１
）に書き込むべきデータの最後の画素データが、１番目
のＰＵ２０（１）を通過するタイミングで、モード信号
１２をコピー０モードにするコピー０モードは、左側の
フラッグラッチ２２（ｉ−１）が０であるときのみ、そ
のフラッグラッチ２２（ｉ）をＯにするという意味であ
る。これにより、１番目のフラッグラッチ２２（１）は
Ｏとなり、２番目のフラッグラッチ２２（２）は１を保
つ。and■ In that one raster, the first local memory 2 (1
), the mode signal 12 is set to copy 0 mode at the timing when the last pixel data of the data to be written passes through the first PU 20 (1). In the copy 0 mode, the left flag latch 22 (i-1) This means that the flag latch 22(i) is set to O only when it is 0. As a result, the first flag latch 22(1) becomes O, and the second flag latch 22(2) maintains 1.

この状態で画像データのシフトを続行すると、２番目の
ローカルメモリ２（２）のみに選択的な書込みが行なわ
れる。以後、■■の「１番目」を「ｉ−１番目」、「２
番日」をｒｉ番目」と読み替えて ■　ｉ　を進めつつ、上記と同様に■と■を緑り返して
、ｉ　番目とｉ−１番目のフラッグラッチ２２（ｉ）と
２２（ｉ　−１）を孫作して＋ｍｉ目のＰＵ２０（ｍ）
までの１ラスタ分の書込みが行なわれたとき、再び最初
の状態（■）に戻るこうして２次々と後続のラスタの画
像データの書込みが行なわれる。１次元のたんざく形の
領域分割の場合には、このようにして全ての部分画像の
分配が終了するが、２次元の領域分割の場合には。If the shifting of image data is continued in this state, selective writing will be performed only in the second local memory 2 (2). From now on, the “1st” of ■■ will be changed to “i-1st” and “2nd”.
Replace "number day" with "rith" and advance ■i, turn ■ and ■ green in the same way as above, and set the i-th and i-1th flag latches 22(i) and 22(i-1). A grandchild of +mi PU20 (m)
When writing for one raster has been completed, the state returns to the initial state (■), and the image data of the subsequent raster is written one after the other. In the case of one-dimensional tanzag-shaped region division, distribution of all partial images is completed in this way, but in the case of two-dimensional region division.

縦方向にも上記の横方向の制御と同じようなモード信号
の制御を行なう。Mode signal control similar to the above-mentioned horizontal direction control is also performed in the vertical direction.

（ｂ）　　領域分割モードにおける非正常部の置換の　
全てのＰＥ３には、「ローカルメモリ２のアドレスを更
新しつつリードを行ない９次いでライトを行なうことを
繰り返す」プログラムをロードし起動して置く ■　モード信号１２によって一度全てのフラッグラッチ
２２を１にセットして、全てのＰＥ３にローカルメモリ
２からの読出しを行なわせる■　再びモード信号１２に
よって全フラッグラッチ２２を０に戻して（ＰＥ３のラ
イト動作を引き延ばして置き）、リングバス１を所定回
数（ＰＵの論理的な２次元配列と論理的な２次元の転送
方向によって決まる数）だけシフトする ■　再度全フラッグラッチ２２を１にセットするこれで
転送されたデータが、ＰＥ３のライト動作によってロー
カルメモリ２に書き込まれる。こうして１画素分の置換
が完了する。以後 ■　■ににＴ−、て、同じ動作を所定回数（オーバラッ
プの輻と部分画像の１辺の長さの積として求まる画素数
、第７図を参照）だけ繰り返す。(b) Replacement of abnormal parts in region division mode
All PE3s are loaded with a program that "repeats reading and then writing while updating the address of local memory 2." ■ All flag latches 22 are set to 1 once by mode signal 12. Set all flag latches 22 to 0 again by mode signal 12 (to postpone the write operation of PE3), and cause all PE3 to read data from local memory 2. Set all flag latches 22 to 1 again The transferred data is transferred to the local memory by the write operation of PE3. Written to 2. In this way, replacement for one pixel is completed. Thereafter, the same operation is repeated a predetermined number of times (the number of pixels determined as the product of the overlap radius and the length of one side of the partial image, see FIG. 7).

ここまでで、１方向分の画素データの置換が終了する。Up to this point, the replacement of pixel data in one direction is completed.

■　リングバス制御回路８にセットしであるリングバス
のシフト回数（■で用いる）を変更して。■ Change the number of ring bus shifts (used in ■) set in the ring bus control circuit 8.

■〜■をもう一度実行する。Execute ■～■ again.

これで２方向分の置換が終了するが、ここまでで３方向
転送型で実際に必要な転送は全て完了しており、３回目
の転送は不要である。何故ならば。This completes the replacement in two directions, but all the actually necessary transfers for the three-way transfer type have been completed so far, and the third transfer is unnecessary. because.

（左への転送の次に上への転送を行なうとして）１回目
の置換時に第７図で示したＳ　＋　＋１１０１の正常デ
ータは、５Ｉｊ（１、へと転送されており、これを２回
目の転送でＳｉＪ＋１中の正常データと合わせてＳｌへ
転送できるからである。The normal data of S + +1101 shown in Figure 7 during the first replacement (assuming that the transfer to the left is then transferred upward) is transferred to 5Ij (1), and this is transferred to the second replacement. This is because the data can be transferred to Sl together with the normal data in SiJ+1.

以上の説明で分かるように、領域分割モードにおける非
正常部の置換は、リングバス１を用いることによって、
複数のＰＵ３の全てに対して並列的に行なうことができ
るため、ＰＵ３の数に依存せず短時間で終了する。As can be seen from the above explanation, replacement of abnormal parts in region division mode can be done by using ring bus 1.
Since the process can be performed in parallel for all the plurality of PUs 3, it is completed in a short time regardless of the number of PUs 3.

複数のＰＵ３への同一画像の一斉ロードも、同様の制御
方法によれば、画像データを一度リングバス１に載せる
だけで全てのＰＵ３に配送されるため、極めて高速に実
行することができる。According to the same control method, simultaneous loading of the same image to a plurality of PUs 3 can be executed extremely quickly since the image data is delivered to all PUs 3 by simply loading it onto the ring bus 1 once.

リングバス制御回路８は、第９図に示すようにシフＩ・
制御部８１とセレクタ８２．入力バッファ８３、出力バ
ッファ８４から構成される１通常セレクタ８２はリング
バスデータ１１（６４）が１１（０）に出ていくように
接続されている０画像データの入力を行う際には、セレ
クタ８２が、システムバス１０からの信号によって切替
えられて、入カバソファを経由して入ってくるデータを
リングバスデータ１１（０）としてリングバスに送り込
まれる。プリンタなどへの画像データの出力は、リング
バスデータ１１（６４）を出力バッフ７８４を介して外
部へ取り出す。As shown in FIG. 9, the ring bus control circuit 8 has a shift I.
Control unit 81 and selector 82. The 1 normal selector 82, which is composed of an input buffer 83 and an output buffer 84, is connected so that the ring bus data 11 (64) goes out to 11 (0). 82 is switched by a signal from the system bus 10, and data coming in via the input bus sofa is sent to the ring bus as ring bus data 11(0). To output image data to a printer or the like, ring bus data 11 (64) is taken out to the outside via an output buffer 784.

シフト制御部８１の中には、論理的な２次元配置に置か
れたＰＵの縦と横の数ｍとｎや領域分割された画像のオ
ーバラップ幅などを記憶するレジスタを設け、これらの
値は、リングバス１の転送開始前にマイクロコンピュー
タ５によって設定される。シフト制御部８１は上記レジ
スタの他にシフトクロック発生器、シフトクロック１４
をカラン］−するカウンタ、これらのカウント値とレジ
スタに設定された値を元に、モード信号１２を次々に変
更するためのＲＯＭやＰＡＬから構成されるが、上述の
制御方式を理解すれば９種々の方式で容易に実現できる
ので、詳細は省略する。The shift control unit 81 is provided with registers that store the vertical and horizontal numbers m and n of PUs placed in a logical two-dimensional arrangement, the overlap width of the divided images, and the like. is set by the microcomputer 5 before starting transfer on the ring bus 1. In addition to the above-mentioned registers, the shift control section 81 also includes a shift clock generator and a shift clock 14.
It consists of a counter that changes the mode signal 12 one after another based on these count values and the values set in the register, but if you understand the control method described above, Since this can be easily realized using various methods, details will be omitted.

〈発明の効果〉本発明によれば、プロセッサ間のデータの交換を高速化
できるため、マルチプロセッサにおける処理速度を大幅
に向上することができる。又、プロセッサ間は１次元的
な接続としているため、装置規模が小さく実用的である
。そして、プロセッサ間を結合するリングバスをプロセ
ッサとローカルメモリの間に配置したことにより、多く
の動作モードを実現することが可能となり、更にデータ
の流れとプロセッサ処理の同期、メモリアドレスの発生
などの方法が単純かつ高速化された。<Effects of the Invention> According to the present invention, it is possible to speed up the exchange of data between processors, so that the processing speed in a multiprocessor can be significantly improved. Furthermore, since the processors are connected in a one-dimensional manner, the device size is small and practical. By placing a ring bus that connects processors between processors and local memory, it has become possible to realize many operating modes, and also to synchronize data flow and processor processing, and to generate memory addresses. The method is now simpler and faster.

【図面の簡単な説明】第１図は本発明の一実施例、第２図と第３図はバス制御
回路の詳細を示す図、第４図は画像の分割の説明図、第
５図〜第８図は動作原理を説明する図、第９図はリング
バス制御回路の詳細を示す図である。符号の説明１・・・リングバス２・・・ローカルメモリ３・・・プロセッサエレメント（ＰＥ）４・・・バス制
御回路５・・・マイクロコンピュータ８・・・リングバス制御回路[Brief Description of the Drawings] Fig. 1 is an embodiment of the present invention, Figs. 2 and 3 are diagrams showing details of the bus control circuit, Fig. 4 is an explanatory diagram of image division, and Figs. FIG. 8 is a diagram explaining the operating principle, and FIG. 9 is a diagram showing details of the ring bus control circuit. Explanation of symbols 1...Ring bus 2...Local memory 3...Processor element (PE) 4...Bus control circuit 5...Microcomputer 8...Ring bus control circuit

Claims

[Claims]

(1) A path control circuit consisting of a circuit that arranges multiple sets of processor units each consisting of a processor element and its local memory, and switches at least the data transfer direction between each processor unit and the local memory, and a data latch. 1. A multiprocessor system comprising: a multiprocessor system, wherein the data latches of the path control circuits are connected to each other like a shift register.

(2) A plurality of processor units each consisting of a processor element and its local memory are arranged, and a circuit for switching at least the data transfer direction between each processor unit and the local memory, a data latch, and a flag latch is provided. The data latches of the path control circuit are connected to each other like a shift register, and the flag latch is connected to the processor.
A multiprocessor system characterized by being connected to a memory access control unit of an element.