JPS63263577A

JPS63263577A - Parallel processing device for image

Info

Publication number: JPS63263577A
Application number: JP9761387A
Authority: JP
Inventors: Naoto Kawamura; 尚登河村
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1987-04-22
Filing date: 1987-04-22
Publication date: 1988-10-31

Abstract

PURPOSE:To effectively execute the parallel image processing and to attain the high speed processing by sending one unit of the image processing composed of plural picture elements to an empty processor as the input pair of a picture element address and image data and also making the address and the image data after the image processing into an output pair and returning them. CONSTITUTION:The device is the parallel processing device for an image to execute in parallel the image processing to make plural picture elements into one processing unit with individual processors, and equipped with a means 11 to control the emptiness of respective processors 13 and the processing completion at respective processors, and a supplying means 20 to supply an input pair composed of the image data of the picture element and the address of the input image space of the picture element to the unspecified empty processor concerning the individual picture element of one processing unit. The processors 13, to which the input pair of one processing unit is supplied, processes the input pair of one processing unit and returns the output pair to add the address of the output image space of the image data to the image data after the processing. Thus, the parallel image processing is effectively executed with plural processing units and the high speed processing can be executed.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は画像の並列処理装置に関し、特に複数の処理プ
ロセサを用いた画像の並列処理装置に関するものである
。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to an image parallel processing apparatus, and more particularly to an image parallel processing apparatus using a plurality of processors.

［従来の技術］従来、画像処理を高速に行う手段としてパイプライン処
理がある。これは、画像処理の一工程を互いに干渉し合
わないｎ個の逐次的な処理段階に細分化し、この細分化
されたｎ個の処理段階を統合したものを１つのプロセサ
とする。そして、このプロセサに画像データ列をラスク
ー人力し、あたかも一度にｎ個の画像データを処理する
かのようにして、結果的に高速化を狙うものである。[Prior Art] Pipeline processing has conventionally been used as a means for performing image processing at high speed. In this method, one process of image processing is subdivided into n sequential processing steps that do not interfere with each other, and the n subdivided processing steps are integrated into one processor. Then, the image data string is manually input to this processor, as if it were processing n pieces of image data at once, with the aim of increasing the speed as a result.

［発明が解決しようとする問題点］しかしながら、この手法は特別のハードウェアが必要と
なり、しかもそのハードウェアは特定の処理に専用とな
り、他の処理又は汎用的処理に関しては融通性が無い。[Problems to be Solved by the Invention] However, this method requires special hardware, and the hardware is dedicated to specific processing, and is not flexible with respect to other processing or general-purpose processing.

一方１汎用的Ｍ　Ｐ　Ｕ　（Ｍｉｃｒｏ　Ｐｒｏｃｅｓ
ｓｏｒ　ＩＪｎｉｔ）を用いて画像処理を行う場合、各
種処理をソフトウェア処理していくために汎用性は高い
が、その反面、処理スピードが遅いという欠点があった
。On the other hand, one general-purpose MPU (Micro Process
When image processing is performed using ``Sor IJnit'', the versatility is high because various processes are processed by software, but on the other hand, the processing speed is slow.

これらの問題は基本的には処理対象であるところの画像
というものの特殊性故である。即ち、画像処理以外の通
常のデータ処理においては、データ間の関連性、相関性
はそれほど問題とはならないのに対し、画像においては
、個々の画素の集合が１つの画像を構成しているのであ
るから、これらの画素をバラバラに処理したのでは、画
像そのものが成立しなくなる。これは、画素は本質的に
は、画像データと画素の位置情報（アドレス）により構
成されているからに他ならない。These problems are basically due to the special nature of the image that is the object of processing. In other words, in normal data processing other than image processing, relationships and correlations between data are not so much of a problem, whereas in images, a collection of individual pixels constitutes one image. Therefore, if these pixels are processed separately, the image itself will not be formed. This is because a pixel essentially consists of image data and pixel position information (address).

本発明は上述の画像の本質の立ち帰って本質的に画像の
並列処理に対する適合性を考察した上で、上記従来の欠
点を除去するために提案されたものでその目的は、複数
の処理ユニットで並列画像処理を効果的に行い、高速に
処理が可能な画像の並列処理装置を提案する点にある。The present invention was proposed in order to eliminate the above-mentioned conventional drawbacks by going back to the essence of images and essentially considering their suitability for parallel processing of images. The purpose of this invention is to propose a parallel image processing device that can effectively perform parallel image processing and perform high-speed processing.

［問題点を解決するための手段］上記課題を解決するための本発明に係る並列処理装置の
構成は、複数の画素を一処理単位とした画像処理を個々
のプロセサが並列的に行なう画像の並列処理装置であっ
て、各プロセサの空きと各プロセサにおける処理終了を
管理する手段と、前記一処理単位の個々の画素について
、画素の画像データとこの画素の入力画像空間のアドレ
スとからなる入力ペアを不特定の空いているブロセサヘ
供給する供給手段とを備えることを特徴とする。[Means for Solving the Problems] The configuration of the parallel processing device according to the present invention for solving the above problems is an image processing system in which individual processors perform image processing in parallel with a plurality of pixels as one processing unit. A parallel processing device, comprising: means for managing availability of each processor and completion of processing in each processor; and input consisting of image data of a pixel and an address of an input image space of this pixel for each pixel of the one processing unit; and a supply means for supplying the pairs to an unspecified vacant processor.

［作用］上記構成によると、上記一処理単位の入力ペアを供給さ
れたプロセサは、該一処理単位の入力ペアを処理して、
この処理後の画像データに該画像データの出力画像空間
のアドレスを付加した出力ペアを返す。[Operation] According to the above configuration, the processor supplied with the input pair of one processing unit processes the input pair of one processing unit,
An output pair is returned in which the address of the output image space of the image data is added to the image data after this processing.

［実施例］以下添付図面に従って本発明の実施例を詳細に説明する
。[Examples] Examples of the present invention will be described in detail below with reference to the accompanying drawings.

（実施例装置の概略）第１図は本発明を適用した並列処理装置の一実施例を示
す。(Outline of Embodiment Apparatus) FIG. 1 shows an embodiment of a parallel processing apparatus to which the present invention is applied.

同図において、１０は画像処理対象である画像データを
格納する画像メモリ、１３−１〜１３ｘは画像処理の処
理ユニットであるＭＰＵ、２０は上記ＭＰＵＩＩＩ〜１
３−Ｎを管理するマスタｃｐＵ、１１は画像メモリ１０
とＭＰＵ（１３−、〜ｌ３−Ｎ）間でのデータの読出し
／書込み（以下、Ｒ／Ｗと略す）を制御するＲ／Ｗコン
トローラ、１６は画像メモリ１０の内容を表示する表示
装置である。後述するように本実施例においては、処理
の高速性を実現するために、ＣＰＵ２０は画像処理の初
期設定過程と最終過程で関与するのみである。In the figure, 10 is an image memory that stores image data to be image processed, 13-1 to 13x are MPUs that are processing units for image processing, and 20 is the above-mentioned MPU III to 1.
11 is the image memory 10
16 is a display device that displays the contents of the image memory 10; . As will be described later, in this embodiment, in order to achieve high-speed processing, the CPU 20 is only involved in the initial setting process and final process of image processing.

画像メモリ１０は、例えば１０２４ｘ　１０２４の画素
を収容するＩＣメモリからなる。この画素はカラーデー
タを持つために、一画素が２４ビツト（Ｒ＝８ビット、
Ｇ零８ビット、Ｂ＝８ビット）で構成されている。ＭＰ
Ｕは１〜ＮまでのＮ個が用いられる（１３−、〜１３−
、）　、画像メモリ１０の内容は表示コントローラ１５
を介してディスプレイ装置１６に表示される０画像メモ
リ１０は、Ｒ／Ｗコントローラ１１と表示コントローラ
１５とからアクセスされるために、デュアルポートＲＡ
Ｍを用いて構成されている。The image memory 10 consists of an IC memory that accommodates, for example, 1024×1024 pixels. Since this pixel has color data, one pixel has 24 bits (R = 8 bits,
G zero 8 bits, B=8 bits). MP
N numbers from 1 to N are used for U (13-, ~13-
), the contents of the image memory 10 are stored in the display controller 15.
The zero image memory 10 displayed on the display device 16 via the dual port RA is accessed by the R/W controller 11 and the display controller 15.
It is configured using M.

各ＭＰＵはそれぞれＲＡＭ（１４−鳳〜ｌ４−Ｎ）を持
ち、このＲＡＭは各ＭＰＵで実行されるプログラム格納
領域、処理途中の画像データを一時蓄えておくワークメ
モリ等の通常の使い方が行われる。かくして、各ＭＰＵ
とそれに付随するＲＡＭ１そして後述のキューバッファ
とで、処理ユニット（ＰＥ）を形成する。マスタＣＰＵ
２０は自身のローカルメモリであるＲＡＭ１９を持ち、
後述するように、このＲＡＭ１９にはマスタＣＰＵの制
御プログラム及び、必要に応じて各ＭＰｔ）（ｔ３−、
〜１３−．）に送られる画像処理用の処理プログラムが
格納されている。Each MPU has its own RAM (14-Otori to 14-N), and this RAM is used normally as a storage area for programs executed by each MPU, and as a work memory for temporarily storing image data that is being processed. . Thus, each MPU
A processing unit (PE) is formed by the RAM 1 associated therewith and a queue buffer to be described later. Master CPU
20 has its own local memory RAM 19,
As will be described later, this RAM 19 stores the control program of the master CPU and, if necessary, each MPt) (t3-,
~13-. ) contains a processing program for image processing that is sent to

各処理ユニットで処理される画像データそして処理され
た画像データ・は、各々の処理ユニットに対応するキュ
ーバッファ１２−１〜１２１に一時的に蓄えられる。こ
れらのキューバッファは入力用と出力用のキューをもち
、入力用キューには主に処理前の画像データを複数個、
出力用キューには処理後の画像データを複数個格納され
るようになっている。即ち各キューバッファは各処理ユ
ニットに対してキュー（待行列）を形成している。The image data processed by each processing unit and the processed image data are temporarily stored in queue buffers 12-1 to 121 corresponding to each processing unit. These queue buffers have input and output queues, and the input queue mainly stores multiple pieces of unprocessed image data.
A plurality of pieces of processed image data are stored in the output queue. That is, each queue buffer forms a queue for each processing unit.

〈処理プログラムのＲＡＭへの格納〉各処理ユニットのＲＡＭ（１１１〜ｌ４−Ｎ）に格納さ
れる処理プログラムについて説明する。本実施例に於け
る並列処理方式はＳ　Ｉ　Ｍ　Ｄ　（ＳｉｎｇｌｅＩｎ
ｓｔｒｕｃｔｉｏｎ　Ｓｔｒｅａｍ　／　Ｍｕｌｔｉ　
Ｄａｔａ　Ｓｔｒｅａｍ）　　と呼ばれる方式で、各Ｍ
ＰＵが全て同じ処理プログラムを実行する。ある画像処
理を行なうのに、その処理に対応する処理プログラムが
各ＭＰＵのＲＡＭにどのように格納され、どのようにし
て、各ＭＰＵがそのプログラムを実行するかには、二通
りの方法が考えられる。<Storage of Processing Program in RAM> The processing program stored in the RAM (111 to 14-N) of each processing unit will be described. The parallel processing method in this embodiment is SIMD (SingleIn
structure Stream / Multi
Data Stream), each M
All PUs execute the same processing program. When performing certain image processing, there are two methods for how the processing program corresponding to that processing is stored in the RAM of each MPU and how each MPU executes that program. It will be done.

１つは、画像処理毎に、マスタＣＰＵ２０が該当する処
理用プログラム（これはＲＡＭ１９に格納されている）
を、各ＭＰＵのＲＡＭ（１４−、〜ｌ４−Ｎ）へバス１
８を介して転送するようにする。他の方法は、各ＭＰＵ
が処理用プログラムを複数個持っており、どの処理を行
うかの指令のみをマスタＣＰＵ２０が各ＭＰＵに送るも
のである。前者は各ＭＰＵの持つＲＡＭ１４−ｔ〜１４
−Ｎの容量が小さくて済むが、プログラムのローディン
グに時間がかかる。後者は各ＭＰＵにｆｉ埋プログラム
を持たせる必要があり、ＲＡＭ１４−ｒ〜１４−８の容
量は増大する。もし、処理するプログラムが小さければ
後者の方がよいが、汎用的にどんな処理をも可能にする
ためには前者がよい。従って、目的に応じて使いわける
様にする。One is that for each image process, the master CPU 20 creates a corresponding processing program (this is stored in the RAM 19).
to the RAM (14-, ~l4-N) of each MPU via bus 1
8. Another method is to
The master CPU 20 has a plurality of processing programs, and the master CPU 20 only sends instructions on which processing to perform to each MPU. The former is RAM14-t~14 that each MPU has.
-N requires a small capacity, but it takes time to load the program. The latter requires each MPU to have a fi-embedded program, which increases the capacity of the RAMs 14-r to 14-8. The latter is better if the program to be processed is small, but the former is better in order to be able to perform any general processing. Therefore, use them appropriately depending on the purpose.

以下に説明する実施例では、各画像処理プログラムが全
て、各ＭＰＵのローカルＲＡＭ１４に格納されているよ
うな実施例である。In the embodiment described below, all image processing programs are stored in the local RAM 14 of each MPU.

〈画像メモリのアドレッシング）画像メモリのアドレスは第２図に示される様に、Ｘ−ア
ドレスとＹ−アドレスとでアドレス指定ができる構成に
なっている。即ち、Ｘ−アドレスとして０からＸ、８ま
で、Ｙ−アドレスとして０からＹ、、８までの値をとり
える。この（ｘ。(Image Memory Addressing) As shown in FIG. 2, the image memory address is configured so that it can be specified using an X-address and a Y-address. That is, the X-address can take values from 0 to X, 8, and the Y-address can take values from 0 to Y, , 8. This (x.

Ｙ）の値により画像メモリ内の任意の一点の画像データ
を取り出す事が出来る。今、画像メモリ１０が１０２４
ｘ　１０２４個の画像データから構成されているとする
と、Ｘ□、　Ｆ　Ｙ□、＝１０２４子２１０で、各１０
ビツトのアドレス空間となる。Image data at any point in the image memory can be retrieved depending on the value of Y). Now, the image memory 10 is 1024
x 1024 pieces of image data, X□, F Y□, = 1024 children 210, each with 10
Bit address space.

〈バスデータのフォーマット〉第３Ａ図〜ＭＢＣ図に、各キューバッファとＲ／Ｗコン
トローラ１１との間を接続するバス１７上に流れるデー
タフォーマットを示す。<Format of Bus Data> The format of data flowing on the bus 17 connecting each queue buffer and the R/W controller 11 is shown in FIGS. 3A to 3B.

各フォーマットの先頭フィールドは、５ＹＮＣフイール
ドである。これは物理層レベルで八−ドウエアがバス上
にデータが存在することを認識するためにある。The first field of each format is a 5YNC field. This is so that the hardware at the physical layer level recognizes that data exists on the bus.

第３Ａ図〜第３Ｃ図の各フォーマットの最初のフィール
ドはＭＰＵ番号を格納するもので、例えばＭＰＵの数が
４つであれば、２ビツト（冨０゜１．２．３）長２であ
る。The first field of each format in Figures 3A to 3C stores the MPU number; for example, if the number of MPUs is 4, it is 2 bits (0° 1.2.3) long and 2. .

３４３Ａ図のフォーマットはＲ／Ｗコントローラ１１が
セレクテイングシーケンスで使うものであり、このセレ
クテイングにより、コントロールビット、画像メモリの
（ｘ、ｙ）アドレス、画像メモリの画像データ等が、Ｍ
ＰＵ番号フィールドに示された番号のＭＰＵのキューバ
ッファ１２に送られる。ここで、２ビツトのコントロー
ルビットは第４図に示した内容を有するが、その詳細な
内容及び使われ方は以下に説明する実施例により自ずか
ら明らかとなる。The format shown in Figure 343A is used by the R/W controller 11 in the selecting sequence, and by this selecting, the control bits, the (x, y) address of the image memory, the image data of the image memory, etc.
The data is sent to the queue buffer 12 of the MPU with the number indicated in the PU number field. Here, the two control bits have the contents shown in FIG. 4, but their detailed contents and how they are used will become clear from the embodiments described below.

第３Ｂ図はＲ／Ｗコントローラ１１からのポールシーケ
ンスを示す、Ｒ／Ｗコントローラ１１がボールまでを送
ると、送るべきデータを有する対応するＰＥが続いてデ
ータをバス１７上に載せる。FIG. 3B shows the poll sequence from the R/W controller 11; once the R/W controller 11 has sent up the ball, the corresponding PE that has data to send will subsequently put the data on the bus 17.

第３Ｃ図はＰＥからの返答データで、ＡＣＫはからあセ
レクテイングシーケンスに対して正常にデータを受けた
ことを、ＮＡＫは入力キューが満杯であったことを、Ｅ
ＭＰＴＹはポーリングに対して入力キューが空であった
ことを示す。Figure 3C shows the response data from PE, where ACK indicates that the data was successfully received for the selecting sequence from PE, and NAK indicates that the input queue is full.
MPTY indicates that the input queue was empty for polling.

〈実施例の概略原理〉以下に説明される実施例は、２通りの構成（第１実施例
、第２実施例）をもつ、その２通りの構成の実施例は、
基本的に第１図に示したハードウェア構成を共通に有し
、各ＭＰＵと画像メモリ間のデータ転送形態、各ＭＰＵ
での画像処理形態を巧みに変更構成することにより実現
される。<Schematic principle of the embodiment> The embodiment described below has two configurations (first embodiment and second embodiment).
Basically, the hardware configuration shown in Figure 1 is common, and the data transfer form between each MPU and image memory, each MPU
This is achieved by skillfully changing and configuring the image processing format.

第５Ａ図、第５Ｂ図は夫々、第１実施例、第２実施例に
よりなされる、画像データの転送元／転送先等を示すも
のである。FIGS. 5A and 5B show the source/destination of image data transfer, etc., performed in the first embodiment and the second embodiment, respectively.

第１実施例（第５Ａ図）での画像処理は主に、マスキン
グ処理、γ変換処理、二値化処理等に代表されるもので
ある。即ち、画像処理対象の画素コは１つであり、そのＩＡ埋のために当該１つの対象とな
る画素に関する情報のみが必要となるような画像処理で
ある。一方、第２実施例（第５Ｂ図）では、ラプラシア
ン処理、強調処理、エツジ検出、スムージング処理等の
画像処理で、その特徴は空間的広がり（−次元、二次元
）をもつものである、即ち、画像処理を行なうためには
複数個の画素に関する情報が必要なものである。The image processing in the first embodiment (FIG. 5A) is mainly represented by masking processing, γ conversion processing, binarization processing, and the like. That is, the number of pixels to be image processed is one, and the image processing is such that only information regarding the one target pixel is required for IA filling. On the other hand, in the second embodiment (Fig. 5B), image processing such as Laplacian processing, emphasis processing, edge detection, and smoothing processing is characterized by having a spatial extent (-dimensional, two-dimensional), that is, In order to perform image processing, information regarding a plurality of pixels is required.

そして、この画素に関する情報とは、画像メモリ内のア
ドレスと、例えば濃度、輝度１色等の画像データである
。The information regarding this pixel is an address in the image memory and image data such as density and brightness of one color.

第５Ａ図の第１実施例によれば、画像メモリ１０内の連
続した３つの画素（ｎ、ｎ＋１．ｎ＋２）は、とのＰＥ
に送られても構わない。即ち、並列処理の効率を上げる
ために、画素の情報は、アドレスと画像データのペアで
、画像メモリ１゜からいずれかの空いているＰＥに送ら
れ、そのＰＥで高速に画像処理され、その画像処理後、
処理後の画像データと処理後のアドレスとがペアでＲ／
Ｗコントローラ１１に返送される。Ｒ／Ｗコントローラ
１１はこの返送された画像データを、同じく返送された
アドレスに従って格納する。According to the first embodiment shown in FIG. 5A, three consecutive pixels (n, n+1.n+2) in the image memory 10 are
I don't mind being sent to. That is, in order to increase the efficiency of parallel processing, pixel information is sent as a pair of address and image data from the image memory 1° to any available PE, where the image is processed at high speed, and then After image processing,
The processed image data and the processed address are paired as R/
It is sent back to the W controller 11. The R/W controller 11 stores this returned image data according to the same returned address.

かくして、マスキング処理、γ変換処理、二値化処理等
に代表される画像処理、即ち画像処理対象の画素は１つ
であり、そのためには当該１つの対象となる画素に関す
る情報のみが必要となるような画像処理を、第５Ａ図に
示した第１実施例で行なえば、全体の画像処理の効率は
全＜ＰＥの個数に依存するものとなり、しかも、画像デ
ータとメモリアドレスとをペアで転送するために、処理
後の画像としての全体性は失われない。In this way, image processing such as masking processing, γ conversion processing, binarization processing, etc., that is, the number of pixels to be processed is one, and for this purpose only information regarding the one target pixel is required. If such image processing is performed in the first embodiment shown in FIG. 5A, the efficiency of the overall image processing will depend on the number of PEs, and moreover, the image data and memory address will be transferred in pairs. Therefore, the integrity of the image after processing is not lost.

第５Ｂ図の第２実施例によれば、１つのＰＥでなされる
画像処理の単位は複数個の画素を含むので、この一単位
を、即ちこの単位の複数個の画素の情報をまとめて、Ｐ
Ｅに転送するようにする。According to the second embodiment shown in FIG. 5B, since the unit of image processing performed by one PE includes a plurality of pixels, this one unit, that is, the information of the plurality of pixels of this unit is summarized, P
Transfer it to E.

同じ単位内の画素は同−ＰＥに転送されるのであれば、
処理単位は第５Ｂ図に示したように任意のＰＥに転送す
ることが可能となり、全体の画像処理の並列性が高まり
、効率が向上する。If pixels within the same unit are transferred to the same PE,
The processing unit can be transferred to any PE as shown in FIG. 5B, increasing the parallelism of the overall image processing and improving efficiency.

上°述した画像処理の他に、例えば平行移動１回転等の
アフィン変換や、ディザ処理等の二値化処理がある。ア
フィン変換等は、メモリアドレスの演算が画像処理に対
応するから、基本的に第５Ａ図の第１実施例で対応可能
である。又、二値化処理は、例え二値化マトリクスが二
次元であっても、１つの画素の画像処理には周辺画素の
情報を必要としないから、第１実施例で対応できる。In addition to the image processing described above, there are, for example, affine transformation such as parallel translation and one rotation, and binarization processing such as dither processing. Affine transformation and the like can basically be handled by the first embodiment shown in FIG. 5A, since memory address calculation corresponds to image processing. Furthermore, even if the binarization matrix is two-dimensional, the binarization process can be handled by the first embodiment because image processing of one pixel does not require information about surrounding pixels.

さて、第５Ａ、５Ｂ図を見ても明らかなように、キュー
バッファ１２の存在は本発明においては木質的ではない
。ところが、実際は各ＭＰＵで行なわれる画像処理に要
する時間は、バス１７上でデータをやり取りする時間よ
り大きい。即ち、車に第５Ａ、５Ｂ図に示したもののみ
の構成では、Ｒ／Ｗコントローラ１１のアイドル時間の
増大が問題となる。そのアイドル時間を極小化するため
に、第１図等に示したキューバッファ１２があるわけで
ある。Now, as is clear from FIGS. 5A and 5B, the presence of the queue buffer 12 is not wooden in the present invention. However, in reality, the time required for image processing performed by each MPU is longer than the time required for exchanging data on the bus 17. That is, if the vehicle is configured only as shown in FIGS. 5A and 5B, an increase in the idle time of the R/W controller 11 becomes a problem. In order to minimize the idle time, the queue buffer 12 shown in FIG. 1 etc. is provided.

（Ｒ／Ｗコントローラ〉次’ｅＲ／Ｗコントローラ１１について示す、Ｒ／Ｗコ
ントローラ１１は画像メモリ１０と各ＭＰＵ１３どの間
でのデータの入力／出力の制御を行うコントローラであ
る。即ち、このコントローラの機能は、（１）画像メモリ１０の画像データを各ＭＰＵへ割り付
けること、（２）各ＭＰＵで処理した画像データを画像メモリへ戻
すこと（３）そのために、各キューバッファ間との間の通信制
御を行なうこと、等である。第６図に、Ｒ／Ｗコントローラ１１の構成と
、それとデータメモリ１０並びに各キューバッファとの
接続を示す。Ｒ／Ｗコントローラ１１内で、バス１７は
バスインターフェース５４を介してＭＰＵ５１と接続し
ている。このバスインターフェース５４は通信制御を行
なう。ＲＯＭ５２には後述の第１３図のプログラムが格
納されている。ＲＡＭ５３には第１４図に示した中間情
報、フラグ等が格納される。ＭＰＵ５１はバス５０を介
して画像メモリ１０に接続する。(R/W controller) The R/W controller 11 shown next is a controller that controls data input/output between the image memory 10 and each MPU 13. In other words, this controller The functions are: (1) Allocating the image data in the image memory 10 to each MPU, (2) Returning the image data processed by each MPU to the image memory, and (3) For that purpose, communication between each queue buffer. 6 shows the configuration of the R/W controller 11 and its connections with the data memory 10 and each queue buffer.In the R/W controller 11, a bus 17 is a bus interface. 54 to the MPU 51. This bus interface 54 performs communication control. The ROM 52 stores a program shown in FIG. 13, which will be described later. The RAM 53 stores intermediate information and flags shown in FIG. etc. are stored.The MPU 51 is connected to the image memory 10 via the bus 50.

（ＭＰＵ＝Ｒ／Ｗコントローラ間のインターフェース〉第７図は、入力と出力用の２つのキューを含むキュー１
２ａと、インターフェース１２ｂとを含むキューバッフ
ァ１２が、ＭＰＵ１３並びにバス１７とどのようにイン
ターフェースするかを全体的に示している。(MPU = interface between R/W controllers) Figure 7 shows queue 1, which includes two queues for input and output.
2a and interface 12b, it generally shows how the queue buffer 12 interfaces with the MPU 13 and the bus 17.

ＭＰＵ１３とバス１７間でやり取りされるデータは、第
３Ａ、３Ｂ図に示されたメモリアドレス／画像データの
ペアである。このデータのペアをやり取りするのに、第
７図に示されたインターフェース信号が使われる。The data exchanged between the MPU 13 and the bus 17 are memory address/image data pairs shown in FIGS. 3A and 3B. The interface signals shown in FIG. 7 are used to exchange this pair of data.

第３Ｂ図に示したポールシーケンスをインターフェース
制御部１２ｂが受けると、このポールフォーマット内の
ＭＰＵと一致するインターフェース制御部１２ｂのみが
、キュー１２ａに対し、信号ポールを渡す。このポール
を受けると、キュー１２ａは出力キューに蓄えられてい
たメモリアドレス／画像データのペア等を制御部１２や
に渡し、制御部１２ｂはこのメモリアドレス／画像デー
タのペア等をバス１７上に、゛第３Ｂ図のフォーマット
になるように送出する。When the interface control unit 12b receives the poll sequence shown in FIG. 3B, only the interface control unit 12b that matches the MPU in this poll format passes a signal poll to the queue 12a. Upon receiving this poll, the queue 12a passes the memory address/image data pair etc. stored in the output queue to the control unit 12, and the control unit 12b transfers the memory address/image data pair etc. onto the bus 17. , ``Send in the format shown in Figure 3B.

制御部１２ｂがバス１７上にセレクテイングシーケンス
を検出すると、そのフォーマットのコントロールビット
の構成により、制御部１２ｂの動作は異なる。第４図に
おいて、画像メモリ→ＭＰＵへの流れにおいて、コント
ロールビット＝００．０１であるときは、そのセレクテ
イングシーケンス中のメそリアドレス／画像データのペ
ア等は入力キューに格納される。コントロールビット＝
１１であるときは、信号ＩＮＴ（割り込み信号）により
ＭＰＵに割り込みをかけ、キューイング無しに迅速にＭ
ＰＵにメモリアドレス／画像データのペアを渡す、コン
トロールビット＝１１のときに何故、キューイングしな
いかは第２実施例の説明で明らかになる。When the control section 12b detects a selecting sequence on the bus 17, the operation of the control section 12b differs depending on the configuration of the control bits of the format. In FIG. 4, in the flow from the image memory to the MPU, when the control bit=00.01, the memory address/image data pair, etc. in the selecting sequence are stored in the input queue. Control bit =
11, the signal INT (interrupt signal) interrupts the MPU and the MPU is quickly processed without queuing.
The reason why queuing is not performed when the control bit = 11 when the memory address/image data pair is passed to the PU will become clear in the explanation of the second embodiment.

ＭＰＵが入力キューからデータを受けとるのは入力キュ
ーのデバイスアドレスとリード（Ｒ）信号をキュー１２
ａに送ることによりなされる。ＭＰＵが出力キューにデ
ータを送るのは、書−込み信号（Ｗ）をキュー１２ａに
送ることによりなされる。The MPU receives data from the input queue by transmitting the input queue's device address and read (R) signal to queue 12.
This is done by sending it to a. The MPU sends data to the output queue by sending a write signal (W) to the queue 12a.

〈キュー〉第８Ａ図、第８Ｂ図に夫々、入力キュー。出力キューの
構成を示す。キューはシフトレジスタ等からなる先入れ
先出しくＦ　Ｉ　ＦＯ）のメモリである。先入れ先出し
を行なうために、入力キューは第８Ａ図に示すように、
アップ／ダウンカウンタ３２と、このカウンタのカウン
ト値をデコードす□　　　　　るデコーダ３３と、メモ
リアドレス／画像データのペアをシフト入力とするシフ
トレジスタ３５と、デコーダ３３の出力により、シフト
レジスタ３５の何れかの段の出力を選択してＭＰＵに出
力するセレクタ３４とからなる。一方、出力キューの構
成は第８Ｂ図に示した如くであるが、大体第８Ａ図のそ
れと同様であるので、その説明は省略する。<Queue> Figures 8A and 8B show input queues, respectively. Shows the configuration of the output queue. The queue is a first-in, first-out (FIFO) memory consisting of a shift register or the like. To perform first-in, first-out, the input queue is as shown in Figure 8A.
An up/down counter 32, a decoder 33 that decodes the count value of this counter, a shift register 35 that receives a pair of memory address/image data as a shift input, and the output of the decoder 33 selects one of the shift registers 35. and a selector 34 that selects the output of the stage and outputs it to the MPU. On the other hand, the configuration of the output queue is as shown in FIG. 8B, but since it is roughly the same as that in FIG. 8A, a description thereof will be omitted.

第８Ａ図の入力キューにおいて、Ｒ／Ｗコントローラ１
１からセレクテイングを受けると、ゲート３１の出力に
より、カウンタ３２は１つインクリメントし、同時にシ
フトレジスタ３５はメモリアドレス／画像データのペア
等をシフト入力する。ＭＰＵからのＲ信号があると、反
対にカウンタ３２は１つデクリメントする。即ち、カウ
ンタ３２はシフトレジスタ３５内の最も早く格納され未
だＭＰＵに取り出されていないデータの位置をポイント
する。かくして、キューの先入れ先出し機能が実現され
る。In the input queue of FIG. 8A, R/W controller 1
When receiving selection from 1, the counter 32 increments by 1 according to the output of the gate 31, and at the same time, the shift register 35 shifts and inputs the memory address/image data pair, etc. Conversely, when there is an R signal from the MPU, the counter 32 decrements by one. That is, the counter 32 points to the position of the data stored earliest in the shift register 35 and not yet taken out by the MPU. Thus, a first-in, first-out function of the queue is realized.

尚、入力キューでシフトレジスタ３５が満杯、即ち、カ
ウンタ３２の値が最大であるときはインターフェース制
御部１２ｂに信号Ｆ’Ｕ　Ｌ　Ｌが返される。このとき
に、Ｒ／Ｗコントローラ１１からセレクテイングがある
と、第３Ｃ図のＮＡＫが返される。又、第８Ｂ図の出力
キューにおいて、キューが空であるときに、ポールシー
ケンスを受けると、キュー１２ａから信号ＥＭＰＴＹを
受けた制御部１２ｂは第３Ｃ図のフォーマットのＥＭＰ
ＴＹを返す。Incidentally, when the shift register 35 is full of input queues, that is, when the value of the counter 32 is maximum, a signal F'U LL is returned to the interface control section 12b. At this time, if there is selection from the R/W controller 11, the NAK shown in FIG. 3C is returned. Further, in the output queue of FIG. 8B, when a poll sequence is received when the queue is empty, the control section 12b which receives the signal EMPTY from the queue 12a outputs EMPTY in the format of FIG. 3C.
Return TY.

入力キュー。出力キューの大きさであるが、取り扱う画
像データをＲＧＢ３色について各８ビツトとし、画像メ
モリの大きさを１０２４ｘ１０２４とし、ＭＰＵの個数
を４つ、コントロールビットの種類を４種類とすれば、メモリアドレスに２０ビツト、画像データに２４ビツト、コントロールビットに８ビツト必要になり、都合５２ビツトとなる。そこで、キューの
長さをこの５２ビツトとし、幅（容量）を画像メモリ１
０の１ラインの画素数に対応する１０２４個とする。即
ち、１つのキューは５２ビツトＸ１０２４である。尚、
第８Ａ図等や後述のフローチャートの説明からも分るよ
うに、本実施例では、キューが満杯のときの対処処理を
考慮しであるので、例えば２５６個、１２８個等のよう
にキューを小規模とすることも可能であり、その場合も
極力処理効率が落ちないようにしである。Input queue. Regarding the size of the output queue, if the image data to be handled is 8 bits each for the three colors RGB, the size of the image memory is 1024x1024, the number of MPUs is 4, and the types of control bits are 4, the memory address is 20 bits are required for image data, 24 bits are required for image data, and 8 bits are required for control bits, making a total of 52 bits. Therefore, the length of the queue is set to 52 bits, and the width (capacity) is set to 1 image memory.
The number is 1024, which corresponds to the number of pixels in one line of 0. That is, one queue is 52 bits x 1024. still,
As can be seen from FIG. 8A and the explanation of the flowcharts described later, in this embodiment, the processing when the queue is full is taken into account, so the queue is reduced to 256, 128, etc. It is also possible to increase the scale, and in that case, the processing efficiency should be kept as low as possible.

［以下余白］（Ｒ／Ｗコントローラによる画素の割り付け〉第１図にも示すように、本実施例は複数のＰＥを擁する
ことを基本的前提とする。又、本発明の趣旨及び第５Ａ
、５Ｂ図の実施例にも示しであるように、本発明は処理
前のある画素がどこか特定のＰＥに送られるべきである
というような制限が無い点に大きな特徴がある。この特
徴により並列処理あ高速化が達成される。[Blank below] (Pixel allocation by R/W controller) As shown in Fig. 1, the basic premise of this embodiment is to have a plurality of PEs.
, 5B, the present invention is characterized in that there is no restriction that a certain pixel before processing should be sent to any specific PE. This feature allows parallel processing to be performed at high speed.

ところが、ラプラシアン等の空間フィルタ処理を行なう
ような第２実施例では、例えば、連続する２つの画素の
フィルタ処理を行なう場合、そのフィルタに対応するブ
ロック内で重複する部分（第９図の斜線部分）がでてく
る。第９図では、中心ｊ　　画素に対して、３×３のラ
プラシアン処理を行なう場合を示す、もし、Ｒ／Ｗコン
トローラ１１の画素の割り付けのアルゴリズムに汎用性
を保たせ、画像処理の種類に対する依存性をないように
することを前提にしたときは、Ｒ／Ｗコントローラ１１
では上記重複の判断はさせるべきでない。However, in the second embodiment, which performs spatial filter processing such as Laplacian, for example, when performing filter processing on two consecutive pixels, the overlapping part (the shaded part in Fig. 9) in the block corresponding to the filter is ) will appear. FIG. 9 shows a case where 3×3 Laplacian processing is performed on the center j pixel. If it is assumed that there is no gender, the R/W controller 11
Therefore, the above-mentioned duplication judgment should not be made.

判断しないままで、第５Ａ、５Ｂ図のようにＲ／Ｗコン
トローラ１１がら空いているＰＥに無秩序に画素情報を
送るとすると、上述の重複した画素が存在しても、これ
を再送する必要がでてくる。If the R/W controller 11 sends pixel information to vacant PEs in a random manner without making any judgments, as shown in FIGS. 5A and 5B, even if the above-mentioned duplicate pixels exist, it is necessary to resend them. It comes out.

しかしこれでは、画素とＰＥの依存関係を絶つことによ
って並列処理の速度を高めた本発明の特徴が少し減殺さ
れてしまう。However, in this case, the feature of the present invention, which increases the speed of parallel processing by breaking the dependency relationship between pixels and PEs, is somewhat diminished.

そこで、第２実施例ではＲ／Ｗコントローラ１１の画像
メモリに対するアクセス手順に特別の工夫を凝らして、
連続子る画素はなるべく同じＰＥに送られるようにして
、重複した画素情報の再送の必要性をなくＬ／、ＰＥと
Ｒ／Ｗコントローラ１１間のデータ転送回数が削減し、
第５Ｂ図の構成より更に並列処理の高速化を１指してい
る。この工夫とは、第１０図に示したように、画像メモ
リ１０のメモリ空間なＰＥの数に等分割する。第１０図
の例では、ＰＥの個数を４つとしているので４分割とし
ている。このように分割した上で、画像メモリの画素情
報を各ＰＥに割り付けするのは次のようにする。各エリ
ア内の先頭アドレスは：（ｎ、ｚ５ａｘｓｃ）で表わされる。ここで、ｋはエリア番号を意味し、ｋ＝
０．１，２．３であり、ｎは画像メモリのＸ方向のアド
レス値を表わし、０≦ｎ≦１０２３である。又、エリア
内の各画素位置は＝（ｎ、ｍ＋２５６ｘｋ）で表わされる。ここで１、ｍはＹ方向のアドレス値を表
わし、Ｏ≦ｍ≦２５５である。このように画素アドレス
を表わした上で、各エリアと各ＰＥとを１対１に対応さ
せる。そして、Ｒ／　Ｗコントローラ１１は各エリアの
先頭から順にラスクスキャンしながら１画素づつ取り出
して、その画素のメモリアドレス／画像データのペアを
対応するＰＥに送る。このようにすれば、横（Ｘ）方向
に連続した画素は同一のＰＥに送られることになり、上
述の重複した画素の情報の再送がなくなり、バス１フの
稼動率が低下して衝突が減り、並列処理の効率化が達成
される。Therefore, in the second embodiment, special measures are taken to access the image memory of the R/W controller 11.
Consecutive pixels are sent to the same PE as much as possible, eliminating the need to resend duplicate pixel information and reducing the number of data transfers between the L/PE and the R/W controller 11.
This points to an even faster parallel processing than the configuration shown in FIG. 5B. This idea is to equally divide the memory space of the image memory 10 into the number of PEs, as shown in FIG. In the example of FIG. 10, the number of PEs is four, so it is divided into four. After dividing in this way, the pixel information of the image memory is allocated to each PE as follows. The start address in each area is expressed as: (n, z5axsc). Here, k means area number, k=
0.1, 2.3, and n represents the address value of the image memory in the X direction, and 0≦n≦1023. Further, each pixel position within the area is expressed as =(n, m+256xk). Here, 1 and m represent address values in the Y direction, and O≦m≦255. After representing the pixel addresses in this way, each area and each PE are made to correspond one to one. Then, the R/W controller 11 sequentially scans the pixels from the beginning of each area and extracts one pixel at a time, and sends the memory address/image data pair of that pixel to the corresponding PE. In this way, consecutive pixels in the horizontal (X) direction will be sent to the same PE, eliminating the above-mentioned retransmission of duplicated pixel information, reducing the operating rate of the first bus, and preventing collisions. This results in more efficient parallel processing.

尚、第１０図において、（ｎ　［ｋ］　、ｍ　［ｋ］　＋ｚｓｓｘｓｃ）とする
のは、ＰＥとエリアとの対応関係が決まり、独立して並
列的に転送すると、現在の転送アドレスがエリア間で異
なる可能性があるからである。In addition, in Fig. 10, (n [k], m [k] + zssxsc) is used because the correspondence between PEs and areas is determined, and when they are transferred independently and in parallel, the current transfer address is transferred between areas. This is because there is a possibility that they differ.

今、第９図に示したような空間フィルタ処理を行なう場
合に上記の対応関係がどのようになるべきかを考えてみ
る。この場合は、第１０図に示すようなＰＥとエリアと
の対応をとることとすると、中心画素とその周りの８個
の画素が同じＰＥに送られ、次の中心画素も同じＰＥに
送られる。Let us now consider what the above correspondence should be like when performing spatial filter processing as shown in FIG. 9. In this case, if we take the correspondence between PEs and areas as shown in Figure 10, the center pixel and the eight surrounding pixels will be sent to the same PE, and the next center pixel will also be sent to the same PE. .

即ち、第１１Ａ図に示したような各ＰＥと各エリアとの
経路が確立する。各エリアの容量は１０２４Ｘ２５６画
素であり、入力キューの容量は１０２４×１画素である
。キューにロードする方がＰＥで画像処理するよりも早
いから、あるＰＥの人力キューが満杯の状態が発生する
可能性がある。That is, a route between each PE and each area as shown in FIG. 11A is established. The capacity of each area is 1024×256 pixels, and the capacity of the input queue is 1024×1 pixel. Since loading an image into a queue is faster than processing an image in a PE, there is a possibility that a certain PE's manual queue is full.

ところが、あるＰＥの入力キューが満杯でこのキューが
空くのを待つよりも、他のＰＥのキューは空いている可
能性があるから、この空いているＰＥを探してそのキュ
ーに、上記送ることができなかった画素の情報を送った
方が効率的である。もし第１１Ａ図でＰＥＩのキューが
満杯で、ＰＥ２のキューに空きがあったときは、第１１
Ｂ図の如＜ＰＥとエリアとの対応関係を替える。このよ
うにすると、対応関係を替えた時は、入力キュー内にあ
る変更前と変更後のデータの連続性は失われるが、変更
された以降の対応関係は、再度どこかのキューが満杯に
なって変更されないかぎりは維持され連続性は保持され
るから処理が効率的になる。However, rather than waiting for a certain PE's input queue to become empty when it is full, there is a possibility that another PE's queue is empty, so instead of searching for this empty PE and sending the above message to that queue. It is more efficient to send information about pixels that could not be transmitted. In Figure 11A, if the PEI queue is full and there is space in the PE2 queue, the 11th
Change the correspondence between PE and area as shown in Figure B. If you do this, when you change the correspondence, the continuity of the data in the input queue before and after the change will be lost, but the correspondence after the change will be changed until some queue becomes full again. As long as it is not changed, it will be maintained and continuity will be maintained, making processing more efficient.

〈データ転送手順の概略〉第１２Ａ図は第１実施例におけるメモリアドレス／画像
データのペアの転送の概略を説明し、第１２Ｂ図は第２
実施例のそれを説吋する。本実施例におけるデータ転送
は一例としてポーリング／セレクテイング方式を用いて
おり、Ｒ／Ｗコントローラ１１が親局であり各ＰＥが子
局になる。<Outline of data transfer procedure> FIG. 12A explains the outline of the transfer of the memory address/image data pair in the first embodiment, and FIG. 12B explains the outline of the transfer of the memory address/image data pair in the first embodiment.
Let's talk about it in an example. Data transfer in this embodiment uses, for example, a polling/selecting method, in which the R/W controller 11 is a master station and each PE is a slave station.

第１実施例では、定期的に、処理前のメモリアドレス／
画像データのペアをＲ／Ｗコントローラ１１が画像メモ
リ１０から読出して、コントロールビット＝００を付し
て、セレクテイングシーケンスでＰＥの入力キューに送
る。ＰＥは入力キューが空いていればＡＣＫを、空いて
いなければＮＡＫを返す。人力キューのメモリアドレス
／画像データのペアは、ＭＰＵで、γ変換等であれば画
像データに対して画像処理を行なわれ、アフィン変換等
であれば画素アドレスめ計算を行なわれ、画像処理結果
のメモリアドレス／画像データのペアを、コントロール
ビット＝１１を付して、出力キューに返す。Ｒ／Ｗコン
トローラ１１は定期的に出力キューをポーリングして、
このコントロールビット＝１０であるメモリアドレス／
画像データのペアを出力キューから取り出し、そのメモ
リアドレスが指す画像メモリ１０のなかに、画像データ
を格納する。第１図の構成では、入力キュー、出力キュ
ーとも１０２４画素の容量があるから、Ｒ／Ｗコントロ
ーラ１１は、ＰＥへのメモリアドレス／画像データのペ
アの転送と、ＰＥからの処理されたメモリアドレス／画
像データのペアの転送とを非同期で行なう。キューの存
在とこの非同期処理により、バス１７でのネックが解消
し高速処理が実現される。In the first embodiment, the memory address before processing is periodically
The R/W controller 11 reads the image data pair from the image memory 10, adds a control bit=00, and sends it to the input queue of the PE in a selecting sequence. The PE returns ACK if the input queue is empty, and returns NAK if it is not. For the memory address/image data pair of the human queue, the MPU performs image processing on the image data in the case of γ transformation, etc., and calculates the pixel address in the case of affine transformation, etc., and calculates the image processing result. Return the memory address/image data pair to the output queue with control bit=11. The R/W controller 11 periodically polls the output queue,
Memory address / where this control bit = 10
A pair of image data is taken from the output queue and stored in the image memory 10 pointed to by the memory address. In the configuration shown in FIG. 1, since both the input queue and the output queue have a capacity of 1024 pixels, the R/W controller 11 transfers the memory address/image data pair to the PE and the processed memory address from the PE. /The transfer of image data pairs is performed asynchronously. The presence of the queue and this asynchronous processing eliminates bottlenecks on the bus 17 and achieves high-speed processing.

第２実施例においても、定期的に、処理前のメモリアド
レス／画像データのペアをＲ／Ｗコントローラ１１が画
像メモリ１０から読出して、コントロールビット＝００
を付して、セレクテイングシーケンスで、ＰＥの入力キ
ューに送る。ＰＥは入力キューが空いていればＡＣＫを
、空いていなければＮＡＫを返す。ここまでは第１実施
例と同じである。この入力キューに入れられた画素は、
例えば第１７図の３×３のブロックの中心画素である。In the second embodiment as well, the R/W controller 11 periodically reads out the unprocessed memory address/image data pair from the image memory 10 and sets the control bit to 0.
and sends it to the PE's input queue in the selecting sequence. The PE returns ACK if the input queue is empty, and returns NAK if it is not. Everything up to this point is the same as the first embodiment. The pixels placed in this input queue are
For example, this is the center pixel of the 3×3 block in FIG. 17.

ＭＰＵはこの中心画素の周辺の画素のアドレスを計算し
て、その周辺画素の画像データをＲ／Ｗコントローラ１
１にリクエストするために、上記周辺画素のアドレスに
コントロールビット＝１１を付したものを出力キューに
入れる。Ｒ／Ｗコントローラ１１はこの出力キューを定
期的にポーリングして取り出し、このアドレスの画素の
画像データを画像メモリ１０から読出し、読出した画像
データとそのメモリアドレスとコンドロールビ、５．ッ
ト＝１１とを再度セレクティングシーケンスによりＰＥ
に送り返す。The MPU calculates the addresses of pixels around this center pixel and sends the image data of the surrounding pixels to the R/W controller 1.
In order to request 1, the address of the peripheral pixel with control bit=11 added is placed in the output queue. The R/W controller 11 periodically polls this output queue and retrieves it, reads out the image data of the pixel at this address from the image memory 10, and stores the read image data, its memory address, and the controller.5. PE = 11 by selecting sequence again.
send it back to

ＰＥ側がこのコントロールビット＝１１のセレクテイン
グシーケンスを受けると、これをＰＥのインターフェー
ス制御部１２ｂは入力キューには入れないで、ＭＰＵに
割り込みをかける。割込みをかけられたＭＰＵはリード
命令でバス１７上のこのデータを取込む。第１７図の例
では、周辺画素は８個あるから、出力キューに格納され
たコントロールビット＝１１のリクエストは８個あり、
結果的にコントロールビット＝１１のセレクティングシ
ーケンスによる上記割込みも８つ発生する。こうして８
つの周辺画素がＰＥ内に揃うと、フィルタ処理等の画像
処理を行ない、その結果であるメモリアドレス／画像デ
ータのペアにコントロールビット＝１０を付して、出力
キューに格納する。Ｒ／Ｗコントローラ１１はポーリン
グによりこの出力キューの画像処理後のメモリアドレス
／画像データのペアを入手し、画像メモリ１ｏに格納す
る。こうして、空間フィルタ等の画像処理が行なわれる
。When the PE side receives the selecting sequence with control bit=11, the interface control unit 12b of the PE interrupts the MPU without putting it into the input queue. The interrupted MPU reads this data on the bus 17 with a read command. In the example of FIG. 17, there are 8 peripheral pixels, so there are 8 requests with control bit = 11 stored in the output queue.
As a result, eight of the above-mentioned interrupts occur due to the selecting sequence of control bit=11. Thus 8
When the two peripheral pixels are aligned within the PE, image processing such as filter processing is performed, and the resulting memory address/image data pair is assigned a control bit of 10 and stored in the output queue. The R/W controller 11 obtains the memory address/image data pair of the output queue after image processing by polling, and stores it in the image memory 1o. In this way, image processing such as spatial filtering is performed.

ところで、第２実施例で割込みを使うのは、入力キュー
の多くが中心画素のメモリアドレス／画像データのペア
で使われており、出力キューは周辺画素のリクエスト及
び処理後のメモリアドレス／画像データのペアで多くが
使用されているからである。又、周辺画素の割り付けを
Ｒ／　Ｗコントローラ１１が行ない、中心画素と８つの
周辺画素のメモリアドレス／画像データのペアを入力キ
ューに入れることも考えられるが、エリア間にまたがる
画素の処理に手間がかかること、入力キューと出力キュ
ーの使用率がアンバランスになることから、上記の周辺
画素はＰＥが割り付けを行ない、ＭＰＵの割込みによる
Ｉ１０リードで周辺画素を得るという手法をすることの
方が望ましい。By the way, the reason why interrupts are used in the second embodiment is that most of the input queue is used for the memory address/image data pair of the center pixel, and the output queue is used for the request of peripheral pixels and the memory address/image data after processing. This is because they are mostly used in pairs. It is also conceivable that the R/W controller 11 allocates the peripheral pixels and puts the memory address/image data pairs of the center pixel and eight peripheral pixels into an input queue, but it would take time and effort to process pixels that span between areas. , and the utilization rates of the input queue and output queue become unbalanced. Therefore, it is better to allocate the peripheral pixels mentioned above by the PE and obtain the peripheral pixels by reading I10 using an interrupt from the MPU. desirable.

また更に、Ｒ／Ｗコントローラ１１に画像処理に依存し
た転送アルゴリズムをもたせないことにより、Ｒ／Ｗコ
ントローラ１１は処理前の画素を次々と連続的にＰＥに
送ることが可能になると共に、第１実施例、第２実施例
の画像処理に対してＲ／Ｗコントローラ１１の転送アル
ゴリズムを共通化できるという効果が生まれる。Furthermore, by not providing the R/W controller 11 with a transfer algorithm that depends on image processing, the R/W controller 11 can continuously send unprocessed pixels one after another to the PE, and the first This has the effect that the transfer algorithm of the R/W controller 11 can be made common to the image processing of the embodiment and the second embodiment.

（Ｒ／Ｗコントローラにおける転送制御〉第１３Ａ図、
第１３Ｂ図、第１４図を用いてＲ／Ｗコントローラ１１
の制御手順を説明する。第１４図はＰＥがＮ個のシステ
ムに接続されている場合におけるＲ／Ｗコントローラ１
１の制御のために使われる中間データ、フラグ等を示す
。(Transfer control in R/W controller) Fig. 13A,
R/W controller 11 using Fig. 13B and Fig. 14.
The control procedure will be explained. Figure 14 shows the R/W controller 1 when PE is connected to N systems.
1 shows intermediate data, flags, etc. used for control.

ｎ　［ｋｌ　、　ｍ　［ｋｌは前述したように、これか
らＰＥに送られようとしていている画素の相対アドレス
を示す。送出完了フラグ［ｋｌはに番目のエリアについ
ての全画素情報をＰＥ側に送出完了したことを示すフラ
グである゛。又、各ＰＥ毎に、ＰＥからのデータを受信
するための受信バッファとＰＥヘデータを送るための送
信バッファが用意されている。As described above, n [kl, m [kl] indicates the relative address of the pixel that is about to be sent to the PE. Transmission completion flag [kl is a flag indicating that all pixel information for the second area has been transmitted to the PE side. Further, for each PE, a reception buffer for receiving data from the PE and a transmission buffer for sending data to the PE are prepared.

Ｈ１３Ａ、１３Ｂ図のフローチャートにおいては、ＰＥ
の中から１番目のＰＥを特定するのに、ＰＥ　［１］の
表記を用いる。又、受信側を示すのにＲを、送信側を示
すのにＳを付して区別する。In the flowcharts of Figures H13A and 13B, PE
The notation PE [1] is used to specify the first PE from among. Also, R is added to indicate the receiving side, and S is added to indicate the transmitting side to distinguish them.

又、画像メモリのＸアドレスとＹアドレスをＸＭＳ（送
信用）、ＸＹＲ（受信用）で表記する。Further, the X address and Y address of the image memory are expressed as XMS (for sending) and XYR (for receiving).

尚、受信バッファへの格納、送信バッファからのデータ
の実際の転送はバスインターフェース５４によりなされ
る。Incidentally, the storage in the reception buffer and the actual transfer of data from the transmission buffer are performed by the bus interface 54.

第１３図に従って、Ｒ／Ｗコントローラー１のフローチ
ャートを説明する。A flowchart of the R/W controller 1 will be explained according to FIG.

先ず、コントロールビット二〇〇のメモリアドレス／画
像データのペアを、各エリアの先頭から順にＰＥに転送
する場合を説明する。First, a case will be described in which memory address/image data pairs of 200 control bits are transferred to the PE in order from the beginning of each area.

ステップＳ２では、エリアを指す添字に、ＰＥを指す添
字１等を初期化する。ステップＳ４で、１番目のＰＥに
ポールシーケンス（第３Ｂ図）を送る。今、画像メモリ
１０からＰＥに向けてデータの転送を開始すると、この
ときはボールに対して、第３Ｃ図のＥＭＰＴＹが返され
るから、ステップＳ４０に進み、送出完了フラグ［ｋ］
を調べる。未だ送出は終了していないから、ステップＳ
４２へ進む。ステップＳ４２で、送信バッファのコント
ロールビットＳを００にする。この値にコントロールビ
ットが設定されると、このコントロールビットを受けた
ＰＥ側では、自分が特定のＰＥとしてではなく、任意の
ＰＥとしてデータな送出されたことを確証できる（第４
図参照）。In step S2, the subscript 1 indicating PE is initialized to the subscript indicating the area. In step S4, a poll sequence (FIG. 3B) is sent to the first PE. Now, when data transfer is started from the image memory 10 to the PE, EMPTY shown in FIG.
Find out. Since the transmission has not finished yet, step S
Proceed to 42. In step S42, the control bit S of the transmission buffer is set to 00. When a control bit is set to this value, the PE that receives this control bit can confirm that the data was sent not as a specific PE but as an arbitrary PE (4th
(see figure).

ステップＳ４４では、（ｎ　［ｋ］　、　ｍ　［ｋ］　
＋２５６Ｘｋ）番地の画素をアクセスする。最初にステ
ップＳ４４に来たときは、ｎ［ｋ］、ｍ［ｋ］±０であ
る。ステップ３４６で、（ｎ　［ｋ］　。In step S44, (n [k], m [k]
+256Xk) accesses the pixel at address. When step S44 is first reached, n[k], m[k]±0. At step 346, (n[k].

ｍ　［ｋ］　＋２５６Ｘｋ）番地の画素の画像データを
送信バッファの画像データＳに格納する。ステップＳ４
８で、（ｎ　［ｋ］、ｍ　［ｋ］　＋２５６Ｘｋ）の値
を送信バッファのＸＹＳに格納する。こうして、送信バ
ッファの内容が全部揃った。ステップＳ５０では［−１
］番目のＰＥにこの［−１１番目の送信バッファの内容
を送出する。ステップＳ５２では、ＰＥのインターフェ
ース制御部１２ｂからのＡＣＫを確認する。入力キュー
が満杯になるまではＡＣＫ以外が返ることは無いからで
ある。ＡＣＫを確認してから、ステップＳ５４又はステ
ップＳ６０で、ｋ番目のエリアの次の画素をポイントす
る。即ち、再度に番目のエリアの画素を送出する番にな
ったときは、（ｎ　［ｋ］　＋１゜ｍ　［ｋ］　＋２５
６Ｘｋ）、又は（ｎ　［ｋ］　。The image data of the pixel at address m [k] +256Xk) is stored in the image data S of the transmission buffer. Step S4
8, the value of (n [k], m [k] +256Xk) is stored in XYS of the transmission buffer. This completes the contents of the send buffer. In step S50, [-1
The contents of this [−11th] transmission buffer are sent to the ]th PE. In step S52, an ACK from the interface control unit 12b of the PE is confirmed. This is because nothing other than ACK will be returned until the input queue is full. After confirming ACK, the next pixel in the kth area is pointed at step S54 or step S60. That is, when it is the turn to send out the pixels of the th area again, (n [k] +1゜m [k] +25
6Xk), or (n [k].

ｍ　［ｋ］　＋１＋２５６ｘｋ）をポイントしている。m[k]+1+256xk).

ステップＳ５６又はステップＳ６２から、ステップＳ６
６へ進み、次のＰＥをポイントするために、ポインタｌ
を１つインクリメントする。ステップＳ７２では、次の
エリアをポイントするために、ポインタｋを１つインク
リメントする。こうして、ステップＳ７４から、ステッ
プｓ４に戻り、１つのエリアの１画素を１つのＰＥに送
出する動作を終了する。From step S56 or step S62, step S6
Go to 6 and move the pointer l to point to the next PE.
Increment by one. In step S72, the pointer k is incremented by one in order to point to the next area. In this way, from step S74, the process returns to step s4, and the operation of sending one pixel of one area to one PE ends.

次のステップＳ４からのサイクルでは、ＰＥはＰＥ　［
１＋１］を指しており、エリアはに＋１を指しており、
エリア［ｋ＋１３の送出画素アドレスは（ｎ　［ｋ＋１
］　、ｍ　［ｋ＋１］　＋２５６Ｘｋ）又は（ｎ［ｋ＋
１コ、ｍ　［ｋ＋１］　＋２５６ｘｋ）である（ステッ
プＳ４４．ステップ５４８）。こうして、ｋ＋１番目の
エリアの１画素の情報がＰＥ［１＋１コに送られる。In the cycle from the next step S4, the PE is PE[
1+1], the area is pointing to +1,
The sending pixel address of area [k+13 is (n [k+1
] , m [k+1] +256Xk) or (n[k+
1, m [k+1] +256xk) (Step S44. Step 548). In this way, information on one pixel in the k+1th area is sent to PE[1+1.

ステップＳ５６〜Ｓｈｏはラインの変り目の制御である
。ステップＳ６８と３７０、ステップＳ７２とＳ７４は
夫々、ＰＥの数が４つの場合のＰＥとエリアを循環させ
る制御である。Steps S56 to Sho are control at line changes. Steps S68 and 370, and steps S72 and S74 are control for circulating PEs and areas when the number of PEs is four, respectively.

ステップＳ６２とステップＳ６４は、１つのエリアの全
画素がＰＥに送出されたことを示すフラグをセットする
制御である。この送出終了フラグ［ｋｌがセットされる
と、ステップＳ４０→ステ！ツブＳ６６以下に進み、このフラグがセットされたエリ
アへの送出は行なわれない。Steps S62 and S64 are controls for setting a flag indicating that all pixels in one area have been sent to the PE. When this transmission end flag [kl is set, step S40→Step! The process advances to block S66 and below, and no transmission is performed to the area where this flag is set.

入力キューが満杯でステップＳ５２でＮＡＫをＰＥから
受けたときは、第１１Ａ図、第１１Ｂ図に関連して説明
したように、ステップＳ８０で１をインクリメントして
、ステップＳ８２で、ＮＡＫを返されたデータを次のＰ
Ｈに送出する。ステップＳ５２→ステツプＳ８４→ステ
ツプＳ５２のループで、同じデータを空いているＰＥに
送る。When the input queue is full and a NAK is received from the PE in step S52, as explained in connection with FIGS. 11A and 11B, 1 is incremented in step S80, and the NAK is returned in step S82. data to the next page
Send to H. In a loop of step S52→step S84→step S52, the same data is sent to the vacant PE.

ＡＣＫが返るまでＰＥを巡回させないのは、入力キュー
と出力キューとが満杯になってデッドロックになるのを
防止するためである。The reason why PEs are not cycled until an ACK is returned is to prevent the input queue and output queue from becoming full and causing a deadlock.

コントロールビット＝１０コントロールビット＝１０のデータは、第１゜第２実施
例の両方に係る画像処理の結果を含むデータである。ス
テップＳ４でＰＥ　［１］にボールを送り、そのＰＥ　
［１］からデータが返ってきたときはステップＳ６→ス
テツプＳ１０へ進んで、コントロールビットを調べる。Control bit=10 Data with control bit=10 is data including the results of image processing related to both the first and second embodiments. In step S4, send the ball to PE [1] and
When data is returned from [1], the process advances from step S6 to step S10 to check the control bit.

コントロールビット＝１０ならば、ステップＳ１２で受
信バッファのＸＹＲ（第１４図）で示されるアドレスで
画像メモリをアクセスし、ステップＳ１４で、データＲ
を画像メモリ１０に格納する。If the control bit = 10, the image memory is accessed at the address indicated by XYR (Fig. 14) of the reception buffer in step S12, and the data R is accessed in step S14.
is stored in the image memory 10.

ステップＳ１４からステップ５６６へ進むのは第１２Ａ
、１２Ｂ図に関連して説明したＰＥとエリアの関係を崩
さないためである。It is the 12th A that proceeds from step S14 to step 566.
This is to maintain the relationship between the PE and the area described in connection with FIG. 12B.

コントロールビット＝１１このコントロールビット＝１１のデータは、周辺画素の
データを送ることを要求するリクエストデータである。Control bit = 11 This control bit = 11 data is request data requesting to send data of peripheral pixels.

ステップＳ１０でコントロールビット≠１１であること
が判明すると、ステップＳ２０へ進み、アドレスＸＹＲ
で画像メモリをアクセスしてステップＳ２２で、そのア
ドレスの画像データを送信バッファ（第１４図）のデー
タＳに送り、同じく送信バッファのコントロールビット
Ｓに１１をセットして、ステップＳ２８でコントロール
ビット冨１１の送信元のＰＥにセレクテイングシーケン
スで返す。第１２Ｂ図に関連して説明したように、この
コントロー・ルビット＝１１のＰＥへの返答は割込みで
行なわれるから、インターフェース制御部１２ｂから必
ずＡＣＫが返る（ステップ５３０）。If it is found in step S10 that the control bit≠11, the process advances to step S20, and the address
accesses the image memory in step S22, sends the image data at that address to the data S in the transmission buffer (Fig. 14), similarly sets the control bit S in the transmission buffer to 11, and in step S28 sets the control bit limit to 11. The selection sequence is returned to the source PE of No. 11. As explained in connection with FIG. 12B, since this response to the PE with control bit=11 is performed by an interrupt, an ACK is always returned from the interface control unit 12b (step 530).

前述したように、ＰＥにコントロールビット＝００で１
つの中心画素を送ると、ＰＥから８つのコントロールビ
ット＝１１のリクエストが順々にきて、Ｒ／Ｗコントロ
ーラ１１は夫々に対して順々に周辺画素の情報を返す。As mentioned above, if control bit = 00 is set to 1 in PE,
When one central pixel is sent, requests for eight control bits=11 come in sequence from the PEs, and the R/W controller 11 returns information on peripheral pixels to each one in turn.

〈第１実施例に係るＰＥ側の処理）第１５図は第１実施例に係るＭＰＵの制御手順を示すフ
ローチャートである。かかるＭＰＵの制御手順に係るプ
ログラムは前述したように、異なる画像処理毎にマスタ
ＣＰＵ２０が各ＭＰＵにロードする。即ち、第２実施例
の画像処理を行なうときは改めてプログラムがロードさ
れる。(Processing on the PE side according to the first embodiment) FIG. 15 is a flowchart showing a control procedure of the MPU according to the first embodiment. As described above, the program related to the MPU control procedure is loaded by the master CPU 20 into each MPU for each different image process. That is, when performing the image processing of the second embodiment, the program is loaded anew.

この第１実施例の画像処理は、１画素の処理を１つのＰ
Ｅが担当するというものである。ステップＳ８０で人力
キューにコントロールビット二〇０のデータが格納され
るのを待つ。このデータが１つでも入ると、ステップＳ
８２で取り出してステップＳ８４でカウンタ３２（第８
Ａ図）を１つカウントダウンする。ステップＳ８６では
、今取り出されたメモリアドレス／画像データのペアに
対してＭＰＵに割り当てられた画像処理を行なう。画像
処理は例えばカラー画像の色処理、γ変換とする。この
γ変換は、画素の三成分Ｒ，Ｇ。In the image processing of this first embodiment, processing of one pixel is performed using one P.
E will be in charge. In step S80, it waits for the data of control bit 200 to be stored in the manual queue. If even one of these data is entered, step S
82 and counter 32 (eighth) in step S84.
Figure A) counts down by one. In step S86, the image processing assigned to the MPU is performed on the memory address/image data pair just taken out. The image processing is, for example, color processing of a color image or γ conversion. This γ conversion converts the three components R and G of the pixel.

Ｂ（８ビツト）にマトリックス演算を行いＲ′。A matrix operation is performed on B (8 bits) and R' is obtained.

Ｇ’、Ｂ’に変換するものである。即ち、なる処理を施
す。This is to convert into G' and B'. That is, the following processing is performed.

画像処理がアフィン変換であれば、ＲＧＢの画像データ
の代りにアドレスがマトリクス演算の対象となる。画像
処理が終了すると、ステップＳ８８で、出力キューが満
杯かどうかを調べ、空きがあれば、ステップＳ９０でコ
ントロールビット＝１０にして、ステップＳ９２で出力
キューにロードし、ステップＳ９４でカウンタ４２（第
８Ｂ図）をインクリメントする。出力キューにロードさ
れたメモリアドレス／画像データのペアはＲ／Ｗコント
ローラ１１によりやがて画像メモリ１０に格納される。If the image processing is affine transformation, addresses will be subjected to matrix calculations instead of RGB image data. When the image processing is completed, it is checked in step S88 whether the output queue is full, and if there is space, the control bit is set to 10 in step S90, the output queue is loaded in step S92, and the counter 42 (number 1) is loaded in step S94. 8B) is incremented. The memory address/image data pair loaded into the output queue is eventually stored in the image memory 10 by the R/W controller 11.

〈第１実施例の効果〉第１６Ａ図は従来例における１つのＰＥで１画素単位の
画像処理のシーケンスを示したもので、先ず、画像メモ
リ１０から画像データを読み出しくａ）、ＭＰＵで処理
しくｂ）、画像メモリへ処理した結果を返す（Ｃ）、以
上（ａ）、（ｂ）。<Effects of the first embodiment> Fig. 16A shows the sequence of image processing in units of one pixel by one PE in the conventional example. First, image data is read from the image memory 10 a), and processed by the MPU. (b), return the processed result to the image memory (C), and (a) and (b) above.

（Ｃ）のサイクルの繰り返しで実行される。This is executed by repeating the cycle (C).

一方、４個のＭＰＵで行う時の第１実施例では第１６Ｂ
図に示すように、Ｒ／Ｗコントローラの限界の処理能力
まで並列性が高められ、それに比例してパフォーマンス
が向上するのである。即ち、 ■：従来通常、複数のＭＰＵが１つの画像メモリを共有
する場合、画像メモリの入出力は１ケ所だけであるので
、各ＭＰＵでタイムシェアして用いなければならない、
この為、同一のバスを用いる必要があり、衝突が起こる
。ｆｔ／Ｗコントローラ１１はこの交通整理を行い、待
ち時間が短い。On the other hand, in the first embodiment when four MPUs are used, the 16th B
As shown in the figure, parallelism is increased to the limit processing capacity of the R/W controller, and performance improves in proportion. That is, ①: Conventionally, when multiple MPUs share one image memory, the image memory has only one input/output location, so each MPU must use it by time-sharing.
Therefore, it is necessary to use the same bus, and collisions occur. The ft/W controller 11 performs this traffic control and the waiting time is short.

■二通常、ＭＰＵでの画像を処理する時間は画像メモリ
へ入出力する時間よりはるかに長い。このため、Ｎ個の
ＰＥを用いた時、人出力の時間さえ待ちがなければ、Ｎ
倍のスピードアップが図れる。(2) Normally, the time it takes to process an image in the MPU is much longer than the time it takes to input and output it to the image memory. Therefore, when using N PEs, if there is no waiting time for human output, N
The speed can be doubled.

■：その結果、Ｒ／Ｗコントローラ１１は頻繁に画像メ
モリ１０及び各ＰＥとの間で入出力を行い、極めて高い
使用率となる。(2): As a result, the R/W controller 11 frequently performs input/output between the image memory 10 and each PE, resulting in an extremely high usage rate.

〈第２実施例に係るＰＥ側の処理〉第１７図以下を用いてＰＥ側における第２実施例の画像
処理の手順を説明する。この第２実施例における画像処
理は第１７図のマトリクスをもつラプラシアン処理であ
る。この処理を行なうために、ＲＡＭ１４に、第１８Ａ
図に示したようなラプラシアンフィルタと同じ大きさの
２つの計算ブロックを確保する。この計算ブロックは第
１８Ｂ図に示すように、＃１計算ブロック、＃２計算ブ
ロックの２つである。１つの計算ブロックには、１つの
中心画素の情報（コントロールビット＝ＯＯ）と８つの
周辺画素の情報（コントロールビット＝１１）からなる
データが含まれる。ＲＡＭＩ４には更に、演算結果を格
納する２つの送信バッファ（ＳＢ）も用意されている。<Processing on the PE side according to the second embodiment> The image processing procedure of the second embodiment on the PE side will be explained using FIG. 17 and subsequent figures. The image processing in this second embodiment is Laplacian processing having the matrix shown in FIG. In order to perform this process, the 18th A
Secure two calculation blocks of the same size as the Laplacian filter shown in the figure. As shown in FIG. 18B, there are two calculation blocks, a #1 calculation block and a #2 calculation block. One calculation block includes data consisting of information about one central pixel (control bit=OO) and information about eight peripheral pixels (control bit=11). The RAMI 4 is further provided with two transmission buffers (SB) for storing calculation results.

上記２つの計算ブロック及び送信バッファを管理するた
めに、第１８Ｂ図に示したような６つのフラグを用いる
。これらのフラグは、＃１計算ブロックが使用されてい
ることを示す＃１使用中フラグ、＃１計算ブロックに１
つの中心画素情報と８つの周辺画素情報の全てが揃った
ことを示す＃ＩＢフルフラグ、＃１計算ブロックの画像
処理が終了して＃ＩＳＢに計算結果が格納されたことな
示す＃ＩＳフルフラグ等である。＃２計算ブロックにつ
いても同様である。In order to manage the two calculation blocks and the transmission buffer, six flags as shown in FIG. 18B are used. These flags include the #1 busy flag, which indicates that the #1 calculation block is in use, and the #1 used flag, which indicates that the #1 calculation block is used.
The #IB full flag indicates that all 1 center pixel information and 8 peripheral pixel information are complete, and the #IS full flag indicates that the image processing of the #1 calculation block has been completed and the calculation results have been stored in the #ISB. be. The same applies to #2 calculation block.

先ず、コントロールビット＝００を伴なう中心画素のメ
モリアドレス／画像データのペアを人力キューから＃１
計算ブロックに入力し、更にその中心画素の周りの８つ
の周辺画素情報をＲ／Ｗコントローラ１１にリクエスト
するまでを説明する。第１９Ａ図のステップ５１００で
、＃１計算ブロックが使用中でないことを確認して、ス
テップ５１０２に進み、入力キューが空でなければ、入
力キュー−計算ブロックのサブルーチンを実行する。第
１９Ｂ図のこのサブルーチンのステップ５２００で、入
力キューからコントロールビット＝００の中心画素情報
を＃１計算ブロック・に格納し、ステップ５２０２でカ
ウンタ３２をデクリメントし、ステップ５２０４では＃
１使用中フラグをセットし、ステップ５２０６では周辺
画素アドレスを計算して求め、この計算して求めたアド
レスデータを＃１計算ブロックのＸＹ領領域格納する。First, the memory address/image data pair of the center pixel with control bit = 00 is retrieved from the manual queue #1.
The process from inputting to the calculation block to requesting information on eight peripheral pixels around the center pixel to the R/W controller 11 will be explained. At step 5100 in FIG. 19A, it is confirmed that the #1 calculation block is not in use, and the process proceeds to step 5102, where if the input queue is not empty, the input queue-calculation block subroutine is executed. In step 5200 of this subroutine in FIG. 19B, the center pixel information with control bit = 00 is stored from the input queue into the #1 calculation block, the counter 32 is decremented in step 5202, and the #1 calculation block is decremented in step 5204.
1 in use flag is set, and in step 5206, peripheral pixel addresses are calculated and the calculated address data is stored in the XY area of the #1 calculation block.

ステップ３２０８では＃２計算ブロックのＸＹを調べて
重複する画素を探す（第９図参照）。In step 3208, the XY of calculation block #2 is examined to find overlapping pixels (see FIG. 9).

最初の段階では＃２計算ブロックと重複する画素は勿論
ないが、連続したラプラシアン処理を次々に＃、１計算
ブロツクと＃２計算ブロックとで行なえば、必ず重複し
た画素が出てくる。もし重複した画素があれば、ステッ
プ５２１０でその画素の画像データをコピーする。こう
することにより、重複する画素をＲ／Ｗコントローラ１
１にリクエストする手間が省は処理が効率化する。At the initial stage, of course, there are no pixels that overlap with the #2 calculation block, but if consecutive Laplacian processing is performed on the #, 1 calculation block and the #2 calculation block one after another, overlapping pixels will inevitably appear. If there is a duplicate pixel, the image data of that pixel is copied in step 5210. By doing this, the R/W controller 1
The process becomes more efficient as it saves the time and effort required to make a request.

この時点で、＃１計算ブロックには、中心画素のメモリ
アドレス／画像データのペア（コントロールビット＝０
０）と、重複して＃２計算ブロックからコピーされたメ
モリアドレス／画像データのペア（コントロールビット
＝００，１１）と、Ｒ／Ｗコントローラ１１に画像デー
タを送ることを要求する画素のアドレスとが揃っている
。更に、ステップ５２１２で、ＸＹアドレスのみが＃１
計算ブロックに存在するものに対して、コントロールビ
ット；１１とする。ステップ５２１６ではリクエスト有
フラグをセットして、リターンする。At this point, the #1 calculation block contains the memory address/image data pair of the center pixel (control bit = 0
0), the memory address/image data pair (control bits = 00, 11) duplicated from #2 calculation block, and the address of the pixel that requests image data to be sent to the R/W controller 11. are available. Furthermore, in step 5212, only the XY address is #1.
The control bit is set to 11 for those existing in the calculation block. In step 5216, the request presence flag is set and the process returns.

ステップ５１０６にリターンすると、出力キューが満杯
になっていないことを確認して、ステップ３１０８のサ
ブルーチンで出力キューにコントロールビット＝１１の
リクエストデータなロードする。即ち、第１９Ｃ図のリ
クエスト出力キューサブルーチンのステップ５２２０で
、このリクエストを出力キューにロードし、ステップＳ
２２２でカウンタ４２（第８Ｂ図）をインクリメントし
、ステップ５２２４で全部のリクエストを出力キューに
ロードしたかを調べ、全部ロードしたのなら、ステップ
８２２６でリクエスト有フラグをリセットする。このサ
ブルーチンからステップ５１００にリターンする。When the process returns to step 5106, it is confirmed that the output queue is not full, and the request data with control bit=11 is loaded into the output queue in the subroutine of step 3108. That is, in step 5220 of the request output queue subroutine of FIG. 19C, this request is loaded into the output queue, and step S
At step 222, the counter 42 (FIG. 8B) is incremented, and at step 5224 it is checked whether all requests have been loaded into the output queue. If all requests have been loaded, the request presence flag is reset at step 8226. From this subroutine, the process returns to step 5100.

次のサイクルのステップ５１００では、使用中フラグが
セットしているからステップ５１２０に進む。＃ＩＢフ
ルフラグは未だセットしていないから、ステップＰ１２
２に進み、リクエスト有フラグを調べる。このフラグは
ステップ５２１６でセットしたから、ステップ５１２４
でリクエスト出力キューロードサブルーチンを実行して
、ステップ５１００にリターンする。In step 5100 of the next cycle, since the in-use flag is set, the process advances to step 5120. #IB full flag is not set yet, so step P12
Proceed to step 2 and check the request flag. Since this flag was set in step 5216, step 5124
The request output queue load subroutine is executed in step 5100, and the process returns to step 5100.

（、、ケラ２３．。。−、、ｆＶ２Ｓ　１２０−＊、、
ｆ”）ブ５１２２→ステップ５１２４→ステップ５ｔＯ
０のループで、やがて、全てのコントロールビット＝１
１のリクエストデータが出力キューにロードされ、リク
エスト有フラグはステップ８２２６でリセットされる。(,,Kera23...-,,fV2S 120-*,,
f”) 5122 → Step 5124 → Step 5tO
In a loop of 0, eventually all control bits = 1
1 request data is loaded into the output queue, and the request presence flag is reset in step 8226.

＃１計算ブロックのデータが全部揃わない内にリクエス
ト有フラグがリセットしたら、ステップ５１２２→ステ
ツプ５１３０に進んで、＃２計算ブロックの使用状態を
調べる。＃２計算ブロックにデータを格納する手順は大
体＃１計算ブロックの場合と同じであるので説明は省略
する。If the request presence flag is reset before all of the data in the #1 calculation block is collected, the process advances from step 5122 to step 5130 to check the usage status of the #2 calculation block. The procedure for storing data in the #2 calculation block is roughly the same as that for the #1 calculation block, so a description thereof will be omitted.

Ｒ／Ｗコントローラ１１がボールをインターフェース制
御部に送って、出力キューからコントロールビット＝１
１のリクエストを受ける。Ｒ／　Ｗコントローラ１１は
前述したシーケンスに従って、コントロールビット＝１
１を付けて周辺画素の画像データをＰＥにセレクテイン
グシーケンスで返す。このデータを受けると、インター
フェース制御部１２ｂはＭＰＵに割込みを掛ける。割込
みルーチンは第１９Ｄ図に示されれる。The R/W controller 11 sends the ball to the interface control unit, and the control bit = 1 from the output queue.
Receive 1 request. The R/W controller 11 sets the control bit to 1 according to the sequence described above.
1 is attached and the image data of the surrounding pixels is returned to the PE in a selecting sequence. Upon receiving this data, the interface control unit 12b interrupts the MPU. The interrupt routine is shown in Figure 19D.

割込みを受けると、ＭＰＵはバス１７上のメモリアドレ
ス／画像データのペアを取り込み（ステップ５２４０）
、そのメモリアドレスに従って空いている計算ブロック
に格納する（ステップ５２４２）。ステップ５２４４で
は、周辺画素の情報が全部揃ったかをみて、揃っていた
らステップ５２４６で＃ＩＢフルフラグをセットする。Upon receiving the interrupt, the MPU retrieves the memory address/image data pair on bus 17 (step 5240).
, is stored in a vacant calculation block according to its memory address (step 5242). In step 5244, it is checked whether all the information on the surrounding pixels is complete, and if it is complete, the #IB full flag is set in step 5246.

出力キューに格納されたリクエストキューの数だけ割込
みは発生し、それで＃ＩＢフルフラグはセットする。Interrupts occur as many times as there are request queues stored in the output queue, and the #IB full flag is set.

＃１計算ブロックに割込みによりコントロールビット干
１１の情報を格納している間に、ステップ５１００→ス
テツプ５１２０→ステツプ５１２２→ステツプ５１３０
→ステツプ５１３２→ステツプ５１３４→ステツプ５１
３６の過程で、＃２計算ブロック用のリクエストキュー
（コントロールビット＝１１）が出力キューに格納され
Ｒ／Ｗコントローラ１１に送られる。そして、＃１計算
ブロックの場合と同じように、割込みにより＃２計算ブ
ロックにも必要なデータが詰められていく。While storing the information of the control bit 11 in the #1 calculation block by interrupt, step 5100 → step 5120 → step 5122 → step 5130
→ Step 5132 → Step 5134 → Step 51
In step 36, the request queue for #2 calculation block (control bit=11) is stored in the output queue and sent to the R/W controller 11. Then, as in the case of the #1 calculation block, the #2 calculation block is also filled with necessary data by an interrupt.

＃ＩＢフルフ５ラグ若しくは＃２Ｂフルフラグ（又は、
両者）がセットすると、ステップ５１５２（ステップ５
１４０）で画像処理のサブルーチンが実行される。画像
処理のサブルーチンを第１９Ｅ図に示す。#IB full flag 5 flag or #2B full flag (or
both) are set, step 5152 (step 5
At step 140), an image processing subroutine is executed. The image processing subroutine is shown in FIG. 19E.

この画像処理サブルーチンは、第１７図のラプラシアン
処理を行なう。フィルタ要素をａ目、計算ブロックに格
納されている各画素の画像データをｕＩＪ％処理後の中
心画素の画像データをｖｎｆｆｉとすると、ｖ　ｎ、＝　　Σ　　ｕ　ｎ−１＋凰、ｍ−１＋Ｊ　　
’　　ａ　　ＩＪＩＪｓ。This image processing subroutine performs the Laplacian processing shown in FIG. If the filter element is a-th, and the image data of the center pixel after uIJ% processing of the image data of each pixel stored in the calculation block is vnffi, then v n, = Σ u n-1 + 凰, m-1 + J
' a IJIJs.

で表わされる。このマトリクス計算を逐次的に行ってい
るのがステップ３１７０〜ステツプ５１８２である。、
を寅算が終了すると、ステップ３１８４で中心画素のメ
モリアドレスを送信バッファのＸＹ領領域、上記演算さ
れた■を送信バッファの画像データ領域に格納し、ステ
ップ８１８６で画像ｆｉｌｌが終了してＲ／Ｗコントロ
ーラ１１に出力キューを介して送出準備が整った事を示
す＃ＩＳフルフラグをセットする。It is expressed as Steps 3170 to 5182 sequentially perform this matrix calculation. ,
When the calculation is completed, in step 3184, the memory address of the center pixel is stored in the XY area of the transmission buffer, and the calculated ■ is stored in the image data area of the transmission buffer, and in step 8186, image filling is completed and R/ The #IS full flag is set in the W controller 11 to indicate that it is ready for sending out via the output queue.

ステップ５１５４（ステップ５１４２）にリタ１　　−
ンして、演算結果を出力キューに出力するために、出力
キューが満杯かを見る。満杯でなければステップ５ｔｓ
６（ステップ５１４６）で送信バッファのコントロール
ビットを１０にして、結果出力キューロードサブルーチ
ンを実行する。このサブルーチンは第１９Ｆ図に示す通
りである。ステップ５１５４（ステップ５１４２）で出
力キューが満杯ならば、ステップ５１００→ステツプ５
１２０→ステツプ５１５０→ステツプ５１５４→ステツ
プ５ｔｏｏのループ、またはステップ５１００→ステツ
プ５１２０→ステツプ５１２２→ステツプ５１３０→ス
テツプ５１３２→ステツプ８１３８→ステツプ５１４２
→ステツプ５１００のループで、出力キューが空くのを
待つ。In step 5154 (step 5142), Rita 1 -
to check whether the output queue is full in order to output the calculation result to the output queue. If not full, step 5ts
6 (step 5146), the control bit of the transmission buffer is set to 10, and the result output queue load subroutine is executed. This subroutine is shown in FIG. 19F. If the output queue is full in step 5154 (step 5142), step 5100 → step 5
120 → step 5150 → step 5154 → step 5too loop, or step 5100 → step 5120 → step 5122 → step 5130 → step 5132 → step 8138 → step 5142
→ In the loop of step 5100, wait until the output queue becomes free.

結果出力キューロードサブルーチン（第１９Ｆ図）では
、出力キューにロードすると共に、使用中フラグ、Ｓフ
ルフラグ、Ｓフルフラグ等をリセットする。In the result output queue load subroutine (FIG. 19F), the results are loaded into the output queue, and the in-use flag, S full flag, S full flag, etc. are reset.

く第２実施例の効果〉この第２実施例は前述の第１実施例の効果に加えて、 ■：複数画素の一処理単位として、この一処理単位が中
心画素のアドレスと共に１つのＰＨに割り付けられるの
で、この割り付けは空いているＰ秤であればどこのＰＥ
でもよくなり、Ｒ／Ｗコントローラ１１側では割り付け
の手間から開放され、Ｒ／Ｗコントローラ１１は最大限
の処理速度を発揮する。Effects of the second embodiment> In addition to the effects of the first embodiment described above, the second embodiment has the following advantages. This assignment can be assigned to any PE if there is an empty P scale.
However, the R/W controller 11 side is freed from the trouble of allocation, and the R/W controller 11 achieves maximum processing speed.

０２画像メモリをＰＥの数と等しい数の領域に分割して
、この領域なＰＥと経路上で緩く結合し、送ろうとした
ＰＥが空いていなくても、第１１Ｂ図のように結合関係
をダイナミックに変更して緩い結合関係を維持する。エ
リア分割することにより、画像メモリとＰＥとの縦方向
の結合関係が自由になる。02 Divide the image memory into areas equal to the number of PEs and loosely connect PEs in these areas on the route, and dynamically change the connection relationship as shown in Figure 11B even if the PE you want to send is not free. to maintain a loosely coupled relationship. By dividing into areas, the vertical connection relationship between the image memory and the PE becomes free.

■：上記の緩い結合関係を維持することにより、連続し
た画素はなるべく同じＰＥに送られるので、重複画像が
バス上で何度も転送されるという無駄がなくなる。(2): By maintaining the above-mentioned loose coupling relationship, consecutive pixels are sent to the same PE as much as possible, so there is no need for redundant images to be transferred over and over again on the bus.

以上２つの実施例を用いて説明したが、その変形は色々
可能である。特に、ＰＥ側でどのタスクを最優先に処理
するかや、ハードウェア部分をソフトウェアに変更した
り、またその逆等の種々の修正は本発明の趣旨を逸脱し
ない範囲内で可能である。Although the above description has been made using the two embodiments, various modifications thereof are possible. In particular, various modifications such as determining which task should be processed with highest priority on the PE side, replacing hardware with software, and vice versa are possible without departing from the spirit of the present invention.

［発明の効果］以上説明したように、本発明の画像の並列処理装置によ
れば、複数の画素からなる画像処理の一単位を、画素ア
ドレスと画像データとの入力ペアとして、空いているプ
ロセサに送られ、画像処理はこのアドレスと画像データ
とに基づいて行なわれ、画像処理後のアドレスと画像デ
ータも出力ペアにして送り返されるので、上記入力ペア
をいずれのプロセサに送っても全体として画像処理は確
実に行なわれる。しかも、入力ペアの送り側はその入力
ペアを特定のプロセサに送らなくてはならないという手
間から開放されるので、並列画像処理を効果的に行い、
高速に処理が可能となる。[Effects of the Invention] As described above, according to the image parallel processing device of the present invention, one unit of image processing consisting of a plurality of pixels is processed as an input pair of a pixel address and image data, and a vacant processor is processed. Image processing is performed based on this address and image data, and the address and image data after image processing are also sent back as an output pair, so no matter which processor the above input pair is sent to, the image as a whole is Processing is performed reliably. Furthermore, the sender of the input pair is freed from the hassle of having to send the input pair to a specific processor, which makes parallel image processing more effective.
Processing becomes possible at high speed.

更に本発明の１実施態様によれば、画像データを格納す
るメモリをプロセサ数に等しい領域に分割すると、上記
の入力ペアを特定のプロセサに送らなくてはならないと
いう手間から開放されるという特徴は維持されながら、
前記分割された同一領域内での画素の連続性はプロセサ
内で保存さ１　　　　れ、処理の効率が更に上がる。Furthermore, according to one embodiment of the present invention, the feature that when the memory for storing image data is divided into areas equal to the number of processors, the trouble of having to send the above-mentioned input pairs to a specific processor is relieved. While being maintained,
The continuity of pixels within the same divided area is preserved within the processor, further increasing processing efficiency.

[Brief explanation of drawings]

第１図は本発明を適用した一実施例の全体ブロック図第２図は画像メモリの構成を示す図、第３Ａ図〜第３Ｃ図はバス上を流れるデータのフォーマ
ットを示す図、第４図はコントロールビットの使われ方を説明する表を
表わした図、第５Ａ図、第５Ｂ図は夫々、第１実施例、第２実施例に
おけるメモリアドレス／画像データのペアの転送動作を
示す図、第６図はＲ／Ｗコントローラ１１の構成を示す図、第７図はＭＰＵとキューバッファとの接続を示す図、第８Ａ図、第８Ｂ図は夫々、入カキュー、出力キューの
構成を示す図、第９図は、第２実施例においてブロック内の画素の重複
を説明する図、第１０図は画像メモリの領域分割を説明する図、第１１Ａ図、第１１Ｂ図は夫々、第２実施例においてＰ
Ｅとエリアとの対応関係が変更される様子を説明する図
、第１２Ａ図、第１２Ｂ図は夫々、第１実施例。第２実施例におけるデータの流れを説明する図、第１３
Ａ図、第１３Ｂ図はＲ／Ｗコントローラ１１の制御手順
を示すフローチャート、第１４図はＲ／Ｗコントローラ
１１のＲＡＭの使われ方を示す図、第１５図はＭＰＬＩにおける制御手順を示すフローチャ
ート、第１６Ａ図、第１６Ｂ図は夫々、従来と本実施例とを比
較したときのタイミングチャート、第１７図は第２実施
例に用いられる空間フィルタのマトリクス図、第１８Ａ図、第１８Ｂ図は夫々、第２実施例におけるＭ
ＰＵのＲＡＭの使われ方を説明する図、第１９Ａ図〜第
１９Ｆ図は第２実施例のＭＰＵの制御手順を示すフロー
チャートである。図中、１０・・・画像メモリ、１１・・・Ｒ／Ｗコントローラ
、１２−１〜１２−Ｎ・・・キューバッファ、１３−１
〜１３−Ｎ・・・ＭＰＵ、１４−ｔ〜１４−Ｎ、１９・
・・ＲＡＭ１１５・・・ＣＲＴコントローラ、１６・・
・表示装置、１７，１８．５０・・・バス、２０・・・
ＣＰＵ、５１・・・ＭＰＵ、５２・・・ＲＯＭ、５３・
・・ＲＡＭ、５４…バスインターフエース、３０，３１
，３６．４Ｏ・・・ＡＮＤゲート、３２．４２・・・カ
ウンタ、３３．４３・・・デコーダ、３４．４４・・・
セレクタ、３−５．４５・・・シフトレジスタである。第２図第３Ａ図第３Ｂ図第３Ｃ図果５Ａ図第９図第１１Ａ図第１１Ｂ図第１２Ａ図第１２８図第１６Ａ図第１６８図第１７図第１５図第旧へ図第旧Ｂ図第１９８図第１９Ｅ図第１９Ｆ図FIG. 1 is an overall block diagram of an embodiment to which the present invention is applied. FIG. 2 is a diagram showing the configuration of an image memory. FIGS. 3A to 3C are diagrams showing the format of data flowing on the bus. FIG. 5A and 5B are diagrams showing the transfer operation of the memory address/image data pair in the first embodiment and the second embodiment, respectively, FIG. 6 is a diagram showing the configuration of the R/W controller 11, FIG. 7 is a diagram showing the connection between the MPU and the queue buffer, and FIGS. 8A and 8B are diagrams showing the configurations of the input queue and output queue, respectively. , FIG. 9 is a diagram illustrating duplication of pixels within a block in the second embodiment, FIG. 10 is a diagram illustrating area division of the image memory, and FIGS. 11A and 11B are diagrams illustrating the second embodiment, respectively. In P
FIGS. 12A and 12B are diagrams illustrating how the correspondence relationship between E and area is changed, respectively, according to the first embodiment. Diagram 13 explaining the flow of data in the second embodiment
Figures A and 13B are flowcharts showing the control procedure of the R/W controller 11, Figure 14 is a diagram showing how the RAM of the R/W controller 11 is used, and Figure 15 is a flowchart showing the control procedure in MPLI. FIGS. 16A and 16B are timing charts comparing the conventional and this embodiment, FIG. 17 is a matrix diagram of the spatial filter used in the second embodiment, and FIGS. 18A and 18B are respectively , M in the second embodiment
Figures 19A to 19F, which are diagrams explaining how the RAM of the PU is used, are flowcharts showing the control procedure of the MPU in the second embodiment. In the figure, 10... Image memory, 11... R/W controller, 12-1 to 12-N... Queue buffer, 13-1
~13-N...MPU, 14-t~14-N, 19.
...RAM115...CRT controller, 16...
・Display device, 17, 18.50... bus, 20...
CPU, 51...MPU, 52...ROM, 53.
...RAM, 54...Bus interface, 30, 31
, 36.4O...AND gate, 32.42...Counter, 33.43...Decoder, 34.44...
Selector, 3-5.45...shift register. Figure 2 Figure 3A Figure 3B Figure 3C Figure 5A Figure 9 Figure 11A Figure 11B Figure 12A Figure 128 Figure 16A Figure 168 Figure 17 Figure 15 Old figure Figure Old B Figure 198 Figure 19E Figure 19F

Claims

[Claims]

(1) An image parallel processing device in which individual processors perform image processing in parallel with a plurality of pixels as one processing unit, comprising means for managing availability of each processor and completion of processing in each processor; Supply means for supplying, for each pixel of the processing unit, an input pair consisting of the image data of the pixel and the address of the input image space of this pixel to an unspecified free processor; and supplying the input pair of the one processing unit. A parallel image processing device characterized in that the processor processes the input pair of the one processing unit and returns an output pair obtained by adding an address of an output image space of the image data to the processed image data. .

(2) The parallel image processing apparatus according to claim 1, wherein the image processing is spatial filtering processing.

(3) When the supply means sends the input pair of one processing unit to the vacant processor, it sends the input pair multiple times, and sends the first input pair to the vacant processor, and the second and subsequent input pairs to the relevant processor. An image parallel processing device according to claim 1, characterized in that:

(4) The supply means includes: a dividing means for dividing the memory for storing image data into areas equal to the number of processors; a combining means for combining each processing unit of each area with the processor; 2. The image parallel processing apparatus according to claim 1, further comprising a changing means for changing the connection relationship.