JPS63197282A

JPS63197282A - Parallel image processor

Info

Publication number: JPS63197282A
Application number: JP2833287A
Authority: JP
Inventors: Toyokazu Uda; 豊和宇田; Susumu Sugiura; 進杉浦; Akiyoshi Fukumoto; 福本　晶美; Makoto Takaoka; 真琴高岡; Kentaro Matsumoto; 健太郎松本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1987-02-12
Filing date: 1987-02-12
Publication date: 1988-08-16

Abstract

PURPOSE:To prevent the degradation in the processing efficiency of the communications between a host processor and processor unit by transmitting the image data processed parallelly by plural image processor units through plural data transmitting paths. CONSTITUTION:The processor consists of an image memory 5 for planting the image data to be an object of image processing a plurality of image processor units 2-1-2-6, the host processor 1 which is connected to the image memory 5 and which is connected to the plural image processor units 2-1-2-6 through plural data-transmitting paths 3-1-3-6 and interface means 306-309 for transferring the image data before and after for every image processing unit between the host processor 1 and individual image processor units 2-1-2-6 through a plurality of the data transferring paths 3-1-3-6 for each image unit processor. The communications between the host processor and each unit processor are multiplexed to prevent the degradation in the processing efficiency even when plural image processor units are used.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、複数の画像処理ユニットを有する並列画像処
理装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a parallel image processing device having a plurality of image processing units.

［従来の技術］画像処理の分野においては、その処理対象のデータの膨
大さに鑑みて、複数の画像処理ユニット（以下、プロセ
ッシングユニットＰＵと略す）により並列処理されるこ
とがある。[Prior Art] In the field of image processing, in view of the enormous amount of data to be processed, data is sometimes processed in parallel by a plurality of image processing units (hereinafter abbreviated as processing units PU).

第２図は従来の並列画像処理装置を説明するブロック図
である。１はホストプロセッサ、２−１〜２−、は並列
処理を行うＰＵ（以降、これらを総称して２で表す）。FIG. 2 is a block diagram illustrating a conventional parallel image processing device. 1 is a host processor, and 2-1 to 2- are PUs that perform parallel processing (hereinafter, these will be collectively referred to as 2).

３２−１＋　　３−２等はプロセッサ間通信を行う通信
路（以降、これらを総称して３で表す）、４はホストプ
ロセッサ自身のメモリ、５は処理すべき画像やＩ！Ｉｊ
１埋した画像を保持する画像メモリ、６は画像メモリ５
のデータを表示するだめの表示コントローラ、７はＣＲ
，Ｔ、８はシステムバス、９は表示バスである。32-1+3-2, etc. are communication paths for inter-processor communication (hereinafter, these will be collectively referred to as 3), 4 is the host processor's own memory, and 5 is the image and I! Ij
1 is the image memory that holds the filled image, 6 is the image memory 5
7 is the display controller for displaying the data of
, T, 8 is a system bus, and 9 is a display bus.

第３図はＰＵ２の内部のブロック図であり、２０１はプ
ロセッサ間通信機能を持っＣＰＵ、２０２はローカルメ
モリ、２０３はプロセッサ内部の内部通信路である。FIG. 3 is an internal block diagram of the PU 2, in which 201 is a CPU having an inter-processor communication function, 202 is a local memory, and 203 is an internal communication path inside the processor.

第４図は画像処理対象の画像を説明する図であり、１０
は画像、１０−、〜１０−６は分割された部分画像を表
す。FIG. 4 is a diagram illustrating an image to be processed, and 10
represents an image, and 10- and 10-6 represent divided partial images.

第２．３．４図を用いて、従来例に係る並列画像処理つ
いて説明を行う。まず、画像メモリ５に保持される画像
１０はホストプロセッサ１により読出され分割され、各
分割された部分画像１０−１〜１０−４はそれぞれ、Ｐ
Ｕ２−、−ＰＵＩ、夫々のローカルメモリ２０２へ、通
信路３を介（）て伝送される。次に、各ＰＵ２はそれぞ
れのローカルメモリ２０２に保持する部分画像に対して
並列に画像処理を行う。その結果は通信路３を介してホ
ストプロセッサ１より画像メモリ５へ転送され、ＣＲ，
Ｔ　８へ結果が表示される。これが基本的な並列画像処
理である。Parallel image processing according to a conventional example will be explained using FIG. 2.3.4. First, the image 10 held in the image memory 5 is read out and divided by the host processor 1, and each divided partial image 10-1 to 10-4 is
U2-, -PUI are transmitted to the respective local memories 202 via the communication path 3 (). Next, each PU 2 performs image processing in parallel on the partial images held in each local memory 202 . The results are transferred from the host processor 1 to the image memory 5 via the communication path 3, and the CR,
The results are displayed on T8. This is basic parallel image processing.

［発明が解決しようとしている問題点］このようなシス
テムにおいては、以下の２つの問題があると考えられる
。[Problems to be Solved by the Invention] There are the following two problems in such a system.

第１に、ｆｉ理すべき画像を、各Ｐυへ転送する時、及
び各ＰＵからの処理された画像を画像メモリ５へ転送す
る時に、プロセッサ間通信を行う必要があるが、通常こ
の種の通信路３はシリアル転送で行なわれるため、通信
速度が遅く、画像データの転送に多くの時間が必要とな
る重大な欠点がある。First, it is necessary to perform interprocessor communication when transferring the image to be processed to each Pυ and when transferring the processed image from each PU to the image memory 5, but normally this kind of communication is required. Since the communication path 3 is serially transferred, the communication speed is slow and there is a serious drawback that it takes a lot of time to transfer the image data.

第２に、通常、画像処理は処理すべき画素に対して近傍
演算が多い。このため、画像処理は部分画像の境界領域
を除いて、各ＰＵ内で処理を完結できる。また、境界領
域も通信路３を介してＰＵ間通信を行うことにより処理
できる。このときの通信量は全体の処理時間に比べ小さ
い。ところが、例えば画像の回転を行わせようとすると
、各ＰＵでそれぞれ保持する各部分画像をプロセッサ間
通信によりデータ転送を行い、画素の並べ換えを行わう
必要がでてくる。特に、ＰＵを増やし、並列性を高め、
高速化を図ろうとすると、それだけ上記のｐｕ間での通
信量が増加し、通信に要する時間が多くなり、処理効率
の低下が生じてしまう。同様な画像を分割して処理する
場合にプロセッサ間通信量が多い場合には、通ず３に時
間がかかるという重大な問題点がある。Second, image processing usually involves many neighborhood calculations for pixels to be processed. Therefore, image processing can be completed within each PU except for the boundary area of the partial image. Further, the boundary area can also be processed by performing inter-PU communication via the communication path 3. The amount of communication at this time is small compared to the entire processing time. However, when attempting to rotate an image, for example, it becomes necessary to transfer data of each partial image held by each PU through inter-processor communication and rearrange the pixels. In particular, increase the number of PUs, increase parallelism,
If an attempt is made to increase the speed, the amount of communication between the PUs will increase accordingly, the time required for communication will increase, and processing efficiency will decrease. When similar images are divided and processed and the amount of communication between processors is large, there is a serious problem that it usually takes time.

即ち、上記の問題を整理すれば、各ＰＵに課せられた画
像処理の処理単位が各ＰＵ間で実質的に独立したもので
あれば、境界領域の処理は少ないためにＰＵ間通信はそ
れほど問題にならないが、一度に大量に発生するホスト
プロセッサとＰＵ間の通信が処理効率上のネックになる
。逆に、画像の回転の如く、画像処理の単位が複数のＰ
Ｕ間にまたがる場合は、各ＰＵ間の通信が処理効率上の
ネックになる。In other words, if the above problem is summarized, if the processing unit of image processing imposed on each PU is substantially independent between each PU, communication between PUs will not be a problem because there will be less processing in the boundary area. However, communication between the host processor and the PU, which occurs in large quantities at once, becomes a bottleneck in processing efficiency. Conversely, when the unit of image processing is multiple P, such as image rotation,
In the case of spanning between U, communication between each PU becomes a bottleneck in processing efficiency.

そこで、本発明は上記従来技術の問題点を解消するため
に提案されたものでその目的は、複数の画像処理ユニッ
トを用いても、ホストプロセッサと各処理ユニット間の
通信が処理効率を低下させる要因とならない並列画像処
理装置を提供する点にある。Therefore, the present invention has been proposed to solve the above-mentioned problems of the prior art.The purpose of the present invention is to solve the problem that even if a plurality of image processing units are used, communication between the host processor and each processing unit reduces processing efficiency. The object of the present invention is to provide a parallel image processing device that does not become a factor.

［問題点を解決するための手段］上記課題を達成するための本発明の構成は、画像処理対
象の画像データを格納する画像メモリと、画像処理を行
う複数の画像処理ユニットと、前記画像メモリに接続さ
れると共に、上記複数の画像処理ユニットと複数のデー
タ転送路により接続されるホストプロセッサと、画像処
理単位毎に処理前の前記画像データと処理後の画像デー
タとを前記ホストプロセッサと個々の画像処理ユニット
間で前記複数のデータ転送路を介して転送するインター
フェース手段とを有することを特徴とする。[Means for Solving the Problems] The configuration of the present invention for achieving the above-mentioned problems includes an image memory that stores image data to be image processed, a plurality of image processing units that perform image processing, and the image memory. and a host processor connected to the plurality of image processing units by a plurality of data transfer paths, and a host processor that transfers the unprocessed image data and the processed image data to the host processor individually for each image processing unit. and interface means for transferring data between the image processing units via the plurality of data transfer paths.

［作用］上記構成によると、複数の画像処理ユニットにより並列
処理された結果の画像データは、複数のデータ転送路を
介して、ホストプロセッサに転送される。[Operation] According to the above configuration, image data resulting from parallel processing by a plurality of image processing units is transferred to a host processor via a plurality of data transfer paths.

［実施例］以下添付図面を参照しつつ本発明に係る実施例を詳細に
説明する。[Examples] Examples according to the present invention will be described in detail below with reference to the accompanying drawings.

〈全体構成〉第１図は実施例の画像処理装置の全体ブロック図で、１
はホストプロセッサ、２−４〜２−８は並列画像処理を
行うプロセッシングユニット（ＰＵ）、３はプロセッサ
間通信を行う通信路、４はホストプロセッサ１のプログ
ラム等を格納するメモリ、５は画像データを保持する画
像メモリ、６は画像メモリを表示するための表示コント
ローラ、７はＣＲＴ、８はシステムバス、９は表示バス
である。<Overall configuration> Figure 1 is an overall block diagram of the image processing device of the embodiment.
is a host processor, 2-4 to 2-8 are processing units (PUs) that perform parallel image processing, 3 is a communication path for inter-processor communication, 4 is a memory that stores programs etc. of the host processor 1, and 5 is image data 6 is a display controller for displaying the image memory, 7 is a CRT, 8 is a system bus, and 9 is a display bus.

第１図システムは２つの大きな機能を有する。The FIG. 1 system has two major functions.

その１つはホストプロセッサ１が複数の通信路を有して
いることである。第１図の例では、ホストプロセッサ１
は３つの通信路３−１＋　３−３．３−ｓを有している
。第２の機能は並列処理とパイプライン処理を使い分け
ることができることである。One of them is that the host processor 1 has multiple communication paths. In the example of FIG. 1, host processor 1
has three communication paths 3-1+3-3.3-s. The second feature is that parallel processing and pipeline processing can be used selectively.

第５図に、これらのＰＵ及びホストプロセッサ１に使用
されるマイクロプロセサＬＳＩ（大規模集積回路）の内
部構成を示す。このマイクロプロセサは英国のＩＮＭＯ
３Ｎ製０７８００である。FIG. 5 shows the internal configuration of a microprocessor LSI (large scale integrated circuit) used in these PUs and the host processor 1. This microprocessor is manufactured by INMO in the UK.
It is 07800 made by 3N.

図中、３０１は３２ビツトｃｐｕであり、３０６〜３０
９はリンク（通信路）インターフェース、３０４は４に
バイトのローカルメモリ、３０５は内蔵メモリ３０４の
拡張メモリ用のインターフェース、３０２は割込み等の
システムサービスを統御する部分、又３００は浮動小数
点演算ユニットである。ＬＳＩが内蔵する４つのリンク
インターフェースを、第１図に示すように、各Ｐ　ＴＪ
間で結合する。又、このＬＳＩをホストプロセッサ１と
して使う場合は、３つのリンクインターフェースを、Ｐ
　ｔｌ　２−１．　２−、．２−５と接続する。In the figure, 301 is a 32-bit CPU, and 306 to 30
9 is a link (communication path) interface, 304 is a 4-byte local memory, 305 is an interface for expanded memory of the built-in memory 304, 302 is a part that controls system services such as interrupts, and 300 is a floating point arithmetic unit. be. The four link interfaces built into the LSI are connected to each P TJ as shown in Figure 1.
join between. Also, when using this LSI as host processor 1, connect the three link interfaces to P
tl 2-1. 2-,. Connect with 2-5.

画像処理は各ＰＵが行なう、ＰＵ２−．１〜２−８は、
データ処理に先立ち、ホストプロセッサから画像処理の
手順を記述した処理プログラムを通信路１．．３−３＋
　　３．、、、を介して受信して、ローカルヌ干り°１
０４に格納する。このようなプログラムどじで・二の実
施例では、第７図と第９図に夫々示したところの二値化
処理と画像回転処理を挙げて説明する。ローカルメモリ
３０４に格納されたプログラムは第６図のように格納さ
れている。即ち、第７図若しくは第９図の処理プログラ
ムが第６図の処理プログラムに該当する。そして、本実
施例の並列処理装置では、並列処理又はバイブライン処
理を効率よく行なうために、第６図にも示すように、処
理プログラムの他に種々のデータをローカルメモリ３０
４に格納する。第６図中の送信元プロセサ番号はデータ
を受けた相手のプロセサの番号を示す。又、送信元プロ
セサ番号は、画像処理結果を送るべき送信先ゾロセＬト
の番号である。プロセサ番号は、第１図システムではホ
ストプロセッサ１には０”を、ＰＵ２−、には１”を、
ＰＵ２−、には“２°°・−・が割り当てられている。Image processing is performed by each PU, PU2-. 1 to 2-8 are
Prior to data processing, a processing program describing image processing procedures is sent from the host processor to the communication channel 1. ．． 3-3+
3. Receive through , , , local despatch °1
Store in 04. The second embodiment of such a program will be described with reference to the binarization process and image rotation process shown in FIGS. 7 and 9, respectively. The programs stored in the local memory 304 are stored as shown in FIG. That is, the processing program in FIG. 7 or 9 corresponds to the processing program in FIG. 6. In the parallel processing device of this embodiment, in order to efficiently perform parallel processing or vibeline processing, various data are stored in the local memory 30 in addition to processing programs, as shown in FIG.
Store in 4. The source processor number in FIG. 6 indicates the number of the other party's processor that received the data. Further, the transmission source processor number is the number of the transmission destination to which the image processing result is to be sent. In the system shown in Figure 1, the processor number is 0'' for host processor 1, 1'' for PU2-,
"2°°..." is assigned to PU2-.

単純な並列処理の場合には、各１．−″（Ｊ内に格納さ
れた、送信先プロセサ番号はポストプロセッサ１のプロ
セサ番号“０°゛である。一方、例えばバイブライン処
理等の場合は、そのアルゴリズムに従ったプロセッサ番
号が格納される。又、隣接プロセサ番号とは、このＰＵ
が担当する分割画像に隣接する分割画像であって、その
分ス１１画像の処理を担当するＰＵのプロセサ番号であ
る。In the case of simple parallel processing, each 1. -" (The destination processor number stored in J is the processor number "0°" of post-processor 1. On the other hand, for example, in the case of vibe line processing, the processor number according to the algorithm is stored. .Also, the adjacent processor number means this PU
This is the processor number of the PU that is in charge of processing the 11th divided image that is adjacent to the divided image that is in charge of.

（並列処理〉並列処理として、例えば第４図面像を閾値マトリクスに
より二値化する場合を考えてみる。第７図にその処理手
順を示す。この二値化処理は、例えば４×４のマトリク
スの各要素と画像データの画素とを比較して（ステップ
Ｓ１）、その大小判別（ステップＳ２）により、白又は
黒と二値化（ステップ５３．Ｓ４）する。ステップＳ５
では二値化結果をローカルメモリ３０４に格納する。(Parallel processing) As parallel processing, let us consider, for example, the case where the fourth drawing image is binarized using a threshold matrix. The processing procedure is shown in Fig. 7. Each element of is compared with a pixel of the image data (step S1), and the size is determined (step S2) and binarized as white or black (step 53.S4).Step S5
Then, the binarization result is stored in the local memory 304.

、ごの動作を、同じブロック内の全ての画素について実
行する（ステップＳ７）。ステップＳ８では、この現在
処理中のブロックが、このＰＵの処理担当の最終ブロッ
クかを調べる。もし最終ブロックならばステップＳ１２
で、処理済みの画像データを、送信先プロセサ番号に示
されたユニットに転送する。この場合の送信先プロセサ
番号は前述したようにホストプロセッサ１の”０″であ
る。, are executed for all pixels in the same block (step S7). In step S8, it is determined whether the block currently being processed is the final block to be processed by this PU. If it is the last block, step S12
The processed image data is transferred to the unit indicated by the destination processor number. In this case, the destination processor number is "0" of the host processor 1, as described above.

ステップＳ８で、もしまだ処理すべきブロックが残って
いるのならば、ステップＳ９へ進み、ローカルメモリ３
０４から次のブロックを切り出す。ステップＳ１０では
、この切り出されたブロックが境界領域にあるか否かを
調べる。これは、切り出されたブロックが４×４の大き
さをもたないことによりわかる。もし境界領域にないの
ならば、ステップＳ１へ戻り、前述のステップ５１以下
を繰り返す。境界領域にあるのならば、ステップＳ１１
で、隣接プロセサ番号を求め、そのＰＵから、必要な画
素のデータをもらい、ステップＳ１へ戻る。In step S8, if there are still blocks to be processed, the process advances to step S9, and the local memory 3
Cut out the next block from 04. In step S10, it is determined whether this extracted block is in a boundary area. This can be seen from the fact that the cut out blocks do not have a 4×4 size. If it is not in the boundary area, the process returns to step S1 and the steps from step 51 described above are repeated. If it is in the boundary area, step S11
Then, the adjacent processor number is determined, necessary pixel data is obtained from that PU, and the process returns to step S1.

上記のステップは各ＰＵに共通しており、従って、処理
を終了したＰＵは一斉にホストプロセッサ１へ処理デー
タをリンクインターフェース（通信路３）を介して、転
送する。ホストプロセッサ１は、この通信路３−１、ｍ
ｌ、、、３−ｓ上のデータを自身のローカルメモリ３０
４に格納して、システムバス８を介して画像メモリ５に
戻す。メモリ４は、メモリインターフェース３０５を介
してホストプロセッサ１を接続され、外部メモリとなっ
ている。The above steps are common to each PU, and therefore, the PUs that have completed processing transfer the processing data to the host processor 1 all at once via the link interface (communication path 3). The host processor 1 uses this communication path 3-1, m
The data on l, , 3-s is stored in its own local memory 30
4 and returns to the image memory 5 via the system bus 8. The memory 4 is connected to the host processor 1 via the memory interface 305, and serves as an external memory.

〈並列処理方式の効果〉かくして、二値化の画像処理が終了した。この場合、二
値化の如き画像処理は、分割画像（画像処理の単位）毎
（ＰＵ毎）の画像処理が各ｐｕ間で独立しているので、
第７図のフローチャートにて示したように、２０間の通
信はわずかじか発生せず、通信のネックとなる部分はホ
ストプロセッサと各２０間の通信しかないと言える。し
かし、ホストプロセッサ１と各２０間とは複数のインク
−フェース通信路が存在しているので、処理スピードの
低下はない。即ち、ホストプロセッサ１として、複数の
通信路を持ちそれらを介して並列にデータ転送が可能な
ようなプロセッサを用いており、複数のＰｔｌと並列に
データ転送を行うことに′Ｘより、画像データを高速に
転送することが可能となる。更に、ホストプロセッサ１
と各２０間の命令や状態などのデータの受は渡しなども
並列ｒ実現でき、高速化が図れる。<Effects of parallel processing method> Thus, the binarization image processing is completed. In this case, image processing such as binarization is performed independently for each divided image (unit of image processing) (for each PU), so
As shown in the flowchart of FIG. 7, communication between the 20s occurs only slightly, and it can be said that the communication bottleneck is only the communication between the host processor and each 20. However, since there are a plurality of ink-face communication paths between the host processor 1 and each 20, there is no reduction in processing speed. That is, as the host processor 1, a processor that has multiple communication channels and can transfer data in parallel via them is used, and due to 'X', the image data can be transferred at high speed. Furthermore, host processor 1
The receiving and passing of data such as commands and status between each 20 units can be realized in parallel, increasing speed.

さて、本実施例ではホストプロセッサ１と各２０間の通
信路が３つの場合について説明したが、ホストプロセッ
サ１が２つ以上の通信路３を持ち、これらを介して並列
にデータ転送を行えるホストプロセッサを使用すること
も可能である。Now, in this embodiment, the case where there are three communication paths between the host processor 1 and each 20 has been described, but the host processor 1 has two or more communication paths 3 and can transfer data in parallel via these. It is also possible to use a processor.

また、第１図示の本実施例において、各ＰＵはデータ転
送と処理を同時に行うことにより、複数のバイブライン
処理も実現でとる。Furthermore, in the embodiment shown in the first figure, each PU performs data transfer and processing at the same time, thereby realizing multiple vibe line processing.

くバイブライン処理方式〉上記の二値化画像処理は各２０間での処理が互いに実質
的に独立しているので、純粋な意味での並列処理に適し
ていた。ところが、例えば画像の回転のように、処理前
の画素位置を担当するＰＵと処理後の画素位置を担当す
るＰＵとが異なるものとなってしまう場合は、ＰＵ間通
信が増えてくるので、ホストプロセッサ１と２０間との
通信路を増やしただけでは対処しきれない。そこで、画
像の回転の如くバイブライン処理が可能な場合には、各
ＰＵを論理的にバイブライン構成をとるように結合する
。この場合、システムの構成は第１図と変わらずに、Ｐ
Ｕのローカルメモリ３０４に格納される処理プログラム
、送信元プロセサ番号、送信先プロセサ番号等が、各ｐ
ｕ間でユニークなものとなる。Vibration Line Processing Method> The binarized image processing described above is suitable for parallel processing in a pure sense because the processing for each 20 pixels is substantially independent of each other. However, when the PU responsible for the pixel position before processing and the PU responsible for the pixel position after processing become different, such as when rotating an image, communication between PUs increases, so the host Simply increasing the number of communication channels between the processors 1 and 20 is not enough to deal with the problem. Therefore, if vibration line processing is possible such as image rotation, each PU is logically combined to form a vibration line configuration. In this case, the system configuration remains the same as in Figure 1, with P
The processing program, source processor number, destination processor number, etc. stored in the local memory 304 of U are stored in each page.
It will be unique among u.

画像の回転は次式で表わされる。即ち、読出し座標を（
ｘＲ，ｙｖ）、、書込み座標を（Ｘ、、Ｙ、）とすれば
、即ち、ｘ、　＝　　ｃｏｓθ・　Ｘｗ−３ＩＮθ−Ｙ。Image rotation is expressed by the following equation. In other words, the readout coordinates are (
xR, yv),, If the writing coordinates are (X,, Y,), then x, = cosθ·Xw-3INθ-Y.

ＹＲ＝ＳＩＮθ・ｘｗ＋ｃｏｓθ・Ｙｌである。YR=SINθ·xw+cosθ·Yl.

第８図に、この画像回転をバイブライン処理を行なって
実現する場合のシステムを、４つのＰＵによって実現し
たものを示す。即ち、ホストプロセッサ１は、回転後の
座標＝書込み座標（Ｘｗ。FIG. 8 shows a system that implements this image rotation by performing vibration line processing using four PUs. That is, the host processor 1 determines that the coordinate after rotation=the write coordinate (Xw).

Ｙ、）を次々と生成して、ＰｔＪｌｌに転送し、回転前
の座標２読出し座標（ＸＲ、ＹＲ）をＰＵ２−３から受
けとる。この読出し座ｊ７１４　（Ｘ　Ｒ，Ｙ　Ｒ）に
基づいて、画像メモリ５の内容を読出して、そのビット
値を、古込み座標（Ｘｗ、Ｙｗ）に書込む。Y, ) are generated one after another and transferred to PtJll, and the coordinate 2 read coordinates (XR, YR) before rotation are received from the PU2-3. Based on this readout coordinate j714 (X R, Y R), the contents of the image memory 5 are read out and the bit values are written in the old coordinates (Xw, Yw).

具体的に説明すると、ホストプロセッサ１はＰｕ２−ｉ
ｃ＝座標Ｘ、、Ｙ、を送る。ＰＵ−１では、ａ＝ＣＯ５
θ・Ｘｗを演算する。このａと座標（Ｘ　、。To explain specifically, the host processor 1 is Pu2-i
Send c=coordinates X,,Y,. In PU-1, a=CO5
Calculate θ·Xw. This a and the coordinates (X,.

Ｙ、）とをＰＬＩＩ２に送る。ＰＵ２−２では、ＸＲ＝
ａ−５ＩＮθ−Ｙ、を演算する。Ｐ　Ｕ　２−２はこの
演算結果ＸＲと（Ｘｗ、Ｙ、）とをＰＵ２−４に送る。Y, ) to PLII2. In PU2-2, XR=
a-5INθ-Y is calculated. PU 2-2 sends this calculation result XR and (Xw, Y,) to PU 2-4.

ＰＵ２−４はｂ＝ＳＩＮθ−Ｘｗを演算する。ＰＵ２−
４は、ＸＲ，（Ｘ、、Ｙ、）　　とｂをＰ　Ｕ　２−３
に送る。ＰｔＪ２−ｓはＹＲ＝ｂ＋　ＣＯ５θ・Ｙ、を
演算する。こうして、ＸＲとＹＲはホストプロセッサ１
へ送られる。PU2-4 calculates b=SINθ-Xw. PU2-
4 is XR, (X,, Y,) and b as P U 2-3
send to PtJ2-s calculates YR=b+CO5θ·Y. Thus, XR and YR are host processor 1
sent to.

ここで送信元プロセサ番号と送信先プロセサ番号をまと
めると、となる。Here, the source processor number and destination processor number are summarized as follows.

第９図は各ＰＵの画像処理手順を示す。このＰＵの手順
が実行される以前に、各ＰＵはそのプログラム、プロセ
サ番号等をホストプロセッサ１から受信して、内部のロ
ーカルメモリ３０４に格納しているものとする。又、第
９図の処理手順は説明の便宜上、各ＰＵに一般的な形式
で示す。ステップＳ２０ではローカルメモリ３０４から
送信元プロセッサ番号を読取る。ステップ３２１では、
この送信元のＰＩＪ（又はホストプロセッサ１）からの
データを待つ。データを受信すると、ステップＳ２２で
、このＰｔＪに課せられた画像処理の一部（例えば、第
８図のＰＵ２−１であれば、ａ＝ＣＯ５θ・ＸＷ）を実
行する。ステップＳ２３では送信先プロセサ番号を読出
す。ステップＳ２４では送信先がデータ受信可であるか
を調べ、可であれば、ステップＳ２５で処理結果データ
を送信する。FIG. 9 shows the image processing procedure of each PU. It is assumed that each PU receives its program, processor number, etc. from the host processor 1 and stores it in the internal local memory 304 before this PU procedure is executed. Further, for convenience of explanation, the processing procedure in FIG. 9 is shown in a general format for each PU. In step S20, the source processor number is read from the local memory 304. In step 321,
It waits for data from the sending source PIJ (or host processor 1). When the data is received, in step S22, part of the image processing imposed on this PtJ (for example, in the case of PU2-1 in FIG. 8, a=CO5θ·XW) is executed. In step S23, the destination processor number is read. In step S24, it is checked whether the destination is capable of receiving data, and if so, the processing result data is transmitted in step S25.

このように、第８図システムでは、４つのＰＵが４段の
バイブライン構成となっており、全体で１つの画素の回
転を実現するものとなっている。In this manner, in the system shown in FIG. 8, the four PUs form a four-stage vibe line configuration, and the rotation of one pixel is realized as a whole.

又、処理シーケンスはローカルメモリ内の送信元、送信
先プロセサ番号に規定されているので、各ＰＵは連続的
な処理が可能となり、高速の画像回転処理が達成される
。Furthermore, since the processing sequence is defined by the source and destination processor numbers in the local memory, each PU can perform continuous processing, achieving high-speed image rotation processing.

〈変形例〉画面分割による並列処理か、バイブライン処理による並
列処理かの選択は、上述したように、ホストプロセッサ
１が各ＰＵに処理手順及びプロセサ番号を送ることで選
択される。この場合、これらの処理手順を前もって各Ｐ
Ｕ内に格納しておぎ、ホストプロセッサ１が命令を出Ｉ
ノて選択するようにしてもよい。<Modification> As described above, the host processor 1 sends the processing procedure and processor number to each PU to select parallel processing using screen division or parallel processing using vibrine processing. In this case, these processing steps are performed in advance for each P.
It is stored in U and the host processor 1 issues the instruction I.
It is also possible to make the selection based on the information.

尚、上記実施例はＰＵが４個若しくは６個の場合につい
て述べたが、ＰＵ２が２個以上の場合についても容易に
実現できる。また、処理に全ＰＵを使用する必要はなく
、適宜必要な数を論理的にｔ妾続すればよい。Although the above embodiment has been described with reference to the case where there are four or six PUs, it is also possible to easily realize a case where there are two or more PU2. Further, it is not necessary to use all PUs for processing, and it is sufficient to logically concatenate as many as necessary.

また、ＰＵを格子状に接続しているが、これ以外の接続
形態、例えばスター状、リング状接続にも容易に適用可
能である。Furthermore, although the PUs are connected in a grid pattern, other connection forms such as star or ring connections can be easily applied.

［発明の効果］以上説明したように本発明の並列画像処理装置によれば
、複数の画像処理ユニットを用いても、ホストプロセッ
サと各処理ユニット間の通信が多重に行なえ、処理効率
は低下しない。[Effects of the Invention] As explained above, according to the parallel image processing device of the present invention, even if a plurality of image processing units are used, communication between the host processor and each processing unit can be performed multiplexed, and processing efficiency does not decrease. .

[Brief explanation of the drawing]

第１図は実施例に係る並列画像処理装置の全体図、第２図、第３図は従来技術を説明する図、第４図は従来
技術９本実施例に使われる画像分割の例を示す図、第５図は本実施例に用いられる画像処理用のＬＳＩの構
成図、第６図はローカルメモリ内の構成図、第７図、第９図は実施例の制御手順を示すフローチャー
ト、第８図は画像回転を行なうためのＰＵの構成を説明する
図である。図中、１・・・ホストプロセッサ、２−１〜２−ｓ・・
・プロセッシングユニット（ＰＵ）、３，１．〜３−６
・・・通信路、４・・・メモリ、５・・・画像メモリ、
３０１・・・ＣＰＵ、３０４・・・ローカルメモリ、３
０６〜３０９・・・リンクインターフェース（通信イン
ターフェースである。特許出願人　　キャノン株式会社第１図第３図第４図第５図第６図Fig. 1 is an overall diagram of a parallel image processing device according to an embodiment, Figs. 2 and 3 are diagrams explaining the conventional technology, and Fig. 4 shows an example of image division used in the conventional technology 9 and this embodiment. 5 is a configuration diagram of the image processing LSI used in this embodiment, FIG. 6 is a configuration diagram of the local memory, FIGS. 7 and 9 are flowcharts showing the control procedure of the embodiment, FIG. 8 is a diagram illustrating the configuration of a PU for image rotation. In the figure, 1... host processor, 2-1 to 2-s...
- Processing unit (PU), 3,1. ~3-6
...Communication path, 4...Memory, 5...Image memory,
301...CPU, 304...Local memory, 3
06-309...Link interface (communication interface. Patent applicant Canon Co., Ltd. Figure 1 Figure 3 Figure 4 Figure 5 Figure 6

Claims

[Claims]

(1) An image memory that stores image data to be image processed, a plurality of image processing units that perform image processing, and is connected to the image memory and connected to the plurality of image processing units by a plurality of data transfer paths. for each image processing unit, the image data before processing and the image data after processing are transferred between the host processor and each image processing unit via the plurality of data transfer paths. 1. A parallel image processing device comprising: an interface means.

(2) The parallel image processing apparatus according to claim 1, wherein the image processing unit is a divided image.