JPS63140379A

JPS63140379A - Parallel-picture processor

Info

Publication number: JPS63140379A
Application number: JP26640987A
Authority: JP
Inventors: Yoshiki Kobayashi; 芳樹小林; Tadashi Fukushima; 忠福島; Yoshiyuki Okuyama; 奥山　良幸
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1987-10-23
Filing date: 1987-10-23
Publication date: 1988-06-11

Abstract

PURPOSE:To obtain a parallel-picture processing processor having an architecture suitable for an LSI and in which a high speed processing can be attained by dividing a local parallel-picture processor into modules having the small number of input and output ports and a regular array. CONSTITUTION:The picture information of the picture memory 3 of a picture processing system is processed in the parallel-picture processing processor 2, the processed result is stored in a memory 3 or applied to a management proces sor 1 for controlling the entirety of the system. This processor 2 is constituted by combining the four modules of the basic modules 10A-10D of the picture processing processor having a processor element 12. Picture data to be processed in the picture data input port 24 of the modules 10A-10D is fetched and processed by an input picture shift register 11, the element 12, a partial sum arithmetic circuit 13 and a partial sum accumulation arithmetic circuit 14. The result processed in the respective modules 10A-10D is outputted from the picture data output port 25 and the LSI of the processor 2 is easily carried out.

Description

【発明の詳細な説明】本発明は、空間積和演算等の局所近傍画像処理を実行す
る並列画像処理プロセッサに係り、特にＬＳＩ化に適し
たアーキテクチャを有する並列画像処理プロセッサに関
する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a parallel image processing processor that performs local neighborhood image processing such as spatial product-sum operations, and particularly to a parallel image processing processor having an architecture suitable for LSI implementation.

画像処理プロセッサは、通産省大型プロジェクト「パタ
ーン情報処理システム」　（昭和５５年１０月に研究開
発成果発表論文集が発行されている。）にて開発されて
いるように、画像データを並列処理し高速化を図ろうと
しているものが多い。Image processing processors process image data in parallel at high speeds, as developed in the Ministry of International Trade and Industry's large-scale project "Pattern Information Processing System" (a collection of research and development results was published in October 1980). There are many things that we are trying to change.

画像データは２次元の広がりをもつため、全ての画像デ
ータを並列処理することは困難である。しかし、ノイズ
除去や輪郭抽出機能を実現する空間積和演算等のように
、近傍の画像データ間の演算が多いため、例えば画像の
ｍ行×ｎ列の局所的なデータを並列処理する例が多い。Since image data has a two-dimensional spread, it is difficult to process all image data in parallel. However, since there are many calculations between neighboring image data, such as spatial product-sum calculations that realize noise removal and contour extraction functions, for example, it is difficult to process local data in m rows by n columns of an image in parallel. many.

このような局所並列形画像処理は、前記文献あるいは木戸出正継−画像処理ハードウエアの動向；情報処理コ
ンピュータビジョン研究会資料８−６（１９８０年９月
）にて総括的に説明されているが、ＣＣＤアナログ処理
形を除いてＬＳＩ化されたものはない。従来のアーキテ
クチャのプロセッサをそのままＬＳＩ化するには、 ■　集積度 ■　ピン数の点で困難がある。Such local parallel image processing is comprehensively explained in the above-mentioned literature or in Masatsugu Kido - Trends in Image Processing Hardware; Information Processing Computer Vision Study Group Material 8-6 (September 1980). There are no LSI versions other than the CCD analog processing type. There are difficulties in converting a processor with a conventional architecture into an LSI as it is in terms of: 1) the degree of integration, and 2) the number of pins.

本発明の目的は、ＬＳＩ化に適したアーキテクチャを有
し、かつ高速処理が可能な並列画像処理プロセッサを提
供するにある。An object of the present invention is to provide a parallel image processing processor that has an architecture suitable for LSI implementation and is capable of high-speed processing.

本発明の特徴は、画像データ供給源からの画像データを
取込み局所並列画像データ処理を行なう並列画像処理プ
ロセッサにおいて、画像データ入力ポートと、入力した
画像にもとづいて画像処理演算を行なう複数個のプロセ
ッサニレメン１−と、前記各プロセッサエレメントの演
算結果と前段のプロセッサエレメントの演算結果を加算
する複数個の第１の演算回路と、前段の基本モジュール
における演算結果データを入力する演算結果データ入力
ポートと、前記演算結果データと最終段の前記第１の演
算回路の演算結果の加算を行なう第２の演算回路と、前
記第２の演算回路の演算結果データを出力する演算結果
データ出力ポートとからなる画像処理プロセッサ基本モ
ジュールを、複数組並列配置した並列画像処理プロセッ
サにある。A feature of the present invention is that a parallel image processing processor that takes in image data from an image data source and performs locally parallel image data processing includes an image data input port and a plurality of processors that perform image processing operations based on the input image. a plurality of first arithmetic circuits that add the arithmetic results of the respective processor elements and the arithmetic results of the preceding processor element; and an arithmetic result data input port that inputs the arithmetic result data of the preceding basic module. a second arithmetic circuit that adds the arithmetic result data to the arithmetic result of the first arithmetic circuit in the final stage; and an arithmetic result data output port that outputs the arithmetic result data of the second arithmetic circuit. The parallel image processing processor includes a plurality of basic image processing processor modules arranged in parallel.

以下、本発明を図示する実施例を用いて説明する。尚、
第１図〜第８図及び第１１図、第１２図は最近考えられ
ている並列画像処理技術の説明図、第９図及び第１０図
は本発明の一実施例を示す。Hereinafter, the present invention will be explained using illustrative embodiments. still,
1 to 8 and FIGS. 11 and 12 are explanatory diagrams of recently considered parallel image processing techniques, and FIGS. 9 and 10 show an embodiment of the present invention.

第１図は典型的な画像処理システムの構成を示すもので
、画像入力装置として工業用テレビジョンカメラ５２画
像記憶装置として画像メモリ３゜及びこの内容を表示す
るＣＲＴモニタ４が設けられている。画像メモリ３の画
像情報が画像処理プロセッサ２により処理され、この結
果がまた画像メモリ３に格納されたり、あるいはシステ
ム全体を制御する管理プロセッサ１に与えられる。FIG. 1 shows the configuration of a typical image processing system, which includes an industrial television camera 5 as an image input device, an image memory 3 as an image storage device, and a CRT monitor 4 for displaying the contents thereof. The image information in the image memory 3 is processed by the image processor 2, and the results are also stored in the image memory 3 or provided to the management processor 1 which controls the entire system.

代表的な画像処理機能として空間積和演算がある。これ
は第２図に示すように、例えば４×４画素の局所画像デ
ータｆｉｌ〜ｆ４ａに対し、定められた荷重Ｗ１’１〜
Ｗ４４を乗算し総和をとるものである。A typical image processing function is spatial product-sum operation. As shown in FIG. 2, for example, a predetermined load W1'1~
W44 is multiplied and the sum is calculated.

これによりノイズ除去輪郭強調等の画像処理が行える。This results in noise removal Contour enhancement Image processing such as

このような、例えば４×４画素の局所画像データを処理
する画像処理プロセッサとして、第３図に示すような４
個のプロセッサエレメント（ＰＥ＃１〜＃４）１２をも
つ画像処理プロセッサ基本モジュール１０を４モジユ一
ル組合せた並列画像処理プロセッサ（タイプＩと呼ぶ）
２−■としている。画像メモリ３からは、局所画像デー
タが１列分（第３図ではｆ１４〜ｆ４４）並列に与えら
れ、その演算結果（第３図ではｇ）が画像メモリ３に格
納される。For example, as an image processing processor that processes 4×4 pixel local image data, a 4×4 pixel image processor as shown in FIG.
A parallel image processing processor (referred to as type I) that combines four image processing processor basic modules 10 each having 12 processor elements (PE#1 to #4).
2-■. One column of local image data (f14 to f44 in FIG. 3) is given in parallel from the image memory 3, and the calculation result (g in FIG. 3) is stored in the image memory 3.

基本モジュール１０は、処理対象の行の画像データを取
込む画像データ入力ポート２４、内部処理結果を出力す
る演算結果データ出方ポート３５をもつ。画像データｆ
Ｌ４が入力されたとき、シフトレジスタ１１を介して１
画素毎隣接した画素ｆ　１３．　ｆ　ｉ２．　ｆ　ｔｔ
も対応するＰＥ８４〜１に入力される。画素ｆ１ｔは、
空間積和演算のサイズを４×４以上に拡張する場合のた
めに、画像データ出力ポート２５から出力される。ＰＥ
１２には、シフトレジスタ１１からの処理対象の画像デ
ータｆと、荷重記憶メモリ１５からの荷重データＷが与
えられ１乗算が実行される。この結果が４個のＰＥ１２
の結果を加算する演算回路１３により部分和がとられる
。演算結果入力ポート３０から入力される部分和が演算
回路１４により次々と累算され、演算結果出力ポート３
５より次段の基本モジュール１０に出力される。The basic module 10 has an image data input port 24 that takes in image data of a row to be processed, and a calculation result data output port 35 that outputs internal processing results. image data f
When L4 is input, 1 is passed through the shift register 11.
Adjacent pixel f for each pixel 13. fi2. f tt
are also input to the corresponding PEs 84-1. The pixel f1t is
The image data is output from the image data output port 25 in case the size of the spatial product-sum operation is expanded to 4×4 or more. P.E.
12 is given the image data f to be processed from the shift register 11 and the load data W from the load storage memory 15, and multiplication by 1 is executed. This result is 4 PE12
A partial sum is calculated by the arithmetic circuit 13 which adds the results. The partial sums input from the calculation result input port 30 are accumulated one after another by the calculation circuit 14, and the partial sums input from the calculation result input port 30 are accumulated one after another by the calculation circuit 14.
5 to the next basic module 10.

このようにして、基本モジュール１０を４段重ねること
により、最終基本モジュール１．０　Ｄからが出力され
る。In this way, by stacking the basic modules 10 in four stages, the final basic module 1.0D is output.

このタイムチャートを第４図に示す。前述した演算が基
本クロック時間Δｔ１内に実行され結果ｇが出力され、
次のΔｔ１では１画素分だけ移動した４×４絵素の入力
画像に対する結果ｇが出力されることになる。したがっ
て、次々と入力される画像データに対する全ての４×４
絵素の空間積和演算結果が次々と出力される。This time chart is shown in FIG. The above-mentioned operation is executed within the basic clock time Δt1 and the result g is output,
At the next Δt1, the result g for the input image of 4×4 picture elements shifted by one pixel is output. Therefore, all 4×4 images for image data that are input one after another
The spatial product-sum calculation results of the picture elements are output one after another.

第５図の実施例は、前述の実施例のタイプ１画像処理プ
ロセッサ２−Ｉの基本クロック時間Δｔ１を、パイプラ
イン処理により短縮化した構成を示すものである。これ
をタイプＩのパイプラインバージョンの並列画像処理プ
ロセッサ２−ＩＰと呼ぶ。即ち、タイプ■では基本クロ
ック時間Δｔ１は ■　画像データＬ、ａのシフトレジスター１への入力処
理 ■　プロセッサエレメント１２による積和荷重Ｗ５２、
と画像ｆ＋、ａとの乗算処理 ■　演算回路１３による部分和処理 ■　演算回路１４による部分和累算処理の全ての処理時
間の和以上である必要があった。The embodiment shown in FIG. 5 shows a configuration in which the basic clock time Δt1 of the type 1 image processing processor 2-I of the previous embodiment is shortened by pipeline processing. This is called a Type I pipeline version parallel image processing processor 2-IP. That is, in type ■, the basic clock time Δt1 is ■ Input processing of image data L and a to shift register 1 ■ Product-sum load W52 by processor element 12,
Multiplying process by image f+,a ■ Partial sum processing by arithmetic circuit 13 ■ It was necessary that the processing time of all partial sum accumulation processes by arithmetic circuit 14 be longer than the sum.

これに対して、例えば第５図の例のように、■と■、■
と■、及び■と■の間にパイプラインレジスター６を介
在させることにより、その基本クロツク時間Δｔ２を■
〜■の処理時間のうちの最大のもの（全ての和でない）
まで小さくすることが可能になる。このタイムチャート
を第６図に示す。On the other hand, for example, as in the example in Figure 5, ■, ■, ■
By interposing the pipeline register 6 between and ■, and between ■ and ■, the basic clock time Δt2 can be reduced to ■
The maximum processing time of ~■ (not the sum of all)
It is possible to make it as small as possible. This time chart is shown in FIG.

時刻１で処理■、２で■、３で■、４で■が実行される
。時刻２では次の入力画像に対する処理■。Processing ■ is executed at time 1, ■ at time 2, ■ at time 3, and ■ at time 4. At time 2, the next input image is processed ■.

３で■、４で■、５で■が実行され、次々と各構成要素
をパイプライン的に動作させその処理速度を向上するこ
とができる。3, 4, and 5 are executed, and the processing speed can be improved by operating each component one after another in a pipeline manner.

第７図の実施例は、前述の並列画像処理プロセッサ２−
ＩＰの基本クロックΔｔ２を更に短縮化しうる構成を示
したもので、タイプ■のパイプラインースキューパージ
ョンの並列画像処理プロセッサ２−ＩＰＳと呼ぶ。第５
図のＩＰタイプでの基本クロック時間Δｔ２は、処理■
の部分和累積時間により制約される可能性が強い。とい
うのは基本モジュール１０をｎ段にした場合、Δｔ２は
演算回路１４での処理時間と演算結果３０．３５の入出
力時間との和のｎ倍の時間が必要になるからである。特
に基本モジュール１０をＬＳＩ化した場合は入出力遅延
時間は無視できない。このため、第５図のタイプＩＰに
更に部分和の累積のパスにパイプラインレジスタ１６を
入れ、基本モジュールｌ０Ａ−Ｄ間での演算もパイプラ
イン処理するようにしたもので、前述のΔｔ　２’の時
間規制を１　／　ｎにしている。この第７図のＩＰＳタ
イプでは、第８図のタイムチャートで示すように、同時
刻３で各基本モジュールｌ０Ａ−Ｄの部分和が算出され
累積の部分でのタイミングが合わなくなる。第７図のＩ
ＰＳでは、このタイミング合せのための可変段数スキュ
ー補正用シフトレジスタ１７を画像データ入力ポート２
４に直後に設置している。各基本モジュールＩＯＡ〜Ｄ
の累積パスでのパイプライン段数は１段であるため、可
変段数スキュー補正用シフトレジスタ１７の段数は、基
本モジュールＩＯＡ・・・・・・・・・０段Ｂ・・・・
・・・・・１段Ｃ・・・・・・・・・２段Ｄ・・・・・・・・・３段に設定される。このようにして第８図のタイムチャート
における不整合（・・・部）が補正され、連続したΔｔ
３時間でのパイプライン動作が可能となる。The embodiment of FIG. 7 is based on the parallel image processing processor 2-
This shows a configuration in which the IP basic clock Δt2 can be further shortened, and is called a type (2) pipeline-skew version parallel image processing processor 2-IPS. Fifth
The basic clock time Δt2 in the IP type shown in the figure is the processing ■
There is a strong possibility that it is constrained by the partial sum accumulation time of . This is because when the basic module 10 has n stages, Δt2 requires n times the sum of the processing time in the arithmetic circuit 14 and the input/output time of the arithmetic result 30.35. In particular, when the basic module 10 is implemented as an LSI, the input/output delay time cannot be ignored. For this reason, a pipeline register 16 is further added to the type IP shown in FIG. 5 in the partial sum accumulation path, so that calculations between basic modules l0A-D are also pipelined, and the above-mentioned Δt 2' The time regulation is set to 1/n. In the IPS type shown in FIG. 7, as shown in the time chart of FIG. 8, the partial sums of the basic modules 10A-D are calculated at the same time 3, and the timings in the cumulative part do not match. I in Figure 7
In the PS, the variable stage skew correction shift register 17 for this timing adjustment is connected to the image data input port 2.
It is installed immediately after 4. Each basic module IOA~D
Since the number of pipeline stages in the cumulative path is 1, the number of stages of the variable stage skew correction shift register 17 is the basic module IOA...0 stage B...
...1st step C...2nd step D...3rd step. In this way, the mismatch (... part) in the time chart of FIG. 8 is corrected, and the continuous Δt
Pipeline operation can be completed in 3 hours.

なお、容易にわかるように、スキュレジスタ１７は、部
分和を求める演算回路１３の直後に設置しても、あるい
は各ＰＥ１２の直前、直後に設置しても同様にタイミン
グの不整合は解決される。As can be easily seen, the timing mismatch is similarly resolved even if the skew register 17 is installed immediately after the arithmetic circuit 13 that calculates the partial sum, or even if it is installed immediately before or after each PE 12. .

第９図は、本発明による並列画像処理プロセッサの一実
施例を示す。前述までのタイプ■の構成では、画像デー
タ入力をシフトレジスタ１１を介して各ＰＥ１２＃１〜
４に隣接する絵素を分配していた。これに対し本実施例
では、入力画像データは各ＰＥ１２＃１〜４に共通に与
え、この乗算結果を演算回路１８．レジスタ１９を介し
て累算して部分和Σ１を出力するようにしている。この
動作を第１０図のタイムチャートを参照して説明する。FIG. 9 shows an embodiment of a parallel image processing processor according to the present invention. In the configuration of type (2) described above, image data is input to each PE 12 #1 through the shift register 11.
Picture elements adjacent to 4 were distributed. On the other hand, in this embodiment, the input image data is commonly given to each PE 12#1-4, and the multiplication result is sent to the arithmetic circuit 18. The partial sum Σ1 is accumulated through the register 19 and output. This operation will be explained with reference to the time chart of FIG.

時刻１で画像データ入力ポート２０より画像ｆ１ｔが入
力され、ＰＥｊ２＃１にて荷重記憶メモリ１５から読み
出された荷重Ｗｌｌとの積ｆ１１＊Ｗ１１がレジスタ１
９＃２にセットされる。At time 1, the image f1t is input from the image data input port 20, and the product f11*W11 with the load Wll read out from the load storage memory 15 at PEj2#1 is stored in register 1.
9 #2 is set.

時刻２で画像データｆ１２が入力され、Ｐ　Ｅ　１２’
＃２にて荷重Ｗ１２との積ｆＬ２＊Ｗ１２がとられ、こ
れとレジスタ１９＃２の値ｆ１ｚ＊ｗｚ工との和ｆｘｚ
＊ｗ１ｔ＋ｆ１２＊Ｗ１２が演算回路１８でとられ、レ
ジスタ１９＃３にセットされる。Image data f12 is input at time 2, and P E 12'
In #2, the product fL2*W12 with the load W12 is taken, and the sum fxz of this and the value f1z*wz of register 19#2
*w1t+f12*W12 is taken by the arithmetic circuit 18 and set in the register 19#3.

時刻３で画像データｆ１ｇが入力され、ＰＥ１２＃３に
て荷重Ｗｉ１１との積ｆ　ｓｓ＊　ｗｔａがとられ、こ
れとレジスタ１９＃３の値ｆ　ｌｌ’ｌ　ｗｉｔ＋　ｆ
　ｘｚ＊Ｗ１２との和ｆ　工１＊　ｗｔｔ＋　ｆ　１２
＊　Ｗ１２＋　ｆ　１ｓｋ　ｗｔａが演算回路１８でと
られ、レジスタ１９＃４にセットされる。Image data f1g is input at time 3, and the product f ss * wta with the load Wi11 is taken at PE 12 #3, and this and the value of register 19 #3 f ll'l wit+ f
Sum of xz*W12 f engineering 1* wtt+ f 12
*W12+f 1sk wta is taken by the arithmetic circuit 18 and set in the register 19#4.

時刻４で画像データｆｚａが入力され、ＰＥ１２＃４に
て荷重Ｗ１４との積ｆ１４＊Ｗ１４がとられ、これとレ
ジスタ１９＃４の値ｆ　１１　＊　Ｗ１１＋　ｆ　ｉ２
＊Ｗ　１２＋　ｆ　１３　’ｋ　Ｗ　１３との和ΣＬ＝
　ｆ　１１　＊　Ｗ　ＩＬ＋　〜＋　ｆ　１４　＊　Ｗ
１４が演算回路１８でとられる。この部分和Σ、が各基
本モジュールｌ０Ａ−Ｄの演算回路１４で累積され、最
終段からが出力される。Image data fza is input at time 4, and the product f14*W14 with the load W14 is taken at PE12#4, and this and the value of register 19#4 f11*W11+f i2
*W 12+ f 13 'k Sum of W 13 ΣL=
f 11 * W IL+ ~+ f 14 * W
14 is taken by the arithmetic circuit 18. This partial sum Σ is accumulated in the arithmetic circuit 14 of each basic module l0A-D, and is output from the final stage.

以下、各基本クロック時間Δｔ４間隔で空間積和演算結
果ｇが出力される。Thereafter, the spatial product-sum calculation result g is output at intervals of each basic clock time Δt4.

このタイプ■の並列画像処理プロセッサ２−ｎにも、タ
イプＩと同様に、タイプ■Ｐ及びＩＩＰＳが考えられ、
基本クロック時間Δｔ４を小さくすることが可能である
。これらは容易に類推できるのでここでは省略する。Similar to type I, types P and IIPS can be considered for this type II parallel image processing processor 2-n.
It is possible to reduce the basic clock time Δt4. Since these can be easily inferred, they are omitted here.

第１１図に、更に処理形態が異なる他の実施例を示す。FIG. 11 shows another embodiment with a further different processing form.

前述までの各ＰＥ１２に独立に積和荷重（メモリ）１５
を与えていた方式に対し、第１１図の構成では全ＰＥ１
２共通に積和荷重（メモリ）１５を与える方式でありタ
イプ■の並列画像処理プロセッサ２−ｍと呼ぶ。この動
作を第１２図のタイムチャートを参照して説明する。Add product-sum load (memory) 15 independently to each PE 12 up to the above.
In the configuration shown in Figure 11, the total PE1
This is a system in which a sum-of-products load (memory) 15 is given to the two in common, and is called a type (2) parallel image processing processor 2-m. This operation will be explained with reference to the time chart of FIG.

まず時刻１で既に画像データ入力ポート２０より画像ｆ
１４が入力されているとする。このときシフトレジスタ
１１を介してＰＥ１２＃１〜＃４にはそれぞれｆｌｚ、
　ｆｉｚ、ｆｚａ、　ｆｚ４が与えられている。そして
荷重記憶メモリ１５から荷重Ｗｌｉが読み出され、それ
ぞれの入力画像との積がとられる。演算回路２０では、
時刻１のはじめに保持している値が１１０”クリアされ
、前述のｆｌｌ””ｆ１４とＷｌｌとの積がそれぞれ保
持される。First, at time 1, the image f has already been input from the image data input port 20.
Assume that 14 is input. At this time, flz and
fiz, fza, fz4 are given. Then, the load Wli is read out from the load storage memory 15 and multiplied by each input image. In the arithmetic circuit 20,
The value held at the beginning of time 1 is cleared to 110'', and the products of the aforementioned f14 and Wll are held respectively.

時刻２では画像ｆｚ５が入力され、ＰＥ１２＃１〜＃４
にはそれぞれｆｚ２〜工５が与えられ、次の荷重ｗｔ２
との積がとられる。この後演算回路２０で以前の値との
累積処理が行われる。例えば＃１ではｆ　１１　＊　ｗ
ｔｚ＋　ｆ　１２　＊　ｗｔｘ、＃２ではｆｘ２１’ｗ
ｘｔ＋ｆｔ３＊ｗｘ２が結果として保持される。At time 2, image fz5 is input, and PE12 #1 to #4
are given fz2 to fz5, respectively, and the next load wt2
The product is taken. Thereafter, the arithmetic circuit 20 performs an accumulation process with the previous value. For example, in #1, f 11 * w
tz + f 12 * wtx, fx21'w in #2
xt+ft3*wx2 is retained as the result.

時刻３，４でも同上の処理が実行され、演算回路２０＃
１〜＃４には＃１：ΣＬ＝”　ｆ　１１”　ｗｚｔ＋　ｆ　１２’ｓ
　Ｗ１２＋　ｆ　ｘ３＊　Ｗ１８＋　ｆ　１４　＊　ｗ
ｚａ＃２・Σｘｘ＝　ｆ　１２＊　Ｗ１１＋　ｆ　１ｇ
”　ｗ工２＋　ｆ　１４”　ｗｔｌｌ＋ｆ　ｌｓ＊ｗｔ
ａ＃３：Σ圭ｓ＝　ｆ　１３”　ｗｔｔ＋　ｆ　１＋＊
　ｗ１ｚ＋　ｆ　１５Ｊ　ｗｔ３＋　ｆ　１８＊Ｗ１４
＃４：Σｊ４＝＝　ｆ　１４＋ｗｔｔ十ｆ　ｔ５＊　Ｗ
１２＋　ｆ　１８牢Ｗｌｌ＋　ｆ　１７＊　ｗ１４とそ
れぞれの第１部分和が得られ、これが時刻Δの終りでシ
フトレジスタ２１にセットされる。The same process is executed at times 3 and 4, and the arithmetic circuit 20#
For 1 to #4, #1:ΣL=”f 11” wzt+f 12's
W12+ f x3 * W18+ f 14 * w
za#2・Σxx= f 12* W11+ f 1g
"w engineering 2+ f 14" wtll+f ls*wt
a#3: ΣKeis= f 13” wtt+ f 1+*
w1z+ f 15J wt3+ f 18*W14
#4: Σj4== f 14 + wtt ten f t5 * W
12+f 18 cells Wll+f 17*w14 and their respective first partial sums are obtained, which are set in the shift register 21 at the end of time Δ.

時刻５〜８では、各基本モジュールｌ０Ａ−Ｄのシフト
レジスタ２１から、Σ）工〜Σ１１．Σ）２〜Σ１２．
Σ）３〜Σ１８．Σ）４〜Σ１４が演算回路１４により
順次累積され、結果ｇ１１〜ｇ１４を出力する。At times 5 to 8, the shift registers 21 of each basic module 10A-D transfer data from Σ) to Σ11. Σ)2 to Σ12.
Σ)3 to Σ18. Σ)4 to Σ14 are sequentially accumulated by the arithmetic circuit 14, and the results g11 to g14 are output.

と同時に、ＰＥＡＬでは画像データｆｘ５〜ｆｔｇ、Ｐ
Ｅ＃２ではｆ１６〜ｆｔｓ、ＰＥ＃３ではｆ１ｒ〜ｆｚ
ｏ、ＰＥ＃４ではｆ　１８〜Ｌｚｘに対して時刻１〜４
と同様の処理が実行され、部分和Σ１５．Σ′Ｘ６゜Σ
１フ、Σ１８を求め、時刻９〜１２にてこれらが累積さ
れ結果ｇ１Ｒ〜ｇ１ｇが得られる。このようにして連続
して空間積和演算結果が出力される。At the same time, in PEAL, image data fx5 to ftg, P
f16~fts in E#2, f1r~fz in PE#3
o, PE#4 has f 18 to Lzx at times 1 to 4
Processing similar to is executed, and partial sum Σ15. Σ′X6゜Σ
1F, Σ18 is calculated, and these are accumulated at times 9 to 12 to obtain results g1R to g1g. In this way, spatial product-sum calculation results are continuously output.

このタイプ■の並列画像処理プロセッサ２−Ｈにも、タ
イプ■と同様に、タイプｍＰ及び■ＰＳが考えられ、基
本クロック時間Δｔ５を小さくすることが可能である。Similar to the type (2), types mP and (2) PS can be considered for the parallel image processing processor 2-H of the type (2), and it is possible to reduce the basic clock time Δt5.

さて、前述のタイプ１〜■までの実施例では、基本モジ
ュール１０間の演算は、部分和演算回路１４を直列接続
する形とし、この回路１４も基本モジュール内に含めて
いた。しかしＬＳＩ化のためにピン数が問題となる場合
には、例えば第３図の点線部のみ基本モジュールとし、
モジュール間演算は外部で並列に行うことも可能である
。Now, in the embodiments of Types 1 to 2 described above, calculations between the basic modules 10 are performed by connecting partial sum calculation circuits 14 in series, and this circuit 14 is also included in the basic module. However, if the number of pins becomes an issue for LSI implementation, for example, only the dotted line in Figure 3 should be used as the basic module.
Inter-module operations can also be performed externally in parallel.

本発明によれば、局所並列画像プロセッサを少ない入出
力ポートでかつ規則的な配列のモジュールに分割できる
ため、ＬＳＩ化に適したアーキテクチャとすることがで
きる。According to the present invention, since a locally parallel image processor can be divided into modules with a small number of input/output ports and a regular arrangement, an architecture suitable for LSI implementation can be achieved.

[Brief explanation of the drawing]

第１図は画像処理システムの構成を示す図、第２図は局
所並列処理の例を説明する図、第３，５゜７．９．１１
図は本発明の並列画像処理プロセッサの構成を示すブロ
ック図、第４．６，８，１０゜１２図は並列画像処理プ
ロセッサのタイムチャートを示す図である。２・・・並列画像処理プロセッサ、３・・・画像メモリ
、１０・・・画像処理プロセッサ基本モジュール、１１
・・・入力画像シフトレジスタ、１２・・・プロセッサ
エレメント、１３・・・部分和演算回路、１４・・・部
分和累算演算回路、１５・・・荷重記憶メモリ、１６・
・・パイブラインレジスタ、１７・・・（可変段数）ス
キュー補正シフトレジスタ、１８・・・伝播・累積演算
回路、１９・・・伝播レジスタ、２０・・・累積演算回
路、２１・・・部分和出力シフトレジスタ、２４・・・
画像データ入力ポート、２５・・・画像データ出力ポー
ト、３０・・・演算結果データ入力ポート、３５・・・
演算結果データ出力ポート。Figure 1 is a diagram showing the configuration of an image processing system, Figure 2 is a diagram explaining an example of local parallel processing, and Figures 3 and 5゜7.9.11
The figure is a block diagram showing the configuration of the parallel image processing processor of the present invention, and Figures 4.6, 8, 10 and 12 are diagrams showing time charts of the parallel image processing processor. 2...Parallel image processing processor, 3...Image memory, 10...Image processing processor basic module, 11
... Input image shift register, 12... Processor element, 13... Partial sum calculation circuit, 14... Partial sum accumulation calculation circuit, 15... Load storage memory, 16.
... Pipeline register, 17... (variable number of stages) skew correction shift register, 18... Propagation/accumulation arithmetic circuit, 19... Propagation register, 20... Accumulation arithmetic circuit, 21... Partial sum Output shift register, 24...
Image data input port, 25... Image data output port, 30... Calculation result data input port, 35...
Operation result data output port.

Claims

[Claims]

1. In a parallel image processing processor that takes in image data from an image data source and performs local parallel image data processing, an image data input port and a plurality of processor elements that perform image processing operations based on the input image; a plurality of first arithmetic circuits that add the arithmetic results of the respective processor elements and the arithmetic results of the preceding processor element; an arithmetic result data input port that inputs the arithmetic result data of the preceding basic module; and the arithmetic result data and a second arithmetic circuit that adds the arithmetic results of the first arithmetic circuit in the final stage, and an arithmetic result data output port that outputs the arithmetic result data of the second arithmetic circuit. of,
A parallel image processing processor characterized by having multiple sets arranged in parallel.