JP2812292B2

JP2812292B2 - Image processing device

Info

Publication number: JP2812292B2
Application number: JP8060049A
Authority: JP
Inventors: 信行山下
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1996-02-22
Filing date: 1996-02-22
Publication date: 1998-10-22
Anticipated expiration: 2016-02-22
Also published as: JPH09231347A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、画像処理装置に関
し、特に高速に二値画像処理を行う並列画像処理装置に
関する。[0001] 1. Field of the Invention [0002] The present invention relates to an image processing apparatus, and more particularly to a parallel image processing apparatus for performing high-speed binary image processing.

【０００２】[0002]

【従来の技術】図８に、従来の並列型画像処理装置の構
成を示す。図８を参照して、従来の並列型画像処理装置
において、プロセッサアレイ１００は、画像データおよ
び処理結果を蓄えるローカルメモリ３、演算器４、及び
レジスタファイル５からなるプロセッサ２００を一次元
状に複数接続してなるものであり、命令供給回路７は、
プログラムメモリ８と、シーケンスコントローラ９と、
からなる。2. Description of the Related Art FIG. 8 shows a configuration of a conventional parallel type image processing apparatus. Referring to FIG. 8, in a conventional parallel-type image processing apparatus, a processor array 100 includes a plurality of one-dimensional processors 200 each including a local memory 3, an arithmetic unit 4, and a register file 5 for storing image data and processing results. The instruction supply circuit 7
A program memory 8, a sequence controller 9,
Consists of

【０００３】プロセッサ２００において、レジスタファ
イル５とローカルメモリ３とは、アドレス線１１とデー
タ線１２とで相互に接続されており、各プロセッサ２０
０は、レジスタファイル５の複数個あるレジスタから、
命令によって選択されたレジスタの内容をアドレスとし
て、ローカルメモリ３をアクセスするという間接アドレ
ッシング機能を具備している。In the processor 200, the register file 5 and the local memory 3 are mutually connected by an address line 11 and a data line 12, and each processor 20
0 is the number of registers in the register file 5
It has an indirect addressing function of accessing the local memory 3 using the contents of the register selected by the instruction as an address.

【０００４】各プロセッサ２００は、レジスタファイル
５内のあるレジスタの内容をアドレスとして、各プロセ
ッサ内のローカルメモリ３にアクセスし、アクセスされ
たデータを処理し、再びローカルメモリ３に書き込む。
この一連のメモリアクセスおよび処理は、全プロセッサ
で同時に実行することができる。Each processor 200 accesses the local memory 3 in each processor using the contents of a certain register in the register file 5 as an address, processes the accessed data, and writes the data in the local memory 3 again.
This series of memory access and processing can be executed simultaneously by all processors.

【０００５】画像処理においては、常に、データが多値
で処理されるというわけではなく、処理の途中におい
て、二値化されたデータを処理する場合が頻繁にある。
例えば、膨張、収縮、孤立点除去や細線化といった処理
においては、二値データが処理対象となり、このような
処理において、処理対象の画素の処理後の値は、処理対
象画素の、例えば３×３近傍画素等の近傍画素の値を演
算して決定される。[0005] In image processing, data is not always processed in multi-valued form, and binarized data is frequently processed during processing.
For example, in processing such as expansion, contraction, removal of isolated points, and thinning, binary data is a processing target. In such processing, a value after processing of a processing target pixel is, for example, 3 × It is determined by calculating values of neighboring pixels such as three neighboring pixels.

【０００６】図８に示した従来の並列型画像処理装置に
おいて、ある画素の値をその近傍領域の画素値によって
決定するという二値画像処理を実行する場合、次のよう
な処理を行う。In the conventional parallel type image processing apparatus shown in FIG. 8, when executing the binary image processing of determining the value of a certain pixel by the pixel value of a neighboring area, the following processing is performed.

【０００７】まず、一のプロセッサは、メモリ（ローカ
ルメモリ）から、処理に必要な画素データをロードし、
次に、近傍のプロセッサから近傍のデータを転送するこ
とによって、処理対象画素を保持するプロセッサ上に近
傍画素のデータを集め、集められたそれぞれの１ビット
データに対して各種論理演算を行うことによって、処理
対象画素の処理後の値を決定し、最後にそのデータ（処
理後の値）をメモリに書き込む。First, one processor loads pixel data necessary for processing from a memory (local memory),
Next, by transferring the neighboring data from the neighboring processor, the data of the neighboring pixels is collected on the processor holding the pixel to be processed, and various logical operations are performed on the collected 1-bit data. , The processed value of the pixel to be processed is determined, and finally the data (the processed value) is written to the memory.

【０００８】この処理において、集められた１ビットデ
ータを、一旦、プロセッサ毎にビットパックして、多ビ
ット化したのち、ビットパックされたデータに対して論
理演算処理を施すことにより、１ビット毎に論理演算を
行う場合に比べて、高速に処理が行えることにもなる。In this processing, the collected 1-bit data is bit-packed once for each processor to increase the number of bits, and then the bit-packed data is subjected to a logical operation process, so that the 1-bit data is obtained. In addition, processing can be performed at a higher speed than when performing a logical operation.

【０００９】また、ビットパックデータに対して論理演
算処理を実行する代わりに、ビットパックデータをアド
レスとしてメモリ（すなわちテーブル）をアクセスし、
当該アドレスに予め格納されているデータ（論理演算結
果）を読み出すというテーブル参照処理を行うことによ
って、高速に処理できる場合もある。このテーブル参照
処理を行う際には、各プロセッサ内のローカルメモリを
テーブルデータ格納用のメモリに割り当て、割り当てら
れたメモリに対して、上記した間接アドレッシングでメ
モリアクセスを行う。In addition, instead of performing a logical operation on the bit pack data, a memory (that is, a table) is accessed using the bit pack data as an address,
In some cases, high-speed processing can be performed by performing a table reference process of reading data (a logical operation result) stored in advance at the address. When performing this table reference processing, a local memory in each processor is allocated to a memory for storing table data, and the allocated memory is accessed by the indirect addressing described above.

【００１０】[0010]

【発明が解決しようとする課題】しかしながら、上記し
た従来の並列型画像処理装置においては、二値画像デー
タをビットパックする際に要する処理時間が長くなり、
高速処理の実現を阻止しているという問題点を有してい
る。これは、近傍画素値をプロセッサ間転送によって、
１ビットずつ転送していることと、集められた画素デー
タに対しても、シフト演算等によって、所望のビット位
置にまでデータを持って来た後に、論理演算を行い、ビ
ットパックするという具合に、ビットパックするのに、
多くの演算を必要とすることによる。However, in the above-mentioned conventional parallel type image processing apparatus, the processing time required for bit-packing the binary image data becomes longer,
There is a problem that the realization of high-speed processing is prevented. This is because the neighboring pixel value is transferred between processors.
The bit data is transferred one bit at a time, and the collected pixel data is brought to a desired bit position by a shift operation or the like, then a logical operation is performed, and bit packing is performed. To pack a bit,
By requiring many operations.

【００１１】また、上記した従来の並列型画像処理装置
においては、各プロセッサで同じテーブルデータを用い
る場合には、同じテーブルデータを各プロセッサ毎に持
つことになり、ローカルメモリを無駄に消費してしま
う、という問題点を有している。In the above-described conventional parallel type image processing apparatus, when the same table data is used in each processor, the same table data is provided for each processor, and the local memory is wastefully consumed. Has the problem that

【００１２】従って、本発明は、上記問題点を解消すべ
く為されたものであって、その目的は、二値画素処理
を、従来の装置よりも高速に行うことができる並列画像
処理装置を提供することにある。Accordingly, the present invention has been made to solve the above problems, and an object of the present invention is to provide a parallel image processing apparatus capable of performing binary pixel processing at a higher speed than a conventional apparatus. To provide.

【００１３】[0013]

【課題を解決するための手段】前記目的を達成するため
に、本発明は、複数のプロセッサがアレイ状に設けられ
てなる画像処理装置において、各プロセッサは、近接プ
ロセッサの所定のレジスタにあるデータを直接参照可能
とされ、前記近接プロセッサの所定のレジスタのデータ
と前記各プロセッサ内の所定のレジスタにあるデータと
をビットパックする手段を有することを特徴とする並列
画像処理装置を提供する。According to one aspect of the present invention, there is provided an image processing apparatus comprising a plurality of processors arranged in an array. And a means for bit-packing data in a predetermined register of the proximity processor and data in a predetermined register in each of the processors.

【００１４】本発明は、前記プロセッサが、前記ビット
パック操作において、ビットパックに用いられるデータ
を格納するレジスタ間で、データをシフトすると共に、
次のビットパック処理で用いられるデータをメモリから
レジスタにリードする手段を備えたことを特徴とする。According to the present invention, in the bit pack operation, the processor shifts data between registers for storing data used for the bit pack,
There is provided means for reading data used in the next bit pack processing from a memory to a register.

【００１５】本発明においては、テーブル参照用メモリ
として、１つのライトポートと、プロセッサ総数と同数
又は前記プロセッサ総数を所定の正数で除してなる個数
のリードポートとを有するメモリを備えたことを特徴と
する。In the present invention, a memory having one write port and a number of read ports equal to the total number of processors or the number obtained by dividing the total number of processors by a predetermined positive number is provided as a table reference memory. It is characterized by.

【００１６】本発明の原理・作用を以下に説明する。本
発明は、複数のプロセッサから構成されたプロセッサア
レイが、画像メモリに対して１度に、画像の行単位のデ
ータのリード／ライトを実行することができる、プログ
ラム実行可能な並列画像処理装置において、二値画像デ
ータをブロック単位でビットパックする手段を備え、さ
らに、ビットパック処理と同時に、すなわち、好ましく
はビットパック処理と同じ命令サイクル内に、次にビッ
トパックするデータをメモリからロードする手段を具備
したことにより、ビットパック演算を高速化すると共
に、回路規模の増大を抑止し、二値画像データに対して
高速な画像処理を実行できるようにしたものである。さ
らに、ビットパックされたデータによるテーブル参照処
理に使用されるメモリとして１ライト／マルチリードポ
ート構成のメモリを用い、全プロセッサからリードでき
るようにすることにより、各プロセッサのローカルメモ
リをテーブルメモリとして用いる場合に比べて、大幅に
メモリ使用量を削減できる。The principle and operation of the present invention will be described below. The present invention relates to a program-executable parallel image processing apparatus in which a processor array including a plurality of processors can execute read / write of data of an image row unit at a time with respect to an image memory. Means for bit-packing the binary image data in block units, and further, means for loading the next data to be bit-packed from the memory simultaneously with the bit-packing, that is, preferably in the same instruction cycle as the bit-packing. With this arrangement, the speed of the bit pack operation is increased, the increase in the circuit scale is suppressed, and high-speed image processing can be performed on the binary image data. Further, a memory having a one-write / multi-read port configuration is used as a memory used for a table reference process using bit-packed data, and can be read from all processors, so that a local memory of each processor is used as a table memory. As compared with the case, the memory usage can be greatly reduced.

【００１７】[0017]

【発明の実施の形態】本発明の実施の形態を図面を参照
して以下に詳細に説明する。図１は本発明の画像処理装
置の一実施形態の構成を示す図である。Embodiments of the present invention will be described in detail below with reference to the drawings. FIG. 1 is a diagram showing the configuration of an embodiment of the image processing apparatus of the present invention.

【００１８】図１を参照して、プロセッサアレイ１は、
ローカルメモリ３、演算器４、レジスタファイル５、及
びビットパック処理を行うビットパックブロック６から
なるプロセッサ２を一次元状に複数接続したものであ
る。また、図１において、命令供給回路７は、シーケン
スコントローラ８とプログラムメモリ９とから構成さ
れ、シーケンスコントローラ８が生成したプログラムア
ドレスによって、プログラムメモリ９から命令を読み出
し、その読み出された内容をプロセッサアレイ１に命令
として供給する。ホストプロセッサ１０は、プログラム
メモリ９やローカルメモリ３へのデータロード等を行
う。Referring to FIG. 1, processor array 1 includes:
A plurality of one-dimensionally connected processors 2 each including a local memory 3, an arithmetic unit 4, a register file 5, and a bit pack block 6 for performing bit pack processing. 1, an instruction supply circuit 7 includes a sequence controller 8 and a program memory 9, reads instructions from the program memory 9 according to a program address generated by the sequence controller 8, and stores the read contents in a processor. Supply to array 1 as instructions. The host processor 10 loads data into the program memory 9 and the local memory 3 and the like.

【００１９】本実施形態の画像処理装置は、プログラム
実行可能な装置であり、同一の命令ストリームで全ての
プロセッサ２（プロセッサエレメント、「ＰＥ」ともい
う）を動作させるＳＩＭＤ（single instruction multi
ple data stream；単一命令複数データ制御）型並列画
像処理装置とされる。この画像処理装置は、算術論理演
算命令、シフト命令、フラグ操作命令、ＰＥ間データ転
送命令と、ロード／ストア命令等の命令セットを有して
いる。The image processing apparatus according to the present embodiment is an apparatus capable of executing a program, and is a single instruction multi-function (SIMD) operating all processors 2 (processor elements, also referred to as “PEs”) with the same instruction stream.
ple data stream (single instruction multiple data control) type parallel image processing apparatus. This image processing apparatus has an instruction set such as an arithmetic and logic operation instruction, a shift instruction, a flag operation instruction, a data transfer instruction between PEs, and a load / store instruction.

【００２０】図２（Ａ）は、画像の第ｊ行、第ｉ列の画
素をＰ（ｉ，ｊ）としたときの、Ｐ（ｉ，ｊ）を中心画
素とした３×３近傍領域を示す図である。いま、各プロ
セッサ２のローカルメモリ３には、画像の１列のデータ
が１画素ずつ分散されて格納されているものとする。す
なわち、ｉ番目のプロセッサ２は、画像の第ｉ列のデー
タをローカルメモリ４内に格納しながら処理を進めて行
く。以下ではプロセッサ２をＰＥ（プロセッサエレメン
ト）、ローカルメモリをＭ、プロセッサの総数をｎ、プ
ロセッサエレメントＰＥの番号をｉで表し、ｉ番目のＰ
ＥをＰＥ（ｉ）、ｉ番目のＰＥのローカルメモリの第ｊ
行をＭ（ｉ，ｊ）にて表すものとする。FIG. 2A shows a 3 × 3 neighborhood area with P (i, j) as the center pixel when the pixel at the j-th row and i-th column of the image is P (i, j). FIG. Now, it is assumed that the data of one column of the image is dispersedly stored for each pixel in the local memory 3 of each processor 2. That is, the i-th processor 2 proceeds with the process while storing the data of the i-th column of the image in the local memory 4. Hereinafter, the processor 2 is represented by PE (processor element), the local memory is represented by M, the total number of processors is represented by n, and the number of the processor element PE is represented by i.
E is PE (i), j-th local memory of the i-th PE
Let the row be represented by M (i, j).

【００２１】画像の第ｊ行のデータは、ローカルメモリ
の第ｊ行に格納されているとすると、ローカルメモリＭ
（ｉ，ｊ）は、画素Ｐ（ｉ，ｊ）を保持することにな
る。Assuming that the data of the j-th row of the image is stored in the j-th row of the local memory, the local memory M
(I, j) holds the pixel P (i, j).

【００２２】いま、ｉ番目のプロセッサエレメントＰＥ
（ｉ）で処理する画素をＰ（ｉ，ｊ）とすると、Ｐ
（ｉ，ｊ）を中心とする近傍３×３画素は、Ｐ（ｉ−
１，ｊ−１）、Ｐ（ｉ，ｊ−１）、Ｐ（ｉ＋１，ｊ−
１）、Ｐ（ｉ−１，ｊ）、Ｐ（ｉ，ｊ）、ｐ（ｉ＋１，
ｊ）、Ｐ（ｉ−１，ｊ＋１）、Ｐ（ｉ，ｊ＋１）、Ｐ
（ｉ＋１，ｊ＋１）である。なお、図２では、中心画素
Ｐ（ｉ，ｊ）をＰ（ｋ，ｍ）、すなわちｉ＝ｋ、ｊ＝ｍ
として示している。Now, the i-th processor element PE
If the pixel to be processed in (i) is P (i, j), P
The neighboring 3 × 3 pixels centered at (i, j) are represented by P (i−
1, j-1), P (i, j-1), P (i + 1, j-
1), P (i-1, j), P (i, j), p (i + 1,
j), P (i-1, j + 1), P (i, j + 1), P
(I + 1, j + 1). In FIG. 2, the center pixel P (i, j) is P (k, m), i.e., i = k, j = m
As shown.

【００２３】これらの９画素から処理対象画素（中心画
素）であるＰ（ｉ，ｊ）を除く８画素を、図２（Ｂ）に
示すように、８ビットデータにビットパックする場合に
ついて、以下説明を行う。As shown in FIG. 2B, a case where eight pixels excluding P (i, j) which is a processing target pixel (center pixel) from these nine pixels are bit-packed into 8-bit data will be described below. Give an explanation.

【００２４】図３は、３×３領域のビットパック処理を
行う構成を示した図であり、ビットパック処理を行うプ
ロセッサエレメントＰＥ（ｉ）のレジスタとその両隣り
のＰＥ（ｉ−１）、及びＰＥ（ｉ＋１）の接続関係が示
されている。なお、図３では、番号ｉをｋとして示して
いる。FIG. 3 is a diagram showing a configuration for performing a bit pack process in a 3 × 3 area. The register of the processor element PE (i) for performing the bit pack process and the PEs (i−1) on both sides thereof are shown. And the connection relationship between PE (i + 1) and PE (i + 1). In FIG. 3, the number i is shown as k.

【００２５】ＰＥ（ｉ）において、３×３領域のビット
パックをする場合、左側のプロセッサＰＥ（ｉ−１）か
らＰ（ｉ−１，ｊ−１）、Ｐ（ｉ−１，ｊ）、Ｐ（ｉ−
１，ｊ＋１）の３ビットのデータを、右側のプロセッサ
ＰＥ（ｉ＋１）からＰ（ｉ＋１，ｊ−１）、Ｐ（ｉ＋
１，ｊ）、Ｐ（ｉ＋１，ｊ＋１）の３ビットのデータ
を、ＰＥ（ｉ）は必要とする。これと同様に、左側のＰ
Ｅ（ｉ−１）は、ＰＥ（ｉ−２）とＰＥ（ｉ）からそれ
ぞれ３ビットずつのデータを、右側のＰＥ（ｉ＋１）
は、ＰＥ（ｉ）とＰＥ（ｉ＋２）からそれぞれ３ビット
ずつのデータを必要とする。In PE (i), when bit packing of a 3 × 3 area is performed, P (i−1, j−1), P (i−1, j), P (i−1, j), P (i-
1, 3-bit data of P (i + 1, j−1), P (i +) from the processor PE (i + 1) on the right side.
The PE (i) needs 3-bit data of (1, j) and P (i + 1, j + 1). Similarly, the left P
E (i-1) converts the data of 3 bits each from PE (i-2) and PE (i) to PE (i + 1) on the right side.
Requires data of 3 bits each from PE (i) and PE (i + 2).

【００２６】すなわち、ＰＥ（ｉ）については、その左
右のプロセッサＰＥ（ｉ−１）、ＰＥ（ｉ＋１）とか
ら、３ビットずつのデータ入力線と３ビットずつのデー
タ出力線が設けられる。That is, for PE (i), a data input line of 3 bits and a data output line of 3 bits are provided from the left and right processors PE (i-1) and PE (i + 1).

【００２７】かかる構成は、全ＰＥについて共通である
ため、ビットパックのために、各ＰＥは、３ビットのデ
ータ入力線と３ビットのデータ出力線の６ビットのデー
タ転送線２０が、隣接ＰＥとの接続において設けられる
ものとする。Since such a configuration is common to all PEs, each PE has a 6-bit data transfer line 20 consisting of a 3-bit data input line and a 3-bit data output line for the bit pack. It shall be provided at the connection with.

【００２８】そして、各ＰＥは、ｒ１、ｒ２、ｒ３の３
つの１ビットレジスタ２１を持ち、これら３つのレジス
タは、隣接ＰＥから直接参照可能に構成されている。Each PE has r1, r2, and r3.
It has one 1-bit register 21, and these three registers are configured to be directly referable from adjacent PEs.

【００２９】このため、データ転送命令によって隣接し
たＰＥ間でデータ転送を行うことなく、各ＰＥは、両隣
のＰＥの３つの１ビットレジスタｒ１、ｒ２、ｒ３を、
ソース（転送元）として用いることができる。すなわ
ち、ビットパック処理時に、自ＰＥの転送先（デスティ
ネーション）のレジスタ又はメモリ等へ、両隣のＰＥの
３つの１ビットレジスタｒ１、ｒ２、ｒ３をソースオペ
ランドとしても指定することができる。For this reason, without performing data transfer between adjacent PEs by a data transfer instruction, each PE stores the three 1-bit registers r1, r2, and r3 of the adjacent PEs.
It can be used as a source (transfer source). That is, at the time of bit pack processing, the three 1-bit registers r1, r2, and r3 of the PEs on both sides can be specified as source operands in a register or a memory at the transfer destination (destination) of the own PE.

【００３０】さらに、図３に示すように、１ビットレジ
スタｒ１〜ｒ３は、各ＰＥ内において、セレクタ２２に
よって、１ビット論理左シフトと、１ビット論理右シフ
トと、が行えるように構成されている。ここで、ＰＥ
（ｉ）の１ビットレジスタｒ１、ｒ２、ｒ３をそれぞ
れ、ｒ１（ｉ）、ｒ２（ｉ）、ｒ３（ｉ）とする。Further, as shown in FIG. 3, the 1-bit registers r1 to r3 are configured so that the selector 22 can perform a 1-bit logical left shift and a 1-bit logical right shift in each PE. I have. Where PE
The 1-bit registers r1, r2, and r3 in (i) are r1 (i), r2 (i), and r3 (i), respectively.

【００３１】まず、メモリの（ｊ−１）行目のデータに
対するロード命令によって、メモリからプロセッサアレ
イへ（ｊ−１）行目のデータが、各ＰＥのｒ１に転送さ
れる。First, in response to a load instruction for the data in the (j-1) th row of the memory, the data in the (j-1) th row is transferred from the memory to the processor array to r1 of each PE.

【００３２】次に、同様にして、ロード命令によって、
メモリのｊ行目、（ｊ＋１）行目のデータが、各ＰＥの
ｒ２、ｒ３に転送される。Next, similarly, by a load instruction,
The data of the j-th row and (j + 1) -th row of the memory are transferred to r2 and r3 of each PE.

【００３３】この時点で、ＰＥ（ｉ）の１ビットレジス
タｒ１（ｉ）は、Ｍ（ｉ，ｊ−１）を、ｒ２（ｉ）は、
Ｍ（ｉ，ｊ）を、ｒ３（ｉ）は、Ｍ（ｉ，ｊ＋１）を保
持する。At this point, the 1-bit register r1 (i) of PE (i) stores M (i, j-1), and r2 (i) stores
M (i, j) and r3 (i) hold M (i, j + 1).

【００３４】ここで、ＰＥ（ｉ）のデスティネーション
・レジスタ２３をｒｄ（ｉ）としてビットパック命令を
行うと、ｒｄ（ｉ）のＭＳＢ（ビット７）からＬＳＢ
（ビット０）の順に、ｒ１（ｉ−１）、ｒ１（ｉ）、ｒ
１（ｉ＋１）、ｒ２（ｉ−１）、ｒ２（ｉ＋１）、ｒ３
（ｉ−１）、ｒ３（ｉ）、ｒ３（ｉ＋１）の内容が格納
される。When a bit pack instruction is performed with the destination register 23 of PE (i) set to rd (i), the MSB (bit 7) of rd (i) is changed to LSB.
In the order of (bit 0), r1 (i-1), r1 (i), r
1 (i + 1), r2 (i-1), r2 (i + 1), r3
The contents of (i-1), r3 (i), and r3 (i + 1) are stored.

【００３５】これによって、中心画素Ｐ（ｉ，ｊ）を中
心とした３×３画素領域の２値データの内、中心画素で
あるＰ（ｉ，ｊ）を除く８画素のデータがｒｄ（ｉ）に
ビットパックされて格納される。As a result, of the binary data in the 3 × 3 pixel area centered on the central pixel P (i, j), the data of eight pixels excluding the central pixel P (i, j) is rd (i). ) Is stored as a bit packed.

【００３６】上記のビットパック処理は、｛ＰＥ
（ｉ）：０≦ｉ≦ｎ−１｝のｎ個のＰＥのうち、１≦ｉ
≦ｎ−２の範囲にあるＰＥ（ｉ）で同時に実行される。The above-mentioned bit pack processing is performed by
(I): 1 ≦ i among n PEs of 0 ≦ i ≦ n−1}
It is executed simultaneously in PE (i) in the range of ≦ n−2.

【００３７】一次元プロセッサアレイ１の両端のＰＥに
ついては、例外処理が行われるが、図４は、この両端の
ＰＥでのビットパック処理の動作の説明をする図であ
る。Exception processing is performed for the PEs at both ends of the one-dimensional processor array 1. FIG. 4 is a diagram for explaining the operation of the bit pack processing at the PEs at both ends.

【００３８】ｎ個のプロセッサエレメントの一次元アレ
イからなるプロセッサアレイの両端のＰＥ（０）とＰＥ
（ｎ−１）について、それぞれ、ＰＥ（−１）、ＰＥ
（ｎ）が存在しないので、用途に応じて、ＰＥ（０）
とＰＥ（ｎ−１）を隣接ＰＥとみなしてデータを必要と
する場合、隣接ＰＥが存在しないＰＥ方向からはデー
タとして「０」を入力する必要がある場合、隣接ＰＥ
が存在しないＰＥ方向からはデータとして「１」を入力
する必要がある場合、の３通りの場合が代表的である。PE (0) and PE at both ends of a processor array consisting of a one-dimensional array of n processor elements
For (n-1), PE (-1), PE
Since (n) does not exist, depending on the application, PE (0)
And PE (n−1) are regarded as adjacent PEs, and data is required. When it is necessary to input “0” as data from the PE direction where no adjacent PE exists, the adjacent PE
In the case where it is necessary to input "1" as data from the PE direction in which no data exists, there are three typical cases.

【００３９】そこで、本実施形態においては、予めモー
ドレジスタ２５に、命令によって、前述した３通りのモ
ードのうちの１つのモードに設定しておき、その設定さ
れた２ビットの値を選択信号とするセレクタ２４によっ
て、ＰＥ（０）については、ＰＥ（ｎ−１）の１ビット
レジスタｒ１、ｒ２、ｒ３の値か、「０」又は「１」を
選択可能とし、ＰＥ（ｎ−１）についても同様に、モー
ドレジスタ（ｍ１）２５からの出力を選択信号とするセ
レクタ２４によって、ＰＥ（０）の１ビットレジスタｒ
１、ｒ２、ｒ３の値か、「０」又は「１」を選択可能と
している。Therefore, in the present embodiment, one of the three modes described above is set in advance in the mode register 25 by an instruction in the mode register 25, and the set 2-bit value is used as a selection signal. Selector 24 selects the value of 1-bit register r1, r2, r3 of PE (n-1) or "0" or "1" for PE (0). Similarly, the selector 24 using the output from the mode register (m1) 25 as a selection signal causes the 1-bit register r of PE (0) to
The value of 1, r2, r3, “0” or “1” can be selected.

【００４０】第ｊ行の各画素を中心とする３×３画素領
域のビットパック処理が終了して、次の第（ｊ＋１）行
について処理を行う際には、その前後の第ｊ行と第（ｊ
＋２）行のデータが必要になる。When the bit pack processing of the 3 × 3 pixel area centering on each pixel of the j-th row is completed and the processing is performed for the next (j + 1) -th row, the j-th row and the next (J
+2) Row data is required.

【００４１】第ｊ行と第（ｊ＋１）行のデータは、既に
それぞれ１ビットレジスタｒ２とｒ３に保持されてお
り、このデータを利用することができ、また１ビットレ
ジスタｒ１に格納されている第（ｊ−１）行のデータは
不要となる。ここで、ロード命令によって、第（ｊ＋
２）行のデータを１ビットレジスタｒ１に転送すると、
最初のビットパック処理では、中心画素がｒ２に、（中
心画素の行数−１）行がｒ１に、（中心画素の行数＋
１）行がｒ３に保持されていたのが、次のビットパック
処理では、中心画素がｒ３に、（中心画素の行数−１）
行がｒ２に、（中心画素の行数＋１）行がｒ１に保持さ
れる。The data of the j-th row and the (j + 1) -th row are already held in the 1-bit registers r2 and r3, respectively, and can be used, and the data stored in the 1-bit register r1 can be used. The data in the (j-1) th row becomes unnecessary. Here, the (j +)
2) When the row data is transferred to the 1-bit register r1,
In the first bit pack process, the center pixel is set to r2, the (number of rows of center pixel−1) row is set to r1, and the (number of rows of center pixel +
1) The row is held in r3, but in the next bit pack processing, the center pixel is set to r3 and (the number of rows of the center pixel -1)
The row is held at r2, and the row (the number of rows of the center pixel + 1) is held at r1.

【００４２】このｒ１、ｒ２、ｒ３を用いて、ビットパ
ック処理を行うためには、デスティネーション・レジス
タｒｄ（ｉ）のＭＳＢ（最上位ビット）からＬＳＢ（最
下位ビット）の順に、ｒ２（ｉ−１）、ｒ２（ｉ）、ｒ
２（ｉ＋１）、ｒ３（ｉ−１）、ｒ３（ｉ＋１）、ｒ１
（ｉ−１）、ｒ１（ｉ）、ｒ１（ｉ＋１）の内容を格納
する必要があり、ｒｄ（ｉ）の各ビット位置には、最初
のビットパック処理とは異なるレジスタが格納されるこ
とになる。In order to perform bit pack processing using r1, r2, and r3, r2 (i) is used in order from the MSB (most significant bit) of the destination register rd (i) to the LSB (least significant bit). -1), r2 (i), r
2 (i + 1), r3 (i-1), r3 (i + 1), r1
It is necessary to store the contents of (i-1), r1 (i), and r1 (i + 1), and a register different from the first bit pack processing is stored in each bit position of rd (i). Become.

【００４３】さらに、その次の第（ｊ＋２）行を中心と
する３×３画素領域のビットパック処理についても、第
ｊ行、第（ｊ＋１）行を中心とする３×３画素領域のビ
ットパック処理の場合とは異なる処理が必要になる。Further, regarding the bit pack processing of the 3 × 3 pixel area centered on the next (j + 2) -th row, the bit pack processing of the 3 × 3 pixel area centered on the j-th row and the (j + 1) -th row is also performed. Processing different from the processing is required.

【００４４】この３通りのビットパック処理を実現する
ための回路としては、ビットパックデータの各ビット位
置の結果を求めるために、１ビットレジスタｒ１、ｒ
２、ｒ３から１つを選択する３入力１出力セレクタ回路
を、ビットパックされる１語（８ビット）に対応して８
個を必要とする。As a circuit for realizing the three types of bit pack processing, one-bit registers r1 and r1 are used to obtain the result of each bit position of the bit pack data.
A three-input one-output selector circuit for selecting one from 2, r3 is provided with eight bits corresponding to one word (8 bits) to be bit-packed.
Need pieces.

【００４５】ここでは、３×３画素領域の場合について
説明したが、一般に、ｍ×ｍ画素領域の中心画素を除い
た画素のビットパック処理を、上記の処理方法で行う場
合、ｍ種類のビットパック命令と、（ｍ²−１）個のｍ
入力１出力セレクタ回路が必要となり（これを「手法
Ａ」という）、必要な命令の数および回路規模的にも大
きくなる。Here, the case of the 3 × 3 pixel area has been described. However, in general, when the bit pack processing of the pixels excluding the center pixel of the m × m pixel area is performed by the above-described processing method, m types of bits are required. Pack instruction and (m ² -1) m
An input / output selector circuit is required (this is referred to as "method A"), and the number of necessary instructions and the circuit scale are increased.

【００４６】本実施形態においては、この問題を解消す
るために、ビットパック処理と同じ命令サイクル内で、
各ＰＥの１ビットレジスタｒ１、ｒ２、ｒ３間で、内容
の転送を行い、さらに次の処理に必要なデータをメモリ
から読み出すものとする。より具体的には、ビットパッ
ク命令は、３種類用意されており、１つは、上記のビッ
トパック動作と同時に、ｒ１←ｒ２、ｒ２←ｒ３とコピ
ーし、さらにメモリのデータをｒ３に転送する『パック
＆シフトアップ＆ロード命令』であり、もう１つは、上
記のビットパック動作と同時に、ｒ１→ｒ２、ｒ２→ｒ
３とコピーし、さらにメモリのデータをｒ１に転送する
『パック＆シフトダウン＆ロード命令』であり、残りの
１つは、上記のビットパック動作のみを行う『パック命
令』である。これらは、処理目的に応じて使い分けられ
る。In the present embodiment, in order to solve this problem, within the same instruction cycle as the bit pack processing,
The contents are transferred between the 1-bit registers r1, r2, and r3 of each PE, and data necessary for the next processing is read from the memory. More specifically, three types of bit pack instructions are prepared. One is to copy r1 ← r2 and r2 ← r3 at the same time as the above bit pack operation, and to transfer the data in the memory to r3. The other is a “pack & shift up & load instruction”, and the other is r1 → r2, r2 → r simultaneously with the above bit pack operation.
3 is a "pack & shift down & load instruction" for transferring data in the memory to r1, and the other is a "pack instruction" for performing only the above-described bit pack operation. These are used properly according to the processing purpose.

【００４７】図５（Ａ）は、処理を画像の行方向に増加
させながら、進めていく場合に、『パック＆シフトアッ
プ＆ロード命令』実行後に、レジスタに格納される行数
を示した図である。FIG. 5A is a diagram showing the number of rows stored in a register after execution of a "pack & shift up & load instruction" when processing is performed while increasing the processing in the row direction of the image. It is.

【００４８】『パック＆シフトアップ＆ロード命令』で
は、処理を画像の行方向に増加させながら進めていく場
合、ビットパックすると同時に、次の行のデータが、１
ビットレジスタｒ３へロードされ、レジスタｒ１は常に
（処理対象画素の行−１）行のデータを、レジスタｒ２
は処理対象画素の行のデータを、レジスタｒ３は（処理
対象画素の行＋１）行のデータを格納する。このため、
次の『パック＆シフトアップ＆ロード命令』を即座に実
行することができる。In the "pack & shift up & load instruction", when processing is performed while increasing in the row direction of the image, data of the next row is set to 1 at the same time as bit packing.
The data is loaded into the bit register r3, and the register r1 always stores the data of the row (row-1 of the pixel to be processed) in the register r2.
Represents the data of the row of the pixel to be processed, and the register r3 stores the data of the row of the pixel to be processed + 1. For this reason,
The next “pack & shift up & load instruction” can be executed immediately.

【００４９】図５（Ｂ）は、処理を画像の行方向に減少
させながら、進めていく場合に、『パック＆シフトダウ
ン＆ロード命令』実行後に、レジスタに格納される行数
を示した図である。FIG. 5B is a diagram showing the number of lines stored in the register after execution of the "pack & shift down & load instruction" when the processing is advanced while decreasing in the line direction of the image. It is.

【００５０】『パック＆シフトダウン＆ロード命令』で
は、処理を画像の行方向に減少させながら進めていく場
合、ビットパックすると同時に次の行のデータがレジス
タｒ１へロードされ、レジスタｒ１は常に（処理対象画
素の行−１）行のデータを、ｒ２は処理対象画素の行の
データを、ｒ３は（処理対象画素の行＋１）行のデータ
を格納する。このため、次の『パック＆シフトダウン＆
ロード命令』を即座に実行できる。さらに、レジスタｒ
１、ｒ２、ｒ３に格納されるデータの行と処理対象画素
の行との関係は、『パック＆シフトアップ＆ロード命
令』の場合と同じであるため、テーブル参照処理をする
際のテーブルデータとしては、同一のものを使用でき
る。In the "pack & shift down & load instruction", when processing is performed while decreasing in the row direction of the image, data of the next row is loaded into the register r1 at the same time as bit packing, and the register r1 is always ( The data of the row of the pixel to be processed-1) is stored, r2 is the data of the row of the pixel to be processed, and r3 is the data of the row of the pixel to be processed + 1. For this reason, the following “Pack & Shift down &
Load instruction ”can be executed immediately. Further, the register r
The relationship between the rows of data stored in 1, r2, and r3 and the rows of the pixels to be processed is the same as in the case of the “pack & shift up & load instruction”, and is used as table data when performing table reference processing. Can be the same.

【００５１】本実施形態では、３×３画素領域のビット
パック処理を行う場合、命令の種類は３種類と、前記手
法Ａと同じであるが、回路的には、３つの３入力１出力
セレクタだけで済み、回路規模を削減している。In the present embodiment, when performing bit pack processing of a 3 × 3 pixel area, there are three types of instructions, which are the same as those in the method A. However, in terms of circuit, three three-input one-output selectors are used. Only requires a small circuit size.

【００５２】また、ｍ×ｍ画素領域のビットパック処理
を行う回路では、ビットパック処理と同時に、『パック
＆シフトダウン＆ロード命令』では、ｒ１←ｒ２、ｒ２
←ｒ３、…、ｒ（ｍ−１）←ｒｍとコピーし、さらにメ
モリのデータをｒｍへロードすればよく、『パック＆シ
フトアップ＆ロード命令』では、ｒ１→ｒ２、ｒ２→ｒ
３、…、ｒ（ｍ−１）→ｒｍとコピーし、さらにメモリ
のデータをｒ１へロードすればよい。In the circuit for performing the bit pack processing of the m × m pixel area, at the same time as the bit pack processing, in the “pack & shift down & load instruction”, r1 ← r2, r2
← r3,..., R (m−1) ← rm and then load the data in the memory to rm. In the “pack & shift up & load instruction”, r1 → r2, r2 → r
3,..., R (m-1) → rm, and the data in the memory may be loaded into r1.

【００５３】したがって、ビットパック命令は３つでよ
く、これは３×３画素領域のビットパック処理の場合と
同じである。回路については、ｍ×ｍ画素領域のビット
パック処理に対する前記手法Ａの場合には、ｍ入力１出
力セレクタ回路を（ｍ²−１）個必要とするのに比べ
て、ｍ個の３入力１出力のセレクタを必要とするだけで
済み、必要な命令数、回路規模ともに大幅に削減でき
る。Therefore, only three bit pack instructions are required, which is the same as the case of the bit pack processing of the 3 × 3 pixel area. Regarding the circuit, in the case of the method A for the bit pack processing of the mxm pixel area, m three-input one-output selector circuits are required in comparison with the necessity of (m ² -1) m-input one-output selector circuits. Only an output selector is required, and the required number of instructions and the circuit scale can be greatly reduced.

【００５４】図６は、本発明の第２の実施形態の構成を
示した図である。図１に示した前記実施形態と比較する
と、本実施形態においては、全プロセッサから同時にリ
ードすることができ、ホストプロセッサ１０またはプロ
セッサアレイ１のうちの１つのプロセッサ２からライト
できるテーブルメモリ１３が付加されている。FIG. 6 is a diagram showing the configuration of the second embodiment of the present invention. Compared with the embodiment shown in FIG. 1, in this embodiment, a table memory 13 which can be read from all processors simultaneously and which can be written by the host processor 10 or one processor 2 of the processor array 1 is added. Have been.

【００５５】二値画像に対して、膨張処理を実行する際
には、生成されたビットパックデータが「０」でないな
らば、周辺の８近傍画素の少なくとも１画素は「１」で
あるため、処理対象画素を「１」とすればよい。When the expansion processing is performed on a binary image, if the generated bit pack data is not “0”, at least one of the eight neighboring pixels is “1”. The pixel to be processed may be set to “1”.

【００５６】これは、ビットパックデータに対する論理
演算の最も簡単な演算の１つであり、このように簡単な
論理演算で済む場合には、テーブルは必要ない。This is one of the simplest logical operations on bit pack data, and if such a simple logical operation is sufficient, no table is required.

【００５７】これに対して、簡単な論理演算で処理でき
ない場合には、ビットパックデータの値によって、予め
計算した処理結果が格納されているテーブルデータを用
意しておくテーブル参照法が有効である。テーブル参照
法を実行するためには、テーブルとしてメモリには、プ
ログラムのダウンロード時、またはテーブル参照法を実
行する前までに、ビットパックデータの内容に対応する
アドレスにそのビットパックデータに対する処理結果を
書き込んでおく。On the other hand, if the processing cannot be performed by a simple logical operation, a table reference method in which table data storing processing results calculated in advance is prepared based on the value of bit pack data is effective. . In order to execute the table reference method, the processing result for the bit pack data is stored in the memory as a table at the address corresponding to the content of the bit pack data at the time of downloading the program or before executing the table reference method. Write it down.

【００５８】そして、テーブル参照処理では、ビットパ
ックデータをアドレスとして、テーブルメモリを間接ア
ドレッシングでアクセスすることにより、メモリから処
理結果を読み出す。３×３画素領域のビットパックデー
タは８ビットであり、その取り得る値は「０」から「２
５５」までの２５６通りであるため、この場合、テーブ
ル参照用のメモリとして２５６ワードあれば良い。In the table reference processing, the processing result is read from the memory by accessing the table memory by indirect addressing using the bit pack data as an address. The bit pack data in the 3 × 3 pixel area is 8 bits, and the possible values are “0” to “2”.
Since there are 256 ways up to "55", in this case, it is sufficient that the table reference memory has 256 words.

【００５９】複数ＰＥにおいて、テーブル参照演算を実
行するには、ＰＥ数分のテーブル参照メモリが必要とさ
れることになるが、テーブルの内容は共通でよい場合が
多く、各ＰＥ毎にテーブルを持たせる構成は、メモリの
使用効率が悪い。In order to execute a table reference operation in a plurality of PEs, a table reference memory for the number of PEs is required. However, in many cases, the contents of the table may be common, and the table is required for each PE. This configuration has poor memory use efficiency.

【００６０】そこで、本実施形態においては、図６に示
すように、全ＰＥに対して、１つだけテーブルメモリ１
３を備え、全ＰＥから同時にテーブルをアクセスするよ
うな構成とされている。Therefore, in this embodiment, as shown in FIG. 6, only one table memory 1 is provided for all PEs.
3 so that all PEs can access the table at the same time.

【００６１】この場合、図７に示すように、テーブルメ
モリ１３としては、１つのメモリセル３０に対して、１
つのライトポート３１と、ＰＥ数（ｃ＝ｎ）分のリード
ポート３２、すなわちｎ個のリードポート３２を持つメ
モリを１つだけ用意すればよい。In this case, as shown in FIG. 7, as the table memory 13, one memory cell 30
Only one write port 31 and one read port 32 for the number of PEs (c = n), that is, one memory having n read ports 32 may be prepared.

【００６２】ライトポート３１は、ライトアドレス線と
ライトデータ線を介して、ホストプロセッサ１０、コン
トローラ８、あるいは特定のＰＥと接続され、それらの
いずれかからメモリセル３０へのデータの書き込みを行
う。The write port 31 is connected to the host processor 10, the controller 8, or a specific PE via a write address line and a write data line, and writes data to the memory cell 30 from any of them.

【００６３】ｎ個のリードポート３２の各ポートは、そ
れ自身のリードアドレス線とリードデータ線とを介し
て、ｎ個のＰＥと個別に接続される。このように構成し
たことにより、全ＰＥが、テーブルメモリのメモリセル
を共有化することになり、高並列化した場合にも、回路
面積（チップ面積）の増加を最小限に抑えることがで
き、一方、チップ面積に余裕がある場合には、より大容
量のテーブルメモリを集積したり、あるいは複数面のテ
ーブルメモリを集積することが可能とされ、その結果、
テーブルメモリの１ワードのサイズを大きくしたり、複
数のテーブルデータを同時に保持できることになり、テ
ーブル参照処理の用途を拡大することができる。Each of the n read ports 32 is individually connected to the n PEs via its own read address line and read data line. With this configuration, all PEs share the memory cells of the table memory, and even when the parallelism is high, the increase in the circuit area (chip area) can be minimized. On the other hand, if there is a margin in the chip area, it is possible to integrate a larger-capacity table memory, or to integrate a plurality of table memories.
The size of one word of the table memory can be increased, and a plurality of table data can be held at the same time, so that the use of the table reference processing can be expanded.

【００６４】回路的、あるいはチップレイアウト的な観
点から、メモリのリードポートの数に限界がある場合に
は、全ＰＥではなく、いくつかのＰＥ毎に１つのテーブ
ルメモリを割り当てればよい。例えば、全ＰＥ数が６４
で、８個のＰＥ毎に１つのテーブルメモリを割り当てる
ならば、テーブルメモリとして、１つのライトポートと
８つのリードポートを持つようにすればよいことにな
る。When the number of read ports of the memory is limited from the viewpoint of circuit or chip layout, one table memory may be allocated to some PEs instead of all PEs. For example, if the total number of PEs is 64
If one table memory is assigned to each of the eight PEs, the table memory should have one write port and eight read ports.

【００６５】[0065]

【発明の効果】以上説明したように、本発明によれば、
プロセッサを多数内蔵する並列画像処理装置において、
各プロセッサにｍ個の１ビットレジスタを設け、これら
のレジスタは隣接するプロセッサから参照可能とし、各
プロセッサが、（ｍ−１）／２ＰＥ以内にあるプロセッ
サからのｍビットずつのデータとそれ自身のレジスタの
（ｍ−１）ビットの合わせて（ｍ²−１）ビットを、格
納されるレジスタに応じたビット位置に割り当て、（ｍ
²−１）ビットレジスタにビットパックして格納できる
ようにし、さらに同時にｍ個の１ビットレジスタ間で、
データをシフトし、シフトの際にデータが送られて来な
い端部のプロセッサには、次のデータをメモリからロー
ドできるように構成したことにより、二値画像データに
対して、高速な画像処理を実行できる。As described above, according to the present invention,
In a parallel image processing device incorporating many processors,
Each processor is provided with m 1-bit registers, and these registers can be referred to by an adjacent processor. Each of the processors has m bits of data from a processor within (m-1) / 2 PE and its own. Allocating (m ² -1) bits together with the (m-1) bits of the register to a bit position corresponding to the register to be stored,
² -1) bits register to be stored in bit-packed, yet at the same time between the m 1-bit register,
High-speed image processing is possible for binary image data by shifting the data and loading the next data from the memory to the end processor to which no data is sent during the shift. Can be executed.

【００６６】また、本発明によれば、ビットパックされ
たデータによるテーブル参照処理に用いられるメモリと
して、１ライト／マルチリードポート構成のメモリを用
い、全プロセッサからリードできるようにしたことによ
り、各プロセッサのローカルメモリをテーブルメモリと
して用いる場合に比べて、大幅にメモリ使用量を削減す
ることができる。Further, according to the present invention, a memory having a one-write / multi-read port configuration is used as a memory used for table reference processing using bit-packed data, and can be read from all processors. Compared with the case where the local memory of the processor is used as the table memory, the memory usage can be significantly reduced.

[Brief description of the drawings]

【図１】本発明の一実施形態の画像処理装置の構成を示
すブロック図である。FIG. 1 is a block diagram illustrating a configuration of an image processing apparatus according to an embodiment of the present invention.

【図２】二値画像の３×３近傍のビットパック処理を説
明するための図である。FIG. 2 is a diagram for explaining bit pack processing near 3 × 3 of a binary image;

【図３】本発明の一実施形態の画像処理装置におけるビ
ットパックブロックの構成を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration of a bit pack block in the image processing apparatus according to the embodiment of the present invention.

【図４】本発明の一実施形態の画像処理装置の両端部の
プロセッサにおけるビットパックブロックの構成を示す
ブロック図である。FIG. 4 is a block diagram illustrating a configuration of a bit pack block in processors at both ends of the image processing apparatus according to the embodiment of the present invention;

【図５】本発明の一実施形態の画像処理装置に使用され
るビットパック命令を説明するための図である。FIG. 5 is a diagram for explaining a bit pack instruction used in the image processing apparatus according to the embodiment of the present invention.

【図６】本発明の他の実施形態の画像処理装置の構成を
示すブロック図である。FIG. 6 is a block diagram illustrating a configuration of an image processing apparatus according to another embodiment of the present invention.

【図７】本発明の他の実施形態の画像処理装置における
テーブルメモリの構成を示すブロック図である。FIG. 7 is a block diagram illustrating a configuration of a table memory in an image processing apparatus according to another embodiment of the present invention.

【図８】従来の画像処理装置の構成を示すブロック図で
ある。FIG. 8 is a block diagram illustrating a configuration of a conventional image processing apparatus.

[Explanation of symbols]

１プロセッサアレイ２プロセッサ３ローカルメモリ４演算器５レジスタ６ビットパックブロック７命令供給回路８シーケンスコントローラ９プログラムメモリ１０ホストプロセッサ１１アドレス線１２データ線１３テーブルメモリ２０データ転送線２１１ビットレジスタ２２セレクタ２３デスティネーション・レジスタ２４セレクタ３０メモリセル３１ライトポート３２リードポート１００一次元プロセッサアレイ２００プロセッサ Reference Signs List 1 processor array 2 processor 3 local memory 4 arithmetic unit 5 register 6 bit pack block 7 instruction supply circuit 8 sequence controller 9 program memory 10 host processor 11 address line 12 data line 13 table memory 20 data transfer line 21 1-bit register 22 selector 23 Destination register 24 selector 30 memory cell 31 write port 32 read port 100 one-dimensional processor array 200 processor

Claims

(57) [Claims]

1. An image processing apparatus comprising a plurality of processors provided in an array, wherein each processor is capable of directly referring to data in a predetermined register of a proximity processor, and the data in a predetermined register of the proximity processor is provided. Means for bit-packing data in a predetermined register in each of the processors.

2. The processor according to claim 1, wherein the processor shifts data between registers storing data used for the bit pack and reads data used for the next bit pack process from the memory to the register in the bit pack operation. 2. A parallel image processing apparatus according to claim 1, further comprising: means.

3. A memory having one write port and read ports having the same number as the total number of processors or a number obtained by dividing the total number of processors by a predetermined positive number, as a table reference memory. Claim 1 or 2
A parallel image processing apparatus as described in the above.

4. A parallel image processing apparatus in which a plurality of processors each including a local memory, an arithmetic unit, a register, and bit pack means for performing bit pack processing are connected in an array, wherein each of the processors The data of a predetermined register holding the pixel information of the processor can be directly referred to, and the predetermined register of the neighboring processor and the register used for the bit pack in the own processor are set as transfer sources (sources). ), The data is shifted between registers that store the data used for the bit pack, and the data used in the next bit pack process is loaded from the local memory to the register. Means for parallel image processing.