JP2002207707A

JP2002207707A - Simd type micro-processor having function for selecting constant

Info

Publication number: JP2002207707A
Application number: JP2001003602A
Authority: JP
Inventors: Shinichi Yamaura; 慎一山浦
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2001-01-11
Filing date: 2001-01-11
Publication date: 2002-07-26
Anticipated expiration: 2021-01-11
Also published as: JP3837293B2

Abstract

PROBLEM TO BE SOLVED: To realize load processing for a threshold value of a dither matrix in a dither method, in reduced steps, and realize processing upto data loading after conversion, in further reduced steps. SOLUTION: This SIMD type micro-processor includes one global processor, and plural processor elements. In the SIMD micro-processor, plural data buses are arranged from the global processor to the respective processor elements, the each processor element generates a selection signal for assigning selection of any data bus out of the plural data buses, and a signal transmitted from the global processor is stored in a prescribed register in the each processor element via the selected data bus selected by the selection signal.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ＳＩＭＤ（Ｓｉｎ
ｇｌｅＩｎｓｔｒｕｃｔｉｏｎ−ｓｔｒｅａｍＭｕｌ
ｔｉｐｌｅＤａｔａ−ｓｔｒｅａｍ；単一命令多デー
タ処理）型マイクロプロセッサに関する。TECHNICAL FIELD The present invention relates to a SIMD (Sin
gle Instruction-streamMmul
The present invention relates to a single data-stream type microprocessor.

【０００２】[0002]

【従来の技術】ＳＩＭＤ型マイクロプロセッサでは、複
数のデータに対して１つの命令で同時に同一の演算処理
が実行可能である。この構造により、演算は同一である
がデータ量が非常に多い処理（例えば、画像処理）に係
る用途において、頻用される。2. Description of the Related Art In a SIMD type microprocessor, the same arithmetic processing can be simultaneously executed on a plurality of data by one instruction. Due to this structure, the calculation is frequently used in applications related to processing (for example, image processing) in which the data amount is very large but the data amount is very large.

【０００３】ＳＩＭＤ型マイクロプロセッサにおける通
常の演算処理では、複数の演算ユニット（Ｐｒｏｃｅｓ
ｓｏｒＥｌｅｍｅｎｔ〔ＰＥ〕；プロセッサエレメン
ト）を並べ同一の演算を同時に複数のデータに対して実
行する。[0003] In a normal operation process in a SIMD type microprocessor, a plurality of operation units (Proces) are used.
so Element [PE]; processor element) and execute the same operation on a plurality of data simultaneously.

【０００４】ＳＩＭＤ型マイクロプロセッサは、全ての
ＰＥが同時に動作する処理においてはその性能を十分に
発揮できる。しかし、ＰＥ毎に演算パラメータが異なる
ような処理においては、性能を発揮できない。「ＰＥ毎
に演算パラメータが異なるような処理」の例としては、
ディザマトリクスを利用するディザ法による２値化処理
が挙げられる。[0004] The SIMD type microprocessor can sufficiently exhibit its performance in processing in which all PEs operate simultaneously. However, performance cannot be exhibited in a process in which the operation parameters are different for each PE. Examples of "processing in which the calculation parameters differ for each PE" include:
A binarization process using a dither method using a dither matrix may be used.

【０００５】画像処理でよく利用されるディザ法による
２値化処理では、２値化の判定基準であるしきい値が画
素単位で異なる。図４は、ディザ法のディザマトリクス
の例である。本マトリクスは、４×４のディザマトリク
スである。このディザマトリクスを利用する２値化処理
では、１つの行（ライン）に４つのしきい値が使用さ
れ、４画素を単位としてその４つの値の繰り返しとな
る。具体的に言うと、多数並べて配置されたＰＥの端か
ら各ＰＥ（の所定のレジスタ）に順に格納される１ライ
ンの画素データを、図４の１つの行のしきい値と比較す
る（１ライン目は１行目のしきい値、２ライン目は２行
目のしきい値、３ライン目は３行目のしきい値、４ライ
ン目は４行目のしきい値、５ライン目は１行目のしきい
値・・・）のであるが、１つのライン内では４画素を単
位として４種の値と比較する（１画素目は１列目のしき
い値、２画素目は２列目のしきい値、３画素目は３列目
のしきい値、４画素目は４列目のしきい値、５画素目は
１列目のしきい値・・・）。In the binarization processing by the dither method often used in image processing, a threshold value, which is a criterion for binarization, differs for each pixel. FIG. 4 is an example of a dither matrix of the dither method. This matrix is a 4 × 4 dither matrix. In the binarization processing using the dither matrix, four threshold values are used for one row (line), and the four values are repeated in units of four pixels. More specifically, one line of pixel data sequentially stored in (a predetermined register of) each PE from the end of a number of arranged PEs is compared with the threshold value of one row in FIG. The first line is the threshold of the first line, the second line is the threshold of the second line, the third line is the threshold of the third line, the fourth line is the threshold of the fourth line, the fifth line Is the threshold of the first row..., But within one line, four pixels are compared with four values (the first pixel is the threshold of the first column and the second pixel is the threshold). The second column threshold, the third pixel the third column threshold, the fourth pixel the fourth column threshold, the fifth pixel the first column threshold ...).

【０００６】ディザ法による２値化処理を、ＳＩＭＤ型
マイクロプロセッサで行なう場合には、ＰＥの所定のレ
ジスタに格納されるしきい値がＰＥにより異なることに
なる。しきい値が１つであれば、１回の比較命令により
全ＰＥでの処理が完了できるが、上記のようにしきい値
が４つあれば、全ＰＥでの処理を完了させるには４回の
比較命令が必要となる。ディザマトリクスのサイズが大
きくなれば、当然ながら、それに応じて比較命令の数も
増加する。When the binarization processing by the dither method is performed by the SIMD type microprocessor, the threshold value stored in a predetermined register of the PE differs depending on the PE. If the number of thresholds is one, the processing in all PEs can be completed by one comparison instruction. However, if the number of thresholds is four as described above, four times are required to complete the processing in all PEs. Is required. As the size of the dither matrix increases, the number of comparison instructions naturally increases accordingly.

【０００７】従来の技術において、上記の問題に対応す
るために、（複数の）しきい値を予め各ＰＥのレジスタ
若しくはローカルメモリに保持しておく、という方法が
採られていることがある。４×４のディザマトリクスを
利用する場合、１つのＰＥに着目すると４ラインごとに
４つのしきい値が繰返し使用される。よって、これら４
つのしきい値を（４つの）レジスタに保持しておき、比
較命令にて利用する。その比較処理に備えて、初期化の
処理では、１ライン当たり４回（以上）のデータ転送命
令を行なって４画素ごとのしきい値の（ＰＥの）レジス
タへの格納を行ない、更に、その格納処理を４回（ライ
ン数分）繰り返す。この処理の場合には、しきい値の格
納のために各ＰＥでレジスタが４つ必要となる。即ち、
ハードウエア資源が相当分消費されるという問題が生じ
る。In the prior art, in order to cope with the above-mentioned problem, there is a method in which a plurality of threshold values are stored in advance in a register or a local memory of each PE. When a 4 × 4 dither matrix is used, focusing on one PE, four thresholds are repeatedly used for every four lines. Therefore, these 4
The two thresholds are held in (four) registers and used in the comparison instruction. In preparation for the comparison process, in the initialization process, a data transfer instruction is performed four times (or more) per line to store the threshold value for each four pixels in the (PE) register. The storing process is repeated four times (for the number of lines). In the case of this processing, four registers are required in each PE for storing the threshold value. That is,
A problem arises in that hardware resources are consumed considerably.

【０００８】また、マイクロプロセッサの外部から画像
データを入力する際に、同時に外部からしきい値を入力
する、という方法が用いられることもある。この方法の
場合には、レジスタへのしきい値の格納のための命令を
設定する必要がないため、その命令分の処理時間は発生
しない（削減される）。しかし、各ＰＥ毎にしきい値を
格納するレジスタが必要となるという問題点は残る。更
に、しきい値を入力するための入力ポートが余分に必要
となる。In some cases, when image data is input from the outside of the microprocessor, a threshold value is input from the outside at the same time. In the case of this method, there is no need to set an instruction for storing the threshold value in the register, so that the processing time for the instruction does not occur (is reduced). However, there remains a problem that a register for storing a threshold value is required for each PE. Further, an extra input port for inputting a threshold value is required.

【０００９】特開平５−６７２０３号、及び特開平６−
８３７８７号で開示されているＳＩＭＤ型マイクロプロ
セッサは、外部からデータを入力する機能を備えるもの
であるが、それらの機能の利用によって上記のように画
像データとしきい値（データ）とを同時に入力すること
も可能である。JP-A-5-67203 and JP-A-6-67203
The SIMD-type microprocessor disclosed in Japanese Patent No. 83787 has a function of inputting data from the outside, and the image data and the threshold value (data) are simultaneously input as described above by utilizing those functions. It is also possible.

【００１０】特開平６−１７６１７６号、及び特開平６
−２５９５８１号の（ＳＩＭＤ型）プロセッサでは、各
ＰＥにローカルメモリのアドレスを関連付ける。ローカ
ルメモリのアドレスは、ＰＥ毎に異なるアドレスであ
る。そのローカルメモリに格納されるデータをＰＥでの
処理で用いることになる。このような構成であれば、Ｐ
Ｅへしきい値をロードするためＰＥ毎に異なる値を転送
する、という処理は必要ではない。しきい値との比較処
理時にＰＥ毎に異なるローカルメモリのアドレスが示さ
れればよいことになる。ただし、初期化の処理において
はしきい値のマトリクスの全部のロード処理が必要であ
り、しかもそれらのデータの全部を保持するメモリも必
要となる。JP-A-6-176176 and JP-A-6-176176
In the -259581 (SIMD) processor, each PE is associated with an address in a local memory. The address of the local memory is different for each PE. The data stored in the local memory will be used for processing in the PE. With such a configuration, P
It is not necessary to transfer a different value for each PE in order to load a threshold value into E. At the time of the comparison process with the threshold value, a different local memory address for each PE only needs to be indicated. However, in the initialization processing, it is necessary to load the entire matrix of thresholds, and a memory for holding all of the data is also required.

【００１１】[0011]

【発明が解決しようとする課題】本発明は、ディザ法の
ディザマトリクスのしきい値のロード処理を、少ない
（処理）ステップで実現することを目的とする。更に変
換後データのロードまでの処理もより少ないステップで
実現することを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to realize a process of loading a threshold value of a dither matrix in a dither method with a small number of (processing) steps. Further, it is another object of the present invention to realize the processing up to loading of the converted data in fewer steps.

【００１２】[0012]

【課題を解決するための手段】本発明は、上記の目的を
達成するためになされたものである。本発明に係る請求
項１に記載のＳＩＭＤ型マイクロプロセッサは、１つの
グローバルプロセッサと、複数のプロセッサエレメント
とを含むＳＩＭＤ型マイクロプロセッサである。そのＳ
ＩＭＤ型マイクロプロセッサにおいて、上記グローバル
プロセッサから各々のプロセッサエレメントに対し、複
数のデータバスが設置されており、各プロセッサエレメ
ントは、上記複数のデータバスのうちからどのデータバ
スを選択するのかを指定する選択信号を生成し、上記選
択信号により選択されたデータバスを介して上記グロー
バルプロセッサから転送される信号を、各プロセッサエ
レメント内の所定のレジスタに格納する。SUMMARY OF THE INVENTION The present invention has been made to achieve the above object. The SIMD type microprocessor according to claim 1 of the present invention is a SIMD type microprocessor including one global processor and a plurality of processor elements. That S
In the IMD type microprocessor, a plurality of data buses are provided from the global processor to each processor element, and each processor element specifies which data bus is selected from the plurality of data buses. A selection signal is generated, and a signal transferred from the global processor via the data bus selected by the selection signal is stored in a predetermined register in each processor element.

【００１３】本発明に係る請求項２に記載のＳＩＭＤ型
マイクロプロセッサは、各プロセッサエレメントには、
連続する通し番号が順に付されており、各プロセッサエ
レメントにおいて、２進法にて表現された自らの通し番
号に対し、所定桁数の上位ビットを“０”に置き換え、
その結果形成される信号を上記の選択信号とする、請求
項１に記載のＳＩＭＤ型マイクロプロセッサである。According to a second aspect of the present invention, in the SIMD type microprocessor, each processor element includes:
Successive serial numbers are sequentially assigned, and in each processor element, a higher digit bit of a predetermined number of digits is replaced with “0” with respect to its own serial number expressed in a binary system,
2. The SIMD microprocessor according to claim 1, wherein a signal formed as a result is the selection signal.

【００１４】本発明に係る請求項３に記載のＳＩＭＤ型
マイクロプロセッサは、各プロセッサエレメントにおけ
る演算結果データ、又はその演算結果から導出されるデ
ータを、各プロセッサエレメント内の所定のレジスタに
格納し、該レジスタから引き出される信号を上記の選択
信号とする、請求項１に記載のＳＩＭＤ型マイクロプロ
セッサである。According to a third aspect of the present invention, there is provided a SIMD type microprocessor storing operation result data in each processor element or data derived from the operation result in a predetermined register in each processor element. 2. The SIMD type microprocessor according to claim 1, wherein a signal drawn from said register is used as said selection signal.

【００１５】本発明に係る請求項４に記載のＳＩＭＤ型
マイクロプロセッサは、即値を２つ以上含む命令コード
により動作されるＳＩＭＤ型マイクロプロセッサであっ
て、上記の複数のデータバスに対し、上記の複数の即値
が伝送される、請求項１乃至請求項３に記載のＳＩＭＤ
型マイクロプロセッサである。According to a fourth aspect of the present invention, there is provided a SIMD microprocessor operated by an instruction code including two or more immediate values. 4. The SIMD according to claim 1, wherein a plurality of immediate values are transmitted.
Type microprocessor.

【００１６】[0016]

【発明の実施の形態】以下、図面を参照して、本発明に
係る好適な実施形態を説明する。Preferred embodiments of the present invention will be described below with reference to the drawings.

【００１７】図１は、本発明に係るＳＩＭＤ型マイクロ
プロセッサ２の概略の構成を示すブロック図である。図
１の構成は、後で説明する第１の実施の形態、第２の実
施の形態、及び第３の実施の形態のＳＩＭＤ型マイクロ
プロセッサ２の、基礎となる構成である。つまり、第１
の実施の形態、第２の実施の形態、及び第３の実施の形
態のＳＩＭＤ型マイクロプロセッサ２は、図１の構成に
対して、必要な構成要素が付加されて形成されている。FIG. 1 is a block diagram showing a schematic configuration of a SIMD type microprocessor 2 according to the present invention. The configuration of FIG. 1 is a basic configuration of a SIMD microprocessor 2 according to a first embodiment, a second embodiment, and a third embodiment described later. That is, the first
The SIMD microprocessor 2 according to the second, third, and third embodiments is formed by adding necessary components to the configuration of FIG.

【００１８】図１のＳＩＭＤ型マイクロプロセッサ２
は、概略、グローバルプロセッサ４、レジスタファイル
６、及び演算アレイ８から構成される。The SIMD type microprocessor 2 shown in FIG.
Generally comprises a global processor 4, a register file 6, and an operation array 8.

【００１９】（１）グローバルプロセッサ４このグローバルプロセッサ４そのものは、いわゆるＳＩ
ＳＤ型のプロセッサであり、プログラムＲＡＭ１０とデ
ータＲＡＭ１２を内蔵し（図２参照）、プログラムを解
読し各種制御信号を生成する。この制御信号は内蔵する
各種ブロック以外に、レジスタファイル６、演算アレイ
８にも供給される。また、ＧＰ（グローバルプロセッ
サ）命令実行時は内蔵する汎用レジスタ、ＡＬＵ（算術
論理演算器）等を使用して各種演算処理、プログラム制
御処理をおこなう。(1) Global processor 4 This global processor 4 itself is a so-called SI
It is an SD type processor, and has a built-in program RAM 10 and data RAM 12 (see FIG. 2), decodes a program and generates various control signals. This control signal is also supplied to the register file 6 and the operation array 8 in addition to the various built-in blocks. When a GP (global processor) instruction is executed, various arithmetic processing and program control processing are performed using a built-in general-purpose register, an ALU (arithmetic logic unit), and the like.

【００２０】（２）レジスタファイル６ＰＥ（プロセッサエレメント）命令で処理されるデータ
を保持している。ＰＥ（プロセッサエレメント）３は、
公知のように、ＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕ
ｃｔｉｏｎ−Ｓｔｒｅａｍ，ＭｕｌｔｉｐｌｅＤａｔ
ａ−Ｓｔｒｅａｍ）型プロセッサにおいて個別の演算を
実行する構成単位である。図２のレジスタファイル６及
び演算アレイ８が示すように、図２のＳＩＭＤ型マイク
ロプロセッサ２では２５６個のＰＥ３を含んでいる。上
記のＰＥ命令はＳＩＭＤ型の命令であり、レジスタファ
イル６に保持されている複数のデータに対し、同時に同
じ処理を行なう。このレジスタファイル６からのデータ
の読み出し／書き込みの制御はグローバルプロセッサ４
からの制御信号によって行なわれる。読み出されたデー
タは演算アレイ８に送られ、演算アレイ８での演算処理
後にレジスタファイル６に書き込まれる。(2) Register file 6 This holds data to be processed by a PE (processor element) instruction. PE (processor element) 3
As is well known, SIMD (Single Instrument)
ction-Stream, Multiple Dat
a-Stream) is a structural unit that executes individual operations in a processor. As shown by the register file 6 and the operation array 8 in FIG. 2, the SIMD type microprocessor 2 in FIG. 2 includes 256 PEs 3. The above PE instruction is a SIMD type instruction, and performs the same process on a plurality of data held in the register file 6 at the same time. The read / write control of data from the register file 6 is controlled by the global processor 4.
This is performed by a control signal from. The read data is sent to the operation array 8 and written into the register file 6 after the operation processing in the operation array 8.

【００２１】また、レジスタファイル６はプロセッサ２
外部からのアクセスが可能であり、グローバルプロセッ
サ４の制御とは別に、外部から特定のレジスタに対し読
み出し／書き込みが行なわれる。The register file 6 is stored in the processor 2
External access is possible, and a specific register is read / written from the outside independently of the control of the global processor 4.

【００２２】（３）演算アレイＰＥ命令の演算処理が行なわれる。処理の制御はすべて
グローバルプロセッサ４から行なわれる。(3) Arithmetic array The arithmetic processing of the PE instruction is performed. All control of the processing is performed by the global processor 4.

【００２３】図２は、本発明に係るＳＩＭＤ型マイクロ
プロセッサ２の、更に詳細な構成を示すブロック図であ
る。FIG. 2 is a block diagram showing a more detailed configuration of the SIMD type microprocessor 2 according to the present invention.

【００２４】グロバールプロセッサ４には、本プロセッ
サ２のプログラム格納用のプログラムＲＡＭ１０と、演
算データ格納用のデータＲＡＭ１２が内蔵されている。
さらに、プログラムのアドレスを保持するプログラムカ
ウンタ（ＰＣ）１４、演算処理のデータ格納のための汎
用レジスタであるＧ０、Ｇ１、Ｇ２及びＧ３レジスタ
（１６、１８、２０、２２）、レジスタ退避・復帰時に
退避先データＲＡＭのアドレスを保持しているスタック
ポインタ（ＳＰ）２４、サブルーチンコール時にコール
元のアドレスを保持するリンクレジスタ（ＬＳ）２６、
同じくＩＲＱ（ＩｎｔｅｒｒｕｐｔＲｅＱｕｅｓｔ；
割込み要求）時とＮＭＩ（Ｎｏｎ−Ｍａｓｋａｂｌｅ
Ｉｎｔｅｒｒｕｐｔｒｅｑｕｅｓｔ；禁止不能割込み
要求）時の分岐元アドレスを保持するＬＩレジスタ２８
及びＬＮレジスタ３０、プロセッサの状態を保持してい
るプロセッサステータスレジスタ（Ｐ）３２が内蔵され
ている。The global processor 4 has a program RAM 10 for storing the program of the processor 2 and a data RAM 12 for storing the operation data.
Further, a program counter (PC) 14 for holding a program address, G0, G1, G2, and G3 registers (16, 18, 20, and 22) as general-purpose registers for storing data for arithmetic processing, A stack pointer (SP) 24 holding an address of the save destination data RAM, a link register (LS) 26 holding an address of a caller when a subroutine is called,
Similarly, IRQ (Interrupt Request;
Interrupt request) and NMI (Non-Maskable)
LI register 28 that holds the branch source address at the time of an interrupt request;
, An LN register 30, and a processor status register (P) 32 that holds the state of the processor.

【００２５】これらのレジスタと、（図示していない）
命令デコーダ、ＡＬＵ、ＳＣＵ（シーケンシャルユニッ
ト）、メモリ制御回路、割り込み制御回路、外部Ｉ／Ｏ
制御回路及びＧＰ演算制御回路とを使用して、ＧＰ命令
の実行が行なわれる。These registers, (not shown)
Instruction decoder, ALU, SCU (sequential unit), memory control circuit, interrupt control circuit, external I / O
The GP instruction is executed using the control circuit and the GP operation control circuit.

【００２６】また、ＰＥ命令実行時には、命令デコー
ダ、レジスタファイル制御回路５６、ＰＥ演算制御回路
５８を使用して、レジスタファイル６の制御と演算アレ
イ８の制御を行なう。さらに、データＲＡＭ１２から複
数のＰＥレジスタファイル６にデータを転送できるよう
に、設定されている。When executing the PE instruction, the instruction decoder, the register file control circuit 56 and the PE operation control circuit 58 are used to control the register file 6 and the operation array 8. Further, it is set so that data can be transferred from the data RAM 12 to the plurality of PE register files 6.

【００２７】レジスタファイル６においては、１つのＰ
Ｅ単位に８ビットのレジスタ３４が３２本内蔵されてお
り、２５６個のＰＥ分の（３２本の）組が、アレイ構成
になっている。レジスタ３４はＰＥ毎に、Ｒ０、Ｒ１、
Ｒ２、・・・Ｒ３１と呼ばれる。それぞれのレジスタ３
４は、演算アレイ８に対して１つの読み出しポートと１
つの書き込みポートを備えており、８ビットのリード／
ライト兼用のバスで演算アレイ８からアクセスされる。
３２本のレジスタの内、２４本（Ｒ０〜Ｒ２３）はプロ
セッサ外部からアクセス可能であり、外部からはクロッ
ク（ＣＬＫ）とアドレス（Ａｄｄｒｅｓｓ）、リード／
ライト制御（ＲＷＢ）を入力することで、任意のレジス
タ３４に対し、読み書きできる。残りの８本（Ｒ２４〜
Ｒ３１）のレジスタ３４は、ＰＥ演算の一時的な演算デ
ータ保存用として使用される。In the register file 6, one P
Thirty-two 8-bit registers 34 are built in E units, and a set of (32) 256 PEs is arranged in an array. The register 34 stores R0, R1,.
R2, ..., R31. Each register 3
Reference numeral 4 denotes one read port and one
It has one write port and 8 bit read /
It is accessed from the operation array 8 by a bus that also serves as a write.
Of the 32 registers, 24 (R0 to R23) can be accessed from outside the processor, and the clock (CLK), address (Address), and read /
By inputting the write control (RWB), it is possible to read and write to an arbitrary register 34. The remaining 8 (R24 ~
The register 34 of R31) is used for temporarily storing operation data of the PE operation.

【００２８】演算アレイ８は、１６ビットＡＬＵ３６と
１６ビットＡレジスタ３８、Ｆレジスタ４０を内蔵して
いる。ＰＥ命令による演算は、レジスタファイル６から
読み出されたデータ若しくはグローバルプロセッサ４か
ら与えられたデータをＡＬＵ３６の片側の入力とし、Ａ
レジスタ３８の内容をもう片側の入力として、行なわれ
るものである。その演算結果は、Ａレジスタ３８に格納
される。したがって、Ｒ０〜Ｒ３１レジスタ３４若しく
はグローバルプロセッサ４から与えられたデータと、Ａ
レジスタ３８に格納されるデータとの、演算が行なわれ
ることになる。The operation array 8 includes a 16-bit ALU 36, a 16-bit A register 38, and an F register 40. In the operation by the PE instruction, the data read from the register file 6 or the data given from the global processor 4 is input to one side of the ALU 36, and A
This is performed by using the contents of the register 38 as the other input. The operation result is stored in the A register 38. Therefore, data provided from the R0 to R31 registers 34 or the global processor 4 and A
An operation is performed on the data stored in the register 38.

【００２９】レジスタファイル６と演算アレイ８との接
続部位に、７ｔｏ１（７対１）のマルチプレクサ４２が
置かれている。図２に示すように、あるマルチプレクサ
４２から見て、左方向の３つのＰＥ３に含まれるＲ０〜
Ｒ３１レジスタ３４のデータと、右方向の３つのＰＥ３
に含まれるＲ０〜Ｒ３１レジスタ３４のデータと、自ら
が属するＰＥ３に含まれるＲ０〜Ｒ３１レジスタ３４の
データを、演算対象として選択し得るように設定されて
いる。また、レジスタファイル６の８ビットのデータ
は、シフト・拡張回路４４により任意のビット分だけ、
左シフトしてＡＬＵ３６に入力する。At the connection between the register file 6 and the operation array 8, a 7to1 (7 to 1) multiplexer 42 is provided. As shown in FIG. 2, when viewed from a certain multiplexer 42, R0 included in three PEs 3 in the left direction
R31 register 34 data and right three PE3
Are set so that the data of the R0 to R31 registers 34 included in the P1 and the data of the R0 to R31 registers 34 included in the PE 3 to which the P3 belongs can be selected as the operation target. The shift / expansion circuit 44 shifts the 8-bit data of the register file 6 by an arbitrary number of bits.
Left shift and input to ALU 36.

【００３０】各ＰＥ３には、ＰＥ番号と呼ばれる通し番
号が付されている。本ＳＩＭＤ型マイクロプロセッサ２
では、ＰＥの個数が２５６個であるので、８ビットのビ
ット列（即ち、００００００００ｂ〜１１１１１１１１
ｂの２５６通り。本明細書において、上記のような末尾
の“ｂ”は２進法表記であることを表す。）が、各ＰＥ
３にＰＥ番号データとして与えられる。ＰＥ番号は、各
ＰＥ３に対し、ＰＥの位置とは無関係に与えられても構
わないが、本明細書においては、端から順に付されてい
るものとする。Each PE 3 is assigned a serial number called a PE number. The SIMD type microprocessor 2
In this example, since the number of PEs is 256, an 8-bit bit string (that is, 00000000b to 11111111) is used.
256 ways of b. In this specification, the suffix "b" as described above indicates that the binary notation is used. ), Each PE
3 is given as PE number data. The PE number may be given to each PE 3 irrespective of the position of the PE, but in this specification, it is assumed that the PE numbers are assigned in order from the end.

【００３１】このＰＥ番号を利用して、特定のＰＥ３を
選択しそのＰＥ３の演算アレイ８に含まれる８ビットの
条件レジスタ５４（図３参照）に、所定の値を設定する
ことができる。この条件レジスタ５４により、ＰＥ３別
に演算実行／非実行の制御をすることが可能である。つ
まり、特定のＰＥ３だけに演算させるように選択するこ
とが可能となる。Using this PE number, a specific PE3 can be selected and a predetermined value can be set in the 8-bit condition register 54 (see FIG. 3) included in the operation array 8 of the PE3. With this condition register 54, it is possible to control the execution / non-execution of the operation for each PE3. That is, it is possible to select such that only the specific PE3 performs the calculation.

【００３２】なお、上記のＰＥ番号データは、各ＰＥ３
にて８ビットの入力端子を備えさせその端子をＶＣＣ若
しくはＧＮＤに結ぶ組み合わせを変えることにより、作
成している。The above PE number data is stored in each PE3
Is provided by providing an 8-bit input terminal and changing the combination of connecting the terminal to VCC or GND.

【００３３】図３のＰＥ番号発生回路６０は、それらＰ
Ｅ番号を作出できる回路である。更に、ＰＥ番号発生回
路６０は、ＧＰ４からの制御により、ＰＥ番号の順序に
従い、所定の繰返しパターンを形成する数（列）を作出
できるように、設定されている。つまり、例えば、ＰＥ
番号が、・０、１、２、３、４、５、６、７、８、９、
１０・・・・と付されているＰＥ３において、それらＰ
Ｅ３に備わるＰＥ番号発生回路６０が、・０、１、２、３、０、１、２、３、０、１・・・・という繰返しパターンを形成する数を作出し得る、とい
うことである。繰返しパターンは、勿論、上記のものに
限定されない。The PE number generation circuit 60 shown in FIG.
This is a circuit that can create an E number. Further, the PE number generation circuit 60 is set so as to be able to generate a number (column) forming a predetermined repetitive pattern in accordance with the order of the PE numbers under the control of GP4. That is, for example, PE
The numbers are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
In PE3 marked with 10 ..., those P3
That is, the PE number generation circuit 60 provided in E3 can generate numbers forming a repetitive pattern of 0, 1, 2, 3, 0, 1, 2, 3, 0, 1,. . The repeating pattern is, of course, not limited to the above.

【００３４】＜基礎となる構成によるディザ法の２値化
処理＞まず、図１及び図２のブロック図に示されるＳＩ
ＭＤ型マイクロプロセッサ２を利用するディザ法の２値
化処理を、図３の本発明に係るブロック図を用いて説明
する。以下の説明のディザ法では、４×４のディザマト
リクスを利用する。<Binarization Processing of Dither Method Based on Basic Configuration> First, the SI shown in the block diagrams of FIGS.
The binarization process of the dither method using the MD microprocessor 2 will be described with reference to the block diagram of FIG. 3 according to the present invention. In the dither method described below, a 4 × 4 dither matrix is used.

【００３５】夫々のＰＥ３が、ディザマトリクスの１つ
の行に属する４つの値と、４個のＰＥ３の単位で、繰り
返し関連付けられることを目的として、複数（２５６
個）配置されたＰＥ３の端から順に、各ＰＥ３を４種類
に分類していく必要がある。For the purpose of repeatedly associating each PE3 with four values belonging to one row of the dither matrix in units of four PE3s, a plurality of (256)
It is necessary to classify each PE3 into four types in order from the end of the arranged PE3.

【００３６】最初に、各ＰＥでは、ＰＥ番号データをＰ
Ｅ番号発生回路６０からＡレジスタ３８にロードする。
続いて、Ａレジスタ３８に格納された上記データを、プ
ロセッサ２への命令により“４”で除算し、その剰余値
を求める。この値は、ＰＥ番号データの下位２ビット以
外を０に置きかえる（即ち、ＰＥ番号データと“０ｘ
３”とのＡＮＤ演算を行なう）ことにより計算される。
その演算結果はＡレジスタ３８に格納されるとする。す
ると、この演算結果値は、ＰＥ番号の小さいＰＥ３から
順に・０、１、２、３、０、１、２、３、０、・・・となり、各ＰＥ３では、Ａレジスタ３８の内容は、ＰＥ
３の端から順に（ＰＥ番号の小さいＰＥ３から順に）、
４つの種類の値の繰り返しとなる。First, in each PE, the PE number data is
The A register 38 is loaded from the E number generation circuit 60.
Subsequently, the data stored in the A register 38 is divided by "4" according to an instruction to the processor 2, and the remainder value is obtained. This value replaces the lower two bits of the PE number data with 0 (that is, the PE number data and “0x
3 ").
It is assumed that the operation result is stored in the A register 38. Then, this operation result value becomes, in order from the PE3 with the smallest PE number, 0, 1, 2, 3, 0, 1, 2, 3, 0,... In each PE3, the content of the A register 38 is PE
3 in order from the end (from the PE number with the smallest PE number)
It is a repetition of four types of values.

【００３７】前に説明したように、ＧＰ４からの制御に
より、ＰＥ番号発生回路６０に、ＰＥ番号の順序に従い
所定の繰返しパターンを形成する数（列）を、作出させ
てもよい。つまり、例えば、ＰＥ番号が、・０、１、２、３、４、５、６、７、８、９、１０・・
・・と付されているＰＥ３において、それらＰＥ３に備わる
ＰＥ番号発生回路６０に、・０、１、２、３、０、１、２、３、０、１・・・・という繰返しパターンを形成する数を作出させて、Ａレ
ジスタ３８に格納させてもよい。As described above, under the control of GP4, the PE number generating circuit 60 may generate a number (column) for forming a predetermined repeating pattern in accordance with the order of the PE numbers. That is, for example, if the PE numbers are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
In the PE3 denoted by..., A repetition pattern of 0, 1, 2, 3, 0, 1, 2, 3, 0, 1,... Is formed in the PE number generation circuit 60 provided in the PE3. Alternatively, a number may be created and stored in the A register 38.

【００３８】次に、プロセッサ２への命令により、全Ｐ
Ｅ３のＡレジスタ３８において“１”と比較演算を行な
う。一致すればそのＰＥ３において、・Ｔ１＝１とする。なおここで、８ビットの条件レジスタ５４は、
下位から順に、・Ｔ０、Ｔ１、Ｔ２、Ｔ３、Ｔ４、Ｔ５、Ｔ６、Ｔ７なるビットで構成される。Next, an instruction to the processor 2 causes all P
A comparison operation is performed with "1" in the A register 38 of E3. If they match, at that PE3: T1 = 1. Here, the 8-bit condition register 54
In order from the lowermost bit: T0, T1, T2, T3, T4, T5, T6, T7.

【００３９】続く命令により、“２”と比較演算し一致
すればそのＰＥ３において、・Ｔ２＝１とし、“３”と比較演算し一致すればそのＰＥ３におい
て、・Ｔ３＝１とする。In accordance with the following instruction, if a comparison operation is performed with "2" and a match is made, T2 = 1 is set in the PE3, and if a comparison operation is performed with "3" and the match is made, T3 = 1 is set in the PE3.

【００４０】ディザマトリクスの各値は、ＧＰ４内のプ
ログラムＲＡＭ１０もしくはデータＲＡＭ１２に格納さ
れている。まず、プロセッサ２への命令により、ディザ
マトリクスの１列目のしきい値を、例えば、即値データ
バス５３を介して，全ＰＥ３のＡレジスタ３８にロード
する。次に、プロセッサ２への命令により、ディザマト
リクスの２列目のしきい値を、「Ｔ１＝１」となってい
るＰＥ３のＡレジスタ３８にロードする。さらに、ディ
ザマトリクスの３列目のしきい値を、「Ｔ２＝１」とな
っているＰＥ３のＡレジスタ３８にロードする。さら
に、ディザマトリクスの４列目のしきい値を、「Ｔ３＝
１」となっているＰＥ３のＡレジスタ３８にロードす
る。これらのロード操作により、Ａレジスタ３８には所
望のしきい値が格納されることになる。各ＰＥ３におい
て、これらのしきい値を画像の画素データが格納されて
いるレジスタ（例えばＲ０レジスタ）のデータと比較す
る。比較結果により（即ち、しきい値と画素データとの
大小関係により）、それぞれの演算結果データを「０ｘ
ｆｆ」もしくは「０ｘ００」に設定する。ここにおい
て、１ラインのディザ法の２値化処理が完了する。Each value of the dither matrix is stored in the program RAM 10 or the data RAM 12 in the GP 4. First, in response to an instruction to the processor 2, the threshold value of the first column of the dither matrix is loaded into the A registers 38 of all PEs 3 via, for example, the immediate data bus 53. Next, in accordance with an instruction to the processor 2, the threshold value of the second column of the dither matrix is loaded into the A register 38 of PE3 in which "T1 = 1". Further, the threshold value of the third column of the dither matrix is loaded into the A register 38 of PE3 in which “T2 = 1”. Further, the threshold value of the fourth column of the dither matrix is set to “T3 =
The data is loaded into the A register 38 of PE3 which is "1". By these loading operations, a desired threshold value is stored in the A register 38. In each PE3, these threshold values are compared with data in a register (for example, an R0 register) in which pixel data of an image is stored. According to the comparison result (that is, according to the magnitude relationship between the threshold value and the pixel data), each operation result data is set to “0x
ff ”or“ 0x00 ”. Here, the binarization processing of the dither method for one line is completed.

【００４１】ＳＩＭＤ型マイクロプロセッサ２を利用す
る演算処理において、画像データの１ラインの画素数が
プロセッサ２に備わるＰＥ個数を超える場合がある。そ
の場合には、１ラインをＰＥ個数で分割し、同じ処理を
分割数だけ繰り返すことになる。In arithmetic processing using the SIMD type microprocessor 2, the number of pixels in one line of image data may exceed the number of PEs provided in the processor 2. In that case, one line is divided by the number of PEs, and the same processing is repeated by the number of divisions.

【００４２】ディザ法の２値化処理では、画像の画素デ
ータとしきい値との比較処理そのものは１命令で終了す
るにもかかわらず、上記のようにしきい値をロードする
ための命令ステップが別途必要とされる。しかも、その
命令ステップは、分割された処理を繰り返す度に実行さ
れなければならない。但し、しきい値のロード操作を１
ラインの処理の先頭の分割処理においてのみ行ない、そ
の際に利用したしきい値をＰＥの所定のレジスタ（例え
ばＲ１）に格納し、後続の分割処理ではレジスタ（Ｒ
１）に格納されたしきい値を使用すれば、命令ステップ
数を削減することができる。しかし、このようにして
も、しきい値を格納するために各ＰＥ３の幾つかのレジ
スタが占有されてしまうという問題点が残る。In the binarization processing of the dither method, although the comparison processing between the pixel data of the image and the threshold value is completed by one instruction, the instruction step for loading the threshold value is separately provided as described above. Needed. Moreover, the instruction step must be executed each time the divided processing is repeated. However, the load operation of the threshold is 1
This is performed only in the first division processing of the line processing, the threshold value used at that time is stored in a predetermined register (eg, R1) of the PE, and in the subsequent division processing, the register (R
If the threshold value stored in 1) is used, the number of instruction steps can be reduced. However, even in this case, there remains a problem that some registers of each PE 3 are occupied for storing the threshold value.

【００４３】＜第１の実施の形態＞図５は、本発明に係
る第１の実施の形態のＳＩＭＤ型マイクロプロセッサ２
の構成を示す。図１及び図２のＳＩＭＤ型マイクロプロ
セッサ２に対して、幾らかの構成要素が付加されてい
る。<First Embodiment> FIG. 5 shows a SIMD type microprocessor 2 according to a first embodiment of the present invention.
Is shown. Some components are added to the SIMD type microprocessor 2 of FIGS. 1 and 2.

【００４４】グローバルプロセッサ（ＧＰ）４から、４
本のパラメータバス（第１のパラメータバス６２−０、
第２のパラメータバス６２−１、第３のパラメータバス
６２−２、第４のパラメータバス６２−３）を介して、
４組の８ビット・データが各ＰＥ３に供給されるように
構成されている。各ＰＥ３においては、それら４組の８
ビット・データを受信するために４つのバッファ回路
（６６−０、６６−１、６６−２、６６−３）が設定さ
れている。それらバッファ回路は上記の４本のパラメー
タバスを、各ＰＥ３の内部バス７０に接続する。The global processors (GP) 4 to 4
Parameter buses (first parameter bus 62-0,
Via a second parameter bus 62-1, a third parameter bus 62-2, and a fourth parameter bus 62-3)
Four sets of 8-bit data are configured to be supplied to each PE3. In each PE3, the four sets of 8
Four buffer circuits (66-0, 66-1, 66-2, 66-3) are set to receive bit data. These buffer circuits connect the four parameter buses to the internal bus 70 of each PE 3.

【００４５】更に、４組のリード（第１のリード６４−
０、第２のリード６４−１、第３のリード６４−２、第
４のリード６４−３）が各ＰＥ３の４つのバッファ回路
（６６−０、６６−１、６６−２、６６−３）に向けて
引かれている。これらリード（６４−０、６４−１、６
４−２、６４−３）は、後で説明するように、上記の８
ビット・データが各ＰＥ３の内部バス７０に出力される
タイミングを制御する１ビット信号を供給する。Further, four sets of leads (first leads 64-
0, the second lead 64-1, the third lead 64-2, and the fourth lead 64-3) are the four buffer circuits (66-0, 66-1, 66-2, 66-3) of each PE3. ). These leads (64-0, 64-1, 6
4-2, 64-3) is the above-mentioned 8 as described later.
A 1-bit signal that controls the timing at which bit data is output to the internal bus 70 of each PE 3 is supplied.

【００４６】上記の４本のパラメータバスには、ＧＰ４
内のデータＲＡＭ１２のデータが転送される。本実施の
形態では、ディザマトリクスのしきい値データがそのよ
うに転送される。The above four parameter buses include GP4
The data in the data RAM 12 is transferred. In the present embodiment, the threshold data of the dither matrix is transferred as such.

【００４７】また、各ＰＥ３において、４組のパラメー
タバス（６２−０、６２−１、６２−２、６２−３）、
即ち４つのバッファ回路（６６−０、６６−１、６６−
２、６６−３）の、いずれかを選択するための２ビット
の選択信号が、ＰＥ番号発生回路６０より４つのバッフ
ァ回路（６６−０、６６−１、６６−２、６６−３）に
入力される。各バッファ回路（符号６６により総称す
る。）は、この選択信号をデコードすることにより、自
身に対する選択か否かを判断する。In each PE 3, four parameter buses (62-0, 62-1, 62-2, 62-3),
That is, four buffer circuits (66-0, 66-1, 66-
2 or 66-3), the PE number generation circuit 60 sends a 2-bit selection signal to the four buffer circuits (66-0, 66-1, 66-2, 66-3). Is entered. Each buffer circuit (collectively denoted by reference numeral 66) decodes this selection signal to determine whether or not the selection is for itself.

【００４８】なお、この選択信号を、ＰＥ番号発生回路
６６でデコードして作成（例えば、下位２ビットをデコ
ードして４ビットの選択信号を作成）し、バッファ回路
６６ではデコード処理しない、というような構成であっ
ても、上記の機能は実現可能である（但し、選択信号の
ビット数が増加する）。It is to be noted that the selection signal is decoded by the PE number generation circuit 66 to be created (for example, the lower 2 bits are decoded to create a 4-bit selection signal), and the decoding is not performed by the buffer circuit 66. Even with such a configuration, the above function can be realized (however, the number of bits of the selection signal increases).

【００４９】図５のＳＩＭＤ型マイクロプロセッサ２に
おいては、４本のパラメータバス（符号６２により総称
する。）が備わるが、勿論それより多くてもよく、例え
ば、８本のパラメータバス６２が備わってもよい。ディ
ザ法に関して言えば、パラメータバス６２の本数が多い
と、より大きなディザマトリクスに対応することが可能
になる。パラメータバス６２が８本である場合には、各
ＰＥ３にはバッファ回路６６も８つ設定されなければな
らない。またその場合、例えば、各ＰＥ３のＰＥ番号発
生回路６６は、ＰＥ番号の下位３ビットを選択信号とし
て出力する。各バッファ回路６６においては、３ビット
の選択信号をデコードすることにより、自身に対する選
択か否か判断することになる。The SIMD type microprocessor 2 shown in FIG. 5 has four parameter buses (collectively indicated by reference numeral 62), but may have more than four, for example, eight parameter buses 62. Is also good. As for the dither method, if the number of parameter buses 62 is large, it is possible to cope with a larger dither matrix. When the number of parameter buses 62 is eight, eight buffer circuits 66 must be set in each PE 3. In this case, for example, the PE number generation circuit 66 of each PE 3 outputs the lower 3 bits of the PE number as a selection signal. Each buffer circuit 66 determines whether or not the selection is for itself by decoding the 3-bit selection signal.

【００５０】リード（符号６４で総称する。）に関して
は、４組の動作は等価であるため、１組（１本）によっ
ても構成され得る。また、上記で示したように、ＰＥ番
号発生回路６０は、ＧＰ４からの制御により、ＰＥ番号
の順序に従い、所定の繰返しパターンを形成する数
（列）を作出し得る。Regarding the read operation (collectively denoted by reference numeral 64), the operations of the four sets are equivalent, and therefore the read operation may be constituted by one set (one). Further, as described above, the PE number generation circuit 60 can generate a number (column) forming a predetermined repetitive pattern in accordance with the order of the PE numbers under the control from GP4.

【００５１】図６は、４つのバッファ回路６６の構成の
例であり、（１）は、第１のパラメータバス６２−０及
び第１のリード６４−０に対応する、第１のバッファ回
路６６−０である。符号「６８」で示されるバスは、選
択信号バス（６８）であり、２ビットで構成されてい
る。下位ビットが“ＣＴ０”に入力され、上位ビットが
“ＣＴ１”に入力される。図の下方にはリード（第１の
リード６４−０）が示されている。図６（１）の回路構
成からすると、選択信号として“００ｂ”が入力され、
且つ、第１のリード６４−０に“１ｂ”の信号が入力さ
れるときに、第１のパラメータバス６２−０を通過する
信号（データ）が内部バス７０に出力される。FIG. 6 shows an example of the configuration of the four buffer circuits 66. FIG. 6A shows the first buffer circuit 66 corresponding to the first parameter bus 62-0 and the first lead 64-0. −0. The bus denoted by reference numeral "68" is a selection signal bus (68), which is composed of two bits. The lower bit is input to “CT0” and the upper bit is input to “CT1”. The lead (first lead 64-0) is shown below the figure. According to the circuit configuration of FIG. 6A, “00b” is input as a selection signal,
When the signal “1b” is input to the first lead 64-0, a signal (data) passing through the first parameter bus 62-0 is output to the internal bus 70.

【００５２】同様に図６（２）は、第２のパラメータバ
ス６２−１及び第２のリード６４−１に対応する、第２
のバッファ回路６６−１である。この回路構成では、選
択信号として“０１ｂ”が入力され、且つ、第２のリー
ド６４−１に“１ｂ”の信号が入力されるときに、第２
のパラメータバス６２−１を通過する信号（データ）が
内部バス７０に出力される。Similarly, FIG. 6B shows the second parameter bus 62-1 and the second lead 64-1 corresponding to the second parameter bus 62-1.
Of the buffer circuit 66-1. In this circuit configuration, when “01b” is input as the selection signal and the signal “1b” is input to the second lead 64-1,
(Data) passing through the parameter bus 62-1 is output to the internal bus 70.

【００５３】また図６（３）は、第３のパラメータバス
６２−２及び第３のリード６４−２に対応する、第３の
バッファ回路６６−２である。この回路構成では、選択
信号として“１０ｂ”が入力され、且つ、第３のリード
６４−２に“１ｂ”の信号が入力されるときに、第３の
パラメータバス６２−２を通過する信号（データ）が内
部バス７０に出力される。FIG. 6C shows a third buffer circuit 66-2 corresponding to the third parameter bus 62-2 and the third lead 64-2. In this circuit configuration, when “10b” is input as the selection signal and “1b” is input to the third lead 64-2, a signal (“B”) passing through the third parameter bus 62-2 ( Data) is output to the internal bus 70.

【００５４】更に図６（４）は、第４のパラメータバス
６２−３及び第４のリード６４−３に対応する、第４の
バッファ回路６６−３である。この回路構成では、選択
信号として“１１ｂ”が入力され、且つ、第４のリード
６４−３に“１ｂ”の信号が入力されるときに、第４の
パラメータバス６２−３を通過する信号（データ）が内
部バス７０に出力される。FIG. 6D shows a fourth buffer circuit 66-3 corresponding to the fourth parameter bus 62-3 and the fourth lead 64-3. In this circuit configuration, when “11b” is input as the selection signal and “1b” is input to the fourth lead 64-3, the signal (“B”) passing through the fourth parameter bus 62-3 ( Data) is output to the internal bus 70.

【００５５】第１の実施の形態のＳＩＭＤ型マイクロプ
ロセッサ２を利用して、ディザ法の２値化処理を行なう
手順を説明する。A procedure for performing the binarization process of the dither method using the SIMD type microprocessor 2 of the first embodiment will be described.

【００５６】各ＰＥ３のＰＥ番号発生回路６０は、プロ
セッサ２に対する命令により、ＰＥ番号の順序に眺める
と所定の繰返しパターンを形成する数（列）を作出し、
その値を選択信号としてバッファ回路６６に与える。The PE number generation circuit 60 of each PE 3 generates a number (column) that forms a predetermined repetitive pattern when viewed in the order of the PE numbers according to an instruction to the processor 2.
The value is given to the buffer circuit 66 as a selection signal.

【００５７】ＰＥ番号発生回路６０が作出する数は、例
えば、（２進法で表現した）ＰＥ番号の下位２ビットで
ある。つまり、ＰＥ番号が、・０、１、２、３、４、５、６、７、８、９、１０・・
・・と付されているＰＥ３において、それらＰＥ３に備わる
ＰＥ番号発生回路６０に、・０、１、２、３、０、１、２、３、０、１・・・・という繰返しパターンを形成する数を作出させ、選択信
号としてバッファ回路６６に与える。The number generated by the PE number generation circuit 60 is, for example, the lower two bits of the PE number (expressed in binary). That is, if the PE number is: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
In the PE3 denoted by..., A repetition pattern of 0, 1, 2, 3, 0, 1, 2, 3, 0, 1,... Is formed in the PE number generation circuit 60 provided in the PE3. The number is calculated and given to the buffer circuit 66 as a selection signal.

【００５８】また、同じ命令により、データＲＡＭ１２
上にある４つの「しきい値」を同時に上記４組のパラメ
ータバス６２を介して各ＰＥ３に転送する。各ＰＥ３で
は、ＰＥ番号発生回路６０から与えられた選択信号をバ
ッファ回路６６でデコードする。これにより、４組のパ
ラメータバス６２のデータのいずれかが選択される。選
択されたパラメータバス６２に係るデータ（即ち、「し
きい値」）は、ＰＥ内部バス７０に出力される。Also, the same instruction causes the data RAM 12
The upper four “thresholds” are simultaneously transferred to each PE 3 via the four sets of parameter buses 62. In each PE 3, the selection signal provided from the PE number generation circuit 60 is decoded by the buffer circuit 66. As a result, one of the data of the four parameter buses 62 is selected. Data related to the selected parameter bus 62 (ie, “threshold”) is output to the PE internal bus 70.

【００５９】ＰＥ内部バス７０に出力されたデータはマ
ルチプレクサ４２、シフタ４４を介してＡＬＵ３６に入
力され、Ａレジスタ３８に格納される。The data output to the PE internal bus 70 is input to the ALU 36 via the multiplexer 42 and the shifter 44, and is stored in the A register 38.

【００６０】ここで、Ａレジスタ３８には４組のしきい
値から、選択されるべきしきい値が選択され、しかも１
つの命令で格納されることになる。この後の処理は、上
記と同様である。Here, a threshold to be selected is selected from the four sets of thresholds in the A register 38.
Will be stored in one instruction. Subsequent processing is the same as described above.

【００６１】以上の処理により、これまではしきい値の
種類分掛かっていたしきい値のロード処理が、１回のス
テップ（処理）で実現できるようになる。よって、処理
時間が減少される。With the above-described processing, the loading processing of the threshold value, which has conventionally been performed by the number of threshold values, can be realized in one step (processing). Thus, processing time is reduced.

【００６２】＜第２の実施の形態＞図７は、本発明に係
る第２の実施の形態のＳＩＭＤ型マイクロプロセッサ２
の構成を示す。第１の実施の形態のＳＩＭＤ型マイクロ
プロセッサ２（図５）の構成と、概略同様である。<Second Embodiment> FIG. 7 shows a SIMD type microprocessor 2 according to a second embodiment of the present invention.
Is shown. The configuration is roughly the same as the configuration of the SIMD type microprocessor 2 (FIG. 5) of the first embodiment.

【００６３】第１の実施の形態のＳＩＭＤ型マイクロプ
ロセッサ２においては、４つのバッファ回路６６に対す
る選択信号は、ＰＥ番号発生回路６０から出力されたも
のが利用されている。一方、第２の実施の形態のＳＩＭ
Ｄ型マイクロプロセッサ２においては、条件レジスタ５
４の２つのビット、例えば、・（Ｔ１、Ｔ２）が、選択信号を与える。選択信号を与える（発生させ
る）供給源は、各ＰＥ３毎に任意の値が格納され得るレ
ジスタであればよく、条件レジスタ５４に限定されるも
のではない。例えば、Ａレジスタ３８やＦレジスタ４０
であってもよい。In the SIMD type microprocessor 2 of the first embodiment, the signals output from the PE number generation circuit 60 are used as the selection signals for the four buffer circuits 66. On the other hand, the SIM of the second embodiment
In the D-type microprocessor 2, the condition register 5
Two bits of 4, for example: (T1, T2) provide the selection signal. The supply source that supplies (generates) the selection signal may be any register that can store an arbitrary value for each PE 3, and is not limited to the condition register 54. For example, the A register 38 and the F register 40
It may be.

【００６４】第２の実施の形態のＳＩＭＤ型マイクロプ
ロセッサ２を利用して、ディザ法の２値化処理を行なう
手順を説明する。A procedure for performing the binarization processing of the dither method using the SIMD type microprocessor 2 of the second embodiment will be described.

【００６５】各ＰＥ３にて、ＰＥ番号発生回路６０でＰ
Ｅ番号を作成し、Ａレジスタ３８にロードする。次に、
Ａレジスタ３８にロードされたデータと、“０ｘ３”と
において、ＡＮＤ処理を行なう。つまり、Ａレジスタ３
８にロードされたデータにおいて下位２ビット以外を
“０ｂ”に置き換える。その後、このデータを任意のＰ
Ｅレジスタ（例えばＲ２レジスタ）に転送する。In each PE 3, P
An E number is created and loaded into the A register 38. next,
The AND processing is performed on the data loaded into the A register 38 and “0x3”. That is, A register 3
In the data loaded in No. 8, data other than the lower two bits are replaced with “0b”. Then, this data is
Transfer to E register (for example, R2 register).

【００６６】Ｒ２レジスタに転送されたデータを、１ビ
ット左（上位）シフトをして、条件レジスタ５４へロー
ドする。以上により、各ＰＥ３において、条件レジスタ
５４の最下位ビットを除く下位２ビット（Ｔ１、Ｔ２）
に、４つの種類の値が（ＰＥ３の端から順に繰り返され
て）設定される。The data transferred to the R2 register is shifted by one bit to the left (higher order) and loaded into the condition register 54. As described above, in each PE3, the lower two bits (T1, T2) of the condition register 54 excluding the least significant bit
Are set (repeated in order from the end of PE3).

【００６７】次に、プロセッサ２への命令により、４つ
の「しきい値」を同時に上記４組のパラメータバス６２
を介して各ＰＥ３に転送する。各ＰＥ３では、条件レジ
スタ５４から与えられた選択信号をバッファ回路６６で
デコードする。これにより、４組のパラメータバス６２
のデータのいずれかが選択される。選択されたパラメー
タバス６２に係るデータ（即ち、「しきい値」）は、Ｐ
Ｅ内部バス７０に出力される。Next, according to an instruction to the processor 2, four "thresholds" are simultaneously set in the four parameter buses 62.
To each of the PEs 3 via the. In each PE 3, the selection signal given from the condition register 54 is decoded by the buffer circuit 66. As a result, the four parameter buses 62
Is selected. The data (ie, “threshold”) associated with the selected parameter bus 62 is P
It is output to the E internal bus 70.

【００６８】ＰＥ内部バス７０に出力されたデータはマ
ルチプレクサ４２、シフタ４４を介してＡＬＵ３６に入
力され、Ａレジスタ３８に格納される。The data output to the PE internal bus 70 is input to the ALU 36 via the multiplexer 42 and the shifter 44 and stored in the A register 38.

【００６９】ここで、Ａレジスタ３８には４組のしきい
値から、選択されるべきしきい値が選択され、格納され
ることになる。この後の処理は、上記と同様である。Here, the threshold to be selected from the four sets of thresholds is selected and stored in the A register 38. Subsequent processing is the same as described above.

【００７０】以上の第２の実施の形態の処理手順の例で
は、第１の実施の形態に関する上記記述における処理手
順の例と、略同様のものを示しているが、第２の実施の
形態のＳＩＭＤ型マイクロプロセッサ２を利用すれば、
各ＰＥ３を種類分けするための繰り返しパターンの生成
を、より自由に行なうことができる。第１の実施の形態
のＳＩＭＤ型マイクロプロセッサ２の構成では、ＰＥ番
号発生回路６０で形成される繰り返しパターンのバリエ
ーションがあまり多くないものと想定される。即ち、Ｐ
Ｅ番号発生回路６０は前に説明したような簡単な回路構
成であるため、生成可能な繰り返しパターンは、・０、１、０、１、０、１、０、１、０、１、０、１・
・・・０、１、２、３、０、１、２、３、０、１、２、３・
・・・０、１、２、３、４、５、６、７、０、１、２、３、
４、５、６、７・・・などのような、２のベキ乗を繰り返すものに限られてし
まう。一方、第２の実施の形態に係るＳＩＭＤ型マイク
ロプロセッサ２の構成では、３×３のディザマトリクス
や６×６のディザマトリクスを利用するディザ法による
２値化処理にも、対応可能である。更に、・０、１、２、３、３、２、１、０、０、１、２、３、
３、２、１・・・というようなパターンも、生成可能である。In the above-described example of the processing procedure of the second embodiment, the example of the processing procedure in the above description of the first embodiment is substantially the same as that of the second embodiment. Using the SIMD type microprocessor 2 of
Generation of a repetitive pattern for classifying each PE 3 can be performed more freely. In the configuration of the SIMD type microprocessor 2 according to the first embodiment, it is assumed that the variation of the repetition pattern formed by the PE number generation circuit 60 is not so large. That is, P
Since the E number generation circuit 60 has a simple circuit configuration as described above, the repetition patterns that can be generated are: 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1.
.. 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3
.. 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3,
It is limited to those which repeat a power of 2, such as 4, 5, 6, 7. On the other hand, the configuration of the SIMD type microprocessor 2 according to the second embodiment can also cope with binarization processing by a dither method using a 3 × 3 dither matrix or a 6 × 6 dither matrix. Further, 0, 1, 2, 3, 3, 2, 1, 0, 0, 1, 2, 3,
Patterns such as 3, 2, 1,... Can also be generated.

【００７１】＜第３の実施の形態＞＜基礎となる構成に
よるディザ法の２値化処理＞において説明したように、
各ＰＥ３においてしきい値をロードした後に、画像デー
タとそのしきい値を比較し、しきい値以上のＰＥ３では
演算結果データ（画像データ）を“０ｘｆｆ”に、しき
い値未満のＰＥ３では“０ｘ００”に、変換する。As described in <Third Embodiment><Binarization Processing of Dither Method by Basic Configuration>
After the threshold value is loaded in each PE3, the image data is compared with the threshold value, and the operation result data (image data) is set to “0xff” for the PE3 equal to or larger than the threshold value, and is set to “0xff” for the PE3 smaller than the threshold value. 0x00 ".

【００７２】より詳しく述べる。例えば、まず、プロセ
ッサ２への命令により、画像データをしきい値と大小比
較し、画像データがしきい値以上であるＰＥ３では、
（条件レジスタ５４のＴ１ビットにおいて）「Ｔ１＝
１」と設定し、画像データがしきい値未満であるＰＥ３
では、「Ｔ１＝０」と設定する。次に、プロセッサ２へ
の命令により、「Ｔ１＝１」であるＰＥ３ではデータ
“０ｘｆｆ”をロードし、さらにプロセッサ２への命令
により、「Ｔ１＝０」であるＰＥ３ではデータ“０ｘ０
０”をロードする。This will be described in more detail. For example, first, the image data is compared in magnitude with a threshold value by an instruction to the processor 2, and in the PE 3 where the image data is equal to or larger than the threshold value,
(In the T1 bit of the condition register 54) "T1 =
1 ", PE3 with image data below the threshold
Then, “T1 = 0” is set. Next, in accordance with the instruction to the processor 2, the data "0xff" is loaded in the PE3 where "T1 = 1", and in response to the instruction to the processor 2, the data "0x0" is loaded in the PE3 where "T1 = 0".
Load 0 ".

【００７３】上記のように、データ“０ｘｆｆ”をロー
ドし続いてデータ“０ｘ００”をロードするには、２ス
テップの命令が必要である。本発明に係る第３の実施の
形態のＳＩＭＤ型マイクロプロセッサ２は、同処理を１
ステップの命令で実施することを実現するものである。As described above, a two-step instruction is required to load data "0xff" and subsequently load data "0x00". The SIMD type microprocessor 2 according to the third embodiment of the present invention performs
This is realized by executing a step command.

【００７４】ここで、第３の実施の形態に係るＳＩＭＤ
型マイクロプロセッサ２の構成は、上記第２の実施の形
態に係るＳＩＭＤ型マイクロプロセッサ２の構成と、略
同様である。第３の実施の形態に係るＳＩＭＤ型マイク
ロプロセッサ２は、図８にてマッピングが示されている
ロード命令コード８４により、動作させることができる
よう構成されている。Here, the SIMD according to the third embodiment
The configuration of the type microprocessor 2 is substantially the same as the configuration of the SIMD type microprocessor 2 according to the second embodiment. The SIMD type microprocessor 2 according to the third embodiment is configured to be operated by a load instruction code 84 whose mapping is shown in FIG.

【００７５】図８のロード命令コード（例）８４は、
（例えば）Ａレジスタ３８にロードするロード値（即
値）を、２つ備えている。従来技術における通常のロー
ド命令コードでは、即値は１つだけである。例えば、条
件レジスタ５４の所定のビットに格納される値が所定の
条件を満足するようなＰＥ３に限り、即値がＡレジスタ
３８にロードされる、というように、１つだけの即値が
必要とされる。The load instruction code (example) 84 in FIG.
For example, two load values (immediate values) to be loaded into the A register 38 are provided. In a normal load instruction code in the prior art, there is only one immediate value. For example, only one immediate value is required such that the immediate value is loaded into the A register 38 only for PE3 in which the value stored in the predetermined bit of the condition register 54 satisfies the predetermined condition. You.

【００７６】本実施の形態に係るＳＩＭＤ型マイクロプ
ロセッサ２においては、図８のロード命令コード８４に
より２つの即値データが、図７の第１のパラメータバス
６４−０と第２のパラメータバス６４−１とのそれぞれ
に出力される。即ち、「即値０」８０（図８）は第１の
パラメータバス６４−０に、「即値１」８２（図８）は
第２のパラメータバス６４−１に、出力される。パラメ
ータバス６４（６４−０、６４−１）への出力データの
選択は、レジスタファイル制御回路５６内のマルチプレ
クサ（図示せず。）により行なわれる。そこでは、ロー
ドに係る命令の種類により、データＲＡＭ１２に格納さ
れる値、若しくは上記のような即値が選択されることに
なる。In the SIMD type microprocessor 2 according to the present embodiment, two immediate data are transferred from the first parameter bus 64-0 and the second parameter bus 64- 1 are output. That is, “immediate 0” 80 (FIG. 8) is output to the first parameter bus 64-0, and “immediate 1” 82 (FIG. 8) is output to the second parameter bus 64-1. Selection of output data to the parameter bus 64 (64-0, 64-1) is performed by a multiplexer (not shown) in the register file control circuit 56. In this case, a value stored in the data RAM 12 or an immediate value as described above is selected depending on the type of the instruction related to the load.

【００７７】各ＰＥ３では、画像データとしきい値との
大小比較の結果が、上記のように条件レジスタ５４の
「Ｔ１」にて格納されている。「Ｔ１」に格納されるデ
ータは、選択信号としてバッファ回路６６に与えられ
る。この選択信号により、第１のパラメータバス６４−
０又は第２のパラメータバス６４−１のどちらかが、Ｐ
Ｅ３毎に選択される。選択されたパラメータバス６４に
係るデータが、ＰＥ内部バス７０に出力される。ＰＥ内
部バス７０に出力されたデータは、マルチプレクサ４
２、シフタ４４を介してＡＬＵ３６に入力され、Ａレジ
スタ３８に格納される。In each PE3, the result of the magnitude comparison between the image data and the threshold value is stored in "T1" of the condition register 54 as described above. The data stored in “T1” is provided to the buffer circuit 66 as a selection signal. This selection signal causes the first parameter bus 64-
0 or the second parameter bus 64-1 is P
Selected for each E3. The data related to the selected parameter bus 64 is output to the PE internal bus 70. The data output to the PE internal bus 70 is
2. The data is input to the ALU 36 via the shifter 44 and stored in the A register 38.

【００７８】結局、「Ｔ１＝０」の場合には第１のパラ
メータバス６４−０が選択され、「Ｔ１＝１」の場合に
は第２のパラメータバス６４−１が選択される。従っ
て、第１のパラメータバス６４−０に出力されるデータ
（即値０）としてデータ“０ｘ００”を指定し、第２の
パラメータバス６４−１に出力されるデータ（即値１）
としてデータ“０ｘｆｆ”を指定すれば、１ステップの
命令だけで２値化処理が可能となる。As a result, when "T1 = 0", the first parameter bus 64-0 is selected, and when "T1 = 1", the second parameter bus 64-1 is selected. Therefore, data “0x00” is designated as data (immediate value 0) output to the first parameter bus 64-0, and data (immediate value 1) output to the second parameter bus 64-1.
If the data "0xff" is designated as, the binarization process can be performed with only one step instruction.

【００７９】図８に示すロード命令コードでは、即値
（即値０、即値１）として、“０ｘ００” “０ｘｆ
ｆ”以外の値を記述することも、勿論可能である。In the load instruction code shown in FIG. 8, "0x00""0xf" is used as an immediate value (immediate value 0, immediate value 1).
Of course, it is also possible to describe a value other than f ″.

【００８０】[0080]

【発明の効果】本発明に係るＳＩＭＤ型マイクロプロセ
ッサ２を利用することにより、以下のような効果を得る
ことができる。The following effects can be obtained by using the SIMD type microprocessor 2 according to the present invention.

【００８１】第１の実施の形態に係るＳＩＭＤ型マイク
ロプロセッサ２を利用することにより、従来ではしきい
値の種類分かかっていた、ディザ法のディザマトリクス
のしきい値のロード処理が、１回のステップ（処理）で
実現できるようになり、処理時間が短縮される。By using the SIMD type microprocessor 2 according to the first embodiment, the load processing of the threshold value of the dither matrix of the dither method, which has conventionally required the number of types of threshold values, is performed once. And the processing time can be shortened.

【００８２】第２の実施の形態に係るＳＩＭＤ型マイク
ロプロセッサ２を利用することにより、第１の実施の形
態と同様、従来ではしきい値の種類分かかっていた上記
しきい値のロード処理が１回のステップ（処理）で実現
できるようになり、処理時間が短縮される。更に、上記
しきい値の繰り返しパターンがより複雑なものであって
も、対応できる。By using the SIMD type microprocessor 2 according to the second embodiment, the load processing of the threshold value, which conventionally takes the number of types of threshold values, can be performed similarly to the first embodiment. This can be realized by one step (processing), and the processing time is shortened. Furthermore, even if the repetition pattern of the threshold value is more complicated, it can be handled.

【００８３】第３の実施の形態に係るＳＩＭＤ型マイク
ロプロセッサ２を利用することにより、（１つの命令内
に記述される）２つの即値データにおいて各ＰＥがいず
れかを選択しロードまでを行なうという処理を、１ステ
ップにより実現できる。By using the SIMD type microprocessor 2 according to the third embodiment, each PE selects one of two immediate data (described in one instruction) and performs processing up to loading. The processing can be realized by one step.

[Brief description of the drawings]

【図１】本発明に係るＳＩＭＤ型マイクロプロセッサ
の概略の構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a SIMD type microprocessor according to the present invention.

【図２】本発明に係るＳＩＭＤ型マイクロプロセッサ
の更に詳細な構成を示すブロック図である。FIG. 2 is a block diagram showing a more detailed configuration of a SIMD type microprocessor according to the present invention.

【図３】本発明に係るＳＩＭＤ型マイクロプロセッサ
の基礎となる構成を示すブロック図である。FIG. 3 is a block diagram showing a basic configuration of a SIMD type microprocessor according to the present invention.

【図４】ディザ法のディザマトリクスの例である。FIG. 4 is an example of a dither matrix of a dither method.

【図５】本発明に係る第１の実施の形態のＳＩＭＤ型
マイクロプロセッサの詳細な構成を示すブロック図であ
る。FIG. 5 is a block diagram illustrating a detailed configuration of a SIMD microprocessor according to the first embodiment of the present invention;

【図６】バッファ回路の構成の例である。FIG. 6 is an example of a configuration of a buffer circuit.

【図７】本発明に係る第２の実施形態のＳＩＭＤ型マ
イクロプロセッサの詳細な構成を示すブロック図であ
る。FIG. 7 is a block diagram illustrating a detailed configuration of a SIMD microprocessor according to a second embodiment of the present invention;

【図８】本発明の第３の実施の形態に係るロード命令
コードのマッピング図である。FIG. 8 is a mapping diagram of a load instruction code according to the third embodiment of the present invention.

[Explanation of symbols]

２・・・ＳＩＭＤ型マイクロプロセッサ、３・・・プロ
セッサエレメント、４グローバルプロセッサ、６・・・
レジスタファイル、８・・・演算アレイ、３６・・・１
６ビットＡＬＵ、３８・・・Ａレジスタ、５０・・・デ
ィザマトリクス、５３・・・即値データバス、５４・・
・条件レジスタ、５６・・・レジスタファイル制御回
路、５８・・・ＰＥ演算部制御回路、６０・・・ＰＥ番
号発生回路、６２−０・・・第１のパラメータバス、６
２−１・・・第２のパラメータバス、６２−２・・・第
３のパラメータバス、６２−３・・・第４のパラメータ
バス、６４−０・・・第１のリード、６４−１・・・第
２のリード、６４−２・・・第３のリード、６４−３・
・・第４のリード、６６−０・・・第１のバッファ回
路、６６−１・・・第２のバッファ回路、６６−２・・
・第３のバッファ回路、６６−３・・・第４のバッファ
回路、６８・・・選択信号バス、７０・・・内部バス。2 SIMD microprocessor, 3 processor element, 4 global processor, 6
Register file, 8 ... operation array, 36 ... 1
6-bit ALU, 38 A register, 50 dither matrix, 53 immediate data bus, 54
· Condition register, 56 ··· Register file control circuit, 58 ··· PE operation unit control circuit, 60 ··· PE number generation circuit, 62-0 ··· First parameter bus, 6
2-1 ... second parameter bus, 62-2 ... third parameter bus, 62-3 ... fourth parameter bus, 64-0 ... first read, 64-1 ... second lead, 64-2 ... third lead, 64-3
..The fourth lead, 66-0 ... the first buffer circuit, 66-1 ... the second buffer circuit, 66-2 ...
A third buffer circuit, 66-3... A fourth buffer circuit, 68... A selection signal bus, 70.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｎ 1/405 Ｈ０４Ｎ 1/40 Ｃ ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) H04N 1/405 H04N 1/40 C

Claims

[Claims]

1. An SIMD microprocessor including one global processor and a plurality of processor elements, wherein a plurality of data buses are provided from the global processor to each processor element. Generating a selection signal specifying which data bus is to be selected from the plurality of data buses; and transmitting a signal transferred from the global processor via the data bus selected by the selection signal to each processor element. SIM stored in a predetermined register of
D-type microprocessor.

2. A serial number is sequentially assigned to each processor element. In each processor element, an upper bit of a predetermined number of digits is set to "0" with respect to its own serial number expressed in a binary system. 2. The SIMD type microprocessor according to claim 1, wherein a signal formed as a result is used as the selection signal.

3. An operation result data in each processor element, or data derived from the operation result,
2. The SIMD type microprocessor according to claim 1, wherein said selection signal is stored in a predetermined register in each processor element, and a signal extracted from said register is used as said selection signal.

4. An SIMD microprocessor operated by an instruction code including two or more immediate values, wherein the plurality of immediate values are transmitted to the plurality of data buses. 4. The SIMD type microprocessor according to 3.