JP2013161325A

JP2013161325A - Simd (single instruction-stream multiple data-stream) type microprocessor, processor system and data processing method for simd type microprocessor

Info

Publication number: JP2013161325A
Application number: JP2012023805A
Authority: JP
Inventors: Kazuhiko Iwanaga; 和彦岩永
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2012-02-07
Filing date: 2012-02-07
Publication date: 2013-08-19

Abstract

PROBLEM TO BE SOLVED: To provide an SIMD type microprocessor, a processor system and the data processing method of the SIMD type microprocessor for enabling each PE (Processor Element) to quickly perform processing of referring to the data of different addresses such as rotation processing only by adding a few circuits.SOLUTION: An SIMD type microprocessor 2 includes: a PE-RAM 20 for inputting and outputting data only with a register file 14 of a processor element group 6; a DMA device 22 for controlling data transfer with the PE-RAM 20 and the register file 14. The DMA device 22 successively reads independent address values stored in an R31 register of each PE 44 in the order of PE numbers, and successively rewrites the data stored in the PE-RAM 20 to the R31 register.

Description

本発明は、１つの演算命令により画像データ等の複数のデータを並列処理するＳＩＭＤ（Single Instruction-stream Multiple Data-stream）型マイクロプロセッサと、ＳＩＭＤ型マイクロプロセッサを有するプロセッサシステムおよびＳＩＭＤ型マイクロプロセッサのデータ処理方法に関する。 The present invention relates to a single instruction-stream multiple data-stream (SIMD) type microprocessor that processes a plurality of data such as image data in parallel by one arithmetic instruction, a processor system having a SIMD type microprocessor, and a SIMD type microprocessor. The present invention relates to a data processing method.

ＳＩＭＤ型マイクロプロセッサでは１つの命令で複数のデータに対して同時に同じ演算処理が実行可能である。この構造により画像処理等を行うべき多くのデータに対して同一の演算を行う用途に使用されている。 In the SIMD type microprocessor, the same arithmetic processing can be executed simultaneously on a plurality of data with one instruction. With this structure, it is used for the purpose of performing the same operation on a lot of data to be subjected to image processing and the like.

ＳＩＭＤ型マイクロプロセッサにおいて、通常の演算処理は複数の演算器を並べ同じ演算を同時に複数のデータに対して実行することで実現できるが、演算処理が式で表すことができない非線形な処理は演算対象のデータによって演算式が変更となるため同じ処理を同時に実行できない。 In a SIMD type microprocessor, normal arithmetic processing can be realized by arranging a plurality of arithmetic units and executing the same operation on a plurality of data at the same time, but non-linear processing in which arithmetic processing cannot be expressed by an expression is subject to calculation. The same processing cannot be executed at the same time because the arithmetic expression is changed depending on the data.

この種の非線形の画像処理の一例として回転という処理が挙げられる。これは、デジタルコピー機などにおいて原稿がコンタクトガラスに対して歪んで置かれた場合に水平面に対して歪んでいる角度分だけスキャンした画像を回転することで適正な画像を得るという処理である。 An example of this kind of non-linear image processing is a process called rotation. This is a process of obtaining an appropriate image by rotating an image scanned by an angle distorted with respect to a horizontal plane when an original is distorted with respect to a contact glass in a digital copying machine or the like.

特許文献１では、回転処理において回転前の元データに対して、回転角度に応じて副走査方向のずれ量を算出して回転後のデータをサンプリングしていき、バイリニアあるいはバイキュービックで補間演算することで回転処理を実現することが記載されている。 In Patent Document 1, with respect to the original data before the rotation in the rotation process, the deviation amount in the sub-scanning direction is calculated according to the rotation angle, the data after the rotation is sampled, and the interpolation calculation is performed by bilinear or bicubic. It is described that the rotation processing is realized.

特許文献２や３には、ＳＩＭＤ型マイクロプロセッサが備える各プロセッサエレメント（ＰＥ）に個別のＳＲＡＭ（Static Random Access Memory）を持った構成となっており、それぞれのＰＥで副走査方向のずれ量を計算した値をアドレスとしてデータを参照することが可能である。 In Patent Documents 2 and 3, each processor element (PE) provided in the SIMD type microprocessor has an individual SRAM (Static Random Access Memory), and the displacement amount in the sub-scanning direction is determined by each PE. Data can be referred to using the calculated value as an address.

特許文献４は、各ＰＥに独立したＳＲＡＭを持つことなくＬＵＴ（Look Up Table）に代表される非線形処理を高速に行う構成が記載されている。 Patent Document 4 describes a configuration in which non-linear processing represented by LUT (Look Up Table) is performed at high speed without having an independent SRAM in each PE.

特許文献１に記載された発明をＳＩＭＤ型マイクロプロセッサで行うためにはＰＥからアクセス可能なメモリに格納されているデータをＰＥ番号の順にアドレスをインクリメントさせながら逐次で読み出していくしかなく、これを命令として実行させるとサイクル数が膨大にかかるという問題があった。 In order to perform the invention described in Patent Document 1 with a SIMD type microprocessor, there is no choice but to sequentially read data stored in a memory accessible from the PE while incrementing the address in the order of the PE number. There is a problem that the number of cycles is enormous when executed as an instruction.

特許文献２、３に記載された発明は、全てのＰＥに内蔵しているＳＲＡＭはアドレスデコーダをそれぞれが持つ構成であるためにＰＥ数が多いＳＩＭＤ型マイクロプロセッサでは回路規模が増大しコストが増大してしまうという問題があった。 In the inventions described in Patent Documents 2 and 3, since the SRAM built in all PEs has an address decoder, the SIMD microprocessor with a large number of PEs increases the circuit scale and costs. There was a problem of doing.

特許文献４は、ガンマ補正のように各ＰＥで同一の参照メモリを用いた処理にしか適用できないという問題があった。 Patent Document 4 has a problem that it can be applied only to processing using the same reference memory in each PE, such as gamma correction.

本発明はかかる問題を解決することを目的としている。 The present invention aims to solve such problems.

すなわち、本発明は、少ない回路追加のみで回転処理などの各ＰＥが異なったアドレスのデータを参照する処理を高速に行うことが出来るＳＩＭＤ型マイクロプロセッサ、プロセッサシステムおよびＳＩＭＤ型マイクロプロセッサのデータ処理方法を提供することを目的としている。 That is, the present invention provides a SIMD microprocessor, a processor system, and a data processing method for a SIMD microprocessor that can perform high-speed processing for referring to data at different addresses for each PE, such as rotation processing, with only a small number of additional circuits. The purpose is to provide.

上記に記載された課題を解決するために請求項１に記載された発明は、所定のアドレスによってリードまたはライトアクセス可能な複数のレジスタを備えたプロセッサエレメントを複数有するＳＩＭＤ型マイクロプロセッサにおいて、前記プロセッサエレメントとのみデータの入出力が可能かつ、少なくとも２以上の前記プロセッサエレメントが同時にアクセス可能なデータビット幅を有したメモリと、前記複数のレジスタのうちの一つのレジスタに格納されている値を前記プロセッサエレメントごとに順次読み出すととともに、該読み出された値を前記メモリのアドレスとして出力するデータ転送手段と、前記メモリから読み出されたデータを、少なくとも該データに対応するアドレスが読み出された前記プロセッサエレメントの前記レジスタに格納する制御を行う書き込み制御手段と、を備えたことを特徴とするＳＩＭＤ型マイクロプロセッサである。 In order to solve the problems described above, the invention described in claim 1 is a SIMD type microprocessor having a plurality of processor elements each having a plurality of registers that can be read or written by a predetermined address. A memory capable of inputting / outputting data only to / from an element and having a data bit width accessible by at least two or more processor elements simultaneously, and a value stored in one of the plurality of registers A data transfer unit that sequentially reads out each processor element and outputs the read value as an address of the memory, and at least an address corresponding to the data is read from the data read from the memory The register of the processor element; And write control means for controlling to store the data, a SIMD microprocessor that comprising the.

請求項１に記載の発明によれば、複数のレジスタのうち一つのレジスタに格納されている値を前記プロセッサエレメントごとに読み出し、該読み出された値を、前記プロセッサエレメントとのみデータの入出力が可能かつ、少なくとも２以上の前記プロセッサエレメントが同時にアクセス可能なデータビット幅を有したメモリのアドレスとし、前記メモリから読み出されたデータを、少なくとも該データに対応するアドレスが読み出された前記プロセッサエレメントの前記レジスタに格納するので、各プロセッサエレメントにそれぞれ独立したアドレスデコーダを持ったメモリを備えることなく、少ない回路追加のみで回転処理などの各プロセッサエレメントが異なったアドレスのデータを参照する処理を高速に行うことが可能となる。 According to the first aspect of the present invention, a value stored in one register among a plurality of registers is read for each processor element, and the read value is input / output only with the processor element. The address of the memory having a data bit width that can be accessed simultaneously by at least two or more of the processor elements is used, and at least the address corresponding to the data is read from the data read from the memory. Since each processor element is stored in the register of the processor element, each processor element refers to data at a different address, such as rotation processing, with only a few additional circuits, without having a memory having an independent address decoder for each processor element. Can be performed at high speed.

本発明の第１の実施形態にかかるプロセッサシステムの構成図である。It is a block diagram of the processor system concerning the 1st Embodiment of this invention. ＳＩＭＤ型マイクロプロセッサの特にプロセッサエレメントグループのより詳細な構成を示すブロック図である。It is a block diagram which shows the more detailed structure of the processor element group especially of a SIMD type | mold microprocessor. 図１に示されたＳＩＭＤ型マイクロプロセッサの要部構成図である。FIG. 2 is a main part configuration diagram of the SIMD type microprocessor shown in FIG. 1. 図３に示されたＲ３１レジスタにアドレス設定値を記載した説明図である。It is explanatory drawing which described the address setting value in R31 register shown by FIG. 図３に示されたＳＩＭＤ型マイクロプロセッサのＰＥ−ＲＡＭからのデータ読み出し動作を示したタイミングチャートである。FIG. 4 is a timing chart showing a data read operation from the PE-RAM of the SIMD type microprocessor shown in FIG. 3. FIG. 図３に示されたＳＩＭＤ型マイクロプロセッサのＰＥ−ＲＡＭからのデータ読み出し動作を示したフローチャートである。4 is a flowchart showing a data read operation from the PE-RAM of the SIMD type microprocessor shown in FIG. 3. 本発明の第２の実施形態にかかるＳＩＭＤ型マイクロプロセッサの要部構成図である。It is a principal part block diagram of the SIMD type | mold microprocessor concerning the 2nd Embodiment of this invention. 図７に示されたデコーダの構成を示した構成図である。FIG. 8 is a configuration diagram illustrating a configuration of the decoder illustrated in FIG. 7. 図８に示された書き込みデコーダのデコード条件を示した説明図である。It is explanatory drawing which showed the decoding conditions of the write decoder shown by FIG. 図７に示されたＳＩＭＤ型マイクロプロセッサのＲ３０レジスタに格納するアドレス設定値の例と、その例に対してデータ整形を施した後のデータを示した説明図である。FIG. 8 is an explanatory diagram illustrating an example of an address setting value stored in an R30 register of the SIMD type microprocessor illustrated in FIG. 7 and data after data shaping is performed on the example. 図７に示されたＳＩＭＤ型マイクロプロセッサのＰＥ−ＲＡＭからのデータ読み出し動作を示したタイミングチャートである。FIG. 8 is a timing chart showing a data read operation from the PE-RAM of the SIMD type microprocessor shown in FIG. 7. FIG. 本発明の第３の実施形態にかかるＳＩＭＤ型マイクロプロセッサの要部構成図である。It is a principal part block diagram of the SIMD type | mold microprocessor concerning the 3rd Embodiment of this invention.

（第１実施形態）
以下、本発明の第１の実施形態を、図１乃至図６を参照して説明する。図１は、本発明の第１の実施形態にかかるプロセッサシステムの構成図である。図２は、ＳＩＭＤ型マイクロプロセッサの特にプロセッサエレメントグループのより詳細な構成を示すブロック図である。図３は、図１に示されたＳＩＭＤ型マイクロプロセッサの要部構成図である。図４は、図３に示されたＲ３１レジスタにアドレス設定値を記載した説明図である。図５は、図３に示されたＳＩＭＤ型マイクロプロセッサのＰＥ−ＲＡＭからのデータ読み出し動作を示したタイミングチャートである。図６は、図３に示されたＳＩＭＤ型マイクロプロセッサのＰＥ−ＲＡＭからのデータ読み出し動作を示したフローチャートである。 (First embodiment)
Hereinafter, a first embodiment of the present invention will be described with reference to FIGS. 1 to 6. FIG. 1 is a configuration diagram of a processor system according to the first embodiment of the present invention. FIG. 2 is a block diagram showing a more detailed configuration of the SIMD type microprocessor, particularly the processor element group. FIG. 3 is a block diagram of the main part of the SIMD type microprocessor shown in FIG. FIG. 4 is an explanatory diagram in which an address set value is described in the R31 register shown in FIG. FIG. 5 is a timing chart showing a data read operation from the PE-RAM of the SIMD type microprocessor shown in FIG. FIG. 6 is a flowchart showing a data read operation from the PE-RAM of the SIMD type microprocessor shown in FIG.

図１に本発明の第１の実施形態にかかるプロセッサシステム１を示す。プロセッサシステム１は、本発明の第１の実施形態にかかるＳＩＭＤ型マイクロプロセッサ２と、メモリコントローラ１０と、メモリ１２と、を備えている。 FIG. 1 shows a processor system 1 according to a first embodiment of the present invention. The processor system 1 includes a SIMD type microprocessor 2, a memory controller 10, and a memory 12 according to the first embodiment of the present invention.

ＳＩＭＤ型マイクロプロセッサ２は、グローバルプロセッサ４と、プロセッサエレメントグループ６と、外部インタフェース８と、ＤＭＡ装置２２と、を備えている。 The SIMD type microprocessor 2 includes a global processor 4, a processor element group 6, an external interface 8, and a DMA device 22.

グローバルプロセッサ（ＧＰ）４は、所謂ＳＩＳＤ（Single Instruction Stream, Single Data Stream）タイプのプロセッサであり、プログラムＲＡＭとデータＲＡＭを内蔵し、プログラムを解読し各種制御信号を生成する。この制御信号は、内蔵する各種ブロックの制御以外にも、後述するレジスタファイル１４、演算アレイ１６、メモリコントローラ１０およびＤＭＡ装置２２に供給される。また、ＧＰ４で実行される命令であるＧＰ命令実行時は、内蔵する汎用レジスタ、ＡＬＵ（算術論理演算器）等を使用して各種演算処理、プログラム制御処理を行う。 The global processor (GP) 4 is a so-called SISD (Single Instruction Stream, Single Data Stream) type processor, which includes a program RAM and a data RAM, decodes the program, and generates various control signals. This control signal is supplied to a register file 14, an arithmetic array 16, a memory controller 10, and a DMA device 22, which will be described later, in addition to control of various built-in blocks. When a GP instruction, which is an instruction executed by GP4, is executed, various arithmetic processes and program control processes are performed using a built-in general-purpose register, ALU (arithmetic logic unit), and the like.

プロセッサエレメントグループ６は、レジスタファイル１４と、演算アレイ１６と、ＰＥ−ＲＡＭ２０と、を備えている。 The processor element group 6 includes a register file 14, an operation array 16, and a PE-RAM 20.

レジスタファイル１４は、プロセッサエレメント（ＰＥ）命令で処理されるデータを保持するレジスタ群（ファイル）である。つまり、各ＰＥ（図２参照）のレジスタ部分の集合体である。ＰＥ命令は、ＳＩＭＤ（Single Instruction Stream, Multiple Data Stream）タイプの命令であり、レジスタファイル１４に保持されている複数のデータに対して、同時に同じ処理を行う命令である。このレジスタファイル１４からのデータの読み出し／書き込みの制御は、ＧＰ４からの制御によって行われる。読み出されたデータは、後述する演算アレイ１６に送られ、演算アレイ１６での演算処理後にレジスタファイル１４に書き込まれる。 The register file 14 is a register group (file) that holds data processed by a processor element (PE) instruction. That is, it is a collection of register portions of each PE (see FIG. 2). The PE instruction is an instruction of the SIMD (Single Instruction Stream, Multiple Data Stream) type, and is an instruction that simultaneously performs the same processing on a plurality of data held in the register file 14. Control of reading / writing of data from the register file 14 is performed by control from the GP 4. The read data is sent to the arithmetic array 16 described later, and is written in the register file 14 after arithmetic processing in the arithmetic array 16.

また、レジスタファイル１４は、外部インタフェース８を介してＳＩＭＤ型マイクロプロセッサ２外部のメモリコントローラ１０からのアクセスが可能であり、ＧＰ４の制御とは別に外部から特定のレジスタに対する読み出し／書き込みが行われる。さらに、レジスタファイル１４の一部のレジスタは、後述するＤＭＡ装置２２からの制御により、ＰＥ−ＲＡＭ２０から読み出されたデータが書き込まれる。 The register file 14 can be accessed from the memory controller 10 outside the SIMD type microprocessor 2 via the external interface 8, and a specific register is read / written from the outside separately from the control of the GP4. Furthermore, data read from the PE-RAM 20 is written to some registers of the register file 14 under the control of the DMA device 22 described later.

演算アレイ１６は、ＰＥ命令の演算処理が行われるブロックである。つまり、各ＰＥ（図２参照）の（ＡＬＵを含む）演算部の集合体である。処理の制御はすべてＧＰ４から行われる。 The arithmetic array 16 is a block in which arithmetic processing of a PE instruction is performed. That is, it is a collection of arithmetic units (including ALU) of each PE (see FIG. 2). All processing control is performed from GP4.

外部インタフェース８は、レジスタファイル１４とメモリコントローラ１０との間のデータや制御信号等の入出力のインタフェースを行う。 The external interface 8 performs an input / output interface between the register file 14 and the memory controller 10 such as data and control signals.

メモリコントローラ１０は、外部ポート１８にクロックとアドレス、リード／ライト制御信号を出力し、シングルポートメモリ１２にクロックとアドレス、リード／ライト制御信号を出力することで、任意のＰＥのレジスタとシングルポートメモリ１２との間で、データ転送が行われる。処理の制御はすべてＧＰ４から行われる。 The memory controller 10 outputs a clock, an address, and a read / write control signal to the external port 18, and outputs a clock, an address, and a read / write control signal to the single port memory 12. Data transfer is performed with the memory 12. All processing control is performed from GP4.

図２は、ＳＩＭＤ型マイクロプロセッサ２の特にプロセッサエレメントグループ６のより詳細な構成を示すブロック図である。 FIG. 2 is a block diagram showing a more detailed configuration of the SIMD type microprocessor 2, particularly the processor element group 6.

ＧＰ４には、ＧＰ４のプログラム格納用のプログラムＲＡＭ２３と演算データ格納用のデータＲＡＭ２４を備えている。更に、プログラムのアドレスを保持するプログラムカウンタ（ＰＣ）２５、演算処理のデータ格納のための汎用レジスタであるＧ０〜Ｇ３レジスタ（２６、２８、３０、３２）、レジスタ退避、復帰時に退避先データＲＡＭのアドレスを保持しているスタックポインタ（ＳＰ）３４、サブルーチンコール時にコール元のアドレスを保持するリンクレジスタ（ＬＳ）３６、同じく割り込み（ＩＲＱ）時とマスク不可割り込み（ＮＭＩ）時の分岐元アドレスを保持するＬＩレジスタ３８、ＬＮレジスタ４０、プロセッサの状態を保持しているプロセッサステータスレジスタ（Ｐ）４２を備えている。 The GP 4 includes a program RAM 23 for storing the GP 4 program and a data RAM 24 for storing operation data. Furthermore, a program counter (PC) 25 that holds the address of the program, G0 to G3 registers (26, 28, 30, 32) that are general-purpose registers for storing arithmetic processing data, and a save destination data RAM at the time of register saving and restoration Stack pointer (SP) 34 that holds the address of the link, link register (LS) 36 that holds the address of the caller at the time of the subroutine call, and the branch source address at the time of interrupt (IRQ) and non-maskable interrupt (NMI) An LI register 38, an LN register 40, and a processor status register (P) 42 for holding the state of the processor are provided.

これらのレジスタと、図示していない命令デコーダ、ＡＬＵ、メモリ制御回路、割り込み制御回路、外部インタフェース制御回路、及びＧＰ演算制御回路を使用してＧＰ命令の実行が行われる。また、ＰＥ命令実行時は命令デコーダ、図示していないレジスタファイル制御回路、ＰＥ演算制御回路を使用して、レジスタファイル１４の制御と演算アレイ１６の制御を行う。 The GP instruction is executed using these registers and an instruction decoder, ALU, memory control circuit, interrupt control circuit, external interface control circuit, and GP arithmetic control circuit (not shown). When the PE instruction is executed, the register file 14 and the operation array 16 are controlled using an instruction decoder, a register file control circuit (not shown), and a PE operation control circuit.

次に、レジスタファイル１４には、１つのＰＥ単位に８ビットのレジスタ５０が３２本内蔵されており、２５６ＰＥ分の組がアレイ構成になっている（一つのＰＥは、符号４４で示される一点鎖線の枠内に相当する部位であり、レジスタ４６と演算部４８とからなる）。レジスタ５０は、ＰＥ毎にレジスタＲ０、レジスタＲ１、レジスタＲ２、…、Ｒ３１レジスタと３２本が備えられている。夫々のレジスタ５０は、演算部４８に対して１つの読み出しポートと１つの書き込みポートを備えており、８ビットのリード／ライト兼用のバスで演算部４８からアクセスされる。３２本のレジスタ５０の内、２４本は外部インタフェース８を介してプロセッサ外部からアクセス可能であり、外部からクロックとアドレス、リード／ライト制御信号が入力されて指定されたレジスタ５０が読み書きされる。即ち、ＰＥ４４は、所定のアドレスによってリードまたはライトアクセス可能な複数のレジスタを備えている。 Next, in the register file 14, 32 8-bit registers 50 are built in one PE unit, and a set of 256 PEs has an array configuration (one PE is indicated by a point 44). This is a portion corresponding to the inside of the chain line, and consists of a register 46 and a calculation unit 48). The register 50 is provided with 32 registers R0, R1, R2,..., R31 registers for each PE. Each register 50 has one read port and one write port for the arithmetic unit 48, and is accessed from the arithmetic unit 48 by an 8-bit read / write bus. Of the 32 registers 50, 24 are accessible from outside the processor via the external interface 8, and the designated register 50 is read and written by inputting a clock, an address, and a read / write control signal from the outside. That is, the PE 44 includes a plurality of registers that can be read or written by a predetermined address.

レジスタ５０の外部からのアクセスでは、１つの外部ポートで各ＰＥの１つのレジスタ５０がアクセス可能であり、外部から入力されたアドレスでＰＥの番号（０〜２５５）が指定される（ＰＥには、図２の左側から順に０〜２５５の番号がＰＥ番号として付されている。ＰＥ番号はＰＥアドレスともいう。）。したがって、レジスタアクセスのための外部ポートは、全部で２４組搭載されている。 When accessing from the outside of the register 50, one register 50 of each PE can be accessed by one external port, and the PE number (0 to 255) is designated by the address inputted from the outside (in the PE) The numbers from 0 to 255 are assigned as PE numbers in order from the left side of Fig. 2. The PE numbers are also called PE addresses.) Therefore, a total of 24 external ports for register access are installed.

演算部４８は、７対１のマルチプレクサ５８と、シフト拡張回路（シフタ）６０と、２対１のマルチプレクサ（ＭＰＸ）６１と、１６ビットＡＬＵ５２と、Ａレジスタ５４と、及びＦレジスタ５６と、を備えている。 The arithmetic unit 48 includes a 7-to-1 multiplexer 58, a shift extension circuit (shifter) 60, a 2-to-1 multiplexer (MPX) 61, a 16-bit ALU 52, an A register 54, and an F register 56. I have.

７対１のマルチプレクサ５８は、レジスタ５０と演算部４８との接続部分に設けられており、ＰＥ並列方向で前方（ＰＥ番号の小さい方）に１、２、３つ離れたレジスタ５０のデータと、後方（ＰＥ番号の大きい方）に１、２、３つ離れたレジスタ５０のデータと、中央の（即ち、同一ＰＥの）レジスタ５０のデータを演算対象として選択できるように設定されている。また、レジスタ５０の８ビットのデータは、シフト拡張回路６０により任意ビット分左シフトされてＡＬＵ５２に入力することができる。 The 7-to-1 multiplexer 58 is provided at the connection portion between the register 50 and the arithmetic unit 48, and the data of the register 50 that is 1, 2, and 3 away from the front (the smaller PE number) in the PE parallel direction. The data of the register 50 that is one, two, three away from the rear (the one with the larger PE number) and the data of the register 50 in the center (that is, the same PE) can be selected as an operation target. The 8-bit data in the register 50 can be shifted to the left by an arbitrary bit by the shift extension circuit 60 and input to the ALU 52.

１６ビットＡＬＵ５２は、ＰＥ命令による演算では、レジスタ５０から読み出されたデータ、ＧＰ４から与えられたデータ（２対１のマルチプレクサ６１の出力）、若しくはＦレジスタ５６から読み出されたデータを１６ビットＡＬＵ５２の片側の入力とし、もう片側にはＡレジスタ５４の内容を入力として、結果をＡレジスタ５４に格納する。したがって、Ａレジスタ５４と、Ｒ０〜Ｒ３１レジスタもしくはＧＰ４から与えられたデータとの演算が行われることになる。なお、Ｆレジスタ５６はテンポラリレジスタとも呼ばれ、Ａレジスタ５４に格納されている演算結果の退避などに用いられるレジスタであり、Ｆレジスタ５６の結果を１６ビットＡＬＵ５２による演算に使用することも可能である。さらに、図示していない８ビットの条件レジスタ（Ｔレジスタともいう）により、ＰＥ４４ごとに演算実行の無効／有効の制御を可能とし、特定のＰＥ４４だけを演算対象として選択できる。 The 16-bit ALU 52 is 16 bits of data read from the register 50, data given from the GP 4 (output of the 2-to-1 multiplexer 61), or data read from the F register 56 in the operation by the PE instruction. The input of one side of the ALU 52 is input, the content of the A register 54 is input to the other side, and the result is stored in the A register 54. Therefore, an operation is performed on the A register 54 and data supplied from the R0 to R31 registers or GP4. The F register 56 is also referred to as a temporary register, and is used for saving the operation result stored in the A register 54. The result of the F register 56 can also be used for the operation by the 16-bit ALU 52. is there. In addition, an 8-bit condition register (also referred to as a T register) (not shown) enables invalidation / validation control of operation execution for each PE 44, and only a specific PE 44 can be selected as an operation target.

ＰＥ−ＲＡＭ２０は、演算処理途中のデータなどを格納するためのワークメモリであり、図３に示すように、レジスタファイル１４（レジスタ４６）に保持されているデータを一度に読み出しまたは書き込むことができる。つまり、ＰＥ−ＲＡＭ２０は全ＰＥ分のデータビット幅（８ビット×２５６）となっている。また、アドレスや読み出しまたは書き込みの制御はＤＭＡ装置２２から行われる。ＤＭＡ装置２２から入力されるアドレス等の制御信号はアドレスデコーダ２２ａに入力され、指定されたアドレスのデータが読み出しまたは書き込まれる。即ち、ＰＥ−ＲＡＭ２０は、ＰＥ４４とのみデータの入出力が可能かつ、少なくとも２以上のＰＥ４４が同時にアクセス可能なデータビット幅を有し、１つのアドレスで複数（本実施形態では全部）のＰＥ４４分のデータが読み出される。 The PE-RAM 20 is a work memory for storing data in the middle of arithmetic processing, and can read or write data held in the register file 14 (register 46) at a time as shown in FIG. . That is, the PE-RAM 20 has a data bit width (8 bits × 256) for all PEs. The address and read / write control are performed from the DMA device 22. A control signal such as an address input from the DMA device 22 is input to an address decoder 22a, and data at a specified address is read or written. That is, the PE-RAM 20 can input / output data only to / from the PE 44 and has a data bit width that can be accessed by at least two or more PEs 44 at the same time. Data is read out.

ＤＭＡ装置２２は、カウンタ２２ａと、ＦＦ２２ｂと、を備えている。ＤＭＡ装置２２は、ＧＰ４からのＤＭＡＳｔａｒｔ信号によりＰＥ−ＲＡＭ２０へのアクセスを行う。ＰＥ−ＲＡＭ２０と、ＧＰ４およびＤＭＡ装置２２間にはマルチプレクサ（ＭＰＸ）２２ｃ、２２ｄ、２２ｅが設けられている。マルチプレクサ２２ｃはＰＥ−ＲＡＭ２０に対するアドレス（Ａ）をＧＰ４またはＤＭＡ装置２２のいずれかから出力されたものに切り替える。マルチプレクサ２２ｄはＰＥ−ＲＡＭ２０に対するリード／ライト制御信号（Ｒ／Ｗ）をＧＰ４またはＤＭＡ装置２２のいずれかから出力されたものに切り替える。マルチプレクサ２２ｅはＰＥ−ＲＡＭ２０に対するチップセレクト（ＣＳ）をＧＰ４またはＤＭＡ装置２２のいずれかから出力されたものに切り替える。 The DMA device 22 includes a counter 22a and an FF 22b. The DMA device 22 accesses the PE-RAM 20 by a DMAStart signal from the GP 4. Multiplexers (MPX) 22c, 22d, and 22e are provided between the PE-RAM 20, the GP4, and the DMA device 22. The multiplexer 22 c switches the address (A) for the PE-RAM 20 to one output from either the GP 4 or the DMA device 22. The multiplexer 22 d switches the read / write control signal (R / W) for the PE-RAM 20 to one output from either the GP 4 or the DMA device 22. The multiplexer 22 e switches the chip select (CS) for the PE-RAM 20 to one output from either the GP 4 or the DMA device 22.

各マルチプレクサ（２２ｃ、２２ｄ、２２ｅ）の切り替えは、ＤＭＡ装置２２から出力されるＤＭＡ信号（図５参照）により切り替えられる。ＤＭＡ信号はＤＭＡＳｔａｒｔ信号が入力されるとＨｉレベル、データ転送が終了するとＬｏｗレベルになる信号であり、Ｈｉレベルの場合は、各マルチプレクサはＤＭＡ装置２２が出力する信号がＰＥ−ＲＡＭ２０に供給されるようになっている。 Switching of each multiplexer (22c, 22d, 22e) is switched by a DMA signal (see FIG. 5) output from the DMA device 22. The DMA signal is a Hi level when the DMAStart signal is input, and becomes a Low level when the data transfer is completed. In the case of the Hi level, each multiplexer supplies a signal output from the DMA device 22 to the PE-RAM 20. It is like that.

カウンタ２２ａは、詳細な動作は後述するが、ＰＥアドレスが設定され順次インクリメントする。つまり、初期値は“０”が設定され順次インクリメントして“２５５”までカウントする。ＦＦ２２ｂは、共通データバス５１から入力されるレジスタファイル１４のＲ３１レジスタから出力されたデータが格納されるレジスタ（フリップフロップ）であり、このＦＦ２２ｂに格納された後にＰＥ−ＲＡＭ２０のアドレスとしてマルチプレクサ２２ｃに出力される。 Although detailed operation will be described later, the counter 22a is sequentially incremented by setting the PE address. In other words, the initial value is set to “0” and increments sequentially until counting to “255”. The FF 22b is a register (flip-flop) in which data output from the R31 register of the register file 14 input from the common data bus 51 is stored. After being stored in the FF 22b, the FF 22b is sent to the multiplexer 22c as an address of the PE-RAM 20. Is output.

また、図３に示したように、各ＰＥ４４のレジスタファイル４６のＲ３１レジスタは、ＰＥ−ＲＡＭ２０から読み出されたデータが書き込み可能となっており、そのために、カウンタ２２ａおよびＤＭＡ装置２２内で生成されるＲ／Ｗ信号が出力されるＰＥ指定アドレス線５３およびＲ／Ｗ制御信号線５５が入力されてＲ３１レジスタの読み出しおよび書き込み制御を行うデコーダ５７が各ＰＥ４４のＲ３１レジスタに対応して設けられている。 As shown in FIG. 3, the data read from the PE-RAM 20 can be written to the R31 register of the register file 46 of each PE 44. For this purpose, the data is generated in the counter 22a and the DMA device 22. A decoder 57 is provided corresponding to the R31 register of each PE 44, to which the PE designation address line 53 and the R / W control signal line 55 to which the R / W signal to be output is input are input and the R31 register is read and written. ing.

図４は、図３に対して、ＰＥ番号０が付与されているＰＥ４４（ＰＥ（０）とする）のＲ３１レジスタにＡ０、ＰＥ（１）のＲ３１レジスタにＡ１、…、ＰＥ（２５５）のＲ３１レジスタにＡ２５５がそれぞれ格納されていることを示した図である。本実施形態では、予めこのように各Ｒ３１レジスタに異なる値をセットし、図５のタイミングチャートに示したように、その後ＧＰ４からのＤＭＡＳｔａｒｔ信号が入力されると、カウンタ２２ａから初期値“０”をＰＥ指定アドレス線５３に出力するとともに、Ｒ／Ｗ制御信号線５５をリードを示す値を出力する。すると、ＰＥ（０）のＲ３１レジスタの値（Ａ０）が共通データバス５１に読み出される。 FIG. 4 is different from FIG. 3 in that the R31 register of PE44 (PE (0)) to which PE number 0 is assigned is A0, the R31 register of PE (1) is A1,..., PE (255). It is the figure which showed that A255 was stored in the R31 register. In this embodiment, different values are set in advance in the respective R31 registers as described above, and as shown in the timing chart of FIG. 5, when a DMAStart signal is subsequently input from GP4, the initial value “0” is output from the counter 22a. Is output to the PE designation address line 53 and the R / W control signal line 55 is output as a value indicating read. Then, the value (A0) of the R31 register of PE (0) is read to the common data bus 51.

次のサイクル（図５では２クロック後）で、カウンタ２２ａがインクリメントされ、ＰＥ（１）のＲ３１レジスタの値（Ａ１）が共通データバス５１に読み出されるとともに、ＦＦ２２ｂに前のサイクルで読み出したＰＥ（０）のＲ３１レジスタの値（Ａ０）が格納されてその値（Ａ０）がＰＥ−ＲＡＭ２０のアドレスとして与えられてＰＥ−ＲＡＭからはアドレスＡ０に格納されているデータが出力される。即ち、ＤＭＡ装置２２が、Ｒ３１レジスタに格納されている値をＰＥ４４ごとに順次読み出すととともに、該読み出された値をＰＥ−ＲＡＭ２０のアドレスとして出力するデータ転送手段として機能している。 In the next cycle (after two clocks in FIG. 5), the counter 22a is incremented, the value (A1) of the R31 register of PE (1) is read to the common data bus 51, and the PE read to the FF 22b in the previous cycle. The value (A0) of the R31 register (0) is stored, the value (A0) is given as the address of the PE-RAM 20, and the data stored at the address A0 is output from the PE-RAM. That is, the DMA device 22 functions as a data transfer unit that sequentially reads the values stored in the R31 register for each PE 44 and outputs the read values as addresses of the PE-RAM 20.

ＰＥ−ＲＡＭ２０から出力されたアドレスＡ０のデータは、当該サイクルの最後でＰＥ（０）のＲ３１レジスタに書き込まれる。デコーダ５７は、内部にパイプラインレジスタを持ち、ＰＥ指定アドレス線５３から入力されたリードアドレスは、ＰＥ−ＲＡＭ２０からデータが読み出されるタイミング分遅延してライトイネーブル信号を生成し、Ｒ３１レジスタに供給する。このようにすることで、実質的にＰＥ指定アドレス線５３のみで、リード時とライト時の制御を行うことができる。なお、ライトイネーブル信号は、各ＰＥ４４のデコーダ５７で遅延させる方法以外に、ＰＥ（０）のみ遅延させて、ＰＥ（１）以降は、サイクル単位でＰＥ間をシフトさせるようにしてもよい。即ち、デコーダ５７が、ＰＥ−ＲＡＭ２０から読み出されたデータを、少なくとも該データのアドレスが読み出されたＰＥ４４のＲ３１レジスタに格納する制御を行う書き込み制御手段として機能している。 The data at the address A0 output from the PE-RAM 20 is written into the R31 register of PE (0) at the end of the cycle. The decoder 57 has a pipeline register therein, the read address input from the PE designation address line 53 is delayed by the timing at which data is read from the PE-RAM 20, generates a write enable signal, and supplies it to the R31 register. . In this way, control at the time of reading and writing can be performed substantially only by the PE designation address line 53. In addition to the method of delaying the write enable signal by the decoder 57 of each PE 44, only PE (0) may be delayed, and after PE (1), the interval between PEs may be shifted. In other words, the decoder 57 functions as a write control unit that performs control to store the data read from the PE-RAM 20 in at least the R31 register of the PE 44 from which the address of the data is read.

そして、上述したサイクルを繰り返すことで、Ｒ３１レジスタに格納されている各ＰＥ４４で独立したアドレス値Ａ０〜Ａ２５５の値をＰＥ（０）から順次読み出していき、ＰＥ−ＲＡＭ２０に格納されていたデータを順次Ｒ３１レジスタに書き戻すことが従来よりも短いサイクル数で出来る。従来は、このようなことをする場合、ＧＰ４による命令でＰＥ４４の指定、指定されたＰＥ４４のアドレスの設定、ＰＥ−ＲＡＭ２０からの読み出し、といったことを行う必要があり、多くのサイクルがかかっていいた。 Then, by repeating the above-described cycle, each PE44 stored in the R31 register sequentially reads the independent address values A0 to A255 from PE (0), and the data stored in the PE-RAM 20 is read. Sequentially writing back to the R31 register can be performed with a shorter cycle number than before. Conventionally, in order to do this, it has been necessary to specify PE44, set the address of the designated PE44, and read from the PE-RAM 20 with an instruction by GP4, which took many cycles. .

上述した動作を図６のフローチャートにまとめる。まず、ＰＥ命令などを用いて各ＰＥ４４のＲ３１レジスタにアドレスを設定し（ステップＳ１）、ＧＰ４からＤＭＡＳｔａｒｔ信号が入力されるとＤＭＡ装置２２が動作を開始し（ステップＳ２）、ＰＥ指定アドレス線５３で指定したＰＥ４４のＲ３１レジスタからアドレス値を読み出して（ステップＳ３）、ＰＥ−ＲＡＭ２０へ出力し当該アドレスのデータを読み出して、対応するＰＥ４４のＲ３１レジスタに格納している（ステップＳ４）。 The operations described above are summarized in the flowchart of FIG. First, an address is set in the R31 register of each PE 44 using a PE instruction or the like (step S1), and when a DMAStart signal is input from GP4, the DMA device 22 starts operating (step S2). The address value is read from the R31 register of the PE 44 specified in (Step S3), output to the PE-RAM 20, and the data at the address is read and stored in the R31 register of the corresponding PE 44 (Step S4).

そして、カウンタ２２ａの値を判断し（ステップＳ５）、カウンタ２２ａの値が最後のＰＥ番号を示していた場合は終了し、最後でない場合はカウンタ２２ａをインクリメントして（ステップＳ６）次のＰＥ４４についてステップＳ３とＳ４を繰り返す。最後のＰＥ番号か否かは、例えばＤＭＡ装置２２内に予めＰＥ数を設定し、その設定値と比較することでＤＭＡ装置２２内の制御回路等（不図示）が判断すればよい。 Then, the value of the counter 22a is judged (step S5), and if the value of the counter 22a indicates the last PE number, the process ends. If not, the counter 22a is incremented (step S6) for the next PE44. Steps S3 and S4 are repeated. Whether or not it is the last PE number may be determined by, for example, setting the number of PEs in the DMA device 22 in advance and comparing it with the set value by a control circuit (not shown) in the DMA device 22.

本実施形態によれば、ＳＩＭＤ型マイクロプロセッサ２に、プロセッサエレメントグループ６のレジスタファイル１４とのみデータの入出力が行われるＰＥ−ＲＡＭ２０と、ＰＥ−ＲＡＭ２０とレジスタファイル１４とのデータ転送の制御を行うＤＭＡ装置２２を設け、ＤＭＡ装置２２で、各ＰＥ４４のＲ３１レジスタに格納されている独立したアドレス値をＰＥ番号順に順次読み出していき、ＰＥ−ＲＡＭ２０に格納されていたデータを順次Ｒ３１レジスタに書き戻すようにするために、少ない回路追加のみで回転処理などの各ＰＥ４４が異なったアドレスのデータを参照する処理を高速に行うことが可能となる。 According to the present embodiment, the SIMD type microprocessor 2 controls the PE-RAM 20 in which data is input / output only to / from the register file 14 of the processor element group 6, and the data transfer control between the PE-RAM 20 and the register file 14. The DMA device 22 is provided, and the DMA device 22 sequentially reads the independent address values stored in the R31 register of each PE 44 in the order of the PE number, and sequentially writes the data stored in the PE-RAM 20 to the R31 register. In order to restore it, it is possible to perform a process of referring to data at different addresses by each PE 44, such as a rotation process, by adding only a small number of circuits.

また、ＧＰ４や各ＰＥ４４は、ＤＭＡ装置２２によるデータ転送中は、ＰＥ−ＲＡＭ２０が関わらない他の処理をすることができるために、ＳＩＭＤ型マイクロプロセッサ２の処理効率を向上させることができる。 In addition, since the GP 4 and each PE 44 can perform other processing not related to the PE-RAM 20 during the data transfer by the DMA device 22, the processing efficiency of the SIMD type microprocessor 2 can be improved.

（第２実施形態）
次に、本発明の第２の実施形態を図７乃至図１１を参照して説明する。なお、前述した第１の実施形態と同一部分には、同一符号を付して説明を省略する。図７は、本発明の第２の実施形態にかかるＳＩＭＤ型マイクロプロセッサの要部構成図である。図８は、図７に示されたデコーダの構成を示した構成図である。図９は、図８に示された書き込みデコーダのデコード条件を示した説明図である。図１０は、図７に示されたＳＩＭＤ型マイクロプロセッサのＲ３０レジスタに格納するアドレス設定値の例と、その例に対してデータ整形を施した後のデータを示した説明図である。図１１は、図７に示されたＳＩＭＤ型マイクロプロセッサのＰＥ−ＲＡＭからのデータ読み出し動作を示したタイミングチャートである。 (Second Embodiment)
Next, a second embodiment of the present invention will be described with reference to FIGS. Note that the same parts as those in the first embodiment described above are denoted by the same reference numerals and description thereof is omitted. FIG. 7 is a block diagram showing the principal part of a SIMD type microprocessor according to the second embodiment of the present invention. FIG. 8 is a configuration diagram showing the configuration of the decoder shown in FIG. FIG. 9 is an explanatory diagram showing the decoding conditions of the write decoder shown in FIG. FIG. 10 is an explanatory diagram showing an example of an address setting value stored in the R30 register of the SIMD type microprocessor shown in FIG. 7 and data after data shaping is performed on the example. FIG. 11 is a timing chart showing a data read operation from the PE-RAM of the SIMD type microprocessor shown in FIG.

図７に示したように、本実施形態では、ＤＭＡ装置２２の構成と、デコーダ５７の構成が第１の実施形態と異なる。また、本実施形態では、レジスタファイル１４のＲ３０レジスタにアドレス値を設定し、Ｒ３１レジスタにＰＥ−ＲＡＭ２０から読み出されたデータを格納する。 As shown in FIG. 7, in this embodiment, the configuration of the DMA device 22 and the configuration of the decoder 57 are different from those in the first embodiment. In this embodiment, an address value is set in the R30 register of the register file 14, and the data read from the PE-RAM 20 is stored in the R31 register.

ＤＭＡ装置２２は、カウンタ２２ａに代えて加算器２２ｇが設けられている。加算器２２ｇはカウンタ２２ａのインクリメント機能に加えて、加算機能も備えた回路である。また、ＤＭＡ装置２２には、デコーダ２２ｆが設けられている。デコーダ２２ｆは、ＦＦ２２ｂに格納されたＲ３１レジスタから読み出された値に応じて加算器２２ｇへの演算制御信号（インクリメントまたは加算、加算値）を出力する。 The DMA device 22 is provided with an adder 22g instead of the counter 22a. The adder 22g is a circuit having an addition function in addition to the increment function of the counter 22a. The DMA device 22 is provided with a decoder 22f. The decoder 22f outputs an operation control signal (increment or addition, addition value) to the adder 22g according to the value read from the R31 register stored in the FF 22b.

本実施形態のデコーダ５７（５７´）は、図８に示すように、読み出し用デコーダ（Ｒｅａｄ−ＤＥＣ）５７ａと、書き込み用デコーダ（Ｗｒｉｔｅ−ＤＥＣ）５７ｂと、ＦＦ５７ｃと、ＡＮＤ回路５７ｄと、を備えている。 As shown in FIG. 8, the decoder 57 (57 ′) according to the present embodiment includes a read decoder (Read-DEC) 57a, a write decoder (Write-DEC) 57b, an FF 57c, and an AND circuit 57d. I have.

読み出し用デコーダ５７ａは、第１の実施形態と同様に、ＰＥ指定アドレス線５３（ＤＭＡ装置２２のＰＥＡ端子）から入力されるアドレス値に基づいて自ＰＥ番号を示している場合はＲｅａｄ用デコード信号（リードイネーブル信号）をアサートしてＲ３０レジスタに出力する。 As in the first embodiment, the read decoder 57a reads the Read decode signal when its own PE number is indicated based on the address value input from the PE designation address line 53 (PEA terminal of the DMA device 22). (Read enable signal) is asserted and output to the R30 register.

書き込み用デコーダ５７ｂは、自ＰＥ番号がＰＥ指定アドレス線５３から入力される値以上である場合に真となり、それ以外は偽となる。図９に各ＰＥ４４ごとの条件を示す。図９の「＝＝」は左辺と右辺が等しいことを意味し、「＜＝」は右辺が左辺以上であることを意味している。つまり、自ＰＥ番号がＰＥ指定アドレス線５３から入力される値以上であるとの条件が成立した場合に出力がアクティブ（Ｈｉ）となる。 The write decoder 57b is true when the self PE number is greater than or equal to the value input from the PE designation address line 53, and false otherwise. FIG. 9 shows conditions for each PE 44. “==” in FIG. 9 means that the left side and the right side are equal, and “<=” means that the right side is equal to or greater than the left side. That is, the output becomes active (Hi) when the condition that the self PE number is greater than or equal to the value input from the PE designation address line 53 is satisfied.

ＦＦ５７ｃは、書き込み用デコーダ５７ｂの出力を遅延されるためのパイプラインレジスタである。なお、図８では１段のように記載しているが、ＰＲ−ＲＡＭ２０からの読み出しタイミングに合わせて段数は適宜調整すればよい。 The FF 57c is a pipeline register for delaying the output of the write decoder 57b. Although FIG. 8 shows a single stage, the number of stages may be appropriately adjusted in accordance with the read timing from the PR-RAM 20.

ＡＮＤ回路５７ｄは、ＦＦ５７ｃの出力と、Ｒ／Ｗ制御信号線５５と、ＦＦ２２ｂの最上位ビット（ＭＳＢ）との論理積演算をする回路である。なお、Ｒ／Ｗ制御信号線５５と、ＦＦ２２ｂの最上位ビットは、それぞれ論理レベルを反転した信号で演算をする。即ち、ＦＦ５７ｃがＨｉレベルで、Ｒ／Ｗ制御信号線５５がＬｏｗレベルで、ＦＦ２２ｂの最上位ビットがＬｏｗレベルの場合にＷｒｉｔｅ用デコード信号（ライトイネーブル信号）がアサートされＲ３１レジスタに出力される。 The AND circuit 57d is a circuit that performs an AND operation on the output of the FF 57c, the R / W control signal line 55, and the most significant bit (MSB) of the FF 22b. Note that the R / W control signal line 55 and the most significant bit of the FF 22b are operated with signals with inverted logic levels. That is, when the FF 57c is at Hi level, the R / W control signal line 55 is at Low level, and the most significant bit of the FF 22b is at Low level, the write decode signal (write enable signal) is asserted and output to the R31 register.

本実施形態では、Ｒ３０レジスタに格納したアドレスのうち読み出す順序で同じアドレスが連続する場合は、一度のアクセスで同じアドレスのＰＥ４４のＲ３１レジスタにデータを書き込むようにしている。例えば、図１０（ａ）に示すようなアドレス値が各ＰＥ４４のＲ３０レジスタに格納されていた場合、まず以下の１０命令をプロセッサエレメントグループ６に対して実行する。即ち、Ｒ３０レジスタに格納されているアドレスのうち、同じアドレスが連続する場合は、当該アドレスが最初にＰＥ−ＲＡＭのアドレスとして出力された際に同じアドレスを示しているＰＥ４４のＲ３１レジスタに書き込むように制御し、以降の同じ値に対応するＰＥ４４への書き込みは規制するように制御している。 In this embodiment, when the same address continues in the reading order among the addresses stored in the R30 register, data is written to the R31 register of the PE 44 having the same address in one access. For example, when an address value as shown in FIG. 10A is stored in the R30 register of each PE 44, the following 10 instructions are first executed for the processor element group 6. That is, if the same address continues among the addresses stored in the R30 register, when the address is first output as the address of the PE-RAM, it is written to the R31 register of the PE 44 indicating the same address. And writing to the PE 44 corresponding to the same value thereafter is controlled to be restricted.

（１）Ｒ３０レジスタの値がＰＥ番号が１つ前のＰＥ４４と同じ値という条件が真ならば、テンポラリレジスタ（Ｆレジスタ５６）の値を“８１ｈ”に設定する。
（２）（１）の条件が真で、かつＰＥ番号が１つ後ろのＰＥとＲ３０レジスタの値が等しければ、テンポラリレジスタの値を“１”加算する。
（３）（１）の条件が真で、かつＰＥ番号が２つ後ろのＰＥとＲ３０レジスタの値が等しければ、テンポラリレジスタの値を“１”加算する。
（４）（１）の条件が真で、かつＰＥ番号が３つ後ろのＰＥとＲ３０レジスタの値が等しければ、テンポラリレジスタの値を“１”加算する。
（５）（１）の条件が真で、かつＰＥ番号が４つ後ろのＰＥとＲ３０レジスタの値が等しければ、テンポラリレジスタの値を“１”加算する。
（６）（１）の条件が真で、かつＰＥ番号が５つ後ろのＰＥとＲ３０レジスタの値が等しければ、テンポラリレジスタの値を“１”加算する。
（７）（１）の条件が真で、かつＰＥ番号が６つ後ろのＰＥとＲ３０レジスタの値が等しければ、テンポラリレジスタの値を“１”加算する。
（８）（１）の条件が真で、かつＰＥ番号が７つ後ろのＰＥとＲ３０レジスタの値が等しければ、テンポラリレジスタの値を“１”加算する。
（９）（１）の条件が偽ならば、元データの値をテンポラリレジスタに設定する。
（１０）テンポラリレジスタの値をＲ３０レジスタに転送する。 (1) If the condition that the value of the R30 register is the same value as the PE 44 with the previous PE number is true, the value of the temporary register (F register 56) is set to “81h”.
(2) If the condition of (1) is true and the value of the R30 register is equal to the PE with the next PE number, “1” is added to the value of the temporary register.
(3) If the condition of (1) is true and the value of the R30 register is equal to the PE whose number is two behind, the value of the temporary register is incremented by “1”.
(4) If the condition of (1) is true and the value of the R30 register is equal to the PE whose number is three behind, the value of the temporary register is incremented by “1”.
(5) If the condition of (1) is true and the value of the R30 register is the same as that of the PE whose number is four behind, the value of the temporary register is incremented by “1”.
(6) If the condition of (1) is true and the value of the R30 register is equal to the PE whose number is five behind, the value of the temporary register is incremented by “1”.
(7) If the condition of (1) is true and the value of the R30 register is equal to the PE whose sixth PE number is six, the value of the temporary register is incremented by “1”.
(8) If the condition of (1) is true and the value of the R30 register is the same as that of the PE whose number is seven behind, the value of the temporary register is incremented by “1”.
(9) If the condition of (1) is false, the value of the original data is set in the temporary register.
(10) Transfer the value of the temporary register to the R30 register.

上記した１０個の命令は順次ＰＥ命令として各ＰＥ４４で実行することにより、図１０（ａ）のデータが、図１０（ｂ）に示すようなデータに整形される。図１０（ｂ）のデータ整形欄には、ＭＳＢ，元データ値または命令によりテンポラリレジスタに積算された値の順に記載されている。上述するように動作させることで、本実施形態では、レジスタ８ビットのうち、ＭＳＢをアドレスか連続数かを識別するコード、以下の７ビットをアドレスまたは連続数が格納される領域としている。 The above 10 instructions are sequentially executed by each PE 44 as PE instructions, whereby the data in FIG. 10A is shaped into data as shown in FIG. In the data shaping column of FIG. 10B, the MSB, the original data value, or the value accumulated in the temporary register by the instruction is described in this order. By operating as described above, in this embodiment, among the 8 bits of the register, the MSB is a code for identifying whether it is an address or a continuous number, and the following 7 bits are an area for storing the address or the continuous number.

例えば、ＰＥ（０）の場合は、１０命令の結果（１）〜（８）の命令の条件が偽で、（９）と（１０）の命令の条件が真となりＲ３０レジスタには元データの値であるＡ０が設定される（ＭＳＢが“０”で、残り７ビットが“Ａ０”）。ＰＥ（１）の場合は、１０命令の結果（２）〜（９）の命令の条件が偽で、（１）と（１０）の命令の条件が真となりＲ３０レジスタには“８１ｈ”が設定される（ＭＳＢが“１”で、残り７ビットが“１”）。つまり、ＰＥ（０）はアドレス、ＰＥ（１）は連続数が格納される。なお、上記では１０命令であるが、参照する範囲は、１命令でＰＥ４４が参照可能な範囲である必要があるので、プロセッサエレメントグループ６の構成により命令数は変動する場合がある。 For example, in the case of PE (0), the conditions of the instructions (1) to (8) of the result of 10 instructions are false, the conditions of the instructions of (9) and (10) are true, and the original data is stored in the R30 register. The value A0 is set (MSB is “0” and the remaining 7 bits are “A0”). In the case of PE (1), the conditions of the instructions (2) to (9) of the 10 instructions are false, the conditions of the instructions (1) and (10) are true, and “81h” is set in the R30 register. (MSB is “1” and the remaining 7 bits are “1”). That is, PE (0) stores an address and PE (1) stores a continuous number. Although the number of instructions is 10 in the above, the range to be referred to needs to be a range in which PE 44 can be referred to by one instruction, and therefore the number of instructions may vary depending on the configuration of the processor element group 6.

図１１に本実施形態の動作のタイミングチャートを示す。Ｒ３０レジスタには、上記１０命令によりデータ整形後の値が格納されている。その後ＧＰ４からのＤＭＡＳｔａｒｔ信号が入力されると、加算器２２ｇから初期値“０”をＰＥ指定アドレス線５３に出力するとともに、Ｒ／Ｗ制御信号線５５をリードを示す値を出力する。すると、ＰＥ（０）のＲ３１レジスタの値（０，Ａ０）が共通データバス５１に読み出される（サイクルＴ１）。 FIG. 11 shows a timing chart of the operation of this embodiment. The R30 register stores the value after data shaping by the above 10 instructions. Thereafter, when the DMAStart signal from GP4 is input, the adder 22g outputs the initial value “0” to the PE designation address line 53 and the R / W control signal line 55 outputs a value indicating read. Then, the value (0, A0) of the R31 register of PE (0) is read to the common data bus 51 (cycle T1).

次のサイクルＴ２（図１１では２クロック後）では、加算器２２ｇがインクリメントされＰＥ（１）のＲ３０レジスタの値（１，１）が共通データバス５１に読み出されるとともに、ＦＦ２２ｂに前のサイクルで読み出したＰＥ（０）のＲ３０レジスタの値（０，Ａ０）が格納されてその値（０，Ａ０）がＰＥ−ＲＡＭ２０のアドレスとして与えられてＰＥ−ＲＡＭ２０からはアドレスＡ０に格納されているデータが出力される。なお、本サイクルで、Ｒ／Ｗ制御信号線５５はライトを示す値に切り替える。 In the next cycle T2 (after two clocks in FIG. 11), the adder 22g is incremented and the value (1, 1) of the R30 register of PE (1) is read to the common data bus 51, and the FF 22b is read in the previous cycle. The value (0, A0) of the read R30 register of PE (0) is stored, the value (0, A0) is given as the address of PE-RAM 20, and the data stored at address A0 from PE-RAM 20 Is output. In this cycle, the R / W control signal line 55 is switched to a value indicating writing.

ＰＥ−ＲＡＭ２０から出力されたアドレスＡ０のデータは、当該サイクルの最後でＰＥ（０）〜（２５５）のＲ３１レジスタに書き込まれる。書き込みデコーダ５７ｂは、上述したように、自ＰＥ番号がＰＥ指定アドレス線５３から入力される値以上である場合に真となるため、ＰＥ指定アドレス線５３が“０”の場合は、ＰＥ（０）〜（２５５）の書き込みデコーダ５７ｂで前記条件が成立する。したがって、ＰＥ（０）〜（２５５）のＲ３１レジスタに対してライトイネーブル信号がアサートされ書込みが行われる。 The data at the address A0 output from the PE-RAM 20 is written into the R31 registers of PE (0) to (255) at the end of the cycle. As described above, the write decoder 57b becomes true when the self PE number is equal to or greater than the value input from the PE designation address line 53. Therefore, when the PE designation address line 53 is "0", the PE (0 ) To (255), the above condition is satisfied. Therefore, the write enable signal is asserted with respect to the R31 registers of PE (0) to (255) and writing is performed.

次のサイクルＴ３では、加算器２２ｇがインクリメントされ、ＰＥ（２）のＲ３０レジスタの値（０，Ａ２）が共通データバス５１に読み出されるとともに、ＦＦ２２ｂに前のサイクルで読み出したＰＥ（１）のＲ３０レジスタの値（１，１）が格納される。この値（１，１）は、ＰＥ−ＲＡＭ２０にはアドレスとして与えられるものの、図８に示したデコーダ５７´の構成からＦＦ２２ｂのＭＳＢが“１”の場合はＡＮＤ回路５７ｄによってライトイネーブル信号がアサートされないため、ＰＥ−ＲＡＭ２０読み出されたデータはいずれのＰＥ４４のＲ３１レジスタにも書き込まれない。 In the next cycle T3, the adder 22g is incremented, the value (0, A2) of the R30 register of PE (2) is read to the common data bus 51, and the PE (1) read in the previous cycle to the FF 22b. The value (1, 1) of the R30 register is stored. Although this value (1, 1) is given as an address to the PE-RAM 20, when the MSB of the FF 22b is “1” due to the configuration of the decoder 57 ′ shown in FIG. 8, the AND circuit 57d asserts the write enable signal. Therefore, the data read from the PE-RAM 20 is not written to the R31 register of any PE 44.

次のサイクルＴ４では、加算器２２ｇがインクリメントされ、ＰＥ（３）のＲ３０レジスタの値（０，Ａ３）が共通データバス５１に読み出されるとともに、ＦＦ２２ｂに前のサイクルで読み出したＰＥ（２）のＲ３０レジスタの値（０，Ａ２）が格納されてその値（Ａ２）がＰＥ−ＲＡＭ２０のアドレスとして与えられてＰＥ−ＲＡＭ２０からはアドレスＡ２のデータが出力される。このときも、アドレスＡ０のデータを書き込んだ際と同様に、当該サイクルの最後でＰＥ（２）〜（２５５）のＲ３１レジスタにＰＥ−ＲＡＭ２０から読み出されたデータが書き込まれる。 In the next cycle T4, the adder 22g is incremented, the value (0, A3) of the R30 register of PE (3) is read to the common data bus 51, and the PE (2) read in the previous cycle to the FF 22b. The value (0, A2) of the R30 register is stored, the value (A2) is given as the address of the PE-RAM 20, and the data of the address A2 is output from the PE-RAM 20. At this time, the data read from the PE-RAM 20 is written to the R31 registers of PEs (2) to (255) at the end of the cycle, as in the case of writing the data at the address A0.

次のサイクルＴ５では、加算器２２ｇがインクリメントされＰＥ（４）のＲ３０レジスタの値（１，２）が共通データバス５１に読み出されるとともに、ＦＦ２２ｂに前のサイクルで読み出したＰＥ（３）のＲ３０レジスタの値（０，Ａ３）が格納されてその値（０，Ａ３）がＰＥ−ＲＡＭ２０のアドレスとして与えられてＰＥ−ＲＡＭ２０からはアドレスＡ３のデータが出力される。このときも、アドレスＡ０やＡ２のデータを書き込んだ際と同様に、当該サイクルの最後でＰＥ（３）〜（２５５）のＲ３１レジスタにＰＥ−ＲＡＭ２０から読み出されたデータが書き込まれる。 In the next cycle T5, the adder 22g is incremented and the value (1, 2) of the R30 register of PE (4) is read to the common data bus 51, and the R30 of PE (3) read in the previous cycle to the FF 22b. The register value (0, A3) is stored, the value (0, A3) is given as the address of the PE-RAM 20, and the data of the address A3 is output from the PE-RAM 20. At this time, the data read from the PE-RAM 20 is written to the R31 registers of PEs (3) to (255) at the end of the cycle, as in the case of writing the data at the addresses A0 and A2.

次のサイクルＴ６では、加算器２２ｇがインクリメントされ、ＰＥ（５）のＲ３０レジスタの値（１，１）が共通データバス５１に読み出されるとともに、ＦＦ２２ｂに前のサイクルで読み出したＰＥ（４）のＲ３０レジスタの値（１，２）が格納される。この値（１，２）は、サイクルＴ３と同様に、ＰＥ−ＲＡＭ２０にはアドレスとして与えられるものの、図８に示したデコーダ５７´からライトイネーブル信号がアサートされないため、ＰＥ−ＲＡＭ２０読み出されたデータはいずれのＰＥ４４のＲ３１レジスタにも書き込まれない。次のサイクルＴ７も同様である。 In the next cycle T6, the adder 22g is incremented, the value (1, 1) of the R30 register of PE (5) is read to the common data bus 51, and the PE (4) read in the previous cycle to the FF 22b. The value (1, 2) of the R30 register is stored. Although the values (1, 2) are given as addresses to the PE-RAM 20 as in the cycle T3, the write enable signal is not asserted from the decoder 57 'shown in FIG. Data is not written to the R31 register of any PE44. The same applies to the next cycle T7.

次のサイクルＴ８では、加算器２２ｇがインクリメントされＰＥ（７）のＲ３０レジスタの値（１，３）が共通データバス５１に読み出されるとともに、ＦＦ２２ｂに前のサイクルで読み出したＰＥ（７）のＲ３０レジスタの値（０，Ａ６）が格納されてその値（０，Ａ６）がＰＥ−ＲＡＭ２０のアドレスとして与えられてＰＥ−ＲＡＭ２０からはアドレスＡ６のデータが出力される。このときも、アドレスＡ０やＡ２やＡ３のデータを書き込んだ際と同様に、当該サイクルの最後でＰＥ（６）〜（２５５）のＲ３１レジスタにＰＥ−ＲＡＭ２０から読み出されたデータが書き込まれる。 In the next cycle T8, the adder 22g is incremented and the value (1, 3) of the R30 register of PE (7) is read to the common data bus 51, and at the same time, the R30 of PE (7) read to the FF 22b in the previous cycle. The register value (0, A6) is stored, the value (0, A6) is given as the address of the PE-RAM 20, and the data of the address A6 is output from the PE-RAM 20. At this time, the data read from the PE-RAM 20 is written to the R31 registers of PEs (6) to (255) at the end of the cycle, as in the case of writing the data at the addresses A0, A2, and A3.

次のサイクルＴ９では、加算器２２ｇがインクリメントされ、ＰＥ（８）のＲ３０レジスタの値（１，２）が共通データバス５１に読み出されるとともに、ＦＦ２２ｂに前のサイクルで読み出したＰＥ（７）のＲ３０レジスタの値（１，３）が格納される。この値（１，３）は、サイクルＴ３やＴ６、Ｔ７と同様に、ＰＥ−ＲＡＭ２０にはアドレスとして与えられるものの、図８に示したデコーダ５７´からライトイネーブル信号がアサートされないため、ＰＥ−ＲＡＭ２０読み出されたデータはいずれのＰＥ４４のＲ３１レジスタにも書き込まれない。 In the next cycle T9, the adder 22g is incremented, the value (1, 2) of the R30 register of PE (8) is read to the common data bus 51, and the PE (7) read in the previous cycle to the FF 22b. The value (1, 3) of the R30 register is stored. Although this value (1, 3) is given as an address to the PE-RAM 20 as in the cycles T3, T6, and T7, the write enable signal is not asserted from the decoder 57 'shown in FIG. The read data is not written to the R31 register of any PE 44.

また、サイクルＴ９では、ＦＦ２２ｂのＭＳＢが“１”でかつ該ＦＦ２２ｂ下位７ビットの値が３以上であるので、下位７ビットの値から“１”を減じたものを加算器２２ｇにおいて加算している。つまり、下位７ビットの値である“３”から“１”を減じた“２”を加算器２２ｇに加算するようにデコーダ２２ｆが制御する。加算器２２ｇは、前記したＦＦ２２ｂのＭＳＢが“１”でかつ該ＦＦ２２ｂ下位７ビットの値が３以上との条件が成立する場合以外はインクリメント動作を行う。 In cycle T9, since the MSB of FF 22b is “1” and the value of the lower 7 bits of FF 22b is 3 or more, the value obtained by subtracting “1” from the value of the lower 7 bits is added by adder 22g. Yes. That is, the decoder 22f controls to add “2” obtained by subtracting “1” from “3” which is the value of the lower 7 bits to the adder 22g. The adder 22g performs the increment operation except when the condition that the MSB of the FF 22b is “1” and the value of the lower 7 bits of the FF 22b is 3 or more is satisfied.

次のサイクルＴ１０では、加算器２２ｇには前のサイクル演算された値が格納されるためＰＥ（１０）の値（０，Ａ１０）が共通データバス５１に読み出される。つまり、サイクルＴ９からサイクルＴ１０に移行する際にＰＥ番号が１つ飛ばされている（スキップされている）。この飛ばされたＰＥ（９）は、図１０（ａ）に示されているように、アドレスが“Ａ６”であり、アドレスＡ６から読み出した値はサイクルＴ８ですでに当該ＰＥ（９）のＲ３１レジスタにも書き込まれているので飛ばしても問題は無い。即ち、Ｒ３０レジスタに格納されているアドレスのうち、同じアドレスが連続する数に応じて、Ｒ３０レジスタに格納されているアドレスを読み出すＰＥ番号をスキップさせている。 In the next cycle T10, since the value calculated in the previous cycle is stored in the adder 22g, the value (0, A10) of PE (10) is read to the common data bus 51. That is, one PE number is skipped when skipping from cycle T9 to cycle T10. As shown in FIG. 10A, the skipped PE (9) has an address “A6”, and the value read from the address A6 is already R31 of the PE (9) in cycle T8. Since it is also written in the register, there is no problem even if it is skipped. That is, among the addresses stored in the R30 register, the PE number for reading the address stored in the R30 register is skipped according to the number of consecutive identical addresses.

上述したように動作させることで、ＰＥ（０）、（１）のＲ３１レジスタにはアドレスＡ０のデータが書き込まれ、ＰＥ（２）のＲ３１レジスタにはアドレスＡ２のデータが書き込まれ、ＰＥ（３）〜（５）のＲ３１レジスタにはアドレスＡ３のデータが書き込まれ、ＰＥ（６）〜（９）のＲ３１レジスタにはアドレスＡ６のデータが書き込まれて、図１０（ａ）に示したとおりのアドレスのデータが各ＰＥ４４に書き込まれる。 By operating as described above, the data at address A0 is written to the R31 register of PE (0), (1), the data at address A2 is written to the R31 register of PE (2), and PE (3 ) To (5), the data of the address A3 is written into the R31 register, and the data of the address A6 is written into the R31 registers of the PEs (6) to (9), as shown in FIG. Address data is written to each PE 44.

本実施形態によれば、まず、予めＲ３０レジスタに格納されているアドレス値に対して、データ整形を行って、アドレス値そのものと、同じアドレスが連続することを示すデータと、のいずれかが格納されるようにし、アドレス値の場合は、自ＰＥ番号がＤＭＡ装置２２から入力されているＰＥ番号以上との条件が成立するＰＥ４４のみＰＥ−ＲＡＭ２０から読み出されたデータがＲ３１レジスタに書き込まれるようにし、同じアドレスが連続することを示すデータの場合は、Ｒ３１レジスタへの書き込みを行わないようにしているので、同じアドレスが連続する場合に一度でデータの書込みを終了させることができる。 According to the present embodiment, first, the address value stored in the R30 register in advance is subjected to data shaping, and either the address value itself or data indicating that the same address continues is stored. In the case of an address value, the data read from the PE-RAM 20 is written to the R31 register only for the PE 44 that satisfies the condition that its own PE number is equal to or greater than the PE number input from the DMA device 22. In the case of data indicating that the same address is continuous, the writing to the R31 register is not performed. Therefore, when the same address is continuous, the data writing can be completed at once.

また、同じアドレスが連続することを示すデータには、連続数を含み、その連続数が３以上の場合は、その連続数から１を減じた値をＰＥ番号を指定する加算器に加算しているので、同じアドレスが連続することが長く続く場合は、ＰＥ番号を飛ばす（スキップする）ことができ、データ転送にかかるサイクル数を削減することができる。特に、微小角回転のような画像処理では同一のアドレスにアクセスするＰＥ数が比較的多いため本実施形態の構成は当該画像処理のサイクル数の削減に有効である。 In addition, the data indicating that the same address is continuous includes a continuous number. When the continuous number is 3 or more, a value obtained by subtracting 1 from the continuous number is added to the adder for specifying the PE number. Therefore, when the same address continues for a long time, the PE number can be skipped (skip), and the number of cycles for data transfer can be reduced. In particular, in image processing such as small-angle rotation, since the number of PEs accessing the same address is relatively large, the configuration of this embodiment is effective in reducing the number of cycles of the image processing.

（第３実施形態）
次に、本発明の第３の実施形態を図１２を参照して説明する。なお、前述した第１、第２の実施形態と同一部分には、同一符号を付して説明を省略する。図１２は、本発明の第３の実施形態にかかるＳＩＭＤ型マイクロプロセッサの要部構成図である。 (Third embodiment)
Next, a third embodiment of the present invention will be described with reference to FIG. The same parts as those in the first and second embodiments described above are denoted by the same reference numerals and description thereof is omitted. FIG. 12 is a main part configuration diagram of a SIMD type microprocessor according to the third embodiment of the present invention.

本実施形態では、第２の実施形態に対して、ＤＭＡ装置２２に、スケール設定レジスタ２２ｈと、オフセット設定レジスタ２２ｉと、乗算器２２ｊと、加算器２２ｋと、ＦＦ２２ｍと、が追加されている点が異なる。 In the present embodiment, a scale setting register 22h, an offset setting register 22i, a multiplier 22j, an adder 22k, and an FF 22m are added to the DMA device 22 with respect to the second embodiment. Is different.

スケール設定レジスタ２２ｈは、Ｒ３０レジスタから読み出されたアドレス値に対するスケール値（乗数）を設定するレジスタである。オフセット設定レジスタ２２ｉは、Ｒ３０レジスタから読み出されたアドレス値に対するオフセット値（加算数）を設定するレジスタである。なお、スケール設定レジスタ２２ｈと、オフセット設定レジスタ２２ｉは、ＧＰ４で実行される命令や、図示しないデータバス等によって値が設定可能となっている。 The scale setting register 22h is a register that sets a scale value (multiplier) for the address value read from the R30 register. The offset setting register 22i is a register for setting an offset value (addition number) for the address value read from the R30 register. Note that the scale setting register 22h and the offset setting register 22i can be set by an instruction executed by GP4, a data bus (not shown), or the like.

乗算器２２ｊは、ＦＦ２２ｂに格納された値とスケール設定レジスタ２２ｈに設定されたスケール値との乗算を行う。加算器２２ｋは、乗算器２２ｊの出力に対して、オフセット設定レジスタ２２ｉに設定されたオフセット値を加算する。 The multiplier 22j multiplies the value stored in the FF 22b and the scale value set in the scale setting register 22h. The adder 22k adds the offset value set in the offset setting register 22i to the output of the multiplier 22j.

ＦＦ２２ｍは、加算器２２ｊの出力が格納されるレジスタ（フリップフロップ）であり、ＦＦ２２ｍの出力がＰＥ−ＲＡＭ２０のアドレスとして、ＤＭＡ装置２２から出力される。なお、乗算器２２ｊ、加算器２２ｋを経て出力されるデータのビット幅はＰＥ−ＲＡＭ２０に与えるアドレスのビット幅と同じになっているのはいうまでもない。 The FF 22m is a register (flip-flop) in which the output of the adder 22j is stored, and the output of the FF 22m is output from the DMA device 22 as the address of the PE-RAM 20. Needless to say, the bit width of the data output through the multiplier 22j and the adder 22k is the same as the bit width of the address given to the PE-RAM 20.

本実施形態によれば、スケール設定レジスタ２２ｈにスケールを設定でき、オフセット設定レジスタ２２ｉにオフセット値を設定できるようにしたので、Ｒ３０レジスタに設定するアドレスのビット幅が、例えば、７ビットであったとしても、それ以上範囲の領域のアドレスを指定することが可能となる。また、例えば、Ｒ３０レジスタから読み出した値を２倍した値をアドレスとするなど、ＰＥ−ＲＡＭ２０からの多様なデータの読み出し方（データの格納方法）が可能となる。即ち、少ないビット幅のレジスタでもＰＥ−ＲＡＭ２０のアドレスとして実効アドレスのビット幅を増やすことが可能となる。 According to the present embodiment, since the scale can be set in the scale setting register 22h and the offset value can be set in the offset setting register 22i, the bit width of the address set in the R30 register is, for example, 7 bits. However, it is possible to specify an address in a range beyond that. In addition, various data reading methods (data storage methods) from the PE-RAM 20 are possible, for example, by using a value obtained by doubling a value read from the R30 register as an address. That is, the bit width of the effective address can be increased as the address of the PE-RAM 20 even with a register having a small bit width.

なお、上述した第３の実施形態において、スケールを設定するレジスタ２２ｈとオフセットを設定するレジスタ２２ｉは、いずれか一方のみ設けてもよい。勿論その場合は、乗算器２２ｊと加算器２２ｋも対応する回路のみが設けられる。 In the third embodiment described above, only one of the register 22h for setting the scale and the register 22i for setting the offset may be provided. Of course, in that case, only a circuit corresponding to the multiplier 22j and the adder 22k is provided.

なお、本発明は上記実施形態に限定されるものではない。即ち、本発明の骨子を逸脱しない範囲で種々変形して実施することができる。 The present invention is not limited to the above embodiment. That is, various modifications can be made without departing from the scope of the present invention.

１プロセッサシステム
２ＳＩＭＤ型マイクロプロセッサ
８外部インタフェース
１２メモリ（外部メモリ）
２０ＰＥ−ＲＡＭ（メモリ）
２２ＤＭＡ装置（データ転送手段）
２２ｈスケール設定レジスタ（乗数を設定するレジスタ）
２２ｉオフセット設定レジスタ（オフセット値を設定するレジスタ）
Ｒ３０、Ｒ３１レジスタ（所定のアドレスによってリードまたはライトアクセス可能な複数のレジスタ） 1 processor system 2 SIMD type microprocessor 8 external interface 12 memory (external memory)
20 PE-RAM (memory)
22 DMA device (data transfer means)
22h Scale setting register (register to set multiplier)
22i Offset setting register (register for setting offset value)
R30, R31 registers (multiple registers that can be read or written by specified addresses)

特許４５０４８６１号公報Japanese Patent No. 4504861 特開平０３−２１１６８８号公報Japanese Patent Laid-Open No. 03-211688 特開２００５−２６９５０２号公報JP 2005-269502 A 特許４２９４１９０号公報Japanese Patent No. 4294190

Claims

In a SIMD type microprocessor having a plurality of processor elements each having a plurality of registers that can be read or written by a predetermined address,
A memory capable of inputting / outputting data only to / from the processor element and having a data bit width accessible by at least two or more processor elements simultaneously;
A data transfer means for sequentially reading a value stored in one of the plurality of registers for each of the processor elements, and outputting the read value as an address of the memory;
Write control means for performing control for storing data read from the memory in at least the register of the processor element from which an address corresponding to the data is read;
A SIMD type microprocessor characterized by comprising:

When the same value continues among the values stored in the one register, the write control means controls to write to the register of the processor element in which the same value is stored by one reading. 2. The SIMD type microprocessor according to claim 1, wherein control is performed so as to restrict subsequent writing of the processor element corresponding to the same value to the register.

The data transfer means skips the processor element that reads the value stored in the one register according to the number of consecutive same values among the values stored in the one register. The SIMD type microprocessor according to claim 2.

The data transfer means includes at least one of a register for setting an offset value and a register for setting a multiplier for an address read from the register, a register for setting the offset value, a register for setting the multiplier, and the register An SIMD type microprocessor according to any one of claims 1 to 3, further comprising: an arithmetic unit that performs an operation on a value read from the processor.

In a processor system having a SIMD type microprocessor having an external interface for performing data transfer control with the outside, and an external memory capable of writing or reading data from the external interface data,
The processor system, wherein the SIMD type microprocessor is the SIMD type microprocessor according to any one of claims 1 to 4.

In a data processing method of a SIMD type microprocessor having a plurality of processor elements each having a plurality of registers that can be read or written by a predetermined address,
A value stored in one of the plurality of registers is read for each processor element,
The read value is an address of a memory having a data bit width that allows data input / output only with the processor element and is accessible to at least two or more processor elements simultaneously,
A data processing method for a SIMD type microprocessor, wherein the data read from the memory is stored in at least the register of the processor element from which an address corresponding to the data is read.