JP4413905B2

JP4413905B2 - SIMD type processor

Info

Publication number: JP4413905B2
Application number: JP2006259487A
Authority: JP
Inventors: 慎一山浦; 和彦原; 貴雄片山; 和彦岩永; 浩資高藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2006-09-25
Filing date: 2006-09-25
Publication date: 2010-02-10
Anticipated expiration: 2019-09-10
Also published as: JP2007035063A

Description

この発明は、一つの演算命令により複数の画像データ等を並列処理するＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＳｔｒｅａｍＭｕｌｔｉｐｌｅＤａｔａＳｔｒｅａｍ）型プロセッサに関するものである。 The present invention relates to a single instruction stream multiple data stream (SIMD) type processor that processes a plurality of image data and the like in parallel by one arithmetic instruction.

近年、デジタル複写機やファクリミリ装置等において、画素数を増加させたり、或いはカラー対応にするなど画像の向上が図られている。そして、この画像の向上に伴い、処理すべきデータ数が増加している。ところで、複写機などにおけるデータ処理は全ての画素に対して同じ演算処理を施すことが多い。そこで、１つの命令で複数のデータに対して同時に同じ演算処理を行うＳＩＭＤ型プロセッサが用いられるようになっている。ここで、演算処理は複数の演算器を並べることで実現できるが、演算の対象となるデータは演算速度に見合う速度でメモリ等をアクセスする必要があり、この速度に間に合わない場合はデータのアクセス速度でプロセッサの性能が決定してしまう。通常タイプのＳＩＳＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＳｉｎｇｌｅＤａｔａ）型プロセッサでは、演算データはプロセッサのプログラムによりメモリから逐次アクセスするが、この場合にデータのアクセス速度はメモリのビット幅と転送時間で決定する。ＳＩＭＤ型プロセッサにおいてもこの方法を用いると演算は並列処理であるのに対して、データのアクセスは逐次処理となりＳＩＳＤ型プロセッサ程度に処理能力は低下してしまう。 In recent years, in digital copying machines, facsimile machines, and the like, improvement of images has been attempted by increasing the number of pixels or making it compatible with color. As the image is improved, the number of data to be processed has increased. By the way, data processing in a copying machine or the like often performs the same arithmetic processing on all pixels. Therefore, a SIMD type processor that performs the same arithmetic processing simultaneously on a plurality of data with one instruction is used. Here, arithmetic processing can be realized by arranging a plurality of arithmetic units, but it is necessary to access the memory etc. at a speed commensurate with the arithmetic speed for the data to be operated. Processor performance is determined by speed. In a normal type SIDS (Single Instruction Single Data) processor, operation data is sequentially accessed from a memory by a program of the processor. In this case, the data access speed is determined by the bit width of the memory and the transfer time. If this method is also used in the SIMD type processor, the computation is parallel processing, whereas the data access is sequential processing, and the processing capability is reduced to the same level as the SISD type processor.

このため、ＳＩＭＤ型プロセッサでは、演算対象データのアクセスはプロセッサの命令では行わず、外部のメモリデータ転送装置からプロセッサ内部の入出力用のレジスタに直接アクセスするように構成している。即ち、プロセッサでの演算実行と同時に、外部に備えられたメモリデータ転送装置が次に演算処理されるデータを入力用レジスタへ転送したり、演算処理されたデータを出力レジスタからメモリデータ転送装置を介してメモリへ転送することで、データ処理の高速化を図っている。 For this reason, the SIMD type processor is configured such that the operation target data is not accessed by a processor instruction, but an input / output register inside the processor is directly accessed from an external memory data transfer device. In other words, simultaneously with the execution of the operation by the processor, the memory data transfer device provided outside transfers the data to be processed next to the input register, or the calculated data is transferred from the output register to the memory data transfer device. The data processing speed is increased by transferring the data to the memory.

プロセッサと外部メモリデータ転送装置での処理フローは以下のように行われる。
(1)外部メモリデータ転送装置が演算対象データを入力用レジスタに転送。
(2)プロセッサは外部から演算データを転送済みである入力用のレジスタから演算対象デ
ータを演算用のレジスタに転送し演算を開始。
(3)プロセッサが所定の演算を実行する。この間に外部メモリデータ転送装置が次の演算
対象データを入力用レジスタに転送。また、演算処理済みデータ（結果データ）が出力用レジスタにある場合には外部メモリデータ転送装置が結果データを出力用レジスタからメモリへ転送。
(4)プロセッサは演算を終了し、結果データを出力用レジスタに転送。 The processing flow between the processor and the external memory data transfer device is performed as follows.
(1) The external memory data transfer device transfers the operation target data to the input register.
(2) The processor starts the operation by transferring the operation target data from the input register to which the operation data has been transferred from the outside to the operation register.
(3) The processor executes a predetermined operation. During this time, the external memory data transfer device transfers the next calculation target data to the input register. In addition, when the operation processed data (result data) is in the output register, the external memory data transfer device transfers the result data from the output register to the memory.
(4) The processor finishes the operation and transfers the result data to the output register.

上記のように、プロセッサの演算実行時に同時に外部のメモリデータ転送装置が演算データを転送することで高速化を実現している。 As described above, the speed is increased by the external memory data transfer device transferring the calculation data simultaneously with the execution of the calculation of the processor.

このデータ転送方式として、シフトレジスタ方式、或いはシリアルアクセスメモリ方式が採用されている。このシフトレジスタ方式は、例えば、特許文献１（特開平５−６７２０３号公報）に記載されているように、クロック入力に同期して、レジスタに保持されているデータがビット毎に順次シフトされる方式である。このシフトレジスタ方式によれば、例えば２５６個のプロセッサエレメントを持つＳＩＭＤ型プロセッサの場合、１回目に転送されたデータは０番目のプロセッサエレメントの入力レジスタに保持され、次のクロック入力により１ビットシフトされて１番目のプロセッサエレメントの入力レジスタに保持される。そして、１回目に転送されたデータが、２５５番目のプロセッサエレメントの入力レジスタに保持されるまでには、合計２５６回のクロック入力が必要となる。 As this data transfer method, a shift register method or a serial access memory method is adopted. In this shift register system, for example, as described in Patent Document 1 ( Japanese Patent Laid-Open No. 5-67203 ) , data held in a register is sequentially shifted bit by bit in synchronization with a clock input. It is a method. According to this shift register system, for example, in the case of a SIMD type processor having 256 processor elements, the data transferred for the first time is held in the input register of the 0th processor element and shifted by 1 bit by the next clock input. And held in the input register of the first processor element. A total of 256 clock inputs are required until the data transferred for the first time is held in the input register of the 255th processor element.

また、シリアルアクセスメモリ方式は、例えば、特許文献２（特開平６−４６９０号公報）に記載されているように、入力ポインタが一つのプロセッサエレメントに論理“Ｈ”を立てた入力ポインタ信号を発生し、論理“Ｈ”で指定されたプロセッサエレメントの入力ＳＡＭ（シリアルアクセスメモリ）に入力データが書き込まれる方式である。このシリアルアクセスメモリ方式では、入力ポインタ信号はクロック入力に同期してビット毎に順次シフトしていく。従って、このシリアルアクセスメモリ方式によれば、例えば２５６個のプロセッサエレメントを持つＳＩＭＤ型プロセッサの場合、１回目のデータ転送では、入力ポインタ信号が０番目のプロセッサエレメントを指定し、０番目のプロセッサエレメントの入力ＳＡＭにデータが保持される。次いで、２回目のデータ転送では、入力ポインタ信号がクロック入力に同期して１ビットシフトして１番目のプロセッサエレメントを指定し、１番目のプロセッサエレメントの入力ＳＡＭにデータが保持される。このようにして、２５５番目のプロセッサエレメントの入力ＳＡＭにデータが保持されるまでには、合計２５６回目のクロック入力が必要となる。
特開平５−６７２０３号公報特開平６−４６９０号公報 In the serial access memory system, for example, as described in Patent Document 2 ( Japanese Patent Laid-Open No. 6-4690 ) , the input pointer generates an input pointer signal in which one processor element is set to logic “H”. In this method, input data is written to an input SAM (serial access memory) of a processor element designated by logic “H”. In this serial access memory system, the input pointer signal is sequentially shifted bit by bit in synchronization with the clock input. Therefore, according to this serial access memory system, for example, in the case of a SIMD type processor having 256 processor elements, in the first data transfer, the input pointer signal specifies the 0th processor element, and the 0th processor element Data is held in the input SAM. Next, in the second data transfer, the input pointer signal is shifted by 1 bit in synchronization with the clock input to designate the first processor element, and the data is held in the input SAM of the first processor element. In this way, a total of 256 clock inputs are required until data is held in the input SAM of the 255th processor element.
JP-A-5-67203 Japanese Patent Laid-Open No. 6-4690

しかし、これらの方式によると、データを偶数番目のプロセッサエレメントにだけ転送したいような場合であっても、奇数番目のプロセッサエレメントにも転送しなければならないという問題があった。また、データを後半のプロセッサエレメント（１２８番目〜２５５番目）にだけ転送したいような場合であっても、全部のプロセッサエレメントに転送しなければならないという問題があった。即ち、特定のプロセッサエレメントにだけデータを直接転送することはできないという問題があった。そのため、必要なデータを転送するのに、必要以上に時間を要し、データ処理が遅くなるという問題があった。 However, according to these methods, there is a problem that even when data is transferred only to even-numbered processor elements, it must be transferred to odd-numbered processor elements. Further, there is a problem that even when data is to be transferred only to the latter half of the processor elements (128th to 255th), it must be transferred to all the processor elements. That is, there is a problem that data cannot be directly transferred only to a specific processor element. Therefore, there is a problem that it takes more time than necessary to transfer the necessary data, and the data processing becomes slow.

また、プロセッサで行うデータ処理においては、入力データの保持に必要な入力レジスタのビット幅、出力データの保持に必要な出力レジスタのビット幅、一時的にデータを保持するのに必要なレジスタのビット幅は実行するアプリケーションにより異なる。従来のＳＩＭＤ型プロセッサにおいては、入力レジスタ、出力レジスタ、一時的にデータを保持するレジスタで保持できるデータのビット幅が固定であった。そのため、データがこれらのレジスタで保持できるビット幅を越えるとデータ処理できないという問題があった。 In the data processing performed by the processor, the bit width of the input register required to hold the input data, the bit width of the output register required to hold the output data, and the bit of the register required to temporarily hold the data The width depends on the application to be executed. In a conventional SIMD type processor, the bit width of data that can be held by an input register, an output register, or a register that temporarily holds data has been fixed. Therefore, there is a problem that data cannot be processed if the data exceeds the bit width that can be held in these registers.

また、従来技術では入出力レジスタと入出力ポートのビット幅は同じであり、全プロセッサエレメント（ＰＥ）のデータを転送するのにはＰＥ数だけのアクセスが必要であり、転送時間が多くなる問題があった。 In the prior art, the bit widths of the input / output registers and the input / output ports are the same, and it is necessary to access only the number of PEs in order to transfer the data of all the processor elements (PE), which increases the transfer time. was there.

また、アプリケーションによっては多数のラインバッファが必要となりプロセッサエレメントに内蔵するレジスタをこの用途に使用している。しかし、レジスタ数は固定であるため、この値を超えるラインバッファが必要なアプリケーションには対応できない問題があった。 Further, depending on the application, a large number of line buffers are required, and a register built in the processor element is used for this purpose. However, since the number of registers is fixed, there is a problem that cannot be applied to an application that requires a line buffer exceeding this value.

この発明は、斯かる従来の問題に着目してなされたものであり、データを任意のプロセッサエレメントに直接に転送することを可能にすることで、データの転送を高速にし、延いてはデータ処理を高速にすることを目的とする。また、レジスタの使用用途を柔軟にすることで、データのビット数に柔軟に対応したデータ処理を可能にすることを目的とする。 The present invention has been made paying attention to such a conventional problem, and enables data to be directly transferred to an arbitrary processor element, thereby speeding up data transfer, and thus data processing. The purpose is to speed up. Another object of the present invention is to enable data processing flexibly corresponding to the number of bits of data by making the usage of the register flexible.

この発明のＳＩＭＤ型プロセッサは、データを演算処理する演算手段及び当該演算手段で演算処理されるデータを保持するとともに当該演算手段で演算処理されたデータを保持するデータ保持手段を備える複数のプロセッサエレメントと、このプロセッサエレメントそれぞれに接続されるデータ転送バスと、所定のプロセッサエレメントを指定する指定手段と、前記プロセッサエレメントそれぞれに接続されるアドレスバスと、演算処理されるデータを前記データ転送バスより取得して前記データ保持手段に保持させるための取得信号、或いは前記データ保持手段に保持されている演算処理されたデータを前記データ転送バスより出力させるための出力信号を前記データ保持手段に与える信号発生手段と、を備え、前記データ保持手段は、複数のレジスタを有する第１のレジスタ群と、複数のレジスタを有する第２のレジスタ群とを有し、前記指定手段が所定のプロセッサエレメントをアドレス指定し、この指定されたプロセッサエレメントの所定の前記データ保持手段に前記信号発生手段が信号を与えることにより、前記データ保持手段の中の前記第1のレジスタ群、第２のレジスタ群から所定数のレジスタを選択し、前記データ保持手段の中の選択されたレジスタにデータを前記データ転送バスより取得或いは出力することを特徴とする。 The SIMD type processor according to the present invention includes a plurality of processor elements each having an arithmetic means for arithmetically processing data and data holding means for holding data arithmetically processed by the arithmetic means and holding data arithmetically processed by the arithmetic means And a data transfer bus connected to each of the processor elements, a designation means for designating a predetermined processor element, an address bus connected to each of the processor elements, and data to be processed are obtained from the data transfer bus Generation of an acquisition signal for holding in the data holding means or an output signal for outputting the processed data held in the data holding means from the data transfer bus to the data holding means And the data holding means includes a plurality of means. A first register group having a plurality of registers, and a second register group having a plurality of registers, wherein the designation means addresses a predetermined processor element, and the predetermined data of the specified processor element When the signal generating means gives a signal to the holding means, a predetermined number of registers are selected from the first register group and the second register group in the data holding means, and the selection in the data holding means The data is acquired or output from the data transfer bus to the registered register.

これによれば、信号発生手段が演算処理されるデータをデータ保持手段に保持させるための取得信号をデータ保持手段に与えることで、このデータ保持手段は演算処理されるデータを取得して保持するものとして機能する。さらに、信号発生手段がデータ保持手段に保持されている演算処理されたデータを出力させるための出力信号をデータ保持手段に与えることで、このデータ保持手段は演算処理されたデータを出力するものとして機能する。このように、レジスタの使用用途を柔軟にすることで、入力データ及び出力データのビット数に柔軟に対応したデータ処理が可能になる。 According to this, the data holding means acquires and holds the data to be processed by giving the data holding means an acquisition signal for holding the data to be processed by the signal generating means to the data holding means. It functions as a thing. Further, the signal generation means provides the data holding means with an output signal for outputting the processed data held in the data holding means, so that the data holding means outputs the processed data. Function. As described above, by making the usage of the register flexible, it is possible to perform data processing flexibly corresponding to the number of bits of input data and output data.

以上詳述したように、この発明によれば、演算処理されるデータは、アドレス指定されたプロセッサエレメントのデータ保持手段に保持されるため、データを任意のプロセッサエレメントに直接に転送できる。また、演算手段で演算処理されたデータを出力する場合にも、アドレス指定されたプロセッサエレメントのデータ保持手段に保持されているデータを出力する。そのため、データの転送を高速にでき、延いてはデータ処理を高速にできる。 As described above in detail, according to the present invention, the data to be processed is held in the data holding means of the addressed processor element, so that the data can be directly transferred to any processor element. Also, when data processed by the arithmetic means is output, the data held in the data holding means of the addressed processor element is output. Therefore, the data transfer can be performed at high speed, and the data processing can be performed at high speed.

また、データ保持手段は入力レジスタとしての機能を有するとともに、出力レジスタとしての機能を有する。このように、データ保持手段の使用用途を柔軟にすることで、データのビット数に柔軟に対応したデータ処理が可能になる。 The data holding means has a function as an input register and also functions as an output register. Thus, by making the usage of the data holding means flexible, it is possible to perform data processing flexibly corresponding to the number of bits of data.

（第１の実施形態）
以下、この発明に係るＳＩＭＤ型プロセッサ１の実施形態を、図１乃至図４に基づいて説明する。 (First embodiment)
An embodiment of a SIMD type processor 1 according to the present invention will be described below with reference to FIGS.

ＳＩＭＤ型プロセッサ１は、図１に示すように、グローバルプロセッサ２、本実施形態では２５６組の後述するプロセッサエレメント３ａからなるプロセッサエレメントブロック３、メモリコントローラ５と接続される外部インタフェース４から構成される。メモリコントローラ５はグローバルプロセッサ２の命令に基づき、メモリ６から演算対象データをプロセッサ内部の入出力用のレジスタフィル３１に直接アクセスする。 SIMD processor 1, as shown in FIG. 1, consists of an external Inn tough Esu 4 connected global processor 2, the processor element block 3 consisting of processor elements 3a to be described later of the 256 sets in this embodiment, the memory controller 5 Is done. The memory controller 5 directly accesses the operation target data from the memory 6 to the input / output register fill 31 in the processor based on the instruction of the global processor 2.

まず、メモリコントローラ５につき説明する。図１に示すように、メモリコントローラ４は、ＳＩＭＤ型プロセッサ１のレジスタファイル３１と外部インタフェース４のデータ転送ポートを介して接続されていて、レジスタファイル３１からメモリ６へのデータ転送、メモリ６からレジスタファイル３１へのデータ転送を行っている。メモリコントローラ５が制御するレジスタは、Ｉ／Ｏ空間にマッピングされており、グローバルプロセッサ２からの指示に従い、アドレス、クロック、及びリード／ライト制御を出力することでリード、ライト可能となっている。 First, the memory controller 5 will be described. As shown in FIG. 1, note Rico controller 4, which is connected via a register file 31 and the data transfer port of the external interface 4 of the SIMD processor 1, data transfer from the register file 31 to memory 6, the memory 6 To the register file 31. The registers controlled by the memory controller 5 are mapped in the I / O space, and can be read and written by outputting addresses, clocks, and read / write controls in accordance with instructions from the global processor 2.

グローバルプロセッサ２からメモリコントローラ５へはＩ／Ｏ用のアドレス、データ、コントロール信号がバスを介して与えられる。グローバルプロセッサ２がメモリコントローラ５のいくつかの動作設定レジスタ（図示せず）へ動作方法等のコマンドを設定している。最後にグローバルプロセッサ２は、メモリコントローラ５のスタートレジスタ（図示せず）にスタートコードを書き込むことで、メモリコントローラ５は自動的に設定に従った動作を行う。このように構成することで、プロセッサの命令制御による演算と同時にレジスタファイル３１のデータを入出力する。 An I / O address, data, and control signal are given from the global processor 2 to the memory controller 5 via a bus. The global processor 2 sets commands such as an operation method in some operation setting registers (not shown) of the memory controller 5. Finally, the global processor 2 writes a start code in a start register (not shown) of the memory controller 5 so that the memory controller 5 automatically performs an operation according to the setting. With this configuration, the data in the register file 31 is input / output simultaneously with the calculation based on the instruction control of the processor.

図２は、この発明に用いられるメモリコントローラ５の構成を示したものである。メモリコントローラ５は、メモリ６にデータライトを行うライトバッファ部５４と、メモリ６からデータリードを行うリードバッファ部５５と、ＰＥレジスタファイルの制御を行っているＰＥ制御部５２、メモリ６の制御を行うＲＡＭ制御部５３、及びシーケンスユニット（ＳＣＵ）５１より構成されている。 FIG. 2 shows the configuration of the memory controller 5 used in the present invention. The memory controller 5 controls the write buffer unit 54 that writes data to the memory 6, the read buffer unit 55 that reads data from the memory 6, the PE control unit 52 that controls the PE register file, and the memory 6. It comprises a RAM control unit 53 and a sequence unit (SCU) 51.

ライトバッファ部５４にはＳＩＭＤ方式プロセッサ１の外部インタフェース４の出力ポートが接続され、リードバッファ部５５には外部インタフェース４の入力ポートが接続される。 An output port of the external interface 4 of the SIMD processor 1 is connected to the write buffer unit 54, and an input port of the external interface 4 is connected to the read buffer unit 55.

グローバルプロセッサ２は、図３に示すように、グローバルプロセッサ２、プロセッサエレメントブロック３、外部インタフェース４及びメモリコントローラ５を制御するためのプログラムが格納されたプログラムＲＡＭ２１、及びこのプログラムＲＡＭ２１に基づきグローバルプロセッサ２、プロセッサエレメントブロック３、外部インタフェース４、メモリコントローラ５を制御するシーケンスユニット２２を備える。具体的には、このシーケンスユニット２２は、グローバルプロセッサ２に備えられている後述する算術論理演算器２３（以下、「ＡＬＵ２３」という。）等を制御する。 As shown in FIG. 3, the global processor 2 includes a global processor 2, a processor element block 3, a program RAM 21 storing a program for controlling the external interface 4 and the memory controller 5, and the global processor 2 based on the program RAM 21. , A processor element block 3, an external interface 4, and a sequence unit 22 for controlling the memory controller 5. Specifically, the sequence unit 22 controls an arithmetic logic unit 23 (hereinafter referred to as “ALU 23”), which will be described later, provided in the global processor 2.

また、このシーケンスユニット２２は、プロセッサエレメントブロック３を構成する後述するレジスタファイル３１、及び後述する演算アレイ３６を制御する。この演算アレイ３６は、マルチプレクサ３２、シフト拡張回路３３、算術論理演算器３４（以下、「ＡＬＵ３４」という）、及びレジスタ３５を備える。なお、このグローバルプロセッサ２は、いわゆるＳＩＳＤ型であり、一つの演算命令に対して一つの演算処理を行うものである。 The sequence unit 22 controls a later-described register file 31 and a later-described arithmetic array 36 that constitute the processor element block 3. The arithmetic array 36 includes a multiplexer 32, a shift extension circuit 33, an arithmetic logic unit 34 (hereinafter referred to as “ALU 34”), and a register 35. The global processor 2 is a so-called SISD type, and performs one arithmetic process for one arithmetic instruction.

さらに、このシーケンスユニット２２は、後述するメモリコントローラ５に対してデータ転送のための動作設定用データ及びコマンド等を送る。メモリコントローラ５は、シーケンスユニット２２の動作設定用データ及びコマンドに基づき、プロセッサエレメント３ａのアドレス指定のためのアドレス制御信号、プロセッサエレメント３ａを構成する後述するレジスタ３１ｂにデータのリード／ライトを指示するためのリード／ライト制御信号、クロック信号を与えるためのクロック制御信号を外部インタフェース４に与える。 Further, the sequence unit 22 sends operation setting data and commands for data transfer to the memory controller 5 described later. Based on the operation setting data and commands of the sequence unit 22, the memory controller 5 instructs an address control signal for addressing the processor element 3a, and a data read / write to a register 31b, which will be described later, constituting the processor element 3a. A read / write control signal and a clock control signal for supplying a clock signal are supplied to the external interface 4.

ここで、リード／ライト制御信号のうちライト制御信号とは、演算処理されるデータを後述するデータバス４１ｄより取得して、プロセッサエレメント３ａのレジスタ３１ｂに保持させるための信号をいう。一方、リード／ライト制御信号のうちリード制御信号とは、プロセッサエレメント３ａのレジスタ３１ｂが保持している演算処理されたデータを、後述するデータバス４１ｄへ与えるようレジスタ３１ｂに指示するための信号をいう。 Here, the write control signal among the read / write control signals refers to a signal for obtaining data to be processed from a data bus 41d described later and holding it in the register 31b of the processor element 3a. On the other hand, the read control signal among the read / write control signals is a signal for instructing the register 31b to supply the processed data held in the register 31b of the processor element 3a to the data bus 41d described later. Say.

メモリコントローラ５は、グローバルプロセッサ２からのコマンドを受けて、プロセッサエレメントブロック３を構成するプロセッサエレメント３ａのアドレスを指定する信号（以下、「アドレス指定信号」という。）を作成し、外部インタフェース４からアドレスバス４１ａを介してプロセッサエレメント３ａの後述するレジスタコントローラ３１ａヘ送る。また、メモリコントローラ５は、後述するようにプロセッサエレメント３ａを構成するレジスタ３１ｂに対して、データのリード／ライトを指示するための信号（以下、「リード／ライト指示信号」という。）を、リード／ライト信号４１ｂを介してプロセッサエレメント３ａの後述するレジスタコントローラ３１ａヘリード／ライト信号が与えられる。また、メモリコントローラ５は、外部インタフェース４からクロック信号４１ｃを介してプロセッサエレメント３ａの後述するレジスタコントローラ３１ａへクロック信号を与える。 The memory controller 5 receives a command from the global processor 2, a signal for designating the address of the processor element 3a constituting the processor element block 3 (hereinafter, referred to as "addressing signal".) Create and external in tough Esu 4 to the register controller 31a (to be described later) of the processor element 3a via the address bus 41a. Further, the memory controller 5 reads a signal (hereinafter referred to as “read / write instruction signal”) for instructing data read / write to a register 31b constituting the processor element 3a as will be described later. A read / write signal is given to a register controller 31a (to be described later) of the processor element 3a via the / write signal 41b. Further, the memory controller 5 gives a clock signal from the external interface 4 to a register controller 31a (to be described later) of the processor element 3a via the clock signal 41c.

また、メモリコントローラ５は、上述したように、ＳＩＭＤ型プロセッサ１の外部に設けられたメモリ６に格納されているデータを、本実施形態では８ビットのパラレルデータとして、データバス４１ｄに置く。この８ビットのパラレルデータについては、データに応じて適宜変更しても問題ない。このデータバス４１ｄは、レジスタ３１ｂに保持されている演算処理されたデータが、ＳＩＭＤ型プロセッサ１の外部に設けられたメモリ６に送られる時にも使用される。 Further, as described above, the memory controller 5 places the data stored in the memory 6 provided outside the SIMD type processor 1 on the data bus 41d as 8-bit parallel data in the present embodiment. The 8-bit parallel data can be appropriately changed according to the data. The data bus 41d is also used when the processed data held in the register 31b is sent to the memory 6 provided outside the SIMD type processor 1.

なお、メモリ６は演算処理されるデータを格納するとともに、演算処理されたデータを格納するものであり、これらのメモリ６はＳＩＭＤ型プロセッサ１の内部に設けても問題ない。また、メモリコントローラ５とメモリ６との間のデータ転送についても、本実施形態では８ビットのパラレルデータとして転送されるものとして扱うが、データに応じて適宜変更しても問題ない。なお、メモリコントローラ５が行うその他の動作については後述する。 Note that the memory 6 stores data to be subjected to arithmetic processing and stores data subjected to arithmetic processing, and there is no problem even if these memories 6 are provided inside the SIMD type processor 1. Also, data transfer between the memory controller 5 and the memory 6 is handled as being transferred as 8-bit parallel data in this embodiment, but there is no problem even if it is appropriately changed according to the data. Other operations performed by the memory controller 5 will be described later.

また、グローバルプロセッサ２は、上記シーケンスユニット２２からの命令により、算術論理演算を行うＡＬＵ２３、演算データを格納するデータＲＡＭ２４を備える。さらに、グローバルプロセッサ２は、演算処理されるデータ等を保持するためのレジスタ群２５を備える。 In addition, the global processor 2 includes an ALU 23 that performs arithmetic logic operations and a data RAM 24 that stores operation data in accordance with instructions from the sequence unit 22. Further, the global processor 2 includes a register group 25 for holding data to be processed.

このレジスタ群２５は、プログラムのアドレスを保持するプログラムカウンタＰＣ、演算処理のデータ格納のための汎用レジスタであるＧ０〜Ｇ３レジスタ、レジスタ待避、復帰時に待避先データＲＡＭのアドレスを保持しているスタックポインタ（ＳＰ）、サブルーチンコール時にコール元のアドレスを保持するリンクレジスタ（ＬＳ）、同じくＩＲＱ時とＮＭＩ時の分岐元アドレスを保持するＬＩ、ＬＮレジスタ、プロセッサの状態を保持しているプロセッサステータスレジスタ（Ｐ）を内蔵している。 The register group 25 includes a program counter PC that holds a program address, G0 to G3 registers that are general-purpose registers for storing data for arithmetic processing, and a stack that holds the address of the save destination data RAM at the time of register save and return. Pointer (SP), link register (LS) that holds the address of the caller at the time of a subroutine call, LI and LN registers that hold branch source addresses at the time of IRQ and NMI, and a processor status register that holds the state of the processor (P) is incorporated.

また、レジスタ群２５は、プロセッサエレメントブロック３の後述するレジスタ３５に接続されており、このレジスタ３５との間でシーケンスユニット２２の制御によりデータの交換が行われる。 The register group 25 is connected to a later-described register 35 of the processor element block 3, and data is exchanged with the register 35 under the control of the sequence unit 22.

プロセッサエレメントブロック３は、図１及び図３に示すように、レジスタファイル３１、マルチプレクサ３２、シフト・拡張回路３３、算術論理演算器３４（以下、「ＡＬＵ３４」という。）、レジスタ３５、を一単位とする複数のプロセッサエレメント３ａを備える。レジスタファイル３１には、１つのプロセッサエレメント３ａ単位に８ビットのレジスタが３２本内蔵されており、本実施形態では２５６プロセッサエレメント分の組がアレイ構成になっている。レジスタファイル３１は１つのプロセッサエレメント（ＰＥ）３ａごとにＲ０、Ｒ１、Ｒ２、．．．Ｒ３１と呼ばれているレジスタが内蔵されている。それぞれのレジスタファイル３１は演算アレイ３６に対して１つの読み出しポートと１つの書き込みポートを備えており、８ビットのリード／ライト兼用のバスで演算アレイ３６からアクセスされる。３２本のレジスタの内、２４本はプロセッサ外部からアクセス可能であり、外部からクロックとアドレス、リード／ライト制御を入力することで任意のレジスタを読み書きできる。 As shown in FIGS. 1 and 3, the processor element block 3 includes a register file 31, a multiplexer 32, a shift / extension circuit 33, an arithmetic logic unit 34 (hereinafter referred to as “ALU 34”), and a register 35 as one unit. A plurality of processor elements 3a. The register file 31 includes 32 8-bit registers for each processor element 3a. In this embodiment, a set of 256 processor elements has an array configuration. The register file 31 stores R0, R1, R2,... For each processor element (PE) 3a. . . A register called R31 is incorporated. Each register file 31 has one read port and one write port for the arithmetic array 36 and is accessed from the arithmetic array 36 by an 8-bit read / write bus. Of the 32 registers, 24 are accessible from outside the processor, and any register can be read and written by inputting a clock, an address, and read / write control from the outside.

レジスタの外部からのアクセスは１つの外部ポートで各プロセッサエレメント（ＰＥ）の１つのレジスタがアクセス可能であり外部から入力されたアドレスでプロセッサエレメント（ＰＥ）の番号（０〜２５５）を指定する。したがって、レジスタアクセスの外部ポートは全部で２４組搭載されている。また、外部からのアクセスは偶数のプロセッサエレメント（ＰＥ）と奇数のプロセッサエレメント（ＰＥ）の１組で１６ビットデータとなっており、１回のアクセスで２つのレジスタを同時にアクセスしている。 Access from the outside of the register allows one register of each processor element (PE) to be accessed by one external port, and the number (0 to 255) of the processor element (PE) is designated by an address input from the outside. Therefore, a total of 24 external ports for register access are installed. Access from the outside is 16-bit data in one set of even-numbered processor elements (PE) and odd-numbered processor elements (PE), and two registers are accessed simultaneously by one access.

本実施形態では、プロセッサエレメント３ａの数を２５６個として説明するが、これに限定されるものでなく適宜変更して使用してもよい。このプロセッサエレメント３ａには、グローバルプロセッサ２のシーケンスユニット２２により、外部インタフェース４に近い順に０から２５５までのアドレスが割り付けられる。 In the present embodiment, the number of processor elements 3a is assumed to be 256. However, the number of processor elements 3a is not limited to this, and may be changed as appropriate. Addresses 0 to 255 are assigned to the processor element 3a in the order of closeness to the external interface 4 by the sequence unit 22 of the global processor 2.

プロセッサエレメント３ａのレジスタファイル３１は、レジスタコントローラ３１ａ、２種類のレジスタ３１ｂ、３１ｃを備える。本実施形態では、図３及び図４に示すように、一単位のプロセッサエレメント３ａ毎に、レジスタコントローラ３１ａとレジスタ３１ｂとを２４組備え、さらにレジスタ３１ｃを８個備えている。なお、図４では２組のプロセッサエレメント３ａにおけるレジスタファイル３１の一部を表しており、図３、４中の１ＰＥとは１つのプロセッサエレメント３ａを表している。ここで、本実施形態では、レジスタ３１ｂ、３１ｃを８ビットのものとして扱うが、これに限定されるものでなく適宜変更して使用してもよい。 The register file 31 of the processor element 3a includes a register controller 31a and two types of registers 31b and 31c. In this embodiment, as shown in FIG. 3 and FIG. 4, for each unit of processor element 3a, 24 sets of register controller 31a and register 31b are provided, and 8 registers 31c are further provided. 4 shows a part of the register file 31 in the two sets of processor elements 3a, and 1PE in FIGS. 3 and 4 represents one processor element 3a. Here, in the present embodiment, the registers 31b and 31c are handled as 8-bit registers, but the present invention is not limited to this and may be used with appropriate modifications.

レジスタコントローラ３１ａは、図４に示すように、外部インタフェース４と、上述したアドレスバス４１ａ、リード／ライト信号４１ｂ、クロック信号４１ｃを介して接続されている。このレジスタコントローラ３１ａは、メモリコントローラ５から外部インタフェース４に与えられ、アドレスバス４１ａを介してアドレス指定信号が送られてくると、そのアドレス指定信号をデコードする。そして、デコードしたアドレスと、自己のプロセッサエレメント３ａに割り付けられたアドレスとが一致する場合には、メモリコントローラ５から外部インタフェース４に与えられ、クロック信号４１ｃからのクロック信号に同期して、リード／ライト信号４１ｂを介してメモリコントローラ５から送られてきたリード／ライト指示信号を得る。このリード／ライト指示信号は、レジスタ３１ｂへ与えられる。 As shown in FIG. 4, the register controller 31a is connected to the external interface 4 via the address bus 41a, the read / write signal 41b, and the clock signal 41c described above. When the register controller 31a is supplied from the memory controller 5 to the external interface 4 and receives an address designation signal via the address bus 41a, the register controller 31a decodes the address designation signal. If the decoded address matches the address assigned to its own processor element 3a, it is given from the memory controller 5 to the external interface 4 and read / read in synchronization with the clock signal from the clock signal 41c. A read / write instruction signal sent from the memory controller 5 is obtained via the write signal 41b. This read / write instruction signal is applied to the register 31b.

レジスタ３１ｂは、後述するＡＬＵ３４でこれから演算される外部から入力されたデータを保持したり、或いはＡＬＵ３４で演算処理されたデータを外部へ出力するために保持するものであり、いわゆる入力レジスタとしても、或いは出力レジスタとしても機能する。また、演算処理されるデータ、或いは演算されたデータを一時的に保持するといった、後述するレジスタ３１ｃとしての機能も有する。なお、本実施形態では、レジスタ３１ｂは８ビットのデータを保持できるものとして扱うが、データに応じて適宜変更しても問題ない。上述したレジスタコントローラ３１ａからライト指示信号が与えられると、レジスタ３１ｂは演算処理されるデータをデータバス４１ｄより取得して保持する。一方、レジスタコントローラ３１ａからリード指示信号が送られてくると、レジスタ３１ｂは保持している演算処理されたデータをデータバス４１ｄへ与える。このデータは外部インタフェース４からメモリコントローラ５のライトバッファ部５４に与えられ、ライトバッファ部５４からメモリ６へ格納される。 The register 31b holds data input from the outside that will be calculated in the ALU 34, which will be described later, or holds the data processed in the ALU 34 for output to the outside. Alternatively, it functions as an output register. Further, it also has a function as a register 31c, which will be described later, such as temporarily holding data to be processed or calculated data. In this embodiment, the register 31b is handled as one that can hold 8-bit data, but there is no problem even if it is appropriately changed according to the data. When the write instruction signal is given from the register controller 31a described above, the register 31b acquires the data to be processed from the data bus 41d and holds it. On the other hand, when a read instruction signal is sent from the register controller 31a, the register 31b gives the data processed and held to the data bus 41d. This data is given from the external interface 4 to the write buffer unit 54 of the memory controller 5 and stored in the memory 6 from the write buffer unit 54.

また、レジスタ３１ｂは、本実施形態においては８ビットデータをパラレルで転送するデータバス３６を介してマルチプレクサ３２に接続されている。ＡＬＵ３４で演算処理されるデータ、或いはＡＬＵ３４で演算処理されたデータは、このデータバス３６を介して、レジスタ３１ｂとの間で転送される。この転送は、グローバルプロセッサ２のシーケンスユニット２２からの指示によって、グローバルプロセッサ２に接続されたリード信号２６ａ、ライト信号２６ｂを介して行われる。具体的には、グローバルプロセッサ２のシーケンスユニット２２から、リード信号２６ａを介してリード指示信号が送られてくると、レジスタ３１ｂはデータバス３６を介して送られてきたＡＬＵ３４で演算処理されたデータを保持する。一方、グローバルプロセッサ２のシーケンスユニット２２から、ライト信号２６ｂを介してライト指示信号が送られてくると、レジスタ３１ｂは保持している演算処理されるデータをデータバス３６へ置く。このデータはＡＬＵ３４へ送られ演算処理される。 The register 31b is connected to the multiplexer 32 via a data bus 36 for transferring 8-bit data in parallel in this embodiment. Data processed by the ALU 34 or data processed by the ALU 34 is transferred to the register 31b via the data bus 36. This transfer is performed via a read signal 26 a and a write signal 26 b connected to the global processor 2 according to an instruction from the sequence unit 22 of the global processor 2. Specifically, when a read instruction signal is sent from the sequence unit 22 of the global processor 2 via the read signal 26a, the register 31b receives the data processed by the ALU 34 sent via the data bus 36. Hold. On the other hand, when a write instruction signal is sent from the sequence unit 22 of the global processor 2 via the write signal 26b, the register 31b puts the data to be processed on the data bus 36. This data is sent to the ALU 34 and processed.

レジスタ３１ｃは、レジスタ３１ｂより与えられた演算処理されるデータ、或いは演算されたデータがレジスタ３１ｂに与えられる前に、そのデータを一時的に保持するものである。このレジスタ３１ｃは、上述したレジスタ３１ｂと異なり、メモリコントローラ５を介して、メモリ６との間においてデータ転送はしない。 The register 31c temporarily holds the data to be processed by the register 31b or before the calculated data is supplied to the register 31b. Unlike the register 31b described above, the register 31c does not transfer data to or from the memory 6 via the memory controller 5.

演算アレイ３６は、マルチプレクサ３２シフト／拡張回路３３、１６ビットＡＬＵ３４及び１６ビットのレジスタ３５を備えている。このレジスタ３５には、１６ビットＡレジスタ、Ｆレジスタを内蔵している。 The arithmetic array 36 includes a multiplexer 32 shift / expansion circuit 33, a 16-bit ALU 34, and a 16-bit register 35. The register 35 includes a 16-bit A register and an F register.

プロセッサエレメント（ＰＥ）３ａの命令による演算は、基本的にレジスタファイル３１から読み出されたデータをＡＬＵ３４の片側の入力としてもう片側にはレジスタ３５のＡレジスタの内容を入力として結果をＡレジスタに格納する。したがって、Ａレジスタとレジスタファイル３１のＲ０〜Ｒ３１レジスタとの演算が行われることとなる。レジスタファイル３１と演算アレイ３６との接続に（７ｔｏ１）のマルチプレクサ３２を置いており、プロセッサエレメント（ＰＥ）方向で左に１、２、３つ離れたデータと右に１、２、３つ離れたデータ、中央のデータを演算対象として選択している。また、レジスタファイル３１の８ビットのデータはシフト／拡張回路３３により任意ビットの左シフトしてＡＬＵ３４に入力される。さらに、図示していない８ビットの条件レジスタ（Ｔ）により、プロセッサエレメント３ａごとに演算実行の無効／有効の制御をしており、特定のプロセッサエレメント３ａだけを演算対象として選択できるように構成している。 The calculation by the instruction of the processor element (PE) 3a basically uses the data read from the register file 31 as input on one side of the ALU 34 and the contents of the A register of the register 35 as input on the other side and the result into the A register. Store. Therefore, the operation between the A register and the R0 to R31 registers of the register file 31 is performed. A (7 to 1) multiplexer 32 is placed between the register file 31 and the operation array 36. The data is 1, 2, 3 away from the left in the processor element (PE) direction, and 1, 2, 3 away from the right. Data and center data are selected for calculation. The 8-bit data in the register file 31 is shifted to the left by an arbitrary bit by the shift / extension circuit 33 and input to the ALU 34. In addition, the execution / invalidation control of each processor element 3a is controlled by an 8-bit condition register (T) (not shown) so that only a specific processor element 3a can be selected as an operation target. ing.

上記したように、マルチプレクサ３２は、自己のプロセッサエレメント３ａに備えられた上記データバス３６に接続されるとともに、両隣３つのプロセッサエレメント３ａに備えられたデータバス３６にも接続されている。このマルチプレクサ３２は７つのプロセッサエレメント３ａから１つを選択し、その選択したプロセッサエレメント３ａにおけるレジスタレジスタ３１ｂ、３１ｃで保持されているデータをＡＬＵ３４へ送る。或いはＡＬＵ３４で演算処理されたデータを、選択したプロセッサエレメント３ａにおけるレジスタレジスタ３１ｂ、３１ｃへ送る。これによって、隣のプロセッサエレメント３ａにおけるレジスタレジスタ３１ｂ、３１ｃで保持されているデータを利用した演算処理が可能になり、ＳＩＭＤ型プロセッサ１の演算処理能力を高めることができる。 As described above, the multiplexer 32 is connected to the data bus 36 provided in its own processor element 3a, and is also connected to the data bus 36 provided in the three adjacent processor elements 3a. The multiplexer 32 selects one of the seven processor elements 3 a and sends the data held in the register registers 31 b and 31 c in the selected processor element 3 a to the ALU 34. Alternatively, the data processed by the ALU 34 is sent to the register registers 31b and 31c in the selected processor element 3a. As a result, arithmetic processing using data held in the register registers 31b and 31c in the adjacent processor element 3a becomes possible, and the arithmetic processing capability of the SIMD type processor 1 can be enhanced.

シフト／拡張回路３３は、マルチプレクサ３２から送られてきたデータを所定ビットシフトしてＡＬＵ３４へ送る。或いはＡＬＵ３４から送られてきた演算処理されたデータを所定ビットシフトしてマルチプレクサ３２へ送る。 The shift / extension circuit 33 shifts the data sent from the multiplexer 32 by a predetermined bit and sends it to the ALU 34. Alternatively, the arithmetically processed data sent from the ALU 34 is shifted by a predetermined bit and sent to the multiplexer 32.

ＡＬＵ３４は、シフト／拡張回路３３から送られてきたデータと、レジスタ３５に保持されているデータとに基づき算術論理演算を行う。なお、本実施形態では、ＡＬＵ３４は１６ビットのデータに対応できるものとして扱うが、データに応じて適宜変更しても問題ない。演算処理されたデータは、レジスタ３５に保持され、シフト／拡張回路３３へ転送されたり、或いはグローバルプロセッサ２の汎用レジスタ２５へ転送される。 The ALU 34 performs arithmetic logic operations based on the data sent from the shift / expansion circuit 33 and the data held in the register 35. In this embodiment, the ALU 34 is handled as being capable of handling 16-bit data, but there is no problem even if it is appropriately changed according to the data. The processed data is held in the register 35 and transferred to the shift / expansion circuit 33 or transferred to the general-purpose register 25 of the global processor 2.

次に、外部からプロセッサエレメント３ａのレジスタファイル３１へのアクセスにつき図４を参照して説明する。この図４では、外部インタフェース４の外部ポートは８ビットのアドレス、ハイレベル時にリード動作をローレベル時にライト動作を示すリード／ライト選択信号、転送のタイミングを示すクロック、転送データである８ビットデータで構成されている。これらの信号はプロセッサの外部インタフェース４に接続され、ここでタイミングおよびバッファリングされ、プロセッサ内部の信号としてアドレス、リード／ライト、クロック、データに変換される。 Next, external access to the register file 31 of the processor element 3a will be described with reference to FIG. In FIG. 4, the external port of the external in tough Esu 4 8-bit address, the read / write selection signal indicating a write operation to a high level during a read operation to a low level when the clock indicating the timing of the transfer is a transfer data 8 It consists of bit data. These signals are connected to the external interface 4 of the processor, where they are timing and buffered, and converted into addresses, read / writes, clocks, and data as internal signals.

これらの信号はレジスタファイル３１の各レジスタに供給されるが、各プロセッサエレメント３ａ…ごとにアドレスをデコードして各プロセッサエレメント３ａ…を示すアドレスと一致したプロセッサエレメント３ａだけがリード／ライトの動作をおこなう。そのため各プロセッサエレメント３ａごとにアドレスのデコードとリード／ライトの制御を行うレジスタコントローラ３１ａを備える。そして、入出力レジスタ３１ｂには、リード／ライト信号４１ｂから与えられるリード／ライト指示信号（ライト信号Ｗ１、リード信号Ｒ１）に基づき、外部インタフェース４と接続されたデータバス４１ｄとデータの転送をおこなう。入出力レジスタ３１ｂは演算アレイ３６ともデータの転送をおこなうため、もう一方の入出力ポートを持ち、命令によりグローバルプロセッサ２で作成され、リード信号２６ａ及びライト信号２６ｂから与えられたライト（Ｗ２）、リード（Ｒ２）制御信号により、演算アレイ３６と接続されたデータバス３７（Ｄ２）からデータの転送をおこなう。 These signals are supplied to each register of the register file 31. However, only the processor element 3a corresponding to the address indicating each processor element 3a... Decodes the address for each processor element 3a. Do it. Therefore, each processor element 3a is provided with a register controller 31a that performs address decoding and read / write control. The input / output register 31b transfers data to the data bus 41d connected to the external interface 4 based on the read / write instruction signals (write signal W1, read signal R1) given from the read / write signal 41b. . For input and output register 31b is used for transferring both the operation array 36 data has the other input and output ports are created in global processor 2 by the instruction, the read signal 26a and the write signal 26b from the given et the light (W2) In response to the read (R2) control signal, data is transferred from the data bus 37 (D2) connected to the arithmetic array 36.

図４では２個のプロセッサエレメント３ａ分の構成だけを図示しているが、図３の２５６個のプロセッサエレメント３ａ…の構成と合わせるためには、レジスタコントローラ３１ａとレジスタファイル３１ｂは２５６組必要となる。また、２５６組を選択するためにアドレスのビット幅は８ビットとなっている。従って、プロセッサエレメント３ａの数の増減によりアドレスのビット幅も変化することとなる。また、データのビット幅もここでは８ビットとしているが１度に転送するデータ量により変化する。 In FIG. 4, only the configuration for two processor elements 3a is shown, but in order to match the configuration of the 256 processor elements 3a ... in FIG. 3, 256 sets of register controllers 31a and register files 31b are required. Become. In order to select 256 sets, the bit width of the address is 8 bits. Therefore, the bit width of the address also changes as the number of processor elements 3a increases or decreases. The bit width of the data is 8 bits here, but it varies depending on the amount of data transferred at one time.

このように構成される本実施形態におけるＳＩＭＤ型プロセッサ１は、以下のような動作を行うため、以下のような利点を得ることができる。 Since the SIMD type processor 1 according to the present embodiment configured as described above performs the following operation, the following advantages can be obtained.

メモリコントローラ５が、メモリ６に格納されているデータをプロセッサエレメント３ａに送る場合、プロセッサエレメント３ａに割り付けられたアドレスを指定することにより、１回のクロック信号が入力されるだけで、その指定したプロセッサエレメント３ａにデータを送ることができる。例えばデータを偶数番目のプロセッサエレメント３ａにだけ転送したい場合には、偶数番目のプロセッサエレメント３ａをアドレス指定すればよい。よって、奇数番目のプロセッサエレメント３ａに、データを転送する必要がないため、データ転送が高速になり、延いてはデータ処理を高速にすることができる。 When the memory controller 5 sends the data stored in the memory 6 to the processor element 3a, by designating the address assigned to the processor element 3a, only one clock signal is input. Data can be sent to the processor element 3a. For example, if it is desired to transfer data only to the even-numbered processor element 3a, the even-numbered processor element 3a may be addressed. Therefore, since there is no need to transfer data to the odd-numbered processor elements 3a, the data transfer can be performed at high speed, and the data processing can be performed at high speed.

また、これとは逆に、レジスタ３１ｂに保持されている演算処理されたデータをメモリ５に転送する場合においても、メモリコントローラ５が、プロセッサエレメント３ａに割り付けられたアドレスを指定することにより、１回のクロック信号が入力されるだけで、指定したプロセッサエレメント３ａのレジスタ３１ｂに保持されているデータをメモリ６に転送できる。従って、この場合においても、必要なデータのみを転送できるため、データ転送が高速になり、延いてはデータ処理を高速にすることができる。 On the contrary, even when the arithmetically processed data held in the register 31b is transferred to the memory 5, the memory controller 5 designates the address assigned to the processor element 3a, so that 1 The data held in the register 31b of the designated processor element 3a can be transferred to the memory 6 only by inputting the clock signal for the first time. Accordingly, even in this case, only necessary data can be transferred, so that the data transfer can be performed at a high speed, and the data processing can be performed at a high speed.

一つのプロセッサエレメント３ａにつき、２４個づつ備えられているレジスタ３１ｂは、上述したように、演算処理されるデータを保持したり、或いは演算処理されたデータを保持するものであり、いわゆる入力レジスタとしても、或いは出力レジスタとしても機能する。例えば、メモリコントローラ５からプロセッサエレメント３ａに送られるデータ、即ち入力データが５６ビットのものであり、プロセッサエレメント３ａからメモリコントローラ５に送るデータ、即ち出力データが３２ビットのものであり、一時的に保持されるべきデータが８０ビットである場合のアプリケーションを考える。この場合、７個のレジスタ３１ｂを５６ビットの入力データを保持するものとして利用し（８ビット×７個＝５６ビット）、４個のレジスタ３１ｂを３２ビットの出力データを保持するものとして利用することができる（８ビット×４個＝３２ビット）。このように、入力データのビット数及び出力データのビット数それぞれのビット数に係わらず、入力データのビット数と出力データのビット数との合計が、８ビット×２４個＝１９２ビットを越えなければ、そのアプリケーションの演算実行ができる。 As described above, 24 registers 31b provided for each processor element 3a hold data to be processed or hold data that has been processed, so-called input registers. Or functions as an output register. For example, data sent from the memory controller 5 to the processor element 3a, ie, input data is 56 bits, data sent from the processor element 3a to the memory controller 5, ie, output data is 32 bits, and temporarily Consider an application where the data to be retained is 80 bits. In this case, seven registers 31b are used as holding 56-bit input data (8 bits × 7 = 56 bits), and four registers 31b are used as holding 32-bit output data. (8 bits × 4 pieces = 32 bits). Thus, regardless of the number of bits of input data and the number of bits of output data, the sum of the number of bits of input data and the number of bits of output data must exceed 8 bits × 24 = 192 bits. For example, the application can be executed.

また、データを一時的に保持するレジスタ３１ｃは、本実施形態では、一つのプロセッサエレメント３ａにつき８個づつ備えられている。そのため、８ビット×８個＝６４ビット分を保持できる。しかし、この例のように、一時的に保持されるべきデータが８０ビットである場合には、レジスタ３１ｃだけでは１６ビット（＝８０ビット−６４ビット）分のデータが保持できない。この場合においても、本実施形態においてレジスタ３１ｂは、上述したようにデータを一時的に保持する機能も有するため、使用していない１１個（＝２４個−７個−４個）のレジスタ３１ｂのうち、２個（８ビット×２個＝１６ビット）を一時的なデータ保持のために使用すればよい。 In the present embodiment, eight registers 31c for temporarily storing data are provided for each processor element 3a. Therefore, 8 bits × 8 pieces = 64 bits can be held. However, as in this example, when the data to be temporarily stored is 80 bits, 16 bits (= 80 bits−64 bits) of data cannot be stored only by the register 31c. Even in this case, in the present embodiment, the register 31b also has a function of temporarily holding data as described above. Therefore, 11 (= 24−7−4) registers 31b that are not used are used. Of these, two (8 bits × 2 = 16 bits) may be used for temporary data retention.

このように、レジスタ３１ｂの使用用途が柔軟であるため、データのビット数に柔軟に対応したデータ処理が可能である。このことは、このＳＩＭＤ型プロセッサ１で演算処理できるアプリケーションの幅が増えることになり、使用用途が広がるという利点がある。 Thus, since the usage of the register 31b is flexible, data processing that flexibly corresponds to the number of data bits is possible. This has the advantage that the range of applications that can be processed by the SIMD type processor 1 is increased, and the usage is expanded.

上記した実施の形態においては、外部インタフェース４の外部ポートは外部端子として説明しているが、図５の実施形態のように、転送先のメモリ６とメモリ転送ブロック７が同一チップ上に搭載され、特に外部端子として外部ポートを出力しない場合でも、図３のプロセッサエレメント３ａ…単位でのアドレスデコードとリード／ライトコントロールにより、同一チップに搭載されたメモリ転送ブロック７等で各プロセッサエレメント３ａ…の任意のレジスタをアクセスすることが可能である。 In the above embodiment, the external port of the external interface 4 is described as an external terminal. However, as in the embodiment of FIG. 5, the transfer destination memory 6 and the memory transfer block 7 are mounted on the same chip. In particular, even when an external port is not output as an external terminal, each of the processor elements 3a... In the memory transfer block 7 etc. mounted on the same chip by address decoding and read / write control in units of the processor elements 3a. Any register can be accessed.

次に、上記実施の形態の変更例につき図６に従い説明する。図６に示す構成は、図４の基本構成を２つ搭載している。即ち、図３に示す実施の形態では、入出力レジスタ３１ｂは全部で２４個あり、８個は演算アレイ３６からのみアクセス可能な演算処理用の一時的なデータ保持に使用される演算レジスタ３１ｃである。この２種類のレジスタが合計で３２個あるため、例えば、入力データが５６ビット、出力データが３２ビット、一時的なデータ保持に８０ビットが必要なアプリケーションでは、７個の入出力レジスタ３１ｂを外部入力レジスタ用に、４個の入出力レジスタ３１ｂを外部出力レジスタに、８個の演算レジスタ３１ｃと２個の入出力レジスタ３１ｂの合計１０個を一時的なデータ保持に割り当てることで実現できる。つまり、入力データと出力データのビット幅の合計が１９２ビットまでで、一時的なデータ保持のビット幅を加えた合計のビット幅が２５６ビットまでのアプリケーションであれば自由にレジスタの使用方法を設定して実現できることになる。これに対して、従来のプロセッサでは入力レジスタ、出力レジスタ、演算レジスタが固定のビット幅であったため、いずれかのビット幅を超えるアプリケーションは実現できなかった。 Next, a modified example of the above embodiment will be described with reference to FIG. The configuration shown in FIG. 6 has two basic configurations shown in FIG. That is, in the embodiment shown in FIG. 3, there are 24 input / output registers 31b in total, and 8 are operation registers 31c used for temporary data storage for operation processing accessible only from the operation array 36. is there. Since these two types of registers are 32 in total, for example, in an application that requires 56 bits of input data, 32 bits of output data, and 80 bits for temporary data retention, 7 input / output registers 31b are externally provided. This can be realized by assigning four input / output registers 31b as external output registers and a total of ten operation registers 31c and two input / output registers 31b for temporary data storage. In other words, if the total of the bit width of input data and output data is up to 192 bits and the total bit width including the bit width of temporary data retention is up to 256 bits, the register usage can be freely set Can be realized. On the other hand, in the conventional processor, the input register, the output register, and the arithmetic register have a fixed bit width, and thus an application exceeding any one of the bit widths cannot be realized.

（第２の実施形態）
本発明に係るＳＩＭＤ型プロセッサ１の第２の実施形態を図７を参照して以下説明する。なお、ここでは上述した第１実施形態と異なる点について説明することとし、同じ点については説明を省略する。また、上述した第１実施形態と同じ構成部分については、同一の符号を付する。 (Second Embodiment)
A second embodiment of the SIMD type processor 1 according to the present invention will be described below with reference to FIG. Here, the points different from the first embodiment described above will be described, and the description of the same points will be omitted. Moreover, the same code | symbol is attached | subjected about the same component as 1st Embodiment mentioned above.

この第２実施形態におけるＳＩＭＤ型プロセッサ１は、互いに隣り合う２つのプロセッサエレメント３ａに偶数番号、奇数番号を割り付けて一組とするとともに、この一組のプロセッサエレメント３ａには、同一のアドレスを割り付けていることを特徴とする。さらに、偶数番号が割り付けられたプロセッサエレメント３ａ用の偶数用データバス４６ａと、奇数番号が割り付けられたプロセッサエレメント３ａ用の奇数用データバス４６ｂと、を各組毎のプロセッサエレメント３ａにそれぞれ割り当てていることを特徴とする。また、メモリコントローラ４とＳＩＭＤ型プロセッサ１の外部に設けられたメモリ５、６との間において、データは上記第１実施形態のように８ビットではなく、１６ビットがパラレルで転送されることも特徴とする。この１６ビットのデータは、偶数番号が割り付けられたプロセッサエレメント３ａに与えられる８ビットと、奇数番号が割り付けられたプロセッサエレメント３ａに与えられる８ビットとから構成されている。以下、具体的にこの実施形態について説明する。 The SIMD type processor 1 in the second embodiment assigns even numbers and odd numbers to two adjacent processor elements 3a as one set, and assigns the same address to this set of processor elements 3a. It is characterized by. Further, an even data bus 46a for the processor element 3a to which the even number is assigned and an odd data bus 46b for the processor element 3a to which the odd number is assigned are respectively assigned to the processor elements 3a of each set. It is characterized by being. Further, 16 bits may be transferred in parallel between the memory controller 4 and the memories 5 and 6 provided outside the SIMD type processor 1 instead of 8 bits as in the first embodiment. Features. This 16-bit data is composed of 8 bits given to the processor element 3a assigned with the even number and 8 bits given to the processor element 3a assigned with the odd number. Hereinafter, this embodiment will be specifically described.

まず、グローバルプロセッサ２からメモリコントローラ５へはＩ／Ｏ用のアドレス、データ、コントロール信号がバスを介して与えられる。グローバルプロセッサ２がメモリコントローラ５のいくつかの動作設定レジスタ（図示せず）へ動作方法等のコマンドを設定している。最後にグローバルプロセッサ２は、メモリコントローラ５のスタートレジスタ（図示せず）にスタートコードを書き込むことで、メモリコントローラ５は自動的に設定に従った動作を行う。 First, an I / O address, data, and control signal are given from the global processor 2 to the memory controller 5 via a bus. The global processor 2 sets commands such as an operation method in some operation setting registers (not shown) of the memory controller 5. Finally, the global processor 2 writes a start code in a start register (not shown) of the memory controller 5 so that the memory controller 5 automatically performs an operation according to the setting.

外部インタフェース４は、メモリコントローラ５からアドレス制御信号を受けると、アドレス指定信号をアドレスバス４１ａを介してプロセッサエレメントブロック３ヘ送る。これにより、一組のプロセッサエレメント３ａ、即ち２つのプロセッサエレメント３ａが同時にアドレス指定される。レジスタコントローラ３１ａは、送られてきたアドレス指定信号をデコードし、デコードしたアドレスと、自己に割り付けられたアドレスとが一致する場合には、メモリコントローラ５からクロック信号４１ｃを介して送られてきたクロック信号に同期して、リード／ライト信号４５ａ或いは４５ｂを介してメモリコントローラ４から送られてきたリード／ライト指示信号を得る。具体的には、偶数番号が割り付けられているレジスタコントローラ３１ａは、偶数用リード／ライト信号４５ａを介してメモリコントローラ４から送られてきたリード／ライト指示信号を得る。一方、奇数番号が割り付けられているレジスタコントローラ３１ａは、奇数用リード／ライト信号４５ｂを介してメモリコントローラ４から送られてきたリード／ライト指示信号を得る。このとき一組を構成するプロセッサエレメント３ａのレジスタコントローラ３１ａへ送られるリード／ライト指示信号はそれぞれ異なるものであってもよい。即ち、偶数番号が割り付けられているレジスタコントローラ３１ａへ送られる指示信号がリード指示であるとき、奇数番号が割り付けられているレジスタコントローラ３１ａへ送られる指示信号はライト指示であってもよい。そして、このリード／ライト指示信号はレジスタ３１ｂに与えられる。 When receiving the address control signal from the memory controller 5, the external interface 4 sends an address designation signal to the processor element block 3 via the address bus 41a. Thereby, a set of processor elements 3a, ie two processor elements 3a, are addressed simultaneously. The register controller 31a decodes the address designation signal sent, and if the decoded address matches the address assigned to itself, the register controller 31a sends the clock sent from the memory controller 5 via the clock signal 41c. In synchronization with the signal, a read / write instruction signal sent from the memory controller 4 is obtained via the read / write signal 45a or 45b. Specifically, the register controller 31a to which the even number is assigned obtains the read / write instruction signal sent from the memory controller 4 via the even read / write signal 45a. On the other hand, the register controller 31a to which the odd number is assigned obtains the read / write instruction signal sent from the memory controller 4 via the odd read / write signal 45b. At this time, the read / write instruction signals sent to the register controller 31a of the processor element 3a constituting the set may be different. That is, when the instruction signal sent to the register controller 31a assigned with the even number is a read instruction, the instruction signal sent to the register controller 31a assigned with the odd number may be a write instruction. The read / write instruction signal is given to the register 31b.

レジスタコントローラ３１ａから双方のプロセッサエレメント３ａに対し、ライト指示信号が送られてきた場合には、偶数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されるデータ（８ビット）を偶数用データバス４６ａより取得して保持する。また、奇数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されるデータ（８ビット）を奇数用データバス４６ｂより取得して保持する。一方、レジスタコントローラ３１ａから双方のプロセッサエレメント３ａに対し、リード指示信号が送られてきた場合には、偶数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されたデータ（８ビット）を偶数用データバス４６ａへ送る。また、奇数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されたデータ（８ビット）を奇数用データバス４６ｂへ送る。 When a write instruction signal is sent from the register controller 31a to both processor elements 3a, the register 31b of the processor element 3a to which the even number is assigned uses the data (8 bits) to be processed for an even number. Obtained from the data bus 46a and held. Further, the register 31b of the processor element 3a to which the odd number is assigned acquires the data (8 bits) to be processed from the odd data bus 46b and holds it. On the other hand, when a read instruction signal is sent from the register controller 31a to both the processor elements 3a, the register 31b of the processor element 3a to which the even number is assigned receives the processed data (8 bits). The data is sent to the even data bus 46a. In addition, the register 31b of the processor element 3a to which the odd number is assigned sends the arithmetically processed data (8 bits) to the odd data bus 46b.

このように、一度のアドレス指定により、偶数番号が割り付けられたプロセッサエレメント３ａにデータ転送できるとともに、奇数番号が割り付けられたプロセッサエレメント３ａにもデータ転送できる。このため、データの転送回数を少なくすることができ、データ転送を高速にできる。よって、データ処理を高速にできる。また、本実施形態においても、上記第１実施形態と同様に、プロセッサエレメント３ａをアドレス指定していることより、上記第１実施形態と同様の利点を得ることができる。 As described above, data can be transferred to the processor element 3a to which the even number is assigned, and can be transferred to the processor element 3a to which the odd number is assigned. For this reason, the number of times of data transfer can be reduced, and data transfer can be performed at high speed. Therefore, data processing can be performed at high speed. Also in the present embodiment, the same advantages as in the first embodiment can be obtained by addressing the processor element 3a as in the first embodiment.

次に、上記実施の形態の変更例につき図８に従い説明する。図８に示す構成は、図７の基本構成を２つ搭載している。即ち、図３に示す実施の形態では、入出力レジスタ３１ｂは全部で２４個あり、８個は演算アレイ３６からのみアクセス可能な演算処理用の一時的なデータ保持に使用される演算レジスタ３１ｃである。この２種類のレジスタが合計で３２個あるため、例えば、入力データが５６ビット、出力データが３２ビット、一時的なデータ保持に８０ビットが必要なアプリケーションでは、７個の入出力レジスタ３１ｂを外部入力レジスタ用に、４個の入出力レジスタ３１ｂを外部出力レジスタに、８個の演算レジスタ３１ｃと２個の入出力レジスタ３１ｂの合計１０個を一時的なデータ保持に割り当てることで実現できる。つまり、入力データと出力データのビット幅の合計が１９２ビットまでで、一時的なデータ保持のビット幅を加えた合計のビット幅が２５６ビットまでのアプリケーションであれば自由にレジスタの使用方法を設定して実現できることになる。 Next, a modified example of the above embodiment will be described with reference to FIG. The configuration shown in FIG. 8 has two basic configurations shown in FIG. That is, in the embodiment shown in FIG. 3, there are 24 input / output registers 31b in total, and 8 are operation registers 31c used for temporary data storage for operation processing accessible only from the operation array 36. is there. Since these two types of registers are 32 in total, for example, in an application that requires 56 bits of input data, 32 bits of output data, and 80 bits for temporary data retention, 7 input / output registers 31b are externally provided. This can be realized by assigning four input / output registers 31b as external output registers and a total of ten operation registers 31c and two input / output registers 31b for temporary data storage. In other words, if the total of the bit width of input data and output data is up to 192 bits and the total bit width including the bit width of temporary data retention is up to 256 bits, the register usage can be freely set Can be realized.

（第３の実施形態）
本発明に係るＳＩＭＤ型プロセッサ１の第３の実施形態を、図９を参照して以下説明する。上述した第２実施形態においては、プロセッサエレメント３ａをアドレス指定しているが、本実施形態はプロセッサエレメント３ａの指定をアドレス指定する方式ではなく、ポインタ指定する方式、即ちシリアルアクセスメモリ方式に応用するものである。なお、ここでは上述した第２実施形態と異なる点について説明することとし、同じ点については説明を省略する。また、上述した第２実施形態と同じ構成部分については、同一の符号を付する。 (Third embodiment)
A third embodiment of the SIMD type processor 1 according to the present invention will be described below with reference to FIG. In the second embodiment described above, the processor element 3a is addressed. However, the present embodiment is not applied to the addressing specification of the processor element 3a, but applied to a pointer specifying method, that is, a serial access memory method. Is. Here, the points different from the second embodiment described above will be described, and the description of the same points will be omitted. Moreover, the same code | symbol is attached | subjected about the same component as 2nd Embodiment mentioned above.

まず、グローバルプロセッサ２からメモリコントローラ５へはＩ／Ｏ用のアドレス、データ、コントロール信号がバスを介して与えられる。グローバルプロセッサ２がメモリコントローラ５のいくつかの動作設定レジスタ（図示せず）へ動作方法等のコマンドを設定している。最後にグローバルプロセッサ２は、メモリコントローラ５のスタートレジスタ（図示せず）にスタートコードを書き込むことで、メモリコントローラ５は自動的に設定に従った動作を行う。メモリコントローラ５は、グローバルプロセッサ２のコマンドに基づき、このリセット信号を生成し、外部インタフェース４からリセット信号４７を介してプロセッサエレメントブロック３ヘ送る。これにより、レジスタコントローラ３１ａは、リセットされる。そして、外部インタフェース４に最も近いレジスタコントローラ３１ａへメモリコントローラ５から外部インタフェース４、クロック信号４１ｃを介してクロック信号が送られる。このクロック信号に同期して、レジスタコントローラ３１ａは、リード／ライト信号４５ａ或いは４５ｂを介してメモリコントローラ５から送られてきたリード／ライト指示信号を得る。このリード／ライト指示信号は、偶数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂ、及び奇数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂにそれぞれ与えられる。このとき一組を構成するプロセッサエレメント３ａのレジスタコントローラ３１ａへ送られるリード／ライト指示信号は、上記第２実施形態の場合と同様それぞれ異なるものであってもよい。 First, an I / O address, data, and control signal are given from the global processor 2 to the memory controller 5 via a bus. The global processor 2 sets commands such as an operation method in some operation setting registers (not shown) of the memory controller 5. Finally, the global processor 2 writes a start code in a start register (not shown) of the memory controller 5 so that the memory controller 5 automatically performs an operation according to the setting. The memory controller 5 generates this reset signal based on the command of the global processor 2 and sends it to the processor element block 3 from the external interface 4 via the reset signal 47. As a result, the register controller 31a is reset. Then, a clock signal is sent from the memory controller 5 to the register controller 31a closest to the external interface 4 via the external interface 4 and the clock signal 41c. In synchronization with this clock signal, the register controller 31a obtains a read / write instruction signal sent from the memory controller 5 via the read / write signal 45a or 45b. This read / write instruction signal is applied to the register 31b of the processor element 3a to which the even number is assigned and to the register 31b of the processor element 3a to which the odd number is assigned. At this time, the read / write instruction signals sent to the register controller 31a of the processor element 3a constituting one set may be different from each other as in the case of the second embodiment.

これにより、上述した第２実施形態の場合と同様、一度のポインタ指定により、偶数番号が割り付けられたプロセッサエレメント３ａにデータ転送できるとともに、奇数番号が割り付けられたプロセッサエレメント３ａにもデータ転送できる。このため、データの転送回数を少なくすることができ、データ転送を高速にできる。よって、データ処理を高速にできる。 As a result, as in the case of the second embodiment described above, data can be transferred to the processor element 3a assigned with an even number by one pointer designation, and can also be transferred to the processor element 3a assigned with an odd number. For this reason, the number of times of data transfer can be reduced, and data transfer can be performed at high speed. Therefore, data processing can be performed at high speed.

（第４実施形態）
本発明に係るＳＩＭＤ型プロセッサ１の第４の実施形態を、図１１及び図１２を参照して以下説明する。なお、ここでは上述した第１実施形態と異なる点について説明することとし、同じ点については説明を省略する。また、上述した第１実施形態と同じ構成部分については同一の符号を付する。 (Fourth embodiment)
A fourth embodiment of the SIMD type processor 1 according to the present invention will be described below with reference to FIGS. Here, the points different from the first embodiment described above will be described, and the description of the same points will be omitted. Moreover, the same code | symbol is attached | subjected about the same component as 1st Embodiment mentioned above.

本実施形態においては、図１０に示すように、ラインバッファ６１をプロセッサエレメント３ａの外部に別途設けることを特徴とする。この図１０では、ラインバッファ６１を２つ示しているが、ラインバッファ６１の数は適宜変更してもよい。このラインバッファ６１には、演算処理が終了しているが、注目画素の上下の画素を参照するために必要なデータを保持したり、或いは１ラインの画素数が多い場合にプロセッサエレメント３ａ…を越える処理画素数を保持することなどに使用される。図１０では、入出力レジスタファイル３１にラインバッファ６１を接続しており、入出力レジスタファイル３１に保持されている一部のデータが、このラインバッファ６１に送られ保持される。また、ラインバッファ６１に保持されているデータは、必要に応じて入出力レジスタファイル３１に送られ、演算処理のデータとして使用される。なお、ここで、入出力レジスタファイル３１の各ブロックは、図２において横に一列に並んでいる２５６個のレジスタコントローラ３１ａ及びレジスタ３１ｂを意味する。 In the present embodiment, as shown in FIG. 10, a line buffer 61 is separately provided outside the processor element 3a. In FIG. 10, two line buffers 61 are shown, but the number of line buffers 61 may be changed as appropriate. In this line buffer 61, calculation processing has been completed, but data necessary for referring to the upper and lower pixels of the target pixel is held, or when the number of pixels in one line is large, the processor element 3a. It is used for holding the number of processed pixels exceeding. In FIG. 10, the line buffer 61 is connected to the input / output register file 31, and a part of the data held in the input / output register file 31 is sent to and held in the line buffer 61. Further, the data held in the line buffer 61 is sent to the input / output register file 31 as necessary, and used as operation processing data. Here, each block of the input / output register file 31 means 256 register controllers 31a and 31b arranged in a line horizontally in FIG.

上記した実施形態のように、２５６個のプロセッサエレメント３ａ…を備えたプロセッサでは、２５６画素までは内部のレジスタファイル３１にデータを置くことが可能である。それを超える画素数の場合、複数の本数のレジスタに同一ラインを分割して保持することになる。上記のようにラインバッファ６１を外部に持つことで、２５６画素ずつラインバッファ６１からデータを取り込むことが可能となり、２５６画素以上のラインでも繰り返し同じ処理を行うことで、画素数をいくらでも増加させることができる。但し、画像数の上限はラインバッファ６１の容量で決まる。このように、外部にラインバッファ６１を備えることにより、１ラインの画素数が多くなっても容易にその処理を行うことができる。 As in the embodiment described above, a processor having 256 processor elements 3a... Can store data in the internal register file 31 up to 256 pixels. When the number of pixels exceeds that, the same line is divided and held in a plurality of registers. By having the line buffer 61 outside as described above, it is possible to fetch data from the line buffer 61 in units of 256 pixels, and by repeatedly performing the same processing for lines of 256 pixels or more, the number of pixels can be increased as much as possible. Can do. However, the upper limit of the number of images is determined by the capacity of the line buffer 61. Thus, by providing the line buffer 61 outside, even if the number of pixels in one line increases, the processing can be easily performed.

また、入出力レジスタファイル３１で保持しているデータを、ラインバッファ６１で保持させることで、空いた入出力レジスタファイル３１を他の演算処理のために使用でき、演算処理を効率的に行うことができる。即ち、プロセッサエレメント３ａのレジスタ３１ｂの容量を越えるデータの処理が可能になる。 Also, by holding the data held in the input / output register file 31 in the line buffer 61, the empty input / output register file 31 can be used for other arithmetic processing, and the arithmetic processing can be performed efficiently. Can do. That is, it becomes possible to process data exceeding the capacity of the register 31b of the processor element 3a.

なお、レジスタファイルの種類に関係なく、ラインバッファ６１をプロセッサエレメント３ａの外部に別途設けることができる。即ち、図１１に示すように、演算処理されるデータを取得して保持するだけの機能を持つ入力レジスタファイル、演算処理されたデータをデータバス４１ｄに出力するだけの機能を持つ出力レジスタファイルに接続して設けてもよい。この場合、出力レジスタファイルに保持されている一部のデータが、ラインバッファ６１に送られ保持される。また、ラインバッファ６１に保持されているデータは、必要に応じて入力レジスタファイルに送られ、演算処理のデータとして使用される。 Note that the line buffer 61 can be separately provided outside the processor element 3a regardless of the type of the register file. That is, as shown in FIG. 11, an input register file having a function of only acquiring and holding data to be processed and an output register file having a function of only outputting the processed data to the data bus 41d. You may connect and provide. In this case, some data held in the output register file is sent to the line buffer 61 and held. Further, the data held in the line buffer 61 is sent to the input register file as necessary, and used as data for arithmetic processing.

この発明の実施形態におけるＳＩＭＤ型プロセッサを示すブロック図である。It is a block diagram which shows the SIMD type | mold processor in embodiment of this invention. この発明に用いられるメモリコントローラ５の構成を示すブロック図である。It is a block diagram which shows the structure of the memory controller 5 used for this invention. この発明の第１実施形態におけるＳＩＭＤ型プロセッサの内部構成を示す図である。It is a figure which shows the internal structure of the SIMD type | mold processor in 1st Embodiment of this invention. 第１実施形態におけるプロセッサエレメントの内部構成を示す図である。It is a figure which shows the internal structure of the processor element in 1st Embodiment. 転送先のメモリとメモリ転送ブロックが同一チップ上に搭載された実施の形態を示すブロック図である。FIG. 3 is a block diagram showing an embodiment in which a transfer destination memory and a memory transfer block are mounted on the same chip. 第１実施形態におけるプロセッサエレメントの内部構成を示す図である。It is a figure which shows the internal structure of the processor element in 1st Embodiment. 第２実施形態におけるプロセッサエレメントの内部構成を示す図である。It is a figure which shows the internal structure of the processor element in 2nd Embodiment. 第２実施形態におけるプロセッサエレメントの内部構成を示す図である。It is a figure which shows the internal structure of the processor element in 2nd Embodiment. 第３実施形態におけるプロセッサエレメントの内部構成を示す図である。It is a figure which shows the internal structure of the processor element in 3rd Embodiment. 第４実施形態におけるラインバッファの接続を説明するブロック図である。It is a block diagram explaining the connection of the line buffer in 4th Embodiment. 第４実施形態におけるラインバッファの接続を説明するブロック図である。It is a block diagram explaining the connection of the line buffer in 4th Embodiment.

１ＳＩＭＤ型プロセッサ
２グローバルプロセッサ
４外部インタフェース
５メモリコントローラ
２６ａリード信号
２６ｂライト信号
３１ａレジスタコントローラ
３１ｂレジスタ
３４ＡＬＵ
４１ａアドレスバス
４１ｂリード／ライト信号
４１ｄクロック信号
４５ａ偶数用リード／ライト信号
４５ｂ奇数用リード／ライト信号
４６ａ偶数用データバス
４６ｂ奇数用データバス
４７リセット信号 1 SIMD type processor 2 Global processor 4 External interface 5 Memory controller 26a Read signal 26b Write signal 31a Register controller 31b Register 34 ALU
41a Address bus 41b Read / write signal 41d Clock signal 45a Even read / write signal 45b Odd read / write signal 46a Even data bus 46b Odd data bus 47 Reset signal

Claims

A plurality of processor elements each having a computing means for computing data and data holding means for holding data computed by the computing means and holding data computed by the computing means, and connected to each of the processor elements Data transfer bus, a designation means for designating a predetermined processor element, an address bus connected to each of the processor elements, and data to be processed are obtained from the data transfer bus and held in the data holding means A signal generating means for providing the data holding means with an acquisition signal for causing the data holding means or an output signal for outputting the processed data held in the data holding means from the data transfer bus,
The data holding means has a first register group having a plurality of registers and a second register group having a plurality of registers, and the designation means addresses a predetermined processor element, and When the signal generating means gives a signal to the predetermined data holding means of the processor element, a predetermined number of registers are selected from the first register group and the second register group in the data holding means, A SIMD type processor characterized in that data is acquired or output from the data transfer bus to a selected register in a data holding means.