JP2006155637A

JP2006155637A - Apparatus for processing signal

Info

Publication number: JP2006155637A
Application number: JP2006001390A
Authority: JP
Inventors: Kazuhiko Iwanaga; 和彦岩永; Shinichi Yamaura; 慎一山浦; Kazuhiko Hara; 和彦原; Takao Katayama; 貴雄片山; Kosuke Takato; 浩資高藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2006-01-06
Filing date: 2006-01-06
Publication date: 2006-06-15
Anticipated expiration: 2019-11-05
Also published as: JP4408113B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an apparatus for processing signal which restrains circuit scale from increasing by using a single port memory and has a SIMD (Single Instruction Multiple Data) type processor with simple constitution wherein a variable-magnification function can be built-in at the same time. <P>SOLUTION: The apparatus for processing signal is connected to an external interface for accessing to a register file built-in by a PE (Processing Element) of the SIMD type processor from an outside of the processor, reads a data stored in an image memory, writes/reads the data into/from the PE, is provided with a memory controller to write the data in the memory, and performs a calculation by dividing the data. The memory controller reads the calculated data from the PE except the data of an invalid portion before and after the SIMD processor, writes the data in the memory, returns back an address of data corresponding to the invalid portion by the previous step process, reads the data from the memory, and writes the data into the PE. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

この発明は一つの演算命令により複数の画像データ等を並列処理するＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＳｔｒｅａｍＭｕｌｔｉｐｌｅＤａｔａＳｔｒｅａｍ）型プロセッサを用いた信号処理装置に関し、例えばデジタルコピーなどの画像処理に用いて好適な信号処理装置に関するものである。 The present invention relates to a signal processing apparatus using a single instruction stream multiple data stream (SIMD) type processor that processes a plurality of image data and the like in parallel by a single operation instruction, and is suitable for use in image processing such as digital copying. It relates to the device.

近年、デジタル複写機やファクリミリ装置等において、画素数を増加させたり、或いはカラー対応にするなど画像の向上が図られている。そして、この画像の向上に伴い、処理すべきデータ数が増加している。ところで、複写機などにおけるデータ処理は全ての画素に対して同じ演算処理を施すことが多い。そこで、１つの命令で複数のデータに対して同時に同じ演算処理を行うＳＩＭＤ方式のプロセッサが用いられるようになっている。 In recent years, in digital copying machines, facsimile machines, and the like, improvement of images has been attempted by increasing the number of pixels or making it compatible with color. As the image is improved, the number of data to be processed has increased. By the way, data processing in a copying machine or the like often performs the same arithmetic processing on all pixels. Therefore, a SIMD processor that performs the same arithmetic processing on a plurality of data simultaneously with one instruction is used.

通常、ＳＩＭＤ方式のプロセッサを用いて画像処理を行う場合、主走査方向にプロセッサエレメント（ＰＥ）を展開する。このため、フィルター処理などの画像処理を行う場合、注目画素の上下の参照画素が必要になり、前のラインの画素データをラインディレイさせて、ラインメモリに格納しておくことが考えられる。 Normally, when image processing is performed using a SIMD processor, processor elements (PE) are developed in the main scanning direction. For this reason, when performing image processing such as filter processing, reference pixels above and below the target pixel are required, and it is conceivable that pixel data of the previous line is delayed in line and stored in the line memory.

図１６に、ＳＩＭＤ方式を用いた画像処理装置の概略ブロック図を示す。図に示すように、この画像処理装置は、画像データが格納される外部画像メモリ（ＲＡＭ）１０１、例えば、１０２４個のプロセッサエレメントからなるプロセッサエレメントブロック１０３、グローバルプロセッサ１０４を備える。 FIG. 16 shows a schematic block diagram of an image processing apparatus using the SIMD method. As shown in the figure, the image processing apparatus includes an external image memory (RAM) 101 in which image data is stored, for example, a processor element block 103 including 1024 processor elements, and a global processor 104.

各プロセッサエレメント（ＰＥ）は、ｎ個の汎用レジスタ（Ｒ０〜Ｒｎ−１）と演算アレイを有し、汎用レジスタは、通常演算アレイの外部にレジスタファイルとして持つ形を取っている。それぞれのレジスタはシフトレジスタとしてシリアルポートを介して外部とアクセス可能となっている。図において、斜線部が単位プロセッサエレメントになる。 Each processor element (PE) has n general-purpose registers (R0 to Rn-1) and an operation array, and the general-purpose registers are in the form of having a register file outside the normal operation array. Each register is accessible as a shift register to the outside through a serial port. In the figure, the shaded area is a unit processor element.

グローバルプロセッサ１０４はプログラムメモリ（ＰＲＡＭ）１０５のアドレス生成手段を有し、ＰＲＡＭ１０５よりプロセッサエレメントブロック１０３のプロセッサエレメントに与える命令コードをリードし、演算アレイ及びレジスタファイルの制御を行っている。 The global processor 104 has an address generation means for a program memory (PRAM) 105, reads an instruction code given to the processor element of the processor element block 103 from the PRAM 105, and controls the operation array and the register file.

各ＰＥの汎用レジスタのシリアルポートに接続されるデバイスはシステム構成によって異なるが、２ライン分のラインディレイのみが必要なシステムでは、例えば、Ｒ０に２ライン前の画素データ、Ｒ１に１ライン前の画素データ、Ｒ２に現ライン画素データを配置するとして、Ｒ０とＲ１のシリアルポートにシリアルパラレル変換器１０２を配置して、外部ＲＡＭ１０１に接続するといった構成をとっている。 The device connected to the serial port of the general-purpose register of each PE differs depending on the system configuration. However, in a system that requires only a line delay of two lines, for example, pixel data of two lines before in R0 and one line before in R1 Assuming that the current line pixel data is arranged in the pixel data R2, the serial-parallel converter 102 is arranged in the serial ports R0 and R1, and is connected to the external RAM 101.

上記のような構成をもち、画像処理装置を実現している例としては、例えばＳＶＰ（ＳＥＲＩＡＬＶＩＤＥＯＰＲＯＣＥＳＳＯＲ）が知られている。 For example, SVP (SERIAL VIDEO PROCESSOR) is known as an example of the image processing apparatus having the above-described configuration.

従来のＳＩＭＤ方式のプロセッサは、殆どが主走査方向の画素数以上のプロセッサエレメントを持っているため、外部メモリ（ＲＡＭ）１０１に対する制御には難しい処理を必要としない。また、デジタルコピーなど画像処理において拡大、縮小などの変倍機能を実現するためには、別途ＡＳＩＣを外付けするか、特開平０８−１２３６８３号公報（ＩＰＣ：Ｇ０６Ｆ９／３８）（特許文献１）に記載されているようなＳＩＭＤ方式のプロセッサ内部に変倍制御用のフラグを持つといった構成を取ることで実現できる。
特開平０８−１２３６８３号公報 Since most conventional SIMD processors have processor elements larger than the number of pixels in the main scanning direction, difficult processing is not required for controlling the external memory (RAM) 101. In order to realize a scaling function such as enlargement or reduction in image processing such as digital copying, an ASIC is separately attached or JP-A-08-123683 (IPC: G06F 9/38) (Patent Document 1). This can be realized by adopting a configuration in which a scaling control flag is provided inside the SIMD processor as described in (1).
Japanese Patent Laid-Open No. 08-123683

しかしながら、画像処理の精度は近年ますます向上しており、主走査方向の画素数は増加する傾向にある。また、従来のＳＩＭＤ型プロセッサにおいて、プロセッサエレメント数の多いものは１ビットの演算アレイを使用するなど、小規模の回路を用いて回路規模の増大を防止しているのが普通である。 However, the accuracy of image processing has been increasing in recent years, and the number of pixels in the main scanning direction tends to increase. Further, in a conventional SIMD type processor, an increase in the circuit scale is usually prevented by using a small-scale circuit, for example, a processor having a large number of processor elements uses a 1-bit arithmetic array.

デジタルコピーなどにＳＩＭＤ型プロセッサを応用しようとした場合に、例えば６００ＤＰＩ（ＤｏｔＰｅｒＩｎｃｈ）の精度でＡ４のサイズの画像を扱う場合、７０００画素以上のプロセッサエレメント数が必要となり、単純にプロセッサエレメント数を増やすのは現実的ではない。そこで、これを解決するために、主走査方向を分割して処理を行うことが考えられるが、ラインメモリとしてデュアルポートメモリ（ＲＡＭ）が必要になり回路規模が増加するという難点がある。 When an SIMD type processor is applied to digital copying, for example, when an A4 size image is handled with an accuracy of 600 DPI (Dot Per Inch), the number of processor elements of 7000 pixels or more is required. It is not realistic to increase In order to solve this problem, it is conceivable to perform processing by dividing the main scanning direction. However, a dual port memory (RAM) is required as a line memory, which increases the circuit scale.

一方、特開平１０−３２６２５８号公報（ＩＰＣ：Ｇ０６Ｆ１５／１６）においては、主走査方向の画素を２分割して、全プロセッサエレメント数を半分に分割してパイプライン処理を行うことで処理時間の短縮を可能にしたデータ演算システムが提案されている。しかしながら、この方法では、プロセッサエレメントとアクセス可能なデータメモリのサイズを越えるような画素数の処理を行うことが出来ない。 On the other hand, in Japanese Patent Laid-Open No. 10-326258 (IPC: G06F 15/16), the processing time is obtained by dividing the pixels in the main scanning direction into two and dividing the total number of processor elements in half to perform pipeline processing. There has been proposed a data operation system that can shorten the time. However, with this method, it is not possible to perform processing with the number of pixels exceeding the size of the data memory accessible to the processor element.

さらに、プロセッサエレメントに変倍制御用のフラグを持つ構成では、分割数が増加した場合に処理が複雑になるため、ＳＩＭＤ型プロセッサの外部で変倍処理を実現するほうが望ましい。しかし、別途ＡＳＩＣを用いて変倍機能を実現すると、プロセッサの汎用性を減少させてしまうことになる。 Furthermore, in a configuration in which a processor element has a scaling control flag, the processing becomes complicated when the number of divisions increases, so it is desirable to implement scaling processing outside the SIMD type processor. However, if the scaling function is realized separately using an ASIC, the versatility of the processor is reduced.

そこで、本発明では、シングルポートメモリ（ＲＡＭ）を用いることで回路規模の増加を抑えながら、同時に変倍機能も内蔵可能な簡単な構成のＳＩＭＤ型プロセッサを用いた信号処理装置を提供することを目的とする。 Therefore, the present invention provides a signal processing device using a SIMD processor having a simple configuration that can suppress the increase in circuit scale by using a single port memory (RAM) and at the same time incorporate a zoom function. Objective.

この発明は、データを演算処理する演算手段及び当該演算手段で演算処理されるデータを保持するとともに当該演算手段で演算処理されたデータを保持するデータ保持手段を備えるＳＩＭＤ型プロセッサのプロセッサエレメントと、複数の前記プロセッサエレメントそれぞれに接続されるデータ転送バスと、前記複数のプロセッサエレメントに割り付けられるアドレスに基づき、所定のプロセッサエレメントを指定する指定手段と、この指定手段にアドレスを供給するアドレスバスと、前記複数のプロセッサエレメントが内蔵しているデータ保持手段にプロセッサ外部からアクセスするためのデータ転送用インタフェースと、このデータ転送インタフェースに接続され、前記アドレスバスに供給される前記所定のプロセッサエレメントを指定するためのアドレスを生成するとともに、メモリに格納されたデータを読み出して、プロセッサエレメントへデータの書き込みを行うと共に前記プロセッサエレメントからデータを読み出して、前記メモリにデータの書き込みを行うメモリコントローラと、を備え、前記メモリコントローラは、前記データ転送用インタフェースを介して、各プロセッサエレメントのデータ保持手段から前記メモリにデータ転送する場合に、転送を開始するプロセッサエレメントのアドレスと転送を終了するプロセッサエレメントのアドレスを指定し、所定数のデータを除いて、各プロセッサエレメントから演算済みのデータの読み出しを行うことを特徴とする。 The present invention relates to a processor element of a SIMD type processor comprising a computing means for computing data and data holding means for holding data computed by the computing means and holding data computed by the computing means; A data transfer bus connected to each of the plurality of processor elements; designation means for designating a predetermined processor element based on addresses assigned to the plurality of processor elements; and an address bus for supplying an address to the designation means; A data transfer interface for accessing the data holding means incorporated in the plurality of processor elements from outside the processor, and the predetermined processor element connected to the data transfer interface and supplied to the address bus is designated A memory controller for generating an address for reading data, reading data stored in the memory, writing data to the processor element, reading data from the processor element, and writing data to the memory; The memory controller, when transferring data from the data holding means of each processor element to the memory via the data transfer interface, the address of the processor element that starts the transfer and the address of the processor element that ends the transfer And the calculated data is read from each processor element except for a predetermined number of data.

上記のように構成することで、プロセッサエレメントの数が主走査方向の画素数より少ない場合においても、容易にＳＩＭＤプロセッサでの演算が行え、フィルタ処理などの重み付け演算を行う場合に有効な画素のみを転送できる。 With the configuration described above, even when the number of processor elements is smaller than the number of pixels in the main scanning direction, the SIMD processor can easily perform calculations and only effective pixels when performing weighting calculations such as filter processing. Can be transferred.

前記メモリコントローラは、メモリへの書き込み転送と読み込み転送とを時分割で行うように制御するとよい。 The memory controller may control to perform write transfer and read transfer to the memory in a time-sharing manner.

このように構成することで、シングルポートメモリをＦＩＦＯもしくはＬＩＦＯメモリとして用いることが可能となり、シングルポートメモリを用いながらラインメモリを実現できる。 With this configuration, the single port memory can be used as a FIFO or LIFO memory, and a line memory can be realized using the single port memory.

また、前記メモリコントローラは、データ転送用インタフェースを介して、各プロセッサエレメントのデータ保持手段から前記メモリにデータ転送する場合に、転送を開始するプロセッサエレメントのアドレスと転送を終了するプロセッサエレメントのアドレスを指定するレジスタを備えるように構成すればよい。たとえば、プロセッサエレメントのアドレスカウンタに初期値ロード機能を付加し、プロセッサエレメントのアドレスを基準としたオフセット値の設定ができるように構成すればよい。 In addition, when transferring data from the data holding means of each processor element to the memory via the data transfer interface, the memory controller sets the address of the processor element that starts transfer and the address of the processor element that ends transfer. What is necessary is just to comprise so that the register to designate may be provided. For example, an initial value loading function may be added to the address counter of the processor element so that an offset value can be set based on the address of the processor element.

さらに、前記メモリコントローラは、前記メモリからデータ転送用インタフェースを介して各プロセッサエレメントのデータ保持手段にデータ転送する場合に、転送を終了するプロセッサエレメントのアドレスと、転送終了後にメモリのリードポインタを戻すためのプロセッサエレメントのアドレスを指定するレジスタを備えるように構成することができる。例えば、プロセッサエレメントのアドレスと設定値との比較器を設け、設定値と等しいプロセッサエレメントアドレスのレジスタに転送されたデータが格納されていたメモリのリードポインタをリロードするように構成すればよい。 Further, when the data is transferred from the memory to the data holding means of each processor element via the data transfer interface, the memory controller returns the address of the processor element that ends the transfer and the read pointer of the memory after the transfer ends. It is possible to provide a register for designating the address of the processor element. For example, a comparator between the processor element address and the set value may be provided so as to reload the read pointer of the memory in which the data transferred to the register having the processor element address equal to the set value is stored.

上記のように構成することで、主査方向の画素数よりも少ないプロセッサエレメント数のＳＩＭＤ型プロセッサを用いた場合に、フィルタ処理などの重み付け演算を行う場合に有効な画素のみを転送できる。 With the configuration described above, when a SIMD processor having a smaller number of processor elements than the number of pixels in the main scanning direction is used, only effective pixels can be transferred when performing weighting operations such as filter processing.

また、前記メモリコントローラは、メモリの任意のアドレス領域の下限値と上限値をレジスタで設定して、その領域をリング状に使用するように構成することができる。例えば、メモリのポインタの下限値、上限値レジスタを設け、それぞれ比較器を持ち、条件が揃った場合にそれぞれのレジスタの値をアドレスバスに出力できるように構成すればよい。 Further, the memory controller can be configured so that a lower limit value and an upper limit value of an arbitrary address area of the memory are set by a register and the area is used in a ring shape. For example, a lower limit value and an upper limit value register for memory pointers may be provided, each having a comparator, and configured so that the value of each register can be output to the address bus when the conditions are met.

上記のように構成することで、複数のステップを持つような画像処理の場合に、処理ステップごとにデータを保管することができる。 With the above configuration, in the case of image processing having a plurality of steps, data can be stored for each processing step.

また、前記データ転送用インタフェースは、偶数のアドレスを持つプロセッサエレメントと奇数のアドレスを持つプロセッサエレメントのデータ保持手段に同時にアクセスすることが可能になるように構成するとよい。たとえば、ＳＩＭＤ型プロセッサのデータ転送ポートを偶数のアドレスを持つプロセッサエレメントと奇数のアドレスを持つプロセッサエレメントのデータ保持手段にアクセスするための２つの独立したポートを持つ構成とし、１サイクルで２プロセッサエレメント分のデータを処理できるように構成すればよい。 The data transfer interface may be configured to be able to simultaneously access data holding means of processor elements having even addresses and processor elements having odd addresses. For example, a data transfer port of a SIMD type processor has a configuration having two independent ports for accessing a data holding means of a processor element having an even address and a processor element having an odd address, and two processor elements in one cycle. What is necessary is just to comprise so that the data of a minute can be processed.

さらに、メモリとの入出力バスのビット幅がプロセッサエレメントとのデータ転送ビット幅よりも広くするとよい。たとえば、メモリとの入出力バッファを４プロセッサエレメント分持つことでメモリのビット幅を４プロセッサエレメント分の幅に構成すればよい。 Furthermore, the bit width of the input / output bus with the memory may be wider than the data transfer bit width with the processor element. For example, the bit width of the memory may be configured to be 4 processor elements wide by having input / output buffers for the memory for 4 processor elements.

上記のように構成することで、全プロセッサエレメントのデータ保持手段へデータ転送する時間を半分にすることができる。また、メモリのアクセスタイムに余裕を持たせることができる。 With the above configuration, the time for transferring data to the data holding means of all the processor elements can be halved. Further, it is possible to provide a margin for memory access time.

また、この発明は、各プロセッサエレメントのデータ保持手段からデータ転送用インタフェースを介してメモリにデータ転送する場合に、このデータ転送と同期を取った外部からのライト制御信号が”０”である場合にデータを書き込み、”１”である場合にデータの書き込みを禁止するように構成することができる。たとえば、同期信号を外部よりシーケンスユニットに入力し、ライト転送を開始するタイミングを制御することで、ライト制御信号とライト転送との同期を取ることができるようにし、またライト制御信号の値によって、プロセッサエレメントのデータ転送ポートからリードされたデータを整形してＲＡＭにライトするためのライトバッファ部の制御を変えることによって、ライト制御信号に応じた転送を行えるように構成すればよい。 Further, according to the present invention, when data is transferred from the data holding means of each processor element to the memory via the data transfer interface, the write control signal from the outside synchronized with the data transfer is “0”. The data can be written in, and when it is “1”, the data writing can be prohibited. For example, by inputting a synchronization signal to the sequence unit from the outside and controlling the timing to start the write transfer, the write control signal and the write transfer can be synchronized, and depending on the value of the write control signal, What is necessary is just to comprise so that transfer according to a write control signal can be performed by changing the control of the write buffer part for shaping the data read from the data transfer port of the processor element and writing it to the RAM.

さらに、前記メモリからデータ転送用インタフェースを介して各プロセッサエレメントのデータ保持手段にデータ転送する場合に、このデータ転送と同期を取った外部からのリード制御信号が”０”である場合には、メモリからデータリードした値をデータ転送用インタフェースに書き込み、”１”の場合には、その転送の前で最後に外部信号が”０”であった場合に転送されたデータと同じ値をデータ転送用インタフェースに書き込むように構成することができる。たとえば、同期信号を外部よりシーケンスユニットに入力し、リード転送を開始するタイミングを制御することで、リード制御信号とリード転送との同期を取ることができるようにし、またリード制御信号の値によって、ＲＡＭからリードされたデータを整形してプロセッサエレメントのデータ転送ポートにライトするためのリードバッファ部の制御を変えることによって、リード制御信号に応じた転送を行えるように構成すればよい。 Further, when data is transferred from the memory to the data holding means of each processor element via the data transfer interface, when an external read control signal synchronized with the data transfer is “0”, The value read from the memory is written to the data transfer interface. If it is “1”, the same value as the data transferred when the external signal was “0” last before the transfer is transferred. Can be configured to write to the interface. For example, by inputting a synchronization signal from the outside to the sequence unit and controlling the timing of starting the read transfer, the read control signal and the read transfer can be synchronized, and depending on the value of the read control signal, What is necessary is just to comprise so that transfer according to a read control signal can be performed by changing the control of the read buffer part for shaping the data read from RAM, and writing in the data transfer port of a processor element.

上記のように構成することで、デジタル画像処理における変倍処理をメモリコントローラで実現できるため、ＳＩＭＤ型プロセッサ自体の汎用性と回路規模を保つことができる。 With the configuration described above, the scaling process in the digital image processing can be realized by the memory controller, so that the versatility and circuit scale of the SIMD type processor itself can be maintained.

また、この発明は、前記メモリには画像データが格納され、前記ＳＩＭＤ型プロセッサのプロセッサエレメントの数は主走査方向の画素数よりも少なく構成され、前記メモリコントローラは主走査方向の全画素を２つ以上に分割して処理を行うように、データの書き込み及び読み込みの処理の制御を行うように構成することができる。 Further, according to the present invention, image data is stored in the memory, the number of processor elements of the SIMD type processor is configured to be smaller than the number of pixels in the main scanning direction, and the memory controller sets all the pixels in the main scanning direction to 2 It can be configured to control the processing of writing and reading data so that the processing is divided into two or more.

上記のように構成することで、デジタルコピーなどの主走査方向の画素数が極めて多い画像処理装置を実現する場合に、プロセッサエレメント数を増減するなどのＳＩＭＤ型プロセッサのアーキテクチャそのものを変更することなく画像処理装置を構築することができる。 By configuring as described above, when realizing an image processing apparatus having a very large number of pixels in the main scanning direction, such as digital copying, without changing the SIMD processor architecture itself, such as increasing or decreasing the number of processor elements. An image processing apparatus can be constructed.

上述したように、この発明によれば、プロセッサエレメントの数が主走査方向の画素数より少ない場合においても、容易にＳＩＭＤプロセッサでの演算が行え、フィルタ処理などの重み付け演算を行う場合に有効な画素のみを転送できる。 As described above, according to the present invention, even when the number of processor elements is smaller than the number of pixels in the main scanning direction, the SIMD processor can easily perform calculations and is effective when performing weighting calculations such as filter processing. Only pixels can be transferred.

以下、この発明に係るＳＩＭＤ型プロセッサ１の実施の形態を図面を参照して説明する。 Embodiments of a SIMD type processor 1 according to the present invention will be described below with reference to the drawings.

まず、この発明にかかるＳＩＭＤ型プロセッサの全体構成について、図１に従い説明する。この発明のＳＩＭＤ型プロセッサ１は、図１に示すように、グローバルプロセッサ２、本実施形態では１０２４組の後述するプロセッサエレメント３ａからなるプロセッサエレメントブロック３、メモリコントローラ５と接続される外部インターフェース４から構成される。メモリコントローラ５はグローバルプロセッサ２の命令に基づき、シングルポートメモリ（ＲＡＭ）で構成された外部画像メモリ６から演算対象となる画像データをプロセッサ内部の入出力用のレジスタフィル３１に与えるとともに、演算処理されたデータをレジスタファイル３１から画像メモリ（ＲＡＭ）６へ転送するものである。 First, the overall configuration of the SIMD type processor according to the present invention will be described with reference to FIG. As shown in FIG. 1, the SIMD type processor 1 of the present invention includes a global processor 2, a processor element block 3 including 1024 sets of processor elements 3a described later in this embodiment, and an external interface 4 connected to a memory controller 5. Composed. Based on instructions from the global processor 2, the memory controller 5 provides image data to be calculated from an external image memory 6 composed of a single port memory (RAM) to an input / output register file 31 inside the processor, and performs arithmetic processing. The processed data is transferred from the register file 31 to the image memory (RAM) 6.

グローバルプロセッサ２は、図２に示すように、プロセッサエレメントブロック３、外部インタフェース４及びメモリコントローラ５を制御するためのプログラムが格納されたプログラムＲＡＭ２１、及びこのプログラムＲＡＭ２１に基づきグローバルプロセッサ２、プロセッサエレメントブロック３、外部インタフェース４、メモリコントローラ５を制御するシーケンスユニット２２を備える。具体的には、このシーケンスユニット２２は、グローバルプロセッサ２に備えられている後述する算術論理演算器２３（以下、「ＡＬＵ２３」という。）等を制御する。 As shown in FIG. 2, the global processor 2 includes a processor element block 3, a program RAM 21 in which a program for controlling the external interface 4 and the memory controller 5 is stored, and a global processor 2 and a processor element block based on the program RAM 21. 3, a sequence unit 22 for controlling the external interface 4 and the memory controller 5 is provided. Specifically, the sequence unit 22 controls an arithmetic logic unit 23 (hereinafter referred to as “ALU 23”), which will be described later, provided in the global processor 2.

また、このシーケンスユニット２２は、プロセッサエレメントブロック３を構成するレジスタファイル３１、及び演算アレイ３６を制御する。この演算アレイ３６は、マルチプレクサ３２、シフト拡張回路３３、算術論理演算器３４（以下、「ＡＬＵ３４」という）、及びレジスタ３５を備える。なお、このグローバルプロセッサ２は、いわゆるＳＩＳＤ型であり、一つの演算命令に対して一つの演算処理を行うものである。 The sequence unit 22 controls the register file 31 and the arithmetic array 36 that constitute the processor element block 3. The arithmetic array 36 includes a multiplexer 32, a shift extension circuit 33, an arithmetic logic unit 34 (hereinafter referred to as “ALU 34”), and a register 35. The global processor 2 is a so-called SISD type, and performs one arithmetic process for one arithmetic instruction.

さらに、このシーケンスユニット２２は、メモリコントローラ５に対してデータ転送のための動作設定用データ及びコマンド等を送る。メモリコントローラ５は、シーケンスユニット２２の動作設定用データ及びコマンドに基づき、プロセッサエレメント３ａのアドレス指定のためのアドレス制御信号、プロセッサエレメント３ａを構成するレジスタ３１ｂにデータのリード／ライトを指示するためのリード／ライト制御信号、クロック信号を与えるためのクロック制御信号を外部インタフェース４に与える。 Further, the sequence unit 22 sends operation setting data and commands for data transfer to the memory controller 5. Based on the operation setting data and command of the sequence unit 22, the memory controller 5 is an address control signal for addressing the processor element 3a, and instructs the register 31b constituting the processor element 3a to read / write data. A clock control signal for supplying a read / write control signal and a clock signal is supplied to the external interface 4.

ここで、リード／ライト制御信号のうちライト制御信号とは、演算処理されるデータをデータバス４６ａ（４６ｂ）より取得して、プロセッサエレメント３ａのレジスタ３１ｂに保持させるための信号をいう。一方、リード／ライト制御信号のうちリード制御信号とは、プロセッサエレメント３ａのレジスタ３１ｂが保持している演算処理されたデータを、データバス４６ａ（４６ｂ）へ与えるようにレジスタ３１ｂに指示するための信号をいう。 Here, the write control signal among the read / write control signals refers to a signal for acquiring data to be processed from the data bus 46a (46b) and holding it in the register 31b of the processor element 3a. On the other hand, of the read / write control signals, the read control signal is used to instruct the register 31b to supply the processed data held in the register 31b of the processor element 3a to the data bus 46a (46b). A signal.

この実施の形態におけるプロセッサエレメントブロック３は、隣り合う２つのプロセッサエレメント３ａに偶数番号、奇数番号を割り付けて１組とすると共に、この１組のプロセッサエレメント３ａには同一のアドレスを割り付けている。 In the processor element block 3 in this embodiment, an even number and an odd number are assigned to two adjacent processor elements 3a to form one set, and the same address is assigned to this set of processor elements 3a.

メモリコントローラ５は、グローバルプロセッサ２からのコマンドを受けて、プロセッサエレメントブロック３を構成するプロセッサエレメント３ａのアドレスを指定する信号（以下、「アドレス指定信号」という。）を作成し、外部インターフェース４からアドレスバス４１ａを介してプロセッサエレメント３ａのレジスタコントローラ３１ａヘ送る。また、メモリコントローラ５は、後述するように、プロセッサエレメント３ａを構成するレジスタ３１ｂに対して、データのリード／ライトを指示するための信号（以下、「リード／ライト指示信号」という。）を、リード／ライト信号線４５ａ（４５ｂ）を介してプロセッサエレメント３ａの後述するレジスタコントローラ３１ａヘリード／ライト信号が与えられる。偶数用リード／ライト信号線４５ａは、偶数のプロセッサエレメント３ａにリード／ライト信号を与え、奇数用リード／ライト信号線４５ｂは、奇数のプロセッサエレメント３ａにリード／ライト信号を与える。 Upon receiving a command from the global processor 2, the memory controller 5 creates a signal (hereinafter referred to as “address designation signal”) that designates the address of the processor element 3 a that constitutes the processor element block 3, and the external controller 4 The data is sent to the register controller 31a of the processor element 3a via the address bus 41a. As will be described later, the memory controller 5 sends a signal for instructing the register 31b constituting the processor element 3a to read / write data (hereinafter referred to as “read / write instruction signal”). A read / write signal is supplied to a register controller 31a (to be described later) of the processor element 3a through a read / write signal line 45a (45b). The even read / write signal line 45a provides a read / write signal to the even number processor element 3a, and the odd read / write signal line 45b provides a read / write signal to the odd number processor element 3a.

また、メモリコントローラ５は、外部インタフェース４からクロック信号線４１ｃを介してプロセッサエレメント３ａの後述するレジスタコントローラ３１ａへクロック信号を与える。 Further, the memory controller 5 gives a clock signal from the external interface 4 to a register controller 31a (to be described later) of the processor element 3a through the clock signal line 41c.

さらに、メモリコントローラ５は、上述したように、ＳＩＭＤ型プロセッサ１の外部に設けられた画像メモリ６に格納されているデータを、本実施形態では１６ビットのパラレルデータとして、外部インタフェース４に与える。この１６ビットのデータは、偶数番号が割り付けられたプロセッサエレメント３ａに与えられる８ビットと、奇数番号が割り付けられたプロセッサエレメント３ａに与えられる８ビットとから構成されている。それぞれ８ビットデータは偶数用データバス４６ａ及び奇数用データバス４６ｂに与えられる。この８ビットのパラレルデータについては、データに応じて適宜変更しても問題ない。このデータバス４６ａ，４６ｂは、レジスタ３１ｂに保持されている演算処理されたデータが、ＳＩＭＤ型プロセッサ１の外部に設けられた画像メモリ６に送られる時にも使用される。 Further, as described above, the memory controller 5 supplies the data stored in the image memory 6 provided outside the SIMD type processor 1 to the external interface 4 as 16-bit parallel data in this embodiment. This 16-bit data is composed of 8 bits given to the processor element 3a assigned with the even number and 8 bits given to the processor element 3a assigned with the odd number. The 8-bit data is applied to the even data bus 46a and the odd data bus 46b. The 8-bit parallel data can be appropriately changed according to the data. The data buses 46 a and 46 b are also used when the processed data held in the register 31 b is sent to the image memory 6 provided outside the SIMD type processor 1.

なお、画像メモリ６は演算処理されるデータを格納するとともに、演算処理されたデータを格納するものであり、これらの画像メモリ６はＳＩＭＤ型プロセッサ１の内部に設けても問題ない。また、メモリコントローラ５と画像メモリ６との間のデータ転送についても、本実施の形態では、後述するように、３２ビットのパラレルデータとして転送されるものとして扱うが、データに応じて適宜変更しても問題ない。なお、メモリコントローラ５が行うその他の動作については後述する。 The image memory 6 stores data to be subjected to arithmetic processing and stores data subjected to arithmetic processing. These image memories 6 may be provided inside the SIMD type processor 1 without any problem. Also, in this embodiment, data transfer between the memory controller 5 and the image memory 6 is treated as being transferred as 32-bit parallel data, as will be described later. However, the data transfer is appropriately changed according to the data. There is no problem. Other operations performed by the memory controller 5 will be described later.

また、グローバルプロセッサ２は、上記シーケンスユニット２２からの命令により、算術論理演算を行うＡＬＵ２３、演算データを格納するデータＲＡＭ２４を備える。さらに、グローバルプロセッサ２は、演算処理されるデータ等を保持するためのレジスタ群２５を備える。 In addition, the global processor 2 includes an ALU 23 that performs arithmetic logic operations and a data RAM 24 that stores operation data in accordance with instructions from the sequence unit 22. Further, the global processor 2 includes a register group 25 for holding data to be processed.

このレジスタ群２５は、プログラムのアドレスを保持するプログラムカウンタＰＣ、演算処理のデータ格納のための汎用レジスタであるＧ０〜Ｇ３レジスタ、レジスタ待避、復帰時に待避先データＲＡＭのアドレスを保持しているスタックポインタ（ＳＰ）、サブルーチンコール時にコール元のアドレスを保持するリンクレジスタ（ＬＳ）、同じくＩＲＱ時とＮＭＩ時の分岐元アドレスを保持するＬＩ、ＬＮレジスタ、プロセッサの状態を保持しているプロセッサステータスレジスタ（Ｐ）を内蔵している。 The register group 25 includes a program counter PC that holds a program address, G0 to G3 registers that are general-purpose registers for storing data for arithmetic processing, and a stack that holds the address of the save destination data RAM at the time of register save and return. Pointer (SP), link register (LS) that holds the address of the caller at the time of a subroutine call, LI and LN registers that hold branch source addresses at the time of IRQ and NMI, and a processor status register that holds the state of the processor (P) is incorporated.

また、レジスタ群２５は、プロセッサエレメントブロック３の後述するレジスタ３５に接続されており、このレジスタ３５との間でシーケンスユニット２２の制御によりデータの交換が行われる。 The register group 25 is connected to a later-described register 35 of the processor element block 3, and data is exchanged with the register 35 under the control of the sequence unit 22.

プロセッサエレメントブロック３は、図１及び図２に示すように、レジスタファイル３１、マルチプレクサ３２、シフト・拡張回路３３、算術論理演算器３４（以下、「ＡＬＵ３４」という。）、レジスタ３５、を一単位とする複数のプロセッサエレメント３ａを備える。レジスタファイル３１には、１つのプロセッサエレメント３ａ単位に８ビットのレジスタが３２本内蔵されており、本実施形態では１０２４のプロセッサエレメント分の組がアレイ構成になっている。レジスタファイル３１は１つのプロセッサエレメント（ＰＥ）３ａごとにＲ０、Ｒ１、Ｒ２、．．．Ｒ３１と呼ばれているレジスタが内蔵されている。それぞれのレジスタファイル３１は演算アレイ３６に対して１つの読み出しポートと１つの書き込みポートを備えており、８ビットのリード／ライト兼用のバスで演算アレイ３６からアクセスされる。３２本のレジスタの内、２４本はプロセッサ外部からアクセス可能であり、外部からクロックとアドレス、リード／ライト制御を入力することで任意のレジスタを読み書きできる。 As shown in FIGS. 1 and 2, the processor element block 3 includes a register file 31, a multiplexer 32, a shift / expansion circuit 33, an arithmetic logic unit 34 (hereinafter referred to as “ALU 34”), and a register 35 as one unit. A plurality of processor elements 3a. The register file 31 includes 32 8-bit registers in one processor element 3a. In this embodiment, a set of 1024 processor elements has an array configuration. The register file 31 stores R0, R1, R2,... For each processor element (PE) 3a. . . A register called R31 is incorporated. Each register file 31 has one read port and one write port for the arithmetic array 36 and is accessed from the arithmetic array 36 by an 8-bit read / write bus. Of the 32 registers, 24 are accessible from outside the processor, and any register can be read and written by inputting a clock, an address, and read / write control from the outside.

レジスタの外部からのアクセスは１つの外部ポートで各プロセッサエレメント３ａの１つのレジスタがアクセス可能であり、外部から入力されたアドレスでプロセッサエレメントの番号（０〜１０２３）を指定する。したがって、レジスタアクセスの外部ポートは全部で２４組搭載されている。また、外部からのアクセスされるデータは上述したように、偶数のプロセッサエレメント３ａと奇数のプロセッサエレメント３ａの１組で１６ビットデータとなっており、１回のアクセスで２つのレジスタを同時にアクセスしている。 Access from the outside of the register allows one register of each processor element 3a to be accessed by one external port, and the processor element number (0 to 1023) is designated by an address inputted from the outside. Therefore, a total of 24 external ports for register access are installed. As described above, externally accessed data is 16-bit data for one set of even-numbered processor elements 3a and odd-numbered processor elements 3a, and two registers are accessed simultaneously by one access. ing.

本実施形態では、プロセッサエレメント３ａの数を１０２４個として説明するが、これに限定されるものでなく適宜変更して使用してもよい。このプロセッサエレメント３ａには、グローバルプロセッサ２のシーケンスユニット２２により、外部インタフェース４に近い順に０から１０２３までのアドレスが割り付けられる。 In the present embodiment, the number of processor elements 3a is described as 1024. However, the number of processor elements 3a is not limited to this, and may be changed as appropriate. Addresses 0 to 1023 are assigned to the processor element 3a in the order of closeness to the external interface 4 by the sequence unit 22 of the global processor 2.

プロセッサエレメント３ａのレジスタファイル３１は、レジスタコントローラ３１ａ、２種類のレジスタ３１ｂ、３１ｃを備える。本実施形態では、図３に示すように、一単位のプロセッサエレメント３ａ毎に、レジスタコントローラ３１ａとレジスタ３１ｂとを２４組備え、さらにレジスタ３１ｃを８個備えている。なお、図３では２組のプロセッサエレメント３ａにおけるレジスタファイル３１の一部を表しており、図３中の１プロセッサエレメントとは１つのプロセッサエレメント３ａを表している。ここで、本実施形態では、レジスタ３１ｂ、３１ｃを８ビットのものとして扱うが、これに限定されるものでなく適宜変更して使用してもよい。 The register file 31 of the processor element 3a includes a register controller 31a and two types of registers 31b and 31c. In the present embodiment, as shown in FIG. 3, each unit of processor element 3a includes 24 sets of register controller 31a and register 31b, and further includes 8 registers 31c. 3 shows a part of the register file 31 in the two sets of processor elements 3a, and one processor element in FIG. 3 represents one processor element 3a. Here, in the present embodiment, the registers 31b and 31c are handled as 8-bit registers, but the present invention is not limited to this and may be used with appropriate modifications.

レジスタコントローラ３１ａは、図３に示すように、外部インタフェース４と、上述したアドレスバス４１ａ、偶数用リード／ライト信号線４５ａ、奇数用リード／ライト信号線４５ｂ、クロック信号線４１ｃを介して接続されている。 As shown in FIG. 3, the register controller 31a is connected to the external interface 4 via the address bus 41a, the even read / write signal line 45a, the odd read / write signal line 45b, and the clock signal line 41c. ing.

外部インタフェース４は、メモリコントローラ５からアドレス制御信号を受けると、アドレス指定信号をアドレスバス４１ａを介してプロセッサエレメントブロック３ヘ送る。これにより、一組のプロセッサエレメント３ａ、即ち２つのプロセッサエレメント３ａが同時にアドレス指定される。レジスタコントローラ３１ａは、送られてきたアドレス指定信号をデコードし、デコードしたアドレスと、自己に割り付けられたアドレスとが一致する場合には、メモリコントローラ５からクロック信号４１ｃを介して送られてきたクロック信号に同期して、リード／ライト信号４５ａ或いは４５ｂを介してメモリコントローラ５から送られてきたリード／ライト指示信号を得る。具体的には、偶数番号が割り付けられているレジスタコントローラ３１ａは、偶数用リード／ライト信号４５ａを介してメモリコントローラ５から送られてきたリード／ライト指示信号を得る。一方、奇数番号が割り付けられているレジスタコントローラ３１ａは、奇数用リード／ライト信号４５ｂを介してメモリコントローラ５から送られてきたリード／ライト指示信号を得る。このとき一組を構成するプロセッサエレメント３ａのレジスタコントローラ３１ａへ送られるリード／ライト指示信号はそれぞれ異なるものであってもよい。即ち、偶数番号が割り付けられているレジスタコントローラ３１ａへ送られる指示信号がリード指示であるとき、奇数番号が割り付けられているレジスタコントローラ３１ａへ送られる指示信号はライト指示であってもよい。そして、このリード／ライト指示信号はレジスタ３１ｂに与えられる。 When receiving the address control signal from the memory controller 5, the external interface 4 sends an address designation signal to the processor element block 3 via the address bus 41a. Thereby, a set of processor elements 3a, ie two processor elements 3a, are addressed simultaneously. The register controller 31a decodes the address designation signal sent, and if the decoded address matches the address assigned to itself, the register controller 31a sends the clock sent from the memory controller 5 via the clock signal 41c. In synchronization with the signal, a read / write instruction signal sent from the memory controller 5 is obtained via the read / write signal 45a or 45b. Specifically, the register controller 31a to which the even number is assigned obtains the read / write instruction signal sent from the memory controller 5 through the even read / write signal 45a. On the other hand, the register controller 31a to which the odd number is assigned obtains the read / write instruction signal sent from the memory controller 5 via the odd read / write signal 45b. At this time, the read / write instruction signals sent to the register controller 31a of the processor element 3a constituting the set may be different. That is, when the instruction signal sent to the register controller 31a assigned with the even number is a read instruction, the instruction signal sent to the register controller 31a assigned with the odd number may be a write instruction. The read / write instruction signal is given to the register 31b.

レジスタコントローラ３１ａから双方のプロセッサエレメント３ａに対し、ライト指示信号が送られてきた場合には、偶数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されるデータ（８ビット）を偶数用データバス４６ａより取得して保持する。また、奇数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されるデータ（８ビット）を奇数用データバス４６ｂより取得して保持する。一方、レジスタコントローラ３１ａから双方のプロセッサエレメント３ａに対し、リード指示信号が送られてきた場合には、偶数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されたデータ（８ビット）を偶数用データバス４６ａへ送る。また、奇数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されたデータ（８ビット）を奇数用データバス４６ｂへ送る。 When a write instruction signal is sent from the register controller 31a to both processor elements 3a, the register 31b of the processor element 3a to which the even number is assigned uses the data (8 bits) to be processed for an even number. Obtained from the data bus 46a and held. Further, the register 31b of the processor element 3a to which the odd number is assigned acquires the data (8 bits) to be processed from the odd data bus 46b and holds it. On the other hand, when a read instruction signal is sent from the register controller 31a to both the processor elements 3a, the register 31b of the processor element 3a to which the even number is assigned receives the processed data (8 bits). The data is sent to the even data bus 46a. In addition, the register 31b of the processor element 3a to which the odd number is assigned sends the arithmetically processed data (8 bits) to the odd data bus 46b.

このように、一度のアドレス指定により、偶数番号が割り付けられたプロセッサエレメント３ａにデータ転送できるとともに、奇数番号が割り付けられたプロセッサエレメント３ａにもデータ転送できる。このため、データの転送回数を少なくすることができ、データ転送を高速にできる。 As described above, data can be transferred to the processor element 3a to which the even number is assigned, and can be transferred to the processor element 3a to which the odd number is assigned. For this reason, the number of times of data transfer can be reduced, and data transfer can be performed at high speed.

レジスタ３１ｂは、後述するＡＬＵ３４でこれから演算される外部から入力されたデータを保持したり、或いはＡＬＵ３４で演算処理されたデータを外部へ出力するために保持するものであり、いわゆる入力レジスタとしても、或いは出力レジスタとしても機能する。また、演算処理されるデータ、或いは演算されたデータを一時的に保持するといった、後述するレジスタ３１ｃとしての機能も有する。なお、本実施形態では、レジスタ３１ｂは８ビットのデータを保持できるものとして扱うが、データに応じて適宜変更しても問題ない。上述したレジスタコントローラ３１ａからライト指示信号が与えられると、レジスタ３１ｂは演算処理されるデータをデータバス４６ａまたはデータバス４６ｂより取得して保持する。一方、レジスタコントローラ３１ａからリード指示信号が送られてくると、レジスタ３１ｂは保持している演算処理されたデータをデータバス４６ａまたはデータバス４６ｂへ与える。このデータは外部インタフェース４からメモリコントローラ５のライトバッファ部５４に与えられ、ライトバッファ部５４から画像メモリ６へ格納される。 The register 31b holds data input from the outside that will be calculated in the ALU 34, which will be described later, or holds the data processed in the ALU 34 for output to the outside. Alternatively, it functions as an output register. Further, it also has a function as a register 31c, which will be described later, such as temporarily holding data to be processed or calculated data. In this embodiment, the register 31b is handled as one that can hold 8-bit data, but there is no problem even if it is appropriately changed according to the data. When the write instruction signal is given from the register controller 31a described above, the register 31b acquires and holds data to be processed from the data bus 46a or the data bus 46b. On the other hand, when a read instruction signal is sent from the register controller 31a, the register 31b gives the data processed and held to the data bus 46a or the data bus 46b. This data is given from the external interface 4 to the write buffer unit 54 of the memory controller 5 and stored in the image memory 6 from the write buffer unit 54.

また、レジスタ３１ｂは、本実施形態においては８ビットデータをパラレルで転送するデータバス３７を介してマルチプレクサ３２に接続されている。ＡＬＵ３４で演算処理されるデータ、或いはＡＬＵ３４で演算処理されたデータは、このデータバス３７を介して、レジスタ３１ｂとの間で転送される。この転送は、グローバルプロセッサ２のシーケンスユニット２２からの指示によって、グローバルプロセッサ２に接続されたリード信号線２６ａ、ライト信号線２６ｂを介して行われる。具体的には、グローバルプロセッサ２のシーケンスユニット２２から、リード信号線２６ａを介してリード指示信号が送られてくると、レジスタ３１ｂは保持している演算処理されるデータをデータバスへ置く。このデータはＡＬＵ３４へ送られ演算処理される。一方、グローバルプロセッサ２のシーケンスユニット２２から、ライト信号線２６ｂを介してライト指示信号が送られてくると、レジスタ３１ｂはデータバス３７を介して送られてきたＡＬＵ３４で演算処理されたデータを保持する。 In the present embodiment, the register 31b is connected to the multiplexer 32 via a data bus 37 for transferring 8-bit data in parallel. Data processed by the ALU 34 or data processed by the ALU 34 is transferred to the register 31b via the data bus 37. This transfer is performed via a read signal line 26 a and a write signal line 26 b connected to the global processor 2 in accordance with an instruction from the sequence unit 22 of the global processor 2. Specifically, when a read instruction signal is sent from the sequence unit 22 of the global processor 2 via the read signal line 26a, the register 31b puts the data to be processed and held in the data bus. This data is sent to the ALU 34 and processed. On the other hand, when a write instruction signal is sent from the sequence unit 22 of the global processor 2 via the write signal line 26b, the register 31b holds the data processed by the ALU 34 sent via the data bus 37. To do.

レジスタ３１ｃは、レジスタ３１ｂより与えられた演算処理されるデータ、或いは演算されたデータがレジスタ３１ｂに与えられる前に、そのデータを一時的に保持するものである。このレジスタ３１ｃは、上述したレジスタ３１ｂと異なり、メモリコントローラ５を介して、画像メモリ６との間においてデータ転送はしない。 The register 31c temporarily holds the data to be processed by the register 31b or before the calculated data is supplied to the register 31b. Unlike the register 31b described above, the register 31c does not transfer data to or from the image memory 6 via the memory controller 5.

演算アレイ３６は、マルチプレクサ３２、シフト／拡張回路３３、１６ビットＡＬＵ３４及び１６ビットのレジスタ３５を備えている。このレジスタ３５には、１６ビットＡレジスタ、Ｆレジスタを内蔵している。 The arithmetic array 36 includes a multiplexer 32, a shift / expansion circuit 33, a 16-bit ALU 34, and a 16-bit register 35. The register 35 includes a 16-bit A register and an F register.

プロセッサエレメント３ａの命令による演算は、基本的にレジスタファイル３１から読み出されたデータをＡＬＵ３４の片側の入力としてもう片側にはレジスタ３５のＡレジスタの内容を入力として結果をＡレジスタに格納する。したがって、Ａレジスタとレジスタファイル３１のＲ０〜Ｒ３１レジスタとの演算が行われることとなる。レジスタファイル３１と演算アレイ３６との接続に（７ｔｏ１）のマルチプレクサ３２を置いており、プロセッサエレメント方向で左に１、２、３つ離れたデータと右に１、２、３つ離れたデータ、中央のデータを演算対象として選択している。また、レジスタファイル３１の８ビットのデータはシフト／拡張回路３３により任意ビットの左シフトしてＡＬＵ３４に入力される。さらに、図示していない８ビットの条件レジスタ（Ｔ）により、プロセッサエレメント３ａごとに演算実行の無効／有効の制御をしており、特定のプロセッサエレメント３ａだけを演算対象として選択できるように構成している。 In the calculation by the instruction of the processor element 3a, basically, the data read from the register file 31 is input to one side of the ALU 34 and the content of the A register of the register 35 is input to the other side, and the result is stored in the A register. Therefore, the operation between the A register and the R0 to R31 registers of the register file 31 is performed. A (7 to 1) multiplexer 32 is placed in the connection between the register file 31 and the arithmetic array 36, and the data 1, 2, 3 away to the left and the data 1, 2, 3 away to the right in the processor element direction, The center data is selected as the calculation target. The 8-bit data in the register file 31 is shifted to the left by an arbitrary bit by the shift / extension circuit 33 and input to the ALU 34. In addition, the execution / invalidation control of each processor element 3a is controlled by an 8-bit condition register (T) (not shown) so that only a specific processor element 3a can be selected as an operation target. ing.

上記したように、マルチプレクサ３２は、自己のプロセッサエレメント３ａに備えられた上記データバス３７に接続されるとともに、両隣３つのプロセッサエレメント３ａに備えられたデータバス３７にも接続されている。このマルチプレクサ３２は７つのプロセッサエレメント３ａから１つを選択し、その選択したプロセッサエレメント３ａにおけるレジスタ３１ｂ、３１ｃで保持されているデータをＡＬＵ３４へ送る。或いはＡＬＵ３４で演算処理されたデータを、選択したプロセッサエレメント３ａにおけるレジスタ３１ｂ、３１ｃへ送る。これによって、隣のプロセッサエレメント３ａにおけるレジスタ３１ｂ、３１ｃで保持されているデータを利用した演算処理が可能になり、ＳＩＭＤ型プロセッサ１の演算処理能力を高めることができる。 As described above, the multiplexer 32 is connected to the data bus 37 provided in its own processor element 3a, and is also connected to the data bus 37 provided in the three adjacent processor elements 3a. The multiplexer 32 selects one of the seven processor elements 3 a and sends the data held in the registers 31 b and 31 c in the selected processor element 3 a to the ALU 34. Alternatively, the data processed by the ALU 34 is sent to the registers 31b and 31c in the selected processor element 3a. As a result, arithmetic processing using data held in the registers 31b and 31c in the adjacent processor element 3a becomes possible, and the arithmetic processing capability of the SIMD type processor 1 can be increased.

シフト／拡張回路３３は、マルチプレクサ３２から送られてきたデータを所定ビットシフトしてＡＬＵ３４へ送る。或いはＡＬＵ３４から送られてきた演算処理されたデータを所定ビットシフトしてマルチプレクサ３２へ送る。 The shift / extension circuit 33 shifts the data sent from the multiplexer 32 by a predetermined bit and sends it to the ALU 34. Alternatively, the arithmetically processed data sent from the ALU 34 is shifted by a predetermined bit and sent to the multiplexer 32.

ＡＬＵ３４は、シフト／拡張回路３３から送られてきたデータと、レジスタ３５に保持されているデータとに基づき算術論理演算を行う。なお、本実施形態では、ＡＬＵ３４は１６ビットのデータに対応できるものとして扱うが、データに応じて適宜変更しても問題ない。演算処理されたデータは、レジスタ３５に保持され、シフト／拡張回路３３へ転送されたり、或いはグローバルプロセッサ２の汎用レジスタ２５へ転送される。 The ALU 34 performs arithmetic logic operations based on the data sent from the shift / expansion circuit 33 and the data held in the register 35. In this embodiment, the ALU 34 is handled as being capable of handling 16-bit data, but there is no problem even if it is appropriately changed according to the data. The processed data is held in the register 35 and transferred to the shift / expansion circuit 33 or transferred to the general-purpose register 25 of the global processor 2.

グローバルプロセッサ２からメモリコントローラ５へはＩ／Ｏ用のアドレス、データ、コントロール信号がバスを介して与えられる。グローバルプロセッサ２がメモリコントローラ５のいくつかの動作設定レジスタ（図示せず）へ動作方法等のコマンドを設定している。最後にグローバルプロセッサ２は、メモリコントローラ５のスタートレジスタ（図示せず）にスタートコードを書き込むことで、メモリコントローラ５は自動的に設定に従った動作を行う。このように構成することで、プロセッサの命令制御による演算と同時にレジスタファイル３１のデータを入出力させることができる。 An I / O address, data, and control signal are given from the global processor 2 to the memory controller 5 via a bus. The global processor 2 sets commands such as an operation method in some operation setting registers (not shown) of the memory controller 5. Finally, the global processor 2 writes a start code in a start register (not shown) of the memory controller 5 so that the memory controller 5 automatically performs an operation according to the setting. With this configuration, the data in the register file 31 can be input / output simultaneously with the calculation based on the instruction control of the processor.

図４は、この発明に用いられるメモリコントローラ５の構成を示したものである。メモリコントローラ５は、画像メモリ６にデータライトを行うライトバッファ部５４と、画像メモリ６からデータリードを行うリードバッファ部５５と、プロセッサエレメントのレジスタファイル３１への制御を行っているＰＥ制御部５２、画像メモリ６への制御を行うＲＡＭ制御部５３、及びシーケンスユニット（ＳＣＵ）５１より構成されている。 FIG. 4 shows the configuration of the memory controller 5 used in the present invention. The memory controller 5 includes a write buffer unit 54 that writes data to the image memory 6, a read buffer unit 55 that reads data from the image memory 6, and a PE control unit 52 that controls the register file 31 of the processor element. A RAM control unit 53 that controls the image memory 6 and a sequence unit (SCU) 51 are included.

メモリコントローラ５は、ＳＩＭＤ型プロセッサ１のレジスタファイル３１と外部インタフェース４内のデータ転送ポートを介して接続されていて、レジスタファイル３１から画像メモリ６へのデータ転送、画像メモリ６からレジスタファイル３１へのデータ転送を行っている。このデータ転送ポートは、出力ポートと入力ポートを備える。また、この実施の形態におけるメモリコントローラ５が制御するレジスタは、上述したように、Ｉ／Ｏ空間にマッピングされており、グローバルプロセッサ２からの指示に従い、アドレス、クロック、及びリード・ライト制御を出力することでリード、ライト可能となっている。 The memory controller 5 is connected to the register file 31 of the SIMD type processor 1 via a data transfer port in the external interface 4, transfers data from the register file 31 to the image memory 6, and transfers from the image memory 6 to the register file 31. Data transfer. This data transfer port includes an output port and an input port. In addition, as described above, the registers controlled by the memory controller 5 in this embodiment are mapped to the I / O space, and output addresses, clocks, and read / write controls in accordance with instructions from the global processor 2. By doing so, it is possible to read and write.

ライトバッファ部５４にはＳＩＭＤ方式プロセッサ１の外部インタフェース４の出力ポートが接続され、リードバッファ部５５には外部インタフェース４の入力ポートが接続される。データ転送ポートはそれぞれ偶数プロセッサエレメント用と奇数プロセッサエレメント用の入力、出力ポートを独立して有しており、１サイクルで一度に偶数、奇数の１組のプロセッサエレメント分のデータがアクセス可能に構成されている。また、ライトバッファ部５４、リードバッファ部５５と画像メモリ６間のデータバスは、それぞれ４プロセッサエレメント分のデータ幅で構成されており、１サイクルで一度に４プロセッサエレメント分のデータをアクセスできる。尚、この実施の形態においては、１プロセッサエレメント分のデータは８ビットとしている。また、外部インタフェース４とライトバッファ５４部及びリードバッファ部５５とのデータバスのビット幅は１６ビットで構成される。従って、メモリコントローラ５と画像メモリ６間のビット幅は３２ビットで構成される。 An output port of the external interface 4 of the SIMD processor 1 is connected to the write buffer unit 54, and an input port of the external interface 4 is connected to the read buffer unit 55. Each data transfer port has independent input and output ports for even processor elements and odd processor elements, and can be configured to access data for one set of even and odd processor elements at a time in one cycle. Has been. The data buses between the write buffer unit 54, the read buffer unit 55, and the image memory 6 are each configured with a data width for four processor elements, and data for four processor elements can be accessed at one time in one cycle. In this embodiment, the data for one processor element is 8 bits. The bit width of the data bus between the external interface 4 and the write buffer 54 and read buffer 55 is 16 bits. Accordingly, the bit width between the memory controller 5 and the image memory 6 is 32 bits.

この結果、外部インターフェース４の外部インタフェース４のデータ転送ポートとメモリコントローラ５間の転送を２回行う間に、画像メモリ６とメモリコントローラ５間の転送を１回実行すればよいことになる。 As a result, the transfer between the image memory 6 and the memory controller 5 only needs to be executed once while the transfer between the data transfer port of the external interface 4 and the memory controller 5 of the external interface 4 is performed twice.

メモリコントローラ５のライトバッファ部５４はＳＩＭＤ型プロセッサ１の外部インタフェース４より出力された画素データを２回取り込み、４個のプロセッサエレメント分のデータに整形した後、画像メモリ６に転送する動作を行っている。また、リードバッファ部５５は、画像メモリ６から読み出した４個のプロセッサエレメント分のデータを２回に分けて、ＳＩＭＤ型プロセッサ１の外部インタフェース４に転送する動作を行っている。 The write buffer unit 54 of the memory controller 5 takes in the pixel data output from the external interface 4 of the SIMD type processor 1 twice, shapes it into data for four processor elements, and transfers it to the image memory 6. ing. The read buffer unit 55 performs an operation of dividing the data for the four processor elements read from the image memory 6 into two times and transferring them to the external interface 4 of the SIMD type processor 1.

図５にプロセッサエレメント（ＰＥ）制御部５２の実施形態の概略ブロック図を示す。 FIG. 5 shows a schematic block diagram of an embodiment of the processor element (PE) control unit 52.

図５に示すように、ＰＥ制御部５２はＳＩＭＤプロセッサ１の外部インタフェース４にプロセッサエレメント３ａのアドレス、クロック、リード／ライト制御信号を出力するものであり、ＳＩＭＤプロセッサ１における各プロセッサエレメント３ａのレジスタファイル３１のデータを読み書きすることを可能にしている。 As shown in FIG. 5, the PE control unit 52 outputs the address, clock, and read / write control signal of the processor element 3 a to the external interface 4 of the SIMD processor 1, and registers the processor elements 3 a in the SIMD processor 1. Data in the file 31 can be read and written.

このＰＥ制御部５２は、プロセッサエレメント（ＰＥ）アドレスカウンタ５２１、転送数カウンタ５２２、５２３、有効データ数カウンタ５２４とからなる。ＰＥアドレスカウンタ５２１はライト転送（プロセッサエレメント３４ａのレジスタファイル３１から画像メモリ６への転送）の場合は、転送開始時に”０”が初期ロードされ、リード転送（画像メモリ６からプロセッサエレメント３４ａのレジスタファイル３１への転送）の場合は、転送開始時に転送開始ＰＥアドレスレジスタ５２５に格納されている値が初期ロードされるアップカウンタであり、データ転送するプロセッサエレメントのアドレスを生成する。転送数カウンタ５２２，５２３は転送するデータ数を初期ロードできるダウンカウンタであり、転送数を管理するのに使用される。転送カウンタ５２２はライトするデータ数がセットされ、転送カウンタ５２３にはリードするデータ数がそれぞれ初期ロードされる。 The PE control unit 52 includes a processor element (PE) address counter 521, transfer number counters 522 and 523, and a valid data number counter 524. In the case of write transfer (transfer from the register file 31 of the processor element 34a to the image memory 6), the PE address counter 521 is initially loaded with “0” at the start of transfer, and read transfer (register from the image memory 6 to the processor element 34a). In the case of (transfer to file 31), the value stored in the transfer start PE address register 525 is initially loaded at the start of transfer, and generates the address of the processor element to which data is transferred. The transfer number counters 522 and 523 are down counters that can initially load the number of data to be transferred, and are used to manage the transfer number. The transfer counter 522 is set with the number of data to be written, and the transfer counter 523 is initially loaded with the number of data to be read.

有効データ数カウンタ５２４は、画像メモリ６に格納済みの有効なデータ数を管理するカウンタであり、画像メモリ６からプロセッサエレメント３ａのレジスタファイル３１にデータ転送する時に、転送数よりも有効データ数の方が多い時にのみ転送が実施されるようになっている。このように構成することで、データの欠落を防止している。 The valid data number counter 524 is a counter that manages the number of valid data stored in the image memory 6. When data is transferred from the image memory 6 to the register file 31 of the processor element 3 a, the valid data number counter 524 has a valid data number that is larger than the transfer number. The transfer is performed only when there are more people. With this configuration, data loss is prevented.

図６にメモリ（ＲＡＭ）制御部５３の第１の実施形態の概略ブロック図を示す。 FIG. 6 shows a schematic block diagram of the first embodiment of the memory (RAM) control unit 53.

図６に示すように、ＲＡＭ制御部５３は、ライトバッファ部５４、リードバッファ部５５からの制御線によって制御され、シングルポートメモリで構成される画像メモリ（ＲＡＭ）６へのクロック、アドレス、リード／ライト制御、バイトセレクトを出力する。 As shown in FIG. 6, the RAM control unit 53 is controlled by control lines from the write buffer unit 54 and the read buffer unit 55, and clocks, addresses, and reads to the image memory (RAM) 6 composed of a single port memory. / Write control and byte select are output.

ＲＡＭ制御部５３は、ＲＡＭアドレス加減器５３１、ライトポインタレジスタ５３２、リードポインタレジスタ５３３、マルチプレクサ５３４とからなる。各ブロックはアドレス設定用バス（以降、ＡＢと略す）を介して接続されている。 The RAM control unit 53 includes a RAM address adjuster 531, a write pointer register 532, a read pointer register 533, and a multiplexer 534. Each block is connected via an address setting bus (hereinafter abbreviated as AB).

ライトポインタレジスタ５３２は次に画像メモリ６にライトすべきポインタを格納しているレジスタであり、リードポインタレジスタ５３３は同様に次に画像メモリ６からリードすべきポインタを格納しているレジスタである。 The write pointer register 532 is a register that stores a pointer to be written to the image memory 6 next, and the read pointer register 533 is a register that similarly stores a pointer to be read from the image memory 6 next.

ライトポインタレジスタ５３２、リードポインタレジスタ５３３は、そのポインタへの画像メモリ６のアクセスの後、ＲＡＭアドレス加減器５３１で更新されたポインタがＡＢ５３５から入力され格納される。ＲＡＭアドレス加減器５３１は、ライトアクセス時にはライトポインタ、リードアクセス時にはリードポインタの値にＲＡＭアクセスサイズに応じた数を、ＦＩＦＯ動作モード時は加算、ＬＩＦＯ動作モード時は減算した値をＡＢ５３５に出力する。 In the write pointer register 532 and the read pointer register 533, the pointer updated by the RAM address adder / subtractor 531 is input from the AB 535 and stored after accessing the pointer in the image memory 6. The RAM address adder / subtractor 531 outputs a value corresponding to the RAM access size to the value of the write pointer at the time of write access and the read pointer at the time of read access, and adds the subtracted value to the AB 535 in the FIFO operation mode and subtraction in the LIFO operation mode. .

シーケンスユニット５１は、アドレスデコーダ、制御レジスタとからなり、グローバルプロセッサ２からＩ／Ｏ空間にマッピングされた制御レジスタをアクセスすることで、メモリコントローラ５全体の制御を行ったり、グローバルプロセッサ２からメモリコントローラ５の内部状態を監視することが可能になっている。また、ライト転送とリード転送の時分割もこのシーケンスユニット５１のブロックで行われる。また、ライトバッファ部５４へのクロックとリードバッファ部５５へのクロックとの２系統のクロックを生成することによってライト転送とリード転送を同時に行うことを可能にしている。 The sequence unit 51 includes an address decoder and a control register. By accessing a control register mapped to the I / O space from the global processor 2, the sequence unit 51 controls the entire memory controller 5, or the global processor 2 controls the memory controller. 5 can be monitored. Further, the time division of the write transfer and the read transfer is also performed in the block of the sequence unit 51. Further, by generating two clocks, a clock to the write buffer unit 54 and a clock to the read buffer unit 55, write transfer and read transfer can be performed simultaneously.

次に、主走査方向の画素数よりもプロセッサエレメント３ａの数が少ないＳＩＭＤ型プロセッサを用いた場合の画素データの転送方法について説明する。図７は主走査方向の画素数よりもプロセッサエレメント３ａの数が少ないＳＩＭＤ方式のプロセッサを用いた場合の画素データの転送方法についての説明図を示している。 Next, a description will be given of a pixel data transfer method in the case of using an SIMD type processor having a smaller number of processor elements 3a than the number of pixels in the main scanning direction. FIG. 7 shows an explanatory diagram of a pixel data transfer method when a SIMD processor having a smaller number of processor elements 3a than the number of pixels in the main scanning direction is used.

主走査方向の画素数よりプロセッサエレメント３ａが少ない場合には、主走査方向の画素データを分割して、メモリコントローラ５は、ＳＩＭＤプロセッサ１のプロセッサエレメント３ａのレジスタファイル３１に画素データを画像メモリ６より与え、この処理を繰り返し実行している。すなわち、主走査方向の画素数よりもプロセッサエレメント数が少ない場合には、主走査方向の画素データを分割してＳＩＭＤプロセッサ１で演算処理を繰り返し実行している。 When the number of processor elements 3a is smaller than the number of pixels in the main scanning direction, the pixel data in the main scanning direction is divided and the memory controller 5 stores the pixel data in the register file 31 of the processor element 3a of the SIMD processor 1 in the image memory 6 This process is repeatedly executed. That is, when the number of processor elements is smaller than the number of pixels in the main scanning direction, pixel data in the main scanning direction is divided and the arithmetic processing is repeatedly executed by the SIMD processor 1.

通常、ＳＩＭＤ型プロセッサを用いた画像処理ではフィルター処理などのように、注目画素前後の画素のデータを参照した処理を含む処理を実施するとＳＩＭＤの両端のデータに無効なデータが残る。このため、プロセッサエレメント３ａのレジスタファイルから画像メモリ６にデータを格納する際には、両端のデータを除いて格納する必要がある。また、画像メモリ６からプロセッサエレメント３ａのレジスタファイル３１にデータを転送する際には、注目画素前後の参照用の画素データも併せて転送する必要がある。つまり、フィルター処理などの重み付け処理における画素データの転送は以下の順序で行われることになる。ここでは、参照用画素の個数をａ個とする。 Usually, in image processing using a SIMD type processor, when processing including processing referring to pixel data before and after a pixel of interest is performed, such as filter processing, invalid data remains in data at both ends of the SIMD. For this reason, when data is stored in the image memory 6 from the register file of the processor element 3a, it is necessary to store the data excluding the data at both ends. Further, when data is transferred from the image memory 6 to the register file 31 of the processor element 3a, it is necessary to transfer reference pixel data before and after the target pixel. That is, transfer of pixel data in weighting processing such as filter processing is performed in the following order. Here, the number of reference pixels is a.

１．処理前画素データを画像メモリ６からＳＩＭＤプロセッサ１のプロセッサエレメント３ａの対応するレジスタファイル３１へ転送する。 1. The pre-processing pixel data is transferred from the image memory 6 to the corresponding register file 31 of the processor element 3a of the SIMD processor 1.

２．ＳＩＭＤプロセッサ１による画像処理を行う。このとき、ＳＩＭＤプロセッサエレメント３ａの前後ａ個のデータは参照画素不在により無効にする。すなわち、図７における斜線部を施した部分が有効画素数になる。 2. Image processing by the SIMD processor 1 is performed. At this time, the a data before and after the SIMD processor element 3a are invalidated due to the absence of the reference pixel. That is, the hatched portion in FIG. 7 is the number of effective pixels.

３．処理後画素データをＳＩＭＤプロセッサ１から画像メモリ６に転送する。このとき、両端の前後ａ個分の画素を除いた有効画素が画像メモリ６に転送される。 3. The processed pixel data is transferred from the SIMD processor 1 to the image memory 6. At this time, effective pixels excluding a pixels before and after both ends are transferred to the image memory 6.

図７を参照して、画像メモリ６とＳＩＭＤプロセッサ１間の画素データの転送につき説明する。ＳＩＭＤプロセッサ１のプロセッサエレメント（ＰＥ）３ａは、ｎ個、すなわち、ＰＥ０からＰＥｎ−１を備え、これらプロセッサエレメント３ａ…に画像メモリ６から画素データを送り、画像処理を行った後、これらプロセッサエレメント３ａ…から画像メモリ６に画素データが転送される。 The transfer of pixel data between the image memory 6 and the SIMD processor 1 will be described with reference to FIG. The processor element (PE) 3a of the SIMD processor 1 includes n, that is, PE0 to PEn-1, and sends pixel data from the image memory 6 to the processor elements 3a. Pixel data is transferred from 3a to the image memory 6.

まず、最初の転送、すなわち、図中１ＳＩＭＤ目の転送では、ＳＩＭＤプロセッサ１のプロセッサエレメント３ａ…前後に参照画素ａ個を併せて転送している。図７に示す例では、ＲＡＭ制御部５３は、まず、リードポインタ５３３に０を格納し、ＲＡＭアクセスサイズに応じた数、この実施の形態においては、３２ビット分のデータを画像メモリ６より読み出し、リードバッファ部５５へ格納する。そして、アドレス加減器５３１は、リードポインタの値にＲＡＭアクセスサイズに応じた数を、ＦＩＦＯ動作モード時は加算、ＬＩＦＯ動作モード時は減算した値をＡＢ５３５に出力し、その値がリードポインタ５３３に格納される。そして、ＰＥ制御部５２のＰＥアドレスカウンタ５２１により、アドレスが生成され、そのアドレスに基づき、外部インターフェース４から該当するプロセッサエレメント３ａ…にデータが書き込みされる。転送数カウンタ５２２は画像データの書き込みの度にＲＡＭサイズに応じた数、この実施の形態では４ずつデクリメントしてゆく。上記の処理はこの転送カウンタ５２２の値が０になるまで繰り返し実行され、ＳＩＭＤプロセッサ１のＰＥ０からＰＥｎ−１に画像メモリ６からの画素データが転送される。 First, in the first transfer, that is, the 1st SIMD transfer in the figure, a reference pixel a is transferred before and after the processor element 3a of the SIMD processor 1. In the example shown in FIG. 7, the RAM control unit 53 first stores 0 in the read pointer 533, and reads the number corresponding to the RAM access size, in this embodiment, 32 bits of data from the image memory 6. And stored in the read buffer unit 55. The address adder / subtractor 531 outputs a value corresponding to the RAM access size to the read pointer value in the FIFO operation mode, and outputs the subtracted value to the AB 535 in the LIFO operation mode. Stored. Then, an address is generated by the PE address counter 521 of the PE control unit 52, and data is written from the external interface 4 to the corresponding processor element 3a... Based on the address. The transfer number counter 522 is decremented by a number corresponding to the RAM size each time image data is written, ie, 4 in this embodiment. The above processing is repeatedly executed until the value of the transfer counter 522 becomes 0, and the pixel data from the image memory 6 is transferred from PE0 to PEn-1 of the SIMD processor 1.

処理後の画素データを画像メモリ６に転送する場合、前後それぞれａ個の画素データは無効なので、転送する画素データはＰＥ０からＰＥｎ−１までのｎ画素の内、ａ〜（ｎ−ａ−１）までの（ｎ−２×ａ）画素である。このため、ＰＥ制御部５２のＰＥアドレスカウンタ５２１には、転送開始時に転送開始ＰＥアドレスレジスタ５２５の値がロードされ、そのアドレスに基づき、該当するプロセッサエレメント３ａ…からデータが読み出され、外部インタフェース４からライトバッファ部５４に画像データが書き込みまれる。転送数カウンタ５２３には読み出される度にＲＡＭアクセスサイズに応じた数、この実施の形態では４ずつデクリメントされる。ライトバッファ部５４に３２ビット分のデータが格納されると、画像メモリ６に画像データが転送される。ＲＡＭ制御部５３は、まず、ライトポインタ５３２にａを格納し、ライトバッファ部５４に格納された画像データ、この実施の形態においては、３２ビット分のデータを画像メモリ６に転送する。アドレス加減器５３１は、リードポインタの値にＲＡＭアクセスサイズに応じた数を、ＦＩＦＯ動作モード時は加算、ＬＩＦＯ動作モード時は減算した値をＡＢ５３５に出力し、その値がライトポインタ５３２に格納される。このようにして、ＳＩＭＤプロセッサ１のＰＥａからＰＥｎ−２×ａまでの画像データが画像メモリ６に転送される。 When the processed pixel data is transferred to the image memory 6, the a pixel data before and after the pixel data are invalid, and therefore, the pixel data to be transferred is a to (na-1) among n pixels from PE0 to PEn-1. ) Pixels up to (n−2 × a). For this reason, the value of the transfer start PE address register 525 is loaded into the PE address counter 521 of the PE control unit 52 at the start of transfer, and data is read from the corresponding processor element 3a. The image data is written from 4 to the write buffer unit 54. Each time data is read, the transfer number counter 523 is decremented by a number corresponding to the RAM access size, ie, 4 in this embodiment. When 32-bit data is stored in the write buffer unit 54, the image data is transferred to the image memory 6. First, the RAM control unit 53 stores a in the write pointer 532 and transfers the image data stored in the write buffer unit 54, in this embodiment, 32-bit data to the image memory 6. The address adder / subtractor 531 outputs the value corresponding to the RAM access size to the value of the read pointer, adds the value in the FIFO operation mode, and subtracts the value in the LIFO operation mode to the AB 535, and the value is stored in the write pointer 532. The In this manner, image data from PEa to PEn-2 × a of the SIMD processor 1 is transferred to the image memory 6.

続いて、２番目の転送が同様にして行われる。図中２ＳＩＭＤ目の転送では、（ｎ−ａ）〜（２ｎ−３×ａ）までの画素を注目画素として処理を行うため、参照画素として（ｎ−２×ａ）〜（ｎ−ａ−１）及び、（２ｎ−３×ａ）〜（２ｎ−２×ａ−１）の画素データを併せて送っている。データ処理後、（ｎ−ａ）〜（ｎ−２×ａ）の（ｎ−３×ａ）画素が画像メモリ６に格納される。主走査方向のＳＩＭＤ分割数が多い場合も同様の処理を繰り返すだけである。 Subsequently, the second transfer is performed in the same manner. In the second SIMD transfer in the figure, since the pixels from (n−a) to (2n−3 × a) are processed as the target pixel, (n−2 × a) to (n−a−1) are used as reference pixels. ) And (2n−3 × a) to (2n−2 × a−1) pixel data are sent together. After the data processing, (n−3 × a) pixels (n−a) to (n−2 × a) are stored in the image memory 6. Similar processing is only repeated when the number of SIMD divisions in the main scanning direction is large.

以上の処理を実現するには、画像メモリ６へのデータライトの場合には、本発明のプロセッサエレメント（ＰＥ）制御部５２の実施形態において、転送開始ＰＥアドレスレジスタ５２５に”ａ”を設定し、転送数カウンタ５２２に”（ｎ−２×ａ）を設定すればよい。 In order to realize the above processing, in the case of data write to the image memory 6, “a” is set in the transfer start PE address register 525 in the embodiment of the processor element (PE) control unit 52 of the present invention. In this case, “(n−2 × a)” may be set in the transfer number counter 522.

設定数は全てＰＥアドレスを基準としているため、ＳＩＭＤごとに設定を変更する必要がない。また、画像メモリ６からのデータリードの場合には、リードポインタ５３３を転送終了後に戻すという操作が必要である。上記の例では（ｎ−２×ａ）に戻すことになる。ＲＡＭ制御部５３の打愛１の実施形態においては、転送が終了した後に、リードポインタ５３３の値をグローバルプロセッサ２から設定する必要がある。 Since all the setting numbers are based on the PE address, there is no need to change the setting for each SIMD. In the case of reading data from the image memory 6, an operation of returning the read pointer 533 after the transfer is completed is necessary. In the above example, the value is returned to (n−2 × a). In the embodiment of the hitting 1 of the RAM control unit 53, it is necessary to set the value of the read pointer 533 from the global processor 2 after the transfer is completed.

また、画像メモリ６へのデータライトの場合には、ライトポインタ５３２の操作を行う必要はない。 In the case of data writing to the image memory 6, it is not necessary to operate the write pointer 532.

図８は本発明におけるＲＡＭ制御部５３の第２の実施形態を示す概略ブロック図である。 FIG. 8 is a schematic block diagram showing a second embodiment of the RAM control unit 53 according to the present invention.

図８に示すＲＡＭ制御部５３は、図６に示したＲＡＭ制御部５３に、さらにリードオフセット生成器５３６を設けたものである。このＲＡＭ制御部５３は、リードオフセット生成器５３６によりメモリ６からＳＩＭＤプロセッサ１のプロセッサエレメント３ａの各レジスタファイルへのデータ転送の際に、プロセッサエレメント３ａへの転送終了後にリードポインタ５３３の値を戻すことができるように変更している。 The RAM control unit 53 shown in FIG. 8 is obtained by adding a read offset generator 536 to the RAM control unit 53 shown in FIG. The RAM control unit 53 returns the value of the read pointer 533 after the transfer to the processor element 3a is completed when data is transferred from the memory 6 to each register file of the processor element 3a of the SIMD processor 1 by the read offset generator 536. It has changed to be able to.

リードオフセット生成器５３６は、ＰＥ制御部５２より入力されたＰＥアドレスを監視し、ＰＥアドレスが設定された値と等しくなると、その時のリードポインタ５３３の値を保持し、設定された転送数を送り終えるまで転送を継続する。転送が終了すると、保持しておいた値をリードポインタ５３３にリロードする。上記図７で示す例では（ｎ−２×ａ−１）を設定することになる。設定値は全てＰＥアドレスを基準としているため、ＲＡＭ制御部５３の第１の実施形態のようにＳＩＭＤごとに設定を変更する必要がない。 The read offset generator 536 monitors the PE address input from the PE control unit 52. When the PE address becomes equal to the set value, the read offset generator 536 holds the value of the read pointer 533 at that time and sends the set transfer number. Continue forwarding until finished. When the transfer is completed, the held value is reloaded into the read pointer 533. In the example shown in FIG. 7, (n−2 × a−1) is set. Since all setting values are based on the PE address, it is not necessary to change the setting for each SIMD as in the first embodiment of the RAM control unit 53.

図９は本発明におけるＲＡＭ制御部５３の第３の実施の形態を示す概略ブロック図である。 FIG. 9 is a schematic block diagram showing a third embodiment of the RAM control unit 53 according to the present invention.

図９に示すＲＡＭ制御部５３は、上記処理に加えて、画像メモリ６の特定の領域のみをリング状に使用することを可能にしたものである。 In addition to the above processing, the RAM control unit 53 shown in FIG. 9 can use only a specific area of the image memory 6 in a ring shape.

このＲＡＭ制御部５３は、図８に示したＲＡＭ制御部にさらに、２つのレジスタ５３７、５３８及び比較器５３９を設けたものである。レジスタ（ＬＡＤＤＲ）５３７は画像メモリ６のアドレスの下限値を設定し、レジスタ（ＵＡＤＤＲ）５３８は画像メモリ６のアドレスの上限値を設定する。そして、比較器５３９はそれぞれ現在のポインタ５３２（５３３）とＬＡＤＤＲ５３７、ＵＡＤＤＲ５３８に設定された値との比較を行う。 The RAM control unit 53 is obtained by adding two registers 537 and 538 and a comparator 539 to the RAM control unit shown in FIG. The register (LADDR) 537 sets the lower limit value of the address of the image memory 6, and the register (UADDR) 538 sets the upper limit value of the address of the image memory 6. The comparator 539 compares the current pointer 532 (533) with the values set in the LADDR 537 and UADDR 538, respectively.

ＦＩＦＯモード時に、ＵＡＤＤＲ５３８とポインタ５３２（５３３）が一致していて、かつアドレス加減器５３１より出力される値がＵＡＤＤＲ５３８を越える場合に、アドレス加減器５３１よりＡＢ５３５への出力をネゲートし、ＬＡＤＤＲ５３７よりＡＢ５３５への出力をアサートする。 In the FIFO mode, when the UADDR 538 and the pointer 532 (533) coincide with each other and the value output from the address adjuster 531 exceeds the UADDR 538, the output from the address adjuster 531 to the AB 535 is negated, and the LADDR 537 receives the AB 535. Assert the output to

ＬＩＦＯモード時は逆に、ＬＡＤＤＲ５３７とポインタ５３２（５３３）が一致し、かつアドレス加減器５３１より出力される値がＬＡＤＤＲ５３７を下回る場合に、アドレス加減器５３１よりスＡＢ５３５への出力をネゲートし、ＵＡＤＤＲ５３８よりＡＢ５３５への出力をアサートする。上記のように構成することで、画像メモリ６のメモリ空間の特定空間だけを用いることができるようになる。 Conversely, in the LIFO mode, when the LADDR 537 matches the pointer 532 (533) and the value output from the address adjuster 531 is lower than the LADDR 537, the output from the address adjuster 531 to the bus AB 535 is negated, and the UADDR 538 Assert the output to AB535. By configuring as described above, only a specific space of the memory space of the image memory 6 can be used.

図１０にこの発明の実施の形態にかかるライトバッファ部５４及びリードバッファ部５５の概略ブロック図を示す。 FIG. 10 is a schematic block diagram of the write buffer unit 54 and the read buffer unit 55 according to the embodiment of the present invention.

この実施の形態におけるライトバッファ部５４及びリードバッファ部５５は、偶数プロセッサエレメント３ａ、奇数プロセッサエレメント３ａの専用のポートを持っていて、それぞれ１サイクルで２プロセッサエレメント分のデータをアクセス可能に構成され、全プロセッサエレメント数の半分のサイクル数でデータ転送することが可能となっている。 The write buffer unit 54 and the read buffer unit 55 in this embodiment have dedicated ports for the even-numbered processor element 3a and the odd-numbered processor element 3a, and are configured to be able to access data for two processor elements in one cycle. It is possible to transfer data with the number of cycles that is half of the total number of processor elements.

ライトバッファ部５４は４プロセッサエレメント分のデータがバッファに格納されると、画像メモリ６にライトアクセスを行うように構成されている。 The write buffer unit 54 is configured to perform write access to the image memory 6 when data for four processor elements is stored in the buffer.

外部インタフェース４の転送ポートからライトバッファ部５４に与えられる２プロセッサエレメント分のデータは、まずフリップフロップ５４１，５４２に格納された後、次段のフリップフロップ５４３，５４４に転送される。続いて、与えられる２プロセッサエレメント分はフリップフロップ５４１，５４２に格納される。そして、フリップフロップ５４１〜５４４に格納された４プロセッサエレメント分のデータがそれぞれラッチ５４５〜５４８に格納される。ライトバッファ部５４は、４プロセッサエレメント分のデータをラッチ５４５〜５４８に格納されると、画像メモリ６にライトアクセスを行う。 Data for two processor elements supplied from the transfer port of the external interface 4 to the write buffer unit 54 is first stored in the flip-flops 541 and 542 and then transferred to the flip-flops 543 and 544 in the next stage. Subsequently, the given two processor elements are stored in flip-flops 541 and 542. Then, the data for four processor elements stored in the flip-flops 541 to 544 are stored in the latches 545 to 548, respectively. When the data for four processor elements is stored in the latches 545 to 548, the write buffer unit 54 performs write access to the image memory 6.

リードバッファ部５５は、画像メモリ６から４プロセッサエレメント分読み出されたデータをバッファとしてのフリップフロップ５５１〜５５４に格納する。４プロセッサエレメント分のデータから２プロセッサエレメント分がマルチプレクサ５５５により選択され、ラッチ５５６，５５７に格納される。このラッチ５５６，５５７に格納された画像データが外部インタフェース４を介してＳＩＭＤプロセッサ１のレジスタファイル３１に転送される。 The read buffer unit 55 stores data read from the image memory 6 for four processor elements in flip-flops 551 to 554 as buffers. Two processor elements are selected from the data of four processor elements by the multiplexer 555 and stored in the latches 556 and 557. The image data stored in the latches 556 and 557 is transferred to the register file 31 of the SIMD processor 1 via the external interface 4.

４プロセッサエレメント分のデータの転送が終わると再び画像メモリ６にリードアクセスを行う。画像メモリ６は一度に４プロセッサエレメント分のデータをアクセスできるので、２サイクルで一度のアクセスを実現できればＳＩＭＤプロセッサ１とのインタフェースが取れることとなり、画像メモリ６のアクセスタイムの制限を緩和できる。 When the transfer of data for four processor elements is completed, read access to the image memory 6 is performed again. Since the image memory 6 can access data for four processor elements at a time, if the access can be realized once in two cycles, the interface with the SIMD processor 1 can be taken, and the access time limit of the image memory 6 can be relaxed.

図１１はデジタルコピーやファクシミリなどでよく行われる変倍処理の内、縮小を実現するための間引きライト動作について図示したものである。 FIG. 11 illustrates a thinning write operation for realizing reduction among the scaling processes often performed in digital copying, facsimile, and the like.

図１１ではプロセッサエレメント（０）、プロセッサエレメント（２）などの画素データはそのまま画像メモリ６に格納され、プロセッサエレメント（１）、プロセッサエレメント（３）などの画素データが画像メモリ６に格納されずに間引かれている。 In FIG. 11, the pixel data of the processor element (0), the processor element (2), etc. are stored in the image memory 6 as they are, and the pixel data of the processor element (1), the processor element (3), etc. are not stored in the image memory 6. Is thinned out.

メモリコントローラ５は偶数側のプロセッサエレメント３ａと奇数側のプロセッサエレメント３ａのそれぞれのデータを間引くかどうかを決定する外部ライト制御信号を２本有している。また、メモリコントローラ５とライト制御信号との同期を取るために転送を開始するタイミングをメモリコントローラ５に通知する外部端子を有している。転送を開始する外部同期信号はシーケンスユニットに入力されており、同期信号がアサートされるとシーケンスユニット５１は転送を開始させる。 The memory controller 5 has two external write control signals for determining whether to thin out the data of the even-numbered processor element 3a and the odd-numbered processor element 3a. The memory controller 5 has an external terminal for notifying the memory controller 5 of the timing for starting transfer in order to synchronize the write control signal. The external synchronization signal for starting the transfer is input to the sequence unit, and when the synchronization signal is asserted, the sequence unit 51 starts the transfer.

シーケンスユニット５１は、上記ライト制御信号の値を検出して、外部からのライト制御信号が”０”である場合には、そのデータをライトバッファ部５４内のバッファに書き込む。また、シーケンスユニット５１は、上記ライト制御信号の値を検出して、”１”が立っている場合は、そのデータをライトバッファ部５４内のバッファに格納することを抑止する。 The sequence unit 51 detects the value of the write control signal, and when the external write control signal is “0”, writes the data in the buffer in the write buffer unit 54. Further, the sequence unit 51 detects the value of the write control signal, and when “1” is set, the sequence unit 51 suppresses storing the data in the buffer in the write buffer unit 54.

ライトバッファ部５４は転送開始時及び転送終了時の例外を除き、４プロセッサエレメント分のデータが格納されるまで、画像メモリ６への転送要求をＲＡＭ制御部５３に対して出力しないため、アドレスポインタの更新と画像メモリ６へのクロック、リード・ライト制御、バイトセレクトの出力は４プロセッサエレメント分のデータがライトバッファ部５４に格納されるまで行われない。ＳＩＭＤプロセッサ１のデータ転送ポートからのデータリードはライト制御信号によらず継続する。 Since the write buffer unit 54 does not output a transfer request to the image memory 6 to the RAM control unit 53 until data for four processor elements is stored, except for exceptions at the start and end of transfer, the address pointer The update of the data, the output to the image memory 6, the read / write control, and the byte select are not performed until the data for four processor elements is stored in the write buffer unit 54. Data reading from the data transfer port of the SIMD processor 1 continues regardless of the write control signal.

図１２は上記変倍処理の内、拡大を実現するための重複リード動作について図示したものである。図１２ではプロセッサエレメント（０）、プロセッサエレメント（２）などのレジスタにはリードポインタの位置から順番に画像メモリ６のデータが書き込まれ、プロセッサエレメント（１）、プロセッサエレメント（３）などのレジスタには１つ前のプロセッサエレメントのレジスタに書き込まれた値が重複して書き込まれている。 FIG. 12 illustrates an overlapping read operation for realizing enlargement in the above scaling process. In FIG. 12, the data of the image memory 6 is written into the registers such as the processor element (0) and the processor element (2) sequentially from the position of the read pointer, and the registers such as the processor element (1) and the processor element (3) are written. The value written in the register of the previous processor element is written in duplicate.

メモリコントローラ５は偶数側のプロセッサエレメント３ａと奇数側のプロセッサエレメント３ａのそれぞれのデータを書き込む際に前と同じデータを重複して書き込むかどうかを決定する外部リード制御信号を２本有している。 The memory controller 5 has two external read control signals for deciding whether or not the same data as before is written when the data of the even-numbered processor element 3a and the odd-numbered processor element 3a is written. .

メモリコントローラ５と重複制御信号との同期を取るために転送を開始するタイミングをメモリコントローラ５に通知する外部端子を有している。転送を開始する外部同期信号はシーケンスユニット５１に入力されており、同期信号がアサートされるとシーケンスユニット５１は転送を開始させる。 In order to synchronize the memory controller 5 and the overlap control signal, an external terminal for notifying the memory controller 5 of the timing for starting the transfer is provided. The external synchronization signal for starting the transfer is input to the sequence unit 51. When the synchronization signal is asserted, the sequence unit 51 starts the transfer.

シーケンスユニット５１は、上記リード制御信号の値を検出して、”１”が立っている場合は、リードバッファ部５５のマルチプレクサを制御して、１つ前にＰＥアドレスを持つプロセッサエレメントのレジスタに転送すべきデータを再度出力できるようにしている。リードバッファ部５５は転送開始時と転送終了時の例外を除き、画像メモリ６からリードされた４プロセッサエレメント分のデータが全て不必要になるまで、ＲＡＭ制御部５３に対して画像メモリ６へのリードデータ転送要求を出力しないため、（たとえばリード制御が常に”０”であれば２回のデータ転送ポートへのアクセスがあるまでであり、リード制御信号に１がたっている間はデータが不必要になることはない。）アドレスポインタの更新と画像メモリ６へのクロック、リード・ライト制御、バイトセレクトの出力は４プロセッサエレメント分のデータが全て不必要になるまで行われない。ＳＩＭＤプロセッサ１の外部インタフェース４へのデータライトはリード制御信号によらず継続する。 The sequence unit 51 detects the value of the read control signal. When “1” is set, the sequence unit 51 controls the multiplexer of the read buffer unit 55 to store the register of the processor element having the previous PE address. The data to be transferred can be output again. The read buffer unit 55 transfers the data to the image memory 6 to the RAM control unit 53 until all the data for the four processor elements read from the image memory 6 are unnecessary, except for exceptions at the start and end of the transfer. Since the read data transfer request is not output (for example, if the read control is always “0”, the data transfer port is accessed twice, and no data is required while the read control signal is 1). The updating of the address pointer and the output of the clock, read / write control, and byte select to the image memory 6 are not performed until all the data for the four processor elements are unnecessary. Data writing to the external interface 4 of the SIMD processor 1 continues regardless of the read control signal.

上記した実施の形態においては、一度のアドレス指定により、ＳＩＭＤプロセッサ１の偶数番号が割り付けられたプロセッサエレメント３ａにデータ転送できるとともに、奇数番号が割り付けられたプロセッサエレメント３ａにもデータ転送できるように構成しているが、ＳＩＭＤプロセッサ１への画像データの転送はこの方式に限られるものではない。たとえば、図１３に示すように、ＳＩＭＤプロセッサ１のプロセッサエレメント３ａに、奇数、偶数の区別を付けずに、アドレス指定により順次データを転送するように構成したものにおいても、この発明は適用できる。すなわち、図１１に示すように、レジスタコントローラ３１ａは、外部インタフェース４と、アドレスバス４１ａ、リード／ライト信号４５ｃ、クロック信号４１ｃを介して接続されている。このレジスタコントローラ３１ａは、メモリコントローラ５から外部インタフェース４に与えられ、アドレスバス４１ａを介してアドレス指定信号が送られてくると、そのアドレス指定信号をデコードする。そして、デコードしたアドレスと、自己のプロセッサエレメント３ａに割り付けられたアドレスとが一致する場合には、メモリコントローラ５から外部インタフェース４に与えられ、クロック信号４１ｃからのクロック信号に同期して、リード／ライト信号４１ｂを介してメモリコントローラ５から送られてきたリード／ライト指示信号を得る。このリード／ライト指示信号は、レジスタ３１ｂへ与えられる。 In the above-described embodiment, the data can be transferred to the processor element 3a to which the even number of the SIMD processor 1 is assigned, and the data can be transferred to the processor element 3a to which the odd number is assigned. However, the transfer of image data to the SIMD processor 1 is not limited to this method. For example, as shown in FIG. 13, the present invention can also be applied to a configuration in which the processor element 3a of the SIMD processor 1 is configured to sequentially transfer data by address designation without distinguishing between odd and even. That is, as shown in FIG. 11, the register controller 31a is connected to the external interface 4 via the address bus 41a, the read / write signal 45c, and the clock signal 41c. When the register controller 31a is supplied from the memory controller 5 to the external interface 4 and receives an address designation signal via the address bus 41a, the register controller 31a decodes the address designation signal. If the decoded address matches the address assigned to its own processor element 3a, it is given from the memory controller 5 to the external interface 4 and read / read in synchronization with the clock signal from the clock signal 41c. A read / write instruction signal sent from the memory controller 5 is obtained via the write signal 41b. This read / write instruction signal is applied to the register 31b.

ＳＩＭＤ型プロセッサ１の外部に設けられた画像メモリ６に格納されているデータを、この実施形態では８ビットのパラレルデータとして、データバス４６ｃに置く。このデータバス４６ｃは、レジスタ３１ｂに保持されている演算処理されたデータが、ＳＩＭＤ型プロセッサ１の外部に設けられた画像メモリ６に送られる時にも使用される。 In this embodiment, data stored in the image memory 6 provided outside the SIMD type processor 1 is placed on the data bus 46c as 8-bit parallel data. The data bus 46c is also used when the processed data held in the register 31b is sent to the image memory 6 provided outside the SIMD type processor 1.

外部インタフェース４から与えられるアドレス、リード／ライト、クロック、データの信号はレジスタファイル３１の各レジスタに供給される。そして、各プロセッサエレメント３ａ…ごとにアドレスをデコードして各プロセッサエレメント３ａ…を示すアドレスと一致したプロセッサエレメント３ａだけがリード／ライトの動作をおこなう。 Address, read / write, clock, and data signals given from the external interface 4 are supplied to each register of the register file 31. Then, the address is decoded for each processor element 3a..., And only the processor element 3a that matches the address indicating each processor element 3a.

このように構成されるＳＩＭＤ型プロセッサ１は、メモリコントローラ５が、画像メモリ６に格納されているデータをプロセッサエレメント３ａに送る場合、プロセッサエレメント３ａに割り付けられたアドレスを指定することにより、１回のクロック信号が入力されるだけで、その指定したプロセッサエレメント３ａにデータが送られる。なお、この例では、偶数、奇数のプロセッサエレメント３ａに同時にデータは送られないので、第１の実施の形態に比べると、データ転送に時間はかかるが、回路構成は簡略化できる。 When the memory controller 5 sends the data stored in the image memory 6 to the processor element 3a, the SIMD processor 1 configured as described above designates the address assigned to the processor element 3a once. Is input to the designated processor element 3a. In this example, since data is not simultaneously sent to the even and odd processor elements 3a, the data transfer takes time compared to the first embodiment, but the circuit configuration can be simplified.

上述した実施形態においては、プロセッサエレメント３ａをアドレス指定しているが、プロセッサエレメント３ａの指定をアドレス指定する方式ではなく、ポインタ指定する方式、即ちシリアルアクセスメモリ方式においても、この発明は適用できる。この例につき図１４に従い説明する。なお、ここでは上述した第１の実施形態と異なる点について説明することとし、同じ点については説明を省略する。また、上述した第１実施形態と同じ構成部分については、同一の符号を付する。 In the above-described embodiment, the processor element 3a is addressed. However, the present invention can also be applied to a pointer designation method, that is, a serial access memory method, instead of a method of addressing designation of the processor element 3a. This example will be described with reference to FIG. Here, the points different from the first embodiment described above will be described, and the description of the same points will be omitted. Moreover, the same code | symbol is attached | subjected about the same component as 1st Embodiment mentioned above.

まず、グローバルプロセッサ２からメモリコントローラ５へはＩ／Ｏ用のアドレス、データ、コントロール信号がバスを介して与えられる。グローバルプロセッサ２がメモリコントローラ５のいくつかの動作設定レジスタ（図示せず）へ動作方法等のコマンドを設定している。最後にグローバルプロセッサ２は、メモリコントローラ５のスタートレジスタ（図示せず）にスタートコードを書き込むことで、メモリコントローラ５は自動的に設定に従った動作を行う。メモリコントローラ５は、グローバルプロセッサ２のコマンドに基づき、このリセット信号を生成し、外部インタフェース４からリセット信号４７を介してプロセッサエレメントブロック３ヘ送る。これにより、レジスタコントローラ３１ａは、リセットされる。そして、外部インタフェース４に最も近いレジスタコントローラ３１ａへメモリコントローラ５から外部インタフェース４、クロック信号４１ｃを介してクロック信号が送られる。このクロック信号に同期して、レジスタコントローラ３１ａ’は、リード／ライト信号４５ａ或いは４５ｂを介してメモリコントローラ５から送られてきたリード／ライト指示信号を得る。このリード／ライト指示信号は、偶数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂ、及び奇数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂにそれぞれ与えられる。このとき一組を構成するプロセッサエレメント３ａのレジスタコントローラ３１ａ’へ送られるリード／ライト指示信号は、上記第１実施形態の場合と同様それぞれ異なるものであってもよい。 First, an I / O address, data, and control signal are given from the global processor 2 to the memory controller 5 via a bus. The global processor 2 sets commands such as an operation method in some operation setting registers (not shown) of the memory controller 5. Finally, the global processor 2 writes a start code in a start register (not shown) of the memory controller 5 so that the memory controller 5 automatically performs an operation according to the setting. The memory controller 5 generates this reset signal based on the command of the global processor 2 and sends it to the processor element block 3 from the external interface 4 via the reset signal 47. As a result, the register controller 31a is reset. Then, a clock signal is sent from the memory controller 5 to the register controller 31a closest to the external interface 4 via the external interface 4 and the clock signal 41c. In synchronization with this clock signal, the register controller 31a 'obtains a read / write instruction signal sent from the memory controller 5 via the read / write signal 45a or 45b. This read / write instruction signal is applied to the register 31b of the processor element 3a to which the even number is assigned and to the register 31b of the processor element 3a to which the odd number is assigned. At this time, the read / write instruction signals sent to the register controller 31a 'of the processor element 3a constituting one set may be different from those in the first embodiment.

これにより、上述した第１実施形態の場合と同様、一度のポインタ指定により、偶数番号が割り付けられたプロセッサエレメント３ａにデータ転送できるとともに、奇数番号が割り付けられたプロセッサエレメント３ａにもデータ転送できる。 As a result, as in the case of the first embodiment described above, data can be transferred to the processor element 3a to which the even number is assigned by one pointer designation, and can also be transferred to the processor element 3a to which the odd number is assigned.

図１５に示すものは、上記したこの発明のメモリコントローラ５を含んだ画像処理装置の他の実施形態の構成を示すブロック図である。この図１５に示すものは、独立した２つのレジスタファイル３のデータ転送ポートの間にメモリコントローラ５を配置したものである。このような構成のものにおいても、本発明は適用することができる。 FIG. 15 is a block diagram showing the configuration of another embodiment of the image processing apparatus including the memory controller 5 of the present invention described above. In FIG. 15, a memory controller 5 is arranged between data transfer ports of two independent register files 3. The present invention can also be applied to such a configuration.

この発明の実施形態におけるＳＩＭＤ型プロセッサを示すブロック図である。It is a block diagram which shows the SIMD type | mold processor in embodiment of this invention. この発明の第１の実施形態におけるＳＩＭＤ型プロセッサの内部構成を示すブロック図である。It is a block diagram which shows the internal structure of the SIMD type | mold processor in 1st Embodiment of this invention. この発明の第１の実施形態におけるプロセッサエレメントの内部構成を示すブロック図である。It is a block diagram which shows the internal structure of the processor element in 1st Embodiment of this invention. この発明に用いられるメモリコントローラ５の構成を示すブロック図である。It is a block diagram which shows the structure of the memory controller 5 used for this invention. この発明に用いられるメモリコントローラ５のプロセッサエレメント（ＰＥ）制御部５２の実施形態の概略ブロック図である。It is a schematic block diagram of embodiment of the processor element (PE) control part 52 of the memory controller 5 used for this invention. この発明に用いられるメモリコントローラ５のメモリ（ＲＡＭ）制御部５３の第１の実施形態の概略ブロック図である。It is a schematic block diagram of 1st Embodiment of the memory (RAM) control part 53 of the memory controller 5 used for this invention. 主走査方向の画素数よりもプロセッサエレメント３ａの数が少ないＳＩＭＤ方式のプロセッサを用いた場合の画素データの転送方法についての説明図である。It is explanatory drawing about the transfer method of pixel data at the time of using the processor of a SIMD system with the number of processor elements 3a smaller than the number of pixels of the main scanning direction. この発明に用いられるメモリコントローラ５のＲＡＭ制御部５３の第２の実施形態を示す概略ブロック図である。It is a schematic block diagram which shows 2nd Embodiment of RAM control part 53 of the memory controller 5 used for this invention. この発明に用いられるメモリコントローラ５のＲＡＭ制御部５３の第３の実施形態を示す概略ブロック図である。It is a schematic block diagram which shows 3rd Embodiment of the RAM control part 53 of the memory controller 5 used for this invention. この発明に用いられるメモリコントローラ５のライトバッファ部５４及びリードバッファ部５５の概略ブロック図である。3 is a schematic block diagram of a write buffer unit 54 and a read buffer unit 55 of the memory controller 5 used in the present invention. FIG. 変倍処理の内、縮小を実現するための間引きライト動作の説明図である。It is explanatory drawing of the thinning-out write operation | movement for implement | achieving reduction | restoration among scaling processes. 変倍処理の内、重複リード動作の説明図である。It is explanatory drawing of duplication read operation | movement among scaling processes. この発明の他の実施形態におけるＳＩＭＤ型プロセッサの内部構成を示すブロック図である。It is a block diagram which shows the internal structure of the SIMD type | mold processor in other embodiment of this invention. この発明のさらに異なる実施形態におけるＳＩＭＤ型プロセッサの内部構成を示すブロック図である。It is a block diagram which shows the internal structure of the SIMD type | mold processor in further different embodiment of this invention. この発明の他の実施の形態を示す概略ブロック図である。It is a schematic block diagram which shows other embodiment of this invention. 従来のＳＩＭＤ方式を用いた画像処理装置の概略ブロック図である。It is a schematic block diagram of the image processing apparatus using the conventional SIMD system.

Explanation of symbols

１ＳＩＭＤ型プロセッサ
２グローバルプロセッサ
４外部インタフェース
５メモリコントローラ
６画像メモリ
５１シーケンスユニット
５２ＰＥ制御部
５３ＲＡＭ制御部
５４ライトバッファ部
５５リードバッファ部 DESCRIPTION OF SYMBOLS 1 SIMD type processor 2 Global processor 4 External interface 5 Memory controller 6 Image memory 51 Sequence unit 52 PE control part 53 RAM control part 54 Write buffer part 55 Read buffer part

Claims

A processor element of a SIMD type processor having a computing means for computing data, a data holding means for holding data computed by the computing means and holding data computed by the computing means, and a plurality of the processors A data transfer bus connected to each of the elements; designation means for designating a predetermined processor element based on an address assigned to the plurality of processor elements; an address bus for supplying an address to the designation means; and the plurality of processors A data transfer interface for accessing the data holding means incorporated in the element from outside the processor, and an interface for specifying the predetermined processor element connected to the data transfer interface and supplied to the address bus. A memory controller that reads data stored in a memory, writes data to a processor element, reads data from the processor element, and writes data to the memory, and When transferring data from the data holding means of each processor element to the memory via the data transfer interface, the memory controller designates the address of the processor element that starts the transfer and the address of the processor element that ends the transfer. A signal processing device that reads out the calculated data from each processor element except for a predetermined number of data.