JP5369669B2

JP5369669B2 - SIMD type microprocessor

Info

Publication number: JP5369669B2
Application number: JP2008327035A
Authority: JP
Inventors: 俊輝山中
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2008-12-24
Filing date: 2008-12-24
Publication date: 2013-12-18
Anticipated expiration: 2028-12-24
Also published as: JP2010152450A

Abstract

<P>PROBLEM TO BE SOLVED: To provide an SIMD type microprocessor allowing high-speed simultaneous processing of more image data by improving processing performance during data transfer without increasing a circuit scale or a layout scale. <P>SOLUTION: In the SIMD (Single Instruction-stream, Multiple Data-stream) type microprocessor 1, first word lines 6 are connected to registers having different numbers for each 16 pieces between a plurality of processor elements PE0-PEn, and second word lines 8 are connected to the registers each having the same number between the plurality of processor elements PE0-PEn. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は１つの演算命令により複数のデータ等を並列処理するＳＩＭＤ（Single Instruction-stream, Multiple Data-stream）型マイクロプロセッサに関する。 The present invention relates to a SIMD (Single Instruction-stream, Multiple Data-stream) type microprocessor that processes a plurality of data in parallel with one arithmetic instruction.

近年、デジタル複写機やファクシミリ装置などの画像処理では、画素数の増加、画像処理の多様化などにより画質の向上が図られている。そしてこの画像の向上に伴い、処理すべきデータ数が増加している。 In recent years, in image processing such as digital copying machines and facsimile machines, image quality has been improved by increasing the number of pixels and diversifying image processing. As the image is improved, the number of data to be processed has increased.

こういった画像処理においては、全ての画素に対して同一の演算処理を施すことが多い。そこで、１つの命令で複数のデータに対して、同時に同じ演算処理を行うＳＩＭＤ（Single Instruction-stream Multiple Data-stream）型マイクロプロセッサがよく用いられるようになった。このＳＩＭＤ型マイクロプロセッサでは、１つの命令で複数のデータに対して同時に同じ演算処理を行うことが可能である。 In such image processing, the same arithmetic processing is often applied to all pixels. Therefore, SIMD (Single Instruction-stream Multiple Data-stream) type microprocessors that perform the same arithmetic processing simultaneously on a plurality of data with one instruction are often used. In this SIMD type microprocessor, the same arithmetic processing can be simultaneously performed on a plurality of data with one instruction.

ＳＩＭＤ型マイクロプロセッサは、プロセッサエレメントと呼ばれる単位で演算器とレジスタとを備え、そのプロセッサエレメントを複数個有している。そしてこれら複数個のプロセッサエレメントを同時に制御する制御ユニットとしてグローバルプロセッサが備えられている。このグローバルプロセッサの制御により、１つの命令で複数のデータを同時に演算処理することが可能となる。このとき各プロセッサエレメントは通常１個の画素の画像処理を担当することとなる。 The SIMD type microprocessor includes an arithmetic unit and a register in a unit called a processor element, and has a plurality of the processor elements. A global processor is provided as a control unit for simultaneously controlling the plurality of processor elements. By controlling the global processor, it is possible to simultaneously process a plurality of data with one instruction. At this time, each processor element is usually in charge of image processing of one pixel.

このようにＳＩＭＤ型マイクロプロセッサは複数の演算器を並列に配置し、グローバルプロセッサにより同時制御することで容易に実現できる。さらに複数のプロセッサエレメントでの演算処理が同時に行われるため、必要な演算処理がより高速に行われることとなる。 As described above, the SIMD type microprocessor can be easily realized by arranging a plurality of arithmetic units in parallel and simultaneously controlling them by a global processor. Furthermore, since the arithmetic processing is performed by a plurality of processor elements at the same time, the necessary arithmetic processing is performed at a higher speed.

しかしながら、演算処理がいくら高速化できたとしても、対象となるデータの転送効率が悪ければＳＩＭＤ型マイクロプロセッサの効果が低下してしまう。そのため、データ転送に関しても演算速度に見合う速度でメモリ等をアクセスする必要があり、この速度に間に合わない場合は、データのアクセス速度でプロセッサの性能が決定してしまう。 However, no matter how fast the arithmetic processing can be, if the transfer efficiency of the target data is poor, the effect of the SIMD type microprocessor is reduced. For this reason, it is necessary to access a memory or the like at a speed corresponding to the calculation speed for data transfer. If this speed is not met, the processor performance is determined by the data access speed.

通常タイプのＳＩＳＤ（Single Instruction-stream Single Data-stream）型マイクロプロセッサでは、演算データはプロセッサのプログラムによりメモリから逐次アクセスするが、この場合にデータのアクセス速度はメモリのビット幅と転送時間で決定する。ＳＩＭＤ型マイクロプロセッサにおいてもこの方法を用いると、演算は並列処理であるのに対してデータのアクセスは逐次処理となるため、ＳＩＳＤ型マイクロプロセッサ程度に処理能力は低下してしまう。 In an ordinary type SISD (Single Instruction-stream Single Data-stream) type microprocessor, arithmetic data is accessed sequentially from memory by a processor program. In this case, the data access speed is determined by the bit width and transfer time of the memory. To do. If this method is used also in the SIMD type microprocessor, the computation is parallel processing, whereas the data access is sequential processing, so that the processing capability is reduced to the level of the SISD type microprocessor.

この外部データとの転送効率を向上するため、ＳＩＭＤ型マイクロプロセッサでは演算対象データのアクセスはプロセッサの命令では行わず、外部メモリ転送用ポートからプロセッサ内部の入出力用のレジスタに直接アクセスするように構成している。即ち、プロセッサでの演算実行と同時に、外部転送用ポートから、次に演算処理されるデータを入力用レジスタへ転送したり、演算処理されたデータを出力レジスタから外部転送用ポートを介して外部メモリへと転送することで、データ処理の高速化を図っている。 In order to improve the transfer efficiency with this external data, the SIMD type microprocessor does not access the operation target data with the processor instruction, but directly accesses the input / output registers inside the processor from the external memory transfer port. doing. In other words, simultaneously with the execution of the operation by the processor, the data to be processed next is transferred from the external transfer port to the input register, or the processed data is transferred from the output register to the external memory via the external transfer port. The data processing is speeded up by transferring to.

ＳＩＭＤ型マイクロプロセッサと外部メモリとのデータ転送フローは以下のように行われる。
（１）外部メモリから外部転送用ポートを介して、演算対象データを入力用レジスタに転送。
（２）プロセッサは外部から演算データを転送済みであるレジスタから演算を開始。
（３）プロセッサが所定の演算を実行する。この間に外部メモリから次の演算対象データを入力用レジスタに転送。また、演算処理済みデータ（結果データ）が出力用レジスタにある場合には、外部転送用ポートを介して結果データを出力用レジスタから外部メモリへと転送。
（４）プロセッサは演算を終了し、結果データを出力用レジスタに転送。 The data transfer flow between the SIMD type microprocessor and the external memory is performed as follows.
(1) Transfer the operation target data from the external memory to the input register via the external transfer port.
(2) The processor starts operation from the register to which operation data has been transferred from the outside.
(3) The processor executes a predetermined operation. During this time, the next operation target data is transferred from the external memory to the input register. Also, if the processed data (result data) is in the output register, the result data is transferred from the output register to the external memory via the external transfer port.
(4) The processor finishes the operation and transfers the result data to the output register.

上記のように、プロセッサの演算実行時に同時に外部のメモリデータ転送装置が演算データを転送することで高速化を実現している。しかし、演算処理とデータ転送がいくら同時に出来るとは言え、演算処理時間はプロセッサエレメント数に関わらず同時に実行できるのに対して、データ転送時間はプロセッサエレメント数に比例して長くなるという問題が生じる。外部メモリからプロセッサエレメントのレジスタにデータを転送する過程においては、通常１つのプロセッサエレメントに対して１サイクルかけてデータを転送するため、例えば５１２個のプロセッサエレメントを持ったＳＩＭＤ型マイクロプロセッサであれば、データ転送に５１２サイクルが必要になるということになる。したがって、いくら５１２個のプロセッサエレメントに対して同時に演算処理が行われたとしても、それに見合ったデータ転送が出来ない以上、ＳＩＭＤ型マイクロプロセッサの効果が発揮できないということになってしまう。 As described above, the speed is increased by the external memory data transfer device transferring the calculation data simultaneously with the execution of the calculation of the processor. However, although calculation processing and data transfer can be performed simultaneously, the calculation processing time can be executed simultaneously regardless of the number of processor elements, whereas the data transfer time increases in proportion to the number of processor elements. . In the process of transferring data from the external memory to the register of the processor element, since data is usually transferred to one processor element in one cycle, for example, if it is a SIMD type microprocessor having 512 processor elements This means that 512 cycles are required for data transfer. Therefore, no matter how much processing is performed on 512 processor elements at the same time, the SIMD microprocessor cannot be effective as long as the data cannot be transferred accordingly.

その問題を解消させるため、特許文献１ではＰＥインタフェースからのデータ線を偶数プロセッサエレメント用と奇数プロセッサエレメント用の２セット配置し、偶数番目のプロセッサエレメントと奇数番目のプロセッサエレメントとに同時アクセス手法を用いることで転送時間を１／２にする方法が記載されている。このように転送データのバンド幅を稼げばその分転送効率は向上する。 In order to solve the problem, in Patent Document 1, two sets of data lines from the PE interface are arranged for the even-numbered processor element and the odd-numbered processor element, and a simultaneous access method is applied to the even-numbered processor element and the odd-numbered processor element. A method is described in which the transfer time is halved by using it. Thus, if the bandwidth of the transfer data is increased, the transfer efficiency is improved accordingly.

また、特許文献２には、データの配列変換にかかわる処理能力の低下の問題に対して、レジスタからＡＬＵへのデータ転送時に、マルチポート化することで複数のデータ線を設け、それらを異なるプロセッサエレメントへと転送する経路を持たせたことが記載されている。特許文献３にも、やはりデータの配列変換にかかわる処理能力の低下の問題に対して、複数の分割ワード線でレジスタからＡＬＵへのデータ転送時に異なるレジスタを選択できるように構成したことが記載されている。
特許第３９７１５３５号公報特許第４０２０８０４号公報特開２００６−１６４１８３号公報 Japanese Patent Laid-Open No. 2004-228867 provides a plurality of data lines by providing multi-ports at the time of data transfer from a register to an ALU for the problem of a reduction in processing capability related to data array conversion. It is described that a route for transferring to the element is provided. Patent Document 3 also describes that a configuration is adopted in which a different register can be selected at the time of data transfer from a register to an ALU with a plurality of divided word lines, in response to the problem of a decrease in processing capacity related to data array conversion. ing.
Japanese Patent No. 3971535 Japanese Patent No. 4020804 JP 2006-164183 A

しかしながら、特許文献１に記載された方法では、転送データのバンド幅を稼ぐことでその分転送効率は向上されているが、データ線を増やすということは、それだけレイアウト面積が必要になるということであり、コスト面を犠牲にしてしまうという問題が生じる。またレジスタから演算回路への処理速度を考えると、レジスタのレイアウトサイズは少しでも小さく構成する必要があり、レイアウト面積の増大は演算速度の性能面にも影響してくる。 However, in the method described in Patent Document 1, the transfer efficiency is improved by increasing the transfer data bandwidth, but increasing the number of data lines means that a layout area is required. There is a problem that the cost is sacrificed. Considering the processing speed from the register to the arithmetic circuit, it is necessary to make the layout size of the register as small as possible, and the increase in the layout area also affects the performance of the arithmetic speed.

さらに、特許文献２や３に記載された方法でも、回路規模の増加やレイアウト面積の増大を招いてしまうという問題がある。 Furthermore, even the methods described in Patent Documents 2 and 3 have a problem that the circuit scale and the layout area are increased.

本発明はかかる問題を解決することを目的としている。 The present invention aims to solve such problems.

すなわち、本発明は、回路規模やレイアウト規模を増大させることなく、データ転送時の処理能力を向上させ、より多くの画像データを高速に同時処理する事が可能なＳＩＭＤ型マイクロプロセッサを提供することを目的としている。 That is, the present invention provides a SIMD type microprocessor capable of improving the processing capability at the time of data transfer without increasing the circuit scale and layout scale and simultaneously processing more image data at a high speed. It is an object.

請求項１に記載された発明は、固有の番号が割り当てられたｎ個（ｎは２以上の自然数）のレジスタおよび演算器を備えた複数のプロセッサエレメントと、前記複数のプロセッサエレメントの動作を制御する制御ユニットと、前記ｎ個のレジスタから外部への通信制御を行う外部インタフェースと、前記ｎ個のレジスタがそれぞれ接続された第１のデータ線と、前記制御ユニットが前記第１のデータ線を介して前記演算器との間で通信を行うレジスタを選択する第１のワード線と、前記複数のプロセッサエレメント間で同じ番号のレジスタ同士および前記外部インタフェースが接続された第２のデータ線と、前記外部インタフェースが前記第２のデータ線を介して通信を行うプロセッサエレメントを選択する第２のワード線と、が設けられたＳＩＭＤ型マイクロプロセッサにおいて、前記外部インタフェースが、前記レジスタの番号のそれぞれに対応して設けられているとともに、ｎ個のプロセッサエレメントの間では、互いに異なる番号のレジスタに同一の前記第１のワード線が接続されるように、前記第１のワード線が複数のプロセッサエレメント間でレジスタに接続され、そして、前記第２のワード線が、複数のプロセッサエレメント間で、同じ番号のレジスタに接続されている、ことを特徴とするＳＩＭＤ型マイクロプロセッサである。 The invention described in claim 1 controls a plurality of processor elements having n registers (n is a natural number of 2 or more) assigned with a unique number and an arithmetic unit, and operations of the plurality of processor elements. A control unit that controls communication from the n registers to the outside, a first data line to which each of the n registers is connected, and the control unit that connects the first data line A first word line that selects a register that communicates with the arithmetic unit via the second arithmetic line, a second data line to which the registers having the same number between the plurality of processor elements and the external interface are connected, And a second word line for selecting a processor element with which the external interface communicates via the second data line. In IMD microprocessors, the external interface is provided in a manner in correspondence to the respective numbers of the registers, among n number of processor elements, the same first word line in the register of different numbers The first word line is connected to a register between a plurality of processor elements , and the second word line is connected to a register of the same number between a plurality of processor elements. have a SIMD microprocessor, characterized in that.

請求項２に記載された発明は、固有の番号が割り当てられたｎ個（ｎは２以上の自然数）のレジスタおよび演算器を備えた複数のプロセッサエレメントと、前記複数のプロセッサエレメントの動作を制御する制御ユニットと、前記ｎ個のレジスタから外部への通信制御を行う外部インタフェースと、前記ｎ個のレジスタがそれぞれ接続された第１のデータ線と、前記制御ユニットが前記第１のデータ線を介して前記演算器との間で通信を行うレジスタを選択する第１のワード線と、前記複数のプロセッサエレメント間で同じ番号のレジスタ同士および前記外部インタフェースが接続された第２のデータ線と、前記外部インタフェースが前記第２のデータ線を介して通信を行うプロセッサエレメントを選択する第２のワード線と、が設けられたＳＩＭＤ型マイクロプロセッサにおいて、ｎ個のプロセッサエレメントの間では、互いに異なる番号のレジスタに同一の前記第１のワード線及び同一の前記第２のワード線が接続されるように、前記第１のワード線及び前記第２のワード線が複数のプロセッサエレメント間でレジスタに接続されることを特徴とするＳＩＭＤ型マイクロプロセッサである。 The invention described in claim 2 controls a plurality of processor elements including n registers (n is a natural number of 2 or more) assigned with a unique number and an arithmetic unit, and operations of the plurality of processor elements. A control unit that controls communication from the n registers to the outside, a first data line to which each of the n registers is connected, and the control unit that connects the first data line A first word line that selects a register that communicates with the arithmetic unit via the second arithmetic line, a second data line to which the registers having the same number between the plurality of processor elements and the external interface are connected, And a second word line for selecting a processor element with which the external interface communicates via the second data line. In IMD microprocessors, among n pieces of processor elements, such that the same in the register of different numbers first word line and the same of said second word lines are connected to each other, the first word line and the second word line is connected to the register among a plurality of processor elements are SIMD microprocessor according to claim Rukoto.

請求項３に記載された発明は、請求項１または２に記載された発明において、前記制御ユニットが前記第１のデータ線を介して前記演算器との間で通信を行うレジスタを選択するための、別の種類の第１のワード線がさらに設けられ、各プロセッサエレメントの同じ番号のレジスタに、同一の前記別の種類の第１のワード線が接続されていることを特徴とする。 According to a third aspect of the present invention, in the first or second aspect of the present invention, the control unit selects a register that communicates with the arithmetic unit via the first data line. The first word line of another type is further provided, and the same first word line of the same different type is connected to the register of the same number of each processor element .

請求項４に記載された発明は、請求項１乃至３のうちいずれか一項に記載された発明において、少なくとも前記第１のワード線の接続関係を変更する機能を有し、第１モードと第２モードのいずれかを選択するための切替手段がさらに設けられ、そして、前記第１モードでは、前記切替手段により、少なくとも前記第１のワード線について、ｎ個のプロセッサエレメントの間では、互いに異なる番号のレジスタに同一の前記第１のワード線が接続されるように、前記第１のワード線が複数のプロセッサエレメント間でレジスタに接続される態様が維持されるものであり、前記第２モードでは、各プロセッサエレメントにおけるｎ個のレジスタが複数のレジスタ群に分割され、前記切替手段により、それぞれの前記レジスタ群における少なくとも前記第１のワード線について、１つのプロセッサエレメント内の当該レジスタ群に含まれるレジスタの数が指し示す個数のプロセッサエレメントの間では、互いに異なる番号のレジスタに同一の前記第１のワード線が接続されるように、前記第１のワード線が複数のプロセッサエレメント間でレジスタに接続される態様が維持されるものである、ことを特徴とする。 The invention described in claim 4 has a function of changing at least the connection relation of the first word lines in the invention described in any one of claims 1 to 3 , and the first mode and Switching means for selecting any one of the second modes is further provided . In the first mode, the switching means causes each other between the n processor elements at least for the first word line. A mode in which the first word line is connected to a register among a plurality of processor elements is maintained so that the same first word line is connected to registers of different numbers, and the second In the mode, the n registers in each processor element are divided into a plurality of register groups, and at least the registers in each register group are divided by the switching means. Regarding the first word line, the same first word line is connected to registers having different numbers between the number of processor elements indicated by the number of registers included in the register group in one processor element. As described above, the aspect in which the first word line is connected to a register among a plurality of processor elements is maintained .

請求項１に記載の発明によれば、ｎ個のプロセッサエレメントの間では、互いに異なる番号のレジスタに同一の第１のワード線が接続されるように、第１のワード線が複数のプロセッサエレメント間でレジスタに接続され、第２のワード線が、複数のプロセッサエレメント間で、同じ番号のレジスタに接続されているので、プロセッサエレメントごとに異なる番号のレジスタを同時に選択することができる。そして、外部インタフェースはレジスタの番号ごとに設けられて全ての外部インタフェースが同時にデータ転送が可能となるように構成されているので、外部データを各外部インタフェースから同時に各プロセッサエレメントの異なる番号のレジスタに転送することができる。また、演算処理後のデータは配列変換することなく同時に読み出すことが出来るため、データの転送効率が向上する。さらに、演算処理に用いるレジスタの選択ワード線などを新たに追加することもないため、回路規模やレイアウト規模を増大させることなく実現できる。 According to the first aspect of the present invention , the first word line includes a plurality of processor elements so that the same first word line is connected to registers having different numbers among n processor elements. Since the second word lines are connected to the registers having the same number among the plurality of processor elements, the registers having different numbers can be simultaneously selected for each processor element. The external interface is provided for each register number so that all the external interfaces can transfer data simultaneously. Therefore, external data is simultaneously transferred from each external interface to a register with a different number in each processor element. Can be transferred. In addition, since the data after the arithmetic processing can be read simultaneously without performing array conversion, the data transfer efficiency is improved. Furthermore, since a selected word line of a register used for arithmetic processing is not newly added, the circuit scale and layout scale can be increased.

請求項２に記載の発明によれば、ｎ個のプロセッサエレメントの間では、互いに異なる番号のレジスタに同一の第１のワード線及び同一の第２のワード線が接続されるように、第１のワード線及び第２のワード線が複数のプロセッサエレメント間でレジスタに接続されているので、外部インタフェースがレジスタごとに独立して設けられずに、同期して同時に制御される場合でも、演算処理に用いるレジスタに直接転送することが可能となる。したがってアドレス信号を増やすことなくデータ転送効率を向上させたレジスタを構成することが可能となる。 According to the second aspect of the present invention, the first first word line and the same second word line are connected to the registers having different numbers among the n processor elements. Since the word line and the second word line are connected to a register between a plurality of processor elements, even if the external interface is not provided independently for each register and is controlled simultaneously in synchronization, It is possible to transfer directly to the register used for the above. Therefore, it is possible to configure a register with improved data transfer efficiency without increasing the address signal.

請求項３に記載の発明によれば、制御ユニットが第１のデータ線を介して演算器との間で通信を行うレジスタを選択するための、別の種類の第１のワード線がさらに設けられ、各プロセッサエレメントの同じ番号のレジスタに、同一の別の種類の第１のワード線が接続されているので、用途に応じて、転送速度を高速化させた動作が可能なワード線と、外部データ転送ポートごとに異なるデータを並行して入出力する動作が可能なワード線とを切り替えて使い分けることが可能となる。その場合、レジスタ１つにつきワード線１本が追加されるだけであり、レイアウト規模を増大させることも少ない。 According to a third aspect of the present invention, there is further provided another type of first word line for the control unit to select a register to communicate with the arithmetic unit via the first data line. Since the same different type of first word line is connected to the register of the same number of each processor element, a word line capable of operating at a higher transfer speed according to the application, It is possible to switch between different word lines that can input and output different data in parallel for each external data transfer port. In that case, only one word line is added per register, and the layout scale is rarely increased.

請求項４に記載の発明によれば、第１モードでは、切替手段により、少なくとも第１のワード線について、ｎ個のプロセッサエレメントの間では、互いに異なる番号のレジスタに同一の第１のワード線が接続されるように、第１のワード線が複数のプロセッサエレメント間でレジスタに接続される態様が維持されるものであり、第２モードでは、各プロセッサエレメントにおけるｎ個のレジスタが複数のレジスタ群に分割され、切替手段により、それぞれの前記レジスタ群における少なくとも第１のワード線について、１つのプロセッサエレメント内の当該レジスタ群に含まれるレジスタの数が指し示す個数のプロセッサエレメントの間では、互いに異なる番号のレジスタに同一の第１のワード線が接続されるように、第１のワード線が複数のプロセッサエレメント間でレジスタに接続される態様が維持されるものであるので、複数のレジスタを分割し領域ごとにデータ転送効率を向上させることを可能としている。 According to the fourth aspect of the present invention, in the first mode, the switching means causes the same first word line to a register having a different number between n processor elements for at least the first word line. Are maintained in such a manner that the first word line is connected to the registers among the plurality of processor elements, and in the second mode, n registers in each processor element include a plurality of registers. Divided into groups, and by the switching means, the number of processor elements indicated by the number of registers included in the register group in one processor element differs at least for the first word line in each of the register groups. A plurality of first word lines are connected so that the same first word line is connected to the number register. Since those aspects which are connected to the register between the processor elements is maintained, thereby enabling to improve the data transfer efficiency for each divide the plurality of register areas.

［第１実施形態］
以下、本発明の第１の実施形態を、図１および図２を参照して説明する。図１は、本発明の第１の実施形態にかかるＳＩＭＤ型マイクロプロセッサのブロック図である。図２は、図１に示したＳＩＭＤ型マイクロプロセッサにおけるレジスタ転送動作の説明図である。 [First Embodiment]
Hereinafter, a first embodiment of the present invention will be described with reference to FIGS. 1 and 2. FIG. 1 is a block diagram of a SIMD type microprocessor according to the first embodiment of the present invention. FIG. 2 is an explanatory diagram of a register transfer operation in the SIMD type microprocessor shown in FIG.

図１に示したＳＩＭＤ型マイクロプロセッサ１は、グローバルプロセッサ２と、複数のプロセッサエレメントＰＥ０〜ＰＥｎと、ＰＥインタフェース４と、を備えている。 The SIMD type microprocessor 1 shown in FIG. 1 includes a global processor 2, a plurality of processor elements PE0 to PEn, and a PE interface 4.

制御ユニットとしてのグローバルプロセッサ２は、プログラム格納用のプログラムＲＡＭと、演算データ格納用のデータＲＡＭと、プログラムのアドレスを保持するプログラムカウンタと、演算処理のデータ格納のための汎用レジスタと、ＡＬＵと、レジスタ退避及び復帰時に退避先データＲＡＭのアドレスを保持しているスタックポインタと、サブルーチンコール時にコール元のアドレスを保持するリンクレジスタと、割り込み時とＮＭＩ（マスク不可割り込み）時の分岐元アドレスを保持するレジスタと、グローバルプロセッサ２の状態を保持しているプロセッサステータスレジスタと、命令を解読し各種制御信号を生成するシーケンスユニットと、などを備えている。そして、これらを用いて命令を実行し、後述する第１ワード線６を含む制御信号を、各プロセッサエレメントＰＥ０〜ＰＥｎに供給する。 The global processor 2 as a control unit includes a program RAM for storing programs, a data RAM for storing arithmetic data, a program counter for holding program addresses, a general-purpose register for storing arithmetic processing data, an ALU, The stack pointer that holds the address of the save destination data RAM at the time of register saving and restoration, the link register that holds the address of the call source at the time of a subroutine call, and the branch source address at the time of interrupt and NMI (non-maskable interrupt) A register for holding, a processor status register for holding the state of the global processor 2, a sequence unit for decoding various instructions and generating various control signals, and the like are provided. Then, an instruction is executed using these, and a control signal including a first word line 6 described later is supplied to each of the processor elements PE0 to PEn.

プロセッサエレメントＰＥ０〜ＰＥｎは、図１に示したようにｎ個（ｎは２以上の自然数）が１列に配置されている。各プロセッサエレメントには、レジスタＲ０〜レジスタＲ１５までの１６個のレジスタと、ＡＬＵ３ａと、を備えている。 As shown in FIG. 1, n (n is a natural number of 2 or more) processor elements PE0 to PEn are arranged in one row. Each processor element includes 16 registers from register R0 to register R15, and an ALU 3a.

レジスタＲ０〜Ｒ１５は、ＡＬＵ３ａで演算するデータやＡＬＵ３ａで演算したデータを格納する。また、レジスタＲ０は０番、レジスタＲ１は１番といったように、各レジスタには固有の番号が割り当てられている。 The registers R0 to R15 store data calculated by the ALU 3a and data calculated by the ALU 3a. Further, a unique number is assigned to each register, such as 0 for the register R0 and 1 for the register R1.

演算器としてのＡＬＵ２５は、算術論理演算器であり、レジスタＲ０〜Ｒ１５に格納されたデータを後述する第１データ線５を介して取得して、グローバルプロセッサ２からの制御信号によって指定された演算を行い、再度第１データ線５を介してレジスタＲ０〜Ｒ１５に出力する。 The ALU 25 as an arithmetic unit is an arithmetic logic unit, which obtains data stored in the registers R0 to R15 via a first data line 5 to be described later and performs an operation designated by a control signal from the global processor 2. Is output to the registers R0 to R15 via the first data line 5 again.

１６個のレジスタＲ０〜Ｒ１５とＡＬＵ３ａとはそれぞれプロセッサエレメント内部の共通のデータ線である第１データ線５で接続されており、いずれのレジスタに対してＡＬＵ３ａがアクセスするかは、グローバルプロセッサ２から出力される第１ワード線６によって選択される。 The 16 registers R0 to R15 and the ALU 3a are connected to each other by the first data line 5 which is a common data line inside the processor element. Which register the ALU 3a accesses from the global processor 2 The first word line 6 to be output is selected.

第１ワード線６は、グローバルプロセッサ２から各プロセッサエレメントのレジスタを選択するためのワード線であり、プロセッサエレメントごとに異なる番号のレジスタに接続されている。例えばプロセッサエレメントＰＥ０のレジスタＲ０に接続した第１ワード線６は、プロセッサエレメントＰＥ１ではレジスタＲ１、プロセッサエレメントＰＥ２ではレジスタＲ２といったように、プロセッサエレメントが１つ移動するごとにレジスタの番号も１ずつずらして接続されている。本実施形態の場合、プロセッサエレメントＰＥ０のレジスタＲ０に接続した第１ワード線６は、ｎ＞１５であれば、プロセッサエレメントＰＥ１５のレジスタＲ１５まで接続された後は、プロセッサエレメントＰＥ１６からは再びレジスタＲ０を選択するように接続される。つまり、ｎ個ごとにそれぞれ異なる番号のレジスタに接続されている。 The first word line 6 is a word line for selecting a register of each processor element from the global processor 2, and is connected to a register having a different number for each processor element. For example, the first word line 6 connected to the register R0 of the processor element PE0 shifts the register number by one each time the processor element moves, such as the register R1 in the processor element PE1 and the register R2 in the processor element PE2. Connected. In the present embodiment, if n> 15, the first word line 6 connected to the register R0 of the processor element PE0 is connected to the register R15 of the processor element PE15. Connected to select. In other words, every n pieces are connected to different numbers of registers.

外部インタフェースとしてのＰＥインタフェース４は、プロセッサエレメントＰＥ０〜ＰＥｎ内のレジスタＲ０〜Ｒ１５とＳＩＭＤ型マイクロプロセッサ１とのデータ転送を行う外部転送用ポートである。ＰＥインタフェース４は、プロセッサエレメント内のレジスタの番号ごとにＰＥＩＦ（Ｒ０）〜ＰＥＩＦ（Ｒ１５）の１６個が設けられており、ＰＥＩＦ（Ｒ０）〜ＰＥＩＦ（Ｒ１５）と各プロセッサエレメントのレジスタＲ０〜Ｒ１５とは第２データ線７で接続されている。つまり、ＰＥＲＦ（Ｒ０）から出力された第２データ線７は、各プロセッサエレメントのレジスタＲ０に接続され、ＰＥＲＦ（Ｒ１）から出力された第２データ線７は、各プロセッサエレメントのレジスタＲ１に接続され、といったように接続されている。すなわち、ＰＥＩＦが複数のレジスタにそれぞれ対応して設けられている。 The PE interface 4 as an external interface is an external transfer port for transferring data between the registers R0 to R15 in the processor elements PE0 to PEn and the SIMD type microprocessor 1. The PE interface 4 is provided with 16 PEIF (R0) to PEIF (R15) for each register number in the processor element. The PEIF (R0) to PEIF (R15) and the registers R0 to R15 of each processor element are provided. Are connected by a second data line 7. That is, the second data line 7 output from the PERF (R0) is connected to the register R0 of each processor element, and the second data line 7 output from the PERF (R1) is connected to the register R1 of each processor element. And so on. That is, PEIF is provided corresponding to each of the plurality of registers.

また第２データ線７を用いてデータを入出力するプロセッサエレメントの選択を制御する第２ワード線８がＰＥＩＦ（Ｒ０）〜ＰＥＩＦ（Ｒ１５）から出力されて各プロセッサエレメントのレジスタＲ０〜Ｒ１５に接続されている。つまり、第２ワード線８も第２データ線７と同様に、ＰＥＲＦ（Ｒ０）から出力された第２ワード線８は、各プロセッサエレメントのレジスタＲ０に接続され、ＰＥＲＦ（Ｒ１）から出力された第２ワード線８は、各プロセッサエレメントのレジスタＲ１に接続され、といったように接続されている。つまり、複数のプロセッサエレメント間で、同じ番号のレジスタに接続されている。 A second word line 8 for controlling selection of a processor element for inputting / outputting data using the second data line 7 is output from PEIF (R0) to PEIF (R15) and connected to registers R0 to R15 of each processor element. Has been. That is, as with the second data line 7, the second word line 8 output from the PERF (R0) is connected to the register R0 of each processor element and output from the PERF (R1). The second word line 8 is connected to the register R1 of each processor element and so on. That is, the plurality of processor elements are connected to the same numbered registers.

上述した構成の各プロセッサエレメントの各レジスタは通常１個の画素に相当するデータが格納されており、１画素は８ｂｉｔ、あるいは１６ｂｉｔのデータからなっている。つまり、各レジスタは８ｂｉｔ、あるいは１６ｂｉｔ幅で構成されている。また、ｎ個のプロセッサエレメントは１列の画素データに相当するように構成されており、これらのデータを同時処理することによって所望の画像処理が実現できる。 Each register of each processor element having the above-described configuration normally stores data corresponding to one pixel, and one pixel is composed of 8-bit or 16-bit data. That is, each register is configured with a width of 8 bits or 16 bits. The n processor elements are configured to correspond to one column of pixel data, and desired image processing can be realized by processing these data simultaneously.

次に、上述した構成のＳＩＭＤ型マイクロプロセッサ１におけるプロセッサエレメントＰＥ０〜ＰＥｎのレジスタＲ０〜Ｒ１５へのデータの転送方法を説明する。従来、１列の画像データは各プロセッサエレメントの同じ番号のレジスタへと転送するため、１つのＰＥＩＦが用いられるが、本実施例では１列の画像データを全てのＰＥＩＦへと分散させてデータ転送を行う。例えば１つ目の画素はＰＥＩＦ（Ｒ０）から、２つ目の画素はＰＥＩＦ（Ｒ１）から、３つ目はＰＥＩＦ（Ｒ２）からといったように１６画素ごとに異なるＰＥＩＦに振り分けて転送する。このようにすることで１６画素のデータを同時に転送することができ、プロセッサエレメントＰＥ０〜ＰＥｎへのデータ転送に要する時間は１／１６に短縮される。 Next, a method of transferring data to the registers R0 to R15 of the processor elements PE0 to PEn in the SIMD type microprocessor 1 having the above configuration will be described. Conventionally, one PEIF is used to transfer one column of image data to the register of the same number of each processor element. In this embodiment, one column of image data is distributed to all PEIFs for data transfer. I do. For example, the first pixel is transferred from PEIF (R0), the second pixel is transferred from PEIF (R1), and the third pixel is transferred from PEIF (R2). In this way, 16-pixel data can be transferred simultaneously, and the time required for data transfer to the processor elements PE0 to PEn is reduced to 1/16.

つまり、各ＰＥＩＦ（Ｒ０）〜ＰＥＩＦ（Ｒ１５）が第２データ線７と第２ワード線８を用いて、それぞれ異なる番号のレジスタにデータを転送する。そのため、１つ目の画素はプロセッサエレメントＰＥ０のレジスタＲ０へ転送され、２つ目の画素はプロセッサエレメントＰＥ１のレジスタＲ１へ転送され、３つ目の画素はプロセッサエレメントＰＥ２のレジスタＲ２へ転送され、１６個目の画素はプロセッサエレメントＰＥ１５のレジスタＲ１５へ転送される。 That is, each of PEIF (R0) to PEIF (R15) uses the second data line 7 and the second word line 8 to transfer data to different numbers of registers. Therefore, the first pixel is transferred to the register R0 of the processor element PE0, the second pixel is transferred to the register R1 of the processor element PE1, and the third pixel is transferred to the register R2 of the processor element PE2. The 16th pixel is transferred to the register R15 of the processor element PE15.

このようにしてプロセッサエレメントＰＥ０〜ＰＥｎのレジスタに転送された画像データは、上述したグローバルプロセッサ２が第１ワード線６のうち１本をアクティブにすることによって、アクティブにされた第１ワード線６に接続されたレジスタが選択される。選択されたデータは第１データ線６を介してＡＬＵ３ａに転送される。ここで選択されたレジスタに格納されているデータは本来１列に並んだ画像データそのものであるので、ＰＥインタフェース４から転送された画像データがプロセッサエレメント上で配列変換を行うことなく、ＡＬＵ３ａの演算データとして使用できる。 The image data transferred to the registers of the processor elements PE0 to PEn in this way is activated when the above-described global processor 2 activates one of the first word lines 6. The register connected to is selected. The selected data is transferred to the ALU 3a via the first data line 6. Since the data stored in the selected register is originally image data arranged in one column, the image data transferred from the PE interface 4 is not subjected to array conversion on the processor element, and the operation of the ALU 3a is performed. Can be used as data.

演算処理後のデータは、演算処理前と逆の手順でデータ転送を行って、異なる各プロセッサエレメント間で異なる番号のレジスタに格納した後ＰＥインタフェース４から１６個分のデータを同時に取り出す。そのため、データの配列変換による処理時間のロスはなく、１／１６に短縮された転送時間がそのまま処理時間の短縮に繋がる。 Data after the arithmetic processing is transferred in the reverse order of the processing before the arithmetic processing, and is stored in registers of different numbers between different processor elements, and then 16 pieces of data are simultaneously extracted from the PE interface 4. Therefore, there is no loss of processing time due to data array conversion, and the transfer time reduced to 1/16 leads to a reduction in processing time.

画像データなどの演算処理を行うＳＩＭＤ型マイクロプロセッサ１では、画像データの並列処理を行うことで処理速度を向上させることを可能にしている。しかし、プロセッサエレメント数の増加に伴い処理時間にはデータの転送速度が支配的になっており、この転送効率を向上させることはＳＩＭＤ型マイクロプロセッサ１の性能向上に繋がるといえる。 In the SIMD type microprocessor 1 that performs arithmetic processing such as image data, the processing speed can be improved by performing parallel processing of image data. However, as the number of processor elements increases, the data transfer rate becomes dominant in the processing time, and it can be said that improving the transfer efficiency leads to the performance improvement of the SIMD type microprocessor 1.

図２は上述したデータ転送方法について、より簡易的に示した図である。簡略化するため、ここではプロセッサエレメント数が８、レジスタ数が４の場合を例に挙げている。まず１回目のＰＥインタフェース４とのデータ転送ではＰＥ０，ＰＥ１，ＰＥ２，ＰＥ３の４つのプロセッサエレメントの太線でくくった部分へと第２データ線７と第２ワード線８によってデータが転送される。２回目のデータ転送ではＰＥ４，ＰＥ５，ＰＥ６，ＰＥ７の４つのプロセッサエレメントの太線でくくった部分へとデータが転送される。その後全てのプロセッサエレメントの太線でくくった部分のデータがまとめて第１データ線５と第１ワード線６によってＡＬＵ３ａ側へと同時に転送されて演算処理が実行される。演算処理後のデータの転送についても順序が逆になる他は同様である。 FIG. 2 is a diagram showing the data transfer method described above more simply. For simplification, the case where the number of processor elements is 8 and the number of registers is 4 is taken as an example here. First, in the first data transfer with the PE interface 4, the data is transferred by the second data line 7 and the second word line 8 to the portions of the four processor elements PE 0, PE 1, PE 2, and PE 3 that are surrounded by thick lines. In the second data transfer, data is transferred to the portion surrounded by the thick lines of the four processor elements PE4, PE5, PE6, and PE7. Thereafter, the data of the portions surrounded by the thick lines of all the processor elements are collectively transferred to the ALU 3a side simultaneously by the first data line 5 and the first word line 6, and the arithmetic processing is executed. The data transfer after the arithmetic processing is the same except that the order is reversed.

なお、本実施形態において、複数のＰＥＩＦに外部メモリなどのデータを振り分けるには、バスの接続やマルチプレクサなどの選択回路が必要となるが、ＳＩＭＤ型マイクロプロセッサ１の大部分は複数のプロセッサエレメントで占めており、ＰＥＩＦの複数使いをしない場合と比べても、周辺部での回路追加による影響は小さいと言える。 In this embodiment, in order to distribute data such as an external memory to a plurality of PEIFs, a bus connection and a selection circuit such as a multiplexer are required. However, most of the SIMD type microprocessor 1 is composed of a plurality of processor elements. Compared to the case of not using multiple PEIFs, it can be said that the influence of circuit addition in the peripheral part is small.

本実施例によれば、第１ワード線６が、複数のプロセッサエレメントＰＥ０〜ＰＥｎ間で、１６個ごとにそれぞれ異なる番号のレジスタに接続され、第２ワード線８が、複数のプロセッサエレメントＰＥ０〜ＰＥｎ間で、同じ番号のレジスタに接続されているので、ＰＥインタフェース４からプロセッサエレメント１６個ごとに異なる番号のレジスタに格納して、その格納したデータを各プロセッサエレメントのＡＬＵ３ａに一斉に転送することができるので、グローバルプロセッサ２からのワード線の数を増やすことがなく、レイアウトサイズの増加もしない。またグローバルプロセッサ２側の回路構成は従来例と全く共通とすることができる。要するに本発明の構成を用いることで、レイアウトサイズを大きくすることなく、容易に転送速度を向上させることができる。 According to this embodiment, the first word line 6 is connected to a register having a different number for every 16 pieces among the plurality of processor elements PE0 to PEn, and the second word line 8 is connected to the plurality of processor elements PE0 to PEn. Since the PEn is connected to the register having the same number, the PE interface 4 stores the 16 processor elements in different register numbers and transfers the stored data to the ALUs 3a of the processor elements all at once. Therefore, the number of word lines from the global processor 2 is not increased and the layout size is not increased. The circuit configuration on the global processor 2 side can be made completely the same as the conventional example. In short, by using the configuration of the present invention, the transfer speed can be easily improved without increasing the layout size.

［第２実施形態］
次に、本発明の第２の実施形態を図３を参照して説明する。なお、前述した第１の実施形態と同一部分には、同一符号を付して説明を省略する。図３は、本発明の第２の実施形態にかかるＳＩＭＤ型マイクロプロセッサ１のブロック図である。 [Second Embodiment]
Next, a second embodiment of the present invention will be described with reference to FIG. Note that the same parts as those in the first embodiment described above are denoted by the same reference numerals and description thereof is omitted. FIG. 3 is a block diagram of the SIMD type microprocessor 1 according to the second embodiment of the present invention.

本実施形態では、第１の実施形態に加えて、第１ワード線６´と、ＯＲゲート３ｂと、が追加されている。 In this embodiment, in addition to the first embodiment, a first word line 6 ′ and an OR gate 3b are added.

つまり、グローバルプロセッサ２からレジスタを選択するためのワード線として、１つのレジスタにつき２本ずつ配線されて、ＯＲゲート３ｂを介してレジスタに入力されている。このワード線のうち１つは第１の実施形態と同様のプロセッサエレメントごとに異なる番号のレジスタに接続されている第１ワード線６であり、もう１つは、プロセッサエレメントごとに同一の番号のレジスタに接続されている第１ワード線６´である。すなわち、第１ワード線６には、複数のプロセッサエレメント間で、ｎ個ごとにそれぞれ異なる番号のレジスタに接続されたワード線と、同一の番号のレジスタに接続されたワード線と、の２種類が設けられている。 That is, two word lines are wired per register as a word line for selecting a register from the global processor 2, and are input to the register via the OR gate 3b. One of the word lines is a first word line 6 connected to a register having a different number for each processor element as in the first embodiment, and the other is the same number for each processor element. A first word line 6 'connected to the register. That is, the first word line 6 includes two types of word lines connected to a register having a different number for every n and a word line connected to a register having the same number among a plurality of processor elements. Is provided.

第１ワード線６´は、従来どおりの１つのＰＥＩＦから順次各プロセッサエレメントにデータ転送を行うためのワード線である。つまり、本実施形態では従来の転送方式と第１の実施形態で示した転送方式とを共存させている。 The first word line 6 'is a word line for transferring data sequentially from one PEIF to each processor element as in the prior art. That is, in this embodiment, the conventional transfer method and the transfer method shown in the first embodiment coexist.

本実施形態によれば、第１のワード線６と６´との２本を各レジスタに配線しているので、用途に応じて、転送速度を高速化させたモードと、ＰＥＩＦごとに異なるデータを並行して入出力するモードとを切り替えて使い分けることが可能となる。その場合、レジスタ１つにつきワード線１本が追加されるだけであり、レイアウト規模を増大させることも少ない。 According to the present embodiment, two of the first word lines 6 and 6 'are wired to each register. Therefore, depending on the application, the mode in which the transfer speed is increased and the data different for each PEIF are used. It is possible to switch the mode to input / output in parallel and use them properly. In that case, only one word line is added per register, and the layout scale is rarely increased.

［第３実施形態］
次に、本発明の第３の実施形態を図４ないし図６を参照して説明する。なお、前述した第１、第２の実施形態と同一部分には、同一符号を付して説明を省略する。図４は、本発明の第３の実施形態にかかるＳＩＭＤ型マイクロプロセッサのブロック図である。図５は、図４に示したＳＩＭＤ型マイクロプロセッサのＷＬ−Ｓｅｌｅｃｔｏｒの構成図である。図６は、図４に示したＳＩＭＤ型マイクロプロセッサにおけるレジスタ転送動作の説明図である。 [Third Embodiment]
Next, a third embodiment of the present invention will be described with reference to FIGS. The same parts as those in the first and second embodiments described above are denoted by the same reference numerals and description thereof is omitted. FIG. 4 is a block diagram of a SIMD type microprocessor according to the third embodiment of the present invention. FIG. 5 is a block diagram of the WL-Selector of the SIMD type microprocessor shown in FIG. FIG. 6 is an explanatory diagram of a register transfer operation in the SIMD type microprocessor shown in FIG.

本実施形態では、第１の実施形態に対して複数のレジスタＲ０〜Ｒ１５の間にＷＬ−Ｓｅｌｅｃｔｏｒ３ｃが設けられている点が異なる。 This embodiment is different from the first embodiment in that a WL-Selector 3c is provided between a plurality of registers R0 to R15.

切替手段としてのＷＬ−Ｓｅｌｅｃｔｏｒ３ｃは、４つのトランスファゲート３ｄ１、３ｄ２、３ｄ３、３ｄ４と、２つの入力端子ＷＬＬＩ、ＷＬＲＩと、２つの出力端子ＷＬＬＯ、ＷＬＲＯと、から構成される回路が１ＰＥ当たりに１組ずつ設けられており、４つのトランスファゲート３ｄ１、３ｄ２、３ｄ３、３ｄ４はグローバルプロセッサ２から入力されるＳＥＬおよびＳＥＬＮ信号によってＯＮ／ＯＦＦが制御される。また、入力端子ＷＬＬＩ、ＷＬＲＩと出力端子ＷＬＬＯ、ＷＬＲＯは、それぞれ第１のワード線６が接続されている。 The WL-Selector 3c as the switching means is a circuit composed of four transfer gates 3d1, 3d2, 3d3, 3d4, two input terminals WLLI, WLRI, and two output terminals WLLO, WLRO per PE. The four transfer gates 3d1, 3d2, 3d3, 3d4 are ON / OFF controlled by SEL and SELN signals input from the global processor 2. The first word line 6 is connected to the input terminals WLLI and WLRI and the output terminals WLLO and WLRO, respectively.

ＷＬ−Ｓｅｌｅｃｔｏｒ３ｃの動作を説明すると、グローバルプロセッサ２から入力される信号ＳＥＬとＳＥＬＮは互いに排他的な信号レベルに設定される。つまり、ＳＥＬが“Ｈｉ”レベルの場合はＳＥＬＮが“Ｌｏｗ”レベルに、ＳＥＬが“Ｌｏｗ”レベルの場合はＳＥＬＮは“Ｈｉ”レベルに、それぞれ設定される。まず、ＳＥＬが“Ｈｉ”レベル、ＳＥＬＮは“Ｌｏｗ”レベルの場合で説明すると、４つのトランスファゲート３ｄ１、３ｄ２、３ｄ３、３ｄ４は、トランスファゲート３ｄ１がＯＦＦ、トランスファゲート３ｄ２がＯＮ、トランスファゲート３ｄ３がＯＮ、トランスファゲート３ｄ４がＯＦＦ、となるため、入力端子ＷＬＬＩから入力された信号はトランスファゲート３ｄ２を経由して出力端子ＷＬＲＯから出力され、入力端子ＷＬＲＩから入力された信号はトランスファゲート３ｄ３を経由して出力端子ＷＬＬＯから出力される。つまり、ｎ個のレジスタを複数のレジスタ群に分割せずにｎ個ごとにそれぞれ異なる番号のレジスタが選択される第１のモードに切替えられる。 Explaining the operation of the WL-Selector 3c, the signals SEL and SELN input from the global processor 2 are set to mutually exclusive signal levels. That is, when SEL is at “Hi” level, SELN is set at “Low” level, and when SEL is at “Low” level, SELN is set at “Hi” level. First, when SEL is at “Hi” level and SELN is at “Low” level, the four transfer gates 3d1, 3d2, 3d3, and 3d4 have the transfer gate 3d1 turned off, the transfer gate 3d2 turned on, and the transfer gate 3d3 turned on. Since ON and transfer gate 3d4 are OFF, a signal input from input terminal WLLI is output from output terminal WLRO via transfer gate 3d2, and a signal input from input terminal WLRI passes through transfer gate 3d3. And output from the output terminal WLLO. In other words, the n registers are not divided into a plurality of register groups, and the first mode is selected in which registers each having a different number are selected for every n registers.

次に、ＳＥＬが“Ｌｏｗ”レベル、ＳＥＬＮが“Ｈｉ”レベルの場合、４つのトランスファゲート３ｄ１、３ｄ２、３ｄ３、３ｄ４は、トランスファゲート３ｄ１がＯＮ、トランスファゲート３ｄ２がＯＦＦ、トランスファゲート３ｄ３がＯＦＦ、トランスファゲート３ｄ４がＯＮ、となるため、入力端子ＷＬＬＩから入力された信号はトランスファゲート３ｄ１を経由して出力端子ＷＬＬＯから出力され、入力端子ＷＬＲＩから入力された信号はトランスファゲート３ｄ４を経由して出力端子ＷＬＲＯから出力される。つまり、ｎ個のレジスタを複数のレジスタ群に分割して各レジスタ群内のレジスタ数ごとに異なる番号のレジスタが選択される第２のモードに切替えられる。 Next, when SEL is at the “Low” level and SELN is at the “Hi” level, the four transfer gates 3d1, 3d2, 3d3, and 3d4 have the transfer gate 3d1 ON, the transfer gate 3d2 OFF, and the transfer gate 3d3 OFF. Since the transfer gate 3d4 is turned on, the signal input from the input terminal WLLI is output from the output terminal WLLO via the transfer gate 3d1, and the signal input from the input terminal WLRI is output via the transfer gate 3d4. Output from terminal WLRO. That is, the n registers are divided into a plurality of register groups, and the mode is switched to the second mode in which different numbers of registers are selected for each number of registers in each register group.

したがって、ＳＥＬが“Ｈｉ”レベル、ＳＥＬＮは“Ｌｏｗ”レベルの場合は２つのブロックを跨いで第１ワード線６が接続される構成となり、ＳＥＬが“Ｌｏｗ”レベル、ＳＥＬＮは“Ｈｉ”レベルの場合は２つのブロック内で異なるレジスタが選択されるように第１ワード線６が接続される構成となる。 Therefore, when the SEL is at the “Hi” level and the SELN is at the “Low” level, the first word line 6 is connected across the two blocks. The SEL is at the “Low” level and the SELN is at the “Hi” level. In this case, the first word line 6 is connected so that different registers are selected in the two blocks.

本実施形態では、複数のレジスタＲ０〜Ｒ１５をＷＬ−Ｓｅｌｅｃｔｏｒ３ｃで複数のブロックに分割して、ブロックごとにデータ転送の形式を変更できるようにしている。例えば図４の場合は、１６レジスタを２＋１４に２分割している。このときの１ブロック内のレジスタ数は任意の数でよい。勿論ブロック数も３以上に分割してもよい。 In the present embodiment, the plurality of registers R0 to R15 are divided into a plurality of blocks by the WL-Selector 3c so that the data transfer format can be changed for each block. For example, in the case of FIG. 4, the 16 registers are divided into 2 +14. At this time, the number of registers in one block may be an arbitrary number. Of course, the number of blocks may be divided into three or more.

本実施形態における第１ワード線６選択の切り替えを図６に示す。図６ではより簡易的に示すため、レジスタ数が４のものを２＋２のブロックに分割した場合で、プロセッサエレメントＰＥ０のレジスタＲ０とＲ２がそれぞれ先頭になる第１ワード線６を例に挙げている。図中の矢印が通ったレジスタ位置のものが同時に選択されることを表している。 FIG. 6 shows switching of the selection of the first word line 6 in the present embodiment. For the sake of simplicity, FIG. 6 shows an example of the first word line 6 in which the registers R0 and R2 of the processor element PE0 are headed when the register number of 4 is divided into 2 + 2 blocks. . In the figure, the register position through which the arrow passes is selected at the same time.

なお、本実施形態では第１の実施形態に対してレジスタを分割した構成としたが、第２の実施形態に対して行っても良い。この場合２レジスタの一方のブロックでは複数のＰＥＩＦを同時に使用したバンド幅でのデータ転送を行い、他方のブロックでは各レジスタごとに同一ＰＥＩＦからデータ転送を行うということも可能となる。 In this embodiment, the register is divided from that of the first embodiment. However, the present embodiment may be applied to the second embodiment. In this case, one block of two registers can perform data transfer with a bandwidth using a plurality of PEIFs simultaneously, and the other block can perform data transfer from the same PEIF for each register.

本実施形態によれば、複数のレジスタＲ０〜Ｒ１５を分割するためのＷＬ−Ｓｅｌｅｃｔｏｒ３ｃを設けたので、複数のレジスタを分割し領域ごとにデータ転送効率を向上させることを可能としている。したがって、データ転送効率の向上とデータ転送の並列性をバランスよく実現させることが可能となる。 According to the present embodiment, since the WL-Selector 3c for dividing the plurality of registers R0 to R15 is provided, it is possible to divide the plurality of registers and improve the data transfer efficiency for each region. Therefore, it is possible to achieve a good balance between improvement in data transfer efficiency and parallelism of data transfer.

［第４実施形態］
次に、本発明の第４の実施形態を図７を参照して説明する。なお、前述した第１乃至第３の実施形態と同一部分には、同一符号を付して説明を省略する。図７は、本発明の第４の実施形態にかかるＳＩＭＤ型マイクロプロセッサ１のブロック図である。 [Fourth Embodiment]
Next, a fourth embodiment of the present invention will be described with reference to FIG. The same parts as those in the first to third embodiments described above are denoted by the same reference numerals and description thereof is omitted. FIG. 7 is a block diagram of a SIMD type microprocessor 1 according to the fourth embodiment of the present invention.

第１から第３の実施形態ではＰＥインタフェース４がレジスタごとに独立して存在していたが、本実施形態では、ＰＥインタフェース４が同期して同時に制御される。そのために、ＰＥインタフェース４のうち第２ワード線７を生成出力する機能を有したブロックであるＰＥＩＦ−Ｄｅｃ４ａから出力される第２ワード線７を、第１ワード線と同様にプロセッサエレメントごとに異なる番号のレジスタに接続する。例えばプロセッサエレメントＰＥ０のレジスタＲ０に接続した第２ワード線７は、プロセッサエレメントＰＥ１ではレジスタＲ１、プロセッサエレメントＰＥ２ではレジスタＲ２といったように、１ずつずらして接続する。すなわち、第１ワード線６および第２ワード線８が、複数のプロセッサエレメント間で、ｎ個ごとにそれぞれ異なる番号のレジスタに接続されている。 In the first to third embodiments, the PE interface 4 exists independently for each register. However, in this embodiment, the PE interface 4 is controlled simultaneously in synchronization. Therefore, the second word line 7 output from the PEIF-Dec 4a which is a block having a function of generating and outputting the second word line 7 in the PE interface 4 is different for each processor element as in the first word line. Connect to the numbered register. For example, the second word line 7 connected to the register R0 of the processor element PE0 is shifted by one, such as the register R1 in the processor element PE1 and the register R2 in the processor element PE2. That is, the first word line 6 and the second word line 8 are connected to a register having a different number for every n pieces among a plurality of processor elements.

このようにすることで、ＰＥインタフェース４がレジスタごとに独立している場合と同様に、異なるプロセッサエレメントの異なる番号のレジスタに転送することが実現できる。 In this way, it is possible to realize transfer to different numbered registers of different processor elements, as in the case where the PE interface 4 is independent for each register.

本実施形態においても、第２、第３実施形態に記載した従来のワード線の接続やレジスタのブロック分割を行っても良い。 Also in this embodiment, the conventional word line connection and register block division described in the second and third embodiments may be performed.

本実施形態によれば、第１ワード線、第２ワード線ともにプロセッサエレメントごとに異なる番号のレジスタに接続しているので、ＰＥインタフェース４から共通のアドレス信号により、同時にレジスタにデータを転送する場合、演算処理に用いるレジスタに直接転送することが可能となる。したがってアドレス信号を増やすことなくデータ転送効率を向上させたレジスタを構成することが可能となる。 According to the present embodiment, since both the first word line and the second word line are connected to registers having different numbers for each processor element, data is simultaneously transferred from the PE interface 4 to the registers by a common address signal. It is possible to transfer directly to a register used for arithmetic processing. Therefore, it is possible to configure a register with improved data transfer efficiency without increasing the address signal.

なお、本発明は上記実施形態に限定されるものではない。即ち、本発明の骨子を逸脱しない範囲で種々変形して実施することができる。 The present invention is not limited to the above embodiment. That is, various modifications can be made without departing from the scope of the present invention.

本発明の第１の実施形態にかかるＳＩＭＤ型マイクロプロセッサのブロック図である。1 is a block diagram of a SIMD type microprocessor according to a first embodiment of the present invention. 図１に示したＳＩＭＤ型マイクロプロセッサにおけるレジスタ転送動作の説明図である。It is explanatory drawing of the register transfer operation | movement in the SIMD type | mold microprocessor shown in FIG. 本発明の第２の実施形態にかかるＳＩＭＤ型マイクロプロセッサ１のブロック図である。It is a block diagram of the SIMD type | mold microprocessor 1 concerning the 2nd Embodiment of this invention. 本発明の第３の実施形態にかかるＳＩＭＤ型マイクロプロセッサのブロック図である。It is a block diagram of a SIMD type microprocessor according to a third embodiment of the present invention. 図４に示したＳＩＭＤ型マイクロプロセッサのＷＬ−Ｓｅｌｅｃｔｏｒの構成図である。FIG. 5 is a configuration diagram of a WL-Selector of the SIMD type microprocessor illustrated in FIG. 4. 図４に示したＳＩＭＤ型マイクロプロセッサにおけるレジスタ転送動作の説明図である。FIG. 5 is an explanatory diagram of a register transfer operation in the SIMD type microprocessor shown in FIG. 4. 本発明の第４の実施形態にかかるＳＩＭＤ型マイクロプロセッサ１のブロック図である。It is a block diagram of the SIMD type | mold microprocessor 1 concerning the 4th Embodiment of this invention.

Explanation of symbols

１ＳＩＭＤ型マイクロプロセッサ
２グローバルプロセッサ（制御ユニット）
３ａＡＬＵ（演算器）
３ｃＷＬ−Ｓｅｌｅｃｔｏｒ（切替手段）
４ＰＥインタフェース（外部インタフェース）
５第１データ線（第１のデータ線）
６第１ワード線（第１のワード線）
６´ 第１ワード線（第１のワード線）
７第２データ線（第２のデータ線）
８第２ワード線（第２のワード線）
ＰＥ０〜ＰＥｎプロセッサエレメント
Ｒ０〜Ｒ１５レジスタ 1 SIMD type microprocessor 2 Global processor (control unit)
3a ALU (calculator)
3c WL-Selector (switching means)
4 PE interface (external interface)
5 First data line (first data line)
6 First word line (first word line)
6 'first word line (first word line)
7 Second data line (second data line)
8 Second word line (second word line)
PE0 to PEn Processor element R0 to R15 Register

Claims

A plurality of processor elements having n registers (n is a natural number of 2 or more) assigned with a unique number and an arithmetic unit, a control unit for controlling the operations of the plurality of processor elements, and the n registers An external interface for controlling communication from the outside to the outside, a first data line to which each of the n registers is connected, and the control unit communicate with the arithmetic unit via the first data line A first word line for selecting a register to perform, a second data line to which the registers having the same number between the plurality of processor elements and the external interface are connected, and the external interface to the second data line SIMD type microprocessor provided with a second word line for selecting a processor element that communicates via Oite,
The external interface is provided in a manner in correspondence to the respective numbers of the registers,
Between the n processor elements, the first word line is connected to a register among a plurality of processor elements so that the same first word line is connected to registers of different numbers , and
The second word line is connected to a register of the same number among a plurality of processor elements ;
SIMD type microprocessor characterized by the above.

A plurality of processor elements having n registers (n is a natural number of 2 or more) assigned with a unique number and an arithmetic unit, a control unit for controlling the operations of the plurality of processor elements, and the n registers An external interface for controlling communication from the outside to the outside, a first data line to which each of the n registers is connected, and the control unit communicate with the arithmetic unit via the first data line A first word line for selecting a register to perform, a second data line to which the registers having the same number between the plurality of processor elements and the external interface are connected, and the external interface to the second data line SIMD type microprocessor provided with a second word line for selecting a processor element that communicates via Oite,
Among the n processor elements, the first word line and the second word line are connected such that the same first word line and the same second word line are connected to registers having different numbers. SIMD microprocessor the word line and said Rukoto is connected to the register among a plurality of processor elements.

Another type of first word line is further provided for selecting a register with which the control unit communicates with the arithmetic unit via the first data line,
3. The SIMD type microprocessor according to claim 1 , wherein the first word line of the same different type is connected to a register having the same number of each processor element .

A switching means for selecting at least one of the first mode and the second mode, and having a function of changing the connection relation of at least the first word line ; and
In the first mode, the switching means connects at least the first word line with the same first word line to registers having different numbers among the n processor elements. A mode in which the first word line is connected to a register between a plurality of processor elements is maintained,
In the second mode, n registers in each processor element are divided into a plurality of register groups, and at least the first word line in each of the register groups is divided by the switching unit in the one processor element. Among the number of processor elements indicated by the number of registers included in the register group, the first word line includes a plurality of processor elements so that the same first word line is connected to registers having different numbers. The mode of being connected to the register between is maintained,
The SIMD type microprocessor according to any one of claims 1 to 3, wherein the SIMD type microprocessor is provided.