JP2005267615A

JP2005267615A - Parallel arithmetic processor

Info

Publication number: JP2005267615A
Application number: JP2005025558A
Authority: JP
Inventors: Takeshi Tanaka; 健田中; Hideshi Nishida; 英志西田; Masashi Hoshino; 将史星野; Takashi Furuta; 岳志古田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2004-02-16
Filing date: 2005-02-01
Publication date: 2005-09-29
Anticipated expiration: 2025-02-01
Also published as: JP4698242B2

Abstract

<P>PROBLEM TO BE SOLVED: To attain high performance while reducing a code size in programming processing for a SIMD processor. <P>SOLUTION: A processor with a plurality of processing elements comprises a decoder for decoding an instruction. Each of the processing elements comprises a data transfer pattern storage part storing values indicative of respective transfer patterns of data transfer to the processing element from the respective processing elements, a transfer part for transferring data from/to a processing element determined based on the transfer patterns, and an update part for updating a value in the transfer pattern storage part according to the result of decoding a preceding instruction by the decoder. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は画像データや音声データ、通信データなどの大量のデータを処理するのに適した並列演算プロセッサに関する。 The present invention relates to a parallel arithmetic processor suitable for processing a large amount of data such as image data, audio data, and communication data.

近年、画像処理や音声処理、通信処理の分野では、SIMD(ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎｓｔｒｅａｍＭｕｌｔｉｐｌｅＤａｔａｓｔｒｅａｍｓ)プロセッサが広く用いられている。SIMDプロセッサとは、プロセッシングエレメントを複数有し、これら複数のプロセッシングエレメントに並列演算を行わせるプロセッサである。かかるSIMDプロセッサで、画像処理等の様々な用途で処理を効率的に行うためには、プロセッシングエレメント間のデータ移動を効率的に行うことが重要である。また、任意のプロセッシングエレメント間のデータ移動を行うためには、「ネットワークパタン値」の設定方法が重要である。”ネットワークパタン値”とは、各プロセッシングエレメントに対し、どのプロセッシングエレメントからのデータを転送するかという「プロセッシングエレメント対プロセッシングエレメント」の転送パターンを意味する。従来のSIMDプロセッサでは、任意のプロセッシングエレメント間のデータ移動を行わずに、例えばデータ移動を隣接するプロセッシングエレメント間のみに制限することによって、ネットワークパタン値の設定を行わない場合がある（例えば、特許文献１参照）。
特開２００１−８４２２９号公報特開２００３−３３７６９６号広報 In recent years, SIMD (Single Instruction Stream Multiple Data Streams) processors are widely used in the fields of image processing, audio processing, and communication processing. The SIMD processor is a processor having a plurality of processing elements and causing the plurality of processing elements to perform a parallel operation. In order for such a SIMD processor to efficiently perform processing in various applications such as image processing, it is important to efficiently move data between processing elements. In addition, in order to move data between arbitrary processing elements, a method of setting a “network pattern value” is important. The “network pattern value” means a “processing element vs. processing element” transfer pattern in which data is transferred from which processing element to each processing element. In the conventional SIMD processor, there is a case where the network pattern value is not set by, for example, restricting data movement only between adjacent processing elements without performing data movement between arbitrary processing elements (for example, patents). Reference 1).
JP 2001-84229 A Japanese Laid-Open Patent Publication No. 2003-337696

しかしながら、任意のプロセッシングエレメント間のデータ移動ができないと、データの移動に制約が生じ、結果としてSIMDプロセッサの適用範囲を狭め、多様な用途に対して十分な性能を発揮することができなくなる。また、任意のプロセッシングエレメント間のデータ移動を行う場合であっても、毎サイクル自由にデータ移動を行うには、命令オペランド中に、そのネットワークパタン値を即値で指定する必要がある。ここでSIMDプロセッサに実装されているプロセッシングエレメントが16個であり、個々のプロセッシングエレメントに付与された4ビットのプロセッシングエレメント番号を用いてネットワークパタン値を表現しようとすると、8バイト(=4×16bit)もの即値を、命令のオペランドに指定しておく必要がある。転送パターンが変わる毎に8バイトの即値をもつ転送命令を、SIMDプロセッサに投入する必要があるので、従来のSIMDプロセッサでは、プログラミングのためのコードサイズは大きくならざるを得なかった。その結果、プログラムを格納するためのメモリの大容量化をもたらしていた。 However, if data cannot be moved between arbitrary processing elements, data movement is restricted, and as a result, the scope of application of the SIMD processor is narrowed and sufficient performance cannot be exhibited for various uses. Further, even when data is moved between arbitrary processing elements, in order to move data freely every cycle, it is necessary to specify the network pattern value as an immediate value in the instruction operand. Here, there are 16 processing elements implemented in the SIMD processor, and when trying to express the network pattern value using the 4-bit processing element number assigned to each processing element, 8 bytes (= 4 × 16bit) ) Immediate value must be specified in the operand of the instruction. Each time the transfer pattern changes, a transfer instruction having an immediate value of 8 bytes needs to be input to the SIMD processor. Therefore, in the conventional SIMD processor, the code size for programming has to be increased. As a result, the memory capacity for storing programs has been increased.

本発明の目的は、SIMDプロセッサに対する処理をプログラミングするにあたってのコードサイズを削減しつつ、高性能を得ることができるプロセッサを提供することである。 An object of the present invention is to provide a processor capable of obtaining high performance while reducing a code size when programming a process for a SIMD processor.

上記課題を解決するために、本発明は、複数のプロセッシングエレメントを備えたプロセッサであって、命令の解読を行うデコーダを備え、前記各プロセッシングエレメントは、当該プロセッシングエレメントに対し、何れのプロセッシングエレメントからデータを転送するかという転送パターンを示す値を格納した転送パターン格納手段と、前記転送パターンに基づいて決定されるプロセッシングエレメントとの間のデータ転送を実行する転送手段と、前記デコーダによる直前の命令の解読結果に従って、前記転送パターン格納手段の値の更新を行う更新手段とを含むプロセッサである。 In order to solve the above-described problems, the present invention is a processor including a plurality of processing elements, and includes a decoder that decodes an instruction, and each processing element includes any processing element from the processing element. A transfer pattern storing means for storing a value indicating a transfer pattern for transferring data, a transfer means for executing data transfer between a processing element determined based on the transfer pattern, and an instruction immediately before by the decoder And a updating unit that updates the value of the transfer pattern storage unit according to the result of decoding.

本発明は、上記の構成を備えることにより、命令に従って、転送パターンを表す値に対し、値の更新を行う。よって、データ移動を行う度に命令オペランド中に即値で転送パターンを示す値を与える必要はなく、SIMDプロセッサに対するプログラミングにあたってのコードサイズを減らすことができる。また、転送パターンに基づいて、プロセッシングエレメント間の転送を実行するので、任意のプロセッシングエレメント間でデータの移動が可能となる。 By providing the above configuration, the present invention updates a value representing a transfer pattern according to an instruction. Therefore, it is not necessary to give an immediate value indicating the transfer pattern in the instruction operand every time data is moved, and the code size for programming the SIMD processor can be reduced. In addition, since the transfer between the processing elements is executed based on the transfer pattern, the data can be moved between arbitrary processing elements.

ここで、前記各プロセッシングエレメントは更に、複数レジスタからなるレジスタセットを備え、前記データとは、レジスタセットにおける各レジスタの格納値であるとしても良い。
ここで、前記レジスタセットは、所定のオフセット信号に基づき、何れかのレジスタに格納されたデータを出力し、前記更新手段による更新は、転送パターンを示す値に対する算術演算を含み、前記オフセット信号は、前記転送パターンを示す値に対する演算に伴う桁上げ又は桁下げに基づき変化するとしても良い。 Here, each of the processing elements may further include a register set including a plurality of registers, and the data may be a stored value of each register in the register set.
Here, the register set outputs data stored in any of the registers based on a predetermined offset signal, and the update by the updating unit includes an arithmetic operation on a value indicating a transfer pattern, and the offset signal is It may be changed based on a carry or a carry accompanying a calculation for a value indicating the transfer pattern.

これにより、例えば、複数レジスタからなるレジスタセットがｒ０〜ｒ１５の１６本のレジスタから構成されている場合に、水平方向に並ぶ画素データがｒ０及びｒ１に格納されているとすると、折り返しで格納されている画素を効率良く転送することができる。
ここで、前記更新手段は、算術演算手段と、飽和演算手段とを含み、前記飽和演算手段は、前記算術演算手段による前記算術演算の結果が所定の範囲外になったか否かを判定し、所定の範囲外になると、前記転送パターンを示す値に対する飽和演算を行い、前記更新手段により更新された前記転送パターンを示す値は、算術演算の結果、飽和演算の結果の何れかであるとしても良い。 Thus, for example, when a register set composed of a plurality of registers is composed of 16 registers r0 to r15, if pixel data arranged in the horizontal direction are stored in r0 and r1, they are stored in a folded state. Can be transferred efficiently.
Here, the update means includes arithmetic operation means and saturation operation means, and the saturation operation means determines whether or not the result of the arithmetic operation by the arithmetic operation means is out of a predetermined range, When the value is outside the predetermined range, a saturation operation is performed on the value indicating the transfer pattern, and the value indicating the transfer pattern updated by the updating unit may be either the result of the arithmetic operation or the result of the saturation operation. good.

ここで、前記更新手段は、出力手段を含み、前記飽和演算の際に、前記転送パターンを示す値が前記所定の範囲より大きいと第一の値を飽和値として出力し、前記所定の範囲より小さいと第二の値を飽和値として出力するとしても良い。
ここで、前記所定の範囲とは、前記プロセッシングエレメント番号を示す範囲であり、前記第一の値とは、前記プロセッシングエレメント番号の最大値であり、前記第二の値とは、前記プロセッシングエレメント番号の最小値であるとしても良い。 Here, the updating unit includes an output unit, and outputs a first value as a saturation value when a value indicating the transfer pattern is larger than the predetermined range during the saturation calculation. If it is smaller, the second value may be output as a saturation value.
Here, the predetermined range is a range indicating the processing element number, the first value is a maximum value of the processing element number, and the second value is the processing element number. May be the minimum value.

これにより、画面の端部でのＦＩＲ（ＦｉｎｉｔｅＩｍｐｕｌｓｅＲｅｓｐｏｎｓｅ）フィルタ演算を好適に行える。
ここで、前記飽和演算手段は更に、前記飽和演算の際に、所定の最大値及び最小値の入力を受け付け、前記所定の範囲とは、前記入力された所定の最大値及び最小値により示される範囲であり、前記第一の値とは、前記所定の最大値であり、前記第二の値とは、前記所定の最小値であるとしても良い。 Thereby, FIR (Finite Impulse Response) filter calculation at the edge of the screen can be suitably performed.
Here, the saturation calculation means further accepts input of a predetermined maximum value and minimum value during the saturation calculation, and the predetermined range is indicated by the input predetermined maximum value and minimum value. The first value may be the predetermined maximum value, and the second value may be the predetermined minimum value.

これにより、様々な大きさの画像に対してＦＩＲフィルタ処理等を施すことができる。
ここで、前記飽和演算手段は更に、前記飽和演算の際に、所定の最大値及び最小値の入力を受け付け、前記所定の範囲とは、前記入力された所定の最大値及び最小値により示される範囲であり、前記第一の値とは、第一式により求まる値であり、前記第二の値とは、第二式により求まる値であるとしても良い。 As a result, FIR filter processing and the like can be performed on images of various sizes.
Here, the saturation calculation means further accepts input of a predetermined maximum value and minimum value during the saturation calculation, and the predetermined range is indicated by the input predetermined maximum value and minimum value. It is a range, and the first value may be a value obtained from the first equation, and the second value may be a value obtained from the second equation.

これにより、画面の端において、データを折り返す処理が行え、ＦＩＲフィルタ演算などを行う際のさらなる高画質化が可能になる。
ここで、前記レジスタセットの各レジスタは、バイトサイズのデータを2つ格納しており、前記飽和演算手段は、上位側と下位側のそれぞれに対し、前記算術演算手段による前記算術演算の結果が所定の範囲外になったか否かを判定し、所定の範囲外になると、上位側のバイトデータに対する飽和演算と、下位側のバイトデータに対する飽和演算とを同時に実行し、前記出力手段は、互いに異なる2つの飽和値を出力するとしても良い。 As a result, the data can be folded back at the edge of the screen, and the image quality can be further improved when performing the FIR filter calculation or the like.
Here, each register of the register set stores two byte-sized data, and the saturation calculation means has a result of the arithmetic operation by the arithmetic operation means for each of the upper side and the lower side. It is determined whether or not it is outside the predetermined range, and when it is out of the predetermined range, a saturation operation on the upper byte data and a saturation operation on the lower byte data are simultaneously performed, and the output means mutually Two different saturation values may be output.

これにより、上位側のバイトデータ、下位側のバイトデータが最大値を越えている場合、これらの値は互いに異なる値に丸められる。これにより例えば、赤色差、青色差が1つのプロセッシングエレメントにおけるレジスタの上位側、下位側に格納されている場合でも、別々の値をそれぞれプロセッシングエレメントに分配することができる。
ここで、前記更新手段は、算術演算手段と、モジュロ式の算術演算を行うモジュロ演算手段とを含み、前記モジュロ演算手段は、前記算術演算手段による前記算術演算の結果が前記転送パターンを示す値の範囲外になったか否かを判定し、範囲外になると、前記転送パターンを示す値に対するモジュロ演算を行い、前記更新手段により更新された前記転送パターンを示す値は、算術演算の結果、モジュロ演算の結果の何れかであるとしても良い。 Thus, when the upper byte data and the lower byte data exceed the maximum value, these values are rounded to different values. Thereby, for example, even when the red color difference and the blue color difference are stored in the upper and lower sides of the register in one processing element, different values can be distributed to the processing elements.
Here, the updating means includes arithmetic operation means and modulo arithmetic means for performing a modulo arithmetic operation, and the modulo arithmetic means is a value in which a result of the arithmetic operation by the arithmetic operation means indicates the transfer pattern. When the value is out of the range, a modulo operation is performed on the value indicating the transfer pattern, and the value indicating the transfer pattern updated by the updating unit is obtained as a result of the arithmetic operation as a modulo value. It may be one of the results of the calculation.

ここで、前記各プロセッシングエレメントは更に、リードオフセット値に対してモジュロ式の算術演算を行うことで、リードオフセット値を変化させる第１変化部と、ライトオフセット値に対してモジュロ式の算術演算を行うことで、ライトオフセット値を変化させる第２変化部とを備え、前記レジスタセットは、リードオフセット値に基づき、何れかのレジスタに格納されたデータを読み出し、ライトオフセット値に基づき、何れかのレジスタにデータを書き込むとしても良い。 Here, each of the processing elements further performs a modulo arithmetic operation on the read offset value to perform a first change unit that changes the read offset value, and a modulo arithmetic operation on the write offset value. And a second changing unit that changes the write offset value. The register set reads data stored in any register based on the read offset value, and Data may be written to the register.

これにより、即値指定を用いずとも、行列転置を実現することができる。 Thereby, matrix transposition can be realized without using immediate specification.

以下本発明に係る並列演算プロセッサの実施形態について説明する。
（第１実施形態）
図１は本発明に係る並列演算プロセッサの全体構成図である。図１に示すように並列演算プロセッサは、命令メモリ１０と、命令デコーダ１１と、全体制御部１２と、データメモリ１３と、プロセッシングエレメント群１４とを含む。 Embodiments of a parallel arithmetic processor according to the present invention will be described below.
(First embodiment)
FIG. 1 is an overall configuration diagram of a parallel arithmetic processor according to the present invention. As shown in FIG. 1, the parallel arithmetic processor includes an instruction memory 10, an instruction decoder 11, an overall control unit 12, a data memory 13, and a processing element group 14.

図２は、プロセッシングエレメント群１４のうちの一つである、プロセッシングエレメントＰＥ０の内部の構成図である。他のプロセッシングエレメントも同一の構成であるため、プロセッシングエレメントＰＥ０についてのみ説明する。図２に示すようにプロセッシングエレメントＰＥ０は、レジスタ番号変換部２０と、レジスタファイル１５と、論理和回路１６と、プロセッシングエレメント間を接続する相互接続網であるネットワーク３０ａと、プロセッシングエレメント間を接続する相互接続網であるネットワーク３０ｂと、演算器ユニット４０とを含む。 FIG. 2 is an internal configuration diagram of the processing element PE0, which is one of the processing element groups 14. Since the other processing elements have the same configuration, only the processing element PE0 will be described. As shown in FIG. 2, the processing element PE0 connects the register number conversion unit 20, the register file 15, the OR circuit 16, and a network 30a that is an interconnection network for connecting the processing elements, and the processing elements. The network 30b which is an interconnection network and the arithmetic unit 40 are included.

図１の並列演算プロセッサはＳＩＭＤ型の構成で、命令メモリ１０からプログラムを読み込み、命令デコーダ１１で命令のデコードを行い、デコードした結果からすべてのプロセッシングエレメントを制御する制御信号を生成する。また、分岐などのすべてのプロセッシングエレメントに共通に影響する動作は全体制御部１２で行う。また、プロセッシングエレメント間を接続し、データの移動を行うネットワーク３０をネットワーク３０ａ、ネットワーク３０ｂの２系統有しており、これにより、プロセッシングエレメント群１４の間でデータの移動が可能となる。 1 has a SIMD type configuration, reads a program from the instruction memory 10, decodes the instruction by the instruction decoder 11, and generates a control signal for controlling all the processing elements from the decoded result. In addition, operations that affect all processing elements in common, such as branching, are performed by the overall control unit 12. Further, there are two networks 30 that connect the processing elements and perform data movement, that is, a network 30a and a network 30b, so that data can be moved between the processing element groups 14.

プロセッシングエレメントＰＥ０は、オペランドのレジスタ番号、又はレジスタ番号変換部２０で変換された後のレジスタ番号を用いて、レジスタファイル１５にリード又はライトの動作を行う。したがって、本発明の並列演算プロセッサはＳＩＭＤ構成でありながら、プロセッシングエレメント毎に異なるレジスタにアクセスすることができる。
以下に、プロセッシングエレメントＰＥ０内の各構成部について詳しく説明する。 The processing element PE0 performs a read or write operation on the register file 15 using the register number of the operand or the register number converted by the register number conversion unit 20. Therefore, the parallel arithmetic processor of the present invention has a SIMD configuration, but can access different registers for each processing element.
Hereinafter, each component in the processing element PE0 will be described in detail.

レジスタ番号変換部２０は、必要に応じてレジスタ番号を変換する。
レジスタファイル１５は、指定されたレジスタから読み出したデータを出力する。また、ｒ０〜ｒ１５の１６本のレジスタから構成されており、各レジスタのビット幅は１６ビットである。
論理和回路１６は、ネットワーク３０から送出された信号を受け取ると、論理和演算を行い、レジスタ番号変換部２０にレジスタオフセット選択信号を出力する。 The register number conversion unit 20 converts a register number as necessary.
The register file 15 outputs the data read from the designated register. Also, it is composed of 16 registers r0 to r15, and the bit width of each register is 16 bits.
When the logical sum circuit 16 receives the signal transmitted from the network 30, it performs a logical sum operation and outputs a register offset selection signal to the register number conversion unit 20.

ネットワーク３０は、レジスタファイル１５から送出されたデータに対して演算を行い、演算結果をレジスタファイル１５に送出する。
演算器ユニット４０は、レジスタファイル１５又はネットワーク３０に対して演算を行い、演算結果をレジスタファイル１５又はネットワーク３０に送出する。図３は、演算器ユニット４０の構成の一例を示している。図３に示すように演算器ユニット４０は、算術論理演算器（ＡＬＵ：ＡｒｉｔｈｍｅｔｉｃＬｏｇｉｃＵｎｉｔ）４１ａと、算術論理演算器４１ｂと、バレルシフタ４２と、乗算器４３とを含む。 The network 30 performs an operation on the data transmitted from the register file 15 and transmits the operation result to the register file 15.
The arithmetic unit 40 performs an operation on the register file 15 or the network 30 and sends the operation result to the register file 15 or the network 30. FIG. 3 shows an example of the configuration of the arithmetic unit 40. As shown in FIG. 3, the arithmetic unit 40 includes an arithmetic logic unit (ALU) 41 a, an arithmetic logic unit 41 b, a barrel shifter 42, and a multiplier 43.

算術論理演算器４１ａ、及び算術論理演算器４１ｂは、加減算、AND/OR演算などを行う。
バレルシフタ４２は、シフトを行う。
乗算器４３は、乗算、及び除算等を行う。
なお、演算器ユニット４０は、演算器同士のデータの入出力が可能な構成であってもよい。また、演算器ユニット４０の構成は用途に応じて自由に決めることができる。 The arithmetic logic unit 41a and the arithmetic logic unit 41b perform addition / subtraction, AND / OR operation, and the like.
The barrel shifter 42 performs a shift.
The multiplier 43 performs multiplication, division, and the like.
Note that the arithmetic unit 40 may be configured to allow data input / output between the arithmetic units. The configuration of the arithmetic unit 40 can be freely determined according to the application.

続いて、レジスタ番号変換部２０の構成について説明する。図４はレジスタ番号変換部２０の構成を示す図である。図４に示すように、レジスタ番号変換部２０は、レジスタオフセット値保持部２１と、加減算器２２と、モジュロ演算部２３と、加減算器２４と、セレクタ２５とを含む。ここで、レジスタオフセット値とは、プログラムに記載され、命令デコーダ１１から入力されるレジスタ番号と各プロセッシングエレメントで実際にリード又はライトを行うレジスタ番号の差分を表す値である。 Next, the configuration of the register number conversion unit 20 will be described. FIG. 4 is a diagram showing the configuration of the register number conversion unit 20. As shown in FIG. 4, the register number conversion unit 20 includes a register offset value holding unit 21, an adder / subtractor 22, a modulo arithmetic unit 23, an adder / subtractor 24, and a selector 25. Here, the register offset value is a value described in a program and representing a difference between a register number input from the instruction decoder 11 and a register number that is actually read or written by each processing element.

レジスタオフセット値保持部２１は、レジスタオフセット値を保持するレジスタであり、プログラムによって書き換えることができる。
加減算器２２は、必要に応じて命令デコーダ１１より送出されるレジスタオフセット値の変化量とレジスタオフセット保持部２８に保持されている値とを加算又は減算し、演算結果をモジュロ演算器３０に送出する。 The register offset value holding unit 21 is a register that holds a register offset value, and can be rewritten by a program.
The adder / subtracter 22 adds or subtracts the change amount of the register offset value sent from the instruction decoder 11 and the value held in the register offset holding unit 28 as necessary, and sends the calculation result to the modulo calculator 30. To do.

モジュロ演算器３０は、加減算器２２から送出された演算結果と命令デコーダ１１より送出されるレジスタオフセットモジュロ値とをモジュロ演算し、結果をレジスタオフセット保持部２８に送出する。これによりレジスタオフセット値の更新が可能となる。レジスタオフセット値の更新動作を行わない場合は、加減算器２２およびモジュロ演算器３０は動作しない。 The modulo calculator 30 performs a modulo operation on the calculation result sent from the adder / subtracter 22 and the register offset modulo value sent from the instruction decoder 11, and sends the result to the register offset holding unit 28. As a result, the register offset value can be updated. When the register offset value update operation is not performed, the adder / subtractor 22 and the modulo calculator 30 do not operate.

加減算器２４は、命令デコーダ１１より送出されるレジスタ番号とレジスタオフセット値保持部２１に保持されている値とを加算又は減算する。加減算の結果が０〜１５の範囲をはずれる場合には、下位４ビットのみを出力し、必ず有効なレジスタ番号を出力する。
セレクタ２５は、プログラム中に記載されているレジスタ番号と、加減算器２４の出力のいずれかをレジスタオフセット選択信号にしたがって出力する。レジスタオフセット選択信号が“０”であれば、プログラム中に記載されているレジスタ番号を出力し、“１”であれば加減算器２４の出力をレジスタ番号として出力する。本実施形態においては、レジスタオフセット選択信号は、リードするレジスタ番号に対してのみ使用し、ライトするレジスタ番号に対しては、レジスタオフセット選択信号は常に“１”とする。 The adder / subtracter 24 adds or subtracts the register number sent from the instruction decoder 11 and the value held in the register offset value holding unit 21. If the result of addition / subtraction is outside the range of 0 to 15, only the lower 4 bits are output, and a valid register number is always output.
The selector 25 outputs either the register number described in the program or the output of the adder / subtractor 24 according to the register offset selection signal. If the register offset selection signal is “0”, the register number described in the program is output, and if it is “1”, the output of the adder / subtractor 24 is output as the register number. In this embodiment, the register offset selection signal is used only for the register number to be read, and the register offset selection signal is always “1” for the register number to be written.

なお、レジスタ番号変換部２０はリード又はライトするレジスタの数だけ持つことが可能であるが、必要に応じてその個数を制限してもよい。本実施形態では、バイト単位でアクセスし、ネットワーク２系統の入出力に使用するため、レジスタ番号変換部２０は各プロセッシングエレメントに少なくとも８つ持つことになる。
続いて、ネットワーク３０の構成について説明する。ここでネットワーク３０ａとネットワーク３０ｂとは同様の構成であるので、ネットワーク３０ａについてのみ説明する。図５はプロセッシングエレメント間を接続するネットワーク３０ａの構成図である。図５に示すように、ネットワーク３０ａは、入力端子３１と、入力端子３２と、セレクト処理部５０とを含む。また、説明のため、プロセッシングエレメントのＭＳＢ８ビットに相当する部分の構成要素をＰＥ０ＨのようにＨを付けて表し、ＬＳＢ８ビットに相当する部分の構成要素をＰＥ０ＬのようにＬを付けて表す。入力端には１６進数で番号が振られており、ＰＥ０Ｈから順に０ｘ００、０ｘ０１、０ｘ０２、０ｘ０３、０ｘ０４、０ｘ０５、０ｘ０６、０ｘ０７、０ｘ０８、０ｘ０９、０ｘ０ａ、０ｘ０ｂ、０ｘ０ｃ、０ｘ０ｄ、０ｘ０ｅ、０ｘ０ｆとする。 Note that the register number conversion unit 20 can have as many registers as read or write, but the number may be limited as necessary. In this embodiment, since access is made in byte units and used for input / output of two networks, at least eight register number conversion units 20 are provided in each processing element.
Next, the configuration of the network 30 will be described. Here, since the network 30a and the network 30b have the same configuration, only the network 30a will be described. FIG. 5 is a configuration diagram of the network 30a for connecting the processing elements. As illustrated in FIG. 5, the network 30 a includes an input terminal 31, an input terminal 32, and a select processing unit 50. Further, for the sake of explanation, the component corresponding to the MSB 8 bit of the processing element is represented by adding H such as PE0H, and the component corresponding to the LSB 8 bit is represented by adding L such as PE0L. Hexadecimal numbers are assigned to the input terminals, and 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f are assigned in order from PE0H. .

入力端子３１は、入力信号を取り込むための端子であり、レジスタファイル１５から読み出したデータのＭＳＢ８ビットをネットワーク３０ａに取り込む。取り込まれたデータは、すべてのプロセッシングエレメントのセレクト処理部に送出される。
入力端子３２は、入力信号を取り込むための端子であり、レジスタファイル１５から読み出したデータのＬＳＢ８ビットをネットワーク３０ａに取り込む。取り込まれたデータは、すべてのプロセッシングエレメントのセレクト処理部に送出される。 The input terminal 31 is a terminal for taking in an input signal, and takes in the MSB 8 bits of the data read from the register file 15 to the network 30a. The fetched data is sent to the select processing unit of all processing elements.
The input terminal 32 is a terminal for capturing an input signal, and captures the LSB 8 bits of the data read from the register file 15 into the network 30a. The fetched data is sent to the select processing unit of all processing elements.

図５では一部を省略しているが、ＰＥ０Ｈの入力端子３１から入力されたデータは、ＰＥ０Ｈ、ＰＥ０ＬからＰＥ７Ｈ、ＰＥ７Ｌまでのすべてのセレクト処理部に送出される。
セレクト処理部５０は、入力された１６個のデータから１個を選択し出力する。そして、ＰＥ０ＨとＰＥ０Ｌのセレクト処理部の出力を合成して１６ビットの形で出力する。
図６は、図５のセレクト処理部５０の内部構成を具体的に示したネットワーク３０aの一部を示す構成図である。図に示すようにネットワーク３０aは、同一構成であるセレクト処理部を複数配したものであり、そのうちの一つを図７に示す。 Although part of the data is omitted in FIG. 5, the data input from the input terminal 31 of PE0H is sent to all select processing units from PE0H and PE0L to PE7H and PE7L.
The select processing unit 50 selects and outputs one of the input 16 data. Then, the outputs of the PE0H and PE0L select processing units are combined and output in a 16-bit form.
FIG. 6 is a configuration diagram showing a part of the network 30a specifically showing the internal configuration of the select processing unit 50 of FIG. As shown in the figure, the network 30a includes a plurality of select processing units having the same configuration, one of which is shown in FIG.

図７はネットワーク３０ａのセレクト処理部５０の構成図である。図７に示すように、セレクト処理部５０は、１６ｔｏ１セレクタ５１と、セレクト信号変換部６０と、セレクタ５２と、ネットワークパタンレジスタ５３と、加減算器５４とを含む。
１６ｔｏ１セレクタ５１は、セレクト信号変換部６０から送出されるセレクト信号にしたがって入力データから１つを選択し、選択されたデータを出力する。セレクト信号は、ネットワーク３０ａの入力端に付けられた０ｘ００〜０ｘ０ｆの値をとり、１６ｔｏ１セレクタ５１は制御信号と同じ番号の入力データを出力する。 FIG. 7 is a configuration diagram of the select processing unit 50 of the network 30a. As shown in FIG. 7, the select processing unit 50 includes a 16 to 1 selector 51, a select signal conversion unit 60, a selector 52, a network pattern register 53, and an adder / subtractor 54.
The 16 to 1 selector 51 selects one of the input data according to the select signal sent from the select signal conversion unit 60, and outputs the selected data. The select signal takes a value of 0x00 to 0x0f attached to the input terminal of the network 30a, and the 16to1 selector 51 outputs input data having the same number as the control signal.

セレクト信号変換部６０は、入力されたネットワークパタン値から、１６ｔｏ１セレクタ５１の制御信号と、レジスタオフセット選択信号を生成する。そして１６ｔｏ１セレクタ５１に制御信号を送出し、レジスタ番号変換部２０にレジスタオフセット選択信号を送出する。
セレクタ５２は、セレクト信号変換部６０に入力するネットワークパタン値を選択するためのものであり、プログラム中に記載されている即値、指定されたレジスタ番号のレジスタ、ネットワークパタンレジスタ５３で保持されている値のいずれかを命令によって選択する。 The select signal conversion unit 60 generates a control signal for the 16to1 selector 51 and a register offset selection signal from the input network pattern value. Then, a control signal is sent to the 16 to 1 selector 51, and a register offset selection signal is sent to the register number conversion unit 20.
The selector 52 is for selecting a network pattern value to be input to the select signal conversion unit 60, and is held in an immediate value described in the program, a register with a designated register number, and the network pattern register 53. Select one of the values by the instruction.

ネットワークパタンレジスタ５３は、１６ｔｏ１セレクタ５１を制御するためのネットワークパタン値を保持するレジスタである。ここで、ネットワークパタン値とは、１６ｔｏ１セレクタ５１を制御するための制御信号の値を表す。
加減算器５４は、必要に応じて、ネットワークパタンレジスタ５３で保持されている値と、外部から入力されるネットワークパタン値変化量とを加算又は減算し、ネットワークパタンレジスタ５３に再度格納する。これによりネットワークパタン値の更新動作を行うことができる。ネットワークパタン値の更新動作を行わない場合は、加減算器５４は動作しない。 The network pattern register 53 is a register that holds a network pattern value for controlling the 16 to 1 selector 51. Here, the network pattern value represents a value of a control signal for controlling the 16 to 1 selector 51.
The adder / subtractor 54 adds or subtracts the value held in the network pattern register 53 and the network pattern value change amount input from the outside as necessary, and stores the value in the network pattern register 53 again. Thus, the network pattern value update operation can be performed. When the network pattern value update operation is not performed, the adder / subtracter 54 does not operate.

図５、図６、及び図７に示すように各セレクト処理部毎にネットワークパタンレジスタを備えるので、全体としてネットワークパタンレジスタを１６個備える。そして、命令のオペランドにネットワークパタン値変化量を指定すると、同時に全てのネットワークパタンレジスタに保持されているネットワークパタン値が更新される。
続いて、セレクト信号変換部６０の構成について説明する。図８はセレクト信号変換部６０の構成図である。図８に示すように、セレクト信号変換部６０は、レジスタオフセット選択演算部６１と、モジュロ演算部６２と、飽和演算部６３と、セレクタＡ６４と、セレクタＢ６５と、デマルチプレクサ６６とを含む。 As shown in FIGS. 5, 6, and 7, each select processing unit is provided with a network pattern register, so that 16 network pattern registers are provided as a whole. When the network pattern value change amount is designated as the operand of the instruction, the network pattern values held in all the network pattern registers are updated at the same time.
Next, the configuration of the select signal conversion unit 60 will be described. FIG. 8 is a configuration diagram of the select signal converter 60. As shown in FIG. 8, the select signal conversion unit 60 includes a register offset selection calculation unit 61, a modulo calculation unit 62, a saturation calculation unit 63, a selector A 64, a selector B 65, and a demultiplexer 66.

ここで、図中のＰＥモード信号は、命令デコーダ１１より送出され、８ＰＥモード又は４ＰＥモードいずれかのモードをとり、レジスタオフセット選択演算部６１、モジュロ演算部６２、飽和演算部６３はそれぞれモードによって動作を変える。また、動作モード信号は、命令デコーダ１１より送出され、レジスタオフセット選択、モジュロ、飽和いずれかのモードをとり、セレクタＡ６４およびセレクタＢ６５によって、１６ｔｏ１セレクタ５９及びデマルチプレクサ６６への入力を選択する。 Here, the PE mode signal in the figure is sent from the instruction decoder 11 and takes either the 8PE mode or the 4PE mode, and the register offset selection calculation unit 61, the modulo calculation unit 62, and the saturation calculation unit 63 are respectively in accordance with the mode. Change the behavior. The operation mode signal is sent from the instruction decoder 11 and takes one of the register offset selection, modulo and saturation modes, and the selector A 64 and selector B 65 select the input to the 16 to 1 selector 59 and the demultiplexer 66.

レジスタオフセット選択演算部６１は、ネットワークパタン値を受け取ると、レジスタオフセット選択演算を行い、セレクト信号とレジスタオフセット選択信号を出力する。ＰＥモード信号が８ＰＥモードの場合、ネットワークパタン値の下位４ビットをセレクト信号とし、５ビット目をレジスタオフセット選択信号とする。ＰＥモード信号が４ＰＥモードの場合、ＰＥ０〜３では、ネットワークパタン値の下位３ビットをセレクト信号とし、セレクト信号の４ビット目はＰＥ０〜３では“０”を、ＰＥ４〜７では“１”を代入する。したがって、ＰＥ４〜７ではセレクト信号はネットワークパタン値の下位３ビットに８を加算した値となる。また、ネットワークパタン値の４ビット目をレジスタオフセット選択信号とする。つまり４ＰＥモードにおいては、ＰＥ０〜３、ＰＥ４〜７それぞれの間でのみデータの移動が行われる。 Upon receiving the network pattern value, the register offset selection calculation unit 61 performs a register offset selection calculation and outputs a select signal and a register offset selection signal. When the PE mode signal is the 8PE mode, the lower 4 bits of the network pattern value are used as a select signal, and the fifth bit is used as a register offset selection signal. When the PE mode signal is the 4PE mode, the lower 3 bits of the network pattern value are used as the select signal in PE0-3, and the fourth bit of the select signal is “0” in PE0-3 and “1” in PE4-7. substitute. Therefore, in PE4-7, the select signal is a value obtained by adding 8 to the lower 3 bits of the network pattern value. The fourth bit of the network pattern value is used as a register offset selection signal. That is, in the 4PE mode, data movement is performed only between PE0-3 and PE4-7.

続いて、レジスタオフセット選択演算部６１の具体的な動作について説明する。ＰＥモード信号が８ＰＥモードの場合の動作概要を図９（ａ）〜（ｃ）に示す。図中のＰＥ０からＰＥ７はプロセッシングエレメントの番号を示し、ＨとＬはＨｉｇｈとＬｏｗの頭文字であり、それぞれのプロセッシングエレメントがハーフワード単位で画素値を格納していることを示す。つまり、それらはどのプロセッシングエレメントのＨとＬのどちらに帰属しているのかを識別する。図９（ａ）、図９（ｂ）及び図９（ｃ）の第一段はネットワークパタン値を示す。第二段はセレクト信号出力を示し、第三段はレジスタオフセット選択信号出力を示す。ここで、ネットワークパタン値に関して、十の位はレジスタオフセット値を使用するか否かを示す。十の位が０なら、レジスタオフセット値を使用しないことを示す。レジスタオフセット値を使用しない場合は、レジスタオフセット選択信号出力は０になり、使用する場合は１になる。一の位はどのプロセッシングエレメントのＨかＬのどちらからデータを取ってくるかを指定する。また、８ＰＥモードの場合、セレクト信号出力は０ｘ００から０ｘ０ｆの間の値となる。したがって、例えば、ネットワークパタン値が０ｘ００では、セレクト信号出力は０ｘ００になり、ＰＥ０Ｈからデータを取ってくる。また、レジスタオフセットは使用しないので、レジスタオフセット選択信号出力は０になる。同様に、ネットワークパタン値が０ｘ０１では、セレクト信号出力は０ｘ０１になり、ＰＥ０Ｌからデータを取ってくる。また、レジスタオフセットは使用しないので、レジスタオフセット選択信号出力は０になる。ネットワークパタン値が０ｘ１０では、セレクト信号出力は０ｘ００になり、ＰＥ０Ｈからデータを取ってくる。また、レジスタオフセット選択信号出力は１になる。ネットワークパタン値が０ｘｆｆでは、セレクト信号出力は０ｘ０ｆになり、ＰＥ０Ｈからデータを取ってくる。レジスタオフセット選択信号出力は１になる。 Next, a specific operation of the register offset selection calculation unit 61 will be described. 9A to 9C show an outline of the operation when the PE mode signal is the 8PE mode. PE0 to PE7 in the figure indicate processing element numbers, H and L are initials of High and Low, and indicate that each processing element stores a pixel value in units of half words. That is, they identify which processing element H or L belongs to. The first stage of FIG. 9A, FIG. 9B, and FIG. 9C shows network pattern values. The second stage shows the select signal output, and the third stage shows the register offset selection signal output. Here, regarding the network pattern value, the tens place indicates whether or not the register offset value is used. If the tens place is 0, it indicates that the register offset value is not used. When the register offset value is not used, the register offset selection signal output is 0, and when it is used, it is 1. The first digit specifies from which processing element H or L data is taken. In the 8PE mode, the select signal output has a value between 0x00 and 0x0f. Therefore, for example, when the network pattern value is 0x00, the select signal output is 0x00, and data is fetched from PE0H. Since the register offset is not used, the register offset selection signal output becomes zero. Similarly, when the network pattern value is 0x01, the select signal output is 0x01, and data is fetched from PE0L. Since the register offset is not used, the register offset selection signal output becomes zero. When the network pattern value is 0x10, the select signal output is 0x00, and data is fetched from PE0H. The register offset selection signal output is 1. When the network pattern value is 0xff, the select signal output is 0x0f, and data is fetched from PE0H. The register offset selection signal output becomes 1.

図９（ａ）では、すべてのネットワークパタン値が０ｘ００〜０ｘ０ｆの範囲にあるので、レジスタオフセット選択演算部６１がセレクト信号として下位４ビットをとると、セレクト信号はネットワークパタン値と同一になり、レジスタオフセット選択信号はすべて“０”になる。図９（ｂ）では、ネットワークパタン値が図９（ａ）と比較して＋２加算されている。ネットワークパタン値が０ｘ００〜０ｘ０ｆの範囲にある場合には、図９（ａ）と同様であり、ネットワークパタン値が０ｘ０ｆより大きい場合には、レジスタオフセット選択演算部６１はセレクト信号として、その下位４ビットをとり、レジスタオフセット選択信号は“１”になる。図９（ｃ）では、ネットワークパタン値が図９（ａ）と比較して２減算されている。ネットワークパタン値が０ｘ００〜０ｘ０ｆの範囲にある場合には、図９（ａ）と同様であり、ネットワークパタン値が０ｘ００より小さい場合には、レジスタオフセット選択演算部６１はセレクト信号として、その下位４ビットをとり、レジスタオフセット選択信号は“１”になる。 In FIG. 9A, since all network pattern values are in the range of 0x00 to 0x0f, when the register offset selection calculation unit 61 takes the lower 4 bits as a select signal, the select signal becomes the same as the network pattern value. All register offset selection signals are “0”. In FIG. 9B, the network pattern value is added by +2 compared to FIG. 9A. When the network pattern value is in the range of 0x00 to 0x0f, the processing is the same as in FIG. 9A. When the network pattern value is larger than 0x0f, the register offset selection calculation unit 61 uses the lower 4 The bit is taken and the register offset selection signal becomes “1”. In FIG. 9C, the network pattern value is subtracted by 2 compared to FIG. 9A. When the network pattern value is in the range of 0x00 to 0x0f, it is the same as FIG. 9A. When the network pattern value is smaller than 0x00, the register offset selection calculation unit 61 uses the lower 4 The bit is taken and the register offset selection signal becomes “1”.

ＰＥモード信号が４ＰＥモードの場合の動作概要を図１０（ａ）〜（ｃ）に示す。図中のＰＥ０からＰＥ７はプロセッシングエレメントの番号を示し、ＨとＬはそれぞれのプロセッシングエレメントがハーフワード単位で画素値を格納していることを示す。図１０（ａ）、図１０（ｂ）及び図１０（ｃ）の第一段はネットワークパタン値を示す。第二段はセレクト信号出力を示し、第三段はレジスタオフセット選択信号出力を示す。ここで、ネットワークパタン値に関して、８ＰＥモードの場合とは異なり、ネットワークパタン値が０ｘ００から０ｘ０７の値をとる場合は、レジスタオフセット値を使用せず、レジスタオフセット選択信号出力は０になり、ネットワークパタン値が０ｘ００より小さい場合、または０ｘ０７より大きい場合はレジスタオフセット値を使用し、レジスタオフセット選択信号出力は１になる。また、ＰＥ０からＰＥ３ではセレクト信号出力は０ｘ００から０ｘ０７の値をとり、ＰＥ４からＰＥ７では０ｘ０８から０ｘ０ｆの値をとる。したがって、例えば、ＰＥ０からＰＥ３ではネットワークパタン値が０ｘ０８の場合、セレクト信号出力は０ｘ００になり、ＰＥ０Ｈからデータを取ってくる。また、レジスタオフセット選択信号出力は１になる。 An outline of the operation when the PE mode signal is the 4PE mode is shown in FIGS. PE0 to PE7 in the figure indicate processing element numbers, and H and L indicate that each processing element stores a pixel value in units of half words. The first stage of FIG. 10A, FIG. 10B and FIG. 10C shows the network pattern value. The second stage shows the select signal output, and the third stage shows the register offset selection signal output. Here, regarding the network pattern value, unlike the case of 8PE mode, when the network pattern value takes a value from 0x00 to 0x07, the register offset selection signal output becomes 0 without using the register offset value, and the network pattern. When the value is smaller than 0x00 or larger than 0x07, the register offset value is used, and the register offset selection signal output becomes 1. Further, the select signal output takes values from 0x00 to 0x07 in PE0 to PE3, and takes values from 0x08 to 0x0f in PE4 to PE7. Therefore, for example, in PE0 to PE3, when the network pattern value is 0x08, the select signal output is 0x00, and data is fetched from PE0H. The register offset selection signal output is 1.

図１０（ａ）では、すべてのネットワークパタン値が０ｘ００〜０ｘ０７の範囲にあるので、レジスタオフセット選択演算部６１がセレクト信号として下位３ビットをとると、ＰＥ０〜３ではセレクト信号はネットワークパタン値と同一になり、レジスタオフセット選択信号はすべて“０”になる。ＰＥ４〜７ではセレクト信号はネットワークパタン値の下位３ビットに８を加算した値となり、レジスタオフセット選択信号はすべて“０”になる。図１０（ｂ）では、ネットワークパタン値が図１０（ａ）と比較して＋１加算されている。ネットワークパタン値が０ｘ００〜０ｘ０７の範囲にある場合には、図１０（ａ）と同様である。ＰＥ０〜３については、ネットワークパタン値が０ｘ０７より大きい場合は、レジスタオフセット選択演算部６１はセレクト信号として、その下位３ビットをとり、４ビット目には“０”が入る。レジスタオフセット選択信号は“１”になる。ＰＥ４〜７では、ネットワークパタン値が０ｘ０７より大きい場合は、レジスタオフセット選択演算部６１はセレクト信号として、その下位３ビットをとり、４ビット目には“１”が入る。レジスタオフセット選択信号は“１”となる。図１０（ｃ）では、ネットワークパタン値が図１０（ａ）と比較して２減算されている。ネットワークパタン値が０ｘ００〜０ｘ０７の範囲にある場合には、図１０（ａ）と同様である。ＰＥ０〜３については、ネットワークパタン値が０ｘ００より小さい場合は、レジスタオフセット選択演算部６１はセレクト信号として、その下位３ビットをとり、４ビット目には“０”が入る。レジスタオフセット選択信号は“１”になる。ＰＥ４〜７では、ネットワークパタン値が０ｘ００より小さい場合は、レジスタオフセット選択演算部６１はセレクト信号として、その下位３ビットをとり、４ビット目には“１”が入る。レジスタオフセット選択信号は“１”になる。 In FIG. 10A, since all network pattern values are in the range of 0x00 to 0x07, if the register offset selection calculation unit 61 takes the lower 3 bits as a select signal, the select signal is the network pattern value in PE0 to PE3. The register offset selection signals are all “0”. In PE4-7, the select signal is a value obtained by adding 8 to the lower 3 bits of the network pattern value, and the register offset selection signals are all "0". In FIG. 10 (b), the network pattern value is incremented by +1 compared to FIG. 10 (a). When the network pattern value is in the range of 0x00 to 0x07, it is the same as in FIG. For PE0 to PE3, when the network pattern value is larger than 0x07, the register offset selection calculation unit 61 takes the lower 3 bits as a select signal, and “0” is set in the fourth bit. The register offset selection signal becomes “1”. In PE4-7, when the network pattern value is larger than 0x07, the register offset selection calculation unit 61 takes the lower 3 bits as a select signal, and “1” is set in the fourth bit. The register offset selection signal is “1”. In FIG. 10C, the network pattern value is subtracted by 2 compared to FIG. When the network pattern value is in the range of 0x00 to 0x07, it is the same as in FIG. For PE0 to PE3, when the network pattern value is smaller than 0x00, the register offset selection calculation unit 61 takes the lower 3 bits as a select signal, and “0” is entered in the fourth bit. The register offset selection signal becomes “1”. In PE4-7, when the network pattern value is smaller than 0x00, the register offset selection calculation unit 61 takes the lower 3 bits as a select signal, and "1" is entered in the fourth bit. The register offset selection signal becomes “1”.

モジュロ演算部６２は、ネットワークパタン値を受け取ると、モジュロ演算を行い、セレクト信号とレジスタオフセット選択信号を出力する。ＰＥモード信号が８ＰＥモードの場合、ネットワークパタン値の下位４ビットをセレクト信号とし、レジスタオフセット選択信号は常に“１”とする。いずれの場合においても、ネットワークパタン値によらず常に下位４ビットをセレクト信号とし、レジスタオフセット選択信号は常に“１”とする。ＰＥモード信号が４ＰＥモードの場合は、ＰＥ０〜３では、ネットワークパタン値の下位３ビットをセレクト信号とし、セレクト信号の４ビット目はＰＥ０〜３では“０”を、ＰＥ４〜７では“１”を代入する。レジスタオフセット選択信号は常に“１”とする。いずれの場合においても、ネットワークパタン値によらず常に下位３ビットをセレクト信号とし、セレクト信号の４ビット目はＰＥ０〜３では“０”を、ＰＥ４〜７では“１”を代入する。レジスタオフセット選択信号は常に“１”とする。 Upon receiving the network pattern value, the modulo operation unit 62 performs a modulo operation and outputs a select signal and a register offset selection signal. When the PE mode signal is the 8PE mode, the lower 4 bits of the network pattern value are set as the select signal, and the register offset selection signal is always set to “1”. In either case, the lower 4 bits are always used as the select signal regardless of the network pattern value, and the register offset selection signal is always set to “1”. When the PE mode signal is the 4PE mode, the lower 3 bits of the network pattern value are used as the select signal in PE0-3, and the fourth bit of the select signal is “0” in PE0-3 and “1” in PE4-7. Is assigned. The register offset selection signal is always “1”. In any case, the lower 3 bits are always used as a select signal regardless of the network pattern value, and the fourth bit of the select signal is substituted with “0” for PE0-3 and “1” for PE4-7. The register offset selection signal is always “1”.

続いて、モジュロ演算部６２の具体的な動作について説明する。ＰＥモード信号が８ＰＥモードの場合の動作概要を図１１（ａ）〜（ｃ）に示す。図中のＰＥ０からＰＥ７はプロセッシングエレメントの番号を示し、ＨとＬはそれぞれのプロセッシングエレメントがハーフワード単位で画素値を格納していることを示す。図１１（ａ）、図１１（ｂ）及び図１１（ｃ）の第一段はネットワークパタン値を示す。第二段はセレクト信号出力を示し、第三段はレジスタオフセット選択信号出力を示す。図１１（ａ）、図１１（ｂ）、図１１（ｃ）では、いずれの場合においても、ネットワークパタン値によらず常に下位４ビットがセレクト信号になり、レジスタオフセット選択信号は常に“１”になる。 Next, a specific operation of the modulo arithmetic unit 62 will be described. An outline of the operation when the PE mode signal is the 8PE mode is shown in FIGS. PE0 to PE7 in the figure indicate processing element numbers, and H and L indicate that each processing element stores a pixel value in units of half words. The first stage of FIG. 11A, FIG. 11B, and FIG. 11C shows network pattern values. The second stage shows the select signal output, and the third stage shows the register offset selection signal output. 11 (a), 11 (b), and 11 (c), in any case, the lower 4 bits are always the select signal regardless of the network pattern value, and the register offset selection signal is always “1”. become.

ＰＥモード信号が４ＰＥモードの場合の動作概要を図１２（ａ）〜（ｃ）に示す。図中のＰＥ０からＰＥ７はプロセッシングエレメントの番号を示し、ＨとＬはそれぞれのプロセッシングエレメントがハーフワード単位で画素値を格納していることを示す。図１２（ａ）、図１２（ｂ）及び図１２（ｃ）の第一段はネットワークパタン値を示す。第二段はセレクト信号出力を示し、第三段はレジスタオフセット選択信号出力を示す。図１２（ａ）、図１２（ｂ）、図１２（ｃ）では、ネットワークパタン値によらず常に下位３ビットがセレクト信号になり、セレクト信号の４ビット目はＰＥ０〜３では“０”を、ＰＥ４〜７では“１”を代入する。レジスタオフセット選択信号は常に“１”になる。 An outline of the operation when the PE mode signal is the 4PE mode is shown in FIGS. PE0 to PE7 in the figure indicate processing element numbers, and H and L indicate that each processing element stores a pixel value in units of half words. The first stage of FIG. 12A, FIG. 12B and FIG. 12C shows network pattern values. The second stage shows the select signal output, and the third stage shows the register offset selection signal output. 12 (a), 12 (b), and 12 (c), the lower 3 bits are always the select signal regardless of the network pattern value, and the fourth bit of the select signal is “0” in PE0-3. In PE4-7, "1" is substituted. The register offset selection signal is always “1”.

飽和演算部６３は、ネットワークパタン値を受け取ると、飽和演算を行い、セレクト信号とレジスタオフセット選択信号を出力する。ＰＥモード信号が８ＰＥモードの場合、ネットワークパタン値が０より小さい場合は０を、０ｘ０ｆより大きい場合は０ｘ０ｆをセレクト信号とし、いずれでもない場合はネットワークパタン値の下位４ビットをセレクト信号とする。そして、レジスタオフセット選択信号はこの変換を行った後の５ビット目の値とする。本実施形態では０ｘ００及び０ｘ０ｆで飽和するので、レジスタオフセット選択信号は常に“０”になる。ＰＥモード信号が４ＰＥモードの場合、ＰＥ０〜３ではネットワークパタン値が０ｘ００より小さい場合は０ｘ００を、０ｘ０７より大きい場合は０ｘ０７をセレクト信号とし、いずれでもない場合は下位３ビットをセレクト信号とする。レジスタオフセット選択信号は、変換後のネットワークパタン値の４ビット目とする。本実施形態では０ｘ００及び０ｘ０７で飽和するので、レジスタオフセット選択信号は常に“０”になる。また、セレクト信号の４ビット目は常に“０”を代入する。ＰＥ４〜７ではネットワークパタン値が０ｘ００より小さい場合は０ｘ０８を、０ｘ０７より大きい場合は０ｘ０ｆをセレクト信号とし、いずれでもない場合は下位３ビットをセレクト信号とする。レジスタオフセット選択信号は、変換後のネットワークパタン値の４ビット目とする。本実施形態では０ｘ０８及び０ｘ０ｆで飽和するので、レジスタオフセット選択信号は常に“０”になる。また、セレクト信号の４ビット目は常に“１”を代入する。 Upon receiving the network pattern value, the saturation calculation unit 63 performs a saturation calculation and outputs a select signal and a register offset selection signal. When the PE mode signal is the 8PE mode, 0 is used as the select signal when the network pattern value is smaller than 0, 0x0f is used as the select signal when the network pattern value is larger than 0x0f, and the lower 4 bits of the network pattern value is used as the select signal. The register offset selection signal is the value of the fifth bit after this conversion. In this embodiment, since saturation occurs at 0x00 and 0x0f, the register offset selection signal is always “0”. When the PE mode signal is the 4PE mode, when the network pattern value is less than 0x00 in PE0-3, 0x00 is selected as the select signal, and when it is greater than 0x07, 0x07 is selected as the select signal. The register offset selection signal is the fourth bit of the converted network pattern value. In this embodiment, since saturation occurs at 0x00 and 0x07, the register offset selection signal is always “0”. Also, “0” is always substituted for the fourth bit of the select signal. In PE4 to PE7, if the network pattern value is smaller than 0x00, 0x08 is used as the select signal, and if it is larger than 0x07, 0x0f is used as the select signal. The register offset selection signal is the fourth bit of the converted network pattern value. In this embodiment, since saturation occurs at 0x08 and 0x0f, the register offset selection signal is always “0”. Also, "1" is always substituted for the fourth bit of the select signal.

続いて、飽和演算部６３の具体的な動作について説明する。ＰＥモード信号が８ＰＥモードの場合の動作概要を図１３（ａ）〜（ｃ）に示す。図中のＰＥ０からＰＥ７はプロセッシングエレメントの番号を示し、ＨとＬはそれぞれのプロセッシングエレメントがハーフワード単位で画素値を格納していることを示す。図１３（ａ）、図１３（ｂ）及び図１３（ｃ）の第一段はネットワークパタン値を示す。第二段はセレクト信号出力を示し、第三段はレジスタオフセット選択信号出力を示す。図１３（ａ）では、すべてのネットワークパタン値が０ｘ００〜０ｘ０ｆの範囲にあるので、セレクト信号はネットワークパタン値と同一になる。レジスタオフセット選択信号はすべて“０”になる。図１３（ｂ）では、ネットワークパタン値が０ｘ００〜０ｘ０ｆの範囲にある場合には、図１３（ａ）と同様である。ＰＥ７Ｈ及びＰＥ７Ｌでは、ネットワークパタン値が０ｘ０ｆより大きいので、セレクト信号は０ｘ０ｆになる。レジスタオフセット選択信号はすべて“０”になる。図１３（ｃ）では、ネットワークパタン値が０ｘ００〜０ｘ０ｆの範囲にある場合には、図１３（ａ）と同様である。ＰＥ０Ｈ及びＰＥ０Ｌでは、ネットワークパタン値が０ｘ００より小さい場合ので、セレクト信号は０ｘ００になる。レジスタオフセット選択信号はすべて“０”になる。 Next, a specific operation of the saturation calculation unit 63 will be described. An outline of the operation when the PE mode signal is the 8PE mode is shown in FIGS. PE0 to PE7 in the figure indicate processing element numbers, and H and L indicate that each processing element stores a pixel value in units of half words. The first stage of FIGS. 13A, 13B, and 13C shows the network pattern value. The second stage shows the select signal output, and the third stage shows the register offset selection signal output. In FIG. 13A, since all network pattern values are in the range of 0x00 to 0x0f, the select signal is the same as the network pattern value. All register offset selection signals are “0”. FIG. 13B is the same as FIG. 13A when the network pattern value is in the range of 0x00 to 0x0f. In PE7H and PE7L, since the network pattern value is larger than 0x0f, the select signal becomes 0x0f. All register offset selection signals are “0”. FIG. 13C is the same as FIG. 13A when the network pattern value is in the range of 0x00 to 0x0f. In PE0H and PE0L, since the network pattern value is smaller than 0x00, the select signal is 0x00. All register offset selection signals are “0”.

ＰＥモード信号が４ＰＥモードの場合の動作概要を図１４（ａ）〜（ｃ）に示す。図中のＰＥ０からＰＥ７はプロセッシングエレメントの番号を示し、ＨとＬはそれぞれのプロセッシングエレメントがハーフワード単位で画素値を格納していることを示す。図１４（ａ）、図１４（ｂ）及び図１４（ｃ）の第一段はネットワークパタン値を示す。第二段はセレクト信号出力を示し、第三段はレジスタオフセット選択信号出力を示す。図１４（ａ）では、すべてのネットワークパタン値が０ｘ００〜０ｘ０７の範囲にあるので、ＰＥ０〜３ではセレクト信号はネットワークパタン値と同一になる。レジスタオフセット選択信号はすべて“０”になる。ＰＥ４〜７ではセレクト信号はネットワークパタン値の下位３ビットに８を加算した値となり、レジスタオフセット選択信号はすべて“０”になる。図１４（ｂ）では、ネットワークパタン値が０ｘ００〜０ｘ０７の範囲にある場合には、図１４（ａ）と同様である。ＰＥ０〜３について、ＰＥ３Ｌのネットワークパタン値が０ｘ０７より大きい場合ので、セレクト信号は０ｘ０７になる。ＰＥ４〜７では、ＰＥ７Ｌのネットワークパタン値が０ｘ０７より大きい場合ので、セレクト信号は０ｘ０ｆになる。レジスタオフセット選択信号はすべて“０”になる。図１４（ｃ）では、ネットワークパタン値が０ｘ００〜０ｘ０７の範囲にある場合には、図１４（ａ）と同様である。ＰＥ０〜３については、ＰＥ０Ｈ及びＰＥ０Ｌでは、ネットワークパタン値が０ｘ００より小さい場合ので、セレクト信号は０ｘ００になる。ＰＥ４〜７については、ＰＥ４Ｈ及びＰＥ４Ｌでは、ネットワークパタン値が０ｘ００より小さい場合ので、セレクト信号は０ｘ０８になる。レジスタオフセット選択信号はすべて“０”になる。 14A to 14C show an outline of the operation when the PE mode signal is the 4PE mode. PE0 to PE7 in the figure indicate processing element numbers, and H and L indicate that each processing element stores a pixel value in units of half words. The first stage of FIG. 14A, FIG. 14B, and FIG. 14C shows network pattern values. The second stage shows the select signal output, and the third stage shows the register offset selection signal output. In FIG. 14A, since all network pattern values are in the range of 0x00 to 0x07, the select signal is the same as the network pattern value in PE0 to PE3. All register offset selection signals are “0”. In PE4-7, the select signal is a value obtained by adding 8 to the lower 3 bits of the network pattern value, and the register offset selection signals are all "0". FIG. 14B is the same as FIG. 14A when the network pattern value is in the range of 0x00 to 0x07. For PE0 to PE3, since the network pattern value of PE3L is larger than 0x07, the select signal becomes 0x07. In PE4-7, since the network pattern value of PE7L is larger than 0x07, the select signal becomes 0x0f. All register offset selection signals are “0”. FIG. 14C is the same as FIG. 14A when the network pattern value is in the range of 0x00 to 0x07. Regarding PE0 to PE3, in PE0H and PE0L, the select signal is 0x00 because the network pattern value is smaller than 0x00. Regarding PE4-7, in PE4H and PE4L, the select signal is 0x08 because the network pattern value is smaller than 0x00. All register offset selection signals are “0”.

セレクタＡ６４は、１６ｔｏ１セレクタ５１に出力する制御信号を選択する。
セレクタＢ６５は、デマルチプレクサ６６に出力する値を選択する。
デマルチプレクサ６６は、セレクト信号と一致する番号のプロセッシングエレメントに出力されるレジスタオフセット選択信号のみをセレクタＢ６５の出力と接続し、他のレジスタオフセット選択信号は“０”にする。 The selector A64 selects a control signal to be output to the 16to1 selector 51.
The selector B65 selects a value to be output to the demultiplexer 66.
The demultiplexer 66 connects only the register offset selection signal output to the processing element with the number matching the select signal to the output of the selector B65, and sets the other register offset selection signals to “0”.

図１５は演算部とセレクタの回路図である。ここで図中のbit[ｘ：ｙ]は下位ｙビットからｘビットまでのビットを示す。bit［ｚ］は下位からｚビット目を示す。まず、ネットワークパタンレジスタより８bitのネットワークパタン値が送出される。モジュロ演算部はネットワークパタン値を受け取ると、上位４bitを０００１に変えて下位４bitはそのまま出力する。飽和演算部はネットワークパタン値を受け取ると、上位４bitが０００１ならば０ｘ０ｆを出力する。上位４bitが１１１０ならば０ｘ００を出力する。それ以外ならそのまま出力する。セレクタAは下位の４bit のみをセレクトする。セレクタＢは５bit 目のみをセレクトする。 FIG. 15 is a circuit diagram of the calculation unit and the selector. Here, bit [x: y] in the figure indicates bits from the lower y bits to the x bits. bit [z] indicates the z-th bit from the lower order. First, an 8-bit network pattern value is transmitted from the network pattern register. When receiving the network pattern value, the modulo arithmetic unit changes the upper 4 bits to 0001 and outputs the lower 4 bits as they are. When receiving the network pattern value, the saturation calculation unit outputs 0x0f if the upper 4 bits are 0001. If the upper 4 bits are 1110, 0x00 is output. Otherwise, it is output as it is. Selector A selects only the lower 4 bits. Selector B selects only the 5th bit.

続いて、図１６はセレクト処理部５０とレジスタ番号変換部２０の接続を示す構成図である。図１６に示すように、セレクト処理部５０とレジスタ番号変換部２０の間に１６入力１出力の論理和回路１６を備える。
セレクト処理部５０から出力されたレジスタオフセット選択信号は、ＰＥ０ＨからＰＥ７Ｌに備えられた論理和回路に入力され、論理和演算が行われ、何れかからのレジスタオフセット選択信号がレジスタオフセット選択を示していれば“１”をレジスタ番号変換部２０に出力し、そうでなければ“０”を出力する。 FIG. 16 is a block diagram showing the connection between the select processing unit 50 and the register number conversion unit 20. As shown in FIG. 16, a 16-input 1-output OR circuit 16 is provided between the select processing unit 50 and the register number conversion unit 20.
The register offset selection signal output from the select processing unit 50 is input to the logical sum circuit provided in PE0H to PE7L, the logical sum operation is performed, and the register offset selection signal from either indicates the register offset selection. If it is, “1” is output to the register number conversion unit 20, otherwise “0” is output.

続いて、このような構成を持つ並列演算プロセッサで使用されるネットワーク動作命令の１つであるネットワークセレクト命令の動作について図１７を用いて説明する。図１７は、命令デコーダ１１とプロセッシングエレメントＰＥ０Ｈの構成図である。
命令デコーダ１１は、レジスタオフセット変化量レジスタ８０ａと、レジスタオフセット変化量レジスタ８０ｂと、レジスタオフセットモジュロ値レジスタ８１と、ＰＥモードレジスタ８２と、動作モードレジスタ８３とを含む。 Next, the operation of the network select instruction, which is one of the network operation instructions used in the parallel arithmetic processor having such a configuration, will be described with reference to FIG. FIG. 17 is a configuration diagram of the instruction decoder 11 and the processing element PE0H.
The instruction decoder 11 includes a register offset change amount register 80a, a register offset change amount register 80b, a register offset modulo value register 81, a PE mode register 82, and an operation mode register 83.

レジスタオフセット変化量レジスタ８０ａは、リード用のレジスタオフセット変化量を格納しているレジスタである。
レジスタオフセット変化量レジスタ８０ｂは、ライト用のレジスタオフセット変化量を格納しているレジスタである。
レジスタオフセットモジュロ値レジスタ８１は、リードライト兼用のレジスタオフセットモジュロ値を格納しているレジスタである。 The register offset change amount register 80a is a register that stores a register offset change amount for reading.
The register offset change amount register 80b is a register that stores a write register offset change amount.
The register offset modulo value register 81 stores a register offset modulo value that is also used for reading and writing.

ＰＥモードレジスタ８２は、８ＰＥモード、４ＰＥモードの何れかのモードを示すレジスタである。
動作モードレジスタ８３は、レジスタオフセット選択、モジュロ、飽和いずれかのモードを示すレジスタである。
プロセッシングエレメントＰＥ０Ｈは、リード用のレジスタ番号変換部２０ａと、ライト用のレジスタ番号変換部２０ｂと、レジスタファイル１５と、論理和回路１６と、セレクト処理部５０と、演算器ユニット４０を含む。 The PE mode register 82 is a register indicating one of the 8PE mode and the 4PE mode.
The operation mode register 83 is a register indicating a register offset selection, modulo, or saturation mode.
The processing element PE0H includes a read register number conversion unit 20a, a write register number conversion unit 20b, a register file 15, a logical sum circuit 16, a select processing unit 50, and an arithmetic unit 40.

レジスタ番号変換部２０ａは、論理和回路１６からのレジスタオフセット選択信号にしたがい、レジスタ番号をそのまま使用するか、レジスタ番号にレジスタオフセット値を加算したレジスタを使用するかを決定し、レジスタファイル１５にレジスタ番号を出力する。
レジスタ番号変換部２０ｂは、オペランドに書かれたレジスタ番号を変換し、変換された番号のレジスタに、ネットワークによるデータ選択の結果出力されたデータを格納する。 The register number conversion unit 20a determines whether to use the register number as it is or to use the register obtained by adding the register offset value to the register number in accordance with the register offset selection signal from the OR circuit 16. Output the register number.
The register number conversion unit 20b converts the register number written in the operand, and stores the data output as a result of data selection by the network in the converted number register.

次に、動作について説明する。ネットワークセレクト命令の書式を以下に示す。
ｎｓｅｌ．ａ＜ｄｓｔ＞，＜ｓｒｃ＞，＜ネットワークパタンレジスタ＞
ｎｓｅｌ．ａの“．ａ”は２系統あるネットワークのうち、ネットワーク３０ａを使用することを示す。＜ｄｓｔ＞は出力先レジスタ、＜ｓｒｃ＞は入力元レジスタ、＜ネットワークパタンレジスタ＞は使用するネットワークパタン値を保持するネットワークパタンレジスタを指定する。このｎｓｅｌ．ａ命令の実際の記述の一例を以下に示し、その動作を説明する。 Next, the operation will be described. The format of the network select command is shown below.
nsel. a <dst>, <src>, <network pattern register>
nsel. “.a” in a indicates that the network 30a is used among the two systems. <Dst> designates an output destination register, <src> designates an input source register, and <network pattern register> designates a network pattern register holding a network pattern value to be used. This nsel. An example of the actual description of the a instruction is shown below, and its operation will be described.

ｎｓｅｌ．ａｒ２，ｒ０，ＮＰＡ
このｎｓｅｌ．ａ命令では、オペランドのｒ０が命令デコーダ１１によって、レジスタ番号変換部２０に入力される。ｎｓｅｌ．ａ命令ではレジスタオフセット更新動作はしないので、レジスタオフセット変化量レジスタ８０およびレジスタオフセットモジュロ値レジスタ８１は使用しない。レジスタ番号変換部２０ａでは、論理和回路１６からのレジスタオフセット選択信号によって、ｒ０をそのまま使用するか、ｒ０にレジスタオフセット値を加算したレジスタを使用するかを決定し、レジスタファイル１５にレジスタ番号を出力する。レジスタファイル１５は指定されたレジスタから読み出したデータを出力する。ＮＰＡはネットワークパタンレジスタ５３を指すオペランドである。オペランドのＮＰＡは、セレクト処理部のネットワークパタンレジスタ５３によって、ネットワークによるデータの選択動作および、レジスタオフセット選択信号の動作が行われることを示す。ｎｓｅｌ．ａ命令では、ネットワークパタン値の更新は行わないので、ネットワークパタン値変化量は使用しない。ネットワークによるデータ選択の結果出力されたデータは、オペランドに書かれたｒ２をレジスタ番号変換部２０ｂで変換した番号のレジスタに格納される。 nsel. a r2, r0, NPA
This nsel. In the a instruction, the operand r0 is input to the register number conversion unit 20 by the instruction decoder 11. nsel. Since the a instruction does not perform the register offset update operation, the register offset change amount register 80 and the register offset modulo value register 81 are not used. The register number conversion unit 20a determines whether to use r0 as it is or to use a register obtained by adding a register offset value to r0 according to the register offset selection signal from the OR circuit 16, and assigns the register number to the register file 15. Output. The register file 15 outputs the data read from the designated register. NPA is an operand indicating the network pattern register 53. The operand NPA indicates that the network pattern register 53 of the select processing unit performs data selection operation and register offset selection signal operation by the network. nsel. In the a command, the network pattern value is not updated, so the network pattern value change amount is not used. Data output as a result of data selection by the network is stored in a register having a number obtained by converting r2 written in the operand by the register number conversion unit 20b.

次に、ＮＰＡの格納値の一例と、そのときの動作を図１８、図１９に示す。図１８の第一段はプロセッシングエレメントを示す。第二段はハーフワード単位で画素値を格納していることを示す。第一弾及び第二段の詳細は図９の説明で記載したのと同様である。第三段はリード用レジスタオフセット値を示す。第四段はライト用レジスタオフセット値を示す。第五段はNPＡに格納されているネットワークパタン値を示す。第六段は入力データｒ０を示す。第七段は出力データｒ２を示す。図１９の第一段はプロセッシングエレメントを示す。第二段はハーフワード単位で画素値を格納していることを示す。第一弾及び第二段の詳細は図９の説明で記載したのと同様である。第三段はリード用レジスタオフセット値を示す。第四段はライト用レジスタオフセット値を示す。第五段はNPＡに格納されているネットワークパタン値を示す。第六段は入力データｒ１を示す。第七段は入力データｒ０を示す。第八段は出力データｒ２を示す。図１８、図１９では、ＰＥ動作モードレジスタは８ＰＥモード、動作モードレジスタはレジスタオフセット選択に設定されており、リード用レジスタオフセット値は全てのプロセッシングエレメントで０ｘ０１が、ライト用レジスタオフセット値は全てのプロセッシングエレメントで０ｘ００が設定されているものとする。図１８は、ネットワークパタン値として０１、００、０３、０２．．．０ｆ、０ｅを用いた場合の入出力を示しており、ｘ００、ｘ０１．．．ｘ１５はデータを表している。図９の説明で述べたように、ネットワークパタン値に関して、十の位はレジスタオフセット値を使用するか否かを示す。十の位が０なら、レジスタオフセット値を使用しないことを示す。一の位はどのプロセッシングエレメントのＨかＬのどちらからデータを取ってくるかを指定する。したがって、例えば、ネットワークパタン値が０１では、レジスタオフセットは使用しないことを示し、またｒ０のＰＥ０Ｌからデータを取ってくることを示す。同様に、ネットワークパタン値が００では、レジスタオフセットは使用せず、ｒ０のＰＥ０Ｈからデータを取ってくることを示す。 Next, an example of the stored value of the NPA and the operation at that time are shown in FIGS. The first stage of FIG. 18 shows a processing element. The second level indicates that pixel values are stored in units of half words. The details of the first bullet and the second stage are the same as described in the explanation of FIG. The third row shows the read register offset value. The fourth row shows the write register offset value. The fifth row shows network pattern values stored in the NPA. The sixth row shows the input data r0. The seventh row shows output data r2. The first stage of FIG. 19 shows a processing element. The second level indicates that pixel values are stored in units of half words. The details of the first bullet and the second stage are the same as described in the explanation of FIG. The third row shows the read register offset value. The fourth row shows the write register offset value. The fifth row shows network pattern values stored in the NPA. The sixth row shows the input data r1. The seventh row shows the input data r0. The eighth row shows output data r2. 18 and 19, the PE operation mode register is set to 8PE mode, the operation mode register is set to register offset selection, the read register offset value is 0x01 in all processing elements, and the write register offset value is all It is assumed that 0x00 is set in the processing element. 18 shows network pattern values 01, 00, 03, 02. . . The input / output when 0f and 0e are used is shown, and x00, x01. . . x15 represents data. As described in the description of FIG. 9, the tens place indicates whether or not the register offset value is used in the network pattern value. If the tens place is 0, it indicates that the register offset value is not used. The first digit specifies from which processing element H or L data is taken. Therefore, for example, when the network pattern value is 01, this indicates that the register offset is not used, and that data is fetched from PE0L of r0. Similarly, when the network pattern value is 00, the register offset is not used and data is fetched from PE0H of r0.

よって、このようなパタンの場合、図１８に示すように各プロセッシングエレメントのＭＳＢ８ビットとＬＳＢ８ビットが入れ替わるように動作する。
図１９は、ネットワークパタン値として１１、００、１３、０２．．．１ｆ、０ｅを用いた場合の入出力を示しており、ｘ００、ｘ０１．．．ｘ１５はデータを表している。このとき、ネットワークは各プロセッシングエレメントのＭＳＢバイトとＬＳＢバイトが入れ替わるように動作するが、図１８と異なり、例えばネットワークパタン値が１１の場合には、ＰＥ０ＨにはＰＥ０Ｌのレジスタオフセットを加えたレジスタｒ１のデータがｒ２に出力される。ネットワークパタン値が００の場合には、図１８と同様に、レジスタオフセットは使用せず、ｒ０のデータがｒ２に出力される。 Therefore, in the case of such a pattern, as shown in FIG. 18, the MSB 8 bit and the LSB 8 bit of each processing element operate so as to be interchanged.
19 shows network pattern values 1 1, 00, 13, 02. . . The input / output when 1f and 0e are used is shown, and x00, x01. . . x15 represents data. At this time, the network operates so that the MSB byte and the LSB byte of each processing element are interchanged. However, unlike FIG. 18, for example, when the network pattern value is 11, the register r1 is obtained by adding the register offset of PE0L to PE0H. Is output to r2. When the network pattern value is 00, the register offset is not used and the data of r0 is output to r2, as in FIG.

次に、ｎｓｅｌ．ａ命令の別の書式を示す。
ｎｓｅｌ．ａ＜ｄｓｔ＞，＜ｓｒｃ＞，＜即値＞
ｎｓｅｌ．ａ命令の“．ａ”は２系統あるネットワークのうち、ネットワーク３０ａを使用することを示し、＜ｄｓｔ＞は出力先レジスタ、＜ｓｒｃ＞は入力元レジスタ、＜即値＞は数値を指定する。このｎｓｅｌ．ａ命令の実際の記述の一例を以下に示し、その動作を説明する。 Next, nsel. Fig. 5 shows another format of the a instruction.
nsel. a <dst>, <src>, <immediate value>
nsel. The “.a” of the a instruction indicates that the network 30a is used out of the two networks, <dst> specifies an output destination register, <src> specifies an input source register, and <immediate value> specifies a numerical value. This nsel. An example of the actual description of the a instruction is shown below, and its operation will be described.

ｎｓｅｌ．ａｒ２，ｒ０，０ｘ４
このｎｓｅｌ．ａ命令では、オペランドのｒ０が命令デコーダ１１によって、レジスタ番号変換部２０に入力される。ｎｓｅｌ．ａ命令ではレジスタオフセット更新動作はしないので、レジスタオフセット変化量レジスタ８０およびレジスタオフセットモジュロ値レジスタ８１は使用しない。レジスタ番号変換部２０ａでは、論理和回路１６からのレジスタオフセット選択信号によって、ｒ０をそのまま使用するか、ｒ０にレジスタオフセット値を加算したレジスタを使用するかを決定し、レジスタファイル１５にレジスタ番号を出力する。レジスタファイル１５は指定されたレジスタから読み出したデータを出力する。オペランドの０ｘ４は、全てのプロセッシングエレメントのネットワークパタン値が０ｘ４であるとして、ネットワークによるデータの選択動作および、レジスタオフセット選択信号の動作が行われることを示している。ｎｓｅｌ．ａ命令では、ネットワークパタン値の更新は行わないので、ネットワークパタン値変化量は使用しない。０ｘ０４の場合は、８ＰＥモードであれば、ＰＥ２Ｈのｒ０のデータがすべてのプロセッシングエレメントに出力され、オペランドに書かれたｒ２をレジスタ番号変換部２０ｂで変換した番号のレジスタに格納されるが、ライト用のレジスタオフセット値が０であるので、オペランドと同じｒ２に格納される。 nsel. a r2, r0,0x4
This nsel. In the a instruction, the operand r0 is input to the register number conversion unit 20 by the instruction decoder 11. nsel. Since the a instruction does not perform the register offset update operation, the register offset change amount register 80 and the register offset modulo value register 81 are not used. The register number conversion unit 20a determines whether to use r0 as it is or to use a register obtained by adding a register offset value to r0 according to a register offset selection signal from the OR circuit 16, and register number is stored in the register file 15. Output. The register file 15 outputs the data read from the designated register. Operand 0x4 indicates that the network pattern value of all processing elements is 0x4, and data selection operation and register offset selection signal operation are performed by the network. nsel. In the a command, the network pattern value is not updated, so the network pattern value change amount is not used. In the case of 0x04, in the 8PE mode, the r0 data of PE2H is output to all the processing elements and stored in the register of the number obtained by converting r2 written in the operand by the register number conversion unit 20b. Since the register offset value is 0, it is stored in the same r2 as the operand.

続いて、ネットワークシフト命令について説明する。ネットワークシフト命令の書式を以下に示す。
ｎｓｆｔｉ．ａ＜ｄｓｔ＞，＜ｓｒｃ＞，＜ネットワークパタンレジスタ＞，＜ネットワークパタン値変化量＞
ｎｓｆｔｉ．ａ命令の“．ａ”は２系統あるネットワークのうち、ネットワーク３０ａを使用することを示し、＜ｄｓｔ＞は出力先レジスタ、＜ｓｒｃ＞は入力元レジスタ、＜ネットワークパタンレジスタ＞はデータの移動に使用するネットワークパタンレジスタ、＜ネットワークパタン値変化量＞はデータの移動後のネットワークパタン値に加算する値を指定する。このｎｓｆｔｉ．ａ命令の実際の記述の一例を以下に示し、その動作を説明する。 Next, the network shift command will be described. The format of the network shift instruction is shown below.
nsfti. a <dst>, <src>, <network pattern register>, <network pattern value change amount>
nsfti. The “.a” of the a instruction indicates that the network 30a is used out of the two networks, <dst> is an output destination register, <src> is an input source register, and <network pattern register> is for data movement The network pattern register to be used, <network pattern value change amount>, designates a value to be added to the network pattern value after data movement. This nsfti. An example of the actual description of the a instruction is shown below, and its operation will be described.

ｎｓｆｔｉ．ａｒ２，ｒ０，ＮＰＡ，０ｘ１
本実施形態では、レジスタオフセット変化量レジスタ８０およびレジスタオフセットモジュロ値レジスタ８１は使用しない。また、ＰＥモードレジスタ８２は８ＰＥモード、動作モードレジスタ８３はレジスタオフセット値選択モードに設定されているものとする。
このｎｓｆｔｉ．ａ命令では、オペランドのｒ０が命令デコーダ１１によって、レジスタ番号変換部２０に入力される。ｎｓｆｔｉ．ａ命令ではレジスタオフセットは更新しない。レジスタ番号変換部２０ａでは、論理和回路１６からのレジスタオフセット選択信号によって、ｒ０をそのまま使用するか、ｒ０にレジスタオフセット値を加算したレジスタを使用するかを決定し、レジスタファイル１５にレジスタ番号を出力する。レジスタファイル１５は指定されたレジスタから読み出したデータを出力する。オペランドのＮＰＡは、セレクト処理部５０のネットワークパタンレジスタ５３によって、ネットワークによるデータの選択動作および、レジスタオフセット選択信号のセレクト動作が行われることを示している。オペランドの０ｘ１はネットワークパタン値変化量を指し、データのセレクトと同時に、ＮＰＡの値をすべてのＰＥで１加算する。そして、ネットワークによるデータ選択の結果出力されたデータは、レジスタ番号変換部２０ｂで変換した番号のレジスタに格納される。 nsfti. a r2, r0, NPA, 0x1
In this embodiment, the register offset change amount register 80 and the register offset modulo value register 81 are not used. It is assumed that the PE mode register 82 is set to the 8PE mode and the operation mode register 83 is set to the register offset value selection mode.
This nsfti. In the a instruction, the operand r0 is input to the register number conversion unit 20 by the instruction decoder 11. nsfti. The register offset is not updated by the a instruction. The register number conversion unit 20a determines whether to use r0 as it is or to use a register obtained by adding a register offset value to r0 according to the register offset selection signal from the OR circuit 16, and assigns the register number to the register file 15. Output. The register file 15 outputs the data read from the designated register. The NPA of the operand indicates that the network pattern register 53 of the select processing unit 50 performs a data selection operation and a register offset selection signal selection operation by the network. The operand 0x1 indicates the amount of change in the network pattern value. At the same time as selecting data, the NPA value is incremented by 1 for all PEs. The data output as a result of data selection by the network is stored in the register with the number converted by the register number conversion unit 20b.

図４７に命令シーケンスを示す。図４７の（ａ）、（ｂ）では、ｎｓｆｔｉ．ａ命令が発行されることを示している。この命令のシーケンスに合わせたネットワーク３０ａの動作を図２０に示す。図２０にｎｓｆｔｉ．ａ命令を発行した時のデータの動きを示す。図２０（ａ）の第一段はプロセッシングエレメントを示す。第二段はハーフワード単位で画素値を格納していることを示す。第一弾及び第二段の詳細は図９の説明で記載したのと同様である。第三段はリード用レジスタオフセット値を示す。第四段はライト用レジスタオフセット値を示す。第五段はNPＡに格納されているネットワークパタン値を示す。第六段は入力データｒ１を示す。第七段は入力データｒ０を示す。第八段は出力データｒ２を示す。図２０（ｂ）の第一段はNPＡに格納されているネットワークパタン値を示す。第二段は入力データｒ１を示す。第三段は入力データｒ０を示し、第四段は出力データｒ２を示す。ｘ００からｘ３１は水平方向に並ぶ画素データである。ＮＰＡの初期値として０１、００、０３、０２．．．０ｆ、０ｅというネットワークパタン値を設定し、レジスタオフセットに０ｘ０１という値を設定する。図１８と同様に図２０（ａ）では、ｘ０１、ｘ００、ｘ０３．．．ｘ１４のデータが出力され、同時にＮＰＡの値が＋１更新される。次に、図４７（ｂ）の２度目のｎｓｆｔｉ．ａ命令によって、図２０（ｂ）のように、ｘ０２、ｘ０１、ｘ０４．．．ｘ１５のデータが出力され、ＮＰＡの値が＋１更新される。また、ネットワークによるデータ選択の結果出力されたデータは、レジスタ番号変換部２０ｂでレジスタ番号を変換されるが、ライト用のレジスタオフセット値が０であるので、オペランドと同じｒ２に格納される。 FIG. 47 shows an instruction sequence. 47 (a) and 47 (b), nsfti. The a instruction is issued. The operation of the network 30a in accordance with this instruction sequence is shown in FIG. FIG. 20 shows nsfti. The movement of data when the a instruction is issued is shown. The first stage in FIG. 20A shows a processing element. The second level indicates that pixel values are stored in units of half words. The details of the first bullet and the second stage are the same as described in the explanation of FIG. The third row shows the read register offset value. The fourth row shows the write register offset value. The fifth row shows network pattern values stored in the NPA. The sixth row shows the input data r1. The seventh row shows the input data r0. The eighth row shows output data r2. The first row in FIG. 20B shows the network pattern values stored in the NPA. The second stage shows input data r1. The third stage shows input data r0, and the fourth stage shows output data r2. x00 to x31 are pixel data arranged in the horizontal direction. As initial values of NPA, 01, 00, 03, 02. . . A network pattern value of 0f and 0e is set, and a value of 0x01 is set in the register offset. Similar to FIG. 18, in FIG. 20 (a), x01, x00, x03. . . The data of x14 is output, and at the same time, the value of NPA is updated by +1. Next, the second nsfti. With the a instruction, as shown in FIG. 20B, x02, x01, x04. . . The data of x15 is output, and the value of NPA is updated by +1. Data output as a result of data selection by the network is converted in register number by the register number conversion unit 20b. However, since the register offset value for writing is 0, it is stored in the same r2 as the operand.

以上のように本実施形態によれば、各プロセッシングエレメントはネットワークパタン値に従い、データ転送を実行するのと同時に、ネットワークパタン値に対し命令に基づく演算を行い、ネットワークパタン値を更新する。よって、任意のプロセッシングエレメント間でデータ転送が可能となり、また、データ転送を行う度に命令オペランド中に即値でネットワークパタン値を与える必要はなく、SIMDプロセッサに対するプログラミングにあたってのコードサイズを削減することができる。 As described above, according to the present embodiment, each processing element executes data transfer according to the network pattern value, and at the same time, performs an operation based on the command on the network pattern value to update the network pattern value. Therefore, it is possible to transfer data between arbitrary processing elements, and it is not necessary to provide an immediate network pattern value in the instruction operand every time data is transferred, and the code size for programming the SIMD processor can be reduced. it can.

（第二実施形態）
本実施形態は、並列演算プロセッサをＦＩＲフィルタ処理に応用する場合の改良に関する。ＦＩＲフィルタは、雑音除去などのためにしばしば利用されるフィルタであり、隣接するデータに積和演算を施すことにより実現される。本実施形態では、対称係数のＦＩＲの場合について示す。 (Second embodiment)
The present embodiment relates to an improvement when a parallel arithmetic processor is applied to FIR filter processing. The FIR filter is a filter often used for noise removal and the like, and is realized by performing a product-sum operation on adjacent data. In the present embodiment, the case of FIR with a symmetric coefficient is shown.

ＦＩＲフィルタの演算方法を図２１に示す。図２１のデータｘ１６に着目する。図２１（ａ）においてｘ１６へのＦＩＲフィルタ演算は、ｘ１６に係数ｃ０を乗算した結果と、ｘ１６の隣接データであるｘ１５とｘ１７を加算後、係数ｃ１と乗算した結果と、ｘ１６の隣接データであるｘ１４とｘ１８を加算後、係数ｃ２を乗算した結果とをすべて加算することによって行うことが出来る。 The calculation method of the FIR filter is shown in FIG. Focus on data x16 in FIG. In FIG. 21 (a), the FIR filter operation on x16 is based on the result of multiplying x16 by the coefficient c0, the result of adding x15 and x17, which are adjacent data of x16, and the coefficient c1, and the adjacent data of x16. After adding a certain x14 and x18, the result obtained by multiplying the coefficient c2 can be added.

また、図２１（ｂ）、図２１（ｃ）に示すようにｘ１７、ｘ１８についても同様に隣接データを利用してＦＩＲフィルタの演算が行われる。ｘ１６のＦＩＲ演算と、ｘ１７のＦＩＲ演算とを比べると、ｘ１７の演算はｘ１６の演算と比べて使用するデータがすべて１データ分だけ右側に寄っている。ｘ１８の演算はさらに１データ分右側に寄っている。
したがって、あるときにｘ１６、ｘ１７、ｘ１８．．．のデータをそれぞれのプロセッシングエレメントの演算器に供給し、ｃ０と乗算する。次にｘ１７、ｘ１８、ｘ１９．．．及びｘ１５、ｘ１６、ｘ１７．．．のデータをそれぞれのプロセッシングエレメントの演算器に供給し、加算後ｃ１と乗算する。次にｘ１８、ｘ１９、ｘ２０．．．及びｘ１４、ｘ１５、ｘ１６．．．をそれぞれのプロセッシングエレメントの演算器に供給し、加算後ｃ２と乗算する。これらの乗算結果を加算することによりＦＩＲ演算をプロセッシングエレメントの数だけ並列に処理することができる。 Further, as shown in FIGS. 21B and 21C, the calculation of the FIR filter is similarly performed for x17 and x18 using adjacent data. Comparing the x16 FIR calculation and the x17 FIR calculation, the x17 calculation is shifted to the right by one data amount compared to the x16 calculation. The calculation of x18 is further to the right by one data.
Therefore, at some point x16, x17, x18. . . Are supplied to the arithmetic unit of each processing element and multiplied by c0. Next, x17, x18, x19. . . And x15, x16, x17. . . Is supplied to the arithmetic unit of each processing element, and after addition, it is multiplied by c1. Next, x18, x19, x20. . . And x14, x15, x16. . . Is supplied to the arithmetic unit of each processing element, and after addition, it is multiplied by c2. By adding these multiplication results, FIR operations can be processed in parallel by the number of processing elements.

このようなデータ供給が第一実施形態に示した構成によって可能となることを、図１７を用いて説明する。ＦＩＲ演算ではネットワークシフト命令を使用する。ネットワークシフト命令の実際の記述の一例を以下に示し、その動作を説明する。
ｎｓｆｔｉ．ａｒ３，ｒ１，ＮＰＡ，０ｘ１
本実施形態では、ネットワーク３０ａ、３０ｂいずれもレジスタオフセット変化量レジスタ８０およびレジスタオフセットモジュロ値レジスタ８１は使用しない。また、ＰＥモードレジスタ８２は８ＰＥモード、動作モードレジスタ８３はレジスタオフセット値選択モードに設定されているものとする。 The fact that such data supply is possible with the configuration shown in the first embodiment will be described with reference to FIG. The FIR operation uses a network shift instruction. An example of the actual description of the network shift instruction is shown below, and its operation will be described.
nsfti. a r3, r1, NPA, 0x1
In this embodiment, neither the network 30a nor 30b uses the register offset change amount register 80 and the register offset modulo value register 81. It is assumed that the PE mode register 82 is set to the 8PE mode and the operation mode register 83 is set to the register offset value selection mode.

このｎｓｆｔｉ．ａ命令では、オペランドのｒ１が命令デコーダ１１によって、レジスタ番号変換部２０に入力される。ｎｓｆｔｉ．ａ命令ではレジスタオフセットは更新しない。レジスタ番号変換部２０ａでは、論理和回路１６からのレジスタオフセット選択信号によって、ｒ１をそのまま使用するか、ｒ１にレジスタオフセット値を加算したレジスタを使用するかを決定し、レジスタファイル１５にレジスタ番号を出力する。レジスタファイル１５は指定されたレジスタから読み出したデータを出力する。ＮＰＡはネットワークパタンレジスタを指すオペランドである。オペランドのＮＰＡは、セレクト処理部５０のネットワークパタンレジスタ５３によって、ネットワークによるデータの選択動作および、レジスタオフセット選択信号のセレクト動作が行われることを示している。オペランドの０ｘ１はネットワークパタン値変化量を指し、データのセレクトと同時に、ＮＰＡの値をすべてのＰＥで１加算する。そして、ネットワークによるデータ選択の結果出力されたデータは、レジスタ番号変換部２０ｂで変換した番号のレジスタに格納される。 This nsfti. In the a instruction, the r1 of the operand is input to the register number conversion unit 20 by the instruction decoder 11. nsfti. The register offset is not updated by the a instruction. The register number conversion unit 20a determines whether to use r1 as it is or to use a register obtained by adding a register offset value to r1 in accordance with a register offset selection signal from the OR circuit 16, and assigns a register number to the register file 15. Output. The register file 15 outputs the data read from the designated register. NPA is an operand indicating a network pattern register. The NPA of the operand indicates that the network pattern register 53 of the select processing unit 50 performs a data selection operation and a register offset selection signal selection operation by the network. The operand 0x1 indicates the amount of change in the network pattern value. At the same time as selecting data, the NPA value is incremented by 1 for all PEs. The data output as a result of data selection by the network is stored in the register with the number converted by the register number conversion unit 20b.

次に、ネットワークシフト命令のもう１つの形式を示す。
ｎｓｆｔｄ．ｂ＜ｄｓｔ＞，＜ｓｒｃ＞，＜ネットワークパタンレジスタ＞，＜ネットワークパタン値変化量＞
ｎｓｆｔｄ．ｂ命令の“．ｂ”は２系統あるネットワークのうち、ネットワーク３０ｂを使用することを示し、＜ｄｓｔ＞は出力先レジスタ、＜ｓｒｃ＞は入力元レジスタ、＜ネットワークパタンレジスタ＞はデータの移動に使用するネットワークパタンレジスタ、＜ネットワークパタン値変化量＞はデータの移動後のネットワークパタン値に減算する値を指定する。このｎｓｆｔｄ．ｂ命令の実際の記述の一例を以下に示し、その動作を説明する。 The following is another form of network shift instruction.
nsftd. b <dst>, <src>, <network pattern register>, <network pattern value change amount>
nsftd. “.b” of the b instruction indicates that the network 30b is used out of two networks, <dst> is an output destination register, <src> is an input source register, and <network pattern register> is for data movement. The network pattern register to be used, <network pattern value change amount>, designates a value to be subtracted from the network pattern value after data movement. This nsftd. An example of the actual description of the b instruction is shown below, and its operation will be described.

ｎｓｆｔｄ．ｂｒ４，ｒ１，ＮＰＢ，０ｘ１
このｎｓｆｔｄ．ｂ命令では、オペランドのｒ１が命令デコーダ１１によって、レジスタ番号変換部２０に入力される。ｎｓｆｔｄ．ｂ命令ではレジスタオフセットは更新しない。レジスタ番号変換部２０ａでは、論理和回路１６からのレジスタオフセット選択信号によって、ｒ１をそのまま使用するか、ｒ１にレジスタオフセット値を加算したレジスタを使用するかを決定し、レジスタファイル１５にレジスタ番号を出力する。レジスタファイル１５は指定されたレジスタから読み出したデータを出力する。ＮＰＢはネットワークパタンレジスタを指すオペランドである。オペランドのＮＰＢは、セレクト処理部５０ｂのネットワークパタンレジスタ５３ｂによって、ネットワークによるデータの選択動作および、レジスタオフセット選択信号のセレクト動作が行われることを示している。オペランドの０ｘ１はネットワークパタン値変化量を指し、データのセレクトと同時に、ＮＰＢの値をすべてのプロセッシングエレメントで１減算する。そして、ネットワークによるデータ選択の結果出力されたデータは、レジスタ番号変換部２０ｂで変換した番号のレジスタに格納される。ここで、セレクト処理部５０ｂ及びネットワークパタンレジスタ５３ｂは、プロセッシングエレメントＰＥ０のネットワーク３０ｂに含まれるセレクト処理部及びネットワークパタンレジスタを指す。 nsftd. b r4, r1, NPB, 0x1
This nsftd. In the b instruction, r1 of the operand is input to the register number conversion unit 20 by the instruction decoder 11. nsftd. The register offset is not updated by the b instruction. The register number conversion unit 20a determines whether to use r1 as it is or to use a register obtained by adding a register offset value to r1 in accordance with a register offset selection signal from the OR circuit 16, and assigns a register number to the register file 15. Output. The register file 15 outputs the data read from the designated register. NPB is an operand indicating a network pattern register. The operand NPB indicates that the network pattern register 53b of the select processing unit 50b performs a data selection operation and a register offset selection signal selection operation by the network. The operand 0x1 indicates the amount of change in the network pattern value. At the same time as selecting the data, 1 is subtracted from the value of NPB in all processing elements. The data output as a result of data selection by the network is stored in the register with the number converted by the register number conversion unit 20b. Here, the select processing unit 50b and the network pattern register 53b indicate a select processing unit and a network pattern register included in the network 30b of the processing element PE0.

このｎｓｆｔｉ．ａ命令と、ｎｓｆｔｄ．ｂ命令を同時に発行することによって、ＦＩＲフィルタの演算が行えることを示す。図４８にＦＩＲフィルタ演算を行うための、命令シーケンスを示す。図４８の（ａ）〜（ｃ）では、ｎｓｆｔｉ．ａ命令とｎｓｆｔｄ．ｂ命令が同時に発行されることを示している。この命令のシーケンスに合わせたネットワーク３０ａ、３０ｂの動作をそれぞれ図２２と図２３に示す。図２２にｎｓｆｔｉ．ａ命令を発行した時のデータの動きを示す。図２２（ａ）の第一段はプロセッシングエレメントを示す。第二段はハーフワード単位で画素値を格納していることを示す。第一弾及び第二段の詳細は図９の説明で記載したのと同様である。第三段はリード用レジスタオフセット値を示す。第四段はライト用レジスタオフセット値を示す。第五段はNPＡに格納されているネットワークパタン値を示す。第六段は入力データｒ２を示す。第七段は入力データｒ１を示す。下第八段は出力データｒ３を示す。図２２（ｂ）、図２２（ｃ）の第一段はNPＡに格納されているネットワークパタン値を示す。第二段は入力データｒ２を示す。第三段は入力データｒ１を示す。第四段は出力データを示す。ｘ１６からｘ４７は水平方向に並ぶ画素データである。ＮＰＡの初期値として００、０１、０２、０３．．．０ｆというネットワークパタン値を設定し、レジスタオフセットに０ｘ０１という値を設定する。図４８（ａ）の１度目のｎｓｆｔｉ．ａ命令によって、図２２（ａ）のように、ｘ１６、ｘ１７、ｘ１８．．．ｘ３１のデータが出力され、同時にＮＰＡの値が＋１更新される。次に、図２２（ｂ）ではネットワークパタン値は１度目のｎｓｆｔｉ．ａ命令によって＋１更新されているので０１、０２、０３．．．０ｆ、１０となっている。ここで、ＰＥ７Ｌのネットワークパタン値は１０であるので、ＰＥ７ＬにはＰＥ０Ｈのオフセットを加えたレジスタｒ２のデータがｒ３に出力される。よって、図４８（ｂ）の２度目のｎｓｆｔｉ．ａ命令によって、図２２（ｂ）のように、ｘ１７、ｘ１８、ｘ１９．．．ｘ３２のデータが出力され、ＮＰＡの値が＋１更新される。さらに、図４８（ｃ）の３度目のｎｓｆｔｉ．ａ命令によって、図２２（ｃ）のように、ｘ１８、ｘ１９、ｘ２０．．．ｘ３３のデータが出力され、ＮＰＡの値が＋１更新される。このようにｎｓｆｔｉ．ａ命令を発行するたびに次々に右側の隣接データが各プロセッシングエレメントに出力される。また、ネットワークによるデータ選択の結果出力されたデータは、レジスタ番号変換部２０ｂでレジスタ番号を変換されるが、ライト用のレジスタオフセット値が０であるので、オペランドと同じｒ３に格納される。 This nsfti. a instruction and nsftd. It indicates that the FIR filter can be operated by issuing the b instruction simultaneously. FIG. 48 shows an instruction sequence for performing the FIR filter operation. 48 (a) to (c), nsfti. a instruction and nsftd. The b instruction is issued simultaneously. The operations of the networks 30a and 30b in accordance with this instruction sequence are shown in FIGS. 22 and 23, respectively. 22 shows nsfti. The movement of data when the a instruction is issued is shown. The first stage in FIG. 22A shows a processing element. The second level indicates that pixel values are stored in units of half words. The details of the first bullet and the second stage are the same as described in the explanation of FIG. The third row shows the read register offset value. The fourth row shows the write register offset value. The fifth row shows network pattern values stored in the NPA. The sixth row shows the input data r2. The seventh row shows the input data r1. The lower eighth row shows the output data r3. The first stage of FIG. 22B and FIG. 22C shows the network pattern values stored in the NPA. The second stage shows input data r2. The third row shows the input data r1. The fourth row shows the output data. x16 to x47 are pixel data arranged in the horizontal direction. 00, 01, 02, 03. . . A network pattern value of 0f is set, and a value of 0x01 is set to the register offset. The first nsfti. Of FIG. The a instruction causes x16, x17, x18. . . x31 data is output, and at the same time, the value of the NPA is updated by +1. Next, in FIG. 22B, the network pattern value is the first nsfti. Since +1 is updated by the a instruction, 01, 02, 03. . . 0f and 10. Here, since the network pattern value of PE7L is 10, the data of register r2 to which the offset of PE0H is added is output to PE3L. Therefore, the second nsfti. By the a instruction, as shown in FIG. 22B, x17, x18, x19. . . x32 data is output, and the value of NPA is updated by +1. Further, the third nsfti. By the a instruction, as shown in FIG. 22C, x18, x19, x20. . . x33 data is output and the value of the NPA is updated by +1. Thus, nsfti. Each time the a instruction is issued, the adjacent data on the right side is sequentially output to each processing element. The data output as a result of data selection by the network is converted in register number by the register number conversion unit 20b. However, since the register offset value for writing is 0, it is stored in the same r3 as the operand.

このとき、各プロセッシングエレメントがアクセスするレジスタは図２２の斜線部で示すように、それぞれｒ１又はｒ２であり、バイト単位に見た場合、読み出すレジスタの数は１つだけである。
図２３にｎｓｆｔｄ．ｂ命令を発行した時のデータの動きを示す。図２３（ａ）の第一段はプロセッシングエレメントを示す。第二段はハーフワード単位で画素値を格納していることを示す。第一弾及び第二段の詳細は図９の説明で記載したのと同様である。第三段はリード用レジスタオフセット値を示す。第四段はライト用レジスタオフセット値を示す。第五段はNPＢに格納されているネットワークパタン値を示す。第六段は入力データｒ０を示す。第七段は入力データｒ１を示す。下第八段は出力データｒ４を示す。図２３（ｂ）、図２３（ｃ）の第一段はNPＢに格納されているネットワークパタン値を示す。第二段は入力データｒ０を示す。第三段は入力データｒ１を示す。第四段は出力データｒ４を示す。ｘ００からｘ３１は水平方向に並ぶ画素データである。ＮＰＢの初期値として００，０１，０２，０３．．．０ｆというネットワークパタン値を設定し、レジスタオフセットに０ｘ０ｆという値を設定する。図４８（ａ）の１度目のｎｓｆｔｄ．ｂ命令によって、図２３（ａ）のように、ｘ１６、ｘ１７、ｘ１８．．．ｘ３１のデータが出力され、同時にＮＰＢの値が−１更新される。次に、図４８（ｂ）の２度目のｎｓｆｔｄ．ｂ命令によって、図２３（ｂ）のように、ｘ１５、ｘ１６、ｘ１７．．．ｘ３０のデータが出力され、ＮＰＢの値が−１更新される。さらに、図４８（ｃ）の３度目のｎｓｆｔｄ．ｂ命令によって、図２３（ｃ）のように、ｘ１４、ｘ１５、ｘ１６．．．ｘ２９のデータが出力され、ＮＰＢの値が−１更新される。このようにｎｓｆｔｄ．ｂ命令を発行するたびに次々に左側の隣接データが各プロセッシングエレメントに出力される。また、ネットワークによるデータ選択の結果出力されたデータは、レジスタ番号変換部２０ｂでレジスタ番号を変換されるが、ライト用のレジスタオフセット値が０であるので、オペランドと同じｒ４に格納される。 At this time, the registers accessed by the respective processing elements are r1 or r2, respectively, as indicated by the hatched portion in FIG. 22, and only one register is read when viewed in byte units.
FIG. 23 shows nsftd. The data movement when the b instruction is issued is shown. The first stage of FIG. 23A shows a processing element. The second level indicates that pixel values are stored in units of half words. The details of the first bullet and the second stage are the same as described in the explanation of FIG. The third row shows the read register offset value. The fourth row shows the write register offset value. The fifth row shows network pattern values stored in the NPB. The sixth row shows the input data r0. The seventh row shows the input data r1. The lower eighth row shows output data r4. The first stage of FIG. 23B and FIG. 23C shows the network pattern values stored in the NPB. The second stage shows input data r0. The third row shows the input data r1. The fourth row shows the output data r4. x00 to x31 are pixel data arranged in the horizontal direction. 00, 01, 02, 03. as the initial value of NPB. . . A network pattern value of 0f is set, and a value of 0x0f is set to the register offset. The first nsftd. Of FIG. As shown in FIG. 23A, x16, x17, x18. . . The data of x31 is output, and at the same time, the value of NPB is updated by -1. Next, the second nsftd. By the b instruction, as shown in FIG. 23B, x15, x16, x17. . . x30 data is output and the value of NPB is updated by -1. Further, the third nsftd. As shown in FIG. 23C, x14, x15, x16. . . x29 data is output, and the value of NPB is updated by -1. Thus, nsftd. Each time the b instruction is issued, the adjacent data on the left is successively output to each processing element. The data output as a result of data selection by the network is converted in register number by the register number conversion unit 20b. However, since the register offset value for writing is 0, it is stored in r4 which is the same as the operand.

このとき、各プロセッシングエレメントがアクセスするレジスタは図２３の斜線部で示すようにそれぞれｒ１又はｒ０であり、バイト単位のプロセッシングエレメント毎にみた場合、読み出すレジスタの数は１つだけである。
図４８（ａ）、（ｂ）、（ｃ）それぞれにおいて、ネットワーク３０ａ、ネットワーク３０ｂそれぞれがｒ３、ｒ４に出力する値を、図４８（ａ）、（ｂ）、（ｃ）毎に演算器ユニット４０に入力し、加算演算と積和演算を繰り返すことにより、各プロセッシングエレメントでＦＩＲフィルタ演算を並列に実行することができる。 At this time, the registers accessed by each processing element are r1 or r0 as shown by the shaded portion in FIG. 23, and only one register is read when viewed for each processing element in byte units.
48 (a), (b), and (c), the values output to r3 and r4 by the network 30a and the network 30b, respectively, are calculated for each of the computing unit units in FIGS. 48 (a), (b), and (c). By repeating the addition operation and the product-sum operation, the FIR filter operation can be executed in parallel by each processing element.

以上のように本実施形態によれば、ネットワークパタン値によって、読み出すレジスタを選択することにより、並列にＦＩＲフィルタの演算が行える。また、データ供給のためのレジスタの読み出しを各ネットワークについて１つ、あわせて２つでＦＩＲフィルタ演算が行えるようになり、並列処理プロセッサにおいてレジスタファイルのハードウェア規模の削減を図ることができる。 As described above, according to the present embodiment, FIR filter operations can be performed in parallel by selecting a register to be read based on a network pattern value. In addition, the FIR filter operation can be performed by reading out the registers for supplying data one by two for each network, and the hardware scale of the register file can be reduced in the parallel processor.

なお、図４８においては、ネットワーク３０ａ、ネットワーク３０ｂを制御する命令を同時に発行しているが、１サイクルあたり１つの命令にしてもよい。この場合、ＦＩＲ演算のためのデータの供給速度は半分になり、性能は低下する。
（第三実施形態）
第二実施形態では、第一実施形態に示す並列演算プロセッサによって、ＦＩＲフィルタの演算が行えることを示した。ところがＦＩＲフィルタは、隣接するデータを必要とするが、画面の端では、隣接するデータが存在しない。そこで本実施形態は、画面の端の値を繰り返したデータが存在するとして、フィルタ処理を行う場合の実施形態である。 In FIG. 48, the commands for controlling the network 30a and the network 30b are issued at the same time, but may be one command per cycle. In this case, the data supply speed for FIR calculation is halved, and the performance is degraded.
(Third embodiment)
In the second embodiment, it has been shown that the FIR filter can be operated by the parallel operation processor shown in the first embodiment. However, the FIR filter requires adjacent data, but there is no adjacent data at the edge of the screen. Therefore, the present embodiment is an embodiment in the case where the filtering process is performed on the assumption that there is data in which the value at the end of the screen is repeated.

図２４（ａ）〜（ｃ）に画面の端でのＦＩＲフィルタ演算の演算方法を示す。図においてｘ３１が画面の端のデータとする。図２４（ｂ）では、ｘ３０におけるＦＩＲフィルタ演算では、３回目の積和演算に使用するデータが画面の外にでてしまう。このような場合には、画面の外にｘ３１のデータが続いているように扱ってＦＩＲ演算を行う。図２４（ｃ）のｘ３１におけるＦＩＲフィルタ演算でも同様である。 24A to 24C show a calculation method of FIR filter calculation at the edge of the screen. In the figure, x31 is data at the edge of the screen. In FIG. 24B, in the FIR filter calculation at x30, data used for the third product-sum calculation appears outside the screen. In such a case, the FIR operation is performed by treating the x31 data as continuing outside the screen. The same applies to the FIR filter calculation at x31 in FIG.

このようなデータ供給が第一実施形態に示した構成によって可能となることを、図１７を用いて説明する。第二実施形態で示したようにＦＩＲ演算では、ネットワークシフト命令を使用する。ネットワークシフト命令の実際の記述の一例を以下に示し、その動作を説明する。
ｎｓｆｔｉ．ａｒ３，ｒ１，ＮＰＡ，０ｘ１
本実施形態では、ネットワーク３０ａは、レジスタオフセット変化量レジスタ８０およびレジスタオフセットモジュロ値レジスタ８１は使用しない。ＰＥモードレジスタ８２は８ＰＥモード、動作モードレジスタ８３は飽和モードに設定されているものとする。 The fact that such data supply is possible with the configuration shown in the first embodiment will be described with reference to FIG. As shown in the second embodiment, a network shift instruction is used in the FIR operation. An example of the actual description of the network shift instruction is shown below, and its operation will be described.
nsfti. a r3, r1, NPA, 0x1
In the present embodiment, the network 30a does not use the register offset change amount register 80 and the register offset modulo value register 81. It is assumed that the PE mode register 82 is set to the 8PE mode and the operation mode register 83 is set to the saturation mode.

このｎｓｆｔｉ．ａ命令では、オペランドのｒ１が命令デコーダ１１によって、レジスタ番号変換部２０ａに入力される。ｎｓｆｔｉ．ａ命令ではレジスタオフセットは更新しない。レジスタ番号変換部２０ａでは、論理和回路１６からのレジスタオフセット選択信号によって、ｒ１をそのまま使用するか、ｒ１にレジスタオフセット値を加算したレジスタを使用するかを決定し、レジスタファイル１５にレジスタ番号を出力する。レジスタファイル１５は指定されたレジスタから読み出したデータを出力する。ＮＰＡはネットワークパタンレジスタを指すオペランドである。オペランドのＮＰＡは、セレクト処理部５０のネットワークパタンレジスタ５３によって、ネットワークによるデータの選択動作および、レジスタオフセット選択信号のセレクト動作が行われることを示している。オペランドの０ｘ１はネットワークパタン値変化量を指し、データのセレクトと同時に、ＮＰＡの値をすべてのプロセッシングエレメントで１加算する。そして、ネットワークによるデータ選択の結果出力されたデータは、レジスタ番号変換部２０ｂで変換した番号のレジスタに格納される。 This nsfti. In the a instruction, r1 of the operand is input to the register number conversion unit 20a by the instruction decoder 11. nsfti. The register offset is not updated by the a instruction. The register number conversion unit 20a determines whether to use r1 as it is or to use a register obtained by adding a register offset value to r1 in accordance with a register offset selection signal from the OR circuit 16, and assigns a register number to the register file 15. Output. The register file 15 outputs the data read from the designated register. NPA is an operand indicating a network pattern register. The NPA of the operand indicates that the network pattern register 53 of the select processing unit 50 performs a data selection operation and a register offset selection signal selection operation by the network. The operand 0x1 indicates the amount of change in the network pattern value. At the same time as selecting the data, the NPA value is incremented by 1 in all the processing elements. The data output as a result of data selection by the network is stored in the register with the number converted by the register number conversion unit 20b.

第二実施形態と同様に、図４８にＦＩＲフィルタ演算を行うための、命令シーケンスを示す。図４８の（ａ）〜（ｃ）では、ｎｓｆｔｉ．ａ命令とｎｓｆｔｄ．ｂ命令が同時に発行されることを示している。この命令のシーケンスに合わせたネットワーク３０ａの動作を図２５に示す。
図２５にｎｓｆｔｉ．ａ命令を発行した時のデータの動きを示す。図２５（ａ）の第一段はプロセッシングエレメントを示す。第二段はハーフワード単位で画素値を格納していることを示す。第一弾及び第二段の詳細は図９の説明で記載したのと同様である。第三段はリード用レジスタオフセット値を示す。第四段はライト用レジスタオフセット値を示す。第五段はNPＡに格納されているネットワークパタン値を示す。第六段は入力データｒ２を示す。第七段は入力データｒ１を示す。第八段は出力データｒ３を示す。図２５（ｂ）、図２５（ｃ）の第一段はNPＡに格納されているネットワークパタン値を示す。第二段は入力データｒ２を示す。第三段は入力データｒ１を示し、第四段は出力データｒ３を示す。ｘ１６からｘ３１は水平方向に並ぶ画素データである。ＮＰＡの初期値として００、０１、０２、０３．．．０ｆというネットワークパタン値を設定し、リード用のレジスタオフセットに０ｘ０１という値を設定する。動作モードレジスタは飽和モードに設定する。図４８（ａ）の１度目のｎｓｆｔｉ．ａ命令によって、図２５（ａ）のようにｘ１６、ｘ１７、ｘ１８．．．ｘ３１のデータが出力され、同時にＮＰＡの値が＋１更新される。次に、図４８（ｂ）の２度目のｎｓｆｔｉ．ａ命令によって、データが選択される。ここで、ＰＥ７Ｌのネットワークパタン値は０ｘ１０であり、０ｘ０ｆを超えているが飽和モードでは０ｘ０ｆとして扱われるので、結果として、図２５（ｂ）のように、ｘ１７、ｘ１８、ｘ１９．．．ｘ３１、ｘ３１のデータが出力され、ＮＰＡの値が＋１更新される。さらに、図４８（ｃ）の３度目のｎｓｆｔｉ．ａ命令によって、データが選択される。ここで、ＰＥ７Ｈ及びＰＥ７Ｌのネットワークパタン値はそれぞれ０ｘ１０と０ｘ１１であり、０ｘ０ｆを超えているが飽和モードでは０ｘ０ｆとして扱われるので、結果として、図２５（ｃ）のように、ｘ１８、ｘ１９．．．ｘ３１、ｘ３１、ｘ３１のデータが出力され、ＮＰＡの値が＋１更新される。このようにｎｓｆｔｉ．ａ命令を発行するたびに次々に右側の隣接データが各ＰＥに出力され、データのない右端ではｘ３１の値が繰り返し出力される。ネットワークによるデータ選択の結果出力されたデータは、レジスタ番号変換部２０ｂでレジスタ番号を変換されるが、ライト用のレジスタオフセット値が０であるので、オペランドと同じｒ３に格納される。 As in the second embodiment, FIG. 48 shows an instruction sequence for performing the FIR filter operation. 48 (a) to (c), nsfti. a instruction and nsftd. The b instruction is issued simultaneously. The operation of the network 30a in accordance with this instruction sequence is shown in FIG.
FIG. 25 shows nsfti. The movement of data when the a instruction is issued is shown. The first stage in FIG. 25A shows a processing element. The second level indicates that pixel values are stored in units of half words. The details of the first bullet and the second stage are the same as described in the explanation of FIG. The third row shows the read register offset value. The fourth row shows the write register offset value. The fifth row shows network pattern values stored in the NPA. The sixth row shows the input data r2. The seventh row shows the input data r1. The eighth row shows the output data r3. The first row in FIGS. 25B and 25C shows the network pattern values stored in the NPA. The second stage shows input data r2. The third stage shows input data r1, and the fourth stage shows output data r3. x16 to x31 are pixel data arranged in the horizontal direction. 00, 01, 02, 03. . . A network pattern value of 0f is set, and a value of 0x01 is set to the register offset for reading. The operation mode register is set to saturation mode. The first nsfti. Of FIG. The a instruction causes x16, x17, x18. . . x31 data is output, and at the same time, the value of the NPA is updated by +1. Next, the second nsfti. Data is selected by the a instruction. Here, the network pattern value of PE7L is 0x10, which exceeds 0x0f, but is treated as 0x0f in the saturation mode. As a result, as shown in FIG. 25B, x17, x18, x19. . . Data of x31 and x31 are output, and the value of NPA is updated by +1. Further, the third nsfti. Data is selected by the a instruction. Here, the network pattern values of PE7H and PE7L are 0x10 and 0x11, respectively, which exceed 0x0f but are treated as 0x0f in the saturation mode. As a result, as shown in FIG. . . Data of x31, x31, and x31 is output, and the value of NPA is updated by +1. Thus, nsfti. Each time the a instruction is issued, the adjacent data on the right side is successively output to each PE, and the value of x31 is repeatedly output at the right end where there is no data. Data output as a result of data selection by the network is converted in register number by the register number conversion unit 20b. However, since the register offset value for writing is 0, it is stored in the same r3 as the operand.

第二実施形態と同じように、ネットワーク３０ｂではｎｓｆｔｄ．ｂ命令が動作し、データの選択動作が行われ、ＦＩＲ演算を行うことができる。
また、画面の左端では、ネットワーク３０ｂの動作モードレジスタ８３を飽和モードに設定し、ＮＰＢに保持されているネットワークパタン値が０より小さい場合に飽和の操作を行うことにより、同様に画面の左端でのＦＩＲフィルタ処理を行うことができる。 As in the second embodiment, in the network 30b, nsftd. The b instruction operates, data selection operation is performed, and FIR calculation can be performed.
Further, at the left end of the screen, the operation mode register 83 of the network 30b is set to the saturation mode, and when the network pattern value held in the NPB is smaller than 0, the saturation operation is performed in the same manner at the left end of the screen. FIR filter processing can be performed.

以上のように本実施形態によれば、０ｘ００から０ｘ０ｆの範囲外のネットワークパタン値に対し飽和演算を行い、データ転送元のプロセッシングエレメントを決定する。よって、画面の端で、隣接するデータが存在しない場合であっても、画面の端の値を繰り返したデータが存在するとして処理することができる。つまり、画面の端におけるＦＩＲフィルタ演算においても、画面端のデータをコピーするような操作をプログラムによって明示的に記述することなく、動作モードを飽和モードに変更するだけで、ＦＩＲフィルタの演算が行える。また、ネットワークパタン値を更新することにより、コードサイズを削減し、プログラムの動作効率を向上させることができる。 As described above, according to this embodiment, the saturation calculation is performed on the network pattern value outside the range of 0x00 to 0x0f, and the processing element of the data transfer source is determined. Therefore, even when there is no adjacent data at the edge of the screen, processing can be performed assuming that there is data in which the value at the edge of the screen is repeated. That is, even in the FIR filter calculation at the edge of the screen, the FIR filter calculation can be performed only by changing the operation mode to the saturation mode without explicitly describing the operation of copying the data at the screen edge by the program. . Further, by updating the network pattern value, the code size can be reduced and the operation efficiency of the program can be improved.

（第四実施形態）
第三実施形態では、ｎｓｆｔｉ．ａ命令を飽和モードで動作させることにより、画面の端での処理が行えることを示した。しかし、第三実施形態では、それぞれのデータが１６ビットである場合等には、ＬＳＢ８ビット又はＭＳＢ８ビットだけが繰り返され、１６ビットデータが画面の端にある場合の処理が正しく行えない。 (Fourth embodiment)
In the third embodiment, nsfti. It was shown that processing at the edge of the screen can be performed by operating the a instruction in saturation mode. However, in the third embodiment, when each data is 16 bits, etc., only LSB 8 bits or MSB 8 bits are repeated, and processing when 16 bits data is at the edge of the screen cannot be performed correctly.

本実施形態では、１６ビットデータ等に対しても画面の端での処理が行える実施形態である。図２６に本実施形態におけるセレクト処理部５０ａの構成図を示す。セレクト処理部５０ａは、１６ｔｏ１セレクタ５１ａと、セレクタ５２ａと、ネットワークパタンレジスタ５３ａと、加減算器５４ａと、セレクト信号変換部６０ａとを備える。
セレクト信号変換部６０ａは、入力されたネットワークパタン値からレジスタオフセット選択信号と、１６ｔｏ１セレクタ５１及び１６ｔｏ１セレクタ５１ａに送出するセレクト信号をそれぞれ生成する。１６ｔｏ１セレクタ５１ａに送出するセレクト信号の生成方法は図６と同じであるが、１６ｔｏ１セレクタ５１ａに送出するセレクト信号は最下位の１ビットを常に“１”とし、１６ｔｏ１セレクタ５１に送出するセレクト信号は最下位の１ビットを常に“０”とし、残りのビットについては同じ値とする。また、レジスタオフセット選択信号をそれぞれのセレクト信号と同じ番号のプロセッシングエレメントに出力する。 In this embodiment, processing at the edge of the screen can be performed even for 16-bit data or the like. FIG. 26 shows a configuration diagram of the select processing unit 50a in the present embodiment. The select processing unit 50a includes a 16to1 selector 51a, a selector 52a, a network pattern register 53a, an adder / subtractor 54a, and a select signal conversion unit 60a.
The select signal conversion unit 60a generates a register offset selection signal and a select signal to be sent to the 16to1 selector 51 and the 16to1 selector 51a from the input network pattern value. The method for generating the select signal to be sent to the 16to1 selector 51a is the same as that in FIG. 6, but the select signal sent to the 16to1 selector 51a always has the least significant bit of “1”, and the select signal to be sent to the 16to1 selector 51 is The least significant bit is always “0”, and the remaining bits have the same value. Further, the register offset selection signal is output to the processing element having the same number as each select signal.

他の構成要素は図６と同様である。
以上の構成により、１６ビットのデータはＭＳＢ８ビットとＬＳＢ８ビットが常に１組のデータとして取り扱われることになる。
続いて、ネットワークパタン値を用いた場合の動作について説明する。実際の記述の一例を以下に示し、その動作を説明する。オペランドのネットワークパタン値の変化量が０ｘ２に変更されていること以外は、第三実施形態と同じである。 Other components are the same as those in FIG.
With the above configuration, 16-bit data is always handled as one set of MSB 8-bit and LSB 8-bit data.
Next, the operation when the network pattern value is used will be described. An example of actual description is shown below, and its operation will be described. The third embodiment is the same as the third embodiment except that the change amount of the network pattern value of the operand is changed to 0x2.

ｎｓｆｔｉ．ａｒ３，ｒ１，ＮＰＡ，０ｘ２
図２７にｎｓｆｔｉ．ａ命令を発行した時のデータの動きを示す。図２７（ａ）の第一段はプロセッシングエレメントを示す。第二段はハーフワード単位で画素値を格納していることを示す。第一弾及び第二段の詳細は図９の説明で記載したのと同様である。第三段はリード用レジスタオフセット値を示す。第四段はライト用レジスタオフセット値を示し、第五段はNPＡに格納されているネットワークパタン値を示す。第六段は１６ｔｏ１セレクタへのセレクト値を示し、第七段は入力データｒ２を示す。第八段は入力データｒ１を示す。第九段は出力データｒ３を示す。図２７（ｂ）、図２７（ｃ）の第一段はNPＡに格納されているネットワークパタン値を示す。第二段はセレクタへのセレクト値を示す。第三段は入力データｒ２を示し、第四段は入力データｒ１を示す。第五段は出力データｒ３を示す。ｘ１６からｘ３１は水平方向に並ぶデータである。ＮＰＡの初期値として−，０１，−，０３．．．−，０ｆというネットワークパタン値を設定し、レジスタオフセットに０ｘ０１を設定する。ここで、“−”は設定不要であることを示し、どのような値でもよい。動作モードレジスタには飽和モードを設定する。図２７（ａ）の１度目のｎｓｆｔｉ．ａ命令によって、ｘ１６、ｘ１７、ｘ１８．．．ｘ３１のデータが出力され、同時にＮＰＡの値が＋２更新される。ＰＥ０Ｈ、ＰＥ１Ｈ．．．ＰＥ７Ｈのネットワークパタン値は使用されない。次に、図２７（ｂ）の２度目のｎｓｆｔｉ．ａ命令において、ＰＥ７Lのネットワークパタン値は１１であるが、飽和モードであるのでセレクト値は０ｆとなる。また、ＰＥ７Ｈのセレクト値はＰＥ７Lのセレクト値の最下位１ビットを０にするので０ｅとなる。よってｘ１８、ｘ１９、ｘ２０．．．ｘ３０、ｘ３１のデータが出力され、ＮＰＡの値が＋２更新される。さらに、図２７（ｃ）の３度目のｎｓｆｔｉ．ａ命令によってｘ２０、ｘ２１、ｘ２２．．．ｘ３１、ｘ３０、ｘ３１のデータが出力され、ＮＰＡの値が＋２更新される。このようにｎｓｆｔｉ．ａ命令を発行するたびに次々に右側の１６ビット単位の隣接データが各プロセッシングエレメントに出力され、データのない右端ではｘ３０およびｘ３１の値が繰り返し出力される。 nsfti. a r3, r1, NPA, 0x2
In FIG. The movement of data when the a instruction is issued is shown. The first stage of FIG. 27A shows a processing element. The second level indicates that pixel values are stored in units of half words. The details of the first bullet and the second stage are the same as described in the explanation of FIG. The third row shows the read register offset value. The fourth row shows the write register offset value, and the fifth row shows the network pattern value stored in the NPA. The sixth row shows the select value for the 16to1 selector, and the seventh row shows the input data r2. The eighth row shows the input data r1. The ninth row shows the output data r3. The first row in FIGS. 27B and 27C shows the network pattern values stored in the NPA. The second row shows the select value to the selector. The third row shows the input data r2, and the fourth row shows the input data r1. The fifth row shows the output data r3. x16 to x31 are data arranged in the horizontal direction. As initial values of NPA,-, 01,-, 03. . . A network pattern value of-, 0f is set, and 0x01 is set in the register offset. Here, “−” indicates that setting is unnecessary, and any value may be used. The saturation mode is set in the operation mode register. The first nsfti. Of FIG. a16, x17, x18. . . The data of x31 is output, and at the same time, the value of NPA is updated by +2. PE0H, PE1H. . . The network pattern value of PE7H is not used. Next, the second nsfti. In the a instruction, the PE7L network pattern value is 11, but since it is in the saturation mode, the select value is 0f. The select value of PE7H is 0e because the least significant bit of the select value of PE7L is set to 0. Therefore, x18, x19, x20. . . Data of x30 and x31 are output, and the value of NPA is updated by +2. Further, the third nsfti. x20, x21, x22. . . Data of x31, x30, and x31 is output, and the value of NPA is updated by +2. Thus, nsfti. Each time the a instruction is issued, adjacent data in the right 16-bit unit is output to each processing element one after another, and the values of x30 and x31 are repeatedly output at the right end where there is no data.

次に、即値を用いた場合の動作について説明する。実際の記述の一例を以下に示し、その動作を説明する。
ｎｓｅｌ．ａｒ２，ｒ０，０ｘ４
この命令では、レジスタファイルからレジスタｒ０を読み出し、すべてのプロセッシングエレメントのネットワークパタン値を即値０ｘ０４として、データを移動させ、結果をｒ２に書き込む動作を行う。データの移動の様子を図２８に示す。図２８の第一段はプロセッシングエレメントを示す。第二段はハーフワード単位で画素値を格納していることを示す。第一弾及び第二段の詳細は図９の説明で記載したのと同様である。第三段はリード用レジスタオフセット値を示す。第四段はライト用レジスタオフセット値を示す。第五段はNPＡに格納されているネットワークパタン値を示す。第六段は１６ｔｏ１セレクタへのセレクト値を示す。第七段は入力データｒ０を示す。第八段は出力データを示す。ネットワークパタン値を即値０ｘ０４とする場合は、ＰＥ０Ｌへのセレクト信号５２ａは最下位ビットが“１”になるので０ｘ０５であり、ＰＥ０Ｈへのセレクト信号５２ｂは最下位ビットが“０”になるので、０ｘ０４となる。このようにして、ＰＥ２Ｈのｒ０のデータがＰＥ０Ｈ、ＰＥ１Ｈ．．．ＰＥ７Ｈに出力される。また、ＰＥ２Ｌのｒ０のデータがＰＥ０Ｌ、ＰＥ１Ｌ．．．ＰＥ７Ｌに出力され、１６ビットデータを扱うことができるようになる。 Next, the operation when an immediate value is used will be described. An example of actual description is shown below, and its operation will be described.
nsel. a r2, r0,0x4
In this instruction, the register r0 is read from the register file, the network pattern value of all processing elements is set to the immediate value 0x04, the data is moved, and the result is written to r2. FIG. 28 shows the state of data movement. The first stage of FIG. 28 shows a processing element. The second level indicates that pixel values are stored in units of half words. The details of the first bullet and the second stage are the same as described in the explanation of FIG. The third row shows the read register offset value. The fourth row shows the write register offset value. The fifth row shows network pattern values stored in the NPA. The sixth row shows the select value for the 16to1 selector. The seventh row shows the input data r0. The eighth row shows the output data. When the network pattern value is an immediate value 0x04, the select signal 52a to PE0L is 0x05 because the least significant bit is "1", and the select signal 52b to PE0H is 0. 0x04. In this way, the r0 data of PE2H becomes PE0H, PE1H. . . Output to PE7H. Also, r0 data of PE2L is PE0L, PE1L. . . It is output to PE7L and can handle 16-bit data.

以上のように本実施形態によれば、１６ビットデータに対しても、画面の端部でのデータの繰り返し処理が行えるようになり、１６ビットデータを取り扱えるようになる。また、オペランドでネットワークパタンを即値として指定した場合でも１６ビットのデータをすべてのプロセッシングエレメントに転送できるようになり、１６ビットデータの取り扱いが行えるようになる。また、処理の効率化やコードサイズの低減を図ることができる。 As described above, according to the present embodiment, data can be repeatedly processed at the edge of the screen even for 16-bit data, and 16-bit data can be handled. Even when the network pattern is specified as an immediate value in the operand, 16-bit data can be transferred to all processing elements, and 16-bit data can be handled. In addition, processing efficiency and code size can be reduced.

なお、本実施形態に示した構成と、第一実施形態に示した構成の切り替えは、制御レジスタによって行うとしてもよいし、命令によって行うとしてもよい。また、１６ビットデータのみを扱う場合には、本実施形態のみの構成をとることによりハードウェア規模を削減することができる。
（第五実施形態）
本実施形態は、行列の転置行列を実現する場合の並列演算プロセッサの改良に関する。メディア処理では、行列の転置変換が必要な場合があり、行列の要素の配置を入れ替えたいケースが多々ある。そこで本実施形態では、行列の転置変換を実現する。尚、縦16×横16の一例で説明を行えば説明が複雑になるので、以降の説明は、縦4×横4の要素からなる行列に対して転置変換を行う場合を対象として簡略化を期する。第一実施形態に示す並列演算プロセッサによって、レジスタファイルに蓄えられたデータの転置が行えることを示す。ＩＤＣＴ（ＩｎｖｅｒｓｅＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ）演算において、垂直方向のＩＤＣＴ演算後に水平方向のＩＤＣＴ演算を行う際にデータの転置が使用される。 Note that switching between the configuration shown in the present embodiment and the configuration shown in the first embodiment may be performed by a control register or by an instruction. When only 16-bit data is handled, the hardware scale can be reduced by adopting the configuration of this embodiment only.
(Fifth embodiment)
The present embodiment relates to an improvement of a parallel arithmetic processor in the case of realizing a matrix transpose matrix. In media processing, transposition of a matrix may be necessary, and there are many cases where it is desired to change the arrangement of matrix elements. Therefore, in this embodiment, transpose transformation of a matrix is realized. Note that the description will be complicated if it is described with an example of vertical 16 × horizontal 16. Therefore, the following description will be simplified for the case of transposing a matrix composed of 4 × 4 elements. Expect. It shows that the data stored in the register file can be transposed by the parallel processor shown in the first embodiment. In an IDCT (Inverse Discrete Cosine Transform) operation, data transposition is used when performing a horizontal IDCT operation after a vertical IDCT operation.

続いて、図２９に２系統のネットワークとレジスタオフセットの機能を用いることによって、データの転置が行えることを示す。本実施形態では、転置を行うために動作モードをモジュロモードに設定する。図２９（ａ）は初期のデータの状態を示している。ＰＥ０〜ＰＥ３のｒ０〜ｒ３にデータｘ０〜ｘ１５が格納されている。ここで、転置を行うには対角線を挟んで対称にある位置のデータを入れ替える必要がある。つまり、ｘ０、ｘ５、ｘ１０、ｘ１５で形成される対角線を挟んで対象の位置にあるｘ４、ｘ９、ｘ１４とｘ１、ｘ６、ｘ１１を、ｘ８、ｘ１３とｘ２、ｘ７を、ｘ１２とｘ３を入れ替える必要がある。斜線を付けたデータはネットワーク３０ａ、３０ｂが読み出すレジスタを表している。図２９（ｂ）は、ネットワーク３０ａ、ネットワーク３０ｂが読み出したデータのプロセッシングエレメント間の移動を示している。図２９（ｃ）は図２９（ｂ）での移動の結果書き込まれたデータを示している。図２９（ｄ）はネットワーク３０ａが読み出したデータのプロセッシングエレメント間の移動を示す。図２９（ｅ）はネットワーク３０ａによるデータの書き込みを示す。図２９（ｆ）は図２９（ａ）から（ｃ）へデータを移動するための、レジスタオフセット値とネットワークパタン値を示している。図２９（ｇ）は図２９（ｃ）から（ｅ）へデータを移動するための、レジスタオフセット値とネットワークパタン値を示している。 Next, FIG. 29 shows that data can be transposed by using a two-system network and a register offset function. In this embodiment, the operation mode is set to the modulo mode in order to perform transposition. FIG. 29A shows the initial data state. Data x0 to x15 are stored in r0 to r3 of PE0 to PE3. Here, in order to perform transposition, it is necessary to replace data at positions that are symmetrical with respect to a diagonal line. That is, x4, x9, x14 and x1, x6, x11 at the target position across the diagonal line formed by x0, x5, x10, x15 need to be replaced with x8, x13 with x2, x7, and x12 with x3. There is. The hatched data represents a register read by the networks 30a and 30b. FIG. 29B shows the movement between the processing elements of the data read by the network 30a and the network 30b. FIG. 29 (c) shows data written as a result of the movement in FIG. 29 (b). FIG. 29 (d) shows the movement of the data read by the network 30a between the processing elements. FIG. 29E shows data writing by the network 30a. FIG. 29 (f) shows a register offset value and a network pattern value for moving data from FIG. 29 (a) to FIG. 29 (c). FIG. 29 (g) shows a register offset value and a network pattern value for moving data from FIG. 29 (c) to (e).

図２９（ａ）において、転置を行うためにまず、ｘ４、ｘ９、ｘ１４、ｘ３とｘ１、ｘ６、ｘ１１、ｘ１２を入れ替える。そのために図２９（ｆ）に示すリード用レジスタオフセット値に従い、ネットワーク３０ａはＰＥ０から順にｒ１、ｒ２、ｒ３、ｒ０を読み出し、ネットワーク３０ｂはＰＥ０から順にｒ３、ｒ０、ｒ１、ｒ２を読み出す。図２９（ｂ）において、図２９（ｆ）に示すネットワークパタン値に従い、ネットワーク３０ａ、３０ｂともに異なる移動を行う。具体的にはネットワーク３０ａについて、ｘ４はｘ１と入れ替えるため、ＰＥ０からＰＥ１に移動する。同様にｘ９はＰＥ２、ｘ１４はＰＥ３、ｘ３はＰＥ０にそれぞれ移動する。ネットワーク３０ｂについてもネットワーク３０ａと同様である。つまり、ｘ１はｘ４と入れ替えるため、ＰＥ１からＰＥ０に移動する。同様にｘ６はＰＥ１、ｘ１１はＰＥ２、ｘ１２はＰＥ３にそれぞれ移動する。図２９（ｃ）において、図２９（ｆ）に示すライト用レジスタオフセット値に従い、ネットワーク３０ａはＰＥ０から順に、ｒ３、ｒ０、ｒ１、ｒ２に書き込み、ネットワーク３０ｂはＰＥ０から順に、ｒ１、ｒ２、ｒ３、ｒ０に書き込む。これにより、ｘ４、ｘ９、ｘ１４、ｘ３とｘ１、ｘ６、ｘ１１、ｘ１２が入れ替わる。また、残りのｘ８、ｘ１３とｘ２、ｘ７についてデータの入れ替えを行うために、次の転置動作での、ネットワーク３０ａが読み出すデータをハッチングで示している。図２９（ｇ）に示すリード用レジスタオフセット値に従い、ネットワーク３０ａは、ＰＥ０から順にｒ２、ｒ３、ｒ０、ｒ１を読み出す。図２９（ｄ）において、図２９（ｇ）に示すネットワークパタン値に従い、移動を行う。つまり、ｘ８をＰＥ２、ｘ１３をＰＥ３、ｘ２をＰＥ０、ｘ７をＰＥ１にそれぞれ移動する。このとき、ネットワーク３０ｂは動作しない。図２９（ｅ）において、図２９（ｇ）に示すライト用レジスタオフセット値に従い書き込む。この書き込みの結果、図２９（ａ）のデータの行と列が転置される。 In FIG. 29A, in order to perform transposition, first, x4, x9, x14, x3 and x1, x6, x11, x12 are interchanged. Therefore, according to the read register offset value shown in FIG. 29F, the network 30a reads r1, r2, r3, r0 sequentially from PE0, and the network 30b reads r3, r0, r1, r2 sequentially from PE0. In FIG. 29B, the networks 30a and 30b move differently according to the network pattern value shown in FIG. Specifically, for network 30a, x4 is moved from PE0 to PE1 to replace x1. Similarly, x9 moves to PE2, x14 moves to PE3, and x3 moves to PE0. The network 30b is the same as the network 30a. That is, since x1 is replaced with x4, it moves from PE1 to PE0. Similarly, x6 moves to PE1, x11 moves to PE2, and x12 moves to PE3. In FIG. 29C, according to the write register offset value shown in FIG. 29F, the network 30a writes to r3, r0, r1, r2 in order from PE0, and the network 30b writes r1, r2, r3 in order from PE0. , R0. As a result, x4, x9, x14, x3 and x1, x6, x11, x12 are interchanged. In addition, in order to exchange data for the remaining x8, x13 and x2, x7, data read by the network 30a in the next transposition operation is indicated by hatching. In accordance with the read register offset value shown in FIG. 29 (g), the network 30a reads r2, r3, r0, r1 in order from PE0. In FIG. 29D, movement is performed according to the network pattern value shown in FIG. That is, x8 is moved to PE2, x13 is moved to PE3, x2 is moved to PE0, and x7 is moved to PE1. At this time, the network 30b does not operate. In FIG. 29E, writing is performed according to the write register offset value shown in FIG. As a result of this writing, the data rows and columns in FIG. 29A are transposed.

次に、このような転置の動作を第一実施形態に示した並列演算プロセッサで行う方法について図１７を用いて説明する。ネットワーク転置命令の書式を以下に示す。
ｎｔｒｐｄ．ａ＜ｄｓｔ＞，＜ｓｒｃ＞，＜ネットワークパタンレジスタ＞，＜ネットワークパタン値変化量＞
ｎｔｒｐｄ．ａ命令の“．ａ”は２系統あるネットワークのうち、ネットワーク３０ａのネットワークを使用することを示し、＜ｄｓｔ＞は出力先レジスタ、＜ｓｒｃ＞は入力元レジスタ、＜ネットワークパタンレジスタ＞はデータの移動に使用するネットワークパタンレジスタ、＜ネットワークパタン値変化量＞はデータの移動後のネットワークパタン値に減算する値を指定する。このｎｔｒｐｄ．ａ命令の実際の記述の一例を以下に示し、その動作を説明する。 Next, a method for performing such a transposition operation by the parallel arithmetic processor shown in the first embodiment will be described with reference to FIG. The format of the network transposition instruction is shown below.
ntrpd. a <dst>, <src>, <network pattern register>, <network pattern value change amount>
ntrpd. The “.a” of the a instruction indicates that the network 30a of the two systems is used, <dst> is the output destination register, <src> is the input source register, and <Network pattern register> is the data The network pattern register used for movement, <network pattern value change amount>, designates a value to be subtracted from the network pattern value after data movement. This ntrpd. An example of the actual description of the a instruction is shown below, and its operation will be described.

ｎｔｒｐｄ．ａｒ０，ｒ０，ＮＰＡ，０ｘ２
本実施形態では、ネットワーク３０ａは、図２９（ｆ）に示すように、リード用レジスタオフセットパタンとしてＰＥ０Ｈから順に０１、０１、０２、０２、０３、０３、００、００という値を、ライト用レジスタオフセットパタンとしてＰＥ０Ｈから順に０３、０３、００、００、０１、０１、０２、０２という値を設定しているものとする。また、ネットワークパタン値としてＰＥ０Ｈから順に０６、０７、００、０１、０２、０３、０４、０５という値を設定しているものとする。更にリード用レジスタオフセット変化量レジスタ８０ａに“＋１”を、ライト用レジスタオフセット変化量レジスタ８０ｂに“−１”を、レジスタオフセットモジュロ値レジスタ８１に“４”が設定されているものとする。また、ＰＥモードレジスタ８２は４ＰＥモード、動作モードレジスタ８３はモジュロモードに設定されているものとする。 ntrpd. a r0, r0, NPA, 0x2
In the present embodiment, as shown in FIG. 29 (f), the network 30a sets the values 01, 01, 02, 02, 03, 03, 00, and 00 as the read register offset pattern in order from the PE0H. Assume that values of 03, 03, 00, 00, 01, 01, 02, 02 are set in order from PE0H as offset patterns. Also assume that values 06, 07, 00, 01, 02, 03, 04, and 05 are set as network pattern values in order from PE0H. Furthermore, it is assumed that “+1” is set in the read register offset change register 80a, “−1” is set in the write register offset change register 80b, and “4” is set in the register offset modulo value register 81. It is assumed that the PE mode register 82 is set to 4PE mode and the operation mode register 83 is set to modulo mode.

このｎｔｒｐｄ．ａ命令では、オペランドのｒ０が命令デコーダ１１によって、リードするレジスタ番号として、レジスタ番号変換部２０に渡される。レジスタ番号変換部２０ａでは、セレクト処理部５０がモジュロモードであるので、論理和回路１６からのレジスタオフセット選択信号が常に“１”であり、ｒ０にレジスタオフセット値を加算したレジスタ番号をレジスタファイル１５に出力する。リード用のレジスタオフセット値は図２９（ｆ）に示すようにＰＥ０Ｈから順に０１、０１、０２，０２、０３，０３，００，００であるので、リードするレジスタはｒ０１、ｒ０１、ｒ０２、ｒ０２、ｒ０３、ｒ０３、ｒ００、ｒ００となる。レジスタファイル１５は指定されたレジスタから読み出したデータを出力する。 This ntrpd. In the a instruction, r0 of the operand is passed to the register number conversion unit 20 by the instruction decoder 11 as a register number to be read. In the register number conversion unit 20a, since the select processing unit 50 is in the modulo mode, the register offset selection signal from the OR circuit 16 is always “1”, and the register number obtained by adding the register offset value to r0 is set to the register file 15 Output to. Since the register offset values for reading are 01, 01, 02, 02, 03, 03, 00, 00 in order from PE0H as shown in FIG. 29 (f), the registers to be read are r01, r01, r02, r02, r03, r03, r00, r00. The register file 15 outputs the data read from the designated register.

また、同時にリード用レジスタオフセット値に、レジスタオフセット変化量レジスタ８０ａの値を加算し、レジスタオフセットモジュロ値レジスタ８１に従ってモジュロ演算を行い、結果をレジスタオフセット値保持部２８に格納する。このときのリード用レジスタオフセット値の変化を図２９（ｆ）と図２９（ｇ）に示す。ＮＰＡはネットワークパタンレジスタを指すオペランドである。オペランドのＮＰＡは、セレクト処理部のネットワークパタンレジスタによって、ネットワークによるデータの選択動作および、レジスタオフセット選択信号セレクト動作が行われることを示している。オペランドの０ｘ２はネットワークパタン値変化量を指し、データのセレクトと同時に、ＮＰＡの値をすべてのプロセッシングエレメントで２減算する。このときのネットワークパタン値の変化を図２９（ｆ）と図２９（ｇ）に示す。 At the same time, the value of the register offset change amount register 80 a is added to the read register offset value, modulo operation is performed according to the register offset modulo value register 81, and the result is stored in the register offset value holding unit 28. Changes in the read register offset value at this time are shown in FIGS. 29 (f) and 29 (g). NPA is an operand indicating a network pattern register. The operand NPA indicates that data selection operation and register offset selection signal selection operation by the network are performed by the network pattern register of the selection processing unit. The operand 0x2 indicates the amount of change in the network pattern value. At the same time as the data selection, the NPA value is subtracted by 2 for all the processing elements. Changes in the network pattern value at this time are shown in FIG. 29 (f) and FIG. 29 (g).

ネットワークによるデータ選択の結果出力されたデータはｒ０をレジスタ番号変換部２０ｂで変換されたレジスタに格納する。ライト用のレジスタオフセット値は図２９（ｆ）に示すようにＰＥ０Ｈから順に０３、０３、００、００、０１、０１、０２，０２であるので、リードするレジスタはｒ０３、ｒ０３、ｒ００、ｒ００、ｒ０１、ｒ０１、ｒ０２、ｒ０２となる。レジスタファイル１５は指定されたレジスタにデータを書き込む。 The data output as a result of data selection by the network stores r0 in the register converted by the register number conversion unit 20b. Since the register offset values for writing are 03, 03, 00, 00, 01, 01, 02, 02 in order from PE0H as shown in FIG. 29 (f), the registers to be read are r03, r03, r00, r00, r01, r01, r02, r02. The register file 15 writes data to the designated register.

また、同時にライト用レジスタオフセット値に、レジスタオフセット変化量レジスタ８０ｂの値を減算し、レジスタオフセットモジュロ値レジスタ８１に従ってモジュロ演算を行い、結果をレジスタオフセット値保持部２８に格納する。このときのライト用レジスタオフセット値の変化を図２９（ｆ）と図２９（ｇ）に示す。
ネットワーク３０ｂは、レジスタオフセット更新動作や、ネットワークパタン値の更新動作を行う必要がないので、ｎｓｅｌ．ｂ命令を発行する。このｎｓｅｌ．ｂ命令の実際の記述の一例を以下に示す。 At the same time, the value of the register offset change amount register 80 b is subtracted from the write register offset value, a modulo operation is performed according to the register offset modulo value register 81, and the result is stored in the register offset value holding unit 28. Changes in the write register offset value at this time are shown in FIGS. 29 (f) and 29 (g).
The network 30b does not need to perform a register offset update operation or a network pattern value update operation. Issue the b instruction. This nsel. An example of the actual description of the b instruction is shown below.

ｎｓｅｌ．ｂｒ０，ｒ０，ＮＰＢ
ｎｓｅｌ．ｂ命令は第一実施形態で説明したｎｓｅｌ．ａ命令と同じ動作である。
図４９に転置を行うための命令シーケンスを示す。図４９（ａ）のようにｎｔｒｐｄ．ａ命令をネットワーク３０ａで発行し、ネットワーク３０ｂではｎｓｅｌ．ｂ命令を発行することにより、図２９（ａ）から（ｃ）に変化する転置演算を行うことができる。 nsel. b r0, r0, NPB
nsel. The b instruction is the nsel. It is the same operation as the a instruction.
FIG. 49 shows an instruction sequence for performing transposition. As shown in FIG. 49 (a), ntrpd. a command is issued on the network 30a, and nsel. By issuing the b instruction, a transposition operation that changes from FIG. 29A to FIG. 29C can be performed.

次に、図２９（ｃ）から（ｅ）に変化する転置演算では、先のｎｔｒｐｄ．ａ命令により、レジスタオフセット値およびネットワークパタン値が更新されているので、図４９（ｂ）のようにｎｓｅｌ．ａ命令を発行することにより転置動作が行える。このとき、ネットワーク３０ｂは使用しないので、命令は発行しなくてよい。このｎｓｅｌ．ａ命令の実際の記述の一例を以下に示す。 Next, in the transposition operation which changes from FIG. 29 (c) to (e), the previous ntrpd. Since the register offset value and the network pattern value have been updated by the a instruction, the nsel. A transposition operation can be performed by issuing an a command. At this time, since the network 30b is not used, the command need not be issued. This nsel. An example of the actual description of the a instruction is shown below.

ｎｓｅｌ．ａｒ０，ｒ０，ＮＰＡ
このようにして、図４９（ａ）および（ｂ）の命令シーケンスを用いることにより、図２９（ａ）から（ｅ）に変化するデータの転置を行うことができる。
引き続き、転置動作を行う場合には、先に述べた転置手順を逆に行うことにより、レジスタオフセット値やネットワークパタン値を変更することなく、連続して転置動作を行うことができる。 nsel. a r0, r0, NPA
In this way, by using the instruction sequences of FIGS. 49A and 49B, the data changing from FIGS. 29A to 29E can be transposed.
Subsequently, when performing the transposition operation, the transposition operation can be continuously performed without changing the register offset value or the network pattern value by performing the transposition procedure described above in reverse.

この際には、ｎｔｒｐｉ．ａ命令をネットワークＡで発行する。ｎｔｒｐｉ．ａ命令は、ｎｔｒｐｄ．ａ命令とは、リード用レジスタオフセット・ライト用レジスタオフセット・ネットワークパタン値更新動作において減算でなく、加算を用いること以外は全く同じ動作である。
図４９（ｃ）のように、ネットワーク３０ａでｎｔｒｐｉ．ａ命令を発行することにより、図２９（ｅ）から（ｃ）への転置演算を行うことができる。次に、図４９（ｄ）のように、ネットワーク３０ａでｎｓｅｌ．ａ命令、ネットワーク３０ｂでｎｓｅｌ．ｂ命令を発行することにより、図２９（ｃ）から（ａ）の転置演算を行うことができ、結果として転置演算が完了する。連続して転置演算を行う場合には、ネットワークパタン値やレジスタオフセットの値を書き換えることなく、転置演算を逆に行う命令を発行することにより、続けて転置演算を行うことができる。 At this time, ntrpi. Issue a command on network A. ntrpi. The a instruction is ntrpd. The a instruction is exactly the same operation except that addition is used instead of subtraction in the read register offset / write register offset / network pattern value update operation.
As shown in FIG. 49 (c), ntrpi. By issuing the a instruction, the transposition operation from FIG. 29E to FIG. 29C can be performed. Next, as shown in FIG. 49 (d), nsel. a command, nsel. By issuing the b instruction, the transposition operation of FIG. 29C to FIG. 29A can be performed, and as a result, the transposition operation is completed. When performing transpose operations continuously, transpose operations can be performed continuously by issuing an instruction to perform the transpose operations in reverse without rewriting the network pattern value or the register offset value.

以上のように本実施形態によれば、並列処理プロセッサにおいて転置の演算を行うことができ、ＩＤＣＴをはじめとする転置演算が必要な場合の処理を効率的に行うことが出来る。
なお、使用するレジスタについては、ｒ４〜ｒ７で転置することや、ｒ０、ｒ２、ｒ４、ｒ６の間で転置するような演算も可能である。 As described above, according to the present embodiment, transposition computation can be performed in a parallel processor, and processing in the case where transposition computation including IDCT is necessary can be performed efficiently.
Note that the registers to be used can be transposed at r4 to r7, or can be operated to transpose between r0, r2, r4, and r6.

また、本実施形態では、説明の簡単のため、１６ビットデータの４ｘ４転置の手順について説明したが、８ビットデータや８ｘ８の転置などについても同様に行うことができる。図３０から図３３は、８ｘ８のデータ転置の説明図である。図３０（ａ）が初期状態を示す図である。図３０（ｄ）に示す値を用い、図３０（ｂ）で示すようにデータをリードし、図３０（ｃ）に示すようにライトする。次に、図３１（ｇ）に示す値を用い、図３１（ｅ）で示すようにデータをリードし、図３１（ｆ）に示すようにライトする。更に、図３２（ｊ）に示す値を用い、図３２（ｈ）で示すようにデータをリードし、図３２（ｉ）に示すようにライトする。最後に図３３（ｍ）に示す値を用い、図３３（ｋ）で示すようにデータをリードし、図３３（ｌ）に示すようにライトする。この書き込みの結果、図３０（ａ）のデータの行と列が転置される。図５０に８ｘ８の転置を行うための命令シーケンスを示す。 Further, in this embodiment, for the sake of simplicity of explanation, the procedure of 4 × 4 transposition of 16-bit data has been described, but 8-bit data and 8 × 8 transposition can be similarly performed. 30 to 33 are explanatory diagrams of 8 × 8 data transposition. FIG. 30A shows an initial state. Using the values shown in FIG. 30 (d), data is read as shown in FIG. 30 (b) and written as shown in FIG. 30 (c). Next, using the values shown in FIG. 31 (g), data is read as shown in FIG. 31 (e) and written as shown in FIG. 31 (f). Further, using the values shown in FIG. 32 (j), data is read as shown in FIG. 32 (h) and written as shown in FIG. 32 (i). Finally, using the values shown in FIG. 33 (m), data is read as shown in FIG. 33 (k) and written as shown in FIG. 33 (l). As a result of this writing, the row and column of data in FIG. FIG. 50 shows an instruction sequence for performing 8 × 8 transposition.

（第六実施形態）
第三実施形態、及び第四実施形態では、ネットワークパタン値が０ｘ００より小さい場合に０ｘ００に飽和し、０ｘ０ｆより大きい場合には０ｘ０ｆに飽和していた。本実施形態は飽和を予め設定されている最大値、最小値で決める場合の実施形態である。
図３４は本発明の第六実施形態に関するセレクト信号変換部６０の構成図である。最大値及び最小値は、制御レジスタに保持されているものとする。図３４と図８の差異は、飽和演算部６３に最大値及び最小値が入力される点である。 (Sixth embodiment)
In the third embodiment and the fourth embodiment, when the network pattern value is smaller than 0x00, it is saturated to 0x00, and when it is larger than 0x0f, it is saturated to 0x0f. This embodiment is an embodiment in which saturation is determined by a preset maximum value and minimum value.
FIG. 34 is a block diagram of the select signal converter 60 according to the sixth embodiment of the present invention. It is assumed that the maximum value and the minimum value are held in the control register. The difference between FIG. 34 and FIG. 8 is that the maximum value and the minimum value are input to the saturation calculation unit 63.

飽和演算部６３は、ＰＥモード信号が８ＰＥモードの場合、ネットワークパタン値が入力された最小値より小さい場合は最小値の下位４ビットを、入力された最大値より大きい場合は最大値の下位４ビットをセレクト信号とし、いずれでもない場合はネットワークパタン値の下位４ビットをセレクト信号とする。そして、レジスタオフセット選択信号は、この変換後の値の５ビット目とする。最大値及び最小値が０ｘ００から０ｘ０ｆの範囲内の場合には、レジスタオフセット選択信号は“０”になる。 When the PE mode signal is 8PE mode, the saturation calculation unit 63 sets the lower 4 bits of the minimum value when the network pattern value is smaller than the input minimum value, and outputs the lower 4 bits of the maximum value when the network pattern value is larger than the input maximum value. Bits are used as select signals, and if none of them, the lower 4 bits of the network pattern value are used as select signals. The register offset selection signal is the fifth bit of the converted value. When the maximum value and the minimum value are within the range of 0x00 to 0x0f, the register offset selection signal is “0”.

ＰＥモード信号が４ＰＥモードの場合、ＰＥ０〜３ではネットワークパタン値が入力された最小値より小さい場合は最小値の下位３ビットを、入力された最大値より大きい場合は最大値の下位３ビットをセレクト信号とし、いずれでもない場合はネットワークパタン値の下位３ビットをセレクト信号とする。レジスタオフセット選択信号は、変換後のネットワークパタン値の４ビット目とする。最大値及び最小値が０ｘ００から０ｘ０７の範囲内の場合には、レジスタオフセット選択信号は“０”になる。セレクト信号の４ビット目は常に“０”にセットする。ＰＥ４〜７ではネットワークパタン値が入力された最小値より小さい場合は最小値の下位３ビットを、入力された最大値より大きい場合は最大値の下位３ビットをセレクト信号とし、いずれでもない場合はネットワークパタン値の下位３ビットをセレクト信号とする。レジスタオフセット選択信号は、変換後のネットワークパタン値の４ビット目とする。最大値及び最小値が０ｘ００から０ｘ０７の範囲内の場合には、レジスタオフセット選択信号は“０”になる。セレクト信号の４ビット目は常に“１”にセットする。 When the PE mode signal is 4PE mode, in PE0-3, the lower 3 bits of the minimum value if the network pattern value is smaller than the inputted minimum value, and the lower 3 bits of the maximum value if larger than the inputted maximum value. If it is neither, the lower 3 bits of the network pattern value are used as the select signal. The register offset selection signal is the fourth bit of the converted network pattern value. When the maximum value and the minimum value are within the range of 0x00 to 0x07, the register offset selection signal is “0”. The fourth bit of the select signal is always set to “0”. In PE4-7, if the network pattern value is smaller than the input minimum value, the lower 3 bits of the minimum value are used as the select signal, and if it is larger than the input maximum value, the lower 3 bits of the maximum value are used as the select signal. The lower 3 bits of the network pattern value are used as a select signal. The register offset selection signal is the fourth bit of the converted network pattern value. When the maximum value and the minimum value are within the range of 0x00 to 0x07, the register offset selection signal is “0”. The fourth bit of the select signal is always set to “1”.

続いて、飽和演算部６３の動作を図３５に示す。ＰＥモード信号が８ＰＥモードの場合の動作概要を図３５（ａ）〜（ｃ）に示す。図中のＰＥ０からＰＥ７はプロセッシングエレメントの番号を示し、ＨとＬはそれぞれのプロセッシングエレメントがハーフワード単位で画素値を格納していることを示す。図３５（ａ）、図３５（ｂ）及び図３５（ｃ）の第一段はネットワークパタン値を示す。第二段はセレクト信号出力を示し、第三段はレジスタオフセット選択信号出力を示す。最大値は０ｘ０ｃとし、最小値は０ｘ００とする。図３５（ａ）では、ネットワークパタン値が０ｘ００〜０ｘ０ｃの範囲にあると、セレクト信号はネットワークパタン値と同一になる。ＰＥ６Ｌ、ＰＥ７Ｈ及びＰＥ７Ｌでは、ネットワークパタン値が０ｘ０ｃより大きいので、セレクト信号は０ｘ０ｃとなる。レジスタオフセット選択信号はすべて“０”である。図３５（ｂ）では、ネットワークパタン値が０ｘ００〜０ｘ０ｃの範囲にあると、セレクト信号はネットワークパタン値と同一になる。ＰＥ５Ｌ、ＰＥ６Ｈ、ＰＥ６Ｌ、ＰＥ７Ｈ及びＰＥ７Ｌでは、ネットワークパタン値が０ｘ０ｃより大きいので、セレクト信号は０ｘ０ｃとなる。レジスタオフセット選択信号はすべて“０”である。図３５（ｃ）では、ネットワークパタン値が０ｘ００〜０ｘ０ｃの範囲にある場合には、図３５（ａ）と同様であり、ＰＥ０Ｈ、ＰＥ０Ｌではネットワークパタン値が０ｘ００より小さいので、セレクト信号は０ｘ００となる。ＰＥ７Ｌでは、ネットワークパタン値が０ｘ０ｃより大きいので、セレクト信号は０ｘ０ｃとなる。レジスタオフセット選択信号はすべて“０”である。 Subsequently, the operation of the saturation calculation unit 63 is shown in FIG. An outline of the operation when the PE mode signal is the 8PE mode is shown in FIGS. PE0 to PE7 in the figure indicate processing element numbers, and H and L indicate that each processing element stores a pixel value in units of half words. The first stage of FIG. 35A, FIG. 35B, and FIG. 35C shows network pattern values. The second stage shows the select signal output, and the third stage shows the register offset selection signal output. The maximum value is 0x0c, and the minimum value is 0x00. In FIG. 35A, if the network pattern value is in the range of 0x00 to 0x0c, the select signal is the same as the network pattern value. In PE6L, PE7H, and PE7L, the network pattern value is greater than 0x0c, so the select signal is 0x0c. All register offset selection signals are “0”. In FIG. 35B, when the network pattern value is in the range of 0x00 to 0x0c, the select signal is the same as the network pattern value. In PE5L, PE6H, PE6L, PE7H, and PE7L, since the network pattern value is larger than 0x0c, the select signal is 0x0c. All register offset selection signals are “0”. In FIG. 35 (c), when the network pattern value is in the range of 0x00 to 0x0c, it is the same as in FIG. 35 (a). In PE0H and PE0L, the network pattern value is smaller than 0x00, so the select signal is 0x00. Become. In PE7L, since the network pattern value is larger than 0x0c, the select signal is 0x0c. All register offset selection signals are “0”.

続いて、ＰＥモード信号が４ＰＥモードの場合の動作概要を図３６（ａ）〜（ｃ）に示す。図中のＰＥ０からＰＥ７はプロセッシングエレメントの番号を示し、ＨとＬはそれぞれのプロセッシングエレメントがハーフワード単位で画素値を格納していることを示す。図３６（ａ）、図３６（ｂ）及び図３６（ｃ）の第一段はネットワークパタン値を示す。第二段はセレクト信号出力を示し、第三段はレジスタオフセット選択信号出力を示す。最大値は０ｘ０６とし、最小値は０ｘｆｆとする。図３６（ａ）では、ネットワークパタン値が０ｘ００〜０ｘ０６の範囲にあると、ＰＥ０〜３ではセレクト信号はネットワークパタン値と同一になる。ＰＥ３Ｌではネットワークパタン値が０ｘ０６より大きいので、セレクト信号は０ｘ０６となる。レジスタオフセット選択信号はすべて“０”である。ＰＥ４〜７ではセレクト信号はネットワークパタン値の下位３ビットに８を加算した値となる。よって、ＰＥ７Ｌではネットワークパタン値が０ｘ０６より大きいので、セレクト信号は０ｘ０ｅとなる。レジスタオフセット選択信号はすべて“０”である。図３６（ｂ）では、ネットワークパタン値が０ｘ００〜０ｘ０６の範囲にあると、ＰＥ０〜３ではセレクト信号はネットワークパタン値と同一になる。ＰＥ３Ｈ及びＰＥ３Ｌでは、ネットワークパタン値が０ｘ０６より大きいので、セレクト信号は０ｘ０６となる。レジスタオフセット選択信号はすべて“０”である。ＰＥ４〜７については、ＰＥ７Ｈ及びＰＥ７Ｌでネットワークパタン値が０ｘ０６より大きいので、セレクト信号は０ｘ０ｅとなる。レジスタオフセット選択信号はすべて“０”である。図３６（ｃ）では、ネットワークパタン値が０ｘ００〜０ｘ０６の範囲にある場合には、図３６（ａ）と同様である。ＰＥ０〜３については、ＰＥ０Ｈでネットワークパタン値が０ｘｆｆより小さいので、セレクト信号は０ｘ０７となる。レジスタオフセット選択信号はＰＥ０Ｈ及びＰＥ０Ｌで“１”となる。ＰＥ４〜７では、ＰＥ４Ｈでネットワークパタン値が０ｘｆｆより小さく、セレクト信号の４ビット目には“１”が入るので、セレクト信号は０ｘ０ｅとなる。レジスタオフセット選択信号はＰＥ４Ｈ及びＰＥ４Ｌで“１”となる。 Subsequently, an outline of the operation when the PE mode signal is in the 4PE mode is shown in FIGS. PE0 to PE7 in the figure indicate processing element numbers, and H and L indicate that each processing element stores a pixel value in units of half words. The first stage of FIG. 36A, FIG. 36B, and FIG. 36C shows network pattern values. The second stage shows the select signal output, and the third stage shows the register offset selection signal output. The maximum value is 0x06, and the minimum value is 0xff. In FIG. 36A, when the network pattern value is in the range of 0x00 to 0x06, the select signal is the same as the network pattern value in PE0 to PE3. In PE3L, since the network pattern value is larger than 0x06, the select signal is 0x06. All register offset selection signals are “0”. In PE4-7, the select signal is a value obtained by adding 8 to the lower 3 bits of the network pattern value. Therefore, in PE7L, the network pattern value is larger than 0x06, so the select signal is 0x0e. All register offset selection signals are “0”. In FIG. 36B, when the network pattern value is in the range of 0x00 to 0x06, the select signal is the same as the network pattern value in PE0 to PE3. In PE3H and PE3L, since the network pattern value is larger than 0x06, the select signal is 0x06. All register offset selection signals are “0”. For PE4 to PE7, since the network pattern value is larger than 0x06 in PE7H and PE7L, the select signal is 0x0e. All register offset selection signals are “0”. FIG. 36C is the same as FIG. 36A when the network pattern value is in the range of 0x00 to 0x06. For PE0 to PE3, since the network pattern value is smaller than 0xff at PE0H, the select signal is 0x07. The register offset selection signal becomes “1” at PE0H and PE0L. In PE4-7, the network pattern value is smaller than 0xff in PE4H, and "1" is entered in the fourth bit of the select signal, so the select signal is 0x0e. The register offset selection signal becomes “1” at PE4H and PE4L.

以上のように本実施形態によれば、画面の端がＰＥ０Ｈ又はＰＥ７Ｌではない場合でも、画面端のデータの繰り返し処理が行えるようになる。また、様々な大きさの画像に対してＦＩＲフィルタ処理などを施すことができるようになり、適用範囲を拡大することができる。
（第七実施形態）
本実施形態は、画面の端において、データを鏡のように折り返す場合の実施形態である。 As described above, according to the present embodiment, even when the edge of the screen is not PE0H or PE7L, it is possible to repeatedly process the data at the edge of the screen. In addition, it becomes possible to perform FIR filter processing and the like on images of various sizes, and the application range can be expanded.
(Seventh embodiment)
This embodiment is an embodiment in which data is folded back like a mirror at the edge of the screen.

図３４は本発明の第七実施形態に関するセレクト信号変換部６０の構成図である。本実施形態では、飽和演算部６３の動作が、第六実施形態とは異なる。
飽和演算部６３は、ＰＥモード信号が８ＰＥモードの場合、ネットワークパタン値が入力された最小値より小さい場合は最小値−（最小値−ネットワークパタン値）＋１の値をセレクト信号とする。ネットワークパタン値が入力された最大値より大きい場合は最大値−（ネットワークパタン値−最大値）＋１の値をセレクト信号とする。 FIG. 34 is a block diagram of the select signal converter 60 according to the seventh embodiment of the present invention. In the present embodiment, the operation of the saturation calculation unit 63 is different from that in the sixth embodiment.
When the PE mode signal is the 8PE mode, the saturation calculation unit 63 sets the value of minimum value− (minimum value−network pattern value) +1 as the select signal when the network pattern value is smaller than the input minimum value. When the network pattern value is larger than the input maximum value, the value of maximum value− (network pattern value−maximum value) +1 is used as the select signal.

このときの、画面の端におけるデータの配置を図３７に、飽和演算部６３の動作を図３８に示す。図中のＰＥ０からＰＥ７はプロセッシングエレメントの番号を示し、ＨとＬはそれぞれのプロセッシングエレメントがハーフワード単位で画素値を格納していることを示す。図３８の第一段はネットワークパタン値を示す。第二段はセレクト値出力を示し、第三段はレジスタオフセット選択信号出力を示す。最大値は０ｘ０ｆとし、最小値は０ｘｆ０とする。ネットワークパタン値が０ｘ０ｆより大きいと、セレクト信号は最大値−（ネットワークパタン値−最大値）＋１で求まる値になる。よって、ＰＥ６Ｈではセレクト値出力が０ｆとなる。ＰＥ６Ｌ及びＰＥ７についても同様である。 The arrangement of data at the edge of the screen at this time is shown in FIG. 37, and the operation of the saturation calculation unit 63 is shown in FIG. PE0 to PE7 in the figure indicate processing element numbers, and H and L indicate that each processing element stores a pixel value in units of half words. The first row in FIG. 38 shows network pattern values. The second stage shows a select value output, and the third stage shows a register offset selection signal output. The maximum value is 0x0f, and the minimum value is 0xf0. If the network pattern value is larger than 0x0f, the select signal is a value obtained by the maximum value− (network pattern value−maximum value) +1. Therefore, the select value output is 0f in PE6H. The same applies to PE6L and PE7.

ＰＥモード信号が４ＰＥモードの場合も同様である。つまり、ＰＥ０〜３ではネットワークパタン値が０ｘ０ｆより大きいと、セレクト信号は“最大値−（ネットワークパタン値−最大値）＋１”となる。ＰＥ４〜７ではセレクト信号はネットワークパタン値の下位３ビットに８を加算した値となる。ネットワークパタン値が０ｘｆ０より小さい場合は、ＰＥ０〜３については、セレクト信号は“最小値−（最小値−ネットワークパタン値）＋１”となる。ＰＥ４〜７では、ネットワークパタン値が０ｘｆ０より小さい場合は、セレクト信号は“最小値−（最小値−ネットワークパタン値）＋１”に８を加算した値となる。 The same applies when the PE mode signal is the 4PE mode. That is, in PE0 to PE3, if the network pattern value is larger than 0x0f, the select signal becomes “maximum value− (network pattern value−maximum value) +1”. In PE4-7, the select signal is a value obtained by adding 8 to the lower 3 bits of the network pattern value. When the network pattern value is smaller than 0xf0, for PE0 to PE3, the select signal is “minimum value− (minimum value−network pattern value) +1”. In PE4-7, when the network pattern value is smaller than 0xf0, the select signal is a value obtained by adding 8 to "minimum value- (minimum value-network pattern value) +1".

以上のように本実施形態によれば、画面の端において、データを鏡のように折り返す処理が行え、ＦＩＲフィルタ演算などを行う際のさらなる高画質化が可能になる。
なお、本実施形態では、折り返す際に端のデータを繰り返す偶対称の折り返しとしたが、端のデータを繰り返さない奇対称の折り返しとしてもよい。
（第八実施形態）
図３９は本発明の第八実施形態に関するセレクト処理部５０の構成図である。図３９と図６の差異は、セレクト信号変換部９０である。 As described above, according to the present embodiment, data can be folded back at the edge of the screen like a mirror, and further image quality can be improved when performing FIR filter calculation and the like.
Note that in this embodiment, even symmetric wrapping that repeats end data when wrapping is used, but odd symmetric wrapping that does not repeat end data may be used.
(Eighth embodiment)
FIG. 39 is a block diagram of the select processing unit 50 according to the eighth embodiment of the present invention. The difference between FIG. 39 and FIG. 6 is the select signal converter 90.

セレクト信号変換部９０は、ネットワークパタン値から１６ｔｏ１セレクタ５１のセレクト信号のみを生成する。
図４０は本発明の第八実施形態に関するセレクト信号変換部９０の構成図である。
モジュロ演算部９１及び飽和演算部９２は、ＰＥモード信号に従い、ネットワークパタン値からセレクト信号を生成する。レジスタオフセット選択信号を出力しないこと以外は、第一実施形態と同じである。 The select signal conversion unit 90 generates only the select signal of the 16 to 1 selector 51 from the network pattern value.
FIG. 40 is a block diagram of the select signal converter 90 according to the eighth embodiment of the present invention.
The modulo arithmetic unit 91 and the saturation arithmetic unit 92 generate a select signal from the network pattern value according to the PE mode signal. The second embodiment is the same as the first embodiment except that the register offset selection signal is not output.

セレクタＡ９３は、動作モード信号によりモジュロ演算部９１又は飽和演算部９２のいずれかの出力を選択し、セレクト信号とする。
図４１は本発明の第八実施形態のレジスタ番号変換部２０とレジスタオフセットパタンレジスタの接続を示す図である。本実施形態では並列演算プロセッサは更に、レジスタオフセットパタンレジスタ１０２と、ローテート演算部１０３とを含む。 The selector A93 selects the output of either the modulo arithmetic unit 91 or the saturation arithmetic unit 92 according to the operation mode signal, and sets it as the select signal.
FIG. 41 is a diagram showing the connection between the register number conversion unit 20 and the register offset pattern register according to the eighth embodiment of the present invention. In the present embodiment, the parallel arithmetic processor further includes a register offset pattern register 102 and a rotation arithmetic unit 103.

レジスタオフセットパタンレジスタ１０２は、各プロセッシングエレメントのレジスタ番号変換部２０がレジスタオフセットを使用するかどうかを示す値を格納したレジスタである。つまり、レジスタオフセットパタンレジスタ１０２の各ビットが各プロセッシングエレメントのレジスタ番号変換部２０のレジスタオフセット選択信号となる。レジスタオフセットパタンレジスタ１０２は、プログラムによって書き込むことが可能であり、必要に応じて、ローテート演算部１０３で演算された結果が書き込まれる。 The register offset pattern register 102 is a register that stores a value indicating whether or not the register number conversion unit 20 of each processing element uses a register offset. That is, each bit of the register offset pattern register 102 becomes a register offset selection signal of the register number conversion unit 20 of each processing element. The register offset pattern register 102 can be written by a program, and the result calculated by the rotation calculation unit 103 is written as necessary.

ローテート演算部１０３は、変化量、ＰＥモード信号、ローテートモード信号が入力され、これらによって、動作を変更する。
ローテートモード信号は、ローテート又は反転ローテートを指定する。ローテートモードでは、ＰＥモード信号が８ＰＥモードであるとき、図４２（ａ）のように、ＭＳＢからＬＳＢ又は、ＬＳＢからＭＳＢに、変化量で指定されたビット数だけ、ローテート演算が行われ、結果がレジスタオフセットパタンレジスタ１０２に書き込まれる。ＰＥモード信号が４ＰＥモードであるとき、図４２（ｂ）のように、１５ビット目と８ビット目の間及び７ビット目と０ビット目の間で、変化量で指定されたビット数だけ、ローテート演算が行われ、結果がレジスタオフセットパタンレジスタ１０２に書き込まれる。 The rotation calculation unit 103 receives the change amount, the PE mode signal, and the rotation mode signal, and changes the operation based on these.
The rotate mode signal designates rotation or reverse rotation. In the rotate mode, when the PE mode signal is the 8PE mode, as shown in FIG. 42 (a), the rotation operation is performed by the number of bits specified by the change amount from the MSB to the LSB or from the LSB to the MSB. Is written into the register offset pattern register 102. When the PE mode signal is the 4PE mode, as shown in FIG. 42 (b), between the 15th bit and the 8th bit, and between the 7th bit and the 0th bit, the number of bits specified by the change amount, A rotation operation is performed, and the result is written in the register offset pattern register 102.

反転ローテートモードでは、ＰＥモード信号が８ＰＥモードであるとき、図４２（ｃ）のように、ＭＳＢからＬＳＢ又は、ＬＳＢからＭＳＢに、変化量で指定されたビット数だけ、ローテート演算が行われ、ローテートされたビットが反転されて、結果がレジスタオフセットパタンレジスタ１０２に書き込まれる。ＰＥモード信号が４ＰＥモードであるとき、図４２（ｄ）のように、１５ビット目と８ビット目の間及び７ビット目と０ビット目の間で、変化量で指定されたビット数だけ、ローテート演算が行われ、ローテートされたビットが反転されて、結果がレジスタオフセットパタンレジスタ１０２に書き込まれる。 In the reverse rotation mode, when the PE mode signal is the 8PE mode, as shown in FIG. 42 (c), the rotation operation is performed by the number of bits designated by the change amount from the MSB to the LSB or from the LSB to the MSB. The rotated bit is inverted and the result is written to the register offset pattern register 102. When the PE mode signal is the 4PE mode, as shown in FIG. 42 (d), between the 15th bit and the 8th bit, and between the 7th bit and the 0th bit, the number of bits specified by the change amount, A rotate operation is performed, the rotated bit is inverted, and the result is written to the register offset pattern register 102.

ここで変化量は、ネットワークシフト命令、又はネットワーク転置命令のネットワークパタン値の移動量と同じである。
次に、第一実施形態に示したｎｓｅｌ．ａ命令が本実施形態においても、実現可能であることを示す。図４３に、第一実施形態で示した図１８と同じ例を示す。ＰＥ動作モードを８ＰＥモード、動作モードをモジュロモードとし、ローテートモードはローテートとする。図４３では、レジスタオフセットパタンレジスタ１０２に０ｘ００００を設定することにより、同様の動作となる。図４４では、レジスタオフセットパタンレジスタ１０２に０ｘ５５５５を設定することにより、“１”となるプロセッシングエレメントのみが、レジスタオフセットを使用したレジスタｒ１から読み出すことになり、同様の動作となる。 Here, the change amount is the same as the movement amount of the network pattern value of the network shift instruction or the network transposition instruction.
Next, the nsel. It is shown that the a instruction can also be realized in this embodiment. FIG. 43 shows the same example as FIG. 18 shown in the first embodiment. The PE operation mode is 8PE mode, the operation mode is modulo mode, and the rotate mode is rotate. In FIG. 43, the same operation is performed by setting 0x0000 in the register offset pattern register 102. In FIG. 44, by setting 0x5555 in the register offset pattern register 102, only the processing element that becomes “1” reads from the register r1 that uses the register offset, and the same operation is performed.

次に、第二実施形態に示したｎｓｆｔｉ．ａ命令が本実施形態においても、実現可能であることを示す。図４５に、第二実施形態で示した図２２と同じ例を示す。ＰＥ動作モードを８ＰＥモード、動作モードをモジュロモード、ローテートモードを反転ローテートモードとする。図４５（ａ）では、レジスタオフセットパタンレジスタ１０２に０ｘ００００を設定することにより、同様の動作となる。図４５（ｂ）では、図４５（ａ）でのｎｓｆｔｉ．ａ命令により、レジスタオフセットパタンレジスタ１０２が、シフト量０ｘ０１だけ、反転ローテートされることにより、レジスタオフセットパタンレジスタ１０２の値は０ｘ８０００となり、ＰＥ０Ｈがレジスタオフセットを使用したレジスタｒ２からデータを読み出すことになるため、図２２（ｂ）と同じ動作となる。図４５（ｃ）では、図４５（ｂ）でのｎｓｆｔｉ．ａ命令により、レジスタオフセットパタンレジスタ１０２が、シフト量０ｘ０１だけ、反転ローテートされることにより、レジスタオフセットパタンレジスタ１０２の値は０ｘｃ０００となり、ＰＥ０Ｈ及びＰＥ０Ｌがレジスタオフセットを使用したレジスタｒ２からデータを読み出すことになるため、図２２（ｃ）と同じ動作となる。 Next, nsfti. It is shown that the a instruction can also be realized in this embodiment. FIG. 45 shows the same example as FIG. 22 shown in the second embodiment. The PE operation mode is the 8PE mode, the operation mode is the modulo mode, and the rotation mode is the reverse rotation mode. In FIG. 45A, the same operation is performed by setting 0x0000 in the register offset pattern register 102. In FIG. 45B, nsfti. As a result of the a instruction, the register offset pattern register 102 is inverted and rotated by the shift amount 0x01, so that the value of the register offset pattern register 102 becomes 0x8000, and PE0H reads data from the register r2 using the register offset. Therefore, the operation is the same as that in FIG. In FIG. 45C, nsfti. The register offset pattern register 102 is inverted and rotated by the shift amount 0x01 by the a instruction, so that the value of the register offset pattern register 102 becomes 0xc000, and PE0H and PE0L read data from the register r2 using the register offset. Therefore, the operation is the same as in FIG.

次に、第五実施形態に示したｎｔｒｐｄ．ａ命令が本実施形態においても、実現可能であることを示す。図４６に、第五実施形態で示した図２９と同じ例を示す。ＰＥ動作モードを４ＰＥモード、動作モードをモジュロモード、ローテートモードをローテートモードとする。図４６（ｆ）では、レジスタオフセットパタンレジスタ１０２に０ｘｆｆｆｆを設定することにより、すべてのプロセッシングエレメントでレジスタオフセットを使用することになり、同様の動作となる。図４６（ｇ）では、図４６（ｆ）でのｎｔｒｐｄ．ａ命令により、レジスタオフセットパタンレジスタ１０２が、シフト量０ｘ０２だけ、ローテートされることにより、レジスタオフセットパタンレジスタ１０２の値は変化せず、０ｘｆｆｆｆとなり、すべてのプロセッシングエレメントでレジスタオフセットを使用することになり、同様の動作となる。 Next, ntrpd. Shown in the fifth embodiment. It is shown that the a instruction can also be realized in this embodiment. FIG. 46 shows the same example as FIG. 29 shown in the fifth embodiment. The PE operation mode is the 4PE mode, the operation mode is the modulo mode, and the rotate mode is the rotate mode. In FIG. 46F, by setting 0xffff in the register offset pattern register 102, register offsets are used in all processing elements, and the same operation is performed. In FIG. 46 (g), the ntrpd. By the a instruction, the register offset pattern register 102 is rotated by the shift amount 0x02, so that the value of the register offset pattern register 102 does not change and becomes 0xffff, and the register offset is used in all the processing elements. The operation is similar.

以上のように本実施形態によれば、第一実施形態で必要であったレジスタオフセット選択信号の信号線の数が大幅に減少し、回路規模が削減される。また、プロセッシングエレメント間をつなぐ信号線が削減されるために、ＬＳＩ設計時の機能ブロックのレイアウトが容易になる。
（その他の変形例）
以上、本発明に係る並列演算プロセッサについて、実施の形態に基づいて説明したが、本発明は上記の実施の形態に限られないことは勿論である。 As described above, according to the present embodiment, the number of signal lines of the register offset selection signal required in the first embodiment is greatly reduced, and the circuit scale is reduced. In addition, since the signal lines connecting the processing elements are reduced, the layout of functional blocks at the time of LSI design is facilitated.
(Other variations)
The parallel arithmetic processor according to the present invention has been described based on the embodiment, but the present invention is not limited to the above embodiment.

（１）上記の実施形態では、プロセッシングエレメントの数は８、レジスタの数は１６、レジスタのビット幅は１６ビット、ネットワークは８ビット単位の移動としたが、これらの数やビット幅に限定されるものではなく、用途に応じて自由に選択してもよい。
（２）上記の実施形態では、ネットワークパタン値はネットワークパタンレジスタを使用したが、ネットワークパタンレジスタに相当するレジスタを複数設けてもよいし、汎用レジスタをネットワークパタンレジスタとして使用してもよい。 (1) In the above embodiment, the number of processing elements is 8, the number of registers is 16, the register bit width is 16 bits, and the network is moved in units of 8 bits. However, the number of processing elements is limited to these numbers and bit widths. It may be selected freely according to the application.
(2) In the above embodiment, a network pattern register is used as the network pattern value. However, a plurality of registers corresponding to the network pattern register may be provided, or a general-purpose register may be used as the network pattern register.

（３）上記の実施形態では、ＰＥモードや動作モードはそれぞれ制御レジスタであるＰＥモードレジスタ８２、動作モードレジスタ８３によって設定されることとしたが、命令によってこれらのモードを切り替えてもよい。
（４）上記の実施形態では、ＰＥモードは８ＰＥモードと４ＰＥモードの切り替えとしたが、２ＰＥモードや１ＰＥモードを設けてもよい。 (3) In the above embodiment, the PE mode and the operation mode are set by the PE mode register 82 and the operation mode register 83, which are control registers, respectively, but these modes may be switched by an instruction.
(4) In the above embodiment, the PE mode is switched between the 8PE mode and the 4PE mode, but a 2PE mode or a 1PE mode may be provided.

（５）上記の実施形態では、プロセッシングエレメントはおのおの独立した演算ユニットを保持しているが、長いビット長の演算器を分割してＳＩＭＤ演算を行う場合には、分割された各演算器をプロセッシングエレメントとしてもよい。
（６）上記の実施形態では、ネットワークパタン値の変化量をオペランド中に指定しているが、汎用レジスタや専用レジスタによって指定してもよい。 (5) In the above embodiment, each processing element holds an independent arithmetic unit. However, when SIMD arithmetic is performed by dividing an arithmetic unit having a long bit length, each processing unit is processed. It may be an element.
(6) In the above embodiment, the change amount of the network pattern value is specified in the operand, but it may be specified by a general-purpose register or a dedicated register.

（７）上記の実施形態では、並列演算プロセッサを用いたが、本発明は上記発明を実現するための方法であっても良い。
（８）上記の実施形態では、並列演算プロセッサを用いたが、本発明は並列演算プロセッサを搭載した信号処理装置であっても良い。信号処理装置とは例えば、画像装置、通信装置及び音声装置である。 (7) Although the parallel processor is used in the above embodiment, the present invention may be a method for realizing the above invention.
(8) In the above embodiment, the parallel arithmetic processor is used. However, the present invention may be a signal processing device equipped with a parallel arithmetic processor. The signal processing device is, for example, an image device, a communication device, and an audio device.

本発明に係る並列演算プロセッサは、上記実施形態に内部構成が開示されており、この内部構成に基づき量産することが可能なので、資質において工業上利用することができる。このことから本発明に係る録画再生装置は、産業上の利用可能性を有する。また、本発明に係る並列演算プロセッサは、画像処理に用いるプロセッサとして有用である。また、画像処理だけでなく、通信処理など大量のデータ処理を行う他の用途にも応用できる。 The parallel arithmetic processor according to the present invention has an internal configuration disclosed in the above embodiment, and can be mass-produced based on the internal configuration, so that it can be industrially utilized in qualities. Therefore, the recording / playback apparatus according to the present invention has industrial applicability. The parallel arithmetic processor according to the present invention is useful as a processor used for image processing. Further, it can be applied not only to image processing but also to other uses for performing a large amount of data processing such as communication processing.

本発明に係る並列演算プロセッサの全体構成図Overall configuration diagram of parallel processor according to the present invention プロセッシングエレメントＰＥ０の内部の構成図Internal configuration diagram of processing element PE0 演算器ユニット４０の構成の一例を示す図The figure which shows an example of a structure of the arithmetic unit 40 レジスタ番号変換部２０の構成を示す図The figure which shows the structure of the register number conversion part 20 プロセッシングエレメント間を接続するネットワーク３０ａの構成図Configuration diagram of network 30a for connecting processing elements 図５のセレクト処理部５０の構成を具体的に示したネットワーク３０aの一部を示す構成図FIG. 5 is a configuration diagram showing a part of the network 30a specifically showing the configuration of the select processing unit 50 in FIG. ネットワーク３０ａのセレクト処理部５０の構成図Configuration diagram of select processing unit 50 of network 30a セレクト信号変換部６０の構成図Configuration diagram of select signal converter 60 ＰＥモード信号が８ＰＥモードの場合のレジスタオフセット選択演算部の動作概要図Operation overview diagram of register offset selection calculation unit when PE mode signal is 8PE mode ＰＥモード信号が４ＰＥモードの場合のレジスタオフセット選択演算部の動作概要図Operation overview diagram of register offset selection calculation unit when PE mode signal is 4PE mode ＰＥモード信号が８ＰＥモードの場合のモジュロ演算部の動作概要図Operation outline diagram of modulo arithmetic unit when PE mode signal is 8PE mode ＰＥモード信号が４ＰＥモードの場合のモジュロ演算部の動作概要図Operation outline diagram of modulo arithmetic unit when PE mode signal is 4PE mode ＰＥモード信号が８ＰＥモードの場合の飽和演算部の動作概要図Operation overview diagram of saturation operation unit when PE mode signal is 8PE mode ＰＥモード信号が４ＰＥモードの場合の飽和演算部の動作概要図Operation outline diagram of saturation operation unit when PE mode signal is 4PE mode 図８の回路図Circuit diagram of FIG. セレクト処理部５０とレジスタ番号変換部２０の接続を示す構成図The block diagram which shows the connection of the selection process part 50 and the register number conversion part 20 命令デコーダ１１とプロセッシングエレメントＰＥ０Ｈの構成図Configuration diagram of instruction decoder 11 and processing element PE0H ネットワークセレクト命令の動作説明図Operation explanation diagram of network select command ネットワークセレクト命令の動作説明図Operation explanation diagram of network select command ネットワークシフト命令（ｎｓｆｔｉ．ａ命令）の動作説明図Operation explanatory diagram of network shift instruction (nsfti.a instruction) ＦＩＲフィルタの演算方法についての説明図Explanatory drawing about the calculation method of FIR filter ネットワークシフト命令（ｎｓｆｔｉ．ａ命令）の動作説明図Operation explanatory diagram of network shift instruction (nsfti.a instruction) ネットワークシフト命令（ｎｓｆｔｄ．ｂ命令）の動作説明図Operation explanatory diagram of network shift instruction (nsftd.b instruction) 画面の端におけるＦＩＲフィルタの演算方法についての説明図Explanatory drawing about the calculation method of FIR filter in the edge of a screen ネットワークシフト命令（ｎｓｆｔｉ．ａ命令）の動作説明図Operation explanatory diagram of network shift instruction (nsfti.a instruction) １６ビット動作モードの場合のセレクト処理部５０ａの構成図を示す図The figure which shows the block diagram of the selection process part 50a in the case of 16 bit operation mode. １６ビット動作モードでの飽和モードの動作説明図Operation explanatory diagram of saturation mode in 16-bit operation mode ネットワークパタン値に即値を用いた場合の１６ビット動作モードの動作説明図Operation explanatory diagram of 16-bit operation mode when immediate value is used for network pattern value ４ｘ４のデータ転置の説明図Illustration of 4x4 data transposition ８ｘ８のデータ転置の説明図Illustration of 8x8 data transposition ８ｘ８のデータ転置の説明図Illustration of 8x8 data transposition ８ｘ８のデータ転置の説明図Illustration of 8x8 data transposition ８ｘ８のデータ転置の説明図Illustration of 8x8 data transposition セレクト信号変換部の構成図Configuration diagram of select signal converter ＰＥモード信号が８ＰＥモードの場合の飽和演算部の動作説明図Operation explanatory diagram of saturation operation unit when PE mode signal is 8PE mode ＰＥモード信号が４ＰＥモードの場合の飽和演算部の動作説明図Operation explanatory diagram of saturation operation unit when PE mode signal is 4PE mode 画面の端におけるデータの配置を示す図Diagram showing the arrangement of data at the edge of the screen ＰＥモード信号が８ＰＥモードの場合の飽和演算部６３の動作説明図Operation explanatory diagram of saturation operation unit 63 when the PE mode signal is 8PE mode セレクト処理部の構成図Configuration diagram of select processing unit セレクト信号変換部の構成図Configuration diagram of select signal converter レジスタ番号変換部とレジスタオフセットパタンレジスタの接続図Connection diagram of register number converter and register offset pattern register ローテート演算部の動作説明図Explanation of operation of rotate calculation unit ネットワークセレクト命令の動作説明図Operation explanation diagram of network select command ネットワークセレクト命令の動作説明図Operation explanation diagram of network select command ネットワークシフト命令の動作説明図Operation explanation diagram of network shift instruction データ転置の動作説明図Operational explanation of data transposition 命令シーケンス図Instruction sequence diagram ＦＩＲ演算を行うためのデータを供給する命令シーケンス図Instruction sequence diagram for supplying data for FIR operation ４ｘ４の転置演算を行うための命令シーケンス図Instruction sequence diagram for performing 4x4 transpose operation ８ｘ８の転置演算を行うための命令シーケンス図Instruction sequence diagram for performing 8x8 transpose operation

Explanation of symbols

１０命令メモリ
１１命令デコーダ
１２全体制御部
１３データメモリ
１４プロセッシングエレメント群
１５レジスタファイル
１６論理和回路
２０レジスタ番号変換部
２０ａリード用のレジスタ番号変換部
２０ｂライト用のレジスタ番号変換部
２１レジスタオフセット値保持部
２２加減算器
２３モジュロ演算部
２４加減算器
２５セレクタ
３０ネットワーク
４０演算器ユニット
４１ａ算術論理演算器（ＡＬＵ）
４１ｂ算術論理演算器（ＡＬＵ）
４２バレルシフタ
４３乗算器
５０セレクト処理部
５０ａセレクト処理部
５１１６ｔｏ１セレクタ
５１ａ１６ｔｏ１セレクタ
５２セレクタ
５２ａセレクタ
５３ネットワークパタンレジスタ
５３ａネットワークパタンレジスタ
５４加減算器
５４ａ加減算器
６０セレクト信号変換部
６０ａセレクト信号変換部
６１レジスタオフセット選択演算部
６２モジュロ演算部
６３飽和演算部
６４セレクタＡ
６５セレクタＢ
６６デマルチプレクサ
８０レジスタオフセット変化量レジスタ
８１レジスタオフセットモジュロ値レジスタ
８２ＰＥモードレジスタ
８３動作モードレジスタ
９０セレクト信号変換部
９１モジュロ演算部
９２飽和演算部
９３セレクタＡ
１０２レジスタオフセットパタンレジスタ
１０３ローテート演算部 DESCRIPTION OF SYMBOLS 10 Instruction memory 11 Instruction decoder 12 Overall control part 13 Data memory 14 Processing element group 15 Register file 16 OR circuit 20 Register number conversion part 20a Register number conversion part for reading 20b Register number conversion part for writing 21 Register offset value holding Unit 22 adder / subtractor 23 modulo operation unit 24 adder / subtractor 25 selector 30 network 40 arithmetic unit 41a arithmetic logic unit (ALU)
41b Arithmetic logic unit (ALU)
42 barrel shifter 43 multiplier 50 select processor 50a select processor 51 16to1 selector 51a 16to1 selector 52 selector 52a selector 53 network pattern register 53a network pattern register 54 adder / subtractor 54a adder / subtractor 60 select signal converter 60a select signal converter 61 register offset Selection calculation unit 62 Modulo calculation unit 63 Saturation calculation unit 64 Selector A
65 Selector B
66 Demultiplexer 80 Register offset change amount register 81 Register offset modulo value register 82 PE mode register 83 Operation mode register 90 Select signal conversion unit 91 Modulo operation unit 92 Saturation operation unit 93 Selector A
102 Register offset pattern register 103 Rotate operation unit

Claims

A processor comprising a plurality of processing elements,
It has a decoder that decodes instructions,
Each processing element is
A transfer pattern storage means for storing a value indicating a transfer pattern indicating from which processing element data is transferred to the processing element;
Transfer means for executing data transfer with a processing element determined based on the transfer pattern;
Updating means for updating the value of the transfer pattern storage means in accordance with the result of decoding the immediately preceding instruction by the decoder.

The processor according to claim 1, wherein the value indicating the transfer pattern is a number representing a processing element serving as a transfer source.

Each processing element further comprises:
It has a register set consisting of multiple registers,
The processor according to claim 2, wherein the data is a stored value of each register in a register set.

The register set is
Based on a predetermined offset signal, output the data stored in any register,
The update by the update means is
Including arithmetic operations on values that indicate transfer patterns,
The processor according to claim 3, wherein the offset signal changes based on a carry or a carry accompanying a calculation on a value indicating the transfer pattern.

The updating means includes
Including arithmetic operation means and saturation operation means,
The saturation calculation means includes
It is determined whether or not the result of the arithmetic operation by the arithmetic operation means is out of a predetermined range, and when it is out of the predetermined range, a saturation operation is performed on the value indicating the transfer pattern,
The processor according to claim 2, wherein the value indicating the transfer pattern updated by the updating unit is one of a result of an arithmetic operation and a result of a saturation operation.

The update means includes output means,
In the saturation calculation, if a value indicating the transfer pattern is larger than the predetermined range, a first value is output as a saturated value, and if smaller than the predetermined range, a second value is output as a saturated value. Item 6. The processor according to Item 5.

The predetermined range is a range indicating the processing element number, the first value is a maximum value of the processing element number, and the second value is a minimum value of the processing element number. The processor according to claim 6.

The saturation calculation means further includes
In the saturation calculation, input of a predetermined maximum value and minimum value is accepted, and the predetermined range is a range indicated by the input predetermined maximum value and minimum value, and the first value and The processor according to claim 6, wherein is the predetermined maximum value, and the second value is the predetermined minimum value.

The saturation calculation means further includes
In the saturation calculation, input of a predetermined maximum value and minimum value is accepted, and the predetermined range is a range indicated by the input predetermined maximum value and minimum value, and the first value and 7 is a value obtained from the first equation, and the second value is a value obtained from the second equation.

Each register of the register set stores two byte-sized data,
The saturation calculation means includes
For each of the upper side and the lower side, it is determined whether or not the result of the arithmetic operation by the arithmetic operation means is out of a predetermined range. , Execute saturation operation on lower byte data at the same time,
The output means includes
The processor according to claim 6, wherein two different saturation values are output.

The updating means includes
An arithmetic operation means, and a modulo arithmetic means for performing a modulo arithmetic operation,
The modulo arithmetic means is
It is determined whether or not the result of the arithmetic operation by the arithmetic operation means is out of the range of the value indicating the transfer pattern, and when it is out of the range, a modulo operation is performed on the value indicating the transfer pattern,
The processor according to claim 3, wherein the value indicating the transfer pattern updated by the updating unit is one of a result of arithmetic operation and a result of modulo operation.

Each processing element in the previous term is further
A first changing unit that changes the lead offset value by performing a modulo arithmetic operation on the lead offset value;
A second changing unit that changes the write offset value by performing a modulo arithmetic operation on the write offset value;
The register set is
Based on the read offset value, read the data stored in one of the registers,
The processor according to claim 11, wherein data is written to any of the registers based on the write offset value.

The decoding process, the update process of the transfer pattern storage means,
The parallel arithmetic processor according to claim 2, wherein the parallel arithmetic processor is performed within one cycle in which the decoding process is performed.

The instruction includes an operand that specifies a transfer pattern change amount;
The processor further includes:
Register offset storage means that stores the value indicating whether or not to use the register offset value, and rotate operation or rotate operation with bit inversion function for the same number of bits as the transfer pattern change amount specified in the instruction The processor according to claim 2, further comprising: a rotation calculation unit that stores a result in the register offset storage unit.

A method used in a processor with a plurality of processing elements, comprising:
The processor is
It has a decoder that decodes instructions,
Each processing element is
A transfer pattern storage means for storing a value indicating a transfer pattern indicating from which processing element data is transferred to the processing element,
A transfer step of performing data transfer with a processing element determined based on the transfer pattern;
The method according to claim 1, further comprising: an update step of updating a value of the transfer pattern storage unit according to a result of decoding the immediately preceding instruction by the decoder.

A signal processing apparatus equipped with the processor according to claim 1.