JP5519951B2

JP5519951B2 - Array processor

Info

Publication number: JP5519951B2
Application number: JP2009094620A
Authority: JP
Inventors: セドゥーキンスタニスラフ; 敏明宮崎; 研一黒田
Original assignee: University of Aizu
Current assignee: University of Aizu
Priority date: 2008-05-01
Filing date: 2009-04-09
Publication date: 2014-06-11
Anticipated expiration: 2029-04-09
Also published as: JP2009289256A

Description

本発明は、アレイプロセッサに関し、より詳細には、積和演算処理を行うプロセッシングエレメントを各軸方向に配設して概念的な三次元配置状態に形成し、各プロセッシングエレメントに設けられる入力端子と隣接する他のプロセッシングエレメントに設けられる出力端子とを前記軸方向に対応させてトーラス状に接続させたアレイプロセッサに関する。 The present invention relates to an array processor, and more specifically, processing elements that perform product-sum operation processing are arranged in each axial direction to form a conceptual three-dimensional arrangement state, and input terminals provided in each processing element; The present invention relates to an array processor in which output terminals provided in other adjacent processing elements are connected in a torus shape corresponding to the axial direction.

従来より、画像を空間座標から周波数座標に変換する処理方式として二次元直交変換処理が知られている。この二次元直交変換処理は、デジタル技術の進んだ今日において大変多く用いられる変換処理であり、例えば、ＪＰＥＧ等の画像圧縮技術や動画圧縮技術等においても多く用いられている。さらに、時間軸を考え三次元のデータに対して周波数座標に変換する三次元直交変換も古くから動画圧縮への応用が考えられている。 Conventionally, two-dimensional orthogonal transformation processing is known as a processing method for converting an image from spatial coordinates to frequency coordinates. This two-dimensional orthogonal transformation process is a transformation process that is very frequently used in today's advanced digital technology. For example, it is often used in an image compression technique such as JPEG or a moving picture compression technique. Furthermore, considering the time axis, three-dimensional orthogonal transformation for converting three-dimensional data into frequency coordinates has also been considered for application to moving image compression.

次述する式（１）は、一般的な三次元離散直交変換処理に用いられる数式を示している。

The following formula (1) represents a mathematical formula used for general three-dimensional discrete orthogonal transform processing.

ここで、ｎ_１，ｎ_２，ｎ_３は、０以上ｎ−１以下の整数値（つまり、０≦ｎ_１≦ｎ−１，０≦ｎ_２≦ｎ−１，０≦ｎ_３≦ｎ−１）であり、ｋ_１，ｋ_２，ｋ_３は、０以上ｎ−１以下の整数値（つまり、０≦ｋ_１≦ｎ−１，０≦ｋ_２≦ｎ−１，０≦ｋ_３≦ｎ−１）である。また、Ｃ（ｎ_１，ｋ_１），Ｃ（ｎ_２，ｋ_２），Ｃ（ｎ_３，ｋ_３）はサイズがｎ×ｎの二次元係数行列を示し、Ｘ（ｎ₁，ｎ_２，ｎ_３）はサイズｎ×ｎ×ｎの三次元入力データ行列を示し、Ｙ（ｋ_１，ｋ_２，ｋ_３）はサイズｎ×ｎ×ｎの三次元直交変換後のデータを表す行列である。 Here, n ₁ , n ₂ and n ₃ are integer values of 0 or more and n−1 or less (that is, 0 ≦ n ₁ ≦ n−1, 0 ≦ n ₂ ≦ n−1, 0 ≦ n ₃ ≦ n−). 1), and k ₁ , k ₂ , and k ₃ are integer values of 0 or more and n−1 or less (that is, 0 ≦ k ₁ ≦ n−1, 0 ≦ k ₂ ≦ n−1, 0 ≦ k ₃ ≦). n-1). C (n ₁ , k ₁ ), C (n ₂ , k ₂ ), and C (n ₃ , k ₃ ) indicate a two-dimensional coefficient matrix having a size of n × n, and X (n ₁ , n ₂ , n ₃ ) represents a three-dimensional input data matrix of size n × n × n, and Y (k ₁ , k ₂ , k ₃ ) is a matrix representing data after three-dimensional orthogonal transformation of size n × n × n. .

式（１）に示す数式において、Ｃ（ｎ_１，ｋ_１），Ｃ（ｎ_２，ｋ_２），Ｃ（ｎ_３，ｋ_３）に記録される具体的な値を変更することにより、さまざまな直交変換の方式、例えば、ＪＰＥＧで採用されている離散コサイン変換（ＤＣＴ：Discrete Cosine Transform）の他に、ウォルシュ・アダマール変換（ＷＨＴ：Walsh-Hadamard Transform）、離散フーリエ変換（ＤＦＴ：Discrete Fourier Transform）、離散サイン変換（ＤＳＴ：Discrete Sine Transform）などの変換処理を実行することが可能となっている。 By changing the specific values recorded in C (n ₁ , k ₁ ), C (n ₂ , k ₂ ), C (n ₃ , k ₃ ) In addition to the discrete cosine transform (DCT: Discrete Cosine Transform) adopted in JPEG, for example, Walsh-Hadamard Transform (WHT), Discrete Fourier Transform (DFT) ) And discrete sine transform (DST: Discrete Sine Transform).

このような三次元離散直交変換処理をコンピュータを用いて演算する場合には、メモリに格納した係数Ｃと入力データＸとを虫食い的に何度もアクセスする必要があり、膨大なデータアクセスの発生により処理の高速化が困難であるという問題があった。このような問題を回避するために、一次元の離散直交変換専用回路を３つ用い、それぞれを接続して三次元離散直交変換処理を実現する方法が提案されている（例えば、特許文献１参照）。 When such a three-dimensional discrete orthogonal transformation process is performed using a computer, it is necessary to access the coefficient C and the input data X stored in the memory many times in an insecticidal manner, resulting in the occurrence of a huge amount of data access. Therefore, there is a problem that it is difficult to increase the processing speed. In order to avoid such a problem, a method has been proposed in which three one-dimensional discrete orthogonal transform dedicated circuits are used and connected to each other to realize three-dimensional discrete orthogonal transform processing (see, for example, Patent Document 1). ).

米国特許第５，１２６，９６２号明細書US Pat. No. 5,126,962

上述した特許文献１に記載の方法は、（１）式に示した各変数ｎ_１,ｎ_２,ｎ_３の１つのみを変化させながら順番に計算を進める方法に該当する。この方法を用いる場合には、一の軸方向に係る離散直交変換処理が終了した後に次の軸方向において離散直交変換処理を行うために、計算途中のデータや係数データを回路に入力させる順番を調整して各データの掛け合わせ処理を行う必要が生ずる。このため、特許文献１に示す方法を用いる場合には、計算途中のデータや係数データを回路に入力させる順番を調整するための専用回路を設ける必要が生じる（例えば、特許文献１の図５参照。）。このようにデータ調整用の専用回路を設けることによって、離散直交変換処理における処理速度の向上を図ることが可能になる一方で、アレイプロセッサの回路構成が複雑になってしまうという問題が生じていた。 The method described in Patent Document 1 described above corresponds to a method in which calculation is advanced in order while changing only one of the variables n ₁ , n ₂ , and n ₃ shown in Equation (1). When this method is used, in order to perform the discrete orthogonal transform process in the next axial direction after the discrete orthogonal transform process in one axial direction is completed, the order in which data or coefficient data during calculation is input to the circuit is changed. It is necessary to adjust and perform multiplication processing of each data. For this reason, when the method shown in Patent Document 1 is used, it is necessary to provide a dedicated circuit for adjusting the order in which data during calculation or coefficient data is input to the circuit (see, for example, FIG. 5 of Patent Document 1). .) By providing a dedicated circuit for data adjustment in this way, it is possible to improve the processing speed in the discrete orthogonal transform process, but there is a problem that the circuit configuration of the array processor becomes complicated. .

また、複数のプロセッシングエレメントによって構成されるアレイプロセッサを用いて三次元離散直交変換を実行する方法も考えられているが、各次元の計算途中においてプロセッシングエレメントの内部に蓄えられるデータおよび係数行列要素を、各プロセッシングエレメントを互いに接続させた複雑な配線構造を用いて何度も交換し合う必要が生ずるという問題があった。 In addition, a method of performing a three-dimensional discrete orthogonal transform using an array processor composed of a plurality of processing elements is also considered, but the data and coefficient matrix elements stored inside the processing elements during the calculation of each dimension are considered. There is a problem that it is necessary to exchange the processing elements many times using a complicated wiring structure in which the processing elements are connected to each other.

しかも、行列要素の交換は、隣接するプロセッシングエレメントの間だけに限定されないため、プロセッシングエレメントの間の配線構造に制約がある場合は，幾つかのプロセッシングエレメントを経由してデータの交換作業を行う必要が生じ、演算処理における処理負担が増大してしまうという問題があった。 Moreover, since the exchange of matrix elements is not limited to between adjacent processing elements, if there is a restriction in the wiring structure between the processing elements, it is necessary to exchange data via several processing elements. As a result, there is a problem that the processing load in the arithmetic processing increases.

さらに、このような問題は、式（２）に示す三次元離散直交変換の逆変換処理（三次元逆離散直交変換）においても同様に生じるものであった。

Furthermore, such a problem also occurs in the inverse transformation process (three-dimensional inverse discrete orthogonal transformation) of the three-dimensional discrete orthogonal transformation shown in Expression (2).

本発明は上記問題に鑑みて成されたものであり、計算途中におけるデータの入替作業を行ったり、その入替作業用の専用回路を設置したりすることなく、三次元直交変換および三次元逆直交変換を迅速に実行することが可能なアレイプロセッサを提供することを課題とする。 The present invention has been made in view of the above problems, and without performing data replacement work in the middle of calculation or installing a dedicated circuit for the replacement work, three-dimensional orthogonal transformation and three-dimensional inverse orthogonal It is an object of the present invention to provide an array processor that can execute conversion quickly.

上記課題を解決するために、本発明に係るアレイプロセッサは、積和演算機能を備えたプロセッシングエレメントを３つの軸方向にそれぞれｎ個ずつ配設することにより概念的な三次元配置状態を形成し、各プロセッシングエレメントに対して、前記軸方向に対応付けられた入力端子と出力端子とを各軸方向に対応付けて３組設け、同一軸方向に隣接配置される一のプロセッシングエレメントの当該軸方向における入力端子と他のプロセッシングエレメントの当該軸方向における出力端子とを接続することにより、各プロセッシングエレメントの３組の入力端子および出力端子を軸方向に対応させてそれぞれトーラス状に接続し、各プロセッシングエレメントでは、前記積和演算機能に基づいて積和演算を行った演算結果を、一の軸方向に対応する出力端子より当該一の軸方向に隣接する他のプロセッシングエレメントに出力すると共に、前記積和演算を行う際に用いた演算データを他の軸方向に対応する出力端子より当該他の軸方向に隣接する他のプロセッシングエレメントに出力し、前記演算結果と演算データとをそれぞれ異なる軸方向に隣接する他のプロセッシングエレメントより取得したプロセッシングエレメントでは、取得した前記演算結果と演算データとを用いて積和演算を行い、当該積和演算に基づく演算結果と前記演算データとを、それぞれ取得した入力端子に対応する出力端子よりそれぞれの軸方向に隣接する他のプロセッシングエレメントに対し出力することにより、一の軸方向に対してトーラス状に接続された全てのプロセッシングエレメントにおいて第一周期目のｎ回の積和演算処理を互いに同期させて実行し、該第一周期目のｎ回の積和演算処理の後に、各プロセッシングエレメントでは、前記演算結果を出力する出力端子の軸方向を変更すると共に、当該軸方向の変更に対応させて前記演算データを出力する出力端子の軸方向を変更して、第二周期目のｎ回の積和演算処理を互いに同期させて実行し、該第二周期目のｎ回の積和演算処理の後に、各プロセッシングエレメントでは、前記演算結果を出力する出力端子の軸方向を、第一周期目および第二周期目と異なる軸方向に変更すると共に、当該軸方向の変更に対応させて前記演算データを出力する出力端子の軸方向を第一周期目および第二周期目と異なる軸方向に変更して、第三周期目のｎ回の積和演算処理を互いに同期させて実行することによって、三次元直交変換処理を実行することを特徴とする。 In order to solve the above problems, an array processor according to the present invention forms a conceptual three-dimensional arrangement state by arranging n processing elements each having a product-sum operation function in three axial directions. For each processing element, three sets of input terminals and output terminals associated with the axial direction are provided in association with each axial direction, and the axial direction of one processing element arranged adjacently in the same axial direction By connecting the input terminal of the processing element and the output terminal of the other processing element in the axial direction, the three input terminals and the output terminal of each processing element are connected in a torus shape corresponding to the axial direction, and each processing element is connected. In the element, the calculation result obtained by performing the product-sum operation based on the product-sum operation function is processed in one axial direction. Output from the output terminal to the other processing element adjacent in the one axial direction, and the calculation data used when the product-sum operation is performed from the output terminal corresponding to the other axial direction to the other axial direction. For a processing element that is output to another adjacent processing element, and the calculation result and calculation data are acquired from other processing elements adjacent to each other in different axial directions, the product sum is obtained using the acquired calculation result and calculation data. By performing the calculation and outputting the calculation result based on the product-sum calculation and the calculation data to the other processing elements adjacent in the respective axial directions from the output terminals corresponding to the acquired input terminals, First in all processing elements connected in a torus in the axial direction The n times of the product-sum operation processes in the period are executed in synchronization with each other, and after each of the n times of the product-sum operation processes in the first period, each processing element has an axial direction of the output terminal that outputs the operation result And changing the axial direction of the output terminal that outputs the calculation data in response to the change in the axial direction, and executing n times of the product-sum calculation processes in the second period in synchronization with each other, After the nth product-sum operation processing in the second cycle, each processing element changes the axial direction of the output terminal that outputs the calculation result to a different axial direction from the first cycle and the second cycle. In addition, the axial direction of the output terminal that outputs the calculation data corresponding to the change in the axial direction is changed to an axial direction different from the first period and the second period, and the product of n times in the third period Execute sum calculation processing in synchronization with each other And performing a three-dimensional orthogonal transformation process.

このように、本発明に係るアレイプロセッサでは、各プロセッシングエレメントの入出力端子が軸方向に対応されてトーラス状に接続されているため、積和演算処理に伴う演算結果と演算処理に用いる演算データとをそれぞれ異なる軸方向に対して隣接する他のプロセッシングエレメントに順次送り出しながら（リレーのように順次伝搬させながら）、各プロセッシングエレメントにおいて個別に演算処理を行うことができる。従って、軸方向に配設されたｎ個のプロセッシングエレメントにおいて演算結果の送り出しを行いつつｎ回の演算処理を行うことにより、ｎ回の積和演算を軸方向に沿って行うことができる。 Thus, in the array processor according to the present invention, since the input / output terminals of the processing elements are connected in a torus shape corresponding to the axial direction, the operation result associated with the product-sum operation processing and the operation data used for the operation processing Are sequentially sent to other processing elements adjacent to each other in different axial directions (sequentially propagating like a relay), and each processing element can be individually operated. Therefore, n product-sum operations can be performed along the axial direction by performing the arithmetic processing n times while sending the calculation result in n processing elements arranged in the axial direction.

また、直交変換処理を行うためには、データの掛け合わせの都合により、演算結果や演算データ等の入力順番を調整させて演算処理を行う必要があるが、第一周期目（第一回目）の演算処理が行われた後に、送り出しを行う軸方向を変更させることによって、容易にデータの入力順番の変更を行うことが可能となる。このため、従来のアレイプロセッサのように、データの入力順番を調整するための専用回路を設ける必要がなくなり、また、データの入替作業等を直接的に行うことなく直交変換処理を行うことが可能となる。 In order to perform orthogonal transformation processing, it is necessary to perform calculation processing by adjusting the input order of calculation results, calculation data, and the like for convenience of data multiplication. The first cycle (first time) After the above calculation process is performed, the data input order can be easily changed by changing the axial direction in which the feed is performed. For this reason, there is no need to provide a dedicated circuit for adjusting the data input order as in a conventional array processor, and it is possible to perform orthogonal transform processing without directly performing data replacement work. It becomes.

特に、本発明に係るアレイプロセッサは、３つの軸方向に対してそれぞれｎ個ずつ配設されたプロセッシングエレメントにより概念的な三次元配置状態が形成され、各プロセッシングエレメントに対して各軸方向に対応付けられた３組の入出力端子が設けられているため、３回の周期にわたりｎ回の積和演算処理を行うことができる。このため、上述したデータの入替処理などを直接的に行うことなく、迅速、確実かつ簡易に三次元直交変換処理を行うことが可能となる。 In particular, in the array processor according to the present invention, a conceptual three-dimensional arrangement state is formed by n processing elements arranged in three axial directions, and each processing element corresponds to each axial direction. Since three attached input / output terminals are provided, it is possible to perform n product-sum operation processing over three cycles. For this reason, it is possible to perform the three-dimensional orthogonal transformation process quickly, surely and simply without directly performing the above-described data replacement process.

なお、本発明に係るアレイプロセッサでは、各プロセッシングエレメントが概念的な三次元配置状態を形成していればよいため、必ずしも物理的な立体形状を構成する必要はない。例えば、平面状に３つの軸を設けることによって、実際には平面的な配置状態において概念的な三次元配置状態を形成するものであってもよい。 In the array processor according to the present invention, it is only necessary that the processing elements form a conceptual three-dimensional arrangement state, and therefore it is not always necessary to form a physical three-dimensional shape. For example, a conceptual three-dimensional arrangement state may be formed in a planar arrangement state by providing three axes in a planar shape.

また、前記アレイプロセッサにおいて、前記各プロセッシングエレメントが、前記積和演算に用いられる被演算値を記憶する１つの被演算値記憶手段と、前記入力端子を介して入力された前記演算結果または前記演算データを記憶する３つの入力情報記憶手段と、前記積和演算機能による演算方法に対応して決定される定数値を記憶する３つの定数値記憶手段と、前記演算結果と前記演算データと前記被演算値と前記定数値とのいずれかを用いて積和演算を行う演算処理手段と、前記３つの入力端子のいずれかより入力された情報を前記入力情報記憶手段または前記被演算値記憶手段のいずれかに案内する入力スイッチ手段と、前記演算データおよび前記演算処理手段により積和演算が行われた演算結果を前記３つの出力端子のいずれかよりそれぞれ出力させる出力スイッチ手段と、前記被演算値記憶手段と前記入力情報記憶手段と前記定数値記憶手段とのいずれかより３つのデータを読み出して前記演算処理手段に案内するセレクタ手段と、前記入力スイッチ手段、前記出力スイッチ手段および前記セレクタ手段の制御を行う制御手段とを有し、前記制御手段は、前記演算結果が前記入力端子を介して入力された場合に、前記入力スイッチ手段を制御して当該演算結果を前記入力情報記憶手段のいずれかに案内し、前記セレクタ手段を制御して当該入力情報記憶手段より読み出された演算結果を前記演算処理手段に案内し、一周期においてまだｎ回の演算処理が行われていない場合には、前記出力スイッチ手段を制御して前記演算処理手段により積和演算が行われた演算結果を、前記演算結果が入力された前記入力端子に対応する軸方向の出力端子より出力し、一周期においてｎ回目の演算処理が行われた場合には、前記出力スイッチ手段を制御して前記演算処理手段により積和演算が行われた演算結果を、前記演算結果が入力された前記入力端子とは異なる軸方向の出力端子より出力するものであってもよい。 Further, in the array processor, each processing element has one operand value storage means for storing operand values used for the product-sum operation, and the calculation result or the calculation input via the input terminal. Three input information storage means for storing data, three constant value storage means for storing constant values determined in accordance with the calculation method by the product-sum calculation function, the calculation result, the calculation data, and the target An arithmetic processing means for performing a product-sum operation using either an arithmetic value or the constant value, and information input from any of the three input terminals is stored in the input information storage means or the operand value storage means. An input switch means for guiding to any one of the three output terminals, and the calculation data and the calculation result obtained by the product-sum calculation performed by the calculation processing means. Output switch means for outputting, selector means for reading three data from any one of the operand value storage means, the input information storage means, and the constant value storage means, and guiding the data to the arithmetic processing means; Control means for controlling the input switch means, the output switch means, and the selector means, and the control means controls the input switch means when the calculation result is input via the input terminal. Control to guide the calculation result to one of the input information storage means, control the selector means to guide the calculation result read from the input information storage means to the calculation processing means, and in one cycle If n times of arithmetic processing has not yet been performed, the output switch means is controlled to calculate the result of product-sum operation performed by the arithmetic processing means, When the calculation result is output from the axial output terminal corresponding to the input terminal and the nth calculation processing is performed in one cycle, the output switch means is controlled to control the calculation processing means. The calculation result obtained by the product-sum operation may be output from an output terminal in an axial direction different from the input terminal to which the calculation result is input.

このように、プロセッシングエレメントに対して、入力端子を介して取得されたデータを記録する入力情報記憶手段と被演算値を記憶する被演算値記憶手段とを設けることによって、これらの記憶手段を、ｎ回の積和演算において内容が変更されたデータを記憶するための記憶手段（入力情報記憶手段）と、同一周期の処理では変更されないが、異なる周期において内容が変更されたデータを記録するための記憶手段として利用することが可能となる。従って、制御手段の制御に応じて入力スイッチ手段およびセレクタ手段を制御して、処理過程に応じて適切なデータを記憶手段に記憶（退避）させておくことができ、また、適切なタイミングで記憶（退避）されたデータを積和演算処理に利用させることが可能となる。このため結果として、データの入力順番を調整するための専用回路を設けることなく実質的なデータの入替処理を行うことができ、直交変換処理を迅速かつ容易に実行することが可能となる。 Thus, by providing the processing element with the input information storage means for recording the data acquired via the input terminal and the operation value storage means for storing the operation value, these storage means are In order to record data whose contents are changed in a different cycle, but not changed in the processing of the same period as the storage means (input information storage means) for storing data whose contents have been changed in n product-sum operations It can be used as a storage means. Therefore, the input switch means and the selector means can be controlled according to the control of the control means, and appropriate data can be stored (saved) in the storage means according to the processing process, and stored at an appropriate timing. The (saved) data can be used for product-sum operation processing. As a result, substantial data replacement processing can be performed without providing a dedicated circuit for adjusting the data input order, and orthogonal transformation processing can be performed quickly and easily.

また、演算方法に対応して決定される定数値を記憶する定数値記憶手段が設けられているので、この定数値を演算方法に応じて変更することにより、さまざまな種類の三次元直交変換処理を実行することが可能となる。定数値を適宜変更することにより、例えば、後述する実施の形態において説明する離散コサイン変換の他に、ウォルシュ・アダマール変換、離散フーリエ変換、離散サイン変換等を行うことが可能となる。 In addition, since constant value storage means for storing a constant value determined according to the calculation method is provided, various types of three-dimensional orthogonal transformation processing can be performed by changing the constant value according to the calculation method. Can be executed. By appropriately changing the constant value, for example, Walsh-Hadamard transform, discrete Fourier transform, discrete sine transform, etc. can be performed in addition to the discrete cosine transform described in the embodiments described later.

さらに、制御手段が、入力スイッチ手段と、セレクタ手段と、出力スイッチ手段とを、積和演算処理に対応させて適宜制御することにより、３回の周期にわたるｎ回の積和演算処理を適切に行うことができるので、従来のようなデータの入力順番処理などを直接的に行うことなく、迅速、確実かつ簡易に三次元直交変換処理を行うことが可能となる。 Further, the control means appropriately controls the input switch means, the selector means, and the output switch means in accordance with the product-sum operation process, thereby appropriately performing n product-sum operation processes over three cycles. Therefore, the three-dimensional orthogonal transformation process can be performed quickly, surely and easily without directly performing the conventional data input order process.

本発明に係るアレイプロセッサによれば、各プロセッシングエレメントの入出力端子が軸方向に対応されてトーラス状に接続されているため、積和演算処理に伴う演算結果と演算処理に用いる演算データとをそれぞれ異なる軸方向に対して隣接する他のプロセッシングエレメントに順次送り出しながら（リレーのように順次伝搬させながら）、各プロセッシングエレメントにおいて個別に演算処理を行うことができる。従って、軸方向に配設されたｎ個のプロセッシングエレメントにおいて演算結果の送り出しを行いつつｎ回の演算処理を行うことにより、ｎ回の積和演算を軸方向に沿って行うことができる。 According to the array processor of the present invention, since the input / output terminals of the processing elements are connected in a torus shape corresponding to the axial direction, the calculation result associated with the product-sum calculation processing and the calculation data used for the calculation processing are obtained. While sequentially sending out to other processing elements adjacent to each other in different axial directions (sequentially propagating like a relay), it is possible to perform arithmetic processing individually in each processing element. Therefore, n product-sum operations can be performed along the axial direction by performing the arithmetic processing n times while sending the calculation result in n processing elements arranged in the axial direction.

また、直交変換処理を行うためには、データの掛け合わせの都合により、演算結果や演算データ等の入力順番を調整させて演算処理を行う必要があるが、第一周期目（第一回目）の演算処理が行われた後に、送り出しを行う軸方向を変更させることによって、第二周期目および第三周期目において、容易にデータの入力順番の変更を行うことが可能となる。このため、従来のアレイプロセッサのように、データの入力順番を調整するための専用回路を設ける必要がなくなり、入替作業等を直接的に行うことなく直交変換処理を行うことが可能となる。 In order to perform orthogonal transformation processing, it is necessary to perform calculation processing by adjusting the input order of calculation results, calculation data, and the like for convenience of data multiplication. The first cycle (first time) After the calculation process is performed, it is possible to easily change the data input order in the second period and the third period by changing the axial direction in which the feeding is performed. Therefore, unlike the conventional array processor, it is not necessary to provide a dedicated circuit for adjusting the data input order, and it is possible to perform orthogonal transform processing without directly performing replacement work or the like.

実施の形態１および実施の形態２に係るアレイプロセッサの概略構成を模式的に示した図である。It is the figure which showed typically the schematic structure of the array processor which concerns on Embodiment 1 and Embodiment 2. FIG. 実施の形態１に係るプロセッシングエレメントの概略構成を示したブロック図である。3 is a block diagram showing a schematic configuration of a processing element according to Embodiment 1. FIG. 実施の形態１に係るアレイプロセッサを用いて三次元離散コサイン変換処理を行う場合におけるレジスタＲ０〜Ｒ６の初期値および各処理ステップにおいて記録される内容を示した表である。6 is a table showing initial values of registers R0 to R6 and contents recorded in each processing step when a three-dimensional discrete cosine transform process is performed using the array processor according to the first embodiment. 図３に示した表において、各レジスタの内容が変更される状態を処理ステップに応じて矢印で示したものである。In the table shown in FIG. 3, the state in which the contents of each register are changed is indicated by an arrow according to the processing step. 実施の形態１に係るアレイプロセッサを用いて三次元逆離散コサイン変換処理を行う場合におけるレジスタＲ０〜Ｒ６の初期値および各処理ステップにおいて記録される内容を示した表である。6 is a table showing initial values of registers R0 to R6 and contents recorded in each processing step when performing a three-dimensional inverse discrete cosine transform process using the array processor according to the first embodiment. 図５に示した表において、各レジスタの内容が変更される状態を処理ステップに応じて矢印で示したものである。In the table shown in FIG. 5, the state in which the contents of each register are changed is indicated by an arrow according to the processing step. 実施の形態２に係るプロセッシングエレメントの概略構成を示したブロック図である。5 is a block diagram showing a schematic configuration of a processing element according to Embodiment 2. FIG. 実施の形態２に係るアレイプロセッサを用いて三次元離散コサイン変換処理を行う場合におけるレジスタＲ０〜Ｒ６の初期値および各処理ステップにおいて記録される内容を示した表である。10 is a table showing initial values of registers R0 to R6 and contents recorded in each processing step when a three-dimensional discrete cosine transform process is performed using the array processor according to the second embodiment. 図８に示した表において、各レジスタの内容が変更される状態を処理ステップに応じて矢印で示したものである。In the table shown in FIG. 8, the state in which the contents of each register are changed is indicated by an arrow according to the processing step. 実施の形態２に係るアレイプロセッサを用いて三次元逆離散コサイン変換処理を行う場合におけるレジスタＲ０〜Ｒ６の初期値および各処理ステップにおいて記録される内容を示した表である。10 is a table showing initial values of registers R0 to R6 and contents recorded in each processing step when performing a three-dimensional inverse discrete cosine transform process using the array processor according to the second embodiment. 図１０に示した表において、各レジスタの内容が変更される状態を処理ステップに応じて矢印で示したものである。In the table shown in FIG. 10, the state in which the contents of each register are changed is indicated by an arrow according to the processing step.

以下、本発明に係るアレイプロセッサを、図面を用いて詳細に説明する。なお、後述する実施の形態１および実施の形態２では、三次元離散コサイン変換（３Ｄ Discrete Cosine Transform:３Ｄ―ＤＣＴ）と三次元逆離散コサイン変換（３Ｄ Inverse Discrete Cosine Transform：３Ｄ−ＩＤＣＴ）とを計算するアレイプロセッサについて説明を行うが、本発明に係るアレイプロセッサは、この三次元離散コサイン変換および三次元逆離散コサイン変換の計算に用いるものだけには限定されず、上述した一般的な式（１）のＣを適宜変更することによって、他の変換方式に基づく三次元直交変換処理を実行することができる。 Hereinafter, an array processor according to the present invention will be described in detail with reference to the drawings. In Embodiment 1 and Embodiment 2 described later, 3D Discrete Cosine Transform (3D-DCT) and 3D Inverse Discrete Cosine Transform (3D-IDCT) are performed. The array processor to be calculated will be described. However, the array processor according to the present invention is not limited to the one used for the calculation of the three-dimensional discrete cosine transform and the three-dimensional inverse discrete cosine transform. By appropriately changing C in 1), a three-dimensional orthogonal transformation process based on another transformation method can be executed.

[実施の形態１]
図１は、実施の形態１に係るアレイプロセッサを示した図である。アレイプロセッサ１は、８個のプロセッシングエレメントＰＥを有している。ここでプロセッシングエレメントＰＥとは、積和演算処理を行う役割を有するアレイプロセッサの演算構成部である。 [Embodiment 1]
FIG. 1 is a diagram showing an array processor according to the first embodiment. The array processor 1 has eight processing elements PE. Here, the processing element PE is an operation configuration unit of an array processor having a role of performing product-sum operation processing.

プロセッシングエレメントＰＥは、縦方向、横方向、高さ方向（三軸方向）にそれぞれ２個ずつ配置されており、各プロセッシングエレメントＰＥは、直方体を成すアレイプロセッサ１の各頂点に一つずつ配置される構成となっている。 Two processing elements PE are arranged in each of the vertical direction, the horizontal direction, and the height direction (triaxial direction), and each processing element PE is arranged at each vertex of the array processor 1 that forms a rectangular parallelepiped. It is the composition which becomes.

実施の形態１では、前述した縦方向をｉ軸方向、前述した横方向をｊ軸方向、前述した高さ方向をｋ軸方向とする。また、各プロセッシングエレメントＰＥは、それぞれを他のプロセッシングエレメントＰＥと区別するために、ｉ軸、ｊ軸、ｋ軸により構成されるｉｊｋ空間の座標位置を用いてＰＥ（ｉ,ｊ,ｋ）で特定される。 In the first embodiment, the aforementioned vertical direction is the i-axis direction, the aforementioned horizontal direction is the j-axis direction, and the aforementioned height direction is the k-axis direction. Each processing element PE is expressed by PE (i, j, k) using the coordinate position of ijk space composed of i-axis, j-axis, and k-axis in order to distinguish each processing element PE from other processing elements PE. Identified.

各プロセッシングエレメントＰＥは、図１及び図２に示すように、−ｉ軸方向・−ｊ軸方向・ｋ軸方向に設けられる３つの入力端子（ｉ入力端子、ｊ入力端子、ｋ入力端子）と、各入力端子（ｉ入力端子、ｊ入力端子、ｋ入力端子）と対を成してそれぞれｉ軸方向・ｊ軸方向・−ｋ軸方向に設けられる３つの出力端子（ｉ出力端子、ｊ出力端子、ｋ出力端子）とを有している。 As shown in FIGS. 1 and 2, each processing element PE includes three input terminals (i input terminal, j input terminal, k input terminal) provided in the −i axis direction, the −j axis direction, and the k axis direction. , And three output terminals (i output terminal, j output) provided in pairs with each input terminal (i input terminal, j input terminal, k input terminal) in the i axis direction, j axis direction, and -k axis direction, respectively. Terminal, k output terminal).

各出力端子は、それぞれの出力端子の設置方向（軸方向）に隣接するプロセッシングエレメントＰＥの対向する（同軸方向に設けられた）入力端子に接続されている。例えば、ＰＥ（１，０，１）において−ｉ軸方向に向けて設けられるｉ出力端子は、ＰＥ（０，０，１）においてｉ軸方向に向けて設けられるｉ入力端子と接続される。 Each output terminal is connected to an opposing input terminal (provided in the coaxial direction) of the processing element PE adjacent in the installation direction (axial direction) of the respective output terminal. For example, an i output terminal provided in the PE (1, 0, 1) direction in the −i axis direction is connected to an i input terminal provided in the PE (0, 0, 1) direction in the i axis direction.

また、対向する方向に他のプロセッシングエレメントＰＥが存在しない場合には、それぞれの軸方向に沿った配設位置の両端部に位置するプロセッシングエレメントＰＥの入力端子および出力端子を接続させることにより、同軸方向に設けられる入力端子および出力端子同士を接続したトーラス状を構成する。 When there is no other processing element PE in the opposing direction, the input terminal and the output terminal of the processing element PE located at both ends of the arrangement positions along the respective axial directions are connected, so that A torus is formed by connecting input terminals and output terminals provided in a direction.

従って、ｉ軸方向に整列されたプロセッシングエレメントＰＥは、ｉ出力端子より−ｉ軸方向に隣接する他のプロセッシングエレメントＰＥのｉ入力端子に対してデータの出力を行うことが可能となっており、ｉ座標が０となる空間位置に存在するプロセッシングエレメントは、そのｉ出力端子を、ｉ軸方向の他端部側に位置するプロセッシングエレメントのｉ入力端子にトーラス状に接続させて、データを出力することが可能となっている。この構造は、それぞれｊ軸方向およびｋ軸方向に整列されたプロセッシングエレメントＰＥにおいても同様である。 Therefore, the processing elements PE aligned in the i-axis direction can output data from the i output terminal to the i input terminal of another processing element PE adjacent in the −i axis direction. A processing element existing at a spatial position where the i coordinate is 0 connects its i output terminal to the i input terminal of the processing element located on the other end side in the i-axis direction, and outputs data. It is possible. This structure is also the same in the processing elements PE aligned in the j-axis direction and the k-axis direction, respectively.

このように、三次元空間に配置されたプロセッシングエレメントＰＥにおいて、それぞれ対応する軸方向に整列された隣接のプロセッシングエレメントＰＥに対してデータを順次出力することができる。このため、アレイプロセッサ１では、隣接するプロセッシングエレメントＰＥにおいて演算された算出結果のデータや、演算に用いる所定のデータなどを隣接する他のプロセッシングエレメントＰＥに順次リレーして、連続的な積和演算処理を、アレイプロセッサ１全体で行うことが可能となっている。 In this way, in the processing elements PE arranged in the three-dimensional space, data can be sequentially output to the adjacent processing elements PE aligned in the corresponding axial direction. For this reason, the array processor 1 sequentially relays data of calculation results calculated in adjacent processing elements PE, predetermined data used for the calculation, and the like to other adjacent processing elements PE to perform continuous product-sum operations. Processing can be performed by the entire array processor 1.

図２は、各プロセッシングエレメントＰＥの内部構成を示したブロック図である。プロセッシングエレメントＰＥは、７個のレジスタＲ０〜Ｒ６と、入力スイッチ部４と、セレクタ部５と、演算回路部６と、出力スイッチ部７と、制御回路部８とを有している。 FIG. 2 is a block diagram showing an internal configuration of each processing element PE. The processing element PE has seven registers R0 to R6, an input switch unit 4, a selector unit 5, an arithmetic circuit unit 6, an output switch unit 7, and a control circuit unit 8.

レジスタＲ０〜Ｒ６は、三次元直交変換処理に用いるデータを記録することが可能となっている。レジスタＲ４、レジスタＲ５、レジスタＲ６には、式（１）に示したＣのデータが記録される。このＣは、実施の形態１に係るアレイプロセッサ１を用いて三次元直交変換処理を行う際に、その演算を行う処理方式に応じて決定される初期値であり、既に説明したように、Ｃの初期値を変更することによって、例えば、離散コサイン変換（ＤＣＴ）、ウォルシュ・アダマール変換（ＷＨＴ）、離散フーリエ変換（ＤＦＴ）、離散サイン変換（ＤＳＴ）などの演算を行うことが可能となる。 The registers R0 to R6 can record data used for the three-dimensional orthogonal transformation process. In the register R4, the register R5, and the register R6, the data C shown in Expression (1) is recorded. This C is an initial value determined according to the processing method for performing the calculation when performing the three-dimensional orthogonal transformation process using the array processor 1 according to the first embodiment. For example, it is possible to perform operations such as discrete cosine transform (DCT), Walsh Hadamard transform (WHT), discrete Fourier transform (DFT), and discrete sine transform (DST).

実施の形態１に係るアレイプロセッサ１では、三次元離散コサイン変換（３Ｄ―ＤＣＴ）に適した初期値（ＤＣＴ係数（固定値））が設定される。具体的には、各プロセッシングエレメントのＲ４に次述するＣ（ｉ，ｋ）の値が初期値として記録され、レジスタＲ５に次述するＣ（ｋ，ｊ）の値が初期値として記録され、レジスタＲ６に次述するＣ（ｉ，ｊ）の値が初期値として記録される。 In the array processor 1 according to the first embodiment, an initial value (DCT coefficient (fixed value)) suitable for three-dimensional discrete cosine transform (3D-DCT) is set. Specifically, the value of C (i, k) described below is recorded as an initial value in R4 of each processing element, and the value of C (k, j) described below is recorded as an initial value in register R5. The value of C (i, j) described below is recorded as an initial value in the register R6.

具体的に説明すると、プロセッシングエレメントＰＥの配置位置の座標位置（ｉ，ｊ，ｋ）において、ｉ＝０であり、かつ、０≦ｋ≦ｎ―１であるプロセッシングエレメントＰＥ（ｉ，ｊ，ｋ）のレジスタＲ４には、

が記録される。ここで、ｎは各軸方向に向けて配設されたプロセッシングエレメントＰＥの個数を示しており、実施の形態１では、ｉ軸方向、ｊ軸方向、ｋ軸方向のそれぞれに対して２個ずつプロセッシングエレメントが配設されているため、以下、ｎは２（ｎ＝２）となる。 More specifically, processing element PE (i, j, k) where i = 0 and 0 ≦ k ≦ n−1 at the coordinate position (i, j, k) of the arrangement position of processing element PE. ) Register R4

Is recorded. Here, n indicates the number of processing elements PE arranged in each axial direction, and in the first embodiment, two each for the i-axis direction, the j-axis direction, and the k-axis direction. Since the processing element is provided, n is 2 (n = 2) below.

次に、１≦ｉ≦ｎ―１であり、かつ、０≦ｋ≦１であるプロセッシングエレメントＰＥ（ｉ，ｊ，ｋ）のレジスタＲ４には、

が記録される。 Next, the register R4 of the processing element PE (i, j, k) where 1 ≦ i ≦ n−1 and 0 ≦ k ≦ 1

Is recorded.

また、ｋ＝０であり、かつ、０≦ｊ≦ｎ―１であるプロセッシングエレメントＰＥ（ｉ，ｊ，ｋ）のレジスタＲ５には、

が記録され、また、１≦ｋ≦ｎ−１であり、かつ、０≦ｊ≦ｎ―１であるプロセッシングエレメントＰＥ（ｉ，ｊ，ｋ）のレジスタＲ５には、

が記録される。 Further, the register R5 of the processing element PE (i, j, k) where k = 0 and 0 ≦ j ≦ n−1 holds

Is stored in the register R5 of the processing element PE (i, j, k) where 1 ≦ k ≦ n−1 and 0 ≦ j ≦ n−1.

Is recorded.

さらに、また、ｉ＝０であり、かつ、０≦ｊ≦ｎ―１であるプロセッシングエレメントＰＥ（ｉ，ｊ，ｋ）のレジスタＲ６には、

が記録され、また、１≦ｉ≦ｎ−１であり、かつ、０≦ｊ≦ｎ―１であるプロセッシングエレメントＰＥ（ｉ，ｊ，ｋ）のレジスタＲ６には、

が記録される。 Furthermore, the register R6 of the processing element PE (i, j, k) where i = 0 and 0 ≦ j ≦ n−1 is

Is stored in the register R6 of the processing element PE (i, j, k) where 1 ≦ i ≦ n−1 and 0 ≦ j ≦ n−1.

Is recorded.

なお、レジスタＲ４〜Ｒ６に記録される上述した式（３）〜式（８）の設定値（定数値）は、演算処理が終了するまで変更されることなく同一の値が維持される。 It should be noted that the set values (constant values) of the above-described equations (3) to (8) recorded in the registers R4 to R6 are not changed until the calculation process is completed, and the same value is maintained.

次に、レジスタＲ１、Ｒ２、Ｒ３には、初期値として０が記録されている。また、レジスタＲ０には、三次元離散コサイン変換処理を行う三次元の入力データ、具体的には上述した式（１）のＸ（ｉ，ｊ，ｋ）の値が記録されている。なお、図２に示すように、レジスタＲ０〜Ｒ３には、入力スイッチ部４を介してデータが案内される場合があり、この場合には各レジスタＲ０〜Ｒ３に新たなデータが上書き保存されてしまうため、演算処理に応じて記録されるデータが変更される可能性が生じる。このため、レジスタＲ０〜Ｒ３は、演算処理の処理経過に応じて演算結果、あるいは、演算に用いられる設定値が順次変更され得る。そして、後述するように、最終的にレジスタＲ３に対して、アレイプロセッサ１により演算された三次元直交変換処理の演算結果が記録される。 Next, 0 is recorded as an initial value in the registers R1, R2, and R3. The register R0 records three-dimensional input data for performing the three-dimensional discrete cosine transform processing, specifically, the value of X (i, j, k) in the above-described equation (1). As shown in FIG. 2, data may be guided to the registers R0 to R3 via the input switch unit 4. In this case, new data is overwritten and saved in the registers R0 to R3. Therefore, there is a possibility that data recorded in accordance with the arithmetic processing is changed. For this reason, in the registers R0 to R3, the calculation result or the setting value used for the calculation can be sequentially changed according to the progress of the calculation process. Then, as will be described later, the calculation result of the three-dimensional orthogonal transformation process finally calculated by the array processor 1 is recorded in the register R3.

入力スイッチ部４は、３つの入力端子（ｉ入力端子、ｊ入力端子、ｋ入力端子）より入力された情報を、制御回路部８の指示に応じて切り替えて、レジスタＲ０〜Ｒ３のいずれかに案内して記録させる役割を有している。実際に入力端子を介して入力されるデータは、３つの入力端子（ｉ入力端子、ｊ入力端子、ｋ入力端子）のうちのいずれか２つの入力端子を介して入力される。制御回路部８では、入力された情報を入力端子の種類に応じて判断し、入力スイッチ部４を制御して、入力されたデータを、それぞれ該当するレジスタ（レジスタＲ０〜Ｒ３のいずれか）に案内して記録させる。 The input switch unit 4 switches information input from three input terminals (i input terminal, j input terminal, k input terminal) in accordance with an instruction from the control circuit unit 8 to any one of the registers R0 to R3. It has a role to guide and record. Data that is actually input via the input terminal is input via any two of the three input terminals (i input terminal, j input terminal, k input terminal). The control circuit unit 8 determines the input information according to the type of the input terminal and controls the input switch unit 4 to transfer the input data to the corresponding register (any one of the registers R0 to R3). Guide and record.

セレクタ部５は、レジスタＲ０〜Ｒ６のいずれか３つのレジスタより演算回路部６の演算処理に用いられるデータを取得して演算回路部６に出力する。セレクタ部５では、取得した３つのデータを、演算回路部６の演算内容に応じて設けられる３つの入力端子（ａ入力端子，ｂ入力端子およびｃ入力端子）に出力する。なお、ｂ入力端子に対して出力されたデータは、ｂ入力端子を介して演算回路部６に入力されると共に、そのまま出力スイッチ部７へと出力される。セレクタ部５において、いずれのレジスタ（レジスタＲ０〜Ｒ６のうち３つのレジスタ）からデータを取得するか、および、取得したデータをａ入力端子，ｂ入力端子およびｃ入力端子のいずれに出力するかという判断処理は、制御回路部８の指示に応じて行われる。 The selector unit 5 acquires data used for arithmetic processing of the arithmetic circuit unit 6 from any three of the registers R0 to R6 and outputs the data to the arithmetic circuit unit 6. The selector unit 5 outputs the acquired three data to three input terminals (a input terminal, b input terminal, and c input terminal) provided according to the calculation contents of the arithmetic circuit unit 6. The data output to the b input terminal is input to the arithmetic circuit unit 6 via the b input terminal and is output to the output switch unit 7 as it is. In the selector unit 5, from which register (three of the registers R0 to R6) data is acquired, and whether the acquired data is output to the a input terminal, b input terminal, or c input terminal The determination process is performed according to an instruction from the control circuit unit 8.

演算回路部６では、ａ入力端子より取得したデータａと、ｂ入力端子より取得したデータｂと、ｃ入力端子より取得したデータｃとに基づいて、積和演算を行う。積和演算は、演算結果をｄとすると、ｄ＝ａ×ｂ＋ｃによって求められる。演算結果ｄは、演算回路部６の出力端子１０を介して出力スイッチ部７に出力される。 The arithmetic circuit unit 6 performs a product-sum operation based on the data a acquired from the a input terminal, the data b acquired from the b input terminal, and the data c acquired from the c input terminal. The product-sum operation is obtained by d = a × b + c, where d is the operation result. The calculation result d is output to the output switch unit 7 via the output terminal 10 of the calculation circuit unit 6.

出力スイッチ部７は、セレクタ部５より演算回路部６のｂ入力端子に対して出力されたデータｂと、演算回路部６の出力端子１０を介して出力された演算結果ｄとを取得し、制御回路部８の指示に応じて、３つの出力端子（ｉ出力端子、ｊ出力端子、ｋ出力端子）のいずれかよりデータｂおよび演算結果ｄを出力させる役割を有している。実際に、出力端子を介して出力されるデータは、３つの出力端子（ｉ出力端子、ｊ出力端子、ｋ出力端子）のうちいずれか２つの出力端子のみである。制御回路部８は、入力されたデータを判断し、出力スイッチ部７を制御することによって、データｂおよび演算結果ｄを、それぞれ該当する出力端子（ｉ出力端子、ｊ出力端子、ｋ出力端子のいずれか）に出力する。 The output switch unit 7 acquires the data b output from the selector unit 5 to the b input terminal of the arithmetic circuit unit 6 and the arithmetic result d output via the output terminal 10 of the arithmetic circuit unit 6, In accordance with an instruction from the control circuit unit 8, the data b and the operation result d are output from any of the three output terminals (i output terminal, j output terminal, k output terminal). Actually, the data output through the output terminal is only two of the three output terminals (i output terminal, j output terminal, k output terminal). The control circuit unit 8 determines the input data and controls the output switch unit 7 to convert the data b and the operation result d into the corresponding output terminals (i output terminal, j output terminal, k output terminal). To either).

制御回路部８は、上述した入力スイッチ部４、セレクタ部５、出力スイッチ部７の操作制御を行う役割を有している。制御回路部８は、入力スイッチ部４，セレクタ部５および出力スイッチ部７に入力されるデータを、その内容や入力された入力端子の種類に応じて判断し、アレイプロセッサ１の処理内容に応じて入力スイッチ部４、セレクタ部５、出力スイッチ部７の制御を行う。 The control circuit unit 8 has a role of performing operation control of the input switch unit 4, the selector unit 5, and the output switch unit 7 described above. The control circuit unit 8 determines the data input to the input switch unit 4, the selector unit 5, and the output switch unit 7 according to the content and the type of input terminal input, and according to the processing content of the array processor 1. The input switch unit 4, the selector unit 5, and the output switch unit 7 are controlled.

なお、図２には示していないが、各プロセッシングエレメントＰＥには、レジスタＲ０に対して三次元離散コサイン変換処理を行うための入力データを入力し、また、レジスタＲ１〜Ｒ３に対して初期値０を入力し、さらに、レジスタＲ４〜Ｒ６に対して初期値Ｃ（ｉ，ｋ），Ｃ（ｋ，ｊ），Ｃ（ｉ，ｊ）を入力するためのデータ入力手段や、最終的な演算結果が記録されるレジスタＲ３よりデータを取得するためのデータ取得手段などが設けられている。 Although not shown in FIG. 2, input data for performing the three-dimensional discrete cosine transform processing on the register R0 is input to each processing element PE, and initial values are input to the registers R1 to R3. 0, and data input means for inputting initial values C (i, k), C (k, j), C (i, j) to the registers R4 to R6, and final calculation Data acquisition means for acquiring data from the register R3 in which the result is recorded is provided.

次に、各プロセッシングエレメントＰＥの制御回路部８が、入力スイッチ部４，セレクタ部５，および出力スイッチ部７を適宜制御することにより、三次元離散コサイン変換処理（３Ｄ―ＤＣＴ処理）を行う過程を説明する。 Next, a process in which the control circuit unit 8 of each processing element PE performs a three-dimensional discrete cosine transform process (3D-DCT process) by appropriately controlling the input switch unit 4, the selector unit 5, and the output switch unit 7. Will be explained.

図３は、各プロセッシングエレメントＰＥ（ｉ，ｊ，ｋ）における各レジスタＲ０〜Ｒ６の初期値および各処理ステップにおいて記録される内容が利用され、または変更されるレジスタを示した表であり、図４は、図３に示したレジスタの内容が変更される状態を、処理ステップに応じて矢印で示したものである。なお、図４に示した破線による矢印は後述するデータｂに該当するデータの出力状態を示しており、実線による矢印は後述する演算結果ｄに該当するデータの出力状態を示している。 FIG. 3 is a table showing registers in which the initial values of the registers R0 to R6 and the contents recorded in the processing steps in each processing element PE (i, j, k) are used or changed. Reference numeral 4 denotes a state in which the contents of the register shown in FIG. 3 are changed by arrows according to processing steps. 4 indicates the output state of data corresponding to data b described later, and the solid arrow indicates the output state of data corresponding to calculation result d described later.

上述したレジスタＲ０〜Ｒ６の初期値の設定が行われた後、各プロセッシングエレメントＰＥの制御回路部８は、セレクタ部５を制御して、レジスタＲ４に記録されるデータを演算回路部６のａ入力端子に出力させ、レジスタＲ０に記録されるデータを演算回路部６のｂ入力端子に出力させ、レジスタＲ１に記録されるデータを演算回路部６のｃ入力端子に出力させる処理を実行する。この処理において、図３および図４に示すように、レジスタＲ４に記録されるデータは、Ｃ（ｉ，ｋ）の初期値であり、レジスタＲ０に記録されるデータは、三次元離散コサイン変換処理を行うための入力データ：Ｘ（ｉ，ｊ，ｋ）であり、レジスタＲ１に記録されるデータは、初期値の０である。 After the initial values of the registers R0 to R6 are set, the control circuit unit 8 of each processing element PE controls the selector unit 5 so that the data recorded in the register R4 is stored in the a of the arithmetic circuit unit 6. A process of outputting to the input terminal, outputting the data recorded in the register R0 to the b input terminal of the arithmetic circuit unit 6, and outputting the data recorded in the register R1 to the c input terminal of the arithmetic circuit unit 6 is executed. In this process, as shown in FIGS. 3 and 4, the data recorded in the register R4 is an initial value of C (i, k), and the data recorded in the register R0 is subjected to a three-dimensional discrete cosine transform process. Is input data: X (i, j, k), and the data recorded in the register R1 has an initial value of 0.

各プロセッシングエレメントＰＥの演算回路部６では、データａとして入力されたレジスタＲ４のデータと、データｂとして入力されたレジスタＲ０のデータと、データｃとして入力されたレジスタＲ１のデータとを用いて、積和演算（ｄ＝ａ×ｂ＋ｃの演算）を実行する。 The arithmetic circuit unit 6 of each processing element PE uses the data of the register R4 input as the data a, the data of the register R0 input as the data b, and the data of the register R1 input as the data c. A product-sum operation (operation of d = a × b + c) is executed.

次いで各プロセッシングエレメントＰＥの制御回路部８は、出力スイッチ部７を制御して演算結果ｄとデータｂとを取得し、演算結果ｄをｋ軸方向に隣接するプロセッシングエレメントＰＥのｋ入力端子に出力し、また、データｂ（詳細には、レジスタＲ０に記録されていたデータ）を−ｉ軸方向に隣接するプロセッシングエレメントＰＥのｉ入力端子に出力する。 Next, the control circuit unit 8 of each processing element PE controls the output switch unit 7 to acquire the calculation result d and data b, and outputs the calculation result d to the k input terminal of the processing element PE adjacent in the k-axis direction. In addition, the data b (specifically, the data recorded in the register R0) is output to the i input terminal of the processing element PE adjacent in the −i axis direction.

そして、各プロセッシングエレメントＰＥの制御回路部８は、入力スイッチ部４を制御して、隣接するプロセッシングエレメントＰＥよりｋ入力端子を介して演算結果ｄを取得し、ｉ入力端子を介してデータｂ（レジスタＲ０のデータ）とを取得して、演算結果ｄをレジスタＲ１に記録させるとともに、データｂ（レジスタＲ０のデータ）をレジスタＲ０に記録する。 Then, the control circuit unit 8 of each processing element PE controls the input switch unit 4 to obtain the operation result d from the adjacent processing element PE through the k input terminal, and the data b ( And the operation result d is recorded in the register R1, and the data b (data in the register R0) is recorded in the register R0.

このようにして、各プロセッシングエレメントＰＥの制御回路部８は、演算結果ｄおよびデータｂを隣接したプロセッシングエレメントＰＥに対して出力する処理を、各軸方向に配設されるプロセッシングエレメントＰＥの配置個数に対応する回数、つまりｎ回（実施の形態１では２回）だけ繰り返し実行して、ｉ入力端子およびｉ出力端子の接続によりトーラス状に接続された経路と、ｋ入力端子およびｋ出力端子の接続によりトーラス状に接続された経路とに沿って、演算処理結果および所定のデータを一巡させる。この一巡の処理が第一周期目のｎ回の積和演算処理に該当する。 In this way, the control circuit unit 8 of each processing element PE performs the process of outputting the calculation result d and data b to the adjacent processing element PE, and the number of processing elements PE arranged in each axial direction. , I.e., n times (in the first embodiment, twice), a path connected in a torus shape by connecting the i input terminal and the i output terminal, and the k input terminal and the k output terminal The calculation processing result and the predetermined data are made a round along the path connected in a torus shape by the connection. This one-round processing corresponds to n product-sum operation processing in the first period.

このｎ回の処理の繰り返しにより、各プロセッシングエレメントＰＥのレジスタＲ１には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果が記録されることになる。実施の形態１では、ｎ＝２であるため、２回だけ上述の処理が行われる。図３および図４に示した「０」および「１」の処理ステップは、上述したこの２回の処理に該当する処理内容を示している。 By repeating this process n times, the result of product-sum operation is recorded in all the processing elements PE arranged in the k-axis direction in the register R1 of each processing element PE. In the first embodiment, since n = 2, the above-described processing is performed only twice. The processing steps “0” and “1” shown in FIG. 3 and FIG. 4 indicate the processing contents corresponding to the above-described two processes.

続いて、各プロセッシングエレメントＰＥの制御回路部８は、セレクタ部５を制御して、レジスタＲ５に記録されるデータを演算回路部６のａ入力端子に出力させ、レジスタＲ１に記録されるデータを演算回路部６のｂ入力端子に出力させ、レジスタＲ２に記録されるデータを演算回路部６のｃ入力端子に出力させる処理を実行する。この処理において、レジスタＲ５に記録されるデータは、Ｃ（ｋ，ｊ）の初期値であり、レジスタＲ１に記録されるデータは、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果であり、レジスタＲ２に記録されるデータは、初期値の０である。 Subsequently, the control circuit unit 8 of each processing element PE controls the selector unit 5 to output the data recorded in the register R5 to the a input terminal of the arithmetic circuit unit 6, and the data recorded in the register R1. A process of outputting to the b input terminal of the arithmetic circuit unit 6 and outputting the data recorded in the register R2 to the c input terminal of the arithmetic circuit unit 6 is executed. In this processing, the data recorded in the register R5 is the initial value of C (k, j), and the data recorded in the register R1 is sum of products in all the processing elements PE arranged in the k-axis direction. The calculated result and the data recorded in the register R2 has an initial value of 0.

各プロセッシングエレメントＰＥの演算回路部６では、データａとして入力されたレジスタＲ５のデータと、データｂとして入力されたレジスタＲ１のデータと、データｃとして入力されたレジスタＲ２のデータとを用いて、積和演算（ｄ＝ａ×ｂ＋ｃの演算）を実行する。 The arithmetic circuit unit 6 of each processing element PE uses the data of the register R5 input as the data a, the data of the register R1 input as the data b, and the data of the register R2 input as the data c. A product-sum operation (operation of d = a × b + c) is executed.

次いで各プロセッシングエレメントＰＥの制御回路部８は、出力スイッチ部７を制御して、演算結果ｄとデータｂとを取得し、演算結果ｄを−ｊ軸方向に隣接するプロセッシングエレメントＰＥのｊ入力端子に出力し、また、データｂ（詳細には、レジスタＲ１に記録されていたデータ）をｋ軸方向に隣接するプロセッシングエレメントＰＥのｋ入力端子に出力する。 Next, the control circuit unit 8 of each processing element PE controls the output switch unit 7 to acquire the operation result d and data b, and the operation result d is j input terminal of the processing element PE adjacent in the −j axis direction. The data b (specifically, data recorded in the register R1) is output to the k input terminal of the processing element PE adjacent in the k-axis direction.

そして、各プロセッシングエレメントＰＥの制御回路部８は、入力スイッチ部４を制御して、隣接するプロセッシングエレメントＰＥよりｊ入力端子を介して演算結果ｄを取得し、ｋ入力端子を介してデータｂ（レジスタＲ１のデータ）を取得して、演算結果ｄをレジスタＲ２に記録させるとともに、データｂ（レジスタＲ１のデータ）をレジスタＲ１に記録する。 Then, the control circuit unit 8 of each processing element PE controls the input switch unit 4 to obtain the calculation result d from the adjacent processing element PE through the j input terminal, and the data b ( (Register R1 data) is acquired, the operation result d is recorded in the register R2, and the data b (data in the register R1) is recorded in the register R1.

このようにして、各プロセッシングエレメントＰＥの制御回路部８は、演算結果ｄおよびデータｂを隣接したプロセッシングエレメントＰＥに対して出力する処理を、各軸方向に配設されるプロセッシングエレメントＰＥの配置個数に対応する回数、つまりｎ回（実施の形態１では２回）だけ繰り返し実行して、ｊ入力端子およびｊ出力端子の接続によりトーラス状に接続された経路と、ｋ入力端子およびｋ出力端子の接続によりトーラス状に接続された経路とに沿って、演算処理結果および所定データを一巡させる。この一巡の処理が第二周期目のｎ回の積和演算処理に該当する。 In this way, the control circuit unit 8 of each processing element PE performs the process of outputting the calculation result d and data b to the adjacent processing element PE, and the number of processing elements PE arranged in each axial direction. , I.e., n times (in the first embodiment, twice), a path connected in a torus shape by connecting the j input terminal and the j output terminal, and the k input terminal and the k output terminal The arithmetic processing result and the predetermined data are circulated along the path connected in a torus shape by the connection. This one-round processing corresponds to n product-sum operation processing in the second period.

このｎ回の処理の繰り返しにより、各プロセッシングエレメントＰＥのレジスタＲ２には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）であるレジスタＲ１に基づいて、ｊ軸方向に配設された全てのプロセッシングエレメントＰＥで積和演算された演算結果（第二周期目のｎ回の積和演算処理による演算結果）が記録され、レジスタＲ１には、ｋ軸方向に配設された全てのプロセッシングエレメントにおいて積和演算された演算結果が記録されることになる。実施の形態１では、ｎ＝２であるため、２回だけ上述の処理が行われる。図３および図４に示した「２」および「３」の処理ステップは、上述したこの２回の処理に該当する処理内容を示している。 By repeating this processing n times, the result of product-sum operation in all the processing elements PE arranged in the k-axis direction (n-th product in the first cycle) is stored in the register R2 of each processing element PE. Based on the register R1 that is the result of the sum operation process, the result of the product-sum operation on all the processing elements PE arranged in the j-axis direction (by the nth product-sum operation process in the second period) (Operation result) is recorded, and the result of the product-sum operation in all the processing elements arranged in the k-axis direction is recorded in the register R1. In the first embodiment, since n = 2, the above-described processing is performed only twice. The processing steps “2” and “3” shown in FIG. 3 and FIG. 4 indicate the processing contents corresponding to the above-described two processes.

続いて、各プロセッシングエレメントＰＥの制御回路部８は、セレクタ部５を制御して、レジスタＲ６に記録されるデータを演算回路部６のａ入力端子に出力させ、レジスタＲ２に記録されるデータを演算回路部６のｂ入力端子に出力させ、レジスタＲ３に記録されるデータを演算回路部６のｃ入力端子に出力させる処理を実行する。この処理において、レジスタＲ６に記録されるデータは、Ｃ（ｉ，ｊ）の初期値であり、レジスタＲ２に記録されるデータは、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）に基づいて、ｊ軸方向に配設された全てのプロセッシングエレメントＰＥで積和演算された演算結果（第二周期目のｎ回の積和演算処理による演算結果）であり、レジスタＲ３に記録されるデータは、初期値の０である。 Subsequently, the control circuit unit 8 of each processing element PE controls the selector unit 5 to output the data recorded in the register R6 to the a input terminal of the arithmetic circuit unit 6, and the data recorded in the register R2 A process of outputting to the b input terminal of the arithmetic circuit unit 6 and outputting the data recorded in the register R3 to the c input terminal of the arithmetic circuit unit 6 is executed. In this processing, the data recorded in the register R6 is the initial value of C (i, j), and the data recorded in the register R2 is sum of products in all the processing elements PE arranged in the k-axis direction. On the basis of the calculated calculation result (the calculation result by the n-th product-sum calculation process in the first period), the calculation result (second calculation) performed by all the processing elements PE arranged in the j-axis direction The data recorded in the register R3 is an initial value of 0.

各プロセッシングエレメントＰＥの演算回路部６では、データａとして入力されたレジスタＲ６のデータと、データｂとして入力されたレジスタＲ２のデータと、データｃとして入力されたレジスタＲ３のデータとを用いて、積和演算（ｄ＝ａ×ｂ＋ｃの演算）を行う。 The arithmetic circuit unit 6 of each processing element PE uses the data of the register R6 input as the data a, the data of the register R2 input as the data b, and the data of the register R3 input as the data c. A product-sum operation (operation of d = a × b + c) is performed.

次いで各プロセッシングエレメントＰＥの制御回路部８は、出力スイッチ部７を制御して、演算結果ｄとデータｂとを取得して、演算結果ｄを−ｉ軸方向に隣接するプロセッシングエレメントＰＥのｉ入力端子に出力し、また、データｂ（詳細には、レジスタＲ２に記録されていたデータ）を−ｊ軸方向に隣接するプロセッシングエレメントＰＥのｊ入力端子に出力する。 Next, the control circuit unit 8 of each processing element PE controls the output switch unit 7 to acquire the operation result d and data b, and the operation result d is input to the i of the processing element PE adjacent in the −i axis direction. The data b (specifically, data recorded in the register R2) is output to the j input terminal of the processing element PE adjacent in the −j axis direction.

そして、各プロセッシングエレメントＰＥの制御回路部８は、入力スイッチ部４を制御して、隣接するプロセッシングエレメントＰＥよりｉ入力端子を介して演算結果ｄを取得し、ｊ入力端子を介してデータｂ（レジスタＲ２のデータ）を取得して、演算結果ｄをレジスタＲ３に記録させるとともに、データｂ（レジスタＲ２のデータ）をＲ２に記録する。 Then, the control circuit unit 8 of each processing element PE controls the input switch unit 4 to acquire the operation result d from the adjacent processing element PE through the i input terminal, and the data b ( The data (register R2 data) is acquired and the operation result d is recorded in the register R3, and the data b (data in the register R2) is recorded in R2.

このようにして、各プロセッシングエレメントＰＥの制御回路部８は、演算結果ｄおよびデータｂを隣接したプロセッシングエレメントＰＥに対して出力する処理を、各軸方向に配設されるプロセッシングエレメントＰＥの配置個数に対応する回数、つまりｎ回（実施の形態１では２回）だけ繰り返し実行して、ｉ入力端子およびｉ出力端子の接続によりトーラス状に接続された経路と、ｊ入力端子およびｊ出力端子の接続によりトーラス状に接続された経路とに沿って、演算処理結果および所定データを一巡させる。この一巡の処理が第三周期目のｎ回の積和演算処理に該当する。 In this way, the control circuit unit 8 of each processing element PE performs the process of outputting the calculation result d and data b to the adjacent processing element PE, and the number of processing elements PE arranged in each axial direction. , I.e., n times (in the first embodiment, twice), a path connected in a torus shape by connecting the i input terminal and the i output terminal, and the j input terminal and the j output terminal The arithmetic processing result and the predetermined data are circulated along the path connected in a torus shape by the connection. This one-round processing corresponds to n product-sum operation processing in the third period.

このｎ回の処理の繰り返しにより、各プロセッシングエレメントＰＥのレジスタＲ３には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）に基づいて積和演算されたｊ軸方向に配設される全てのプロセッシングエレメントＰＥの積和演算結果（第二周期目のｎ回の積和演算処理による演算結果）を用いて、ｉ軸方向に配設された全てのプロセッシングエレメントＰＥで演算処理した演算結果（第三周期目のｎ回の積和演算処理による演算結果）が記録され、レジスタＲ２には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）に基づいて、ｊ軸方向に配設された全てのプロセッシングエレメントＰＥで積和演算された演算結果（第二周期目のｎ回の積和演算処理による演算結果）が記録されることになる。実施の形態１では、ｎ＝２であるため、２回だけ上述の処理が行われる。図３および図４に示した「４」および「５」の処理ステップは、上述したこの２回の処理に該当する処理内容を示している。 As a result of repeating this process n times, the register R3 of each processing element PE stores the result of the product-sum operation in all the processing elements PE arranged in the k-axis direction (nth product in the first period). The product-sum operation results of all the processing elements PE arranged in the j-axis direction that have undergone product-sum operation based on the result of the operation of the sum operation processing (operation results of the n-th product-sum operation processing in the second period) Is used to record the calculation results (calculation results of the nth product-sum calculation processing in the third period) performed by all the processing elements PE arranged in the i-axis direction, and the register R2 stores k Based on the result of the product-sum operation in all the processing elements PE arranged in the axial direction (the operation result by the n-th product-sum operation processing in the first period), the j-axis Product-sum computed operation results in all processing elements PE arranged in direction (calculation result by n multiply and sum operation of the second round eyes) is to be recorded. In the first embodiment, since n = 2, the above-described processing is performed only twice. The processing steps “4” and “5” shown in FIG. 3 and FIG. 4 indicate the processing contents corresponding to the above-described two processes.

このようにして実施の形態１に係るアレイプロセッサ１では、隣接するプロセッシングエレメントＰＥに対して、演算結果および積和演算に用いる所定データを順次出力してそれぞれのプロセッシングエレメントＰＥにおいて積和演算処理を行い、この隣接するプロセッシングエレメントＰＥへの演算結果の出力処理を、順次ｋ軸方向、ｊ軸方向、ｉ軸方向へと出力方向を変えて処理を進めることにより、各演算回路部６における積和演算結果をレジスタＲ３に記録させることができる。従って、従来のアレイプロセッサのように、計算途中でデータや係数データの入力順番を調整するための専用回路を設ける必要がなく、回路構成の簡素化を図ることが可能となる。また、演算結果を隣接するプロセッシングエレメントＰＥに順次出力することによって三次元離散コサイン変換処理（三次元直交変換処理）を行うことが可能であるため、各次元の計算途中においてプロセッシングエレメントＰＥの内部に蓄えられるデータおよび係数行列要素を、従来のように複雑な配線構造を用いて何度も交換し合う必要がなく、処理の迅速化および簡素化を図ることが可能となる。 In this way, in the array processor 1 according to the first embodiment, the operation result and the predetermined data used for the product-sum operation are sequentially output to the adjacent processing elements PE, and the product-sum operation processing is performed in each processing element PE. The output of the calculation result to the adjacent processing element PE is sequentially processed by changing the output direction to the k-axis direction, the j-axis direction, and the i-axis direction. The calculation result can be recorded in the register R3. Therefore, unlike the conventional array processor, it is not necessary to provide a dedicated circuit for adjusting the input order of data and coefficient data during the calculation, and the circuit configuration can be simplified. Further, since the three-dimensional discrete cosine transform process (three-dimensional orthogonal transform process) can be performed by sequentially outputting the calculation result to the adjacent processing element PE, the calculation element PE can be inserted inside the processing element PE during the calculation of each dimension. The stored data and coefficient matrix elements do not need to be exchanged many times using a complicated wiring structure as in the prior art, and the processing can be speeded up and simplified.

このように、実施の形態１に係るアレイプロセッサ１を用いることにより、従来の構成において必須とされていた計算途中におけるデータ同士の入替作業を行うことなくなる。また、複数のプロセッシングエレメントＰＥを接続させて演算結果処理を順次行うことにより、三次元離散コサイン変換処理（三次元離散直交変換処理）を迅速に実行することができ、さらに回路構成の複雑化を抑制することが可能となる。 As described above, by using the array processor 1 according to the first embodiment, it is not necessary to perform replacement work between data during calculation, which is essential in the conventional configuration. In addition, by connecting a plurality of processing elements PE and sequentially performing calculation result processing, three-dimensional discrete cosine transform processing (three-dimensional discrete orthogonal transform processing) can be performed quickly, and the circuit configuration is further complicated. It becomes possible to suppress.

次に、上述したアレイプロセッサ１を用いて、三次元逆離散コサイン変換（３Ｄ−ＩＤＣＴ）を計算する方法を説明する。なお、図５は、各プロセッシングエレメントＰＥ（ｉ，ｊ，ｋ）における各レジスタＲ０〜Ｒ６の初期値および各処理ステップにおいて記録されるレジスタを示した表であり、図６は、図５に示したレジスタの内容が変更される状態を、処理ステップに応じて矢印で示したものである。なお、図６に示した破線による矢印は、演算処理に用いられるデータｂに該当するデータの出力状態を示しており、実線による矢印は演算結果ｄに該当するデータの出力状態を示している。 Next, a method for calculating the three-dimensional inverse discrete cosine transform (3D-IDCT) using the array processor 1 described above will be described. FIG. 5 is a table showing the initial values of the registers R0 to R6 and the registers recorded in each processing step in each processing element PE (i, j, k), and FIG. 6 is shown in FIG. The state in which the contents of the registers are changed is indicated by arrows according to the processing steps. 6 indicates the output state of the data corresponding to the data b used in the calculation process, and the solid line arrow indicates the output state of the data corresponding to the calculation result d.

三次元逆離散コサイン変換を行う場合には、式（２）において三次元逆離散コサイン変換用の固定値（IＤＣＴ係数（固定値））として下記のようなＣ（ｋ，ｊ），Ｃ（ｉ，ｋ）およびＣ（ｉ，ｊ）が用いられ、それぞれの値がレジスタＲ４，Ｒ５，Ｒ６に初期値として記録される。 When performing the three-dimensional inverse discrete cosine transform, the following C (k, j), C (i) are used as fixed values (IDCT coefficients (fixed values)) for the three-dimensional inverse discrete cosine transform in Equation (2). , K) and C (i, j) are used, and the respective values are recorded as initial values in the registers R4, R5 and R6.

具体的に説明すると、プロセッシングエレメントＰＥの配置位置の座標位置（ｉ，ｊ，ｋ）において、ｋ＝０であり、かつ、０≦ｊ≦ｎ―１であるプロセッシングエレメントＰＥ（ｉ，ｊ，ｋ）のレジスタＲ４には、

が記録される。ここで、ｎは各軸方向に向けて配設されたプロセッシングエレメントＰＥの個数を示しており、実施の形態１では、以下、ｎ＝２となる。 More specifically, the processing element PE (i, j, k) where k = 0 and 0 ≦ j ≦ n−1 at the coordinate position (i, j, k) of the processing element PE. ) Register R4

Is recorded. Here, n indicates the number of processing elements PE arranged in the respective axial directions, and in the first embodiment, n = 2 hereinafter.

次に、１≦ｋ≦ｎ―１であり、かつ、０≦ｊ≦ｎ−１であるプロセッシングエレメントＰＥ（ｉ，ｊ，ｋ）のレジスタＲ４には、

が記録される。 Next, in the register R4 of the processing element PE (i, j, k) where 1 ≦ k ≦ n−1 and 0 ≦ j ≦ n−1,

Is recorded.

また、ｉ＝０であり、かつ、０≦ｋ≦ｎ―１であるプロセッシングエレメントＰＥ（ｉ，ｊ，ｋ）のレジスタＲ５には、

が記録され、また、１≦ｉ≦ｎ−１であり、かつ、０≦ｋ≦ｎ―１であるプロセッシングエレメントＰＥ（ｉ，ｊ，ｋ）のレジスタＲ５には、

が記録される。 Further, the register R5 of the processing element PE (i, j, k) where i = 0 and 0 ≦ k ≦ n−1 includes

Is stored in the register R5 of the processing element PE (i, j, k) where 1 ≦ i ≦ n−1 and 0 ≦ k ≦ n−1.

Is recorded.

また、レジスタＲ１、Ｒ２、Ｒ３には、初期値として０が記録されて、レジスタＲ０には、三次元逆離散コサイン変換処理を行う三次元の入力データ、具体的には上述した式（２）のＹ（ｉ，ｊ，ｋ）の値が記録されている。 Also, 0 is recorded as an initial value in the registers R1, R2, and R3, and three-dimensional input data for performing the three-dimensional inverse discrete cosine transform processing, specifically, the above-described equation (2) is stored in the register R0. Y (i, j, k) values are recorded.

このレジスタＲ０〜Ｒ６の初期値の設定が行われた後、各プロセッシングエレメントＰＥの制御回路部８は、セレクタ部５を制御して、レジスタＲ４に記録されるデータを演算回路部６のａ入力端子に出力させ、レジスタＲ０に記録されるデータを演算回路部６のｂ入力端子に出力させ、レジスタＲ１に記録されるデータを演算回路部６のｃ入力端子に出力させる処理を実行する。この処理において、レジスタＲ４に記録されるデータは、Ｃ（ｋ，ｊ）の初期値であり、レジスタＲ０に記録されるデータは、三次元逆離散コサイン変換処理を行うための入力データＹ（ｉ，ｊ，ｋ）であり、レジスタＲ１に記録されるデータは、初期値の０である。 After the initial values of the registers R0 to R6 are set, the control circuit unit 8 of each processing element PE controls the selector unit 5 to input the data recorded in the register R4 to the a input of the arithmetic circuit unit 6. A process of outputting data to the terminal, outputting data recorded in the register R0 to the b input terminal of the arithmetic circuit unit 6, and outputting data recorded in the register R1 to the c input terminal of the arithmetic circuit unit 6 is executed. In this process, the data recorded in the register R4 is an initial value of C (k, j), and the data recorded in the register R0 is input data Y (i for performing the three-dimensional inverse discrete cosine transform process. , J, k), and the data recorded in the register R1 has an initial value of 0.

各プロセッシングエレメントＰＥの演算回路部６では、データａとして入力されたレジスタＲ４のデータと、データｂとして入力されたレジスタＲ０のデータと、データｃとして入力されたレジスタＲ１のデータとを用いて、積和演算（ｄ＝ａ×ｂ＋ｃの演算）を行う。 The arithmetic circuit unit 6 of each processing element PE uses the data of the register R4 input as the data a, the data of the register R0 input as the data b, and the data of the register R1 input as the data c. A product-sum operation (operation of d = a × b + c) is performed.

次いで各プロセッシングエレメントＰＥの制御回路部８は、出力スイッチ部７を制御して、演算結果ｄとデータｂとを取得し、演算結果ｄをｋ軸方向に隣接するプロセッシングエレメントＰＥのｋ入力端子に出力し、また、データｂ（詳細には、レジスタＲ０に記録されていたデータ）を−ｊ軸方向に隣接するプロセッシングエレメントＰＥのｊ入力端子に出力する。 Next, the control circuit unit 8 of each processing element PE controls the output switch unit 7 to obtain the calculation result d and data b, and the calculation result d is applied to the k input terminal of the processing element PE adjacent in the k-axis direction. Further, data b (specifically, data recorded in the register R0) is output to the j input terminal of the processing element PE adjacent in the −j axis direction.

そして、各プロセッシングエレメントＰＥの制御回路部８は、入力スイッチ部４を制御して、隣接するプロセッシングエレメントＰＥよりｋ入力端子を介して演算結果ｄを取得し、ｊ入力端子を介してデータｂ（レジスタＲ０のデータ）を取得して、演算結果ｄをレジスタＲ１に記録させるとともに、データｂ（レジスタＲ０のデータ）をレジスタＲ０に記録する。 Then, the control circuit unit 8 of each processing element PE controls the input switch unit 4 to obtain the operation result d from the adjacent processing element PE through the k input terminal, and the data b ( (Register R0 data) is acquired, the operation result d is recorded in the register R1, and the data b (data in the register R0) is recorded in the register R0.

このようにして、各プロセッシングエレメントＰＥの制御回路部８は、演算結果ｄおよびデータｂを隣接したプロセッシングエレメントＰＥに対して出力する処理を、各軸方向に配設されるプロセッシングエレメントＰＥの配置個数に対応する回数、つまりｎ回（実施の形態１では２回）だけ繰り返し実行して、ｋ入力端子およびｋ出力端子の接続によりトーラス状に接続された経路と、ｊ入力端子およびｊ出力端子の接続によりトーラス状に接続された経路とに沿って、演算処理結果および所定データを一巡させる。この一巡の処理が第一周期目のｎ回の積和演算処理に該当する。 In this way, the control circuit unit 8 of each processing element PE performs the process of outputting the calculation result d and data b to the adjacent processing element PE, and the number of processing elements PE arranged in each axial direction. , I.e., n times (twice in the first embodiment), a path connected in a torus shape by connecting the k input terminal and the k output terminal, and the j input terminal and the j output terminal The arithmetic processing result and the predetermined data are circulated along the path connected in a torus shape by the connection. This one-round processing corresponds to n product-sum operation processing in the first period.

このｎ回の処理の繰り返しにより、各プロセッシングエレメントＰＥのレジスタＲ１には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）が記録されることになる。実施の形態１では、ｎ＝２であるため、２回だけ上述の処理が行われる。図５および図６に示した「０」および「１」の処理ステップは、上述したこの２回の処理に該当する処理内容を示している。 By repeating this processing n times, the result of product-sum operation in all the processing elements PE arranged in the k-axis direction (nth product in the first cycle) is stored in the register R1 of each processing element PE. The calculation result by the sum calculation process) is recorded. In the first embodiment, since n = 2, the above-described processing is performed only twice. The processing steps “0” and “1” shown in FIG. 5 and FIG. 6 indicate processing contents corresponding to the above-described two processes.

続いて、各プロセッシングエレメントＰＥの制御回路部８は、セレクタ部５を制御して、レジスタＲ５に記録されるデータを演算回路部６のａ入力端子に出力させ、レジスタＲ１に記録されるデータを演算回路部６のｂ入力端子に出力させ、レジスタＲ２に記録されるデータを演算回路部６のｃ入力端子に出力させる処理を実行する。この処理において、レジスタＲ５に記録されるデータは、Ｃ（ｉ，ｋ）の初期値であり、レジスタＲ１に記録されるデータは、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果であり、レジスタＲ２に記録されるデータは、初期値の０である。 Subsequently, the control circuit unit 8 of each processing element PE controls the selector unit 5 to output the data recorded in the register R5 to the a input terminal of the arithmetic circuit unit 6, and the data recorded in the register R1. A process of outputting to the b input terminal of the arithmetic circuit unit 6 and outputting the data recorded in the register R2 to the c input terminal of the arithmetic circuit unit 6 is executed. In this processing, the data recorded in the register R5 is the initial value of C (i, k), and the data recorded in the register R1 is sum of products in all the processing elements PE arranged in the k-axis direction. The calculated result and the data recorded in the register R2 has an initial value of 0.

次いで各プロセッシングエレメントＰＥの制御回路部８は、出力スイッチ部７を制御して、演算結果ｄとデータｂとを取得し、演算結果ｄを−ｉ軸方向に隣接するプロセッシングエレメントＰＥのｉ入力端子に出力し、また、データｂ（詳細には、レジスタＲ１に記録されていたデータ）をｋ軸方向に隣接するプロセッシングエレメントＰＥのｋ入力端子に出力する。 Next, the control circuit unit 8 of each processing element PE controls the output switch unit 7 to acquire the calculation result d and data b, and the calculation result d is input to the i input terminal of the processing element PE adjacent in the -i axis direction. The data b (specifically, data recorded in the register R1) is output to the k input terminal of the processing element PE adjacent in the k-axis direction.

そして、各プロセッシングエレメントＰＥの制御回路部８は、入力スイッチ部４を制御して、隣接するプロセッシングエレメントＰＥよりｉ入力端子を介して演算結果ｄを取得し、ｋ入力端子を介してデータｂ（レジスタＲ１のデータ）を取得して、演算結果ｄをレジスタＲ２に記録させるとともに、データｂ（レジスタＲ１のデータ）をレジスタＲ１に記録する。 Then, the control circuit unit 8 of each processing element PE controls the input switch unit 4 to acquire the operation result d from the adjacent processing element PE through the i input terminal, and the data b ( (Register R1 data) is acquired, the operation result d is recorded in the register R2, and the data b (data in the register R1) is recorded in the register R1.

このようにして、各プロセッシングエレメントＰＥの制御回路部８は、演算結果ｄおよびデータｂを隣接したプロセッシングエレメントＰＥに対して出力する処理を、各軸方向に配設されるプロセッシングエレメントＰＥの配置個数に対応する回数、つまりｎ回（実施の形態１では２回）だけ繰り返し実行して、ｉ入力端子およびｉ出力端子の接続によりトーラス状に接続された経路と、ｋ入力端子およびｋ出力端子の接続によりトーラス状に接続された経路とに沿って、演算処理結果および所定データを一巡させる。この一巡の処理が第二周期目のｎ回の積和演算処理に該当する。 In this way, the control circuit unit 8 of each processing element PE performs the process of outputting the calculation result d and data b to the adjacent processing element PE, and the number of processing elements PE arranged in each axial direction. , I.e., n times (in the first embodiment, twice), a path connected in a torus shape by connecting the i input terminal and the i output terminal, and the k input terminal and the k output terminal The arithmetic processing result and the predetermined data are circulated along the path connected in a torus shape by the connection. This one-round processing corresponds to n product-sum operation processing in the second period.

このｎ回の処理の繰り返しにより、各プロセッシングエレメントＰＥのレジスタＲ２には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）であるレジスタＲ１に基づいて、ｉ軸方向に配設された全てのプロセッシングエレメントＰＥで積和演算された演算結果（第二周期目のｎ回の積和演算処理による演算結果）が記録され、レジスタＲ１には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果が記録されることになる。図５および図６に示した「２」および「３」の処理ステップは、上述したこの２回の処理に該当する処理内容を示している。 By repeating this processing n times, the result of product-sum operation in all the processing elements PE arranged in the k-axis direction (n-th product in the first cycle) is stored in the register R2 of each processing element PE. Based on the register R1 which is the result of the sum operation process, the result of the product-sum operation on all the processing elements PE arranged in the i-axis direction (by n product-sum operation processes in the second period) (Operation result) is recorded, and the result of product-sum operation in all the processing elements PE arranged in the k-axis direction is recorded in the register R1. The processing steps “2” and “3” shown in FIG. 5 and FIG. 6 indicate processing contents corresponding to the above-described two processings.

続いて、各プロセッシングエレメントＰＥの制御回路部８は、セレクタ部５を制御して、レジスタＲ６に記録されるデータを演算回路部６のａ入力端子に出力させ、レジスタＲ２に記録されるデータを演算回路部６のｂ入力端子に出力させ、レジスタＲ３に記録されるデータを演算回路部６のｃ入力端子に出力させる処理を実行する。この処理において、レジスタＲ６に記録されるデータは、Ｃ（ｉ，ｊ）の初期値であり、レジスタＲ２に記録されるデータは、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）に基づいて、ｉ軸方向に配設された全てのプロセッシングエレメントＰＥで積和演算された演算結果（第二周期目のｎ回の積和演算処理による演算結果）であり、レジスタＲ３に記録されるデータは、初期値の０である。 Subsequently, the control circuit unit 8 of each processing element PE controls the selector unit 5 to output the data recorded in the register R6 to the a input terminal of the arithmetic circuit unit 6, and the data recorded in the register R2 A process of outputting to the b input terminal of the arithmetic circuit unit 6 and outputting the data recorded in the register R3 to the c input terminal of the arithmetic circuit unit 6 is executed. In this processing, the data recorded in the register R6 is the initial value of C (i, j), and the data recorded in the register R2 is sum of products in all the processing elements PE arranged in the k-axis direction. Based on the calculated calculation result (the calculation result by the n-th product-sum calculation processing in the first cycle), the calculation result (second calculation) performed by all the processing elements PE arranged in the i-axis direction The data recorded in the register R3 is an initial value of 0.

次いで各プロセッシングエレメントＰＥの制御回路部８は、出力スイッチ部７を制御して、演算結果ｄとデータｂとを取得して、演算結果ｄを−ｊ軸方向に隣接するプロセッシングエレメントＰＥのｊ入力端子に出力し、また、データｂ（詳細には、レジスタＲ２に記録されていたデータ）を−ｉ軸方向に隣接するプロセッシングエレメントＰＥのｉ入力端子に出力する。 Next, the control circuit unit 8 of each processing element PE controls the output switch unit 7 to acquire the operation result d and data b, and the operation result d is input to the j of the processing element PE adjacent in the −j axis direction. The data b (specifically, data recorded in the register R2) is output to the i input terminal of the processing element PE adjacent in the -i axis direction.

そして、各プロセッシングエレメントＰＥの制御回路部８は、入力スイッチ部４を制御して、隣接するプロセッシングエレメントＰＥよりｊ入力端子を介して演算結果ｄを取得し、ｉ入力端子を介してデータｂ（レジスタＲ２のデータ）を取得して、演算結果ｄをレジスタＲ３に記録させるとともに、データｂ（レジスタＲ２のデータ）をレジスタＲ２に記録する。 Then, the control circuit unit 8 of each processing element PE controls the input switch unit 4 to obtain the operation result d from the adjacent processing element PE through the j input terminal, and to the data b ( The data (register R2 data) is acquired, the operation result d is recorded in the register R3, and the data b (data in the register R2) is recorded in the register R2.

このようにして、各プロセッシングエレメントＰＥの制御回路部８は、演算結果ｄおよびデータｂを隣接したプロセッシングエレメントＰＥに対して出力する処理を、各軸方向に配設されるプロセッシングエレメントＰＥの配置個数に対応する回数、つまりｎ回（実施の形態１では２回）だけ繰り返し実行して、ｊ入力端子およびｊ出力端子の接続によりトーラス状に接続された経路と、ｉ入力端子およびｉ出力端子の接続によりトーラス状に接続された経路とに沿って、演算処理結果および所定データを一巡させる。この一巡の処理が第三周期目のｎ回の積和演算処理に該当する。 In this way, the control circuit unit 8 of each processing element PE performs the process of outputting the calculation result d and data b to the adjacent processing element PE, and the number of processing elements PE arranged in each axial direction. , I.e., n times (in the first embodiment, twice), a path connected in a torus shape by connecting the j input terminal and the j output terminal, and the i input terminal and the i output terminal The arithmetic processing result and the predetermined data are circulated along the path connected in a torus shape by the connection. This one-round processing corresponds to n product-sum operation processing in the third period.

このｎ回の処理の繰り返しにより、各プロセッシングエレメントＰＥのレジスタＲ３には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）に基づいて積和演算されたｉ軸方向に配設される全てのプロセッシングエレメントＰＥの積和演算結果（第二周期目のｎ回の積和演算処理による演算結果）を用いて、ｊ軸方向に配設された全てのプロセッシングエレメントＰＥで演算処理した演算結果（第三周期目のｎ回の積和演算処理による演算結果）が記録され、レジスタＲ２には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）に基づいて、ｉ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第二周期目のｎ回の積和演算処理による演算結果）が記録されることになる。実施の形態１では、ｎ＝２であるため、２回だけ上述の処理が行われる。図５および図６に示した「４」および「５」の処理ステップは、上述したこの２回の処理に該当する処理内容を示している。 As a result of repeating this process n times, the register R3 of each processing element PE stores the result of the product-sum operation in all the processing elements PE arranged in the k-axis direction (nth product in the first period). The product-sum operation results of all the processing elements PE arranged in the i-axis direction that are product-sum-calculated based on the result of the sum operation processing (operation results of the n-th product-sum operation processing in the second period) Are used to record the calculation results (calculation results of the nth product-sum calculation processing in the third cycle) performed by all the processing elements PE arranged in the j-axis direction, and k is stored in the register R2. Based on the result of the product-sum operation in all the processing elements PE arranged in the axial direction (the operation result by the n-th product-sum operation processing in the first period), the i-axis Product-sum computed operation results in all the processing elements PE arranged in direction (calculation result by n times of product-sum operation processing of the second round eyes) is to be recorded. In the first embodiment, since n = 2, the above-described processing is performed only twice. The processing steps “4” and “5” shown in FIG. 5 and FIG. 6 indicate processing contents corresponding to the above-described two processings.

このようにして実施の形態１に係るアレイプロセッサ１では、隣接するプロセッシングエレメントＰＥに対して、演算結果および積和演算に用いる所定データを順次出力してそれぞれのプロセッシングエレメントＰＥにおいて積和演算処理を行い、この隣接するプロセッシングエレメントＰＥへの演算結果の出力処理を、順次ｋ軸方向、ｉ軸方向、ｊ軸方向へと出力方向を変えて処理を進めることにより、各演算回路部６における積和演算結果をレジスタＲ３に記録させることができる。従って、従来のアレイプロセッサ１のように、計算途中でデータや係数データの入力順番を調整するための専用回路を設ける必要がなく、回路構成の簡素化を図ることが可能となる。また、演算結果を隣接するプロセッシングエレメントＰＥに順次出力することによって三次元逆離散コサイン変換処理（三次元逆直交変換処理）を行うことが可能であるため、各次元の計算途中においてプロセッシングエレメントＰＥの内部に蓄えられるデータおよび係数行列要素を、従来のように複雑な配線構造を用いて何度も交換し合う必要がなく、処理の迅速化および簡素化を図ることが可能となる。 In this way, in the array processor 1 according to the first embodiment, the operation result and the predetermined data used for the product-sum operation are sequentially output to the adjacent processing elements PE, and the product-sum operation processing is performed in each processing element PE. The output of the calculation result to the adjacent processing element PE is sequentially processed by changing the output direction to the k-axis direction, the i-axis direction, and the j-axis direction. The calculation result can be recorded in the register R3. Therefore, unlike the conventional array processor 1, it is not necessary to provide a dedicated circuit for adjusting the input order of data and coefficient data during calculation, and the circuit configuration can be simplified. In addition, since the three-dimensional inverse discrete cosine transform process (three-dimensional inverse orthogonal transform process) can be performed by sequentially outputting the calculation results to adjacent processing elements PE, the processing element PE of each dimension is calculated in the middle. It is not necessary to exchange the data and coefficient matrix elements stored in the inside many times using a complicated wiring structure as in the conventional case, and it becomes possible to speed up and simplify the processing.

このように、実施の形態１に係るアレイプロセッサ１を用いることにより、従来の構成において必須とされていた計算途中におけるデータ同士の入替作業を行うことがなくなる。また、複数のプロセッシングエレメントを接続させて演算結果処理を順次行うことにより、三次元逆離散コサイン変換処理（三次元逆離散直交変換処理）を迅速に実行することができ、さらに回路構成の複雑化を抑制することが可能となる。 As described above, by using the array processor 1 according to the first embodiment, it is not necessary to perform replacement work between data during calculation, which is essential in the conventional configuration. In addition, by connecting multiple processing elements and processing the results sequentially, 3D inverse discrete cosine transform processing (3D inverse discrete orthogonal transform processing) can be executed quickly, and the circuit configuration becomes more complex. Can be suppressed.

[実施の形態２]
次に、実施の形態２に係るアレイプロセッサについて説明を行う。実施の形態２に係るアレイプロセッサ１は、図１に示した実施の形態１に係るアレイプロセッサ１を構成するプロセッシングエレメントＰＥの構成状態（各プロセッシングエレメントＰＥの接続状態・配置状態）と同様の構成状態であるが、アレイプロセッサを構成するプロセッシングエレメントＰＥの内部構造が異なっている。 [Embodiment 2]
Next, the array processor according to the second embodiment will be described. The array processor 1 according to the second embodiment has the same configuration as the configuration state of the processing elements PE (connection state / arrangement state of each processing element PE) constituting the array processor 1 according to the first embodiment shown in FIG. In this state, the internal structure of the processing element PE constituting the array processor is different.

図７は、実施の形態２に係るプロセッシングエレメントＰＥの概略構成を模式的に示した図である。実施の形態２に係るプロセッシングエレメントＰＥは、実施の形態１に係るプロセッシングエレメントＰＥ（図２参照）と比較して、セレクタ部１５から演算回路部１６のa入力端子に対して出力されるデータが、演算回路部１６のａ入力端子だけでなく出力スイッチ部１７にも出力することができるように構成される点で相違する。このため、実施の形態２に係るプロセッシングエレメントＰＥでは、実施の形態１に係るプロセッシングエレメントＰＥに比べて、出力スイッチ部１７へと延設される出力端子の端子数が１本多いという特徴を有している。 FIG. 7 is a diagram schematically showing a schematic configuration of the processing element PE according to the second embodiment. In the processing element PE according to the second embodiment, the data output from the selector unit 15 to the a input terminal of the arithmetic circuit unit 16 is compared with the processing element PE according to the first embodiment (see FIG. 2). The operation circuit unit 16 is different in that it can be output not only to the a input terminal but also to the output switch unit 17. Therefore, the processing element PE according to the second embodiment has a feature that the number of output terminals extended to the output switch unit 17 is one more than that of the processing element PE according to the first embodiment. doing.

また、実施の形態２に係るアレイプロセッサ１では、追加されたａ入力端子からの入出力信号に応じて出力スイッチ部１７や他の要素を制御するために、制御回路部１８が設けられている。なお、出力スイッチ部１７と制御回路部１８以外の各要素、すなわち入力スイッチ部１４，セレクタ部１５，演算回路部１６およびレジスタＲ０〜Ｒ６は、実施の形態１に係るプロセッシングエレメントＰＥと同一構成である。 In the array processor 1 according to the second embodiment, a control circuit unit 18 is provided to control the output switch unit 17 and other elements according to the input / output signal from the added a input terminal. . The elements other than the output switch unit 17 and the control circuit unit 18, that is, the input switch unit 14, the selector unit 15, the arithmetic circuit unit 16, and the registers R0 to R6 have the same configuration as the processing element PE according to the first embodiment. is there.

本構成のアレイプロセッサを用いて三次元離散コサイン変換処理（３Ｄ−ＤＣＴ処理）を行う方法を説明する。 A method of performing a three-dimensional discrete cosine transform process (3D-DCT process) using the array processor of this configuration will be described.

まず、各プロセッシングエレメントＰＥ内のレジスタＲ４〜Ｒ６には、式３〜式８に基づいて求められた値が初期値として設定される。なお、これら値は定数であり、全ての演算が終了するまで変更されることはない。また、レジスタＲ１〜Ｒ３には、初期値として０が記録される。さらに、レジスタＲ０には、三次元離散コサイン変換処理（３Ｄ−ＤＣＴ処理）を行う三次元の入力データ、具体的には前述した式（１）のＸ（ｍ，ｊ，ｋ）の値が記録されている。ここで、ｍは、ｍ＝（−ｉ−ｊ＋ｋ）ｍｏｄ＿ｎで計算される値であり、ｍｏｄ＿ｎはｎのモジュロ演算を表している。なお、ｍは、（−ｉ−ｊ＋ｋ）をｎで割った余りであり、ｎは、三次元の入力データｎ×ｎ×ｎのサイズ、すなわち、アレイプロセッサ１のサイズを示している。実施の形態２に係るプロセッシングエレメントＰＥでは、図１に示すようにｎ＝２である。 First, values obtained based on Expressions 3 to 8 are set as initial values in the registers R4 to R6 in each processing element PE. These values are constants and are not changed until all calculations are completed. Further, 0 is recorded as an initial value in the registers R1 to R3. Further, in the register R0, three-dimensional input data for performing the three-dimensional discrete cosine transform process (3D-DCT process), specifically, the value of X (m, j, k) in the above-described equation (1) is recorded. Has been. Here, m is a value calculated by m = (− i−j + k) mod_n, and mod_n represents a modulo operation of n. Note that m is a remainder obtained by dividing (−i−j + k) by n, and n indicates the size of the three-dimensional input data n × n × n, that is, the size of the array processor 1. In the processing element PE according to the second embodiment, n = 2 as shown in FIG.

さらに、図７に示すように、レジスタＲ０〜Ｒ３には、入力スイッチ部１４を介してデータが案内される場合があり、この場合には各レジスタＲ０〜Ｒ３に新たなデータが上書き保存されてしまう。このため、後述するように、レジスタＲ０〜Ｒ３は、演算処理に応じて記録されるデータが変更される可能性があり、演算処理の処理経過に応じて演算結果、あるいは、演算に用いられる設定値が順次変更される。そして、最終的には、アレイプロセッサ１によって演算された三次元離散コサイン変換処理（３Ｄ−ＤＣＴ処理）の演算結果が、レジスタＲ３に記録されることになる。 Furthermore, as shown in FIG. 7, data may be guided to the registers R0 to R3 via the input switch unit 14. In this case, new data is overwritten and saved in the registers R0 to R3. End up. For this reason, as will be described later, in the registers R0 to R3, there is a possibility that the data recorded in accordance with the calculation process may be changed, and the calculation result or the setting used for the calculation depending on the progress of the calculation process. The value is changed sequentially. Finally, the calculation result of the three-dimensional discrete cosine transform process (3D-DCT process) calculated by the array processor 1 is recorded in the register R3.

入力スイッチ部１４は、制御回路部１８の指示に応じて、３つの入力端子（ｉ入力端子、ｊ入力端子、ｋ入力端子）を切り替えることにより、３つの入力端子を介して隣接するプロセッシングエレメントＰＥより入力されたデータを、レジスタＲ０〜Ｒ３のいずれかに案内して記録させる役割を有している。実際に入力端子を介して入力されるデータは、３つの入力端子（ｉ入力端子、ｊ入力端子、ｋ入力端子）のうちのいずれか２つの入力端子を介して入力される。制御回路部１８では、入力された情報を入力端子の種類に応じて判断し、入力スイッチ部１４を制御して、入力されたデータを、それぞれ該当するレジスタ（レジスタＲ０〜Ｒ３のいずれか）に案内して記録させる。 The input switch unit 14 switches the three input terminals (i input terminal, j input terminal, k input terminal) in accordance with an instruction from the control circuit unit 18, thereby adjacent processing elements PE through the three input terminals. It has a role of guiding and recording the data input more by one of the registers R0 to R3. Data that is actually input via the input terminal is input via any two of the three input terminals (i input terminal, j input terminal, k input terminal). The control circuit unit 18 determines the input information according to the type of the input terminal, and controls the input switch unit 14 to transfer the input data to the corresponding register (one of the registers R0 to R3). Guide and record.

セレクタ部１５は、レジスタＲ０〜Ｒ６のいずれか３つのレジスタより演算回路部１６の演算処理に用いられるデータを取得して演算回路部１６に出力する。セレクタ部１５では、取得した３つのデータを、演算回路部１６の演算内容に応じて設けられる３つの入力端子（ａ入力端子，ｂ入力端子およびｃ入力端子）に出力する。なお、ａ入力端子およびｂ入力端子に対して出力されたデータは、ａ入力端子およびｂ入力端子を介して演算回路部１６に入力されると共に、そのまま出力スイッチ部１７へと出力される。セレクタ部１５において、いずれのレジスタ（レジスタＲ０〜Ｒ６のうち３つのレジスタ）からデータを取得するか、および、取得したデータをａ入力端子，ｂ入力端子およびｃ入力端子のいずれに出力するかという判断処理は、制御回路部１８の指示に応じて行われる。 The selector unit 15 acquires data used for the arithmetic processing of the arithmetic circuit unit 16 from any three of the registers R0 to R6 and outputs the data to the arithmetic circuit unit 16. The selector unit 15 outputs the acquired three data to three input terminals (a input terminal, b input terminal, and c input terminal) provided according to the calculation contents of the arithmetic circuit unit 16. The data output to the a input terminal and the b input terminal is input to the arithmetic circuit unit 16 through the a input terminal and the b input terminal, and is output to the output switch unit 17 as it is. In the selector unit 15, from which register (three registers out of the registers R0 to R6) data is acquired, and whether the acquired data is output to the a input terminal, b input terminal, or c input terminal The determination process is performed according to an instruction from the control circuit unit 18.

演算回路部１６では、ａ入力端子より取得したデータａと、ｂ入力端子より取得したデータｂと、ｃ入力端子より取得したデータｃとに基づいて、積和演算を行う。積和演算は、演算結果をｄとすると、ｄ＝ａ×ｂ＋ｃによって求められる。演算結果ｄは、演算回路部１６の出力端子２０を介して出力スイッチ部１７に出力される。 The arithmetic circuit unit 16 performs a product-sum operation based on the data a acquired from the a input terminal, the data b acquired from the b input terminal, and the data c acquired from the c input terminal. The product-sum operation is obtained by d = a × b + c, where d is the operation result. The calculation result d is output to the output switch unit 17 via the output terminal 20 of the calculation circuit unit 16.

出力スイッチ部１７は、セレクタ部１５より演算回路部１６のａ入力端子に対して出力され、さらに、出力端子２１を介して出力スイッチ部１７へ出力されたデータａと、演算回路部１６のｂ入力端子に対して出力され、さらに、出力端子２２を介して出力スイッチ部１７へ出力されたデータｂと、演算回路部１６の出力端子２０を介して出力スイッチ部１７へ出力された演算結果ｄとを取得し、制御回路部１８の指示に応じて、３つの出力端子（ｉ出力端子、ｊ出力端子、ｋ出力端子）のいずれか２つの端子より、データａまたはデータｂのいずれか一方と、演算結果ｄとを出力させる役割を有している。このため、実際に出力端子を介して出力されるデータは、３つの出力端子（ｉ出力端子、ｊ出力端子、ｋ出力端子）のうちいずれか２つの出力端子のみになる。制御回路部１８は、入力されたデータを判断し、出力スイッチ部１７を制御することによって、データａまたはデータｂのいずれか一方と、演算結果ｄとを、それぞれ該当する出力端子（ｉ出力端子、ｊ出力端子、ｋ出力端子のいずれか）に出力する。 The output switch unit 17 is output from the selector unit 15 to the a input terminal of the arithmetic circuit unit 16, and is further output to the output switch unit 17 via the output terminal 21 and b of the arithmetic circuit unit 16. The data b output to the input terminal and further output to the output switch unit 17 via the output terminal 22 and the calculation result d output to the output switch unit 17 via the output terminal 20 of the arithmetic circuit unit 16 And according to an instruction from the control circuit unit 18, either one of the data a or data b is received from any two terminals of the three output terminals (i output terminal, j output terminal, k output terminal). , And has a role of outputting the calculation result d. Therefore, the data that is actually output through the output terminal is only two of the three output terminals (i output terminal, j output terminal, k output terminal). The control circuit unit 18 determines the input data and controls the output switch unit 17 so that either the data a or the data b and the calculation result d are output to the corresponding output terminals (i output terminals). , J output terminal, or k output terminal).

制御回路部１８は、上述した入力スイッチ部１４、セレクタ部１５、出力スイッチ部１７の操作制御を行う役割を有している。制御回路部１８は、入力スイッチ部１４，セレクタ部１５および出力スイッチ部１７に入力されるデータを、その内容や入力された入力端子の種類に応じて判断し、アレイプロセッサ１の処理内容に応じて入力スイッチ部１４、セレクタ部１５、出力スイッチ部１７の制御を行う。 The control circuit unit 18 has a role of performing operation control of the input switch unit 14, the selector unit 15, and the output switch unit 17 described above. The control circuit unit 18 determines the data input to the input switch unit 14, the selector unit 15, and the output switch unit 17 according to the content and the type of input terminal input, and according to the processing content of the array processor 1. The input switch unit 14, the selector unit 15, and the output switch unit 17 are controlled.

なお、図７には示していないが、各プロセッシングエレメントＰＥには、レジスタＲ０に対して三次元離散コサイン変換処理を行うための入力データを入力し、また、レジスタＲ１〜Ｒ３に対して初期値０を入力し、さらに、レジスタＲ４〜Ｒ６に対して初期値Ｃ（ｉ，ｋ），Ｃ（ｋ，ｊ），Ｃ（ｉ，ｊ）を入力するためのデータ入力手段や、最終的な演算結果が記録されるレジスタＲ３よりデータを取得するためのデータ取得手段などが設けられている。 Although not shown in FIG. 7, input data for performing the three-dimensional discrete cosine transform processing on the register R0 is input to each processing element PE, and initial values are input to the registers R1 to R3. 0, and data input means for inputting initial values C (i, k), C (k, j), C (i, j) to the registers R4 to R6, and final calculation Data acquisition means for acquiring data from the register R3 in which the result is recorded is provided.

次に、各プロセッシングエレメントＰＥの制御回路部１８が、入力スイッチ部１４，セレクタ部１５，および出力スイッチ部１７を適宜制御することにより、三次元直交変換の一つである三次元離散コサイン変換処理（３Ｄ―ＤＣＴ処理）を行う過程を説明する。 Next, the control circuit unit 18 of each processing element PE appropriately controls the input switch unit 14, the selector unit 15, and the output switch unit 17, thereby performing a three-dimensional discrete cosine transform process that is one of three-dimensional orthogonal transforms. A process of performing (3D-DCT processing) will be described.

図８は、各プロセッシングエレメントＰＥ（ｉ，ｊ，ｋ）における各レジスタＲ０〜Ｒ６の初期値および各処理ステップにおいて記録される内容が利用され、または変更されるレジスタを示した表である。レジスタＲ０には、Ｘ（ｍ，ｊ，ｋ）が設定される。但し、ｍの値には、ｍ＝（−ｉ−ｊ＋ｋ）ｍｏｄ＿２で求められる値が用いられる。このため、プロセッシングエレメントＰＥ（０,０,０）のレジスタＲ０にはＸ（０,０,０）が設定され、プロセッシングエレメントＰＥ（０,１,０）のレジスタＲ０にはＸ（１,１,０）が設定される。また、プロセッシングエレメントＰＥ（１,０,０）のレジスタＲ０にはＸ（１,０,０）が設定され、プロセッシングエレメントＰＥ（１,１,０）のレジスタＲ０にはＸ（０,１,０）が設定される。さらに、プロセッシングエレメントＰＥ（０,０,１）のレジスタＲ０にはＸ（１,０,１）が設定され、プロセッシングエレメントＰＥ（０,１,１）のレジスタＲ０にはＸ（０,１,１）が設定される。また、プロセッシングエレメントＰＥ（１,０,１）のレジスタＲ０にはＸ（０,０,１）が設定され、プロセッシングエレメントＰＥ（１,１,１）のレジスタＲ０にはＸ（１,１,１）が設定される。 FIG. 8 is a table showing registers in which the initial values of the registers R0 to R6 and the contents recorded in the processing steps in each processing element PE (i, j, k) are used or changed. X (m, j, k) is set in the register R0. However, a value obtained by m = (− i−j + k) mod — 2 is used as the value of m. For this reason, X (0,0,0) is set in the register R0 of the processing element PE (0,0,0), and X (1,1) is set in the register R0 of the processing element PE (0,1,0). , 0) is set. In addition, X (1, 0, 0) is set in the register R0 of the processing element PE (1, 0, 0), and X (0, 1, 0) is set in the register R0 of the processing element PE (1, 1, 0). 0) is set. Furthermore, X (1, 0, 1) is set in the register R0 of the processing element PE (0, 0, 1), and X (0, 1, 1) is set in the register R0 of the processing element PE (0, 1, 1). 1) is set. In addition, X (0,0,1) is set in the register R0 of the processing element PE (1, 0, 1), and X (1, 1, 1) is set in the register R0 of the processing element PE (1, 1, 1). 1) is set.

その他の各レジスタの初期値は、実施の形態１において説明した値と同様とする。図９は、図８に示したレジスタＲ０〜Ｒ６の内容が変更される状態を、処理ステップに応じて矢印で示したものである。なお、図９に示した破線による矢印は、後述するデータａまたはデータｂのいずれか一方であって、後述する処理において用いられるデータの出力状態を示している。また、実線による矢印は、後述する演算結果ｄに該当するデータの出力状態を示している。 The initial values of the other registers are the same as those described in the first embodiment. FIG. 9 shows a state in which the contents of the registers R0 to R6 shown in FIG. 8 are changed by arrows according to processing steps. In addition, the arrow by the broken line shown in FIG. 9 is one of data a and data b described later, and indicates the output state of data used in the processing described later. Further, solid arrows indicate the output state of data corresponding to the calculation result d described later.

上述したレジスタＲ０〜Ｒ６の初期値の設定が行われた後、各プロセッシングエレメントＰＥの制御回路部１８は、セレクタ部１５を制御して、レジスタＲ４に記録されるデータを演算回路部１６のａ入力端子に出力させ、レジスタＲ０に記録されるデータを演算回路部１６のｂ入力端子に出力させ、レジスタＲ１に記録されるデータを演算回路部１６のｃ入力端子に出力させる処理を実行する。この処理において、図８および図９に示すように、レジスタＲ４に記録されるデータは、Ｃ（ｉ，ｋ）の初期値であり、レジスタＲ０に記録されるデータは、三次元離散コサイン変換処理を行うための入力データ：Ｘ（ｍ，ｊ，ｋ）、ただし、ｍ＝（−ｉ−ｊ＋ｋ）ｍｏｄ＿２であり、レジスタＲ１に記録されるデータは、初期値の０である。 After the initial values of the registers R0 to R6 are set, the control circuit unit 18 of each processing element PE controls the selector unit 15 to transfer the data recorded in the register R4 to the a of the arithmetic circuit unit 16. A process of outputting to the input terminal, outputting the data recorded in the register R0 to the b input terminal of the arithmetic circuit unit 16, and outputting the data recorded in the register R1 to the c input terminal of the arithmetic circuit unit 16 is executed. In this processing, as shown in FIGS. 8 and 9, the data recorded in the register R4 is an initial value of C (i, k), and the data recorded in the register R0 is processed by a three-dimensional discrete cosine transform process. Data: X (m, j, k) where m = (− i−j + k) mod — 2, and the data recorded in the register R 1 has an initial value of 0.

各プロセッシングエレメントＰＥの演算回路部１６では、データａとして入力されたレジスタＲ４のデータと、データｂとして入力されたレジスタＲ０のデータと、データｃとして入力されたレジスタＲ１のデータとを用いて、積和演算（ｄ＝ａ×ｂ＋ｃの演算）を実行する。 The arithmetic circuit unit 16 of each processing element PE uses the data of the register R4 input as the data a, the data of the register R0 input as the data b, and the data of the register R1 input as the data c. A product-sum operation (operation of d = a × b + c) is executed.

次いで各プロセッシングエレメントＰＥの制御回路部１８は、出力スイッチ部１７を制御して演算結果ｄ、データａおよびデータｂとを取得し、演算結果ｄをｋ軸方向に隣接するプロセッシングエレメントＰＥのｋ入力端子に出力し、また、データｂ（詳細には、レジスタＲ０に記録されていたデータ）を−ｉ軸方向に隣接するプロセッシングエレメントＰＥのｉ入力端子に出力する。なお、取得されたデータａに関しては、隣接するプロセッシングエレメントＰＥに対して出力されない。 Next, the control circuit unit 18 of each processing element PE controls the output switch unit 17 to acquire the operation result d, data a, and data b, and the operation result d is input to the k of the processing element PE adjacent in the k-axis direction. The data b (specifically, data recorded in the register R0) is output to the i input terminal of the processing element PE adjacent in the -i axis direction. Note that the acquired data a is not output to the adjacent processing element PE.

そして、各プロセッシングエレメントＰＥの制御回路部１８は、入力スイッチ部１４を制御して、隣接するプロセッシングエレメントＰＥよりｋ入力端子を介して演算結果ｄを取得し、ｉ入力端子を介してデータｂ（レジスタＲ０のデータ）とを取得して、演算結果ｄをレジスタＲ１に記録させるとともに、データｂ（レジスタＲ０のデータ）をレジスタＲ０に記録する。 Then, the control circuit unit 18 of each processing element PE controls the input switch unit 14 to obtain the operation result d from the adjacent processing element PE through the k input terminal, and the data b ( And the operation result d is recorded in the register R1, and the data b (data in the register R0) is recorded in the register R0.

このようにして、各プロセッシングエレメントＰＥの制御回路部１８は、演算結果ｄおよびデータｂを隣接したプロセッシングエレメントＰＥに対して出力する処理を、各軸方向に配設されるプロセッシングエレメントＰＥの配置個数に対応する回数、つまりｎ回（実施の形態２では２回）だけ繰り返し実行して、ｉ入力端子およびｉ出力端子の接続によりトーラス状に接続された経路と、ｋ入力端子およびｋ出力端子の接続によりトーラス状に接続された経路とに沿って、演算処理結果および所定のデータを一巡させる。この一巡の処理が第一周期目のｎ回の積和演算処理に該当する。なお、第一周期目のｎ回の積和演算処理において、出力端子２１を介して出力スイッチ部１７に入力されたデータａは、隣接するプロセッシングエレメントＰＥに対して出力されることはない。 In this way, the control circuit unit 18 of each processing element PE performs the process of outputting the calculation result d and data b to the adjacent processing element PE, and the number of processing elements PE arranged in each axial direction. , I.e., n times (in the second embodiment, twice), a path connected in a torus shape by connecting the i input terminal and the i output terminal, and the k input terminal and the k output terminal The calculation processing result and the predetermined data are made a round along the path connected in a torus shape by the connection. This one-round processing corresponds to n product-sum operation processing in the first period. Note that, in the first-time n product-sum operation processing, the data a input to the output switch unit 17 via the output terminal 21 is not output to the adjacent processing element PE.

このｎ回の処理の繰り返しにより、各プロセッシングエレメントＰＥのレジスタＲ１には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果が記録されることになる。実施の形態２では、ｎ＝２であるため、２回だけ上述の処理が行われる。図８および図９に示した「０」および「１」の処理ステップは、上述したこの２回の処理に該当する処理内容を示している。 By repeating this process n times, the result of product-sum operation is recorded in all the processing elements PE arranged in the k-axis direction in the register R1 of each processing element PE. In the second embodiment, since n = 2, the above-described processing is performed only twice. The processing steps “0” and “1” shown in FIGS. 8 and 9 indicate processing contents corresponding to the above-described two processings.

続いて、各プロセッシングエレメントＰＥの制御回路部１８は、セレクタ部１５を制御して、レジスタＲ１に記録されるデータを演算回路部１６のａ入力端子に出力させ、レジスタＲ５に記録されるデータを演算回路部１６のｂ入力端子に出力させ、レジスタＲ２に記録されるデータを演算回路部１６のｃ入力端子に出力させる処理を実行する。この処理において、レジスタＲ５に記録されるデータは、Ｃ（ｋ，ｊ）の初期値であり、レジスタＲ１に記録されるデータは、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果であり、レジスタＲ２に記録されるデータは、初期値の０である。 Subsequently, the control circuit unit 18 of each processing element PE controls the selector unit 15 to output the data recorded in the register R1 to the a input terminal of the arithmetic circuit unit 16, and the data recorded in the register R5. A process of outputting to the b input terminal of the arithmetic circuit unit 16 and outputting the data recorded in the register R2 to the c input terminal of the arithmetic circuit unit 16 is executed. In this processing, the data recorded in the register R5 is the initial value of C (k, j), and the data recorded in the register R1 is sum of products in all the processing elements PE arranged in the k-axis direction. The calculated result and the data recorded in the register R2 has an initial value of 0.

各プロセッシングエレメントＰＥの演算回路部１６では、データａとして入力されたレジスタＲ１のデータと、データｂとして入力されたレジスタＲ５のデータと、データｃとして入力されたレジスタＲ２のデータとを用いて、積和演算（ｄ＝ａ×ｂ＋ｃの演算）を実行する。 The arithmetic circuit unit 16 of each processing element PE uses the data of the register R1 input as the data a, the data of the register R5 input as the data b, and the data of the register R2 input as the data c. A product-sum operation (operation of d = a × b + c) is executed.

次いで各プロセッシングエレメントＰＥの制御回路部１８は、出力スイッチ部１７を制御して、演算結果ｄ、データａおよびデータｂを取得し、演算結果ｄを−ｊ軸方向に隣接するプロセッシングエレメントＰＥのｊ入力端子に出力し、また、データａ（詳細には、レジスタＲ１に記録されていたデータ）をｋ軸方向に隣接するプロセッシングエレメントＰＥのｋ入力端子に出力する。なお、取得されたデータｂに関しては、隣接するプロセッシングエレメントＰＥに対して出力されない。 Next, the control circuit unit 18 of each processing element PE controls the output switch unit 17 to acquire the operation result d, data a, and data b, and the operation result d is j of the processing element PE adjacent in the −j axis direction. The data a is output to the input terminal, and the data a (specifically, the data recorded in the register R1) is output to the k input terminal of the processing element PE adjacent in the k-axis direction. The acquired data b is not output to the adjacent processing element PE.

そして、各プロセッシングエレメントＰＥの制御回路部１８は、入力スイッチ部１４を制御して、隣接するプロセッシングエレメントＰＥよりｊ入力端子を介して演算結果ｄを取得し、ｋ入力端子を介してデータａ（レジスタＲ１のデータ）を取得して、演算結果ｄをレジスタＲ２に記録させるとともに、データａ（レジスタＲ１のデータ）をレジスタＲ１に記録する。 Then, the control circuit unit 18 of each processing element PE controls the input switch unit 14 to obtain the operation result d from the adjacent processing element PE through the j input terminal, and the data a ( (Register R1 data) is acquired, the operation result d is recorded in the register R2, and data a (data in the register R1) is recorded in the register R1.

このようにして、各プロセッシングエレメントＰＥの制御回路部１８は、演算結果ｄおよびデータａを隣接したプロセッシングエレメントＰＥに対して出力する処理を、各軸方向に配設されるプロセッシングエレメントＰＥの配置個数に対応する回数、つまりｎ回（実施の形態２では２回）だけ繰り返し実行して、ｊ入力端子およびｊ出力端子の接続によりトーラス状に接続された経路と、ｋ入力端子およびｋ出力端子の接続によりトーラス状に接続された経路とに沿って、演算処理結果および所定データを一巡させる。この一巡の処理が第二周期目のｎ回の積和演算処理に該当する。なお、第二周期目のｎ回の積和演算処理において、出力端子２２を介して出力スイッチ部１７に入力されたデータｂは、隣接するプロセッシングエレメントＰＥに対して出力されることはない。 In this way, the control circuit unit 18 of each processing element PE performs the process of outputting the calculation result d and data a to the adjacent processing element PE, and the number of processing elements PE arranged in each axial direction. , I.e., n times (in the second embodiment, twice), a path connected in a torus shape by connecting the j input terminal and the j output terminal, and the k input terminal and the k output terminal The arithmetic processing result and the predetermined data are circulated along the path connected in a torus shape by the connection. This one-round processing corresponds to n product-sum operation processing in the second period. In the n-th product-sum operation processing in the second period, the data b input to the output switch unit 17 via the output terminal 22 is not output to the adjacent processing element PE.

このｎ回の処理の繰り返しにより、各プロセッシングエレメントＰＥのレジスタＲ２には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）であるレジスタＲ１に基づいて、ｊ軸方向に配設された全てのプロセッシングエレメントＰＥで積和演算された演算結果（第二周期目のｎ回の積和演算処理による演算結果）が記録され、レジスタＲ１には、ｋ軸方向に配設された全てのプロセッシングエレメントにおいて積和演算された演算結果が記録されることになる。実施の形態２では、ｎ＝２であるため、２回だけ上述の処理が行われる。図８および図９に示した「２」および「３」の処理ステップは、上述したこの２回の処理に該当する処理内容を示している。 By repeating this processing n times, the result of product-sum operation in all the processing elements PE arranged in the k-axis direction (n-th product in the first cycle) is stored in the register R2 of each processing element PE. Based on the register R1 that is the result of the sum operation process, the result of the product-sum operation on all the processing elements PE arranged in the j-axis direction (by the nth product-sum operation process in the second period) (Operation result) is recorded, and the result of the product-sum operation in all the processing elements arranged in the k-axis direction is recorded in the register R1. In the second embodiment, since n = 2, the above-described processing is performed only twice. The processing steps “2” and “3” shown in FIGS. 8 and 9 indicate the processing contents corresponding to the above-described two processings.

続いて、各プロセッシングエレメントＰＥの制御回路部１８は、セレクタ部１５を制御して、レジスタＲ２に記録されるデータを演算回路部１６のａ入力端子に出力させ、レジスタＲ６に記録されるデータを演算回路部１６のｂ入力端子に出力させ、レジスタＲ３に記録されるデータを演算回路部１６のｃ入力端子に出力させる処理を実行する。この処理において、レジスタＲ６に記録されるデータは、Ｃ（ｉ，ｊ）の初期値であり、レジスタＲ２に記録されるデータは、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）に基づいて、ｊ軸方向に配設された全てのプロセッシングエレメントＰＥで積和演算された演算結果（第二周期目のｎ回の積和演算処理による演算結果）であり、レジスタＲ３に記録されるデータは、初期値の０である。 Subsequently, the control circuit unit 18 of each processing element PE controls the selector unit 15 to output the data recorded in the register R2 to the a input terminal of the arithmetic circuit unit 16, and the data recorded in the register R6. A process of outputting to the b input terminal of the arithmetic circuit unit 16 and outputting the data recorded in the register R3 to the c input terminal of the arithmetic circuit unit 16 is executed. In this processing, the data recorded in the register R6 is the initial value of C (i, j), and the data recorded in the register R2 is sum of products in all the processing elements PE arranged in the k-axis direction. On the basis of the calculated calculation result (the calculation result by the n-th product-sum calculation process in the first period), the calculation result (second calculation) performed by all the processing elements PE arranged in the j-axis direction The data recorded in the register R3 is an initial value of 0.

各プロセッシングエレメントＰＥの演算回路部１６では、データａとして入力されたレジスタＲ２のデータと、データｂとして入力されたレジスタＲ６のデータと、データｃとして入力されたレジスタＲ３のデータとを用いて、積和演算（ｄ＝ａ×ｂ＋ｃの演算）を行う。 The arithmetic circuit unit 16 of each processing element PE uses the data of the register R2 input as the data a, the data of the register R6 input as the data b, and the data of the register R3 input as the data c. A product-sum operation (operation of d = a × b + c) is performed.

次いで各プロセッシングエレメントＰＥの制御回路部１８は、出力スイッチ部１７を制御して、演算結果ｄ、データａおよびデータｂとを取得して、演算結果ｄを−ｉ軸方向に隣接するプロセッシングエレメントＰＥのｉ入力端子に出力し、また、データａ（詳細には、レジスタＲ２に記録されていたデータ）を−ｊ軸方向に隣接するプロセッシングエレメントＰＥのｊ入力端子に出力する。なお、取得されたデータｂに関しては、隣接するプロセッシングエレメントＰＥに対して出力されない。 Next, the control circuit unit 18 of each processing element PE controls the output switch unit 17 to obtain the calculation result d, data a, and data b, and the calculation result d is processed in the processing element PE adjacent in the −i axis direction. The data a (specifically, data recorded in the register R2) is output to the j input terminal of the processing element PE adjacent in the −j axis direction. The acquired data b is not output to the adjacent processing element PE.

そして、各プロセッシングエレメントＰＥの制御回路部１８は、入力スイッチ部１４を制御して、隣接するプロセッシングエレメントＰＥよりｉ入力端子を介して演算結果ｄを取得し、ｊ入力端子を介してデータａ（レジスタＲ２のデータ）を取得して、演算結果ｄをレジスタＲ３に記録させるとともに、データａ（レジスタＲ２のデータ）をレジスタＲ２に記録する。 Then, the control circuit unit 18 of each processing element PE controls the input switch unit 14 to obtain the operation result d from the adjacent processing element PE through the i input terminal, and the data a ( The data (register R2 data) is acquired, the operation result d is recorded in the register R3, and the data a (data in the register R2) is recorded in the register R2.

このようにして、各プロセッシングエレメントＰＥの制御回路部１８は、演算結果ｄおよびデータａを隣接したプロセッシングエレメントＰＥに対して出力する処理を、各軸方向に配設されるプロセッシングエレメントＰＥの配置個数に対応する回数、つまりｎ回（実施の形態２では２回）だけ繰り返し実行して、ｉ入力端子およびｉ出力端子の接続によりトーラス状に接続された経路と、ｊ入力端子およびｊ出力端子の接続によりトーラス状に接続された経路とに沿って、演算処理結果および所定データを一巡させる。この一巡の処理が第三周期目のｎ回の積和演算処理に該当する。なお、第三周期目のｎ回の積和演算処理において、出力端子２２を介して出力スイッチ部１７に入力されたデータｂは、隣接するプロセッシングエレメントＰＥに対して出力されることはない。 In this way, the control circuit unit 18 of each processing element PE performs the process of outputting the calculation result d and data a to the adjacent processing element PE, and the number of processing elements PE arranged in each axial direction. , I.e., n times (twice in the second embodiment), a path connected in a torus shape by connecting the i input terminal and the i output terminal, and the j input terminal and the j output terminal The arithmetic processing result and the predetermined data are circulated along the path connected in a torus shape by the connection. This one-round processing corresponds to n product-sum operation processing in the third period. Note that in the third product-sum operation processing in the third period, the data b input to the output switch unit 17 via the output terminal 22 is not output to the adjacent processing element PE.

このｎ回の処理の繰り返しにより、各プロセッシングエレメントＰＥのレジスタＲ３には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）に基づいて積和演算されたｊ軸方向に配設される全てのプロセッシングエレメントＰＥの積和演算結果（第二周期目のｎ回の積和演算処理による演算結果）を用いて、ｉ軸方向に配設された全てのプロセッシングエレメントＰＥで演算処理した演算結果（第三周期目のｎ回の積和演算処理による演算結果）が記録される。これは、すなわち、三次元離散コサイン変換の計算結果である。また、レジスタＲ２には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）に基づいて、ｊ軸方向に配設された全てのプロセッシングエレメントＰＥで積和演算された演算結果（第二周期目のｎ回の積和演算処理による演算結果）が記録されることになる。これは、すなわち、ｉ軸方向およびｊ軸方向で構成される二次元の離散コサイン変換の計算結果である。さらに、レジスタＲ１には、ｉ軸方向の一次元の離散コサイン変換の計算結果が記録されることになる。実施の形態２では、ｎ＝２であるため、２回だけ上述の処理が行われる。図８および図９に示した「４」および「５」の処理ステップは、上述したこの２回の処理に該当する処理内容を示している。 As a result of repeating this process n times, the register R3 of each processing element PE stores the result of the product-sum operation in all the processing elements PE arranged in the k-axis direction (nth product in the first period). The product-sum operation results of all the processing elements PE arranged in the j-axis direction that have undergone product-sum operation based on the result of the operation of the sum operation processing (operation results of the n-th product-sum operation processing in the second period) Are used to record the calculation results (calculation results of the nth product-sum calculation processing in the third period) performed by all the processing elements PE arranged in the i-axis direction. This is the calculation result of the three-dimensional discrete cosine transform. In addition, the register R2 stores j based on the operation result obtained by the product-sum operation in all the processing elements PE arranged in the k-axis direction (the operation result obtained by n times product-sum operation processing in the first period). The result of the product-sum operation performed by all the processing elements PE arranged in the axial direction (the operation result by the n-th product-sum operation processing in the second period) is recorded. This is a calculation result of a two-dimensional discrete cosine transform composed of the i-axis direction and the j-axis direction. Further, the calculation result of the one-dimensional discrete cosine transform in the i-axis direction is recorded in the register R1. In the second embodiment, since n = 2, the above-described processing is performed only twice. The processing steps “4” and “5” shown in FIGS. 8 and 9 indicate the processing contents corresponding to the above-described two processings.

このようにして実施の形態２に係るアレイプロセッサ１では、隣接するプロセッシングエレメントＰＥに対して、演算結果および積和演算に用いる所定データを順次出力してそれぞれのプロセッシングエレメントＰＥにおいて積和演算処理を行い、この隣接するプロセッシングエレメントＰＥへの演算結果の出力処理を、順次ｋ軸方向、ｊ軸方向、ｉ軸方向へと出力方向を変えて処理を進めることにより、各演算回路部１６における積和演算結果をレジスタＲ３に記録させることができる。従って、従来のアレイプロセッサのように、計算途中でデータや係数データの入力順番を調整するための専用回路を設ける必要がなく、回路構成の簡素化を図ることが可能となる。また、演算結果を隣接するプロセッシングエレメントＰＥに順次出力することによって三次元離散コサイン変換処理（三次元直交変換処理）を行うことが可能であるため、各次元の計算途中においてプロセッシングエレメントＰＥの内部に蓄えられるデータおよび係数行列要素を、従来のように複雑な配線構造を用いて何度も交換し合う必要がなく、処理の迅速化および簡素化を図ることが可能となる。 In this manner, in the array processor 1 according to the second embodiment, the operation result and the predetermined data used for the product-sum operation are sequentially output to the adjacent processing elements PE, and the product-sum operation processing is performed in each processing element PE. The output of the calculation result to the adjacent processing element PE is sequentially processed by changing the output direction to the k-axis direction, the j-axis direction, and the i-axis direction. The calculation result can be recorded in the register R3. Therefore, unlike the conventional array processor, it is not necessary to provide a dedicated circuit for adjusting the input order of data and coefficient data during the calculation, and the circuit configuration can be simplified. Further, since the three-dimensional discrete cosine transform process (three-dimensional orthogonal transform process) can be performed by sequentially outputting the calculation result to the adjacent processing element PE, the calculation element PE can be inserted inside the processing element PE during the calculation of each dimension. The stored data and coefficient matrix elements do not need to be exchanged many times using a complicated wiring structure as in the prior art, and the processing can be speeded up and simplified.

このように、実施の形態２に係るアレイプロセッサ１を用いることにより、従来の構成において必須とされていた計算途中におけるデータ同士の入替作業を行うことなくなる。また、複数のプロセッシングエレメントＰＥを接続させて演算結果処理を順次行うことにより、三次元離散コサイン変換処理（三次元直交変換処理）を迅速に実行することができ、さらに回路構成の複雑化を抑制することが可能となる。 As described above, by using the array processor 1 according to the second embodiment, it is not necessary to perform replacement work of data during calculation, which is essential in the conventional configuration. In addition, by connecting a plurality of processing elements PE and sequentially performing processing result processing, three-dimensional discrete cosine transform processing (three-dimensional orthogonal transform processing) can be executed quickly, and the complexity of the circuit configuration is further suppressed. It becomes possible to do.

次に、上述したアレイプロセッサ１を用いて、三次元逆離散コサイン変換（３Ｄ−ＩＤＣＴ）を計算する方法を説明する。なお、図１０は、各プロセッシングエレメントＰＥ（ｉ，ｊ，ｋ）における各レジスタＲ０〜Ｒ６の初期値および各処理ステップにおいて記録されるレジスタを示した表であり、図１１は、図１０に示したレジスタの内容が変更される状態を、処理ステップに応じて矢印で示したものである。なお、図１１に示した破線による矢印は、演算処理に用いられるデータａまたはデータｂのいずれか一方であって、後述する処理において用いられるデータの出力状態を示している。また、実線による矢印は、演算結果ｄに該当するデータの出力状態を示している。 Next, a method for calculating the three-dimensional inverse discrete cosine transform (3D-IDCT) using the array processor 1 described above will be described. FIG. 10 is a table showing the initial values of the registers R0 to R6 and the registers recorded in each processing step in each processing element PE (i, j, k), and FIG. 11 is shown in FIG. The state in which the contents of the registers are changed is indicated by arrows according to the processing steps. In addition, the arrow by the broken line shown in FIG. 11 is one of the data a and data b used in the calculation process, and indicates the output state of data used in the process described later. An arrow with a solid line indicates an output state of data corresponding to the calculation result d.

まず始めに、各プロセッシングエレメントＰＥ内のレジスタＲ０〜Ｒ６の初期値を設定する。ここで、レジスタＲ４、Ｒ５、Ｒ６には、式９〜式１４により求められる値が初期値として設定される。こららは、定数であり、演算終了まで変更されることはない。 First, initial values of the registers R0 to R6 in each processing element PE are set. Here, in the registers R4, R5, and R6, values obtained by Expressions 9 to 14 are set as initial values. These are constants and are not changed until the end of the operation.

また、レジスタＲ１〜Ｒ３には、初期値として０が記録され、レジスタＲ０には、三次元逆離散コサイン変換処理（３Ｄ−ＩＤＣＴ処理）を行う三次元の入力データ、具体的には上述した式（２）のＹ（ｉ，ｍ，ｋ）の値が記録されている。ここで、ｍは、ｍ＝（−ｉ−ｊ＋ｋ）ｍｏｄ＿ｎで計算される値であり、ｍｏｄ＿ｎはｎのモジュロ演算を表す。すなわち，ｍは、（−ｉ−ｊ＋ｋ）をｎで割った余りである。また、ｎは、三次元の入力データｎ×ｎ×ｎのサイズ、すなわち、アレイプロセッサ１のサイズを示している。実施の形態２に係るプロセッシングエレメントＰＥでは、図１に示すようにｎ＝２である。 Also, 0 is recorded in the registers R1 to R3 as an initial value, and in the register R0, three-dimensional input data for performing a three-dimensional inverse discrete cosine transform process (3D-IDCT process), specifically, the above-described formula The value of Y (i, m, k) in (2) is recorded. Here, m is a value calculated by m = (− i−j + k) mod_n, and mod_n represents a modulo operation of n. That is, m is a remainder obtained by dividing (−i−j + k) by n. N indicates the size of the three-dimensional input data n × n × n, that is, the size of the array processor 1. In the processing element PE according to the second embodiment, n = 2 as shown in FIG.

従って、プロセッシングエレメントＰＥ（０,０,０）のレジスタＲ０にはＹ（０,０,０）が設定され、プロセッシングエレメントＰＥ（０,１,０）のレジスタＲ０にはＹ（０,１,０）が設定される。また、プロセッシングエレメントＰＥ（１,０,０）のレジスタＲ０にはＹ（１,１,０）が設定され、プロセッシングエレメントＰＥ（１,１,０）のレジスタＲ０にはＹ（１,０,０）が設定される。さらに、プロセッシングエレメントＰＥ（０,０,１）のレジスタＲ０にはＹ（０,１,１）が設定され、プロセッシングエレメントＰＥ（０,１,１）のレジスタＲ０にはＹ（０,０,１）が設定される。また、プロセッシングエレメントＰＥ（１,０,１）のレジスタＲ０にはＹ（１,０,１）が設定され、プロセッシングエレメントＰＥ（１,１,１）のレジスタＲ０にはＹ（１,１,１）が設定される。 Accordingly, Y (0,0,0) is set in the register R0 of the processing element PE (0,0,0), and Y (0,1,0) is set in the register R0 of the processing element PE (0,1,0). 0) is set. Further, Y (1,1,0) is set in the register R0 of the processing element PE (1, 0, 0), and Y (1, 0, 0) is set in the register R0 of the processing element PE (1, 1, 0). 0) is set. Further, Y (0,1,1) is set in the register R0 of the processing element PE (0,0,1), and Y (0,0,1) is set in the register R0 of the processing element PE (0,1,1). 1) is set. In addition, Y (1, 0, 1) is set in the register R0 of the processing element PE (1, 0, 1), and Y (1, 1, 1) is set in the register R0 of the processing element PE (1, 1, 1). 1) is set.

このレジスタＲ０〜Ｒ６の初期値の設定が行われた後、各プロセッシングエレメントＰＥの制御回路部１８は、セレクタ部１５を制御して、レジスタＲ０に記録されるデータを演算回路部１６のａ入力端子に出力させ、レジスタＲ４に記録されるデータを演算回路部１６のｂ入力端子に出力させ、レジスタＲ１に記録されるデータを演算回路部１６のｃ入力端子に出力させる処理を実行する。この処理において、レジスタＲ４に記録されるデータは、Ｃ（ｋ，ｊ）の初期値であり、レジスタＲ０に記録されるデータは、三次元逆離散コサイン変換処理（３Ｄ−ＩＤＣＴ処理）を行うための入力データＹ（ｉ，ｍ，ｋ）、ただしｍ＝（−ｉ−ｊ＋ｋ）ｍｏｄ＿ｎ、であり、レジスタＲ１に記録されるデータは、初期値の０である。 After the initial values of the registers R0 to R6 are set, the control circuit unit 18 of each processing element PE controls the selector unit 15 to input the data recorded in the register R0 to the a input of the arithmetic circuit unit 16. A process of outputting data to the terminal, outputting data recorded in the register R4 to the b input terminal of the arithmetic circuit unit 16, and outputting data recorded in the register R1 to the c input terminal of the arithmetic circuit unit 16 is executed. In this process, the data recorded in the register R4 is an initial value of C (k, j), and the data recorded in the register R0 is subjected to a three-dimensional inverse discrete cosine transform process (3D-IDCT process). Input data Y (i, m, k), where m = (− i−j + k) mod_n, and the data recorded in the register R1 has an initial value of 0.

各プロセッシングエレメントＰＥの演算回路部１６では、データａとして入力されたレジスタＲ０のデータと、データｂとして入力されたレジスタＲ４のデータと、データｃとして入力されたレジスタＲ１のデータとを用いて、積和演算（ｄ＝ａ×ｂ＋ｃの演算）を行う。 The arithmetic circuit unit 16 of each processing element PE uses the data of the register R0 input as the data a, the data of the register R4 input as the data b, and the data of the register R1 input as the data c. A product-sum operation (operation of d = a × b + c) is performed.

次いで各プロセッシングエレメントＰＥの制御回路部１８は、出力スイッチ部１７を制御して、演算結果ｄ、データａおよびデータｂを取得し、演算結果ｄをｋ軸方向に隣接するプロセッシングエレメントＰＥのｋ入力端子に出力し、また、データａ（詳細には、レジスタＲ０に記録されていたデータ）を−ｊ軸方向に隣接するプロセッシングエレメントＰＥのｊ入力端子に出力する。なお、取得されたデータｂに関しては、隣接するプロセッシングエレメントＰＥに対して出力されない。 Next, the control circuit unit 18 of each processing element PE controls the output switch unit 17 to obtain the operation result d, data a, and data b, and the operation result d is input to the k of the processing element PE adjacent in the k-axis direction. Further, the data a (specifically, data recorded in the register R0) is output to the j input terminal of the processing element PE adjacent in the −j axis direction. The acquired data b is not output to the adjacent processing element PE.

そして、各プロセッシングエレメントＰＥの制御回路部１８は、入力スイッチ部１４を制御して、隣接するプロセッシングエレメントＰＥよりｋ入力端子を介して演算結果ｄを取得し、ｊ入力端子を介してデータａ（レジスタＲ０のデータ）を取得して、演算結果ｄをレジスタＲ１に記録させるとともに、データａ（レジスタＲ０のデータ）をレジスタＲ０に記録する。 Then, the control circuit unit 18 of each processing element PE controls the input switch unit 14 to obtain the operation result d from the adjacent processing element PE through the k input terminal, and the data a ( Register R0) and record the operation result d in the register R1, and record data a (data in the register R0) in the register R0.

このようにして、各プロセッシングエレメントＰＥの制御回路部１８は、演算結果ｄおよびデータａを隣接したプロセッシングエレメントＰＥに対して出力する処理を、各軸方向に配設されるプロセッシングエレメントＰＥの配置個数に対応する回数、つまりｎ回（実施の形態２では２回）だけ繰り返し実行して、ｋ入力端子およびｋ出力端子の接続によりトーラス状に接続された経路と、ｊ入力端子およびｊ出力端子の接続によりトーラス状に接続された経路とに沿って、演算処理結果および所定データを一巡させる。この一巡の処理が第一周期目のｎ回の積和演算処理に該当する。なお、第一周期目のｎ回の積和演算処理において、出力端子２２を介して出力スイッチ部１７に入力されたデータｂは、隣接するプロセッシングエレメントＰＥに対して出力されることはない。 In this way, the control circuit unit 18 of each processing element PE performs the process of outputting the calculation result d and data a to the adjacent processing element PE, and the number of processing elements PE arranged in each axial direction. , I.e., n times (twice in the second embodiment), a path connected in a torus shape by connecting the k input terminal and the k output terminal, and the j input terminal and the j output terminal The arithmetic processing result and the predetermined data are circulated along the path connected in a torus shape by the connection. This one-round processing corresponds to n product-sum operation processing in the first period. Note that, in the n-th product-sum operation processing in the first period, the data b input to the output switch unit 17 via the output terminal 22 is not output to the adjacent processing element PE.

このｎ回の処理の繰り返しにより、各プロセッシングエレメントＰＥのレジスタＲ１には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）が記録されることになる。実施の形態２では、ｎ＝２であるため、２回だけ上述の処理が行われる。図１０および図１１に示した「０」および「１」の処理ステップは、上述したこの２回の処理に該当する処理内容を示している。 By repeating this processing n times, the result of product-sum operation in all the processing elements PE arranged in the k-axis direction (nth product in the first cycle) is stored in the register R1 of each processing element PE. The calculation result by the sum calculation process) is recorded. In the second embodiment, since n = 2, the above-described processing is performed only twice. The processing steps “0” and “1” shown in FIGS. 10 and 11 indicate the processing contents corresponding to the above-described two processings.

続いて、各プロセッシングエレメントＰＥの制御回路部１８は、セレクタ部１５を制御して、レジスタＲ５に記録されるデータを演算回路部１６のａ入力端子に出力させ、レジスタＲ１に記録されるデータを演算回路部１６のｂ入力端子に出力させ、レジスタＲ２に記録されるデータを演算回路部１６のｃ入力端子に出力させる処理を実行する。この処理において、レジスタＲ５に記録されるデータは、Ｃ（ｉ，ｋ）の初期値であり、レジスタＲ１に記録されるデータは、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果であり、レジスタＲ２に記録されるデータは、初期値の０である。 Subsequently, the control circuit unit 18 of each processing element PE controls the selector unit 15 to output the data recorded in the register R5 to the a input terminal of the arithmetic circuit unit 16, and the data recorded in the register R1. A process of outputting to the b input terminal of the arithmetic circuit unit 16 and outputting the data recorded in the register R2 to the c input terminal of the arithmetic circuit unit 16 is executed. In this processing, the data recorded in the register R5 is the initial value of C (i, k), and the data recorded in the register R1 is sum of products in all the processing elements PE arranged in the k-axis direction. The calculated result and the data recorded in the register R2 has an initial value of 0.

各プロセッシングエレメントＰＥの演算回路部１６では、データａとして入力されたレジスタＲ５のデータと、データｂとして入力されたレジスタＲ１のデータと、データｃとして入力されたレジスタＲ２のデータとを用いて、積和演算（ｄ＝ａ×ｂ＋ｃの演算）を実行する。 The arithmetic circuit unit 16 of each processing element PE uses the data in the register R5 input as the data a, the data in the register R1 input as the data b, and the data in the register R2 input as the data c. A product-sum operation (operation of d = a × b + c) is executed.

次いで各プロセッシングエレメントＰＥの制御回路部１８は、出力スイッチ部１７を制御して、演算結果ｄ、データａおよびデータｂを取得し、演算結果ｄを−ｉ軸方向に隣接するプロセッシングエレメントＰＥのｉ入力端子に出力し、また、データｂ（詳細には、レジスタＲ１に記録されていたデータ）をｋ軸方向に隣接するプロセッシングエレメントＰＥのｋ入力端子に出力する。なお、取得されたデータａに関しては、隣接するプロセッシングエレメントＰＥに対して出力されない。 Next, the control circuit unit 18 of each processing element PE controls the output switch unit 17 to obtain the calculation result d, data a, and data b, and the calculation result d is converted to i of the processing element PE adjacent in the −i axis direction. The data b is output to the input terminal, and the data b (specifically, the data recorded in the register R1) is output to the k input terminal of the processing element PE adjacent in the k-axis direction. Note that the acquired data a is not output to the adjacent processing element PE.

そして、各プロセッシングエレメントＰＥの制御回路部１８は、入力スイッチ部１４を制御して、隣接するプロセッシングエレメントＰＥよりｉ入力端子を介して演算結果ｄを取得し、ｋ入力端子を介してデータｂ（レジスタＲ１のデータ）を取得して、演算結果ｄをレジスタＲ２に記録させるとともに、データｂ（レジスタＲ１のデータ）をレジスタＲ１に記録する。 Then, the control circuit unit 18 of each processing element PE controls the input switch unit 14 to obtain the operation result d from the adjacent processing element PE through the i input terminal, and the data b ( (Register R1 data) is acquired, the operation result d is recorded in the register R2, and the data b (data in the register R1) is recorded in the register R1.

このようにして、各プロセッシングエレメントＰＥの制御回路部１８は、演算結果ｄおよびデータｂを隣接したプロセッシングエレメントＰＥに対して出力する処理を、各軸方向に配設されるプロセッシングエレメントＰＥの配置個数に対応する回数、つまりｎ回（実施の形態２では２回）だけ繰り返し実行して、ｉ入力端子およびｉ出力端子の接続によりトーラス状に接続された経路と、ｋ入力端子およびｋ出力端子の接続によりトーラス状に接続された経路とに沿って、演算処理結果および所定データを一巡させる。この一巡の処理が第二周期目のｎ回の積和演算処理に該当する。なお、第二周期目のｎ回の積和演算処理において、出力端子２１を介して出力スイッチ部１７に入力されたデータａは、隣接するプロセッシングエレメントＰＥに対して出力されることはない。 In this way, the control circuit unit 18 of each processing element PE performs the process of outputting the calculation result d and data b to the adjacent processing element PE, and the number of processing elements PE arranged in each axial direction. , I.e., n times (in the second embodiment, twice), a path connected in a torus shape by connecting the i input terminal and the i output terminal, and the k input terminal and the k output terminal The arithmetic processing result and the predetermined data are circulated along the path connected in a torus shape by the connection. This one-round processing corresponds to n product-sum operation processing in the second period. Note that in the n-th product-sum operation processing in the second cycle, the data a input to the output switch unit 17 via the output terminal 21 is not output to the adjacent processing element PE.

このｎ回の処理の繰り返しにより、各プロセッシングエレメントＰＥのレジスタＲ２には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）であるレジスタＲ１に基づいて、ｉ軸方向に配設された全てのプロセッシングエレメントＰＥで積和演算された演算結果（第二周期目のｎ回の積和演算処理による演算結果）が記録され、レジスタＲ１には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果が記録されることになる。図１０および図１１に示した「２」および「３」の処理ステップは、上述したこの２回の処理に該当する処理内容を示している。 By repeating this processing n times, the result of product-sum operation in all the processing elements PE arranged in the k-axis direction (n-th product in the first cycle) is stored in the register R2 of each processing element PE. Based on the register R1 which is the result of the sum operation process, the result of the product-sum operation on all the processing elements PE arranged in the i-axis direction (by n product-sum operation processes in the second period) (Operation result) is recorded, and the result of product-sum operation in all the processing elements PE arranged in the k-axis direction is recorded in the register R1. The processing steps “2” and “3” shown in FIG. 10 and FIG. 11 indicate the processing contents corresponding to the above-described two processings.

続いて、各プロセッシングエレメントＰＥの制御回路部１８は、セレクタ部１５を制御して、レジスタＲ６に記録されるデータを演算回路部１６のａ入力端子に出力させ、レジスタＲ２に記録されるデータを演算回路部１６のｂ入力端子に出力させ、レジスタＲ３に記録されるデータを演算回路部１６のｃ入力端子に出力させる処理を実行する。この処理において、レジスタＲ６に記録されるデータは、Ｃ（ｉ，ｊ）の初期値であり、レジスタＲ２に記録されるデータは、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）に基づいて、ｉ軸方向に配設された全てのプロセッシングエレメントＰＥで積和演算された演算結果（第二周期目のｎ回の積和演算処理による演算結果）であり、レジスタＲ３に記録されるデータは、初期値の０である。 Subsequently, the control circuit unit 18 of each processing element PE controls the selector unit 15 to output the data recorded in the register R6 to the a input terminal of the arithmetic circuit unit 16, and the data recorded in the register R2 A process of outputting to the b input terminal of the arithmetic circuit unit 16 and outputting the data recorded in the register R3 to the c input terminal of the arithmetic circuit unit 16 is executed. In this processing, the data recorded in the register R6 is the initial value of C (i, j), and the data recorded in the register R2 is sum of products in all the processing elements PE arranged in the k-axis direction. Based on the calculated calculation result (the calculation result by the n-th product-sum calculation processing in the first cycle), the calculation result (second calculation) performed by all the processing elements PE arranged in the i-axis direction The data recorded in the register R3 is an initial value of 0.

各プロセッシングエレメントＰＥの演算回路部１６では、データａとして入力されたレジスタＲ６のデータと、データｂとして入力されたレジスタＲ２のデータと、データｃとして入力されたレジスタＲ３のデータとを用いて、積和演算（ｄ＝ａ×ｂ＋ｃの演算）を行う。 The arithmetic circuit unit 16 of each processing element PE uses the data of the register R6 input as the data a, the data of the register R2 input as the data b, and the data of the register R3 input as the data c. A product-sum operation (operation of d = a × b + c) is performed.

次いで各プロセッシングエレメントＰＥの制御回路部１８は、出力スイッチ部１７を制御して、演算結果ｄ、データａおよびデータｂとを取得して、演算結果ｄを−ｊ軸方向に隣接するプロセッシングエレメントＰＥのｊ入力端子に出力し、また、データｂ（詳細には、レジスタＲ２に記録されていたデータ）を−ｉ軸方向に隣接するプロセッシングエレメントＰＥのｉ入力端子に出力する。なお、取得されたデータａに関しては、隣接するプロセッシングエレメントＰＥに対して出力されない。 Next, the control circuit unit 18 of each processing element PE controls the output switch unit 17 to obtain the operation result d, data a, and data b, and the operation result d is adjacent to the processing element PE in the −j axis direction. The data b (specifically, data recorded in the register R2) is output to the i input terminal of the processing element PE adjacent in the -i axis direction. Note that the acquired data a is not output to the adjacent processing element PE.

そして、各プロセッシングエレメントＰＥの制御回路部１８は、入力スイッチ部１４を制御して、隣接するプロセッシングエレメントＰＥよりｊ入力端子を介して演算結果ｄを取得し、ｉ入力端子を介してデータｂ（レジスタＲ２のデータ）を取得して、演算結果ｄをレジスタＲ３に記録させるとともに、データｂ（レジスタＲ２のデータ）をレジスタＲ２に記録する。 Then, the control circuit unit 18 of each processing element PE controls the input switch unit 14 to obtain the calculation result d from the adjacent processing element PE through the j input terminal, and the data b ( (Register R2 data) is acquired, and the operation result d is recorded in the register R3, and the data b (data in the register R2) is recorded in the register R2.

このようにして、各プロセッシングエレメントＰＥの制御回路部１８は、演算結果ｄおよびデータｂを隣接したプロセッシングエレメントＰＥに対して出力する処理を、各軸方向に配設されるプロセッシングエレメントＰＥの配置個数に対応する回数、つまりｎ回（実施の形態２では２回）だけ繰り返し実行して、ｊ入力端子およびｊ出力端子の接続によりトーラス状に接続された経路と、ｉ入力端子およびｉ出力端子の接続によりトーラス状に接続された経路とに沿って、演算処理結果および所定データを一巡させる。この一巡の処理が第三周期目のｎ回の積和演算処理に該当する。なお、第三周期目のｎ回の積和演算処理において、出力端子２１を介して出力スイッチ部１７に入力されたデータａは、隣接するプロセッシングエレメントＰＥに対して出力されることはない。 In this way, the control circuit unit 18 of each processing element PE performs the process of outputting the calculation result d and data b to the adjacent processing element PE, and the number of processing elements PE arranged in each axial direction. , I.e., n times (in the second embodiment, twice), a path connected in a torus shape by connecting the j input terminal and the j output terminal, and the i input terminal and the i output terminal The arithmetic processing result and the predetermined data are circulated along the path connected in a torus shape by the connection. This one-round processing corresponds to n product-sum operation processing in the third period. In the third product-sum operation process in the third period, the data a input to the output switch unit 17 via the output terminal 21 is not output to the adjacent processing element PE.

このｎ回の処理の繰り返しにより、各プロセッシングエレメントＰＥのレジスタＲ３には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）に基づいて積和演算されたｉ軸方向に配設される全てのプロセッシングエレメントＰＥの積和演算結果（第二周期目のｎ回の積和演算処理による演算結果）を用いて、ｊ軸方向に配設された全てのプロセッシングエレメントＰＥで演算処理した演算結果（第三周期目のｎ回の積和演算処理による演算結果）が記録される。これは、すなわち三次元逆離散コサイン変換処理（３Ｄ−ＩＤＣＴ処理）の計算結果である。また、レジスタＲ２には、ｋ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第一周期目のｎ回の積和演算処理による演算結果）に基づいて、ｉ軸方向に配設された全てのプロセッシングエレメントＰＥにおいて積和演算された演算結果（第二周期目のｎ回の積和演算処理による演算結果）が記録されることになる。これは、すなわち、ｉ軸方向およびｋ軸方向で構成される二次元の離散逆コサイン変換の計算結果である。さらに、レジスタＲ１には、ｉ軸方向の一次元の離散逆コサイン変換の計算結果が記録されることになる。実施の形態２では、ｎ＝２であるため、２回だけ上述の処理が行われる。図１０および図１１に示した「４」および「５」の処理ステップは、上述したこの２回の処理に該当する処理内容を示している。 As a result of repeating this process n times, the register R3 of each processing element PE stores the result of the product-sum operation in all the processing elements PE arranged in the k-axis direction (nth product in the first period). The product-sum operation results of all the processing elements PE arranged in the i-axis direction that are product-sum-calculated based on the result of the sum operation processing (operation results of the n-th product-sum operation processing in the second period) Are used to record the calculation results (calculation results of the nth product-sum calculation processing in the third period) performed by all the processing elements PE arranged in the j-axis direction. This is a calculation result of a three-dimensional inverse discrete cosine transform process (3D-IDCT process). Further, the register R2 stores i on the basis of the operation result obtained by the product-sum operation in all the processing elements PE arranged in the k-axis direction (the operation result by the n-th product-sum operation processing in the first period). An operation result obtained by the product-sum operation in all the processing elements PE arranged in the axial direction (operation result by n times product-sum operation processing in the second period) is recorded. This is a calculation result of a two-dimensional discrete inverse cosine transform composed of the i-axis direction and the k-axis direction. Further, the calculation result of the one-dimensional discrete inverse cosine transform in the i-axis direction is recorded in the register R1. In the second embodiment, since n = 2, the above-described processing is performed only twice. The processing steps “4” and “5” shown in FIG. 10 and FIG. 11 indicate processing contents corresponding to the above-described two processings.

このようにして実施の形態２に係るアレイプロセッサ１では、隣接するプロセッシングエレメントＰＥに対して、演算結果および積和演算に用いる所定データを順次出力してそれぞれのプロセッシングエレメントＰＥにおいて積和演算処理を行い、この隣接するプロセッシングエレメントＰＥへの演算結果の出力処理を、順次ｋ軸方向、ｉ軸方向、ｊ軸方向へと出力方向を変えて処理を進めることにより、各演算回路部１６における積和演算結果をレジスタＲ３に記録させることができる。従って、従来のアレイプロセッサ１のように、計算途中でデータや係数データの入力順番を調整するための専用回路を設ける必要がなく、回路構成の簡素化を図ることが可能となる。また、演算結果を隣接するプロセッシングエレメントＰＥに順次出力することによって三次元逆離散コサイン変換処理（三次元逆直交変換処理）を行うことが可能であるため、各次元の計算途中においてプロセッシングエレメントＰＥの内部に蓄えられるデータおよび係数行列要素を、従来のように複雑な配線構造を用いて何度も交換し合う必要がなく、処理の迅速化および簡素化を図ることが可能となる。 In this manner, in the array processor 1 according to the second embodiment, the operation result and the predetermined data used for the product-sum operation are sequentially output to the adjacent processing elements PE, and the product-sum operation processing is performed in each processing element PE. The output of the calculation result to the adjacent processing element PE is sequentially processed by changing the output direction to the k-axis direction, the i-axis direction, and the j-axis direction. The calculation result can be recorded in the register R3. Therefore, unlike the conventional array processor 1, it is not necessary to provide a dedicated circuit for adjusting the input order of data and coefficient data during calculation, and the circuit configuration can be simplified. In addition, since the three-dimensional inverse discrete cosine transform process (three-dimensional inverse orthogonal transform process) can be performed by sequentially outputting the calculation results to adjacent processing elements PE, the processing element PE of each dimension is calculated in the middle. It is not necessary to exchange the data and coefficient matrix elements stored in the inside many times using a complicated wiring structure as in the conventional case, and it becomes possible to speed up and simplify the processing.

このように、実施の形態２に係るアレイプロセッサ１を用いることにより、従来の構成において必須とされていた計算途中におけるデータ同士の入替作業を行うことなくなる。また、複数のプロセッシングエレメントを接続させて演算結果処理を順次行うことにより、三次元逆離散コサイン変換処理（三次元逆直交変換処理）を迅速に実行することができ、さらに回路構成の複雑化を抑制することが可能となる。 As described above, by using the array processor 1 according to the second embodiment, it is not necessary to perform replacement work of data during calculation, which is essential in the conventional configuration. In addition, by connecting a plurality of processing elements and processing the results sequentially, the 3D inverse discrete cosine transform process (3D inverse orthogonal transform process) can be executed quickly, and the circuit configuration can be further complicated. It becomes possible to suppress.

以上、本発明に係るアレイプロセッサについて、図面を用いて詳細に説明を行ったが、本発明に係るアレイプロセッサは、上述した構成に限定されるものではなく、当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例または修正例に想到しうることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 The array processor according to the present invention has been described in detail with reference to the drawings. However, the array processor according to the present invention is not limited to the above-described configuration, and those skilled in the art can claim It is clear that various changes and modifications can be conceived within the scope described in the scope, and it is understood that these also belong to the technical scope of the present invention.

例えば、実施の形態１および実施の形態２に係るアレイプロセッサ１では、プロセッシングエレメントＰＥを、ｉ軸方向，ｊ軸方向，ｋ軸方向のそれぞれに対して２個ずつ、合計８個配設した構成を例として用いて説明を行ったが、本発明に係るアレイプロセッサは、各軸方向に対して２個ずつ配設された構造には限定されず、２以上の個数（例えば、ｎ個）ずつ配設した場合であってもよい。このように複数のプロセッシングエレメントを配設した場合であっても、上述した実施の形態１および実施の形態２に示した方法に従って隣接するプロセッシングエレメントＰＥに演算結果と所定データを出力させて一巡することにより、三次元直交変換処理および三次元逆直交変換処理に関する演算を行うことが可能となる。 For example, in the array processor 1 according to the first embodiment and the second embodiment, a configuration in which a total of eight processing elements PE are provided, two for each of the i-axis direction, the j-axis direction, and the k-axis direction. However, the array processor according to the present invention is not limited to a structure in which two array processors are arranged in each axial direction, and two or more (for example, n) arrays. The case where it arrange | positions may be sufficient. Even when a plurality of processing elements are provided in this way, the calculation result and predetermined data are output to adjacent processing elements PE according to the method described in the first and second embodiments, and the circuit is completed. This makes it possible to perform calculations related to the three-dimensional orthogonal transformation process and the three-dimensional inverse orthogonal transformation process.

また、実施の形態１および実施の形態２に係るアレイプロセッサ１では、説明の便宜上、各プロセッシングエレメントをｉ軸，ｊ軸，ｋ軸のそれぞれに対応付けた座標位置に配設する構成としたが、現実にプロセッシングエレメントＰＥを配設する場合には、このように物理的な三次元配設（立体的な配置）により構成する必要はなく、前述したプロセッシングエレメントＰＥに設けられる３個の入力端子と３個の出力端子とがそれぞれ対応するようにして接続されるものであれば、どのように配設されるものであってもよい。例えば、同一平面状に３つの軸を設定することにより、各プロセッシングエレメントを平面的に配設するものであってもよい。 In the array processor 1 according to the first embodiment and the second embodiment, the processing elements are arranged at coordinate positions corresponding to the i-axis, j-axis, and k-axis for convenience of explanation. When the processing element PE is actually arranged, it is not necessary to form the physical three-dimensional arrangement (three-dimensional arrangement) as described above, and the three input terminals provided in the processing element PE described above. And the three output terminals may be arranged in any way as long as they correspond to each other. For example, each processing element may be arranged in a plane by setting three axes on the same plane.

また、実施の形態１および実施の形態２では、三次元直交変換処理の一例である三次元離散コサイン変換と三次元逆離散コサイン変換との演算を行うアレイプロセッサについて説明を行ったが、本発明に係るアレイプロセッサは、上述した三次元離散コサイン変換と三次元逆離散コサイン変換とに関する三次元直交変換処理等だけに用いられるものではなく、Ｃ（ｉ，ｋ），Ｃ（ｋ，ｊ），Ｃ（ｉ，ｊ）で示された初期値（ＤＣＴ係数）を適切な係数に変更することにより、他の三次元直交変換であるウォルシュ・アダマール変換（ＷＨＴ）や離散フーリエ変換（ＤＦＴ）等の演算処理に利用することが可能である。 In the first embodiment and the second embodiment, the array processor that performs the calculation of the three-dimensional discrete cosine transform and the three-dimensional inverse discrete cosine transform, which is an example of the three-dimensional orthogonal transform processing, has been described. The array processor according to the present invention is not used only for the three-dimensional orthogonal transform processing related to the above-described three-dimensional discrete cosine transform and three-dimensional inverse discrete cosine transform, but C (i, k), C (k, j), By changing the initial value (DCT coefficient) indicated by C (i, j) to an appropriate coefficient, other three-dimensional orthogonal transforms such as Walsh Hadamard transform (WHT), discrete Fourier transform (DFT), etc. It can be used for arithmetic processing.

さらに、実施の形態１および実施の形態２に示したアレイプロセッサ１では、物理的な専用の演算回路を用いて演算回路部６,１６を構成して演算処理を行う場合について説明を行ったが、このよう物理的な回路を用いるのではなく、制御回路部８,１８の演算処理機能を利用することによって演算回路部６,１６の演算処理に該当する演算を行う構成とし、演算処理を行うための回路を物理的に設けない構成とするものであってもよい。 Furthermore, in the array processor 1 shown in the first embodiment and the second embodiment, the case has been described in which the arithmetic circuit units 6 and 16 are configured to perform arithmetic processing using a physical dedicated arithmetic circuit. Instead of using such a physical circuit, the arithmetic processing function of the control circuit units 8 and 18 is used to perform an operation corresponding to the arithmetic processing of the arithmetic circuit units 6 and 16, and the arithmetic processing is performed. For example, the circuit may not be physically provided.

また、実施の形態１および実施の形態２で示したアレイプロセッサ１を用いる場合において、ｉ軸方向，ｊ軸方向およびｋ軸方向におけるデータの転送方向や、どの軸方向へ最初にデータを出力するかというデータの処理順番などを自由に設定・変更することが可能である。また各プロセッシングエレメントＰＥに入力されるデータや初期値（ＤＣＴ係数）を、各プロセッシングエレメントＰＥ（ｉ，ｊ，ｋ）内におけるレジスタＲ０からレジスタＲ６のどこに格納するかを自由に設定・変更することが可能である。このように、データの転送方向や処理順番、さらには、データの格納設定などを変更した場合であっても、実施の形態１および実施の形態２に示した内容と同一の処理結果を得ることが可能であり、また同一の結果を得る方法は、実施の形態１および実施の形態２に示した方法の他、幾通りも考えられる。しかし、これらの変更は、当業者であれば、特許請求の範囲に記載された範疇内において、容易に想到しうることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 In the case of using the array processor 1 shown in the first embodiment and the second embodiment, the data is first output in the i-axis direction, the j-axis direction, the k-axis direction, the data transfer direction, and in which axis direction. It is possible to freely set and change the data processing order. In addition, it is possible to freely set and change where the data and initial values (DCT coefficients) input to each processing element PE are stored in the registers R0 to R6 in each processing element PE (i, j, k). Is possible. As described above, even when the data transfer direction, processing order, and data storage setting are changed, the same processing result as the contents shown in the first and second embodiments can be obtained. In addition to the methods shown in the first embodiment and the second embodiment, various methods are conceivable for obtaining the same result. However, it is obvious that those skilled in the art can easily come up with these modifications within the scope of the claims, and these modifications naturally belong to the technical scope of the present invention. It is understood.

１ …アレイプロセッサ
４、１４ …入力スイッチ部
５、１５ …セレクタ部
６、１６ …演算回路部
７、１７ …出力スイッチ部
８、１８ …制御回路部
１０、２０ …（演算回路部の）出力端子
２１、２２ …（セレクト部から出力スイッチ部への）出力端子
ＰＥ …プロセッシングエレメント
Ｒ０〜Ｒ６ …レジスタ DESCRIPTION OF SYMBOLS 1 ... Array processor 4, 14 ... Input switch part 5, 15 ... Selector part 6, 16 ... Operation circuit part 7, 17 ... Output switch part 8, 18 ... Control circuit part 10, 20 ... Output terminal (of operation circuit part) 21, 22... Output terminal PE (from select section to output switch section) Processing element R 0 to R 6. Register

Claims

Forming a conceptual three-dimensional arrangement state by arranging n processing elements each having a product-sum operation function in three axial directions,
For each processing element, three sets of input terminals and output terminals associated with the axial direction are provided in association with each axial direction, and one processing element arranged adjacent to the same axial direction in the axial direction. By connecting the input terminal and the output terminal in the axial direction of the other processing element, the three input terminals and the output terminal of each processing element are connected in a torus shape corresponding to the axial direction,
In each processing element, an operation result obtained by performing the product-sum operation based on the product-sum operation function is output from an output terminal corresponding to one axial direction to another processing element adjacent in the one axial direction, The operation data used when performing the product-sum operation is output from the output terminal corresponding to the other axial direction to another processing element adjacent to the other axial direction,
In the processing element obtained from the other processing elements adjacent to each other in the different axial directions, the calculation result and the calculation data are calculated using the acquired calculation result and the calculation data, and based on the product-sum calculation By outputting the calculation result and the calculation data to other processing elements adjacent to each other in the axial direction from the output terminal corresponding to the acquired input terminal, it is connected in a torus shape in one axial direction. In all the processing elements, n times product-sum operation processing in the first period is executed in synchronization with each other,
After the first product-sum operation processing in the first cycle, each processing element changes the axial direction of the output terminal that outputs the operation result, and the operation data is changed corresponding to the change in the axial direction. Change the axial direction of the output terminal to be output, and execute n times of product-sum operation processing in the second period in synchronization with each other,
After the nth product-sum operation processing in the second cycle, each processing element changes the axial direction of the output terminal that outputs the calculation result to a different axial direction from the first cycle and the second cycle. In addition, the axial direction of the output terminal that outputs the calculation data corresponding to the change in the axial direction is changed to an axial direction different from the first period and the second period, and the product of n times in the third period By performing the sum operation in sync with each other,
An array processor that performs three-dimensional orthogonal transform processing ,
Each processing element is
One operand value storage means for storing operand values used for the product-sum operation;
Three input information storage means for storing the calculation result or the calculation data input via the input terminal;
Three constant value storage means for storing constant values determined corresponding to the calculation method by the multiply-accumulate function;
Arithmetic processing means for performing a product-sum operation using any one of the calculation result, the calculation data, the operand value, and the constant value;
Input switch means for guiding information input from any of the three input terminals to either the input information storage means or the operand value storage means;
Output switch means for outputting the operation data and the operation result of the product-sum operation performed by the operation processing means from any of the three output terminals;
Selector means for reading three data from any one of the operand value storage means, the input information storage means, and the constant value storage means and guiding the data to the arithmetic processing means;
Control means for controlling the input switch means, the output switch means and the selector means;
Have
When the calculation result is input via the input terminal, the control means controls the input switch means to guide the calculation result to one of the input information storage means, and controls the selector means. Then, the calculation result read from the input information storage means is guided to the calculation processing means, and when the calculation processing is not performed n times in one cycle, the output switch means is controlled to control the output switch means. When the calculation result obtained by the product-sum operation by the calculation processing means is output from the axial output terminal corresponding to the input terminal to which the calculation result is input, and the nth calculation process is performed in one cycle The output switch means is controlled to output the operation result obtained by the product-sum operation by the operation processing means from an output terminal in an axial direction different from the input terminal to which the operation result is input. That
An array processor characterized by that.