JP5601817B2

JP5601817B2 - Parallel processing unit

Info

Publication number: JP5601817B2
Application number: JP2009247807A
Authority: JP
Inventors: 勇一郎村地; 伸一服部
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2009-10-28
Filing date: 2009-10-28
Publication date: 2014-10-08
Anticipated expiration: 2029-10-28
Also published as: JP2011095908A

Description

本発明は、相互に接続された複数の演算要素が単一の命令により並列処理を行うＳＩＭＤ（Single Instruction Multiple Data-path）型の並列演算処理装置に関するものである。 The present invention relates to a SIMD (Single Instruction Multiple Data-path) type parallel arithmetic processing device in which a plurality of arithmetic elements connected to each other perform parallel processing by a single instruction.

今日、種々の分野でＳＩＭＤ型のプロセッサが使用されている。ＳＩＭＤ型のプロセッサとは、１回の命令で複数データに対する処理を同時に行うプロセッサである。従来、このようなＳＩＭＤ型のプロセッサとして、入力シリアルデータをシリアルパラレル変換する第１のシフトレジスタと、この第１のシフトレジスタから出力される並列データを並列に同一プログラムにより処理する複数個のプロセッサと、これらプロセッサから出力されるパラレルデータをパラレルシリアル変換して出力シリアルデータとして出力する第２のシフトレジスタとを備えたものがあった（例えば、特許文献１参照）。 Today, SIMD type processors are used in various fields. The SIMD type processor is a processor that simultaneously processes a plurality of data with a single instruction. Conventionally, as such a SIMD type processor, a first shift register for serial-to-parallel conversion of input serial data, and a plurality of processors for processing parallel data output from the first shift register in parallel by the same program And a second shift register that converts the parallel data output from these processors into parallel serial data and outputs it as output serial data (see, for example, Patent Document 1).

特開平５−２０４４８号公報JP-A-5-20448

しかしながら、従来の並列演算処理装置では、複数個の演算要素を備えていても、ある演算要素が所望の距離離れた演算要素が有するデータを参照するための相互接続機構は、一斉かつ同様に動作する。このため、演算要素のＭ（Ｍ≧２）個並んだＳＩＭＤ型プロセッサで、Ｎ個（Ｎ＜Ｍ）のデータ数を持つＯ列（Ｏ≧２）のデータ列に対して同様の演算処理を実施する際、データ列毎に独立して演算を行いたい場合（データ列間で相互参照を行いたくない場合）にはＯ列のデータを１列ずつ逐次的に処理する必要があった。この場合、Ｍ個存在する演算要素のうちＮ個しか並列に使用できず、ＳＩＭＤ型プロセッサの持つ演算処理能力を全て発揮できず、処理時間の面でロスを生じるという課題があった。 However, in the conventional parallel processing device, even if a plurality of computing elements are provided, the interconnection mechanism for referring to the data held by the computing elements at a desired distance from a certain computing element operates simultaneously and similarly. To do. For this reason, a SIMD processor in which M (M ≧ 2) operation elements are arranged in a SIMD type processor performs similar arithmetic processing on an O column (O ≧ 2) data string having N (N <M) data numbers. In the implementation, when it is desired to perform an operation independently for each data string (when it is not desired to perform cross-reference between data strings), it is necessary to sequentially process the data in the O column one by one. In this case, only N of the M computing elements can be used in parallel, so that all of the arithmetic processing capabilities of the SIMD processor cannot be exhibited, and there is a problem that a loss occurs in terms of processing time.

また、従来の並列演算処理装置では、逐次入力とバス入力への演算処理が一つのＳＩＭＤ型プロセッサ内で要求される場合、ラスタ入力インタフェースを持つＳＩＭＤ型プロセッサでブロック入力を処理する際にはブロック−ラスタ変換処理（パラレル−シリアル変換処理）が、また、ブロック入力インタフェースを持つＳＩＭＤ型プロセッサでラスタ入力データを処理する際にはラスタ−ブロック変換処理（シリアル−パラレル変換処理）が、それぞれ必要とされ、処理時間の面でロスを生じるという課題があった。 Further, in the conventional parallel processing device, when the arithmetic processing to the sequential input and the bus input is required in one SIMD type processor, the block input is processed when the block input is processed by the SIMD type processor having the raster input interface. -Raster conversion processing (parallel-serial conversion processing) and raster-block conversion processing (serial-parallel conversion processing) are required when processing raster input data with a SIMD processor having a block input interface. However, there is a problem that a loss occurs in terms of processing time.

この発明は上記のような課題を解決するためになされたもので、処理の高速化を図ることができる並列演算処理装置を得ることを目的とする。 The present invention has been made to solve the above-described problems, and an object of the present invention is to obtain a parallel arithmetic processing device capable of increasing the processing speed.

この発明に係る並列演算処理装置は、一次元に配置され相互に接続された複数の演算要素を有する演算器群を複数備え、各演算器群間を相互接続機構によって一次元トーラス又はリング形状に接続し、各演算要素同士がＳＩＭＤ型として動作する並列演算処理装置において、コントローラを備え、コントローラが相互接続機構を制御することによって、演算器群内ローカルに他の任意の一つの演算要素の出力を入力とするモードと、全部の演算器群が一次元トーラスまたは一続きのリングであるとみて、ラップアラウンドを考慮して所定の距離内にある演算要素の出力の任意の一つを入力とするモードとを切り換えるようにしたものである。 The parallel arithmetic processing device according to the present invention includes a plurality of arithmetic unit groups having a plurality of arithmetic elements arranged one-dimensionally and connected to each other, and the arithmetic unit groups are formed into a one-dimensional torus or ring shape by an interconnection mechanism. In a parallel arithmetic processing unit that is connected and operates as a SIMD type, each arithmetic element is provided with a controller, and the controller controls the interconnection mechanism to output any other arithmetic element locally in the arithmetic unit group. And any one of the outputs of computing elements within a predetermined distance in consideration of wraparound, assuming that all computing units are a one-dimensional torus or a series of rings. The mode to be switched is switched .

この発明の並列演算処理装置は、コントローラが相互接続機構を制御することによって、演算器群内ローカルに他の任意の一つの演算要素の出力を入力とするモードと、全部の演算器群が一次元トーラスまたは一続きのリングであるとみて、ラップアラウンドを考慮して所定の距離内にある演算要素の出力の任意の一つを入力とするモードとを切り換えるよう構成したので、処理の高速化を図ることができる。 In the parallel arithmetic processing device of the present invention, the controller controls the interconnection mechanism, so that the output of any one other arithmetic element is input locally within the arithmetic unit group, and all the arithmetic unit groups are primary. Considering the original torus or a series of rings, considering the wraparound, it is configured to switch to the mode that inputs any one of the outputs of the calculation elements within a predetermined distance, so the processing speed is increased. Can be achieved.

この発明の実施の形態１による並列演算処理装置を示す構成図である。It is a block diagram which shows the parallel arithmetic processing apparatus by Embodiment 1 of this invention. 一般的な画像データの入力形式であるラスタ入力形式とブロック入力形式を示す説明図である。It is explanatory drawing which shows the raster input format which is a general input format of image data, and a block input format. この発明の実施の形態１による並列演算処理装置の入力インタフェースを示す構成図である。It is a block diagram which shows the input interface of the parallel arithmetic processing unit by Embodiment 1 of this invention. この発明の実施の形態１による並列演算処理装置の入力インタフェースをラスタ入力インタフェースとして動作させた場合の説明図である。It is explanatory drawing at the time of making the input interface of the parallel arithmetic processing unit by Embodiment 1 of this invention operate | move as a raster input interface. この発明の実施の形態１による並列演算処理装置の入力インタフェースをバス入力インタフェースとして動作させた場合の説明図である。It is explanatory drawing at the time of operating the input interface of the parallel arithmetic processing unit by Embodiment 1 of this invention as a bus input interface. この発明の実施の形態１による並列演算処理装置の出力インタフェースを示す構成図である。It is a block diagram which shows the output interface of the parallel arithmetic processing unit by Embodiment 1 of this invention. この発明の実施の形態１による並列演算処理装置の出力インタフェースをラスタ出力インタフェースとして動作させた場合の説明図である。It is explanatory drawing at the time of making the output interface of the parallel arithmetic processing unit by Embodiment 1 of this invention operate | move as a raster output interface. この発明の実施の形態１による並列演算処理装置の出力インタフェースをバス出力インタフェースとして動作させた場合の説明図である。It is explanatory drawing at the time of operating the output interface of the parallel arithmetic processing unit by Embodiment 1 of this invention as a bus output interface. この発明の実施の形態１による並列演算処理装置の相互接続機構を示す構成図である。It is a block diagram which shows the interconnection mechanism of the parallel arithmetic processing unit by Embodiment 1 of this invention. この発明の実施の形態１による並列演算処理装置の相互接続機構の動作を示す説明図である。It is explanatory drawing which shows operation | movement of the interconnection mechanism of the parallel arithmetic processing unit by Embodiment 1 of this invention. この発明の実施の形態１による並列演算処理装置の接続選択機構を示す構成図である。It is a block diagram which shows the connection selection mechanism of the parallel arithmetic processing unit by Embodiment 1 of this invention. この発明の実施の形態１による並列演算処理装置のＮ＝１６、Ｍ＝８とした場合の相互接続機構、接続選択機構および演算要素の構成図である。It is a block diagram of an interconnection mechanism, a connection selection mechanism, and arithmetic elements when N = 16 and M = 8 of the parallel arithmetic processing device according to the first embodiment of the present invention. この発明の実施の形態２による並列演算処理装置の構成図である。It is a block diagram of the parallel arithmetic processing apparatus by Embodiment 2 of this invention. この発明の実施の形態３による並列演算処理装置の構成図である。It is a block diagram of the parallel arithmetic processing apparatus by Embodiment 3 of this invention. この発明の実施の形態４による並列演算処理装置の構成図である。It is a block diagram of the parallel arithmetic processing apparatus by Embodiment 4 of this invention. この発明の実施の形態５による並列演算処理装置の構成図である。It is a block diagram of the parallel arithmetic processing apparatus by Embodiment 5 of this invention. この発明の実施の形態５による並列演算処理装置の垂直相互接続機構を示す構成図である。It is a block diagram which shows the vertical interconnection mechanism of the parallel arithmetic processing unit by Embodiment 5 of this invention. この発明の実施の形態５による並列演算処理装置の垂直接続選択機構を示す構成図である。It is a block diagram which shows the vertical connection selection mechanism of the parallel arithmetic processing unit by Embodiment 5 of this invention.

実施の形態１．
図１は、この発明の実施の形態１による並列演算処理装置を示す構成図である。
図１に示す並列演算処理装置はＳＩＭＤ型プロセッサであり、演算要素１、入力インタフェース２、出力インタフェース３、コントローラ４、相互接続機構５、接続選択機構６を備えている。演算要素１は複数個が一次元に配置され、演算と演算途中結果の記録を実施するものであり、ＡＬＵ（Arithmetic Logic Unit）１１、内部メモリ１２ａ，１２ｂを備えている。ＡＬＵ１１は、コントローラ４からの演算命令によって所望の演算を実施する演算部である。内部メモリ１２ａ，１２ｂは、ＡＬＵ１１から出力される演算結果を記録し、ＡＬＵ１１および出力インタフェース３に対して転送するためのデータ記録手段である。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing a parallel arithmetic processing apparatus according to Embodiment 1 of the present invention.
The parallel arithmetic processing apparatus shown in FIG. 1 is a SIMD type processor, and includes an arithmetic element 1, an input interface 2, an output interface 3, a controller 4, an interconnection mechanism 5, and a connection selection mechanism 6. A plurality of calculation elements 1 are arranged one-dimensionally and perform calculation and recording of calculation results, and include an ALU (Arithmetic Logic Unit) 11 and internal memories 12a and 12b. The ALU 11 is a calculation unit that performs a desired calculation according to a calculation command from the controller 4. The internal memories 12 a and 12 b are data recording means for recording the calculation result output from the ALU 11 and transferring it to the ALU 11 and the output interface 3.

入力インタフェース２は、外部からラスタ形式またはブロック形式で入力される例えば画像データといったデータを受信し、並列演算処理装置内部に転送するためのインタフェースであり、入力バッファ２１とセレクタ２２とを有している。入力バッファ２１は、演算要素１に対応して同数設けられ、外部から受信した入力データを記録し、それぞれの演算要素１に対して並列に転送するためのバッファである。セレクタ２２は、入力インタフェース２の入力データ形式を選択するための入力形式選択手段である。 The input interface 2 is an interface for receiving data such as image data input from the outside in a raster format or a block format, and transferring the data to the inside of the parallel processing unit. The input interface 2 includes an input buffer 21 and a selector 22. Yes. The input buffers 21 are provided in the same number corresponding to the calculation elements 1, and are buffers for recording input data received from the outside and transferring them in parallel to the respective calculation elements 1. The selector 22 is input format selection means for selecting the input data format of the input interface 2.

図２は、一般的な画像データの入力形式であるラスタ入力形式とブロック（バス）入力形式を示す説明図であり、（ａ）はラスタ入力形式、（ｂ）はブロック（バス）入力形式を表す。図中の丸印が画素データを示している。尚、ブロック入力形式については、一例として４×４データのバス入力を示している。但し、バス入力のバス幅については一例としてバス幅＝４データとしている。また、図中の数字はそれぞれの形式における一般的な入力順序を表している。 FIG. 2 is an explanatory diagram showing a raster input format and a block (bus) input format, which are general image data input formats, where (a) is a raster input format and (b) is a block (bus) input format. Represent. Circles in the figure indicate pixel data. As for the block input format, 4 × 4 data bus input is shown as an example. However, the bus width of the bus input is, for example, bus width = 4 data. The numbers in the figure represent general input orders in the respective formats.

図１に戻り、出力インタフェース３は、演算要素１からの出力データを受信し、外部にラスタ形式またはブロック形式でデータを転送するためのインタフェースであり、内部に、出力バッファ３１とセレクタ３２とを有している。出力バッファ３１は、それぞれの演算要素１に対応して同数設けられ、各演算要素１から出力される演算結果データを記録し、外部に転送するためのバッファである。セレクタ３２は、出力インタフェース３の出力データ形式を選択するための出力形式選択手段である。 Returning to FIG. 1, the output interface 3 is an interface for receiving the output data from the computing element 1 and transferring the data to the outside in a raster format or block format, and internally includes an output buffer 31 and a selector 32. Have. The output buffers 31 are provided in the same number corresponding to the respective computation elements 1 and are used for recording the computation result data output from each computation element 1 and transferring the data to the outside. The selector 32 is output format selection means for selecting the output data format of the output interface 3.

コントローラ４は、演算要素１、入力インタフェース２、出力インタフェース３、相互接続機構５および接続選択機構６に対して単一の演算命令を転送し、演算処理を制御するための制御部である。相互接続機構５は、各演算要素１を相互に接続し、コントローラ４の演算命令によって任意の距離の演算要素１からのデータを転送する参照データ距離の選択手段である。接続選択機構６は、相互接続機構５の間に一定の間隔で配置され、相互接続機構５の接続構成を選択するための参照データ群の選択手段である。 The controller 4 is a control unit for transferring a single calculation command to the calculation element 1, the input interface 2, the output interface 3, the interconnection mechanism 5, and the connection selection mechanism 6 to control the calculation process. The interconnection mechanism 5 is a reference data distance selection unit that connects the calculation elements 1 to each other and transfers data from the calculation element 1 at an arbitrary distance according to a calculation command of the controller 4. The connection selection mechanism 6 is a reference data group selection unit that is arranged at a constant interval between the interconnection mechanisms 5 and selects a connection configuration of the interconnection mechanism 5.

図３に入力インタフェース２の詳細を示す。
各セレクタ２２は、コントローラ４からの制御信号により入力形式を選択するよう構成されている。各セレクタ２２は、ラスタ入力データとバス入力データとを入力し、その選択出力を入力バッファ２１に出力する。入力バッファ２１は、コントローラ４の制御信号に基づいてセレクタ２２からのデータを受信し、それぞれの演算要素１に対して出力するよう構成されている。このような構成により、入力インタフェース２は、ラスタ入力インタフェースの機能とバス入力インタフェースの機能の二つの機能を持つ。尚、ここでは、一例としてバス入力を４データのバス幅としたが、バス幅は４データに限らない。 FIG. 3 shows details of the input interface 2.
Each selector 22 is configured to select an input format according to a control signal from the controller 4. Each selector 22 inputs raster input data and bus input data, and outputs the selected output to the input buffer 21. The input buffer 21 is configured to receive data from the selector 22 based on a control signal from the controller 4 and output the data to each arithmetic element 1. With such a configuration, the input interface 2 has two functions: a raster input interface function and a bus input interface function. Here, as an example, the bus input is a bus width of 4 data, but the bus width is not limited to 4 data.

図４は、入力インタフェース２がラスタ入力インタフェースとして動作する際の説明図であり、そのデータフローを破線で示している。
ラスタ入力によりデータが入力される際は、左端のセレクタ２２では外部からのラスタ入力データが選択されるように、その他のセレクタ２２では左隣の入力バッファ２１のデータが選択されるように、即ち、全てのセレクタ２２で左側の入力信号が選択されるように、コントローラ４からのセレクタ制御信号を与える。
更に、コントローラ４からの入力バッファ制御信号を、全ての入力バッファ２１が受信を行うように与える。これによって、左端の入力バッファ２１が外部からのラスタ入力データを、その他の入力バッファ２１が左隣の入力バッファ２１のデータをそれぞれ受信し、ラスタ入力インタフェースと同等の動作が可能となる。 FIG. 4 is an explanatory diagram when the input interface 2 operates as a raster input interface, and its data flow is indicated by a broken line.
When data is input by raster input, the leftmost selector 22 selects external raster input data, and the other selectors 22 select data in the input buffer 21 adjacent to the left. The selector control signal from the controller 4 is given so that the left input signal is selected by all the selectors 22.
Further, an input buffer control signal from the controller 4 is given so that all the input buffers 21 receive it. As a result, the input buffer 21 at the left end receives raster input data from the outside, and the other input buffers 21 receive data from the input buffer 21 adjacent to the left, respectively, so that an operation equivalent to the raster input interface is possible.

図５は、入力インタフェース２がバス入力インタフェースとして動作する際の説明図であり、そのデータフローを破線で示している。
バス入力によりデータが入力される際は、セレクタ２２上部右側の入力信号が選択されるように、コントローラ４からのセレクタ制御信号を与える。
更に、コントローラ４からの入力バッファ制御信号については、書き込みたい部分のみ受信を行うように与える。これによって、バス入力を任意の連続した入力バッファ２１がバス入力データを受信し、バス入力インタフェースと同等の動作が可能となる。 FIG. 5 is an explanatory diagram when the input interface 2 operates as a bus input interface, and its data flow is indicated by a broken line.
When data is input by bus input, a selector control signal from the controller 4 is given so that the input signal on the upper right side of the selector 22 is selected.
Further, the input buffer control signal from the controller 4 is given so that only the part to be written is received. As a result, any continuous input buffer 21 receiving the bus input receives the bus input data, and an operation equivalent to that of the bus input interface becomes possible.

図６に出力インタフェース３の詳細を示す。
各セレクタ３２は、コントローラ４からの制御信号（上部セレクタ制御信号）により、演算要素１からのデータと、ラスタ出力時のシフト用データを選択するよう構成されている。即ち、ラスタ出力時のシフト用データの選択として、図面左端以外のセレクタ３２は、それぞれ一つ左側の出力バッファ３１の出力を入力するようになっている。また、セレクタ３３は、コントローラ４からの制御信号（下部セレクタ制御信号）によって任意の位置の１データを選択するためのセレクタである。更に、出力バッファ３１は、コントローラ４の制御信号によって各セレクタ３２からのデータを受信するよう構成されている。
このような構成により、出力インタフェース３は、ラスタ出力インタフェースの機能とバス出力インタフェースの機能の二つの機能を持つ。尚、ここでは一例としてバス出力を４データのバス幅としたが、バス幅は４データに限らない。 FIG. 6 shows details of the output interface 3.
Each selector 32 is configured to select data from the calculation element 1 and shift data at the time of raster output in accordance with a control signal (upper selector control signal) from the controller 4. That is, as a selection of shift data at the time of raster output, selectors 32 other than the left end of the drawing each receive the output of the output buffer 31 on the left side. The selector 33 is a selector for selecting one data at an arbitrary position by a control signal (lower selector control signal) from the controller 4. Further, the output buffer 31 is configured to receive data from each selector 32 in accordance with a control signal from the controller 4.
With such a configuration, the output interface 3 has two functions: a raster output interface function and a bus output interface function. Here, as an example, the bus output has a bus width of 4 data, but the bus width is not limited to 4 data.

また、演算要素１からデータを取り込む際の動作は、ラスタ出力、バス出力で共通である。コントローラ４からの上部セレクタ制御信号によりセレクタ３２右側の入力信号を選択し、同時にコントローラ４からの出力バッファ制御信号を与える。これによって、演算要素１から転送されるデータを並列に出力バッファ３１で受信する。 In addition, the operation when fetching data from the computing element 1 is common to raster output and bus output. The input signal on the right side of the selector 32 is selected by the upper selector control signal from the controller 4, and at the same time, the output buffer control signal from the controller 4 is given. As a result, the data transferred from the computing element 1 is received by the output buffer 31 in parallel.

図７は、出力インタフェース３がラスタ出力インタフェースとして動作する際の説明図であり、そのデータフローを破線で示している。
ラスタ出力によりデータを出力する際はコントローラ４からの上部セレクタ制御信号によりセレクタ３２左側の入力信号を選択し、同時にコントローラ４からの出力バッファ制御信号を与える。これによって、全ての出力バッファ３１が一斉に隣の出力バッファ３１（図では右隣）へデータを転送し、かつ隣の出力バッファ３１（図では左隣）からデータを受信する。ここで、右端の出力バッファ３１が外部への転送を実施する。
このようにして、出力インタフェース３はラスタ出力インタフェースとしての動作が可能である。 FIG. 7 is an explanatory diagram when the output interface 3 operates as a raster output interface, and its data flow is indicated by a broken line.
When data is output by raster output, an input signal on the left side of the selector 32 is selected by an upper selector control signal from the controller 4, and an output buffer control signal from the controller 4 is given at the same time. As a result, all the output buffers 31 simultaneously transfer data to the adjacent output buffer 31 (right adjacent in the figure) and receive data from the adjacent output buffer 31 (left adjacent in the figure). Here, the output buffer 31 at the right end carries out transfer to the outside.
In this way, the output interface 3 can operate as a raster output interface.

図８は、出力インタフェースがバス出力インタフェースとして動作する際の説明図であり、そのデータフローを破線で示している。
バス出力によりデータを出力する際は、コントローラ４からの下部セレクタ制御信号によって４つのセレクタ３３を制御し、出力したい位置の連続した４つの出力バッファを選択する。このようにして、出力インタフェース３はバス出力インタフェースとしての動作が可能である。 FIG. 8 is an explanatory diagram when the output interface operates as a bus output interface, and its data flow is indicated by a broken line.
When outputting data by bus output, the four selectors 33 are controlled by the lower selector control signal from the controller 4 to select four continuous output buffers at the positions to be output. In this way, the output interface 3 can operate as a bus output interface.

上述の入力インタフェース２および出力インタフェース３により、ラスタ入力およびバス入力に対して単一の構成で対応可能となる。これにより、従来に比べて、ラスタ入力およびバス入力が単一のＳＩＭＤ型プロセッサで要求される場合に、シリアル−パラレル変換処理もしくはパラレル−シリアル変換処理に必要とされる演算処理を削減することができるという効果が得られる。 The input interface 2 and the output interface 3 described above can deal with raster input and bus input with a single configuration. As a result, when the raster input and the bus input are required by a single SIMD type processor, the arithmetic processing required for the serial-parallel conversion process or the parallel-serial conversion process can be reduced as compared with the conventional case. The effect that it can be obtained.

図９に相互接続機構５の詳細を示す。
図中のセレクタ５１は、コントローラ４からの制御信号に従い、任意の距離の演算要素１から転送されたデータを選択するための選択手段である。但し、ここでは相互接続機構５のうち、左側の演算要素１のデータを参照するための部分のみを示している。右側の演算要素１のデータを参照するための部分については、左側と同様（左右対称）の構成および動作となるため、ここでは説明を省略する。
図中のカギ括弧内の数値［］は、演算要素の位置を表している。即ち、演算要素［ｘ］が相互接続機構５直下の演算要素であり、例えば、演算要素［ｘ−８］は八つ左隣の演算要素を、演算要素［ｘ＋４］は四つ右隣の演算要素を表す。
相互接続機構５は、コントローラ４の演算命令に従い、全てが一斉に所望の距離離れた演算要素１の内部メモリ１２ａまたは１２ｂのデータを転送する。例えば、左１の距離の転送を命じる場合、各相互接続機構５は図１０の破線で示すように、右に１つ分離れた位置の演算要素１から内部メモリのデータを転送する。尚、図１０では、接続選択機構６の図示は省略している。 FIG. 9 shows details of the interconnection mechanism 5.
A selector 51 in the figure is a selection means for selecting data transferred from the arithmetic element 1 at an arbitrary distance in accordance with a control signal from the controller 4. However, only the part for referring to the data of the calculation element 1 on the left side of the interconnection mechanism 5 is shown here. The portion for referring to the data of the calculation element 1 on the right side has the same configuration and operation as the left side (symmetrical), and therefore the description thereof is omitted here.
The numerical value [] in the brackets in the figure represents the position of the calculation element. That is, the arithmetic element [x] is an arithmetic element immediately below the interconnection mechanism 5, for example, the arithmetic element [x−8] is the arithmetic element on the left by eight, and the arithmetic element [x + 4] is the arithmetic element on the right of four. Represents an element.
The interconnection mechanism 5 transfers the data in the internal memory 12a or 12b of the calculation element 1 all at a desired distance all at once according to the calculation command of the controller 4. For example, when ordering the transfer of the distance of the left 1, each interconnection mechanism 5 transfers the data in the internal memory from the calculation element 1 at the position separated by 1 on the right as shown by the broken line in FIG. 10. In FIG. 10, the connection selection mechanism 6 is not shown.

このような相互接続機構５の機能により、入力インタフェース２を動作させることなく近隣の演算要素１の保持するデータを参照可能である。ここでは一例として左隣８の距離までの参照が可能な相互接続機構５の例を示したが、この個数や最大距離は８に限らない。また、相互接続機構５が連続しない演算要素を参照することも考えられる。 With such a function of the interconnection mechanism 5, it is possible to refer to data held by the nearby computing element 1 without operating the input interface 2. Here, as an example, the example of the interconnection mechanism 5 capable of referring to the distance of the left adjacent 8 is shown, but the number and the maximum distance are not limited to 8. It is also conceivable that the interconnection mechanism 5 refers to non-consecutive computing elements.

図１１に接続選択機構６の詳細を示す。但し、相互接続機構５と同様、ここでは左側の演算要素１のデータを参照するための部分のみを示している。右側の演算要素１のデータを参照するための部分については、左側と同様（左右対称）の構成および動作となるため、説明を省略する。
接続選択機構６は、相互接続機構５の数（＝演算要素１の数）にしてＭ個おきに配置される。Ｍは、相互接続機構５の数の合計をＮとすると、Ｍ≦Ｎ／２を満たす数である。セレクタ６１は、コントローラ４からのセレクタ制御信号により、演算要素［ｘ’−Ｏ］からのデータと演算要素［ｘ’＋Ｍ−Ｏ］からのデータ（図では１≦Ｏ≦８）から片方を選択し、演算要素［ｘ’＋８−Ｏ］へ出力するための選択手段である。
接続選択機構６により、演算要素［ｘ’−０１］から演算要素［ｘ’−０８］の８データと、演算要素［ｘ’＋Ｍ−０１］から演算要素［ｘ’＋Ｍ−０８］の８データのうち片方を選択し、演算要素［ｘ’］から演算要素［ｘ’＋０７］へ出力可能である。ここでｘ’は、接続選択機構の右側に位置する演算要素が演算要素［ｘ’］となるような値である。 FIG. 11 shows details of the connection selection mechanism 6. However, like the interconnection mechanism 5, only the part for referring to the data of the left arithmetic element 1 is shown here. About the part for referring the data of the calculation element 1 of the right side, since it becomes the structure and operation | movement similar to the left side (symmetrical), description is abbreviate | omitted.
The connection selection mechanisms 6 are arranged at intervals of M in terms of the number of interconnection mechanisms 5 (= the number of computing elements 1). M is a number that satisfies M ≦ N / 2, where N is the total number of interconnection mechanisms 5. The selector 61 selects one of the data from the arithmetic element [x′−O] and the data from the arithmetic element [x ′ + MO] (1 ≦ O ≦ 8 in the figure) by the selector control signal from the controller 4. And selecting means for outputting to the arithmetic element [x ′ + 8−O].
By the connection selection mechanism 6, 8 data from the calculation element [x′-01] to the calculation element [x′−08] and 8 data from the calculation element [x ′ + M-01] to the calculation element [x ′ + M−08] are displayed. One of them can be selected and output from the calculation element [x ′] to the calculation element [x ′ + 07]. Here, x ′ is a value such that the calculation element located on the right side of the connection selection mechanism is the calculation element [x ′].

図１２に、一例としてＮ＝１６、Ｍ＝８とした場合の相互接続機構５、接続選択機構６および演算要素１の一部の詳細を示す。
接続選択機構６に対して、演算要素［ｘ’−Ｏ］のデータを選択するようにコントローラ４からの制御信号を与えることにより、演算要素［０］から演算要素［１５］までの１６個の相互接続機構は全てが繋がった形で動作が可能となる。
また、接続選択機構６に対して、演算要素［ｘ’＋Ｍ−Ｏ］を選択するようにコントローラ４からの制御信号を与えることにより、演算要素［０］から演算要素［７］までの８個の相互接続機構がひとまとまり、演算要素［８］から演算要素［１５］までの８個の相互接続機構がひとまとまりとなり、８並列×２個の相互接続機構５を独立して使用可能となる。 FIG. 12 shows details of a part of the interconnection mechanism 5, the connection selection mechanism 6, and the computing element 1 when N = 16 and M = 8 as an example.
By giving the connection selection mechanism 6 a control signal from the controller 4 so as to select the data of the calculation element [x′−O], 16 pieces of calculation elements [0] to [15] are calculated. The interconnection mechanism can operate in a connected state.
In addition, by giving a control signal from the controller 4 so as to select the calculation element [x ′ + MO] to the connection selection mechanism 6, eight elements from the calculation element [0] to the calculation element [7] are selected. The interconnection mechanisms are grouped together, and the eight interconnection mechanisms from the calculation element [8] to the calculation element [15] are grouped, and eight parallel × two interconnection mechanisms 5 can be used independently. .

従来では、データ列毎に独立の相互参照が必要とされ、かつ、データ数Ｐの大データ列とデータ数Ｑの小データ列（Ｑ≦Ｐ／２）に対する処理が単一のＳＩＭＤプロセッサで要求される場合、小データ列を一つずつ逐次的に処理する必要があった。
しかし、実施の形態１に示すように、接続選択機構６を用い、Ｎ＝Ｐ、Ｍ＝ＱとしてＳＩＭＤ型プロセッサを構成することにより、Ｐ／Ｑ個（但し端数は切捨て）の小データ列を同時に処理可能となる。即ち、従来に対し小データ列の処理速度をＰ／Ｑ倍に向上できる。 Conventionally, independent cross-reference is required for each data string, and processing for a large data string with the number of data P and a small data string with the number of data Q (Q ≦ P / 2) is required by a single SIMD processor. In this case, it is necessary to sequentially process small data strings one by one.
However, as shown in the first embodiment, by using the connection selection mechanism 6 and configuring the SIMD type processor with N = P and M = Q, P / Q (however, the fraction is rounded down) small data strings are obtained. It becomes possible to process at the same time. That is, the processing speed of the small data string can be improved by P / Q times compared to the conventional case.

以上のようにして、この実施の形態１によれば、従来よりも小さい演算処理時間で、ラスタ入力とバス入力の双方に対応可能である。また、従来よりも小さい演算処理時間で、データ数の異なる２種類のデータ列に対する演算処理を行うことが可能である。 As described above, according to the first embodiment, both the raster input and the bus input can be handled with a shorter calculation processing time than the conventional one. In addition, it is possible to perform arithmetic processing on two types of data strings with different numbers of data in a shorter arithmetic processing time than in the past.

尚、上記実施の形態１では、一例として１データを８ビットとしたが、これはどのような値でもかまわない。また、上記実施の形態１では、一例としてＮ＝１６、Ｍ＝８としたが、これはどのような数でもかまわない。ＮはＭの倍数でなくてもよいし、ＮおよびＭは偶数でなくても、２のべき乗でなくてもよい。更に、上記例では、入力インタフェースおよび出力インタフェースのバス入出力を４データのバス幅としたが、これはどのような値でもかまわない。また、入力インタフェースと出力インタフェースのバス幅が異なっていてもかまわない。 In the first embodiment, one data is 8 bits as an example, but this may be any value. In the first embodiment, N = 16 and M = 8 are set as an example, but any number may be used. N may not be a multiple of M, and N and M may not be even or a power of 2. Further, in the above example, the bus input / output of the input interface and the output interface is a bus width of 4 data, but this may be any value. The bus widths of the input interface and the output interface may be different.

また、上記例では、相互接続機構５で参照可能な演算要素を左右に連続した８演算要素としたが、これはどのような値でもかまわない。また、連続していない演算要素１を選択可能な構成としてもかまわない。さらに、左と右で参照できる演算要素数が異なっていてもかまわない。 Further, in the above example, the calculation elements that can be referred to by the interconnection mechanism 5 are eight calculation elements that are continuous in the left and right, but this may be any value. Further, a configuration may be adopted in which non-consecutive calculation elements 1 can be selected. Furthermore, the number of arithmetic elements that can be referred to on the left and right may be different.

以上のように、実施の形態１の並列演算処理装置によれば、一次元に配置され相互に接続された複数の演算要素を有する演算器群を複数備え、各演算器群間を相互接続機構によって一次元トーラス又はリング形状に接続し、各演算要素同士がＳＩＭＤ型として動作する並列演算処理装置において、コントローラを備え、コントローラが相互接続機構を制御することによって、演算器群内ローカルに他の任意の一つの演算要素の出力を入力とするモードと、全部の演算器群が一次元トーラスまたは一続きのリングであるとみて、ラップアラウンドを考慮して所定の距離内にある演算要素の出力の任意の一つを入力とするモードとを切り換えるようにしたので、処理時間のロスを無くすことができ、処理の高速化を図ることができる。 As described above, according to the parallel arithmetic processing device of the first embodiment, a plurality of arithmetic units having a plurality of arithmetic elements arranged one-dimensionally and connected to each other are provided, and an interconnection mechanism is provided between the arithmetic units. In a parallel arithmetic processing device that is connected to a one-dimensional torus or ring shape and operates as a SIMD type in each arithmetic element , a controller is provided, and the controller controls the interconnection mechanism, so that other local in the arithmetic unit group A mode in which the output of any one computation element is input, and all computation units are regarded as a one-dimensional torus or a series of rings, and the computation elements output within a predetermined distance in consideration of wraparound Since the mode for inputting any one of these is switched , the loss of processing time can be eliminated, and the processing speed can be increased.

また、実施の形態１の並列演算処理装置によれば、複数の演算要素１が外部からの入力データを受信するインタフェースとして、ラスタ入力またはブロック入力のどちらかの入力形式を選択するための入力形式選択手段を有する入力インタフェース２を設けたので、ラスタ入力およびバス入力が単一のＳＩＭＤ型プロセッサで要求される場合に、シリアル−パラレル変換処理もしくはパラレル−シリアル変換処理に必要とされる演算処理を削減することができる。 Further, according to the parallel arithmetic processing apparatus of the first embodiment, an input format for selecting either an input format of raster input or block input as an interface through which a plurality of computing elements 1 receive input data from outside. Since the input interface 2 having selection means is provided, when raster input and bus input are required by a single SIMD type processor, arithmetic processing required for serial-parallel conversion processing or parallel-serial conversion processing is performed. Can be reduced.

また、実施の形態１の並列演算処理装置によれば、複数の演算要素１が外部へ出力データを転送するインタフェースとして、ラスタ出力またはブロック出力のどちらかの出力形式を選択するための出力形式選択手段を有する出力インタフェース３を設けたので、ラスタ入力およびバス入力が単一のＳＩＭＤ型プロセッサで要求される場合に、シリアル−パラレル変換処理もしくはパラレル−シリアル変換処理に必要とされる演算処理を削減することができる。 Further, according to the parallel arithmetic processing apparatus of the first embodiment, an output format selection for selecting either an output format of raster output or block output as an interface through which a plurality of arithmetic elements 1 transfer output data to the outside Since the output interface 3 having means is provided, when the raster input and the bus input are required by a single SIMD type processor, the arithmetic processing required for the serial-parallel conversion process or the parallel-serial conversion process is reduced. can do.

実施の形態２．
図１３は、実施の形態２の並列演算処理装置の構成図である。
実施の形態２では、実施の形態１で示した図１の並列演算処理装置と比べて、コントローラ４から接続選択機構６に対する制御信号線が接続選択機構６毎に分離している点が異なる（図中の相違点１００参照）。その他の構成では実施の形態１と同様であるため、以下では、実施の形態１と異なる点に絞って説明を行う。
実施の形態２においては、コントローラ４から接続選択機構６への制御信号線を独立することによって、演算要素［ｘ’−Ｏ］のデータと、演算要素［ｘ’＋Ｍ−Ｏ］のデータどちらを選択するか、接続選択機構６毎に独立に決定可能である。
上記の構成によって、任意の位置で接続選択機構６の参照先を切替えることができ、複数種類の異なるデータ数を持つ小データ列に対してデータ列を同時に処理可能となる。 Embodiment 2. FIG.
FIG. 13 is a configuration diagram of the parallel arithmetic processing apparatus according to the second embodiment.
The second embodiment is different from the parallel processing device of FIG. 1 shown in the first embodiment in that the control signal line from the controller 4 to the connection selection mechanism 6 is separated for each connection selection mechanism 6 ( (See the difference 100 in the figure). Since other configurations are the same as those of the first embodiment, the following description will be focused on differences from the first embodiment.
In the second embodiment, the control signal line from the controller 4 to the connection selection mechanism 6 is independent, so that either the data of the calculation element [x′−O] or the data of the calculation element [x ′ + MO] is stored. It can be selected or determined independently for each connection selection mechanism 6.
With the above configuration, the connection destination of the connection selection mechanism 6 can be switched at an arbitrary position, and a data string can be processed simultaneously for a plurality of types of small data strings having different numbers of data.

一例として、Ｎ＝２５６、Ｍ＝１６とした場合について説明する。任意の位置で接続選択機構６を切替えることが可能となるため、例えば３２個の相互接続機構５をひとまとまりとして８つのデータ列、６４個の相互接続機構５をひとまとまりとして４つのデータ列など、１６×ｉ個（ｉは正の整数）の相互接続機構５をひとまとまりとして１６／ｉ個（ｉは正の整数、端数は切捨て）のデータ列を同時に、独立に処理可能となる。 As an example, a case where N = 256 and M = 16 will be described. Since it is possible to switch the connection selection mechanism 6 at an arbitrary position, for example, eight data strings with 32 interconnect mechanisms 5 as a group, four data strings with 64 interconnect mechanisms 5 as a group, etc. , 16 × i (i is a positive integer) interconnection mechanism 5 as a group, and 16 / i (i is a positive integer, rounded down) data string can be processed simultaneously and independently.

また、相互接続機構５を不等な間隔で区切り、それぞれをひとまとまりとしてデータ列の処理を行うことも可能である。一例として、１２８個、６４個、３２個、３２個のように、相互接続機構５を不等な間隔で区切るようにコントローラ４からの制御信号を与える。これによって、データ数が１２８個のデータ列を１個、データ数が６４個のデータ列を１個、データ数が３２個のデータ列をそれぞれ独立に処理できる。 It is also possible to divide the interconnection mechanism 5 at unequal intervals and perform data string processing as a group. As an example, a control signal from the controller 4 is given so as to divide the interconnection mechanism 5 at unequal intervals such as 128, 64, 32, and 32. Thus, one data string having 128 data, one data string having 64 data, and 32 data strings can be processed independently.

以上のようにして、この実施の形態２によれば、実施の形態１よりも小さい演算処理時間で、データ数の異なる３種類以上のデータ列に対する演算処理を行うことが可能である。 As described above, according to the second embodiment, it is possible to perform arithmetic processing on three or more types of data strings having different numbers of data in a shorter arithmetic processing time than in the first embodiment.

尚、上記例では、一例としてＮ＝２５６、Ｍ＝１６としたが、これはどのような数でもかまわない。ＮはＭの倍数でなくてもよいし、ＮおよびＭは偶数でなくても、２のべき乗でなくてもよい。
また、上記例では、相互接続機構５をひとまとまりとして扱う個数の例として３２個、６４個、１２８個を挙げたが、これは、Ｍ×ｉ個（ｉは正の整数）であることを満たせばどのような数でもよい。それぞれが偶数でなくてもよいし、２のべき乗でなくてもよい。 In the above example, N = 256 and M = 16 are set as an example, but any number may be used. N may not be a multiple of M, and N and M may not be even or a power of 2.
In the above example, 32, 64, and 128 are given as examples of the number of interconnecting mechanisms 5 treated as a unit, but this means that M × i (i is a positive integer). Any number is acceptable as long as it is satisfied. Each may not be an even number or a power of 2.

以上のように、実施の形態２の並列演算処理装置によれば、コントローラ４が、複数の接続選択機構６を個別に制御するようにしたので、複数種類の異なるデータ数を持つ小データ列に対してデータ列を同時に処理可能とすることができる。 As described above, according to the parallel processing device of the second embodiment, the controller 4 individually controls the plurality of connection selection mechanisms 6, so that a plurality of types of small data strings having different numbers of data are obtained. On the other hand, data strings can be processed simultaneously.

また、実施の形態２の並列演算処理装置によれば、接続選択機構６を不等の間隔で配置するようにしたので、データ数の異なる３種類以上のデータ列に対する演算処理を行うことができる。 Further, according to the parallel processing apparatus of the second embodiment, the connection selection mechanism 6 because to arrange at intervals not like, it is possible to perform operation to the three or more types of data strings having different number of data .

実施の形態３．
図１４は、実施の形態３の並列演算処理装置の構成図である。
実施の形態３では、図１で示した実施の形態１の並列演算処理装置と比べて、コントローラ４から相互接続機構５に対する制御信号線がＭ個の相互接続機構５毎に分離しており（図中、相違点１０１参照）、更に、コントローラ４から演算要素１に対する制御信号線がＭ個の演算要素１毎に分離している（図中、相違点１０２参照）点が異なる。その他の構成では実施の形態１と同様であるため、以下では、実施の形態１と異なる点に絞って説明を行う。また、以下では、Ｍ個の演算要素１および相互接続機構５をまとめて演算要素群と呼ぶ。 Embodiment 3 FIG.
FIG. 14 is a configuration diagram of the parallel arithmetic processing apparatus according to the third embodiment.
In the third embodiment, the control signal line from the controller 4 to the interconnection mechanism 5 is separated for each of the M interconnection mechanisms 5 as compared with the parallel processing unit of the first embodiment shown in FIG. Further, the control signal line from the controller 4 to the calculation element 1 is separated for each of the M calculation elements 1 (see the difference 102 in the figure). Since other configurations are the same as those of the first embodiment, the following description will be focused on differences from the first embodiment. In the following, the M computing elements 1 and the interconnection mechanism 5 are collectively referred to as a computing element group.

演算要素群毎にコントローラ４からの制御信号線を独立に与えることによって、各演算要素群をそれぞれが独立したＳＩＭＤ型プロセッサであるかのように使用可能となる。即ち、全体がＮ／Ｍ個の（但し端数は切捨て）演算要素群（＝並列数ＭのＳＩＭＤ型プロセッサ）を有するMultiple Instruction Multiple Data（ＭＩＭＤ）型プロセッサとして使用可能となる。 By independently providing a control signal line from the controller 4 for each arithmetic element group, each arithmetic element group can be used as if it were an independent SIMD type processor. That is, it can be used as a multiple instruction multiple data (MIMD) type processor having a total of N / M (but rounded down) arithmetic element groups (= SIMD type processor with M parallel numbers).

ＭＩＭＤ型プロセッサとして使用する場合、先ず、接続選択機構６に、演算要素［ｘ’＋Ｍ−Ｏ］のデータを選択するようにコントローラ４からの制御信号を与える。これによって、各演算要素間の相互接続が分断される。同時に、コントローラ４から演算要素群に対し、各演算要素群で別個の演算処理が実施されるように、また、別個の相互参照を行うように制御信号を与える。
このような制御信号を与えることにより、各演算要素群が、独立したデータ列に対し、別個の演算命令を実施可能となり、ＭＩＭＤ型プロセッサとしての機能が実現される。 When used as a MIMD type processor, first, the connection selection mechanism 6 is given a control signal from the controller 4 so as to select data of the arithmetic element [x ′ + MO]. As a result, the interconnection between the calculation elements is disconnected. At the same time, a control signal is given from the controller 4 to the calculation element group so that a separate calculation process is performed in each calculation element group and a separate cross-reference is performed.
By giving such a control signal, each calculation element group can execute a separate calculation instruction for an independent data string, and a function as a MIMD type processor is realized.

従来通りのＳＩＭＤ型プロセッサとして使用する場合は、コントローラ４から接続選択機構６への制御信号を、演算要素［ｘ’−Ｏ］のデータを選択するように与え、同時に各演算要素群に対して全て同じ制御信号を与える。
上記のように制御信号を与えることにより、ＳＩＭＤ型プロセッサ全体を従来通り１個のＳＩＭＤ型プロセッサとして使用可能となる。 When used as a conventional SIMD type processor, a control signal from the controller 4 to the connection selection mechanism 6 is given so as to select the data of the calculation element [x′-O], and at the same time for each calculation element group All give the same control signal.
By giving the control signal as described above, the entire SIMD type processor can be used as one SIMD type processor as before.

以上のようにして、この実施の形態３によれば、ＳＩＭＤ型プロセッサがＭＩＭＤ型プロセッサとしても使用可能となる。 As described above, according to the third embodiment, the SIMD type processor can also be used as the MIMD type processor.

尚、上記例では、単一のコントローラ４で全体を制御する構成としたが、コントローラ４が複数個に分かれていてもよい。また、コントローラ４を複数に分割した場合は、分割したコントローラそれぞれが制御する演算要素群の数はいくつであってもよい。 In the above example, the entire controller is controlled by the single controller 4, but the controller 4 may be divided into a plurality. When the controller 4 is divided into a plurality of parts, the number of arithmetic element groups controlled by each of the divided controllers may be any number.

以上のように、実施の形態３の並列演算処理装置によれば、コントローラ４が、複数の演算要素１を個別に制御するようにしたので、ＭＩＭＤ型プロセッサとしての機能を実現することができる。 As described above, according to the parallel arithmetic processing device of the third embodiment, the controller 4 controls the plurality of arithmetic elements 1 individually, so that the function as the MIMD type processor can be realized.

また、実施の形態３の並列演算処理装置によれば、コントローラ４が、複数の相互接続機構５を個別に制御するようにしたので、ＭＩＭＤ型プロセッサとしての機能を実現することができる。 Further, according to the parallel processing device of the third embodiment, the controller 4 individually controls the plurality of interconnection mechanisms 5, so that the function as the MIMD type processor can be realized.

実施の形態４．
図１５は、実施の形態４の並列演算処理装置の構成図である。
実施の形態４は、図１で示した実施の形態１の並列演算処理装置と比べて、入力インタフェース２ａおよび出力インタフェース３ａに従来インタフェースを用いている点が異なる。即ち、実施の形態４は従来の構成に対して接続選択機構６のみを付け加えた構成である。以下では、実施の形態１と異なる点に絞って説明を行う。
入力形式、出力形式は単一だが、データ数の異なるデータ列が存在する場合には、従来の構成に対して接続選択機構６のみを付け加え、入力インタフェース２ａおよび出力インタフェース３ａには従来のものを用いてもよい。即ち、接続選択機構６は、図３〜図５に示す入力インタフェース２や、図６〜図８に示す出力インタフェース３と必ずしも組み合わせて用いる必要はなく、単独で追加することにより、実施の形態１で説明したように、Ｎ＝Ｐ（大データ列）、Ｍ＝Ｑ（小データ列）としてＳＩＭＤ型プロセッサを構成することにより、Ｐ／Ｑ個の小データ列を同時に処理可能とすることができる。 Embodiment 4 FIG.
FIG. 15 is a configuration diagram of the parallel arithmetic processing apparatus according to the fourth embodiment.
The fourth embodiment is different from the parallel processing device of the first embodiment shown in FIG. 1 in that conventional interfaces are used for the input interface 2a and the output interface 3a. That is, the fourth embodiment has a configuration in which only the connection selection mechanism 6 is added to the conventional configuration. In the following, the description will be focused on the points different from the first embodiment.
If there is a data string with a single input format and output format but different numbers of data, only the connection selection mechanism 6 is added to the conventional configuration, and the conventional interface interface 2a and output interface 3a are used. It may be used. That is, the connection selection mechanism 6 does not necessarily need to be used in combination with the input interface 2 shown in FIGS. 3 to 5 or the output interface 3 shown in FIGS. 6 to 8. As described above, by configuring the SIMD type processor with N = P (large data string) and M = Q (small data string), P / Q small data strings can be processed simultaneously. .

以上のようにして、この実施の形態４によれば、従来よりも小さい演算処理時間で、データ数の異なる２種類のデータ列に対する演算処理を行うことが可能である。 As described above, according to the fourth embodiment, it is possible to perform arithmetic processing on two types of data strings having different numbers of data in a shorter arithmetic processing time than in the past.

尚、実施の形態４では、入力インタフェース２ａにはラスタ入力インタフェースを用いてもよいし、バス入力インタフェースを用いてもよい。同様にして、出力インタフェース３ａにはラスタ出力インタフェースを用いてもよいし、バス出力インタフェースを用いてもよい。また、ラスタ入力インタフェースとバス出力インタフェースを同時に用いてもよいし、バス入力インタフェースとラスタ出力インタフェースを同時に用いてもよい。更に、ラスタ形式、バス形式以外の入出力インタフェースを用いることももちろん可能である。 In the fourth embodiment, a raster input interface or a bus input interface may be used as the input interface 2a. Similarly, a raster output interface or a bus output interface may be used as the output interface 3a. Further, the raster input interface and the bus output interface may be used simultaneously, or the bus input interface and the raster output interface may be used simultaneously. Furthermore, it is of course possible to use an input / output interface other than the raster format and bus format.

実施の形態５．
実施の形態５は、実施の形態１に比べて、演算要素１が二次元に配置されており、垂直方向への相互参照が存在する点が異なる。
図１６に、実施の形態５における演算要素１および水平相互接続機構５ａ，垂直相互接続機構５ｂと、水平接続選択機構６ａ，垂直接続選択機構６ｂの接続関係を示す。図示のように、実施の形態５では、水平方向の水平相互接続機構５ａと水平接続選択機構６ａおよび垂直方向の垂直相互接続機構５ｂと垂直接続選択機構６ｂが設けられており、それぞれの機能は実施の形態１における相互接続機構５および接続選択機構６と同様である。また、セレクタ７は、コントローラ４からの制御信号によって水平相互接続機構５ａと垂直相互接続機構５ｂ、どちらかのデータを選択するためのセレクタである。これ以外の並列演算処理装置としての構成は実施の形態１と同様であるため、以下では、実施の形態１と異なる点に絞って説明を行う。 Embodiment 5 FIG.
The fifth embodiment is different from the first embodiment in that the calculation elements 1 are two-dimensionally arranged and there is a cross-reference in the vertical direction.
FIG. 16 shows the connection relationship between the computing element 1, the horizontal interconnection mechanism 5a, the vertical interconnection mechanism 5b, the horizontal connection selection mechanism 6a, and the vertical connection selection mechanism 6b in the fifth embodiment. As shown in the figure, the horizontal interconnection mechanism 5a and horizontal connection selection mechanism 6a in the horizontal direction and the vertical interconnection mechanism 5b and vertical connection selection mechanism 6b in the vertical direction are provided in the fifth embodiment. This is the same as the interconnection mechanism 5 and the connection selection mechanism 6 in the first embodiment. The selector 7 is a selector for selecting either the horizontal interconnection mechanism 5a or the vertical interconnection mechanism 5b according to a control signal from the controller 4. Since the configuration as the parallel arithmetic processing apparatus other than this is the same as that of the first embodiment, the following description will focus on the points different from the first embodiment.

図１７に、垂直相互接続機構５ｂの詳細構成を示す。
垂直相互接続機構５ｂの構成および動作の原理は、実施の形態１における相互接続機構５と全く同様である。但し、図１６中の左側のデータではなく、上側のデータを参照する点が異なる。セレクタ５１ｂは、コントローラ４からの制御信号に従い、上に任意の距離の演算要素１から転送されたデータを選択するための選択手段である。但し、ここでは垂直相互接続機構５ｂのうち、上側の演算要素１のデータを参照する部分のみを示している。下側の演算要素１のデータを参照する部分については、上側と同様（上下対称）の構成および動作となるため、ここでは説明を省略する。また、図中の演算要素［ｙ］は、演算要素の位置を表している。演算要素［ｙ］が垂直相互接続機構５ｂ直近の演算要素であり、演算要素［ｙ−８］は８つ上隣の演算要素を、演算要素［ｙ＋４］は４つ下隣の演算要素を表す。 FIG. 17 shows a detailed configuration of the vertical interconnection mechanism 5b.
The configuration and operation principle of the vertical interconnection mechanism 5b is exactly the same as that of the interconnection mechanism 5 in the first embodiment. However, the difference is that the upper data is referred to instead of the left data in FIG. The selector 51b is a selection means for selecting the data transferred from the computing element 1 at an arbitrary distance above according to the control signal from the controller 4. However, only the portion of the vertical interconnection mechanism 5b that refers to the data of the upper computing element 1 is shown here. The portion that refers to the data of the lower calculation element 1 has the same configuration and operation as those of the upper side (vertical symmetry), and thus the description thereof is omitted here. In addition, the calculation element [y] in the figure represents the position of the calculation element. The calculation element [y] is the calculation element immediately adjacent to the vertical interconnection mechanism 5b, the calculation element [y-8] represents the next eight calculation elements, and the calculation element [y + 4] represents the four lower calculation elements. .

垂直相互接続機構５ｂは、コントローラ４の演算命令に従い、全てが一斉に所望の距離離れた演算要素１の内部メモリ１２ａもしくは１２ｂのデータを転送する。この機能により、入力インタフェース２を動作させることなく近隣の演算要素１の保持するデータを参照可能である。この図１７では、一例として、上隣八つの距離までの参照が可能な相互接続機構を示している。 The vertical interconnection mechanism 5b transfers the data in the internal memory 12a or 12b of the calculation element 1 all at a desired distance all at once according to the calculation command of the controller 4. With this function, it is possible to refer to the data held by the nearby computing element 1 without operating the input interface 2. In FIG. 17, as an example, an interconnection mechanism capable of referring to up to eight adjacent distances is shown.

図１８に垂直接続選択機構６ｂの詳細を示す。
垂直相互接続機構６ｂの構成および動作の原理は、実施の形態１における接続選択機構６と全く同様である。但し、図１６中の左側のデータではなく、上側のデータを参照する点が異なる。
図１８では、図１７で示した垂直相互接続機構５ｂと同様に、ここでは上側の演算要素のデータを参照するための部分のみを示している。下側の演算要素のデータを参照するための部分については、上側と同様（上下対称）の構成および動作となるため、説明を省略する。
垂直接続選択機構６ｂは、垂直相互接続機構５ｂの数（＝演算要素１の数）にしてＲ個おきに配置される。Ｒは、垂直の相互接続機構５の数の合計をＳとすると、Ｒ≦Ｓ／２を満たす数である。また、セレクタ６１ｂは、コントローラ４からのセレクタ制御信号により、演算要素［ｙ’−Ｔ］からのデータと演算要素［ｙ’＋Ｒ−Ｔ］からのデータ（図では１≦Ｔ≦８）から片方を選択し、演算要素［ｙ’＋８−Ｔ］へ出力するための選択手段である。
このような垂直接続選択機構６ｂにより、演算要素［ｙ’−０１］から演算要素［ｙ’−０８］の８データと、演算要素［ｙ’＋Ｍ−０１］から演算要素［ｙ’＋Ｍ−０８］の８データのうち片方を選択し、演算要素［ｙ’］から演算要素［ｙ’＋０７］へ出力可能である。ここでｙ’は、接続選択機構の下側に位置する演算要素が演算要素［ｙ’］となるような値である。 FIG. 18 shows details of the vertical connection selection mechanism 6b.
The configuration and operation principle of the vertical interconnection mechanism 6b is exactly the same as the connection selection mechanism 6 in the first embodiment. However, the difference is that the upper data is referred to instead of the left data in FIG.
In FIG. 18, as in the vertical interconnection mechanism 5 b shown in FIG. 17, only the part for referring to the data of the upper arithmetic element is shown here. The portion for referring to the data of the lower arithmetic element has the same configuration and operation as those of the upper side (vertical symmetry), and thus description thereof is omitted.
The vertical connection selection mechanisms 6b are arranged every R pieces in the number of vertical interconnection mechanisms 5b (= the number of computing elements 1). R is a number satisfying R ≦ S / 2, where S is the total number of vertical interconnection mechanisms 5. Further, the selector 61b receives one of the data from the calculation element [y′-T] and the data from the calculation element [y ′ + RT] (1 ≦ T ≦ 8 in the figure) according to the selector control signal from the controller 4. Is a selection means for selecting and outputting to the computation element [y ′ + 8−T].
By such a vertical connection selection mechanism 6b, eight data from the calculation element [y′-01] to the calculation element [y′-08] and the calculation element [y ′ + M-01] to the calculation element [y ′ + M−08] are obtained. ] Can be selected and output from the calculation element [y ′] to the calculation element [y ′ + 07]. Here, y ′ is a value such that the calculation element located below the connection selection mechanism is the calculation element [y ′].

以上のような垂直相互接続機構５ｂおよび垂直接続選択機構６ｂにより、水平方向と同様にして、上下方向に任意の位置からデータを参照することが可能となる。
更に、セレクタ７は、コントローラ４からの制御信号により、水平相互接続機構５ａから出力される水平方向任意位置のデータと、垂直相互接続機構５ｂから出力される垂直方向任意位置のデータからどちらかのデータを選択し、演算要素１に対して転送する。 With the vertical interconnection mechanism 5b and the vertical connection selection mechanism 6b as described above, data can be referred from an arbitrary position in the vertical direction in the same manner as in the horizontal direction.
Furthermore, the selector 7 selects either the horizontal arbitrary position data output from the horizontal interconnection mechanism 5a or the vertical arbitrary position data output from the vertical interconnection mechanism 5b according to a control signal from the controller 4. Data is selected and transferred to computing element 1.

以上のようにして、実施の形態５によれば、演算要素が二次元に並んだＳＩＭＤ型プロセッサにおいて、左右方向および上下方向の、任意の演算要素１の内部データを参照することが可能となる。 As described above, according to the fifth embodiment, in the SIMD type processor in which the calculation elements are arranged two-dimensionally, it is possible to refer to the internal data of the arbitrary calculation element 1 in the horizontal direction and the vertical direction. .

尚、上記例では、一例として水平相互接続機構５ａおよび垂直相互接続機構５ｂで参照可能な演算要素１を上下に連続した８演算要素としたが、これはどのような値でもかまわない。また、連続していない演算要素１を選択可能な構成としてもかまわない。さらに、上と下で参照できる演算要素数が異なっていてもかまわない。
また、上記の図１６の構成では、一例として水平接続選択機構６ａ、垂直接続選択機構６ｂの配置間隔を共に２個おきとしたが、これはどのような値であってもよい。また、水平と垂直で異なった値でもよい。 In the above example, the calculation element 1 that can be referred to by the horizontal interconnection mechanism 5a and the vertical interconnection mechanism 5b is assumed to be eight calculation elements continuous in the vertical direction, but any value may be used. Further, a configuration may be adopted in which non-consecutive calculation elements 1 can be selected. Furthermore, the number of arithmetic elements that can be referred to above and below may be different.
In the configuration of FIG. 16 described above, the horizontal connection selection mechanism 6a and the vertical connection selection mechanism 6b are arranged at intervals of two as an example, but this may be any value. Also, different values may be used for horizontal and vertical.

以上のように、実施の形態５の並列演算処理装置によれば、一次元に配置され相互に接続された複数の演算要素を有する演算器群を複数備え、各演算器群間を相互接続機構によって一次元トーラス又はリング形状に接続した回路を二次元に配置し、各演算要素同士がＳＩＭＤ型として動作する並列演算処理装置において、コントローラを備え、コントローラが相互接続機構を制御することによって、演算器群内ローカルに他の任意の一つの演算要素の出力を入力とするモードと、一次元に配置された全部の演算器群が一次元トーラスまたは一続きのリングであるとみて、ラップアラウンドを考慮して所定の距離内にある演算要素の出力の任意の一つを入力とするモードとを切り換えるようにしたので、実施の形態１の効果に加えて、二次元配置における左右方向および上下方向の、任意の演算要素１の内部データを参照することができる。 As described above, according to the parallel arithmetic processing device of the fifth embodiment, a plurality of arithmetic units having a plurality of arithmetic elements arranged one-dimensionally and connected to each other are provided, and an interconnection mechanism is provided between the arithmetic units. A parallel arithmetic processing unit in which a circuit connected in a one-dimensional torus or ring shape is arranged two-dimensionally and each arithmetic element operates as a SIMD type is provided with a controller, and the controller controls the interconnection mechanism, thereby calculating The wraparound is performed by considering the mode in which the output of any other arithmetic element is input locally in the unit group and all the unit units arranged in one dimension as a one-dimensional torus or a series of rings. since in view and to switch a mode to enter any one of the output of the operational elements within a predetermined distance, in addition to the effect of the first embodiment, the two-dimensional distribution Horizontal direction and the vertical direction, it is possible to refer to the internal data of an arbitrary calculation element 1 in.

１演算要素、２，２ａ入力インタフェース、３，３ａ出力インタフェース、４コントローラ、５相互接続機構、５ａ水平相互接続機構、５ｂ垂直相互接続機構、６接続選択機構、６ａ水平接続選択機構、６ｂ垂直接続選択機構、７，２２，３２，５１，５１ｂ，６１，６１ｂセレクタ、１１ＡＬＵ、１２ａ，１２ｂ内部メモリ、２１入力バッファ、３１出力バッファ。 1 computing element, 2, 2a input interface, 3, 3a output interface, 4 controller, 5 interconnection mechanism, 5a horizontal interconnection mechanism, 5b vertical interconnection mechanism, 6 connection selection mechanism, 6a horizontal connection selection mechanism, 6b vertical connection Selection mechanism, 7, 22, 32, 51, 51b, 61, 61b selector, 11 ALU, 12a, 12b internal memory, 21 input buffer, 31 output buffer.

Claims

A plurality of computing units having a plurality of computing elements arranged in a one-dimensional manner and connected to each other, the computing units are connected to each other in a one-dimensional torus or ring shape by an interconnection mechanism, and the computing elements are connected to each other. In a parallel processing device operating as a SIMD type ,
A controller, wherein the controller controls the interconnection mechanism;
A mode in which the output of any one other arithmetic element is input locally in the arithmetic unit group, and all the arithmetic unit groups are assumed to be a one-dimensional torus or a series of rings. A parallel arithmetic processing device for switching between a mode in which any one of outputs of arithmetic elements within the distance is input .

A plurality of computing units having a plurality of computing elements arranged in one dimension and connected to each other are arranged, and a circuit in which the computing units are connected in a one-dimensional torus or ring shape by an interconnection mechanism is arranged in two dimensions. In the parallel arithmetic processing device in which each of the arithmetic elements operates as a SIMD type ,
A controller, wherein the controller controls the interconnection mechanism;
A mode in which the output of any one other arithmetic element is input locally in the arithmetic unit group, and all the arithmetic unit groups arranged in the one dimension are regarded as a one-dimensional torus or a continuous ring, A parallel arithmetic processing device that switches between a mode in which any one of outputs of arithmetic elements within a predetermined distance is input in consideration of wraparound .

The input interface having input format selection means for selecting an input format of either a raster input or a block input is provided as an interface through which a plurality of arithmetic elements receive input data from the outside. The parallel arithmetic processing apparatus according to claim 1 or 2.

2. An output interface having output format selection means for selecting an output format of either raster output or block output is provided as an interface through which a plurality of arithmetic elements transfer output data to the outside. The parallel processing unit according to claim 3.

The parallel arithmetic processing device according to claim 1, wherein the connection selection mechanisms are arranged at unequal intervals.

The parallel arithmetic processing device according to claim 1, wherein the controller individually controls the plurality of connection selection mechanisms.

The parallel arithmetic processing device according to claim 1, wherein the controller individually controls a plurality of arithmetic elements.

The parallel arithmetic processing device according to claim 1, wherein the controller individually controls a plurality of interconnection mechanisms.