JPH04217026A

JPH04217026A - Parallel processor

Info

Publication number: JPH04217026A
Application number: JP40322990A
Authority: JP
Inventors: Yutaka Iizuka; 裕飯塚
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1990-12-18
Filing date: 1990-12-18
Publication date: 1992-08-07

Abstract

PURPOSE:To make a whole unit smaller in size and more compacter in constitution and reduce the number of peripheral circuits by using a bus having a small bit number. CONSTITUTION:This parallel processors are provided with plural arithmetic pipelines arranged in parallel, a decoder which outputs process instructions to each arithmetic pipeline after decoding, and general register which is provided with register sections in which process instructions outputted to each decoder are written and the arithmetic pipelines are simultaneously executed by successively writing the process instructions in each register section of the general register, simultaneously designating the register sections in which process instructions are written by means of a parallel instructing means, and then, simultaneously outputting the process instructions to each decoder. Therefore, simultaneous concentration of a large amount of information to one bus can be eliminated at the parallel processing time and the need of using a bus having a large bit number is reduced.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、情報処理の高速化を図
るため、複数の情報を並列に処理する電子計算機等の並
列処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel processing apparatus such as an electronic computer that processes a plurality of pieces of information in parallel in order to speed up information processing.

【０００２】0002

【従来の技術】従来から電子計算機等の情報処理装置を
高速化するために、数々の手法が考えられ、１つの命令
を実行するのに数クロックかかっていたものがほぼ１ク
ロックで実行できるようになってきた。すなわちＣＰＩ
（サイクル・パー・インストラクション）値が２〜５で
あったものが１に近づいてきた。[Background Art] Many techniques have been devised to speed up information processing devices such as electronic computers, and one instruction that used to take several clocks to execute can now be executed in almost one clock. It has become. That is, CPI
(Cycle per instruction) The value used to be 2 to 5, but now it is approaching 1.

【０００３】そして、情報処理装置をさらに高速化する
ため、すなわちＣＰＩ値を１以下にするため、複数の命
令を同時に実行する並列処理装置が考えられた。[0003] In order to further speed up the information processing device, that is, to reduce the CPI value to 1 or less, a parallel processing device that executes a plurality of instructions simultaneously has been devised.

【０００４】この種の並列処理装置としてはＶＬＩＷ（
ベリー・ラージ・インストラクション・ワード）方式（
「並列計算機構成論」、著者冨田眞治、（株）昭晃堂　
　１９８６年１１月）が知られている。以下、「並列計
算機構成論」によるＶＬＩＷ並列計算機を図２に基づい
て概説する。[0004] As this type of parallel processing device, VLIW (
Very Large Instruction Word) method (
"Parallel Computer Configuration Theory", author Shinji Tomita, Shokodo Co., Ltd.
November 1986) is known. Below, a VLIW parallel computer based on "parallel computer configuration theory" will be outlined based on FIG. 2.

【０００５】基本命令は３２ビットの固定長であり、４
つの基本命令を１ワードすなわち１２８ビットに格納す
る。そして、実行時には１ワードを同時に読み出し、４
本の演算パイプラインにより、４つの基本命令を並列に
、かつ同時に実行する。これにより、前述のＣＰＩ値は
理想的には０．２５となる。[0005] The basic instruction has a fixed length of 32 bits, and has a fixed length of 4 bits.
One basic instruction is stored in one word, or 128 bits. Then, during execution, one word is read at the same time, and four words are read at the same time.
The main arithmetic pipeline executes four basic instructions in parallel and simultaneously. As a result, the above-mentioned CPI value is ideally 0.25.

【０００６】２０１は３２ビット幅の４本の内部バス、
２０２は内部バス２０１と３２ビット幅のバス４本で結
ばれたデータユニットで、このデータユニット２０２は
データキャッシュを含む。２０３はインストラクション
ユニットで、インストラクションキャッシュを含む。２
０４はバスインターフェイスで、データユニット２０２
との間は１２８ビット幅の内部データバスで結ばれ、イ
ンストラクションユニット２０３との間は１２８ビット
幅のインストラクションバスで結ばれる。バスインター
フェイス２０４は外部と３２ビットのアドレスバス、１
２８ビットのデータバス及びコントロールバスで接続さ
れる。[0006] 201 has four internal buses with a width of 32 bits,
A data unit 202 is connected to the internal bus 201 by four 32-bit wide buses, and this data unit 202 includes a data cache. 203 is an instruction unit that includes an instruction cache. 2
04 is a bus interface, and data unit 202
A 128-bit wide internal data bus is connected to the instruction unit 203, and a 128-bit wide instruction bus is connected to the instruction unit 203. The bus interface 204 includes an external 32-bit address bus, 1
Connected by a 28-bit data bus and control bus.

【０００７】２０５はインストラクションデコーダ、２
０６はインストラクションレジスタである。インストラ
クションデコーダ２０５はインストラクションユニット
２０３から１２８ビット幅のインストラクションを受取
ってデコードし、マイクロ命令としてインストラクショ
ンレジスタ２０６に格納する。インストラクションレジ
スタ２０６は４命令分のマイクロ命令を保持し、このマ
イクロ命令を出力することにより、第１〜第４の演算パ
イプライン２０８〜２１１を制御する。205 is an instruction decoder;
06 is an instruction register. Instruction decoder 205 receives a 128-bit wide instruction from instruction unit 203, decodes it, and stores it in instruction register 206 as a microinstruction. The instruction register 206 holds four instructions worth of microinstructions, and controls the first to fourth arithmetic pipelines 208 to 211 by outputting these microinstructions.

【０００８】２０７はマルチポートレジスタで、このマ
ルチポートレジスタ２０７は、内部バス２０１と３２ビ
ット幅のバス４本で結ばれて内部バス２０１から処理す
るデータを入力し、３２ビット幅の４本のバスを介して
各演算パイプライン２０８〜２１１に出力する。各演算
パイプライン２０８〜２１１は前記マイクロ命令により
、それぞれ固定少数点演算、論理演算、浮動少数点演算
等のデータ処理を数クロックかけて行う。そして、４本
の演算パイプライン２０８〜２１１全体により実効的に
４つの演算を１クロック毎に行う。各演算パイプライン
２０８〜２１１の出力側はそれぞれ３２ビット幅のバス
を介して内部バス２０１に接続される。Reference numeral 207 denotes a multi-port register. This multi-port register 207 is connected to the internal bus 201 by four 32-bit wide buses, inputs data to be processed from the internal bus 201, and inputs data to be processed from the internal bus 201. It is output to each calculation pipeline 208 to 211 via a bus. Each arithmetic pipeline 208 to 211 performs data processing such as fixed point arithmetic, logical arithmetic, and floating point arithmetic over several clocks using the microinstructions. The four operation pipelines 208 to 211 effectively perform four operations every clock. The output side of each arithmetic pipeline 208-211 is connected to the internal bus 201 via a 32-bit wide bus.

【０００９】次に以上の構成のＶＬＩＷ並列計算機の動
作について説明する。Next, the operation of the VLIW parallel computer with the above configuration will be explained.

【００１０】インストラクションユニット２０３は１２
８ビット幅のインストラクションをバスインターフェイ
ス２０４を介して外部メモリ（図示せず）から読み込む
。次に読み込んだインストラクションをインストラクシ
ョンデコーダ２０５でデコードし、マイクロ命令として
インストラクションレジスタ２０６に書き込む。インス
トラクションレジスタ２０６に書き込まれたマイクロ命
令は各演算パイプライン２０８〜２１１に出力され、こ
れらを制御する。各演算パイプライン２０８〜２１１は
必要に応じてマルチポートレジスタ２０７内のデータを
読み込み、演算処理後のデータを内部バス２０１を介し
てマルチポートレジスタ２０７に書き込む。そして、各
演算パイプライン２０８〜２１１が再びこのデータを読
み込んで複数回の演算処理を施す。また、各演算パイプ
ライン２０８〜２１１は演算処理後のデータを内部バス
２０１を介して一旦データユニット２０２に書き込み、
データユニット２０２がデータを内部バス２０１を介し
てマルチポートレジスタ２０７に書き込み、複数回の演
算を施す。The instruction unit 203 has 12
An 8-bit wide instruction is read from external memory (not shown) via bus interface 204. Next, the instruction decoder 205 decodes the read instruction and writes it into the instruction register 206 as a microinstruction. The microinstruction written in the instruction register 206 is output to each operation pipeline 208 to 211 to control them. Each of the calculation pipelines 208 to 211 reads data in the multiport register 207 as necessary, and writes the data after calculation processing to the multiport register 207 via the internal bus 201. Then, each calculation pipeline 208 to 211 reads this data again and performs calculation processing multiple times. In addition, each calculation pipeline 208 to 211 once writes the data after calculation processing to the data unit 202 via the internal bus 201, and
Data unit 202 writes data to multiport register 207 via internal bus 201 and performs multiple operations.

【００１１】さらに、データユニット２０２はバスイン
ターフェイス２０４を介して外部との間でデータのやり
とりを行う。命令のデコード、インストラクションレジ
スタ２０６からのマイクロ命令の読み出し、各演算パイ
プライン２０８〜２１１での処理はすべてパイプライン
処理されるので、１クロックあたり４命令を実行できる
ことになる。Furthermore, the data unit 202 exchanges data with the outside via a bus interface 204. Instruction decoding, microinstruction reading from the instruction register 206, and processing in each arithmetic pipeline 208 to 211 are all pipelined, so four instructions can be executed per clock.

【００１２】0012

【発明が解決しようとする課題】しかしながら、上述の
ＶＬＩＷ並列計算機では、４つの基本命令を１ワードと
して処理するので、通常の計算機のデータバス幅が１６
ビットまたは３２ビット程度であるのに対し、１２８ビ
ットものデータバス幅が必要となる。このため、ユニッ
ト全体をパッケージする場合、外部に延出するピン数が
増大して複雑になると共に、周辺回路の量が増えるとい
う問題点がある。[Problems to be Solved by the Invention] However, since the above-mentioned VLIW parallel computer processes four basic instructions as one word, the data bus width of a normal computer is 16
However, a data bus width of 128 bits is required. For this reason, when the entire unit is packaged, there are problems in that the number of pins extending to the outside increases and becomes complicated, and the amount of peripheral circuits increases.

【００１３】本発明は、以上の点を考慮してなされたも
ので、高速処理能力を維持したまま、通常の３２ビット
幅のバスを用いてピン数を減少させ、周辺回路の簡略化
が図れる並列処理装置を提供することを目的とする。The present invention has been made in consideration of the above points, and it is possible to simplify the peripheral circuitry by reducing the number of pins by using a normal 32-bit width bus while maintaining high-speed processing capability. The purpose is to provide a parallel processing device.

【００１４】[0014]

【課題を解決するための手段】本発明はかかる問題点を
解決するためになされたもので、複数の演算を並列的に
行なうべく複数個並列に設けられた演算パイプラインと
、各演算パイプラインにそれぞれ対応して複数個設けら
れ、各演算パイプラインに処理命令をデコードして出力
するデコーダと、複数のレジスタ部を有しいずれかのレ
ジスタ部に各デコーダに出力する処理命令が書込まれる
ジェネラルレジスタとを含んで構成され、処理命令を書
込んだ前記ジェネラルレジスタの各レジスタ部を同時に
指定し、各デコーダに同時に出力して各演算パイプライ
ンを同時に実行させる並列命令手段を備えたことを特徴
とする。[Means for Solving the Problems] The present invention has been made to solve such problems, and includes a plurality of arithmetic pipelines provided in parallel to perform a plurality of arithmetic operations in parallel, and each arithmetic pipeline It has a decoder that decodes and outputs a processing instruction to each arithmetic pipeline, and a plurality of register sections, and a processing instruction that is output to each decoder is written in one of the register sections. and a general register, and includes parallel instruction means for simultaneously specifying each register section of the general register into which a processing instruction is written, outputting the same to each decoder at the same time, and causing each arithmetic pipeline to execute at the same time. Features.

【００１５】[0015]

【作用】前記構成により、ジェネラルレジスタの各レジ
スタ部に処理命令を順次書込む。そして、並列命令手段
で処理命令を書込んだ各レジスタ部を同時に指定し、処
理命令を各デコーダに同時に出力して各演算パイプライ
ンを同時に実行させる。これにより、同時に多量の情報
が１つのバスに集中するのを解消し、ビット数の大きい
バスの使用の必要性を抑える。[Operation] With the above configuration, processing instructions are sequentially written to each register portion of the general register. Then, the parallel instruction means simultaneously specifies each register section into which a processing instruction has been written, outputs the processing instruction to each decoder simultaneously, and executes each arithmetic pipeline at the same time. This eliminates the simultaneous concentration of a large amount of information on one bus, and reduces the need to use a bus with a large number of bits.

【００１６】[0016]

【実施例】以下、本発明の一実施例を図１及び図３に基
づいて説明する。Embodiment An embodiment of the present invention will be described below with reference to FIGS. 1 and 3.

【００１７】図１は本実施例の並列処理装置としての並
列計算機を示すブロック図、図３は並列エグゼキュート
命令を示す説明図である。FIG. 1 is a block diagram showing a parallel computer as a parallel processing device of this embodiment, and FIG. 3 is an explanatory diagram showing parallel execute instructions.

【００１８】図１において、１０１は内部バス、１０２
はデータキャッシュを含むデータユニット、１０３はイ
ンストラクションキャッシュを含むインストラクション
ユニット、１０４はバスインターフェイス、１１２〜１
１５は第１〜第４の演算パイプラインで、これらは前述
した従来の並列処理装置と同様の構成を有している。In FIG. 1, 101 is an internal bus, 102
1 is a data unit including a data cache; 103 is an instruction unit including an instruction cache; 104 is a bus interface; 112 to 1;
Reference numeral 15 denotes first to fourth arithmetic pipelines, which have the same configuration as the conventional parallel processing device described above.

【００１９】内部バス１０１とデータユニット１０２と
の間は３２ビット幅の内部データバスで結ばれている。データユニット１０２とバスインターフェイス１０４間
は３２ビット幅の内部データバスで結ばれている。イン
ストラクションユニット１０３とバスインターフェイス
１０４との間は３２ビットのインストラクションバスで
結ばれている。バスインターフェイス１０４には全て３
２ビットのアドレスバス、データバス及びコントロール
バスが接続されている。インストラクションユニット１
０３は内部バス１０１と接続されている。Internal bus 101 and data unit 102 are connected by a 32-bit wide internal data bus. The data unit 102 and the bus interface 104 are connected by a 32-bit wide internal data bus. Instruction unit 103 and bus interface 104 are connected by a 32-bit instruction bus. The bus interface 104 has all 3
A 2-bit address bus, data bus, and control bus are connected. Instruction unit 1
03 is connected to the internal bus 101.

【００２０】１０５は単独演算処理用のデコーダで、イ
ンストラクションユニット１０３からインストラクショ
ンを受取ってデコードし、マイクロ命令として出力する
。１０６はマルチプレクサで、単独演算処理用デコーダ
１０５からのマイクロ命令と、後述の並列第１デコーダ
１０７からのマイクロ命令とを選択的に第１の演算パイ
プライン１１２に出力する。Reference numeral 105 denotes a decoder for single arithmetic processing, which receives instructions from the instruction unit 103, decodes them, and outputs them as microinstructions. A multiplexer 106 selectively outputs microinstructions from the decoder 105 for single arithmetic processing and microinstructions from a parallel first decoder 107 (to be described later) to the first arithmetic pipeline 112.

【００２１】１０７は並列演算処理用の並列第１デコー
ダで、後述のジェネラルレジスタ１１１からの処理命令
を、デコードしてマイクロ命令とし、マルチプレクサ１
０６を介して第１の演算パイプライン１１２に出力する
。１０８は並列第２デコーダで、ジェネラルレジスタ１
１１からの処理命令を、デコードしてマイクロ命令とし
、第２の演算パイプライン１１３に出力する。１０９は
並列第３デコーダ、１１０は並列第４デコーダで、これ
らも並列第２デコーダ１０８と同様に、ジェネラルレジ
スタ１１１からの処理命令を、デコードしてマイクロ命
令とし、第３及び第４の演算パイプライン１１４、１１
５にそれぞれ出力する。Reference numeral 107 denotes a parallel first decoder for parallel arithmetic processing, which decodes processing instructions from the general register 111, which will be described later, into microinstructions and sends them to the multiplexer 1.
06 to the first calculation pipeline 112. 108 is a parallel second decoder, and general register 1
The processing instructions from 11 are decoded into microinstructions and output to the second arithmetic pipeline 113. 109 is a parallel third decoder, 110 is a parallel fourth decoder, and like the parallel second decoder 108, these also decode processing instructions from the general register 111 into microinstructions, and send them to the third and fourth arithmetic pipes. Lines 114, 11
5, respectively.

【００２２】１１１はジェネラルレジスタで、複数のレ
ジスタ部を含んで構成されている。このジェネラルレジ
スタ１１１は、具体的には例えば３２ビット幅を有して
１つの処理命令を書込むことができるレジスタ部を６４
個含んで構成されている。さらに、ジェネラルレジスタ
１１１はマルチポート構成となっており、内部バス１０
１と３２ビット幅の双方向バス４本で結ばれてデータの
入出力を行い、さらに、３２ビット幅のバスで各演算パ
イプライン１１２〜１１５にそれぞれデータを出力する
。ジェネラルレジスタ１１１の各レジスタ部には、ロー
ド命令により各演算パイプライン１１２〜１１５を制御
する処理命令が順次書込まれている。A general register 111 includes a plurality of register sections. Specifically, the general register 111 has 64 register sections each having a width of 32 bits and into which one processing instruction can be written.
It is composed of 1. Furthermore, the general register 111 has a multi-port configuration, and the internal bus 10
It is connected by four bidirectional buses with a width of 1 and 32 bits to input and output data, and further outputs data to each calculation pipeline 112 to 115 using a bus with a width of 32 bits. Processing instructions for controlling each arithmetic pipeline 112 to 115 are sequentially written in each register section of the general register 111 using a load instruction.

【００２３】並列命令手段としての並列エグゼキュート
命令は図３に示す構成になっている。即ち、“並列エク
ゼキュート”を意味するオペレーションコードと、ジェ
ネラルレジスタ１１１の特定のレジスタ部をそれぞれ指
定する４つのフィールドを含んで構成されている。これ
らのビット幅の一例としては、例えばオペレーションコ
ードが８ビット、各フィールドが６ビットである。そし
て、並列エグゼキュート命令１２１は、ジェネラルレジ
スタ１１１の特定のレジスタ部を全て同時に指定し、各
デコーダ１０７〜１１０を介して各演算パイプライン１
１２〜１１５を同時に実行させる。The parallel execute instruction as the parallel instruction means has the configuration shown in FIG. That is, it includes an operation code meaning "parallel execute" and four fields each specifying a specific register section of the general register 111. An example of these bit widths is, for example, an operation code of 8 bits and each field of 6 bits. Then, the parallel execute instruction 121 simultaneously specifies all specific register sections of the general register 111, and sends it to each operation pipeline via each decoder 107 to 110.
12 to 115 are executed simultaneously.

【００２４】次に、前記構成の並列計算機の処理動作に
ついて説明する。まず、通常の単独演算処理は次のよう
になる。インストラクションユニット１０３は処理命令
をバスインターフェイス１０４を介して外部メモリから
読み込む。次に読み込んだ処理命令をデコーダ１０５に
書込み、デコードしてマイクロ命令とし、マルチプレク
サ１０６を介して第１の演算パイプライン１１２を制御
する。この演算パイプライン１１２は必要に応じ、ジェ
ネラルレジスタ１１１内のデータを読み込み、また、内
部バス１０１を介して処理後のデータをジェネラルレジ
スタ１１１に書き込む。さらに演算パイプライン１１２
は内部バス１０１を介して処理後のデータをデータユニ
ット１０２に書き込み、データユニット１０２は内部バ
ス１０１を介してそのデータをジェネラルレジスタ１１
１に書き込む。またデータユニット１０２はバスインタ
ーフェイス１０４を通じ、外部とのデータのやりとりも
行う。Next, the processing operation of the parallel computer with the above configuration will be explained. First, normal single operation processing is as follows. Instruction unit 103 reads processing instructions from external memory via bus interface 104 . Next, the read processing instruction is written to the decoder 105, decoded into a microinstruction, and the first arithmetic pipeline 112 is controlled via the multiplexer 106. The arithmetic pipeline 112 reads data in the general register 111 and writes processed data to the general register 111 via the internal bus 101 as necessary. Furthermore, the calculation pipeline 112
writes the processed data to the data unit 102 via the internal bus 101, and the data unit 102 writes the data via the internal bus 101 to the general register 11.
Write to 1. The data unit 102 also exchanges data with the outside via the bus interface 104.

【００２５】並列演算処理を行なう場合には次のように
なる。[0025] When performing parallel arithmetic processing, the process is as follows.

【００２６】まず、ループの中に入る前にロード命令を
使用して並列実行する複数の処理命令をジェネラルレジ
スタ１１１の特定のレジスタ部に順次格納する。この処
理命令は、３２ビット固定長を有し、プログラムのコン
パイルの段階で、データ部等に予め書込んでおく。なお
、処理命令は殆どループ中に含まれ、繰り返し実行され
る場合が多い。一般的に全ソースコードの５％に実行時
間の９５％が費やされるので、ジェネラルレジスタ１１
１の各レジスタ部に処理命令を格納する時間はループ処
理に対して殆ど問題にならない程度の時間である。First, before entering a loop, a load instruction is used to sequentially store a plurality of processing instructions to be executed in parallel in a specific register section of the general register 111. This processing instruction has a fixed length of 32 bits, and is written in advance to the data section or the like at the stage of compiling the program. Note that most processing instructions are included in loops and are often repeatedly executed. Generally, 95% of the execution time is spent on 5% of the total source code, so the general register 11
The time it takes to store a processing instruction in each register section of 1 is a time that hardly poses a problem for loop processing.

【００２７】そして、並列エグゼキュート命令１２１が
デコーダ１０５に書込まれたとき、この命令１２１の各
フィールドにより指定されたジェネラルレジスタ１１１
の各処理命令が並列第１〜第４デコーダ１０７〜１１０
に出力される。各デコーダ１０７〜１１０は受けた処理
命令をデコードしてマイクロ命令とした後、各演算パイ
プライン１１２〜１１５にそれぞれ出力する。このとき
、マルチプレクサ１０６の入力側は並列第１デコーダ１
０７側に切り替えられており、各演算パイプライン１１
２〜１１５は、各マイクロ命令に従ってパイプライン演
算を行う。そして、並列エグゼキュート命令１２１のデ
コード、ジェネラルレジスタ１１１からの処理命令の読
み出し及び並列第１〜第４デコーダ１０７〜１１０での
処理命令のデコードはすべてパイプライン処理されるの
で、ループ処理により並列エグゼキュート命令１２１が
続く限り、１クロックあたり等価的に４命令を実行し続
けることになる。これにより、実行スピードの観点から
は、１並列エグゼキュート命令１２１を実行することで
通常の命令を４命令実行することになる。なお、ループ
の中には通常の命令は最少限にとどめ、なるべく並列エ
グゼキュート命令１２１を置くようにすることが望まし
い。When the parallel execute instruction 121 is written to the decoder 105, the general register 111 specified by each field of this instruction 121
Each processing instruction of the parallel first to fourth decoders 107 to 110
is output to. Each decoder 107-110 decodes the received processing instruction into a microinstruction, and then outputs the microinstruction to each operation pipeline 112-115, respectively. At this time, the input side of the multiplexer 106 is connected to the parallel first decoder 1.
07 side, each calculation pipeline 11
2 to 115 perform pipeline operations according to each microinstruction. Since the decoding of the parallel execute instruction 121, the reading of the processing instruction from the general register 111, and the decoding of the processing instruction in the parallel first to fourth decoders 107 to 110 are all performed by pipeline processing, the parallel execution instruction is As long as the cute instruction 121 continues, equivalently four instructions will continue to be executed per clock. As a result, from the viewpoint of execution speed, by executing one parallel execute instruction 121, four normal instructions are executed. Note that it is desirable to keep the number of normal instructions to a minimum in the loop and place as many parallel execute instructions 121 as possible.

【００２８】以上により、従来のＶＬＩＷ並列計算機が
４基本命令を１ワードとして１クロック毎に実行するの
に１２８ビット幅のインストラクションバス及びデータ
バスを必要としたのに対し、本実施例の並列計算機は並
列演算処理による高速情報処理能力を維持した状態で、
３２ビット幅のインストラクションバス、データバスに
することができ、ユニット全体の小型、コンパクト化及
び周辺回路の減少を図ることができる。As described above, while the conventional VLIW parallel computer requires a 128-bit wide instruction bus and data bus to execute four basic instructions as one word per clock, the parallel computer of this embodiment maintains high-speed information processing capability through parallel processing,
A 32-bit wide instruction bus and data bus can be used, and the entire unit can be made smaller and more compact, and the number of peripheral circuits can be reduced.

【００２９】なお、本実施例では、１つの並列エグゼキ
ュート命令１２１で通常の命令を４命令並列に実行する
場合を例に説明したが、１つの並列エグゼキュート命令
で通常の命令を２命令、３命令または５命令以上を並列
に実行する場合でも、前記同様の作用、効果を奏するこ
とがこでる。In this embodiment, the case where four normal instructions are executed in parallel with one parallel execute instruction 121 was explained as an example, but one parallel execute instruction executes two normal instructions, Even when three or five or more instructions are executed in parallel, the same operations and effects as described above can be achieved.

【００３０】また、２命令、３命令、４命令または５命
令以上の通常命令を並列実行する並列エグゼキュート命
令の各フィールドに、別々のオペレーションコードを割
り当て、各演算パイプラインにそれぞれ異なる処理を行
わせるようにしてもよい。[0030] Furthermore, a separate operation code is assigned to each field of a parallel execute instruction that executes two, three, four, or five or more normal instructions in parallel, and different processing is performed on each operation pipeline. It may be possible to do so.

【００３１】そして、前記５命令以上の並列エグゼキュ
ート命令とする場合には、これに対応して５以上のデコ
ーダ及び演算パイプラインが設けられることはいうまで
もない。[0031] It goes without saying that when five or more instructions are to be executed in parallel, five or more decoders and arithmetic pipelines are provided correspondingly.

【００３２】[0032]

【発明の効果】以上、詳述したように本発明によれば、
複数個並列に設けられた演算パイプラインと、各演算パ
イプラインに処理命令をデコードして出力するデコーダ
と、いずれかのレジスタ部に各デコーダに出力する処理
命令が書込まれるジェネラルレジスタとを含んで構成さ
れ、ジェネラルレジスタの各レジスタ部に処理命令を順
次書込み、並列命令手段で処理命令を書込んだ各レジス
タ部を同時に指定し、処理命令を各デコーダに同時に出
力して各演算パイプラインを同時に実行させるようにし
たので、同時に多量の情報が１つのバスに集中するのを
解消し、ビット数の大きいバスの使用の必要性を抑える
ことができる。これにより、従来に比べてビット数の小
さいバスにすることができ、ユニット全体の小型、コン
パクト化及び周辺回路の減少を図ることができる。[Effects of the Invention] As detailed above, according to the present invention,
It includes multiple arithmetic pipelines installed in parallel, a decoder that decodes and outputs processing instructions to each arithmetic pipeline, and a general register in which the processing instructions to be output to each decoder are written in one of the register sections. The processing instructions are sequentially written to each register section of the general register, each register section into which the processing instructions have been written is specified simultaneously by the parallel instruction means, and the processing instructions are simultaneously output to each decoder to execute each calculation pipeline. Since they are executed simultaneously, it is possible to prevent a large amount of information from being concentrated on one bus at the same time, and to suppress the need to use a bus with a large number of bits. This makes it possible to use a bus with a smaller number of bits than in the past, making it possible to make the entire unit smaller and more compact, and to reduce the number of peripheral circuits.

[Brief explanation of the drawing]

【図１】本実施例の並列処理装置としての並列計算機を
示すブロック図。FIG. 1 is a block diagram showing a parallel computer as a parallel processing device of this embodiment.

【図２】従来の並列計算機を示すブロック図。FIG. 2 is a block diagram showing a conventional parallel computer.

【図３】並列エグゼキュート命令を示す説明図。FIG. 3 is an explanatory diagram showing parallel execute instructions.

[Explanation of symbols]

１０１　　　　内部バス１０２　　　　データユニット１０３　　　　インストラクションユニット１０４　　
　　バスインターフェイス１０５　　　　単独演算処理用のデコーダ１０６　　　
　マルチプレクサ１０７　　　　並列第１デコーダ１０８　　　　並列第２デコーダ１０９　　　　並列第３デコーダ１１０　　　　並列第４デコーダ１１１　　　　ジェネラルレジスタ101 Internal bus 102 Data unit 103 Instruction unit 104
Bus interface 105 Decoder 106 for independent calculation processing
Multiplexer 107 Parallel first decoder 108 Parallel second decoder 109 Parallel third decoder 110 Parallel fourth decoder 111 General register

Claims

[Claims]

Claim 1: A plurality of arithmetic pipelines are provided in parallel to perform a plurality of arithmetic operations in parallel, and a plurality of arithmetic pipelines are provided corresponding to each arithmetic pipeline, and a processing instruction is decoded into each arithmetic pipeline. and a general register which has a plurality of register sections and into which a processing instruction to be output to each decoder is written. A parallel processing device comprising parallel instruction means that simultaneously specifies each register section and simultaneously outputs to each decoder to execute each calculation pipeline simultaneously.