JPH06139071A

JPH06139071A - Parallel computers

Info

Publication number: JPH06139071A
Application number: JP4292693A
Authority: JP
Inventors: Chikako Nakanishi; 知嘉子中西
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1992-10-30
Filing date: 1992-10-30
Publication date: 1994-05-20

Abstract

PURPOSE:To realize the data by-pass circuit of a memory access instruction in parallel computers. CONSTITUTION:A super scalar processor 20 is constituted of an instruction fetching stage 4, an instruction decoding stage 5, plural functional units 6 to 9 having pipeline structure respectively, a register file 3 to hold temporarily data to be used for executing an instruction, a memory data by-pass 10, and a data by-pass 12, and it can access the data memory 2 of the outside through a data bus 11. One functional unit is constituted of an execution stage 61, a memory access stage 62 and a write back stage 63. Thus, the data can be transferred between pipelines without waiting for the data to be written in a memory, and further, the data can be transferred to the pipeline requesting the read memory data, and the finish of a read instruction need not be waited.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、並列計算機におい
て、並行して処理される複数の命令において使われるデ
ータをパイプライン間で直接転送され得るデータ・バイ
パスに関し、特にメモリ・アクセス命令のデータ・バイ
パスに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data bypass, in which data used in a plurality of instructions processed in parallel can be directly transferred between pipelines in a parallel computer, and more particularly to a data bypass of a memory access instruction. Regarding bypass.

【０００２】[0002]

【従来の技術】「スーパスカラ」は、並列計算機の処理
速度を高めるためのアーキテクチャの１つとして知られ
ている。スーパスカラでは与えられた複数の命令のう
ち、同時に処理可能な命令が検出され、検出された命令
が複数のパイプラインを使って同時に処理される。2. Description of the Related Art "Superscalar" is known as one of the architectures for increasing the processing speed of a parallel computer. In superscalar, an instruction that can be processed simultaneously is detected from a plurality of given instructions, and the detected instruction is processed simultaneously using a plurality of pipelines.

【０００３】図１４は、この発明の背景を示すスーパス
カラプロセッサのブロック図である。構成を以下に説明
する。このスーパスカラ２０は、命令メモリ１内に格納
された複数の命令を取り出す命令フェッチステージ４
と、命令フェッチステージ４において取り出された命令
をデコードする命令デコードステージ５と、各々がパイ
プライン構造を有する複数の機能ユニット６〜９と、命
令を処理するのに使われるデータを一時的に保持するレ
ジスタファイル３と、を含む。FIG. 14 is a block diagram of a superscalar processor showing the background of the present invention. The configuration will be described below. The superscalar 20 is an instruction fetch stage 4 for fetching a plurality of instructions stored in the instruction memory 1.
An instruction decode stage 5 for decoding the instruction fetched in the instruction fetch stage 4, a plurality of functional units 6 to 9 each having a pipeline structure, and temporarily holding data used for processing the instruction. Register file 3 for

【０００４】機能ユニット６〜９は、データバス１１を
介して外部のデータメモリ２にアクセスすることができ
る。レジスタファイル３は、ＲＡＭによって構成され、
機能ユニット６〜９からアクセスされる。The functional units 6 to 9 can access the external data memory 2 via the data bus 11. The register file 3 is composed of RAM,
It is accessed from the functional units 6-9.

【０００５】命令フェッチステージ４は、プログラムカ
ウンタを備えており、プログラムカウンタから発生され
たアドレス信号を命令メモリ１に与える。与えられたア
ドレス信号により指定された複数の命令が命令メモリ１
から取り出され、命令デコードステージ５へと送られ
る。The instruction fetch stage 4 has a program counter, and supplies the address signal generated from the program counter to the instruction memory 1. A plurality of instructions designated by a given address signal are stored in the instruction memory 1
Are fetched from the CPU and sent to the instruction decode stage 5.

【０００６】命令デコードステージ５は、命令メモリ１
から受けた複数の命令をデコードする。命令のデコード
によって、与えられた複数の命令のうち、同時に処理す
ることのできる命令が検出される。これに加えて命令デ
コードステージ５は、機能ユニット６〜９が、与えられ
た命令を処理するのに使われるデータをレジスタファイ
ル３から読み出し、読み出されたデータを機能ユニット
６〜９に与える。The instruction decode stage 5 includes an instruction memory 1
Decode multiple instructions received from. By decoding the instruction, an instruction that can be simultaneously processed is detected from a plurality of given instructions. In addition to this, in the instruction decoding stage 5, the functional units 6 to 9 read the data used for processing the given instruction from the register file 3 and give the read data to the functional units 6 to 9.

【０００７】機能ユニット６，７，８，９は、それぞれ
パイプライン構造を有している。４つの機能ユニットは
各々、実行（ＥＸＣ）ステージ６１、メモリ・アクセス
（ＭＥＭ）・ステージ６２、ライトバック（ＷＢ）・ス
テージ６３を含む。一般に、実行ステージ６１は、演算
及びメモリ・アクセスのためのアドレス計算を行う。ま
た、メモリ・アクセス・ステージ６２は、データ・メモ
リ２からの読み出し、もしくはデータ・メモリ２への書
き込みを行う。ライトバックステージ６３は、演算結果
及びデータ・メモリ２からの読み出しデータをレジスタ
ファイル３に書き込む。The functional units 6, 7, 8 and 9 each have a pipeline structure. Each of the four functional units includes an execute (EXC) stage 61, a memory access (MEM) stage 62, and a writeback (WB) stage 63. Generally, the execute stage 61 performs address calculations for operations and memory accesses. Further, the memory access stage 62 performs reading from the data memory 2 or writing to the data memory 2. The write back stage 63 writes the calculation result and the read data from the data memory 2 into the register file 3.

【０００８】スーパスカラプロセッサ２０は外部的に与
えられる２相のノンオーバラップ・クロック信号も応答
して動作する。図１５は、２相のノンオーバラップ・ク
ロック信号の例を示すタイミング図である。Superscalar processor 20 also operates in response to an externally applied two-phase non-overlap clock signal. FIG. 15 is a timing diagram showing an example of a two-phase non-overlap clock signal.

【０００９】スーパスカラプロセッサ２０の動作は、以
下のようになる。まず命令デコードステージ５によっ
て、与えられた複数の命令のうち同時に処理可能な命令
が検出される。この検出された命令は機能ユニット６〜
９（場合によっては機能ユニット６〜９のうちのいくつ
か）に与えられる。機能ユニット６〜９は、パイプライ
ン構造を有しているため、同時に与えられた命令を処理
することができる。The operation of the superscalar processor 20 is as follows. First, the instruction decode stage 5 detects an instruction which can be simultaneously processed among a plurality of given instructions. The detected instruction is the functional unit 6 ...
9 (possibly some of the functional units 6-9). Since the functional units 6 to 9 have a pipeline structure, they can process given instructions at the same time.

【００１０】図１６は、この場合におけるパイプライン
処理の進行の一例を示すタイミング図である。機能ユニ
ット６，７，８はそれぞれパイプラインＰＬ１，ＰＬ
２，ＰＬ３の一部を構成すると考えることができる。命
令フェッチステージ４、命令デコードステージ５はいず
れのパイプラインの構成要素ともなっている。FIG. 16 is a timing chart showing an example of the progress of the pipeline processing in this case. Functional units 6, 7, and 8 are pipelines PL1 and PL, respectively.
2, can be considered to form part of PL3. The instruction fetch stage 4 and the instruction decode stage 5 are components of any pipeline.

【００１１】命令１１，１２，１３はいずれも同時に処
理可能であるとし、それぞれパイプラインＰＬ１，ＰＬ
２，ＰＬ３において処理されるとする。It is assumed that all the instructions 11, 12 and 13 can be processed at the same time, and the pipelines PL1 and PL respectively.
2, PL3.

【００１２】パイプラインＰＬ１，ＰＬ２，及びＰＬ３
では、期間Ｔ₁において命令フェッチステージ４におけ
る処理ＩＦが行われ、期間Ｔ₂においては命令デコード
ステージ５における処理ＩＤが行われる。実行ステージ
６１、メモリ・アクセス・ステージ６２、ライトバック
・ステージ６３では、それぞれ期間Ｔ₃，Ｔ₄，Ｔ₅に
おいてそれぞれ実行（ＥＸＣ）、メモリ・アクセス（Ｍ
ＥＭ）、ライトバック（ＷＢ）の処理がなされる。Pipelines PL1, PL2, and PL3
So done processing IF in the instruction fetch stage 4 in the period T _1, in a period T ₂ process ID in the instruction decode stage 5 is performed. In the execution stage 61, the memory access stage 62, and the write back stage 63, execution (EXC) and memory access (M) are performed in the periods T ₃ , T ₄ , and T ₅ , respectively.
EM) and write back (WB) processing is performed.

【００１３】[0013]

【発明が解決しようとする課題】しかし、次のような場
合には、処理に要する時間の観点から問題が生じる。However, in the following cases, a problem arises from the viewpoint of processing time.

【００１４】今、３つの命令１１，１２，１３がそれぞ
れストア命令、ロード命令、演算命令であったとする。
これらが同時に処理可能であるとして与えられ、それぞ
れパイプラインＰＬ１，ＰＬ２，ＰＬ３で処理されると
仮定する。これに加えて、命令１１でデータがデータ・
メモリ２に格納（ストア）され、命令１２の処理によっ
て当該データが読み出され（ロード）、一旦レジスタフ
ァイル３に書き込まれた後、命令１３においてレジスタ
ファイル３をアクセスし、当該データを用いて処理を行
うと仮定する。Now, it is assumed that the three instructions 11, 12, and 13 are a store instruction, a load instruction, and an operation instruction, respectively.
It is assumed that these are given as being processable at the same time and are processed in pipelines PL1, PL2, PL3, respectively. In addition to this, the data is
The data is stored (stored) in the memory 2, the data is read (loaded) by the processing of the instruction 12, and once written in the register file 3, the register file 3 is accessed in the instruction 13 and processing is performed using the data. Suppose you do.

【００１５】このような場合、図１４に示された構成で
は、命令１１の実行の完了を待たないと、命令１２の実
行が完了しない。In such a case, in the configuration shown in FIG. 14, the execution of the instruction 12 is not completed until the completion of the execution of the instruction 11 is waited.

【００１６】図１７は命令１１，１２，１３が処理され
る様子を示すタイミング図である。まず命令１１が実行
され、かつ終了する。即ち、期間Ｔ₁において命令フェ
ッチステージ４における処理ＩＦが行われ、期間Ｔ₂に
おいては命令デコードステージ５における処理ＩＤが行
われる。そして期間Ｔ₃，Ｔ₄，Ｔ₅においてそれぞれ
実行ステージ６１、メモリ・アクセス・ステージ６２、
ライトバック・ステージ６３における処理ＥＸＣ，ＭＥ
Ｍ，ＷＢが実行される。FIG. 17 is a timing diagram showing how instructions 11, 12, and 13 are processed. First, the instruction 11 is executed, and the processing ends. That is done processing IF in the instruction fetch stage 4 in the period T _1, in a period T ₂ process ID in the instruction decode stage 5 is performed. Then, in the periods T ₃ , T ₄ , and T ₅ , the execution stage 61, the memory access stage 62, and
Processing EXC, ME in write-back stage 63
M and WB are executed.

【００１７】一方、期間Ｔ₁においては、パイプライン
ＰＬ２において命令１１についての、パイプラインＰＬ
３において命令１２についての、命令フェッチステージ
４の処理ＩＦが行われる。更に、期間Ｔ₂においては、
パイプラインＰＬ２において命令１１についての、パイ
プラインＰＬ３において命令１２についての、命令デコ
ードステージ５における処理ＩＤが行われる。On the other hand, in the period T ₁ , the pipeline PL2 for the instruction 11 in the pipeline PL2
In 3, the processing IF of the instruction fetch stage 4 for the instruction 12 is performed. Furthermore, in the period T ₂ ,
The process ID in the instruction decode stage 5 is performed for the instruction 11 in the pipeline PL2 and for the instruction 12 in the pipeline PL3.

【００１８】しかしながら、期間Ｔ₄，Ｔ₅において命
令１２の実行は停止され、かつ、期間Ｔ₃〜Ｔ₈に命令
１３の実行が停止される。なぜなら、前述のように命令
１２が命令１１の実行によりデータ・メモリ２に格納さ
れたデータを用いるために、命令１１の実行の終了を待
たないと命令１２は実行できないからであり、命令１３
が命令１２でデータ・メモリ２から読み込まれたデータ
を使用するために、命令１２の実行の終了を待たないと
命令１３が実行できないからである。However, the execution of the instruction 12 is stopped in the periods T ₄ and T ₅ , and the execution of the instruction 13 is stopped in the periods T _{3 to} T ₈ . This is because the instruction 12 uses the data stored in the data memory 2 by the execution of the instruction 11 as described above, and thus the instruction 12 cannot be executed until the execution of the instruction 11 is completed.
Is to use the data read from the data memory 2 by the instruction 12, the instruction 13 cannot be executed unless the execution of the instruction 12 is completed.

【００１９】即ち、パイプラインＰＬ２は、期間Ｔ₄及
び期間Ｔ₅において待機状態（パイプライン・インター
ロック）になり、期間Ｔ₆で実行（ＥＸＣ）が再開され
る。よって期間Ｔ₆，Ｔ₇，Ｔ₈において実行ステージ
６１、メモリ・アクセス・ステージ６２、ライトバック
・ステージ６３における処理が、それぞれ実行される。
またパイプラインＰＬ３は、期間Ｔ₃及び期間Ｔ₈にお
いて待機状態になり、期間Ｔ₉で命令デコードステージ
５における処理ＩＤが再開される。That is, the pipeline PL2 is in a standby state (pipeline interlock) in the periods T ₄ and T ₅ , and the execution (EXC) is restarted in the period T ₆ . Therefore, the processes in the execution stage 61, the memory access stage 62, and the write back stage 63 are executed in the periods T ₆ , T ₇ , and T ₈ , respectively.
The pipe line PL3 is in a standby state in the period T ₃ and period T _8, the process ID in the instruction decode stage 5 in the period T ₉ is resumed.

【００２０】つまり、図１７に示したように、命令１２
の実行により読み出されるデータは、期間Ｔ₃の時点で
既に得られているのであるが、データ・メモリ２に書き
込まれていないため、命令１１の実行が終了しなけれ
ば、読み出すことはできない。また、命令１３によって
使用されるデータは、命令１２の実行が停止しなけれ
ば、期間Ｔ₆の実行ステージ６１における処理ＥＸＣに
より、既に得られているのであるが、レジスタファイル
３に書き込まれていないため、命令１２の実行の終了を
待たなければならない。その結果、命令の実行を完了す
るのに長い時間を要し、計算機の処理能力を低下させて
いた。That is, as shown in FIG.
The data read by the execution of is already obtained at the time of the period T ₃ , but since it has not been written in the data memory 2, it cannot be read unless the execution of the instruction 11 is completed. Also, the data used by the instruction 13 has not been written in the register file 3 although it has already been obtained by the process EXC in the execution stage 61 of the period T ₆ unless the execution of the instruction 12 is stopped. Therefore, it is necessary to wait for the end of the execution of the instruction 12. As a result, it took a long time to complete the execution of the instruction, and the processing capacity of the computer was lowered.

【００２１】[0021]

【課題を解決するための手段】この発明にかかる並列計
算機は、複数のパイプライン処理実行手段と、与えられ
た複数の命令をフェッチし、フェッチされたパイプライ
ン処理実行手段複数の命令の中から同時に実行出来る所
定の命令を見つけ、パイプライン処理実行手段パイプラ
イン処理実行手段にパイプライン処理実行手段所定の命
令を投入する命令付与手段と、パイプライン処理実行手
段所定の命令の処理に際してデータを比較的短期に保持
する一時データ記憶手段と、データを比較的長期に保持
する外部メモリと、パイプライン処理実行手段パイプラ
イン処理実行手段の扱うデータが伝送されるデータ・バ
イパス・バスと、を備える。そして、パイプライン処理
実行手段パイプライン処理実行手段の各々は、パイプラ
イン処理実行手段データ・バイパス・バスによって伝送
されたデータを使用するか否かを決定するバイパス制御
手段を有する。A parallel computer according to the present invention fetches a plurality of pipeline processing execution means and a plurality of given instructions, and selects from among the fetched pipeline processing execution means a plurality of instructions. Pipeline processing execution means Pipeline processing execution means Pipeline processing execution means Instructions for inputting predetermined instructions to pipeline processing execution means and pipeline processing execution means Compare data when processing predetermined instructions A temporary data storage means for holding for a relatively short period of time, an external memory for holding data for a relatively long term, a pipeline processing execution means, and a data bypass bus for transmitting data handled by the pipeline processing execution means. Each of the pipeline process executing means has a bypass control means for determining whether to use the data transmitted by the pipeline process executing means data bypass bus.

【００２２】一のパイプライン処理実行手段パイプライ
ン処理実行手段におけるパイプライン処理実行手段バイ
パス制御手段は、一のパイプライン処理実行手段パイプ
ライン処理実行手段に対応する一のパイプライン処理実
行手段所定の命令において得られたパイプライン処理実
行手段外部メモリのメモリアドレスと、他のパイプライ
ン処理実行手段パイプライン処理実行手段に対応する他
のパイプライン処理実行手段所定の命令において得られ
たパイプライン処理実行手段外部メモリのメモリアドレ
スとの一致を検出する外部メモリアドレス一致検出手段
を備える。また、パイプライン処理実行手段外部メモリ
アドレス一致検出手段の出力に従って、パイプライン処
理実行手段データ・バイパス・バスによって伝送された
データを用いるか、パイプライン処理実行手段外部メモ
リから別途ロードされたデータを用いるかを選択する第
１のデータ供与手段を更に備える。One pipeline processing executing means In the pipeline processing executing means, the pipeline processing executing means bypass control means is one pipeline processing executing means, one pipeline processing executing means corresponding to the pipeline processing executing means. Pipeline processing execution means obtained in the instruction Memory address of the external memory and other pipeline processing execution means Other pipeline processing execution means Corresponding to pipeline processing execution means Pipeline processing execution obtained in a predetermined instruction Means: An external memory address match detection means for detecting a match with the memory address of the external memory is provided. Further, according to the output of the pipeline processing execution means external memory address match detection means, the data transmitted by the pipeline processing execution means data bypass bus is used, or the data separately loaded from the pipeline processing execution means external memory is used. It further comprises a first data providing means for selecting whether to use.

【００２３】パイプライン処理実行手段外部メモリアド
レス一致検出手段は、データがバイパスされたか否かを
示す一致信号を出力し、パイプライン処理実行手段一致
信号が一致を示す場合には、パイプライン処理実行手段
一の所定の命令を、バイパスされたパイプライン処理実
行手段データをパイプライン処理実行手段一時データ記
憶手段に格納する命令に変更する命令変更手段を更に備
える。Pipeline processing execution means The external memory address match detection means outputs a match signal indicating whether or not the data is bypassed. If the pipeline processing execution means match signal indicates a match, pipeline processing execution means is executed. It further comprises an instruction changing means for changing the predetermined instruction of the means 1 into an instruction for storing the bypassed pipeline processing execution means data in the pipeline processing execution means temporary data storage means.

【００２４】一のパイプライン処理実行手段パイプライ
ン処理実行手段におけるパイプライン処理実行手段バイ
パス制御手段は、パイプライン処理実行手段一のパイプ
ライン処理実行手段に対応する一のパイプライン処理実
行手段所定の命令において得られたパイプライン処理実
行手段一時データ記憶手段のアドレスと、他のパイプラ
イン処理実行手段パイプライン処理実行手段に対応する
他のパイプライン処理実行手段所定の命令において得ら
れたパイプライン処理実行手段一時データ記憶手段のア
ドレスとの一致を検出する一時データ記憶手段アドレス
一致検出手段を更に備える。また、他のパイプライン処
理実行手段パイプライン処理実行手段のパイプライン処
理実行手段外部メモリアドレス一致検出手段の出力及び
一時データ記憶手段アドレス一致検出手段の出力に従っ
て、パイプライン処理実行手段データ・バイパス・バス
によって伝送されたデータを用いるか否かを選択する第
２のデータ供与手段を更に備える。One pipeline processing executing means The pipeline processing executing means bypass control means in the pipeline processing executing means is one pipeline processing executing means corresponding to one pipeline processing executing means. Pipeline processing execution means obtained in the instruction Address of temporary data storage means and other pipeline processing execution means Other pipeline processing execution means Corresponding to pipeline processing execution means Pipeline processing obtained in a predetermined instruction The execution means temporary data storage means further includes temporary data storage means address coincidence detection means for detecting a match with the address of the temporary data storage means. Further, according to the output of the other pipeline processing executing means, the pipeline processing executing means, the pipeline processing executing means, the external memory address matching detecting means, and the temporary data storing means, the address matching detecting means, It further comprises a second data providing means for selecting whether to use the data transmitted by the bus.

【００２５】[0025]

【作用】この発明における並列計算機では、第１のデー
タ供与手段が、外部メモリアドレス一致検出手段に応答
して、他のパイプライン処理実行手段に外部メモリへの
アクセス命令によって外部メモリに書き込まれるデータ
を、同アドレスの外部メモリへのアクセス命令を実行中
のパイプライン処理実行手段に直接与える。即ち、必要
なデータを、外部メモリに書き込まれるのを待つことな
く得ることができる。In the parallel computer according to the present invention, the first data providing means is responsive to the external memory address coincidence detecting means and the data written to the external memory by the access instruction to the other pipeline processing executing means to the external memory. Is directly given to the pipeline processing execution means that is executing the instruction to access the external memory at the same address. That is, the required data can be obtained without waiting for writing to the external memory.

【００２６】また、外部メモリアドレス一致検出手段に
おいて、アドレスの一致が検出された場合には、外部メ
モリへのアクセス命令を中止し、一時データ記憶手段に
対する書き込み命令に変更する手段をもつ。よって、無
駄なアクセス命令が行われない。Further, the external memory address coincidence detecting means has means for stopping an access instruction to the external memory and changing to a write instruction for the temporary data storing means when the address coincidence is detected. Therefore, useless access commands are not issued.

【００２７】また、読み出されてくるデータを使用する
パイプライン処理実行手段に、必要なデータが得られた
ことを判定し、他のパイプラインからデータを直接与え
る。即ち、必要なデータを外部メモリから読み出される
ことを待つことなく得ることができる。Further, it is judged that the necessary data has been obtained, and the data is directly supplied from another pipeline to the pipeline processing executing means which uses the read data. That is, it is possible to obtain necessary data without waiting for the data to be read from the external memory.

【００２８】以上の作用により、他のパイプライン処理
実行手段において実行されるデータを直接受けるので、
実行の終了を待つことなく、短時間で命令を完了するこ
とができる。By the above operation, the data executed by the other pipeline processing execution means is directly received,
Instructions can be completed in a short time without waiting for the end of execution.

【００２９】[0029]

【実施例】図１は、この発明の一実施例を示すスーパス
カラプロセッサのブロック図である。このスーパスカラ
２０は、命令メモリ１内に格納された複数の命令を取り
出す命令フェッチステージ４と、命令フェッチステージ
４において取り出された命令をデコードする命令デコー
ドステージ５と、各々がパイプライン構造を有する複数
の機能ユニット６〜９と、命令を実行するのに使われる
データを一時的に保持するレジスタファイル３を含む。
機能ユニット６〜９は、データバス１１を介して外部の
データメモリ２にアクセスすることができる。レジスタ
ファイル３は、ＲＡＭによって構成され、機能ユニット
６〜９からアクセスされる。FIG. 1 is a block diagram of a superscalar processor showing an embodiment of the present invention. The superscalar 20 includes an instruction fetch stage 4 for fetching a plurality of instructions stored in the instruction memory 1, an instruction decode stage 5 for decoding an instruction fetched in the instruction fetch stage 4, and a plurality of pipeline stages each having a pipeline structure. Functional units 6-9 and a register file 3 for temporarily holding data used for executing instructions.
The functional units 6 to 9 can access the external data memory 2 via the data bus 11. The register file 3 is composed of a RAM and is accessed by the functional units 6-9.

【００３０】スーパスカラプロセッサ２０は外部的に与
えられる２相のノンオーバラップ・クロック信号に応答
して動作する。基本的な動作は、図１４に示した従来の
スーパスカラプロセッサ２０と同様であるので説明を省
略する。Superscalar processor 20 operates in response to an externally applied two-phase non-overlap clock signal. Since the basic operation is the same as that of the conventional superscalar processor 20 shown in FIG. 14, the description will be omitted.

【００３１】メモリ・データ・バイパス１０、データ・
バイパス１２は、それぞれの機能ユニット６〜９間に設
けられている。Memory data bypass 10, data
The bypass 12 is provided between the respective functional units 6-9.

【００３２】メモリ・データ・バイパス１０を介して、
機能ユニット内のデータ供与手段によって得られたデー
タが伝送される。さらにデータ・バイパス１２を介し
て、機能ユニット６〜９間で、メモリ・アクセス命令を
実行中の機能ユニット内において得られたメモリ・アド
レスも伝送される。その他データがバイパスされたこと
を示す信号、および、各機能ユニットにおいて処理され
る命令に含まれる処理の結果得られたデータの格納アド
レス（レジスタファイル３のアドレス）が機能ユニット
６〜９間で伝送される。Via the memory data bypass 10,
The data obtained by the data providing means in the functional unit are transmitted. Further, via the data bypass 12, the memory address obtained in the functional unit executing the memory access instruction is also transmitted between the functional units 6 to 9. A signal indicating that other data has been bypassed and a storage address (address of the register file 3) of data obtained as a result of processing included in an instruction processed in each functional unit are transmitted between the functional units 6 to 9. To be done.

【００３３】図２に命令デコードステージ５と共に機能
ユニット６の構成図を示す。他の機能ユニット７〜９も
同様に構成される。デコードステージ５から与えられる
命令は、命令コードＯＰ₆と、いずれもレジスタファイ
ル３のアドレスである、格納アドレスＤ₆及び２つのソ
ースアドレスＳ₁₁，Ｓ₁₂を含んでいる。ソースアドレス
Ｓ₁₁，Ｓ₁₂は、命令を実行するのに使われるデータがス
トアされているレジスタファイル３のアドレスを示す。
また格納アドレスＤ₆は機能ユニット６において実行さ
れたデータがストアされるべきレジスタファイル３のア
ドレスを示す。FIG. 2 shows a block diagram of the functional unit 6 together with the instruction decode stage 5. The other functional units 7 to 9 are similarly configured. The instruction given from the decode stage 5 includes an instruction code OP ₆ , a storage address D ₆ and two source addresses S ₁₁ and S ₁₂ , both of which are addresses of the register file 3. The source addresses S ₁₁ and S ₁₂ indicate the addresses of the register file 3 in which the data used to execute the instruction are stored.
The storage address D ₆ indicates the address of the register file 3 in which the data executed in the functional unit 6 should be stored.

【００３４】機能ユニット６は、命令の演算および、メ
モリ・アドレス（メモリ２のアドレス）の計算を実行す
る実行ステージ６１、データメモリ２にアクセスするた
めのメモリ・アクセス・ステージ６２、得られたデータ
をレジスタファイル３へ書き込むためのライトバック・
ステージ６３からなる。The functional unit 6 executes an operation of an instruction and calculation of a memory address (address of the memory 2), an execution stage 61, a memory access stage 62 for accessing the data memory 2, and the obtained data. Write back to write data to register file 3
It consists of stage 63.

【００３５】実行ステージ６１は、メモリ・アドレス比
較器９０、メモリ・データ・セレクタ９１、命令変更器
９２、格納レジスタ９４、選択回路９７を備えている。The execution stage 61 includes a memory / address comparator 90, a memory / data selector 91, an instruction changer 92, a storage register 94, and a selection circuit 97.

【００３６】メモリ・アドレス比較器９０は、図１に示
した他の機能ユニット７〜９で処理されている命令ＯＰ
₇〜ＯＰ₉及び機能ユニット７〜９の備える演算実行器
８４によって計算されたメモリ・アドレスＭ₇〜Ｍ₉、
並びに機能ユニット６で処理されている命令ＯＰ₆及び
演算実行器８４によって計算された読み出しアドレスＭ
₆を受け取る。The memory / address comparator 90 uses the instruction OP processed by the other functional units 7 to 9 shown in FIG.
_{7 to} OP ₉ and memory addresses M _{7 to} M ₉ calculated by the operation executor 84 included in the functional units 7 to ₉ ,
And the read address M calculated by the operation executor 84 and the instruction OP ₆ being processed in the functional unit 6.
Receive ₆

【００３７】今、命令ＯＰ₇〜ＯＰ₉が書き込み命令
（ストア命令）であれば、メモリ・アドレスＭ₇〜Ｍ₉
は書き込みメモリ・アドレスである。また命令ＯＰ₆が
読み出し命令（ロード命令）であれば、アドレスＭ₆は
読み出しアドレスである。メモリ・アドレス比較器９０
はこれらを比較し、比較結果（一致／不一致）を示す信
号ＢＹ₇，ＢＹ₈，ＢＹ₉を出力する。詳細については
後で記述する。Now, if the instructions OP _{7 to} OP ₉ are write instructions (store instructions), the memory addresses M _{7 to} M _{9 are} stored.
Is the write memory address. If the instruction OP ₆ is a read instruction (load instruction), the address M ₆ is a read address. Memory / address comparator 90
Compares these and outputs signals BY ₇ , BY ₈ and BY ₉ indicating the comparison result (match / mismatch). Details will be described later.

【００３８】メモリ・データ・セレクタ９１は、メモリ
・データ・バイパス１０を介して、他の機能ユニット７
〜９において格納されるべきメモリデータＲ₇，Ｒ₈，
Ｒ₉を受け取る。そして信号ＢＹ₇，ＢＹ₈，ＢＹ₉に
よって制御されて動作する。詳細は後で記述する。The memory data selector 91 is connected to another functional unit 7 via the memory data bypass 10.
Memory data R ₇ , R ₈ , to be stored in
Receive R ₉ . Then, it operates under the control of the signals BY ₇ , BY ₈ and BY ₉ . Details will be described later.

【００３９】命令変更器９２は、メモリ・アドレス比較
器９０の出力である信号ＢＹ₇〜ＢＹ₉によって制御さ
れ、レジスタ９４に保持された命令コードに基づいて命
令の変更を行う。詳細は、後で記述する。The instruction changer 92 is controlled by the signals BY _{7 to} BY ₉ output from the memory / address comparator 90, and changes the instruction based on the instruction code held in the register 94. Details will be described later.

【００４０】格納レジスタ９４は、デコードステージ５
から与えられる命令コードＯＰ₆を保持する。保持され
た命令コードＯＰ₆は、命令変更器９２及び選択回路９
７並びにメモリ・アクセスステージ６２内のレジスタ９
５に与えられる。The storage register 94 is used in the decoding stage 5
The instruction code OP ₆ given by the above is held. The held instruction code OP ₆ is stored in the instruction changer 92 and the selection circuit 9
7 and register 9 in the memory access stage 62
Given to 5.

【００４１】選択回路９７は、命令コードＯＰ₆がスト
ア命令の場合にはレジスタファイル３からのデータＤ₃
を、命令コードＯＰ₆がその他の命令の場合には格納ア
ドレスＤ₆を、それぞれ選択して信号Ｒ₆を出力する。When the instruction code OP ₆ is a store instruction, the selection circuit 97 outputs the data D ₃ from the register file _3.
If the instruction code OP ₆ is another instruction, the storage address D ₆ is selected and the signal R ₆ is output.

【００４２】実行ステージ６１は、更にレジスタファイ
ル・アドレス比較器８０、格納レジスタ８１、レジスタ
ファイル・データセレクタ８３、演算実行器８４を含
む。The execution stage 61 further includes a register file / address comparator 80, a storage register 81, a register file / data selector 83, and an operation executor 84.

【００４３】レジスタファイル・アドレス比較器８０
は、他の機能ユニット７〜９から得られる格納アドレス
Ｄ₇，Ｄ₈，Ｄ₉および、機能ユニット６の命令に含ま
れるソースアドレスＳ₁₁，Ｓ₁₂を受け取る。そしてソー
スアドレスＳ₁₁，Ｓ₁₂と格納アドレスＤ₇，Ｄ₈，Ｄ₉
を比較し、比較結果（一致／不一致）を示す選択信号Ｓ
₁₇〜Ｓ₁₉及びＳ₂₇〜Ｓ₂₉を出力する。詳細については後
で記述する。Register file address comparator 80
Receives the storage addresses D ₇ , D ₈ and D ₉ obtained from the other functional units 7 to ₉ and the source addresses S ₁₁ and S ₁₂ included in the instruction of the functional unit 6. The source addresses S ₁₁ , S ₁₂ and the storage addresses D ₇ , D ₈ , D ₉
Selection signal S indicating the comparison result (match / mismatch)
Outputs _{17 to} S ₁₉ and S _{27 to} S ₂₉ . Details will be described later.

【００４４】格納レジスタ８１は、デコードステージ５
から与えられる格納アドレスＤ₆を保持する。保持され
た格納アドレスＤ₆は、他の機能ユニットのアドレス比
較器８０および、メモリ・アクセスステージ６２内のレ
ジスタ８２に与えられる。The storage register 81 is used in the decoding stage 5
The storage address D ₆ given from is held. The stored storage address D ₆ is provided to the address comparator 80 of another functional unit and the register 82 in the memory access stage 62.

【００４５】レジスタファイル・データ・セレクタ８３
は、デコードステージ５を介して、レジスタファイル３
から与えられる２つのデータＤ₁，Ｄ₂を受け取る。さ
らに、レジスタファイル・データ・セレクタ８３は、他
の機能ユニットにおいて得られたメモリ・データＭＤ₇
〜ＭＤ₉をデータバイパス１２を介して受け取る。レジ
スタファイル・データ・セレクタ８３はアドレス比較器
８０から与えられる選択信号Ｓ₁₇〜Ｓ₁₉及びＳ₂₇〜Ｓ₂₉
並びに他の機能ユニット７〜９のそれぞれで得られた３
つの信号ＢＹ₆に応答して動作する。詳細は後で記述す
る。Register file data selector 83
Register file 3 via decode stage 5
It receives _two data D ₁ and D ₂ given by In addition, the register file data selector 83 is provided with the memory data MD ₇ obtained in another functional unit.
Receive MD ₉ through data bypass 12. The register file data selector 83 receives the selection signals S _{17 to} S ₁₉ and S _{27 to} S ₂₉ supplied from the address comparator 80.
And 3 obtained in each of the other functional units 7-9
It operates in response to one signal BY ₆ . Details will be described later.

【００４６】演算実行器８４はデータ・バス３１及び３
２を介してレジスタファイル・データ・セレクタ８３と
接続され、レジスタファイル・データ・セレクタ８３に
よって選択されたデータに基づいて所定の演算を実行す
る。演算の実行結果は、レジスタ８５及びメモリ・アド
レス比較器９０に与えられる。これは、データ・メモリ
２をアクセスするためのアドレスである。The operation executor 84 uses the data buses 31 and 3
It is connected to the register file data selector 83 via 2 and executes a predetermined operation based on the data selected by the register file data selector 83. The execution result of the operation is given to the register 85 and the memory / address comparator 90. This is an address for accessing the data memory 2.

【００４７】メモリアクセスステージ６２は、格納アド
レスＲ₆を保持するためのレジスタ８２とレジスタ８５
の演算結果のデータを保持するデータレジスタ８６とを
含む。ライトバックステージ６３は与えられた格納アド
レスにしたがって実行結果データをレジスタファイル３
に書き込む。The memory access stage 62 has a register 82 and a register 85 for holding the storage address R _6.
And a data register 86 for holding the data of the calculation result of. The write back stage 63 transfers the execution result data to the register file 3 according to the given storage address.
Write in.

【００４８】図２に示したレジスタファイル・アドレス
比較器８０の一例を図３に示す。レジスタファイル・ア
ドレス比較器８０は、ソースアドレスＳ₁₁と、図１に示
した他の機能ユニット７〜９において扱われる命令の格
納アドレスＤ₇，Ｄ₈，Ｄ₉との一致をそれぞれ検出す
る一致検出器８１１乃至８１３を備えている。またソー
スアドレスＳ₁₂と、図１に示した他の機能ユニット７〜
９において扱われる命令の格納アドレスＤ₇，Ｄ₈，Ｄ
₉との一致をそれぞれ検出する一致検出器８２１乃至８
２３を備えている。An example of the register file / address comparator 80 shown in FIG. 2 is shown in FIG. The register file address comparator 80 detects a match between the source address S ₁₁ and the storage addresses D ₇ , D ₈ and D ₉ of the instructions handled in the other functional units 7 to 9 shown in FIG. 1, respectively. The detectors 811 to 813 are provided. The source address S ₁₂ and the other functional units 7 to 7 shown in FIG.
Storage address D ₇ of the instructions handled in 9, D _8, D
Match detectors 821 to 8 for respectively detecting a match with ₉
Equipped with 23.

【００４９】一致検出器８１１乃至８１３は、それぞれ
一致信号Ｓ₁₇〜Ｓ₁₉を発生する。また一致検出器８２１
乃至８２３は、それぞれ一致信号Ｓ₂₇〜Ｓ₂₉を発生す
る。例えば検出器８１１は、格納アドレスＤ₇とソース
アドレスＳ₁₁との一致を検出し、一致信号Ｓ₁₇を発生す
る。一致信号Ｓ₁₇〜Ｓ₁₉，Ｓ₂₇〜Ｓ₂₉は、演算実行器８
４において演算されるデータを選択するための信号とし
てレジスタファイル・データ・セレクタ８３に与えられ
る。The coincidence detectors 811 to 813 generate coincidence signals S _{17 to} S ₁₉ , respectively. Also, the coincidence detector 821
Through 823 generate coincidence signals S _{27 through} S ₂₉ , respectively. For example, the detector 811 detects a match between the storage address D ₇ and the source address S ₁₁ and generates a match signal S ₁₇ . Coincidence signal _{_{_{S 17 ~S 19, S 27 ~S}}} 29 , the arithmetic execution unit 8
It is given to the register file data selector 83 as a signal for selecting the data calculated in 4.

【００５０】レジスタファイル・データ・セレクタ８３
の一例を図４に示す。レジスタファイル・データ・セレ
クタ８３は、データバス３１に接続された出力端を有す
るトライステートバッファ９１０〜９１３と、データバ
ス３２に接続された出力端を有するトライステートバッ
ファ９２０〜９２３とからなる。Register file data selector 83
An example is shown in FIG. The register file data selector 83 includes tristate buffers 910 to 913 having output ends connected to the data bus 31 and tristate buffers 920 to 923 having output ends connected to the data bus 32.

【００５１】トライステートバッファ９１０，９２０
は、それぞれレジスタファイル３から与えられるデータ
Ｄ₁，Ｄ₂を受け取る。また、トライステートバッファ
９１１〜９１３は、図１に示したデータバイパス１２を
介して他の機能ユニットにおいて得られたメモリ・デー
タＭＤ₇〜ＭＤ₉をそれぞれ受け取る。トライステート
バッファ９２０〜９２３も同様に、データバイパス１２
を介して他の機能ユニットにおいて得られたメモリ・デ
ータＭＤ₇〜ＭＤ₉をそれぞれ受け取る。Tristate buffers 910 and 920
Respectively receives the data D ₁ and D ₂ given from the register file 3. Also, tri-state buffers 911 to 913, respectively receive the memory data MD ₇ to MD ₉ obtained in other functional units via a data bypass 12 shown in FIG. Similarly, the tri-state buffers 920 to 923 also have the data bypass 12
The memory data MD _{7 to} MD ₉ obtained in the other functional units are respectively received via.

【００５２】トライステートバッファ９１０〜９１３
は、レジスタファイル・アドレス比較器８０から与えら
れる選択信号Ｓ₁₇〜Ｓ₁₉と機能ユニット７〜９において
それぞれメモリ・データが得られたことを示す信号ＢＹ
₆に従って制御される。例えば、トライステートバッフ
ァ９１１は、選択信号Ｓ₁₇が高レベルで与えられかつ機
能ユニット７からの信号ＢＹ₆が高レベルで与えられた
とき、データバイパス１２上のデータＭＤ₇をデータバ
ス３１に与える。一方、トライステートバッファ９１０
は、選択信号Ｓ₁₇〜Ｓ₁₉の全てが低レベルを示している
か、或いは機能ユニット７〜９から得られる信号ＢＹ₆
が全て低レベルを示しているとき、レジスタファイル３
から与えられるデータＤ₁をデータバス３１に与える。
トライステートバッファ９２０〜９２３についても同様
に動作する。Tristate buffers 910 to 913
Is a signal BY indicating that the selection signals S _{17 to} S ₁₉ given from the register file address comparator 80 and the memory data in the functional units 7 to 9 are obtained.
Controlled according to ₆ . For example, the tri-state buffer 911 applies the data MD ₇ on the data bypass 12 to the data bus 31 when the selection signal S ₁₇ is applied at a high level and the signal BY ₆ from the functional unit 7 is applied at a high level. . On the other hand, the tri-state buffer 910
Indicates that all of the selection signals S _{17 to} S ₁₉ are low, or the signal BY ₆ obtained from the functional units 7 to 9 is
Register file 3 when all indicate low levels
The data D ₁ given by the above is given to the data bus 31.
The tristate buffers 920 to 923 operate similarly.

【００５３】演算実行器８４の一例を図５に示す。演算
実行器８４は、データバス３１，３２を介して得られた
データをそれぞれ保持するレジスタ８４１，８４２と、
レジスタ８４１，８４２によって保持されたデータを用
いて演算を実行する演算器８４３からなる。FIG. 5 shows an example of the arithmetic execution unit 84. The operation executor 84 has registers 841 and 842 which respectively hold data obtained via the data buses 31 and 32,
It is composed of an arithmetic unit 843 that executes an arithmetic operation using the data held by the registers 841 and 842.

【００５４】メモリ・アドレス比較器の一例を図６に示
す。メモリ・アドレス比較器９０は一致検出器９０１〜
９０３からなる。一致検出器９０１は他の機能ユニット
７で処理されている命令ＯＰ₇、及び機能ユニット７の
演算実行器８４によって計算された書き込みメモリ・ア
ドレスＭ₇、並びにこの機能ユニット６で処理されてい
る命令ＯＰ₆、及びこの機能ユニット６の演算実行器８
４によって計算された読み出しアドレスＭ₆が入力され
る。一致検出器９０２，９０３も同様である。つまり一
致検出器９０２，９０３のいずれにもこの機能ユニット
６で処理されている命令ＯＰ₆、及びこの機能ユニット
６の演算実行器８４によって計算された読み出しアドレ
スＭ₆が入力される。そして一致検出器９０２には他の
機能ユニット８で処理されている命令ＯＰ₈、及び機能
ユニット８の演算実行器８４によって計算された書き込
みメモリ・アドレスＭ₈も入力される。一致検出器９０
３には他の機能ユニット９で処理されている命令Ｏ
Ｐ₉、及び機能ユニット９の演算実行器８４によって計
算された書き込みメモリ・アドレスＭ₉も入力される。An example of the memory address comparator is shown in FIG. The memory / address comparator 90 is a match detector 901-
It consists of 903. The coincidence detector 901 indicates the instruction OP ₇ being processed by another functional unit 7, the write memory address M ₇ calculated by the operation executor 84 of the functional unit 7, and the instruction being processed by this functional unit 6. OP ₆ and operation executor 8 of this functional unit 6
The read address M ₆ calculated by 4 is input. The same applies to the coincidence detectors 902 and 903. That is, the instruction OP ₆ processed by the functional unit 6 and the read address M ₆ calculated by the operation executor 84 of the functional unit ₆ are input to both the coincidence detectors 902 and 903. Then, the coincidence detector 902 also receives the instruction OP ₈ being processed by the other functional unit 8 and the write memory address M ₈ calculated by the operation executor 84 of the functional unit 8. Coincidence detector 90
3 is an instruction O processed by another functional unit 9.
P ₉ and the write memory address M ₉ calculated by the operation executor 84 of the functional unit 9 are also input.

【００５５】例えば検出器９１０は、命令コードＯＰ₇
がメモリ書き込み命令であり、かつＯＰ₆がメモリ読み
出し命令であるとき、書き込みメモリ・アドレスＭ
₇と、読み出しアドレスＭ₆の一致を検出し、一致信号
ＢＹ₇を発生する。これはメモリ・データが得られたこ
とを示す信号である。信号ＢＹ₈，ＢＹ₉も同様にして
得られる。信号ＢＹ₇〜ＢＹ₉は、メモリ・データを選
択するための信号としてメモリ・データ・セレクタ９１
および、命令変更器９２に与えられる。For example, the detector 910 uses the instruction code OP ₇
Is a memory write instruction and OP ₆ is a memory read instruction, write memory address M
A match between ₇ and the read address M ₆ is detected, and a match signal BY ₇ is generated. This is a signal indicating that the memory data has been obtained. The signals BY ₈ and BY ₉ are similarly obtained. The signals BY _{7 to} BY ₉ are used as signals for selecting memory data by the memory data selector 91.
It is also given to the instruction changer 92.

【００５６】メモリ・データ・セレクタ９１の一例を図
７に示す。メモリ・データ・セレクタ９１は、トライス
テートバッファ９１０〜９１３からなる。トライステー
トバッファ９１０は、メモリ・データバイパス１０を介
して他の機能ユニット７〜９において格納されるべきメ
モリデータ（演算実行器８４の出力データ）Ｒ₇〜Ｒ₉
のそれぞれを受け取る。トライステートバッファ９１０
〜９１３は、メモリ・アドレス比較器９０から与えられ
る信号ＢＹ₇〜ＢＹ₉に従って制御される。FIG. 7 shows an example of the memory data selector 91. The memory data selector 91 includes tristate buffers 910 to 913. The tri-state buffer 910 stores memory data (output data of the arithmetic execution unit 84) R _{7 to} R ₉ to be stored in the other functional units 7 to 9 via the memory / data bypass 10.
Receive each of. Tri-state buffer 910
～913 is controlled in accordance with signals BY ₇ ~BY ₉ provided from a memory address comparator 90.

【００５７】例えば、トライステートバッファ９１１
は、信号ＢＹ₇が高レベルで与えられたとき、データバ
イパス１０上のデータをデータバイパス１２及びレジス
タ８６に信号ＭＤ₆として与える。For example, the tri-state buffer 911
Provides the data on data bypass 10 to data bypass 12 and register 86 as signal MD ₆ when signal BY ₇ is applied at a high level.

【００５８】命令変更器の一例を図８に示す。命令変更
器９２は、変更命令であるレジスタ・ファイル書き込み
命令を保持するレジスタ９２１とレジスタ９５に接続さ
れた選択回路９２２からなる。選択回路９２２は命令コ
ードメモリ・アドレス比較器９０から与えられた、メモ
リ・データが得られたことを示す信号ＢＹ₇〜ＢＹ
₉と、自己のレジスタ９４に接続される命令コードを受
け取る。選択回路９２２は、信号ＢＹ₇〜ＢＹ₉によっ
て制御される。すなわち、信号ＢＹ₇〜ＢＹ₉のうち１
個でも高レベルのものがある場合、レジスタ９２１に保
持された命令コードが選択され、レジスタ９６に書き込
まれる。FIG. 8 shows an example of the instruction changer. The instruction changer 92 includes a register 921 that holds a register file write instruction that is a change instruction and a selection circuit 922 that is connected to the register 95. The selection circuit 922 outputs the signals BY _{7 to} BY, which are given from the instruction code memory / address comparator 90 and indicate that the memory data has been obtained.
₉ and the instruction code connected to its own register 94. The selection circuit 922 is controlled by the signals BY _{7 to} BY ₉ . That is, one of the signals BY _{7 to} BY ₉
If there is even a high level, the instruction code held in the register 921 is selected and written in the register 96.

【００５９】図９を参照して動作について説明する。以
下の説明において図１に示したスーパスカラプロセッサ
２０が図１７を参照して説明した命令１１〜１３を実行
するものと仮定する。そしてパイプラインＰＬ１が機能
ユニット６，パイプラインＰＬ２が機能ユニット７，パ
イプラインＰＬ３が機能ユニット８に相当しているもの
と仮定する。The operation will be described with reference to FIG. In the following description, it is assumed that the superscalar processor 20 shown in FIG. 1 executes the instructions 11 to 13 described with reference to FIG. It is assumed that the pipeline PL1 corresponds to the functional unit 6, the pipeline PL2 corresponds to the functional unit 7, and the pipeline PL3 corresponds to the functional unit 8.

【００６０】パイプラインＰＬ１においては、期間Ｔ₁
において命令１１の処理の内、命令フェッチステージ４
における処理ＩＦが行われ、期間Ｔ₂においては命令デ
コードステージ５における処理ＩＤが行われる。実行ス
テージ６１、メモリ・アクセス・ステージ６２、ライト
バック・ステージ６３における処理ＥＸＣ，ＭＥＭ，Ｗ
Ｂは、期間Ｔ₃，Ｔ₄，Ｔ₅においてそれぞれ処理され
る。In the pipeline PL1, the period T ₁
Instruction fetch stage 4 of the processing of instruction 11 in
Processing IF is performed, and processing ID in the instruction decoding stage 5 is performed in the period T ₂ . Processes EXC, MEM, W in the execution stage 61, the memory access stage 62, and the write back stage 63
B is processed in the periods T ₃ , T ₄ , and T ₅ , respectively.

【００６１】一方、パイプラインＰＬ２，ＰＬ３におい
て、期間Ｔ₁には命令１１及び１２の処理の内、命令フ
ェッチステージ４の処理ＩＦが実行され、期間Ｔ₂にお
いては命令デコードステージ５における処理ＥＸＣが行
われる。パイプラインＰＬ２においては、期間Ｔ₃にお
いてメモリ・アドレスが計算され、他の機能ユニット
６，８，９のメモリ・アドレスとが比較される。その結
果、パイプラインＰＬ１において計算されたメモリ・ア
ドレスと一致していた場合にはメモリ・データバイパス
１０を介して、パイプラインＰＬ１のデータがパイプラ
インＰＬ２に転送される。[0061] On the other hand, in the pipeline PL2, PL3, among the processing of instructions 11 and 12 in the period T _1, is processed IF the execution of the instruction fetch stage 4, the process EXC in the instruction decode stage 5 in the period T ₂ Done. In pipeline PL2, it is calculated memory address in the period T _3, and the memory address of the other functional units 6, 8 and 9 are compared. As a result, if the memory address calculated in the pipeline PL1 matches, the data in the pipeline PL1 is transferred to the pipeline PL2 via the memory data bypass 10.

【００６２】また、パイプラインＰＬ３においては、期
間Ｔ₃では待機するものの、期間Ｔ₄においては、既に
パイプラインＰＬ２の実行データが得られているので、
データ・バス１２を介して、データの転送が行われ、期
間Ｔ₄で実行ステージ６２が処理される。[0062] In the pipeline PL3, but waits during the period T _3, in a period T _4, since already running data pipeline PL2 is obtained,
Data is transferred via the data bus 12 and the execute stage 62 is processed during period T ₄ .

【００６３】図１０は、図２と同様に示した、機能ユニ
ット７の構成図である。機能ユニット７のメモリ・アド
レス比較器９０は、他の機能ユニット６、及び８，９の
メモリ・アドレスＭ₆，Ｍ₈，Ｍ₉と自身のメモリ・ア
ドレスＭ₇を比較する。また、他の機能ユニット６，
８，９の命令コードＯＰ₆，ＯＰ₈，ＯＰ₉を参照す
る。これらの命令コードＯＰ₆，ＯＰ₈，ＯＰ₉が書き
込み命令で、かつアドレスの一致が検出された場合、信
号ＢＹ₆，ＢＹ₈，ＢＹ₉を発生する。FIG. 10 is a block diagram of the functional unit 7 shown similarly to FIG. The memory address comparator 90 of the functional unit ₇ compares its own memory address M ₇ with the memory addresses M ₆ , M ₈ and M ₉ of the other functional units 6 and 8, 9. In addition, other functional units 6,
The instruction codes OP ₆ , OP ₈ and OP ₉ of ₈ and ₉ are referred to. If these instruction code OP _6, OP _8, OP ₉ is a write command, and an address match is detected, it generates a signal _{_{_{BY 6, BY 8, BY 9}}} .

【００６４】さらに、機能ユニット７のメモリ・データ
・セレクタ９１は、信号ＢＹ₆，ＢＹ₈，ＢＹ₉に応答
して、メモリ・データバス１０を介して得られたデータ
Ｒ₆，Ｒ₈，Ｒ₉を信号ＭＤ₇として選択的にレジスタ
８６に与える。つまり、命令コードが書き込み命令とな
っている他の機能ユニットとメモリアドレスが一致した
場合、その機能ユニットのレジスタ８１から得られたデ
ータが、機能ユニット７のレジスタ８６に転送される。Further, the memory data selector 91 of the functional unit 7 is responsive to the signals BY ₆ , BY ₈ and BY ₉ to obtain the data R ₆ , R ₈ and R obtained via the memory data bus 10. ₉ is selectively applied to the register 86 as the signal MD ₇ . That is, when the memory address matches another functional unit whose instruction code is a write instruction, the data obtained from the register 81 of the functional unit is transferred to the register 86 of the functional unit 7.

【００６５】すなわち機能ユニット７は、データをデー
タ・メモリ２から読み出すことなく得ることができるの
で、命令１２の処理ＥＸＣは、命令１１の処理ＷＢの終
了を待つ必要がなくなる。That is, since the functional unit 7 can obtain the data without reading from the data memory 2, the processing EXC of the instruction 12 does not need to wait for the end of the processing WB of the instruction 11.

【００６６】図１１に機能ユニット７の命令変更器９２
の構成を示す。機能ユニット７の命令変更器９２は、信
号ＢＹ₆，ＢＹ₈，ＢＹ₉に応答して、レジスタ９５ま
たは、レジスタ９２１を選択する。信号ＢＹ₆，Ｂ
Ｙ₈，ＢＹ₉のうち１個でも高レベルのものがある場
合、レジスタ９２１に保持された命令コードが選択さ
れ、レジスタ９６に書き込まれることにより、命令を変
更することができる。そのため、無駄なメモリアクセス
命令を実行する必要がなくなる。FIG. 11 shows the instruction changer 92 of the functional unit 7.
Shows the configuration of. The instruction changer 92 of the functional unit 7 selects the register 95 or the register 921 in response to the signals BY ₆ , BY ₈ and BY ₉ . Signals BY ₆ and B
When at least one of Y ₈ and BY ₉ has a high level, the instruction code held in the register 921 is selected and written in the register 96, so that the instruction can be changed. Therefore, it becomes unnecessary to execute useless memory access instructions.

【００６７】図１２に機能ユニット８のレジスタファイ
ル・アドレス比較器８０の構成を示す。機能ユニット８
のレジスタファイル・アドレス比較器８０は、他の機能
ユニット６，７，９の格納アドレスＤ₆，Ｄ₇，Ｄ₉と
自身命令に含まれるソース・アドレスＳ₁₁，Ｓ₁₂を比較
する。アドレスの一致が検出された場合、選択信号
Ｓ₁₆，Ｓ₁₇，Ｓ₁₉及び、Ｓ₂₆，Ｓ₂₇，Ｓ₂₉を発生する。FIG. 12 shows the configuration of the register file / address comparator 80 of the functional unit 8. Functional unit 8
Register file address comparator 80 compares the source address S _11, S ₁₂ contained in the storage address D _6, D _7, D ₉ and its own instruction of other functional units 6, 7, 9. When an address match is detected, the selection signals S ₁₆ , S ₁₇ , S ₁₉ and S ₂₆ , S ₂₇ , S ₂₉ are generated.

【００６８】図１３に機能ユニット８のレジスタファイ
ル・データ・セレクタ８３の構成を示す。機能ユニット
８のレジスタファイル・データ・セレクタ８３は、選択
信号Ｓ₁₆，Ｓ₁₇，Ｓ₁₉及びＳ₂₆，Ｓ₂₇，Ｓ₂₉、並びに他
の機能ユニット６，７，９からの、メモリ・データが得
られたことを示す信号ＢＹ₈に応答して、データバイパ
ス１２上の他の機能ユニット６，７，９において得られ
たメモリ・データＭＤ₆，ＭＤ₇，ＭＤ₉をデータバス
３１に与える。FIG. 13 shows the configuration of the register file data selector 83 of the functional unit 8. The register file data selector 83 of the functional unit 8 receives the memory data from the selection signals S ₁₆ , S ₁₇ , S ₁₉ and S ₂₆ , S ₂₇ , S ₂₉ , and the other functional units 6, 7, and 9. In response to the obtained signal BY ₈ , the memory data MD ₆ , MD ₇ , MD ₉ obtained in the other functional units 6, 7, 9 on the data bypass 12 are applied to the data bus 31.

【００６９】すなわち、データ選択信号Ｓ₁₆，Ｓ₁₇，Ｓ
₁₉及び、Ｓ₂₆，Ｓ₂₇，Ｓ₂₉が高レベルで与えられ、かつ
信号ＢＹ₆，ＢＹ₇，ＢＹ₉が高レベルで与えられたと
き、その信号に応じて、データバイパス１２上のデータ
ＭＤ₆，ＭＤ₇，ＭＤ₉をデータバス３１乃至３２に与
える。That is, the data selection signals S ₁₆ , S ₁₇ , S
_{When 19} and S ₂₆ , S ₂₇ , S ₂₉ are given at a high level and the signals BY ₆ , BY ₇ , BY ₉ are given at a high level, the data MD on the data bypass 12 is responded according to the signals. ₆ , MD ₇ , MD ₉ are supplied to the data buses 31 to 32.

【００７０】従って、他の機能ユニット６，７，９のデ
ータが機能ユニット８に転送される。すなわち機能ユニ
ット８は、データをレジスタファイル３に書き込まれる
のを待つことなく得ることができるので、命令１３の処
理ＩＤは、命令１２の処理ＷＢの終了を待つ必要がなく
なる。Therefore, the data of the other functional units 6, 7, 9 are transferred to the functional unit 8. That is, since the functional unit 8 can obtain the data without waiting for the data to be written in the register file 3, the process ID of the instruction 13 does not need to wait for the end of the process WB of the instruction 12.

【００７１】以上のようにして命令１２，１３は待機時
間を短縮することができる。つまり、図９に示されるよ
うに、期間Ｔ₃の終了時点でパイプラインＰＬ１のデー
タがメモリ・データバス１０を介してパイプラインＰＬ
２に与えられるので、期間Ｔ ₄で停止することなく命令
１２を継続して処理できる。また、メモリ読み出し命令
を実行することもなくなる。As described above, the instructions 12 and 13 are waiting
The time can be shortened. That is, as shown in FIG.
Sea urchin, period T₃Of the pipeline PL1 at the end of
Data through the memory / data bus 10 through the pipeline PL
Given to 2, so the period T _FourInstruction without stopping at
12 can be processed continuously. Also, a memory read instruction
Will not be executed.

【００７２】また、パイプラインＰＬ２に与えられたデ
ータがデータバス１２を介してパイプラインＰＬ３に与
えられるので、期間Ｔ₄において実行ステージ（ＥＸ
Ｃ）を処理することができる。[0072] Further, since the data applied to the pipe line PL2 is supplied to a pipe line PL3 via the data bus 12, execution stage in the period T ₄ (EX
C) can be processed.

【００７３】[0073]

【発明の効果】以上のように、この発明によれば、アド
レスの一致を検出し、他のパイプラインのメモリ書き込
みデータを、メモリに書き込まれる以前の他のパイプラ
インに直接与えるので、命令の実行に要する時間を短縮
できる。As described above, according to the present invention, the address match is detected, and the memory write data of another pipeline is directly applied to another pipeline before being written in the memory. The time required for execution can be shortened.

【００７４】また、不必要になったメモリ読み出し命令
の実行を中止することにより、メモリアクセスを減少す
ることができる。Further, the memory access can be reduced by stopping the execution of the unnecessary memory read instruction.

【００７５】さらに、上記の手段によって得られたメモ
リ読み出し命令によって得られるはずのデータを他のパ
イプラインに直接与えるので、命令の実行に要する時間
を短縮できる。Furthermore, since the data which should be obtained by the memory read instruction obtained by the above means is directly given to another pipeline, the time required for executing the instruction can be shortened.

[Brief description of drawings]

【図１】この発明の一実施例を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of the present invention.

【図２】機能ユニット６の構成図である。FIG. 2 is a configuration diagram of a functional unit 6.

【図３】機能ユニット６のレジスタファイル・アドレス
比較器の構成図である。FIG. 3 is a configuration diagram of a register file / address comparator of a functional unit 6.

【図４】機能ユニット６のレジスタファイル・データ・
セレクタの構成図である。[Fig. 4] Register file data of functional unit 6
It is a block diagram of a selector.

【図５】演算実行器の構成図である。FIG. 5 is a configuration diagram of an arithmetic execution unit.

【図６】メモリ・アドレス比較器の構成図である。FIG. 6 is a configuration diagram of a memory / address comparator.

【図７】メモリ・データ・セレクタの構成図である。FIG. 7 is a configuration diagram of a memory data selector.

【図８】命令変更器の構成図である。FIG. 8 is a configuration diagram of an instruction changer.

【図９】この発明の一実施例の動作を示すタイミング図
である。FIG. 9 is a timing chart showing the operation of the embodiment of the present invention.

【図１０】機能ユニット７の構成図である。FIG. 10 is a configuration diagram of a functional unit 7.

【図１１】機能ユニット７の命令変更器の構成図であ
る。11 is a configuration diagram of an instruction changer of the functional unit 7. FIG.

【図１２】機能ユニット８のレジスタファイル・アドレ
ス比較器の構成図である。12 is a configuration diagram of a register file / address comparator of the functional unit 8. FIG.

【図１３】機能ユニット８のレジスタファイル・データ
・セレクタの構成図である。13 is a configuration diagram of a register file data selector of the functional unit 8. FIG.

【図１４】従来の技術を示すブロック図である。FIG. 14 is a block diagram showing a conventional technique.

【図１５】従来の技術を示すタイミング図である。FIG. 15 is a timing diagram showing a conventional technique.

【図１６】従来の技術を示すタイミング図である。FIG. 16 is a timing diagram showing a conventional technique.

【図１７】従来の技術を示すタイミング図である。FIG. 17 is a timing diagram showing a conventional technique.

[Explanation of symbols]

５命令デコードステージ１０メモリ・データ・バイパス１２データ・バイパス８０レジスタファイル・アドレス比較器８３レジスタファイル・データ・セレクタ９０メモリ・アドレス比較器９１メモリ・データ・セレクタ９２命令変更器 5 instruction decode stage 10 memory data bypass 12 data bypass 80 register file address comparator 83 register file data selector 90 memory address comparator 91 memory data selector 92 instruction modifier

Claims

[Claims]

1. A plurality of pipeline processing execution means, a plurality of given instructions are fetched, a predetermined instruction that can be simultaneously executed is found from the plurality of fetched instructions, and the pipeline processing execution means is provided. An instruction giving means for inputting the predetermined instruction, a temporary data storage means for holding the data for a relatively short period during the processing of the predetermined instruction, an external memory for holding the data for a relatively long term, and the pipeline processing execution A data bypass bus for transmitting data handled by the means, and each of the pipeline processing execution means includes the data bypass bus.
A parallel computer having bypass control means for determining whether to use the data transmitted by the bypass bus.

2. The bypass control means in the one pipeline processing execution means, the memory address of the external memory obtained in one of the predetermined instructions corresponding to the one pipeline processing execution means, and the like. External memory address coincidence detecting means for detecting coincidence with the memory address of the external memory obtained in the other predetermined instruction corresponding to the pipeline processing executing means, and according to the output of the external memory address coincidence detecting means. ,
The parallel computer according to claim 1, further comprising a first data providing unit that selects whether to use the data transmitted by the data bypass bus or the data separately loaded from the external memory.

3. The external memory address coincidence detection means outputs a coincidence signal indicating whether or not data is bypassed, and when the coincidence signal indicates coincidence, the one predetermined instruction is bypassed. The parallel computer according to claim 2, further comprising an instruction changing unit that changes the data into an instruction to store the data in the temporary data storage unit.

4. The bypass control means in the one pipeline processing execution means, the address of the temporary data storage means obtained in the one predetermined instruction corresponding to the one pipeline processing execution means, Temporary data storage means address coincidence detection means for detecting coincidence with the address of the temporary data storage means obtained in the other predetermined instruction corresponding to the other pipeline processing execution means, and the other pipeline processing Second data providing means for selecting whether to use the data transmitted by the data bypass bus according to the output of the external memory address coincidence detecting means of the executing means and the output of the temporary data storing means address coincidence detecting means. And further comprising:
The described parallel computer.