JPH056274A

JPH056274A - Data processor

Info

Publication number: JPH056274A
Application number: JP15684491A
Authority: JP
Inventors: Masaki Hashizume; 雅樹橋詰
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1991-06-27
Filing date: 1991-06-27
Publication date: 1993-01-14

Abstract

PURPOSE:To execute a high-speed processing without competing units even when a store instruction and the instruction of another load system are mixed. CONSTITUTION:A calculation data supplying means 31 is provided to read the contents of a register 83b in a register file 8 and to supply them to a computing element 7, and a store data supplying means 32 is provided to read the contents of a different register 83a in the register file 8 independently of the calculation data supplying means 31 and to supply the data to an operand access unit 5 as store data. By parallelly executing the operation processing of the computing element 7 and the write operation of the operand access unit 5 to a main storage device, the preprocessing of a store instruction is executed. Thus, the following store instruction can be simultaneously processed while executing the preceding load system instruction.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、パイプライン方式の
データ処理装置のストア命令の高速化に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to speeding up a store instruction in a pipeline type data processing device.

【０００２】[0002]

【従来の技術】データ処理装置が扱う命令には大別し
て、ロード系の命令と、ストア系の命令がある。パイプ
ライン方式のデータ処理装置では、ストア系の命令は、
種々の制約から思うように高速に処理できなかった。例
えば特開昭６３−１３６１３８はロード系命令直後のス
トア系命令処理の高速化に関し、また例えば特開昭６２
−１０３２などはストア系命令直後のロード系命令処理
の高速化に関するものであるが、いずれもストア系命令
がロード系命令に挟まれた場合の処理については考えら
れていない。2. Description of the Related Art Instructions handled by a data processing device are roughly classified into load type instructions and store type instructions. In a pipeline type data processing device, store instructions are
Due to various restrictions, it could not be processed at high speed as expected. For example, Japanese Patent Laid-Open No. 63-136138 relates to speeding up store-type instruction processing immediately after a load-type instruction.
Although -1032 and the like relate to speeding up of load-type instruction processing immediately after the store-type instruction, none of them consider the processing when the store-type instruction is sandwiched between the load-type instructions.

【０００３】図１５は、従来のデータ処理装置の一例を
示すブロック図であり、図において、１は基本処理装置
（ＢＰＵ）、２は主記憶装置（ＭＭＵ）、３は命令読出
ユニット（ＩＰＵ）、４は命令デコード・ユニット（Ｄ
ＥＣ）、５はオペランド・アクセス・ユニット（ＯＰ
Ｕ）、６は演算・実行ユニット（ＥＸＵ、演算器ともい
う）である。図１６は、ＥＸＵ６のより詳細なブロック
図であり、図において７は演算器（ＡＬＵ）、８はレジ
スタ・ファイル（ＲＥＧ）である。ロード系の命令は、
ＭＭＵ２より読み出したオペランド・データ及びＲＥＧ
８の内容を入力とする演算をＡＬＵ７により行ない、結
果をＲＥＧ８に格納する。一方、ストア系の命令はＲＥ
Ｇ８の内容を読み出しＭＭＵ２へ書き込む。FIG. 15 is a block diagram showing an example of a conventional data processing apparatus. In the figure, 1 is a basic processing unit (BPU), 2 is a main memory unit (MMU), and 3 is an instruction reading unit (IPU). 4 is an instruction decode unit (D
EC), 5 is an operand access unit (OP
U) and 6 are arithmetic / execution units (also referred to as EXUs and arithmetic units). FIG. 16 is a more detailed block diagram of the EXU 6, in which 7 is an arithmetic unit (ALU) and 8 is a register file (REG). The load instructions are
Operand data and REG read from MMU2
The operation of inputting the contents of 8 is performed by the ALU 7, and the result is stored in the REG 8. On the other hand, store instructions are RE
The contents of G8 are read and written to MMU2.

【０００４】次に動作について説明する。ロード系の命
令では、ＩＰＵ３はＭＭＵ２より命令コードを読み出
す。ＤＥＣ４はＩＰＵ３が読み出した命令コードをデコ
ードし各種制御信号を生成して、ＯＰＵ５やＥＸＵ６を
制御する。ＯＰＵ５はＤＥＣ４からの制御信号により必
要とするオペランドデータをＭＭＵ２より読み出す。Ｅ
ＸＵ６はＤＥＣ４からの指示により、ＯＰＵ５が読み出
したデータと、ＤＥＣ４が指示したＲＥＧ８の内容を、
ＡＬＵ７により加工してＲＥＧ８に格納する。Next, the operation will be described. For load-type instructions, the IPU 3 reads the instruction code from the MMU 2. The DEC 4 decodes the instruction code read by the IPU 3, generates various control signals, and controls the OPU 5 and the EXU 6. The OPU 5 reads out necessary operand data from the MMU 2 according to the control signal from the DEC 4. E
The XU 6 receives the data read by the OPU 5 and the contents of the REG 8 specified by the DEC 4 according to the instruction from the DEC 4.
It is processed by the ALU 7 and stored in the REG 8.

【０００５】図１７は、上記の一連の処理の流れを示す
パイプライン処理構成図である。ステージ１は命令読み
出しステージであり、ＩＰＵ３の処理に対応する。ステ
ージ２は命令デコード・ステージであり、ＤＥＣ４の処
理に対応する。ステージ３はオペランド・アクセス・ス
テージであり、ＯＰＵ５の処理に対応する。ステージ４
は演算・実行ステージであり、ＥＸＵ６の処理に対応す
る。FIG. 17 is a pipeline processing configuration diagram showing a flow of the above-described series of processing. Stage 1 is an instruction read stage and corresponds to the processing of the IPU 3. Stage 2 is an instruction decoding stage and corresponds to the processing of DEC4. Stage 3 is an operand access stage and corresponds to the processing of OPU5. Stage 4
Is an operation / execution stage and corresponds to the processing of EXU6.

【０００６】図１８は、上記の一連の処理の流れを示す
パイプライン遷移図である。横軸に時間を、縦軸にパイ
プラインのステージを取った。Ｌはロード命令、Ａは加
算命令、Ｎは論理積命令であり、いずれの命令もロード
系の命令である。このような命令が連続して実行される
ときは、パイプラインの流れは乱れる事なくスムーズに
処理が進む。FIG. 18 is a pipeline transition diagram showing the flow of the series of processes described above. The horizontal axis represents time and the vertical axis represents the pipeline stage. L is a load instruction, A is an addition instruction, N is a logical product instruction, and all the instructions are load-related instructions. When such instructions are continuously executed, the flow of the pipeline is not disturbed and the processing proceeds smoothly.

【０００７】図１９は、ストア命令の処理を示すパイプ
ライン遷移図である。ＳＴはストア命令を示す。ストア
命令は、ＲＥＧ８の内容をそのままＭＭＵ２に書き込む
ストア系の命令である。サイクル２でＩＰＵ３により読
み込まれたストア命令は、サイクル３でＤＥＣ４により
デコードされる。この後ロード系の命令であればサイク
ル４はステージ３でありＯＰＵ５によるオペランド・ア
クセスが行なわれるが、ストア命令ではオペランド・デ
ータの読み出しは不要であるから、ステージ３を省略し
てステージ４へ遷移した方が良い。しかし、サイクル４
はＬ命令がステージ４にあり、このＬ命令が、ＡＬＵ７
を経由してストア・データをＯＰＵ５へ出力するストア
命令と競合するので、ストア命令は１サイクル余計にサ
イクル４までＤＥＣ４に保持されることになる。サイク
ル５でストア命令はステージ４に遷移し、ストア・デー
タはＲＥＧ８より取り出される。次にストア命令の制御
はＯＰＵ５に移り、ＲＥＧ８より取り出されたデータは
ＯＰＵ５経由でＭＭＵ２に書き込まれる。ストア命令に
続く加算命令は、サイクル４にてストア命令がステージ
２に１サイクル余計に保持されていたため、ステージ１
に２サイクル保持される。さらにストア命令がＯＰＵ５
を使用するサイクルと加算命令がＯＰＵ５を使用するサ
イクルとの競合を避けるため、サイクル６で加算命令は
ステージ２でも１サイクル余計に保持されることにな
る。FIG. 19 is a pipeline transition diagram showing processing of a store instruction. ST indicates a store instruction. The store instruction is a store-type instruction that directly writes the contents of REG8 to MMU2. The store instruction read by the IPU 3 in the cycle 2 is decoded by the DEC 4 in the cycle 3. After that, if it is a load type instruction, cycle 4 is stage 3 and operand access is performed by the OPU 5, but read of operand data is not necessary for the store instruction, so stage 3 is omitted and transition is made to stage 4. It is better to do it. But cycle 4
Has an L instruction in stage 4, and this L instruction is
Since it conflicts with the store instruction for outputting the store data to the OPU 5 via the, the store instruction is held in the DEC 4 up to cycle 4 for one extra cycle. In cycle 5, the store instruction transits to stage 4, and the store data is fetched from REG8. Next, the control of the store instruction is transferred to the OPU 5, and the data fetched from the REG 8 is written to the MMU 2 via the OPU 5. The add instruction following the store instruction was added to the stage 1 because the store instruction was held in the stage 2 for an extra cycle in the cycle 4.
Held for 2 cycles. Furthermore, the store instruction is OPU5.
In order to avoid a conflict between the cycle of using the add instruction and the cycle of the add instruction using the OPU5, the add instruction in the cycle 6 is held for one extra cycle in the stage 2 as well.

【０００８】[0008]

【発明が解決しようとする課題】従来のデータ処理装置
は以上のように構成されているので、ストア命令と他の
ロード系の命令との間でオペランド・アクセス・ユニッ
トの競合が発生するので、該競合回避の為に余計なサイ
クルを必要し、高速にデータ処理できない問題があっ
た。Since the conventional data processing device is constructed as described above, since the operand access unit conflicts between the store instruction and another load type instruction, There is a problem that an extra cycle is required to avoid the contention and high-speed data processing cannot be performed.

【０００９】この発明は上記のような問題点を解消する
ためになされたもので、ストア命令と他のロード系の命
令が混在しても、ユニットの競合がなく、高速に処理で
きるデータ処理装置を得ることを目的としている。The present invention has been made in order to solve the above problems, and even if a store instruction and another load instruction are mixed, there is no unit competition and a data processing device capable of high-speed processing. The purpose is to get.

【００１０】[0010]

【課題を解決するための手段】第一の発明に係るデータ
処理装置は、演算実行ユニットのストア・データ出力ポ
ートにマルチプレクサと、レジスタ・ファイル内の異な
る２つのレジスタの内容をそれぞれ独立に読み出す２つ
の読み出しポート（演算データ供給手段とストア・デー
タ供給手段）を持つレジスタ・ファイルとを備え、該レ
ジスタ・ファイルの出力ポートの一方を演算器のデータ
入力とし他方を前記マルチプレクサの一方の入力に接続
し、前記マルチプレクサの残る一方の入力に演算器の出
力を接続したものである。According to a first aspect of the present invention, there is provided a data processing device in which a store data output port of an operation execution unit and a content of two different registers in a register file are read independently from each other. A register file having two read ports (operation data supply means and store data supply means), one of the output ports of the register file being the data input of the arithmetic unit and the other being connected to one input of the multiplexer However, the output of the arithmetic unit is connected to the other input of the multiplexer.

【００１１】また、第二の発明に於ては上述の目的を達
成するため、基本処理装置と主記憶装置の間にストア・
バッファを設け、フラグにより先行処理か否かを判定
し、このフラグでストア・バッファから主記憶装置辺の
書き込みを制御するようにしたことを特徴としている。According to the second aspect of the invention, in order to achieve the above-mentioned object, a storage device is provided between the basic processing device and the main storage device.
It is characterized in that a buffer is provided, whether a preceding process is performed or not is determined by a flag, and writing from the store buffer to the main storage device side is controlled by this flag.

【００１２】また、第三の発明に於ては上述の目的を達
成するため、ストア・アドレスを一時保持する無効アド
レス・レジスタを設け、無効アドレス・レジスタにより
キャシュの内容を無効化できるようにして先行処理を実
行する事を特徴としている。In order to achieve the above-mentioned object, the third invention is provided with an invalid address register for temporarily holding a store address so that the contents of the cache can be invalidated by the invalid address register. It is characterized by executing the preceding processing.

【００１３】[0013]

【作用】第一の発明に依れば、演算データ供給手段によ
るレジスタ・ファイル内のレジスタの内容の演算器への
供給と、ストア・データ供給手段による前記レジスタと
異なるレジスタの内容をストア・データとするオペラン
ド・アクセス・ユニットへの供給とを並列に行なう事が
できる。According to the first aspect of the present invention, the arithmetic data supplying means supplies the contents of the register in the register file to the arithmetic unit, and the store data supplying means stores the contents of the register different from the register. And the supply to the operand access unit can be performed in parallel.

【００１４】また、第二の発明に依れば、ストア命令に
よるストア・アドレスおよびストア・データはストア・
バッファに一時保持され、フラグが先行処理を示す場合
は、ストア命令に先行する命令の正常終了を待って、主
記憶装置を書き換える。According to the second invention, the store address and the store data by the store instruction are stored as
If the flag temporarily holds in the buffer and the flag indicates preceding processing, the main memory is rewritten after waiting for the normal end of the instruction preceding the store instruction.

【００１５】また、第三の発明に依れば、ストア命令に
先行する命令の異常処理によりストアを破棄しなければ
ならなくなっても、無効アドレス・レジスタに保持され
ているストア・アドレスに対応するキャシュ・メモリの
内容を無効化する事ができる。Further, according to the third invention, even if the store has to be discarded due to the abnormal processing of the instruction preceding the store instruction, it corresponds to the store address held in the invalid address register. It is possible to invalidate the contents of the cache memory.

【００１６】[0016]

【Example】

実施例１．以下、図１に示す一実施例に基づき第一の発
明の一実施例を詳述すると、図中８は出力ポートを２本
（８１と８２）持つレジスタ・ファイル（ＲＥＧ）であ
り、９はマルチプレクサ（ＭＰＸ）である。また、８１
はストア命令専用ポート、８２は演算データ専用ポー
ト、８３はレジスタ群、７１は演算器出力ポートであ
る。また、３１はストア・データ供給手段、３２は演算
データ供給手段である。その他の部分は従来例と同様で
ある。Example 1. An embodiment of the first invention will be described in detail below based on the embodiment shown in FIG. 1. In the figure, 8 is a register file (REG) having two output ports (81 and 82), and 9 is a register file. It is a multiplexer (MPX). Also, 81
Is a store instruction dedicated port, 82 is a calculation data dedicated port, 83 is a register group, and 71 is a calculator output port. Further, 31 is a store data supply means, and 32 is a calculation data supply means. Other parts are the same as in the conventional example.

【００１７】次に動作について説明する。ロード系の命
令の動作は従来のデータ処理装置と同じである。ストア
命令は、ＩＰＵ３で読み出されＤＥＣ４でデコードされ
た後、そのままＯＰＵ５を使用しデータのＭＭＵ２への
書き込み動作を始める。書き込みデータはＲＥＧ８のス
トア命令専用ポート８１からＯＰＵ５へ供給される。Next, the operation will be described. The operation of the load type instruction is the same as that of the conventional data processing device. The store instruction is read by the IPU 3 and decoded by the DEC 4, and then the OPU 5 is used as it is to start a data write operation to the MMU 2. The write data is supplied to the OPU 5 from the store instruction dedicated port 81 of the REG 8.

【００１８】図２は、この発明によるデータ処理装置の
パイプライン遷移図である。ストア命令はステージ１、
ステージ２の後、従来とは異なりサイクルをまたされる
ことなく、ステージ３でＯＰＵ５を使用しＭＭＵ２への
書き込み動作に入ることができる。これは従来ＲＥＧ８
からＯＰＵ５へのデータ経路がＡＬＵ７経由だったた
め、ロード系の命令のステージ４と競合していたのに対
して、この発明ではストア命令専用のデータ経路８１を
設けたので可能となったものである。すなわち、演算デ
ータ供給手段３２はＲＥＧ８内のレジスタ８３ｂの内容
を読み出してＡＬＵ７に供給するので、サイクル４でＥ
ＸＵ６はＬ命令を実行することができ、同時に、ストア
・データ供給手段３１がＲＥＧ８内の他のレジスタ８３
ａの内容を読み出してＭＰＸ９を経由してＯＰＵ５にス
トア・データとして供給するのでサイクル４でＯＰＵ５
はＳＴ命令を並行して実行することができる。以上によ
りストア命令は、サイクル４までで全ての処理を終えた
ことになり、ステージ４には移行せずに終了する。一
方、後続の加算命令のためのオペランド・アクセスはス
トア命令によるＯＰＵ５の使用とは競合しないので、加
算命令はサイクル５でステージ３に移行し、そのままス
テージ４に入る事ができる。FIG. 2 is a pipeline transition diagram of the data processing apparatus according to the present invention. Store instruction is stage 1,
After the stage 2, unlike the conventional case, the write operation to the MMU 2 can be started using the OPU 5 in the stage 3 without repeating the cycle. This is the conventional REG8
Since the data path from the OPU5 to the OPU5 was via the ALU7, it competed with the stage 4 of the load instruction, but in the present invention, it is possible because the data path 81 dedicated to the store instruction is provided. . That is, the arithmetic data supply means 32 reads the contents of the register 83b in the REG 8 and supplies the contents to the ALU 7, so that the cycle data E
The XU 6 can execute the L instruction, and at the same time, the store / data supply means 31 causes the other register 83 in the REG 8 to operate.
The contents of a are read and supplied as store data to the OPU5 via the MPX9.
Can execute ST instructions in parallel. As described above, the store instruction has completed all the processing up to cycle 4, and ends without shifting to stage 4. On the other hand, since the operand access for the subsequent add instruction does not conflict with the use of the OPU 5 by the store instruction, the add instruction can shift to stage 3 in cycle 5 and directly enter stage 4.

【００１９】以上のように第一の発明によれば、ＲＥＧ
８の内容をＡＬＵ７を経由せずにＯＰＵ５へ供給するこ
とが出来るので、ストア命令のステージ３で、ストア命
令によるＭＭＵ２への書き込み動作を、先行するロード
系の命令のステージ４と同時に行なうことが出来、さら
にストア命令に後続する他のロード系の命令のステージ
３におけるＯＰＵ５の使用との競合も避けることが出来
るので、ロード系の命令とストア命令が混在しても、パ
イプラインの流れを乱すことなく高速に処理することが
出来る。このように、ストア命令のオペランド・アクセ
スを、先行する命令の実行ステージと同時に並行処理す
ることを、プリストア処理と呼ぶことにする。As described above, according to the first invention, the REG
Since the contents of 8 can be supplied to the OPU 5 without passing through the ALU 7, the write operation to the MMU 2 by the store instruction can be performed at the same time as the stage 4 of the preceding load-related instruction at the stage 3 of the store instruction. Further, it is possible to avoid the conflict with the use of the OPU 5 in the stage 3 of the other load type instruction following the store instruction, so that the flow of the pipeline is disturbed even if the load type instruction and the store instruction are mixed. It can be processed at high speed. Such parallel processing of the operand access of the store instruction at the same time as the execution stage of the preceding instruction will be referred to as "prestore processing".

【００２０】尚、上記実施例ではＲＥＧ８に２つの出力
ポートを設けてステージ４でＡＬＵ７の入力とするポー
トとステージ３でＯＰＵ５の入力となるポートを得るも
のを示したが、図３のようにＡＬＵ７の入力専用のＲＥ
Ｇ１（８ａ）とＯＰＵ５の入力専用のＲＥＧ２（８ｂ）
を設けてＡＬＵ７の出力は両方に同じデータを同時に書
き込む様にしても、上記実施例と同様の作用効果を奏す
る。In the above embodiment, two output ports are provided in the REG 8 to obtain a port for inputting the ALU 7 in the stage 4 and a port for inputting the OPU 5 in the stage 3, but as shown in FIG. RE dedicated to ALU7 input
REG2 (8b) dedicated to G1 (8a) and OPU5 input
Even if the same data is simultaneously written to both of the outputs of the ALU 7 by providing the same effect, the same operation and effect as in the above embodiment can be obtained.

【００２１】実施例２．図４は従来技術による構成に対
して実施例１に依るデータ処理装置を付加した場合のパ
イプライン遷移図であり、Ｍは乗算命令、ＳＴはストア
命令である。乗算命令を実現するのにＡＬＵ７による加
算を繰り返すとすれば、Ｍ命令のステージ４は１サイク
ルでは終わらない。さらに場合によっては、例えばサイ
クル７で加算がオーバ・フローを検出して以後の命令シ
ーケンスを変更し、オーバ・フローによる例外処理を実
行する事がある。一方ＳＴはサイクル４でステージ３に
達しＭＭＵ２への書き込みを行なってしまっている。Ｍ
命令の終了後は、ＳＴ命令を実行せずに例外処理をしな
ければならないのに、既にＳＴ命令によるＭＭＵ２の書
き換えが行われてしまったことになる。Example 2. FIG. 4 is a pipeline transition diagram in the case where the data processing device according to the first embodiment is added to the configuration according to the prior art, where M is a multiplication instruction and ST is a store instruction. If the addition by the ALU 7 is repeated to realize the multiplication instruction, the stage 4 of the M instruction does not end in one cycle. Further, in some cases, for example, in cycle 7, addition detects an overflow, changes the subsequent instruction sequence, and executes exception processing due to overflow. On the other hand, ST has reached stage 3 in cycle 4 and has written to MMU 2. M
After the completion of the instruction, the exception processing must be performed without executing the ST instruction, but the MMU2 has already been rewritten by the ST instruction.

【００２２】このような不具合を避けるためには、Ｍ命
令の最終サイクルまでＳＴ命令のステージ３を遅らせれ
ば良い。すなわち図５のように、Ｍ命令の最終ステップ
より１ステップ前のサイクルであるサイクル７で、ＳＴ
命令をステージ２からステージ３への遷移させれば良
い。ところがこの方法は、現在実行中のステージ４のサ
イクルが、最終ステップの一つ前のステップであるか否
かが判定できる必要があるが、常に該判定が可能とは限
らない。該判定が不可能なときは、図６の様にＭ命令の
最終ステップで、ＳＴ命令をステージ２から３へ遷移さ
せることになり、サイクル９でステージ４に空きサイク
ルが出来、十分に処理速度を向上することが出来ない。In order to avoid such a problem, the stage 3 of the ST instruction may be delayed until the final cycle of the M instruction. That is, as shown in FIG. 5, in the cycle 7 which is a cycle one step before the final step of the M instruction, ST
The instruction may be transited from stage 2 to stage 3. However, this method needs to be able to determine whether or not the cycle of the stage 4 currently being executed is the step immediately before the final step, but the determination is not always possible. If the judgment cannot be made, the ST instruction is transited from the stage 2 to the stage 3 in the final step of the M instruction as shown in FIG. Can not be improved.

【００２３】図７は、第二の発明に基づくデータ処理装
置の一実施例を示すブロック図であり、図において、１
０はストア・バッファ（ＳＴＢ）であり、その他の部分
は従来例と同様である。FIG. 7 is a block diagram showing an embodiment of a data processing device according to the second invention. In FIG.
0 is a store buffer (STB), and other parts are the same as in the conventional example.

【００２４】図８は、本発明に依るパイプライン遷移図
である。サイクル４でストア命令はステージ３にあり、
ストア・データはＳＴＢ１０に書き込まれ、まだＭＭＵ
２には書き込まれない。その後サイクル８に於て、Ｍ命
令の最終ステップでＭ命令が正常終了する事が確定して
から、ＳＴＢ１０の内容をＭＭＵ２に書き込む様にすれ
ば良い。この時、後続のＬ命令はサイクル５でＯＰＵ５
を使用してＭＭＵ２からデータを読み出すので、サイク
ル８のＭＭＵ２への書き込みと競合することはない。FIG. 8 is a pipeline transition diagram according to the present invention. In cycle 4, the store instruction is in stage 3,
Store data written to STB10, still MMU
Not written to 2. After that, in cycle 8, it is sufficient to write the contents of the STB 10 to the MMU 2 after it is confirmed that the M instruction is normally completed in the final step of the M instruction. At this time, the subsequent L instruction is OPU5 in cycle 5.
Is used to read data from MMU2, so it does not conflict with the write to MMU2 in cycle 8.

【００２５】図９はＳＴＢ１０のより詳細な構成図であ
り、図において、１１はストア・データ・レジスタ（Ｓ
ＤＲ）、１２ａ及び１２ｂはＪ／Ｋタイプのフリップ・
フロップであり、１３は入力バス（ＤＩＮ）、１４は出
力バス（ＤＯＵＴ）、１５はＢＰＵ１が出力するストア
指示信号（ＷＲ）、１６はステージ３での書き込みであ
ることを示すプリストア信号（ＰＲＥ）、１７はリセッ
ト信号（ＲＳＴ）、１８はプリストア実行指示信号（Ｅ
ＸＥ）、１９はＭＭＵ２への書き込みを指示するメモリ
書き込み指示信号（ＭＷ）である。FIG. 9 is a more detailed block diagram of the STB 10, in which 11 is a store data register (S
DR), 12a and 12b are J / K type flips.
A flop, 13 is an input bus (DIN), 14 is an output bus (DOUT), 15 is a store instruction signal (WR) output by the BPU 1, and 16 is a restore signal (PRE) indicating that writing is performed in the stage 3. ), 17 is a reset signal (RST), and 18 is a restore execution instruction signal (E
XE) and 19 are memory write instruction signals (MW) for instructing writing to the MMU 2.

【００２６】図１０は通常のライト動作を示すタイミン
グ・チャートである。サイクル５でＷＲ１５により書き
込み動作が開始され、ＤＩＮ１３がＳＤＲ１１に保持さ
れサイクル６以降ＤＯＵＴ１４として出力されている。
ＷＲ１５はＪＫ１２ａに保持されサイクル６でＭＷ１９
が有意になり、メモリへの書き込み動作が行われる。FIG. 10 is a timing chart showing a normal write operation. In cycle 5, the write operation is started by WR15, DIN13 is held in SDR11 and is output as DOUT14 in cycle 6 and thereafter.
WR15 is held by JK12a and MW19 is held in cycle 6
Becomes significant, and the write operation to the memory is performed.

【００２７】図１１は図８に対応するタイミング・チャ
ートである。サイクル５でＷＲ１５により書き込み動作
が開始され、ＤＩＮ１３がＳＤＲ１１に保持されサイク
ル６以降ＤＯＵＴ１４として出力されるところまでは、
図１０と同じである。ところが、サイクル５でＰＲＥ１
６がＷＲ１５と同時に有意になっているため、ＪＫ１２
ａはセットされず、ＪＫ１２ｂがセットされる。サイク
ル８、すなわちＭ命令の最終ステップＥＸＥ１８を有意
にする事により、ＭＷ１９を有意にして、またＪＫ１２
ｂをクリアする。以上により、ＳＴ命令がステージ３で
実行した書き込みを、ＳＴＢ１０により保持して、Ｍ命
令の正常終了を確認してサイクル８でＭＭＵ２に書き込
んだ事になる。FIG. 11 is a timing chart corresponding to FIG. In cycle 5, the write operation is started by WR15, DIN13 is held in SDR11 and is output as DOUT14 in cycle 6 and thereafter.
It is the same as FIG. However, in cycle 5, PRE1
6 is significant at the same time as WR15, so JK12
a is not set, but JK12b is set. In cycle 8, that is, by making the final step EXE18 of the M instruction significant, MW19 becomes significant, and JK12
Clear b. As described above, the write executed by the ST instruction at the stage 3 is held by the STB 10, the normal end of the M instruction is confirmed, and it is written to the MMU 2 at the cycle 8.

【００２８】図１２は図４に対応するタイミング・チャ
ートである。サイクル８でＭ命令がエラーを検出したの
でＥＸＥ１８をマスクし、エラー処理であるサイクル９
で、ＲＳＴ１７を有意にしてＪＫ１２ｂをクリアする。FIG. 12 is a timing chart corresponding to FIG. Since the M instruction detected an error in cycle 8, EXE18 is masked, and error processing is performed in cycle 9
Then, RST17 is made significant and JK12b is cleared.

【００２９】以上のように第二の発明によれば、プリス
トア処理による書き込みをＳＴＢ１０に一時保持し、先
行する命令の正常終了を待って主記憶装置への該書き込
みを実行するようにしたので、先行する命令の終了以前
に主記憶装置を変更してしまう不具合を避けることが出
来る。As described above, according to the second invention, the writing by the restore processing is temporarily held in the STB 10, and the writing to the main storage device is executed after waiting for the normal end of the preceding instruction. , It is possible to avoid the problem that the main storage device is changed before the end of the preceding instruction.

【００３０】なお、上記実施例ではストア・バッファと
して１個のデータ・レジスタによるものを示したが、複
数組のデータ・レジスタとＪＫ１２ａ及びＪＫ１２ｂに
より構成されるストア・バッファでも、同様の作用効果
を奏する。In the above embodiment, one data register is used as the store buffer, but a store buffer composed of a plurality of sets of data registers and JK12a and JK12b also has the same operation and effect. Play.

【００３１】実施例３．図１３は、キャシュ・メモリを
付加した構成例である。図において、２０はキャシュ・
メモリ（ＣＡＭ）であり、その他の部分は従来例と同様
である。このような構成でプリストア動作を行なうと、
ＣＡＭ２０がＳＴＢ１０とＢＰＵ１の間にあるので、先
行する命令の終了以前にＣＡＭ２０を書き換えてしまう
ことが起こる。Example 3. FIG. 13 shows an example of a configuration in which a cache memory is added. In the figure, 20 is a cache
It is a memory (CAM), and other parts are the same as in the conventional example. When the restore operation is performed with such a configuration,
Since the CAM 20 is between the STB 10 and the BPU 1, the CAM 20 may be rewritten before the end of the preceding instruction.

【００３２】図１４は、上記のような問題を解決するた
めに考えられたもので、図において、２１は無効アドレ
ス・レジスタ（ＩＮＶ）であり、その他の部分は従来例
と同様である。FIG. 14 is designed to solve the above problem. In the figure, 21 is an invalid address register (INV), and other parts are the same as in the conventional example.

【００３３】ＢＰＵ１より行われる全ての書き込み処理
として、ＳＴＢ１０にアドレスとデータを保持し、ＣＡ
Ｍ２０に該アドレスの対応するＭＭＵ２の内容の写しを
保持する場合、ＣＡＭ２０に該写しを書き込み、同時に
該書き込みアドレスをＩＮＶ２１にも保持する。ストア
命令に先行する命令の例外処理等により書き込み処理を
破棄する時は、ＲＳＴ１７信号を有意にしてＪＫ１２ｂ
をクリアするとともに、ＩＮＶ２１に保持してある書き
込みアドレスをＣＡＭ２０へ供給し、ＣＡＭ２０は該Ｉ
ＮＶ２１より供給されるアドレスに対応するＭＭＵ２の
内容の写しを無効化すれば良い。As all the write processing performed by the BPU1, the address and data are held in the STB10, and the CA
When the copy of the contents of MMU2 corresponding to the address is held in M20, the copy is written in CAM20, and at the same time, the write address is also held in INV21. When discarding the write processing due to exception processing of the instruction preceding the store instruction, the RST17 signal is made significant and JK12b
And the write address held in INV21 is supplied to CAM20, and CAM20
It is only necessary to invalidate the copy of the contents of the MMU 2 corresponding to the address supplied from the NV 21.

【００３４】以上のように第三の発明に依れば、ストア
命令のプリストア処理によりＣＡＭ２０の内容を書き換
えた後に該ストア命令を破棄しなければならなくなって
も、ＩＮＶ２１に保持してあるアドレスを使うことによ
り、ＣＡＭ２０の内容のうち問題のアドレスのみを無効
化する事ができるので、ＣＡＭ２０の内容を最大限保存
でき、プリストア処理を行ってもＣＡＭ２０のヒット率
を殆ど落とす事なく処理できる。As described above, according to the third invention, even if the store instruction must be discarded after the contents of the CAM 20 are rewritten by the restore processing of the store instruction, the address held in the INV 21 is stored. By using, it is possible to invalidate only the problematic address among the contents of CAM20, so that the contents of CAM20 can be saved as much as possible, and even if the restore process is performed, the hit rate of CAM20 can be processed with almost no drop. .

【００３５】[0035]

【発明の効果】以上のように第一の発明に依れば、演算
用のデータを供給する演算データ供給手段とストア命令
のプリストアの為のストア・データを供給するストア・
データ供給手段を設けたので、先行命令の演算処理とス
トア命令のプリストア処理を同時に行うことが出来、よ
り高速のデータ処理装置を得ることができる。As described above, according to the first aspect of the present invention, the operation data supplying means for supplying the operation data and the store data for supplying the store data for restoring the store instruction are provided.
Since the data supply means is provided, the arithmetic processing of the preceding instruction and the restore processing of the store instruction can be performed at the same time, and a faster data processing device can be obtained.

【００３６】また、第二の発明に依れば、ストア・バッ
ファを基本処理装置と記憶装置の間に設け、ストア・デ
ータ及びストア・アドレスを一旦保持し、該ストア命令
に先行する命令の終了を確認してから、前記保持したス
トア・データとストア・アドレスにより記憶装置を書き
換えるので、先行命令の正常終了を確認せずにプレスト
ア動作を開始できる。According to the second aspect of the invention, a store buffer is provided between the basic processing unit and the storage unit to temporarily hold the store data and the store address, and to end the instruction preceding the store instruction. Since the storage device is rewritten with the stored store data and the stored address after confirming that the prestore operation can be started without confirming the normal end of the preceding instruction.

【００３７】また、第三の発明に依れば、キャシュ・メ
モリを備えたデータ処理装置に於いて、ストア・アドレ
スを保持する無効アドレス・レジスタを備え、プリスト
ア処理を破棄するときに、該無効アドレス・レジスタの
内容に依りキャシュ・メモリに先行して書き込まれたデ
ータのみを破棄する事ができる。According to the third aspect of the invention, in a data processing device having a cache memory, an invalid address register for holding a store address is provided, and when the restore process is canceled, Only the data previously written to the cache memory can be discarded depending on the contents of the invalid address register.

【図面の簡単な説明】[Brief description of drawings]

【図１】第一の発明に係るデータ処理装置の演算・実行
ユニットの一構成例を示すブロック図。FIG. 1 is a block diagram showing a configuration example of an arithmetic / execution unit of a data processing device according to a first invention.

【図２】第一の発明に係るデータ処理装置の一実施例の
パイプライン遷移図。FIG. 2 is a pipeline transition diagram of one embodiment of the data processing device according to the first invention.

【図３】第一の発明に係るデータ処理装置の実行・演算
ユニットの第二の構成例を示すブロック図。FIG. 3 is a block diagram showing a second configuration example of an execution / arithmetic unit of the data processing device according to the first invention.

【図４】第一の発明に係るデータ処理装置の一実施例の
パイプライン遷移図。FIG. 4 is a pipeline transition diagram of an embodiment of the data processing device according to the first invention.

【図５】第一の発明に係るデータ処理装置の一実施例の
パイプライン遷移図。FIG. 5 is a pipeline transition diagram of an embodiment of the data processing device according to the first invention.

【図６】第一の発明に係るデータ処理装置の一実施例の
パイプライン遷移図。FIG. 6 is a pipeline transition diagram of an embodiment of the data processing device according to the first invention.

【図７】第二の発明に係るデータ処理装置の一実施例を
示すブロック図。FIG. 7 is a block diagram showing an embodiment of a data processing device according to the second invention.

【図８】第二の発明に係るデータ処理装置の一実施例の
パイプライン遷移図。FIG. 8 is a pipeline transition diagram of an embodiment of a data processing device according to the second invention.

【図９】第二の発明に係るデータ処理装置のストア・バ
ッファのより詳細な構成を示すブロック図。FIG. 9 is a block diagram showing a more detailed configuration of a store buffer of a data processing device according to a second invention.

【図１０】第二の発明に係るデータ処理装置のタイミン
グ・チャート図。FIG. 10 is a timing chart of the data processing device according to the second invention.

【図１１】第二の発明に係るデータ処理装置のタイミン
グ・チャート図。FIG. 11 is a timing chart of the data processing device according to the second invention.

【図１２】第二の発明に係るデータ処理装置のタイミン
グ・チャート図。FIG. 12 is a timing chart of the data processing device according to the second invention.

【図１３】第二の発明に係るデータ処理装置にキャシュ
・メモリを付加した構成を示すブロック図。FIG. 13 is a block diagram showing a configuration in which a cache memory is added to the data processing device according to the second invention.

【図１４】第三の発明に係るデータ処理装置の一実施例
を示すブロック図。FIG. 14 is a block diagram showing an embodiment of a data processing device according to the third invention.

【図１５】従来のデータ処理装置を示すブロック図。FIG. 15 is a block diagram showing a conventional data processing device.

【図１６】従来のデータ処理装置の演算・実行ユニット
を示すブロック図。FIG. 16 is a block diagram showing a calculation / execution unit of a conventional data processing device.

【図１７】従来のデータ処理装置のパイプライン構成を
示すパイプライン構成図。FIG. 17 is a pipeline configuration diagram showing a pipeline configuration of a conventional data processing device.

【図１８】従来のデータ処理装置のパイプライン遷移
図。FIG. 18 is a pipeline transition diagram of a conventional data processing device.

【図１９】従来のデータ処理装置のパイプライン遷移
図。FIG. 19 is a pipeline transition diagram of a conventional data processing device.

[Explanation of symbols]

１基本処理装置２主記憶装置３命令読出ユニット４命令デコード・ユニット５オペランド・アクセス・ユニット６演算・実行ユニット７演算器８レジスタ・ファイル９マルチ・プレクサ１０ストア・バッファ１１ストア・データ・レジスタ１２ＪＫフリップ・フロップ１３入力バス１４出力バス１５ストア指示信号１６プリストア指示信号１７リセット信号１８プリストア指示信号１９メモリ書き込み指示信号２０キャシュ・メモリ２１無効アドレス・レジスタ 1 Basic processing device 2 main memory 3 Instruction read unit 4 instruction decode unit 5 operand access unit 6 Operation / execution unit 7 arithmetic unit 8 register file 9 Multiplexer 10 store buffer 11 Store data register 12 JK flip flops 13 input bus 14 output bus 15 Store instruction signal 16 Prestore instruction signal 17 Reset signal 18 Prestore instruction signal 19 Memory write instruction signal 20 cash memory 21 Invalid Address Register

Claims

[Claims]

1. An arithmetic data supply means for reading out the contents of a register in a register file and supplying it to an arithmetic unit,
The operation data supply means independently reads the contents of different registers in the register file, and stores the read data as store data to a unit other than the operation device. A data processing device, wherein arithmetic processing and store data supply operation to a unit other than the arithmetic unit are performed in parallel.

2. A store buffer holding a store data and a store address generated by the store instruction, and another instruction in which the store data and the store address held in the store buffer precede the store instruction. By a flag indicating whether or not the preceding processing of the store instruction is executed before the end of the, and a signal indicating the normal processing of the preceding instruction preceding the store instruction when the flag indicates the preceding processing of the store instruction. When writing from the store buffer to the storage device, and when the flag does not indicate the preceding process, the writing from the store buffer to the storage device is performed regardless of the state of the signal indicating the normal end of the preceding instruction. Characteristic data processing device.

3. A cache memory for holding data,
When rewriting the contents of the cache memory by the preceding processing of the store instruction executed before the end of another instruction preceding the store instruction, an invalid address register for holding the store address of the data, and the cache memory After rewriting the contents of
A data processing device, comprising: invalidation means for discarding the contents of the memory by the address held in the invalid address register.