JP2000267849A

JP2000267849A - Pipeline processor

Info

Publication number: JP2000267849A
Application number: JP11071603A
Authority: JP
Inventors: Kazutaka Nogami; 一孝野上
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-03-17
Filing date: 1999-03-17
Publication date: 2000-09-29

Abstract

PROBLEM TO BE SOLVED: To obtain a pipeline processor capable of shotening the total execution time without reducing the execution time of an instruction having the worst cycle time and performing a fast operation. SOLUTION: The number of clocks necessary to the processing stage of each instruction is set to each instruction, a clock generation circuit 17 generates a clock signal CLK in accordance with a stage needing the longest time among a series of instructions that are subjected to pipeline processing on the basis of the number of clocks supplied from an instruction decoder 15. For this reason, cycle time is not regulated by the worst cycle time, and processing time can be reduced because the cycle of the signal CLK is varied in accordance with a simultaneously executed instruction.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、例えば命令実行の
各ステップをオーバーラップさせ、複数個の命令を並列
に処理することが可能なパイプライン・プロセッサに関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a pipeline processor capable of processing a plurality of instructions in parallel, for example, by overlapping each step of instruction execution.

【０００２】[0002]

【従来の技術】図９は、従来のパイプライン・プロセッ
サの動作を示している。この種のパイプライン・プロセ
ッサは、例えば命令のフェッチ（Ｆ）、デコード
（Ｄ）、演算の実行（Ｅ）、メモリアクセス（Ｍ）、レ
ジスタへの書き込み（Ｗ）からなるステージを有してい
る。前記命令のフェッチ（Ｆ）は、プログラムカウンタ
に従って命令メモリから命令を読み出す。デコード
（Ｄ）はこの読み出された命令を解読するとともにオペ
ランドアドレスを計算する。演算の実行（Ｅ）は、前記
デコード結果に応じて算術演算ユニット（ＡＬＵ）や浮
動小数点演算処理ユニットにより演算を行う。この演算
の後、メモリアクセス（Ｍ）において例えばデータキャ
ッシュがアクセスされ、書き込みステージ（Ｗ）におい
て、演算結果がレジスタに書き込まれる。2. Description of the Related Art FIG. 9 shows the operation of a conventional pipeline processor. This type of pipeline processor has, for example, stages composed of instruction fetch (F), decode (D), execution of operation (E), memory access (M), and writing to a register (W). . The instruction fetch (F) reads an instruction from an instruction memory according to a program counter. The decode (D) decodes the read instruction and calculates an operand address. The execution of the operation (E) is performed by an arithmetic operation unit (ALU) or a floating point operation processing unit according to the decoding result. After this operation, for example, the data cache is accessed in the memory access (M), and the operation result is written in the register in the write stage (W).

【０００３】[0003]

【発明が解決しようとする課題】ところで、上記従来の
パイプライン・プロセッサは、全ての命令のパイプライ
ンステージにおいて、最も処理時間を要するステージに
より各ステージのサイクル時間が規定されている。この
ため、図９に示す例の場合、例えば命令０の演算実行
（Ｅ）ステージが、全ての命令のパイプラインステージ
において、最も長時間を要する場合、他のステージのサ
イクル時間もこのステージのサイクル時間に一致され
る。In the above-mentioned conventional pipeline processor, the cycle time of each stage is defined by the stage requiring the longest processing time in the pipeline stages of all instructions. Therefore, in the case of the example shown in FIG. 9, for example, when the operation execution (E) stage of the instruction 0 takes the longest time in the pipeline stages of all the instructions, the cycle time of the other stages is also the cycle time of this stage. Matched with time.

【０００４】このように、従来のパイプライン・プロセ
ッサは、例えば１つの命令のあるパイプラインステージ
が他の命令のパイプラインステージに比べて例えば１．
５倍時間がかかると、その命令は非常に希にしか使用さ
れない場合においても、プロセッサの各ステージのサイ
クル時間を殆どの命令が処理できるサイクル時間の１．
５倍に設定しなければならない。As described above, in a conventional pipeline processor, for example, a pipeline stage having one instruction is compared with a pipeline stage of another instruction by, for example, 1.
If it takes five times as long, even if the instruction is used very rarely, the cycle time of each stage of the processor is 1.
Must be set to 5 times.

【０００５】また、プロセッサの設計を考慮した場合、
全ての命令、全てのパイプラインステージのうち、ワー
ストのサイクル時間、すなわち、最もサイクル時間が長
いパイプラインステージの処理時間を短くしなければ、
プロセッサの性能が向上しない。しかし、ワーストのサ
イクル時間を有する命令は、一般に、複雑な処理を行っ
ている。このため、このワーストのサイクル時間を有す
る命令を高速に処理可能とするには、複雑な処理を簡略
化したり、回路設計や製造プロセス等を再検討する必要
がある。したがって、非常に多くの労力を要するととも
に、製造コストの高騰を招くため得策ではない。In consideration of the design of a processor,
If the worst cycle time of all instructions and all pipeline stages, that is, the processing time of the pipeline stage with the longest cycle time is not shortened,
Processor performance does not improve. However, instructions with the worst cycle time generally perform complex processing. For this reason, in order to be able to process instructions having the worst cycle time at high speed, it is necessary to simplify complicated processing and reconsider the circuit design and manufacturing process. Therefore, it requires a great deal of labor and increases the production cost, which is not an advantage.

【０００６】本発明は、上記課題を解決するためになさ
れたものであり、その目的とするところは、ワーストの
サイクル時間を有する命令の実行時間を短縮することな
く、全体の実行時間を短縮でき、高速動作が可能なパイ
プライン・プロセッサを提供しようとするものである。SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to reduce the overall execution time without reducing the execution time of an instruction having the worst cycle time. And a pipeline processor capable of high-speed operation.

【０００７】[0007]

【課題を解決するための手段】本発明のパイプライン・
プロセッサは、上記課題を解決するため、複数のステー
ジを有し、複数の命令が１ステージずつずれて同時に実
行されるパイプライン・プロセッサであって、前記複数
の命令の前記同時に実行される異なるステージにそれぞ
れ設定された処理に必要な時間を示す時間情報のうち、
最大の時間情報を選択する選択回路と、前記パイプライ
ン・プロセッサの動作周波数より高い周波数のサブクロ
ック信号を用いて前記選択回路により選択された時間情
報からクロック信号を生成するクロック信号生成回路と
を具備している。SUMMARY OF THE INVENTION The pipeline of the present invention
In order to solve the above-mentioned problem, a processor is a pipeline processor that has a plurality of stages and a plurality of instructions are simultaneously executed while being shifted by one stage, wherein the different stages of the plurality of instructions that are simultaneously executed are different. Of the time information indicating the time required for the process set in
A selection circuit that selects the maximum time information; and a clock signal generation circuit that generates a clock signal from the time information selected by the selection circuit using a subclock signal having a frequency higher than the operating frequency of the pipeline processor. I have it.

【０００８】前記パイプライン・プロセッサは、前記時
間情報を各命令のコマンドをアドレスとして記憶するメ
モリと、前記命令の読み出し時に、読み出された命令の
コマンドをアドレスとして前記メモリに記憶された時間
情報を読み出し、前記読み出された命令に付加する付加
回路とをさらに具備している。The pipeline processor is configured to store the time information as a command of each instruction as an address, and to read the time information stored in the memory using the read command as an address when reading the instruction. And an additional circuit for reading the read instruction and adding the read instruction to the read instruction.

【０００９】前記パイプライン・プロセッサは、前記時
間情報を各命令のコマンドをアドレスとして記憶するメ
モリと、前記命令のデコード時に、デコードされる命令
のコマンドをアドレスとして前記メモリに記憶された時
間情報を読み出し、前記デコードされる命令に付加する
付加回路とをさらに具備している。The pipeline processor stores the time information as a command of each instruction as an address, and decodes the time information stored in the memory using the command of the decoded instruction as an address when decoding the instruction. An additional circuit for reading and adding the instruction to the decoded instruction.

【００１０】[0010]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面を参照して説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１１】図１は、本発明のパイプライン・プロセッ
サを示している。このパイプライン・プロセッサ１１
は、例えばプログラムカウンタ１２に接続された命令メ
モリ１３、この命令メモリ１３に接続された命令レジス
タ１４、この命令レジスタ１４に接続された命令デコー
ダ１５、サブクロック発生器１６、このサブクロック発
生器１６及び前記命令デコーダ１５に接続されたクロッ
ク生成回路１７、このクロック生成回路１７に接続され
た算術演算ユニット（ＡＬＵ）１８、データキャッシュ
メモリ１９、レジスタ２０により構成されている。FIG. 1 shows the pipeline processor of the present invention. This pipeline processor 11
Are, for example, an instruction memory 13 connected to the program counter 12, an instruction register 14 connected to the instruction memory 13, an instruction decoder 15 connected to the instruction register 14, a subclock generator 16, and a subclock generator 16. And a clock generation circuit 17 connected to the instruction decoder 15, an arithmetic operation unit (ALU) 18 connected to the clock generation circuit 17, a data cache memory 19, and a register 20.

【００１２】前記プログラムカウンタ１２は命令の実行
順序を制御する。このプログラムカウンタ１２の出力信
号に応じて命令メモリ１３に記憶されている１つの命令
が読み出され、命令レジスタ１４にロードされる。この
命令レジスタ１４に記憶された命令は命令デコーダ１５
によりデコードされる。The program counter 12 controls the order of instruction execution. One instruction stored in the instruction memory 13 is read out according to the output signal of the program counter 12 and loaded into the instruction register 14. The instruction stored in the instruction register 14 is an instruction decoder 15
Is decoded by

【００１３】サブクロック発振器１６は、例えば水晶発
振器により構成され、このプロセッサの動作周波数より
数倍高い周波数のサブクロック信号を発生する。このサ
ブクロック信号はクロック生成回路１７に供給される。
このクロック生成回路１７は、命令デコーダ１５のデコ
ード結果に応じて、サブクロック信号を制御し、クロッ
ク信号を発生する。The subclock oscillator 16 is composed of, for example, a crystal oscillator, and generates a subclock signal having a frequency several times higher than the operating frequency of the processor. This sub clock signal is supplied to the clock generation circuit 17.
The clock generation circuit 17 controls a sub-clock signal according to a decoding result of the instruction decoder 15 to generate a clock signal.

【００１４】ところで、この実施例において、前記命令
には、例えば命令を処理する各ステージの処理時間に関
する情報が、前記サブクロック信号のクロック数（以
下、単にクロック数と称す）に対応して予め設定されて
いる。In this embodiment, the instruction includes, for example, information on the processing time of each stage for processing the instruction in advance corresponding to the number of clocks of the subclock signal (hereinafter simply referred to as the number of clocks). Is set.

【００１５】図２は、命令の一例を示している。この例
の場合、命令２０にこの命令のデコード（Ｄ）、実行
（Ｅ）、メモリアクセス（Ｍ）、レジスタへの書き込み
（Ｗ）に要するクロック数群２１がそれぞれ“３”、
“４”、“１”、“２”と設定されている。尚、後述す
るように、命令のフェッチサイクルは各命令とも同一で
あるため、各命令毎に設定する必要はない。FIG. 2 shows an example of the instruction. In this example, the number of clocks 21 required for decoding (D), execution (E), memory access (M), and writing to a register (W) of the instruction 20 is “3”,
“4”, “1”, and “2” are set. As described later, since the fetch cycle of the instruction is the same for each instruction, it is not necessary to set the fetch cycle for each instruction.

【００１６】また、前記命令デコーダ１５は、命令２０
をデコードするとともに、順次デコードされる複数の命
令のクロック数群２１を保持する複数の例えばシフトレ
ジスタを有している。The instruction decoder 15 has an instruction 20
And a plurality of shift registers, for example, that hold a clock number group 21 of a plurality of instructions to be sequentially decoded.

【００１７】図３は、前記命令デコーダ１５に設けられ
た複数のシフトレジスタＲ０〜Ｒ３を示している。シフ
トレジスタＲ０には最も先にデコードされた命令０のク
ロック数群が保持され、シフトレジスタＲ３には最も後
にデコードされた命令３のクロック数群が保持されてい
る。各シフトレジスタＲ０〜Ｒ３に保持されたクロック
数は、例えばクロック生成回路１７から供給されるクロ
ック信号ＣＬＫに応じてシフトされ、パイプラインステ
ージの進行に従って出力される。図３に示す破線は、各
シフトレジスタＲ０〜Ｒ３から現在出力されている信号
を示すとともに、プロセッサの現在の処理ステージ状態
を示している。すなわち、命令０は書き込み（Ｗ）、命
令１はメモリアクセス（Ｍ）、命令２は実行（Ｅ）、命
令３は命令デコード（Ｄ）の状態であることを示してい
る。これらシフトレジスタＲ０〜Ｒ３から出力されるク
ロック数群は、クロック生成回路１７に供給される。FIG. 3 shows a plurality of shift registers R0 to R3 provided in the instruction decoder 15. The shift register R0 holds a group of clock numbers of the first decoded instruction 0, and the shift register R3 holds a group of clock numbers of the last decoded instruction 3. The number of clocks held in each of the shift registers R0 to R3 is shifted according to, for example, a clock signal CLK supplied from the clock generation circuit 17, and output as the pipeline stage progresses. The broken lines shown in FIG. 3 indicate the signals currently output from each of the shift registers R0 to R3 and indicate the current processing stage state of the processor. That is, the instruction 0 indicates a write (W), the instruction 1 indicates a memory access (M), the instruction 2 indicates an execution (E), and the instruction 3 indicates an instruction decode (D). The group of clock numbers output from the shift registers R0 to R3 is supplied to the clock generation circuit 17.

【００１８】図４は、クロック生成回路１７の一例を示
している。このクロック生成回路１７は例えばレジスタ
１７ａ、１７ｂ、１７ｃ、１７ｄ、１７ｅを有してい
る。前記命令デコーダ１５から供給される各ステージの
クロック数は例えばレジスタ１７ｂ、１７ｃ、１７ｄ、
１７ｅにそれぞれ供給される。すなわち、レジスタ１７
ｂ、１７ｃ、１７ｄ、１７ｅには、図３に破線で囲まれ
た部分のクロック数がそれぞれ供給される。尚、命令の
フェッチ（Ｆ）に要するサイクル数は前述したように各
命令で同一であるため、レジスタ１７ａには例えばクロ
ック数“２”が予め設定される。また、前記レジスタ１
７ａ〜１７ｅはクロック数が“１”の場合、“０００
１”がセットされ、クロック数が“２”の場合“００１
１”がセットされ、クロック数が“３”の場合、“０１
１１”、クロック数が“４”の場合“１１１１”が設定
される。FIG. 4 shows an example of the clock generation circuit 17. The clock generation circuit 17 has, for example, registers 17a, 17b, 17c, 17d, and 17e. The number of clocks of each stage supplied from the instruction decoder 15 is, for example, the registers 17b, 17c, 17d,
17e. That is, the register 17
To b, 17c, 17d, and 17e, the number of clocks in a portion surrounded by a broken line in FIG. 3 is supplied. Since the number of cycles required for the instruction fetch (F) is the same for each instruction as described above, the clock number "2" is set in advance in the register 17a, for example. The register 1
7a to 17e are “000” when the number of clocks is “1”.
"1" is set, and when the number of clocks is "2", "001" is set.
When "1" is set and the number of clocks is "3", "01" is set.
When the number of clocks is "4", "1111" is set.

【００１９】上記各レジスタ１７ａ〜１７ｅの各ビット
の信号はオア回路１７ｆ、１７ｇ、１７ｈ、１７ｉに供
給される。これらオア回路１７ｆ〜１７ｉの出力信号は
スイッチ回路１７ｋを構成する各スイッチ１７ｍ、１７
ｎ、１７ｏ、１７ｐの入力端にそれぞれ供給される。さ
らに、スイッチ回路１７ｋはスイッチ１７ｌを有してい
る。このスイッチ１７ｌの入力端には電源Ｖssが供給さ
れ、出力端は前記スイッチ１７ｍ、１７ｎ、１７ｏ、１
７ｐの出力端とともに、アンド回路１７ｑの反転入力端
に接続されている。このアンド回路１７ｑの他方入力端
には前記サブクロック発振器１６から出力されるサブク
ロック信号が供給されている。前記スイッチ１７ｌ〜１
７ｐは例えばＭＯＳトランジスタ又はトランスファーゲ
ートにより構成される。前記スイッチ１７ｌ、１７ｍ、
１７ｎ、１７ｏ、１７ｐは前記サブクロック発振器１６
から出力されるサブクロック信号により順次制御され
る。前記スイッチ１７ｌは制御信号を発生する機能を有
している。The signals of the respective bits of the registers 17a to 17e are supplied to OR circuits 17f, 17g, 17h and 17i. The output signals of these OR circuits 17f to 17i are connected to the switches 17m and 17m constituting the switch circuit 17k.
n, 17o, 17p. Further, the switch circuit 17k has a switch 17l. The power supply Vss is supplied to the input terminal of the switch 17l, and the output terminal is connected to the switches 17m, 17n, 17o,
The output terminal 7p and the inverting input terminal of the AND circuit 17q are connected together. The subclock signal output from the subclock oscillator 16 is supplied to the other input terminal of the AND circuit 17q. The switches 17l to 1
7p is composed of, for example, a MOS transistor or a transfer gate. The switches 17l, 17m,
17n, 17o and 17p are the sub-clock oscillators 16
Are sequentially controlled by the sub-clock signal output from. The switch 171 has a function of generating a control signal.

【００２０】上記構成において、スイッチ１７ｐ、１７
o、１７ｎ、１７ｍの入力端に供給されている信号が、
例えば全て“１”である場合、サブクロック信号に応じ
てスイッチ１７ｐ、１７o、１７ｎ、１７ｍが順次オン
とされると、スイッチ１７ｐ、１７o、１７ｎ、１７ｍ
の出力端はハイレベルとなる。このため、アンド回路１
７ｑから出力されるクロック信号ＣＬＫは、図５に示す
ように、ローレベルとなる。この後、スイッチ１７ｌが
オンとされ、このスイッチ１７ｌから出力される制御信
号がローレベルとなると、アンド回路１７ｑからハイレ
ベルのクロック信号ＣＬＫが出力される。以下、サブク
ロック信号に応じてスイッチ１７ｐ、１７o、１７ｎ、
１７ｍ、１７ｌが順次オンとされる。In the above configuration, the switches 17p, 17
o, the signal supplied to the input terminal of 17n, 17m,
For example, if all of them are "1", and the switches 17p, 17o, 17n, and 17m are sequentially turned on in accordance with the subclock signal, the switches 17p, 17o, 17n, and 17m are turned on.
Is at a high level. Therefore, the AND circuit 1
The clock signal CLK output from 7q becomes low level as shown in FIG. Thereafter, the switch 17l is turned on, and when the control signal output from the switch 17l goes low, the AND circuit 17q outputs a high-level clock signal CLK. Hereinafter, the switches 17p, 17o, 17n,
17m and 17l are sequentially turned on.

【００２１】このように、スイッチ１７ｐ、１７o、１
７ｎ、１７ｍの入力端に供給されている信号が全て
“１”である場合、現在の処理において最も長時間を要
する命令１のメモリアクセス（Ｍ）に対応して、クロッ
ク信号ＣＬＫの周期がサブクロック信号の周期の５倍と
される。以下、同様にしてプロセッサの動作サイクル毎
に各命令のうち最長の時間を要するステージに対応して
クロック信号ＣＬＫの周期が設定される。As described above, the switches 17p, 17o, 1
When the signals supplied to the input terminals of 7n and 17m are all “1”, the cycle of the clock signal CLK is set to the sub-cycle corresponding to the memory access (M) of the instruction 1 which requires the longest time in the current processing. It is set to five times the period of the clock signal. Hereinafter, similarly, the cycle of the clock signal CLK is set corresponding to the stage requiring the longest time among the instructions for each operation cycle of the processor.

【００２２】上記クロック生成回路１７から供給される
クロック信号ＣＬＫは、前記プログラムカウンタ１２、
命令メモリ１３、命令レジスタ１４、命令デコーダ１
５、算術演算ユニット１８、データキャッシュ１９、レ
ジスタ２０にそれぞれ供給され、このクロック信号ＣＬ
Ｋの基づいて各ステージの動作が実行される。The clock signal CLK supplied from the clock generation circuit 17 is supplied to the program counter 12,
Instruction memory 13, instruction register 14, instruction decoder 1
5, the arithmetic operation unit 18, the data cache 19, and the register 20, respectively.
The operation of each stage is executed based on K.

【００２３】図６は、命令０〜命令４とプロセッサの処
理サイクルの関係を示している。図６から明らかなよう
に、プロセッサの処理サイクルは同時に実行される命令
のステージに応じて変化している。すなわち、この実施
例の場合、各サイクルの時間が従来のようにワーストサ
イクルに規定されていない。したがって、プロセッサの
全体的な処理時間を短縮できる。FIG. 6 shows the relationship between instructions 0 to 4 and the processing cycle of the processor. As is apparent from FIG. 6, the processing cycle of the processor changes according to the stage of the instruction executed at the same time. That is, in the case of this embodiment, the time of each cycle is not defined as the worst cycle as in the related art. Therefore, the overall processing time of the processor can be reduced.

【００２４】尚、図６において、例えば最も長時間を要
するサイクル４が、命令１の実行ステージ（Ｅ）により
規定されていると仮定する。この場合、他の命令のステ
ージ（Ｍ）（Ｄ）（Ｆ）は、命令１の実行ステージ
（Ｅ）の処理が終了する以前に終了している。図６にお
いて、斜線部は、命令１の処理終了を待つ待ち時間を示
している。In FIG. 6, it is assumed that, for example, cycle 4 requiring the longest time is defined by the execution stage (E) of instruction 1. In this case, the stages (M), (D), and (F) of the other instructions are completed before the processing of the execution stage (E) of the instruction 1 is completed. In FIG. 6, a hatched portion indicates a waiting time to wait for the processing of the instruction 1 to be completed.

【００２５】上記実施例によれば、各命令に、各命令の
処理ステージに必要なクロック数を設定し、クロック生
成回路１７は、命令デコーダ１５から供給される前記ク
ロック数に基づいて、パイプライン処理される一連の命
令のうち、最長の時間を要するステージに応じてクロッ
ク信号ＣＬＫを発生している。このため、クロック信号
ＣＬＫの周期が同時に実行される命令に応じて可変され
るため、例えば同時に実行される命令のステージが処理
時間が短いステージのみで構成される場合、この処理サ
イクルを短時間に終了することができる。したがって、
プロセッサの全体的な処理時間を短縮できる。According to the above embodiment, the number of clocks required for the processing stage of each instruction is set for each instruction, and the clock generation circuit 17 performs the pipeline based on the number of clocks supplied from the instruction decoder 15. The clock signal CLK is generated according to the stage requiring the longest time in a series of instructions to be processed. For this reason, since the cycle of the clock signal CLK is changed according to the instruction executed at the same time, for example, when the stage of the instruction executed at the same time is constituted only by the stage having a short processing time, this processing cycle can be shortened. Can be terminated. Therefore,
The overall processing time of the processor can be reduced.

【００２６】また、ワースト時間を有する命令が希にし
か使用されない場合、この命令を高速にする必要がな
い。このため、この希にしか使用されない命令を高速化
するために再設計する必要がないため、開発労力及び開
発コストを削減することが可能である。When an instruction having the worst time is rarely used, it is not necessary to increase the speed of the instruction. For this reason, it is not necessary to redesign the rarely used instructions in order to increase the speed, and thus it is possible to reduce development effort and development cost.

【００２７】尚、上記実施例では、各命令に予め各ステ
ージのクロック数を設定した場合について説明したが、
これに限定されるものではない。In the above embodiment, the case where the number of clocks of each stage is set in advance for each instruction has been described.
It is not limited to this.

【００２８】図７は、前記命令メモリ１３においてクロ
ック数を付加する例を示している。この例の場合、例え
ばＲＯＭ５１には各命令のコマンドをアドレスとして、
各命令の各ステージに応じたクロック数が記憶されてい
る。命令メモリを構成する命令キャッシュメモリ５２と
例えばダイナミックＲＡＭからなる主記憶装置５３の相
互間には前記ＲＯＭ５１の読み出し回路５４が接続され
ている。FIG. 7 shows an example in which the number of clocks is added in the instruction memory 13. In the case of this example, for example, the command of each instruction is
The number of clocks corresponding to each stage of each instruction is stored. A read circuit 54 of the ROM 51 is connected between an instruction cache memory 52 constituting the instruction memory and a main storage device 53 composed of, for example, a dynamic RAM.

【００２９】上記構成において、プログラムカウンタの
動作に応じて、命令キャッシュメモリ５２がアクセスさ
れる。この際、命令キャッシュメモリ５２においてミス
ヒットした場合、主記憶装置５３がアクセスされ、この
主記憶装置５３から対応する命令が読み出される。この
読み出された命令は、命令キャッシュメモリ５２に記憶
される。In the above configuration, the instruction cache memory 52 is accessed according to the operation of the program counter. At this time, if a miss occurs in the instruction cache memory 52, the main storage device 53 is accessed, and the corresponding instruction is read from the main storage device 53. The read instruction is stored in the instruction cache memory 52.

【００３０】また、前記主記憶装置５３から命令が読み
出されると、読み出し回路５４により、この読み出され
た命令のコマンドをアドレスとしてＲＯＭ５１に記憶さ
れた対応する命令のクロック数が読み出される。この読
み出されたクロック数は命令キャッシュメモリ５２に供
給され、前記読み出された命令に付加される。以下の動
作は、上記実施例と同様である。When an instruction is read from the main storage device 53, the read circuit 54 reads the clock number of the corresponding instruction stored in the ROM 51 using the read instruction command as an address. The read clock number is supplied to the instruction cache memory 52 and added to the read instruction. The following operation is the same as in the above embodiment.

【００３１】図８は、前記命令レジスタ１４と命令デコ
ーダ１５の間でクロック数を付加する場合を示してい
る。例えばＲＯＭ６１には前記ＲＯＭ５１と同様に、各
命令のコマンドをアドレスとして、各命令の各ステージ
に応じたクロック数が記憶されている。命令レジスタ１
４と命令デコーダ１５の相互間には前記ＲＯＭ６１の読
み出し回路６２が接続されている。FIG. 8 shows a case where the number of clocks is added between the instruction register 14 and the instruction decoder 15. For example, similarly to the ROM 51, the ROM 61 stores the number of clocks corresponding to each stage of each instruction using the command of each instruction as an address. Instruction register 1
The read circuit 62 of the ROM 61 is connected between the instruction decoder 4 and the instruction decoder 15.

【００３２】上記構成において、命令レジスタ１４から
命令デコーダ１５に命令が転送される際、読み出し回路
６２により、この転送された命令のコマンドをアドレス
としてＲＯＭ６１に記憶された対応する命令のクロック
数が読み出される。この読み出されたクロック数は命令
デコーダ１５に供給され、このクロック数が前記転送さ
れた命令に付加される。これ以降の動作は、上記実施例
と同様である。In the above configuration, when an instruction is transferred from the instruction register 14 to the instruction decoder 15, the read circuit 62 reads the clock number of the corresponding instruction stored in the ROM 61 using the transferred instruction command as an address. It is. The read clock number is supplied to the instruction decoder 15, and the clock number is added to the transferred instruction. Subsequent operations are the same as in the above embodiment.

【００３３】図７、図８に示す構成とすれば、命令の読
み出し時や転送時にクロック数が付加され、命令自体に
はクロック数が設けられていないため、命令セットの構
成を簡略化することが可能である。また、実施のプロセ
ッサのスピードを評価した上でＲＯＭ５１、６１にクロ
ック数を設定することにより、最適なクロック数を設定
できる利点を有している。With the configuration shown in FIGS. 7 and 8, the number of clocks is added at the time of reading or transferring an instruction, and the number of clocks is not provided in the instruction itself. Therefore, the configuration of the instruction set can be simplified. Is possible. Further, by setting the number of clocks in the ROMs 51 and 61 after evaluating the speed of the processor to be implemented, there is an advantage that the optimum number of clocks can be set.

【００３４】尚、図７、図８において、クロック数を記
憶する手段は、ＲＯＭに限定されるものではなく、ＲＡ
Ｍ等のメモリを適用可能である。In FIGS. 7 and 8, the means for storing the number of clocks is not limited to a ROM.
A memory such as M can be applied.

【００３５】その他、本発明の要旨を変えない範囲にお
いて種々変形実施可能なことは勿論である。Of course, various modifications can be made without departing from the spirit of the present invention.

【００３６】[0036]

【発明の効果】以上、詳述したように本発明によれば、
ワーストのサイクル時間を短縮することなく、全体の処
理に必要な時間を短縮でき、高速動作が可能なパイプラ
イン・プロセッサを提供できる。As described in detail above, according to the present invention,
The time required for the entire processing can be reduced without reducing the worst cycle time, and a pipeline processor capable of high-speed operation can be provided.

[Brief description of the drawings]

【図１】図１は本発明の実施例を示す回路構成図。FIG. 1 is a circuit diagram showing an embodiment of the present invention.

【図２】図２は本発明に適用される命令の一例を示す
図。FIG. 2 is a diagram showing an example of an instruction applied to the present invention.

【図３】図１に示す命令デコーダの動作を説明するため
に示す図。FIG. 3 is a view for explaining an operation of the instruction decoder shown in FIG. 1;

【図４】図１に示すクロック発生回路の一例を示す回路
図。FIG. 4 is a circuit diagram illustrating an example of a clock generation circuit illustrated in FIG. 1;

【図５】図４の動作を示す波形図。FIG. 5 is a waveform chart showing the operation of FIG.

【図６】図１の動作を説明するために示す図。FIG. 6 is a view for explaining the operation of FIG. 1;

【図７】図１に示す命令メモリの変形例を示す構成図。FIG. 7 is a configuration diagram showing a modified example of the instruction memory shown in FIG. 1;

【図８】図１に示す命令デコーダの変形例を示す構成
図。FIG. 8 is a configuration diagram showing a modification of the instruction decoder shown in FIG. 1;

【図９】従来のパイプライン・プロセッサの動作を説明
するための図。FIG. 9 is a diagram for explaining the operation of a conventional pipeline processor.

[Explanation of symbols]

１１…パイプライン・プロセッサ、１２…プログラムカウンタ、１３…命令メモリ、１４…命令レジスタ、１５…命令デコーダ、１６…サブクロック発振器、１７…クロック発生回路、１８…ＡＬＵ、１９…データキャッシュ、２０…レジスタ、５１、６１…ＲＯＭ、５４、６２…読み出し回路。 DESCRIPTION OF SYMBOLS 11 ... Pipeline processor, 12 ... Program counter, 13 ... Instruction memory, 14 ... Instruction register, 15 ... Instruction decoder, 16 ... Subclock oscillator, 17 ... Clock generation circuit, 18 ... ALU, 19 ... Data cache, 20 ... Registers 51, 61 ROM, 54, 62 read circuit.

Claims

[Claims]

1. A method comprising: a plurality of stages;
A pipeline processor which is simultaneously executed while being shifted by stages, wherein the maximum time information among the time information indicating the time required for the processes respectively set in the different stages to be simultaneously executed of the plurality of instructions is set as the maximum time information. A selecting circuit for selecting, and a clock signal generating circuit for generating a clock signal from time information selected by the selecting circuit using a subclock signal having a frequency higher than an operating frequency of the pipeline processor. And a pipeline processor.

2. A memory for storing the time information as a command of each instruction as an address, and at the time of reading the instruction, reading the time information stored in the memory using the command of the read instruction as an address. 2. The pipeline processor according to claim 1, further comprising an additional circuit for adding the issued instruction.

3. A memory for storing the time information as a command of each instruction as an address, and at the time of decoding the instruction, reading the time information stored in the memory with the command of the decoded instruction as an address. 2. The pipeline processor according to claim 1, further comprising an additional circuit for adding an instruction to the instruction.