JP3531311B2

JP3531311B2 - Instruction reading device

Info

Publication number: JP3531311B2
Application number: JP23765695A
Authority: JP
Inventors: 淳河井
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1995-08-23
Filing date: 1995-08-23
Publication date: 2004-05-31
Anticipated expiration: 2015-08-23
Also published as: JPH0962509A

Description

【発明の詳細な説明】【０００１】【発明の属する技術分野】本発明は、パイプライン方式
の計算機における命令読み出し装置に関する。【０００２】【従来の技術】従来より、パイプライン方式の計算機で
は、命令実行を複数のステージで構成し、その各処理ス
テージでそれぞれ別の命令に対する処理を行うパイプラ
イン構成をとることで、等価的に最小命令実行時間をパ
イプライン１段分の処理時間、即ち、１クロックサイク
ルを実現していることは良く知られている。このような
構成のパイプライン計算機において実現可能な最大性能
を満足させるためには、命令読み出し能力も、１命令／
クロックサイクルが必要とされる。もし命令の読み出し
に２クロックサイクルを費やさなければならないとした
ら、最小命令実行時間は２倍となり、２クロックサイク
ルに低下してしまう。このために、高速クロックで動作
するＣＰＵでは高速のＳＲＡＭやＲＯＭ、あるいは命令
キャッシュメモリをＣＰＵと同一のＬＳＩ（以下ＣＰＵ
−ＬＳＩと記す）内部、あるいはＬＳＩ外部に設置して
いる。【０００３】【発明が解決しようとする課題】しかしながら、従来の
ように、ＳＲＡＭやＲＯＭ、あるいは命令キャッシュメ
モリをＣＰＵ−ＬＳＩ内部、あるいは外部に設置した場
合、そのいずれであってもメモリデバイスがＣＰＵ動作
クロックサイクル時間以下の高速アクセス可能でなけれ
ばならず、これらの方法では追従可能なＣＰＵ動作クロ
ック周波数に限界があった。即ち、メモリデバイスのア
クセス時間でＣＰＵの動作クロック周波数が決定されて
しまっていた（ある値以上、クロック周波数を上げるこ
とができなかった）。【０００４】しかも、メモリデバイスをＣＰＵ−ＬＳＩ
内部に設けた場合、そのメモリデバイスを高速アクセス
可能なメモリとするため消費電力が増大し、その結果、
ＣＰＵ−ＬＳＩとして大消費電力、高発熱となる問題が
あった。また、高速アクセス可能なメモリとするために
は回路遅延、および配線遅延を制限する必要があり、こ
のため、記憶容量を大きく取ることができず、必要な命
令メモリ、あるいは命令キャッシュメモリ容量をＣＰＵ
−ＬＳＩに内蔵させることが出来ない場合も多かった。
即ち、回路遅延や配線遅延を考慮した場合、必要とする
メモリ容量を確保するのが困難であった。【０００５】また、メモリデバイスをＣＰＵ−ＬＳＩの
外部に設けた場合は、メモリデバイスを内部に設けた場
合に比べて、更に、ＣＰＵ−ＬＳＩと外部メモリデバイ
ス間の回路遅延、および配線遅延が大きくなるため、追
従可能なＣＰＵ動作クロックに大きな制限が生ずるこ
と、ＣＰＵ−ＬＳＩ外部に高速メモリを設置するため、
実装面積、消費電力、および部品コストが増大する問題
があった。【０００６】更に、命令キャッシュメモリを設置した場
合は、必要な命令コードの全てを格納することはできな
い（しない）。従って、キャッシュミスによるウエイト
サイクルが生じ、このことが平均命令アクセス時間の低
下となる問題があった。【０００７】このような点から、消費電力が少なく、構
成が簡素で、かつ、必要とする記憶容量を確保しながら
命令実行時間を高速化することのできる命令読み出し装
置を実現することが望まれていた。【０００８】【課題を解決するための手段】本発明は、前述の課題を
解決するために、一つのステージを１クロックサイクル
で行うパイプライン方式の計算機における命令読み出し
装置であって、実行すべきプログラムの複数の命令コー
ドを格納し、該複数の命令コードを命令ブロックとして
一括して読み出し可能な命令メモリと、前記命令メモリ
に、前記複数の命令コードを一括して命令ラッチに格納
すべく前記命令ブロックを指定するための命令ブロック
アドレスを出力すると共に、実行すべき命令コードを指
定するための命令ポジションを出力する命令アドレス生
成部と、前記命令アドレス生成部から出力された命令ポ
ジションに基づき、前記命令ラッチから前記複数の命令
コードのいずれかの命令コードを選択し出力する命令セ
レクタと、前記命令セレクタで選択された前記命令コー
ドを一時的に保持し、前記命令コードを実行する命令実
行装置へ１クロックサイクル毎に出力する命令レジスタ
と、前記命令アドレス生成部から出力された前記命令ポ
ジションと、前記命令実行装置が前記命令コードを分岐
命令と判断して出力した分岐ヒットとに基づいて、前記
１クロックサイクルだけ命令処理を停止させるウエイト
信号を前記命令アドレス生成部と前記命令実行装置へ出
力するタイミング生成部とを備えたことを基本構成とす
る。【０００９】本発明の命令読み出し装置がこのように構
成されているため、命令メモリより、ブロックアドレス
に基づき一度に複数の命令コードを命令ラッチに読み出
す。命令セレクタは、命令アドレス生成部より出力され
る命令ポジションに基づき、命令ラッチ内の命令ブロッ
クにおけるいずれかの命令コードを選択する。命令レジ
スタは、命令セレクタで選択された命令コードを一時的
に保持し、１クロックサイクル毎に出力する。【００１０】タイミング生成部は、１クロックサイクル
だけウエイト信号を送出する。即ち、タイミング生成部
は、命令ポジションと、命令実行装置が命令コードを分
岐命令と判断して出力した分岐ヒットとに基づいて、１
クロックサイクルだけウエイト信号を送出する。そし
て、命令ラッチに読み出した複数の命令コードのうち、
次に実行すべき分岐先命令アドレスにある連続する命令
コードを命令セレクタにて選択し、命令レジスタでこれ
を次の命令実行開始時点に保持して出力する。尚、命令
メモリにおける分岐先命令ブロック読み出し時間が予め
決められた時間を超える場合とは、分岐先命令ブロック
の読み出しが実行中の命令ブロックに続けて読み出せな
い場合を指している。【００１１】本発明では、前記タイミング生成部は、前
記命令ポジションにより前記分岐先の命令ブロックの最
後の命令コードのみが処理対象と判定すると、再度ウエ
イト信号を前記命令アドレス生成部と前記命令実行装置
へ出力することを特徴とする。タイミング生成部がこの
ように構成されていることにより、分岐命令時には、少
なくとも１回のウエイト信号の送出によって、高速な命
令メモリが必要なくなり、かつ分岐先命令のポジション
によってウエイト信号の送出回数が異なることによっ
て、命令実行時間の増加を抑えることができる。【００１２】【発明の実施の形態】以下、本発明の実施の形態を図面
を用いて詳細に説明する。【００１３】《実施形態１》［構成］図１は本発明の命令読み出し装置の実施形態１
を示す構成図である。命令読み出し装置は、命令アドレ
ス生成部１、命令メモリ２、命令ラッチ３、命令セレク
タ４、命令レジスタ５、タイミング生成部６からなる。
また、この命令読み出し装置の後段側には、命令実行装
置１００が設置され、この命令実行装置に対して命令コ
ードおよびウエイト信号を供給するようになっている。
尚、この命令読み出し装置および命令実行装置１００の
構成によってＣＰＵ（中央処理装置）が構成されてお
り、以下、これらを含めてＣＰＵと記す。【００１４】命令アドレス生成部１は、命令読み出し装
置、および命令実行装置１００に共通のクロックと、命
令実行装置１００から入力される分岐ヒットと、分岐オ
フセットの各信号と、タイミング生成部６から入力され
るアドレス選択信号とに基づき、次に実行すべき命令ア
ドレスの生成、およびその時点で実行中の命令の含まれ
る命令ブロックに連続する次の命令ブロックを示す命令
ブロックアドレスの生成と、実行中の命令が分岐命令で
ある場合には、分岐先命令コードの含まれる命令ブロッ
クアドレス（以下分岐先命令ブロックと記す）の生成と
を同時に行い、次に実行すべき命令ブロックアドレスと
命令ポジションとを出力する。【００１５】本実施形態では一つの命令ブロックは連続
する二つの命令コードから構成されるとする。命令ブロ
ックには、偶数番地の命令アドレスに配置される命令コ
ード、そしてそれに連続する奇数番地の命令アドレスに
配置される命令コードの順に二つの命令コードが置かれ
る。命令ポジションは命令アドレス生成部１の内部で生
成される次に実行すべき命令アドレスの最下位ビット
で、命令ブロックの二つの命令コードのうちの一つを指
定する信号である。【００１６】命令メモリ２は、プログラムを構成する命
令コードを格納するもので、命令アドレス生成部１から
入力される命令ブロックアドレスで指定される命令ブロ
ックの二つの命令コードを同時に読み出すよう構成され
ている。命令ラッチ３は、命令メモリ２から読み出した
命令ブロックの二つの命令コードをラッチするものであ
る。ここでラッチ動作とは、命令ラッチ３に入力される
命令ラッチ信号が“１”の状態においては、データ入力
である二つの命令コードをそのまま通過させて出力し、
命令ラッチ信号が“１”から“０”に立ち下がる時点
で、その時点でのデータ入力を保持し始め、命令ラッチ
信号が“０”の状態にある間上記データ入力を保持し続
けることである。また、命令ラッチ信号はタイミング生
成部６から入力される。【００１７】命令セレクタ４は、命令ラッチ３から出力
される命令ブロックを構成する二つの命令コードを入力
し、命令アドレス生成部１から入力される命令ポジショ
ンを選択信号として次に実行すべき命令アドレスに配置
される命令コードを選択して出力する。命令レジスタ５
は、命令セレクタ４から出力される次に実行すべき命令
コードを保持する。この命令レジスタ５にはクロックが
セット信号として入力され、毎クロックサイクルのクロ
ックの立ち上がり時点で新たな命令コードが保持される
よう構成されている。【００１８】タイミング生成部６は、クロック、命令実
行装置１００から入力される分岐ヒット、および上記命
令アドレス生成部１から入力される命令ポジションとか
ら、アドレス選択信号を生成して、命令アドレス生成部
１に出力し、また、命令ラッチ信号を生成して命令ラッ
チ３に、そしてウエイト信号を生成して命令アドレス生
成部１、および命令実行装置１００に出力するよう構成
されている。アドレス選択信号は、命令アドレス生成部
１に入力された場合、分岐以外の命令実行時には実行中
の命令コードを含む命令ブロックの配置される命令ブロ
ックアドレスを選択させ、分岐命令の実行時には分岐先
命令ブロックアドレスを選択させる。ウエイト信号は、
命令アドレス生成部１に対して、次に実行すべき命令ア
ドレスの更新を保留させるため、および命令実行装置１
００に対して命令実行を保留させるための信号で、命令
メモリ２からの分岐先命令コードを含む命令ブロック
（以下分岐先命令ブロックと記す）の読み出し時間のた
めに必要な分岐先アドレスに配置される命令コードの供
給が遅れる時にクロックサイクル単位で生成される。【００１９】図２は、実施形態１の命令アドレス生成部
１の構成図である。図示のように、命令アドレス生成部
１は、三つの加算器１１、１２、１３、二つの選択回路
（セレクタ１４、セレクタ１５）、および二つのレジス
タ（命令フェッチアドレスレジスタ１６、命令デコード
アドレスレジスタ１７）とで構成される。【００２０】命令フェッチアドレスレジスタ１６は、実
行すべき命令コードの配置される命令アドレスを保持す
るレジスタで、パイプライン方式で処理される命令の命
令フェッチ相にある命令アドレスを最低１クロックサイ
クルの間保持するものである。また、命令デコードアド
レスレジスタ１７は、実行すべき命令コードの配置され
る命令アドレスを保持するレジスタで、パイプライン方
式で処理される命令の命令デコード相にある命令アドレ
スを最低１クロックサイクルの間保持するものである。【００２１】加算器１１は、命令フェッチ相にある命令
コードの命令アドレスに連続する次の命令アドレスを計
算するもので、命令フェッチアドレスレジスタ１６の内
容に１を加える演算を行う。また、加算器１２は、命令
デコード相にある命令が分岐命令である場合に、その分
岐命令コードの示す分岐先命令アドレスを計算するもの
で、命令デコードアドレスレジスタ１７の内容と、命令
デコード相にある命令コードの示す分岐オフセット値を
加算する演算を行う。【００２２】セレクタ１４は、次に実行すべき命令アド
レスを選択するもので、命令実行装置１００から入力さ
れる分岐ヒット、およびタイミング生成部６から入力さ
れるウエイト信号とを選択信号として、ウエイト信号が
“１”の場合には分岐ヒットのレベルにかかわらず命令
フェッチアドレスレジスタ１６から入力される現命令ア
ドレスを選択する。一方、ウエイト信号が“０”で、分
岐ヒットが“０”の場合には加算器１１から入力される
次命令アドレスを選択し、ウエイト信号が“０”で、分
岐ヒットが“１”の場合には加算器１２から入力される
分岐先命令アドレスを選択して命令フェッチアドレスレ
ジスタ１６に出力する。【００２３】加算器１３は、命令フェッチアドレスレジ
スタ１６に保持される命令フェッチ相にある命令コード
を含む命令ブロックアドレスに連続する次の命令ブロッ
クアドレスを計算するものである。この加算器１３は、
命令フェッチアドレスレジスタ１６の出力のうち、最下
位１ビット除いた値、即ち、命令ブロックアドレス（２
命令コード分単位のアドレス）に１を加える演算を行
う。【００２４】セレクタ１５は、命令メモリ２から読み出
すべき次の命令ブロックを配置する命令ブロックアドレ
スを選択するものである。このセレクタ１５は、タイミ
ング生成部６から入力されるアドレス選択信号を選択信
号として、次に読み出すべき命令ブロックが現実行中の
命令ブロックに連続する命令ブロックである場合には加
算器１３から入力される次命令ブロックアドレスを選択
し、一方、次に読み出すべき命令ブロックが分岐先命令
アドレスに配置される命令コードを含む命令ブロックで
ある場合には命令フェッチアドレスレジスタ１６から入
力される命令ブロックアドレスを選択して命令メモリ２
に出力する。【００２５】図３は、タイミング生成部６の構成図であ
る。タイミング生成部６は、五つのフリップフロップ
（ＦＦ）２１〜２５、二つの論理積回路（アンドゲー
ト）２６、２７、および三つの論理和回路（オアゲー
ト）２８〜３０とで構成される。【００２６】ＦＦ２５は、命令実行装置１００から入力
される分岐ヒットを１クロックサイクル遅延させるもの
で、出力は分岐選択信号としてアンドゲート２６、オア
ゲート２８、およびオアゲート３０に入力される。ＦＦ
２１、アンドゲート２６、ＦＦ２２、およびオアゲート
２８は、ＦＦ２５から入力される分岐選択信号とクロッ
クにより、命令ラッチ３に供給する命令ラッチ信号を生
成する。【００２７】命令ラッチ信号は命令メモリ２から読み出
す命令ブロックを命令ラッチ３にラッチするための信号
である。命令ブロックは二つの命令コードで構成される
ため、分岐命令以外の命令が実行される場合には２クロ
ックサイクル間隔で該信号が論理レベル“１”となる
（以下アサートされると記す）。ＦＦ２１、およびＦＦ
２２のループはこれに用いられる。しかしながら、分岐
命令が実行される場合には、分岐先命令ブロックアドレ
スの決定時間と、これにより読み出される分岐先命令ブ
ロックの確定時間が分岐命令以外の命令実行時と異なる
ため、命令ラッチ信号のアサート時間を変える必要があ
る。アンドゲート２６、およびオアゲート２８はこのた
めの論理演算を行うもので、分岐命令を実行する場合に
は、その分岐命令のメモリ操作相、即ち、命令デコード
相から２クロックサイクル後の１クロックサイクル時間
に命令ラッチ信号がアサートされる。【００２８】アンドゲート２７、ＦＦ２３、ＦＦ２４お
よびオアゲート２９は、命令実行装置１００に供給する
ウエイト信号を生成するもので、分岐命令が実行された
場合で、更に分岐先命令アドレスが偶数番地であるか、
あるいは奇数番地であるかによりウエイト信号のアサー
トされるタイミングが決定される。【００２９】ウエイト信号は命令フェッチアドレスレジ
スタ１６に保持されている命令フェッチ相にある命令ア
ドレスの保存を延長させたり、実行中の命令の命令フェ
ッチ相をクロックサイクル単位で遅延させる信号で、命
令アドレス生成部１、および命令実行装置１００に出力
される。命令フェッチ相を、ウエイト信号のアサートさ
れている間遅延させることで、命令供給と命令実行のタ
イミングを合わせることができる。【００３０】更に、分岐先命令アドレスが奇数番地であ
る場合には、読み出した分岐先命令ブロックにある二つ
の命令コードのうち、奇数番地の命令コードのみが実行
される。従って、この命令の命令フェッチ相の次のクロ
ックサイクルには新たな命令ブロックが読み出されてい
る必要がある。ところが、本発明の実施形態１では命令
メモリ２のアクセス時間を１クロックサイクルより大き
く、かつ２クロックサイクル未満としているため、この
場合、分岐先命令ブロックに連続する次の命令ブロック
を１クロックサイクル後に読み出すことはできない。こ
の場合には、上記ウエイト信号を再度１クロックサイク
ルの間アサートさせて、分岐先命令ブロックの読み出
し、およびそのブロックに連続する次命令ブロックの読
み出しの両方の時間遅延を命令実行装置１００に通知す
る。【００３１】オアゲート２９の二つの入力のうち、命令
実行装置１００から供給される分岐ヒットは分岐先命令
ブロックの読み出しによる命令コード確定の遅れのため
に、また、アンドゲート２７、ＦＦ２３、およびＦＦ２
４を経て生成される信号は、分岐先命令アドレスが奇数
番地である場合に分岐先命令ブロックに連続する次の命
令ブロックの読み出しによる命令コード確定の遅れのた
めにウエイトサイクルの生成を促すもので、これら二つ
の入力の論理和によりウエイト信号が生成される。【００３２】［動作］図４〜図６は、それぞれ実施形態
１の命令読み出し装置における分岐命令を含まない命令
列の実行時の動作タイムチャート、分岐先命令アドレス
が偶数番地となる分岐命令を含む命令列の実行時の動作
タイムチャート、分岐命令アドレスが奇数番地となる分
岐命令を含む命令列の実行時の動作タイムチャートであ
る。【００３３】図４では、その上部に、図１の命令読み出
し装置の各部の信号を示す。また、下部は、命令読み出
し装置において処理される命令フェッチ相、および命令
読み出し装置から供給される命令コードを実行する命令
実行装置１００における命令デコード相、命令実行相、
メモリ操作相、および結果格納相の各パイプライン相を
各命令毎に示す。先ず、命令メモリ２は一度に二つの命
令コードで構成される命令ブロックを読み出すことがで
きる。但し、命令メモリ２の読み出しアクセス時間は１
クロックサイクルより大きく２クロックサイクル未満
（図４では１．５クロックサイクル程度）としている。【００３４】以下、図４に示す命令ブロックＮ＋１につ
いて、それが命令メモリ２から読み出されて命令レジス
タ５の出力に保持され、命令実行装置１００に供給され
る動作について説明する。図４に示すように、各命令ブ
ロックは二つの命令コードで構成されるため、命令ブロ
ックの読み出しは２クロックサイクル毎でよい。従っ
て、命令ブロックアドレスの更新は２クロックサイクル
毎に行われ、この命令ブロックアドレスで指定される命
令ブロックが、命令メモリ２から２クロックサイクル毎
に読み出される。【００３５】命令メモリ２の読み出し時間は１クロック
サイクルを超えるとしているので、新たに命令メモリ２
から読み出される命令ブロックＮ＋１は、命令ブロック
アドレスＮ＋１の確定後、１クロックサイクル以降に確
定する。命令ラッチ３に入力される命令ラッチ信号は、
命令ブロックアドレス確定後、約１クロックサイクル後
に“１”に変化するため、それまでは命令ブロックＮの
内容を保持し続ける。命令ラッチ信号が“１”の状態で
は、命令ラッチ３は透過となる。入力される命令メモリ
読み出しデータは、命令ラッチ信号が“１”に変化した
時点では、命令ブロックアドレスＮ＋１に対する読み出
しデータは確定しておらず不定である。やがて、必要な
読み出し時間が経過すると、命令コードＮ＋１の値は確
定し、命令ラッチ３を透過して命令セレクタ４に出力さ
れる。【００３６】命令セレクタ４では、入力される実行すべ
き命令アドレスの最下位ビットである命令ポジションに
より、入力される命令ブロックの二つの命令コードのう
ちの一つを選択し、命令レジスタ５に出力する。命令ブ
ロックＮ＋１は、分岐命令実行によるものではなく、連
続的に実行されるので、命令ポジションは“０”、
“１”の順となる。従って、命令セレクタ４では、最初
のクロックサイクルにおいて命令コードＮ＋１（０）
を、次のクロックサイクルで命令コードＮ＋１（１）を
出力する。【００３７】命令ラッチ３は、入力される命令ラッチ信
号の立ち下がり時点の入力データを保持し、命令ラッチ
信号が“０”の間データは保持されるので、命令ブロッ
クＮ＋１の二つの命令コードＮ＋１（０）、命令コード
Ｎ＋１（１）は正しく命令レジスタ５に入力される。命
令レジスタ５は、命令セレクタ４により選択されて入力
された命令コードＮ＋１（０）、および命令コードＮ＋
１（１）を順次クロックの立ち上がり時点で保持し、１
クロックサイクルの間命令実行装置１００に供給する。【００３８】次に、図５、６に示す分岐先命令アドレス
が偶数番地となる分岐命令を含む命令列の実行時の動作
を説明する。図５、６では、命令ブロックＮの最初に実
行される命令、即ち、命令コードＮ（０）が分岐命令で
あり、その命令で指定される分岐先命令アドレスが命令
ブロックＭの偶数番地の命令、即ち、命令コードＭ
（０）であるとする。ここで、命令ブロックＮと命令ブ
ロックＭは命令アドレス空間上連続であっても不連続で
あっても構わない。【００３９】分岐命令｛命令コードＮ（０）｝は、命令
実行装置１００において命令デコード相で解読され、そ
の命令が分岐命令であることが認識される。これと同時
に本命令読み出し装置の命令アドレス生成部１において
分岐先命令アドレスの計算が行われる。分岐先命令アド
レスは、分岐命令｛分岐命令コードＮ（０）｝の配置さ
れる命令アドレスの値と、この命令コードから与えられ
る分岐オフセット値を加算することで得られる。【００４０】命令コードＮ（０）が分岐命令であるの
で、これに連続する次の命令である命令コードＮ（１）
は本来実行されてはならない命令であるが、本発明では
ＲＩＳＣ（Reduced Instruction Set Computer ）で採
用されている遅延分岐という手法を用いるため、これを
連続して実行することが可能である。ここで、遅延分岐
とは、分岐命令の実行により分岐した先の一つ、または
複数の命令コードを分岐命令に連続して再配置し、更に
分岐先命令アドレスをこの再配置を行った命令数に等し
い値だけ進めて分岐オフセット値とするもので、これに
より分岐先の命令コードを命令メモリ２から新たに読み
出す時間に、予め再配置しておいた本来の分岐先の命令
を実行してしまうことが可能となる。この再配置された
命令を本説明において遅延命令と呼ぶ。【００４１】従って、図５、６に示す遅延命令である命
令コードＮ（１）は、分岐先命令ブロックＭを命令メモ
リ２から読み出す間に実行される。図５、６の分岐先命
令である命令コードＭ（０）の実行タイムチャート部分
から分かるように、分岐命令である命令コードＮ（０）
から、分岐先命令である命令コードＭ（０）まで２クロ
ックサイクルの間隔があるため、命令コードＮ（１）に
加え、更に命令コードＮ＋１（０）に相当する命令をも
遅延命令とすることが可能である。しかしながら、この
命令コードＮ＋１（０）の含まれる命令ブロックＮ＋１
は、新たに命令メモリから読み出す必要があり（本実施
形態１では、命令コードを二つしか読まないため）、こ
れを行ってしまうと分岐先命令ブロックである命令ブロ
ックＭの読み出しがその分遅れてしまうことから、命令
コードＮ＋１（０）に相当する命令は遅延命令とはしな
い。【００４２】従って、遅延命令である命令コードＮ
（１）の実行と分岐先命令である命令コードＭ（０）の
実行の間の１クロックサイクルはウエイトサイクルとな
る。この１クロックサイクルの間、タイミング生成部６
からウエイト信号がアサートされ、命令アドレス生成部
１、および命令実行装置１００に出力される。ウエイト
信号がアサートされるクロックサイクルは命令アドレス
生成部１では命令フェッチアドレスレジスタ１６に保持
される命令フェッチ相の命令アドレスの内容の更新を保
留し、命令実行装置１００では命令実行を遅延し、正し
い命令実行シーケンスを保つ。【００４３】次に、図７、８に示す分岐先命令アドレス
が奇数番地となる分岐命令を含む命令列の実行時の動作
を説明する。上述した図５、６に示す分岐先命令アドレ
スが偶数番地となる分岐命令を含む命令列の実行動作と
の大きな違いはウエイトサイクルの数である。図７、８
では、命令ブロックＮの偶数番地の命令である命令コー
ドＮ（０）が分岐命令であるが、分岐先命令は、分岐先
命令ブロックＭの奇数番地命令である命令コードＭ
（１）としている。ここで、分岐命令の実行、分岐先命
令ブロックの読み出しと命令ラッチ３への格納までは図
５、６の場合と同様の動作である。【００４４】分岐先命令は奇数番地に配置されるため、
命令セレクタ４に入力される命令ポジションは“１”と
なり、これにより命令セレクタ４では命令コードＭ
（１）を選択して命令レジスタ５に出力する。命令レジ
スタ５では図５、６に示す場合と同様に、１クロックサ
イクルのウエイトサイクルを置いて分岐先命令である命
令コードＭ（１）を保持し、命令実行装置１００に出力
する。命令アドレス生成部１、および命令実行装置１０
０には、ウエイトサイクルの間、ウエイト信号がアサー
トされて出力される。【００４５】分岐先命令アドレスが奇数番地の場合に
は、分岐先命令ブロックＭのうち奇数番地にある命令コ
ードＭ（１）のみが実行される。つまり、ウエイトサイ
クルを含む２クロックサイクルで読み出した命令ブロッ
クＭを、１クロックサイクルで実行してしまうことにな
る。従って、分岐先命令ブロックＭに連続する次の命令
ブロックである命令ブロックＭ＋１はこの１クロックサ
イクルの間に命令メモリ２から読み出されなければなら
ないが、命令メモリ２の命令ブロック読み出し時間は１
クロックを超えるため、更に１クロックサイクルのウエ
イトサイクルを発生する必要がある。図８に示すよう
に、このウエイトサイクルが命令コードＭ（１）の命令
デコード相、および命令コードＭ＋１（０）の命令フェ
ッチ相の前に挿入されている。【００４６】［効果］本発明実施形態１の命令読み出し
装置は、複数命令で構成される命令ブロックを一度に読
み出すことが可能な命令メモリ２、命令メモリ２から読
み出した命令ブロックを一時的に保持する命令ラッチ
３、命令ラッチ３に保持される一つの命令ブロックにあ
る複数の命令コードのうち、次に実行すべき命令コード
を選択するための命令セレクタ４、命令セレクタ４で選
択された命令コードを一時的に保持するための命令レジ
スタ５、およびこれらに必要な命令アドレスと、各制御
信号を供給するための命令アドレス生成部、およびラッ
チタイミング等のタイミング信号を生成するタイミング
生成部で構成され、一度に連続する複数の命令コードを
読み出し、これを一時的に保持し、この中から次に実行
すべき命令コードを抽出して命令レジスタ５に設定し、
命令実行装置１００に供給するようにしたので次のよう
な効果がある。【００４７】即ち、複数命令を一括して命令メモリ２か
ら予め読み出すことにより、命令メモリ２の読み出し時
間が命令実行装置１００における命令処理時間を超える
場合でも、分岐命令を含まない命令列の実行の場合に
は、各命令実行サイクル毎に実行すべき命令コードを命
令実行装置１００に供給することができ、また、分岐命
令を含む命令列の実行の場合には、分岐先の命令ブロッ
ク、あるいは分岐先命令ブロックに連続する次の命令ブ
ロックの読み出し時のみウエイトサイクルを挿入するこ
とで、命令メモリ２の読み出し速度の低下による命令実
行時間の増加を最小とすることができる。【００４８】一方、従来技術によるパイプライン方式の
計算機における命令読み出し装置では、命令実行装置の
命令実行時間以下で命令読み出しを行う必要があり、こ
れが満足されない場合には、各命令フェッチ相にウエイ
トサイクルを挿入するか、ＣＰＵ動作クロック周波数を
低下させる方法で対応している。実施形態１の命令メモ
リ２では、読み出し時間は１クロックサイクルより大き
く２クロックサイクル未満としている。この命令メモリ
２を用い、従来技術による命令読み出し装置を構成した
場合、１クロックサイクル以下で命令読み出しを行うこ
とが可能な命令メモリを用いた場合に比較して平均命令
実行時間は２倍となる。ところが、本発明実施形態１の
命令読み出し装置では、平均命令実行時間Ｔａｖは、Ｔａｖ＝１×（１−Ｂ／１００）＋２×（１−Ｂ／
１００×Ｏ／１００）＋３×（Ｂ／１００×Ｏ／１０
０）となる。上式で、実行する命令列に含まれる分岐命令の
頻度をＢ％とし、この分岐命令実行による分岐先命令ア
ドレスが奇数である比率をＯ％としている。例としてＢ
＝１０、Ｏ＝５０とするとＴａｖ＝１．１５となり、平
均命令実行時間の増加は従来方式の１５％で済むことに
なる。【００４９】《実施形態２》［構成］図９は、本発明の命令読み出し装置における実
施形態２の構成図である。【００５０】本実施形態２では、一つの命令ブロックは
四つの命令コードで構成されるとしている。図９におい
て、命令メモリ２ａは一度に一つの命令ブロック、即
ち、四つの命令コードを読み出すため、命令メモリ２ａ
の出力、命令ラッチ３ａの構成、および命令セレクタ４
ａの構成は、図１に示す実施形態１の構成と異なる。ま
た、命令アドレス生成部１ａから出力される命令ポジシ
ョンは４命令のうちの一つを示すため２ビットとなって
いる。これら以外は図１に示す実施形態１の命令読み出
し装置の構成と同様である。【００５１】実施形態２では、一つの命令ブロックは連
続する四つの命令コードから構成されるとしている。命
令ブロックには、４の倍数となる命令アドレスに配置さ
れる命令コード、そしてそれに連続する４の剰余が１、
２、３となる四つの命令コードがこの順に置かれる。命
令ポジションは、命令アドレス生成部１ａの内部で生成
される次に実行すべき命令アドレスの下位２ビットで、
命令ブロックの四つの命令コードのうちの一つを指定す
る信号である。【００５２】命令メモリ２ａは、プログラムを構成する
命令コードを格納するもので、命令アドレス生成部１ａ
から入力される命令ブロックアドレスで指定される命令
ブロックの四つの命令コードを同時に読み出す。命令ラ
ッチ３ａは、命令メモリ２ａから読み出した命令ブロッ
クの四つの命令コードをラッチする。ここでラッチ動作
とは、命令ラッチ３ａに入力される命令ラッチ信号が
“１”の状態においては、データ入力である四つの命令
コードをそのまま通過させて出力し、命令ラッチ信号が
“１”から“０”に立ち下がる時点で、その時点でのデ
ータ入力を保持し始め、命令ラッチ信号が“０”の状態
にある間、上記データ入力を保持し続ける動作を指す。
また、命令ラッチ信号はタイミング生成部６ａから入力
される。【００５３】命令セレクタ４ａは、命令ラッチ３ａから
出力される命令ブロックを構成する四つの命令コードを
入力し、命令アドレス生成部１ａから入力される命令ポ
ジションを選択信号として次に実行すべき命令アドレス
に配置される命令コードを選択して出力するものであ
る。命令レジスタ５ａは、命令セレクタ４ａから出力さ
れる次に実行すべき命令コードを保持するレジスタであ
る。命令レジスタ５ａには、クロックがセット信号とし
て入力され、毎クロックサイクルのクロックの立ち上が
り時点で新たな命令コードが保持される。【００５４】タイミング生成部６ａは、クロック、命令
実行装置１００から入力される分岐ヒット、および命令
アドレス生成部１ａから入力される命令ポジションとに
基づき、アドレス選択信号を生成して命令アドレス生成
部１ａに出力し、また、命令ラッチ信号を生成して命令
ラッチ３ａに出力し、更に、ウエイト信号を生成して命
令アドレス生成部１ａ、および命令実行装置１００に出
力するものである。【００５５】アドレス選択信号は、命令アドレス生成部
１ａにおいて、分岐以外の命令実行時には実行中の命令
コードを含む命令ブロックの配置される命令ブロックア
ドレスを選択し、分岐命令の実行時には分岐先命令ブロ
ックアドレスを選択する。ウエイト信号は、命令アドレ
ス生成部１ａに対して、次に実行すべき命令アドレスの
更新を保留させるため、および命令実行装置１００に対
して命令実行を保留させるための信号で、命令メモリ２
ａからの分岐先命令コードを含む命令ブロック（以下分
岐先命令ブロックと記す）の読み出し時間のために必要
な分岐先アドレスに配置される命令コードの供給が遅れ
る時にクロックサイクル単位で生成される。【００５６】図１０は、実施形態２の命令アドレス生成
部１ａの構成図である。命令アドレス生成部１ａは、三
つの加算器（加算器１１ａ、加算器１２ａ、加算器１３
ａ）、二つの選択回路（セレクタ１４ａ、セレクタ１５
ａ）、および二つのレジスタ（命令フェッチアドレスレ
ジスタ１６ａ、命令デコードアドレスレジスタ１７ａ）
とで構成される。本発明実施形態２では、一つの命令ブ
ロックは四つの命令コードで構成されるとしているた
め、そのアドレス生成部１ａは、命令フェッチアドレス
レジスタ１６ａ、および命令デコードアドレスレジスタ
１７ａから出力される命令ポジションが、図２に示した
実施形態１の命令読み出し装置の命令アドレス生成部１
の命令ポジションが１ビットであるのに対して、２ビッ
トになっている。その他は図１の本発明実施例と同様で
ある。【００５７】命令フェッチアドレスレジスタ１６ａは、
実行すべき命令コードの配置される命令アドレスを保持
するレジスタで、パイプライン方式で処理される命令の
命令フェッチ相にある命令アドレスを最低１クロックサ
イクルの間保持するものである。また、命令デコードア
ドレスレジスタ１７ａは、実行すべき命令コードの配置
される命令アドレスを保持するレジスタで、パイプライ
ン方式で処理される命令の命令デコード相にある命令ア
ドレスを最低１クロックサイクルの間保持するものであ
る。【００５８】加算器１１ａは、命令フェッチ相にある命
令コードの命令アドレスに連続する次の命令アドレスを
計算するもので、命令フェッチアドレスレジスタ１６ａ
の内容に１を加える演算を行う。また、加算器１２ａ
は、命令デコード相にある命令が分岐命令である場合
に、その分岐命令コードの示す分岐先命令アドレスを計
算するもので、命令デコードアドレスレジスタ１７ａの
内容と、命令デコード相にある命令コードの示す分岐オ
フセット値を加算する演算を行う。【００５９】セレクタ１４ａは、次に実行すべき命令ア
ドレスを選択するもので、命令実行装置１００から入力
される分岐ヒット、およびタイミング生成部６ａから入
力されるウエイト信号とを選択信号として、ウエイト信
号が“１”の場合には分岐ヒットのレベルにかかわらず
命令フェッチアドレスレジスタ１６ａから入力される現
命令アドレスを選択する。一方、ウエイト信号が“０”
で、分岐ヒットが“０”の場合には加算器１１ａから入
力される次命令アドレスを選択し、ウエイト信号が
“０”で、分岐ヒットが“１”の場合には加算器１２ａ
から入力される分岐先命令アドレスを選択して命令フェ
ッチアドレスレジスタ１６ａに出力する。【００６０】加算器１３ａは、命令フェッチアドレスレ
ジスタ１６ａに保持される命令フェッチ相にある命令コ
ードを含む命令ブロックアドレスに連続する次の命令ブ
ロックアドレスを計算するものである。この加算器１３
ａは、命令フェッチアドレスレジスタ１６ａの出力のう
ち、下位２ビット除いた値、即ち、命令ブロックアドレ
ス（４命令コード分単位のアドレス）に１を加える演算
を行う。【００６１】セレクタ１５ａは、命令メモリ２ａから読
み出すべき次の命令ブロックを配置する命令ブロックア
ドレスを選択するものである。このセレクタ１５ａは、
タイミング生成部６ａから入力されるアドレス選択信号
を選択信号として、次に読み出すべき命令ブロックが現
実行中の命令ブロックに連続する命令ブロックである場
合には加算器１３ａから入力される次命令ブロックアド
レスを選択し、一方、次に読み出すべき命令ブロックが
分岐先命令アドレスに配置される命令コードを含む命令
ブロックである場合には命令フェッチアドレスレジスタ
１６ａから入力される命令ブロックアドレスを選択して
命令メモリ２ａに出力する。【００６２】図１１は、実施形態２のタイミング生成部
６ａの構成図である。このタイミング生成部６ａの基本
的な機能については上述した実施形態１のタイミング生
成部６ａと同様であるが、本実施形態２では一つの命令
ブロックは四つの命令コードで構成されるとしているた
め、図１１で生成される各信号は、図３とはタイミング
が異なる。【００６３】タイミング生成部は、７つのフリップフロ
ップ（ＦＦ）３１〜３７、三つのセレクタ３８〜４０、
一つのデコーダ４１、一つのセット・リセットフリップ
フロップ（ＲＳＦＦ）４２、二つの論理積回路（アンド
ゲート）４３、４４、および二つの論理和回路（オアゲ
ート）４５、４６とで構成される。ＦＦ３７は、命令実
行装置１００から入力される分岐ヒットを１クロックサ
イクル遅延させるもので、出力は分岐選択信号としてセ
レクタ３８〜４０、アンドゲート４３、オアゲート４
５、およびＲＳＦＦ４２のリセット入力に入力される。
ＦＦ３１〜３４、セレクタ３８〜４０、アンドゲート４
３、およびオアゲート４５は、四つの命令コードで構成
される命令ブロックを４サイクル毎に命令メモリ２ａか
ら読み出し、命令ラッチ３ａに保持するための命令ラッ
チ信号を生成するループを構成している。【００６４】分岐命令の実行により新たな命令ブロック
を命令メモリ２ａから読み出す場合の命令ラッチ信号の
生成タイミング、およびその場合の分岐先命令アドレス
が分岐先命令ブロックの四つのうちのどの命令位置を示
すかにより、分岐先命令ブロックに連続する次の命令ブ
ロックに対する命令ラッチ信号の生成タイミングが、分
岐命令を含まない命令列の実行時に比べて異なる。この
ため、図１１のデコーダ４１にて命令ポジションをデコ
ードし、分岐位置１〜４信号を生成し、分岐位置１信号
はセレクタ３８に、分岐位置２信号はセレクタ３９に、
また、分岐位置３信号と分岐位置４信号はオアゲート４
６で論理和をとった後、セレクタ４０に入力し、分岐命
令実行時にこれらの信号をセレクタ３８〜４０にて選択
するために、ＦＦ３７出力である分岐選択信号をセレク
タ３８〜４０に入力する。【００６５】ＲＳＦＦ４２はアドレス選択信号を生成す
るもので、命令実行装置１００から入力される分岐ヒッ
トの立ち上がり時にセットし、ＦＦ３７出力である分岐
選択信号の立ち下がり時にリセットして、２クロックサ
イクルの間アサートされ、命令アドレス生成部１ａに出
力される。命令アドレス生成部１ａではアドレス選択信
号により、命令メモリ２ａに出力する命令ブロックアド
レスを選択する。アンドゲート４４、およびＦＦ３５、
３６はウエイト信号を生成するもので、分岐命令が実行
され、かつ分岐先命令アドレスが分岐位置４、即ち、４
の剰余が３で分岐先命令ブロックの最後の命令位置を示
す場合に１クロックサイクルの間ウエイト信号はアサー
トされ、命令アドレス生成部１ａ、および命令実行装置
１００に出力される。尚、ウエイト信号の働きは本発明
実施形態１と同様である。【００６６】［動作］図１２、１３は、本実施形態２の
命令読み出し装置における、分岐命令を含まない命令列
の実行時の動作タイムチャートである。図１４、１５
は、分岐先命令アドレスの４の剰余が０となる分岐命令
を含む命令列の実行時の動作タイムチャートである。図
１６、１７は、分岐先命令アドレスの４の剰余が１とな
る分岐命令を含む命令列の実行時の動作タイムチャート
である。図１８、１９は、分岐先命令アドレスの４の剰
余が２となる分岐命令を含む命令列の実行時の動作タイ
ムチャートである。図２０、２１は、分岐命令アドレス
の４の剰余が３となる分岐命令を含む命令列の実行時の
動作タイムチャートである。【００６７】図１２、１３に示す本実施形態２における
分岐命令を含まない命令列の実行時の動作タイムチャー
トは、命令ブロックが４命令で構成されるために命令メ
モリ２ａからの命令ブロック読み出しが４クロックサイ
クル毎になっている点を除いては、図４に示す本発明の
実施形態１における分岐命令を含まない命令列の実行時
の場合と同様である。【００６８】図１２は、図９に示す命令読み出し装置の
各部の信号を、図１３では図９の命令読み出し装置にお
いて処理される命令フェッチ相（ＩＦ）、および命令実
行装置１００における命令デコード相（Ｄ）、命令実行
相（Ｅ）、メモリ操作相（Ｍ）、および結果格納相
（Ｗ）の各パイプライン相を各命令毎に示す。図１２、
１３で、命令メモリ２ａは、一度に四つの命令コードで
構成される命令ブロックを読み出すことができる。但
し、命令メモリ２ａの読み出しアクセス時間は１クロッ
クサイクルより大きく２クロックサイクル未満（図１２
では１．５クロックサイクル程度）としている。以下、
図１２、１３の命令ブロックＮ＋１について、それが命
令メモリ２ａから読み出されて命令レジスタ５ａの出力
に保持され、命令実行装置１００に供給される動作につ
いて説明する。【００６９】図１２、１３に示すように、各命令ブロッ
クは四つの命令コードで構成されるため、命令ブロック
の読み出しは４クロックサイクル毎でよい。従って、命
令ブロックアドレスの更新は４クロックサイクル毎に行
われ、この命令ブロックアドレスで指定される命令ブロ
ックが命令メモリ２ａから４クロックサイクル毎に読み
出される。【００７０】命令メモリ２ａの読み出し時間は１クロッ
クサイクルを超えるとしているので、新たに命令メモリ
２ａから読み出される命令ブロックＮ＋１は命令ブロッ
クアドレスＮ＋１の確定後、１クロックサイクル以降に
確定する。命令ラッチ３ａに入力される命令ラッチ信号
は命令ブロックアドレス確定後、約３クロックサイクル
後に“１”に変化するため、それまでは命令ブロックＮ
の内容を保持し続ける。命令ラッチ信号が“１”の状態
においては命令ラッチ３ａは透過となる。入力される命
令メモリ読み出しデータは、命令ラッチ信号が“１”に
変化した時点では、命令ブロックアドレスＮ＋１に対す
る読み出しデータは既に確定している。従って、この時
点で正しい命令ブロックの内容が命令ラッチ３ａを透過
して命令セレクタ４ａに出力される。【００７１】命令セレクタ４ａでは、入力される実行す
べき命令アドレスの下位２ビットである命令ポジション
により入力される命令ブロックの四つの命令コードのう
ちの一つを選択し命令レジスタ５ａに出力する。命令ブ
ロックＮ＋１は、分岐命令実行によるものではなく連続
的に実行されるので、命令ポジションは“０”、
“１”、“２”、および“３”の順となる。従って、命
令セレクタ４ａでは最初のクロックサイクルにおいて、
命令コードＮ＋１（０）を、次のクロックサイクルで命
令コードＮ＋１（１）を、その次のクロックサイクルで
命令コードＮ＋１（２）を、そして、その次のクロック
サイクルで命令コードＮ＋１（３）を出力する。【００７２】命令ラッチ３ａは、入力される命令ラッチ
信号の立ち下がり時点の入力データを保持し、命令ラッ
チ信号が“０”の間データは保持されるので、命令ブロ
ックＮ＋１の四つの命令コードＮ＋１（０〜３）は正し
く命令レジスタ５ａに入力される。命令レジスタ５ａ
は、命令セレクタ４ａにより選択されて入力された命令
コードＮ＋１（０）、命令コードＮ＋１（１）、命令コ
ードＮ＋１（２）、および命令コードＮ＋１（３）を、
順次クロックの立ち上がり時点で保持し、１クロックサ
イクルの間命令実行装置１００に供給する。【００７３】次に、図１４、１５に示す分岐先命令アド
レスの４の剰余が０となる分岐命令を含む命令列の実行
時の動作を説明する。図１４、１５では、命令ブロック
Ｎの最初に実行される命令、即ち、命令コードＮ（０）
が分岐命令であり、その命令で指定される分岐先命令ア
ドレスが命令ブロックＭの最初の命令（分岐位置１の命
令）、即ち、命令コードＭ（０）であるとする。ここ
で、命令ブロックＮと命令ブロックＭは、実施形態１と
同様、命令アドレス空間上連続であっても不連続であっ
ても構わない。【００７４】分岐命令｛命令コードＮ（０）｝は、命令
実行装置１００において命令デコード相（Ｄ）で解読さ
れ、その命令が分岐命令であることが認識される。これ
と同時に、命令読み出し装置の命令アドレス生成部１ａ
において、分岐先命令アドレスの計算が行われる。分岐
先命令アドレスは分岐命令｛分岐命令コードＮ（０）｝
の配置される命令アドレスの値と、この命令コードから
与えられる分岐オフセット値を加算することで得られ
る。【００７５】命令コードＮ（０）が分岐命令であるの
で、これに連続する命令である命令コードＮ（１）、お
よび命令コードＮ（２）は本来実行されてはならない命
令であるが、この実施形態２においても、ＲＩＳＣで採
用されている遅延分岐という手法を用いるため、これを
連続して実行することが可能である。従って、図１４、
１５に示す遅延命令である命令コードＮ（１）、および
命令コードＮ（２）は、分岐先命令ブロックＭを命令メ
モリ２ａから読み出す間に実行される。図１４、１５の
分岐先命令である命令コードＭ（０）の実行タイムチャ
ート部分から分かるように、分岐命令である命令コード
Ｎ（０）から分岐先命令である命令コードＭ（０）まで
２クロックサイクルの間隔があるため、命令コードＮ
（１）、および命令コードＮ＋１（０）の二つの命令を
遅延命令とすることが可能である。ここで、本発明の実
施形態２では遅延分岐を用いることにより、命令メモリ
２ａからの分岐先命令ブロックの読み出しによる命令実
行の遅延を隠蔽している。【００７６】次に、図１６、１７に示す分岐先命令アド
レスの４の剰余が１となる分岐命令を含む命令列の実行
時の動作を説明する。図１６、１７では、命令ブロック
Ｎの最初に実行される命令、即ち、命令コードＮ（０）
が分岐命令であり、その命令で指定される分岐先命令ア
ドレスが命令ブロックＭの２番目の命令（分岐位置２の
命令）、即ち、命令コードＭ（１）であるとする。この
場合の動作は、図１４、１５の分岐先命令アドレスの４
の剰余が０となる分岐命令を含む命令列の実行時と比べ
て、分岐先命令位置が異なることから図９の命令セレク
タ４ａにより選択される命令コードが異なること、そし
て分岐先命令ブロックＭに連続する次の命令ブロックで
ある命令ブロックＭ＋１の命令ラッチ３ａへの保持タイ
ミング、即ち、命令ラッチ信号のアサートタイミングが
図１４、１５の場合に比べ、１クロックサイクル早くな
ること以外は図１４、１５の場合と同様の動作を行う。【００７７】次に、図１８、１９に示す分岐先命令アド
レスの４の剰余が２となる分岐命令を含む命令列の実行
時の動作を説明する。図１８、１９では、命令ブロック
Ｎの最初に実行される命令、即ち、命令コードＮ（０）
が分岐命令であり、その命令で指定される分岐先命令ア
ドレスが命令ブロックＭの３番目の命令（分岐位置３の
命令）、即ち、命令コードＭ（２）であるとする。この
場合の動作は、図１４、１５の分岐先命令アドレスの４
の剰余が０となる分岐命令を含む命令列の実行時と比べ
て、分岐先命令位置が異なることから図９の命令セレク
タ４ａにより選択される命令コードが異なること、そし
て分岐先命令ブロックＭに連続する次の命令ブロックで
ある命令ブロックＭ＋１の命令ラッチ３ａへの保持タイ
ミング、即ち、命令ラッチ信号のアサートタイミングが
図１４、１５に比べ、２クロックサイクル早くなること
以外は図１４、１５に示した場合と同様の動作を行う。【００７８】最後に、図２０、２１に示す分岐先命令ア
ドレスの４の剰余が３となる分岐命令を含む命令列の実
行時の動作を説明する。この場合の動作が、上述した分
岐先命令アドレスの４の剰余が２となる分岐命令を含む
命令列の実行動作との大きく異なるのは、分岐先命令ブ
ロックＭに連続する次の命令ブロックである命令ブロッ
クＭ＋１を図９の命令メモリ２ａから読み出すことによ
り生ずる１クロックサイクルのウエイトサイクルであ
る。【００７９】図２０、２１では、命令ブロックＮの最初
に実行される命令、即ち、命令コードＮ（０）が分岐命
令であり、その命令で指定される分岐先命令アドレスが
分岐先命令ブロックＭの最後の命令（分岐位置４の命
令）、即ち、命令コードＭ（３）とする。分岐命令の実
行、分岐先命令ブロックの読み出しと命令ラッチ３ａへ
の格納までは図１８、１９の場合と同様の動作である。
分岐先命令は命令ブロック内の命令位置４に配置される
ため、命令セレクタ４ａに入力される命令ポジションは
“３”となり、これにより命令セレクタ４ａでは、命令
コードＭ（３）を選択して命令レジスタ５ａに出力す
る。命令レジスタ５ａでは図１８、１９に示す場合と同
様に、分岐先命令である命令コードＭ（３）を保持し、
命令実行装置１００に出力する。ここまでは、図１４〜
図１９に示した場合と同様にウエイトサイクルは挿入さ
れない。即ち、ウエイト信号はアサートされない。【００８０】分岐先命令アドレスがブロック内命令位置
４の場合には、分岐先命令ブロックＭのうち、最後に配
置される命令コードＭ（３）のみが実行される。つま
り、２クロックサイクルで読み出した命令ブロックＭを
１クロックサイクルで実行してしまうことになる。従っ
て、分岐先命令ブロックＭに連続する次の命令ブロック
である命令ブロックＭ＋１は、この１クロックサイクル
の間に命令メモリ２ａから読み出されなければならない
が、命令メモリ２ａの命令ブロック読み出し時間は１ク
ロックを超えるため、更に１クロックサイクルのウエイ
トサイクルを発生する必要がある。図２０、２１におい
て、このウエイトサイクルが命令コードＭ（３）の命令
デコード相（Ｄ）、および命令コードＭ＋１（０）の命
令フェッチ相（ＩＦ）の前に挿入されている。【００８１】［効果］実施形態２の命令読み出し装置
は、実施形態１の読み出し装置と比べ、一つの命令ブロ
ックを四つの命令ブロックで構成している点が大きな相
違である。４クロックサイクル毎に一つの命令ブロック
を読み出すことにより、実施形態１に比べて多くの実行
すべき命令、および分岐命令実行の場合の遅延命令を、
予め命令ラッチ３ａに保持することができるため、分岐
命令実行時の分岐先命令ブロック、およびそれに連続す
る次の命令ブロックを命令メモリ２ａから読み出す際の
ウエイトサイクルの挿入を大幅に削減することができ
る。【００８２】実施形態２の命令読み出し装置は、複数命
令で構成される命令ブロックを一度に読み出すことが可
能な命令メモリ２ａ、この命令メモリ２ａから読み出し
た命令ブロックを一時的に保持する命令ラッチ３ａ、命
令ラッチ３ａに保持される一つの命令ブロックにある複
数の命令コードのうち、次に実行すべき命令コードを選
択するための命令セレクタ４ａ、この命令セレクタ４ａ
で選択された命令コードを一時的に保持するための命令
レジスタ５ａ、およびこれらに必要な命令アドレス、お
よび各制御信号を供給するための命令アドレス生成部１
ａ、およびタイミング生成部６ａで構成され、一度に連
続する複数の命令コードを読み出し、これを一時的に保
持し、この中から次に実行すべき命令コードを抽出して
命令レジスタ５ａに設定し、命令実行装置１００に供給
するもので、複数命令を一括して命令メモリ２ａから予
め読み出すことにより、命令メモリ２ａの読み出し時間
が命令実行装置１００における命令処理時間を超える場
合でも、分岐命令を含まない命令列の実行の場合には各
命令実行サイクル毎に実行すべき命令コードを命令実行
装置１００に供給することができ、分岐命令を含む命令
列の実行の場合には分岐先の命令ブロック、あるいは分
岐先命令ブロックに連続する次の命令ブロックの読み出
し時のみ、ウエイトサイクルを挿入することで命令メモ
リ２ａの読み出し速度の低下による命令実行時間の増加
を最小とする。【００８３】一方、従来のパイプライン方式の計算機に
おける命令読み出し装置では、命令実行装置の命令実行
時間以下で命令読み出しを行う必要があり、これが満足
されない場合には、各命令フェッチ相にウエイトサイク
ルを挿入するか、ＣＰＵ動作クロック周波数を低下させ
る方法で対応している。実施形態２の命令メモリ２ａで
は、読み出し時間は１クロックサイクルより大きく２ク
ロックサイクル未満としている。この命令メモリ２ａを
用いて従来技術による命令読み出し装置を構成した場
合、１クロックサイクル以下で命令読み出しを行うこと
が可能な命令メモリを用いた場合に比較して平均命令実
行時間は２倍となる。ところが、本発明実施形態２の命
令読み出し装置では、平均命令実行時間Ｔａｖは、Ｔａｖ＝１×（１−Ｂ／１００×Ｌ／１００）＋２
×（Ｂ／１００×Ｌ／１００）となる。上式で、実行する命令列に含まれる分岐命令の
頻度をＢ％とし、この分岐命令実行による分岐先命令ア
ドレスの４の剰余が３（分岐命令位置３）である比率を
Ｏ％としている。例としてＢ＝１０、Ｏ＝２５とすると
Ｔａｖ＝１．０２５となり、平均命令実行時間の増加は
従来方式の２．５％で済むことになる。これは、実施形
態１の命令読み出し装置に比べても１／６となる。【００８４】尚、上記実施形態１、２では、一度に二つ
の命令ブロック、四つの命令ブロックを読み出すように
構成したが、この数値に限定されるものではなく、三
つ、あるいは五つ以上の命令ブロックを読み出すように
構成してもよく、これらは、用途、仕様等に応じて適宜
選択が可能である。【００８５】【発明の効果】以上説明したように、第１発明の命令読
み出し装置によれば、複数命令を一括して命令メモリか
ら読み出し、各命令を１命令実行サイクル毎に出力する
ようにしたので、命令メモリの読み出し時間が命令実行
装置における命令処理時間の１命令実行サイクルを超え
る場合でも各命令実行サイクル毎に実行すべき命令コー
ドを命令実行装置に供給することができ、その結果、高
速メモリを必要としないことから消費電力が少なく、か
つ、ＣＰＵの動作クロックに制限がなくなり、必要とす
る記憶容量を十分に確保しながら、その命令実行時間を
高速化することができる。【００８６】また、第２発明の命令読み出し装置によれ
ば、分岐命令を含む命令列の実行の場合には、分岐先の
命令ブロック、あるいは分岐先の命令ブロックに連続す
る次の命令ブロックの読み出し時のみウエイトサイクル
を挿入するようにしたので、分岐命令を含む命令列の実
行の場合でも、命令メモリの読み出し速度の低下による
命令実行時間の増加を最小限に抑えることができる。DETAILED DESCRIPTION OF THE INVENTION [0001] TECHNICAL FIELD The present invention relates to a pipeline system.
And a command reading device in the computer. [0002] 2. Description of the Related Art Conventionally, a pipeline computer has been used.
The instruction execution consists of multiple stages, and each processing stage
Pipelines that process different instructions at different stages
The equivalent configuration reduces the minimum instruction execution time
Processing time for one stage of pipeline, ie, one clock cycle
It is well known that this is achieved. like this
Maximum achievable performance in a configured pipeline computer
In order to satisfy
Clock cycles are required. If the instruction read
Had to spend two clock cycles on
Therefore, the minimum instruction execution time is doubled, and two clock cycles
Will be reduced to Therefore, it operates with a high-speed clock
CPU that performs high-speed SRAM or ROM, or instructions
Cache memory is the same LSI as CPU (hereafter CPU)
-Described as LSI) inside or outside of LSI
I have. [0003] SUMMARY OF THE INVENTION However, the conventional
As in SRAM, ROM, or instruction cache memory.
If the memory is installed inside or outside the CPU-LSI
Memory device operates as a CPU
Must be able to access at high speed less than clock cycle time
In these methods, the CPU operation clock that can be followed
There was a limit on the clock frequency. That is, the memory device
Access time determines the CPU operating clock frequency.
(The clock frequency must be increased over a certain value.
And couldn't). In addition, a memory device is a CPU-LSI
When installed internally, high-speed access to the memory device
The power consumption increases as a possible memory, and as a result,
The problem of high power consumption and high heat generation as CPU-LSI
there were. Also, to make the memory accessible at high speed
Need to limit circuit delay and wiring delay.
Storage capacity cannot be increased,
Instruction memory or instruction cache memory capacity to CPU
-In many cases, it could not be built into LSI.
That is, it is necessary in consideration of circuit delay and wiring delay.
It was difficult to secure memory capacity. Further, a memory device is used for a CPU-LSI.
If the memory device is provided inside,
In addition, the CPU-LSI and external memory
Circuit delay and wiring delay between
There is a large restriction on the CPU operating clock that can be used.
And to install a high-speed memory outside the CPU-LSI,
Issues that increase mounting area, power consumption, and component costs
was there. Further, when an instruction cache memory is installed,
In this case, all necessary instruction codes cannot be stored.
Yes (no). Therefore, weight due to cache miss
Cycles, which reduce the average instruction access time
There was a problem below. [0007] From such a point, the power consumption is small,
Is simple and secures the required storage capacity.
Instruction reading device that can shorten instruction execution time
It has been desired to realize such an arrangement. [0008] The present invention solves the aforementioned problems.
One stage is one clock cycle to solve
Instruction Readout in Pipelined Computer
A plurality of instruction codes of a program to be executed.
And stores the plurality of instruction codes as an instruction block.
Instruction memory readable in a batch, and the instruction memory
And collectively store the plurality of instruction codes in an instruction latch.
Instruction block for specifying the instruction block to be executed
Outputs the address and specifies the instruction code to be executed.
Output the command position for settingorderAddress student
Narita and the saidorderThe instruction point output from the address generator
The plurality of instructions from the instruction latch based on the
Selects one of the instruction codes and outputs it.
And the instruction code selected by the instruction selector.
Instruction code for temporarily storing the instruction code and executing the instruction code.
Instruction register to output to row device every clock cycle
And the instruction port output from the instruction address generation unit.
And the instruction execution device branches the instruction code.
Output as commandWasBased on the branch hit andSaid
1 clock cycleWait to stop instruction processing only
Signal to the instruction address generator and the instruction execution unit.
And a timing generatorBasic configurationToss
You. [0009]BookThe instruction reading apparatus of the present invention is thus configured.
Because it is formed, from the instruction memory,Block address
Based onMultiple instruction codes at onceInstruction latchRead
You. The instruction selector isorderOutput from the address generator
Based on the command positionIn the instruction latchInstruction block
Select any instruction code in the task. Instruction cash register
The master temporarily stores the instruction code selected by the instruction selector.
And output every clock cycle. [0010] The timing generator includes:1Clock cycle
OnlySends a wait signal. That is, the timing generator
Is the instructionPosition andorderThe execution deviceInstruction codeA minute
Based on the branch hit determined and output as a branch instruction, 1
Clock cycle onlySends a wait signal. Soshi
Out of the plurality of instruction codes read into the instruction latch,
Consecutive instructions at the branch destination instruction address to be executed next
Select the code with the instruction selector and use the instruction register
Is held and output at the start of the next instruction execution. In addition, instruction
Read time of branch destination instruction block in memory
If the time exceeds the specified time, the branch destination instruction block
Cannot be read following the instruction block being executed.
Is pointing to you. The present inventionThen, The timing generator
The position of the instruction block at the branch destination
If only the subsequent instruction code is determined to be processed,againUe
The instruction signal generating unit and the instruction execution unit.
Output to The timing generator
By being configured asAt the time of branch instruction,
By sending the wait signal at least once, high-speed
Instruction memory is no longer needed and the position of the branch destination instruction
The number of weight signal transmissions
Thus, an increase in instruction execution time can be suppressed. [0012] BRIEF DESCRIPTION OF THE DRAWINGS FIG.
This will be described in detail with reference to FIG. Embodiment 1 [Structure] FIG. 1 is a first embodiment of an instruction reading apparatus according to the present invention.
FIG. The instruction reading device uses an instruction address
Instruction generator 1, instruction memory 2, instruction latch 3, instruction select
4, an instruction register 5, and a timing generator 6.
Also, at the subsequent stage of the instruction reading device, an instruction execution device is provided.
100 is installed, and an instruction
And weight signals are supplied.
The instruction reading device and the instruction execution device 100
CPU (Central Processing Unit)
Hereinafter, these will be referred to as CPU including these. The instruction address generation unit 1 includes an instruction reading unit.
Clock common to the instruction and execution unit 100,
And a branch hit input from the instruction execution device 100.
Signals from the timing generator 6
Instruction instruction to be executed next based on the
Generate dress and include instructions currently executing
Instruction indicating the next instruction block following the instruction block
Generation of block address and execution of branch instruction
In some cases, the instruction block
Address (hereinafter referred to as a branch destination instruction block)
And the instruction block address to be executed next and
Outputs the command position. In this embodiment, one instruction block is continuous
It is assumed that it is composed of two instruction codes. Instruction bro
The instruction code located at the even address
To the instruction address at the odd address
Two instruction codes are placed in the order of the
You. The instruction position is generated inside the instruction address generator 1.
The least significant bit of the next instruction address to be executed
Specifies one of the two instruction codes in the instruction block.
Signal to be determined. The instruction memory 2 stores instructions constituting a program.
The instruction code is stored in the instruction address generation unit 1.
Instruction block specified by the input instruction block address
To read two instruction codes of the
ing. The instruction latch 3 is read from the instruction memory 2
Latches two instruction codes of an instruction block.
You. Here, the latch operation is input to the instruction latch 3.
When the instruction latch signal is "1", the data input
And pass the two instruction codes as they are,
When the instruction latch signal falls from "1" to "0"
Then, hold the data input at that point,
Holds the above data input while the signal is "0".
That is, The instruction latch signal is used for timing generation.
It is input from the component 6. The instruction selector 4 outputs a signal from the instruction latch 3
Input the two instruction codes that make up the instruction block
Instruction position input from the instruction address generation unit 1
At the next instruction address to be executed as a selection signal
The selected instruction code is output. Instruction register 5
Is the next instruction to be executed, output from the instruction selector 4.
Hold the code. This instruction register 5 has a clock
Input as a set signal,
New instruction code is retained at the rise of the clock
It is configured as follows. The timing generation unit 6 includes a clock, an instruction
Branch hits input from the row device 100, and
Command position input from command address generator 1
Generates an address selection signal,
1 and also generates an instruction latch signal to
And wait signal is generated to generate instruction address.
Configuration to output to component 1 and instruction execution device 100
Have been. The address selection signal is sent to the instruction address generator
If input to 1, it is being executed when an instruction other than a branch is executed
Instruction block where the instruction block containing the instruction code
Select the address of the branch and execute the branch instruction.
Select an instruction block address. The weight signal is
Instructs the instruction address generator 1 to execute the next instruction
To suspend the update of the dress and the instruction execution device 1
00 is a signal for suspending instruction execution
Instruction block including branch instruction code from memory 2
(Hereinafter referred to as a branch destination instruction block)
Of the instruction code allocated to the branch destination address necessary for
Generated in clock cycles when supply is delayed. FIG. 2 shows an instruction address generator according to the first embodiment.
1 is a configuration diagram. As shown, the instruction address generator
1 is three adders 11, 12, 13 and two selection circuits
(Selector 14, selector 15) and two registers
(Instruction fetch address register 16, instruction decode
Address register 17). The instruction fetch address register 16 stores
Holds the instruction address where the instruction code to be executed is located
Registers that are processed in a pipelined manner.
Instruction fetch phase at least one clock cycle
It is held during the cruise. Also, instruction decode add
Register 17 stores an instruction code to be executed.
Register that holds the instruction address
Instruction address in the instruction decode phase of the instruction processed by the expression
Is held for at least one clock cycle. The adder 11 controls an instruction in an instruction fetch phase.
Counts the next instruction address following the instruction address of the code.
Of the instruction fetch address register 16
An operation of adding 1 to the volume is performed. Further, the adder 12 is provided with an instruction
If the instruction in the decode phase is a branch instruction,
Calculates branch instruction address indicated by branch instruction code
The contents of the instruction decode address register 17 and the instruction
The branch offset value indicated by the instruction code in the decode phase is
Perform the addition operation. The selector 14 has an instruction address to be executed next.
Address from the instruction execution device 100.
Branch hit, and the input from the timing generation unit 6.
The wait signal is used as the selection signal and the wait signal is
If "1", instruction regardless of branch hit level
The current instruction address input from the fetch address register 16
Choose a dress. On the other hand, when the weight signal is “0”,
When the branch hit is "0", it is input from the adder 11.
Select the next instruction address, and if the wait signal is "0",
When the branch hit is "1", it is input from the adder 12.
Select the branch destination instruction address and set the instruction fetch address
Output to the register 16. The adder 13 has an instruction fetch address register.
Instruction code in the instruction fetch phase held in the
The next instruction block following the instruction block address containing
Calculate the address. This adder 13
Of the output of the instruction fetch address register 16
The value excluding one bit, ie, the instruction block address (2
Perform an operation to add 1 to the address of the instruction code unit).
U. The selector 15 reads from the instruction memory 2
Instruction block address to place the next instruction block to be
To select the source. This selector 15 is
Selects the address selection signal input from the
The next instruction block to be read is
If the instruction block is a continuation of the instruction block,
Selects next instruction block address input from arithmetic unit 13
On the other hand, the next instruction block to be read is the branch destination instruction.
An instruction block containing the instruction code located at the address
If there is, the instruction fetch address register 16
Select the instruction block address to be input and select the instruction memory 2
Output to FIG. 3 is a block diagram of the timing generator 6.
You. The timing generator 6 includes five flip-flops.
(FF) 21 to 25, two AND circuits (AND gates)
G) 26, 27, and three OR circuits (OR gates)
G) 28 to 30. The FF 25 is input from the instruction execution device 100
That delays the taken branch hit by one clock cycle
The output is AND gate 26, OR
Input to the gate 28 and the OR gate 30. FF
21, AND gate 26, FF22, and OR gate
Reference numeral 28 denotes a branch selection signal input from the FF 25 and a clock.
Generates an instruction latch signal to be supplied to the instruction latch 3.
To achieve. The instruction latch signal is read from the instruction memory 2.
For latching the instruction block to the instruction latch 3
It is. Instruction block consists of two instruction codes
Therefore, if an instruction other than a branch instruction is executed,
The signal goes to the logic level "1" at the clock cycle interval.
(Hereinafter referred to as asserted). FF21 and FF
A loop of 22 is used for this. However, branch
When the instruction is executed, the branch destination instruction block address
And the branch destination instruction block
Lock confirmation time is different from execution of instructions other than branch instructions
Therefore, it is necessary to change the assertion time of the instruction latch signal.
You. AND gate 26 and OR gate 28
Logical operation for executing a branch instruction.
Is the memory operation phase of the branch instruction, ie, the instruction decode
One clock cycle time two clock cycles after the phase
, The instruction latch signal is asserted. AND gate 27, FF23, FF24 and
And the OR gate 29 supply the instruction execution device 100
Generates a wait signal and executes a branch instruction
If the branch destination instruction address is an even address,
Alternatively, the assertion of the weight signal depends on whether the address is an odd address.
Is determined. The wait signal is sent to the instruction fetch address register.
The instruction address in the instruction fetch phase held in the
You can extend the storage of the dress, or
Signal that delays the switch phase in clock cycle units.
Output to the instruction address generation unit 1 and the instruction execution device 100
Is done. The instruction fetch phase is determined by the assertion of the wait signal.
Delay while the instruction is being supplied and executed.
You can adjust the timing. Further, the branch destination instruction address is an odd address.
If there are two
Only the odd-numbered instruction codes are executed
Is done. Therefore, the next clock after the instruction fetch phase of this instruction
A new instruction block is read in the clock cycle.
Need to be However, in the first embodiment of the present invention, the instruction
Access time of memory 2 is longer than one clock cycle
And less than two clock cycles,
If the next instruction block following the branch destination instruction block
Cannot be read after one clock cycle. This
In this case, the above wait signal is cycled again for one clock cycle.
Assertion for the duration of the
And read the next instruction block that follows the block.
Notify the instruction execution device 100 of both time delays
You. Of the two inputs of the OR gate 29, the instruction
The branch hit supplied from the execution device 100 is a branch destination instruction.
Due to delay in instruction code determination due to block reading
And AND gate 27, FF23 and FF2
The signal generated through 4 is an odd-numbered branch destination instruction address.
Address, the next instruction following the branch destination instruction block
Instruction code is delayed by reading the instruction block.
To promote the generation of a weight cycle.
The wait signal is generated by the logical sum of the inputs of [Operation] FIGS. 4 to 6 each show an embodiment.
Instruction not including a branch instruction in the first instruction reading device
Operation time chart when executing a column, branch destination instruction address
During execution of an instruction string that includes a branch instruction whose address is an even address
Time chart, where the branch instruction address is an odd address
5 is an operation time chart when executing an instruction sequence including a branch instruction.
You. In FIG. 4, the instruction read of FIG.
2 shows signals of various parts of the device. The lower part is the instruction reading
Instruction fetch phase processed in the processor and instructions
Instruction for executing the instruction code supplied from the reading device
Instruction decoding phase, instruction execution phase,
The pipeline phases of memory operation phase and result storage phase
Shown for each instruction. First, the instruction memory 2 stores two instructions at a time.
Instruction code consisting of instruction codes
Wear. However, the read access time of the instruction memory 2 is 1
Greater than clock cycle and less than 2 clock cycles
(About 1.5 clock cycles in FIG. 4). The instruction block N + 1 shown in FIG.
And it is read from the instruction memory 2 and the instruction register
And is supplied to the instruction execution device 100.
The operation will be described. As shown in FIG.
Since the lock consists of two instruction codes, the instruction block
The reading of the clock may be performed every two clock cycles. Follow
The instruction block address is updated in two clock cycles
The instruction specified by this instruction block address
Instruction block from instruction memory 2 every two clock cycles
Is read out. The read time of the instruction memory 2 is one clock
Cycle, the instruction memory 2 is newly added.
The instruction block N + 1 read from the
After address N + 1 is determined,
Set. The instruction latch signal input to the instruction latch 3 is
Approximately one clock cycle after the instruction block address is determined
To "1", the instruction block N
Keep the contents. When the instruction latch signal is "1"
Means that the instruction latch 3 is transparent. Input instruction memory
In the read data, the instruction latch signal has changed to “1”.
At this point, the read for the instruction block address N + 1 is performed.
However, the data is undefined and undefined. Eventually, necessary
After the read time has elapsed, the value of the instruction code N + 1 is
Output to the instruction selector 4 through the instruction latch 3.
It is. In the instruction selector 4, all the input executions are performed.
The instruction position, which is the least significant bit of the instruction address
The two instruction codes of the input instruction block
One of them is selected and output to the instruction register 5. Order
Lock N + 1 is not due to execution of a branch instruction, but
Since the command is executed continuously, the command position is “0”,
The order is “1”. Therefore, in the instruction selector 4,
Instruction code N + 1 (0) in the clock cycle of
And the instruction code N + 1 (1) in the next clock cycle.
Output. The instruction latch 3 receives an input instruction latch signal.
Holds the input data at the falling edge of the signal and latches the instruction.
Since the data is held while the signal is "0", the instruction block
Two instruction codes N + 1 (0) and N
N + 1 (1) is correctly input to the instruction register 5. life
The instruction register 5 is selected by the instruction selector 4 and input.
Instruction code N + 1 (0) and instruction code N +
1 (1) are sequentially held at the rising edge of the clock,
It is supplied to the instruction execution device 100 during a clock cycle. Next, the branch destination instruction addresses shown in FIGS.
During execution of an instruction string that includes a branch instruction whose address is an even address
Will be described. 5 and 6, the instruction block N is executed at the beginning.
The instruction to be executed, that is, the instruction code N (0) is a branch instruction
And the branch destination instruction address specified by the instruction is
Instruction at even address of block M, ie, instruction code M
(0). Here, the instruction block N and the instruction block
Lock M is discontinuous even if it is continuous in the instruction address space.
It does not matter. The branch instruction {instruction code N (0)} is
The instruction is decoded in the instruction decoding phase in the execution device 100, and
Is recognized as a branch instruction. At the same time
In the instruction address generation unit 1 of the instruction reading device,
Calculation of the branch destination instruction address is performed. Branch destination instruction add
Is the location of the branch instruction {branch instruction code N (0)}.
Given by the instruction address value
It is obtained by adding the branch offset values. The instruction code N (0) is a branch instruction
And the instruction code N (1) which is the next instruction following the instruction
Are instructions that should not be executed, but in the present invention
Collected by RISC (Reduced Instruction Set Computer)
To use the technique called delayed branching,
It is possible to execute continuously. Where the delayed branch
Is one of the branches to which the branch instruction was executed, or
Rearrange multiple instruction codes consecutively to branch instructions, and
Make the branch instruction address equal to the number of instructions that performed this relocation.
The value is set to the branch offset value by
The instruction code at the further branch destination is newly read from the instruction memory 2.
The instruction at the original branch destination that was relocated in advance at the time of issuing
Can be executed. This relocated
The instructions are referred to as delayed instructions in this description. Therefore, the instruction which is the delayed instruction shown in FIGS.
The instruction code N (1) stores the branch destination instruction block M in the instruction memo.
This is executed during reading from the memory 2. 5 and 6
Execution time chart of instruction code M (0)
As can be seen from the above, the instruction code N (0)
From the instruction code M (0) which is the branch destination instruction
Since there is an interval between clock cycles, the instruction code N (1)
In addition, an instruction corresponding to the instruction code N + 1 (0) is also provided.
It can be a delayed instruction. However, this
Instruction block N + 1 including instruction code N + 1 (0)
Must be newly read from the instruction memory (this implementation
In mode 1, only two instruction codes are read.)
If this is done, the instruction block
Since the reading of the block M is delayed by that amount, the instruction
The instruction corresponding to code N + 1 (0) is not a delayed instruction.
No. Therefore, the instruction code N which is a delayed instruction
Execution of (1) and the execution of the instruction code M (0)
One clock cycle during execution is a wait cycle.
You. During this one clock cycle, the timing generator 6
The wait signal is asserted from the
1, and output to the instruction execution device 100. weight
The clock cycle in which the signal is asserted is the instruction address
The generation unit 1 holds the instruction fetch address register 16
Of the instruction address of the instruction fetch phase
The instruction execution device 100 delays instruction execution and corrects
Instruction execution sequence. Next, the branch destination instruction addresses shown in FIGS.
During execution of an instruction sequence that includes a branch instruction whose address is an odd address
Will be described. The branch destination instruction address shown in FIGS.
Execution of an instruction sequence including a branch instruction whose address is an even address
The major difference is the number of wait cycles. 7 and 8
Now, the instruction code which is the instruction of the even address of the instruction block N
Is a branch instruction, but the branch destination instruction is
Instruction code M which is an odd address instruction of instruction block M
(1). Here, execution of a branch instruction,
Figure from reading the instruction block to storing it in instruction latch 3
The operation is the same as in the cases of 5 and 6. Since the branch destination instruction is located at an odd address,
The instruction position input to the instruction selector 4 is "1".
As a result, the instruction selector 4 sets the instruction code M
(1) is selected and output to the instruction register 5. Instruction cash register
In the star 5, as in the case shown in FIGS.
After the wait cycle of the cycle, the instruction
Holds instruction code M (1) and outputs it to instruction execution device 100
I do. Instruction address generation unit 1 and instruction execution device 10
0 indicates that the wait signal is asserted during the wait cycle.
Output. When the branch destination instruction address is an odd address
Is the instruction code at an odd address in the branch destination instruction block M.
Only the mode M (1) is executed. In other words, weight rhino
Instruction block read in two clock cycles
Will be executed in one clock cycle.
You. Therefore, the next instruction following the branch instruction block M
The instruction block M + 1, which is a block,
Must be read from instruction memory 2 during the cycle
However, the instruction block read time of the instruction memory 2 is 1
One clock cycle more than one clock cycle
It is necessary to generate a write cycle. As shown in FIG.
The wait cycle is the instruction of the instruction code M (1).
Decode phase and instruction code of instruction code M + 1 (0)
Inserted before the touch phase. [Effect] Instruction reading of the first embodiment of the present invention
The device reads an instruction block consisting of multiple instructions at once.
Instruction memory 2 that can be read
Instruction latch that temporarily holds the executed instruction block
3. In one instruction block held in the instruction latch 3,
Instruction code to be executed next among multiple instruction codes
Instruction selector 4 for selecting
Instruction register for temporarily holding the selected instruction code
5 and the instruction addresses necessary for them, and each control
An instruction address generator for supplying signals,
Timing for generating timing signals such as touch timing
It is composed of a generator,
Read, hold this temporarily and execute next from this
The instruction code to be extracted is extracted and set in the instruction register 5,
Since it is supplied to the instruction execution device 100,
Has a significant effect. That is, the instruction memory 2 collectively stores a plurality of instructions.
From the instruction memory 2
Interval exceeds the instruction processing time in the instruction execution device 100
Even when executing an instruction sequence that does not include a branch instruction,
Specifies the instruction code to be executed in each instruction execution cycle.
Command execution device 100, and
In the case of execution of an instruction sequence that includes an instruction,
Or the next instruction block following the branch instruction block.
Wait cycles can only be inserted when reading locks.
Thus, the instruction actuality due to the decrease in the reading speed of the instruction memory 2 is
The increase in row time can be minimized. On the other hand, the pipeline system of the prior art
In the instruction reading device in the computer, the instruction execution device
It is necessary to read out the instruction within the instruction execution time.
If these are not satisfied, a way is added to each instruction fetch phase.
Insert a clock cycle or change the CPU operating clock frequency.
We cope by method to lower. Instruction memo of Embodiment 1
In Re2, the read time is longer than one clock cycle
Less than two clock cycles. This instruction memory
2 and a command reading device according to the prior art is configured.
Instruction reading in one clock cycle or less.
Average instruction compared to using instruction memory capable of
Execution time is doubled. However, in the first embodiment of the present invention,
In the instruction reading device, the average instruction execution time Tav is: Tav = 1 × (1-B / 100) + 2 × (1-B /
100 × O / 100) + 3 × (B / 100 × O / 10
0) It becomes. In the above formula, the branch instruction
The frequency is B%, and the branch destination instruction
The odd number of dresses is O%. B as an example
= 10, O = 50, Tav = 1.15, and flat
The average instruction execution time can be increased by 15% of the conventional method.
Become. << Embodiment 2 >> [Structure] FIG. 9 is a block diagram of an instruction reading apparatus according to the present invention.
It is a block diagram of Embodiment 2. In the second embodiment, one instruction block is
It consists of four instruction codes. Figure 9
Thus, the instruction memory 2a stores one instruction block at a time,
In order to read four instruction codes, the instruction memory 2a
Output, the configuration of the instruction latch 3a, and the instruction selector 4
The configuration a is different from the configuration of the first embodiment shown in FIG. Ma
The instruction position output from the instruction address generation unit 1a
Option is two bits to indicate one of the four instructions
I have. Other than these, the instruction reading of the first embodiment shown in FIG.
The configuration is the same as that of the device. In the second embodiment, one instruction block is
It consists of four consecutive instruction codes. life
Instruction blocks are located at instruction addresses that are multiples of four.
Instruction code, and the remainder of 4 following it is 1,
Two or three instruction codes are placed in this order. life
The instruction position is generated inside the instruction address generation unit 1a.
The lower two bits of the next instruction address to be executed,
Specify one of the four instruction codes in the instruction block
Signal. The instruction memory 2a forms a program
An instruction address generator 1a for storing an instruction code.
Instruction specified by instruction block address input from
The four instruction codes of the block are read simultaneously. Instruction la
The switch 3a holds the instruction block read from the instruction memory 2a.
Latch the four instruction codes. Latch operation here
Means that the instruction latch signal input to the instruction latch 3a is
In the state of "1", four instructions which are data input
Pass the code as it is and output it.
When the data falls from “1” to “0”, the data at that time
Data input starts and the instruction latch signal is "0"
, The operation of keeping the data input.
The instruction latch signal is input from the timing generation unit 6a.
Is done. The instruction selector 4a receives an instruction from the instruction latch 3a.
The four instruction codes that make up the output instruction block
Input from the instruction address generation unit 1a.
Instruction address to be executed next using
Select and output the instruction code located in
You. The instruction register 5a receives the output from the instruction selector 4a.
This register holds the instruction code to be executed next.
You. The instruction register 5a has a clock as a set signal.
Clock rises every clock cycle
At this point, a new instruction code is held. The timing generation unit 6a includes a clock, an instruction
Branch hit and instruction input from execution device 100
To the command position input from the address generator 1a
Generates an instruction address based on an address selection signal
And outputs an instruction latch signal to the
Output to the latch 3a, and further generate a wait signal to
To the instruction address generation unit 1a and the instruction execution device 100.
It is something to empower. The address selection signal is supplied to the instruction address generator.
1a, an instruction being executed when an instruction other than a branch is executed
Instruction block address where the instruction block containing the code is located
Address, and when executing a branch instruction, the branch destination instruction block
Select a contact address. The wait signal is the instruction address.
To the source generation unit 1a.
In order to suspend the update, and
To suspend execution of the instruction.
an instruction block containing the instruction code of the branch destination from
Required for the read time of the branch instruction block)
Supply of instruction code located at a complicated branch destination address is delayed
At the clock cycle time. FIG. 10 shows an instruction address generation according to the second embodiment.
It is a block diagram of the part 1a. The instruction address generation unit 1a
Adders (adder 11a, adder 12a, adder 13
a), two selection circuits (selector 14a, selector 15
a) and two registers (instruction fetch address register)
Register 16a, instruction decode address register 17a)
It is composed of In Embodiment 2 of the present invention, one instruction block
The lock is said to consist of four instruction codes
Therefore, the address generation unit 1a uses the instruction fetch address
Register 16a and instruction decode address register
The command position output from 17a is shown in FIG.
The instruction address generation unit 1 of the instruction reading device according to the first embodiment
Instruction position is 1 bit, while 2 bits
It has become. Others are the same as the embodiment of the present invention in FIG.
is there. The instruction fetch address register 16a stores
Holds the instruction address where the instruction code to be executed is located
Registers that are processed in a pipelined manner.
The instruction address in the instruction fetch phase must be
It is held during the cycle. Also, instruction decode
The dress register 17a stores an instruction code to be executed.
This register holds the instruction address to be executed.
Instructions in the instruction decode phase of instructions processed in the
Address for at least one clock cycle.
You. The adder 11a outputs the instruction in the instruction fetch phase.
The next instruction address following the instruction address of the instruction code
The instruction fetch address register 16a
To add 1 to the contents of The adder 12a
Indicates that the instruction in the instruction decode phase is a branch instruction
The branch instruction address indicated by the branch instruction code.
In the instruction decode address register 17a,
Branch and the instruction code indicated by the instruction code in the instruction decode phase.
Performs an operation to add the offset value. The selector 14a sets the instruction address to be executed next.
Select a dress, input from the instruction execution device 100
Branch hit and input from the timing generator 6a.
The wait signal to be input is used as a selection signal,
If the number is "1", regardless of the branch hit level
The current value input from the instruction fetch address register 16a
Select an instruction address. On the other hand, the wait signal is “0”
When the branch hit is "0", the data is input from the adder 11a.
Select the next instruction address to be input and wait signal
If "0" and the branch hit is "1", the adder 12a
Select the branch destination instruction address input from
Output to the switch address register 16a. The adder 13a provides an instruction fetch address
The instruction code in the instruction fetch phase held in the register 16a
The next instruction block following the instruction block address containing the
The lock address is calculated. This adder 13
a is the output of the instruction fetch address register 16a.
That is, the value excluding the lower 2 bits, that is, the instruction block address
Operation to add 1 to the address (address in units of 4 instruction codes)
I do. The selector 15a reads from the instruction memory 2a.
Instruction block address for locating the next instruction block to be
It is to choose a dress. This selector 15a
Address selection signal input from timing generator 6a
The instruction block to be read next is
If the instruction block is a continuation of the instruction block being executed
The next instruction block address input from the adder 13a.
Address, while the next instruction block to read is
Instruction containing the instruction code located at the branch destination instruction address
Instruction fetch address register if block
Select the instruction block address input from 16a
Output to the instruction memory 2a. FIG. 11 shows a timing generator according to the second embodiment.
It is a block diagram of 6a. Basics of this timing generator 6a
For the functional functions, the timing generator of the first embodiment described above is used.
The same as the component 6a, except that in the second embodiment, one instruction
The block is said to consist of four instruction codes
Therefore, each signal generated in FIG. 11 is different from FIG.
Are different. The timing generation section has seven flip-flops.
(FF) 31-37, three selectors 38-40,
One decoder 41, one set / reset flip
Flop (RSFF) 42, two AND circuits (AND
(Gates) 43, 44 and two OR circuits (Oage
) 45 and 46. The FF 37 is an instruction
The branch hit input from the row device 100 is
Cycle delay, and the output is used as a branch selection signal.
Lector 38-40, AND gate 43, OR gate 4
5, and the reset input of the RSFF 42.
FF31-34, selector 38-40, AND gate 4
3 and OR gate 45 are composed of four instruction codes
The instruction block to be executed is stored in the instruction memory 2a every four cycles.
From the instruction latch 3a.
A loop for generating the H signal is configured. Execution of a branch instruction causes a new instruction block
Is read from the instruction memory 2a.
Generation timing and branch destination instruction address in that case
Indicates which of four instruction positions in the branch instruction block
The next instruction block following the branch instruction block
The generation timing of the instruction latch signal for lock
It differs from the execution of an instruction sequence that does not include a branch instruction. this
Therefore, the decoder 41 of FIG.
To generate the branch position 1 to 4 signals, and the branch position 1 signal
To the selector 38, the branch position 2 signal to the selector 39,
The branch position 3 signal and the branch position 4 signal are
6, the logical sum is input to the selector 40, and the branch instruction
These signals are selected by selectors 38 to 40 when the instruction is executed.
To select the branch selection signal output from the FF 37
Input to the data 38 to 40. The RSFF 42 generates an address selection signal.
Branch hits input from the instruction execution device 100.
Set at the rising edge of FF37 and output FF37
Reset when the selection signal falls,
Asserted during the cycle and sent to the instruction address generator 1a.
Is forced. The instruction address generation unit 1a uses an address selection signal.
, An instruction block address to be output to the instruction memory 2a.
Select a lesson. AND gate 44, and FF 35,
36 is for generating a wait signal and executing a branch instruction
And the branch destination instruction address is at branch position 4, ie, 4
Is 3 and indicates the last instruction position in the branch destination instruction block
In this case, the wait signal is asserted for one clock cycle.
The instruction address generator 1a and the instruction execution device
It is output to 100. The function of the weight signal is the same as that of the present invention.
This is the same as in the first embodiment. [Operation] FIGS. 12 and 13 show the operation of the second embodiment.
Instruction sequence not including branch instructions in the instruction reading device
6 is an operation time chart when the operation is executed. 14 and 15
Is a branch instruction in which the remainder of 4 of the branch destination instruction address is 0
6 is an operation time chart at the time of execution of an instruction sequence including "." Figure
16 and 17 indicate that the remainder of 4 of the branch destination instruction address is 1.
Time chart during execution of an instruction sequence including branch instructions
It is. 18 and 19 show the remainder of 4 of the branch destination instruction address.
Operation type at the time of execution of an instruction sequence including a branch instruction with a remainder of 2
FIG. FIGS. 20 and 21 show branch instruction addresses.
At the time of execution of an instruction sequence including a branch instruction in which the remainder of 4 is 3
It is an operation time chart. In the second embodiment shown in FIGS.
Operation time chart when executing an instruction sequence that does not include branch instructions
Since the instruction block consists of four instructions,
The instruction block read from the memory 2a takes four clock cycles.
Except for each vehicle, the present invention shown in FIG.
At the time of execution of an instruction sequence not including a branch instruction in the first embodiment
Is the same as FIG. 12 is a block diagram of the instruction reading device shown in FIG.
In FIG. 13, the signals of each part are transmitted to the instruction reading device of FIG.
Instruction fetch phase (IF) to be processed
Instruction decode phase (D) in line device 100, instruction execution
Phase (E), memory operation phase (M), and result storage phase
Each pipeline phase of (W) is shown for each instruction. FIG.
At 13, the instruction memory 2a stores four instruction codes at a time.
The configured instruction block can be read. However
The read access time of the instruction memory 2a is one clock.
Greater than two clock cycles and less than two clock cycles (FIG. 12)
In this case, about 1.5 clock cycles). Less than,
For the instruction block N + 1 in FIGS.
Read from instruction memory 2a and output from instruction register 5a
And the operation supplied to the instruction execution device 100.
Will be described. As shown in FIGS. 12 and 13, each instruction block
Since the block is composed of four instruction codes, the instruction block
May be read every four clock cycles. Therefore, life
The instruction block address is updated every four clock cycles.
The instruction block specified by this instruction block address
The clock is read from the instruction memory 2a every four clock cycles.
Will be issued. The read time of the instruction memory 2a is one clock.
Cycle, so a new instruction memory
Instruction block N + 1 read from 2a is an instruction block.
After the address N + 1 is determined, one clock cycle or later
Determine. Instruction latch signal input to instruction latch 3a
Is about 3 clock cycles after the instruction block address is determined
Since it changes to "1" later, the instruction block N
Keep the contents of Instruction latch signal is "1"
, The instruction latch 3a is transparent. Life to be input
The instruction memory read data has an instruction latch signal of “1”.
At the time of the change, the instruction block address N + 1
The read data to be read has already been determined. Therefore, at this time
The contents of the correct instruction block pass through the instruction latch 3a
Is output to the instruction selector 4a. In the instruction selector 4a, the input execution
Instruction position which is the lower 2 bits of the instruction address to be
The four instruction codes of the instruction block input by
One of them is selected and output to the instruction register 5a. Order
Lock N + 1 is continuous not due to execution of a branch instruction
Command position is "0",
The order is "1", "2", and "3". Therefore, life
Instruction selector 4a in the first clock cycle,
Instruction code N + 1 (0) is issued in the next clock cycle.
Instruction code N + 1 (1) in the next clock cycle
Instruction code N + 1 (2) and the next clock
The instruction code N + 1 (3) is output in the cycle. The instruction latch 3a receives the input instruction latch.
Holds the input data at the falling edge of the signal and
Since the data is held while the switch signal is "0", the instruction block
The four instruction codes N + 1 (0-3) of the block N + 1 are correct.
Input to the instruction register 5a. Instruction register 5a
Is the instruction selected and input by the instruction selector 4a.
Code N + 1 (0), instruction code N + 1 (1), instruction code
Code N + 1 (2) and instruction code N + 1 (3)
The clock is held at the rising edge of the
It is supplied to the instruction execution device 100 during the cycle. Next, the branch destination instruction add shown in FIGS.
Of instruction including branch instruction whose remainder of 4 is 0
The operation at the time will be described. In FIG. 14 and FIG.
N-first executed instruction, that is, instruction code N (0)
Is a branch instruction, and the branch destination instruction address specified by the instruction is
The dress is the first instruction in instruction block M (the instruction at branch position 1).
Command), that is, the command code M (0). here
The instruction block N and the instruction block M are the same as those in the first embodiment.
Similarly, even if continuous in the instruction address space,
It does not matter. The branch instruction {instruction code N (0)} is an instruction
Decoded in the instruction decode phase (D) in the execution device 100
And it is recognized that the instruction is a branch instruction. this
At the same time, the instruction address generator 1a of the instruction reading device
, The calculation of the branch destination instruction address is performed. Branch
The destination instruction address is a branch instruction {branch instruction code N (0)}
From the value of the instruction address where
Obtained by adding the given branch offset values
You. The instruction code N (0) is a branch instruction
The instruction code N (1), which is an instruction following the instruction,
And instruction code N (2) are instructions that should not be executed.
However, in the second embodiment as well,
To use the technique called delayed branching,
It is possible to execute continuously. Therefore, FIG.
An instruction code N (1) which is a delayed instruction shown in FIG.
The instruction code N (2) specifies the branch destination instruction block M as an instruction code.
This is executed during reading from the memory 2a. 14 and 15
Execution time chart of instruction code M (0) which is a branch destination instruction
As you can see from the section, the instruction code that is a branch instruction
From N (0) to instruction code M (0) which is a branch destination instruction
Since there is an interval of two clock cycles, the instruction code N
(1) and two instructions of instruction code N + 1 (0)
It can be a delayed instruction. Here, the present invention is implemented.
In the second embodiment, the instruction memory is
Instruction actual by reading branch instruction block from 2a
It hides line delays. Next, the branch destination instruction add shown in FIGS.
Execution of an instruction string including a branch instruction whose remainder of 4 is 1
The operation at the time will be described. 16 and 17, the instruction block
N-first executed instruction, that is, instruction code N (0)
Is a branch instruction, and the branch destination instruction address specified by the instruction is
The dress is the second instruction of the instruction block M (the branch position 2
Instruction), that is, instruction code M (1). this
In this case, the operation is performed in accordance with the branch destination instruction address 4 shown in FIGS.
Compared with the execution of an instruction sequence that includes a branch instruction whose remainder is 0
Since the branch destination instruction position is different, the instruction
That the instruction code selected by the
The next instruction block following the branch instruction block M
Holding tie of instruction block M + 1 in instruction latch 3a
Timing, that is, the assertion timing of the instruction latch signal is
One clock cycle earlier than in the case of FIGS.
Other than that, the same operation as in FIGS. 14 and 15 is performed. Next, the branch destination instruction add shown in FIGS.
Execution of an instruction string including a branch instruction whose remainder of 4 is 2
The operation at the time will be described. 18 and 19, the instruction block
N-first executed instruction, that is, instruction code N (0)
Is a branch instruction, and the branch destination instruction address specified by the instruction is
Dress is the third instruction in instruction block M (branch position 3
Instruction), that is, an instruction code M (2). this
In this case, the operation is performed in accordance with the branch destination instruction address 4 shown in FIGS.
Compared with the execution of an instruction sequence that includes a branch instruction whose remainder is 0
Since the branch destination instruction position is different, the instruction
That the instruction code selected by the
The next instruction block following the branch instruction block M
Holding tie of instruction block M + 1 in instruction latch 3a
Timing, that is, the assertion timing of the instruction latch signal is
Two clock cycles earlier than in FIGS. 14 and 15
Otherwise, the same operation as in the case shown in FIGS. Finally, the branch destination instruction A shown in FIGS.
The execution of the instruction string including the branch instruction in which the remainder of dress 4 is 3
The operation at the time of execution will be described. The operation in this case is as described above.
Includes branch instructions where the remainder of 4 in the branch instruction address is 2
The major difference from the execution of the instruction sequence is the branch destination instruction block.
The instruction block that is the next instruction block following the lock M
By reading the mark M + 1 from the instruction memory 2a in FIG.
One clock cycle wait cycle
You. 20 and 21, at the beginning of the instruction block N
Instruction, that is, the instruction code N (0)
Instruction, and the branch destination instruction address specified by the instruction is
The last instruction of the branch destination instruction block M (the instruction at branch position 4)
Command), that is, the command code M (3). Branch instruction
Read line and branch destination instruction block and to instruction latch 3a
Is the same as the operation shown in FIGS.
The branch destination instruction is located at instruction position 4 in the instruction block.
Therefore, the instruction position input to the instruction selector 4a is
It becomes “3”, so that the instruction selector 4a
Select code M (3) and output to instruction register 5a
You. The instruction register 5a is the same as the case shown in FIGS.
Holds the instruction code M (3) which is the branch destination instruction,
Output to the instruction execution device 100. Up to this point,
Wait cycles are inserted as in the case shown in FIG.
Not. That is, the wait signal is not asserted. The instruction address at the branch destination is the instruction position in the block.
In the case of 4, the branch destination instruction block M is allocated last.
Only the placed instruction code M (3) is executed. Toes
Instruction block M read in two clock cycles
It will be executed in one clock cycle. Follow
And the next instruction block following the branch destination instruction block M
Is an instruction block M + 1 of this one clock cycle
Must be read from the instruction memory 2a during
However, the instruction block read time of the instruction memory 2a is one cycle.
One clock cycle way beyond lock
It is necessary to generate a cycle. Figures 20, 21
The wait cycle is the instruction of instruction code M (3).
Decode phase (D) and instruction code M + 1 (0)
Inserted before the instruction fetch phase (IF). [Effect] Instruction reading device of the second embodiment
Is one instruction block compared to the reading device of the first embodiment.
The big difference is that the block is composed of four instruction blocks.
Is different. One instruction block every 4 clock cycles
By reading out more executions than in the first embodiment.
The instruction to be executed and the delay instruction in the case of branch instruction execution are
Branch can be held in the instruction latch 3a in advance.
The branch destination instruction block during instruction execution and the
When reading the next instruction block from the instruction memory 2a.
Weight cycle insertion can be greatly reduced
You. The instruction reading device according to the second embodiment
Instruction blocks consisting of instructions can be read at once
Instruction memory 2a, read from this instruction memory 2a
Instruction latch 3a for temporarily holding the instruction block
Instruction latch 3a.
Select the next instruction code to be executed
Instruction selector 4a for selecting the instruction, this instruction selector 4a
Instruction to temporarily hold the instruction code selected in
Register 5a and the instruction addresses required for these,
Address generation unit 1 for supplying each control signal
a, and a timing generation unit 6a.
Read multiple instruction codes that follow and temporarily store them.
And extract the next instruction code to be executed from
Set in the instruction register 5a and supply to the instruction execution device 100
A plurality of instructions at once from the instruction memory 2a.
The read time of the instruction memory 2a.
Exceeds the instruction processing time in the instruction execution device 100.
Even if the instruction sequence does not include a branch instruction,
Executes the instruction code to be executed in each instruction execution cycle
Instructions that can be provided to the apparatus 100 and include branch instructions
When executing a sequence, the instruction block or branch
Read the next instruction block following the foremost instruction block
Instruction memo only by inserting a wait cycle.
Increase in instruction execution time due to decrease in read speed of memory 2a
Is minimized. On the other hand, a conventional pipeline computer
In the instruction reading device, the instruction execution device executes the instruction.
Instruction reading must be done in less than time, which is satisfactory
If not, a wait cycle is applied to each instruction fetch phase.
Or lower the CPU operating clock frequency.
It responds by the method. In the instruction memory 2a of the second embodiment,
Means that the read time is longer than one clock cycle and two
It is less than the lock cycle. This instruction memory 2a
Using the conventional instruction reading device
Read instruction in one clock cycle or less
Average instruction execution compared to the case where
Row times are doubled. However, according to the second embodiment of the present invention,
In the instruction reading device, the average instruction execution time Tav is: Tav = 1 × (1-B / 100 × L / 100) +2
× (B / 100 × L / 100) It becomes. In the above formula, the branch instruction
The frequency is B%, and the branch destination instruction
The ratio where the remainder of dress 4 is 3 (branch instruction position 3)
O%. For example, if B = 10 and O = 25
Tav = 1.025, and the average instruction execution time increases
2.5% of the conventional method will suffice. This is an implementation
It is 1/6 as compared with the instruction reading device of state 1. In the first and second embodiments, two at a time
To read four instruction blocks
Although it is configured, it is not limited to this numerical value.
To read one or five or more instruction blocks
These may be configured according to the application, specifications, etc.
Choice is possible. [0085] As described above, the instruction reading of the first invention is performed.
According to the read-out device, the instruction memory is
And outputs each instruction in each instruction execution cycle
The read time of the instruction memory
The instruction processing time of the device exceeds one instruction execution cycle
Instruction code to be executed in each instruction execution cycle
Can be supplied to the instruction execution unit, and
It consumes less power because it does not require fast memory.
Since the operating clock of the CPU is no longer limited,
Instruction execution time while securing sufficient storage capacity.
Speed can be increased. Further, according to the instruction reading device of the second invention,
For example, in the case of execution of an instruction sequence including a branch instruction,
Continue to the instruction block or the instruction block at the branch destination.
Wait cycle only when reading the next instruction block
Is inserted, the execution of the instruction string including the branch instruction is executed.
Even in the case of a row, the reading speed of the instruction memory is reduced.
An increase in instruction execution time can be minimized.

【図面の簡単な説明】【図１】本発明の命令読み出し装置の実施形態１を示す
構成図である。【図２】本発明の命令読み出し装置における実施形態１
の命令アドレス生成部の構成図である。【図３】本発明の命令読み出し装置における実施形態１
のタイミング生成部の構成図である。【図４】本発明の命令読み出し装置の実施形態１におけ
る分岐命令を含まない命令列の実行時の動作タイムチャ
ートである。【図５】本発明の命令読み出し装置の実施形態１におけ
る分岐先命令アドレスが偶数番地となる分岐命令を含む
命令列の実行時の動作タイムチャート（その１）であ
る。【図６】本発明の命令読み出し装置の実施形態１におけ
る分岐先命令アドレスが偶数番地となる分岐命令を含む
命令列の実行時の動作タイムチャート（その２）であ
る。【図７】本発明の命令読み出し装置の実施形態１におけ
る分岐命令アドレスが奇数番地となる分岐命令を含む命
令列の実行時の動作タイムチャート（その１）である。【図８】本発明の命令読み出し装置の実施形態１におけ
る分岐命令アドレスが奇数番地となる分岐命令を含む命
令列の実行時の動作タイムチャート（その２）である。【図９】本発明の命令読み出し装置における実施形態２
の構成図である。【図１０】本発明の命令読み出し装置における実施形態
２の命令アドレス生成部の構成図である。【図１１】本発明の命令読み出し装置における実施形態
２のタイミング生成部の構成図である。【図１２】本発明の命令読み出し装置の実施形態２にお
ける分岐命令を含まない命令列の実行時の動作タイムチ
ャート（その１）である。【図１３】本発明の命令読み出し装置の実施形態２にお
ける分岐命令を含まない命令列の実行時の動作タイムチ
ャート（その２）である。【図１４】本発明の命令読み出し装置の実施形態２にお
ける分岐先命令アドレスの４の剰余が０となる分岐命令
を含む命令列の実行時の動作タイムチャート（その１）
である。【図１５】本発明の命令読み出し装置の実施形態２にお
ける分岐先命令アドレスの４の剰余が０となる分岐命令
を含む命令列の実行時の動作タイムチャート（その２）
である。【図１６】本発明の命令読み出し装置の実施形態２にお
ける分岐先命令アドレスの４の剰余が１となる分岐命令
を含む命令列の実行時の動作タイムチャート（その１）
である。【図１７】本発明の命令読み出し装置の実施形態２にお
ける分岐先命令アドレスの４の剰余が１となる分岐命令
を含む命令列の実行時の動作タイムチャート（その２）
である。【図１８】本発明の命令読み出し装置の実施形態２にお
ける分岐先命令アドレスの４の剰余が２となる分岐命令
を含む命令列の実行時の動作タイムチャート（その１）
である。【図１９】本発明の命令読み出し装置の実施形態２にお
ける分岐先命令アドレスの４の剰余が２となる分岐命令
を含む命令列の実行時の動作タイムチャート（その２）
である。【図２０】本発明の命令読み出し装置の実施形態２にお
ける分岐先命令アドレスの４の剰余が３となる分岐命令
を含む命令列の実行時の動作タイムチャート（その１）
である。【図２１】本発明の命令読み出し装置の実施形態２にお
ける分岐先命令アドレスの４の剰余が３となる分岐命令
を含む命令列の実行時の動作タイムチャート（その２）
である。【符号の説明】１、１ａ命令アドレス生成部２、２ａ命令メモリ３、３ａ命令ラッチ４、４ａ命令セレクタ５、５ａ命令レジスタ６、６ａタイミング生成部BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a configuration diagram showing Embodiment 1 of an instruction reading device of the present invention. FIG. 2 is a first embodiment of an instruction reading device of the present invention.
3 is a configuration diagram of an instruction address generation unit of FIG. FIG. 3 is a first embodiment of an instruction reading device according to the present invention;
FIG. 3 is a configuration diagram of a timing generation unit. FIG. 4 is an operation time chart at the time of executing an instruction sequence not including a branch instruction in the first embodiment of the instruction reading apparatus of the present invention. FIG. 5 is an operation time chart (part 1) for executing an instruction sequence including a branch instruction whose branch destination instruction address is an even address in Embodiment 1 of the instruction reading apparatus of the present invention; FIG. 6 is an operation time chart (part 2) of executing an instruction sequence including a branch instruction whose branch destination instruction address is an even address in the first embodiment of the instruction reading apparatus of the present invention. FIG. 7 is an operation time chart (part 1) of executing an instruction sequence including a branch instruction whose branch instruction address is an odd address in the first embodiment of the instruction reading apparatus of the present invention. FIG. 8 is an operation time chart (part 2) for executing an instruction sequence including a branch instruction having an odd-numbered branch instruction address in the first embodiment of the instruction reading apparatus of the present invention. FIG. 9 is a second embodiment of the instruction reading apparatus of the present invention.
FIG. FIG. 10 is a configuration diagram of an instruction address generation unit according to a second embodiment in the instruction reading device of the present invention. FIG. 11 is a configuration diagram of a timing generation unit according to a second embodiment in the instruction reading device of the present invention. FIG. 12 is an operation time chart (part 1) at the time of executing an instruction sequence not including a branch instruction in the second embodiment of the instruction reading apparatus of the present invention. FIG. 13 is an operation time chart (part 2) of the instruction reading apparatus according to the second embodiment of the present invention when executing an instruction sequence that does not include a branch instruction. FIG. 14 is an operation time chart at the time of execution of an instruction sequence including a branch instruction in which the remainder of 4 of the branch destination instruction address is 0 in the instruction reading apparatus according to the second embodiment of the present invention (part 1);
It is. FIG. 15 is an operation time chart at the time of execution of an instruction sequence including a branch instruction in which the remainder of 4 of the branch destination instruction address is 0 in the instruction reading apparatus according to the second embodiment of the present invention (part 2);
It is. FIG. 16 is an operation time chart at the time of execution of an instruction sequence including a branch instruction in which the remainder of 4 of the branch destination instruction address is 1 in the instruction reading apparatus according to the second embodiment of the present invention (part 1);
It is. FIG. 17 is an operation time chart at the time of execution of an instruction sequence including a branch instruction in which the remainder of 4 of the branch destination instruction address is 1 in the instruction reading apparatus according to the second embodiment of the present invention (part 2);
It is. FIG. 18 is an operation time chart at the time of execution of an instruction sequence including a branch instruction in which the remainder of 4 of the branch destination instruction address is 2 in the instruction reading apparatus according to the second embodiment of the present invention (part 1);
It is. FIG. 19 is an operation time chart at the time of execution of an instruction sequence including a branch instruction in which the remainder of 4 of the branch destination instruction address is 2 in the instruction reading apparatus according to the second embodiment of the present invention (part 2);
It is. FIG. 20 is an operation time chart at the time of execution of an instruction sequence including a branch instruction in which the remainder of 4 of the branch destination instruction address is 3 in the instruction reading apparatus according to the second embodiment of the present invention (part 1);
It is. FIG. 21 is an operation time chart at the time of execution of an instruction sequence including a branch instruction in which the remainder of 4 of the branch destination instruction address is 3 in the instruction reading apparatus according to the second embodiment of the present invention (part 2);
It is. [Description of Signs] 1, 1a Instruction address generator 2, 2a Instruction memory 3, 3a Instruction latch 4, 4a Instruction selector 5, 5a Instruction register 6, 6a Timing generator

フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 9/38 Continuation of the front page (58) Field surveyed (Int.Cl. ⁷ , DB name) G06F 9/38

Claims

(57) [Claim 1] An instruction reading device in a pipeline type computer which performs one stage in one clock cycle, wherein the instruction reading device stores a plurality of instruction codes of a program to be executed. An instruction memory capable of collectively reading a plurality of instruction codes as an instruction block; and an instruction block address for designating the instruction block to store the plurality of instruction codes in an instruction latch in the instruction memory. Output and output the instruction position to specify the instruction code to be executed
An instruction address generation unit, an instruction selector that selects and outputs any one of the plurality of instruction codes from the instruction latch based on the instruction position output from the instruction address generation unit, and an instruction selector selected by the instruction selector. said instruction code temporarily holds, before the instruction execution unit to execute the instruction code
An instruction register for outputting serial every clock cycle, the instructions address the instruction position output from the generator, the instruction execution unit on the basis of the branch hit outputted is determined that the branch instruction the instruction code , Said 1
And a timing generation unit for outputting a wait signal for stopping the instruction processing only locked cycles to the instruction execution unit and the instruction address generation unit, said timing generation unit, before the said instruction position
Only the last instruction code of the instruction block at the branch destination is processed
If the target is determined, the weight signal is output again.
An instruction reading device characterized by the above-mentioned.