JP2000357090A

JP2000357090A - Microcomputer and cache control method

Info

Publication number: JP2000357090A
Application number: JP11168213A
Authority: JP
Inventors: Hirokazu Tsukamoto; 宏和塚本
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1999-06-15
Filing date: 1999-06-15
Publication date: 2000-12-26

Abstract

PROBLEM TO BE SOLVED: To shorten access time with instruction fetch to a main memory in condition branching instruction execution time without providing a complicated branch predicting mechanism in a CPU. SOLUTION: This device has a CPU 12 which incorporates the branch predicting mechanism for shortening time for access to a main memory 18 when a cache error occurs because of a condition branching instruction, a cache memory 10 to be the execution object of instruction fetch due to the CPU 12 at ordinary time, a branch establishment prefetch queue 14 to be the execution object of instruction fetch due to the CPU 12 when branch is established, a branch non-establishment prefetch queue 16 to be the execution object of instruction fetch due to the CPU 12 when branch is not established, a predecoder 22 provided outside the CPU 12 for judging the branching instruction and a memory controller 20 having an address generating function for memory access.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、マイクロコンピュ
ータのキャッシュ制御技術に係り、特にＣＰＵに複雑な
分岐予測機構を設けることなく条件分岐命令実行時間の
メインメモリに対する命令フェッチに伴うアクセス時間
の短縮を図ると同時に回路構成の複雑化を回避して性能
向上を図るマイクロコンピュータおよびキャッシュ制御
方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a cache control technique for a microcomputer, and more particularly to a technique for reducing the access time of a conditional branch instruction execution time associated with an instruction fetch to a main memory without providing a complicated branch prediction mechanism in a CPU. The present invention relates to a microcomputer and a cache control method for improving performance by simultaneously avoiding complication of a circuit configuration.

【０００２】[0002]

【従来の技術】命令キャッシュ、命令プリフェッチバッ
ファ、および主メモリの階層構造を備えた情報処理装置
の従来のメモリ制御装置においては、プリフェッチされ
るメモリブロックは最後にアクセスした（ＭｏｓｔＲ
ｅｃｅｎｔｌｙＵｓｅｄ、最も最近使用されたの意味
で「ＭＲＵ」ともいう）メモリブロック（所定ワード数
からなる）の次のアドレスのメモリブロックとされてい
た。例えばアドレスに従った集合として順序付けられた
メモリ中のブロックを想定し、メモリブロックＬｉをア
クセスする時に、通常プログラムの振舞の局所性等から
次のメモリブロックＬｉ＋１が命令キャッシュおよび命
令プリフェッチバッファ中に存在しない場合メモリブロ
ックＬｉ＋１が主メモリからプリフェッチされる。2. Description of the Related Art In a conventional memory control device of an information processing device having a hierarchical structure of an instruction cache, an instruction prefetch buffer, and a main memory, a prefetched memory block is accessed last (Most R).
The newly used memory block is a memory block at an address next to a memory block (consisting of a predetermined number of words). For example, assuming blocks in the memory ordered as a set according to the address, when accessing the memory block Li, the next memory block Li + 1 exists in the instruction cache and the instruction prefetch buffer due to locality of the behavior of the normal program. If not, the memory block Li + 1 is prefetched from the main memory.

【０００３】しかしながら、従来のメモリ制御装置にお
いては、ＣＰＵが実行する命令中に分岐命令が含まれて
いる場合、プリフェッチした命令が無駄になる場合があ
る。すなわち、分岐先アドレスの命令の写し（コピー）
が命令キャッシュメモリ内に存在せず、また命令プリフ
ェッチバッファにも存在しない場合には該アドレスの命
令を含むメモリブロックを新たに主メモリからリードす
る（読み出す）ことが必要とされ、既にプリフェッチし
た命令が無駄になる。キャッシュメモリを備えた高速プ
ロセッサの処理能力はキャッシュメモリに対する高いヒ
ット率に依存し、分岐命令実行時には処理性能が低下す
る場合があるという問題がある。また、メモリ制御装置
が主メモリから命令をプリフェッチしている最中にＣＰ
Ｕから分岐先の命令の要求が来た場合に、進行中のメモ
リ読み出し動作を中断して分岐先の命令を含むメモリブ
ロックを主メモリからリードしなければならず、命令要
求から命令の転送終了までに要する時間が、プリフェッ
チ機能を持たないメモリ制御装置よりも長くなるという
問題がある。However, in the conventional memory control device, if a branch instruction is included in an instruction executed by the CPU, a prefetched instruction may be wasted. That is, a copy of the instruction at the branch destination address
Is not present in the instruction cache memory and is not present in the instruction prefetch buffer, it is necessary to newly read (read) a memory block including the instruction at the address from the main memory, and the instruction which has already been prefetched is required. Is wasted. The processing performance of a high-speed processor having a cache memory depends on a high hit rate for the cache memory, and there is a problem that the processing performance may be reduced when executing a branch instruction. Also, while the memory controller is prefetching instructions from main memory, the CP
When a request for a branch destination instruction comes from U, the ongoing memory read operation must be interrupted and the memory block containing the branch destination instruction must be read from the main memory. However, there is a problem in that the time required until the memory control device has no prefetch function is longer.

【０００４】このような問題点を解決することを目的と
する従来技術としては、例えば、特開平８−２８６９１
４号公報に記載のものがある。すなわち、従来技術は、
キャッシュメモリの前段に配置され、キャッシュメモリ
に転送される分岐命令を検出し、分岐先のアドレスを生
成出力するデコード手段を備えるとともに、デコード手
段の出力に基づき次にプリフェッチするメモリブロック
のアドレスを決定する手段を備えた情報処理装置のメモ
リ制御装置であって、デコード手段が、主メモリの出力
と命令プリフェッチバッファの出力のいずれか一を選択
出力するセレクタの出力を入力とし、かつ命令キャッシ
ュメモリの前段に配置され、ＣＰＵが必要とする命令が
命令キャッシュメモリ中に存在しない場合において命令
を主メモリまたは命令プリフェッチバッファから命令キ
ャッシュメモリへ転送する際に、転送中の命令を入力し
これをデコードするように構成され、制御手段が、デコ
ード手段から出力される分岐命令検出信号がアクティブ
状態のときに、分岐先アドレスを含むメモリブロックの
アドレスを次にプリフェッチするメモリブロックのアド
レスとして主メモリから読み出し、また、メモリブロッ
ク転送中においてデコード手段から出力される分岐命令
の検出信号がインアクティブ状態の時には、転送したメ
モリブロックの次のメモリブロックのアドレスを次にプ
リフェッチするメモリブロックのアドレスとして主メモ
リから読み出すように構成されていることが開示されて
いる。このような従来技術では、命令キャッシュメモリ
へ転送中の命令をデコードし、分岐命令がある場合には
分岐先の命令をプリフェッチすることにより、プリフェ
ッチの効率を改善して処理性能を向上させるという効
果、命令キャッシュメモリに転送された命令をＣＰＵが
実行する段階では分岐先アドレスを含むメモリブロック
はプリフェッチバッファに既に格納されているため、メ
モリブロックを主メモリから読み出すことが不要とさ
れ、分岐命令実行の際の処理性能の低下を抑止できると
いった効果が記載されている。As a prior art for solving such a problem, for example, Japanese Patent Application Laid-Open No. 8-28691 is disclosed.
There is one described in Japanese Patent Publication No. That is, the prior art
Decoding means for detecting a branch instruction to be transferred to the cache memory and generating and outputting a branch destination address is provided at the preceding stage of the cache memory, and the address of the next memory block to be prefetched is determined based on the output of the decoding means. A memory control device of an information processing device, comprising: a decoder for receiving an output of a selector for selecting and outputting one of an output of a main memory and an output of an instruction prefetch buffer; When an instruction required by the CPU is not present in the instruction cache memory and is transferred from the main memory or the instruction prefetch buffer to the instruction cache memory, the instruction being transferred is input and decoded. Control means outputs from the decoding means. When the branch instruction detection signal to be activated is in the active state, the address of the memory block including the branch destination address is read from the main memory as the address of the memory block to be prefetched next, and the branch output from the decoding means during the memory block transfer. It is disclosed that when an instruction detection signal is in an inactive state, an address of a memory block next to a transferred memory block is read from a main memory as an address of a memory block to be prefetched next. In such a conventional technique, an instruction being transferred to the instruction cache memory is decoded, and if there is a branch instruction, the branch destination instruction is prefetched, thereby improving the prefetch efficiency and improving the processing performance. When the CPU executes the instruction transferred to the instruction cache memory, the memory block including the branch destination address is already stored in the prefetch buffer, so that it is not necessary to read the memory block from the main memory. The effect is described that a reduction in the processing performance in the case of can be suppressed.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、従来技
術には以下に掲げる問題点があった。まず、第１の問題
点は、分岐予測を実行するための論理回路と、分岐命令
の実行履歴を保存する分岐ターゲットキャッシュが必要
であり、多くの資源が必要であることである。また第２
の問題点は、従来の分岐予測機構を持つＣＰＵのプリフ
ェッチ動作では、分岐予測した側の命令列のアドレスを
基にメモリコントローラがシーケンシャルにプリフェッ
チを実行し、プリフェッチを行ったデータがキャッシュ
メモリに書き込まれるため、分岐予測が失敗した場合に
は、条件分岐命令実行後にキャッシュのミスヒットが発
生する可能性が高いことである。また第３の問題点は、
条件分岐命令実行後にキャッシュのミスヒットが発生し
た場合、条件分岐が失敗した側の命令列に対してメイン
メモリからキャッシュのリフィル時間が必要となり、実
行速度が低下することである。そして第４の問題点は、
条件分岐命令実行後にキャッシュのミスヒットが発生し
た場合であって条件分岐の予測が失敗したときは、プリ
フェッチ動作によって本来不要なデータをフェッチして
しまい、キャッシュへのストア時に元のキャッシュライ
ンが破壊されてしまうため、キャッシュのヒット率が低
下してしまう結果、さらに性能を悪化させることであ
る。However, the prior art has the following problems. First, a first problem is that a logic circuit for executing branch prediction and a branch target cache for storing the execution history of a branch instruction are required, and many resources are required. Also the second
The problem is that, in the prefetch operation of the CPU having the conventional branch prediction mechanism, the memory controller executes the prefetch sequentially based on the address of the instruction sequence on the branch prediction side, and the prefetched data is written to the cache memory. Therefore, if the branch prediction fails, there is a high possibility that a cache miss occurs after the execution of the conditional branch instruction. The third problem is that
If a cache mishit occurs after the execution of a conditional branch instruction, a refill time of the cache from the main memory is required for the instruction sequence on the side where the conditional branch has failed, and the execution speed decreases. And the fourth problem is that
If a cache miss occurs after execution of a conditional branch instruction and the prediction of the conditional branch fails, the prefetch operation fetches originally unnecessary data, and the original cache line is destroyed when stored in the cache. As a result, the cache hit rate is reduced, and the performance is further degraded.

【０００６】本発明は斯かる問題点を鑑みてなされたも
のであり、その目的とするところは、ＣＰＵに複雑な分
岐予測機構を設けることなく条件分岐命令実行時間のメ
インメモリに対する命令フェッチに伴うアクセス時間の
短縮を図ると同時に回路構成の複雑化を回避して性能向
上を図るマイクロコンピュータおよびキャッシュ制御方
法を提供する点にある。The present invention has been made in view of such a problem, and an object of the present invention is to accompany the instruction fetch of the conditional branch instruction execution time to the main memory without providing a complicated branch prediction mechanism in the CPU. It is an object of the present invention to provide a microcomputer and a cache control method for shortening access time and improving performance by avoiding complication of a circuit configuration.

【０００７】[0007]

【課題を解決するための手段】請求項１に記載の発明の
要旨は、ＣＰＵに複雑な分岐予測機構を設けることなく
条件分岐命令実行時間のメインメモリに対する命令フェ
ッチに伴うアクセス時間の短縮を図ると同時に回路構成
の複雑化を回避して性能向上を図るマイクロコンピュー
タであって、条件分岐命令によるキャッシュミスの発生
時でのメインメモリへのアクセス時間を短縮するために
分岐予測機構を内蔵するＣＰＵと、通常時に前記ＣＰＵ
の命令フェッチの実行対象となるキャッシュメモリと、
分岐成立時に前記ＣＰＵの命令フェッチの実行対象とな
る分岐成立プリフェッチキューと、分岐不成立時に前記
ＣＰＵの命令フェッチの実行対象となる分岐不成立プリ
フェッチキューと、前記ＣＰＵの外に設けられ分岐命令
を判断するプリデコーダと、メモリアクセスのアドレス
生成機能を有するメモリコントローラを有することを特
徴とするマイクロコンピュータに存する。また請求項２
に記載の発明の要旨は、ＣＰＵに複雑な分岐予測機構を
設けることなく条件分岐命令実行時間のメインメモリに
対する命令フェッチに伴うアクセス時間の短縮を図ると
同時に回路構成の複雑化を回避して性能向上を図るマイ
クロコンピュータであって、条件分岐命令によるキャッ
シュミスの発生時でのメインメモリへのアクセス時間を
短縮するために分岐予測機構を内蔵するＣＰＵと、通常
時に前記ＣＰＵの命令フェッチの実行対象となるキャッ
シュメモリと、分岐成立時に前記ＣＰＵの命令フェッチ
の実行対象となる分岐成立プリフェッチキューと、分岐
不成立時に前記ＣＰＵの命令フェッチの実行対象となる
分岐不成立プリフェッチキューと、前記ＣＰＵの命令フ
ェッチの実行対象とならなかった他方のプリフェッチキ
ューのデータを破棄せずに格納する分岐不使用側プリフ
ェッチキャッシュメモリと、前記ＣＰＵの外に設けられ
分岐命令を判断するプリデコーダと、メモリアクセスの
アドレス生成機能を有するメモリコントローラを有する
ことを特徴とするマイクロコンピュータに存する。また
請求項３に記載の発明の要旨は、前記プリデコーダは、
前記メインメモリから分岐成立側および／または分岐不
成立側の２つのプリフェッチキューである前記分岐成立
プリフェッチキューおよび／または前記分岐不成立プリ
フェッチキューへのデータ転送を制御するように構成さ
れていることを特徴とする請求項１または２に記載のマ
イクロコンピュータに存する。また請求項４に記載の発
明の要旨は、前記プリデコーダは、命令フェッチでキャ
ッシュミスが発生した場合に、前記メインメモリからプ
リフェッチバッファに保持されるフェッチデータを基に
分岐命令を判断するとともに、前記キャッシュメモリ、
前記分岐成立プリフェッチキュー、前記分岐不成立プリ
フェッチキューのうちいずれに当該フェッチデータを転
送するかを決定するように構成されていることを特徴と
する請求項１または２に記載のマイクロコンピュータに
存する。また請求項５に記載の発明の要旨は、前記ＣＰ
Ｕは、前記キャッシュメモリで分岐命令をフェッチした
後、当該分岐命令実行結果に応じて前記分岐成立プリフ
ェッチキュー、前記分岐不成立プリフェッチキューのう
ちいずれにアクセスするかを決定して前記フェッチデー
タの前記ＣＰＵへの取り込み処理を実行するように構成
されていることを特徴とする請求項４に記載のマイクロ
コンピュータに存する。また請求項６に記載の発明の要
旨は、ＣＰＵに複雑な分岐予測機構を設けることなく条
件分岐命令実行時間のメインメモリに対する命令フェッ
チに伴うアクセス時間の短縮を図ると同時に回路構成の
複雑化を回避して性能向上を図るキャッシュ制御方法で
あって、条件分岐命令によるキャッシュミスの発生時で
のメインメモリへのアクセス時間を短縮するために分岐
予測処理をＣＰＵを制御して実行する工程と、通常時に
前記ＣＰＵの命令フェッチの実行対象となるキャッシュ
メモリを管理する工程と、分岐成立時に前記ＣＰＵ命令
フェッチの実行対象となる分岐成立プリフェッチキュー
を管理する工程と、分岐不成立時に前記ＣＰＵの命令フ
ェッチの実行対象となる分岐不成立プリフェッチキュー
を管理する工程と、前記ＣＰＵの外に設けられ分岐命令
を判断するプリデコーダを管理する工程と、メモリアク
セスのアドレス生成機能を有するメモリコントローラを
管理する工程を有することを特徴とするキャッシュ制御
方法に存する。また請求項７に記載の発明の要旨は、Ｃ
ＰＵに複雑な分岐予測機構を設けることなく条件分岐命
令実行時間のメインメモリに対する命令フェッチに伴う
アクセス時間の短縮を図ると同時に回路構成の複雑化を
回避して性能向上を図るキャッシュ制御方法であって、
条件分岐命令によるキャッシュミスの発生時でのメイン
メモリへのアクセス時間を短縮するために分岐予測処理
をＣＰＵを制御して実行する工程と、通常時に前記ＣＰ
Ｕの命令フェッチの実行対象となるキャッシュメモリを
管理する工程と、分岐成立時に前記ＣＰＵの命令フェッ
チの実行対象となる分岐成立プリフェッチキューを管理
する工程と、分岐不成立時に前記ＣＰＵの命令フェッチ
の実行対象となる分岐不成立プリフェッチキューを管理
する工程と、前記ＣＰＵの命令フェッチの実行対象とな
らなかった他方のプリフェッチキューのデータを破棄せ
ずに格納する分岐不使用側プリフェッチキャッシュメモ
リを管理する工程と、前記ＣＰＵの外に設けられ分岐命
令を判断するプリデコーダを管理する工程と、メモリア
クセスのアドレス生成機能を有するメモリコントローラ
を管理する工程を有することを特徴とするキャッシュ制
御方法に存する。また請求項８に記載の発明の要旨は、
前記プリデコーダは、前記メインメモリから分岐成立側
および／または分岐不成立側の２つのプリフェッチキュ
ーである前記分岐成立プリフェッチキューおよび／また
は前記分岐不成立プリフェッチキューへのデータ転送を
制御するように構成されていることを特徴とする請求項
６または７に記載のキャッシュ制御方法に存する。また
請求項９に記載の発明の要旨は、前記プリデコーダを管
理する工程は、命令フェッチでキャッシュミスが発生し
た場合に、前記メインメモリからプリフェッチバッファ
に保持されるフェッチデータを基に分岐命令を判断する
とともに、前記キャッシュメモリ、前記分岐成立プリフ
ェッチキュー、前記分岐不成立プリフェッチキューのう
ちいずれに当該フェッチデータを転送するかを決定する
工程を含むことを特徴とする請求項６または７に記載の
キャッシュ制御方法に存する。また請求項１０に記載の
発明の要旨は、前記分岐予測処理をＣＰＵを制御して実
行する工程は、前記キャッシュメモリで分岐命令をフェ
ッチした後、当該分岐命令実行結果に応じて前記分岐成
立プリフェッチキュー、前記分岐不成立プリフェッチキ
ューのうちいずれにアクセスするかを決定して前記フェ
ッチデータの前記ＣＰＵへの取り込み処理を実行する含
むことを特徴とする請求項９に記載のキャッシュ制御方
法に存する。また請求項１１に記載の発明の要旨は、命
令実行の前に前記メモリコントローラが前記メインメモ
リから命令アクセスのプリフェッチを実行し、通常は前
記キャッシュメモリにデータを格納する工程と、前記Ｃ
ＰＵが前記キャッシュメモリから命令をフェッチすると
ともに、プリフェッチした命令のプリデコードを実行し
て分岐命令および条件分岐命令を他の命令と判別し、プ
リデコード結果が分岐命令の場合に前記プリデコーダが
飛び先アドレスの計算を実行し、前記キャッシュメモリ
にデータが存在しない場合に前記メインメモリから飛び
先アドレスのプリフェッチデータを前記プリデコーダが
前記キャッシュメモリに格納し、前記キャッシュメモリ
から命令をフェッチし、プリデコード結果が条件分岐命
令の場合に前記プリデコーダが飛び先アドレスの計算を
実行し、前記キャッシュメモリにデータが存在しない場
合に前記メインメモリから飛び先アドレスのプリフェッ
チデータを前記分岐成立プリフェッチキューに格納し、
分岐不成立時のシーケンシャルな命令プリフェッチも併
せて実行して前記分岐不成立プリフェッチキューに格納
する工程と、分岐命令実行の分岐判断時に、分岐成立側
および分岐不成立側の２つのプリフェッチキューのデー
タについて、分岐判断に合致したプリフェッチキューか
ら命令フェッチを実行して前記キャッシュメモリに当該
命令フェッチ実行時のフェッチデータを格納するととも
に、他方のプリフェッチキューである命令フェッチが実
行されなかったプリフェッチキューのデータを破棄する
工程を有することを特徴とする請求項６に記載のキャッ
シュ制御方法に存する。また請求項１２に記載の発明の
要旨は、命令実行の前に前記メモリコントローラが前記
メインメモリから命令アクセスのプリフェッチを実行し
て前記キャッシュメモリにデータを格納し、前記ＣＰＵ
が前記キャッシュメモリから命令をフェッチし、プリフ
ェッチした命令のプリデコードを実行し、分岐命令およ
び条件分岐命令を他の命令と判別する工程と、前記ＣＰ
Ｕが前記キャッシュメモリから命令をフェッチするとと
もに、プリフェッチした命令のプリデコードを実行して
分岐命令および条件分岐命令を他の命令と判別し、前記
プリデコーダが、プリデコード結果が分岐命令の場合に
飛び先アドレスの計算を実行し、当該計算結果に基づい
て、前記キャッシュメモリにデータが存在しない場合に
前記メインメモリから飛び先アドレスのプリフェッチデ
ータを前記キャッシュメモリに格納し、プリデコード結
果が条件分岐命令の場合に飛び先アドレスの計算を実行
し、前記キャッシュメモリにデータが存在しない場合に
前記メインメモリから飛び先アドレスのプリフェッチデ
ータを前記分岐成立プリフェッチキューに格納し、分岐
不成立時のシーケンシャルな命令プリフェッチも直後に
実行して前記分岐不成立プリフェッチキューに格納する
工程と、分岐成立側および分岐不成立側の２つのプリフ
ェッチキューは、分岐命令実行の分岐判断時に分岐判断
に合致したプリフェッチキューから命令フェッチを実行
して前記キャッシュメモリにフェッチデータを格納する
工程と、命令フェッチに使用しなかった他方のプリフェ
ッチキューのデータを破棄せずに前記分岐不使用側プリ
フェッチキャッシュメモリに格納する工程と、再び同じ
条件分岐命令を前記キャッシュメモリの使用時に実行
し、前記キャッシュメモリに現在ない命令列を実行する
場合に前記分岐不使用側プリフェッチキャッシュメモリ
に蓄えられた命令を実行する工程を有することを特徴と
する請求項７に記載のキャッシュ制御方法に存する。The gist of the present invention is to reduce the access time associated with the instruction fetch of the conditional branch instruction execution time from the main memory without providing a complicated branch prediction mechanism in the CPU. At the same time, a microcomputer for improving the performance by avoiding the complexity of the circuit configuration, the CPU having a built-in branch prediction mechanism for shortening the access time to the main memory when a cache miss occurs due to a conditional branch instruction And the CPU at normal times
A cache memory to be subjected to the instruction fetch of
A branch taken prefetch queue to be executed by the CPU when the branch is taken, a branch taken prefetch queue to be executed by the CPU when the branch is not taken, and a branch instruction provided outside the CPU are determined. A microcomputer includes a predecoder and a memory controller having a memory access address generation function. Claim 2
SUMMARY OF THE INVENTION The gist of the invention described in (1) is to reduce the access time associated with the instruction fetch to the main memory of the conditional branch instruction execution time without providing a complicated branch prediction mechanism in the CPU, and at the same time, to avoid the complexity of the circuit configuration and improve the performance. A microcomputer having a built-in branch prediction mechanism for shortening access time to a main memory when a cache miss occurs due to a conditional branch instruction; A cache memory to be executed, a branch taken prefetch queue to be executed by the CPU when the branch is taken, a branch taken prefetch queue to be executed by the CPU when the branch is not taken, and: Break the data of the other prefetch queue that was not targeted for execution A microcomputer including a branch non-use-side prefetch cache memory for storing without storing the data, a predecoder provided outside the CPU for determining a branch instruction, and a memory controller having a memory access address generation function. Exist. The gist of the invention according to claim 3 is that the pre-decoder comprises:
It is configured to control data transfer from the main memory to the branch taken prefetch queue and / or the branch taken prefetch queue, which are two prefetch queues on the branch taken side and / or the branch not taken side. A microcomputer according to claim 1 or 2. The gist of the invention described in claim 4 is that, when a cache miss occurs in the instruction fetch, the predecoder determines a branch instruction from the main memory based on fetch data held in a prefetch buffer, Said cache memory,
3. The microcomputer according to claim 1, wherein the microcomputer is configured to determine which one of the branch taken prefetch queue and the branch not taken prefetch queue is to transfer the fetch data. The gist of the invention described in claim 5 is that the CP
U fetches a branch instruction in the cache memory, and then determines which of the branch taken prefetch queue and the branch not taken prefetch queue is to be accessed according to a result of the execution of the branch instruction, and 5. The microcomputer according to claim 4, wherein the microcomputer is configured to execute a fetch process. The gist of the invention described in claim 6 is to reduce the access time associated with the instruction fetch to the main memory for the execution time of the conditional branch instruction without providing a complicated branch prediction mechanism in the CPU, and to increase the complexity of the circuit configuration. A cache control method for avoiding and improving performance, comprising: controlling a CPU to execute a branch prediction process to reduce access time to a main memory when a cache miss occurs due to a conditional branch instruction; A step of managing a cache memory which is an execution target of the instruction fetch of the CPU in a normal state; a step of managing a branch taken prefetch queue which is an execution target of the CPU instruction fetch when a branch is taken; Managing a branch not-taken prefetch queue to be executed, and providing the queue outside the CPU. A step of managing the predecoder is to determine branch instruction resides in cache control method characterized by comprising the step of managing a memory controller having an address generation function of the memory access. The gist of the invention described in claim 7 is that
A cache control method for reducing the access time associated with the instruction fetch to the main memory of the conditional branch instruction execution time without providing a complicated branch prediction mechanism in the PU, and at the same time, avoiding a complicated circuit configuration and improving the performance. hand,
Controlling the CPU to execute a branch prediction process in order to reduce the access time to the main memory when a cache miss occurs due to the conditional branch instruction;
A step of managing a cache memory as an execution target of the instruction fetch of U; a step of managing a branch taken prefetch queue as an execution target of the instruction fetch of the CPU when the branch is taken; and the execution of the instruction fetch of the CPU when the branch is not taken. A step of managing a target branch unsatisfied prefetch queue, and a step of managing a branch non-use side prefetch cache memory that stores data of the other prefetch queue that has not been subjected to the instruction fetch of the CPU without discarding the same. A cache control method including a step of managing a predecoder provided outside the CPU and determining a branch instruction, and a step of managing a memory controller having a memory access address generation function. The gist of the invention described in claim 8 is:
The predecoder is configured to control data transfer from the main memory to the branch taken prefetch queue and / or the branch not taken prefetch queue, which are two prefetch queues on the branch taken side and / or the branch not taken side. The cache control method according to claim 6 or 7, wherein The gist of the invention described in claim 9 is that the step of managing the pre-decoder includes, when a cache miss occurs in the instruction fetch, executing a branch instruction from the main memory based on fetch data held in a prefetch buffer. 8. The cache according to claim 6, further comprising: determining which of the cache memory, the branch taken prefetch queue, and the branch not taken prefetch queue the fetch data is to be transferred to. 8. It lies in the control method. The gist of the invention according to claim 10 is that the step of controlling the CPU to execute the branch prediction processing includes, after fetching a branch instruction in the cache memory, the branch taken prefetch according to a result of the execution of the branch instruction. 10. The cache control method according to claim 9, further comprising determining which one of a queue and the branch failure prefetch queue is to be accessed, and executing a process of loading the fetch data into the CPU. The gist of the invention described in claim 11 is that the memory controller executes a prefetch of an instruction access from the main memory before executing the instruction, and normally stores data in the cache memory;
The PU fetches an instruction from the cache memory, executes a predecode of the prefetched instruction to determine a branch instruction and a conditional branch instruction as other instructions, and if the predecode result is a branch instruction, the predecoder jumps. Calculating a destination address, and when there is no data in the cache memory, the predecoder stores the prefetch data of the jump address from the main memory in the cache memory, fetches an instruction from the cache memory, and When the decoding result is a conditional branch instruction, the predecoder calculates the jump address, and when no data exists in the cache memory, stores the prefetch data of the jump address from the main memory in the branch taken prefetch queue. And
A step of simultaneously executing a sequential instruction prefetch when the branch is not taken and storing the same in the branch unsatisfied prefetch queue; The instruction fetch is executed from the prefetch queue that matches the judgment, the fetch data at the time of execution of the instruction fetch is stored in the cache memory, and the data in the other prefetch queue, the prefetch queue in which the instruction fetch has not been executed, is discarded. 7. The cache control method according to claim 6, comprising a step. The gist of the present invention is that the memory controller executes prefetch of an instruction access from the main memory to store data in the cache memory before executing the instruction, and stores the data in the cache memory.
Fetching an instruction from the cache memory, performing pre-decoding of the prefetched instruction, and distinguishing a branch instruction and a conditional branch instruction from other instructions;
U fetches an instruction from the cache memory and executes predecoding of the prefetched instruction to determine a branch instruction and a conditional branch instruction as other instructions. A jump address is calculated, and based on the calculation result, when data does not exist in the cache memory, prefetch data of the jump address is stored in the cache memory from the main memory. In the case of an instruction, a jump address is calculated, and if no data exists in the cache memory, prefetch data of the jump address is stored in the branch taken prefetch queue from the main memory. Prefetch is also executed immediately after the branch The step of storing in a taken prefetch queue and the two prefetch queues of a branch taken side and a branch not taken side execute an instruction fetch from a prefetch queue that matches the branch determination at the time of branch determination of branch instruction execution, and fetch data into the cache memory. And storing the data in the other prefetch queue not used for instruction fetch in the branch non-use side prefetch cache memory without discarding the same conditional branch instruction again when the cache memory is used. 8. The cache control method according to claim 7, further comprising the step of executing the instruction stored in the branch non-use side prefetch cache memory when executing an instruction sequence not present in the cache memory. Exist.

【０００８】[0008]

【発明の実施の形態】以下に示す各実施の形態の特徴
は、条件分岐命令によるキャッシュミスの発生時でのメ
インメモリへのアクセス時間を短縮するために分岐予測
機構をＣＰＵに内蔵することで、条件分岐命令以降のメ
インメモリのアクセス方向を決定する既存のＣＰＵのよ
うな高速な分岐予測機構が不要となり、各条件分岐命令
の分岐予測情報を格納するためのＲＡＭが不要となる結
果、大幅なコストダウンを実現できること、分岐成立側
および分岐不成立側の２つのプリフェッチキュー（分岐
成立プリフェッチキュー、分岐不成立プリフェッチキュ
ー）、ＣＰＵの外に設けられ分岐命令を判断するプリデ
コーダ、およびメモリアクセスのアドレス生成機能を有
するメモリコントローラの３つに必須構成要素を削減し
てハードウェアの簡便化が可能となる結果、低コストで
マイクロコンピュータの性能向上を実現できること、キ
ャッシュメモリのデータを破壊する、すなわち、分岐命
令および条件分岐命令に伴う余計なメインメモリからキ
ャッシュメモリへのプリフェッチを発生させないように
できる結果、キャッシュメモリのヒット率の向上に代表
されるマイクロコンピュータの性能向上を図ることがで
きることである。以下、本発明の実施の形態を図面に基
づいて詳細に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS A feature of each embodiment described below is that a branch prediction mechanism is incorporated in a CPU in order to reduce access time to a main memory when a cache miss occurs due to a conditional branch instruction. This eliminates the need for a high-speed branch prediction mechanism such as an existing CPU that determines the access direction of the main memory after a conditional branch instruction, and eliminates the need for a RAM for storing branch prediction information for each conditional branch instruction. Prefetch queue (branch taken prefetch queue, branch not taken prefetch queue) on the branch taken side and branch not taken side, a predecoder provided outside the CPU to determine a branch instruction, and an address of memory access The hardware controller is simplified by reducing essential components to three memory controllers with a generation function. As a result, the performance of the microcomputer can be improved at low cost, and the data in the cache memory is destroyed. That is, unnecessary prefetch from the main memory to the cache memory due to the branch instruction and the conditional branch instruction does not occur. As a result, it is possible to improve the performance of the microcomputer represented by the improvement of the hit rate of the cache memory. Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

【０００９】（第１の実施の形態）図１は本発明の第１
の実施の形態にかかるマイクロコンピュータ１００を説
明するための機能ブロック図である。図１において、１
０はキャッシュメモリ、１２はＣＰＵ、１４は分岐成立
プリフェッチキュー、１６は分岐不成立プリフェッチキ
ュー、１８はメインメモリ、２０はメモリコントロー
ラ、２２はプリデコーダ、２６はプリフェッチバッフ
ァ、２８は第１内部データバス、３０は第２内部データ
バス、そして、１００はマイクロコンピュータを示して
いる。図１を参照すると、本実施の形態では、ＣＰＵ１
２の命令フェッチを実行する手段として、通常のキャッ
シュメモリ１０、分岐成立プリフェッチキュー１４、お
よび、分岐不成立プリフェッチキュー１６を備えてい
る。第１内部データバス２８には、キャッシュメモリ１
０、ＣＰＵ１２、分岐成立プリフェッチキュー１４、お
よび、分岐不成立プリフェッチキュー１６が接続されて
いる。第２内部データバス３０には、キャッシュメモリ
１０、分岐成立プリフェッチキュー１４、分岐不成立プ
リフェッチキュー１６、および、プリフェッチバッファ
２６が接続されている。FIG. 1 shows a first embodiment of the present invention.
FIG. 3 is a functional block diagram for explaining a microcomputer 100 according to the embodiment. In FIG. 1, 1
0 is a cache memory, 12 is a CPU, 14 is a branch taken prefetch queue, 16 is a branch not taken prefetch queue, 18 is a main memory, 20 is a memory controller, 22 is a predecoder, 26 is a prefetch buffer, and 28 is a first internal data bus. , 30 indicate a second internal data bus, and 100 indicates a microcomputer. Referring to FIG. 1, in the present embodiment, CPU 1
As means for executing the second instruction fetch, a normal cache memory 10, a branch taken prefetch queue 14, and a branch not taken prefetch queue 16 are provided. The first internal data bus 28 has a cache memory 1
0, a CPU 12, a branch taken prefetch queue 14, and a branch not taken prefetch queue 16. The cache memory 10, the branch taken prefetch queue 14, the branch not taken prefetch queue 16, and the prefetch buffer 26 are connected to the second internal data bus 30.

【００１０】プリデコーダ２２は、メモリコントローラ
２０およびプリフェッチバッファ２６に接続され、メイ
ンメモリ１８から分岐成立側および分岐不成立側の２つ
のプリフェッチキュー（分岐成立プリフェッチキュー１
４、分岐不成立プリフェッチキュー１６）へのデータ転
送を制御する機能、命令フェッチでキャッシュミスが発
生した場合に、メインメモリ１８からプリフェッチバッ
ファ２６に保持されるフェッチデータを基に分岐命令を
判断し、キャッシュメモリ１０、分岐成立プリフェッチ
キュー１４、分岐不成立プリフェッチキュー１６のどこ
に第２内部データバス３０を介してフェッチデータを転
送するかを決定する機能を有している。The predecoder 22 is connected to the memory controller 20 and the prefetch buffer 26, and is provided with two prefetch queues (branch prefetch queue 1) on the branch taken side and the branch not taken side from the main memory 18.
4. A function for controlling data transfer to the branch not taken prefetch queue 16). When a cache miss occurs in the instruction fetch, the branch instruction is determined from the main memory 18 based on the fetch data held in the prefetch buffer 26, It has a function of determining where in the cache memory 10, the branch taken prefetch queue 14, and the branch not taken prefetch queue 16 the fetch data is transferred via the second internal data bus 30.

【００１１】ＣＰＵ１２は、キャッシュメモリ１０で分
岐命令をフェッチした後、命令実行結果に応じて分岐成
立プリフェッチキュー１４、分岐不成立プリフェッチキ
ュー１６のどちらにアクセスするかを決定することによ
りフェッチデータのＣＰＵ１２への取り込み処理を実行
する機能を有している。After fetching a branch instruction in the cache memory 10, the CPU 12 determines which of the branch taken prefetch queue 14 and the branch not taken prefetch queue 16 is to be accessed according to the result of the instruction execution. It has the function of executing the capture process of

【００１２】次に図１を用いてマイクロコンピュータの
動作（キャッシュ制御方法）について説明する。まず、
命令実行の前にメモリコントローラ２０がメインメモリ
１８から命令アクセスのプリフェッチを実行し、通常は
キャッシュメモリ１０にデータを格納する。Next, the operation of the microcomputer (cache control method) will be described with reference to FIG. First,
Before the instruction is executed, the memory controller 20 executes a prefetch of the instruction access from the main memory 18 and usually stores the data in the cache memory 10.

【００１３】これに応じて、ＣＰＵ１２がキャッシュメ
モリ１０から命令をフェッチするとともに、プリフェッ
チした命令のプリデコードを実行して分岐命令および条
件分岐命令を他の命令と判別する。In response, CPU 12 fetches an instruction from cache memory 10 and executes predecoding of the prefetched instruction to determine a branch instruction and a conditional branch instruction as other instructions.

【００１４】このとき、プリデコード結果が分岐命令の
場合にはプリデコーダ２２が飛び先アドレスの計算を実
行し、一方、キャッシュメモリ１０にデータが存在しな
い場合にはメインメモリ１８から飛び先アドレスのプリ
フェッチデータをキャッシュメモリ１０に格納する。At this time, when the predecode result is a branch instruction, the predecoder 22 calculates the jump address, while when no data exists in the cache memory 10, the jump address of the jump address is read from the main memory 18. The prefetch data is stored in the cache memory 10.

【００１５】続いて、ＣＰＵ１２がキャッシュメモリ１
０から命令をフェッチする。プリデコード結果が条件分
岐命令の場合にはプリデコーダ２２が飛び先アドレスの
計算を実行し、一方、キャッシュメモリ１０にデータが
存在しない場合にはメインメモリ１８から飛び先アドレ
スのプリフェッチデータを分岐成立プリフェッチキュー
１４に格納する。さらにこの直後に、分岐不成立時のシ
ーケンシャルな命令プリフェッチも併せて実行して分岐
不成立プリフェッチキュー１６に格納する。Subsequently, the CPU 12 sets the cache memory 1
Fetch the instruction from 0. If the predecode result is a conditional branch instruction, the predecoder 22 calculates the jump address, while if there is no data in the cache memory 10, the prefetch data of the jump address is taken from the main memory 18 as a branch. The data is stored in the prefetch queue 14. Immediately after this, a sequential instruction prefetch when the branch is not taken is also executed and stored in the branch not taken prefetch queue 16.

【００１６】続いて分岐命令実行の分岐判断時に、分岐
成立側および分岐不成立側の２つのプリフェッチキュー
（分岐成立プリフェッチキュー１４、分岐不成立プリフ
ェッチキュー１６）のデータは、分岐判断に合致したプ
リフェッチキュー、すなわち、分岐成立プリフェッチキ
ュー１４、分岐不成立プリフェッチキュー１６のうちい
ずれか一方から命令フェッチを実行してキャッシュメモ
リ１０にそのときのフェッチデータを格納すると同時
に、他方のプリフェッチキュー、すなわち、命令フェッ
チが実行されなかったプリフェッチキューのデータは破
棄する。Subsequently, at the time of branch determination for execution of a branch instruction, data of two prefetch queues (branch-taken prefetch queue 14 and branch-not-taken prefetch queue 16) on the branch-taken side and the branch-not-taken side are stored in a prefetch queue matching the branch determination. That is, the instruction fetch is executed from one of the branch taken prefetch queue 14 and the branch not taken prefetch queue 16 to store the fetch data at that time in the cache memory 10, and at the same time, the other prefetch queue, that is, the instruction fetch is executed. The data of the prefetch queue that is not performed is discarded.

【００１７】以上説明したように第１の実施の形態によ
れば、以下に掲げる効果を奏する。まず第１の効果は、
ＣＰＵ１２に複雑な分岐予測機構を設けることなく、条
件分岐命令実行時間のメインメモリ１８に対する命令フ
ェッチに伴うアクセス時間を短縮でき、回路構成を複雑
にせずに性能を向上できることである。すなわち、条件
分岐命令によるキャッシュミスの発生時でのメインメモ
リ１８へのアクセス時間を短縮するために分岐予測機構
をＣＰＵ１２に内蔵することで、条件分岐命令以降のメ
インメモリ１８のアクセス方向を決定する既存のＣＰＵ
１２のような高速な分岐予測機構が不要となり、各条件
分岐命令の分岐予測情報を格納するためのＲＡＭが不要
となる結果、大きなコストダウンを実現できる。As described above, according to the first embodiment, the following effects can be obtained. First, the first effect is
Without providing a complicated branch prediction mechanism in the CPU 12, it is possible to shorten the access time of the conditional branch instruction execution time associated with the instruction fetch to the main memory 18 and improve the performance without complicating the circuit configuration. That is, the branch prediction mechanism is built in the CPU 12 to reduce the access time to the main memory 18 when a cache miss occurs due to the conditional branch instruction, so that the access direction of the main memory 18 after the conditional branch instruction is determined. Existing CPU
12, a high-speed branch prediction mechanism such as 12 is not required, and a RAM for storing branch prediction information of each conditional branch instruction is not required. As a result, a large cost reduction can be realized.

【００１８】また第２の効果は、分岐成立側および分岐
不成立側の２つのプリフェッチキュー（分岐成立プリフ
ェッチキュー１４、分岐不成立プリフェッチキュー１
６）、ＣＰＵ１２の外に設けられ分岐命令を判断するプ
リデコーダ２２、およびメモリアクセスのアドレス生成
機能を有するメモリコントローラ２０の３つに必須構成
要素を削減してハードウェアの簡便化が可能となる結
果、低コストでマイクロコンピュータ１００の性能向上
を実現できることである。The second effect is that two prefetch queues (a branch taken prefetch queue 14 and a branch not taken prefetch queue 1) are provided on the branch taken side and the branch not taken side.
6) The hardware can be simplified by reducing the three essential components of the pre-decoder 22 provided outside the CPU 12 for determining a branch instruction and the memory controller 20 having a memory access address generation function. As a result, the performance of the microcomputer 100 can be improved at low cost.

【００１９】そして第３の効果は、キャッシュメモリ１
０のデータを破壊する、すなわち、分岐命令および条件
分岐命令に伴う余計なメインメモリ１８からキャッシュ
メモリ１０へのプリフェッチを発生させないようにでき
る結果、キャッシュメモリ１０のヒット率の向上に代表
されるマイクロコンピュータ１００の性能向上を図るこ
とができることである。The third effect is that the cache memory 1
As a result of destroying the data of 0, that is, preventing the unnecessary prefetch from the main memory 18 to the cache memory 10 due to the branch instruction and the conditional branch instruction, the microscopic data represented by the improvement in the hit rate of the cache memory 10 is obtained. That is, the performance of the computer 100 can be improved.

【００２０】（第２の実施の形態）図２は本発明の第２
の実施の形態にかかるマイクロコンピュータ１００を説
明するための機能ブロック図である。図２において、１
０はキャッシュメモリ、１２はＣＰＵ、１４は分岐成立
プリフェッチキュー、１６は分岐不成立プリフェッチキ
ュー、１８はメインメモリ、２０はメモリコントロー
ラ、２２はプリデコーダ、２４は分岐不使用側プリフェ
ッチキャッシュメモリ、２６はプリフェッチバッファ、
２８は第１内部データバス、３０は第２内部データバ
ス、そして、１００はマイクロコンピュータを示してい
る。なお、第１の実施の形態において既に記述したもの
と同一の部分については、同一符号を付し、重複した説
明は省略する。図２を参照すると、本実施の形態では、
前述のキャッシュメモリ１０、分岐成立プリフェッチキ
ュー１４、および、分岐不成立プリフェッチキュー１６
に加えて、使用しなかった他方のプリフェッチキューの
データを破棄せずに格納する分岐不使用側プリフェッチ
キャッシュメモリ２４を第１内部データバス２８に接続
し、再び同じ条件分岐命令をキャッシュメモリ１０の使
用時に実行し、現在キャッシュメモリ１０にない、すな
わち、前回使用しなかった側の命令列を実行する場合、
分岐不使用側プリフェッチキャッシュメモリ２４に蓄え
られた命令を実行する点に特徴を有している。(Second Embodiment) FIG. 2 shows a second embodiment of the present invention.
FIG. 3 is a functional block diagram for explaining a microcomputer 100 according to the embodiment. In FIG. 2, 1
0 is a cache memory, 12 is a CPU, 14 is a branch taken prefetch queue, 16 is a branch not taken prefetch queue, 18 is a main memory, 20 is a memory controller, 22 is a predecoder, 24 is a branch non-use side prefetch cache memory, and 26 is Prefetch buffer,
28 is a first internal data bus, 30 is a second internal data bus, and 100 is a microcomputer. The same portions as those already described in the first embodiment are denoted by the same reference numerals, and duplicate description will be omitted. Referring to FIG. 2, in the present embodiment,
The above-described cache memory 10, branch taken prefetch queue 14, and branch not taken prefetch queue 16
In addition, a branch non-use side prefetch cache memory 24 for storing data of the other unused prefetch queue without discarding is connected to the first internal data bus 28, and the same conditional branch instruction is stored in the cache memory 10 again. When executing at the time of use and not currently in the cache memory 10, that is, when executing the instruction sequence on the side not used last time,
It is characterized in that the instructions stored in the branch non-use side prefetch cache memory 24 are executed.

【００２１】本実施の形態では、プリデコーダ２２の動
作は第１の実施の形態と同様である。メモリコントロー
ラ２０が命令実行の前に、メインメモリ１８から命令ア
クセスのプリフェッチを実行し、通常はキャッシュメモ
リ１０にデータを格納する。続いて、ＣＰＵ１２がキャ
ッシュメモリ１０から命令をフェッチし、さらにプリフ
ェッチした命令のプリデコードを実行し、分岐命令およ
び条件分岐命令を他の命令と判別する。In the present embodiment, the operation of the predecoder 22 is the same as in the first embodiment. Before executing the instruction, the memory controller 20 prefetches the instruction access from the main memory 18 and usually stores the data in the cache memory 10. Subsequently, the CPU 12 fetches an instruction from the cache memory 10, executes predecoding of the prefetched instruction, and determines a branch instruction and a conditional branch instruction as other instructions.

【００２２】続いて、プリデコーダ２２が、プリデコー
ド結果が分岐命令の場合に飛び先アドレスの計算を実行
し、その結果、キャッシュメモリ１０にデータが存在し
ない場合にメインメモリ１８から飛び先アドレスのプリ
フェッチデータをキャッシュメモリ１０に格納し、一
方、プリデコード結果が条件分岐命令の場合に飛び先ア
ドレスの計算を実行し、その結果、キャッシュメモリ１
０にデータが存在しない場合にはメインメモリ１８から
飛び先アドレスのプリフェッチデータを分岐成立プリフ
ェッチキュー１４に格納する。さらに、プリデコーダ２
２は、分岐不成立時のシーケンシャルな命令プリフェッ
チも直後に行い、分岐不成立プリフェッチキュー１６に
格納する。Subsequently, the predecoder 22 calculates the jump destination address when the predecode result is a branch instruction. As a result, when there is no data in the cache memory 10, the predecoder 22 calculates the jump address from the main memory 18. The prefetch data is stored in the cache memory 10. On the other hand, when the predecode result is a conditional branch instruction, the jump destination address is calculated.
If there is no data at 0, the prefetch data at the jump address from the main memory 18 is stored in the branch taken prefetch queue 14. Further, the predecoder 2
No. 2 also performs the sequential instruction prefetch immediately after the branch is not taken and stores it in the branch not taken prefetch queue 16.

【００２３】分岐成立側および分岐不成立側の２つのプ
リフェッチキュー（分岐成立プリフェッチキュー１４、
分岐不成立プリフェッチキュー１６）のデータは、分岐
命令実行の分岐判断時に分岐判断に合致したプリフェッ
チキュー、すなわち、分岐成立プリフェッチキュー１４
または分岐不成立プリフェッチキュー１６のうちいずれ
か一方のプリフェッチキューから命令フェッチを実行し
てキャッシュメモリ１０にフェッチデータを格納するま
での動作は第１の実施の形態と同様である。The two prefetch queues (the branch taken prefetch queue 14 and the branch taken prefetch queue 14,
The data in the branch failure prefetch queue 16) is a prefetch queue that matches the branch determination at the time of branch determination of branch instruction execution, that is, the branch taken prefetch queue 14
Alternatively, the operation from executing the instruction fetch from any one of the branch failure prefetch queues 16 and storing the fetch data in the cache memory 10 is the same as that of the first embodiment.

【００２４】続いて、命令フェッチに使用しなかった他
方のプリフェッチキュー、すなわち、分岐成立プリフェ
ッチキュー１４または分岐不成立プリフェッチキュー１
６のうちの命令フェッチに使用したプリフェッチキュー
でない方のプリフェッチキューのデータを破棄せずに、
分岐不使用側プリフェッチキャッシュメモリ２４に格納
する。Subsequently, the other prefetch queue not used for instruction fetch, that is, the branch taken prefetch queue 14 or the branch not taken prefetch queue 1
6 without discarding the data of the prefetch queue which is not the prefetch queue used for the instruction fetch,
It is stored in the branch non-use side prefetch cache memory 24.

【００２５】続いて、再び同じ条件分岐命令をキャッシ
ュメモリ１０の使用時に実行し、現在キャッシュメモリ
１０にない、すなわち、前回使用しなかった側の命令列
を実行する場合に分岐不使用側プリフェッチキャッシュ
メモリ２４に蓄えられた命令を実行する。Subsequently, the same conditional branch instruction is executed again when the cache memory 10 is used, and when the instruction sequence which is not currently in the cache memory 10, ie, the instruction sequence not used last time is executed, the branch non-use side prefetch cache is executed. The instruction stored in the memory 24 is executed.

【００２６】以上説明したように第２の実施の形態によ
れば、キャッシュヒット時に条件分岐命令が存在した場
合にもメモリのアクセス時間の短縮が実現できる結果、
キャッシュミス時の命令フェッチ時に存在する分岐命令
のみに有効である第１の実施の形態に比較してさらに実
行時間の短縮が可能となるといった効果を奏する。As described above, according to the second embodiment, even if a conditional branch instruction is present at the time of a cache hit, the memory access time can be shortened.
There is an effect that the execution time can be further reduced as compared with the first embodiment which is effective only for the branch instruction existing at the time of instruction fetch at the time of a cache miss.

【００２７】なお、本発明が上記各実施の形態に限定さ
れず、本発明の技術思想の範囲内において、各実施の形
態は適宜変更され得ることは明らかである。また上記構
成部材の数、位置、形状等は上記実施の形態に限定され
ず、本発明を実施する上で好適な数、位置、形状等にす
ることができる。また、各図において、同一構成要素に
は同一符号を付している。It should be noted that the present invention is not limited to the above embodiments, and it is clear that the embodiments can be appropriately modified within the scope of the technical idea of the present invention. Further, the number, position, shape, and the like of the constituent members are not limited to the above-described embodiment, and can be set to numbers, positions, shapes, and the like suitable for carrying out the present invention. In each drawing, the same components are denoted by the same reference numerals.

【００２８】[0028]

【発明の効果】本発明は以上のように構成されているの
で、以下に掲げる効果を奏する。まず第１の効果は、Ｃ
ＰＵに複雑な分岐予測機構を設けることなく、条件分岐
命令実行時間のメインメモリに対する命令フェッチに伴
うアクセス時間を短縮でき、回路構成を複雑にせずに性
能を向上できることである。すなわち、条件分岐命令に
よるキャッシュミスの発生時でのメインメモリへのアク
セス時間を短縮するために分岐予測機構をＣＰＵに内蔵
することで、条件分岐命令以降のメインメモリアクセス
方向を決定する既存のＣＰＵのような高速な分岐予測機
構が不要となり、各条件分岐命令の分岐予測情報を格納
するためのＲＡＭが不要となる結果、大きなコストダウ
ンを実現できる。また第２の効果は、分岐成立側および
分岐不成立側の２つのプリフェッチキュー、ＣＰＵの外
に設けられ分岐命令を判断するプリデコーダ、およびメ
モリアクセスのアドレス生成機能を有するメモリコント
ローラの３つに必須構成要素を削減してハードウェアの
簡便化が可能となる結果、低コストでマイクロコンピュ
ータの性能向上を実現できることである。そして第３の
効果は、キャッシュメモリのデータを破壊する、すなわ
ち、分岐命令および条件分岐命令に伴う余計なメインメ
モリからキャッシュへのプリフェッチを発生させないよ
うにできる結果、キャッシュのヒット率の向上に代表さ
れるマイクロコンピュータの性能向上を図ることができ
ることである。Since the present invention is configured as described above, the following effects can be obtained. The first effect is that C
An object of the present invention is to provide a PU that does not have a complicated branch prediction mechanism, can shorten the access time of the conditional branch instruction execution time associated with the instruction fetch to the main memory, and can improve the performance without complicating the circuit configuration. That is, by incorporating a branch prediction mechanism in the CPU to reduce the access time to the main memory when a cache miss occurs due to a conditional branch instruction, the existing CPU that determines the main memory access direction after the conditional branch instruction Such a high-speed branch prediction mechanism is not required, and a RAM for storing branch prediction information of each conditional branch instruction is not required. As a result, a large cost reduction can be realized. The second effect is essential to three prefetch queues on the branch taken side and the branch not taken side, a predecoder provided outside the CPU to determine a branch instruction, and a memory controller having a memory access address generation function. As a result, hardware can be simplified by reducing the number of components, so that the performance of the microcomputer can be improved at low cost. The third effect is that the data in the cache memory is destroyed, that is, unnecessary prefetch from the main memory to the cache due to the branch instruction and the conditional branch instruction can be prevented. The performance of the microcomputer to be improved.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態にかかるマイクロコ
ンピュータを説明するための機能ブロック図である。FIG. 1 is a functional block diagram for explaining a microcomputer according to a first embodiment of the present invention.

【図２】本発明の第２の実施の形態にかかるマイクロコ
ンピュータを説明するための機能ブロック図である。FIG. 2 is a functional block diagram for explaining a microcomputer according to a second embodiment of the present invention.

【符号の説明】１０…キャッシュメモリ１２…ＣＰＵ１４…分岐成立プリフェッチキュー１６…分岐不成立プリフェッチキュー１８…メインメモリ２０…メモリコントローラ２２…プリデコーダ２４…分岐不使用側プリフェッチキャッシュメモリ２６…プリフェッチバッファ２８…第１内部データバス３０…第２内部データバス１００…マイクロコンピュータ[Description of Signs] 10 cache memory 12 CPU 14 branch taken prefetch queue 16 branch not taken prefetch queue 18 main memory 20 memory controller 22 predecoder 24 branch non-use side prefetch cache memory 26 prefetch buffer 28 ... first internal data bus 30 ... second internal data bus 100 ... microcomputer

Claims

[Claims]

1. A method for shortening an access time associated with an instruction fetch to a main memory during a conditional branch instruction execution time without providing a complicated branch prediction mechanism in a CPU, and at the same time, improving a performance by avoiding a complicated circuit configuration. A microcomputer comprising: a CPU having a branch prediction mechanism for shortening access time to a main memory when a cache miss occurs due to a conditional branch instruction; and a cache to which an instruction fetch of the CPU is normally executed. A memory; a branch taken prefetch queue to be executed by the CPU when the branch is taken; a branch taken prefetch queue to be executed by the CPU when the branch is not taken; and a branch instruction provided outside the CPU. Pre-decoder to determine memory access and memory access address generation function A microcomputer comprising a memory controller having the following.

2. The method according to claim 1, wherein the CPU is not provided with a complicated branch prediction mechanism, so that the access time associated with the instruction fetch to the main memory with the conditional branch instruction execution time is reduced, and the performance is improved by avoiding the complicated circuit configuration. A microcomputer comprising: a CPU having a branch prediction mechanism for shortening access time to a main memory when a cache miss occurs due to a conditional branch instruction; and a cache to which an instruction fetch of the CPU is normally executed. A memory; a branch taken prefetch queue to be executed by the CPU when the branch is taken; a branch not taken prefetch queue to be executed by the CPU when the branch is not taken; Without destroying the data in the other prefetch queue A branch unused side prefetch cache memory for storing a pre-decoder for determining a branch instruction provided outside of the CPU, a microcomputer, characterized in that it comprises a memory controller having an address generation function of the memory access.

3. The predecoder controls data transfer from the main memory to the branch taken prefetch queue and / or the branch not taken prefetch queue, which are two prefetch queues on a branch taken side and / or a branch not taken side. The microcomputer according to claim 1, wherein the microcomputer is configured as follows.

4. When a cache miss occurs in an instruction fetch, the predecoder determines a branch instruction from the main memory on the basis of fetch data held in a prefetch buffer. 3. The microcomputer according to claim 1, wherein the microcomputer is configured to determine to which of the prefetch queue and the branch failure prefetch queue the fetch data is transferred.

5. After fetching a branch instruction in the cache memory, the CPU determines which of the branch taken prefetch queue and the branch not taken prefetch queue is to be accessed according to a result of the execution of the branch instruction. 5. The microcomputer according to claim 4, wherein the microcomputer is configured to execute a process of loading the fetch data into the CPU.

6. A CPU for providing a conditional branch instruction execution time without providing a complicated branch prediction mechanism in a CPU, thereby shortening an access time associated with an instruction fetch to the main memory, and at the same time, avoiding a complicated circuit configuration and improving performance. A cache control method, comprising: controlling a CPU to execute a branch prediction process in order to reduce an access time to a main memory when a cache miss occurs due to a conditional branch instruction; Managing a cache memory to be executed by the CPU; a step of managing a branch taken prefetch queue to be executed by the CPU instruction fetch when the branch is taken; and a branch not taken by the CPU to execute the instruction fetch when the branch is not taken. Managing a prefetch queue; determining a branch instruction provided outside the CPU Cache control method characterized by comprising the steps of: managing a predecoder, a step of managing a memory controller having an address generation function of the memory access that.

7. A CPU that is not provided with a complicated branch prediction mechanism, reduces the access time of a conditional branch instruction execution time associated with an instruction fetch to a main memory, and avoids a complicated circuit configuration to improve performance. A cache control method, comprising: controlling a CPU to execute a branch prediction process in order to reduce an access time to a main memory when a cache miss occurs due to a conditional branch instruction; Managing a cache memory to be executed by the CPU, a step of managing a branch taken prefetch queue to be executed by the CPU when the branch is taken, and a branch to be executed by the CPU when the branch is not taken. A step of managing an unsuccessful prefetch queue; Managing a branch non-use side prefetch cache memory for storing data of the other prefetch queue that has not been lost without discarding; managing a predecoder provided outside the CPU for determining a branch instruction; A cache control method comprising a step of managing a memory controller having an access address generation function.

8. The predecoder controls data transfer from the main memory to the branch taken prefetch queue and / or the branch not taken prefetch queue, which are two prefetch queues on a branch taken side and / or a branch not taken side. The cache control method according to claim 6, wherein the cache control method is configured as follows.

9. A process for managing the predecoder, comprising: determining a branch instruction from the main memory based on fetch data held in a prefetch buffer when a cache miss occurs in an instruction fetch; 8. The cache control method according to claim 6, further comprising the step of determining which of the branch taken prefetch queue and the branch not taken prefetch queue is to transfer the fetch data.

10. The step of controlling the CPU to execute the branch prediction process includes fetching a branch instruction in the cache memory, and then executing the branch taken prefetch queue and the branch not taken prefetch queue in accordance with a result of the execution of the branch instruction. 10. The cache control method according to claim 9, further comprising: determining which one of the CPUs is to be accessed, and executing a process of loading the fetch data into the CPU.

11. The memory controller executes a prefetch of an instruction access from the main memory before executing an instruction, and usually stores data in the cache memory, and the CPU fetches the instruction from the cache memory. The pre-decoder executes a pre-decode of the pre-fetched instruction to determine a branch instruction and a conditional branch instruction as other instructions, and when the pre-decode result is a branch instruction, the pre-decoder performs calculation of a jump destination address; When there is no data in the memory, the predecoder stores prefetch data of a jump address from the main memory in the cache memory, fetches an instruction from the cache memory, and if the predecode result is a conditional branch instruction, Predecoder calculates jump address If no data exists in the cache memory, the prefetch data of the jump address is stored in the branch taken prefetch queue from the main memory, and a sequential instruction prefetch when the branch is not taken is also executed to execute the branch not taken. Storing the data in a prefetch queue, and, when determining a branch for execution of a branch instruction, with respect to data in two prefetch queues on a branch taken side and a branch not taken side,
The instruction fetch is executed from the prefetch queue that matches the branch determination, the fetch data at the time of execution of the instruction fetch is stored in the cache memory, and the data of the other prefetch queue, the prefetch queue in which the instruction fetch was not executed, is discarded. 7. The cache control method according to claim 6, further comprising the step of:

12. The memory controller executes instruction access prefetch from the main memory to store data in the cache memory before executing the instruction, and the CPU fetches the instruction from the cache memory and executes the prefetched instruction. Executing the pre-decoding of the instruction and determining the branch instruction and the conditional branch instruction as other instructions; and the CPU fetching the instruction from the cache memory and executing the pre-decoding of the prefetched instruction to execute the branch instruction and the conditional instruction. A branch instruction is determined to be another instruction, and the predecoder executes a calculation of a jump destination address when the predecode result is a branch instruction, and based on the calculation result, when there is no data in the cache memory. Prefetch data of the jump address from the main memory When the predecode result is a conditional branch instruction, the jump address is calculated. When no data exists in the cache memory, the prefetch data of the jump address is read from the main memory into the branch taken prefetch. Storing the instruction in a queue and immediately executing a sequential instruction prefetch when the branch is not taken, and storing the instruction in the branch unsatisfied prefetch queue. Executing an instruction fetch from the prefetch queue that coincides with the branch determination and storing the fetch data in the cache memory, and the branch non-use side without discarding the data of the other prefetch queue not used for the instruction fetch. Prefetch cache memory Storing the same conditional branch instruction again when the cache memory is used, and executing the instruction stored in the branch non-use side prefetch cache memory when executing an instruction sequence not present in the cache memory. 8. The cache control method according to claim 7, comprising: