JP2011209902A

JP2011209902A - Instruction fetch device, instruction packet generation device, processor, and instruction fetch method

Info

Publication number: JP2011209902A
Application number: JP2010075779A
Authority: JP
Inventors: Hiroshi Kobayashi; 浩小林; Hiroaki Sakaguchi; 浩章坂口; Yosuke Morita; 陽介森田; Hitoshi Kai; 斉甲斐; Katsuhiko Metsugi; 勝彦目次; Haruhisa Yamamoto; 晴久山本; Koichi Hasegawa; 浩一長谷川; Taichi Hirao; 太一平尾
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2010-03-29
Filing date: 2010-03-29
Publication date: 2011-10-20

Abstract

PROBLEM TO BE SOLVED: To control instruction pre-fetch by using information relating to a branch instruction.SOLUTION: A program is managed by an instruction packet divided into command payloads with fixed length to each of which an instruction header is added. The instruction packet includes a branch prediction flag, and when this branch prediction flag indicates "1", the instruction pre-fetch of a next line from a next line pre-fetch part 150 to a system memory 140 is suppressed. The branch prediction flag is a field indicating that such possibility that the branch instruction may exist in the corresponding instruction payload, and that the branch destination may be other than the instruction payload or the next instruction payload is high. Also, the arrangement of the branch instruction can be changed by performing compression using an instruction dictionary table 192 in order to prevent the branch prediction flag from being set to "1" in the continuous instruction packets.

Description

本発明は、命令フェッチ装置に関し、特に分岐命令を含む命令列をプリフェッチするための命令フェッチ装置、命令パケット生成装置、プロセッサ、および、これらにおける処理方法ならびに当該方法をコンピュータに実行させるプログラムに関する。 The present invention relates to an instruction fetch device, and more particularly to an instruction fetch device for prefetching an instruction sequence including a branch instruction, an instruction packet generation device, a processor, a processing method therefor, and a program for causing a computer to execute the method.

パイプライン化されたＣＰＵ（Central Processing Unit：プロセッサ）の処理能力を最大限に引き出すためには、パイプライン内の命令を滞らないよう流れ続けさせることが理想的である。この理想状態を保つためには、次に処理されるべき命令が格納されているメモリから命令を予めＣＰＵまたは命令キャッシュにフェッチしておく必要がある。しかし、プログラムに分岐命令が含まれる場合、分岐命令の次に実行すべき命令の番地が分岐命令実行時まで確定しない。そのため、命令フェッチが待たされてパイプラインストールが発生し、命令実行のスループットが低下する。そのため、分岐による不確定要素がありながらプリフェッチを行って、パイプラインストールの発生を抑えるために工夫をしているＣＰＵが多くある。 In order to maximize the processing capability of a pipelined CPU (Central Processing Unit), it is ideal that instructions in the pipeline continue to flow without delay. In order to maintain this ideal state, it is necessary to fetch an instruction from a memory storing an instruction to be processed next to a CPU or an instruction cache in advance. However, when a branch instruction is included in the program, the address of the instruction to be executed next to the branch instruction is not fixed until the branch instruction is executed. For this reason, instruction install is awaited, pipeline installation occurs, and instruction execution throughput decreases. For this reason, many CPUs have been devised to perform prefetching and suppress the occurrence of pipeline installation while there are uncertain elements due to branching.

単純なハードウェアにより実現できるものとして典型的なプリフェッチがネクストラインプリフェッチである（例えば、特許文献１参照。）。これは、プログラムの順番通りに命令をプリフェッチしてくる手法である。プロセッサの命令フェッチでは、連続的に番地が増加する方向にメモリをアクセスするというのが基本的なメモリアクセスパターンである。このため、ハードウェアによるプリフェッチは、ある番地の命令をキャッシュに格納した後に、その次のキャッシュラインも使われるという見込みのもとに、自動的に次のキャッシュラインも格納するという方法である。 A typical prefetch that can be realized by simple hardware is next-implement fetch (see, for example, Patent Document 1). This is a method of prefetching instructions in the order of the program. In the instruction fetch of the processor, the basic memory access pattern is to access the memory in a direction in which the addresses continuously increase. For this reason, the prefetch by hardware is a method in which after the instruction at a certain address is stored in the cache, the next cache line is automatically stored with the expectation that the next cache line will also be used.

特許第４３２７２３７号公報（図１）Japanese Patent No. 4327237 (FIG. 1)

上述のネクストラインプリフェッチは、単純なハードウェア構成で実現できる反面、分岐しないことを前提としてプリフェッチをするため、多くの場合で無駄なプリフェッチ、すなわちプリフェッチミスが生じてしまう。このようなプリフェッチミスが起こるとプリフェッチしておいた命令を破棄して、再度正しい分岐先の命令をフェッチするため、ＣＰＵが待たされるという時間面のデメリットがある。それに加えて、余分なデータの読み書きを行うため、メモリアクセスが増加し、電力面でのロスも生じる。さらに、頻繁なプリフェッチや無駄なプリフェッチはデータパスのトラフィックを混雑させるという問題もある。 Although the above-described next-implementation fetch can be realized with a simple hardware configuration, prefetching is performed on the premise that no branching occurs, and in many cases, useless prefetching, that is, prefetch miss occurs. When such a prefetch miss occurs, the prefetched instruction is discarded, and the correct branch destination instruction is fetched again. Therefore, there is a time disadvantage that the CPU waits. In addition, since extra data is read and written, memory access increases and power loss occurs. Furthermore, frequent prefetching and useless prefetching also cause a problem of congesting data path traffic.

プリフェッチミスを減らす他の試みとして、分岐予測を用いるものがある。ネクストラインプリフェッチは常に分岐しないことを予測してネクストラインをプリフェッチしているが、過去の履歴から分岐方向を予測して、予測した方の番地をプリフェッチするという方法である。分岐予測は複雑であり、履歴テーブルなどの回路面積の大きなハードウェアが必要である。しかし、分岐予測によって達成される性能利益は、予測アルゴリズムの効率に依存し、その予測アルゴリズムの多くは、相対的に大容量の記憶装置と、複雑なハードウェアとで実装する必要がある。分岐予測も予測が外れた場合にはネクストラインプリフェッチと同様なペナルティが発生する。実際のプログラムの大半では、ループ処理や例外処理など各分岐先への分岐比率は偏りが大きいため、分岐予測によるメリットがデメリットを上回ることが多い。しかし、アプリケーションによってはどのような予測アルゴリズムを用いたとしても予測性能を上げることが困難なものがあり、特にコーデックではループ以外の予測が当たりにくい傾向がある。予測のヒット率は向上させなければならないが、そのための機構は複雑かつ大規模になる反面、必ずしも回路規模に見合う性能改善が得られるとは限らない。 Another attempt to reduce prefetch misses uses branch prediction. In the next-implement fetch, the next line is prefetched by predicting that the branch will not always be branched, but the branch direction is predicted from the past history, and the predicted address is prefetched. Branch prediction is complicated and requires hardware with a large circuit area such as a history table. However, the performance benefit achieved by branch prediction depends on the efficiency of the prediction algorithm, and many of the prediction algorithms need to be implemented with relatively large storage devices and complex hardware. If the branch prediction is also unpredictable, a penalty similar to the next implicit fetch occurs. In most actual programs, the branch ratio to each branch destination, such as loop processing and exception processing, is highly biased, and the benefits of branch prediction often outweigh the disadvantages. However, depending on the application, no matter what prediction algorithm is used, it is difficult to improve the prediction performance. In particular, the codec tends to be difficult to predict other than the loop. Although the prediction hit rate must be improved, the mechanism for this is complicated and large-scale, but the performance improvement corresponding to the circuit scale is not always obtained.

一方向のみのプリフェッチを行う上述の方法と違って、予測をせず、分岐先の両方向をプリフェッチしておいてプリフェッチミスを解消する手法も考えられている。この場合、分岐予測の手法に比べ、少ないハードウェア構成追加でパイプラインストールを解消することができる。しかし、プリフェッチのための格納データ量が単純に倍増するだけでなく、不要なデータを必ず読むことになり、データパス混雑度が高まることによる悪影響や、冗長回路追加による複雑化や、電力面のロスも無視できない。 Unlike the above-described method in which prefetching is performed in only one direction, a method is also considered in which prefetching is eliminated by prefetching both directions at the branch destination without performing prediction. In this case, compared to the branch prediction method, the pipeline installation can be eliminated with a small hardware configuration addition. However, not only does the amount of stored data for prefetching simply double, but it also means that unnecessary data is always read, and the adverse effects of increased data path congestion, the complexity of adding redundant circuits, Loss cannot be ignored.

以上のように、どのようにプリフェッチをするか、それぞれデメリット（ＣＰＵ実装コスト、分岐予測処理のオーバーヘッド）とメリット（期待されるスループット向上）があり、コストと性能のトレードオフが存在する。 As described above, how prefetching is performed has its demerits (CPU mounting cost, branch prediction processing overhead) and merits (expected throughput improvement), and there is a trade-off between cost and performance.

本発明はこのような状況に鑑みてなされたものであり、分岐命令に関する情報を利用して、命令プリフェッチを制御することを目的とする。 The present invention has been made in view of such a situation, and an object of the present invention is to control instruction prefetch using information related to a branch instruction.

本発明は、上記課題を解決するためになされたものであり、その第１の側面は、プログラムの命令列を所定サイズ毎に区切った命令ペイロードと当該命令ペイロードに含まれる分岐命令によって当該命令ペイロードまたは次の命令ペイロードの何れにも含まれない命令へ分岐が生じる可能性の高さを示す分岐予測情報を含む命令ヘッダとからなる命令パケットを保持する命令パケット保持部と、上記命令パケット保持部に保持された上記命令パケットを上記命令ペイロードと上記命令ヘッダとに分離する命令パケット分離部と、上記命令ヘッダに含まれる上記分岐予測情報に基づいて当該命令ヘッダに対応する命令ペイロードに含まれる分岐命令によって当該命令ペイロードまたは次の命令ペイロードの何れにも含まれない命令へ分岐する可能性が高いと判定した場合には次の命令パケットのプリフェッチ抑止を指示する分岐予測情報判定部と、上記プリフェッチ抑止が指示されない限り上記次の命令パケットのプリフェッチを実行する命令プリフェッチ部とを具備する命令フェッチ装置またはそれらに対応する処理手順を具備する命令フェッチ方法である。これにより、分岐予測情報に基づいて次の命令パケットのプリフェッチを抑止するか否かを制御するという作用をもたらす。 The present invention has been made to solve the above problems, and a first aspect of the present invention is that the instruction payload includes an instruction payload obtained by dividing an instruction sequence of a program for each predetermined size and a branch instruction included in the instruction payload. Or an instruction packet holding unit for holding an instruction packet including an instruction header including branch prediction information indicating a high possibility of branching to an instruction not included in any of the next instruction payloads; and the instruction packet holding unit An instruction packet separator for separating the instruction packet held in the instruction payload and the instruction header, and a branch included in the instruction payload corresponding to the instruction header based on the branch prediction information included in the instruction header An instruction can branch to an instruction that is not included in either the instruction payload or the next instruction payload. A branch prediction information determination unit for instructing prefetch suppression of the next instruction packet when it is determined that the prefetch is high, and an instruction prefetch unit for executing prefetch of the next instruction packet unless the prefetch suppression is instructed. An instruction fetch method comprising an instruction fetch device or a processing procedure corresponding to the instruction fetch device. This brings about the effect of controlling whether or not to prefetch the next instruction packet based on the branch prediction information.

また、この第１の側面において、上記命令パケット分離部によって分離された上記命令ペイロードに含まれる命令辞書参照命令に基づいて当該命令辞書参照命令に対応する命令列を命令辞書テーブルから読み出して伸張する命令実行部をさらに具備してもよい。これにより、命令辞書テーブルを利用して圧縮された命令列を伸張させるという作用をもたらす。 Further, in this first aspect, based on an instruction dictionary reference instruction included in the instruction payload separated by the instruction packet separation unit, an instruction sequence corresponding to the instruction dictionary reference instruction is read from the instruction dictionary table and decompressed. An instruction execution unit may be further provided. As a result, the command sequence table is expanded using the command dictionary table.

また、この第１の側面において、上記命令ヘッダに含まれる命令ペイロード圧縮フラグに基づいて当該命令ヘッダに対応する命令ペイロードから命令列を伸張する命令伸張部をさらに具備してもよい。これにより、命令ペイロード圧縮フラグに基づいて圧縮された命令ペイロードを伸張させるという作用をもたらす。 In addition, in the first aspect, an instruction expansion unit that expands an instruction sequence from an instruction payload corresponding to the instruction header based on an instruction payload compression flag included in the instruction header may be further provided. As a result, the instruction payload compressed based on the instruction payload compression flag is expanded.

また、本発明の第２の側面は、プログラムの命令列を所定サイズ毎に区切った命令ペイロードと当該命令ペイロードに含まれる分岐命令によって当該命令ペイロードまたは次の命令ペイロードの何れにも含まれない命令へ分岐が生じる可能性の高さを示す分岐予測情報を含む命令ヘッダとからなる命令パケットを保持する命令パケット保持部と、上記命令パケット保持部に保持された上記命令パケットを上記命令ペイロードと上記命令ヘッダとに分離する命令パケット分離部と、上記命令パケット分離部によって分離された上記命令ペイロードに含まれる命令列を実行する命令実行部と、上記命令ヘッダに含まれる上記分岐予測情報に基づいて当該命令ヘッダに対応する命令ペイロードに含まれる分岐命令によって当該命令ペイロードまたは次の命令ペイロードの何れにも含まれない命令へ分岐する可能性が高いと判定した場合には次の命令パケットのプリフェッチ抑止を指示する分岐予測情報判定部と、上記プリフェッチ抑止が指示されない限り上記次の命令パケットのプリフェッチを実行して上記命令パケット分離部に次の命令パケットを供給する命令プリフェッチ部とを具備するプロセッサである。これにより、命令実行部を備えるプロセッサにおいて分岐予測情報に基づいて次の命令パケットのプリフェッチを抑止するか否かを制御するという作用をもたらす。 The second aspect of the present invention provides an instruction payload that is not included in either the instruction payload or the next instruction payload by a branch instruction included in the instruction payload obtained by dividing a program instruction sequence for each predetermined size and the instruction payload. An instruction packet holding unit that holds an instruction packet including an instruction header including branch prediction information indicating a high possibility of branching to the instruction packet, and the instruction packet held in the instruction packet holding unit Based on the instruction packet separation unit that separates into the instruction header, the instruction execution unit that executes the instruction sequence included in the instruction payload separated by the instruction packet separation unit, and the branch prediction information included in the instruction header Depending on the branch instruction included in the instruction payload corresponding to the instruction header, the instruction payload or If it is determined that there is a high possibility of branching to an instruction that is not included in any of the instruction payloads, a branch prediction information determination unit that instructs prefetch suppression of the next instruction packet; And an instruction prefetch unit that executes prefetching of the instruction packet and supplies the next instruction packet to the instruction packet separation unit. This brings about the effect of controlling whether or not to prefetch the next instruction packet based on the branch prediction information in the processor having the instruction execution unit.

また、本発明の第３の側面は、プログラムの命令列を所定サイズ毎に区切った命令ペイロードと命令ヘッダとからなる命令パケットを生成する命令パケット生成部と、上記命令ペイロードの各々について当該命令ペイロードに含まれる分岐命令によって当該命令ペイロードまたは次の命令ペイロードの何れにも含まれない命令へ分岐が生じる可能性の高さを示す分岐予測情報を当該命令ペイロードに対応する命令ヘッダに設定する分岐予測情報設定部と、上記分岐予測情報を設定された命令ヘッダを含む命令パケットを保持する命令パケット保持部とを具備する命令パケット生成装置である。これにより、次の命令ペイロードをプリフェッチするか否かを判断するための分岐予測情報を命令ヘッダに含ませるという作用をもたらす。 According to a third aspect of the present invention, there is provided an instruction packet generation unit that generates an instruction packet including an instruction payload obtained by dividing a command sequence of a program into predetermined sizes and an instruction header, and the instruction payload for each instruction payload. Branch prediction information that sets branch prediction information indicating the likelihood of branching to an instruction not included in either the instruction payload or the next instruction payload by the branch instruction included in the instruction header corresponding to the instruction payload An instruction packet generation device comprising: an information setting unit; and an instruction packet holding unit that holds an instruction packet including an instruction header in which the branch prediction information is set. This brings about the effect that branch prediction information for determining whether or not to prefetch the next instruction payload is included in the instruction header.

また、この第３の側面において、連続する２つの命令パケットにおける上記分岐予測情報が当該命令パケットの命令ペイロードまたは次の命令ペイロードの何れにも含まれない命令へ分岐が生じる可能性が高い旨を示している場合には上記２つの命令パケットにそれぞれ含まれる２つの分岐命令が同じ命令ペイロードに収まるように上記２つの分岐命令間の命令を圧縮する命令圧縮部をさらに具備してもよい。これにより、次の命令ペイロードをプリフェッチしないと判断されるケースを減らして命令発行レートを向上させるという作用をもたらす。 Further, in the third aspect, the fact that the branch prediction information in two consecutive instruction packets is likely to branch to an instruction that is not included in either the instruction payload of the instruction packet or the next instruction payload. In the case shown, an instruction compression unit may be further included that compresses the instruction between the two branch instructions so that the two branch instructions included in the two instruction packets respectively fit in the same instruction payload. This brings about the effect of reducing the case where it is determined that the next instruction payload is not prefetched and improving the instruction issue rate.

本発明によれば、分岐命令に関する情報を利用することにより命令プリフェッチを制御することができるという優れた効果を奏し得る。 According to the present invention, it is possible to obtain an excellent effect that instruction prefetch can be controlled by using information related to a branch instruction.

本発明の第１の実施の形態におけるプロセッサのパイプライン構成例を示す図である。It is a figure which shows the example of a pipeline structure of the processor in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるプロセッサのブロック構成例を示す図である。It is a figure which shows the block structural example of the processor in the 1st Embodiment of this invention. 本発明の第１の実施の形態における命令パケット３００の構造例を示す図である。It is a figure which shows the structural example of the instruction packet 300 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における命令ヘッダ３１０のフィールド構成例を示す図である。It is a figure which shows the example of a field structure of the instruction header 310 in the 1st Embodiment of this invention. 本発明の第１の実施の形態で用いられる分岐予測フラグ３１１の設定例を示す図である。It is a figure which shows the example of a setting of the branch prediction flag 311 used in the 1st Embodiment of this invention. 本発明の第１の実施の形態で用いられる命令辞書テーブル参照型圧縮の適用例を示す図である。It is a figure which shows the example of application of the instruction dictionary table reference type compression used in the 1st Embodiment of this invention. 本発明の第１の実施の形態における命令辞書テーブル参照型圧縮による分岐予測フラグ３１１の変更例を示す図である。It is a figure which shows the example of a change of the branch prediction flag 311 by the instruction dictionary table reference compression in the 1st Embodiment of this invention. 本発明の第１の実施の形態における命令パケット生成のための機能構成例を示す図である。It is a figure which shows the function structural example for the instruction | indication packet generation in the 1st Embodiment of this invention. 本発明の第１の実施の形態における命令パケット生成のための処理手順例を示す図である。It is a figure which shows the example of a process sequence for the command packet production | generation in the 1st Embodiment of this invention. 本発明の第１の実施の形態における命令実行のための機能構成例を示す図である。It is a figure which shows the function structural example for the instruction execution in the 1st Embodiment of this invention. 本発明の第１の実施の形態における命令実行のための処理手順例を示す図である。It is a figure which shows the example of a process sequence for the instruction execution in the 1st Embodiment of this invention. 本発明の第１の実施の形態における命令ヘッダ３１０のフィールド構成の変形例を示す図である。It is a figure which shows the modification of the field structure of the instruction header 310 in the 1st Embodiment of this invention. 本発明の第２の実施の形態における分岐命令の配置と命令プリフェッチ開始位置との関係例を示す図である。It is a figure which shows the example of a relationship between arrangement | positioning of the branch instruction and instruction prefetch start position in the 2nd Embodiment of this invention. 本発明の第２の実施の形態におけるプリフェッチ開始アドレス設定レジスタを用いた構成例を示す図である。It is a figure which shows the structural example using the prefetch start address setting register in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における命令ヘッダ３１０の命令プリフェッチタイミングフィールド３１２を用いた構成例を示す図である。It is a figure which shows the structural example using the instruction prefetch timing field 312 of the instruction header 310 in the 2nd Embodiment of this invention. 本発明の第２の実施の形態において所定回数の命令実行をプリフェッチタイミングに利用する構成例を示す図である。It is a figure which shows the structural example which utilizes the instruction execution of the predetermined number of times for the prefetch timing in the 2nd Embodiment of this invention. 本発明の第２の実施の形態において命令ヘッダ３１０に命令タイプおよび実行回数を設定した例を示す図である。It is a figure which shows the example which set the instruction type and the frequency | count of execution to the instruction header 310 in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における命令実行のための機能構成例を示す図である。It is a figure which shows the function structural example for the instruction execution in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における命令実行のための処理手順例を示す図である。It is a figure which shows the example of a process sequence for the instruction execution in the 2nd Embodiment of this invention. 本発明の第３の実施の形態におけるプログラムカウンタの加算制御処理の機能構成例を示す図である。It is a figure which shows the function structural example of the addition control process of the program counter in the 3rd Embodiment of this invention. 本発明の第３の実施の形態における加算制御レジスタ６４０の構成例を示す図である。It is a figure which shows the structural example of the addition control register 640 in the 3rd Embodiment of this invention. 本発明の第３の実施の形態における２方向分岐による命令の処理態様例を示す図である。It is a figure which shows the example of a processing mode of the instruction | indication by the two-way branch in the 3rd Embodiment of this invention. 本発明の第３の実施の形態における多方向分岐による命令の処理態様例を示す図である。It is a figure which shows the example of a processing mode of the instruction | indication by the multiway branch in the 3rd Embodiment of this invention. 本発明の第３の実施の形態における加算制御レジスタ６４０に値を設定するための命令セットの一例を示す図である。It is a figure which shows an example of the instruction set for setting a value to the addition control register 640 in the 3rd Embodiment of this invention. 本発明の第３の実施の形態において条件分岐命令により加算制御レジスタ６４０に値を設定した場合の例を示す図である。It is a figure which shows the example at the time of setting a value to the addition control register 640 by the conditional branch instruction in the 3rd Embodiment of this invention. 本発明の第３の実施の形態において制御レジスタ変更命令ＰＣＩＮＣＭＯＤＥにより加算制御レジスタ６４０に値を設定した場合の例を示す図である。It is a figure which shows the example at the time of setting a value to the addition control register 640 by the control register change instruction PCINCMODE in the 3rd Embodiment of this invention. 本発明の第３の実施の形態における命令実行のための処理手順例を示す図である。It is a figure which shows the example of a process sequence for the instruction execution in the 3rd Embodiment of this invention. 本発明の第４の実施の形態におけるプロセッサのパイプライン構成例を示す図である。It is a figure which shows the pipeline structural example of the processor in the 4th Embodiment of this invention. 本発明の第４の実施の形態におけるプロセッサのブロック構成例を示す図である。It is a figure which shows the block structural example of the processor in the 4th Embodiment of this invention. 本発明の第４の実施の形態における分岐命令とキャッシュラインの関係を示す図である。It is a figure which shows the relationship between the branch instruction and cache line in the 4th Embodiment of this invention. 本発明の第４の実施の形態における命令配置の変更の一態様を示す図である。It is a figure which shows one aspect | mode of the change of the instruction arrangement in the 4th Embodiment of this invention. 本発明の第４の実施の形態における命令配置のための機能構成例を示す図である。It is a figure which shows the function structural example for the instruction | indication arrangement | positioning in the 4th Embodiment of this invention. 本発明の第４の実施の形態における命令配置のための処理手順例を示す図である。It is a figure which shows the example of a process sequence for the instruction arrangement | positioning in the 4th Embodiment of this invention. 本発明の第４の実施の形態におけるプリフェッチアドレスレジスタの設定例を示す図である。It is a figure which shows the example of a setting of the prefetch address register in the 4th Embodiment of this invention. 本発明の第４の実施の形態における命令実行のための機能構成例を示す図である。It is a figure which shows the function structural example for the instruction execution in the 4th Embodiment of this invention. 本発明の第４の実施の形態における命令実行のための処理手順例を示す図である。It is a figure which shows the example of a process sequence for the instruction execution in the 4th Embodiment of this invention.

以下、本発明を実施するための形態（以下、実施の形態と称する）について説明する。説明は以下の順序により行う。
１．第１の実施の形態（分岐予測情報を利用した命令プリフェッチの抑止制御）
２．第２の実施の形態（命令プリフェッチのタイミング制御）
３．第３の実施の形態（命令を混合配置することによる命令プリフェッチのペナルティ平均化）
４．第４の実施の形態（分岐先キャッシュラインの配置を固定化することによるキャッシュライン衝突回避）
５．各実施の形態の組合せ Hereinafter, modes for carrying out the present invention (hereinafter referred to as embodiments) will be described. The description will be made in the following order.
1. First embodiment (instruction prefetch suppression control using branch prediction information)
2. Second Embodiment (Instruction Prefetch Timing Control)
3. Third embodiment (instruction prefetch penalty averaging by mixing instructions)
4). Fourth Embodiment (Avoiding cache line collision by fixing the arrangement of branch destination cache lines)
5. Combination of each embodiment

＜１．第１の実施の形態＞
［プロセッサの構成］
図１は、本発明の第１の実施の形態におけるプロセッサのパイプライン構成例を示す図である。この例では、命令フェッチステージ（ＩＦ）１１と、命令デコードステージ（ＩＤ）２１と、レジスタフェッチステージ（ＲＦ）３１と、実行ステージ（ＥＸ）４１と、メモリアクセスステージ（ＭＥＭ）５１の５段階のパイプラインを想定している。各パイプラインは、それぞれラッチ１９、２９、３９および４９によって区切られており、クロックに同期してパイプライン処理が行われる。 <1. First Embodiment>
[Processor configuration]
FIG. 1 is a diagram illustrating a pipeline configuration example of a processor according to the first embodiment of the present invention. In this example, an instruction fetch stage (IF) 11, an instruction decode stage (ID) 21, a register fetch stage (RF) 31, an execution stage (EX) 41, and a memory access stage (MEM) 51 are included. A pipeline is assumed. Each pipeline is delimited by latches 19, 29, 39 and 49, and pipeline processing is performed in synchronization with the clock.

命令フェッチステージ（ＩＦ：Instruction Fetch）１１では、命令フェッチ処理が行われる。この命令フェッチステージ１１では、プログラムカウンタ（ＰＣ）１８が加算部１２により逐次加算され、このプログラムカウンタ１８に示される命令が次の命令デコードステージ２１に供給されていく。また、この命令フェッチステージ１１では、後述する命令キャッシュを含み、この命令キャッシュへの命令プリフェッチを行う。ネクストラインプリフェッチ部１３は、現在実行対象となっている命令を含むキャッシュラインの次のキャッシュラインであるネクストラインのプリフェッチを行うためのものである。 In an instruction fetch stage (IF) 11, instruction fetch processing is performed. In the instruction fetch stage 11, the program counter (PC) 18 is sequentially added by the adder 12, and the instruction indicated by the program counter 18 is supplied to the next instruction decode stage 21. The instruction fetch stage 11 includes an instruction cache described later, and performs instruction prefetching to the instruction cache. The next-implement fetch unit 13 is for prefetching the next line, which is the cache line next to the cache line including the instruction currently being executed.

命令デコードステージ（ＩＤ：Instruction Decode）２１では、命令フェッチステージ１１から供給された命令のデコード処理が行われる。この命令デコードステージ２１においてデコードされた結果は、レジスタフェッチステージ（ＲＦ）３１に供給される。また、分岐命令の場合、その分岐先アドレスがプログラムカウンタ（ＰＣ）１８に供給される。 In the instruction decode stage (ID: Instruction Decode) 21, the instruction supplied from the instruction fetch stage 11 is decoded. The result decoded in the instruction decode stage 21 is supplied to a register fetch stage (RF) 31. In the case of a branch instruction, the branch destination address is supplied to a program counter (PC) 18.

レジスタフェッチステージ（ＲＦ：Register Fetch）３１では、命令実行に必要なオペランドのフェッチ処理が行われる。パイプライン型のプロセッサではオペランドアクセス対象はレジスタファイルに限定されることが多い。このレジスタフェッチステージ３１において取得されたオペランドデータは、実行ステージ（ＥＸ）４１に供給される。 In a register fetch stage (RF: Register Fetch) 31, an operand fetch process necessary for instruction execution is performed. In a pipeline type processor, the operand access target is often limited to a register file. Operand data acquired in the register fetch stage 31 is supplied to an execution stage (EX) 41.

実行ステージ（ＥＸ：EXecute）４１では、オペランドデータを用いて命令実行が行われる。例えば、算術論理演算や分岐判定処理などが行われる。この実行ステージ（ＥＸ）４１において得られた実行結果データはレジスタファイルに格納される。また、ストア命令の場合には、メモリアクセスステージ（ＭＥＭ）５１においてメモリへの書込みが行われる。 In the execution stage (EX: EXecute) 41, instruction execution is performed using operand data. For example, arithmetic logic operations and branch determination processing are performed. The execution result data obtained in the execution stage (EX) 41 is stored in a register file. In the case of a store instruction, the memory access stage (MEM) 51 writes to the memory.

メモリアクセスステージ（ＭＥＭ：Memory）５１では、メモリへのアクセスが行われる。ロード命令の場合にはメモリからのリードアクセスが行われ、ストア命令の場合にはメモリへのライトアクセスが行われる。 In a memory access stage (MEM: Memory) 51, access to the memory is performed. In the case of a load instruction, read access from the memory is performed, and in the case of a store instruction, write access to the memory is performed.

図２は、本発明の第１の実施の形態におけるプロセッサのブロック構成例を示す図である。このプロセッサは、プロセッサコア１１０と、命令キャッシュ１２０と、データキャッシュ１３０と、ネクストラインプリフェッチ部１５０と、パケットデマルチプレクサ１６０とを備えている。また、このプロセッサは、さらに、プリフェッチキュー１７０と、命令キュー１８０と、命令辞書インデックス１９１と、命令辞書テーブル１９２とを備えている。また、このプロセッサには、システムメモリ１４０が接続される。 FIG. 2 is a diagram illustrating a block configuration example of the processor according to the first embodiment of the present invention. This processor includes a processor core 110, an instruction cache 120, a data cache 130, a next implementation fetch unit 150, and a packet demultiplexer 160. The processor further includes a prefetch queue 170, an instruction queue 180, an instruction dictionary index 191, and an instruction dictionary table 192. A system memory 140 is connected to this processor.

プロセッサコア１１０は、命令フェッチ機能を除くプロセッサとしての主機構を備えるものであり、プログラムカウンタ１１１と、命令レジスタ１１２と、命令デコーダ１１３と、実行部１１４と、レジスタファイル１１５とを備えている。プログラムカウンタ１１１は、命令実行対象となる命令のアドレスを逐次計数するカウンタである。命令レジスタ１１２は、プログラムカウンタ１１１によって示される命令実行対象となる命令を保持するレジスタである。命令デコーダ１１３は、命令レジスタ１１２に保持された命令をデコードするデコーダである。実行部１１４は、命令デコーダ１１３によってデコードされた命令を実行するものである。レジスタファイル１１５は、実行部１１４における命令実行に必要なオペランド等を保持する記憶領域である。 The processor core 110 has a main mechanism as a processor excluding the instruction fetch function, and includes a program counter 111, an instruction register 112, an instruction decoder 113, an execution unit 114, and a register file 115. The program counter 111 is a counter that sequentially counts the addresses of instructions to be executed. The instruction register 112 is a register that holds an instruction to be executed by the instruction indicated by the program counter 111. The instruction decoder 113 is a decoder that decodes an instruction held in the instruction register 112. The execution unit 114 executes the instruction decoded by the instruction decoder 113. The register file 115 is a storage area that holds operands and the like necessary for instruction execution in the execution unit 114.

命令キャッシュ１２０は、システムメモリ１４０に記憶されている命令のコピーを保持するキャッシュメモリである。プロセッサコア１１０から命令をアクセスする際、システムメモリ１４０よりも命令キャッシュ１２０の方が高速にアクセス可能であるため、可能な限り前もって命令キャッシュ１２０に命令を保持しておくことが望ましい。必要な命令が命令キャッシュ１２０に保持されている場合にはヒット、保持されていない場合にはミスヒットと呼ばれる。 The instruction cache 120 is a cache memory that holds a copy of instructions stored in the system memory 140. When accessing an instruction from the processor core 110, the instruction cache 120 can be accessed at a higher speed than the system memory 140. Therefore, it is desirable to hold the instruction in the instruction cache 120 as much as possible. When a necessary instruction is held in the instruction cache 120, it is called a hit, and when it is not held, it is called a miss.

データキャッシュ１３０は、システムメモリ１４０に記憶されているデータのコピーを保持するキャッシュメモリである。プロセッサコア１１０からデータをアクセスする際、システムメモリ１４０よりもデータキャッシュ１３０の方が高速にアクセス可能であるため、可能な限り前もってデータキャッシュ１３０に命令を保持しておくことが望ましい。命令キャッシュ１２０の場合と同様に、必要なデータがデータキャッシュ１３０に保持されている場合にはヒット、保持されていない場合にはミスヒットと呼ばれる。命令キャッシュ１２０の場合と異なり、ライトアクセスの際にもデータキャッシュ１３０は用いられる。 The data cache 130 is a cache memory that holds a copy of data stored in the system memory 140. When accessing data from the processor core 110, the data cache 130 can be accessed at a higher speed than the system memory 140. Therefore, it is desirable to store instructions in the data cache 130 in advance as much as possible. As in the case of the instruction cache 120, when the necessary data is held in the data cache 130, it is called a hit, and when it is not held, it is called a miss. Unlike the instruction cache 120, the data cache 130 is also used for write access.

ネクストラインプリフェッチ部１５０は、予め必要となることが予想される命令として、次のキャッシュラインであるネクストラインをシステムメモリ１４０から命令キャッシュ１２０にプリフェッチするためのものである。このネクストラインプリフェッチ部１５０は、パイプライン構成におけるネクストラインプリフェッチ部１３に相当するものであり、命令フェッチステージ（ＩＦ）１１に属する。このネクストラインプリフェッチ部１５０は、プログラムカウンタ１１１の状態を監視して、適切なタイミングにより命令キャッシュ１２０のキャッシュラインのプリフェッチリクエストをシステムメモリ１４０に発行する。 The next implement fetch unit 150 is for prefetching the next cache line, which is the next cache line, from the system memory 140 to the instruction cache 120 as an instruction expected to be required in advance. The next implement fetch unit 150 corresponds to the next implement fetch unit 13 in the pipeline configuration, and belongs to the instruction fetch stage (IF) 11. The next-implement fetch unit 150 monitors the state of the program counter 111 and issues a prefetch request for the cache line of the instruction cache 120 to the system memory 140 at an appropriate timing.

パケットデマルチプレクサ１６０は、システムメモリ１４０から読み出された命令パケットを命令ヘッダと命令ペイロードとに分離するものである。この命令パケットの構造については後述するが、命令のキャッシュラインは命令ペイロードに含まれている。 The packet demultiplexer 160 separates the instruction packet read from the system memory 140 into an instruction header and an instruction payload. Although the structure of the instruction packet will be described later, the instruction cache line is included in the instruction payload.

プリフェッチキュー１７０は、命令ペイロードに含まれる命令のキャッシュラインを保持する待ち行列である。このプリフェッチキュー１７０に保持されたキャッシュラインは、先頭から順に命令キャッシュ１２０に保持される。 The prefetch queue 170 is a queue that holds a cache line of instructions included in the instruction payload. The cache lines held in the prefetch queue 170 are held in the instruction cache 120 in order from the top.

命令キュー１８０は、プログラムカウンタ１１１に従って命令キャッシュ１２０から読み出された命令のキャッシュラインを保持する待ち行列である。 The instruction queue 180 is a queue that holds a cache line of instructions read from the instruction cache 120 according to the program counter 111.

命令辞書インデックス１９１および命令辞書テーブル１９２は、命令辞書テーブル参照型の圧縮命令を実装するためのものである。出現頻度の高い一連の命令マクロが最初に出現したときにその命令マクロを命令辞書登録命令によって登録しておいて、次回出現したときに命令辞書参照命令に対して一連の命令マクロを１命令で置き換えていく。命令辞書テーブル１９２は一連の命令マクロを保持するものであり、命令辞書インデックス１９１は、この命令辞書テーブル１９２をアクセスするためのインデックスとしての機能を有するものである。この命令辞書テーブル参照型の圧縮命令の利用法については後述する。 The instruction dictionary index 191 and the instruction dictionary table 192 are for implementing an instruction dictionary table reference type compression instruction. When a series of instruction macros having a high appearance frequency first appears, the instruction macro is registered by an instruction dictionary registration instruction, and when the next occurrence occurs, a series of instruction macros is assigned to the instruction dictionary reference instruction by one instruction. I will replace it. The instruction dictionary table 192 holds a series of instruction macros, and the instruction dictionary index 191 has a function as an index for accessing the instruction dictionary table 192. The method of using this instruction dictionary table reference type compression instruction will be described later.

システムメモリ１４０は、命令実行対象となる命令およびその命令の実行に必要となるデータを記憶するメモリである。プロセッサコア１１０からシステムメモリ１４０に対してリードまたはライトのアクセスがリクエストされるが、命令キャッシュ１２０またはデータキャッシュ１３０にヒットしている限りは実際にはリクエストは発生しない。なお、このシステムメモリ１４０は、特許請求の範囲に記載の命令パケット保持部の一例である。 The system memory 140 is a memory that stores an instruction to be executed, and data necessary for executing the instruction. A read or write access is requested from the processor core 110 to the system memory 140, but no request is actually generated as long as the instruction cache 120 or the data cache 130 is hit. The system memory 140 is an example of an instruction packet holding unit described in the claims.

このブロック構成例において、プログラムカウンタ１１１、命令キャッシュ１２０、ネクストラインプリフェッチ部１５０、パケットデマルチプレクサ１６０、プリフェッチキュー１７０、命令キュー１８０は図１の命令フェッチステージ（ＩＦ）１１に属する。また、命令レジスタ１１２、命令辞書インデックス１９１および命令辞書テーブル１９２についても命令フェッチステージ（ＩＦ）１１の一部と考えることができる。同様に、命令デコーダは命令デコードステージ（ＩＤ）２１に属する。また、レジスタファイル１１５はレジスタフェッチステージ（ＲＦ）３１に属する。また、実行部１１４は実行ステージ（ＥＸ）４１に属する。データキャッシュ１３０およびシステムメモリ１４０はメモリアクセスステージ（ＭＥＭ）５１に属する。 In this block configuration example, a program counter 111, an instruction cache 120, a next-implement fetch unit 150, a packet demultiplexer 160, a prefetch queue 170, and an instruction queue 180 belong to the instruction fetch stage (IF) 11 in FIG. The instruction register 112, the instruction dictionary index 191 and the instruction dictionary table 192 can also be considered as a part of the instruction fetch stage (IF) 11. Similarly, the instruction decoder belongs to the instruction decode stage (ID) 21. The register file 115 belongs to the register fetch stage (RF) 31. The execution unit 114 belongs to the execution stage (EX) 41. The data cache 130 and the system memory 140 belong to the memory access stage (MEM) 51.

［命令パケットの構造］
図３は、本発明の第１の実施の形態における命令パケット３００の構造例を示す図である。この命令パケット３００は、命令ヘッダ３１０と、命令ペイロード３２０とから構成される。命令ペイロード３２０は、１つ以上の命令キャッシュラインを格納する領域である。この例では、１２８バイトの命令キャッシュラインをｎ個（ｎは１以上の整数）ずつ格納するものとする。命令ヘッダ３１０は、命令ペイロード３２０に付与されるヘッダであり、その命令ペイロード３２０に関する情報を保持する。 [Instruction packet structure]
FIG. 3 is a diagram illustrating a structure example of the instruction packet 300 according to the first embodiment of this invention. The instruction packet 300 includes an instruction header 310 and an instruction payload 320. The instruction payload 320 is an area for storing one or more instruction cache lines. In this example, n 128-byte instruction cache lines are stored (n is an integer of 1 or more). The instruction header 310 is a header given to the instruction payload 320 and holds information regarding the instruction payload 320.

図４は、本発明の第１の実施の形態における命令ヘッダ３１０のフィールド構成例を示す図である。この命令ヘッダ３１０の第１の構成例は、分岐予測フラグ３１１、命令プリフェッチタイミング３１２、命令ペイロード圧縮フラグ３１３、命令ペイロード長３１４、および、プリフェッチ設定３１５の各フィールドを備えている。この例では、命令ヘッダ３１０として３２ビットを想定し、ＬＳＢ側から第０ビットに分岐予測フラグ３１１、第１および２ビットに命令プリフェッチタイミング３１２、第３ビットに命令ペイロード圧縮フラグ３１３をそれぞれ割り当てている。また、第４乃至第７ビットに命令ペイロード長３１４、第８乃至１１ビットにプリフェッチ設定３１５をそれぞれ割り当てている。残りの第１２乃至３１ビットの２０ビットの未使用領域３１６は、後述のように他の用途に用いることができる。 FIG. 4 is a diagram showing a field configuration example of the instruction header 310 according to the first embodiment of the present invention. The first configuration example of the instruction header 310 includes fields of a branch prediction flag 311, an instruction prefetch timing 312, an instruction payload compression flag 313, an instruction payload length 314, and a prefetch setting 315. In this example, 32 bits are assumed as the instruction header 310, and the branch prediction flag 311 is assigned to the 0th bit from the LSB side, the instruction prefetch timing 312 is assigned to the 1st and 2nd bits, and the instruction payload compression flag 313 is assigned to the 3rd bit. Yes. Also, an instruction payload length 314 is assigned to the fourth to seventh bits, and a prefetch setting 315 is assigned to the eighth to eleventh bits, respectively. The remaining 20-bit unused area 316 of the 12th to 31st bits can be used for other purposes as will be described later.

分岐予測フラグ３１１は、命令ペイロード３２０内に分岐命令が存在し、かつ、その分岐先として命令ペイロード３２０内または次の命令ペイロード以外に分岐する可能性が高いことを示すフィールドである。すなわち、ネクストラインプリフェッチをそのまま実行してしまうと無駄になる可能性が高い場合に分岐予測フラグ３１１は例えば「１」を示し、それ以外の場合には例えば「０」を示す。なお、この分岐予測フラグ３１１は、特許請求の範囲に記載の分岐予測情報の一例である。 The branch prediction flag 311 is a field that indicates that a branch instruction exists in the instruction payload 320 and that there is a high possibility of branching in the instruction payload 320 or other than the next instruction payload as a branch destination. In other words, the branch prediction flag 311 indicates, for example, “1” when there is a high possibility that the next implicit fetch is executed as it is, and indicates “0” in other cases. The branch prediction flag 311 is an example of branch prediction information described in the claims.

命令プリフェッチタイミング３１２は、命令プリフェッチを実行するタイミングを示すフィールドである。この命令プリフェッチタイミング３１２については第２の実施の形態において説明する。なお、命令プリフェッチタイミング３１２は、特許請求の範囲に記載のプリフェッチタイミング情報の一例である。 The instruction prefetch timing 312 is a field indicating the timing for executing the instruction prefetch. This instruction prefetch timing 312 will be described in the second embodiment. The instruction prefetch timing 312 is an example of prefetch timing information described in the claims.

命令ペイロード圧縮フラグ３１３は、命令ペイロード３２０についてロスレス圧縮を施しているか否かを示すフィールドである。ロスレス圧縮とはデータの欠落が発生しない可逆圧縮であり、命令ペイロード３２０のビット列全体を圧縮するものである。このロスレス圧縮の方式としては、ハフマン符号、算術符号、ＬＺ符号などが広く知られている。命令ペイロード３２０についてロスレス圧縮を施している場合には、それを伸張しなければ命令デコードを実行することはできない。したがって、命令ペイロード圧縮フラグ３１３が「１」を示す場合には、いったん伸張処理を行った上で命令デコードが行われる。１つの命令キャッシュラインをロスレス圧縮してもフェッチされるデータ量は少なくならないため効果はなく、ある程度長いビット列でないと符号化効率は上がらない。また、分岐命令を含む場合には基本ブロック毎に命令パケットを区切って分ける必要がある。 The instruction payload compression flag 313 is a field indicating whether or not the instruction payload 320 is subjected to lossless compression. Lossless compression is lossless compression that does not cause data loss, and compresses the entire bit string of the instruction payload 320. As the lossless compression method, Huffman code, arithmetic code, LZ code, and the like are widely known. When lossless compression is applied to the instruction payload 320, instruction decoding cannot be executed unless the instruction payload 320 is decompressed. Therefore, when the instruction payload compression flag 313 indicates “1”, the instruction decoding is performed after the decompression process is performed once. Even if lossless compression is performed on one instruction cache line, the amount of data fetched is not reduced, so there is no effect, and the coding efficiency cannot be increased unless the bit string is somewhat long. If a branch instruction is included, it is necessary to divide and separate the instruction packet for each basic block.

命令ペイロード長３１４は、命令ペイロード３２０のサイズを示すフィールドである。例えば、命令キャッシュライン数を単位として、命令ペイロード３２０のサイズを示すことができる。上述の例では、命令ペイロード３２０に１２８バイトの命令キャッシュラインをｎ個ずつ格納することを想定しており、この場合には値ｎが命令ペイロード長３１４に設定される。 The instruction payload length 314 is a field indicating the size of the instruction payload 320. For example, the size of the instruction payload 320 can be indicated with the number of instruction cache lines as a unit. In the above example, it is assumed that n 128-byte instruction cache lines are stored in the instruction payload 320. In this case, the value n is set to the instruction payload length 314.

プリフェッチ設定３１５は、プリフェッチ対象となるアドレスを予め設定するためのフィールドである。このプリフェッチ設定３１５については第４の実施の形態において説明する。 The prefetch setting 315 is a field for presetting an address to be prefetched. This prefetch setting 315 will be described in the fourth embodiment.

［分岐予測フラグ］
図５は、本発明の第１の実施の形態で用いられる分岐予測フラグ３１１の設定例を示す図である。この例では、命令パケット＃１の命令ペイロード内に分岐命令＄１が含まれ、命令パケット＃２および＃３には分岐命令が含まれないことを想定している。そして、分岐命令＄１の分岐先は命令パケット＃３の命令ペイロード内の命令アドレスとなっており、その分岐確率は高いものと予測されている。したがって、この場合には、命令パケット＃１の命令ヘッダの分岐予測フラグ３１１は「１」に設定される。一方、命令パケット＃２および＃３には分岐命令が含まれないため、命令パケット＃２および＃３の命令ヘッダの分岐予測フラグ３１１は「０」に設定される。この分岐予測フラグ３１１は、後述するように、プロファイル等に基づいてコンパイル時に静的に設定されることを想定している。ここでは、命令パケット＃１から見ると、命令パケット＃２に含まれるのがネクストラインであり、命令パケット＃３に含まれるのが分岐先ラインである。 [Branch prediction flag]
FIG. 5 is a diagram illustrating a setting example of the branch prediction flag 311 used in the first embodiment of the present invention. In this example, it is assumed that the branch instruction $ 1 is included in the instruction payload of the instruction packet # 1, and the branch instructions are not included in the instruction packets # 2 and # 3. The branch destination of the branch instruction $ 1 is the instruction address in the instruction payload of the instruction packet # 3, and the branch probability is predicted to be high. Therefore, in this case, the branch prediction flag 311 in the instruction header of the instruction packet # 1 is set to “1”. On the other hand, since branch instructions are not included in the instruction packets # 2 and # 3, the branch prediction flag 311 in the instruction headers of the instruction packets # 2 and # 3 is set to “0”. As will be described later, this branch prediction flag 311 is assumed to be set statically at the time of compilation based on a profile or the like. Here, when viewed from the instruction packet # 1, the next line is included in the instruction packet # 2, and the branch destination line is included in the instruction packet # 3.

このようにして設定された分岐予測フラグ３１１は、命令プリフェッチの際に参照され、「１」に設定されている場合にはネクストキャッシュラインのプリフェッチを中止する。これにより、無駄になることが予想される命令プリフェッチを回避することができる。 The branch prediction flag 311 set in this manner is referred to at the time of instruction prefetching, and when it is set to “1”, prefetching of the next cache line is stopped. Thereby, it is possible to avoid an instruction prefetch that is expected to be wasted.

一方、分岐予測フラグ３１１が「１」に設定されるケースが連続すると、命令プリフェッチが行われず、命令プリフェッチ機構を有効利用できなくなるおそれが生じ得る。そこで、分岐予測フラグ３１１が「１」に設定されるケースが連続しないように、分岐命令の間の命令を命令辞書テーブル参照型圧縮処理により圧縮することを考える。なお、この命令辞書テーブル参照型圧縮処理は、命令ペイロード圧縮フラグ３１３に関するロスレス圧縮とは別個のものである。 On the other hand, if the case where the branch prediction flag 311 is set to “1” continues, instruction prefetch may not be performed and the instruction prefetch mechanism may not be effectively used. Therefore, it is considered that instructions between branch instructions are compressed by an instruction dictionary table reference type compression process so that cases where the branch prediction flag 311 is set to “1” do not continue. The instruction dictionary table reference compression process is separate from the lossless compression related to the instruction payload compression flag 313.

［命令辞書テーブル参照型圧縮］
図６は、本発明の第１の実施の形態で用いられる命令辞書テーブル参照型圧縮の適用例を示す図である。同図左側の非圧縮コードでは、圧縮されていない命令列３３１乃至３３５が配置されている。ここで、命令列３３１、３３２および３３５は同一のコードであることを想定する。同様に、命令列３３３および３３４は同一のコードであることを想定する。 [Instruction dictionary table reference compression]
FIG. 6 is a diagram illustrating an application example of the instruction dictionary table reference compression used in the first embodiment of the present invention. In the uncompressed code on the left side of the figure, uncompressed instruction sequences 331 to 335 are arranged. Here, it is assumed that the instruction sequences 331, 332, and 335 are the same code. Similarly, it is assumed that the instruction sequences 333 and 334 are the same code.

同図中央の圧縮コードでは、命令列３３１の直後に命令辞書登録命令％１が配置される。これにより、命令辞書テーブル１９２の領域％１（３５１）には命令列３３１の内容が登録される。その後、命令辞書参照命令％１（３４２）が実行されると、命令辞書テーブル１９２の領域％１（３５１）が参照され、命令列３３２に相当する内容が伸張されて命令キュー１８０に供給される。 In the compressed code in the center of the figure, the instruction dictionary registration instruction% 1 is arranged immediately after the instruction string 331. As a result, the contents of the instruction string 331 are registered in the area% 1 (351) of the instruction dictionary table 192. Thereafter, when the instruction dictionary reference instruction% 1 (342) is executed, the area% 1 (351) of the instruction dictionary table 192 is referred to, and the content corresponding to the instruction string 332 is expanded and supplied to the instruction queue 180. .

また、圧縮コードでは、命令列３３３の直後に命令辞書登録命令％２が配置される。これにより、命令辞書テーブル１９２の領域％２（３５２）には命令列３３３の内容が登録される。その後、命令辞書参照命令％２（３４４）が実行されると、命令辞書テーブル１９２の領域％２（３５２）が参照され、命令列３３４に相当する内容が伸張されて命令キュー１８０に供給される。 In the compressed code, the instruction dictionary registration instruction% 2 is arranged immediately after the instruction sequence 333. As a result, the contents of the instruction sequence 333 are registered in the area% 2 (352) of the instruction dictionary table 192. Thereafter, when the instruction dictionary reference instruction% 2 (344) is executed, the area% 2 (352) of the instruction dictionary table 192 is referred to, and the content corresponding to the instruction string 334 is expanded and supplied to the instruction queue 180. .

また、さらに命令辞書参照命令％１（３４５）が実行されると、命令辞書テーブル１９２の領域％１（３５１）が参照され、命令列３３５に相当する内容が伸張されて命令キュー１８０に供給される。 When the instruction dictionary reference instruction% 1 (345) is further executed, the area% 1 (351) of the instruction dictionary table 192 is referred to, and the content corresponding to the instruction string 335 is expanded and supplied to the instruction queue 180. The

このように、命令辞書テーブル１９２を使用することにより、命令列の圧縮処理が実現される。そこで、これを利用して、以下のように分岐予測フラグ３１１の設定を変更することができる。 In this way, by using the instruction dictionary table 192, instruction string compression processing is realized. Therefore, by using this, the setting of the branch prediction flag 311 can be changed as follows.

図７は、本発明の第１の実施の形態における命令辞書テーブル参照型圧縮による分岐予測フラグ３１１の変更例を示す図である。同図左側のように命令パケット＃１および＃２において分岐予測フラグ３１１が「１」に設定されている場合、連続して命令プリフェッチが行われなくなってしまう。そこで、上述の命令辞書テーブル１９２を用いた命令圧縮を行うことにより、分岐予測フラグ３１１が連続して「１」に設定されることを解消することを試みる。 FIG. 7 is a diagram illustrating a modification example of the branch prediction flag 311 by the instruction dictionary table reference compression according to the first embodiment of this invention. If the branch prediction flag 311 is set to “1” in the instruction packets # 1 and # 2 as shown on the left side of the figure, the instruction prefetch is not continuously performed. Therefore, an attempt is made to eliminate the fact that the branch prediction flag 311 is continuously set to “1” by performing instruction compression using the instruction dictionary table 192 described above.

すなわち、同図右側のように、命令辞書テーブル１９２を用いて分岐命令＄１と＄２との間の命令を圧縮することにより、命令パケット＃２に含まれていた分岐命令＄２を命令パケット＃１'に移動させる。これにより、命令パケット＃２に分岐命令＄２が存在しなくなることによって、命令パケット＃２'の分岐予測フラグ３１１を「０」に設定することができるようになる。 That is, as shown on the right side of the figure, the instruction between the branch instructions $ 1 and $ 2 is compressed using the instruction dictionary table 192, so that the branch instruction $ 2 included in the instruction packet # 2 is changed to the instruction packet. Move to # 1 '. As a result, the branch instruction $ 2 does not exist in the instruction packet # 2, so that the branch prediction flag 311 of the instruction packet # 2 ′ can be set to “0”.

なお、一般に、命令辞書テーブル参照型の圧縮命令は、通常の命令よりデコードに多くのサイクル数を必要とすることがあるため、全ての命令に適用してしまうと却って処理性能を劣化させるおそれがある。しかし、出現頻度の高い命令マクロが存在するケースにおいては、高い圧縮効率が得られ、効果を発揮する。 In general, an instruction dictionary table reference type compression instruction may require a larger number of cycles for decoding than a normal instruction. Therefore, if it is applied to all instructions, the processing performance may be deteriorated. is there. However, in the case where an instruction macro having a high appearance frequency exists, high compression efficiency is obtained and the effect is exhibited.

［命令パケット生成処理］
図８は、本発明の第１の実施の形態における命令パケット生成のための機能構成例を示す図である。この例は、プログラム保持部４１１と、分岐プロファイル保持部４１２と、命令パケット生成部４２０と、分岐予測フラグ設定部４３０と、命令圧縮部４４０と、命令パケット保持部４１３とを備えている。この命令パケットの生成は、コンパイル時またはリンク時に行うことが適している。リロケータブルなＯＳにおいてダイナミックリンクが行われる場合には実行時でも可能である。 [Instruction packet generation processing]
FIG. 8 is a diagram illustrating a functional configuration example for generating an instruction packet according to the first embodiment of the present invention. This example includes a program holding unit 411, a branch profile holding unit 412, an instruction packet generation unit 420, a branch prediction flag setting unit 430, an instruction compression unit 440, and an instruction packet holding unit 413. It is suitable to generate the instruction packet at the time of compiling or linking. When dynamic linking is performed in a relocatable OS, it is possible even at the time of execution.

プログラム保持部４１１は、命令パケットを生成する対象となるプログラムを保持するものである。分岐プロファイル保持部４１２は、プログラム保持部４１１に保持されたプログラムに含まれる分岐命令の分岐プロファイルを保持するものである。この分岐プロファイルは、予めプログラムを解析し、または、実行することにより得られるものである。無条件分岐命令であれば分岐するか否かはプログラムを解析することにより判断できることが多い。また、条件分岐命令であっても、プログラムを実行させることにより統計的に分岐する確率を判断することができる。 The program holding unit 411 holds a program that is a target for generating an instruction packet. The branch profile holding unit 412 holds a branch profile of a branch instruction included in the program held in the program holding unit 411. This branch profile is obtained by analyzing or executing a program in advance. In the case of an unconditional branch instruction, it is often possible to determine whether or not to branch by analyzing a program. Even for a conditional branch instruction, the probability of statistical branching can be determined by executing a program.

命令パケット生成部４２０は、プログラム保持部４１１に保持されたプログラムを固定サイズにより区切って命令ペイロード３２０を生成し、それぞれに命令ヘッダ３１０を付することにより命令パケット３００を生成する。命令ペイロード３２０のサイズとしては、上述のように１２８バイトの命令キャッシュラインをｎ個ずつ格納することを想定することができる。 The instruction packet generation unit 420 generates an instruction payload 320 by dividing the program held in the program holding unit 411 by a fixed size, and generates an instruction packet 300 by attaching an instruction header 310 to each. As the size of the instruction payload 320, it can be assumed that n 128-byte instruction cache lines are stored as described above.

分岐予測フラグ設定部４３０は、命令パケット生成部４２０によって生成された命令ヘッダ３１０における分岐予測フラグ３１１を設定するものである。この分岐予測フラグ設定部４３０は、分岐プロファイル保持部４１２に保持された分岐プロファイルを参照することにより、命令ペイロード３２０に含まれる分岐命令の分岐先およびその分岐確率を予測して、分岐予測フラグ３１１を設定する。命令ペイロード３２０内に分岐命令が存在し、かつ、その分岐先として命令ペイロード３２０内または次の命令ペイロード以外に分岐する可能性が高い場合には分岐予測フラグ３１１に「１」が設定され、それ以外の場合には「０」が設定される。なお、この分岐予測フラグ設定部４３０は、特許請求の範囲に記載の分岐予測情報設定部の一例である。 The branch prediction flag setting unit 430 sets the branch prediction flag 311 in the instruction header 310 generated by the instruction packet generation unit 420. The branch prediction flag setting unit 430 refers to the branch profile held in the branch profile holding unit 412 to predict the branch destination of the branch instruction included in the instruction payload 320 and its branch probability, and the branch prediction flag 311. Set. If there is a branch instruction in the instruction payload 320 and there is a high possibility of branching in the instruction payload 320 or other than the next instruction payload as the branch destination, “1” is set in the branch prediction flag 311, In other cases, “0” is set. The branch prediction flag setting unit 430 is an example of a branch prediction information setting unit described in the claims.

命令圧縮部４４０は、命令ペイロード３２０に含まれる命令を圧縮するものである。命令辞書テーブル１９２を用いた命令圧縮を行う場合には、出現頻度の高い命令マクロを検出して、最初に出現した際にその命令マクロを命令辞書登録命令によって登録する。そして、次回出現したときに命令辞書参照命令に対して一連の命令マクロを１命令で置き換えていく。その結果、分岐命令の配置が変更された場合には、あらためて分岐予測フラグ３１１を設定し直す。また、命令ペイロード３２０全体に対してロスレス圧縮を行った場合には、命令ヘッダ３１０における命令ペイロード圧縮フラグ３１３を「１」に設定する。 The instruction compression unit 440 compresses instructions included in the instruction payload 320. When instruction compression using the instruction dictionary table 192 is performed, an instruction macro having a high appearance frequency is detected, and when the instruction macro first appears, the instruction macro is registered by an instruction dictionary registration instruction. Then, when it appears next time, a series of instruction macros is replaced with one instruction for the instruction dictionary reference instruction. As a result, when the arrangement of the branch instruction is changed, the branch prediction flag 311 is set again. When lossless compression is performed on the entire instruction payload 320, the instruction payload compression flag 313 in the instruction header 310 is set to “1”.

命令パケット保持部４１３は、命令圧縮部４４０から出力された命令パケット３００を保持するものである。 The instruction packet holding unit 413 holds the instruction packet 300 output from the instruction compression unit 440.

図９は、本発明の第１の実施の形態における命令パケット生成のための処理手順例を示す図である。 FIG. 9 is a diagram illustrating an example of a processing procedure for generating an instruction packet according to the first embodiment of this invention.

まず、命令パケット生成部４２０によって、プログラム保持部４１１に保持されたプログラムが固定サイズに区切られて命令ペイロード３２０を生成され、それぞれに命令ヘッダ３１０が付されることにより命令パケット３００が生成される（ステップＳ９１１）。そして、分岐予測フラグ設定部４３０によって、命令ペイロード３２０内に分岐命令が存在し、かつ、その分岐先として命令ペイロード３２０内または次の命令ペイロード以外に分岐する可能性が高いか否かが判断される（ステップＳ９１２）。その結果、そのような分岐が発生する可能性が高いと判断された場合には分岐予測フラグ３１１に「１」が設定され（ステップＳ９１３）、それ以外の場合には「０」が設定される。 First, the instruction packet generator 420 generates the instruction payload 320 by dividing the program held in the program holder 411 into a fixed size, and the instruction packet 310 is generated by adding the instruction header 310 to each. (Step S911). Then, the branch prediction flag setting unit 430 determines whether or not there is a branch instruction in the instruction payload 320 and there is a high possibility of branching in the instruction payload 320 or other than the next instruction payload as the branch destination. (Step S912). As a result, when it is determined that such a branch is highly likely to occur, “1” is set to the branch prediction flag 311 (step S913), and “0” is set otherwise. .

また、連続する命令パケット３００において分岐予測フラグ３１１に「１」が設定されている場合には（ステップＳ９１４）、命令圧縮部４４０によって命令ペイロード３２０内の命令が命令辞書テーブル１９２を用いて圧縮される（ステップＳ９１５）。なお、命令ペイロード３２０全体に対してロスレス圧縮を行うことも可能であり、その場合には命令ヘッダ３１０における命令ペイロード圧縮フラグ３１３が「１」に設定される。 If “1” is set in the branch prediction flag 311 in the consecutive instruction packets 300 (step S914), the instruction compression unit 440 compresses the instructions in the instruction payload 320 using the instruction dictionary table 192. (Step S915). Note that it is possible to perform lossless compression on the entire instruction payload 320. In this case, the instruction payload compression flag 313 in the instruction header 310 is set to “1”.

［命令実行処理］
図１０は、本発明の第１の実施の形態における命令実行のための機能構成例を示す図である。この例は、命令パケット保持部４１３と、命令パケット分離部４５０と、分岐予測フラグ判定部４６０と、命令プリフェッチ部４７０と、命令伸張部４８０と、命令実行部４９０とを備えている。 [Instruction execution processing]
FIG. 10 is a diagram illustrating an example of a functional configuration for instruction execution in the first embodiment of the present invention. This example includes an instruction packet holding unit 413, an instruction packet separation unit 450, a branch prediction flag determination unit 460, an instruction prefetch unit 470, an instruction expansion unit 480, and an instruction execution unit 490.

命令パケット分離部４５０は、命令パケット保持部４１３に保持されていた命令パケット３００を命令ヘッダ３１０と命令ペイロード３２０とに分離するものである。 The instruction packet separation unit 450 separates the instruction packet 300 held in the instruction packet holding unit 413 into an instruction header 310 and an instruction payload 320.

分岐予測フラグ判定部４６０は、命令ヘッダ３１０の分岐予測フラグ３１１を参照して、命令キャッシュ１２０に対して次のキャッシュラインのプリフェッチを行うべきか否かを判定するものである。プリフェッチを行うべきと判定した場合には、分岐予測フラグ判定部４６０は命令プリフェッチ部４７０に対して命令プリフェッチを要求する。なお、この分岐予測フラグ判定部４６０は、特許請求の範囲に記載の分岐予測情報判定部の一例である。 The branch prediction flag determination unit 460 refers to the branch prediction flag 311 of the instruction header 310 to determine whether or not to prefetch the next cache line for the instruction cache 120. When it is determined that prefetching should be performed, the branch prediction flag determination unit 460 requests the instruction prefetch unit 470 to perform instruction prefetch. The branch prediction flag determination unit 460 is an example of the branch prediction information determination unit described in the claims.

命令プリフェッチ部４７０は、分岐予測フラグ判定部４６０から命令プリフェッチが要求されると、システムメモリ１４０に対して次のキャッシュラインのリクエストを発行するものである。プリフェッチされた命令は、命令キャッシュ１２０に保持され、そのまま命令の流れに変化が生じなければ命令実行部４９０に供給される。 The instruction prefetch unit 470 issues a next cache line request to the system memory 140 when an instruction prefetch is requested from the branch prediction flag determination unit 460. The prefetched instruction is held in the instruction cache 120 and is supplied to the instruction execution unit 490 if there is no change in the instruction flow.

命令伸張部４８０は、命令ヘッダ３１０の命令ペイロード圧縮フラグ３１３が「１」に設定されている場合には、ロスレス圧縮されている命令ペイロード３２０を伸張して、デコード可能な命令列を得るものである。命令ペイロード圧縮フラグ３１３が「１」に設定されていない場合には、命令伸張部４８０は命令ペイロード３２０内の命令をそのまま出力する。 When the instruction payload compression flag 313 of the instruction header 310 is set to “1”, the instruction decompression unit 480 decompresses the instruction payload 320 that is losslessly compressed to obtain a decodable instruction sequence. is there. If the instruction payload compression flag 313 is not set to “1”, the instruction decompression unit 480 outputs the instruction in the instruction payload 320 as it is.

命令実行部４９０は、命令伸張部４８０から出力された命令列を実行するものである。命令辞書テーブル参照型の圧縮を施された命令列については、命令辞書登録命令および命令辞書参照命令を実行することにより各命令を伸張する。これに対し、ロスレス圧縮については、そのままではデコードすることができないため、命令伸張部４８０において命令伸張が行われる。 The instruction execution unit 490 executes the instruction sequence output from the instruction expansion unit 480. For the instruction string subjected to the instruction dictionary table reference type compression, each instruction is expanded by executing the instruction dictionary registration instruction and the instruction dictionary reference instruction. On the other hand, the lossless compression cannot be decoded as it is, so that the instruction expansion unit 480 performs instruction expansion.

図１１は、本発明の第１の実施の形態における命令実行のための処理手順例を示す図である。 FIG. 11 is a diagram illustrating an example of a processing procedure for instruction execution according to the first embodiment of this invention.

まず、命令パケット保持部４１３に保持されていた命令パケット３００が命令パケット分離部４５０によって命令ヘッダ３１０と命令ペイロード３２０とに分離される（ステップＳ９２１）。そして、命令ヘッダ３１０の分岐予測フラグ３１１が分岐予測フラグ判定部４６０によって判定される（ステップＳ９２２）。分岐予測フラグ３１１に「１」が設定されていれば命令プリフェッチは抑止され（ステップＳ９２３）、「０」が設定されていれば命令プリフェッチ部４７０によって命令プリフェッチが実行される（ステップＳ９２４）。 First, the instruction packet 300 held in the instruction packet holding unit 413 is separated into the instruction header 310 and the instruction payload 320 by the instruction packet separation unit 450 (step S921). Then, the branch prediction flag 311 of the instruction header 310 is determined by the branch prediction flag determination unit 460 (step S922). If “1” is set in the branch prediction flag 311, instruction prefetch is suppressed (step S 923), and if “0” is set, instruction prefetch is executed by the instruction prefetch unit 470 (step S 924).

また、命令ヘッダ３１０の命令ペイロード圧縮フラグ３１３が「１」に設定されている場合には（ステップＳ９２５）、命令伸張部４８０がロスレス圧縮されている命令ペイロード３２０を伸張する（ステップＳ９２６）。 If the instruction payload compression flag 313 of the instruction header 310 is set to “1” (step S925), the instruction decompression unit 480 decompresses the instruction payload 320 that has been losslessly compressed (step S926).

そして、得られた命令が命令実行部４９０によって実行される（ステップＳ９２７）。このとき、命令辞書テーブル参照型の圧縮を施された命令列については、命令実行部４９０によって命令辞書登録命令および命令辞書参照命令が実行されることにより、各命令が伸張されていく。 Then, the obtained instruction is executed by the instruction execution unit 490 (step S927). At this time, for the instruction string subjected to the instruction dictionary table reference type compression, the instruction execution unit 490 executes the instruction dictionary registration instruction and the instruction dictionary reference instruction, whereby each instruction is expanded.

なお、ステップＳ９２１は、特許請求の範囲に記載の命令パケット分離手順の一例である。また、ステップＳ９２２は、特許請求の範囲に記載の分岐予測情報判定手順の一例である。また、ステップＳ９２３およびＳ９２４は、特許請求の範囲に記載の命令プリフェッチ手順の一例である。 Step S921 is an example of the instruction packet separation procedure described in the claims. Step S922 is an example of the branch prediction information determination procedure described in the claims. Steps S923 and S924 are an example of an instruction prefetch procedure described in the claims.

このように、本発明の第１の実施の形態によれば、予め分岐予測フラグ３１１を設定しておくことにより、無駄な命令プリフェッチを抑止することができる。 Thus, according to the first embodiment of the present invention, it is possible to suppress useless instruction prefetching by setting the branch prediction flag 311 in advance.

［変形例］
図１２は、本発明の第１の実施の形態における命令ヘッダ３１０のフィールド構成の変形例を示す図である。図４のフィールド構成例では第１２乃至３１ビットの２０ビットを未使用領域３１６としたが、この変形例では、この２０ビットの領域３１７に命令ペイロードの先頭命令を保持することとしている。この第１の実施の形態では、３２ビット長命令の命令セットを想定しているが、命令フィールドの未使用部分やオペランドを減らす等の工夫を施すことにより２０ビットの短縮命令とし、領域３１７に埋め込んでいる。この場合、先頭命令が領域３１７に埋め込まれるため、命令ペイロード３２０のサイズは１命令分、すなわち３２ビットを削減することができる。 [Modification]
FIG. 12 is a diagram illustrating a modification of the field configuration of the instruction header 310 according to the first embodiment of this invention. In the field configuration example of FIG. 4, 20 bits of the 12th to 31st bits are used as the unused area 316, but in this modification, the head instruction of the instruction payload is held in the 20-bit area 317. In the first embodiment, an instruction set of a 32-bit instruction is assumed. However, a 20-bit shortened instruction is created by devising an unused part of the instruction field and operands, and the area 317 Embedded. In this case, since the head instruction is embedded in the area 317, the size of the instruction payload 320 can be reduced by one instruction, that is, 32 bits.

なお、ここでは先頭命令を２０ビットに短縮することとしたが、この短縮命令のビット幅はこれに限定されるものではなく、他のフィールドとの関係から適宜定めることができる。 Although the head instruction is shortened to 20 bits here, the bit width of the shortened instruction is not limited to this, and can be determined as appropriate from the relationship with other fields.

＜２．第２の実施の形態＞
上述の第１の実施の形態ではプログラムを命令パケットにより管理することを前提としていたが、この第２の実施の形態では必ずしもそのような管理は必要ではない。そこで、最初に命令パケットによらない命令プリフェッチ制御について説明し、その後で命令パケットを利用した命令プリフェッチ制御について説明する。なお、この第２の実施の形態では、パイプライン構成およびブロック構成については上述の第１の実施の形態と同様であるため、説明を省略する。 <2. Second Embodiment>
In the above-described first embodiment, it is assumed that the program is managed by the instruction packet. However, in the second embodiment, such management is not necessarily required. Therefore, instruction prefetch control not using an instruction packet will be described first, and then instruction prefetch control using the instruction packet will be described. In the second embodiment, the pipeline configuration and the block configuration are the same as those in the first embodiment described above, and a description thereof will be omitted.

［分岐命令の配置と命令プリフェッチ開始位置］
図１３は、本発明の第２の実施の形態における分岐命令の配置と命令プリフェッチ開始位置との関係例を示す図である。キャッシュライン＃１に存在する分岐命令＄１の分岐先は、キャッシュライン＃３に含まれる。したがって、この分岐命令＄１を実行した結果、分岐することとなった場合、キャッシュライン＃１に続くネクストラインとしてキャッシュライン＃２をプリフェッチしたとしても無駄になってしまう。 [Branch instruction placement and instruction prefetch start position]
FIG. 13 is a diagram illustrating an example of the relationship between the arrangement of branch instructions and the instruction prefetch start position in the second embodiment of the present invention. The branch destination of the branch instruction $ 1 existing in the cache line # 1 is included in the cache line # 3. Therefore, if the branch instruction $ 1 is executed as a result of branching, even if the cache line # 2 is prefetched as the next line following the cache line # 1, it is useless.

このとき、プリフェッチ開始位置Ａからキャッシュライン＃２のプリフェッチを開始したとすると、その時点では分岐命令＄１の実行結果は不明であり、キャッシュライン＃２のプリフェッチが無駄になる可能性がある。一方、プリフェッチ開始位置Ｂからキャッシュライン＃２のプリフェッチを開始することとすると、その時点で分岐命令＄１の実行結果は判明しており、キャッシュライン＃２の無駄なプリフェッチを抑止することが可能である。 At this time, if the prefetch of the cache line # 2 is started from the prefetch start position A, the execution result of the branch instruction $ 1 is unknown at that time, and the prefetch of the cache line # 2 may be wasted. On the other hand, if the prefetch of the cache line # 2 is started from the prefetch start position B, the execution result of the branch instruction $ 1 is known at that time, and it is possible to suppress the useless prefetch of the cache line # 2. It is.

このように、プリフェッチ開始位置は、ネクストラインプリフェッチの抑止の可否に影響を与える。上述の例からすると、プリフェッチ開始位置が遅いほど分岐命令の実行結果を知ることができて、プリフェッチ抑止には有利である。一方、プリフェッチ開始位置が遅過ぎるとプリフェッチが間に合わなくなり、命令パイプラインにおいて命令待ちが生じてしまうおそれがある。そこで、本発明の第２の実施の形態では、予め設定された任意のタイミングにより命令プリフェッチを行う機構を設ける。 Thus, the prefetch start position affects whether or not next-implicit fetching can be suppressed. From the above example, the later the prefetch start position is, the more the execution result of the branch instruction can be known, which is advantageous for prefetch suppression. On the other hand, if the prefetch start position is too late, prefetching may not be in time, and there is a possibility that an instruction wait may occur in the instruction pipeline. Therefore, in the second embodiment of the present invention, a mechanism for performing instruction prefetching at an arbitrary preset timing is provided.

［プリフェッチ開始アドレス設定レジスタにタイミングを設定する場合］
図１４は、本発明の第２の実施の形態におけるプリフェッチ開始アドレス設定レジスタを用いた構成例を示す図である。図１４（ａ）に示すように、この構成例は、ネクストラインプリフェッチ部１５０における構成として、プリフェッチ開始アドレス設定レジスタ１５３と、アドレス比較部１５４とを備える。 [When setting the timing in the prefetch start address setting register]
FIG. 14 is a diagram illustrating a configuration example using a prefetch start address setting register according to the second embodiment of the present invention. As shown in FIG. 14A, this configuration example includes a prefetch start address setting register 153 and an address comparison unit 154 as a configuration in the next implementation fetch unit 150.

プリフェッチ開始アドレス設定レジスタ１５３は、各キャッシュラインにおいてネクストラインプリフェッチを開始するアドレスを設定するレジスタである。このプリフェッチ開始アドレス設定レジスタ１５３に設定されるアドレスは、キャッシュライン内における相対アドレスで足りる。このアドレスの設定は、プログラムにおける分岐命令の頻度などに基づいてコンパイル時に定めることを想定する。なお、このプリフェッチ開始アドレス設定レジスタ１５３は、特許請求の範囲に記載のアドレス設定レジスタの一例である。 The prefetch start address setting register 153 is a register for setting an address at which next implicit fetch is started in each cache line. A relative address in the cache line is sufficient for the address set in the prefetch start address setting register 153. This address setting is assumed to be determined at the time of compilation based on the frequency of branch instructions in the program. The prefetch start address setting register 153 is an example of an address setting register described in the claims.

アドレス比較部１５４は、プリフェッチ開始アドレス設定レジスタ１５３に設定されたアドレスと、プログラムカウンタ１１１の内容とを比較するものである。キャッシュライン内における相対アドレスについて一致が検出されると、アドレス比較部１５４はネクストラインプリフェッチリクエストを発行する。 The address comparison unit 154 compares the address set in the prefetch start address setting register 153 with the contents of the program counter 111. When a match is detected for the relative address in the cache line, the address comparison unit 154 issues a next implement fetch request.

この構成例によれば、キャッシュライン内の任意の位置においてプリフェッチ開始アドレスをプリフェッチ開始アドレス設定レジスタ１５３に設定しておいて、アドレス比較部１５４により一致を検出することができる。 According to this configuration example, the prefetch start address is set in the prefetch start address setting register 153 at an arbitrary position in the cache line, and the address comparison unit 154 can detect a match.

図１４（ｂ）は、具体的な設定アドレスの例を示すものである。キャッシュラインにおいてプリフェッチ開始位置を４個所程度設けることを想定する。キャッシュラインが１２８バイトとすると、３２バイトずつに区切って、先頭（０バイト）、３２バイト、６４バイト（中央）、９６バイトの各位置を設定することが考えられる。４バイト（３２ビット）長命令の命令セットを想定すると、命令アドレスを２進数表記した下位２ビットは無視することができる。したがって、この場合、下位３ビットから下位７ビットの５ビット分をアドレス比較部１５４によって比較対象とすればよいことがわかる。 FIG. 14B shows an example of a specific set address. Assume that about four prefetch start positions are provided in the cache line. Assuming that the cache line is 128 bytes, it is possible to set the positions of the top (0 byte), 32 bytes, 64 bytes (center), and 96 bytes by dividing into 32 bytes. Assuming a 4-byte (32-bit) long instruction set, the lower 2 bits representing the instruction address in binary can be ignored. Therefore, in this case, it is understood that 5 bits from the lower 3 bits to the lower 7 bits may be compared by the address comparison unit 154.

［命令ヘッダの利用］
図１５は、本発明の第２の実施の形態における命令ヘッダ３１０の命令プリフェッチタイミングフィールド３１２を用いた構成例を示す図である。この構成例では、第１の実施の形態において説明した命令パケットを前提として、命令ヘッダ３１０の命令プリフェッチタイミング３１２のフィールドを利用する。そして、ネクストラインプリフェッチ部１５０における構成として、図１４（ａ）のプリフェッチ開始アドレス設定レジスタ１５３およびアドレス比較部１５４に加えて、設定ステップアドレスレジスタ１５１と、乗算部１５２とを備える。 [Use of instruction header]
FIG. 15 is a diagram illustrating a configuration example using the instruction prefetch timing field 312 of the instruction header 310 according to the second embodiment of the present invention. In this configuration example, the field of the instruction prefetch timing 312 of the instruction header 310 is used on the premise of the instruction packet described in the first embodiment. The next-implement fetch unit 150 includes a setting step address register 151 and a multiplication unit 152 in addition to the prefetch start address setting register 153 and the address comparison unit 154 shown in FIG.

設定ステップアドレスレジスタ１５１は、プリフェッチ開始アドレスを設定する際の粒度をステップ値として保持するレジスタである。例えば、上述の例のように３２バイトをステップ値として、キャッシュラインの先頭（０バイト）、３２バイト、６４バイト、または、９６バイトの各位置をプリフェッチ開始として設定する場合には、「３２」が設定ステップアドレスレジスタ１５１に保持される。 The setting step address register 151 is a register that holds the granularity when setting the prefetch start address as a step value. For example, when 32 bytes are set as the step value and the position of the top (0 byte), 32 bytes, 64 bytes, or 96 bytes of the cache line is set as the prefetch start as in the above example, “32” is set. Is held in the setting step address register 151.

乗算部１５２は、命令プリフェッチタイミング３１２のフィールドの値と、設定ステップアドレスレジスタ１５１に保持されたステップ値との乗算を行うものである。上述のように命令プリフェッチタイミング３１２のフィールドは２ビット幅であるため、これを補うために命令プリフェッチタイミング３１２にはステップ数を保持し、設定ステップアドレスレジスタ１５１に示されるステップ値を乗じるように構成している。したがって、命令ヘッダ３１０の命令プリフェッチタイミング３１２には、キャッシュラインの先頭（０バイト）であれば「００」、３２バイトであれば「０１」、６４バイトであれば「１０」、９６バイトであれば「１１」を設定することになる。この乗算部１５２による乗算結果は、プリフェッチ開始アドレス設定レジスタ１５３に保持される。 The multiplier 152 multiplies the field value of the instruction prefetch timing 312 by the step value held in the set step address register 151. As described above, since the field of the instruction prefetch timing 312 is 2 bits wide, in order to compensate for this, the instruction prefetch timing 312 is configured to hold the step number and multiply by the step value indicated in the set step address register 151. is doing. Therefore, the instruction prefetch timing 312 of the instruction header 310 may be “00” if it is the beginning (0 byte) of the cache line, “01” if it is 32 bytes, “10” if it is 64 bytes, or 96 bytes. In this case, “11” is set. The multiplication result by the multiplication unit 152 is held in the prefetch start address setting register 153.

これ以外の構成は図１４（ａ）と同様であり、プリフェッチ開始アドレス設定レジスタ１５３に保持されたアドレスと、プログラムカウンタ１１１の内容とがアドレス比較部１５４によって比較される。キャッシュライン内における相対アドレスについて一致が検出されると、アドレス比較部１５４はネクストラインプリフェッチリクエストを発行する。 The rest of the configuration is the same as in FIG. 14A, and the address held in the prefetch start address setting register 153 and the contents of the program counter 111 are compared by the address comparison unit 154. When a match is detected for the relative address in the cache line, the address comparison unit 154 issues a next implement fetch request.

なお、乗算部１５２における乗算、または、アドレス比較部１５４におけるアドレス比較を容易にするためには、ステップ値は２のべき乗であることが望ましい。 Note that the step value is preferably a power of 2 in order to facilitate multiplication in the multiplication unit 152 or address comparison in the address comparison unit 154.

この構成例によれば、命令ヘッダ３１０の命令プリフェッチタイミング３１２のフィールドを利用して、プリフェッチ開始アドレスをプリフェッチ開始アドレス設定レジスタ１５３に設定することができる。 According to this configuration example, the prefetch start address can be set in the prefetch start address setting register 153 using the field of the instruction prefetch timing 312 of the instruction header 310.

［所定回数の命令実行をプリフェッチタイミングに利用する場合］
図１６は、本発明の第２の実施の形態において所定回数の命令実行をプリフェッチタイミングに利用する構成例を示す図である。図１４および図１５の構成例ではキャッシュライン内の固定位置をプリフェッチタイミングとして設定していたが、この構成例では特定の命令タイプの命令が所定回数実行されたときをプリフェッチタイミングとする。この構成例は、ネクストラインプリフェッチ部１５０における構成として、命令タイプ設定レジスタ１５５と、実行回数設定レジスタ１５６と、命令タイプ比較部１５７と、実行回数カウンタ１５８と、実行回数比較部１５９とを備える。 [When a predetermined number of instruction executions are used for prefetch timing]
FIG. 16 is a diagram illustrating a configuration example in which execution of a predetermined number of instructions is used for prefetch timing in the second embodiment of the present invention. In the configuration examples of FIGS. 14 and 15, the fixed position in the cache line is set as the prefetch timing. However, in this configuration example, the time when an instruction of a specific instruction type is executed a predetermined number of times is set as the prefetch timing. This configuration example includes an instruction type setting register 155, an execution count setting register 156, an instruction type comparison section 157, an execution count counter 158, and an execution count comparison section 159 as a configuration in the next implement fetch section 150.

命令タイプ設定レジスタ１５５は、実行回数を計数する対象となる命令の命令タイプを設定するレジスタである。この場合の命令タイプとしては、例えば、除算命令やロード命令などの比較的レイテンシの長い命令、または、分岐命令などを想定することができる。レイテンシの長い命令については、後続の命令を多少遅らせたとしても全体の実行に影響はないからである。また、分岐命令については、図１３により説明したように、後続の命令を決定するために分岐命令の実行を待った方がよい場合があるからである。 The instruction type setting register 155 is a register that sets an instruction type of an instruction to be counted. As an instruction type in this case, for example, an instruction with a relatively long latency such as a division instruction or a load instruction, a branch instruction, or the like can be assumed. This is because an instruction with a long latency has no effect on the overall execution even if the subsequent instruction is delayed a little. As for the branch instruction, as described with reference to FIG. 13, it may be better to wait for the execution of the branch instruction in order to determine the subsequent instruction.

実行回数設定レジスタ１５６は、命令タイプ設定レジスタ１５５に設定された命令タイプに該当する命令について、その命令が実行される回数を設定するレジスタである。この実行回数設定レジスタ１５６に設定された回数の命令実行が行われると、ネクストラインプリフェッチリクエストが発行される。 The execution count setting register 156 is a register that sets the number of times the instruction corresponding to the instruction type set in the instruction type setting register 155 is executed. When the execution of the number of times set in the execution count setting register 156 is performed, a next implement fetch request is issued.

なお、これら命令タイプおよび実行回数の設定は、プロファイルデータに含まれる出現頻度に基づいてコンパイル時に静的に、または、実行時に動的に決定することができる。 The setting of the instruction type and the number of executions can be determined statically at the time of compilation or dynamically at the time of execution based on the appearance frequency included in the profile data.

命令タイプ比較部１５７は、命令レジスタ１１２に保持される命令の命令タイプと、命令タイプ設定レジスタ１５５に設定された命令タイプとを比較して、一致を検出するものである。この命令タイプ比較部１５７において一致が検出されるたびに、実行回数カウンタ１５８に対して計数のトリガが出力される。 The instruction type comparison unit 157 compares the instruction type of the instruction held in the instruction register 112 with the instruction type set in the instruction type setting register 155 to detect a match. Each time a match is detected by the instruction type comparison unit 157, a count trigger is output to the execution number counter 158.

実行回数カウンタ１５８は、命令タイプ設定レジスタ１５５に設定された命令タイプに該当する命令の実行回数を計数するカウンタである。この実行回数カウンタ１５８は、加算部１５８１と、カウント値レジスタ１５８２とを備える。加算部１５８１は、カウント値レジスタ１５８２の値に「１」を加算するものである。カウント値レジスタ１５８２は、実行回数カウンタ１５８としてのカウント値を保持するレジスタである。このカウント値レジスタ１５８２は、命令タイプ比較部１５７から計数のトリガが出力されるたびに、加算部１５８１の出力を保持する。これにより、実行回数の計数が行われる。 The execution number counter 158 is a counter that counts the number of executions of an instruction corresponding to the instruction type set in the instruction type setting register 155. The execution number counter 158 includes an adder 1581 and a count value register 1582. The adding unit 1581 adds “1” to the value of the count value register 1582. The count value register 1582 is a register that holds a count value as the execution number counter 158. The count value register 1582 holds the output of the adder 1581 each time a count trigger is output from the instruction type comparator 157. Thereby, the number of executions is counted.

実行回数比較部１５９は、カウント値レジスタ１５８２の値と実行回数設定レジスタ１５６の値とを比較して、一致を検出するものである。この実行回数比較部１５９において一致が検出されると、ネクストラインプリフェッチリクエストが発行される。 The execution number comparison unit 159 compares the value of the count value register 1582 and the value of the execution number setting register 156 to detect a match. When the execution number comparing unit 159 detects a match, a next implement fetch request is issued.

なお、命令タイプ設定レジスタ１５５と実行回数設定レジスタ１５６の組は、複数設けることができる。この場合、実行回数カウンタ１５８も別個に設ける必要がある。これにより、何れかの組について一致が検出されると、ネクストラインプリフェッチリクエストが発行される。 A plurality of sets of the instruction type setting register 155 and the execution count setting register 156 can be provided. In this case, it is necessary to provide the execution counter 158 separately. As a result, when a match is detected for any pair, a next implement fetch request is issued.

［命令ヘッダの利用］
図１７は、本発明の第２の実施の形態において命令ヘッダ３１０に命令タイプおよび実行回数を設定した例を示す図である。図１６の構成例では命令タイプ設定レジスタ１５５および実行回数設定レジスタ１５６に命令タイプおよび実行回数をそれぞれ設定していたが、これらの値は命令ヘッダ３１０に設定することも可能である。 [Use of instruction header]
FIG. 17 is a diagram illustrating an example in which the instruction type and the execution count are set in the instruction header 310 according to the second embodiment of this invention. In the configuration example of FIG. 16, the instruction type and the execution count are set in the instruction type setting register 155 and the execution count setting register 156, respectively, but these values can be set in the instruction header 310.

この例では、命令ヘッダ３１０の第１２ビット目から第２５ビット目の１４ビットの領域３１８に命令タイプを設定し、第２６ビット目から第３１ビット目の６ビットの領域３１９に実行回数を設定している。したがって、領域３１８の値を命令タイプ比較部１５７の一方の入力とし、領域３１９の値を実行回数比較部１５９の一方の入力とすることにより、特別な設定レジスタを設けることなく、所定回数の命令実行をプリフェッチタイミングに利用することができる。 In this example, the instruction type is set in the 14-bit area 318 from the 12th bit to the 25th bit of the instruction header 310, and the execution count is set in the 6-bit area 319 from the 26th bit to the 31st bit. is doing. Therefore, by using the value of the area 318 as one input of the instruction type comparison unit 157 and the value of the area 319 as one input of the execution frequency comparison unit 159, a predetermined number of instructions can be obtained without providing a special setting register. Execution can be used for prefetch timing.

［命令実行処理］
図１８は、本発明の第２の実施の形態における命令実行のための機能構成例を示す図である。この例は、プログラム実行状態生成部５１０と、検出状態設定部５２０と、命令プリフェッチタイミング検出部５３０と、命令プリフェッチ部５７０と、命令実行部５９０とを備えている。 [Instruction execution processing]
FIG. 18 is a diagram illustrating an example of a functional configuration for instruction execution in the second embodiment of the present invention. This example includes a program execution state generation unit 510, a detection state setting unit 520, an instruction prefetch timing detection unit 530, an instruction prefetch unit 570, and an instruction execution unit 590.

プログラム実行状態生成部５１０は、現在のプログラムの実行状態を生成するものである。このプログラム実行状態生成部５１０では、現在のプログラムの実行状態として、例えば、現在実行中の命令アドレスを保持するプログラムカウンタ１１１の値を生成することができる。また、例えば、実行回数カウンタ１５８に保持された所定の命令タイプの現在の実行回数を生成することができる。 The program execution state generation unit 510 generates an execution state of the current program. The program execution state generation unit 510 can generate, for example, the value of the program counter 111 that holds the instruction address currently being executed as the execution state of the current program. For example, the current number of executions of a predetermined instruction type held in the execution number counter 158 can be generated.

検出状態設定部５２０は、命令プリフェッチタイミングを検出すべきプログラムの実行状態を設定するものである。この検出状態設定部５２０では、プログラムの実行状態として、例えば、命令プリフェッチタイミングを検出すべき命令アドレスの少なくとも一部をプリフェッチ開始アドレス設定レジスタ１５３に設定することができる。また、例えば、所定の命令タイプの実行回数を実行回数設定レジスタ１５６に設定することができる。 The detection state setting unit 520 sets the execution state of a program whose instruction prefetch timing should be detected. In the detection state setting unit 520, for example, at least a part of an instruction address whose instruction prefetch timing should be detected can be set in the prefetch start address setting register 153 as a program execution state. Further, for example, the execution count of a predetermined instruction type can be set in the execution count setting register 156.

命令プリフェッチタイミング検出部５３０は、現在のプログラムの実行状態と検出状態設定部５２０に設定されたプログラムの実行状態とを比較して、両者が一致した場合に命令プリフェッチタイミングを検出するものである。この命令プリフェッチタイミング検出部５３０として、アドレス比較部１５４または実行回数比較部１５９を利用することができる。 The instruction prefetch timing detection unit 530 compares the execution state of the current program with the execution state of the program set in the detection state setting unit 520, and detects the instruction prefetch timing when they match. As the instruction prefetch timing detection unit 530, the address comparison unit 154 or the execution frequency comparison unit 159 can be used.

命令プリフェッチ部５７０は、命令プリフェッチタイミング検出部５３０によって命令プリフェッチタイミングが検出されると、ネクストラインの命令プリフェッチを実行するものである。 The instruction prefetch unit 570 executes next line instruction prefetch when the instruction prefetch timing detection unit 530 detects the instruction prefetch timing.

命令実行部５９０は、命令プリフェッチ部５７０により取得された命令を実行するものである。この命令実行部５９０による実行の結果、プログラム実行状態生成部５１０によって生成される現在のプログラムの実行状態に影響を与える。すなわち、プログラムカウンタ１１１の値や実行回数カウンタ１５８の値が更新され得る。 The instruction execution unit 590 executes the instruction acquired by the instruction prefetch unit 570. As a result of execution by the instruction execution unit 590, the execution state of the current program generated by the program execution state generation unit 510 is affected. That is, the value of the program counter 111 and the value of the execution counter 158 can be updated.

図１９は、本発明の第２の実施の形態における命令実行のための処理手順例を示す図である。 FIG. 19 is a diagram illustrating an example of a processing procedure for executing instructions in the second embodiment of the present invention.

まず、検出状態設定部５２０に、命令プリフェッチタイミングを検出すべきプログラムの実行状態が設定される（ステップＳ９３１）。例えば、命令プリフェッチタイミングを検出すべき命令アドレスや、所定の命令タイプの実行回数が設定される。 First, an execution state of a program whose instruction prefetch timing is to be detected is set in the detection state setting unit 520 (step S931). For example, the instruction address at which the instruction prefetch timing should be detected and the number of executions of a predetermined instruction type are set.

そして、命令実行部５９０により命令実行が行われ（ステップＳ９３２）、命令プリフェッチタイミング検出部５３０によって命令プリフェッチタイミングが検出される（ステップＳ９３３）。例えば、設定された命令アドレスがプログラムカウンタ１１１と一致した場合や、設定された所定の命令タイプの実行回数が実行回数カウンタ１５８の値と一致した場合に、命令プリフェッチタイミングが検出される。命令プリフェッチタイミング検出部５３０によって命令プリフェッチタイミングが検出されると、命令プリフェッチ部５７０によって命令プリフェッチが行われる（ステップＳ９３４）。 Then, the instruction execution unit 590 executes the instruction (step S932), and the instruction prefetch timing detection unit 530 detects the instruction prefetch timing (step S933). For example, the instruction prefetch timing is detected when the set instruction address matches the program counter 111 or when the set number of executions of the predetermined instruction type matches the value of the execution counter 158. When the instruction prefetch timing is detected by the instruction prefetch timing detection unit 530, the instruction prefetch unit 570 performs instruction prefetch (step S934).

このように、本発明の第２の実施の形態によれば、命令プリフェッチを行うタイミングを予め設定しておくことにより、命令プリフェッチのタイミングを制御することができる。 As described above, according to the second embodiment of the present invention, it is possible to control the instruction prefetch timing by setting the timing for performing the instruction prefetch in advance.

＜３．第３の実施の形態＞
上述の第１および第２の実施の形態ではネクストラインプリフェッチの抑止制御に関するものであったが、以下の第３および第４の実施の形態ではネクストラインおよび分岐先ラインの両者をプリフェッチすることを想定する。なお、本発明の第３の実施の形態では、パイプライン構成およびブロック構成については上述の第１の実施の形態と同様であるため、説明を省略する。 <3. Third Embodiment>
The first and second embodiments described above relate to the suppression control of the next implementation fetch. However, in the following third and fourth embodiments, both the next line and the branch destination line are prefetched. Suppose. Note that in the third embodiment of the present invention, the pipeline configuration and the block configuration are the same as those in the first embodiment described above, and a description thereof will be omitted.

［プログラムカウンタの加算制御処理］
図２０は、本発明の第３の実施の形態におけるプログラムカウンタの加算制御処理の機能構成例を示す図である。この構成例は、命令フェッチ部６１０と、命令デコード部６２０と、命令実行部６３０と、加算制御レジスタ６４０と、加算制御部６５０と、プログラムカウンタ６６０とを備えている。 [Program counter addition control processing]
FIG. 20 is a diagram illustrating a functional configuration example of the addition control process of the program counter according to the third embodiment of the present invention. This configuration example includes an instruction fetch unit 610, an instruction decode unit 620, an instruction execution unit 630, an addition control register 640, an addition control unit 650, and a program counter 660.

命令フェッチ部６１０は、プログラムカウンタ６６０の値に従って、実行対象となる命令をフェッチするものであり、命令フェッチステージ１１に相当する。この命令フェッチ部６１０によってフェッチされた命令は命令デコード部６２０に供給される。 The instruction fetch unit 610 fetches an instruction to be executed according to the value of the program counter 660 and corresponds to the instruction fetch stage 11. The instruction fetched by the instruction fetch unit 610 is supplied to the instruction decode unit 620.

命令デコード部６２０は、命令フェッチ部６１０によってフェッチされた命令をデコードするものであり、命令デコードステージ２１に相当する。 The instruction decode unit 620 decodes the instruction fetched by the instruction fetch unit 610 and corresponds to the instruction decode stage 21.

命令実行部６３０は、命令デコード部６２０によってデコードされた命令を実行するものであり、命令実行ステージ４１に相当する。なお、ここではオペランドアクセスについては省略している。 The instruction execution unit 630 executes the instruction decoded by the instruction decoding unit 620 and corresponds to the instruction execution stage 41. Note that operand access is omitted here.

加算制御レジスタ６４０は、プログラムカウンタ６６０の加算制御を行うためのデータを保持するものである。この加算制御レジスタ６４０の構成例については後述する。 The addition control register 640 holds data for performing addition control of the program counter 660. A configuration example of the addition control register 640 will be described later.

加算制御部６５０は、加算制御レジスタ６４０に保持されたデータに基づいてプログラムカウンタ６６０の加算制御を行うものである。 The addition control unit 650 performs addition control of the program counter 660 based on the data held in the addition control register 640.

プログラムカウンタ６６０は、実行対象となる命令のアドレスを計数するものであり、プログラムカウンタ（ＰＣ）１８に相当する。このプログラムカウンタ６６０は、プログラムカウンタ値保持部６６１と、加算部６６２とを備えている。プログラムカウンタ値保持部６６１は、プログラムカウンタの値を保持するレジスタである。加算部６６２は、プログラムカウンタ値保持部６６１の値を加算する処理を行うものである。 The program counter 660 counts the address of the instruction to be executed, and corresponds to the program counter (PC) 18. The program counter 660 includes a program counter value holding unit 661 and an adding unit 662. The program counter value holding unit 661 is a register that holds the value of the program counter. The adding unit 662 performs processing for adding the values of the program counter value holding unit 661.

図２１は、本発明の第３の実施の形態における加算制御レジスタ６４０の構成例を示す図である。この加算制御レジスタ６４０は、増分語数（ｉｎｃｒ）６４１と、増分回数（ｃｏｎｔｉ）６４２とを保持している。 FIG. 21 is a diagram illustrating a configuration example of the addition control register 640 according to the third embodiment of the present invention. The addition control register 640 holds an increment word count (incr) 641 and an increment count (conti) 642.

増分語数６４１は、加算部６６２においてプログラムカウンタ値保持部６６１の値を加算する際の増分語数を保持するものである。この第４の実施の形態では３２ビット（４バイト）長命令の命令セットを想定しているため、１語は４バイトになる。プログラムカウンタ６６０においてアドレスの下位２ビットを省略して語単位のアドレスを保持しているものとすると、従来の方式では増分値「１」が毎回加算されることになる。これに対し、この第４の実施の形態では、増分値として増分語数６４１の値が加算されていく。増分語数６４１に「１」を設定すると従来通りの動作となるが、「２」以上の整数値を設定した場合には命令を間引きながら実行できるようになる。具体例については後述する。なお、増分語数６４１は、特許請求の範囲に記載の増分値レジスタの一例である。 The increment word number 641 holds the increment word number when the value of the program counter value holding unit 661 is added by the adder 662. In the fourth embodiment, an instruction set of a 32-bit (4 byte) length instruction is assumed, so one word is 4 bytes. If the program counter 660 omits the lower 2 bits of the address and holds the address in units of words, an increment value “1” is added each time in the conventional method. On the other hand, in the fourth embodiment, the value of the increment word number 641 is added as the increment value. When “1” is set in the increment word number 641, the conventional operation is performed. However, when an integer value of “2” or more is set, the instruction can be executed while being thinned. Specific examples will be described later. The increment word number 641 is an example of an increment value register described in the claims.

増分回数６４２は、加算部６６２において増分語数６４１に従った加算を行う回数を保持するものである。通常は従来の方式と同様に増分値「１」を加算するが、増分回数６４２において「１」以上の整数値が設定されている場合には増分語数６４１に従った加算を行う。この増分回数６４２は、図示しない減算部によって、命令が実行されるたびに「０」になるまで「１」を減算するように構成してもよく、また、別途カウンタを設けてそのカウンタの値が「０」になるまで「１」を減算するように構成してもよい。何れの場合であっても、増分回数６４２に指定された回数の加算が増分語数６４１に従って行われた後には、通常通り増分値「１」の加算に戻る。なお、この増分回数６４２は、特許請求の範囲に記載の変更指示レジスタの一例である。 The number of increments 642 holds the number of times the addition unit 662 performs addition according to the number of incremented words 641. Normally, the increment value “1” is added in the same manner as in the conventional method, but when an integer value equal to or greater than “1” is set in the increment count 642, the increment according to the increment word number 641 is performed. The increment count 642 may be configured such that “1” is subtracted until it becomes “0” each time an instruction is executed by a subtracting unit (not shown). “1” may be subtracted until becomes “0”. In any case, after the number of times specified in the increment number 642 is added according to the increment word number 641, the process returns to the increment value “1” as usual. The increment count 642 is an example of a change instruction register described in the claims.

［命令の実行態様］
図２２は、本発明の第３の実施の形態における２方向分岐による命令の処理態様例を示す図である。２方向分岐を行う分岐命令のアドレスを「Ａ」とすると、分岐が生じなかった場合の命令列は、「Ａ＋４」、「Ａ＋１２」、「Ａ＋２０」、「Ａ＋２８」、「Ａ＋３６」、「Ａ＋４４」、「Ａ＋５２」、「Ａ＋６０」...に配置される。一方、分岐が生じた場合の命令列は、「Ａ＋８」、「Ａ＋１６」、「Ａ＋２４」、「Ａ＋３２」、「Ａ＋４０」、「Ａ＋４８」、「Ａ＋５６」、「Ａ＋６４」...に配置される。すなわち、分岐が生じなかった場合の命令列と分岐が生じた場合の命令列とが交互に配置されることになる。 [Instruction execution mode]
FIG. 22 is a diagram illustrating an example of a processing mode of an instruction by a two-way branch according to the third embodiment of this invention. If the address of a branch instruction that performs a two-way branch is “A”, the instruction sequence when no branch occurs is “A + 4”, “A + 12”, “A + 20”, “A + 28”, “A + 36”, “A + 44”. , “A + 52”, “A + 60”... On the other hand, the instruction sequences when a branch occurs are arranged in “A + 8”, “A + 16”, “A + 24”, “A + 32”, “A + 40”, “A + 48”, “A + 56”, “A + 64”,. . That is, the instruction sequence when the branch does not occur and the instruction sequence when the branch occurs are alternately arranged.

この２方向分岐の場合、各命令列の先頭の命令が実行されると、増分語数６４１には「２」が、増分回数６４２には各命令列の命令数が、それぞれ設定される。これにより、交互に配置された各命令列の一方のみを実行していくことができる。 In the case of this two-way branch, when the first instruction of each instruction sequence is executed, “2” is set to the increment word number 641 and the instruction count of each instruction sequence is set to the increment count 642. Thereby, only one of the instruction sequences arranged alternately can be executed.

図２３は、本発明の第３の実施の形態における多方向分岐による命令の処理態様例を示す図である。ここでは、３方向分岐の例について説明するが、４方向以上に分岐する場合も同様の手法により適用可能である。３方向分岐を行う分岐命令のアドレスを「Ａ」とすると、第１の命令列は、「Ａ＋４」、「Ａ＋１６」、「Ａ＋２８」、「Ａ＋４０」、「Ａ＋５２」、「Ａ＋６４」、「Ａ＋７６」...に配置される。また、第２の命令列は、「Ａ＋８」、「Ａ＋２０」、「Ａ＋３２」、「Ａ＋４４」、「Ａ＋５６」、「Ａ＋６８」、「Ａ＋８０」...に配置される。また、第３の命令列は、「Ａ＋１２」、「Ａ＋２４」、「Ａ＋３６」、「Ａ＋４８」、「Ａ＋６０」、「Ａ＋７２」、「Ａ＋８４」...に配置される。すなわち、第１乃至第３の命令列が１命令ずつ順番に配置されることになる。 FIG. 23 is a diagram illustrating an example of a processing mode of an instruction by multidirectional branching according to the third embodiment of this invention. Here, an example of a three-way branch will be described, but the same technique can be applied when branching in four or more directions. If the address of a branch instruction that performs a three-way branch is “A”, the first instruction sequence is “A + 4”, “A + 16”, “A + 28”, “A + 40”, “A + 52”, “A + 64”, “A + 76”. ... is arranged. The second instruction sequence is arranged in “A + 8”, “A + 20”, “A + 32”, “A + 44”, “A + 56”, “A + 68”, “A + 80”. Also, the third instruction sequence is arranged in “A + 12”, “A + 24”, “A + 36”, “A + 48”, “A + 60”, “A + 72”, “A + 84”. That is, the first to third instruction sequences are arranged one by one in order.

この３方向分岐の場合、各命令列の先頭の命令が実行されると、増分語数６４１には「３」が、増分回数６４２には各命令列の命令数が、それぞれ設定される。これにより、１命令ずつ順番に配置された各命令列の一つのみを実行していくことができる。 In the case of this three-way branch, when the first instruction of each instruction sequence is executed, “3” is set to the increment word number 641 and the instruction count of each instruction sequence is set to the increment number 642. Thereby, it is possible to execute only one of the instruction sequences arranged one by one in order.

［加算制御レジスタへの設定］
図２４は、本発明の第３の実施の形態における加算制御レジスタ６４０に値を設定するための命令セットの一例を示す図である。図２４（ａ）は、本発明の第３の実施の形態における命令フォーマットの例である。この命令フォーマットは、６ビットのオペコード（ＯＰＣＯＤＥ）、５ビットの第１ソースオペランド（ｒｓ）、５ビットの第２ソースオペランド（ｒｔ）、５ビットのデスティネーションオペランド（ｒｄ）、１１ビットの即値フィールド（ｉｍｍ）を備えている。 [Setting to addition control register]
FIG. 24 is a diagram illustrating an example of an instruction set for setting a value in the addition control register 640 according to the third embodiment of the present invention. FIG. 24A shows an example of an instruction format in the third embodiment of the present invention. This instruction format includes a 6-bit opcode (OPCODE), a 5-bit first source operand (rs), a 5-bit second source operand (rt), a 5-bit destination operand (rd), and an 11-bit immediate field. (Imm).

図２４（ｂ）は、本発明の第３の実施の形態におけるオペコード一覧の例を示している。縦方向にオペコードの上位３ビット、横方向にオペコードの下位３ビットを配している。以下では、オペコード一覧の右下の条件分岐命令、および、オペコード「１００１１１」の制御レジスタ変更命令に着目して説明する。 FIG. 24B shows an example of an operation code list in the third embodiment of the present invention. The upper 3 bits of the operation code are arranged in the vertical direction, and the lower 3 bits of the operation code are arranged in the horizontal direction. The following description focuses on the conditional branch instruction at the lower right of the operation code list and the control register change instruction of the operation code “100111”.

図２４（ｃ）は、条件分岐命令の命令フォーマットの例である。この条件分岐命令としては、ここでは、ＢＥＱｆｐ、ＢＮＥｆｐ、ＢＬＥｆｐ、ＢＧＴＺｆｐ、ＢＬＴＺｆｐ、ＢＧＥＺｆｐ、ＢＬＴＺＡＬｆｐ、ＢＧＥＺＡＬｆｐを挙げている。分岐（Branch）を表す「Ｂ」に続く「ＥＱ」は両ソースオペランドの値が等しい（EQual）こと（ｒｓ＝ｒｔ）を分岐条件とすることを表す。また、「ＮＥ」は両ソースオペランドの値が等しくない（Not Equal）こと（ｒｓ≠ｒｔ）を分岐条件とすることを表す。また、「ＬＥ」は第１ソースオペランドが第２ソースオペランド以下である（Less than or Equal）こと（ｒｓ≦ｒｔ）を分岐条件とすることを表す。また、「ＧＴＺ」は第１ソースオペランドがゼロより大きい（Greater Than Zero）こと（ｒｓ＞０）を分岐条件とすることを表す。また、「ＬＴＺ」は第１ソースオペランドがゼロより小さい（Less Than Zero）こと（ｒｓ＜０）を分岐条件とすることを表す。また、「ＧＥＺ」は第１ソースオペランドがゼロ以上である（Greater than or Equal Zero）こと（ｒｓ≧０）を分岐条件とすることを表す。また、それらに続く「ＡＬ」は、分岐の際に戻り番地を保存すること（branch And Link）を意味する。また、それらに続く「ｆｐ」は両ソースオペランドの値が浮動小数点数（floating point number）を表すことを意味する。デスティネーションオペランドとして示される増分語数ｉｎｃｒは、プログラムカウンタ６６０の値を加算する際の増分語数である。即値フィールドとして示される増分回数ｃｏｎｔｉは、プログラムカウンタ６６０において増分語数ｉｎｃｒに従った加算を行う回数である。これら条件分岐命令が実行されると、加算制御レジスタ６４０の増分語数６４１には増分語数ｉｎｃｒが設定され、増分回数６４２には増分回数ｃｏｎｔｉが設定される。 FIG. 24C shows an example of the instruction format of the conditional branch instruction. As this conditional branch instruction, BEQfp, BNEfp, BLEfp, BGTfp, BLTZfp, BGEZfp, BLTZALfp, BGEZALfp are listed here. “EQ” following “B” representing a branch (Branch) indicates that the value of both source operands is equal (EQual) (rs = rt) as a branch condition. “NE” represents that the branch condition is that the values of both source operands are not equal (Not Equal) (rs ≠ rt). “LE” indicates that the branch condition is that the first source operand is less than or equal to the second source operand (less than or equal) (rs ≦ rt). “GTZ” indicates that the branch condition is that the first source operand is greater than zero (rs> 0). “LTZ” represents that the first source operand is less than zero (Less Than Zero) (rs <0) as a branch condition. “GEZ” represents that the branch condition is that the first source operand is greater than or equal to zero (rs ≧ 0). Further, “AL” following them means that the return address is stored at the time of branching (branch and link). Further, “fp” following them means that the values of both source operands represent floating point numbers. The increment word number incr shown as the destination operand is the increment word number when the value of the program counter 660 is added. The increment count conti shown as an immediate field is the number of times the program counter 660 performs addition according to the increment word count incr. When these conditional branch instructions are executed, the increment word number incr is set in the increment word number 641 of the addition control register 640, and the increment number conti is set in the increment number 642.

図２４（ｄ）は、制御レジスタ変更命令ＰＣＩＮＣＭＯＤＥの命令フォーマットの例である。この制御レジスタ変更命令ＰＣＩＮＣＭＯＤＥは、プログラムカウンタ６６０の増分モードを加算制御レジスタ６４０に設定する命令である。この制御レジスタ変更命令ＰＣＩＮＣＭＯＤＥが実行されると、加算制御レジスタ６４０の増分語数６４１には増分語数ｉｎｃｒが設定され、増分回数６４２には増分回数ｃｏｎｔｉが設定される。この制御レジスタ変更命令ＰＣＩＮＣＭＯＤＥは、条件分岐命令とは別個の命令であり、実際には条件分岐命令とともに使用される。 FIG. 24D shows an example of the instruction format of the control register change instruction PCINCMODE. This control register change instruction PCINCMODE is an instruction for setting the increment mode of the program counter 660 in the addition control register 640. When this control register change instruction PCINCMODE is executed, the increment word number incr of the increment control register 640 is set to the increment word count incr, and the increment count 642 is set to the increment count conti. This control register change instruction PCINCMODE is an instruction separate from the conditional branch instruction, and is actually used together with the conditional branch instruction.

図２５は、本発明の第３の実施の形態において条件分岐命令により加算制御レジスタ６４０に値を設定した場合の例を示す図である。この例では、条件分岐命令ＢＥＱｆｐにおいて、分岐条件「ｒｓ＝ｒｔ」、増分語数「２」、増分回数「Ｌ／２」が指定されている。この条件分岐命令ＢＥＱｆｐの命令語アドレスをｍとする。このとき、分岐条件「ｒｓ＝ｒｔ」が成立した場合には、命令ｍ＋２、命令ｍ＋４、命令ｍ＋６の順に、命令ｍ＋Ｌまで、増分語数「２」により実行が行われる。一方、分岐条件「ｒｓ＝ｒｔ」が成立しなかった場合には、命令ｍ＋１、命令ｍ＋３、命令ｍ＋５の順に、命令ｍ＋（Ｌ−１）まで、増分語数「２」により実行が行われる。 FIG. 25 is a diagram illustrating an example when a value is set in the addition control register 640 by a conditional branch instruction in the third embodiment of the present invention. In this example, the branch condition “rs = rt”, the increment word number “2”, and the increment count “L / 2” are specified in the conditional branch instruction BEQfp. Let m be the instruction word address of this conditional branch instruction BEQfp. At this time, if the branch condition “rs = rt” is satisfied, the instruction m + 2, the instruction m + 4, and the instruction m + 6 are executed in the order of the instruction m + L up to the instruction m + L with the incremented word number “2”. On the other hand, if the branch condition “rs = rt” is not satisfied, the instruction m + 1, the instruction m + 3, and the instruction m + 5 are executed in the order of the instruction m + (L−1) with the increment word number “2”.

図２６は、本発明の第３の実施の形態において制御レジスタ変更命令ＰＣＩＮＣＭＯＤＥにより加算制御レジスタ６４０に値を設定した場合の例を示す図である。この例では、加算制御レジスタ６４０への設定を行わない通常の条件分岐命令の次に、制御レジスタ変更命令ＰＣＩＮＣＭＯＤＥが配置されている。そして、制御レジスタ変更命令ＰＣＩＮＣＭＯＤＥにおいて、増分語数「２」、増分回数「Ｌ／２」が指定されている。制御レジスタ変更命令ＰＣＩＮＣＭＯＤＥの命令語アドレスをｍとする。このとき、条件分岐命令において分岐条件が成立した場合には、命令ｍ＋２、命令ｍ＋４、命令ｍ＋６の順に、命令ｍ＋Ｌまで、増分語数「２」により実行が行われる。一方、条件分岐命令において分岐条件が成立しなかった場合には、命令ｍ＋１、命令ｍ＋３、命令ｍ＋５の順に、命令ｍ＋（Ｌ−１）まで、増分語数「２」により実行が行われる。 FIG. 26 is a diagram illustrating an example when a value is set in the addition control register 640 by the control register change instruction PCINCMODE in the third embodiment of the present invention. In this example, a control register change instruction PCINCMODE is arranged next to a normal conditional branch instruction that does not set the addition control register 640. In the control register change instruction PCINCMODE, the number of increment words “2” and the increment count “L / 2” are designated. Let m be the instruction word address of the control register change instruction PCINCMODE. At this time, if the branch condition is satisfied in the conditional branch instruction, the instruction m + 2, the instruction m + 4, and the instruction m + 6 are executed in the order of the instruction m + L up to the instruction m + L with the increment word number “2”. On the other hand, if the branch condition is not satisfied in the conditional branch instruction, the instruction m + 1, the instruction m + 3, and the instruction m + 5 are executed in the order of the instruction m + (L−1) and the incremented word number “2”.

［命令実行処理］
図２７は、本発明の第３の実施の形態における命令実行のための処理手順例を示す図である。ここでは、上述の条件分岐命令や制御レジスタ変更命令などによって、加算制御レジスタ６４０への増分語数および増分回数の設定が予め完了していることを想定している。 [Instruction execution processing]
FIG. 27 is a diagram illustrating an example of a processing procedure for instruction execution according to the third embodiment of the present invention. Here, it is assumed that the setting of the number of increment words and the increment count in the addition control register 640 has been completed in advance by the above-described conditional branch instruction, control register change instruction, and the like.

加算制御レジスタ６４０の増分回数６４２がゼロより大きい場合（ステップＳ９４１）、プログラムカウンタ６６０において増分語数６４１に「４」を乗じた値が加算部６６２によってプログラムカウンタ値保持部６６１に加算される（ステップＳ９４２）。また、この場合、加算制御レジスタ６４０の増分回数６４２が「１」減算される（ステップＳ９４３）。一方、加算制御レジスタ６４０の増分回数６４２がゼロより大きくない場合には（ステップＳ９４１）、通常通り、プログラムカウンタ６６０において値「４」が加算部６６２によってプログラムカウンタ値保持部６６１に加算される（ステップＳ９４４）。これらの処理は繰り返し行われる。なお、ステップＳ９４２は、特許請求の範囲に記載の変更増分加算手順の一例である。また、ステップＳ９４４は、特許請求の範囲に記載の通常増分加算手順の一例である。 If the increment count 642 of the addition control register 640 is greater than zero (step S941), a value obtained by multiplying the increment word number 641 by “4” in the program counter 660 is added to the program counter value holding unit 661 by the adder 662 (step S941). S942). In this case, “1” is subtracted from the increment number 642 of the addition control register 640 (step S943). On the other hand, when the increment count 642 of the addition control register 640 is not greater than zero (step S941), the value “4” is added to the program counter value holding unit 661 by the adding unit 662 in the program counter 660 as usual (step S941). Step S944). These processes are repeated. Note that step S942 is an example of a change increment addition procedure described in the claims. Step S944 is an example of a normal incremental addition procedure described in the claims.

このように、本発明の第３の実施の形態によれば、分岐後の複数の命令列を命令単位で順番に混在して配置しておいて、分岐条件に合わせてプログラムカウンタの加算を制御することにより、適切な命令列の命令を実行することができる。これにより、ネクストラインおよび分岐先ラインを混在して配置することが可能となり、命令プリフェッチのペナルティを平均化することができる。 As described above, according to the third embodiment of the present invention, a plurality of instruction sequences after branching are arranged in order and mixed in order, and the addition of the program counter is controlled in accordance with the branching condition. By doing so, an instruction of an appropriate instruction sequence can be executed. As a result, the next line and the branch destination line can be mixedly arranged, and the instruction prefetch penalty can be averaged.

＜４．第４の実施の形態＞
［プロセッサの構成］
図２８は、本発明の第４の実施の形態におけるプロセッサのパイプライン構成例を示す図である。基本的なパイプライン構成は、第１の実施の形態において説明したものと同様の５段階のパイプラインを想定している。 <4. Fourth Embodiment>
[Processor configuration]
FIG. 28 is a diagram illustrating a pipeline configuration example of a processor according to the fourth embodiment of the present invention. The basic pipeline configuration assumes a five-stage pipeline similar to that described in the first embodiment.

上述の第１の実施の形態ではネクストラインプリフェッチ部１３においてネクストラインのプリフェッチを行っていたが、この第４の実施の形態ではネクストライン分岐先ラインプリフェッチ部１４がネクストラインおよび分岐先ラインのプリフェッチを行う。すなわち、現在実行対象となっている命令を含むキャッシュラインの次のキャッシュラインであるネクストラインのみならず、分岐先の命令を含むキャッシュラインである分岐先ラインのプリフェッチも行われる。ネクストライン分岐先ラインプリフェッチ部１４によってプリフェッチされた分岐先ラインは、プリフェッチキュー１７に保持される。このプリフェッチキュー１７に保持された分岐先ラインは、次の命令デコードステージ（ＩＤ）２１に供給される場合に用いられる。なお、ネクストラインについては命令キャッシュから直接供給されるため、プリフェッチキュー１７を介する必要はない。 In the first embodiment, the next line prefetch unit 13 performs the next line prefetch. However, in the fourth embodiment, the next line branch destination line prefetch unit 14 prefetches the next line and the branch destination line. I do. In other words, not only the next line that is the next cache line after the cache line that includes the instruction that is currently being executed, but also the branch destination line that is the cache line that includes the branch destination instruction is prefetched. The branch destination line prefetched by the next line branch destination line prefetch unit 14 is held in the prefetch queue 17. The branch destination line held in the prefetch queue 17 is used when supplied to the next instruction decode stage (ID) 21. Since the next line is directly supplied from the instruction cache, it is not necessary to go through the prefetch queue 17.

図２９は、本発明の第４の実施の形態におけるプロセッサのブロック構成例を示す図である。基本的なブロック構成は、第１の実施の形態において説明したものと同様である。 FIG. 29 is a diagram illustrating a block configuration example of a processor according to the fourth embodiment of the present invention. The basic block configuration is the same as that described in the first embodiment.

上述の第１の実施の形態ではネクストラインプリフェッチ部１５０においてネクストラインのプリフェッチを行っていたが、この第４の実施の形態ではネクストライン分岐先ラインプリフェッチ部２５０がネクストラインおよび分岐先ラインのプリフェッチを行う。また、プリフェッチキュー１７１を命令キャッシュ１２０と並列に配置して、プリフェッチキュー１７１から命令レジスタ１１２へ直接、分岐先ラインを供給できるようにしている。すなわち、分岐が発生した場合には、命令キャッシュ１２０から供給しかけた命令に代えて、プリフェッチキュー１７１からの命令をバイパスして供給する。これにより、パイプラインをストールさせることなく、命令を発行し続けることができる。なお、ネクストライン分岐先ラインプリフェッチ部２５０は、特許請求の範囲に記載のプリフェッチ部の一例である。また、プリフェッチキュー１７１は、特許請求の範囲に記載のプリフェッチキューの一例である。 In the first embodiment described above, the next line prefetch unit 150 performs the next line prefetch. In the fourth embodiment, the next line branch destination line prefetch unit 250 performs the next line and branch destination line prefetch. I do. Further, the prefetch queue 171 is arranged in parallel with the instruction cache 120 so that the branch destination line can be directly supplied from the prefetch queue 171 to the instruction register 112. That is, when a branch occurs, an instruction from the prefetch queue 171 is bypassed and supplied instead of the instruction supplied from the instruction cache 120. Thereby, it is possible to continue issuing instructions without stalling the pipeline. The next line branch destination line prefetch unit 250 is an example of a prefetch unit described in the claims. The prefetch queue 171 is an example of a prefetch queue described in the claims.

なお、この第４の実施の形態では、命令パケットに分けることは必須ではないため、このブロック構成からは除外している。また、命令辞書テーブルによる圧縮についても、この第４の実施の形態では必須ではないため、ブロック構成から除外している。これらは適宜、組み合わせて実施することも可能である。 In the fourth embodiment, since it is not essential to divide into instruction packets, it is excluded from this block configuration. Also, compression by the instruction dictionary table is not essential in the fourth embodiment, and is excluded from the block configuration. These can be implemented in combination as appropriate.

［分岐命令とキャッシュラインの関係］
図３０は、本発明の第４の実施の形態における分岐命令とキャッシュラインの関係を示す図である。 [Relationship between branch instruction and cache line]
FIG. 30 is a diagram showing a relationship between a branch instruction and a cache line in the fourth embodiment of the present invention.

現在実行対象となっている命令を含むキャッシュラインを現ラインと呼称し、それに続くキャッシュラインをネクストラインと呼称する。また、現ラインに含まれる分岐命令の分岐先命令を含むキャッシュラインを分岐先ラインと呼称する。この例では、現ラインの最後に分岐命令が配置されている。これは、現ラインの先頭の命令が実行されるタイミングからネクストラインおよび分岐先ラインのプリフェッチを開始することによって、分岐命令の実行までに両ラインのプリフェッチを完了させるまでの余裕をもたせるためである。したがって、必ずしも現ラインの最後に分岐命令が配置されている必要はなく、現ラインの少なくとも後半部分に配置されていれば、場合によってはプリフェッチの完了に間に合わせることは可能であると考えられる。 A cache line including an instruction that is currently being executed is referred to as a current line, and a subsequent cache line is referred to as a next line. A cache line including a branch destination instruction of a branch instruction included in the current line is referred to as a branch destination line. In this example, a branch instruction is arranged at the end of the current line. This is because by starting the prefetch of the next line and the branch destination line from the timing when the first instruction of the current line is executed, there is a margin until the prefetch of both lines is completed before the branch instruction is executed. . Therefore, it is not always necessary that the branch instruction is arranged at the end of the current line, and if it is arranged at least in the latter half of the current line, it may be possible to meet the prefetch completion in some cases.

現ラインの最後に分岐命令を配置した場合、分岐命令における分岐条件が成立せずに分岐が発生しない場合にはネクストラインが必要となる、分岐条件が成立して分岐が発生する場合には分岐先ラインが必要となる。したがって、分岐条件の成否に関わらずプリフェッチを成功させるためには、ネクストラインおよび分岐先ラインの両ラインをプリフェッチすることが望ましい。本発明の第４の実施の形態では、ネクストライン分岐先ラインプリフェッチ部２５０によって両ラインをプリフェッチすることにより、分岐条件の成否に左右されずに命令実行を継続することを可能とする。この場合、両ラインをプリフェッチするためには、スループットは通常の２倍あることが望ましいが、必ずしも必須要件ではない。 If a branch instruction is placed at the end of the current line, the next line is necessary if the branch condition in the branch instruction is not satisfied and a branch does not occur, and a branch occurs if the branch condition is satisfied and a branch occurs. A destination line is required. Therefore, it is desirable to prefetch both the next line and the branch destination line in order to succeed in prefetching regardless of whether or not the branch condition is satisfied. In the fourth embodiment of the present invention, by prefetching both lines by the next line branch destination line prefetch unit 250, instruction execution can be continued regardless of whether or not the branch condition is satisfied. In this case, in order to prefetch both lines, it is desirable that the throughput is twice the normal, but this is not necessarily an essential requirement.

各キャッシュラインの命令キャッシュ１２０上での衝突を考慮すると、分岐先ラインの配置には制限を設けることが望ましい。例えば、命令キャッシュ１２０がダイレクトマッピング方式であった場合、ラインアドレスが同じキャッシュライン同士は同時には格納することができずに衝突を起こす。この場合、ネクストラインをプリフェッチした直後に同じラインアドレスを有する分岐先ラインをプリフェッチすると、ネクストラインは命令キャッシュ１２０から追い出されてしまう。２ｗａｙセットアソシアティブ方式であれば、衝突を起こす可能性は低くなるが、格納状態によっては他のキャッシュラインに影響を与える場合は生じ得る。したがって、この第４の実施の形態では、最も厳しい条件としてダイレクトマッピング方式の命令キャッシュを想定し、ネクストラインと分岐先ラインとが同じラインアドレスにならないように、コンパイラまたはリンカにおいて分岐先ラインの配置を調整する。 Considering the collision of each cache line on the instruction cache 120, it is desirable to provide a restriction on the arrangement of branch destination lines. For example, when the instruction cache 120 uses the direct mapping method, cache lines having the same line address cannot be stored at the same time, causing a collision. In this case, if a branch destination line having the same line address is prefetched immediately after the next line is prefetched, the next line is evicted from the instruction cache 120. In the case of the 2-way set associative method, the possibility of a collision is reduced, but depending on the storage state, it may occur when other cache lines are affected. Therefore, in the fourth embodiment, the direct mapping type instruction cache is assumed as the strictest condition, and the arrangement of the branch destination line is made by the compiler or the linker so that the next line and the branch destination line do not have the same line address. Adjust.

コンパイラまたはリンカにおいて命令のアドレス配置を変更するためには、例えば次のような手法を用いることができる。まず、以下のような命令列を想定する。なお、「０ｘ」に続く数字は十六進数を表す。
０ｘ００００：命令Ａ
０ｘ０００４：命令Ｂ
０ｘ０００８：命令Ｃ
このとき、命令の配置を全体的に４バイト後方にずらしたい場合には、以下のようにＮＯＰ（No-OPeration）命令を挿入する手法が考えられる。
０ｘ００００：ＮＯＰ命令
０ｘ０００４：命令Ａ
０ｘ０００８：命令Ｂ
０ｘ０００Ｃ：命令Ｃ In order to change the instruction address arrangement in the compiler or linker, for example, the following technique can be used. First, the following instruction sequence is assumed. The number following “0x” represents a hexadecimal number.
0x0000: Instruction A
0x0004: Instruction B
0x0008: Instruction C
At this time, when it is desired to shift the instruction arrangement backward by 4 bytes as a whole, a method of inserting a NOP (No-OPeration) instruction as follows can be considered.
0x0000: NOP instruction 0x0004: Instruction A
0x0008: Instruction B
0x000C: Instruction C

また、命令Ａが複数のオペレーションを行う命令である場合、以下のように命令Ａを命令ＡＡと命令ＡＢの２命令に分けることができれば、同様に命令の配置を全体的に４バイト後方にずらすことができる。
０ｘ００００：命令ＡＡ
０ｘ０００４：命令ＡＢ
０ｘ０００８：命令Ｂ
０ｘ０００Ｃ：命令Ｃ Further, when the instruction A is an instruction for performing a plurality of operations, if the instruction A can be divided into two instructions, the instruction AA and the instruction AB as follows, the instruction arrangement is similarly shifted backward by 4 bytes. be able to.
0x0000: Instruction AA
0x0004: Instruction AB
0x0008: Instruction B
0x000C: Instruction C

図３１は、本発明の第４の実施の形態における命令配置の変更の一態様を示す図である。ここでは、図３１（ａ）のように命令列Ａと命令列Ｂの後方に分岐命令Ｃがあり、命令列Ｄと命令列Ｅの何れかの処理を行い、その後、命令列Ｆを処理するというプログラムを想定する。このとき、命令列Ｂの結果が分岐命令Ｃの分岐条件に影響を与えない場合には、図３１（ｂ）のように分岐命令Ｃを命令列Ａの直後に移動させ、さらに命令列Ｂを分岐先にも配置することにより、実行結果に影響を与えることなく命令配置を変更することができる。 FIG. 31 is a diagram illustrating an aspect of changing the instruction arrangement according to the fourth embodiment of the present invention. Here, as shown in FIG. 31A, there is a branch instruction C behind the instruction sequence A and the instruction sequence B, and either the instruction sequence D or the instruction sequence E is processed, and then the instruction sequence F is processed. Assuming a program called At this time, if the result of the instruction sequence B does not affect the branch condition of the branch instruction C, the branch instruction C is moved immediately after the instruction sequence A as shown in FIG. By placing the instruction at the branch destination, the instruction arrangement can be changed without affecting the execution result.

［命令配置処理］
図３２は、本発明の第４の実施の形態における命令配置のための機能構成例を示す図である。この構成例では、プログラム保持部７０１に保持されたプログラムからオブジェクトコードを生成して、オブジェクトコード保持部７０２に保持させることを想定している。この構成例は、分岐命令抽出部７１０と、分岐命令配置部７２０と、分岐先命令配置部７３０と、オブジェクトコード生成部７４０とを備えている。 [Instruction placement processing]
FIG. 32 is a diagram illustrating an example of a functional configuration for instruction arrangement according to the fourth embodiment of the present invention. In this configuration example, it is assumed that an object code is generated from a program held in the program holding unit 701 and held in the object code holding unit 702. This configuration example includes a branch instruction extraction unit 710, a branch instruction arrangement unit 720, a branch destination instruction arrangement unit 730, and an object code generation unit 740.

分岐命令抽出部７１０は、プログラム保持部７０１に保持されたプログラムの中から分岐命令を抽出するものである。この分岐命令抽出部７１０は、抽出した分岐命令のプログラムにおけるアドレスを把握して分岐命令配置部７２０に供給する。また、この分岐命令抽出部７１０は、抽出した分岐命令の分岐先アドレスを把握して分岐先命令配置部７３０に供給する。 The branch instruction extraction unit 710 extracts a branch instruction from the programs held in the program holding unit 701. The branch instruction extraction unit 710 grasps the address of the extracted branch instruction in the program and supplies it to the branch instruction arrangement unit 720. Further, the branch instruction extraction unit 710 grasps the branch destination address of the extracted branch instruction and supplies it to the branch destination instruction placement unit 730.

分岐命令配置部７２０は、分岐命令抽出部７１０によって抽出された分岐命令をキャッシュライン（現ライン）の後半部分に配置するものである。キャッシュラインの後半部分に配置するのは、上述のように、プリフェッチを完了させるまでの余裕をもたせるためである。したがって、この観点からは、キャッシュラインの最後に分岐命令を配置することが最も望ましいことになる。 The branch instruction placement unit 720 places the branch instruction extracted by the branch instruction extraction unit 710 in the latter half of the cache line (current line). The reason why it is arranged in the latter half part of the cache line is to provide a margin until the prefetch is completed as described above. Therefore, from this point of view, it is most desirable to place a branch instruction at the end of the cache line.

分岐先命令配置部７３０は、分岐命令抽出部７１０によって抽出された分岐命令の分岐先命令を、次のキャッシュライン（ネクストライン）とは異なるラインアドレスを有する他のキャッシュライン（分岐先ライン）に配置するものである。ネクストラインと分岐先ラインとを異なるラインアドレスのキャッシュラインに配置するのは、上述のように、命令キャッシュ１２０における衝突を避けるためである。 The branch destination instruction placement unit 730 transfers the branch destination instruction of the branch instruction extracted by the branch instruction extraction unit 710 to another cache line (branch destination line) having a line address different from the next cache line (next line). Is to be placed. The reason why the next line and the branch destination line are arranged in cache lines having different line addresses is to avoid a collision in the instruction cache 120 as described above.

オブジェクトコード生成部７４０は、分岐命令配置部７２０および分岐先命令配置部７３０によって配置された分岐命令および分岐先命令を含む命令列についてオブジェクトコードを生成するものである。このオブジェクトコード生成部７４０によって生成されたオブジェクトコードは、オブジェクトコード保持部７０２に保持される。なお、このオブジェクトコード生成部７４０は、特許請求の範囲に記載の命令列出力部の一例である。 The object code generation unit 740 generates an object code for an instruction sequence including a branch instruction and a branch destination instruction arranged by the branch instruction arrangement unit 720 and the branch destination instruction arrangement unit 730. The object code generated by the object code generation unit 740 is held in the object code holding unit 702. The object code generation unit 740 is an example of an instruction sequence output unit described in the claims.

図３３は、本発明の第４の実施の形態における命令配置のための処理手順例を示す図である。 FIG. 33 is a diagram illustrating an example of a processing procedure for instruction arrangement according to the fourth embodiment of the present invention.

まず、分岐命令抽出部７１０によって、プログラム保持部７０１に保持されたプログラムの中から分岐命令が抽出される（ステップＳ９５１）。そして、分岐命令抽出部７１０によって抽出された分岐命令が、分岐命令配置部７２０によってキャッシュライン（現ライン）の後半部分に配置される（ステップＳ９５２）。また、分岐命令抽出部７１０によって抽出された分岐命令の分岐先命令が、分岐先命令配置部７３０によって次のキャッシュライン（ネクストライン）とは異なるラインアドレスを有する他のキャッシュライン（分岐先ライン）に配置される（ステップＳ９５３）。そして、分岐命令配置部７２０および分岐先命令配置部７３０によって配置された分岐命令および分岐先命令を含む命令列について、オブジェクトコード生成部７４０によってオブジェクトコードが生成される（ステップＳ９５４）。 First, the branch instruction extraction unit 710 extracts a branch instruction from the program held in the program holding unit 701 (step S951). Then, the branch instruction extracted by the branch instruction extraction unit 710 is arranged in the latter half portion of the cache line (current line) by the branch instruction arrangement unit 720 (step S952). Further, the branch destination instruction of the branch instruction extracted by the branch instruction extraction unit 710 is another cache line (branch destination line) having a line address different from that of the next cache line (next line) by the branch destination instruction placement unit 730. (Step S953). Then, the object code generation unit 740 generates an object code for the instruction sequence including the branch instruction and the branch destination instruction arranged by the branch instruction arrangement unit 720 and the branch destination instruction arrangement unit 730 (step S954).

なお、ステップＳ９５１は、特許請求の範囲に記載の分岐命令抽出手順の一例である。また、ステップＳ９５２は、特許請求の範囲に記載の分岐命令配置手順の一例である。また、ステップＳ９５３は、特許請求の範囲に記載の分岐先命令配置手順の一例である。また、ステップＳ９５４は、特許請求の範囲に記載の命令列出力手順の一例である。 Note that step S951 is an example of a branch instruction extraction procedure described in the claims. Step S952 is an example of a branch instruction arrangement procedure described in the claims. Step S953 is an example of a branch destination instruction arrangement procedure described in the claims. Step S954 is an example of an instruction sequence output procedure described in the claims.

［プリフェッチアドレスの設定］
図３４は、本発明の第４の実施の形態におけるプリフェッチアドレスレジスタの設定例を示す図である。上述のように、分岐先ラインはネクストラインとは異なるラインアドレスに配置される。分岐先ラインをプリフェッチする際には、現ラインからの相対位置により常に固定的にプリフェッチするようにしてもよいが、以下のように自動的にプリフェッチする分岐先アドレスをその都度、任意に設定するようにしてもよい。 [Prefetch address setting]
FIG. 34 is a diagram showing a setting example of the prefetch address register in the fourth embodiment of the present invention. As described above, the branch destination line is arranged at a line address different from that of the next line. When prefetching a branch destination line, the prefetch may be always fixed according to the relative position from the current line. However, the branch destination address to be automatically prefetched is arbitrarily set as follows. You may do it.

図３４（ａ）は、プリフェッチアドレスレジスタ（ＰＲＡＤＤＲ：PRefetch ADDress Register）７９０の構成例を示す図である。このプリフェッチアドレスレジスタ７９０は、分岐先ラインとして命令キャッシュ１２０へのプリフェッチ対象となるプリフェッチアドレスを設定するレジスタである。このプリフェッチアドレスは、プリフェッチアドレスレジスタ７９０の下位１２ビットに保持される。 FIG. 34A is a diagram illustrating a configuration example of a prefetch address register (PRADDR: PRefetch ADDress Register) 790. The prefetch address register 790 is a register that sets a prefetch address to be prefetched to the instruction cache 120 as a branch destination line. This prefetch address is held in the lower 12 bits of the prefetch address register 790.

図３４（ｂ）は、プリフェッチアドレスレジスタ７９０に対する値の設定を行うＭＴＳＩ＿ＰＲＡＤＤＲ（Move To Special register Immediate - PRADDR）命令の命令フォーマットを示す図である。このＭＴＳＩ＿ＰＲＡＤＤＲ命令は、特殊（ＳＰＥＣＩＡＬ）命令の一つであり、特定のレジスタ（ここではプリフェッチアドレスレジスタ７９０）に即値を設定する命令である。この命令の第１７乃至第２１ビットがプリフェッチアドレスレジスタＰＲＡＤＤＲを表している。この命令の第１１乃至第８ビットがプリフェッチアドレスレジスタ７９０の第１１乃至第８ビットに設定される。これにより、プリフェッチ対象となる分岐先ラインのアドレスが設定される。なお、ここでは、命令キャッシュ１２０の仕様として、４Ｋバイトの２ｗａｙセットアソシアティブ方式、１ｗａｙ当り８ラインの計１６ライン、エントリサイズ２５６バイトを想定している。 FIG. 34B is a diagram showing an instruction format of an MTSI_PRADDR (Move To Special register Immediate-PRADDR) instruction for setting a value for the prefetch address register 790. This MTSI_PRADDR instruction is one of special (SPECIAL) instructions, and is an instruction for setting an immediate value in a specific register (here, prefetch address register 790). The 17th to 21st bits of this instruction represent the prefetch address register PRADDR. The 11th to 8th bits of this instruction are set to the 11th to 8th bits of the prefetch address register 790. Thereby, the address of the branch destination line to be prefetched is set. In this case, the specification of the instruction cache 120 is assumed to be a 4-Kbyte 2-way set associative method, a total of 16 lines of 8 lines per way, and an entry size of 256 bytes.

また、他の例として、第１の実施の形態において説明した命令パケット３００に区分けして、命令ヘッダ３１０のプリフェッチ設定フィールド３１５を利用することが考えられる。この場合、図４の命令ヘッダ３１０の第１１乃至第８ビットのプリフェッチ設定フィールド３１５が、プリフェッチアドレスレジスタの第１１乃至第８ビットに設定される。これにより、特殊命令を用いることなく、プリフェッチ対象となる分岐先ラインのアドレスを設定することができる。 As another example, it can be considered that the prefetch setting field 315 of the instruction header 310 is used by dividing the instruction packet 300 described in the first embodiment. In this case, the 11th to 8th bits of the prefetch setting field 315 of the instruction header 310 of FIG. 4 are set to the 11th to 8th bits of the prefetch address register. Thereby, the address of the branch destination line to be prefetched can be set without using a special instruction.

［命令実行処理］
図３５は、本発明の第４の実施の形態における命令実行のための機能構成例を示す図である。この構成例では、プログラムカウンタ１１１の状態を検知して、命令キャッシュ１２０およびプリフェッチキュー１７１へプリフェッチを行うことを想定している。この構成例は、プリフェッチタイミング検出部７５０と、ネクストラインプリフェッチ部７６０と、分岐先ラインプリフェッチ部７７０とを備えている。これらの構成は、ブロック構成のネクストライン分岐先ラインプリフェッチ部２５０に相当するものである。 [Instruction execution processing]
FIG. 35 is a diagram illustrating a functional configuration example for instruction execution according to the fourth embodiment of the present invention. In this configuration example, it is assumed that the state of the program counter 111 is detected and prefetching is performed to the instruction cache 120 and the prefetch queue 171. This configuration example includes a prefetch timing detection unit 750, a next-implement fetch unit 760, and a branch destination line prefetch unit 770. These configurations correspond to the next-line branch destination line prefetch unit 250 having a block configuration.

プリフェッチタイミング検出部７５０は、プログラムカウンタ１１１の状態を参照して、命令プリフェッチのタイミングを検出するものである。この第４の実施の形態では、ネクストラインおよび分岐先ラインの両方向をプリフェッチするため、早期にプリフェッチを開始することが望ましい。したがって、例えばキャッシュラインの先頭の命令が実行開始された時点で命令プリフェッチのタイミングを検出することが考えられる。 The prefetch timing detector 750 refers to the state of the program counter 111 and detects the instruction prefetch timing. In the fourth embodiment, since both the next line and the branch destination line are prefetched, it is desirable to start prefetching early. Therefore, for example, it is conceivable to detect the instruction prefetch timing at the start of execution of the instruction at the head of the cache line.

ネクストラインプリフェッチ部７６０は、ネクストラインをプリフェッチするものである。システムメモリ１４０からプリフェッチされたネクストラインは、命令キャッシュ１２０に格納される。 The next implement fetch unit 760 prefetches the next line. The next line prefetched from the system memory 140 is stored in the instruction cache 120.

分岐先ラインプリフェッチ部７７０は、分岐先ラインをプリフェッチするものである。分岐先ラインは、現ラインからの相対位置のキャッシュラインを固定的に使用するようにしてもよく、また、上述のプリフェッチアドレスレジスタ７９０に設定されたアドレスを使用するようにしてもよい。システムメモリ１４０からプリフェッチされた分岐先ラインは、命令キャッシュ１２０およびプリフェッチキュー１７１に格納される。 The branch destination line prefetch unit 770 prefetches the branch destination line. As the branch destination line, the cache line relative to the current line may be used in a fixed manner, or the address set in the prefetch address register 790 may be used. The branch destination line prefetched from the system memory 140 is stored in the instruction cache 120 and the prefetch queue 171.

図３６は、本発明の第４の実施の形態における命令実行のための処理手順例を示す図である。 FIG. 36 is a diagram illustrating an example of a processing procedure for instruction execution according to the fourth embodiment of the present invention.

まず、プリフェッチタイミング検出部７５０においてキャッシュラインの先頭の命令が実行開始されたことが検知されると（ステップＳ９６１）、ネクストラインプリフェッチ部７６０によってネクストラインがプリフェッチされる（ステップＳ９６２）。また、分岐先ラインプリフェッチ部７７０によって分岐先ラインがプリフェッチされる（ステップＳ９６３）。以下、これらの処理が繰り返される。これにより、ネクストラインおよび分岐先ラインの両方向の命令列がプリフェッチされる。 First, when it is detected by the prefetch timing detection unit 750 that the instruction at the head of the cache line has been started (step S961), the next line is prefetched by the next implementation fetch unit 760 (step S962). The branch destination line prefetch unit 770 prefetches the branch destination line (step S963). Thereafter, these processes are repeated. As a result, instruction sequences in both directions of the next line and the branch destination line are prefetched.

このように、本発明の第４の実施の形態によれば、分岐先ラインをネクストラインとは異なるラインアドレスなるよう配置しておいて、ネクストラインおよび分岐先ラインの両方向の命令列をプリフェッチすることにより、スループットを改善することができる。 As described above, according to the fourth embodiment of the present invention, the branch destination line is arranged to have a line address different from that of the next line, and instruction sequences in both directions of the next line and the branch destination line are prefetched. As a result, throughput can be improved.

＜５．各実施の形態の組合せ＞
ここまでは、本発明の第１乃至第４の実施の形態について別々に説明したが、これらの実施の形態は適宜組み合わせて実施することが可能である。 <5. Combination of embodiments>
Up to this point, the first to fourth embodiments of the present invention have been described separately, but these embodiments can be implemented in appropriate combination.

［第１の実施の形態と第２の実施の形態の組合せ］
本発明の第１の実施の形態では、命令ヘッダ３１０の分岐予測フラグ３１１に従ってプリフェッチの有無を決定していたが、その予測が外れることを回避するために本発明の第２の実施の形態を組み合わせることが有効である。すなわち、第２の実施の形態によりプリフェッチの判断を遅らせることによって、分岐の有無を先に確定させることができ、正しいキャッシュラインをプリフェッチすることができる。 [Combination of the first embodiment and the second embodiment]
In the first embodiment of the present invention, the presence / absence of prefetching is determined according to the branch prediction flag 311 of the instruction header 310, but in order to avoid the prediction being lost, the second embodiment of the present invention is used. It is effective to combine them. That is, by delaying the prefetch determination according to the second embodiment, the presence or absence of a branch can be determined first, and a correct cache line can be prefetched.

［第１または第２の実施の形態と第３の実施の形態の組合せ］
本発明の第３の実施の形態では、両方向のプリフェッチを行うため、アドレスが離れた分岐先への分岐命令や、ｅｌｓｅ節のないｉｆ文の場合には適用が困難なことがある。例えば、多方向分岐の全てのケースが同じ命令数でない場合には命令数が同じ数になるまでＮＯＰ命令を挿入する必要がある。また、ある程度長い命令例になってしまうと命令実行のスループットとキャッシュの利用効率が低下してしまう。これに対して、第１の実施の形態の分岐予測フラグ３１１を利用して、離れたアドレスへ分岐する可能性が高い場合には両方向のプリフェッチを行わないようにすることにより、第３の実施の形態のデメリットを回避することができる。また、第２の実施の形態のように命令プリフェッチタイミングを遅らせることにより分岐の有無を先に確定させ、無駄なプリフェッチを行わないようにすることにより、第３の実施の形態のデメリットを回避することができる。 [Combination of the first or second embodiment and the third embodiment]
Since the third embodiment of the present invention performs prefetching in both directions, it may be difficult to apply in the case of a branch instruction to a branch destination with a remote address or an if statement without an else clause. For example, when all cases of multi-directional branch are not the same number of instructions, it is necessary to insert NOP instructions until the number of instructions becomes the same number. In addition, if an instruction example is long to some extent, instruction execution throughput and cache utilization efficiency are reduced. On the other hand, by using the branch prediction flag 311 of the first embodiment, the prefetching in both directions is not performed when there is a high possibility of branching to a distant address. The disadvantages of this form can be avoided. Further, as in the second embodiment, by delaying the instruction prefetch timing, the presence / absence of a branch is determined first, and unnecessary prefetch is not performed, thereby avoiding the disadvantages of the third embodiment. be able to.

［第１または第２の実施の形態と第４の実施の形態の組合せ］
本発明の第４の実施の形態では、ネクストラインおよび分岐先ラインを常にプリフェッチするようにしていたが、現ラインに分岐命令を含まない場合は分岐先ラインのプリフェッチが無駄になってしまうというデメリットがある。そこで、第１の実施の形態の分岐予測フラグ３１１を利用して、ネクストラインを実行する可能性が高いと判断した場合にはネクストラインのみをプリフェッチすることにより、第４の実施の形態のデメリットを回避することができる。また、第２の実施の形態のように命令プリフェッチタイミングを遅らせることにより分岐の有無を先に確定させ、無駄なプリフェッチを行わないようにすることにより、第４の実施の形態のデメリットを回避することができる。 [Combination of the first or second embodiment and the fourth embodiment]
In the fourth embodiment of the present invention, the next line and the branch destination line are always prefetched. However, if the current line does not include a branch instruction, the prefetch of the branch destination line is wasted. There is. Therefore, when the branch prediction flag 311 of the first embodiment is used and it is determined that there is a high possibility of executing the next line, only the next line is prefetched, thereby demerit of the fourth embodiment. Can be avoided. Further, as in the second embodiment, by delaying the instruction prefetch timing, the presence / absence of a branch is determined first, so that the unnecessary prefetch is not performed, thereby avoiding the disadvantages of the fourth embodiment. be able to.

［第３の実施の形態と第４の実施の形態の組合せ］
本発明の第４の実施の形態では、ネクストラインおよび分岐先ラインの２方向のプリフェッチを対象としていたが、第３の実施の形態を適用することにより、３方向以上の多方向分岐にも適用が可能となる。すなわち、複数の命令列が混在したキャッシュラインを２方向プリフェッチすることにより、多方向分岐への適用が可能となる。 [Combination of the third embodiment and the fourth embodiment]
In the fourth embodiment of the present invention, the prefetch in the two directions of the next line and the branch destination line is targeted. However, by applying the third embodiment, the present invention can also be applied to a multidirectional branch of three or more directions. Is possible. That is, it is possible to apply to a multi-directional branch by bi-directional prefetching a cache line in which a plurality of instruction sequences are mixed.

このとき、ラインサイズ程度の範囲内の小さな分岐については第３の実施の形態を適用して、より広範囲への分岐については第４の実施の形態を適用するように使い分けることにより、両者のデメリットを回避することができる。すなわち、第４の実施の形態では、実行のスループットが低下しない一方で、命令キャッシュの利用効率は常に半分になるというデメリットがある。また、第３の実施の形態では、広範囲への分岐に適用してもあまり効果が得られないというデメリットがある。両者を組み合わせることにより、互いのデメリットを回避することができる。 At this time, the third embodiment is applied to small branches within the range of the line size, and the fourth embodiment is applied to branching to a wider range. Can be avoided. That is, the fourth embodiment has a demerit that the use efficiency of the instruction cache is always halved while the execution throughput does not decrease. Further, the third embodiment has a demerit that it is not very effective even when applied to a branch to a wide range. By combining both, the disadvantages of each other can be avoided.

［その他の組合せ］
ここで挙げた以外の組合せについても可能であり、互いの効果をより向上させることができる。例えば、第１または第２の実施の形態と、第３の実施の形態と、第４の実施の形態とを組み合わせることにより、上述したそれぞれの効果を互いにより向上させることができる。 [Other combinations]
Combinations other than those listed here are also possible, and the mutual effects can be further improved. For example, by combining the first or second embodiment, the third embodiment, and the fourth embodiment, the above-described effects can be further improved.

なお、本発明の実施の形態は本発明を具現化するための一例を示したものであり、本発明の実施の形態において明示したように、本発明の実施の形態における事項と、特許請求の範囲における発明特定事項とはそれぞれ対応関係を有する。同様に、特許請求の範囲における発明特定事項と、これと同一名称を付した本発明の実施の形態における事項とはそれぞれ対応関係を有する。ただし、本発明は実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において実施の形態に種々の変形を施すことにより具現化することができる。 The embodiment of the present invention shows an example for embodying the present invention. As clearly shown in the embodiment of the present invention, the matters in the embodiment of the present invention and the claims Each invention-specific matter in the scope has a corresponding relationship. Similarly, the matters specifying the invention in the claims and the matters in the embodiment of the present invention having the same names as the claims have a corresponding relationship. However, the present invention is not limited to the embodiments, and can be embodied by making various modifications to the embodiments without departing from the gist of the present invention.

また、本発明の実施の形態において説明した処理手順は、これら一連の手順を有する方法として捉えてもよく、また、これら一連の手順をコンピュータに実行させるためのプログラム乃至そのプログラムを記憶する記録媒体として捉えてもよい。この記録媒体として、例えば、ＣＤ（Compact Disc）、ＭＤ（MiniDisc）、ＤＶＤ（Digital Versatile Disk）、メモリカード、ブルーレイディスク（Blu-ray Disc（登録商標））等を用いることができる。 The processing procedure described in the embodiment of the present invention may be regarded as a method having a series of these procedures, and a program for causing a computer to execute the series of procedures or a recording medium storing the program May be taken as As this recording medium, for example, a CD (Compact Disc), an MD (MiniDisc), a DVD (Digital Versatile Disk), a memory card, a Blu-ray Disc (registered trademark), or the like can be used.

１１命令フェッチステージ
１２加算部
１３ネクストラインプリフェッチ部
１４ネクストライン分岐先ラインプリフェッチ部
１７、１７０、１７１プリフェッチキュー
１８、１１１プログラムカウンタ
２１命令デコードステージ
３１レジスタフェッチステージ
４１命令実行ステージ
１１０プロセッサコア
１１２命令レジスタ
１１３命令デコーダ
１１４実行部
１１５レジスタファイル
１２０命令キャッシュ
１３０データキャッシュ
１４０システムメモリ
１５０ネクストラインプリフェッチ部
１５１設定ステップアドレスレジスタ
１５２乗算部
１５３プリフェッチ開始アドレス設定レジスタ
１５４アドレス比較部
１５５命令タイプ設定レジスタ
１５６実行回数設定レジスタ
１５７命令タイプ比較部
１５８実行回数カウンタ
１５９実行回数比較部
１６０パケットデマルチプレクサ
１８０命令キュー
１９１命令辞書インデックス
１９２命令辞書テーブル
２５０ネクストライン分岐先ラインプリフェッチ部
３２０命令ペイロード
４１１プログラム保持部
４１２分岐プロファイル保持部
４１３命令パケット保持部
４２０命令パケット生成部
４３０分岐予測フラグ設定部
４４０命令圧縮部
４５０命令パケット分離部
４６０分岐予測フラグ判定部
４７０命令プリフェッチ部
４８０命令伸張部
４９０命令実行部
５１０プログラム実行状態生成部
５２０検出状態設定部
５３０命令プリフェッチタイミング検出部
５７０命令プリフェッチ部
５９０命令実行部
６１０命令フェッチ部
６２０命令デコード部
６３０命令実行部
６４０加算制御レジスタ
６５０加算制御部
６６０プログラムカウンタ
７０１プログラム保持部
７０２オブジェクトコード保持部
７１０分岐命令抽出部
７２０分岐命令配置部
７３０分岐先命令配置部
７４０オブジェクトコード生成部
７５０プリフェッチタイミング検出部
７６０ネクストラインプリフェッチ部
７７０分岐先ラインプリフェッチ部
７９０プリフェッチアドレスレジスタ DESCRIPTION OF SYMBOLS 11 Instruction fetch stage 12 Adder 13 Next implement fetch part 14 Next line branch destination line prefetch part 17, 170, 171 Prefetch queue 18, 111 Program counter 21 Instruction decode stage 31 Register fetch stage 41 Instruction execution stage 110 Processor core 112 Instruction register 113 instruction decoder 114 execution unit 115 register file 120 instruction cache 130 data cache 140 system memory 150 next-implement fetch unit 151 setting step address register 152 multiplication unit 153 prefetch start address setting register 154 address comparison unit 155 instruction type setting register 156 execution number setting Register 157 Instruction type comparison unit 158 execution times Counter 159 Execution count comparison section 160 Packet demultiplexer 180 Instruction queue 191 Instruction dictionary index 192 Instruction dictionary table 250 Next line branch destination line prefetch section 320 Instruction payload 411 Program holding section 412 Branch profile holding section 413 Instruction packet holding section 420 Instruction packet generation Section 430 Branch prediction flag setting section 440 Instruction compression section 450 Instruction packet separation section 460 Branch prediction flag determination section 470 Instruction prefetch section 480 Instruction decompression section 490 Instruction execution section 510 Program execution state generation section 520 Detection state setting section 530 Instruction prefetch timing detection Section 570 Instruction prefetch section 590 Instruction execution section 610 Instruction fetch section 620 Instruction decode section 630 Instruction execution section 640 Addition Control register 650 Addition control unit 660 Program counter 701 Program holding unit 702 Object code holding unit 710 Branch instruction extraction unit 720 Branch instruction arrangement unit 730 Branch destination instruction arrangement unit 740 Object code generation unit 750 Prefetch timing detection unit 760 Next implement fetch unit 770 Branch destination line prefetch unit 790 Prefetch address register

Claims

Indicates the high possibility of branching to an instruction that is not included in either the instruction payload or the next instruction payload due to an instruction payload obtained by dividing the instruction sequence of the program by a predetermined size and a branch instruction included in the instruction payload. An instruction packet holding unit for holding an instruction packet including an instruction header including branch prediction information;
An instruction packet separation unit for separating the instruction packet held in the instruction packet holding unit into the instruction payload and the instruction header;
Based on the branch prediction information included in the instruction header, there is a high possibility of branching to an instruction not included in either the instruction payload or the next instruction payload by a branch instruction included in the instruction payload corresponding to the instruction header. A branch prediction information determination unit for instructing prefetch suppression of the next instruction packet,
An instruction fetch device comprising: an instruction prefetch unit that executes prefetch of the next instruction packet unless the prefetch suppression is instructed.

The apparatus further comprises an instruction execution unit that reads an instruction sequence corresponding to the instruction dictionary reference instruction from the instruction dictionary table based on an instruction dictionary reference instruction included in the instruction payload separated by the instruction packet separation unit and decompresses the instruction string. The instruction fetch device according to claim 1.

3. The instruction fetch apparatus according to claim 2, further comprising: an instruction decompression unit that decompresses an instruction string from an instruction payload corresponding to the instruction header based on an instruction payload compression flag included in the instruction header.

Indicates the high possibility of branching to an instruction that is not included in either the instruction payload or the next instruction payload due to an instruction payload obtained by dividing the instruction sequence of the program by a predetermined size and a branch instruction included in the instruction payload. An instruction packet holding unit for holding an instruction packet including an instruction header including branch prediction information;
An instruction packet separation unit for separating the instruction packet held in the instruction packet holding unit into the instruction payload and the instruction header;
An instruction execution unit that executes an instruction sequence included in the instruction payload separated by the instruction packet separation unit;
Based on the branch prediction information included in the instruction header, there is a high possibility of branching to an instruction not included in either the instruction payload or the next instruction payload by a branch instruction included in the instruction payload corresponding to the instruction header. A branch prediction information determination unit for instructing prefetch suppression of the next instruction packet,
A processor comprising: an instruction prefetch unit that executes prefetch of the next instruction packet and supplies the next instruction packet to the instruction packet separation unit unless the prefetch suppression is instructed.

Indicates the high possibility of branching to an instruction that is not included in either the instruction payload or the next instruction payload due to an instruction payload obtained by dividing the instruction sequence of the program by a predetermined size and a branch instruction included in the instruction payload. An instruction packet separation procedure for separating the instruction packet held in an instruction packet holding unit that holds an instruction packet including an instruction header including branch prediction information into the instruction payload and the instruction header;
Based on the branch prediction information included in the instruction header, there is a high possibility of branching to an instruction not included in either the instruction payload or the next instruction payload by a branch instruction included in the instruction payload corresponding to the instruction header. Branch prediction information determination procedure for instructing prefetch suppression of the next instruction packet,
And an instruction prefetch procedure for executing prefetch of the next instruction packet unless the prefetch suppression is instructed.

An instruction packet generation unit that generates an instruction packet including an instruction payload and an instruction header obtained by dividing a program instruction sequence into predetermined sizes;
For each of the instruction payloads, branch prediction information indicating the likelihood of branching to an instruction not included in either the instruction payload or the next instruction payload due to a branch instruction included in the instruction payload is included in the instruction payload. A branch prediction information setting unit set in the corresponding instruction header;
An instruction packet generation apparatus comprising: an instruction packet holding unit that holds an instruction packet including an instruction header in which the branch prediction information is set.

When the branch prediction information in two consecutive instruction packets indicates that there is a high possibility that a branch will occur to an instruction that is not included in either the instruction payload of the instruction packet or the next instruction payload. The instruction packet generation device according to claim 6, further comprising an instruction compression unit that compresses an instruction between the two branch instructions so that the two branch instructions included in the instruction packet can be accommodated in the same instruction payload.