JP2012216009A

JP2012216009A - Method for increasing speed of program by performing overtaking control of memory access instruction

Info

Publication number: JP2012216009A
Application number: JP2011080035A
Authority: JP
Inventors: Shukuyu Kudo; 淑裕工藤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2011-03-31
Filing date: 2011-03-31
Publication date: 2012-11-08

Abstract

PROBLEM TO BE SOLVED: To provide a method for performing a program at high speed.SOLUTION: A program performing system comprises: a CPU having a store instruction including an overtaking permitting flag showing overtaking permission/non-permission of a load instruction and an instruction set including an instruction for blocking overtaking of the load instruction; a compiler having overtaking permitting flag setting means for setting the flag and overtaking blocking instruction creating means for creating the instruction; and an instruction reorder unit having overtaking blocking instruction registering means for registering the instruction for blocking overtaking into an instruction reorder buffer and instruction by-pass means for registering a memory access instruction into the instruction reorder buffer while referring to the store instruction to which the overtaking permitting flag is set.

Description

本発明は、記憶装置にデータを書き込むないし記憶装置からデータを読み込む命令をもつＣＰＵを利用する技術分野において、プログラムを高速に実行する技術に関する。 The present invention relates to a technique for executing a program at high speed in a technical field using a CPU having an instruction for writing data to a storage device or reading data from the storage device.

従来、命令レベルの並列性を高めてプログラムを高速実行するため、複数の命令パイプラインを実装したＣＰＵがある。 Conventionally, there are CPUs equipped with a plurality of instruction pipelines in order to execute instructions at high speed by increasing instruction level parallelism.

図１は、従来例における命令パイプラインの流れを示す図である。命令パイプライン１１は、基本的には、命令フェッチ１２、命令デコード１３、実行１４、リード／ライト１５の４つのステージで構成され、それらの各ステージはいくつか同時に処理できるようになっている。図１は二つの命令を同時処理可能な命令パイプラインの例である。メモリアクセス命令では、そのアドレス計算を実行ステージ１４、メモリの読み込み／書き込みをリード／ライトステージ１５で処理する。各ステージの前後には一時的な記憶域であるバッファが用意され、次のステージが処理可能になるまで、前ステージの結果がバッファ内で待たされる。 FIG. 1 is a diagram showing a flow of an instruction pipeline in a conventional example. The instruction pipeline 11 basically includes four stages of an instruction fetch 12, an instruction decode 13, an execution 14, and a read / write 15, and each of these stages can be processed simultaneously. FIG. 1 shows an example of an instruction pipeline capable of simultaneously processing two instructions. In the memory access instruction, the address calculation is processed in the execution stage 14, and the memory read / write is processed in the read / write stage 15. A buffer which is a temporary storage area is prepared before and after each stage, and the result of the previous stage is waited in the buffer until the next stage can be processed.

このような命令パイプラインを使って、メモリアクセス命令を同時に処理するとき、たとえばロード命令とストア命令でアクセスするメモリ領域が同じであるとき、それらの命令による読み込み／書き込みの順序関係が保たれなければ命令の正しい実行結果は得られない。このため、ロード命令、ストア命令のどちらか一方の命令は、リード／ライトステージ１５の直前のバッファで待たされることになる。一般的に、メモリアクセス命令は実行に時間を要するため、それらの命令の結果を必要とする命令も同じように各ステージの前後のバッファ内で待たされる。このようにして、命令パイプラインの流れが止まってしまうことをパイプラインハザードと呼ぶ。 Using such an instruction pipeline, when memory access instructions are processed simultaneously, for example, when the memory area accessed by a load instruction and a store instruction is the same, the read / write order relationship by these instructions must be maintained. If this is the case, the correct execution result of the instruction cannot be obtained. Therefore, one of the load instruction and the store instruction is waited in the buffer immediately before the read / write stage 15. In general, since memory access instructions require time to execute, instructions that require the results of those instructions are similarly waited in buffers before and after each stage. Stopping the flow of the instruction pipeline in this way is called a pipeline hazard.

パイプラインハザードを避けて、プログラムを高速化するための方法として、リード／ライトステージの前後において、メモリアクセス命令のメモリ・ディスアンビギュエションを行い、あるメモリ領域に対するロード、ストアの順序関係を保ちながら、ロード命令をできるだけ先行して開始する方法がある。 As a method for avoiding pipeline hazards and speeding up the program, memory disambiguation of memory access instructions is performed before and after the read / write stage, and the order relationship between load and store for a certain memory area is determined. There is a way to start the load instruction as early as possible while keeping.

図２は、メモリアクセス命令のメモリ・ディスアンビギュエションを行い高速化するときの従来例を示す図である。同図の左上に示す６行の命令列は、命令パイプラインに入力される命令のアセンブライメージである。左端の列の数字２１は、入力される順番であり、次の列２２は、ロード、ストアのどちらであるかを示す。右列の中央部２３はロード、ストアを行う領域のアドレスを示し、右端の数字２４はロード、ストアするサイズを示す。アドレス計算ユニット２５は、実行ステージ１４に相当し、メモリアクセス命令のアクセスするメモリ領域のアドレスを計算する。命令デコードステージ１３の結果が入力される。命令リオーダユニット２６は、メモリ・ディスアンビギュエションを行いロード、ストア命令の実行順番を制御する。メモリアクセスユニット２８は、リード／ライトステージ１５に相当する。アドレス計算ユニット２５、メモリアクセスユニット２８は、二つの命令を同時に処理できるものとする。命令リオーダバッファ２９は、命令パイプラインのリード／ライトステージ１５の直前のバッファに相当する。 FIG. 2 is a diagram showing a conventional example in which memory disambiguation of a memory access instruction is performed to increase the speed. The six-row instruction sequence shown in the upper left of the figure is an assembler image of an instruction input to the instruction pipeline. The number 21 in the leftmost column indicates the input order, and the next column 22 indicates whether it is a load or a store. The central portion 23 of the right column indicates the address of the area where loading and storing are performed, and the number 24 at the right end indicates the size of loading and storing. The address calculation unit 25 corresponds to the execution stage 14 and calculates the address of the memory area accessed by the memory access instruction. The result of the instruction decode stage 13 is input. The instruction reorder unit 26 performs memory disambiguation and loads and controls the execution order of store instructions. The memory access unit 28 corresponds to the read / write stage 15. It is assumed that the address calculation unit 25 and the memory access unit 28 can process two instructions simultaneously. The instruction reorder buffer 29 corresponds to a buffer immediately before the read / write stage 15 of the instruction pipeline.

命令リオーダユニット２６は、アドレス計算ユニット２５から命令と計算されたアドレスを受け取ると命令リオーダバッファ２９の空きエントリに登録する。図２の右下の表の行の一つ一つがエントリである。エントリが空いていない場合は空くまで待つ。命令リオーダバッファの一つのエントリには、エントリの「番号」２１１、ロード、ストアの「種別」２１２、ロード、ストアする「サイズ」２１３、計算された「アドレス」２１４、先行させなければならない命令に対応する命令リオーダバッファのエントリの番号である「先行Ｎ」(Ｎは１から命令リオーダバッファのエントリ数−１までの整数、図２の場合は１，２，３，４又は５のいずれか)２１５を登録する。「先行させなければならない命令」とは、当該命令の実行開始前に、実行完了していなければならない命令のことである。ここで、「実行完了」とは命令パイプラインのリード／ライトステージ１５の完了を意味する。命令パイプラインの入力となっているアセンブライメージの、入力される順番２１が４のストア命令２１８は、順番が１及び３のロード命令が終わる前にアドレス値address1を更新してはならないため、それらの実行完了を待たなければならない。このため、順番４の命令に対応する命令リオーダバッファの番号５のエントリ２１０の「先行１」、「先行２」には、順番１のロード命令に対応するエントリの番号「１」、順番３のロード命令に対応するエントリの番号「３」が登録される。また、２−１の入力される順番が６のロード命令２１７は、順番が４のストア命令２１８の実行完了前に実行を開始してはならないため、入力される順番が６のロード命令２１７に対応する命令リオーダバッファの番号４のエントリ２１６の「先行１」に、番号「５」が登録される。 When the instruction reorder unit 26 receives the instruction and the calculated address from the address calculation unit 25, the instruction reorder unit 26 registers it in the empty entry of the instruction reorder buffer 29. Each row in the lower right table of FIG. 2 is an entry. If the entry is not free, wait until it is free. One entry of the instruction reorder buffer includes an entry “number” 211, a load and store “type” 212, a load and store “size” 213, a calculated “address” 214, and an instruction to be preceded. “Leading N” which is the number of the corresponding instruction reorder buffer entry (N is an integer from 1 to the number of instruction reorder buffer entries−1, in the case of FIG. 2, one of 1, 2, 3, 4 or 5) 215 is registered. The “instruction that must be preceded” is an instruction that must be completed before the execution of the instruction starts. Here, “execution completion” means completion of the read / write stage 15 of the instruction pipeline. In the assembler image that is the input of the instruction pipeline, the store instruction 218 with the input order 21 of 4 must not update the address value address1 before the load instructions with the order 1 and 3 finish. You have to wait for the execution to complete. Therefore, the “preceding 1” and “preceding 2” of the entry 210 of the instruction reorder buffer number 5 corresponding to the instruction of order 4 include the number “1” of the entry corresponding to the load instruction of order 1 and the The entry number “3” corresponding to the load instruction is registered. Also, since the load instruction 217 whose order is 2-1 is input must not be started before the execution of the store instruction 218 whose order is 4 is completed, the load instruction 217 whose order is 6 is input. The number “5” is registered in “preceding 1” of the entry 216 of number 4 of the corresponding instruction reorder buffer.

命令リオーダユニット２６は、すべての「先行Ｎ」に番号が設定されていない（命令リオーダバッファ２９では「−」と表記。）エントリに対応する命令をメモリアクセスユニット２８に渡す。メモリアクセスユニット２８は、命令の実行が完了すると、命令リオーダユニット２６に命令実行完了通知２７を送る。命令実行完了通知２７を受け取った命令リオーダユニット２６は、完了した命令に対応する命令リオーダバッファのエントリをクリアして「空きエントリ」にするとともに、他のエントリの「先行Ｎ」に登録されていた当該エントリの番号をクリア（「−」に設定。）する。こうして命令リオーダバッファ２９に空きができると、新たな命令を命令リオーダバッファに登録し、順次命令を実行している。 The instruction reorder unit 26 passes to the memory access unit 28 an instruction corresponding to an entry in which all “preceding N” numbers are not set (indicated as “−” in the instruction reorder buffer 29). When the execution of the instruction is completed, the memory access unit 28 sends an instruction execution completion notification 27 to the instruction reorder unit 26. The instruction reorder unit 26 that has received the instruction execution completion notification 27 clears the entry in the instruction reorder buffer corresponding to the completed instruction to become “empty entry” and is registered in “preceding N” of other entries. Clear the entry number (set to "-"). When the instruction reorder buffer 29 becomes free in this way, a new instruction is registered in the instruction reorder buffer, and instructions are sequentially executed.

また、ＳＩＭＤ（Single Instruction Multi Data）命令、ベクトル命令のような一つの命令で複数のメモリ領域に対してデータをロード、ストアする命令がある。 There are also instructions for loading and storing data in a plurality of memory areas with one instruction such as a SIMD (Single Instruction Multi Data) instruction and a vector instruction.

図３は、一つの命令で四つのメモリ領域にロード、ストアできるベクトル命令をもつ場合の従来例を示す図である。ＤＯループ３１は、ベクトル収集、ベクトル拡散を行うＦｏｒｔｒａｎ言語で記述されている。ＤＯループ３１に対して、コンパイラは命令列３２を生成する。レジスタ３７のｒｅｇ５は４つの値を保持できるレジスタ（ベクトルレジスタという。）である。レジスタreg1、reg2、reg3、reg4、及びreg6もベクトルレジスタである。ベクトルロード命令３３は、配列IDX2，３８の先頭がアドレス値address1のメモリ領域から、おのおののサイズが８バイトの、連続した４つのデータを１命令で読み込む命令である。図３は、命令列３２の２回目の繰り返しの例であり、アドレスaddress1＋０、address1＋８、address1＋１６、address1＋２４のアドレスから値を読み込み、ベクトルレジスタreg1に書き込んでいる。ベクトルレジスタreg1，３９には、４つの値２、５、６、６が書き込まれている。ベクトル収集命令３４は、４つの値が書き込まれたベクトルレジスタreg1，３９の値を元にアドレスを計算し、４つのメモリ領域から値を集める命令である。ベクトルレジスタreg1，３９には、４つの値２、５、６、６が書き込まれているので、ベクトル収集命令３４は、配列Ｙ，３１０のアドレスaddress3＋（２−１）×８、address3＋（５−１）×８、address3＋（６−１）×８から値を読み込み、ベクトルレジスタreg3，３１１に値を書き込んでいる。このようにして得た結果をもとにベクトル加算命令３５が４つの加算を一命令で行い、その結果をベクトル拡散命令３６がメモリに書き込む。 FIG. 3 is a diagram showing a conventional example in the case of having vector instructions that can be loaded and stored in four memory areas with one instruction. The DO loop 31 is described in the Fortran language that performs vector collection and vector diffusion. For the DO loop 31, the compiler generates an instruction sequence 32. Reg5 of the register 37 is a register (referred to as a vector register) that can hold four values. Registers reg1, reg2, reg3, reg4, and reg6 are also vector registers. The vector load instruction 33 is an instruction for reading four consecutive data pieces each having a size of 8 bytes from the memory area having the address value address1 at the top of the array IDX2, 38 with one instruction. FIG. 3 shows an example of the second repetition of the instruction sequence 32, in which values are read from the addresses of address address1 + 0, address1 + 8, address1 + 16, address1 + 24 and written to the vector register reg1. In the vector registers reg1, 39, four values 2, 5, 6, 6 are written. The vector collection instruction 34 is an instruction that calculates an address based on the values of the vector registers reg1 and 39 into which four values are written, and collects values from the four memory areas. Since four values 2, 5, 6, and 6 are written in the vector registers reg1 and 39, the vector collection instruction 34 uses the address address3 + (2-1) × 8, address3 + (5- 1) Values are read from x8, address3 + (6-1) x8, and values are written to vector registers reg3 and 311. Based on the result thus obtained, the vector addition instruction 35 performs four additions with one instruction, and the vector diffusion instruction 36 writes the result into the memory.

なお、特許文献１は、アドレス依存性チェックの際の処理待ちの発生を出来るだけ解消し、メモリへのアクセスが伴う処理を高速化する演算処理技術について記載されている。 Note that Patent Document 1 describes an arithmetic processing technique that eliminates as much as possible the waiting for processing during an address dependency check and speeds up processing involving access to a memory.

特開２００９−０２６２６０号公報JP 2009-026260 A

しかしながら、メモリアクセス命令のメモリ・ディスアンビギュエションを行う場合には、前述のような仕組みであるため、先行して実行できるロード命令数を増やしプログラムを高速化するには、命令リオーダバッファ２９のエントリ数、および、「先行Ｎ」２１５の列を増やすしかない。しかし、ＣＰＵの物理的な制約からそれらを増やすことは難しいという課題がある。また、それらが増えれば増えるほど、命令リオーダユニット２６の処理量が増え、性能向上の度合いが減少するという課題がある。 However, since the memory disambiguation of the memory access instruction is performed as described above, the instruction reorder buffer 29 is used in order to increase the number of load instructions that can be executed in advance and increase the program speed. The number of entries and the column of “preceding N” 215 must be increased. However, there is a problem that it is difficult to increase them due to physical restrictions of the CPU. Further, there is a problem that as the number increases, the processing amount of the instruction reorder unit 26 increases and the degree of performance improvement decreases.

また、ＳＩＭＤ命令、ベクトル命令では、上述のような処理を行うので、それらがアクセスするすべてのメモリ領域ごとに先行させなければならない命令の完了を待ちあわせなければならない。そのため、ロード命令をできるだけ先行して開始しプログラムを高速化するには、図４のようにそれらの命令がアクセスするすべての領域のアドレスを命令リオーダバッファに登録しなければならなくなる。さらに、命令リオーダユニット２６がアドレス計算の終わった命令を命令リオーダバッファに登録するとき、各エントリのすべてのアドレスを調べ先行させなければならない命令を得る必要があり、また、命令実行完了通知２７を受け取ったときも各エントリのすべてのアドレスを調べ、クリアしなければならなくなる。このため、アドレスの登録領域が増加するだけでなく、調べるための時間も増加してしまうという課題があり、ＣＰＵの物理的制約、また、性能向上の恩恵も少ないことから、ＳＩＭＤ命令、ベクトル命令タイプのロード命令を先行して実行し、プログラムを高速化することはできない。 In addition, since the SIMD instruction and the vector instruction perform the processing as described above, it is necessary to wait for the completion of the instruction that must be preceded for every memory area to be accessed. Therefore, in order to start the load instruction as early as possible and speed up the program, it is necessary to register the addresses of all areas accessed by those instructions in the instruction reorder buffer as shown in FIG. Further, when the instruction reorder unit 26 registers an instruction for which address calculation has been completed in the instruction reorder buffer, it is necessary to check all addresses of each entry and obtain an instruction that must be preceded, and to issue an instruction execution completion notification 27. When received, all addresses in each entry must be examined and cleared. For this reason, there is a problem that not only the address registration area increases but also the time for checking increases, and the physical restrictions of the CPU and the benefits of performance improvement are few. Therefore, SIMD instructions and vector instructions It is not possible to speed up the program by executing the type of load instruction in advance.

本発明は、かかる実情に鑑み、ロード命令による追い越しの許可／不許可を示すフラグをもつストア命令、ロード命令の追い越しをブロックする命令を命令セットに含むＣＰＵと、そのフラグの設定、命令を生成するコンパイラを提供することで、プログラムを高速に実行する手法を提供しようとするものである。 In view of such circumstances, the present invention provides a CPU including an instruction set that includes a store instruction having a flag indicating whether overtaking is permitted or not permitted by a load instruction, an instruction that blocks overtaking of a load instruction, setting of the flag, and generation of the instruction. By providing a compiler that does this, we intend to provide a method for executing a program at high speed.

本発明は、ロード命令の追い越しの許可／不許可を示す追い越し許可フラグをもつストア命令、ロード命令の追い越しをブロックする命令を含む命令セットをもつＣＰＵと、そのフラグを設定する追い越し許可フラグ設定手段、その命令を生成する追い越しブロック命令生成手段をもつコンパイラと、追い越しをブロックする命令を命令リオーダバッファに登録する追い越しブロック命令登録手段、追い越し許可フラグの設定されたストア命令を参照しながら、メモリアクセス命令を命令リオーダバッファに登録する命令バイパス手段をもつ命令リオーダユニットを有する。 The present invention relates to a store instruction having an overtaking permission flag indicating permission / non-permission of overtaking of a load instruction, a CPU having an instruction set including an instruction for blocking overtaking of a load instruction, and overtaking permission flag setting means for setting the flag Memory access while referring to a compiler having an overtaking block instruction generation means for generating the instruction, an overtaking block instruction registration means for registering an instruction to block overtaking in the instruction reorder buffer, and a store instruction in which the overtaking permission flag is set An instruction reorder unit having an instruction bypass means for registering an instruction in an instruction reorder buffer;

なお、本明細書において、メモリにデータを書き込むことをストア、それを行う命令をストア命令、メモリからデータを読み込むことをロード、それを行う命令をロード命令という。また、ロード命令、ストア命令をまとめてメモリアクセス命令という。 In this specification, writing data to the memory is referred to as a store, an instruction for executing the data is referred to as a store instruction, reading data from the memory is referred to as load, and an instruction for executing the data is referred to as a load instruction. A load instruction and a store instruction are collectively referred to as a memory access instruction.

本発明によれば、追い越し許可フラグはＯＮに設定されている第１の命令の後続の命令が、第１の命令と同時に、または、第１の命令を追い越して処理される。これにより、処理時間が短縮され、プログラムが高速化されるという効果を奏し得る。 According to the present invention, the instruction subsequent to the first instruction whose overtaking permission flag is set to ON is processed simultaneously with the first instruction or overtaking the first instruction. As a result, the processing time can be shortened and the program can be speeded up.

従来例における命令パイプラインの流れを示す図である。It is a figure which shows the flow of the instruction pipeline in a prior art example. メモリアクセス命令のメモリ・ディスアンビギュエションを行い高速化するときの従来例を示す図である。It is a figure which shows the prior art example when performing memory disambiguation of a memory access instruction and speeding up. 一つの命令で四つのメモリ領域にロード、ストアできるベクトル命令をもつ場合の従来例を示す図である。It is a figure which shows the prior art example when it has a vector instruction which can be loaded and stored in four memory areas with one instruction. 命令リオーダバッファの一例である。It is an example of an instruction reorder buffer. 本発明の一実施形態に係るプログラム実行システムの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the program execution system which concerns on one Embodiment of this invention. 本実施例における追い越し許可フラグ設定手段の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the overtaking permission flag setting means in a present Example. 本実施例における追い越しブロック命令生成手段の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the overtaking block command production | generation means in a present Example. 本実施例における追い越しブロック命令登録手段の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the overtaking block command registration means in a present Example. 本実施例における命令バイパス手段の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the command bypass means in a present Example. 本実施例において、ベクトル命令の場合における効果を示す図である。In this example, it is a figure which shows the effect in the case of a vector instruction. （Ａ）は、他の実施例におけるプログラム１１１の一例を示す図である。（Ｂ）は、この実施例において追い越し許可フラグ設定手段の動作を示すフローチャートである。(A) is a figure which shows an example of the program 111 in another Example. (B) is a flowchart showing the operation of the overtaking permission flag setting means in this embodiment. 本発明の他の実施例におけるコンパイラの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the compiler in the other Example of this invention. 本発明の他の実施例を示す図である。It is a figure which shows the other Example of this invention.

以下、本発明の実施の形態について図面を参照しつつ詳細に説明する。なお、同一の要素には同一の符号を付し、重複する説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In addition, the same code | symbol is attached | subjected to the same element and the overlapping description is abbreviate | omitted.

図５は、本発明の一実施形態に係るプログラム実行システムの概略構成を示すブロック図である。同図に示すように、本実施例において、ＣＰＵの命令セットは、追い越し許可フラグをもつストア命令５４、５５、５６、および、追い越しする命令をブロックする命令（以下「追い越しブロック命令」という。）５７を含む。 FIG. 5 is a block diagram showing a schematic configuration of a program execution system according to an embodiment of the present invention. As shown in the figure, in this embodiment, the CPU instruction set includes store instructions 54, 55, 56 having an overtaking permission flag, and instructions for blocking overtaking instructions (hereinafter referred to as “overtaking block instructions”). 57.

また、本実施例におけるコンパイラ５１は、プログラムのあるループ中のストア命令の追い越し許可フラグをＯＮ（追い越しを許可）に設定する指示行５１６が指定されたときループ中のストア命令５４、５５、５６の追い越し許可フラグをＯＮに設定する追い越し許可フラグ設定手段５２と、プログラム中のある地点以降のロード命令の追い越しをブロックすることを指定する指示行５１５が指定されたときその場所に追い越しブロック命令５７を生成する追い越しブロック命令生成手段５３を有する。 Further, the compiler 51 in the present embodiment stores the store instructions 54, 55, and 56 in the loop when the instruction line 516 for setting the overtaking permission flag of the store instruction in a loop of the program to ON (passing is permitted) is designated. When the overtaking permission flag setting means 52 for setting the overtaking permission flag of the vehicle and the instruction line 515 for designating that the overtaking of the load instruction after a certain point in the program is to be blocked are designated, the overtaking block instruction 57 is provided at that location. Has an overtaking block instruction generation means 53.

命令リオーダユニット５９は、メモリ・ディスアンビギュエションを行いロード、ストア命令の実行順番を制御する。本実施例における命令リオーダユニット５９は、追い越しブロック命令５７を命令リオーダバッファ５１７に登録する追い越しブロック命令登録手段５１０、追い越し許可フラグがＯＮに設定されたストア命令を先行させなければならない命令とせず、かつ、命令リオーダバッファ５１７中の追い越しブロック命令５７を先行させなければならない命令として、ロード命令を命令リオーダバッファ５１７に登録する命令バイパス手段５１１を有する。また、命令リオーダユニット５９は、命令リオーダユニット２６として説明した動作及び機能を包含している。 The instruction reorder unit 59 performs memory disambiguation and loads and controls the execution order of store instructions. The instruction reorder unit 59 in this embodiment does not make the overtaking block instruction registration means 510 for registering the overtaking block instruction 57 in the instruction reorder buffer 517, an instruction that must precede the store instruction in which the overtaking permission flag is set to ON, In addition, an instruction bypass unit 511 that registers a load instruction in the instruction reorder buffer 517 as an instruction that must be preceded by the overtaking block instruction 57 in the instruction reorder buffer 517. The instruction reorder unit 59 includes the operations and functions described as the instruction reorder unit 26.

アドレス計算ユニット５８は、実行ステージ１４に相当し、メモリアクセス命令のアクセスするメモリ領域のアドレスを計算する。命令デコードステージ１３の結果が入力される。メモリアクセスユニット５１３は、リード／ライトステージ１５に相当する。本実施例では、アドレス計算ユニット５８、メモリアクセスユニット５１３は、二つの命令を同時に処理できるものとするが、並列処理可能な命令の数はこれに限定されない。命令リオーダバッファ５１７は、命令パイプラインのリード／ライトステージ１５の直前のバッファに相当する。命令実行完了通知５１２は、命令実行完了通知２７と同様の通知である。 The address calculation unit 58 corresponds to the execution stage 14 and calculates the address of the memory area accessed by the memory access instruction. The result of the instruction decode stage 13 is input. The memory access unit 513 corresponds to the read / write stage 15. In this embodiment, the address calculation unit 58 and the memory access unit 513 can process two instructions simultaneously, but the number of instructions that can be processed in parallel is not limited to this. The instruction reorder buffer 517 corresponds to a buffer immediately before the read / write stage 15 of the instruction pipeline. The instruction execution completion notification 512 is the same notification as the instruction execution completion notification 27.

本実施例におけるプログラム実行システムは、コンピュータにより実現される。プログラム実行システムは、システムの動作及び処理を制御するＣＰＵと、プログラムや必要なデータを格納するＲＯＭ及び／又はＲＡＭと、外部からの入力及び外部への出力をするための入出力インタフェースと、外部とデータ通信するための通信インタフェースと、これらを結ぶバス等を備える。また、本明細書等において、手段とは、単に物理的手段を意味するものではなく、その手段が有する機能をソフトウェアによって実現する場合、すなわち、ＣＰＵが、メモリまたは外部記憶装置などに記憶された所定のプログラムを実行することにより実現する場合を含む。このとき、１つの手段が有する機能が２つ以上の物理的手段により実現されても、２つ以上の手段の機能が１つの物理的手段により実現されてもよい。 The program execution system in the present embodiment is realized by a computer. The program execution system includes a CPU for controlling the operation and processing of the system, a ROM and / or RAM for storing programs and necessary data, an input / output interface for external input and output, and an external A communication interface for data communication with the bus, a bus connecting these, and the like. Further, in this specification and the like, the means does not simply mean physical means, but the function of the means is realized by software, that is, the CPU is stored in a memory or an external storage device. This includes a case where it is realized by executing a predetermined program. At this time, the function of one means may be realized by two or more physical means, or the functions of two or more means may be realized by one physical means.

次に、図５〜９を用いて本実施例における動作について説明する。 Next, the operation in this embodiment will be described with reference to FIGS.

図６は、本実施例における追い越し許可フラグ設定手段の動作を示すフローチャートである。本実施例におけるコンパイラ５１の追い越し許可フラグ設定手段５２は、コンパイラがストア命令を処理するときに起動される。図６に示すように、まず、ストア命令がループに含まれるかどうかを判定する（Ｓ６１）。ストア命令がループに含まれないとき、ストア命令の追い越し許可フラグをＯＦＦに設定する（Ｓ６４）。ストア命令がループに含まれるとき、ループに対して追い越し許可フラグをＯＮに設定する指示行５１６が指定されたかどうかを判定する（Ｓ６２）。指定されていないとき、ストア命令の追い越し許可フラグをＯＦＦに設定する（Ｓ６４）。指定されていたとき、ストア命令の追い越し許可フラグをＯＮに設定する（Ｓ６３）。 FIG. 6 is a flowchart showing the operation of the overtaking permission flag setting means in the present embodiment. The overtaking permission flag setting means 52 of the compiler 51 in this embodiment is activated when the compiler processes a store instruction. As shown in FIG. 6, first, it is determined whether or not a store instruction is included in the loop (S61). When the store instruction is not included in the loop, the overtaking permission flag of the store instruction is set to OFF (S64). When the store instruction is included in the loop, it is determined whether or not the instruction line 516 for setting the overtaking permission flag to ON is designated for the loop (S62). If not specified, the overtaking permission flag of the store instruction is set to OFF (S64). If so, the overtaking permission flag of the store instruction is set to ON (S63).

図７は、本実施例における追い越しブロック命令生成手段の動作を示すフローチャートである。本実施例におけるコンパイラ５１の追い越しブロック命令生成手段５３は、コンパイラが文を処理する直前に起動される。図７に示すように、まず、文の直前にロード命令の追い越しをブロックすることを指定する指示行５１５が指定されているか判定する（Ｓ７１）。指定されていないとき、従来どおり、現在の文に対応する命令を生成する（Ｓ７３）。指定されていたとき、追い越しブロック命令を生成する（Ｓ７２）。その後、従来どおり、現在の文に対応する命令を生成する（Ｓ７３）。 FIG. 7 is a flowchart showing the operation of the overtaking block instruction generation means in this embodiment. The overtaking block instruction generation means 53 of the compiler 51 in this embodiment is activated immediately before the compiler processes a statement. As shown in FIG. 7, first, it is determined whether or not an instruction line 515 for designating blocking of overtaking of a load instruction is designated immediately before a sentence (S71). If not specified, an instruction corresponding to the current sentence is generated as before (S73). If so, an overtaking block instruction is generated (S72). After that, an instruction corresponding to the current sentence is generated as usual (S73).

図８は、本実施例における追い越しブロック命令登録手段の動作を示すフローチャートである。本実施例における命令リオーダユニット５９の追い越しブロック命令登録手段５１０は、命令リオーダユニット５９が命令リオーダバッファ５１７に追い越しブロック命令５７を登録するときに起動される。図８に示すように、追い越しブロック命令登録手段５１０は、命令リオーダバッファ５１７からエントリを取り出す（Ｓ８１）。次に取り出したエントリが空のエントリかどうかを判定する（Ｓ８２）。空のエントリでないとき、追い越しブロック命令に先行させなければならない命令として本エントリの番号を登録する（Ｓ８３）。次に、取り出したエントリが命令リオーダバッファ５１７の最後のエントリかどうかを判定する（Ｓ８４）。最後のエントリでなかったとき、命令リオーダバッファ５１７から次のエントリを取り出し（Ｓ８１）、処理を繰り返す。ステップＳ８４で最後のエントリだったとき、処理を終了する。ステップＳ８２でエントリが空のエントリであったとき、取り出したエントリが命令リオーダバッファ５１７の最後のエントリかどうかを判定する（Ｓ８４）。最後のエントリでなかったとき、命令リオーダバッファ５１７から次のエントリを取り出し（Ｓ８１）、処理を繰り返す。ステップＳ８４で最後のエントリだったとき、処理を終了する。 FIG. 8 is a flowchart showing the operation of the overtaking block instruction registration means in this embodiment. The overtaking block instruction registration means 510 of the instruction reorder unit 59 in this embodiment is activated when the instruction reorder unit 59 registers the overtaking block instruction 57 in the instruction reorder buffer 517. As shown in FIG. 8, the overtaking block instruction registration unit 510 takes out an entry from the instruction reorder buffer 517 (S81). Next, it is determined whether or not the extracted entry is an empty entry (S82). If it is not an empty entry, the number of this entry is registered as an instruction that must precede the overtaking block instruction (S83). Next, it is determined whether or not the fetched entry is the last entry in the instruction reorder buffer 517 (S84). If it is not the last entry, the next entry is extracted from the instruction reorder buffer 517 (S81), and the process is repeated. If it is the last entry in step S84, the process is terminated. If the entry is an empty entry in step S82, it is determined whether the extracted entry is the last entry in the instruction reorder buffer 517 (S84). If it is not the last entry, the next entry is extracted from the instruction reorder buffer 517 (S81), and the process is repeated. If it is the last entry in step S84, the process is terminated.

図９は、本実施例における命令バイパス手段の動作を示すフローチャートである。本実施例における命令リオーダユニット５９の命令バイパス手段５１１は、命令リオーダユニット５９が命令リオーダバッファ５１７に命令を登録するときに起動される。図９に示すように、まず、命令リオーダバッファ５１７からエントリを取り出す出したエントリが追い越しブロック命令のものかどうか判定する（Ｓ９３）。エントリが追い越しブロック命令のものであるとき、登録しようとしている命令に先行させなければならない命令として、このエントリの番号を登録するものとする（Ｓ９４）。その後、取り出したエントリが命令リオーダバッファ５１７の最後のエントリだったかどうか判定する（Ｓ９５）。最後のエントリでなかったとき、次のエントリを取り出す（Ｓ９１）。最後のエントリだったとき、従来と同じように、命令を命令リオーダバッファ５１７の空きエントリに登録する（Ｓ９６）。 FIG. 9 is a flowchart showing the operation of the instruction bypass means in this embodiment. The instruction bypass unit 511 of the instruction reorder unit 59 in this embodiment is activated when the instruction reorder unit 59 registers an instruction in the instruction reorder buffer 517. As shown in FIG. 9, first, it is determined whether or not the entry taken out from the instruction reorder buffer 517 is an overtaking block instruction (S93). When the entry is for the overtaking block instruction, the entry number is registered as an instruction that must precede the instruction to be registered (S94). Thereafter, it is determined whether or not the fetched entry is the last entry in the instruction reorder buffer 517 (S95). If it is not the last entry, the next entry is extracted (S91). If it is the last entry, the instruction is registered in the empty entry of the instruction reorder buffer 517 as in the conventional case (S96).

ステップＳ９３で追い越しブロック命令のエントリでなかったとき、エントリがストア命令のものかどうか判定する（Ｓ９７）。エントリがストア命令のものでなかったとき、すなわち、ロード命令のエントリであったとき、登録しようとしている命令がロード命令であるかどうか判定する（Ｓ９１０）。ロード命令であるとき、取り出したエントリが命令リオーダバッファ５１７の最後のエントリだったかどうか判定し（Ｓ９５）、次のエントリを処理するか（Ｓ９１）、命令を登録して（Ｓ９６）、処理を終了する。 If the entry is not an entry for the overtaking block instruction in step S93, it is determined whether the entry is for a store instruction (S97). When the entry is not a store instruction, that is, when the entry is a load instruction, it is determined whether the instruction to be registered is a load instruction (S910). When it is a load instruction, it is determined whether or not the fetched entry is the last entry of the instruction reorder buffer 517 (S95), the next entry is processed (S91), the instruction is registered (S96), and the process ends. To do.

ステップＳ９１０でロード命令でなかったとき、すなわち、ストア命令であったとき、メモリアクセス命令のメモリ・ディスアンビギュエションを行い、必要であれば先行させなければならない命令としてエントリ番号を登録するものとする。また、ＳＩＭＤ命令、ベクトル命令であるときは直前に登録した命令を先行させなければならない命令として登録するものとする（Ｓ９１１）。ステップＳ９１１は、従来どおりの処理である。その後、取り出したエントリが命令リオーダバッファ５１７の最後のエントリだったかどうか判定し（Ｓ９５）、次のエントリを処理するか（Ｓ９１）、命令を登録して（Ｓ９６）、処理を終了する。 If it is not a load instruction in step S910, that is, if it is a store instruction, the memory disambiguation of the memory access instruction is performed, and the entry number is registered as an instruction that must be preceded if necessary. And If the instruction is a SIMD instruction or a vector instruction, the instruction registered immediately before is registered as an instruction to be preceded (S911). Step S911 is a conventional process. Thereafter, it is determined whether or not the fetched entry is the last entry in the instruction reorder buffer 517 (S95), the next entry is processed (S91), the instruction is registered (S96), and the process is terminated.

ステップＳ９７でエントリがストア命令のものであったとき、登録しようとしている命令がロード命令かどうか判定する（Ｓ９８）。ロード命令でないとき、すなわち、ストア命令であるとき、メモリアクセス命令のメモリ・ディスアンビギュエションを行い、必要であれば先行させなければならない命令としてエントリ番号を登録するものとする。また、ＳＩＭＤ命令、ベクトル命令であるときは直前に登録した命令を先行させなければならない命令として登録するものとする（Ｓ９１１）。その後、取り出したエントリが命令リオーダバッファ５１７の最後のエントリだったかどうか判定し（Ｓ９５）、次のエントリを処理するか（Ｓ９１）、命令を登録して（Ｓ９６）、処理を終了する。 If the entry is for a store instruction in step S97, it is determined whether the instruction to be registered is a load instruction (S98). When the instruction is not a load instruction, that is, when it is a store instruction, memory disambiguation of a memory access instruction is performed, and an entry number is registered as an instruction that must be preceded if necessary. If the instruction is a SIMD instruction or a vector instruction, the instruction registered immediately before is registered as an instruction to be preceded (S911). Thereafter, it is determined whether or not the fetched entry is the last entry in the instruction reorder buffer 517 (S95), the next entry is processed (S91), the instruction is registered (S96), and the process is terminated.

ステップＳ９８でロード命令であったとき、エントリに登録されていたストア命令の追い越し許可フラグがＯＮであるかどうか判定する（Ｓ９９）。追い越し許可フラグがＯＦＦ（追い越し不可）であったとき、メモリアクセス命令のメモリ・ディスアンビギュエションを行い、必要であれば先行させなければならない命令としてエントリ番号を登録するものとする。また、ＳＩＭＤ命令、ベクトル命令であるときは直前に登録した命令を先行させなければならない命令として登録するものとする（Ｓ９１１）。その後、取り出したエントリが命令リオーダバッファ５１７の最後のエントリだったかどうか判定し（Ｓ９５）、次のエントリを処理するか（Ｓ９１）、命令を登録して（Ｓ９６）、処理を終了する。 If it is a load instruction in step S98, it is determined whether or not the overtaking permission flag of the store instruction registered in the entry is ON (S99). When the overtaking permission flag is OFF (passing is impossible), memory disambiguation of a memory access instruction is performed, and an entry number is registered as an instruction that must be preceded if necessary. If the instruction is a SIMD instruction or a vector instruction, the instruction registered immediately before is registered as an instruction to be preceded (S911). Thereafter, it is determined whether or not the fetched entry is the last entry in the instruction reorder buffer 517 (S95), the next entry is processed (S91), the instruction is registered (S96), and the process is terminated.

ステップＳ９９で追い越し許可フラグがＯＮ（追い越し許可）であったとき、取り出したエントリが命令リオーダバッファ５１７の最後のエントリだったかどうか判定し（Ｓ９５）、次のエントリを処理するか（Ｓ９１）、命令を登録して（Ｓ９６）、処理を終了する。 When the overtaking permission flag is ON (overtaking permission) in step S99, it is determined whether or not the extracted entry is the last entry in the instruction reorder buffer 517 (S95), whether the next entry is processed (S91), Is registered (S96), and the process is terminated.

このようにして、追い越し許可フラグがＯＮのストア命令があったとき、ロード命令のメモリ・ディスアンビギュエションを省くことができ、また、ＳＩＭＤ命令、ベクトル命令タイプのロード命令を先行して実行することができ、プログラム実行が高速化される。 In this way, when there is a store instruction with the overtaking permission flag ON, the memory disambiguation of the load instruction can be omitted, and the SIMD instruction and the vector instruction type load instruction are executed in advance. And program execution is accelerated.

図１０は、本実施例において、ベクトル命令の場合における効果を示す図である。ソースコード１０１は、サンプルのＣプログラムのループである。同図においてｘ、ｙ、ｚ、ｉｄｘはポインタである。このループ内では、二つのポインタの指示先から値をベクトルロードし、そのベクトル加算結果を、命令１０２でストアしている。命令１０２のストアは、一般的に、ベクトル拡散と呼ばれる。指示行１０３は、ループ中のストア命令の追い越し許可フラグをＯＮ（追い越しを許可）に設定する。このループに対して、本発明のコンパイラは、ソースコード１０１のループの一回の繰り返しに対して、ベクトル命令列１０４を生成する。ＶＳＣ命令１０５は、追い越し許可フラグがＯＮに設定されたストア命令（ベクトル拡散命令）である。ＶＬＤはベクトルロード命令、ＶＡＤＤはベクトル加算命令である。プログラムの実行では、命令列１０４が命令パイプラインに繰り返し投入される。命令列１０６は、命令列１０４が命令パイプラインに投入される順番を示す。左端例１０１０は投入される順番を示す。最初に、ＶＬＤ命令１０９が投入される。命令１０８は次の繰り返しの最初のＶＬＤ命令であり、命令１０７はその次の繰り返しの最初のＶＬＤ命令である。 FIG. 10 is a diagram illustrating an effect in the case of a vector instruction in the present embodiment. The source code 101 is a sample C program loop. In the figure, x, y, z and idx are pointers. In this loop, values are vector-loaded from the two pointers, and the vector addition result is stored by the instruction 102. The store of instructions 102 is commonly referred to as vector spreading. The instruction line 103 sets the overtaking permission flag of the store instruction in the loop to ON (passing is permitted). For this loop, the compiler of the present invention generates a vector instruction sequence 104 for one iteration of the source code 101 loop. The VSC instruction 105 is a store instruction (vector spread instruction) in which the overtaking permission flag is set to ON. VLD is a vector load instruction, and VADD is a vector addition instruction. In executing the program, the instruction sequence 104 is repeatedly input into the instruction pipeline. The instruction sequence 106 indicates the order in which the instruction sequence 104 is input to the instruction pipeline. The leftmost example 1010 shows the order of input. First, the VLD instruction 109 is input. Instruction 108 is the first VLD instruction of the next iteration, and instruction 107 is the first VLD instruction of the next iteration.

時系列１０１１は、従来技術のときの命令パイプラインのリード／ライトステージでの処理状況である。命令は、右から左に流れていく。従来は、ベクトル命令の追い越しは行われなかったため、ＶＳＣ命令１０１２と同時に、あるいはそれより先に、ＶＬＤ命令１０１３は処理されなかった。後続のＶＳＣ命令１０２１に続くＶＬＤ命令１０２２も同じである。 The time series 1011 is the processing status at the read / write stage of the instruction pipeline in the prior art. Instructions flow from right to left. Conventionally, since the vector instruction is not overtaken, the VLD instruction 1013 is not processed at the same time as or before the VSC instruction 1012. The same applies to the VLD instruction 1022 following the subsequent VSC instruction 1021.

これに対して、時系列１０１４は、本発明での命令パイプラインのリード／ライトステージでの処理状況である。ＶＳＣ命令１０１６の追い越し許可フラグはＯＮに設定されているため、その後続のＶＬＤ命令１０１５が、ＶＳＣ命令１０１６と同時に処理される。ＶＳＣ命令１０１７も追い越し許可フラグがＯＮに設定されているため、その後続のＶＬＤ命令１０１８、ＶＬＤ命令１０１９がＶＳＣ命令１０１７を追い越して処理される。これにより、時間１０２０に相当する時間が短縮され、プログラムが高速化される。 On the other hand, the time series 1014 shows the processing status at the read / write stage of the instruction pipeline in the present invention. Since the overtaking permission flag of the VSC instruction 1016 is set to ON, the subsequent VLD instruction 1015 is processed simultaneously with the VSC instruction 1016. Since the overtaking permission flag of the VSC instruction 1017 is also set to ON, the subsequent VLD instruction 1018 and VLD instruction 1019 are processed by overtaking the VSC instruction 1017. Thereby, the time corresponding to the time 1020 is shortened, and the program is speeded up.

本発明の他の実施例として、プログラマが追い越し許可フラグをＯＮに設定する指示行をループに対して指定する代わりに、文ごとに指定する方法がある。 As another embodiment of the present invention, there is a method in which an instruction line for setting an overtaking permission flag to ON is designated for each sentence instead of designating to a loop.

図１１（Ａ）は、他の実施例におけるプログラム１１１の一例を示す図である。図１１（Ｂ）は、この実施例において追い越し許可フラグ設定手段の動作を示すフローチャートである。指示行１１２は、直後の文に対して指定された指示行である。このときの追い越し許可フラグ設定手段の動作は、最初に文に対して追い越し許可フラグをＯＮに設定する指示行が指定されていたかどうかを判定する（Ｓ１１３）。指定されていたとき、文に対して生成するストア命令の追い越し許可フラグをＯＮに設定する（Ｓ１１４）。そうでないとき、ストア命令の追い越し許可フラグをＯＦＦに設定する（Ｓ１１５）。 FIG. 11A is a diagram illustrating an example of a program 111 in another embodiment. FIG. 11B is a flowchart showing the operation of the overtaking permission flag setting means in this embodiment. The instruction line 112 is an instruction line designated for the immediately following sentence. The operation of the overtaking permission flag setting means at this time determines whether or not an instruction line for setting the overtaking permission flag to ON is first designated for the sentence (S113). If it is specified, the overtaking permission flag of the store instruction generated for the sentence is set to ON (S114). Otherwise, the overtaking permission flag of the store instruction is set to OFF (S115).

図１２は、本発明の他の実施例におけるコンパイラの機能構成を示すブロック図である。この実施例においては、プログラマが追い越し許可フラグをＯＮに設定する指示行、ロード命令の追い越しをブロックすることを指定する指示行をループ、文に対して指定する代わりに、図１２のコンパイラのように追い越し可能ストア命令検出手段１２１をコンパイラに追加し、それによりコンパイラがプログラムを解析し、追い越しブロック命令をどの位置に作るのか、どのストア命令の追い越し許可フラグをＯＮに設定するのか、ＯＦＦに設定するのかを自動的に判定する方法がある。このとき、その検出結果を追い越し許可フラグ設定手段１２２、追い越しブロック命令生成手段１２３で利用する。 FIG. 12 is a block diagram showing a functional configuration of a compiler in another embodiment of the present invention. In this embodiment, instead of specifying the instruction line for setting the overtaking permission flag ON and the instruction line for blocking the overtaking of the load instruction for the loop and the statement, the programmer like the compiler of FIG. The store instruction detection means 121 that can be overtaken is added to the compiler, so that the compiler analyzes the program, and in which position the overtake block instruction is created, which store instruction overtake permission flag is set to ON, and OFF is set. There is a method for automatically determining whether to do this. At this time, the detection result is used by the overtaking permission flag setting unit 122 and the overtaking block instruction generation unit 123.

図１３は、本発明の他の実施例を示す図である。この実施例は、ストア命令に追い越し許可フラグを設ける代わりに、追い越し許可を示す命令（追い越し許可命令）を追加し、その命令と追い越しブロック命令の間の区間のストア命令を追い越し許可と判断する方法である。同図において、ＯＶＴ１３５が追い越し許可命令である。追い越し許可命令は、指示行１３１７の指定された位置に、コンパイラ１３１の追い越し許可命令生成手段１３２によって生成される。命令列１３４は、本実施例のコンパイラ１３１によって生成されたループ１３１６に対する命令列である。ＬＡＢＥＬから末尾のＩＦまでの命令１３８がループの一回の繰り返しに相当する命令列である。その直前に追い越し許可命令であるＯＶＴ１３５が生成される。そしてループの出口には指示行１３１８に相当する追い越しブロック命令ＳＯＶＴ１３７が生成される。追い越しブロック命令は、指定された位置に、コンパイラ１３１の追い越しブロック命令生成手段１３３によって生成される。このとき、ＯＶＴ１３５とＳＯＶＴ１３７の間で実行されるすべてのストア命令１３６は追い越し許可となる。 FIG. 13 is a diagram showing another embodiment of the present invention. In this embodiment, instead of providing an overtaking permission flag in a store instruction, an instruction indicating overtaking permission (overtaking permission instruction) is added, and a store instruction in a section between the instruction and the overtaking block instruction is determined as overtaking permission. It is. In the figure, OVT 135 is an overtaking permission command. The overtaking permission instruction is generated by the overtaking permission instruction generation unit 132 of the compiler 131 at the designated position of the instruction line 1317. The instruction sequence 134 is an instruction sequence for the loop 1316 generated by the compiler 131 of this embodiment. An instruction 138 from LABEL to the last IF is an instruction sequence corresponding to one iteration of the loop. Immediately before that, an OVT 135 that is an overtaking permission command is generated. Then, an overtaking block instruction SOVT137 corresponding to the instruction line 1318 is generated at the exit of the loop. The overtaking block instruction is generated at the designated position by the overtaking block instruction generation means 133 of the compiler 131. At this time, all store instructions 136 executed between the OVT 135 and the SOVT 137 are overtaken.

アドレス計算ユニット１３９、命令実行完了通知１３１４、メモリアクセスユニット１３１５、命令リオーダバッファ１３１９は、図５に記載の実施例の対応する構成と同様の構成を備える。 The address calculation unit 139, the instruction execution completion notification 1314, the memory access unit 1315, and the instruction reorder buffer 1319 have the same configuration as the corresponding configuration of the embodiment described in FIG.

本実施例の命令リオーダユニット１３１０は、追い越し許可／ブロック命令登録手段１３１１を備える。追い越し許可命令は、追い越し許可／ブロック命令登録手段１３１１により、図８の追い越しブロック命令登録手段と同じ手順で命令リオーダバッファ１３１９に登録される。追い越しブロック命令は、追い越し許可／ブロック命令登録手段１３１１により、図８の追い越しブロック命令登録手段と同じ手順で命令リオーダバッファ１３１９に登録される。 The instruction reorder unit 1310 of this embodiment includes an overtaking permission / block instruction registration means 1311. The overtaking permission instruction is registered in the instruction reorder buffer 1319 by the overtaking permission / block instruction registration means 1311 in the same procedure as the overtaking block instruction registration means of FIG. The overtaking block instruction is registered in the instruction reorder buffer 1319 by the overtaking permission / block instruction registration unit 1311 in the same procedure as the overtaking block instruction registration unit of FIG.

本実施例の命令バイパス手段１３１２は、図９の追い越し許可フラグがＯＮであるかの判定（Ｓ９９）の代わりに、追い越し許可命令実行後に一度も追い越しブロック命令を実行していないかどうかを判定する。実行していないとき、追い越し許可であるので取り出したエントリが命令リオーダバッファ１３１９の最後のエントリだったかどうか判定し（Ｓ９５）、次のエントリを処理するか（Ｓ９１）、命令を登録して（Ｓ９６）、処理を終了する。追い越しブロック命令を実行していたとき、メモリアクセス命令のメモリ・ディスアンビギュエションを行い、必要であれば先行させなければならない命令としてエントリ番号を登録するものとする。 The instruction bypass unit 1312 of this embodiment determines whether or not an overtaking block instruction has been executed even after execution of the overtaking permission instruction, instead of determining whether the overtaking permission flag in FIG. 9 is ON (S99). . When not executed, since overtaking is permitted, it is determined whether or not the extracted entry is the last entry in the instruction reorder buffer 1319 (S95), the next entry is processed (S91), or the instruction is registered (S96). ), The process is terminated. When an overtaking block instruction is executed, memory disambiguation of a memory access instruction is performed, and an entry number is registered as an instruction that must be preceded if necessary.

また、ＳＩＭＤ命令、ベクトル命令であるときは直前に登録した命令を先行させなければならない命令として登録するものとする（Ｓ９１１）。その後、取り出したエントリが命令リオーダバッファ１３１９の最後のエントリだったかどうか判定し（Ｓ９５）、次のエントリを処理するか（Ｓ９１）、命令を登録して（Ｓ９６）、処理を終了する。それ以外は、図９の命令バイパス手段と同じ動作をする。 If the instruction is a SIMD instruction or a vector instruction, the instruction registered immediately before is registered as an instruction to be preceded (S911). Thereafter, it is determined whether or not the fetched entry is the last entry in the instruction reorder buffer 1319 (S95), the next entry is processed (S91), the instruction is registered (S96), and the process is terminated. Otherwise, the operation is the same as the instruction bypass means of FIG.

本発明の他の実施例として、プログラマが追い越し許可命令、追い越しブロック命令の位置を指示行で支持する代わりに、図１２のコンパイラのように追い越し可能ストア命令検出手段をコンパイラに追加し、それによりコンパイラがプログラムを解析し、追い越し許可命令、追い越しブロック命令をどの位置に作るのかを自動的に判定する方法もある。 As another embodiment of the present invention, instead of the programmer supporting the position of the overtaking permission instruction and the overtaking block instruction in the instruction line, an overtaking store instruction detecting means is added to the compiler as in the compiler of FIG. There is also a method in which the compiler analyzes the program and automatically determines where to pass the overtaking permission instruction and overtaking block instruction.

なお、本発明は、上記した実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲内において、他の様々な形で実施することができる。このため、上記実施形態はあらゆる点で単なる例示にすぎず、限定的に解釈されるものではない。例えば、上述の各処理ステップは処理内容に矛盾を生じない範囲で任意に順番を変更して又は並列に実行することができる。 The present invention is not limited to the above-described embodiment, and can be implemented in various other forms without departing from the gist of the present invention. For this reason, the said embodiment is only a mere illustration in all points, and is not interpreted limitedly. For example, the above-described processing steps can be executed in any order or in parallel as long as there is no contradiction in the processing contents.

上記の実施形態の一部又は全部は、以下の付記のようにも記載され得るが、以下には限
られない。 A part or all of the above embodiments can be described as in the following supplementary notes, but is not limited thereto.

（付記１）ロード命令の追い越しの許可又は不許可を示す追い越し許可フラグをもつストア命令と、ロード命令の追い越しをブロックする追い越しブロック命令と、を含む命令セットをもつプロセッサと、前記追い越し許可フラグを設定する追い越し許可フラグ設定手段と、前記追い越しブロック命令を生成する追い越しブロック命令生成手段と、をもつコンパイラと、前記追い越しブロック命令を命令リオーダバッファに登録する追い越しブロック命令登録手段と、前記追い越し許可フラグの設定されたストア命令を参照しながら、メモリアクセス命令を命令リオーダバッファに登録する命令バイパス手段と、をもつ命令リオーダユニットと、を有するプログラム実行システム。 (Supplementary Note 1) A processor having an instruction set including a store instruction having an overtaking permission flag indicating permission or disapproval of overtaking of a load instruction, and an overtaking block instruction for blocking overtaking of a load instruction, and the overtaking permission flag An overtaking permission flag setting means to set; an overtaking block instruction generation means for generating the overtaking block instruction; an overtaking block instruction registration means for registering the overtaking block instruction in an instruction reorder buffer; and the overtaking permission flag. An instruction reorder unit having instruction bypass means for registering a memory access instruction in an instruction reorder buffer while referring to the set store instruction.

（付記２）前記命令リオーダユニットは、前記追い越し許可フラグがＯＮに設定されている第１の命令について、その後続の命令を、前記第１の命令と同時に又は追い越して処理されるよう、前記命令リオーダバッファに登録されることを特徴とする付記１記載のプログラム実行システム。 (Supplementary Note 2) The instruction reorder unit is configured to process the instruction for the first instruction in which the overtaking permission flag is set to ON so that the subsequent instruction is processed simultaneously with or overtaking the first instruction. The program execution system according to appendix 1, which is registered in a reorder buffer.

（付記３）ロード命令の追い越しの許可又は不許可を示す追い越し許可フラグをもつストア命令と、ロード命令の追い越しをブロックする追い越しブロック命令と、を含む命令セットをもつプロセッサを備えるプログラム実行システムにおいて、プログラムを実行する方法であって、コンパイラにおいて、前記追い越し許可フラグを設定する追い越し許可フラグ設定ステップと、前記コンパイラにおいて、前記追い越しブロック命令を生成する追い越しブロック命令生成ステップと、命令リオーダユニットが、前記追い越しブロック命令を命令リオーダバッファに登録する追い越しブロック命令登録ステップと、前記命令リオーダユニットが、前記追い越し許可フラグの設定されたストア命令を参照しながら、メモリアクセス命令を命令リオーダバッファに登録する命令バイパスステップと、を備える方法。 (Supplementary note 3) In a program execution system including a processor having an instruction set including a store instruction having an overtaking permission flag indicating permission or non-permission of overtaking of a load instruction, and an overtaking block instruction for blocking overtaking of a load instruction. A method of executing a program, wherein in a compiler, the overtaking permission flag setting step for setting the overtaking permission flag, in the compiler, the overtaking block instruction generation step for generating the overtaking block instruction, and the instruction reorder unit includes: An overtaking block instruction registration step for registering an overtaking block instruction in an instruction reorder buffer, and the instruction reorder unit refers to a store instruction in which the overtaking permission flag is set while referring to a memory access instruction. The method comprising the instruction bypass step of registering the Dabaffa, the.

（付記４）前記命令リオーダユニットが、前記追い越し許可フラグがＯＮに設定されている第１の命令について、その後続の命令を、前記第１の命令と同時に又は追い越して処理されるよう、前記命令リオーダバッファに登録することを特徴とする付記３記載の方法。 (Supplementary note 4) The instruction reorder unit is configured to process the subsequent instruction for the first instruction in which the overtaking permission flag is set to ON at the same time as or overtaking the first instruction. The method according to appendix 3, wherein the method is registered in a reorder buffer.

５１，１３１コンパイラ、５２，１２２追い越し許可フラグ設定手段、５３，１２３，１３３追い越しブロック命令生成手段、５８，１３９アドレス計算ユニット、５９，１３１０命令リオーダユニット、５１０追い越しブロック命令登録手段、５１１，１３１２命令バイパス手段、５１２，１３１４命令実行完了通知、５１３，１３１５メモリアクセスユニット、５１７，１３１９命令リオーダバッファ、１２１追い越し可能ストア命令検出手段、１３２追い越し許可命令生成手段、１３１１追い越し許可／ブロック命令登録手段 51,131 compiler, 52,122 overtaking permission flag setting means, 53,123,133 overtaking block instruction generation means, 58,139 address calculation unit, 59,1310 instruction reorder unit, 510 overtaking block instruction registration means, 511,1312 instructions Bypass means, 512, 1314 instruction execution completion notification, 513, 1315 memory access unit, 517, 1319 instruction reorder buffer, 121 overtaking store instruction detecting means, 132 overtaking permission instruction generating means, 1311 overtaking permission / block instruction registration means

Claims

A processor having an instruction set including a store instruction having an overtaking permission flag indicating permission or disapproval of overtaking of the load instruction, and an overtaking block instruction for blocking overtaking of the load instruction;
A compiler having overtaking permission flag setting means for setting the overtaking permission flag and overtaking block instruction generation means for generating the overtaking block instruction;
An overtaking block instruction registration means for registering the overtaking block instruction in an instruction reorder buffer; and an instruction bypass means for registering a memory access instruction in the instruction reorder buffer while referring to the store instruction in which the overtaking permission flag is set. An instruction reorder unit;
A program execution system.

The instruction reorder unit registers, in the instruction reorder buffer, the subsequent instruction for the first instruction having the overtaking permission flag set to ON to be processed simultaneously with or overtaking the first instruction. The program execution system according to claim 1, wherein:

A program is executed in a program execution system including a processor having an instruction set including a store instruction having an overtaking permission flag indicating permission or disapproval of overtaking of a load instruction and an overtaking block instruction for blocking overtaking of a load instruction A method,
In the compiler, an overtaking permission flag setting step for setting the overtaking permission flag;
In the compiler, an overtaking block instruction generation step for generating the overtaking block instruction;
An instruction reorder unit for registering the overtaking block instruction in an instruction reorder buffer;
An instruction bypass step in which the instruction reorder unit registers a memory access instruction in an instruction reorder buffer while referring to a store instruction in which the overtaking permission flag is set;
A method comprising:

The instruction reorder unit registers, in the instruction reorder buffer, the subsequent instruction for the first instruction having the overtaking permission flag set to ON to be processed simultaneously with or overtaking the first instruction. 4. The method of claim 3, wherein: