JP2015201026A

JP2015201026A - Arithmetic processing device and method for controlling arithmetic processing device

Info

Publication number: JP2015201026A
Application number: JP2014079229A
Authority: JP
Inventors: 聡太坂下; Sota Sakashita; 吉田　利雄; Toshio Yoshida; 利雄吉田; 秋月　康伸; Yasunobu Akizuki; 康伸秋月
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2014-04-08
Filing date: 2014-04-08
Publication date: 2015-11-12
Anticipated expiration: 2034-04-08
Also published as: JP6344022B2

Abstract

PROBLEM TO BE SOLVED: To suppress an increase in the circuit scale of a circuit for controlling a data bypass.SOLUTION: An execution control unit 2 holds the instruction decoded by a decoder unit 1 and outputs executable instructions successively. A memory control unit 4 outputs an access request to a memory 5 on the basis of the memory access instruction received from the execution control unit 2. A second register 8 temporarily holds the data allocated in correspondence to a first register 7 that holds the data used for arithmetic by an arithmetic execution unit 3 and read out from the memory 5, before it is transferred to the first register 7. When the memory control unit 4 outputs a plurality of access requests on the basis of the memory access instruction and successively transfers the data read out from the memory 5 separately a number of times to the second register 8, a bypass control unit 6 causes the execution control unit 2 to output an arithmetic instruction at the time the data last read out from the memory 5 and transferred to the second register 8 is bypassed from the second register 8 to the arithmetic execution unit 3.

Description

本発明は、演算処理装置および演算処理装置の制御方法に関する。 The present invention relates to an arithmetic processing unit and a control method for the arithmetic processing unit.

演算を実行する演算処理装置は、例えば、先行する命令の実行により得られたデータがレジスタに格納される前に、データを演算器にバイパスさせるバイパス機能を有する。この種の演算処理装置は、先行する命令で使用するレジスタと後続の命令で使用するレジスタとの依存関係を検出し、依存関係がある場合にバイパス機能を有効にする回路を有する。また、この種の演算処理装置が有する倍精度浮動小数点レジスタは、１つの倍精度浮動小数点データまたは２つの単精度浮動小数点データを格納可能である。例えば、倍精度浮動小数点レジスタの上位ビット側に１つの単精度浮動小数点データを格納することで、バイパス機能におけるレジスタの依存関係の検出は、既存の回路を用いて実現可能である。また、この種の演算処理装置は、演算器による演算により得られたデータおよびデータキャッシュから読み出されたデータを一時的に保持するリネーミングレジスタを有する（例えば、特許文献１参照）。 An arithmetic processing unit that executes an arithmetic operation has, for example, a bypass function that causes an arithmetic unit to bypass data before data obtained by execution of a preceding instruction is stored in a register. This type of arithmetic processing unit has a circuit that detects a dependency relationship between a register used in a preceding instruction and a register used in a subsequent instruction, and enables the bypass function when there is a dependency relationship. In addition, the double precision floating point register of this type of arithmetic processing apparatus can store one double precision floating point data or two single precision floating point data. For example, by storing one single-precision floating-point data on the upper bit side of a double-precision floating-point register, detection of register dependency in the bypass function can be realized using an existing circuit. In addition, this type of arithmetic processing device has a renaming register that temporarily holds data obtained by arithmetic operations and data read from the data cache (see, for example, Patent Document 1).

特開２００９−２３０３３９号公報JP 2009-230339 A

例えば、倍精度浮動小数点レジスタに格納する２つの単精度浮動小数点データが、データキャッシュにおける互いに異なるキャッシュラインに保持される場合、２つの単精度浮動小数点データは、互いに異なるタイミングでデータキャッシュから読み出される。この場合、各キャッシュラインから読み出されるデータを順次に保持するバッファと、演算器へのデータのバイパスが可能になるタイミングを判定する回路とが演算処理装置に設けられる。しかしながら、新たな回路の追加により、演算処理装置の回路規模は増加する。 For example, when two single-precision floating-point data stored in the double-precision floating-point register are held in different cache lines in the data cache, the two single-precision floating-point data are read from the data cache at different timings. . In this case, a buffer that sequentially holds data read from each cache line and a circuit that determines a timing at which data can be bypassed to the arithmetic unit are provided in the arithmetic processing unit. However, the addition of new circuits increases the circuit scale of the arithmetic processing unit.

本件開示の演算処理装置および演算処理装置の制御方法は、メモリアクセス命令に基づいてデータをメモリから複数回に分けて読み出す場合に、データのバイパスを制御する回路の回路規模の増加を抑制することを目的とする。 The arithmetic processing device and the control method of the arithmetic processing device disclosed herein suppress an increase in the circuit scale of a circuit that controls data bypass when data is read from a memory in a plurality of times based on a memory access instruction. With the goal.

一つの観点によれば、演算処理装置は、命令を解読するデコーダ部と、デコーダ部が解読した命令を保持し、実行可能な命令を順次に出力する実行制御部と、実行制御部から受ける演算の実行を示す演算命令に基づいて演算を実行する演算実行部と、演算実行部で使用するデータを記憶するメモリと、演算実行部が実行する演算に使用するデータを保持する複数の第１レジスタと、演算に使用する第１レジスタに対応して割り当てられ、演算命令またはメモリアクセス命令の実行により得られるデータを第１レジスタに転送する前に一時的に保持する複数の第２レジスタと、第１レジスタと第２レジスタとの割り当ての関係を保持し、実行制御部により参照されるレジスタ管理部と、実行制御部から受けるメモリからデータを読み出すメモリアクセス命令に基づいてメモリにアクセス要求を出力するメモリ制御部と、メモリ制御部がメモリアクセス命令に基づいて複数のアクセス要求を出力し、メモリから複数回に分けて読み出されるデータを第２レジスタに順次に転送する場合、メモリから最後に読み出されるデータが第２レジスタに転送された後に第２レジスタから演算実行部へデータをバイパスするタイミングで、実行制御部に演算命令を出力させるバイパス制御部を有する。 According to one aspect, the arithmetic processing device includes a decoder unit that decodes instructions, an execution control unit that holds instructions decoded by the decoder unit, and sequentially outputs executable instructions, and an operation received from the execution control unit An operation execution unit that executes an operation based on an operation instruction indicating execution of the operation, a memory that stores data used in the operation execution unit, and a plurality of first registers that hold data used for the operation executed by the operation execution unit A plurality of second registers that are allocated corresponding to the first register used for the operation and temporarily hold before transferring the data obtained by executing the operation instruction or the memory access instruction to the first register; Memory that retains the allocation relationship between one register and the second register, and that reads data from a register management unit that is referred to by the execution control unit and a memory that is received from the execution control unit A memory control unit that outputs an access request to the memory based on the access instruction, and the memory control unit outputs a plurality of access requests based on the memory access command, and the data read out from the memory in a plurality of times is stored in the second register In the case of sequential transfer, a bypass control unit that outputs an operation instruction to the execution control unit at a timing when data is read from the memory to the second register after the data is transferred to the second register and then is bypassed from the second register to the operation execution unit. Have.

別の観点によれば、命令を解読するデコーダ部と、デコーダ部が解読した命令を保持し、実行可能な命令を順次に出力する実行制御部と、実行制御部から受ける演算の実行を示す演算命令に基づいて演算を実行する演算実行部と、演算実行部で使用するデータを記憶するメモリと、演算実行部が実行する演算に使用するデータを保持する複数の第１レジスタと、演算に使用する第１レジスタに対応して割り当てられ、演算命令またはメモリアクセス命令の実行により得られるデータを第１レジスタに転送する前に一時的に保持する複数の第２レジスタと、第１レジスタと第２レジスタとの割り当ての関係を保持し、実行制御部により参照されるレジスタ管理部を有する演算処理装置の制御方法は、演算処理装置が有するメモリ制御部が、実行制御部から受けるメモリからデータを読み出すメモリアクセス命令に基づいてメモリにアクセス要求を出力し、メモリ制御部がメモリアクセス命令に基づいて複数のアクセス要求を出力し、メモリから複数回に分けて読み出されるデータを第２レジスタに順次に転送する場合、演算処理装置が有するバイパス制御部が、メモリから最後に読み出されるデータが第２レジスタに転送された後に第２レジスタから演算実行部へデータをバイパスするタイミングで、実行制御部に演算命令を出力させる。 According to another aspect, a decoder unit that decodes an instruction, an execution control unit that holds instructions decoded by the decoder unit and sequentially outputs executable instructions, and an operation that indicates execution of an operation received from the execution control unit An operation execution unit that executes an operation based on an instruction, a memory that stores data used in the operation execution unit, a plurality of first registers that hold data used for an operation executed by the operation execution unit, and an operation used A plurality of second registers, which are allocated corresponding to the first registers to be stored, and temporarily hold before transferring the data obtained by executing the operation instruction or the memory access instruction to the first register, the first register and the second register A method for controlling an arithmetic processing unit having a register management unit that retains an allocation relationship with a register and is referenced by an execution control unit is executed by a memory control unit included in the arithmetic processing unit. An access request is output to the memory based on a memory access instruction for reading data from the memory received from the control unit, and the memory control unit outputs a plurality of access requests based on the memory access instruction, and is read out from the memory in multiple times. When sequentially transferring data to the second register, the bypass control unit included in the arithmetic processing unit bypasses the data from the second register to the arithmetic execution unit after the data read last from the memory is transferred to the second register At the timing, the execution control unit is caused to output an operation instruction.

本件開示の演算処理装置および演算処理装置の制御方法は、メモリアクセス命令に基づいてデータをメモリから複数回に分けて読み出す場合に、データのバイパスを制御する回路の回路規模の増加を抑制することができる。 The arithmetic processing device and the control method of the arithmetic processing device disclosed herein suppress an increase in the circuit scale of a circuit that controls data bypass when data is read from a memory in a plurality of times based on a memory access instruction. Can do.

演算処理装置および演算処理装置の制御方法の一実施形態を示す図である。It is a figure which shows one Embodiment of the arithmetic processing apparatus and the control method of an arithmetic processing apparatus. 演算処理装置および演算処理装置の制御方法の別の実施形態を示す図である。It is a figure which shows another embodiment of the arithmetic processing apparatus and the control method of an arithmetic processing apparatus. 図２に示す浮動小数点レジスタ部および固定小数点レジスタ部の例を示す図である。FIG. 3 is a diagram illustrating an example of a floating point register unit and a fixed point register unit illustrated in FIG. 2. 図２に示すリネーミングレジスタ部の例を示す図である。It is a figure which shows the example of the renaming register part shown in FIG. 図２に示すレジスタ管理テーブルの例を示す図である。FIG. 3 is a diagram illustrating an example of a register management table illustrated in FIG. 2. 図２に示すデータキャッシュからデータをロードする例を示す図である。It is a figure which shows the example which loads data from the data cache shown in FIG. 図２に示すバイパス制御部に設けられるセット信号生成回路の例を示す図である。It is a figure which shows the example of the set signal generation circuit provided in the bypass control part shown in FIG. バイパス制御部に設けられるバイパス管理テーブルおよびテーブル制御回路の例を示す図である。It is a figure which shows the example of the bypass management table and table control circuit which are provided in a bypass control part. 図７および図８に示すバイパス制御部の動作の例を示す図である。It is a figure which shows the example of operation | movement of the bypass control part shown to FIG. 7 and FIG. 図２に示す演算処理装置の動作の例を示す図である。It is a figure which shows the example of operation | movement of the arithmetic processing unit shown in FIG. 図２に示す演算処理装置の動作の別の例を示す図である。It is a figure which shows another example of operation | movement of the arithmetic processing unit shown in FIG. 図２に示す演算処理装置の動作のさらなる別の例を示す図である。It is a figure which shows another example of operation | movement of the arithmetic processing unit shown in FIG. 図２に示す演算処理装置の動作のさらなる別の例を示す図である。It is a figure which shows another example of operation | movement of the arithmetic processing unit shown in FIG.

以下、図面を用いて実施形態を説明する。 Hereinafter, embodiments will be described with reference to the drawings.

図１は、演算処理装置および演算処理装置の制御方法の一実施形態を示す。図１に示す演算処理装置ＯＰＤ１は、デコーダ部１、実行制御部２、演算実行部３、メモリ制御部４、メモリ５、バイパス制御部６、複数の第１レジスタ７、複数の第２レジスタ８およびレジスタ管理部９を有する。 FIG. 1 shows an embodiment of an arithmetic processing unit and a control method for the arithmetic processing unit. The arithmetic processing unit OPD1 shown in FIG. 1 includes a decoder unit 1, an execution control unit 2, an operation execution unit 3, a memory control unit 4, a memory 5, a bypass control unit 6, a plurality of first registers 7, and a plurality of second registers 8. And a register management unit 9.

デコーダ部１は、命令バッファ等から受ける命令を解読し、解読した命令を実行制御部２に出力する。実行制御部２は、デコーダ部１により解読された命令を保持し、実行可能な命令を演算実行部３またはメモリ制御部４に順次に出力する。すなわち、実行制御部２は、命令の実行順序を変更するアウトオブオーダ実行を制御する。実行制御部２は、命令が演算の実行を示す演算命令の場合、演算命令を演算実行部３に出力し、命令がメモリ５からデータを読み出すメモリアクセス命令の場合、メモリアクセス命令をメモリ制御部４に出力する。 The decoder unit 1 decodes an instruction received from an instruction buffer or the like, and outputs the decoded instruction to the execution control unit 2. The execution control unit 2 holds the instructions decoded by the decoder unit 1 and sequentially outputs executable instructions to the arithmetic execution unit 3 or the memory control unit 4. That is, the execution control unit 2 controls out-of-order execution that changes the execution order of instructions. When the instruction is an operation instruction indicating execution of an operation, the execution control unit 2 outputs the operation instruction to the operation execution unit 3. When the instruction is a memory access instruction for reading data from the memory 5, the execution control unit 2 transmits the memory access instruction to the memory control unit. 4 is output.

演算実行部３は、実行制御部２から受ける演算命令に基づいて、第１レジスタ７に格納されたデータを用いて演算を実行する。なお、演算に使用するデータが第２レジスタ８に格納されている場合、演算実行部３は、第１レジスタ７を介することなく、図１に太い破線の矢印で示すバイパス経路を介して第２レジスタ８からデータを受け、演算を実行することが可能である。第２レジスタ８からのデータのバイパスの可否は、バイパス制御部６により判断される。実行制御部２は、バイパス制御部６がバイパスの可能を判断した場合、演算命令をバイパスが可能なタイミングで演算実行部３に投入する。演算実行部３は、演算の実行より得られるデータを第２レジスタ８に出力する。 The operation execution unit 3 executes an operation using the data stored in the first register 7 based on the operation instruction received from the execution control unit 2. When the data used for the calculation is stored in the second register 8, the calculation execution unit 3 does not go through the first register 7, but passes through the bypass path indicated by the thick dashed arrow in FIG. It is possible to receive data from the register 8 and execute an operation. The bypass control unit 6 determines whether or not the data from the second register 8 can be bypassed. When the bypass control unit 6 determines that the bypass is possible, the execution control unit 2 inputs the calculation instruction to the calculation execution unit 3 at a timing at which the bypass can be performed. The operation execution unit 3 outputs data obtained by executing the operation to the second register 8.

メモリ制御部４は、実行制御部２から受けるメモリアクセス命令（例えば、ロード命令）に基づいてメモリ５にアクセス要求を出力する。例えば、メモリアクセス命令で読み出すデータが、メモリ５における１つのアドレスが割り当てられた領域に記憶されている場合、メモリ制御部４は、メモリアクセス命令に基づいて１つのアクセス要求を生成する。そして、アクセス要求によりメモリ５から読み出されたデータは、第２レジスタ８に一度で格納される。 The memory control unit 4 outputs an access request to the memory 5 based on a memory access instruction (for example, a load instruction) received from the execution control unit 2. For example, when data read by a memory access instruction is stored in an area to which one address is allocated in the memory 5, the memory control unit 4 generates one access request based on the memory access instruction. Data read from the memory 5 in response to the access request is stored in the second register 8 at a time.

一方、メモリアクセス命令で読み出すデータが、メモリ５における複数のアドレスが割り当てられた領域に記憶されている場合、メモリ制御部４は、メモリアクセス命令に基づいて複数のアクセス要求を順次に生成する。そして、複数のアクセス要求によりメモリ５から読み出されるデータは、第２レジスタ８に順次に格納される。すなわち、メモリアクセス命令で読み出すデータが、メモリ５における複数のアドレスが割り当てられた領域に記憶されている場合、ロードデータのタイミングずれが発生する。 On the other hand, when the data read by the memory access instruction is stored in an area to which a plurality of addresses are allocated in the memory 5, the memory control unit 4 sequentially generates a plurality of access requests based on the memory access instruction. Data read from the memory 5 in response to a plurality of access requests is sequentially stored in the second register 8. That is, when the data read by the memory access instruction is stored in an area in the memory 5 to which a plurality of addresses are assigned, load data timing shift occurs.

メモリアクセス命令に基づいて複数回に分けて読み出されるデータは、最後のアクセス要求に対応してメモリ５から読み出されたデータが第２レジスタ８に格納されたときに、第２レジスタ８内に揃う。換言すれば、メモリアクセス命令で読み出すデータが、複数のアクセス要求によりメモリ５から読み出される場合、第２レジスタ８を利用してデータを待ち合わせることができる。この結果、演算処理装置ＯＰＤ１は、メモリアクセス命令に基づいてデータをメモリ５から複数回に分けて読み出す場合にも、新たな回路を設けることなくデータを待ち合わせることができ、回路規模の増加を抑制することができる。 The data read in a plurality of times based on the memory access instruction is stored in the second register 8 when the data read from the memory 5 corresponding to the last access request is stored in the second register 8. It's aligned. In other words, when the data read by the memory access instruction is read from the memory 5 by a plurality of access requests, the data can be waited using the second register 8. As a result, the arithmetic processing unit OPD1 can wait for data without providing a new circuit even when reading data from the memory 5 in a plurality of times based on a memory access instruction, and suppresses an increase in circuit scale. can do.

メモリ５は、演算実行部３で使用するデータを記憶し、アクセス要求に基づいて記憶しているデータを出力する。メモリ５から読み出されるデータは、第２レジスタ８を経由して第１レジスタ７に転送される。 The memory 5 stores data used in the operation execution unit 3, and outputs the stored data based on the access request. Data read from the memory 5 is transferred to the first register 7 via the second register 8.

第１レジスタ７の各々は、演算実行部３が実行する演算に使用するデータを保持する。第２レジスタ８の各々は、演算に使用する第１レジスタ７に対応して割り当てられる。第１レジスタ７に対応して割り当てられた第２レジスタ８は、演算命令またはメモリアクセス命令の実行により得られるデータを第１レジスタ７に転送する前に一時的に保持する。 Each of the first registers 7 holds data used for an operation executed by the operation execution unit 3. Each of the second registers 8 is assigned corresponding to the first register 7 used for the operation. The second register 8 assigned corresponding to the first register 7 temporarily holds data obtained by execution of the arithmetic instruction or the memory access instruction before transferring it to the first register 7.

レジスタ管理部９は、第１レジスタ７と第２レジスタ８との割り当ての関係を保持する。実行制御部２は、レジスタ管理部９を参照することで、例えば、命令のオペランドにより指示される第１レジスタ７を割り当てた第２レジスタ８を認識する。例えば、デコーダ部１は、命令を解読した際に、第１レジスタ７を割り当てる第２レジスタ８を決定し、決定した割り当てを示す情報をレジスタ管理部９が有するテーブルに格納する。デコーダ部１が、演算に使用する第１レジスタ７に対応して第２レジスタ８を割り当てることで、第１レジスタ７が命令のオペランドで重複して指示され、依存関係が生じる場合にも、依存関係がないもとのとして演算を実行することができる。この結果、第２レジスタ８を使用しない場合に比べて、演算実行部３による演算の効率を向上することができる。 The register management unit 9 holds the allocation relationship between the first register 7 and the second register 8. The execution control unit 2 refers to the register management unit 9 to recognize, for example, the second register 8 to which the first register 7 designated by the instruction operand is assigned. For example, when the decoder unit 1 decodes the instruction, the decoder unit 1 determines the second register 8 to which the first register 7 is allocated, and stores information indicating the determined allocation in a table included in the register management unit 9. The decoder unit 1 assigns the second register 8 corresponding to the first register 7 used for the operation, so that the first register 7 is redundantly indicated by the operand of the instruction and a dependency relationship occurs. Arithmetic can be performed as an unrelated source. As a result, the calculation efficiency of the calculation execution unit 3 can be improved as compared with the case where the second register 8 is not used.

例えば、バイパス制御部６は、メモリ制御部４から受けるメモリ５のアクセスの状況を示す情報に基づいて、メモリ５から読み出され第２レジスタ８に格納されたデータを演算実行部３にバイパス可能か否かを判定する。また、バイパス制御部６は、実行制御部２から受ける演算実行部３による演算の実行状況を示す情報に基づいて、演算実行部３の演算により得られ第２レジスタ８に格納されたデータを演算実行部３にバイパス可能か否かを判定する。 For example, the bypass control unit 6 can bypass the data read from the memory 5 and stored in the second register 8 to the arithmetic execution unit 3 based on information indicating the access status of the memory 5 received from the memory control unit 4. It is determined whether or not. Further, the bypass control unit 6 calculates the data obtained by the calculation of the calculation execution unit 3 and stored in the second register 8 based on the information indicating the execution status of the calculation by the calculation execution unit 3 received from the execution control unit 2. It is determined whether the execution unit 3 can be bypassed.

バイパス制御部６は、メモリ制御部４がメモリアクセス命令に基づいてデータを１回でメモリ５から読み出す場合、データのバイパスが、メモリ５から第２レジスタ８に転送された後に可能になると判断する。この場合、メモリ５から読み出されるデータは、最初のデータでもあり、最後のデータでもある。そして、バイパス制御部６は、第２レジスタ８から演算実行部３へのデータを、最後のデータがメモリ５から第２レジスタ８に転送された後にバイパスするタイミングで、実行制御部２に演算命令を出力させる。 When the memory control unit 4 reads data from the memory 5 at a time based on the memory access instruction, the bypass control unit 6 determines that the data can be bypassed after being transferred from the memory 5 to the second register 8. . In this case, the data read from the memory 5 is both the first data and the last data. The bypass control unit 6 then sends an operation instruction to the execution control unit 2 at a timing to bypass the data from the second register 8 to the operation execution unit 3 after the last data is transferred from the memory 5 to the second register 8. Is output.

一方、バイパス制御部６は、メモリ制御部４がメモリアクセス命令に基づいてデータをメモリ５から複数回に分けて読み出す場合、データのバイパスが、メモリ５から第２レジスタ８に最後のデータが転送された後に可能になると判断する。そして、バイパス制御部６は、第２レジスタ８から演算実行部３へのデータのバイパスを、最後のデータがメモリ５から第２レジスタ８に転送された後に実行させるタイミングで、実行制御部２に演算命令を出力させる。 On the other hand, when the memory control unit 4 reads data from the memory 5 in a plurality of times based on the memory access instruction, the bypass control unit 6 bypasses the data and transfers the last data from the memory 5 to the second register 8. Judge that it will be possible after. The bypass control unit 6 then causes the execution control unit 2 to bypass the data from the second register 8 to the calculation execution unit 3 at the timing when the last data is transferred from the memory 5 to the second register 8. Output operation instructions.

以上、図１に示す実施形態では、バイパス制御部６は、メモリ５から読み出されるデータが第２レジスタ８に全て揃った後に、データのバイパスが可能になったことを実行制御部２に通知する。これにより、メモリ５から第２レジスタ８へロードされるデータのタイミングずれが発生する場合にも、既存の第２レジスタ８を利用してデータを待ち合わせることができる。すなわち、新たな回路を設けることなくデータを待ち合わせることができ、演算処理装置ＯＰＤ１の回路規模の増加を抑制することができる。 As described above, in the embodiment illustrated in FIG. 1, the bypass control unit 6 notifies the execution control unit 2 that the data can be bypassed after all the data read from the memory 5 is stored in the second register 8. . As a result, even when the timing of data loaded from the memory 5 to the second register 8 is shifted, the data can be waited using the existing second register 8. In other words, data can be queued without providing a new circuit, and an increase in the circuit scale of the arithmetic processing unit OPD1 can be suppressed.

また、メモリアクセス命令に基づいてデータがメモリ５から１回で読み出されるか複数回に分けて読み出されるかに依存せず、バイパス制御部６は、メモリ５から第２レジスタ８に最後のデータが転送された後にバイパスが可能になると判断する。これにより、バイパス制御部６によるバイパスタイミングの制御を従来と同様に簡易にすることができる。この結果、メモリアクセス命令に基づいてデータがメモリ５から１回で読み出される場合と、複数回に分けて読み出される場合とでバイパスタイミングを変更する場合にも、演算処理装置ＯＰＤ１の回路規模の増加を抑制することができる。さらに、回路規模の増加を抑制することで、演算処理装置ＯＰＤ１の消費電力の増加を抑制することができ、演算処理装置ＯＰＤ１の動作周波数を向上させることができる。 In addition, the bypass control unit 6 does not depend on whether the data is read from the memory 5 once or divided into a plurality of times based on the memory access instruction. It is determined that bypass is possible after being transferred. Thereby, control of the bypass timing by the bypass control part 6 can be simplified similarly to the past. As a result, the circuit scale of the arithmetic processing unit OPD1 increases even when the bypass timing is changed between when the data is read from the memory 5 once based on the memory access instruction and when the data is read in multiple times. Can be suppressed. Furthermore, by suppressing an increase in circuit scale, an increase in power consumption of the arithmetic processing unit OPD1 can be suppressed, and an operating frequency of the arithmetic processing device OPD1 can be improved.

図２は、演算処理装置および演算処理装置の制御方法の別の実施形態を示す。図２の太い実線の矢印は、ロード命令の実行時における浮動小数点データの転送経路を示す。図２の太い破線の矢印は、ロード命令の実行時における浮動小数点データのバイパス経路を示す。なお、固定小数点データのバイパス経路の図示は省略する。 FIG. 2 shows another embodiment of the arithmetic processing device and the control method of the arithmetic processing device. A thick solid line arrow in FIG. 2 indicates the transfer path of the floating point data when the load instruction is executed. The thick dashed arrows in FIG. 2 indicate the floating point data bypass path during execution of the load instruction. Note that the bypass path for fixed-point data is not shown.

図２に示す演算処理装置ＯＰＤ２は、命令キャッシュ１０、命令バッファ１２、デコーダ部１４、レジスタ管理テーブル１６、リザベーションステーション１８、アドレス生成部２０、データキャッシュ２２およびバイパス制御部２４を有する。また、演算処理装置ＯＰＤ２は、浮動小数点演算器２６、固定小数点演算器２８、ロードレジスタ３０、リザルトレジスタ３２、３４、リネーミングレジスタ部３６、３８、浮動小数点レジスタ部４０、固定小数点レジスタ部４２およびコミット制御部４４を有する。 The arithmetic processing unit OPD2 illustrated in FIG. 2 includes an instruction cache 10, an instruction buffer 12, a decoder unit 14, a register management table 16, a reservation station 18, an address generation unit 20, a data cache 22, and a bypass control unit 24. The arithmetic processing unit OPD2 includes a floating point arithmetic unit 26, a fixed point arithmetic unit 28, a load register 30, result registers 32 and 34, renaming register units 36 and 38, a floating point register unit 40, a fixed point register unit 42, and A commit control unit 44 is included.

例えば、命令キャッシュ１０は、二次命令キャッシュまたはメインメモリ等から転送される命令を格納する一次命令キャッシュである。命令バッファ１２は、命令キャッシュ１０から順次に転送される命令を保持し、保持した命令をデコーダ部１４に順次に転送する。 For example, the instruction cache 10 is a primary instruction cache that stores instructions transferred from a secondary instruction cache or a main memory. The instruction buffer 12 holds instructions sequentially transferred from the instruction cache 10 and sequentially transfers the held instructions to the decoder unit 14.

デコーダ部１４は、命令バッファ１２から転送される命令を順次に解読し、解読した命令に含まれる情報ＩＮＳ（以下、命令情報ＩＮＳと称される）を、リザベーションステーション１８に格納する。例えば、命令情報ＩＮＳは、命令の種別に応じて、リザベーションステーション１８が有するリザベーションステーションＲＳＦＬＴ、ＲＳＦＩＸ、ＲＳＭＡのいずれかに格納される。 The decoder unit 14 sequentially decodes instructions transferred from the instruction buffer 12, and stores information INS (hereinafter referred to as instruction information INS) included in the decoded instructions in the reservation station 18. For example, the command information INS is stored in any of the reservation stations RSFLT, RSFIX, and RSMA included in the reservation station 18 according to the type of the command.

デコーダ部１４は、解読した命令が浮動小数点レジスタ部４０内の浮動小数点レジスタをデータの格納先に使用することを示す場合、使用する浮動小数点レジスタを、リネーミングレジスタ部３６内のリネーミングレジスタのいずれかに割り当てる。例えば、デコーダ部１４は、データの格納先に使用する浮動小数点レジスタのレジスタ番号ＲＮと、割り当てるリネーミングレジスタを示すアドレスＵＢＡとを、書き込み要求ＷＥとともにレジスタ管理テーブル１６およびコミット制御部４４に出力する。例えば、アドレスＵＢＡは、”Update Buffer Address”の略称である。リネーミングレジスタ部３６が３２個のリネーミングレジスタを有する場合、アドレスＵＢＡは５ビットである。なお、アドレスＵＢＡは、リザベーションステーション１８にも出力される。浮動小数点レジスタ部４０の例は、図３に示し、リネーミングレジスタ部３６の例は、図４に示す。 When the decoded instruction indicates that the floating point register in the floating point register unit 40 is used as the data storage destination, the decoder unit 14 determines the floating point register to be used as the renaming register in the renaming register unit 36. Assign to one. For example, the decoder unit 14 outputs the register number RN of the floating-point register used as the data storage destination and the address UBA indicating the renaming register to be allocated to the register management table 16 and the commit control unit 44 together with the write request WE. . For example, the address UBA is an abbreviation for “Update Buffer Address”. When the renaming register unit 36 has 32 renaming registers, the address UBA is 5 bits. The address UBA is also output to the reservation station 18. An example of the floating point register unit 40 is shown in FIG. 3, and an example of the renaming register unit 36 is shown in FIG.

同様に、デコーダ部１４は、解読した命令が固定小数点レジスタ部４２内の固定小数点レジスタをデータの格納先に使用することを示す場合、使用する固定小数点レジスタを、リネーミングレジスタ部３８内のリネーミングレジスタのいずれかに割り当てる。例えば、デコーダ部１４は、データの格納先に使用する固定小数点レジスタを示すレジスタ番号ＲＮと、割り当てたリネーミングレジスタを示すアドレスＵＢＡとを、書き込み要求ＷＥとともにレジスタ管理テーブル１６およびコミット制御部４４に出力する。なお、デコーダ部１４は、浮動小数点レジスタ部４０と固定小数点レジスタとを識別可能なレジスタ番号ＲＮおよびアドレスＵＢＡを出力する。固定小数点レジスタ部４２の例は、図３に示し、リネーミングレジスタ部３８の例は、図４に示す。 Similarly, when the decoded instruction indicates that the decoded instruction uses the fixed-point register in the fixed-point register unit 42 as the data storage destination, the decoder unit 14 converts the fixed-point register to be used into the rename register in the renaming register unit 38. Assign to one of the naming registers. For example, the decoder unit 14 sends the register number RN indicating the fixed-point register used as the data storage destination and the address UBA indicating the assigned renaming register to the register management table 16 and the commit control unit 44 together with the write request WE. Output. The decoder unit 14 outputs a register number RN and an address UBA that can identify the floating point register unit 40 and the fixed point register. An example of the fixed-point register unit 42 is shown in FIG. 3, and an example of the renaming register unit 38 is shown in FIG.

レジスタ管理テーブル１６は、浮動小数点レジスタ用のテーブルＲＭＴＢＬ１と固定小数点レジスタ用のテーブルＲＭＴＢＬ２とを有する。レジスタ管理テーブル１６は、デコーダ部１４から受けるレジスタ番号ＲＮ、アドレスＵＢＡおよび書き込み要求ＷＥに応じて、テーブルＲＭＴＢＬ１、ＲＭＴＢＬ２のいずれかに格納された情報を更新する。レジスタ管理テーブル１６は、リザベーションステーション１８からの読み出し要求に応じて、テーブルＲＭＴＢＬ１、ＲＭＴＢＬ２に記憶しているアドレスＵＢＡおよびビット値Ｐをリザベーションステーション１８に出力する。レジスタ管理テーブル１６は、コミット制御部４４からのリセット要求ＲＳＴ１に基づいて、テーブルＲＭＴＢＬ１、ＲＭＴＢＬ２のいずれかに登録したアドレスＵＢＡを無効化する。レジスタ管理テーブル１６は、浮動小数点レジスタＦＲとリネーミングレジスタＲＮＦＲとの割り当ての関係を保持し、リザベーションステーションＲＳＦＬＴにより参照されるレジスタ管理部の一例である。レジスタ管理テーブル１６の例は、図５に示す。 The register management table 16 includes a floating-point register table RMTBL1 and a fixed-point register table RMTBL2. The register management table 16 updates information stored in one of the tables RMTBL1 and RMTBL2 according to the register number RN, address UBA, and write request WE received from the decoder unit 14. The register management table 16 outputs the address UBA and the bit value P stored in the tables RMTBL1 and RMTBL2 to the reservation station 18 in response to a read request from the reservation station 18. The register management table 16 invalidates the address UBA registered in either of the tables RMTBL1 and RMTBL2 based on the reset request RST1 from the commit control unit 44. The register management table 16 is an example of a register management unit that holds an allocation relationship between the floating-point register FR and the renaming register RNFR and is referred to by the reservation station RSFLT. An example of the register management table 16 is shown in FIG.

リザベーションステーション１８は、３つのリザベーションステーションＲＳＦＬＴ、ＲＳＦＩＸ、ＲＳＭＡを含む。リザベーションステーション１８は、デコーダ部１４が解読した命令（命令情報ＩＮＳ）を保持し、実行可能な命令を順次に出力する実行制御部の一例である。リザベーションステーションＲＳＦＬＴは、浮動小数点データの演算用の命令に含まれる命令情報ＩＮＳを保持する。リザベーションステーションＲＳＦＩＸは、固定小数点データの演算用の命令に含まれる命令情報ＩＮＳを保持する。また、リザベーションステーションＲＳＭＡは、メモリアクセス用の命令に含まれる命令情報ＩＮＳを保持する。 The reservation station 18 includes three reservation stations RSFLT, RSFIX, and RSMA. The reservation station 18 is an example of an execution control unit that holds instructions (command information INS) decoded by the decoder unit 14 and sequentially outputs executable instructions. The reservation station RSFLT holds instruction information INS included in instructions for calculating floating point data. The reservation station RSFIX holds instruction information INS included in instructions for calculating fixed-point data. The reservation station RSMA holds command information INS included in the memory access command.

リザベーションステーション１８は、命令情報ＩＮＳに基づいて、レジスタ管理テーブル１６を参照して得られる浮動小数点レジスタ（または固定小数点レジスタ）を割り当てたリネーミングレジスタを示すアドレスＵＢＡおよびビット値Ｐを保持する。保持したビット値Ｐは、命令の実行の完了時にコミット制御部４４から受けるリセット要求ＲＳＴ２に基づいて無効化され、実行が完了した命令のアドレスＵＢＡは、ビット値Ｐの無効化により破棄される。 The reservation station 18 holds an address UBA and a bit value P indicating a renaming register to which a floating point register (or a fixed point register) obtained by referring to the register management table 16 is assigned based on the instruction information INS. The held bit value P is invalidated based on the reset request RST2 received from the commit control unit 44 when the execution of the instruction is completed, and the address UBA of the instruction that has been executed is discarded due to the invalidation of the bit value P.

なお、リザベーションステーション１８は、データを格納する浮動小数点レジスタ（または固定小数点レジスタ）を割り当てたリネーミングレジスタのアドレスＵＢＡを、命令情報ＩＮＳとともにデコーダ部１４から受ける。また、リザベーションステーション１８は、データを読み出す浮動小数点レジスタ（または固定小数点レジスタ）を割り当てたリネーミングレジスタのアドレスＵＢＡを、レジスタ管理テーブル１６から読み出す。以下では、データを格納する浮動小数点レジスタ（または固定小数点レジスタ）は、ディスティネーションレジスタとも称され、データを読み出す浮動小数点レジスタ（または固定小数点レジスタ）は、ソースレジスタとも称される。 The reservation station 18 receives the address UBA of the renaming register to which the floating point register (or fixed point register) for storing data is assigned from the decoder unit 14 together with the instruction information INS. The reservation station 18 also reads from the register management table 16 the address UBA of the renaming register to which the floating-point register (or fixed-point register) from which data is read is assigned. Hereinafter, the floating-point register (or fixed-point register) that stores data is also referred to as a destination register, and the floating-point register (or fixed-point register) that reads data is also referred to as a source register.

リザベーションステーション１８は、レジスタ管理テーブル１６から読み出したアドレスＵＢＡおよびバイパス制御部２４から出力されるバイパス可能信号ＢＰＥＮ（ＢＰＥＮ０−ＢＰＥＮ３１）に基づいて、保持した命令の依存関係を判定する。そして、リザベーションステーション１８は、保持した命令の中から実行可能な演算命令を浮動小数点演算器２６または固定小数点演算器２８に順次に投入する。あるいは、リザベーションステーション１８は、保持した命令の中から実行可能なロード命令またはストア命令をアクセス要求ＭＲＥＱとしてアドレス生成部２０に順次に投入する。ロード命令を示すアクセス要求ＭＲＥＱは、データキャッシュ２２からデータを読み出すメモリアクセス命令の一例である。リザベーションステーション１８は、命令の実行順序を変更するアウトオブオーダ実行を制御する実行制御部の一例である。 The reservation station 18 determines the dependency relationship of the held instruction based on the address UBA read from the register management table 16 and the bypass enable signal BPEN (BPEN0-BPEN31) output from the bypass control unit 24. Then, the reservation station 18 sequentially inputs an executable operation instruction from the held instructions to the floating point arithmetic unit 26 or the fixed point arithmetic unit 28. Alternatively, the reservation station 18 sequentially inputs an executable load instruction or store instruction from among the held instructions to the address generation unit 20 as an access request MREQ. An access request MREQ indicating a load instruction is an example of a memory access instruction for reading data from the data cache 22. The reservation station 18 is an example of an execution control unit that controls out-of-order execution that changes the execution order of instructions.

リザベーションステーション１８は、浮動小数点演算器２６の演算により得られるデータがリザルトレジスタ３２またはリネーミングレジスタ部３６からバイパス可能な場合、制御信号Ｂ１ＦＰおよびアドレスＢ１ＦＰＵＢＡをバイパス制御部２４に出力する。制御信号Ｂ１ＦＰは、浮動小数点演算器２６が演算を開始するタイミングを示す開始信号である。制御信号Ｂ１ＦＰの出力タイミングは、バイパス制御部２４のバイパス管理テーブルのフラグ領域ＦＬＧ（図８）をセットするタイミングを示す。制御信号Ｂ１ＦＰおよびアドレスＢ１ＦＰＵＢＡの先頭の”Ｂ１”は、制御信号Ｂ１ＦＰおよびアドレスＢ１ＦＰＵＢＡが後述するＢ１サイクルで生成されることを示す。制御信号Ｂ１ＦＰおよびアドレスＢ１ＦＰＵＢＡを受けた場合のバイパス制御部２４の動作は、図１１で説明される。 The reservation station 18 outputs the control signal B1FP and the address B1FPUBA to the bypass control unit 24 when the data obtained by the calculation of the floating point arithmetic unit 26 can be bypassed from the result register 32 or the renaming register unit 36. The control signal B1FP is a start signal indicating the timing at which the floating point arithmetic unit 26 starts operation. The output timing of the control signal B1FP indicates the timing at which the flag area FLG (FIG. 8) of the bypass management table of the bypass control unit 24 is set. “B1” at the head of the control signal B1FP and the address B1FPUBA indicates that the control signal B1FP and the address B1FPUBA are generated in a B1 cycle to be described later. The operation of bypass control unit 24 when receiving control signal B1FP and address B1FPUBA will be described with reference to FIG.

アドレス生成部２０は、リザベーションステーション１８から出力されるアクセス要求ＭＲＥＱに基づいて、データキャッシュ２２に出力するアクセス要求ＭＲＥＱＣを生成する。アドレス生成部２０は、データキャッシュ２２から浮動小数点データを読み出すアクセス要求ＭＲＥＱＣとともに、制御信号ＡＬＤＬ、ＡＬＤＨおよびアドレスＡＬＤＵＢＡをバイパス制御部２４に出力する。制御信号ＡＬＤＬ、ＡＬＤＨは、データキャッシュ２２からデータのロードを開始するタイミングを示す開始信号の一例であり、例えば、アクセス要求ＭＲＥＱＣに同期して生成される。制御信号ＡＬＤＬは下位４バイトのデータをロードする場合に生成され、制御信号ＡＬＤＨは上位４バイトのデータをロードする場合に生成される。下位４バイトのデータおよび上位４バイトのデータは、データ群の一例である。 The address generation unit 20 generates an access request MREQC to be output to the data cache 22 based on the access request MREQ output from the reservation station 18. The address generation unit 20 outputs the control signals ALDL and ALDH and the address ALDUBA to the bypass control unit 24 together with the access request MREQC for reading the floating point data from the data cache 22. The control signals ALDL and ALDH are an example of a start signal indicating a timing at which loading of data from the data cache 22 is started. For example, the control signals ALDL and ALDH are generated in synchronization with the access request MREQC. The control signal ALDL is generated when lower 4 bytes of data are loaded, and the control signal ALDH is generated when upper 4 bytes of data are loaded. The lower 4 bytes of data and the upper 4 bytes of data are examples of data groups.

制御信号ＡＬＤＬ、ＡＬＤＨおよびアドレスＡＬＤＵＢＡの先頭の”Ａ”は、制御信号ＡＬＤＬ、ＡＬＤＨおよびアドレスＡＬＤＵＢＡが後述するＡサイクルで生成されることを示す。アドレスＡＬＤＵＢＡの値は、バイパス可能なデータが転送されるリネーミングレジスタＲＮＦＲ（図４）を示す。アドレスＡＬＤＵＢＡのビット数は、６４個のリネーミングレジスタＲＮＦＲを識別可能な６ビットである。 “A” at the head of the control signals ALDL and ALDH and the address ALDUBA indicates that the control signals ALDL and ALDH and the address ALDUBA are generated in an A cycle described later. The value of the address ALDUBA indicates the renaming register RNFR (FIG. 4) to which bypassable data is transferred. The number of bits of the address ALDUBA is 6 bits that can identify 64 renaming registers RNFR.

アクセス要求ＭＲＥＱに基づいて、データキャッシュ２２内の１つのキャッシュラインからデータが読み出し可能な場合、アドレス生成部２０は、１つのアクセス要求ＭＲＥＱＣを生成する。一方、アクセス要求ＭＲＥＱに基づいて、データキャッシュ２２内の２つのキャッシュラインに保持される場合、あるいは、データの一部がデータキャッシュ２２に保持されていない場合（キャッシュミス）がある。この場合、アドレス生成部２０は、１つのキャッシュラインからデータを読み出すアクセス要求ＭＲＥＱＣを生成した後、時間間隔を置いて、他の１つのキャッシュラインからデータを読み出すアクセス要求ＭＲＥＱＣを生成する。すなわち、データキャッシュ２２から読み出すロードデータのタイミングずれが発生する。 When data can be read from one cache line in the data cache 22 based on the access request MREQ, the address generation unit 20 generates one access request MREQC. On the other hand, there is a case where the data is held in two cache lines in the data cache 22 based on the access request MREQ, or a case where a part of the data is not held in the data cache 22 (cache miss). In this case, the address generation unit 20 generates an access request MREQC for reading data from one cache line, and then generates an access request MREQC for reading data from another cache line at a time interval. That is, there is a timing shift in the load data read from the data cache 22.

例えば、アドレス生成部２０は、１つのアクセス要求ＭＲＥＱＣにより、リネーミングレジスタ部３６へ転送する８バイトのデータをデータキャッシュ２２から一度で読み出す場合、制御信号ＡＬＤＬ、ＡＬＤＨの両方を生成する。アドレス生成部２０は、２つのアクセス要求ＭＲＥＱＣにより、リネーミングレジスタ部３６へ転送する８バイトのデータを２つの４バイトのデータ毎にデータキャッシュ２２から読み出す場合、制御信号ＡＬＤＬ、ＡＬＤＨを互いに異なるタイミングで生成する。 For example, when the 8-byte data to be transferred to the renaming register unit 36 is read from the data cache 22 at a time by one access request MREQC, the address generation unit 20 generates both control signals ALDL and ALDH. When the address generation unit 20 reads out the 8-byte data to be transferred to the renaming register unit 36 from the data cache 22 every two 4-byte data according to the two access requests MREQC, the control signals ALDL and ALDH have different timings from each other. Generate with

例えば、アドレス生成部２０は、８バイトのデータを４バイトずつデータキャッシュ２２から読み出す場合、上位４バイトのデータを読み出した後、下位４バイトのデータを読み出す。このため、制御信号ＡＬＤＬが出力されるタイミングで、上位４バイトのデータは、データキャッシュ２２から出力されており、ロードレジスタ３０およびリネーミングレジスタ部３６に順次に転送される。制御信号ＡＬＤＬ、ＡＬＤＨおよびアドレスＡＬＤＵＢＡを受けた場合のバイパス制御部２４の動作は、図９で説明する。アドレス生成部２０は、リザベーションステーションＲＳＦＬＴから受けるアクセス要求ＭＲＥＱ（メモリアクセス命令）に基づいてデータキャッシュ２２にアクセス要求ＭＲＥＱＣを出力するメモリ制御部の一例である。 For example, when the 8-byte data is read from the data cache 22 in units of 4 bytes, the address generation unit 20 reads the upper 4 bytes and then the lower 4 bytes. Therefore, at the timing when the control signal ALDL is output, the upper 4 bytes of data are output from the data cache 22 and sequentially transferred to the load register 30 and the renaming register unit 36. The operation of the bypass control unit 24 when receiving the control signals ALDL and ALDH and the address ALDUBA will be described with reference to FIG. The address generation unit 20 is an example of a memory control unit that outputs an access request MREQC to the data cache 22 based on an access request MREQ (memory access instruction) received from the reservation station RSFLT.

例えば、データキャッシュ２２は、二次データキャッシュまたはメインメモリ等から転送されるデータを格納する一次データキャッシュである。データキャッシュ２２は、アドレス生成部２０からのアクセス要求ＭＲＥＱＣによりアクセスされる。データキャッシュ２２は、アクセス要求ＭＲＥＱＣに対応するデータを保持している場合（キャッシュヒット）、保持しているデータをロードレジスタ３０に出力する。データキャッシュ２２は、アクセス要求ＭＲＥＱＣに対応するデータを保持していない場合（キャッシュミス）、二次データキャッシュまたはメインメモリ等からデータを読み出し、読み出したデータをデータキャッシュ２２の記憶領域に格納する。 For example, the data cache 22 is a primary data cache that stores data transferred from a secondary data cache or a main memory. The data cache 22 is accessed by an access request MREQC from the address generation unit 20. When the data cache 22 holds data corresponding to the access request MREQC (cache hit), the data cache 22 outputs the held data to the load register 30. When the data cache 22 does not hold data corresponding to the access request MREQC (cache miss), the data cache 22 reads data from the secondary data cache, the main memory, or the like, and stores the read data in the storage area of the data cache 22.

また、データキャッシュ２２は、キャッシュヒット時に、データをロードレジスタ３０に出力するサイクルで完了信号ＣＥをバイパス制御部２４に出力する。データキャッシュ２２は、キャッシュミス時に、完了信号ＣＥの出力を禁止する。 Further, the data cache 22 outputs a completion signal CE to the bypass control unit 24 in a cycle of outputting data to the load register 30 when a cache hit occurs. The data cache 22 prohibits the output of the completion signal CE when a cache miss occurs.

バイパス制御部２４は、リネーミングレジスタ部３６が有するリネーミングレジスタ毎に、浮動小数点データのバイパスが可能か否かを示す情報を保持するバイパス管理テーブルＢＰＴＢＬを有する。バイパス制御部２４は、バイパス管理テーブルＢＰＴＢＬが保持する情報に基づいて、バイパス可能信号ＢＰＥＮ（ＢＰＥＮ０−ＢＰＥＮ３１）を出力する。また、バイパス制御部２４は、リネーミングレジスタ部３８のリネーミングレジスタ毎に、固定小数点データのバイパスが可能か否かを示す情報を保持する図示しないバイパス管理テーブルを有する。 The bypass control unit 24 includes a bypass management table BPTBL that holds information indicating whether or not the floating-point data can be bypassed for each renaming register included in the renaming register unit 36. The bypass control unit 24 outputs a bypass enable signal BPEN (BPEN0-BPEN31) based on the information held in the bypass management table BPTBL. Further, the bypass control unit 24 has a bypass management table (not shown) that holds information indicating whether or not fixed-point data can be bypassed for each renaming register of the renaming register unit 38.

バイパス制御部２４は、制御信号Ｂ１ＦＰ、ＡＬＤＬ、ＡＬＤＨに基づいて、アドレスＢ１ＦＰＵＢＡ、ＡＬＤＵＢＡに対応するバイパス可能信号ＢＰＥＮ（ＢＰＥＮ０−ＢＰＥＮ３１のいずれか）を有効レベルに設定する。バイパス制御部２４は、制御信号ＷＣＭＴに基づいて、アドレスＷＣＭＴＵＢＡに対応するバイパス可能信号ＢＰＥＮを無効レベルに設定する。バイパス制御部２４は、完了信号ＣＥに基づいて、キャッシュミスしたアクセス命令ＭＲＥＱＣに対応するアドレスＡＬＤＵＢＡが示すバイパス可能信号ＢＰＥＮを無効レベルに設定する。バイパス制御部２４の例は、図７および図８に示される。 The bypass control unit 24 sets the bypass enable signal BPEN (any one of BPEN0 to BPEN31) corresponding to the addresses B1FPUBA and ALDUBA to an effective level based on the control signals B1FP, ALDL, and ALDH. The bypass control unit 24 sets the bypass enable signal BPEN corresponding to the address WCMTUBA to an invalid level based on the control signal WCMT. Based on the completion signal CE, the bypass control unit 24 sets the bypass enable signal BPEN indicated by the address ALDUBA corresponding to the cache-missed access instruction MREQC to an invalid level. Examples of the bypass control unit 24 are shown in FIGS. 7 and 8.

浮動小数点演算器２６は、リザベーションステーションＲＳＦＬＴから転送される命令に基づいて演算を実行し、実行結果をリザルトレジスタ３２に格納する。浮動小数点演算器２６は、リザベーションステーションＲＳＦＬＴから受ける演算の実行を示す演算命令に基づいて演算を実行する演算実行部の一例である。浮動小数点演算器２６による演算は、浮動小数点レジスタ部４０、リネーミングレジスタ部３６、リザルトレジスタ３２およびロードレジスタ３０の少なくともいずれかに格納されたデータを用いて実行される。 The floating point arithmetic unit 26 executes an operation based on the instruction transferred from the reservation station RSFLT and stores the execution result in the result register 32. The floating point arithmetic unit 26 is an example of an operation execution unit that executes an operation based on an operation instruction indicating execution of an operation received from the reservation station RSFLT. The calculation by the floating point arithmetic unit 26 is executed using data stored in at least one of the floating point register unit 40, the renaming register unit 36, the result register 32, and the load register 30.

固定小数点演算器２８は、リザベーションステーションＲＳＦＩＸから転送される命令に基づいて演算を実行し、実行結果をリザルトレジスタ３４に格納する。固定小数点演算器２８による演算は、固定小数点レジスタ部４２、リネーミングレジスタ部３８、リザルトレジスタ３４およびロードレジスタ３０の少なくともいずれかに格納されたデータを用いて実行される。 The fixed point arithmetic unit 28 performs an operation based on an instruction transferred from the reservation station RSFIX, and stores the execution result in the result register 34. The calculation by the fixed-point calculator 28 is executed using data stored in at least one of the fixed-point register unit 42, the renaming register unit 38, the result register 34, and the load register 30.

ロードレジスタ３０は、ロード命令の実行に基づいてデータキャッシュ２２から読み出されるデータを一時的に保持し、保持したデータをリネーミングレジスタ部３６またはリネーミングレジスタ部３８にデータ線ＬＤＤを介して転送する。ロードレジスタ３０は、データキャッシュ２２とリネーミングレジスタＲＮＦＲ（図４）との間に配置され、データキャッシュ２２から読み出されるデータをリネーミングレジスタＲＮＦＲに転送する前に一時的に保持する第３レジスタの一例である。 The load register 30 temporarily holds data read from the data cache 22 based on the execution of the load instruction, and transfers the held data to the renaming register unit 36 or the renaming register unit 38 via the data line LDD. . The load register 30 is arranged between the data cache 22 and the renaming register RNFR (FIG. 4), and is a third register that temporarily holds data read from the data cache 22 before transferring it to the renaming register RNFR. It is an example.

リザルトレジスタ３２は、浮動小数点演算器２６による演算結果を一時的に保持し、保持したデータをリネーミングレジスタ部３６にデータ線ＥＸＤ１を介して転送する。リザルトレジスタ３４は、固定小数点演算器２８による演算結果を一時的に保持し、保持したデータをリネーミングレジスタ部３８にデータ線ＥＸＤ２を介して転送する。 The result register 32 temporarily holds the calculation result by the floating point calculator 26 and transfers the held data to the renaming register unit 36 via the data line EXD1. The result register 34 temporarily holds the calculation result by the fixed-point calculator 28 and transfers the held data to the renaming register unit 38 via the data line EXD2.

例えば、リネーミングレジスタ部３６は、浮動小数点演算器２６が実行した演算により得られた浮動小数点データまたはデータキャッシュから転送される浮動小数点データを一次的に保持する３２個のリネーミングレジスタＲＮＦＲ（図４）を有する。各リネーミングレジスタＲＮＦＲに保持されたデータは、命令の完了に基づいてコミット制御部４４が出力する読み出し要求ＲＣＭＴ１に応答して、浮動小数点レジスタ部４０に転送される。リネーミングレジスタＲＮＦＲは、浮動小数点レジスタＦＲ（図３）に対応して割り当てられ、演算命令またはメモリアクセス命令の実行により得られるデータを浮動小数点レジスタＦＲに転送する前に一時的に保持する第２レジスタの一例である。 For example, the renaming register unit 36 has 32 renaming registers RNFR that temporarily hold floating point data obtained by an operation executed by the floating point arithmetic unit 26 or floating point data transferred from a data cache (see FIG. 4). The data held in each renaming register RNFR is transferred to the floating point register unit 40 in response to a read request RCMT1 output from the commit control unit 44 based on the completion of the instruction. The renaming register RNFR is assigned corresponding to the floating-point register FR (FIG. 3), and temporarily holds the data obtained by executing the operation instruction or the memory access instruction before transferring it to the floating-point register FR. It is an example of a register.

例えば、リネーミングレジスタ部３８は、固定小数点演算器２８が実行した演算により得られた固定小数点データまたはデータキャッシュから転送される固定小数点データを一次的に保持する３２個のリネーミングレジスタＲＮＲ（図４）を有する。各リネーミングレジスタＲＮＲに保持されたデータは、命令の完了に基づいてコミット制御部４４が出力する読み出し要求ＲＣＭＴ２に応答して、固定小数点レジスタ部４２に転送される。リネーミングレジスタ部３６、３８の例は、図４に示す。 For example, the renaming register unit 38 temporarily stores the 32 renaming registers RNR (see FIG. 5) that hold the fixed-point data obtained by the calculation executed by the fixed-point calculator 28 or the fixed-point data transferred from the data cache. 4). The data held in each renaming register RNR is transferred to the fixed-point register unit 42 in response to a read request RCMT2 output from the commit control unit 44 based on the completion of the instruction. An example of the renaming register units 36 and 38 is shown in FIG.

例えば、浮動小数点レジスタ部４０は、浮動小数点演算器２６が実行する演算により得られた浮動小数点データまたはデータキャッシュ２２から転送される浮動小数点データを保持する６４個の浮動小数点レジスタＦＲ（図３）を有する。浮動小数点レジスタＦＲは、浮動小数点演算器２６が実行する演算に使用するデータを保持する第１レジスタの一例である。浮動小数点レジスタ部４０は、命令の完了に基づいてコミット制御部４４が出力する書き込み要求ＷＣＭＴ１に応答して、リネーミングレジスタ部３６から転送されるデータを記憶する。 For example, the floating-point register unit 40 has 64 floating-point registers FR (FIG. 3) that hold floating-point data obtained by an operation executed by the floating-point calculator 26 or floating-point data transferred from the data cache 22. Have The floating point register FR is an example of a first register that holds data used for an operation executed by the floating point arithmetic unit 26. The floating point register unit 40 stores the data transferred from the renaming register unit 36 in response to the write request WCMT1 output from the commit control unit 44 based on the completion of the instruction.

例えば、固定小数点レジスタ部４２は、固定小数点演算器２８が実行する演算により得られた固定小数点データまたはデータキャッシュ２２から転送される固定小数点データを保持する３２個の固定小数点レジスタＲ（図３）を有する。固定小数点レジスタ部４２は、命令の完了に基づいてコミット制御部４４が出力する書き込み要求ＷＣＭＴ２に応答して、リネーミングレジスタ部３８から転送されるデータを記憶する。浮動小数点レジスタ部４０および固定小数点レジスタ部４２の例は、図３に示す。 For example, the fixed-point register unit 42 has 32 fixed-point registers R that hold fixed-point data obtained by the operation executed by the fixed-point calculator 28 or fixed-point data transferred from the data cache 22 (FIG. 3). Have The fixed-point register unit 42 stores data transferred from the renaming register unit 38 in response to the write request WCMT2 output from the commit control unit 44 based on the completion of the instruction. Examples of the floating point register unit 40 and the fixed point register unit 42 are shown in FIG.

コミット制御部４４は、命令の実行の完了に基づいて、レジスタ管理テーブル１６にリセット要求ＲＳＴ１を出力し、リザベーションステーション１８にリセット要求ＲＳＴ２を出力する。また、コミット制御部４４は、命令の実行の完了に基づいて、バイパス制御部２４に制御信号ＷＣＭＴおよびアドレスＷＣＭＴＵＢＡを出力する。コミット制御部４４は、デコーダ部１４が命令の解読に基づいて出力するレジスタ番号ＲＮおよびアドレスＵＢＡを命令毎に保持する。コミット制御部４４は、保持したアドレスＵＢＡのうち、実行が完了した命令に対応するアドレスＵＢＡが示すリネーミングレジスタＲＮＦＲからデータを読み出す読み出し要求ＲＣＭＴ１（またはＲＣＭＴ２）を出力する。コミット制御部４４は、保持したレジスタ番号ＲＮのうち、実行が完了した命令に対応するレジスタ番号ＲＮが示すディスティネーションレジスタにデータを書き込む書き込み要求ＷＣＭＴ１（またはＷＣＭＴ２）を出力する。 The commit control unit 44 outputs a reset request RST1 to the register management table 16 and outputs a reset request RST2 to the reservation station 18 based on the completion of execution of the instruction. Further, the commit control unit 44 outputs a control signal WCMT and an address WCMTUBA to the bypass control unit 24 based on completion of execution of the instruction. The commit control unit 44 holds the register number RN and the address UBA output by the decoder unit 14 based on the decoding of the instruction for each instruction. The commit control unit 44 outputs a read request RCMT1 (or RCMT2) for reading data from the renaming register RNFR indicated by the address UBA corresponding to the instruction that has been executed among the held addresses UBA. The commit control unit 44 outputs a write request WCMT1 (or WCMT2) for writing data to the destination register indicated by the register number RN corresponding to the instruction that has been executed among the held register numbers RN.

図３は、図２に示す浮動小数点レジスタ部４０および固定小数点レジスタ部４２の例を示す。浮動小数点レジスタ部４０は、倍精度（８バイト）の浮動小数点データを格納する６４個の浮動小数点レジスタＦＲ（ＦＲ０−ＦＲ６３）を有する。また、浮動小数点レジスタ部４０は、浮動小数点レジスタＦＲへのデータの書き込みを制御する書き込み制御部ＷＣＮＴ１および浮動小数点レジスタＦＲからのデータの読み出しを制御する読み出し制御部ＲＣＮＴ１を有する。 FIG. 3 shows an example of the floating point register unit 40 and the fixed point register unit 42 shown in FIG. The floating point register unit 40 includes 64 floating point registers FR (FR0 to FR63) for storing double precision (8 bytes) floating point data. The floating point register unit 40 includes a write control unit WCNT1 that controls writing of data to the floating point register FR and a read control unit RCNT1 that controls reading of data from the floating point register FR.

各浮動小数点レジスタＦＲは、８バイトのダブルデータＤＤ（倍精度の浮動小数点データ）または４バイトの２つのシングルデータＳＤ（単精度の浮動小数点データ）を記憶する８バイトの記憶領域を有している。すなわち、各浮動小数点レジスタＦＲは、２つの単精度（４バイト）の浮動小数点データを格納可能である。図３は、浮動小数点レジスタＦＲ０にダブルデータＤＤが格納され、浮動小数点レジスタＦＲ１に２つのシングルデータＳＤが格納された状態を示す。 Each floating point register FR has an 8-byte storage area for storing 8-byte double data DD (double-precision floating-point data) or 4-byte two single data SD (single-precision floating-point data). Yes. That is, each floating-point register FR can store two single-precision (4-byte) floating-point data. FIG. 3 shows a state in which double data DD is stored in the floating-point register FR0 and two single data SD are stored in the floating-point register FR1.

１つの浮動小数点レジスタＦＲに２つのシングルデータＳＤを格納することで、１つの浮動小数点レジスタＦＲの番号により２つのシングルデータＳＤを使用することができる。これにより、２つの浮動小数点レジスタＦＲのそれぞれにシングルデータＳＤを格納する場合に比べて、単精度浮動小数点データの演算性能は向上する。以降、２つのシングルデータＳＤを格納する浮動小数点レジスタＦＲは、倍幅単精度浮動小数点レジスタとも称され、倍幅単精度浮動小数点レジスタに格納されるデータは、倍幅単精度浮動小数点データとも称される。 By storing two single data SD in one floating point register FR, two single data SD can be used according to the number of one floating point register FR. Thereby, the calculation performance of single-precision floating-point data is improved as compared with the case where single data SD is stored in each of the two floating-point registers FR. Hereinafter, the floating-point register FR that stores two single data SD is also referred to as a double-width single-precision floating-point register, and the data stored in the double-width single-precision floating-point register is also referred to as double-width single-precision floating-point data. Is done.

固定小数点レジスタ部４２は、８バイトの固定小数点データを格納する３２個の固定小数点レジスタＲ（Ｒ０−Ｒ３１）を有する。また、固定小数点レジスタ部４２は、固定小数点レジスタＲへのデータの書き込みを制御する書き込み制御部ＷＣＮＴ２および固定小数点レジスタＲからのデータの読み出しを制御する読み出し制御部ＲＣＮＴ２を有する。各固定小数点レジスタＲは、８バイトのダブルデータＤＤを記憶する８バイトの記憶領域を有している。図３は、固定小数点レジスタＲ１にダブルデータＤＤが格納された状態を示す。 The fixed-point register unit 42 has 32 fixed-point registers R (R0 to R31) that store 8-byte fixed-point data. The fixed-point register unit 42 includes a write control unit WCNT2 that controls writing of data to the fixed-point register R and a read control unit RCNT2 that controls reading of data from the fixed-point register R. Each fixed-point register R has an 8-byte storage area for storing 8-byte double data DD. FIG. 3 shows a state in which double data DD is stored in the fixed-point register R1.

図４は、図２に示すリネーミングレジスタ部３６、３８の例を示す。リネーミングレジスタ部３６は、図３に示す浮動小数点レジスタＦＲのいずれかが割り当てられる３２個のリネーミングレジスタＲＮＦＲ（ＲＮＦＲ０−ＲＮＦＲ３１）、書き込み制御部ＷＣＮＴ３および読み出し制御部ＲＣＮＴ３を有する。各リネーミングレジスタＲＮＦＲは、２つの４バイトの記憶領域を含む８バイトの記憶領域を有する。書き込み制御部ＷＣＮＴ３は、演算命令の実行時にリザルトレジスタ３２から転送されるデータＥＸＤ１またはロード命令の実行時にロードレジスタ３０から転送されるデータＬＤＤのいずれかを選択するセレクタＳＥＬ３を有する。例えば、書き込み制御部ＷＣＮＴ３によりデータが書き込まれるリネーミングレジスタＲＮＦＲは、レジスタ管理テーブル１６に格納された情報に基づいてリザベーションステーション１８が指定する。 FIG. 4 shows an example of the renaming register units 36 and 38 shown in FIG. The renaming register unit 36 includes 32 renaming registers RNFR (RNFR0 to RNFR31) to which any of the floating-point registers FR illustrated in FIG. 3 is assigned, a write control unit WCNT3, and a read control unit RCNT3. Each renaming register RNFR has an 8-byte storage area including two 4-byte storage areas. The write control unit WCNT3 includes a selector SEL3 that selects either the data EXD1 transferred from the result register 32 when the arithmetic instruction is executed or the data LDD transferred from the load register 30 when the load instruction is executed. For example, the renaming register RNFR into which data is written by the write control unit WCNT3 is specified by the reservation station 18 based on information stored in the register management table 16.

セレクタＳＥＬ３は、各リネーミングレジスタＲＮＦＲの上位４バイトにデータを格納する場合、書き込み制御信号ＷＥ１Ｈおよび４バイト（すなわち、３２ビット）のデータＥＸＤ１ＨをリネーミングレジスタＲＮＦＲに出力する。セレクタＳＥＬ３は、リネーミングレジスタＲＮＦＲの下位４バイトにデータを格納する場合、書き込み制御信号ＷＥ１Ｌおよび４バイト（すなわち、３２ビット）のデータＥＸＤ１ＬをリネーミングレジスタＲＮＦＲに出力する。例えば、書き込み制御信号ＷＥ１Ｈ、ＷＥ１Ｌは、書き込み制御部ＷＣＮＴ３がデータＥＸＤ１、ＬＤＤ（各々６４ビット）の受信に基づいて生成する。 When the selector SEL3 stores data in the upper 4 bytes of each renaming register RNFR, the selector SEL3 outputs the write control signal WE1H and 4 bytes (that is, 32 bits) of data EXD1H to the renaming register RNFR. When the selector SEL3 stores data in the lower 4 bytes of the renaming register RNFR, the selector SEL3 outputs the write control signal WE1L and 4 bytes (that is, 32 bits) of data EXD1L to the renaming register RNFR. For example, the write control signals WE1H and WE1L are generated based on reception of data EXD1 and LDD (64 bits each) by the write control unit WCNT3.

浮動小数点演算器２６からリザルトレジスタ３２に転送されるデータおよびリザルトレジスタ３２からリネーミングレジスタ部３６に転送されるデータＥＸＤ１は、８バイトである。このため、書き込み制御部ＷＣＮＴ３は、データＥＸＤ１を受けた場合、書き込み制御信号ＷＥ１Ｈ、ＷＥ１Ｌを生成し、８バイトのデータＥＸＤ１Ｈ、ＥＸＤ１ＬをリネーミングレジスタＲＮＦＲに格納する。 Data transferred from the floating point arithmetic unit 26 to the result register 32 and data EXD1 transferred from the result register 32 to the renaming register unit 36 are 8 bytes. Therefore, when receiving the data EXD1, the write control unit WCNT3 generates the write control signals WE1H and WE1L, and stores the 8-byte data EXD1H and EXD1L in the renaming register RNFR.

ロード命令に基づいてロードレジスタ３０から転送されるデータＬＤＤが倍精度浮動小数点データ（８バイト）の場合、書き込み制御部ＷＣＮＴ３は、書き込み制御信号ＷＥ１Ｈ、ＷＥ１Ｌを生成する。これにより、８バイトのデータＥＸＤ１Ｈ、ＥＸＤ１Ｌ（すなわち、倍精度浮動小数点データ）がリネーミングレジスタＲＮＦＲに格納される。ロード命令に基づいてロードレジスタ３０から転送されるデータＬＤＤが倍幅単精度浮動小数点データ（８バイト）の場合も、書き込み制御部ＷＣＮＴ３は、書き込み制御信号ＷＥ１Ｈ、ＷＥ１Ｌを生成する。 When the data LDD transferred from the load register 30 based on the load instruction is double precision floating point data (8 bytes), the write control unit WCNT3 generates the write control signals WE1H and WE1L. As a result, 8-byte data EXD1H and EXD1L (that is, double-precision floating-point data) are stored in the renaming register RNFR. Even when the data LDD transferred from the load register 30 based on the load instruction is double-width single precision floating point data (8 bytes), the write control unit WCNT3 generates the write control signals WE1H and WE1L.

一方、ロード命令に基づいてロードレジスタ３０から転送されるデータＬＤＤが倍幅単精度浮動小数点データの上位４バイトの場合、書き込み制御部ＷＣＮＴ３は、書き込み制御信号ＷＥ１Ｈを生成する。これにより、４バイトのデータＥＸＤ１Ｈ（すなわち、倍幅単精度浮動小数点データの上位４バイト）がリネーミングレジスタＲＮＦＲの上位４バイトの領域に格納される。同様に、ロード命令に基づいてロードレジスタ３０から転送されるデータＬＤＤが倍幅単精度浮動小数点データの下位４バイトの場合、書き込み制御部ＷＣＮＴ３は、書き込み制御信号ＷＥ１Ｌを生成する。これにより、４バイトのデータＥＸＤ１Ｌ（すなわち、倍幅単精度浮動小数点データの下位４バイト）がリネーミングレジスタＲＮＦＲの下位４バイトの領域に格納される。 On the other hand, when the data LDD transferred from the load register 30 based on the load instruction is the upper 4 bytes of the double-width single precision floating point data, the write control unit WCNT3 generates the write control signal WE1H. As a result, 4-byte data EXD1H (that is, the upper 4 bytes of double-width single-precision floating-point data) is stored in the upper 4 bytes of the renaming register RNFR. Similarly, when the data LDD transferred from the load register 30 based on the load instruction is the lower 4 bytes of the double-width single precision floating point data, the write control unit WCNT3 generates the write control signal WE1L. As a result, 4-byte data EXD1L (that is, the lower 4 bytes of the double-width single-precision floating-point data) is stored in the lower 4 bytes of the renaming register RNFR.

ここで、倍幅単精度浮動小数点データは、データキャッシュ２２の１つのキャッシュラインに含まれる場合、一度で転送され、２つのキャッシュラインに跨って含まれる場合、上位４バイトと下位４バイトに分けて順次に転送される。 Here, when double-width single-precision floating-point data is included in one cache line of the data cache 22, it is transferred at once, and when it is included across two cache lines, it is divided into upper 4 bytes and lower 4 bytes. Are transferred sequentially.

読み出し制御部ＲＣＮＴ３は、コミット制御部４４から出力される読み出し要求ＲＣＭＴ１に基づいて、リネーミングレジスタＲＮＦＲ０−ＲＮＦＲ３１のいずれかからデータを読み出し、読み出したデータを浮動小数点レジスタ部４０に出力する。例えば、読み出し制御部ＲＣＮＴ３によりデータが読み出されるリネーミングレジスタＲＮＦＲは、レジスタ管理テーブル１６に格納された情報に基づいてリザベーションステーション１８が指定する。 The read control unit RCNT3 reads data from one of the renaming registers RNFR0 to RNFR31 based on the read request RCMT1 output from the commit control unit 44, and outputs the read data to the floating point register unit 40. For example, the renaming register RNFR from which data is read by the read control unit RCNT3 is designated by the reservation station 18 based on information stored in the register management table 16.

リネーミングレジスタ部３８は、図３に示す固定小数点レジスタＲのいずれかが割り当てられる３２個のリネーミングレジスタＲＮＲ（ＲＮＲ０−ＲＮＲ３１）、書き込み制御部ＷＣＮＴ４および読み出し制御部ＲＣＮＴ４を有する。各リネーミングレジスタＲＮＲは、８バイトの記憶領域を有する。書き込み制御部ＷＣＮＴ４は、演算命令の実行時にリザルトレジスタ３４から出力されるデータＥＸＤ２またはロード命令の実行時にロードレジスタ３０から出力されるデータＬＤＤのいずれかを選択するセレクタＳＥＬ４を有する。例えば、書き込み制御部ＷＣＮＴ４によりデータが書き込まれるリネーミングレジスタＲＮＲは、リザベーションステーション１８により指定される。 The renaming register unit 38 includes 32 renaming registers RNR (RNR0 to RNR31) to which any of the fixed-point registers R shown in FIG. 3 is assigned, a write control unit WCNT4, and a read control unit RCNT4. Each renaming register RNR has an 8-byte storage area. The write control unit WCNT4 includes a selector SEL4 that selects either the data EXD2 output from the result register 34 when the arithmetic instruction is executed or the data LDD output from the load register 30 when the load instruction is executed. For example, the renaming register RNR into which data is written by the write control unit WCNT4 is designated by the reservation station 18.

セレクタＳＥＬ４は、リネーミングレジスタＲＮＲにデータを格納する場合、書き込み制御信号ＷＥ２および８バイト（６４ビット）のデータＥＸＤ２をリネーミングレジスタＲＮＲに出力する。例えば、書き込み制御信号ＷＥ２は、書き込み制御部ＷＣＮＴ４がデータＥＸＤ２、ＬＤＤ（各々６４ビット）の受信に基づいて生成する。読み出し制御部ＲＣＮＴ３は、コミット制御部４４から出力される読み出し要求ＲＣＭＴ２に基づいて、リネーミングレジスタＲＮＲ０−ＲＮＲ３１のいずれかからデータを読み出し、読み出したデータを固定小数点レジスタ部４２に出力する。例えば、読み出し制御部ＲＣＮＴ４によりデータが読み出されるリネーミングレジスタＲＮＲは、レジスタ管理テーブル１６に格納された情報に基づいてリザベーションステーション１８が指定する。 When the selector SEL4 stores data in the renaming register RNR, the selector SEL4 outputs the write control signal WE2 and 8-byte (64-bit) data EXD2 to the renaming register RNR. For example, the write control signal WE2 is generated by the write control unit WCNT4 based on reception of data EXD2 and LDD (each 64 bits). The read control unit RCNT3 reads data from one of the renaming registers RNR0 to RNR31 based on the read request RCMT2 output from the commit control unit 44, and outputs the read data to the fixed-point register unit 42. For example, the renaming register RNR from which data is read by the read control unit RCNT4 is designated by the reservation station 18 based on information stored in the register management table 16.

図４では、セレクタＳＥＬ３の回路規模が、セレクタＳＥＬ４の回路規模より大きく見える。しかしながら、セレクタＳＥＬ３が出力するデータＥＸＤ１Ｈ、ＥＸＤ１Ｌの幅の合計は６４ビットであり、セレクタＳＥＬ４が出力するデータＥＸＤ２の幅と同じである。このため、上位４バイトのデータと下位４バイトのデータとがリネーミングレジスタＲＮＦＲにそれぞれ格納される場合の書き込み制御部ＷＣＮＴ３の回路規模を、書き込み制御部ＷＣＮＴ４の回路規模と同等にすることができる。例えば、書き込み制御部ＷＣＮＴ４に対して追加される書き込み制御部ＷＣＮＴ３の要素は、書き込み制御信号ＷＥ１Ｈ、ＷＥ１Ｌを生成する論理回路と、書き込み制御信号ＷＥ１Ｈの配線領域である。したがって、リネーミングレジスタ部３６の回路規模は、既存のリネーミングレジスタ部およびリネーミングレジスタ部３８の回路規模と同等である。 In FIG. 4, the circuit scale of the selector SEL3 appears larger than the circuit scale of the selector SEL4. However, the total width of the data EXD1H and EXD1L output from the selector SEL3 is 64 bits, which is the same as the width of the data EXD2 output from the selector SEL4. Therefore, the circuit scale of the write control unit WCNT3 when the upper 4 bytes of data and the lower 4 bytes of data are respectively stored in the renaming register RNFR can be made equal to the circuit scale of the write control unit WCNT4. . For example, the elements of the write control unit WCNT3 added to the write control unit WCNT4 are a logic circuit that generates the write control signals WE1H and WE1L and a wiring region for the write control signal WE1H. Therefore, the circuit scale of the renaming register section 36 is equivalent to the circuit scale of the existing renaming register section and the renaming register section 38.

倍幅単精度浮動小数点データが上位４バイトと下位４バイトに分けてデータキャッシュ２２から読み出される場合、下位４バイトのデータがリネーミングレジスタＲＮＦＲに格納された時点で、８バイトのデータがリネーミングレジスタＲＮＦＲ内に揃う。すなわち、上位４バイトと下位４バイトに分けて転送される倍幅単精度浮動小数点データは、リネーミングレジスタ部３６で待ち合わされ、結合される。図４に示すリネーミングレジスタ部３６を用いない場合、上位４バイトと下位４バイトに分けて転送される倍幅単精度浮動小数点データを結合する新たなバッファ回路等が、例えば、データキャッシュ２２の出力に接続される。上述したように、リネーミングレジスタ部３６の回路規模は、既存のリネーミングレジスタ部の回路規模と同等である。このため、リネーミングレジスタ部３６を有する演算処理装置ＯＰＤ２は、上位４バイトと下位４バイトとのデータ群に分けて転送される倍幅単精度浮動小数点データを結合する新たなバッファ回路を有する演算処理装置に比べて、回路規模を削減することができる。 When double-width single-precision floating-point data is read from the data cache 22 separately in the upper 4 bytes and the lower 4 bytes, when the lower 4 bytes are stored in the renaming register RNFR, the 8-byte data is renamed. Aligned in register RNFR. That is, double-width single-precision floating-point data transferred separately into upper 4 bytes and lower 4 bytes are waited at the renaming register unit 36 and combined. When the renaming register unit 36 shown in FIG. 4 is not used, a new buffer circuit or the like that combines double-width single-precision floating-point data transferred separately in the upper 4 bytes and the lower 4 bytes is provided in the data cache 22, for example. Connected to output. As described above, the circuit scale of the renaming register unit 36 is equal to the circuit scale of the existing renaming register unit. For this reason, the arithmetic processing unit OPD2 having the renaming register unit 36 has a new buffer circuit that combines double-width single-precision floating-point data that is transferred divided into data groups of upper 4 bytes and lower 4 bytes. Compared with the processing apparatus, the circuit scale can be reduced.

図５は、図２に示すレジスタ管理テーブル１６の例を示す。レジスタ管理テーブル１６のテーブルＲＭＴＢＬ１は、浮動小数点レジスタ部４０の６４個の浮動小数点レジスタＦＲ０−ＦＲ６３のそれぞれに対応してビット値ＰとアドレスＵＢＡとを格納する６４個の領域を有する。図５において、テーブルＲＭＴＢＬ１の左端に示す数字は、浮動小数点レジスタＦＲの番号（すなわち、レジスタ番号ＲＮ）を示す。 FIG. 5 shows an example of the register management table 16 shown in FIG. The table RMTBL1 of the register management table 16 has 64 areas for storing the bit value P and the address UBA corresponding to each of the 64 floating point registers FR0 to FR63 of the floating point register unit 40. In FIG. 5, the number shown at the left end of the table RMTBL1 indicates the number of the floating point register FR (that is, the register number RN).

レジスタ管理テーブル１６は、デコーダ部１４からライト要求ＷＥとともに浮動小数点レジスタＦＲを示すレジスタ番号ＲＮおよびアドレスＵＢＡを受けた場合、テーブルＲＭＴＢＬ１におけるレジスタ番号ＲＮに対応する領域にアドレスＵＢＡを格納する。アドレスＵＢＡは、浮動小数点レジスタＦＲを割り当てるリネーミングレジスタＲＮＦＲを示す。そして、レジスタ管理テーブル１６は、アドレスＵＢＡを格納した領域のビット値Ｐをセットする。ビット値Ｐは、アドレスＵＢＡが有効か無効かを示す。これにより、デコーダ部１４が決定した浮動小数点レジスタＦＲとリネーミングレジスタＲＮＦＲとの対応付けが、テーブルＲＭＴＢＬ１に保持される。 When the register management table 16 receives the register number RN indicating the floating point register FR and the address UBA together with the write request WE from the decoder unit 14, the register management table 16 stores the address UBA in an area corresponding to the register number RN in the table RMTBL1. The address UBA indicates a renaming register RNFR to which the floating point register FR is allocated. Then, the register management table 16 sets the bit value P of the area storing the address UBA. The bit value P indicates whether the address UBA is valid or invalid. As a result, the correspondence between the floating point register FR and the renaming register RNFR determined by the decoder unit 14 is held in the table RMTBL1.

レジスタ管理テーブル１６は、デコーダ部１４からテーブルＲＭＴＢＬ１を参照する読み出し要求を受けた場合、読み出し要求に含まれるレジスタ番号ＲＮに対応する領域からビット値ＰとアドレスＵＢＡとを読み出す。そして、レジスタ管理テーブル１６は、読み出したビット値ＰとアドレスＵＢＡとをリザベーションステーション１８に出力する。 When receiving a read request referring to the table RMTBL1 from the decoder unit 14, the register management table 16 reads the bit value P and the address UBA from the area corresponding to the register number RN included in the read request. Then, the register management table 16 outputs the read bit value P and address UBA to the reservation station 18.

レジスタ管理テーブル１６のテーブルＲＭＴＢＬ２は、固定小数点レジスタ部４２の３２個の固定小数点レジスタＲ０−Ｒ３１のそれぞれに対応してビット値ＰとアドレスＵＢＡとを格納する３２個の領域を有する。図５において、テーブルＲＭＴＢＬ２の左端に示す数字は、固定小数点レジスタＲの番号（すなわち、レジスタ番号ＲＮ）を示す。 The table RMTBL2 of the register management table 16 has 32 areas for storing the bit value P and the address UBA corresponding to each of the 32 fixed-point registers R0 to R31 of the fixed-point register unit 42. In FIG. 5, the number shown at the left end of the table RMTBL2 indicates the number of the fixed-point register R (that is, the register number RN).

レジスタ管理テーブル１６は、テーブルＲＭＴＢＬ１と同様に、テーブルＲＭＴＢＬ２におけるレジスタ番号Ｒに対応する領域にアドレスＵＢＡを格納し、ビット値Ｐをセットする。アドレスＵＢＡは、固定小数点レジスタＲを割り当てるリネーミングレジスタＲＮＲを示す。これにより、デコーダ部１４が決定した固定小数点レジスタＲとリネーミングレジスタＲＮＲとの対応付けが、テーブルＲＭＴＢＬ２に保持される。また、レジスタ管理テーブル１６は、テーブルＲＭＴＢＬ１と同様に、リザベーションステーション１８からの読み出し要求に含まれるレジスタ番号Ｒに対応する領域からビット値ＰとアドレスＵＢＡとを読み出し、リザベーションステーション１８に出力する。 Similarly to the table RMTBL1, the register management table 16 stores the address UBA in the area corresponding to the register number R in the table RMTBL2, and sets the bit value P. The address UBA indicates a renaming register RNR to which the fixed point register R is assigned. Thereby, the correspondence between the fixed-point register R and the renaming register RNR determined by the decoder unit 14 is held in the table RMTBL2. Similarly to the table RMTBL 1, the register management table 16 reads the bit value P and the address UBA from the area corresponding to the register number R included in the read request from the reservation station 18, and outputs it to the reservation station 18.

図６は、図２に示すデータキャッシュ２２からデータをロードする例を示す。例えば、データキャッシュ２２は、１２８バイトの幅を有する６４個のキャッシュラインを有する。図６に示す”７０”および”８０”等は、１６進数であり、連続する２つのキャッシュラインの先頭アドレスの下位８ビットを示す。１６進数のアドレスの右側に示す括弧内の数値は、アドレスを２進数で表した値である。以下では、図６に示す連続する２つのキャッシュラインにデータが続けて格納されているとする。 FIG. 6 shows an example of loading data from the data cache 22 shown in FIG. For example, the data cache 22 has 64 cache lines having a width of 128 bytes. “70”, “80”, and the like shown in FIG. 6 are hexadecimal numbers and indicate the lower 8 bits of the head addresses of two consecutive cache lines. A numerical value in parentheses on the right side of the hexadecimal address is a value representing the address in binary. In the following, it is assumed that data is continuously stored in two consecutive cache lines shown in FIG.

例えば、倍精度浮動小数点データ（８バイト）をロードするロード命令では、アドレスの下位３ビットが”０００”に設定され、倍幅単精度浮動小数点データをロードするロード命令では、アドレスの下位２ビットが”００”に設定される。このため、データキャッシュ２２からロードされる倍精度浮動小数点データは、キャッシュラインの境界を跨がない（図６（ａ））。データキャッシュ２２からロードされる倍幅単精度浮動小数点データも、先頭アドレスの下位３ビットが”０００”の場合、キャッシュラインの境界を跨がない（図６（ｂ））。しかしながら、先頭アドレスの下位３ビットが”１００”の場合、データキャッシュ２２からロードされる倍幅単精度浮動小数点データは、キャッシュラインの境界を跨ぐ場合がある（図６（ｃ））。なお、図６（ｄ）は、先頭アドレスの下位３ビットが”１００”の場合で、キャッシュラインの境界を跨がない例を示す。 For example, in a load instruction that loads double precision floating point data (8 bytes), the lower 3 bits of the address are set to “000”, and in a load instruction that loads double width single precision floating point data, the lower 2 bits of the address Is set to “00”. For this reason, the double-precision floating point data loaded from the data cache 22 does not straddle the boundary of the cache line (FIG. 6A). The double-width single-precision floating point data loaded from the data cache 22 does not cross the boundary of the cache line when the lower 3 bits of the head address are “000” (FIG. 6B). However, when the lower 3 bits of the head address are “100”, the double-width single-precision floating point data loaded from the data cache 22 may straddle the boundary of the cache line (FIG. 6C). FIG. 6D shows an example in which the lower 3 bits of the head address are “100” and the boundary of the cache line is not straddled.

倍幅単精度浮動小数点データをロードするロード命令において、キャッシュラインの境界を跨ぐ先頭アドレスが指定された場合、図２に示すアドレス生成部２０は、キャッシュライン毎にアクセス要求ＭＲＥＱＣを生成する。すなわち、倍幅単精度浮動小数点データがキャッシュラインの境界を跨いで、データキャッシュ２２に格納されている場合、アドレス生成部２０は、倍幅単精度浮動小数点データを２回に分けてデータキャッシュ２２から読み出す。この実施形態では、図１２で説明するように、ロード命令によりデータが２回に分けてデータキャッシュ２２から読み出される場合にも、回路規模の増加を抑制して、正しいタイミングでデータを浮動小数点演算器２６へバイパスさせることができる。 In the load instruction for loading double-width single-precision floating-point data, when a head address that crosses the boundary of the cache line is specified, the address generation unit 20 shown in FIG. 2 generates an access request MREQC for each cache line. That is, when the double-width single-precision floating point data is stored in the data cache 22 across the boundary of the cache line, the address generation unit 20 divides the double-width single-precision floating point data into two data caches 22. Read from. In this embodiment, as will be described with reference to FIG. 12, even when data is read from the data cache 22 in two steps by a load instruction, the increase in circuit scale is suppressed and data is floating-point operated at the correct timing. Can be bypassed.

なお、倍幅単精度浮動小数点データの上位４バイトまたは下位４バイトがデータキャッシュ２２に格納されていない場合（キャッシュミス）、データキャッシュ２２は、二次データキャッシュ２２またはメインメモリ等から１２８バイトのデータを読み出す。そして、データキャッシュ２２は、読み出した１２８バイトのデータをキャッシュラインのいずれかに格納する。そして、アドレス生成部２０は、データキャッシュ２２にデータが格納された後に、倍幅単精度浮動小数点データの上位４バイトまたは下位４バイトを読み出すアクセス要求ＭＲＥＱＣをデータキャッシュ２２に発行する。 If the upper 4 bytes or the lower 4 bytes of the double-width single-precision floating-point data are not stored in the data cache 22 (cache miss), the data cache 22 receives 128 bytes from the secondary data cache 22 or the main memory. Read data. The data cache 22 stores the read 128-byte data in one of the cache lines. Then, after the data is stored in the data cache 22, the address generation unit 20 issues an access request MREQC for reading the upper 4 bytes or the lower 4 bytes of the double-width single precision floating point data to the data cache 22.

図７は、図２に示すバイパス制御部２４に設けられるセット信号生成回路ＳＳＧＥＮの例を示す。セット信号生成回路ＳＳＧＥＮは、直列に接続された２つのラッチ回路ＦＦ１、ＦＦ２およびサイクル信号生成部ＡＳＧＥＮ、ＭＳＧＥＮを有する。ラッチ回路ＦＦ１、ＦＦ２は、クロック信号ＣＬＫに同期して動作し、アドレス生成部２０からの制御信号ＡＬＤＬ、ＡＬＤＨ、ＡＬＤＢＵＡをラッチする。 FIG. 7 shows an example of the set signal generation circuit SSGEN provided in the bypass control unit 24 shown in FIG. The set signal generation circuit SSGEN includes two latch circuits FF1 and FF2 and cycle signal generation units ASGEN and MSGEN connected in series. The latch circuits FF1 and FF2 operate in synchronization with the clock signal CLK and latch the control signals ALDL, ALDH, and ALDBUA from the address generation unit 20.

サイクル信号生成部ＡＳＧＥＮは、制御信号ＡＬＤＬ、ＡＬＤＨを受けるアンド回路ＡＮＤ１を有する。サイクル信号生成部ＡＳＧＥＮは、制御信号ＡＬＤＬ、ＡＬＤＨがともに有効レベル（例えば、ハイレベル）のときにセット信号ＡＬＤＳＥＴを有効レベルに設定する。有効レベルのセット信号ＡＬＤＳＥＴにより、図８に示すバイパステーブルＢＰＴＢＬのフラグ領域ＦＬＧのいずれかがセットされる。セットされるフラグ領域ＦＬＧは、アドレスＡＬＤＵＢＡで示される。セット信号ＡＬＤＳＥＴによるフラグ領域ＦＬＧのセットタイミングは、後述するＡサイクルである。サイクル信号生成部ＡＳＧＥＮは、制御信号ＡＬＤＬ、ＡＬＤＨの少なくともいずれかが無効レベル（例えば、ロウレベル）のときにセット信号ＡＬＤＳＥＴを無効レベルに設定する。サイクル信号生成部ＡＳＧＥＮは、複数の制御信号ＡＬＤＬ、ＡＬＤＨの共通なタイミングでの受信に基づいてセット信号ＡＬＤＳＥＴを生成する第１生成回路の一例である。 The cycle signal generator ASGEN includes an AND circuit AND1 that receives the control signals ALDL and ALDH. The cycle signal generator ASGEN sets the set signal ALDSET to an effective level when both the control signals ALDL and ALDH are at an effective level (for example, high level). One of the flag areas FLG of the bypass table BPTBL shown in FIG. 8 is set by the effective level set signal ALDSET. The flag area FLG to be set is indicated by an address ALDUBA. The set timing of the flag area FLG by the set signal ALDSET is an A cycle described later. The cycle signal generation unit ASGEN sets the set signal ALDSET to an invalid level when at least one of the control signals ALDL and ALDH is at an invalid level (for example, low level). The cycle signal generation unit ASGEN is an example of a first generation circuit that generates a set signal ALDSET based on reception of a plurality of control signals ALDL and ALDH at a common timing.

サイクル信号生成部ＭＳＧＥＮは、ラッチ回路ＦＦ２の出力Ｑ１に接続されたインバータＩＶ１と、ラッチ回路ＦＦ２の出力Ｑ０およびインバータＩＶ１の出力に接続されたアンド回路ＡＮＤ２とを有する。ラッチ回路ＦＦ２は、制御信号ＡＬＤＬを２クロックサイクル遅らせた信号を出力Ｑ０から出力し、制御信号ＡＬＤＨを２クロックサイクル遅らせた信号を出力Ｑ２から出力する。サイクル信号生成部ＭＳＧＥＮは、２クロックサイクル遅らせた制御信号ＡＬＤＬ、ＡＬＤＨのそれぞれが有効レベルと無効レベルのとき、セット信号ＭＬＤＳＥＴを有効レベルに設定する。有効レベルのセット信号ＭＬＤＳＥＴにより、図８に示すバイパステーブルＢＰＴＢＬのフラグ領域ＦＬＧのいずれかがセットされる。セットされるフラグ領域ＦＬＧは、アドレスＭＬＤＵＢＡで示される。セット信号ＭＬＤＳＥＴによるフラグ領域ＦＬＧのセットタイミングは、後述するＭサイクルである。 The cycle signal generation unit MSGEN has an inverter IV1 connected to the output Q1 of the latch circuit FF2, and an AND circuit AND2 connected to the output Q0 of the latch circuit FF2 and the output of the inverter IV1. The latch circuit FF2 outputs a signal obtained by delaying the control signal ALDL by two clock cycles from the output Q0, and outputs a signal obtained by delaying the control signal ALDH by two clock cycles from the output Q2. The cycle signal generation unit MSGEN sets the set signal MLDSET to the valid level when the control signals ALDL and ALDH delayed by two clock cycles are at the valid level and the invalid level, respectively. One of the flag areas FLG of the bypass table BPTBL shown in FIG. 8 is set by an effective level set signal MLDSET. The flag area FLG to be set is indicated by an address MLDUBA. The set timing of the flag area FLG by the set signal MLDSET is an M cycle described later.

サイクル信号生成部ＭＳＧＥＮは、２クロックサイクル遅らせた制御信号ＡＬＤＬが無効レベルのとき、または２クロックサイクル遅らせた制御信号ＡＬＤＨが有効レベルのとき、セット信号ＭＬＤＳＥＴを無効レベルに設定する。サイクル信号生成部ＭＳＧＥＮは、最後の制御信号ＡＬＤＬの受信から所定のサイクル後にセット信号ＭＬＤＳＥＴを生成する第２生成回路の一例である。 The cycle signal generation unit MSGEN sets the set signal MLDSET to the invalid level when the control signal ALDL delayed by two clock cycles is at the invalid level or when the control signal ALDH delayed by two clock cycles is at the valid level. The cycle signal generation unit MSGEN is an example of a second generation circuit that generates the set signal MLDSET after a predetermined cycle from the reception of the last control signal ALDL.

制御信号ＡＬＤＬ、ＡＬＤＨがどちらも無効レベルの場合、アドレス生成部２０はデータキャッシュ２２にアクセス要求ＭＲＥＱＣを出力していないため、セット信号ＡＬＤＳＥＴ、ＭＬＤＳＥＴはどちらも有効レベルにならない。このため、フラグ領域ＦＬＧはセットされない。 When both of the control signals ALDL and ALDH are at an invalid level, the address generation unit 20 does not output the access request MREQC to the data cache 22, so that the set signals ALDSET and MLDSET are not at the valid level. For this reason, the flag area FLG is not set.

制御信号ＡＬＤＬ、ＡＬＤＨがどちらも有効レベルの場合、アドレス生成部２０は、８バイトのデータをデータキャッシュ２２から読み出すアクセス要求ＭＲＥＱＣを出力している。この場合、セット信号ＡＬＤＳＥＴは有効レベルになり、セット信号ＭＬＤＳＥＴは無効レベルになり、フラグ領域ＦＬＧは、Ａサイクルでセットされる。 When both the control signals ALDL and ALDH are at a valid level, the address generation unit 20 outputs an access request MREQC that reads 8-byte data from the data cache 22. In this case, the set signal ALDSET becomes a valid level, the set signal MLDSET becomes an invalid level, and the flag area FLG is set in the A cycle.

制御信号ＡＬＤＬが無効レベルで、制御信号ＡＬＤＨが有効レベルの場合、アドレス生成部２０は、倍幅単精度浮動小数点データの上位４バイトのデータをデータキャッシュ２２から読み出すアクセス要求ＭＲＥＱＣを出力している。この場合、セット信号ＡＬＤＳＥＴ、ＭＬＤＳＥＴはどちらも有効レベルにならず、フラグ領域ＦＬＧはセットされない。 When the control signal ALDL is at the invalid level and the control signal ALDH is at the valid level, the address generation unit 20 outputs an access request MREQC for reading the upper 4 bytes of double-width single precision floating point data from the data cache 22. . In this case, both the set signals ALDSET and MLDSET do not become valid levels, and the flag area FLG is not set.

制御信号ＡＬＤＬが有効レベルで、制御信号ＡＬＤＨが無効レベルの場合、アドレス生成部２０は、倍幅単精度浮動小数点データの下位４バイトのデータをデータキャッシュ２２から読み出すアクセス要求ＭＲＥＱＣを出力している。この時点で、倍幅単精度浮動小数点データの上位４バイトは、データキャッシュ２２から既にロードされており、下位４バイトのデータがデータキャッシュ２２から読み出されることで、８バイトの倍幅単精度浮動小数点データが揃う。この場合、セット信号ＡＬＤＳＥＴは無効レベルになり、セット信号ＭＬＤＳＥＴは有効レベルになり、フラグ領域ＦＬＧは、Ｍサイクルでセットされる。 When the control signal ALDL is at a valid level and the control signal ALDH is at an invalid level, the address generation unit 20 outputs an access request MREQC for reading the lower 4 bytes of double-width single precision floating point data from the data cache 22. . At this point, the upper 4 bytes of the double-width single-precision floating-point data are already loaded from the data cache 22, and the lower-order 4 bytes of data are read from the data cache 22, so that the 8-byte double-width single precision floating point data is read. The decimal point data is available. In this case, the set signal ALDSET becomes an invalid level, the set signal MLDSET becomes an effective level, and the flag area FLG is set in M cycles.

この実施形態では、数ゲート規模のサイクル信号生成部ＡＳＧＥＮ、ＭＳＧＥＮにより、８バイトのデータがデータキャッシュ２２から読み出されるか否かを判定し、セット信号ＡＬＤＳＥＴまたはセット信号ＭＬＤＳＥＴを生成することができる。 In this embodiment, it is possible to determine whether 8-byte data is read from the data cache 22 by the cycle signal generation units ASGEN and MSGEN having a scale of several gates, and to generate the set signal ALDSET or the set signal MLDSET.

図８は、バイパス制御部２４に設けられるバイパス管理テーブルＢＰＴＢＬおよびテーブル制御回路ＴＳＣＮＴの例を示す。 FIG. 8 shows an example of the bypass management table BPTBL and the table control circuit TSCNT provided in the bypass control unit 24.

バイパス管理テーブルＢＰＴＢＬは、１ビットのラッチ回路を有する３２個のフラグ領域ＦＬＧ（ＦＬＧ０−ＦＬＧ３１）を有する。バイパス管理テーブルＢＰＴＢＬは、クロック信号ＣＬＫに同期して入力端子の論理をラッチして各フラグ領域ＦＬＧに保持し、保持した論理をバイパス可能信号ＢＰＥＮ（ＢＰＥＮ０−ＢＰＥＮ３１）として出力する。 The bypass management table BPTBL has 32 flag areas FLG (FLG0 to FLG31) each having a 1-bit latch circuit. The bypass management table BPTBL latches the logic of the input terminal in synchronization with the clock signal CLK and holds it in each flag area FLG, and outputs the held logic as a bypass enable signal BPEN (BPEN0-BPEN31).

有効レベル（例えば、ハイレベル）のバイパス可能信号ＢＰＥＮは、図２に太い破線で示したバイパス経路を用いたデータのバイパスが可能であることを示す。すなわち、セット状態のフラグ領域ＦＬＧは、リネーミングレジスタＲＮＦＲ、リザルトレジスタ３２またはロードレジスタ３０に格納されたデータを、浮動小数点レジスタ部４０を介さずに浮動小数点演算器２６にバイパス可能であることを示す。なお、バイパス制御部２４は、固定小数点レジスタ部４２に対応するリネーミングレジスタ部３８に対応するバイパス管理テーブルを有するが、図示は省略する。 An effective level (for example, high level) bypass enable signal BPEN indicates that data can be bypassed using the bypass path indicated by a thick broken line in FIG. That is, the flag area FLG in the set state indicates that the data stored in the renaming register RNFR, the result register 32, or the load register 30 can be bypassed to the floating point arithmetic unit 26 without going through the floating point register unit 40. Show. Although the bypass control unit 24 has a bypass management table corresponding to the renaming register unit 38 corresponding to the fixed-point register unit 42, the illustration is omitted.

図２に示す演算処理装置ＯＰＤ２では、８バイトの倍幅単精度浮動小数点データが４バイトずつデータキャッシュ２２から読み出される場合、アドレス生成部２０は、上位４バイトと下位４バイトのデータを読み出すアクセス要求ＭＲＥＱＣを順次に生成する。これにより、下位４バイトのデータがデータキャッシュ２２からの出力されたことに基づいて、８バイト全ての倍幅単精度浮動小数点データがデータキャッシュ２２から読み出されたと判定することができる。したがって、バイパスが可能なことを示すフラグ領域ＦＬＧを、下位４バイトのデータの読み出しを示す制御信号ＡＬＤＬに基づいてセットさせることが可能になる。この結果、４バイトのデータが互いに異なるタイミングで２回に分けて読み出される場合にも、１ビットのフラグ領域ＦＬＧによりバイパス動作を許可させることができ、バイパス管理テーブルＢＰＴＢＬの回路規模の増加を抑制することができる。 In the arithmetic processing unit OPD2 shown in FIG. 2, when the double-byte single-precision floating-point data of 8 bytes is read from the data cache 22 by 4 bytes, the address generator 20 accesses to read out the upper 4 bytes and the lower 4 bytes. Request MREQC is generated sequentially. Thus, based on the fact that the lower 4 bytes of data are output from the data cache 22, it can be determined that all double-byte single-precision floating point data of 8 bytes has been read from the data cache 22. Therefore, the flag area FLG indicating that bypass is possible can be set based on the control signal ALDL indicating reading of the lower 4 bytes of data. As a result, even when 4-byte data is read out twice at different timings, the bypass operation can be permitted by the 1-bit flag area FLG, and an increase in the circuit scale of the bypass management table BPTBL is suppressed. can do.

テーブル制御回路ＴＳＣＮＴは、セット信号生成部ＡＳＳＥＴ、ＭＳＳＥＴ、Ｂ１ＳＥＴ、リセット信号生成部ＷＲＳＴおよびテーブル制御部ＴＢＬＣＮＴを有する。 The table control circuit TSCNT includes set signal generation units ASSET, MSSET, B1SET, a reset signal generation unit WRST, and a table control unit TBLCNT.

セット信号生成部ＡＳＳＥＴは、アドレスＡＬＤＵＢＡを受けるデコーダＡＤＥＣと、デコーダＡＤＥＣの出力に接続され、制御信号ＡＬＤＳＥＴを受ける３２個のアンド回路とを有する。デコーダＡＤＥＣは、アドレスＡＬＤＵＢＡの値を解読し、アドレスＡＬＤＵＢＡの値が示すフラグ領域ＦＬＧに対応する３２個の出力端子のいずれかを有効レベルに設定する。そして、デコーダＡＤＥＣからの有効レベルと制御信号ＡＬＤＳＥＴとを受けるアンド回路が出力するハイレベルにより、対応するフラグ領域ＦＬＧがセットされる。セット信号生成部ＡＳＳＥＴは、倍精度浮動小数点データまたは倍幅単精度浮動小数点データ（何れも８バイト）をロードする場合、フラグ領域ＦＬＧをセットする。 The set signal generation unit ASSET has a decoder ADEC that receives an address ALDUBA, and 32 AND circuits that are connected to the output of the decoder ADEC and receive a control signal ALDSET. The decoder ADEC decodes the value of the address ALDUBA, and sets any of the 32 output terminals corresponding to the flag area FLG indicated by the value of the address ALDUBA to an effective level. The corresponding flag area FLG is set by the high level output from the AND circuit that receives the effective level from the decoder ADEC and the control signal ALDSET. The set signal generation unit ASSET sets the flag area FLG when loading double precision floating point data or double width single precision floating point data (both are 8 bytes).

セット信号生成部ＭＳＳＥＴは、デコーダＡＤＥＣの代わりにデコーダＭＤＥＣを有し、アドレスＡＬＤＵＢＡと制御信号ＡＬＤＳＥＴの代わりにアドレスＭＬＤＵＢＡと制御信号ＭＬＤＳＥＴを受けることを除き、セット信号生成部ＡＳＳＥＴと同様である。セット信号生成部ＭＳＳＥＴは、デコーダＭＤＥＣからの有効レベルと制御信号ＭＬＤＳＥＴとを受けるアンド回路が出力するハイレベルにより、対応するフラグ領域ＦＬＧをセットする。セット信号生成部ＭＳＳＥＴは、倍幅単精度浮動小数点データを４バイトずつ互いに異なるタイミングでロードする場合、フラグ領域ＦＬＧをセットする。 The set signal generation unit MSSET has a decoder MDEC instead of the decoder ADEC, and is the same as the set signal generation unit ASSET except that it receives the address MLDUBA and the control signal MLDSET instead of the address ALDUBA and the control signal ALDSET. The set signal generation unit MSSET sets the corresponding flag area FLG according to the high level output from the AND circuit that receives the effective level from the decoder MDEC and the control signal MLDSET. The set signal generation unit MSSET sets the flag area FLG when loading double-width single-precision floating-point data at a timing different from each other by 4 bytes.

セット信号生成部Ｂ１ＳＥＴは、デコーダＡＤＥＣの代わりにデコーダＢＤＥＣを有し、アドレスＡＬＤＵＢＡと制御信号ＡＬＤＳＥＴの代わりにアドレスＢ１ＦＰＵＢＡと制御信号Ｂ１ＦＰを受けることを除き、セット信号生成部ＡＳＳＥＴと同様である。セット信号生成部Ｂ１ＳＥＴは、デコーダＢＤＥＣからの有効レベルと制御信号Ｂ１ＦＰとを受けるアンド回路が出力するハイレベルにより、対応するフラグ領域ＦＬＧをセットする。セット信号生成部Ｂ１ＳＥＴは、浮動小数点データの演算命令による演算の開始を示す信号を後述するＢ１サイクルまで保持した制御信号Ｂ１ＦＰに応答して、アドレスＢ１ＦＰＵＢＡが示すフラグ領域ＦＬＧをセットする。 The set signal generation unit B1SET is similar to the set signal generation unit ASSET except that it has a decoder BDEC instead of the decoder ADEC and receives the address B1FPUBA and the control signal B1FP instead of the address ALDUBA and the control signal ALDSET. The set signal generation unit B1SET sets the corresponding flag area FLG according to the high level output from the AND circuit that receives the effective level from the decoder BDEC and the control signal B1FP. The set signal generation unit B1SET sets the flag area FLG indicated by the address B1FPUBA in response to a control signal B1FP that holds a signal indicating the start of an operation based on a floating-point data operation instruction until a B1 cycle described later.

リセット信号生成部ＷＲＳＴは、アドレスＷＣＭＴＵＢＡを受けるデコーダＷＤＥＣと、デコーダＷＤＥＣの出力に接続され、制御信号ＷＣＭＴを受ける３２個のナンド回路と、ナンド回路の出力に接続された３２個のアンド回路とを有する。デコーダＷＤＥＣは、デコーダＡＤＥＣと同様に、アドレスＷＣＭＴＵＢＡの値を解読し、アドレスＷＣＭＴＵＢＡの値が示すフラグ領域ＦＬＧに対応する３２個の出力端子のいずれかを有効レベルに設定する。デコーダＷＤＥＣからの有効レベルと制御信号ＷＣＭＴとを受けるナンド回路は、ロウレベルを出力し、ナンド回路の出力に接続されたアンド回路の出力をロウレベルに設定する。これにより、対応するフラグ領域ＦＬＧがリセットされる。 The reset signal generation unit WRST includes a decoder WDEC that receives the address WCMTUBA, 32 NAND circuits that are connected to the output of the decoder WDEC and receive the control signal WCMT, and 32 AND circuits that are connected to the output of the NAND circuit. Have. Similarly to the decoder ADEC, the decoder WDEC decodes the value of the address WCMTUBA and sets any one of the 32 output terminals corresponding to the flag area FLG indicated by the value of the address WCMTUBA to an effective level. The NAND circuit that receives the effective level from the decoder WDEC and the control signal WCMT outputs a low level, and sets the output of an AND circuit connected to the output of the NAND circuit to a low level. As a result, the corresponding flag area FLG is reset.

なお、制御信号ＡＬＤＳＥＴ、ＭＬＤＳＥＴ、Ｂ１ＦＰの出力期間は、１クロックサイクルである。このため、各セット信号生成部ＡＳＳＥＴ、ＭＳＳＥＴ、Ｂ１ＳＥＴは、制御信号ＡＬＤＳＥＴ、ＭＬＤＳＥＴ、Ｂ１ＦＰを受けるクロックサイクルを除き、アンド回路からロウレベルを出力する。一方、リセット信号生成部ＷＲＳＴにおいてナンド回路の出力に接続されたアンド回路は、ハイレベルのバイパス可能信号ＢＰＥＮを受け、ナンド回路がロウレベルを出力するまでハイレベルを出力する。これにより、セット信号生成部ＡＳＳＥＴ、ＭＳＳＥＴ、Ｂ１ＳＥＴによりセットされたフラグ領域ＦＬＧのセット状態は、ナンド回路がロウレベルを出力するまで維持される。 The output period of the control signals ALDSET, MLDSET, and B1FP is one clock cycle. Therefore, each set signal generation unit ASSET, MSSET, B1SET outputs a low level from the AND circuit except for the clock cycle that receives the control signals ALDSET, MLDSET, B1FP. On the other hand, the AND circuit connected to the output of the NAND circuit in the reset signal generation unit WRST receives the high level bypass enable signal BPEN and outputs the high level until the NAND circuit outputs the low level. Thus, the set state of the flag area FLG set by the set signal generation units ASSET, MSSET, and B1SET is maintained until the NAND circuit outputs a low level.

テーブル制御部ＴＢＬＣＮＴは、セット信号生成部ＡＳＳＥＴ、ＭＳＳＥＴ、Ｂ１ＳＥＴの出力およびリセット信号生成部ＷＲＳＴの出力のオア論理を演算し、演算した論理をバイパステーブルＢＰＴＢＬの入力端子に出力する複数のオア回路を有する。 The table control unit TBLCNT calculates OR logic of the outputs of the set signal generation unit ASSET, MSSET, B1SET and the reset signal generation unit WRST, and outputs a plurality of OR circuits that output the calculated logic to the input terminal of the bypass table BPTBL. Have.

図７および図８に示す回路に基づいて、バイパス制御部２４は、アドレス生成部２０からの制御信号ＡＬＤＬ、ＡＬＤＨの論理に応じたタイミングで、制御信号ＡＬＤＢＵＡの論理が示すバイパス管理テーブルＢＰＴＢＬのフラグ領域ＦＬＧ（図８）をセットする。また、バイパス制御部２４は、リザベーションステーション１８からの制御信号Ｂ１ＦＰのタイミングで、アドレスＢ１ＦＰＵＢＡの論理が示すバイパス管理テーブルＢＰＴＢＬのフラグ領域ＦＬＧをセットする。 Based on the circuits shown in FIGS. 7 and 8, the bypass control unit 24 sets the flag of the bypass management table BPTBL indicated by the logic of the control signal ALDBUA at the timing according to the logic of the control signals ALDL and ALDH from the address generation unit 20. Region FLG (FIG. 8) is set. The bypass control unit 24 sets the flag area FLG of the bypass management table BPTBL indicated by the logic of the address B1FPUBA at the timing of the control signal B1FP from the reservation station 18.

バイパス制御部２４は、セットしたフラグ領域ＦＬＧに対応するバイパス可能信号ＢＰＥＮ（ＢＰＥＮ０−ＢＰＥＮ３１のいずれか）を有効レベルに設定する。バイパス制御部２４は、コミット制御部４４からの制御信号ＷＣＭＴのタイミングで、アドレスＷＣＭＴＵＢＡの論理が示すバイパス管理テーブルＢＰＴＢＬのフラグ領域ＦＬＧをリセットする。バイパス制御部２４は、リセットしたフラグ領域ＦＬＧに対応するバイパス可能信号ＢＰＥＮ（ＢＰＥＮ０−ＢＰＥＮ３１のいずれか）を無効レベルに設定する。テーブル制御部ＴＢＬＣＮＴおよびバイパス管理テーブルＢＰＴＢＬは、セット信号ＡＬＤＳＥＴまたはセット信号ＭＬＤＳＥＴに基づいて、バイパス可能信号ＢＰＥＮを出力する出力回路の一例である。 The bypass control unit 24 sets the bypassable signal BPEN (any one of BPEN0 to BPEN31) corresponding to the set flag area FLG to an effective level. The bypass control unit 24 resets the flag area FLG of the bypass management table BPTBL indicated by the logic of the address WCMTUBA at the timing of the control signal WCMT from the commit control unit 44. The bypass control unit 24 sets the bypassable signal BPEN (any one of BPEN0 to BPEN31) corresponding to the reset flag area FLG to an invalid level. The table control unit TBLCNT and the bypass management table BPTBL are an example of an output circuit that outputs a bypass enable signal BPEN based on the set signal ALDSET or the set signal MLDSET.

なお、バイパス制御部２４は、バイパス管理テーブルＢＰＴＢＬのフラグ領域ＦＬＧをセットした後、データキャッシュ２２から完了信号ＣＥを受けない場合、セットしたフラグ領域ＦＬＧをリセットする。データキャッシュ２２によるキャッシュミス時、データキャッシュ２２からのデータは、ロードレジスタ３０およびリネーミングレジスタ部３６に所望のサイクルで転送されない。この場合、ロードレジスタ３０およびリネーミングレジスタ部３６は、浮動小数点演算器２６へバイパスするデータを保持しないため、バイパスを禁止するために、フラグ領域ＦＬＧがリセットされる。完了信号ＣＥに基づいてフラグ領域ＦＬＧをリセットする論理は、例えば、リセット信号生成部ＷＲＳＴ内に設けられる。例えば、リセット信号生成部ＷＲＳＴは、完了信号ＣＥに対応してリセットするフラグ領域ＦＬＧを示すアドレスとアドレスＷＣＭＴＵＢＡとのオア論理をデコーダＷＤＥＣに供給するオア回路を有する。また、リセット信号生成部ＷＲＳＴは、制御信号ＷＣＭＴと完了信号ＣＥとのオア論理を各ナンド回路に供給するオア回路を有する。 If the bypass control unit 24 does not receive the completion signal CE from the data cache 22 after setting the flag area FLG of the bypass management table BPTBL, it resets the set flag area FLG. At the time of a cache miss by the data cache 22, data from the data cache 22 is not transferred to the load register 30 and the renaming register unit 36 in a desired cycle. In this case, since the load register 30 and the renaming register unit 36 do not hold the data to be bypassed to the floating point arithmetic unit 26, the flag area FLG is reset to prohibit the bypass. The logic for resetting the flag area FLG based on the completion signal CE is provided in the reset signal generation unit WRST, for example. For example, the reset signal generation unit WRST has an OR circuit that supplies an OR logic of an address indicating the flag area FLG to be reset and the address WCMTUBA corresponding to the completion signal CE to the decoder WDEC. Further, the reset signal generation unit WRST has an OR circuit that supplies an OR logic of the control signal WCMT and the completion signal CE to each NAND circuit.

図９は、図７および図８に示すバイパス制御部２４の動作の例を示す。図２に示すアドレス生成部２０は、８バイトの倍精度浮動小数点データまたは８バイトの倍幅単精度浮動小数点データを１回のアクセス要求ＭＲＥＱＣでデータキャッシュ２２から読み出す場合、制御信号ＡＬＤＬ、ＡＬＤＨを同時に生成する。また、アドレス生成部２０は、倍幅単精度浮動小数点データを２回のアクセス要求ＭＲＥＱＣでデータキャッシュ２２から４バイトずつ読み出す場合、制御信号ＡＬＤＨ、ＡＬＤＬを順次に生成する。この場合、８バイトの倍幅単精度浮動小数点データのうち、上位４バイトがデータキャッシュ２２から読み出された後、下位４バイトがデータキャッシュ２２から読み出される。 FIG. 9 shows an example of the operation of the bypass control unit 24 shown in FIGS. The address generator 20 shown in FIG. 2 reads the control signals ALDL and ALDH when reading 8-byte double-precision floating-point data or 8-byte double-width single-precision floating-point data from the data cache 22 with one access request MREQC. Generate at the same time. The address generation unit 20 sequentially generates control signals ALDH and ALDL when the double-width single-precision floating point data is read from the data cache 22 4 bytes at a time by two access requests MREQC. In this case, among the 8-byte double-width single-precision floating point data, the upper 4 bytes are read from the data cache 22 and then the lower 4 bytes are read from the data cache 22.

例えば、動作ＯＰ１０において、バイパス制御部２４は、制御信号ＡＬＤＬの生成の有無を判定する。制御信号ＡＬＤＬが生成されない場合、８バイトの倍精度浮動小数点データまたは８バイトの倍幅単精度浮動小数点データは、データキャッシュ２２から読み出されていない。バイパス制御部２４は、動作ＯＰ１２において、リネーミングレジスタＲＮＦＲの下位４バイトに対応する制御信号ＡＬＤＬが生成されない場合、バイパステーブルＢＰＴＢＬを更新しない。 For example, in the operation OP10, the bypass control unit 24 determines whether or not the control signal ALDL is generated. When the control signal ALDL is not generated, 8-byte double-precision floating point data or 8-byte double-width single-precision floating point data is not read from the data cache 22. When the control signal ALDL corresponding to the lower 4 bytes of the renaming register RNFR is not generated in the operation OP12, the bypass control unit 24 does not update the bypass table BPTBL.

制御信号ＡＬＤＬが生成された場合、８バイトの倍精度浮動小数点データまたは８バイトの倍幅単精度浮動小数点データが一度にデータキャッシュ２２から読み出される。バイパス制御部２４は、動作ＯＰ１４において、制御信号ＡＬＤＬとともに制御信号ＡＬＤＨが生成されたか否かを判定する。 When the control signal ALDL is generated, 8-byte double-precision floating point data or 8-byte double-width single-precision floating point data is read from the data cache 22 at a time. The bypass control unit 24 determines whether or not the control signal ALDH is generated together with the control signal ALDL in the operation OP14.

制御信号ＡＬＤＬとともに制御信号ＡＬＤＨが生成された場合、８バイトの倍精度浮動小数点データまたは８バイトの倍幅単精度浮動小数点データがデータキャッシュ２２から１回のアクセス要求ＭＲＥＱＣで読み出される。すなわち、８バイトのデータの上位４バイトと下位４バイトとがデータキャッシュ２２からロードされるタイミングのずれは発生しない。この場合、バイパス制御部２４は、動作ＯＰ１６において、制御信号ＡＬＤＬ、ＡＬＤＨとともに受けるアドレスＭＡＬＤＵＢＡにより示されるバイパステーブルＢＰＴＢＬのフラグ領域ＦＬＧをセットする。 When the control signal ALDH is generated together with the control signal ALDL, 8-byte double-precision floating point data or 8-byte double-width single-precision floating point data is read from the data cache 22 with one access request MREQC. That is, there is no difference in timing when the upper 4 bytes and lower 4 bytes of the 8-byte data are loaded from the data cache 22. In this case, the bypass control unit 24 sets the flag area FLG of the bypass table BPTBL indicated by the address MALDUBA received together with the control signals ALDL and ALDH in the operation OP16.

制御信号ＡＬＤＬとともに制御信号ＡＬＤＨが生成されない場合、８バイトの倍幅単精度浮動小数点データは、２回に分けてデータキャッシュ２２から読み出される。すなわち、８バイトのデータの上位４バイトと下位４バイトとがデータキャッシュ２２からロードされるタイミングのずれが発生する。この場合、バイパス制御部２４は、動作ＯＰ１８において、制御信号ＡＬＤＬの受信から２クロックサイクル後にセット信号ＭＬＤＳＥＴとアドレスＭＬＤＵＢＡを生成する。この後、バイパス制御部２４は、動作ＯＰ１６において、セット信号ＭＬＤＳＥＴに同期してアドレスＭＬＤＵＢＡにより示されるバイパステーブルＢＰＴＢＬのフラグ領域ＦＬＧをセットする。セット信号ＭＬＤＳＥＴに同期するフラグ領域ＦＬＧのセットにより、２回に分けてデータキャッシュ２２から読み出された倍幅単精度浮動小数点データは、図１２に示すように、リネーミングレジスタＲＮＦＲを介してバイパス可能になる。図１２で説明するように、リネーミングレジスタＲＮＦＲからのデータの読み出しは、２クロックサイクル掛かる。このため、バイパス制御部２４は、セット信号ＭＬＤＳＥＴを制御信号ＡＬＤＬから２クロックサイクル後に生成することで、リネーミングレジスタＲＮＦＲからバイパスされたデータを用いた演算の開始を遅らせる。 When the control signal ALDH is not generated together with the control signal ALDL, the 8-byte double-width single-precision floating point data is read from the data cache 22 in two steps. That is, a timing difference occurs when the upper 4 bytes and the lower 4 bytes of 8-byte data are loaded from the data cache 22. In this case, in the operation OP18, the bypass control unit 24 generates the set signal MLDSET and the address MLDUBA two clock cycles after receiving the control signal ALDL. Thereafter, in operation OP16, the bypass control unit 24 sets the flag area FLG of the bypass table BPTBL indicated by the address MLDUBA in synchronization with the set signal MLDSET. By setting the flag area FLG in synchronization with the set signal MLDSET, double-width single precision floating point data read from the data cache 22 in two steps is bypassed via the renaming register RNFR as shown in FIG. It becomes possible. As described with reference to FIG. 12, reading of data from the renaming register RNFR takes two clock cycles. Therefore, the bypass control unit 24 delays the start of the calculation using the data bypassed from the renaming register RNFR by generating the set signal MLDSET after two clock cycles from the control signal ALDL.

図１０は、図２に示す演算処理装置ＯＰＤ２の動作の例を示す。図１０では、先行する浮動小数点データのロード命令の実行後に複数の浮動小数点データの演算命令Ａ、Ｂ、Ｃ、Ｄが実行される。図１０のロード命令は、倍精度浮動小数点データまたは倍幅単精度浮動小数点データがデータキャッシュ２２の１つのキャッシュラインから読み出される例を示す。また、演算命令Ａ、Ｂ、Ｃ、Ｄは、ロード命令によりロードされたデータが浮動小数点レジスタＦＲに格納される前に、ロードレジスタ３０またはリネーミングレジスタＲＮＦＲからバイパスさせるデータを使用して実行される。 FIG. 10 shows an example of the operation of the arithmetic processing unit OPD2 shown in FIG. In FIG. 10, a plurality of floating point data arithmetic instructions A, B, C, and D are executed after execution of the preceding floating point data load instruction. The load instruction in FIG. 10 shows an example in which double precision floating point data or double width single precision floating point data is read from one cache line of the data cache 22. Further, the arithmetic instructions A, B, C, and D are executed using data bypassed from the load register 30 or the renaming register RNFR before the data loaded by the load instruction is stored in the floating point register FR. The

図１０において、ロード命令のＤ、ＤＴ、Ｐ、ＰＴ、Ｂ１、Ｂ２、Ａ、Ｔ、Ｍ、Ｂ、Ｒ、ＲＴサイクルは、以下に示すように、ロード命令の実行時のパイプラインのステージを示す。
（ａ）Ｄ（Decode）：デコーダ部１４が命令を解読し、解読した命令情報ＩＮＳをメモリアクセス命令用のリザベーションステーションＲＳＭＡへ転送する。
（ｂ）ＤＴ（Decode Transfer)：リザベーションステーションＲＳＭＡが、Ｄサイクルで出力された命令情報ＩＮＳを保持する。
（ｃ）Ｐ（Priority)：リザベーションステーションＲＳＭＡが、アドレス生成部２０に出力する命令を決定する。
（ｄ）ＰＴ（Priority Transfer)：リザベーションステーションＲＳＭＡが、Ｐサイクルで決定した命令をアドレス生成部２０に出力する。
（ｅ）Ｂ１、Ｂ２（Buffer)：アドレス生成部２０が、データキャッシュ２２等にアクセスするアドレスを生成するためのデータ（ソースレジスタ）を決定する。
（ｆ）Ａ（Address)：アドレス生成部２０が、データキャッシュ２２等にアクセスするアドレスを計算する。
（ｇ）Ｔ（Tag)：データキャッシュ２２がタグ領域にアクセスする。
（ｈ）Ｍ（Match)：データキャッシュ２２がタグ領域から読み出したアドレスをアドレス生成部２０から受けたアドレスと比較し、キャッシュヒット、キャッシュミスを判定する。
（ｉ）Ｂ（Buffer)：データキャッシュ２２内で読み出したデータをバッファする。
（ｊ）Ｒ（Result)：データキャッシュ２２へのアクセスを完了し、読み出したデータをロードレジスタ３０に転送する。
（ｋ）ＲＴ（Result)：Ｒサイクルでロードレジスタ３０に転送されたデータをリネーミングレジスタ部３６に転送する。 In FIG. 10, the D, DT, P, PT, B1, B2, A, T, M, B, R, and RT cycles of the load instruction are the pipeline stages at the time of execution of the load instruction as shown below. Show.
(A) D (Decode): The decoder unit 14 decodes the instruction, and transfers the decoded instruction information INS to the memory access instruction reservation station RSMA.
(B) DT (Decode Transfer): The reservation station RSMA holds the instruction information INS output in the D cycle.
(C) P (Priority): The reservation station RSMA determines an instruction to be output to the address generation unit 20.
(D) PT (Priority Transfer): The reservation station RSMA outputs an instruction determined in the P cycle to the address generation unit 20.
(E) B1, B2 (Buffer): The address generation unit 20 determines data (source register) for generating an address for accessing the data cache 22 and the like.
(F) A (Address): The address generation unit 20 calculates an address for accessing the data cache 22 and the like.
(G) T (Tag): The data cache 22 accesses the tag area.
(H) M (Match): The address read by the data cache 22 from the tag area is compared with the address received from the address generation unit 20 to determine a cache hit or a cache miss.
(I) B (Buffer): Buffers data read in the data cache 22.
(J) R (Result): The access to the data cache 22 is completed, and the read data is transferred to the load register 30.
(K) RT (Result): The data transferred to the load register 30 in the R cycle is transferred to the renaming register unit 36.

図１０において、演算命令のＤ、ＤＴ、Ｐ、ＰＴ、Ｂ１、Ｂ２、Ｘ１、Ｘ２、Ｘ３、Ｘ４サイクルは、以下に示すように、演算命令の実行時のパイプラインのステージを示す。
（ｌ）Ｄ（Decode)：デコーダ部１４が命令を解読し、解読した命令情報ＩＮＳを浮動小数点データの演算用のリザベーションステーションＲＳＦＬＴへ転送する。
（ｍ）ＤＴ（Decode Transfer)：リザベーションステーションＲＳＦＬＴが、Ｄサイクルで出力された命令情報ＩＮＳを保持する。
（ｎ）Ｐ（Priority)：リザベーションステーションＲＳＦＬＴが、浮動小数点演算器２６に投入する命令を決定する。
（ｏ）ＰＴ（Priority Transfer)：リザベーションステーションＲＳＦＬＴが、Ｐサイクルで決定した命令を浮動小数点演算器２６に投入する。
（ｐ）Ｂ１、Ｂ２（Buffer)：浮動小数点演算器２６が、演算に必要なデータ（ソースレジスタ）を決定する。
（ｑ）Ｘ１、Ｘ２、Ｘ３、Ｘ４（Execute 1-4)：浮動小数点演算器２６が、命令を実行する。浮動小数点データの演算は、Ｘ１−Ｘ４の４サイクルで実行される。Ｘ４サイクルの次のサイクルで、演算により得られたデータがリネーミングレジスタ部３６へ格納される。 In FIG. 10, cycles D, DT, P, PT, B1, B2, X1, X2, X3, and X4 of the operation instruction indicate pipeline stages when the operation instruction is executed, as shown below.
(L) D (Decode): The decoder unit 14 decodes the instruction, and transfers the decoded instruction information INS to the reservation station RSFLT for calculation of floating point data.
(M) DT (Decode Transfer): The reservation station RSFLT holds the instruction information INS output in the D cycle.
(N) P (Priority): The reservation station RSFLT determines an instruction to be input to the floating point arithmetic unit 26.
(O) PT (Priority Transfer): The reservation station RSFLT inputs the instruction determined in the P cycle to the floating point arithmetic unit 26.
(P) B1, B2 (Buffer): The floating point arithmetic unit 26 determines data (source register) necessary for the operation.
(Q) X1, X2, X3, X4 (Execute 1-4): The floating point arithmetic unit 26 executes the instruction. The calculation of floating point data is executed in four cycles of X1-X4. In the next cycle of the X4 cycle, the data obtained by the operation is stored in the renaming register unit 36.

図１０のロード命令は、キャッシュヒット時の動作を示す。キャッシュミス時には、Ｄ、ＤＴ、Ｐ、ＰＴ、Ｂ１、Ｂ２、Ａ、Ｔ、Ｍサイクルが、１番目のクロックサイクルより手前のクロックサイクルで実行される。そして、アドレス生成部２０およびデータキャッシュ２２は、７番目のクロックサイクルからＡ、Ｔ、Ｍサイクルを再度実行し、キャッシュヒット時に、Ｍサイクルに続けてＢ、Ｒ、ＲＴサイクルを再度実行する。すなわち、ロード命令におけるキャッシュミス時の動作は、図１０からＤ、ＤＴ、Ｐ、ＰＴ、Ｂ１、Ｂ２サイクルが削除される。 The load instruction in FIG. 10 indicates an operation at the time of a cache hit. At the time of a cache miss, the D, DT, P, PT, B1, B2, A, T, and M cycles are executed in the clock cycle before the first clock cycle. Then, the address generation unit 20 and the data cache 22 re-execute the A, T, and M cycles from the seventh clock cycle, and re-execute the B, R, and RT cycles following the M cycle when a cache hit occurs. In other words, the D, DT, P, PT, B1, and B2 cycles are deleted from FIG.

また、図２に示す演算処理装置ＯＰＤ２は、デコーダ部１４が解読した命令の順に限定せず、実行可能な命令を実行するアウトオブオーダ実行を行う。このため、演算命令Ａ、Ｂ、Ｃ、Ｄにおいて、ＤＴサイクルの実行からＰサイクルの実行までのクロックサイクル数は、リザベーションステーションＲＳＦＬＴによる実行順の判定結果に依存して変化する。例えば、演算命令ＡのＤ、ＤＴサイクルは、１番目のクロックサイクルより手前のクロックサイクルで実行される。 2 is not limited to the order of instructions decoded by the decoder unit 14, but performs out-of-order execution of executing executable instructions. For this reason, in the arithmetic instructions A, B, C, and D, the number of clock cycles from the execution of the DT cycle to the execution of the P cycle varies depending on the execution order determination result by the reservation station RSFLT. For example, the D and DT cycles of the operation instruction A are executed in a clock cycle before the first clock cycle.

ロード命令において、アドレス生成部２０は、Ａサイクルでデータキャッシュへの１回のアクセスでデータを読み出せると判断し、データキャッシュ２２に図２に示すアクセス要求ＭＲＥＱＣを発行する。また、アドレス生成部２０は、アクセス要求ＭＲＥＱＣとともに、バイパス制御部２４に制御信号ＡＬＤＬ、ＡＬＤＨおよびアドレスＡＬＤＵＢＡを出力する（図１０（ａ））。 In the load instruction, the address generation unit 20 determines that the data can be read in one access to the data cache in the A cycle, and issues an access request MREQC shown in FIG. The address generator 20 outputs the control signals ALDL and ALDH and the address ALDUBA to the bypass controller 24 together with the access request MREQC (FIG. 10A).

バイパス制御部２４は、制御信号ＡＬＤＬ、ＡＬＤＨに応答してセット信号ＡＬＤＳＥＴを生成し、アドレスＡＬＤＵＢＡが示すフラグ領域ＦＬＧをセットすることでバイパス可能信号ＢＰＥＮを有効レベルに設定する（図１０（ｂ））。 The bypass control unit 24 generates the set signal ALDSET in response to the control signals ALDL and ALDH, and sets the bypassable signal BPEN to an effective level by setting the flag area FLG indicated by the address ALDUBA (FIG. 10B). ).

データキャッシュ２２は、ロード命令のＲサイクルにおいて読み出しサイクルを完了し、読み出した８バイトのデータをロードレジスタ３０に転送する（図１０（ｃ））。ロードレジスタ３０に転送されたデータは、ＲＴサイクルでリネーミングレジスタ部３６に転送される。リネーミングレジスタ部３６の書き込み制御部ＷＣＮＴ３（図４）は、ロードレジスタ３０から転送される８バイトのデータに基づいて、共通なタイミングで書き込み制御信号ＷＥ１Ｈ、ＷＥ１Ｌを生成する（図１０（ｄ））。そして、書き込み制御部ＷＣＮＴ３は、リネーミングレジスタＲＮＦＲのいずれかに８バイトのデータを格納する（図１０（ｅ））。例えば、書き込み制御部ＷＣＮＴ３は、リザベーションステーション１８から受ける有効レベルに設定されたバイパス可能信号ＢＰＥＮを示す情報に基づいてデータを格納するリネーミングレジスタＲＮＦＲを決定する。なお、例えば、リネーミングレジスタＲＮＦＲからのデータの読み出しは、２クロックサイクル掛かるため、リネーミングレジスタＲＮＦＲからのデータのバイパスは、１４番目のクロックサイクルから可能になる。 The data cache 22 completes the read cycle in the R cycle of the load instruction, and transfers the read 8-byte data to the load register 30 (FIG. 10C). The data transferred to the load register 30 is transferred to the renaming register unit 36 in the RT cycle. The write control unit WCNT3 (FIG. 4) of the renaming register unit 36 generates the write control signals WE1H and WE1L at a common timing based on the 8-byte data transferred from the load register 30 (FIG. 10 (d)). ). Then, the write control unit WCNT3 stores 8-byte data in any of the renaming registers RNFR (FIG. 10 (e)). For example, the write control unit WCNT3 determines a renaming register RNFR that stores data based on information indicating the bypassable signal BPEN set to an effective level received from the reservation station 18. For example, since reading of data from the renaming register RNFR takes two clock cycles, bypassing of data from the renaming register RNFR is possible from the 14th clock cycle.

リザベーションステーションＲＳＦＬＴは、リザベーションステーションＲＳＭＡがＢ１、Ｂ２サイクルでソースレジスタを決定した後、８番目のクロックサイクルで、バイパスが可能なことを判断する。そして、リザベーションステーションＲＳＦＬＴは、９番目のクロックサイクルで演算命令ＡのＰサイクルを実行し、演算命令Ａを浮動小数点演算器２６に投入する（図１０（ｆ））。 The reservation station RSFLT determines that the bypass is possible in the eighth clock cycle after the reservation station RSMA determines the source register in the B1 and B2 cycles. Then, the reservation station RSFLT executes the P cycle of the arithmetic instruction A in the ninth clock cycle, and inputs the arithmetic instruction A to the floating point arithmetic unit 26 (FIG. 10 (f)).

浮動小数点演算器２６は、演算命令ＡのＢ２サイクルでロードレジスタ３０からバイパスされるデータを選択し、選択したデータをＸ１サイクルで使用する（図１０（ｇ））。図１０において、太い破線で示す矢印は、データがロードレジスタ３０またはリネーミングレジスタＲＮＦＲから浮動小数点演算器２６にバイパスされることを示す。浮動小数点演算器２６は、演算命令ＢのＢ１サイクルでロードレジスタ３０からバイパスされるデータを選択し、選択したデータをＸ１サイクルで使用する（図１０（ｈ））。 The floating point arithmetic unit 26 selects data to be bypassed from the load register 30 in the B2 cycle of the operation instruction A, and uses the selected data in the X1 cycle (FIG. 10 (g)). In FIG. 10, an arrow indicated by a thick broken line indicates that data is bypassed from the load register 30 or the renaming register RNFR to the floating point arithmetic unit 26. The floating point arithmetic unit 26 selects data to be bypassed from the load register 30 in the B1 cycle of the operation instruction B, and uses the selected data in the X1 cycle (FIG. 10 (h)).

また、浮動小数点演算器２６は、演算命令ＣのＢ２サイクルでリネーミングレジスタＲＮＦＲからバイパスされるデータを選択し、選択したデータをＸ１サイクルで使用する（図１０（ｉ））。浮動小数点演算器２６は、演算命令ＤのＢ２サイクルでリネーミングレジスタＲＮＦＲからバイパスされるデータを選択し、選択したデータをＸ１サイクルで使用する（図１０（ｊ））。 The floating point arithmetic unit 26 selects data to be bypassed from the renaming register RNFR in the B2 cycle of the operation instruction C, and uses the selected data in the X1 cycle (FIG. 10 (i)). The floating point arithmetic unit 26 selects data to be bypassed from the renaming register RNFR in the B2 cycle of the operation instruction D, and uses the selected data in the X1 cycle (FIG. 10 (j)).

図１１は、図２に示す演算処理装置ＯＰＤ２の動作の別の例を示す。図１０と同一または同様の動作については、詳細な説明は省略する。図１１では、先行する浮動小数点データの演算命令の実行後に複数の浮動小数点データの演算命令Ａ、Ｂ、Ｃ、Ｄが実行される。図１１において先行する演算命令では、倍精度浮動小数点データ、倍幅単精度浮動小数点データまたは単精度浮動小数点データが使用される。また、演算命令Ａ、Ｂ、Ｃ、Ｄは、先行する演算命令により得られたデータが浮動小数点レジスタＦＲに格納される前に、リザルトレジスタ３２またはリネーミングレジスタＲＮＦＲからバイパスさせるデータを使用して実行される。 FIG. 11 shows another example of the operation of the arithmetic processing unit OPD2 shown in FIG. Detailed description of the same or similar operations as in FIG. 10 will be omitted. In FIG. 11, a plurality of floating-point data arithmetic instructions A, B, C, and D are executed after execution of the preceding floating-point data arithmetic instruction. In the operation instruction preceding in FIG. 11, double precision floating point data, double width single precision floating point data or single precision floating point data is used. The arithmetic instructions A, B, C, and D use data bypassed from the result register 32 or the renaming register RNFR before the data obtained by the preceding arithmetic instruction is stored in the floating point register FR. Executed.

リザベーションステーションＲＳＦＬＴは、先行する演算命令のＢ１サイクルで、演算命令Ａ、Ｂ、Ｃ、Ｄで使用するデータをリザルトレジスタ３２またはリネーミングレジスタＲＮＦＲからバイパス可能と判断する。そして、リザベーションステーションＲＳＦＬＴは、制御信号Ｂ１ＦＰおよびアドレスＢ１ＦＰＵＢＡを出力する（図１１（ａ））。 The reservation station RSFLT determines that the data used in the operation instructions A, B, C, and D can be bypassed from the result register 32 or the renaming register RNFR in the B1 cycle of the preceding operation instruction. The reservation station RSFLT outputs a control signal B1FP and an address B1FPUBA (FIG. 11 (a)).

バイパス制御部２４は、制御信号Ｂ１ＦＰに応答してアドレスＢ１ＦＰＵＢＡが示すフラグ領域ＦＬＧをセットすることでバイパス可能信号ＢＰＥＮを有効レベルに設定する（図１１（ｂ））。 Bypass control unit 24 sets flag area FLG indicated by address B1FPUBA in response to control signal B1FP, thereby setting bypass enable signal BPEN to an effective level (FIG. 11 (b)).

浮動小数点演算器２６は、先行する演算命令をＸ１からＸ４サイクルで実行し、演算結果をＸ４サイクルでリザルトレジスタ３２に転送する（図１１（ｃ））。リザルトレジスタ３２に転送されたデータは、次のクロックサイクルでリネーミングレジスタＲＮＦＲに転送される（図１１（ｄ））。 The floating point arithmetic unit 26 executes the preceding arithmetic instruction in the X1 to X4 cycles, and transfers the arithmetic result to the result register 32 in the X4 cycle (FIG. 11 (c)). The data transferred to the result register 32 is transferred to the renaming register RNFR in the next clock cycle (FIG. 11 (d)).

リザベーションステーションＲＳＦＬＴは、７番目のクロックサイクルで演算命令ＡのＰサイクルを実行し、演算命令Ａを浮動小数点演算器２６に投入する（図１１（ｅ））。浮動小数点演算器２６は、演算命令ＡのＸ１サイクルでリザルトレジスタ３２からバイパスされるデータを選択し、選択したデータを用いてＸ１サイクルから演算を開始する（図１１（ｆ））。浮動小数点演算器２６は、演算命令ＢのＢ２サイクルでリザルトレジスタ３２からバイパスされるデータを選択し、選択したデータをＸ１サイクルで使用する（図１１（ｇ））。浮動小数点演算器２６は、演算命令ＣのＢ１サイクルでリザルトレジスタ３２からバイパスされるデータを選択し、選択したデータをＸ１サイクルで使用する（図１１（ｈ））。浮動小数点演算器２６は、演算命令ＤのＢ２サイクルでリネーミングレジスタＲＮＦＲからバイパスされるデータを選択し、選択したデータをＸ１サイクルで使用する（図１１（ｉ））。なお、図１０で説明したように、リネーミングレジスタＲＮＦＲからのデータの読み出しは、２クロックサイクル掛かるため、リネーミングレジスタＲＮＦＲからのデータのバイパスは、１３番目のクロックサイクルから可能になる。 The reservation station RSFLT executes the P cycle of the arithmetic instruction A in the seventh clock cycle, and inputs the arithmetic instruction A to the floating point arithmetic unit 26 (FIG. 11 (e)). The floating point arithmetic unit 26 selects data to be bypassed from the result register 32 in the X1 cycle of the operation instruction A, and starts the operation from the X1 cycle using the selected data (FIG. 11 (f)). The floating point arithmetic unit 26 selects data to be bypassed from the result register 32 in the B2 cycle of the operation instruction B, and uses the selected data in the X1 cycle (FIG. 11 (g)). The floating point arithmetic unit 26 selects data to be bypassed from the result register 32 in the B1 cycle of the operation instruction C, and uses the selected data in the X1 cycle (FIG. 11 (h)). The floating point arithmetic unit 26 selects data to be bypassed from the renaming register RNFR in the B2 cycle of the operation instruction D, and uses the selected data in the X1 cycle (FIG. 11 (i)). As described with reference to FIG. 10, since reading of data from the renaming register RNFR takes two clock cycles, the bypass of data from the renaming register RNFR can be performed from the thirteenth clock cycle.

図１２は、図２に示す演算処理装置ＯＰＤ２の動作のさらなる別の例を示す。図１０と同一または同様の動作については、詳細な説明は省略する。図１２では、先行する倍幅単精度浮動小数点データのロード命令において、８バイトのデータがデータキャッシュ２２の２つのキャッシュラインを跨いで保持されている。また、演算命令Ａ、Ｂ、Ｃ、Ｄは、倍幅単精度浮動小数点データの演算を実行する命令であり、先行するロード命令により得られたデータが浮動小数点レジスタＦＲに格納される前にリネーミングレジスタＲＮＦＲからバイパスさせるデータを使用して実行される。 FIG. 12 shows still another example of the operation of the arithmetic processing unit OPD2 shown in FIG. Detailed description of the same or similar operations as in FIG. 10 will be omitted. In FIG. 12, 8-byte data is held across two cache lines of the data cache 22 in the preceding double-width single-precision floating-point data load instruction. The operation instructions A, B, C, and D are instructions for performing an operation on double-width single-precision floating-point data, and are read before the data obtained by the preceding load instruction is stored in the floating-point register FR. This is performed using data to be bypassed from the naming register RNFR.

ロード命令で読み出される倍幅単精度浮動小数点データが２つのキャッシュラインから読み出される場合、Ａ、Ｔ、Ｍ、Ｂ、Ｒ、ＲＴサイクルが間隔を置いて２回実行される（図１２（ａ）、（ｂ））。図２に示すアドレス生成部２０は、最初の読み出しサイクルで倍幅単精度浮動小数点データのうち上位４バイトを読み出し、２番目の読み出しサイクルで倍幅単精度浮動小数点データのうち下位４バイトを読み出す。 When double-width single-precision floating point data read by a load instruction is read from two cache lines, A, T, M, B, R, and RT cycles are executed twice at intervals (FIG. 12A). (B)). The address generator 20 shown in FIG. 2 reads the upper 4 bytes of the double-width single precision floating point data in the first read cycle, and reads the lower 4 bytes of the double width single precision floating point data in the second read cycle. .

アドレス生成部２０は、上位４バイトのデータをデータキャッシュ２２から読み出すための最初のＡサイクルにおいて、バイパス制御部２４に制御信号ＡＬＤＨおよびアドレスＡＬＤＵＢＡを出力する（図１２（ｃ））。１番目のクロックサイクルにおいて、バイパス制御部２４のセット信号生成回路ＳＳＧＥＮ（図７）は、無効レベルの制御信号ＡＬＤＬと有効レベルの制御信号ＡＬＤＨとを受け、セット信号ＡＳＤＳＥＴを無効レベルに維持する（図１２（ｄ））。このため、バイパス管理テーブルＢＰＴＢＬにおいて、アドレスＡＬＤＵＢＡが示すフラグ領域ＦＬＧはセットされず、バイパス可能信号ＢＰＥＮは出力されない（図１２（ｅ））。 The address generation unit 20 outputs the control signal ALDH and the address ALDUBA to the bypass control unit 24 in the first A cycle for reading the upper 4 bytes of data from the data cache 22 (FIG. 12 (c)). In the first clock cycle, the set signal generation circuit SSGEN (FIG. 7) of the bypass control unit 24 receives the invalid level control signal ALDL and the valid level control signal ALDH, and maintains the set signal ASDSET at the invalid level ( FIG. 12 (d)). Therefore, in the bypass management table BPTBL, the flag area FLG indicated by the address ALDUBA is not set, and the bypass enable signal BPEN is not output (FIG. 12 (e)).

セット信号生成回路ＳＳＧＥＮは、アドレスＡＬＤＵＢＡを２クロックサイクル遅らせてアドレスＭＬＤＵＢＡを出力する（図１２（ｆ））。しかし、セット信号ＭＬＤＳＥＴが無効レベルに維持されるため、アドレスＡＬＤＵＢＡが示すフラグ領域ＦＬＧはセットされず、バイパス可能信号ＢＰＥＮは出力されない（図１２（ｇ））。 The set signal generation circuit SSGEN delays the address ALDUBA by two clock cycles and outputs the address MLDUBA (FIG. 12 (f)). However, since the set signal MLDSET is maintained at the invalid level, the flag area FLG indicated by the address ALDUBA is not set, and the bypass enable signal BPEN is not output (FIG. 12 (g)).

データキャッシュ２２は、最初のアクセスサイクルのＲサイクルにおいて読み出した上位４バイトのデータをロードレジスタ３０に転送する（図１２（ｈ））。リネーミングレジスタ部３６の書き込み制御部ＷＣＮＴ３（図４）は、ロードレジスタ３０から転送される上位４バイトのデータに基づいて、書き込み制御信号ＷＥ１Ｈを生成する（図１２（ｉ））。そして、書き込み制御部ＷＣＮＴ３は、リネーミングレジスタＲＮＦＲのいずれかに上位４バイトのデータを格納する（図１２（ｊ））。例えば、書き込み制御部ＷＣＮＴ３は、リザベーションステーション１８から受ける有効レベルに設定されたバイパス可能信号ＢＰＥＮを示す情報に基づいてデータを格納するリネーミングレジスタＲＮＦＲを決定する。 The data cache 22 transfers the upper 4 bytes of data read in the R cycle of the first access cycle to the load register 30 (FIG. 12 (h)). The write control unit WCNT3 (FIG. 4) of the renaming register unit 36 generates the write control signal WE1H based on the upper 4 bytes of data transferred from the load register 30 (FIG. 12 (i)). Then, the write control unit WCNT3 stores the upper 4 bytes of data in any of the renaming registers RNFR ((j) in FIG. 12). For example, the write control unit WCNT3 determines a renaming register RNFR that stores data based on information indicating the bypassable signal BPEN set to an effective level received from the reservation station 18.

アドレス生成部２０は、下位４バイトのデータをデータキャッシュ２２から読み出すための最初のＡサイクルにおいて、バイパス制御部２４に制御信号ＡＬＤＬおよびアドレスＡＬＤＵＢＡを出力する（図１２（ｋ））。９番目のクロックサイクルにおいて、バイパス制御部２４のセット信号生成回路ＳＳＧＥＮは、有効レベルの制御信号ＡＬＤＬと無効レベルの制御信号ＡＬＤＨとを受け、セット信号ＡＳＤＳＥＴを無効レベルに維持する（図１２（ｌ））。このため、バイパス管理テーブルＢＰＴＢＬにおいて、アドレスＡＬＤＵＢＡが示すフラグ領域ＦＬＧはセットされず、バイパス可能信号ＢＰＥＮは出力されない（図１２（ｍ））。 The address generation unit 20 outputs the control signal ALDL and the address ALDUBA to the bypass control unit 24 in the first A cycle for reading the lower 4 bytes of data from the data cache 22 (FIG. 12 (k)). In the ninth clock cycle, the set signal generation circuit SSGEN of the bypass control unit 24 receives the control signal ALDL at the valid level and the control signal ALDH at the invalid level, and maintains the set signal ASDSET at the invalid level (FIG. 12 (l )). Therefore, in the bypass management table BPTBL, the flag area FLG indicated by the address ALDUBA is not set, and the bypass enable signal BPEN is not output (FIG. 12 (m)).

一方、セット信号生成回路ＳＳＧＥＮ（図７）におけるサイクル信号生成部ＭＳＧＥＮは、有効レベルの制御信号ＡＬＤＬと無効レベルの制御信号ＡＬＤＨと２クロックサイクル遅らせた信号を１１番目のクロックサイクルに受ける。そして、サイクル信号生成部ＭＳＧＥＮは、セット信号ＭＬＤＳＥＴを有効レベルに設定する（図１２（ｎ））。また、セット信号生成回路ＳＳＧＥＮは、アドレスＡＬＤＵＢＡを２クロックサイクル遅らせたアドレスＭＬＤＵＢＡを出力する（図１２（ｏ））。これにより、バイパス管理テーブルＢＰＴＢＬにおいて、アドレスＡＬＤＵＢＡが示すフラグ領域ＦＬＧがセットされ、バイパス可能信号ＢＰＥＮが出力される（図１２（ｐ））。 On the other hand, the cycle signal generation unit MSGEN in the set signal generation circuit SSGEN (FIG. 7) receives an effective level control signal ALDL, an invalid level control signal ALDH, and a signal delayed by two clock cycles in the eleventh clock cycle. Then, the cycle signal generation unit MSGEN sets the set signal MLDSET to an effective level (FIG. 12 (n)). The set signal generation circuit SSGEN outputs the address MLDUBA obtained by delaying the address ALDUBA by two clock cycles (FIG. 12 (o)). As a result, the flag area FLG indicated by the address ALDUBA is set in the bypass management table BPTBL, and the bypass enable signal BPEN is output (FIG. 12 (p)).

データキャッシュ２２は、２番目のアクセスサイクルのＲサイクルにおいて読み出した下位４バイトのデータをロードレジスタ３０に転送する（図１２（ｑ））。リネーミングレジスタ部３６の書き込み制御部ＷＣＮＴ３（図４）は、ロードレジスタ３０から転送される下位４バイトのデータに基づいて、書き込み制御信号ＷＥ１Ｌを生成する（図１２（ｒ））。そして、書き込み制御部ＷＣＮＴ３は、リネーミングレジスタＲＮＦＲのいずれかに下位４バイトのデータを格納する（図１２（ｓ））。例えば、書き込み制御部ＷＣＮＴ３は、リザベーションステーション１８から受ける有効レベルに設定されたバイパス可能信号ＢＰＥＮを示す情報に基づいてデータを格納するリネーミングレジスタＲＮＦＲを決定する。 The data cache 22 transfers the lower 4 bytes of data read in the R cycle of the second access cycle to the load register 30 ((q) in FIG. 12). The write control unit WCNT3 (FIG. 4) of the renaming register unit 36 generates the write control signal WE1L based on the lower 4 bytes of data transferred from the load register 30 (FIG. 12 (r)). Then, the write control unit WCNT3 stores the lower 4 bytes of data in any of the renaming registers RNFR (FIG. 12 (s)). For example, the write control unit WCNT3 determines a renaming register RNFR that stores data based on information indicating the bypassable signal BPEN set to an effective level received from the reservation station 18.

これにより、８バイトの倍幅単精度浮動小数点データがリネーミングレジスタＲＮＦＲ内に揃い、リネーミングレジスタＲＮＦＲからの倍幅単精度浮動小数点データのバイパスが可能になる。なお、８バイトの倍幅単精度浮動小数点データが２つのキャッシュラインから読み出される場合、ロードレジスタ３０は、上位４バイトのデータまたは下位４バイトのデータを互いに異なるタイミングで保持する。このため、ロードレジスタ３０は、倍幅単精度浮動小数点データのバイパスには使用されない。換言すれば、バイパス制御部２４は、バイパス可能信号ＢＰＥＮの出力サイクルを図１０および図１１に比べて遅らせ、ロードレジスタ３０から浮動小数点演算器２６へのデータのバイパスを禁止する。データが揃っていないロードレジスタ３０からのバイパスを禁止することで、演算処理装置ＯＰＤ２の誤動作を抑制することができる。 As a result, double-byte single-precision floating point data of 8 bytes is aligned in the renaming register RNFR, and the double-width single-precision floating point data from the renaming register RNFR can be bypassed. When 8-byte double-width single-precision floating-point data is read from two cache lines, the load register 30 holds upper 4 bytes data or lower 4 bytes data at different timings. For this reason, the load register 30 is not used for bypassing double-width single precision floating point data. In other words, the bypass control unit 24 delays the output cycle of the bypass enable signal BPEN as compared with FIGS. 10 and 11 and prohibits the bypass of data from the load register 30 to the floating point calculator 26. By prohibiting bypass from the load register 30 where data is not available, malfunction of the arithmetic processing unit OPD2 can be suppressed.

浮動小数点演算器２６は、演算命令Ａ、Ｂ、Ｃ、Ｄのそれぞれにおいて、Ｂ２サイクルでリネーミングレジスタＲＮＦＲからバイパスされるデータを選択し、選択したデータをＸ１サイクルで使用する（図１２（ｔ）、（ｕ）、（ｖ）、（ｗ））。 In each of the arithmetic instructions A, B, C, and D, the floating point arithmetic unit 26 selects data to be bypassed from the renaming register RNFR in the B2 cycle, and uses the selected data in the X1 cycle (FIG. 12 (t ), (U), (v), (w)).

図１２に示すように、倍幅単精度浮動小数点データが２つのキャッシュラインから読み出される場合、バイパス制御部２４は、上位４バイトのデータより後に読み出される下位４バイトのデータの読み出しを待つ。さらに、バイパス制御部２４は、下位４バイトのデータの読み出しの開始に対応する制御信号ＡＬＤＬの生成から２クロックサイクル後に、フラグ領域ＦＬＧをセットし、バイパス可能信号ＢＰＥＮを出力する。 As shown in FIG. 12, when double-width single-precision floating point data is read from two cache lines, the bypass control unit 24 waits for reading of lower 4 bytes of data read after the upper 4 bytes of data. Further, the bypass control unit 24 sets the flag area FLG and outputs the bypass enable signal BPEN after two clock cycles from the generation of the control signal ALDL corresponding to the start of reading of the lower 4 bytes of data.

これにより、リザベーションステーションＲＳＦＬＴは、図１０に比べて２クロックサイクル遅くイネーブル信号ＢＰＥＮを受ける。データキャッシュ２２からロードレジスタ３０へのデータの転送は、演算命令ＡのＰサイクルと同じサイクルで実行される。このため、演算命令ＡのＰサイクルにおいて、リザベーションステーションＲＳＦＬＴは、ロードレジスタ３０からのデータのバイパスをあきらめ、リネーミングレジスタＲＮＦＲからデータをバイパスさせることを決定する。この結果、倍幅単精度浮動小数点データがリネーミングレジスタＲＮＦＲに揃う前に、ロードレジスタ３０からデータがバイパスされることを抑止することができる。換言すれば、リザベーションステーションＲＳＦＬＴは、演算命令ＡのＰサイクルを、リネーミングレジスタＲＮＦＲからデータをバイパス可能な最も早いクロックサイクルに割り当てる。例えば、リネーミングレジスタＲＮＦＲからのデータの読み出しは２クロックサイクル掛かるため、演算命令ＡのＰサイクルは、１３番目のクロックサイクルに設定される。 Thereby, the reservation station RSFLT receives the enable signal BPEN two clock cycles later than in FIG. Data transfer from the data cache 22 to the load register 30 is executed in the same cycle as the P cycle of the operation instruction A. Therefore, in the P cycle of the operation instruction A, the reservation station RSFLT gives up bypassing the data from the load register 30 and decides to bypass the data from the renaming register RNFR. As a result, it is possible to prevent the data from being bypassed from the load register 30 before the double-width single-precision floating point data is arranged in the renaming register RNFR. In other words, the reservation station RSFLT assigns the P cycle of the operation instruction A to the earliest clock cycle that can bypass data from the renaming register RNFR. For example, since reading of data from the renaming register RNFR takes 2 clock cycles, the P cycle of the operation instruction A is set to the 13th clock cycle.

図１３は、図２に示す演算処理装置ＯＰＤ２の動作のさらなる別の例を示す。図１０と同一または同様の動作については、詳細な説明は省略する。図１３は、先行する浮動小数点データのロード命令または先行する浮動小数点データの演算命令の実行完了時の動作を示す。 FIG. 13 shows still another example of the operation of the arithmetic processing unit OPD2 shown in FIG. Detailed description of the same or similar operations as in FIG. 10 will be omitted. FIG. 13 shows the operation at the completion of the execution of the preceding floating point data load instruction or the preceding floating point data arithmetic instruction.

図１３において、７番目と８番目のクロックサイクルのＣサイクルとＷサイクルは、以下に示すように、命令の実行が完了したときのコミット制御部４４によるパイプラインのステージを示す。
（ａ）Ｃ（Commit)：コミット制御部４４が、命令の実行が完了したか否かを判定する。
（ｂ）Ｗ（Write)：コミット制御部４４が、実行が完了した命令で使用したリネーミングレジスタＲＮＦＲ、ＲＮＲ、浮動小数点レジスタＦＲ、固定小数点レジスタＲ等を更新する。また、コミット制御部４４が、レジスタ管理テーブル１６、リザベーションステーション１８、バイパス制御部２４等における命令で使用したリソースの解放を行う。 In FIG. 13, C and W cycles of the seventh and eighth clock cycles indicate pipeline stages by the commit control unit 44 when the execution of the instruction is completed, as will be described below.
(A) C (Commit): The commit control unit 44 determines whether or not the execution of the instruction is completed.
(B) W (Write): The commit control unit 44 updates the renaming registers RNFR and RNR, the floating-point register FR, the fixed-point register R, and the like used in the instruction that has been executed. In addition, the commit control unit 44 releases resources used in the instructions in the register management table 16, the reservation station 18, the bypass control unit 24, and the like.

例えば、先行するロード命令のＲＴサイクルまたは先行する演算命令のＸ４サイクルは、３番目のクロックサイクルで完了している。コミット制御部４４は、命令の完了処理を、デコーダ部１４が命令を解読した順（プログラムの記述順）に実行するため、Ｃサイクルは、先行して解読した命令の完了を待って実行される。この例では、Ｘ４サイクルまたはＲＴサイクルの４クロックサイクル後にＣサイクルが実行される。なお、Ｃサイクルは、最短で、ＲＴサイクルまたはＸ４サイクルの次のクロックサイクルで実行される。 For example, the RT cycle of the preceding load instruction or the X4 cycle of the preceding arithmetic instruction is completed in the third clock cycle. Since the commit control unit 44 executes instruction completion processing in the order in which the decoder unit 14 decodes the instructions (program description order), the C cycle is executed after completion of the previously decoded instruction. . In this example, the C cycle is executed after 4 clock cycles of the X4 cycle or the RT cycle. Note that the C cycle is executed in the shortest clock cycle after the RT cycle or the X4 cycle.

コミット制御部４４は、Ｗサイクルにおいて、制御信号ＷＣＭＴおよびアドレスＷＣＭＴＵＢＡをバイパス制御部２４に出力する（図１３（ａ））。アドレスＷＣＭＴＵＢＡは、ロード命令または演算命令で使用されたリネーミングレジスタＲＮＦＲを示す。 The commit control unit 44 outputs the control signal WCMT and the address WCMTUBA to the bypass control unit 24 in the W cycle (FIG. 13A). The address WCMTUBA indicates the renaming register RNFR used in the load instruction or the operation instruction.

また、コミット制御部４４は、Ｗサイクルにおいて、レジスタ管理テーブル１６にリセット信号ＲＳＴ１を出力し、リザベーションステーション１８にリセット信号ＲＳＴ２を出力する（図１３（ｂ））。さらに、コミット制御部４４は、Ｗサイクルにおいて、リネーミングレジスタ部３６に読み出し要求ＲＣＭＴ１を出力する（図１３（ｃ））。コミット制御部４４は、読み出し要求ＲＣＭＴ１から２クロックサイクル後に、浮動小数点レジスタ部４０に書き込み要求ＷＣＭＴ１を出力する（図１３（ｄ））。図１０で説明したように、２クロックサイクルは、リネーミングレジスタＲＮＦＲからの読み出しに掛かる時間である。 In the W cycle, the commit control unit 44 outputs the reset signal RST1 to the register management table 16, and outputs the reset signal RST2 to the reservation station 18 (FIG. 13 (b)). Further, the commit control unit 44 outputs a read request RCMT1 to the renaming register unit 36 in the W cycle (FIG. 13C). The commit control unit 44 outputs the write request WCMT1 to the floating point register unit 40 two clock cycles after the read request RCMT1 (FIG. 13 (d)). As described with reference to FIG. 10, two clock cycles are the time taken for reading from the renaming register RNFR.

なお、図１３では、リセット信号ＲＳＴ１、ＲＳＴ２は、単一の信号で示されるが、リセット信号ＲＳＴ１、ＲＳＴ２のそれぞれは、複数ビットの信号である。例えば、リセット信号ＲＳＴ１、ＲＳＴ２は、アドレスＷＣＭＴＵＢＡと同じ情報を含む。同様に、読み出し要求ＲＣＭＴ１および書き込み要求ＷＣＭＴ１のそれぞれは、単一の信号で示されるが、読み出し要求ＲＣＭＴ１および書き込み要求ＷＣＭＴ１のそれぞれは、複数ビットの信号である。例えば、読み出し要求ＲＣＭＴ１および書き込み要求ＷＣＭＴ１は、リネーミングレジスタＲＮＦＲおよび浮動小数点レジスタＦＲの番号を示す情報を含む。 In FIG. 13, the reset signals RST1 and RST2 are shown as a single signal, but each of the reset signals RST1 and RST2 is a multi-bit signal. For example, the reset signals RST1 and RST2 include the same information as the address WCMTUBA. Similarly, each of the read request RCMT1 and the write request WCMT1 is indicated by a single signal, but each of the read request RCMT1 and the write request WCMT1 is a multi-bit signal. For example, the read request RCMT1 and the write request WCMT1 include information indicating the numbers of the renaming register RNFR and the floating point register FR.

バイパス制御部２４は、制御信号ＷＣＭＴに応答してアドレスＷＣＭＴＵＢＡが示すフラグ領域ＦＬＧをリセットすることでバイパス可能信号ＢＰＥＮを無効レベルに設定する（図１３（ｅ））。 The bypass control unit 24 sets the bypassable signal BPEN to an invalid level by resetting the flag area FLG indicated by the address WCMTUBA in response to the control signal WCMT (FIG. 13 (e)).

レジスタ管理テーブル１６は、リセット信号ＲＳＴ１に基づいて、リセット信号ＲＳＴ１で指示されるリネーミングレジスタＲＮＦＲに対応する領域のビット値Ｐをリセットする（図１３（ｆ））。リザベーションステーション１８は、リセット信号ＲＳＴ２に基づいて、リセット信号ＲＳＴ２で指示されるリネーミングレジスタＲＮＦＲに対応して保持しているビット値Ｐをリセットする。 Based on the reset signal RST1, the register management table 16 resets the bit value P in the area corresponding to the renaming register RNFR indicated by the reset signal RST1 (FIG. 13 (f)). The reservation station 18 resets the bit value P held corresponding to the renaming register RNFR instructed by the reset signal RST2 based on the reset signal RST2.

リネーミングレジスタ部３６は、読み出し要求ＲＣＭＴ１に基づいて、読み出し要求ＲＣＭＴ１で指示されるリネーミングレジスタＲＮＦＲからデータを出力する。浮動小数点レジスタ部４０は、書き込み要求ＷＣＭＴ１に基づいて、リネーミングレジスタＲＮＦＲから出力されるデータを、書き込み要求ＷＣＭＴ１で指示される浮動小数点レジスタＦＲに格納する（図１３（ｇ））。すなわち、リネーミングレジスタＲＮＦＲから浮動小数点レジスタＦＲにデータが転送される。 Based on the read request RCMT1, the renaming register unit 36 outputs data from the renaming register RNFR indicated by the read request RCMT1. Based on the write request WCMT1, the floating point register unit 40 stores the data output from the renaming register RNFR in the floating point register FR indicated by the write request WCMT1 (FIG. 13 (g)). That is, data is transferred from the renaming register RNFR to the floating point register FR.

リザベーションステーションＲＳＦＬＴは、ビット値Ｐがリセットされる前の７番目のクロックサイクルで、リネーミングレジスタＲＮＦＲからデータをバイパス可能と判断する。そして、リザベーションステーションＲＳＦＬＴは、８番目のクロックサイクルにおいて、演算命令ＡのＰサイクルを実行し、演算命令Ａを浮動小数点演算器２６に投入する（図１３（ｈ））。 The reservation station RSFLT determines that data can be bypassed from the renaming register RNFR in the seventh clock cycle before the bit value P is reset. Then, the reservation station RSFLT executes the P cycle of the arithmetic instruction A in the eighth clock cycle, and inputs the arithmetic instruction A to the floating point arithmetic unit 26 (FIG. 13 (h)).

同様に、リザベーションステーションＲＳＦＬＴは、ビット値Ｐがリセットされる前の８番目のクロックサイクルで、リネーミングレジスタＲＮＦＲからデータをバイパス可能と判断する。そして、リザベーションステーションＲＳＦＬＴは、９番目のクロックサイクルにおいて、演算命令ＢのＰサイクルを実行し、演算命令Ｂを浮動小数点演算器２６に投入する（図１３（ｉ））。 Similarly, the reservation station RSFLT determines that data can be bypassed from the renaming register RNFR in the eighth clock cycle before the bit value P is reset. Then, the reservation station RSFLT executes the P cycle of the arithmetic instruction B in the ninth clock cycle, and inputs the arithmetic instruction B to the floating point arithmetic unit 26 (FIG. 13 (i)).

浮動小数点演算器２６は、演算命令ＡのＢ２サイクルでリネーミングレジスタＲＮＦＲからバイパスされるデータを選択し、選択したデータをＸ１サイクルで使用する（図１３（ｊ））。浮動小数点演算器２６は、演算命令ＢのＢ２サイクルでリネーミングレジスタＲＮＦＲからバイパスされるデータを選択し、選択したデータをＸ１サイクルで使用する（図１３（ｋ））。 The floating point arithmetic unit 26 selects data to be bypassed from the renaming register RNFR in the B2 cycle of the operation instruction A, and uses the selected data in the X1 cycle (FIG. 13 (j)). The floating point arithmetic unit 26 selects data to be bypassed from the renaming register RNFR in the B2 cycle of the operation instruction B, and uses the selected data in the X1 cycle (FIG. 13 (k)).

一方、リザベーションステーションＲＳＦＬＴは、ビット値Ｐがリセットされた後の９番目のクロックサイクルで、リネーミングレジスタＲＮＦＲからのデータのバイパスは無理であると判断する。そして、リザベーションステーションＲＳＦＬＴは、１０番目のクロックサイクルにおいて、演算命令ＣのＰサイクルを実行し、演算命令Ｃを浮動小数点演算器２６に投入する（図１３（ｌ））。浮動小数点演算器２６は、演算命令ＣのＢ２サイクルで浮動小数点レジスタＦＲからデータを読み出し、読み出したデータをＸ１サイクルで使用する（図１３（ｍ））。 On the other hand, the reservation station RSFLT determines that it is impossible to bypass data from the renaming register RNFR in the ninth clock cycle after the bit value P is reset. Then, the reservation station RSFLT executes the P cycle of the arithmetic instruction C in the tenth clock cycle, and inputs the arithmetic instruction C to the floating point arithmetic unit 26 (FIG. 13 (l)). The floating point arithmetic unit 26 reads data from the floating point register FR in the B2 cycle of the operation instruction C, and uses the read data in the X1 cycle (FIG. 13 (m)).

以上、図２から図１３に示す実施形態においても、図１に示す実施形態と同様の効果を得ることができる。すなわち、データキャッシュ２２からリネーミングレジスタＲＮＦＲへロードされるデータのタイミングずれが発生する場合にも、既存のリネーミングレジスタＲＮＦＲを利用してデータを待ち合わせることができる。また、バイパス制御部２４は、図７で説明した簡易な回路により、データがデータキャッシュ２２から１回で読み出されるか複数回に分けて読み出されるかを判定することができる。この結果、演算処理装置ＯＰＤ２の回路規模の増加を抑制することができ、演算処理装置ＯＰＤ２の消費電力の増加を抑えることができ、演算処理装置ＯＰＤ２の動作周波数を向上させることができる。 As mentioned above, also in embodiment shown in FIGS. 2-13, the effect similar to embodiment shown in FIG. 1 can be acquired. That is, even when a timing shift occurs in data loaded from the data cache 22 to the renaming register RNFR, data can be waited using the existing renaming register RNFR. Further, the bypass control unit 24 can determine whether the data is read from the data cache 22 once or divided into a plurality of times by the simple circuit described with reference to FIG. As a result, an increase in the circuit scale of the arithmetic processing unit OPD2 can be suppressed, an increase in power consumption of the arithmetic processing device OPD2 can be suppressed, and the operating frequency of the arithmetic processing device OPD2 can be improved.

例えば、データがデータキャッシュ２２から複数回に分けて読み出される場合にも、図４で説明したように、リネーミングレジスタ部３６の回路規模を従来と同等にすることができる。また、データキャッシュ２２から異なるタイミングで読み出されるデータをリネーミングレジスタＲＮＦＲで結合することできるため、データの結合用の回路は演算処理装置ＯＰＤ２に設けられない。 For example, even when data is read from the data cache 22 in a plurality of times, as described with reference to FIG. 4, the circuit scale of the renaming register unit 36 can be made equivalent to the conventional one. In addition, since data read from the data cache 22 at different timings can be combined by the renaming register RNFR, a data combining circuit is not provided in the arithmetic processing unit OPD2.

さらに、データがデータキャッシュ２２から複数回に分けて読み出される場合、バイパス制御部２４は、データが揃っていないロードレジスタ３０からのバイパスを禁止することで、演算処理装置ＯＰＤ２の誤動作を抑制することができる。 Further, when the data is read from the data cache 22 in a plurality of times, the bypass control unit 24 suppresses malfunction of the arithmetic processing unit OPD2 by prohibiting bypass from the load register 30 in which data is not prepared. Can do.

一方、例えば、上位４バイトと下位４バイトとのデータに対応して、２ビットを記憶可能なフラグ領域ＦＬＧをバイパス管理テーブルＢＰＴＢＬに設け、２ビットが両方セットされたときにデータが揃ったと判断して２つの４バイトデータを結合してもよい。しかしながら、この場合、バイパス管理テーブルＢＰＴＢＬの規模が大きくなり、また、フラグ領域ＦＬＧの各ビットをセットする制御回路が追加される。さらに、データキャッシュ２２の出力で上位４バイトと下位４バイトとのデータを結合する場合、下位４バイトのデータがデータキャッシュ２２から出力されるまで、先に出力された上位４バイトのデータを保持するバッファ回路が設けられる。このため、回路規模は増加する。 On the other hand, for example, a flag area FLG that can store 2 bits is provided in the bypass management table BPTBL corresponding to the data of the upper 4 bytes and the lower 4 bytes, and it is determined that the data is ready when both 2 bits are set. Then, two 4-byte data may be combined. However, in this case, the size of the bypass management table BPTBL is increased, and a control circuit for setting each bit of the flag area FLG is added. Further, when the data of the upper 4 bytes and the lower 4 bytes are combined at the output of the data cache 22, the upper 4 bytes of data output first are held until the lower 4 bytes of data are output from the data cache 22. A buffer circuit is provided. For this reason, the circuit scale increases.

なお、図２から図１３に示す実施形態は、ロード命令に基づいて、データキャッシュ２２から複数回に分けて出力される固定小数点データを固定小数点レジスタＲに転送する場合のバイパス制御に適用されてもよい。 The embodiment shown in FIGS. 2 to 13 is applied to bypass control when transferring fixed-point data output from the data cache 22 in a plurality of times to the fixed-point register R based on a load instruction. Also good.

また、上述した実施形態は、ＳＩＭＤ（single instruction multiple data）方式の演算処理装置に適用されてもよい。この場合、図３に示す各浮動小数点レジスタＦＲは、並列処理される複数のデータ群を格納する複数の領域を有し、図４に示す各リネーミングレジスタＲＮＦＲは、並列処理される複数のデータ群を格納する複数の領域を有する。また、浮動小数点演算器２６は、複数のデータ群を並列に演算する複数の演算器を有する。なお、ロード命令に基づいて、データキャッシュ２２から３回以上に分けてデータが読み出される場合、バイパス制御部は、最後に読み出すデータの開始信号（ＡＬＤＬ等）に基づいて、バイパス可能信号ＢＰＥＮを生成する。 Further, the above-described embodiment may be applied to a SIMD (single instruction multiple data) type arithmetic processing apparatus. In this case, each floating point register FR shown in FIG. 3 has a plurality of areas for storing a plurality of data groups to be processed in parallel, and each renaming register RNFR shown in FIG. 4 has a plurality of data to be processed in parallel. It has a plurality of areas for storing groups. The floating point arithmetic unit 26 includes a plurality of arithmetic units that operate a plurality of data groups in parallel. When the data is read from the data cache 22 in three or more times based on the load instruction, the bypass control unit generates the bypassable signal BPEN based on the start signal (ALDL or the like) of the data to be read last. To do.

以上の詳細な説明により、実施形態の特徴点および利点は明らかになるであろう。これは、特許請求の範囲がその精神および権利範囲を逸脱しない範囲で前述のような実施形態の特徴点および利点にまで及ぶことを意図するものである。また、当該技術分野において通常の知識を有する者であれば、あらゆる改良および変更に容易に想到できるはずである。したがって、発明性を有する実施形態の範囲を前述したものに限定する意図はなく、実施形態に開示された範囲に含まれる適当な改良物および均等物に拠ることも可能である。 From the above detailed description, features and advantages of the embodiments will become apparent. This is intended to cover the features and advantages of the embodiments described above without departing from the spirit and scope of the claims. Also, any improvement and modification should be readily conceivable by those having ordinary knowledge in the art. Therefore, there is no intention to limit the scope of the inventive embodiments to those described above, and appropriate modifications and equivalents included in the scope disclosed in the embodiments can be used.

１…デコーダ部；２…実行制御部；３…演算実行部；４…メモリ制御部；５…メモリ；６…バイパス制御部；７…第１レジスタ；８…第２レジスタ；９…レジスタ管理部；１０…命令キャッシュ；１２…命令バッファ；１４…デコーダ部；１６…レジスタ管理テーブル；１８…リザベーションステーション；２０…アドレス生成部；２２…データキャッシュ；２４…バイパス制御部；２６…浮動小数点演算器；２８…固定小数点演算器；３０…ロードレジスタ；３２、３４…リザルトレジスタ；３６、３８…リネーミングレジスタ部；４０…浮動小数点レジスタ部；４２…固定小数点レジスタ部；４４…コミット制御部；ＡＬＤＬ、ＡＬＤＨ…制御信号；ＡＬＤＵＢＡ…アドレス；ＡＳＧＥＮ…サイクル信号生成部；ＢＰＥＮ…バイパス可能信号；ＢＰＴＢＬ…バイパス管理テーブル；ＦＬＧ…フラグ領域；ＦＲ…浮動小数点レジスタ；ＭＲＥＱ、ＭＲＥＱＣ…アクセス要求；ＭＳＧＥＮ…サイクル信号生成部；ＯＰＤ１、ＯＰＤ２…演算処理装置；Ｒ…固定小数点レジスタ；ＲＭＴＢＬ１、ＲＭＴＢＬ２…テーブル；ＲＮＦＲ、ＲＮＲ…リネーミングレジスタ；ＲＳＦＬＴ、ＲＳＦＩＸ、ＲＳＭＡ…リザベーションステーション；ＳＳＧＥＮ…セット信号生成回路；ＴＳＣＮＴ…テーブル DESCRIPTION OF SYMBOLS 1 ... Decoder part; 2 ... Execution control part; 3 ... Arithmetic execution part; 4 ... Memory control part; 5 ... Memory; 6 ... Bypass control part; 7 ... 1st register; 10 ... Instruction cache; 12 ... Instruction buffer; 14 ... Decoder unit; 16 ... Register management table; 18 ... Reservation station; 20 ... Address generation unit; 22 ... Data cache; 24 ... Bypass control unit; 28 ... Fixed point arithmetic unit; 30 ... Load register; 32, 34 ... Result register; 36, 38 ... Renaming register unit; 40 ... Floating point register unit; 42 ... Fixed point register unit; 44 ... Commit control unit; , ALDH ... control signal; ALDUBA ... address; ASGEN ... cycle signal generator; BPEN ... bypassable BPTBL ... Bypass management table; FLG ... Flag area; FR ... Floating point register; MREQ, MREQC ... Access request; MSGEN ... Cycle signal generator; OPD1, OPD2 ... Arithmetic processing unit; ... Table; RNFR, RNR ... Renaming register; RSFLT, RSFIX, RSMA ... Reservation station; SSGEN ... Set signal generation circuit; TSCNT ... Table

Claims

A decoder unit for decoding instructions;
An execution control unit that holds instructions decoded by the decoder unit and sequentially outputs executable instructions;
An operation execution unit that executes an operation based on an operation instruction indicating execution of the operation received from the execution control unit;
A memory for storing data used in the calculation execution unit;
A plurality of first registers for holding data used for the calculation executed by the calculation execution unit;
A plurality of second registers assigned corresponding to the first register used for the operation and temporarily holding the data obtained by executing the operation instruction or the memory access instruction before transferring to the first register;
A register management unit that holds an allocation relationship between the first register and the second register and is referred to by the execution control unit;
A memory control unit that outputs an access request to the memory based on a memory access instruction for reading data from the memory received from the execution control unit;
When the memory control unit outputs a plurality of the access requests based on the memory access command and sequentially transfers the data read out from the memory in a plurality of times to the second register, it is read out last from the memory. And a bypass control unit that causes the execution control unit to output the arithmetic instruction at a timing to bypass the data from the second register to the arithmetic execution unit after the data to be transferred is transferred to the second register. Arithmetic processing unit.

The memory control unit reads data including a plurality of data groups from the memory based on the memory access instruction, and bypasses a plurality of start signals indicating the start of reading of the plurality of data groups from the memory. Output to the control unit,
When the bypass control unit receives the start signal corresponding to the plurality of data groups at different timings, the bypass control unit allows the bypass control signal to permit bypassing of data from the second register to the arithmetic execution unit. When the start signal corresponding to all of the plurality of data groups is received at a common timing after the last start signal is received from the unit, the bypassable signal is received. Output based on
The execution control unit outputs the operation instruction to the operation execution unit at a timing enabling data bypass from the second register to the operation execution unit after receiving the bypass enable signal. The arithmetic processing apparatus according to claim 1.

The bypass control unit
A first generation circuit that generates a first set signal based on reception of a plurality of the start signals at the common timing;
A second generation circuit for generating a second set signal after the predetermined cycle from reception of the last start signal;
The arithmetic processing apparatus according to claim 2, further comprising an output circuit that outputs the bypassable signal based on the first set signal or the second set signal.

Each of the second registers has a plurality of storage areas for storing each of the plurality of data groups read from the memory in a plurality of times,
The arithmetic processing unit further includes:
When the plurality of data groups are read from the memory in a plurality of times, a plurality of write control signals for storing each of the plurality of data groups in the plurality of storage areas are sequentially generated for each data group, 4. The arithmetic processing device according to claim 2, further comprising: a write control unit configured to generate the plurality of write control signals at a common timing when the plurality of data groups are read from the memory at a time. .

The memory has a data cache including a plurality of cache lines for storing data;
The memory control unit sequentially outputs the access request for each cache line when data read from the data cache based on the memory access instruction is held across the plurality of cache lines. The arithmetic processing apparatus according to claim 1, wherein:

The arithmetic processing unit further includes:
A third register disposed between the memory and the second register and temporarily holding data read from the memory before being transferred to the second register;
The bypass control unit prohibits bypassing of data from the third register to the arithmetic execution unit when the memory control unit reads data from the memory in a plurality of times. The arithmetic processing device according to claim 5.

A decoder unit that decodes an instruction, an execution control unit that holds instructions decoded by the decoder unit, and sequentially outputs executable instructions, and an operation based on an operation instruction that indicates execution of an operation received from the execution control unit A calculation execution unit that executes data, a memory that stores data used by the calculation execution unit, a plurality of first registers that hold data used for calculation performed by the calculation execution unit, and a first that is used for calculation A plurality of second registers that are allocated corresponding to the registers and temporarily hold before transferring the data obtained by executing the operation instruction or the memory access instruction to the first register; the first register; In a control method of an arithmetic processing unit having a register management unit that holds an allocation relationship with a second register and is referenced by the execution control unit,
The memory control unit included in the arithmetic processing unit outputs an access request to the memory based on a memory access instruction for reading data from the memory received from the execution control unit,
When the memory control unit outputs a plurality of the access requests based on the memory access instruction and sequentially transfers data read out from the memory in a plurality of times to the second register, the arithmetic processing unit has The bypass control unit outputs the arithmetic instruction to the execution control unit at a timing to bypass the data from the second register to the arithmetic execution unit after the data read last from the memory is transferred to the second register A control method for an arithmetic processing device, characterized in that: