JP2005134987A

JP2005134987A - Pipeline arithmetic processor

Info

Publication number: JP2005134987A
Application number: JP2003367508A
Authority: JP
Inventors: Masanobu Seki; 正伸関
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2003-10-28
Filing date: 2003-10-28
Publication date: 2005-05-26

Abstract

<P>PROBLEM TO BE SOLVED: To provide a pipeline arithmetic processor allowing forwarding even if not arranging an NOP instruction inside a program when processing the program including a prescribed instruction requiring a plurality of cycles at execution. <P>SOLUTION: A control part 11 decides whether an instruction read from an instruction memory 1 is a load instruction requiring two cycles at the execution or not. As a result, when deciding that it is the load instruction, the control part 11 stops the operation of a program counter 10 in the first cycle to stop the execution of the processing of the succeeding instruction, and operates the program counter 10 in the second cycle to allow the execution of the processing of the succeeding instruction, in a period of the two cycles of the load instruction. When not deciding that it is the load instruction, the controller 11 performs no process. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、命令を細かい処理単位に分割し、それらをオーバラップさせて処理し、命令の処理の高速化を図るパイプライン演算処理装置に関する。 The present invention relates to a pipeline arithmetic processing apparatus that divides an instruction into fine processing units, processes them by overlapping them, and increases the processing speed of the instructions.

この種のパイプライン演算処理装置としては、所定の段数の命令パイプライン処理向けに開発されたプログラムを、より段数の多い命令パイプライン処理で実行する場合にも、正確に動作するものが知られている（特許文献１参照）。
ところで、従来のパイプライン演算処理装置が行うパイプライン処理として、図３に示すように、例えば第１命令が分岐命令の場合には、以下のような処理がなされていた。 As this kind of pipeline arithmetic processing device, a device that operates correctly even when a program developed for instruction pipeline processing of a predetermined number of stages is executed by instruction pipeline processing having a larger number of stages is known. (See Patent Document 1).
By the way, as shown in FIG. 3, for example, when the first instruction is a branch instruction, the following processing is performed as the pipeline processing performed by the conventional pipeline arithmetic processing unit.

すなわち、この場合には、その分岐命令を命令解読ステージＲＤで解読して確定するまでは、後続の命令の読み出しアドレスが決定されない。このため、第２命令として、ＮＯＰ（No Operation：無操作）命令を置いていた。
ここで、図３において、「ＩＦ」は命令を取り込むための命令取り込みステージを表し、「ＲＤ」は命令をデコード（解読）する命令解読ステージを表し、「ＥＸＥ」は命令を実行する命令実行ステージを表し、これらの意味は本明細書においては、以下同様である。 That is, in this case, the read address of the subsequent instruction is not determined until the branch instruction is decoded and determined at the instruction decoding stage RD. For this reason, a NOP (No Operation) instruction has been placed as the second instruction.
In FIG. 3, “IF” represents an instruction fetch stage for fetching an instruction, “RD” represents an instruction decode stage for decoding (decodes) an instruction, and “EXE” represents an instruction execution stage for executing an instruction. These meanings are the same in the present specification.

また、従来のパイプライン演算処理装置が行うパイプライン処理として、図４に示すように、例えば第１命令および第２命令の双方が分岐命令というように、分岐命令が連続する場合には、プログラムが破綻するという不都合ある。
すなわち、分岐命令が連続する場合には、１番目の分岐先アドレスに遷移後、次のサイクルで本来分岐先アドレスをインクリメントしたアドレスにならなくてはいけないが、２番目の分岐先のアドレスに書き換わってしまい、このため１番目の分岐先で実行されるプログラムが実行されずに破綻してしまう。 Further, as pipeline processing performed by a conventional pipeline arithmetic processing unit, as shown in FIG. 4, for example, when both branch instructions are consecutive, such as the first instruction and the second instruction are both branch instructions, Is inconvenient.
In other words, when branch instructions are continuous, after the transition to the first branch destination address, it must be the address that originally incremented the branch destination address in the next cycle, but it is written to the second branch destination address. As a result, the program executed at the first branch destination fails without being executed.

このため、両分岐命令、すなわち第１命令および第２命令の後にそれぞれのＮＯＰ命令を置くことで、プログラムの破綻を防止していた。
さらに、従来のパイプライン演算処理装置では、ステージ「ＲＤ」でレジスタから値を実行部に読み込むために、直前の命令の演算結果を直後の命令で使えないという不都合があり、この不都合を解消するために、フォワーディング機構（チェイニング機構）を備えている。 For this reason, the failure of the program is prevented by placing the respective NOP instructions after both branch instructions, that is, the first instruction and the second instruction.
Further, in the conventional pipeline arithmetic processing apparatus, since the value is read from the register to the execution unit at the stage “RD”, there is an inconvenience that the operation result of the immediately preceding instruction cannot be used by the immediately following instruction, and this inconvenience is solved. Therefore, a forwarding mechanism (chaining mechanism) is provided.

このフォワーディング機構は、１つのレジスタに対して、データの書き込みと読み出しが同時に発生した場合に、書き込みデータを演算処理装置へ直接引き渡す仕組みである。これにより、各命令の見かけ上のディレイ・スロットを１つずつ少なくするという効果がある。
図５に、命令実行ステージＥＸＥ１、ＥＸＥ２が２サイクルである第１命令の演算結果が、フォワーディングによって第３命令で利用可能となる例を示す。 This forwarding mechanism is a mechanism in which write data is directly transferred to an arithmetic processing unit when data writing and reading occur simultaneously for one register. This has the effect of reducing the apparent delay slot of each instruction by one.
FIG. 5 shows an example in which the operation result of the first instruction in which the instruction execution stages EXE1 and EXE2 are two cycles can be used for the third instruction by forwarding.

次に、図６では、次のようなプログラムからなる、加算命令ADD と分岐命令BEQ とをパイプライン処理する場合について検討する。
ADD ％SR0 ％SR1 ％SR2 （レジスタSR0 ＝レジスタSR1 ＋レジスタSR2 )
BEQ ％SR0 ％SR3 L abel1（レジスタSR0 ＝レジスタSR3 ならばLabel1へ分岐)
この場合において、フォワーディング機構があれば、加算命令ADD と分岐命令BEQ のパイプライン処理は、図６に示すようになりインタ−ロックさせる必要はない。
特開平２０００−２９６９６号公報 Next, in FIG. 6, the case where the add instruction ADD and the branch instruction BEQ, which are composed of the following programs, are pipeline processed will be considered.
ADD% SR0% SR1% SR2 (Register SR0 = Register SR1 + Register SR2)
BEQ% SR0% SR3 Label1 (If register SR0 = register SR3, branch to Label1)
In this case, if there is a forwarding mechanism, the pipeline processing of the add instruction ADD and the branch instruction BEQ is as shown in FIG. 6 and need not be interlocked.
Japanese Patent Laid-Open No. 2000-29696

ところで、従来は、パイプライン処理において、上記のように分岐命令が複数連続する場合には、その各分岐命令の直後にＮＯＰ命令をそれぞれ置くようにしている。
この場合にはＮＯＰ命令を各分岐命令の直後にいちいち追加するため、プログラムコード量が増加するという不都合がある。
一方、上記のように、加算命令と分岐命令を行う場合には、フォワーディング機構を用いて、分岐命令の直前の命令の実行ステージで得られたデータを分岐命令の分岐条件として直接用いることは理論的に可能である。 Conventionally, in pipeline processing, when a plurality of branch instructions are continuous as described above, a NOP instruction is placed immediately after each branch instruction.
In this case, since the NOP instruction is added immediately after each branch instruction, the amount of program code increases.
On the other hand, as described above, when performing an add instruction and a branch instruction, it is theoretically possible to directly use the data obtained in the execution stage of the instruction immediately before the branch instruction as the branch condition of the branch instruction using the forwarding mechanism. Is possible.

しかし、加算命令における命令の実行時間と分岐命令のアドレス生成時間とを加えた時間は、演算回路の遅延が大きくなり動作周波数が低下する原因になる。
そこで、フォワーディングがかかる場合でもプログラムを変更せずに分岐処理に対処でき、フォワーディングの遅延による動作周波数の低下を抑えることができる、以下のようなパイプライン処理が考えられる。 However, the time obtained by adding the instruction execution time in the add instruction and the address generation time of the branch instruction increases the delay of the arithmetic circuit and decreases the operating frequency.
Therefore, the following pipeline processing that can cope with branch processing without changing the program even when forwarding is applied and can suppress a decrease in operating frequency due to a delay in forwarding can be considered.

このパイプライン処理の一例について、図７を参照して説明する。
図７は、次のようなプログラムからなる加算命令ＡＤＤと、分岐命令ＢＥＱとを実行する場合であって、分岐命令の分岐条件に使用されるレジスタＳＲ０が、直前の加算命令で更新される場合である。
サンプルコードＡ
ADD ％SR0 ％SR1 ％SR2 （レジスタSR0 ＝レジスタSR1 ＋レジスタSR2 )
BEQ ％SR0 ％S R3 Label1（レジスタSR0 ＝レジスタSR3 ならばLabel1へ分岐)
この場合には、最初のクロックサイクル「１」で、第１命令（加算命令）が命令メモリから命令レジスタに取り込まれる（ＩＦステージ）。 An example of this pipeline processing will be described with reference to FIG.
FIG. 7 shows a case where an add instruction ADD and a branch instruction BEQ consisting of the following programs are executed, and the register SR0 used for the branch condition of the branch instruction is updated by the immediately preceding add instruction. It is.
Sample code A
ADD% SR0% SR1% SR2 (Register SR0 = Register SR1 + Register SR2)
BEQ% SR0% S R3 Label1 (If register SR0 = register SR3, branch to Label1)
In this case, in the first clock cycle “1”, the first instruction (addition instruction) is fetched from the instruction memory to the instruction register (IF stage).

次のクロックサイクル「２」では、その命令レジスタに取り込まれた第１命令（加算命令）が解読され（ＲＤステージ）、これと同時に、第２命令（分岐命令）が命令メモリから命令レジスタに取り込まれる（ＩＦステージ）。
次のクロックサイクル「３」では、その解読された第１命令（加算命令）が実行され、その処理結果がレジスタに書き込まれる。 In the next clock cycle “2”, the first instruction (add instruction) fetched in the instruction register is decoded (RD stage), and at the same time, the second instruction (branch instruction) is fetched from the instruction memory to the instruction register. (IF stage).
In the next clock cycle “3”, the decoded first instruction (addition instruction) is executed, and the processing result is written to the register.

また、クロックサイクル「３」では、第２命令（分岐命令）を解読し、第１命令（加算命令）の加算結果を書き込むレジスタと、第２命令（分岐命令）で読み出すレジスタとが同一か否かを判定する。この判定の結果、両レジスタが同一の場合には、ハードウエアを用いてＮＯＰ命令に置き換えるようにし、レジスタから分岐命令を読み出すことはしない。さらに、クロックサイクル「３」では、第３命令が命令メモリから命令レジスタに取り込まれる。 Also, in clock cycle “3”, the register that decodes the second instruction (branch instruction) and writes the addition result of the first instruction (addition instruction) is the same as the register that reads by the second instruction (branch instruction). Determine whether. If the result of this determination is that both registers are the same, they are replaced with NOP instructions using hardware, and branch instructions are not read from the registers. Further, in clock cycle “3”, the third instruction is fetched from the instruction memory into the instruction register.

次のクロックサイクル「４」では、第２命令に関して以下のように処理される。すなわち、第１命令（加算命令）で実行された加算結果を書き込んだレジスタを読み込み、分岐先のアドレスを生成する。
また、クロックサイクル「４」では、第３命令は以下のように処理される。すなわち、プログラムカウンタの動作（インクリメント）を停止させ、第３命令が再び命令メモリから命令レジスタに取り込まれる。 In the next clock cycle “4”, the second instruction is processed as follows. That is, the register in which the addition result executed by the first instruction (addition instruction) is written is read to generate a branch destination address.
In the clock cycle “4”, the third instruction is processed as follows. That is, the operation (increment) of the program counter is stopped, and the third instruction is again taken from the instruction memory into the instruction register.

以上説明した処理は、以下のプログラムのように、加算命令ＡＤＤと、分岐命令ＢＥＱとの間に、ＮＯＰ命令（内部の状態を変化させない命令）が挿入された場合と同じ動作をする。
ADD ％SR0 ％SR1 ％SR2 （レジスタSR0 ＝レジスタSR1 ＋レジスタSR2 )
NOP
BEQ ％SR0 ％SR3 Label1（レジスタSR0 ＝レジスタSR3 ならばLabel1へ分岐)
このように、図７に示すような処理では、第１命令（加算命令）の加算結果を書き込むレジスタと、第２命令（分岐命令）で読み出すレジスタとが同一の場合に、ハードウエア的にＮＯＰ命令に置き換えることにより、プログラムからＮＯＰ命令を省略することができる。 The processing described above performs the same operation as when a NOP instruction (an instruction that does not change the internal state) is inserted between the addition instruction ADD and the branch instruction BEQ as in the following program.
ADD% SR0% SR1% SR2 (Register SR0 = Register SR1 + Register SR2)
NOP
BEQ% SR0% SR3 Label1 (If register SR0 = register SR3, branch to Label1)
As described above, in the processing shown in FIG. 7, when the register for writing the addition result of the first instruction (addition instruction) and the register for reading by the second instruction (branch instruction) are the same, a hardware NOP is used. By replacing with an instruction, the NOP instruction can be omitted from the program.

しかし、ロード命令のように実行処理時に２サイクルからなる命令でフォワーディングを行う場合には、プログラムにおいて、ロード命令の後に必ずＮＯＰ命令を配置しておくが必要であるという制約があった。
そこで、本発明の目的は、実行処理時に複数サイクルが必要な所定の命令を含むプログラムを処理する場合に、そのプログラム中にＮＯＰ命令を配置しておかなくとも、フォワーディングが可能なパイプライン演算処理装置を提供することにある。 However, when forwarding is performed with an instruction consisting of two cycles at the time of execution processing, such as a load instruction, there is a restriction that a NOP instruction must be arranged after the load instruction in the program.
Accordingly, an object of the present invention is to process a pipeline operation process capable of forwarding even if a NOP instruction is not arranged in the program when a program including a predetermined instruction that requires a plurality of cycles during execution processing is processed. To provide an apparatus.

上記の課題を解決し本発明の目的を達成するために、各発明は、以下のように構成した。
即ち、第１の発明は、命令がパイプライン処理されるパイプライン演算処理装置において、前記命令が実行処理時に複数サイクルを必要とする所定命令であるか否を判定する判定手段と、この判定手段が、前記命令が所定命令であると判定したときには、その所定命令の実行処理が行われる複数サイクルの期間において、後続命令の処理の実行を停止させ、その停止の解除後に前記後続命令の処理を実行させる制御手段と、を備えている。 In order to solve the above-mentioned problems and achieve the object of the present invention, each invention is configured as follows.
In other words, according to a first aspect of the present invention, there is provided a pipeline arithmetic processing unit in which an instruction is pipeline processed. A determination unit that determines whether the instruction is a predetermined instruction that requires a plurality of cycles during execution processing, and the determination unit However, when it is determined that the instruction is a predetermined instruction, the execution of the subsequent instruction is stopped during a plurality of cycles in which the execution process of the predetermined instruction is performed, and the processing of the subsequent instruction is performed after the stop is released. Control means to be executed.

第２の発明は、第１の発明において、前記判定手段は、前記命令が実行処理時に２サイクルを必要とするロード命令であるか否かを判定するようになっており、前記制御手段は、前記判定手段がロード命令であると判定したときには、そのロード命令の実行処理が行われる２サイクルの期間において、第１サイクル目にプログラムカウンタの動作を一時停止させて後続命令の処理の実行を停止させ、第２サイクル目に前記プログラムカウンタを動作させて前記後続命令の処理を実行させるようになっている。 In a second aspect based on the first aspect, the determination means determines whether or not the instruction is a load instruction that requires two cycles during execution processing, and the control means includes: When the determination means determines that the instruction is a load instruction, during the two-cycle period in which the execution process of the load instruction is performed, the operation of the program counter is temporarily stopped in the first cycle and the execution of the subsequent instruction is stopped. In the second cycle, the program counter is operated to process the subsequent instruction.

このような構成からなる本発明によれば、実行処理時に複数サイクルが必要な所定の命令を含むプログラムを処理する場合に、そのプログラム中にＮＯＰ命令を配置しておかなくとも、フォワーディングが可能なパイプライン演算処理装置を提供できる。 According to the present invention having such a configuration, when a program including a predetermined instruction that requires a plurality of cycles during execution processing is processed, forwarding is possible even if a NOP instruction is not arranged in the program. A pipeline arithmetic processing device can be provided.

以下、本発明の実施の形態について図面を参照して説明する。
図１は、本発明のパイプライン演算処理装置の実施形態の構成を示すブロック図である。
この実施形態に係るパイプライン演算処理装置は、図１に示すように、命令メモリ１と、命令レジスタ２と、分岐先アドレス計算部３と、命令デコード部４と、実行部５と、レジスタファイル６と、加算器８と、マルチプレクサ９と、プログラムカウンタ１０と、制御部１１とを備えている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing a configuration of an embodiment of a pipeline arithmetic processing apparatus according to the present invention.
As shown in FIG. 1, the pipeline arithmetic processing apparatus according to this embodiment includes an instruction memory 1, an instruction register 2, a branch destination address calculation unit 3, an instruction decoding unit 4, an execution unit 5, and a register file. 6, an adder 8, a multiplexer 9, a program counter 10, and a control unit 11.

命令メモリ１は、各種の命令（プログラム）を格納するメモリであり、所定のアドレスに所定の命令が格納されるようになっている。この命令メモリ１は、プログラムカウンタ１０が指定するアドレス（番地）の命令を読み出し、その読み出した命令を命令レジスタ２に出力するようになっている。
命令レジスタ２は、命令メモリ１から出力された命令を一時的に格納するレジスタである。この命令レジスタ２に格納される命令は、命令デコード部４に供給されるようになっている。 The instruction memory 1 is a memory for storing various instructions (programs), and predetermined instructions are stored at predetermined addresses. The instruction memory 1 reads an instruction at an address (address) designated by the program counter 10 and outputs the read instruction to the instruction register 2.
The instruction register 2 is a register that temporarily stores an instruction output from the instruction memory 1. The instruction stored in the instruction register 2 is supplied to the instruction decoding unit 4.

分岐先アドレス計算部３は、命令デコード部４からの命令に基づいて、その命令の分岐先のアドレスを計算する必要があるときには、その分岐先アドレスを求めるものである。この分岐先アドレス計算部３が求めたアドレスは、マルチプレクサ９の入力側に供給されるようになっている。
命令デコード部４は、命令レジスタ２からの命令をデコード（解釈）するものである。この命令デコード部４で解釈された命令は、実行部５に供給されるようになっている。 The branch destination address calculation unit 3 obtains the branch destination address when it is necessary to calculate the branch destination address of the instruction based on the instruction from the instruction decoding unit 4. The address obtained by the branch destination address calculation unit 3 is supplied to the input side of the multiplexer 9.
The instruction decoding unit 4 decodes (interprets) an instruction from the instruction register 2. The instruction interpreted by the instruction decoding unit 4 is supplied to the execution unit 5.

実行部５は、命令デコード部４でデコードされた命令を実行するものである。この実行部５で処理されたデータは、レジスタファイル６の所定のレジスタに格納されるようになっている。
レジスタファイル６は、複数のレジスタからなり、その各レジスタには実行部５で処理されたデータが格納されるようになっている。その各レジスタの内容は、命令デコード部４および実行部５にそれぞれ供給できるようになっている。 The execution unit 5 executes the instruction decoded by the instruction decoding unit 4. The data processed by the execution unit 5 is stored in a predetermined register of the register file 6.
The register file 6 is composed of a plurality of registers, and each register stores data processed by the execution unit 5. The contents of each register can be supplied to the instruction decode unit 4 and the execution unit 5, respectively.

加算器８は、プログラムカウンタ１０からの指示に基づき、次の命令を読み出すための加算を行い、その加算結果を出力するものである。次の命令を読み出すための加算とは、命令長に依存し３２ビット固定長命令の場合は＋４になり、１６ビット固定長命令の場合は＋２になる。この加算器８の加算出力は、マルチプレクサ９の入力側に供給されるようになっている。 The adder 8 performs addition for reading out the next instruction based on an instruction from the program counter 10 and outputs the addition result. The addition for reading the next instruction depends on the instruction length and is +4 for a 32-bit fixed-length instruction and +2 for a 16-bit fixed-length instruction. The addition output of the adder 8 is supplied to the input side of the multiplexer 9.

マルチプレクサ９は、制御部１１からの指示に基づいて分岐先アドレス計算部３または加算器８の出力を選択するものである。このマルチプレクサ９の出力は、プログラムカウンタ１０に供給されるようになっている。
プログラムカウンタ１０は、マルチプレクサ９からの出力に基づき、命令メモリ１の命令を読み出すアドレスを求めるカウンタである。このプログラムカウンタ１０の求めたアドレスは、命令メモリ１に供給されるようになっている。また、プログラムカウンタ１０は、後述のように制御部１１からの制御に基づき、その動作（インクリメント）が一時的に停止（ロック）されるように構成されている。 The multiplexer 9 selects the output of the branch destination address calculation unit 3 or the adder 8 based on an instruction from the control unit 11. The output of the multiplexer 9 is supplied to the program counter 10.
The program counter 10 is a counter for obtaining an address for reading an instruction in the instruction memory 1 based on an output from the multiplexer 9. The address obtained by the program counter 10 is supplied to the instruction memory 1. The program counter 10 is configured such that its operation (increment) is temporarily stopped (locked) based on control from the control unit 11 as described later.

制御部１１は、命令レジスタ２から供給される命令が、その実行処理時に複数サイクル（例えば２サイクル）を必要とする所定の命令（この例ではロード命令）であるか否を判定するようになっている。
また、制御部１１は、その命令がロード命令であると判定した場合には、そのロード命令の実行処理が行われる２サイクルの期間において、第１サイクル目にプログラムカウンタ１０の動作を一時停止させて後続命令の処理の実行を停止させ、第２サイクル目にプログラムカウンタ１０を動作させて後続命令の処理を実行させるようになっている。 The control unit 11 determines whether or not the instruction supplied from the instruction register 2 is a predetermined instruction (in this example, a load instruction) that requires a plurality of cycles (for example, two cycles) during the execution process. ing.
If the control unit 11 determines that the instruction is a load instruction, the control unit 11 temporarily stops the operation of the program counter 10 in the first cycle in the period of two cycles in which the execution process of the load instruction is performed. Then, the execution of the subsequent instruction is stopped, and the program counter 10 is operated in the second cycle to execute the subsequent instruction.

なお、制御部１１の上述のような制御の具体例は、後述のように図面を参照して説明する。
次に、このような構成からなる実施形態のパイプライン処理について、図２を参照して説明する。
この例では、第１命令が実行処理時に２サイクルを必要とするロード命令（ＬＷ）であり、第２命令が加算命令（ＡＤＤ）である場合について説明する。 In addition, the specific example of the above control of the control part 11 is demonstrated with reference to drawings so that it may mention later.
Next, pipeline processing according to the embodiment having such a configuration will be described with reference to FIG.
In this example, a case will be described in which the first instruction is a load instruction (LW) that requires two cycles during execution processing, and the second instruction is an addition instruction (ADD).

すなわち、この例では、プログラムが以下のようなサンプルコードからなる場合について説明する。
LW ％SR0 [ ％SR1 ] ％SR2 （( レジスタSR1 ＋レジスタSR2 ) のアドレスから読み出す)
ADD ％SR3 ％SR0 ％SR3 （レジスタSR3 ＝レジスタSR0 ＋レジスタSR3 )
この場合には、図２に示すように、最初のクロックサイクル「１」で、第１命令が命令メモリ１から命令レジスタ２に取り込まれる（ＩＦステージ）。このとき、制御部１１は、その命令レジスタ２に取り込まれた第１命令を受け取り、その受け取った第１命令が実行処理時に２サイクルを必要とするロード命令か否かを判定する。 That is, in this example, a case where the program is composed of the following sample code will be described.
LW% SR0 [% SR1]% SR2 (Read from the address of (register SR1 + register SR2))
ADD% SR3% SR0% SR3 (Register SR3 = Register SR0 + Register SR3)
In this case, as shown in FIG. 2, the first instruction is fetched from the instruction memory 1 into the instruction register 2 in the first clock cycle “1” (IF stage). At this time, the control unit 11 receives the first instruction fetched into the instruction register 2, and determines whether or not the received first instruction is a load instruction that requires two cycles during execution processing.

ここで、第１命令は上記のようにロード命令であるので、制御部１１はその第１命令をロード命令であると判定する。
次のクロックサイクル「２」では、命令レジスタ２に取り込まれた第１命令が、命令デコード部４で解読される（ＲＤステージ）。このときには、次の第２命令が命令メモリ１から命令レジスタ２に取り込まれる（ＩＦステージ）。また、このときには、制御部１１は、その命令レジスタ２に取り込まれた第２命令がロード命令か否かを判定するが、ここではロード命令でないと判定する。 Here, since the first instruction is a load instruction as described above, the control unit 11 determines that the first instruction is a load instruction.
In the next clock cycle “2”, the first instruction fetched into the instruction register 2 is decoded by the instruction decoding unit 4 (RD stage). At this time, the next second instruction is fetched from the instruction memory 1 into the instruction register 2 (IF stage). At this time, the control unit 11 determines whether or not the second instruction fetched into the instruction register 2 is a load instruction, but here determines that it is not a load instruction.

次のクロックサイクル「３」では、命令デコード部４で解読された第１命令が実行部５で実行され、この実行の結果、データメモリ（図示せず）の読み出しアドレスが生成される（ＥＸＥ１ステージ）。
ここで、制御部１１は、上記のように第１命令が、実行処理時に２サイクルを必要とするロード命令であると判定している。このため、クロックサイクル「３」では、制御部１１はプログラムカウンタ１０の動作を一時停止させる。この結果、命令レジスタ２に取り込まれている第２命令は、命令デコード部４で解読されず、第３命令が命令メモリ１から命令レジスタ２に取り込まれない。 In the next clock cycle “3”, the first instruction decoded by the instruction decode unit 4 is executed by the execution unit 5, and as a result of this execution, a read address of a data memory (not shown) is generated (EXE1 stage) ).
Here, as described above, the control unit 11 determines that the first instruction is a load instruction that requires two cycles during execution processing. Therefore, in the clock cycle “3”, the control unit 11 temporarily stops the operation of the program counter 10. As a result, the second instruction fetched into the instruction register 2 is not decoded by the instruction decoding unit 4, and the third instruction is not fetched from the instruction memory 1 into the instruction register 2.

次のクロックサイクル「４」では、第１命令に関して以下の処理が行われる。すなわち、上記のように生成されたデータメモリのアドレスからデータを読み出し、この読み出したデータをレジスタファイル６のレジスタに書き込む（ＥＸＥ２ステージ）。
また、クロックサイクル「４」では、第２命令に関して以下の処理が行われる。すなわち、命令レジスタ２に取り込まれている第２命令が、命令デコード部４で解読される（ＲＤステージ）。 In the next clock cycle “4”, the following processing is performed for the first instruction. That is, data is read from the address of the data memory generated as described above, and the read data is written to the register of the register file 6 (EXE2 stage).
In the clock cycle “4”, the following processing is performed for the second instruction. That is, the second instruction fetched in the instruction register 2 is decoded by the instruction decoding unit 4 (RD stage).

また、このときには、制御部１１からの指示に基づき、実行部５は、上記のように第１命令でデータメモリから読み出したデータを、フォワーディングする。すなわち、実行部５は、第１命令でデータメモリから読み出したデータを、次のクロックサイクル「５」の演算で入力データとしてそのまま使用できるようにする。
さらに、クロックサイクル「４」では、第３命令が命令メモリ１から命令レジスタ２に取り込まれる（ＩＦステージ）。 At this time, based on an instruction from the control unit 11, the execution unit 5 forwards the data read from the data memory by the first instruction as described above. That is, the execution unit 5 enables the data read from the data memory by the first instruction to be used as input data in the next clock cycle “5” operation.
Further, in the clock cycle “4”, the third instruction is fetched from the instruction memory 1 into the instruction register 2 (IF stage).

次のクロックサイクル「５」では、第２命令に関して以下の処理が行われる。すなわち、実行部５は、上記のようにフォワーディングされたデータを用いて加算処理を行う（ＥＸＥステージ）。また、クロックサイクル「５」では、命令レジスタ２に取り込まれている第３命令が命令デコード部４で解読され（ＲＤステージ）、第４命令が命令メモリ１から命令レジスタ２に取り込まれる（ＩＦステージ）。 In the next clock cycle “5”, the following processing is performed for the second instruction. That is, the execution unit 5 performs an addition process using the data forwarded as described above (EXE stage). In the clock cycle “5”, the third instruction fetched into the instruction register 2 is decoded by the instruction decoding unit 4 (RD stage), and the fourth instruction is fetched from the instruction memory 1 into the instruction register 2 (IF stage). ).

次のクロックサイクル「６」では、命令デコード部４で解読された第３命令の内容が、実行部５で実行され（ＥＸＥステージ）、命令レジスタ２に取り込まれている第４命令が命令デコード部４で解読される（ＲＤステージ）。
以上説明したように、この実施形態では、命令が実行処理時に２サイクルを必要とするロード命令であるか否かを判定するようにし、ロード命令であると判定したときには、そのロード命令の実行処理が行われる２サイクルの期間において、第１サイクル目にプログラムカウンタの動作を停止させて後続命令の処理の実行を停止させ、第２サイクル目にプログラムカウンタを動作させて後続命令の処理を実行させるようにした。 In the next clock cycle “6”, the content of the third instruction decoded by the instruction decoding unit 4 is executed by the execution unit 5 (EXE stage), and the fourth instruction fetched in the instruction register 2 is changed to the instruction decoding unit. 4 (RD stage).
As described above, in this embodiment, it is determined whether or not an instruction is a load instruction that requires two cycles during execution processing. When it is determined that the instruction is a load instruction, execution processing of the load instruction is performed. In the period of two cycles in which the program is performed, the operation of the program counter is stopped in the first cycle to stop the processing of the subsequent instruction, and the program counter is operated in the second cycle to execute the processing of the subsequent instruction. I did it.

このため、この実施形態によれば、プログラムにロード命令のように実行処理に２サイクルを必要とする命令が含まれる場合に、プログラムのロード命令の後にＮＯＰ命令を配置する必要がなくなり、プログラムコード量を減少できる。
なお、上記の実施形態では、実行処理に２サイクルを必要とする命令の一例として、ロード命令の場合について説明した。 Therefore, according to this embodiment, when the program includes an instruction that requires two cycles for execution processing, such as a load instruction, it is not necessary to place a NOP instruction after the load instruction of the program. The amount can be reduced.
In the above embodiment, a load instruction has been described as an example of an instruction that requires two cycles for execution processing.

しかし、実行処理時に２サイクル、または２サイクル以上（複数サイクル）を必要とする命令には、上記のロード命令の他に、積和演算命令、ベクトル命令などをあげることができる。
このように、実行処理時に複数サイクルを必要とする特定の命令がプログラムに含まれる場合には、命令メモリ１から命令レジスタ２に命令が取り込まれたときに、制御部１１は、その命令が特定の命令であるか否かを判定する。そして、特定の命令であると判定した場合には、その特定命令の実行処理が行われる複数のサイクルの期間に、上記のようにロード命令であると判定された場合と同様の処理を行う。 However, examples of instructions that require two cycles or two or more cycles (plural cycles) during execution processing include product-sum operation instructions and vector instructions in addition to the load instructions described above.
As described above, when a specific instruction that requires a plurality of cycles during execution processing is included in the program, when the instruction is fetched from the instruction memory 1 to the instruction register 2, the control unit 11 specifies the instruction. It is determined whether or not it is an instruction. If it is determined that the instruction is a specific instruction, the same processing as that when it is determined that the instruction is a load instruction is performed during a plurality of cycles during which the execution process of the specific instruction is performed.

本発明の実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of embodiment of this invention. この実施形態において、実行処理時に２サイクルを必要とする命令を含む場合のパイプライン処理を説明する説明図である。In this embodiment, it is explanatory drawing explaining the pipeline process in the case of including the instruction which requires 2 cycles at the time of an execution process. 従来のパイプライン処理の一例を説明するための説明図である。It is explanatory drawing for demonstrating an example of the conventional pipeline process. 従来のパイプライン処理における分岐時の不都合を説明するための説明図である。It is explanatory drawing for demonstrating the inconvenience at the time of the branch in the conventional pipeline processing. 従来のパイプライン処理におけるフォワーディングを説明するための説明図である。It is explanatory drawing for demonstrating the forwarding in the conventional pipeline processing. 従来のパイプライン処理における他のフォワーディングを説明するための説明図である。It is explanatory drawing for demonstrating the other forwarding in the conventional pipeline processing. 従来のパイプライン処理において、前後で同じレジスタを使用する場合の不都合を解消するために考えられるパイプライン処理を説明するための説明図である。In the conventional pipeline process, it is explanatory drawing for demonstrating the pipeline process considered in order to eliminate the inconvenience when using the same register before and behind.

Explanation of symbols

１は命令メモリ、２は命令レジスタ、３は分岐先アドレス計算部、４は命令デコード部、５は実行部、６はレジスタファイル、８は加算器、９はマルチプレクサ、１０はプログラムカウンタ、１１は制御部である。 1 is an instruction memory, 2 is an instruction register, 3 is a branch destination address calculation unit, 4 is an instruction decoding unit, 5 is an execution unit, 6 is a register file, 8 is an adder, 9 is a multiplexer, 10 is a program counter, and 11 is It is a control unit.

Claims

In a pipeline processing unit in which instructions are pipelined,
Determination means for determining whether or not the instruction is a predetermined instruction that requires a plurality of cycles during execution processing;
When the determination means determines that the instruction is a predetermined instruction, the execution of the subsequent instruction is stopped during a period of a plurality of cycles in which the execution process of the predetermined instruction is performed, and the subsequent instruction is released after the stop is released. Control means for executing the processing of
A pipeline arithmetic processing apparatus comprising:

The determination means determines whether or not the instruction is a load instruction that requires two cycles during execution processing,
When the determination means determines that the load instruction is a load instruction, the control means temporarily stops the operation of the program counter in the first cycle during the two-cycle period during which the execution process of the load instruction is performed. 2. The pipeline arithmetic processing apparatus according to claim 1, wherein execution of processing is stopped, and the program counter is operated in a second cycle to execute processing of the subsequent instruction.