JPH04247522A

JPH04247522A - Parallel operation processor

Info

Publication number: JPH04247522A
Application number: JP3013247A
Authority: JP
Inventors: Takashi Yoshida; 尊吉田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1991-02-04
Filing date: 1991-02-04
Publication date: 1992-09-03
Anticipated expiration: 2014-03-31
Also published as: JP2877531B2

Abstract

PURPOSE:To suppress following instructions to be aborted at a branch to a minimum by changing the number of instructions coming after a branching instruction and a branching destination address to be allocated to an operation execution part according to probability calculated from the branching history of the branching instruction. CONSTITUTION:When simultaneously processing the plural instructions and the branching instruction exists in a prefetched instruction group, the instruction is also fetched from the branching destination address where branching occurs according to the same branching instruction. Next, the probability for the branching instruction to execute branching is calculated according to the branching history of the branching instruction extracted from a storage device 1. Then, a ratio between the number of instructions coming after the branching instruction to be allocated to plural operation execution parts 13a-13h and the number of instructions coming after the branching destination address is decided according to the probability. Thus, the instructions are distributed to the operation execution parts 13a-13h while changing the ratio between the both numbers of instructions by using the probability of branching, and the instruction having no data depending relation with the instruction before the branching instruction at the branching destination is issued to the operation execution parts 13a-13h.

Description

[Detailed description of the invention]

【０００１】［発明の目的］[Object of the invention]

【０００２】0002

【産業上の利用分野】本発明は、パイプライン処理方式
、及び並列処理方式により、複数の演算処理を同時並列
に行う並列演算処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel arithmetic processing device that simultaneously performs a plurality of arithmetic operations in parallel using a pipeline processing method and a parallel processing method.

【０００３】0003

【従来の技術】近年の演算処理装置では、１マシンサイ
クルあたりに複数の演算を行える並列処理を行っている
ものが出ており、その例としては、スーパースカラ、あ
るいはＶＬＩＷなどが挙げられる。これらは演算処理装
置内に複数の演算器及び処理部を持ち、それらが並列に
実行することにより、ＣＰＩ（ｃｌｏｃｋ　ｐｅｒ　Ｉ
ｎｓｔｒｕｃｔｉｏｎ）を小さくしている。また、上記
演算部及び処理部はパイプラインで処理を行っているの
が通常で、それにより、より高い処理能力を持つように
なっている。2. Description of the Related Art In recent years, some arithmetic processing devices are performing parallel processing that allows multiple operations to be performed per machine cycle, examples of which include superscalar and VLIW. These devices have multiple arithmetic units and processing units within the arithmetic processing unit, and when they execute in parallel, CPI (clock per I)
structure) is made smaller. Further, the arithmetic unit and the processing unit usually perform processing in a pipeline, thereby providing higher processing power.

【０００４】パイプラインで処理を行う際に問題となる
点の一つとして、分岐命令の実行によるパイプラインの
乱れがある。通常、分岐が生じた際は、パイプライン内
の、分岐命令以降の後続命令はアボートされ、分岐先の
命令がフェッチされてから実行に入るまでのパイプライ
ンステージが無駄になってしまう。そのため、分岐命令
の分岐による演算命令アボートを少なくするために、様
々な方法がとられている。[0004] One of the problems when performing processing using a pipeline is that the pipeline is disrupted by execution of a branch instruction. Normally, when a branch occurs, subsequent instructions in the pipeline after the branch instruction are aborted, and the pipeline stages from fetching the branch destination instruction to execution are wasted. Therefore, various methods have been taken to reduce the number of operation instruction aborts caused by branches of branch instructions.

【０００５】例えば、パイプライン処理では、コンパイ
ル時に、分岐命令の後ろに分岐が発生してもアボートさ
れない命令（本来は分岐命令の前に実行されるが、演算
結果が分岐に影響を及ぼさない命令）を置く（ディレイ
ド命令）方法がある。図７にディレイド命令の例を示す
。Ｉ０　とＩ１　は通常の命令で、Ｉ２　が分岐命令で
あったとする。パイプラインが５段構成で、分岐判定が
３段目で決定される場合、Ｉ２　がパイプライン３段目
にきて分岐実行となったとき、通常Ｉ２　の後続命令Ｉ
３　，Ｉ４　はアボートされる（図７（ａ））。Ｉ３　
がディレイド命令でれば、命令Ｉ４　はアボートされる
が、Ｉ３　は分岐命令Ｉ２　の分岐の可否によらずその
ままパイプラインを流れて実行される。これにより、ア
ボートされるステージは、２段から１段に減る（図７（
ｂ））。For example, in pipeline processing, during compilation, instructions that are not aborted even if a branch occurs after a branch instruction (instructions that are originally executed before a branch instruction, but whose operation result does not affect the branch) ) (delayed instruction). FIG. 7 shows an example of a delayed instruction. Assume that I0 and I1 are normal instructions, and I2 is a branch instruction. If the pipeline has a five-stage configuration and the branch decision is made in the third stage, when I2 reaches the third stage of the pipeline and a branch is executed, the instruction I2 that follows I2 is normally
3 and I4 are aborted (FIG. 7(a)). I3
If the instruction I4 is a delayed instruction, the instruction I4 is aborted, but the instruction I3 continues through the pipeline and is executed regardless of whether the branch instruction I2 branches. As a result, the number of stages to be aborted is reduced from two to one (Figure 7 (
b)).

【０００６】上記ディレイド命令による方法の他には、
分岐命令があった際に、過去に同分岐命令を実行したか
否かの履歴により、同分岐命令に続いて実行させる命令
を、同分岐命令の後続命令とするか、分岐先の命令とす
るかを決定する分岐予測の方法などがあり、スーパース
カラタイプの演算処理装置にも使用できる。[0006] In addition to the method using the above-mentioned delayed instruction,
When there is a branch instruction, the instruction to be executed following the branch instruction is set as the subsequent instruction of the branch instruction or as the branch destination instruction, depending on the history of whether or not the same branch instruction was executed in the past. There are branch prediction methods that determine whether the

【０００７】上記ディレイド命令による方法は、１サイ
クルに１命令を処理する装置の場合であるが、１サイク
ルに複数命令を並列処理する装置に対するディレイド命
令方法は無かった。このため、並列処理においては分岐
命令を含んだ命令群を同時に実行した際、分岐が発生し
た場合は、シーケンシャルなモデル上で、分岐命令の前
の命令は実行を続け、分岐命令より後の命令は実行を中
止しなければならなかった。The method using delayed instructions described above applies to devices that process one instruction per cycle, but there has not been a delayed instruction method for devices that process multiple instructions in parallel per cycle. Therefore, in parallel processing, when a group of instructions including a branch instruction are executed simultaneously, if a branch occurs, on a sequential model, the instruction before the branch instruction continues to execute, and the instruction after the branch instruction continues to execute. had to stop running.

【０００８】また、そのほかの問題としては、スーパー
スカラ等の処理同時実行の演算処理装置では、一連の命
令の中で同時に実行可能な命令数は、データ依存関係等
により、２〜３命令とされており、演算実行部が空いて
いる場合が多く、演算実行部の無駄により、並列処理の
メリットが半減されるという問題もある。[0008] Another problem is that in an arithmetic processing device such as a superscalar that can execute processes simultaneously, the number of instructions that can be executed simultaneously in a series of instructions is limited to 2 to 3 instructions due to data dependencies. There is also the problem that the calculation execution unit is often vacant, and the merits of parallel processing are halved due to wasted operation execution units.

【０００９】[0009]

【発明が解決しようとする課題】このように、従来のパ
イプラインあるいは並列処理装置等の、分岐命令に続く
命令の実行を、分岐確定前に始める機構の並列演算処理
装置では、常に分岐による多くの後続命令がアボートさ
れてしまうという問題があった。[Problems to be Solved by the Invention] As described above, in parallel arithmetic processing devices such as conventional pipelines or parallel processing devices, which have a mechanism that starts execution of instructions following a branch instruction before the branch is confirmed, there is always a large number of problems due to branching. There was a problem that subsequent instructions would be aborted.

【００１０】そこで、本発明は、このような事情に鑑み
てなされたものであり、その目的とするところは、１サ
イクルに複数命令を並列処理する場合にも、分岐命令に
よってアボートされる命令を最少限に止めることができ
る並列演算処理装置を提供することにある。[0010] The present invention has been made in view of the above-mentioned circumstances, and its purpose is to prevent instructions that are aborted by branch instructions even when processing multiple instructions in parallel in one cycle. An object of the present invention is to provide a parallel arithmetic processing device that can reduce the number of parallel operations to a minimum.

【００１１】［発明の構成］[Configuration of the invention]

【００１２】0012

【課題を解決するための手段】上記目的を達成するため
に、本発明は、パイプライン動作し、同時並列に動作可
能な複数の演算実行部と、フェッチした条件分岐命令が
過去に分岐を行ったか否かの履歴情報を保持する履歴情
報保持手段と、前記条件分岐命令の後続命令群を保持す
る後続命令群保持手段と、前記条件分岐命令の分岐先ア
ドレス以降の命令群を保持する分岐先命令群保持手段と
、前記履歴情報保持手段を用いて分岐が発生する確率を
求める確率発生手段と、前記確率発生手段より得られた
確率に従い、前記後続命令群保持手段から発行すべき命
令数と、前記分岐先命令群保持手段から発行すべき命令
数を決定し、前記複数の演算実行部に各命令を割り付け
る命令分配手段とから構成されている。[Means for Solving the Problems] In order to achieve the above object, the present invention provides a plurality of arithmetic execution units that operate in a pipeline and can operate simultaneously in parallel, and a fetched conditional branch instruction that has previously branched. history information holding means for holding history information on whether or not the conditional branch instruction was executed; a subsequent instruction group holding means for holding a group of instructions subsequent to the conditional branch instruction; and a branch destination holding a group of instructions after the branch destination address of the conditional branch instruction. an instruction group holding means, a probability generation means for calculating the probability of a branch occurring using the history information holding means, and a number of instructions to be issued from the subsequent instruction group holding means according to the probability obtained from the probability generation means. and an instruction distributing means for determining the number of instructions to be issued from the branch target instruction group holding means and allocating each instruction to the plurality of operation execution units.

【００１３】[0013]

【作用】上記手段によって、複数の同時実行可能な演算
実行部を持つ演算処理装置において、複数の命令を同時
処理を行っている際、複数演算器に分配するためにプリ
フェッチした命令群の中に分岐命令が存在した場合、分
岐命令の実行前に同分岐命令で分岐が起こった場合の分
岐先アドレスからも命令をフェッチし、分岐命令の分岐
の履歴を保持した記憶装置から引いてきた同分岐命令の
分岐の履歴より、同分岐命令が分岐を実行する確率を求
め、複数の演算実行部に割り付けられる分岐命令以降の
命令の数と、分岐先アドレス以降の命令の数の比を前記
確率により決定する。このように、複数の演算実行部に
割り付けられる、分岐命令以降の命令の数と、分岐先ア
ドレス以降の命令の数命令数の比を分岐の確率を用いて
変化させ、分岐命令以降の命令と、分岐先アドレス以降
の命令を演算実行部に分配し、分岐先の命令で分岐命令
以前の命令とのデータ依存関係がない命令を演算実行部
へ発行することにより、分岐実行時の分岐による命令ア
ボートを防いでいる。[Operation] With the above means, when multiple instructions are being processed simultaneously in an arithmetic processing unit that has a plurality of arithmetic execution units that can be executed simultaneously, a group of prefetched instructions for distribution to multiple arithmetic units is If a branch instruction exists, before the execution of the branch instruction, an instruction is also fetched from the branch destination address when a branch occurs with the same branch instruction, and the same branch is fetched from the storage device that holds the branch history of the branch instruction. From the branch history of an instruction, find the probability that the same branch instruction executes a branch, and calculate the ratio of the number of instructions after the branch instruction that are assigned to multiple arithmetic execution units to the number of instructions after the branch destination address based on the probability. decide. In this way, the ratio of the number of instructions after the branch instruction and the number of instructions after the branch destination address, which are allocated to multiple arithmetic execution units, is changed using the branch probability, and the number of instructions after the branch instruction is changed. , by distributing the instructions after the branch destination address to the arithmetic execution unit, and issuing to the arithmetic execution unit an instruction at the branch destination that has no data dependency with the instruction before the branch instruction, the instruction by the branch at the time of branch execution. Preventing aborts.

【００１４】[0014]

【実施例】以下、図面を参照しながらこの発明の実施例
を説明する。図１は、本発明の並列演算処理装置に係わ
る一実施例の構成を示すブロック図である。なお、今回
の実施例では、８命令同時実行可能なスーパースカラタ
イプのＲＩＳＣプロセッサを例として挙げる。Embodiments Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of an embodiment of a parallel arithmetic processing device of the present invention. In this embodiment, a superscalar type RISC processor that can execute eight instructions simultaneously will be taken as an example.

【００１５】図１に示す並列演算処理装置は、プログラ
ムを保持するインストラクションメモリ１（以降、ＩＭ
と呼ぶ）、命令ラッチ部２、分岐先命令ラッチ部３、命
令ラッチ部２のすでに命令が発行されて空いている部分
に、ＩＭからフェッチしてきた命令をアロケートして入
れることにより常に命令ラッチ部２が詰まっているよう
にする命令アロケータ４、分岐先命令ラッチ部３に、命
令アロケータ４と同様の作用をする分岐先命令アロケー
タ５、ＩＭ１から命令ラッチ部２へラッチするアドレス
の入ったプログラムカウンタ６、ＩＭ１から分岐先命令
ラッチ部３へラッチする命令のアドレスが入った分岐先
プログラムカウンタ７、本発明の回路を使用することが
指定された分岐命令があった場合のアドレスと、同分岐
命令の分岐先アドレスと、分岐先の命令群と、同分岐命
令の分岐履歴とを記憶したブランチターゲットバッファ
８（以降、ＢＴＢと呼ぶ）、命令ラッチ部２内の命令群
のどの位置に分岐命令が存在するかを判別し、命令ラッ
チ部２の先頭から、分岐命令までの命令数を算出する非
アボート命令検出回路９、非アボート命令検出回路９の
結果と、ＢＴＢ８から送られて来る分岐履歴により、タ
ーゲットの分岐命令が分岐する確率を生成し、演算器に
分配される、分岐命令以降の命令数と、分岐先アドレス
以降の命令数を決定する分配命令数算出回路１０、分配
命令数算出回路１０の結果から、命令ラッチ部２と分岐
先命令ラッチ部３から発行する命令を決定する命令分配
回路１１、命令分配回路１１から出力される命令を一時
的にラッチするラッチ部１２、演算を実行する演算実行
部１３ａ〜１３ｈ、演算に必要なデータを保持するデー
タメモリ１４（以降、ＤＭと呼ぶ）から構成される。The parallel arithmetic processing device shown in FIG. 1 has an instruction memory 1 (hereinafter referred to as IM
), the instruction latch unit 2, the branch target instruction latch unit 3, and the instruction latch unit 2 by allocating and inserting the instruction fetched from the IM into the vacant part of the instruction latch unit 2 where an instruction has already been issued. an instruction allocator 4 for making sure that the instruction latch unit 2 is full; a branch destination instruction allocator 5 that operates in the same manner as the instruction allocator 4 in the branch destination instruction latch unit 3; 6. Branch destination program counter 7 containing the address of the instruction to be latched from the IM 1 to the branch destination instruction latch unit 3; the address of a branch instruction that specifies the use of the circuit of the present invention and the same branch instruction; The branch target buffer 8 (hereinafter referred to as BTB) stores the branch destination address, the instruction group of the branch destination, and the branch history of the branch instruction, and in which position of the instruction group in the instruction latch section 2 the branch instruction is placed. Based on the results of the non-abort instruction detection circuit 9 and the non-abort instruction detection circuit 9, which determine whether the instruction exists and calculate the number of instructions from the beginning of the instruction latch unit 2 to the branch instruction, and the branch history sent from the BTB 8. , a distributed instruction number calculation circuit 10 that generates a probability that a target branch instruction will branch, and determines the number of instructions after the branch instruction and the number of instructions after the branch destination address to be distributed to the arithmetic units; 10, the instruction distribution circuit 11 determines the instruction to be issued from the instruction latch unit 2 and the branch destination instruction latch unit 3, the latch unit 12 temporarily latches the instruction output from the instruction distribution circuit 11, and executes the operation. The data memory 14 (hereinafter referred to as DM) holds data necessary for the calculations.

【００１６】ここで、分岐命令には、ユーザーにより、
分岐命令に続いて実行する命令として、同分岐命令に連
続した命令群が指定されたものと、同分岐命令の分岐先
の命令群が指定されたものと、本発明の回路を用いて分
岐命令に連続した命令群と分岐先の命令群とを割り振る
ように指定された命令とがある。また、分配命令数算出
回路１０を使用しないように指定された分岐命令には、
予めユーザーによって分岐確率が設定されている。[0016] Here, the branch instruction includes:
As an instruction to be executed following a branch instruction, there are two types of instructions: one in which a group of consecutive instructions is specified in the same branch instruction, and one in which a group of instructions at the branch destination of the same branch instruction is specified. There is an instruction specified to allocate a group of consecutive instructions and a group of branch destination instructions. In addition, for branch instructions specified not to use the distribution instruction number calculation circuit 10,
Branch probabilities are set in advance by the user.

【００１７】ＩＭ１、及びＤＭ１４は、複数ポート、あ
るいはｓｈａｒｅｄメモリによる、同時に複数のアクセ
スが可能なものとする。命令ラッチ部２は、ＩＭ１から
通常の命令プリフェッチを行う。命令アロケータ４には
、ＩＭ１より命令ラッチ部２へラッチすべき命令群と、
命令ラッチ部２の出力となる命令群が入り、分配命令数
算出回路１０からの情報により、命令ラッチ部２からの
命令群で、すでに演算実行部１３ａ〜１３ｈに発行され
た命令を除いて、まだ発行されていない命令をシフトし
て詰め、空いた部分にＩＭ１からの命令を詰め、その結
果を命令ラッチ部２への入力とする。It is assumed that the IM1 and DM14 can be accessed simultaneously by a plurality of ports or by a shared memory. The instruction latch unit 2 performs normal instruction prefetch from the IM1. The instruction allocator 4 includes a group of instructions to be latched from the IM 1 to the instruction latch unit 2;
A group of instructions that will be the output of the instruction latch unit 2 enters, and according to information from the distributed instruction count calculation circuit 10, the group of instructions from the instruction latch unit 2, excluding the instructions that have already been issued to the operation execution units 13a to 13h, is Instructions that have not yet been issued are shifted and filled in, the empty portion is filled with instructions from IM1, and the result is input to the instruction latch section 2.

【００１８】分岐先命令ラッチ部３には、分岐命令がＢ
ＴＢ８にヒット（一致）した場合に、ＢＴＢ８から分岐
先命令アロケータ５を介して送られて来る分岐先以降の
命令群か、分岐先命令アロケータ５から送られて来る命
令群がラッチされる。分岐先命令アロケータ５は、命令
アロケータ４と同様に、ＩＭ１より分岐先命令ラッチ部
３へラッチすべき分岐先命令群と、分岐先命令ラッチ部
３の出力となる命令群が入り、分配命令数算出回路１０
からの情報により、分岐先命令ラッチ部３からの命令群
で、すでに演算実行部１３ａ〜１３ｈに発行された命令
を除いて、まだ発行されていない命令をシフトして詰め
、空いた部分にＩＭ１からの命令を詰め、その結果を分
岐先命令ラッチ部３への入力とする。The branch destination instruction latch unit 3 stores the branch instruction B.
When there is a hit (match) in TB8, a group of instructions after the branch destination sent from BTB8 via the branch destination instruction allocator 5 or a group of instructions sent from the branch destination instruction allocator 5 are latched. Like the instruction allocator 4, the branch destination instruction allocator 5 receives a group of branch destination instructions to be latched from the IM1 to the branch destination instruction latch unit 3 and a group of instructions to be output from the branch destination instruction latch unit 3, and determines the number of distributed instructions. Calculation circuit 10
Based on the information from the branch destination instruction latch unit 3, the instructions that have not yet been issued are shifted and filled in, excluding the instructions that have already been issued to the operation execution units 13a to 13h, and the IM1 is filled in the vacant part. , and the result is input to the branch destination instruction latch unit 3.

【００１９】図２に命令アロケータ４および分岐先命令
アロケータ５の動作説明図を示す。時間ｔ０で、命令ラ
ッチ部２に命令Ｉ０　から命令Ｉ７　までの８命令が存
在し、時間ｔ１で、命令Ｉ０　から命令Ｉ２　までが発
行される場合、命令Ｉ３　から命令Ｉ７　までは３命令
分左シフトし、空いた３命令分の部分にＩＭ１から読み
込んだ命令が入る。FIG. 2 is a diagram illustrating the operations of the instruction allocator 4 and the branch destination instruction allocator 5. At time t0, if there are 8 instructions from instruction I0 to instruction I7 in the instruction latch unit 2, and at time t1, instructions I0 to I2 are issued, instructions I3 to I7 are shifted to the left by three instructions. Then, the instruction read from IM1 is entered into the vacant three instruction portion.

【００２０】プログラムカウンタ６には、分配命令数算
出回路１０により算出された、命令ラッチ部２から発行
された命令数だけアドレスがインクリメントされる。す
なわち、プログラムカウンタ６は、次クロックで命令ラ
ッチ部２に読み込むべき先頭の命令のアドレスを持つこ
とになる。The address of the program counter 6 is incremented by the number of instructions issued from the instruction latch section 2, which is calculated by the distributed instruction number calculation circuit 10. That is, the program counter 6 will have the address of the first instruction to be read into the instruction latch section 2 at the next clock.

【００２１】分岐プログラムカウンタ７は、プログラム
カウンタ６と同様に、ＢＴＢ８がヒットした場合、ＢＴ
Ｂ８から出力される分岐先アドレス＋８に、分配命令数
算出回路１０から出力される、分岐先命令ラッチ部３か
ら発行される命令数を足したアドレスが入り、次クロッ
クでは、現在保持している値に次クロックで分岐先命令
ラッチ部３から発行される命令数を足した値が入る。Similar to the program counter 6, when BTB8 is hit, the branch program counter 7
The address obtained by adding the number of instructions issued from the branch destination instruction latch unit 3, which is output from the distribution instruction number calculation circuit 10, is entered into the branch destination address +8 output from B8, and in the next clock, the address that is currently held is entered. The value obtained by adding the number of instructions issued from the branch destination instruction latch unit 3 in the next clock is entered.

【００２２】図３にＢＴＢ８の構成図を示す。ＢＴＢ８
には、分岐命令のアドレスタグと、図示しないアドレス
タグ一致検出回路と、同分岐命令の分岐先アドレスと、
分岐先の命令群と、同分岐命令の分岐履歴とを保持して
いる。アドレスタグには、過去の分岐命令があったアド
レスを保持している。ＢＴＢ８は、プログラムカウンタ
６のアドレスから、８命令分までのアドレスをアドレス
タグ一致検出回路により検索する。FIG. 3 shows a configuration diagram of the BTB 8. BTB8
The address tag of the branch instruction, an address tag match detection circuit (not shown), the branch destination address of the branch instruction,
It holds a group of branch destination instructions and a branch history of the branch instructions. The address tag holds the address where the past branch instruction was located. The BTB 8 uses an address tag match detection circuit to search for addresses for up to eight instructions from the address of the program counter 6.

【００２３】分岐履歴保持部には、アドレスタグに対応
した分岐命令の分岐の履歴がデコードあるいはエンコー
ドされた形で保持され、例えば１エントリが６ｂｉｔ構
成で、エンコードしない形で過去６回の分岐履歴を保持
するものとし、同タグに対応する分岐命令が過去に”分
岐””非分岐””分岐””分岐””非分岐””分岐”と
いう履歴を持っている場合、分岐履歴保持部には、”０
１００１０”が保持されている。この６ｂｉｔのうち最
左部が最も古い履歴であり、新たな分岐履歴は右側に付
加されていく。[0023] The branch history holding unit holds the branch history of the branch instruction corresponding to the address tag in a decoded or encoded form. For example, one entry has a 6-bit structure, and the past six branch histories are stored in an unencoded form. If the branch instruction corresponding to the same tag has a history of "branch", "non-branch", "branch", "branch", "non-branch", and "branch" in the past, the branch history holding section ,”0
10010" is held. The leftmost part of these 6 bits is the oldest history, and new branch history is added to the right side.

【００２４】命令ラッチ部２で分岐履歴保持部に保持さ
れている分岐履歴を使用する分岐命令が存在した場合、
同分岐命令のアドレスとアドレスタグとが比較され、ヒ
ットした場合は分岐履歴が出力される。ヒットしない場
合は、ユーザーによって分岐命令に設定された確率の分
岐履歴が出力される。例えば、分岐履歴が６ビットで、
分岐命令の分岐が起こる確率が１／２と設定されていた
場合、”１０１０１０”あるいは”０１０１０１”が出
される。If there is a branch instruction that uses the branch history held in the branch history holding unit in the instruction latch unit 2,
The address of the same branch instruction is compared with the address tag, and if there is a hit, the branch history is output. If there is no hit, the branch history with the probability set in the branch instruction by the user is output. For example, if the branch history is 6 bits,
If the probability that a branch occurs in a branch instruction is set to 1/2, "101010" or "010101" is issued.

【００２５】２クロック後、分岐命令の結果が確定され
ると、図示しない分岐履歴更新部により、同分岐命令の
アドレスタグに対応した分岐履歴保持部には、過去の分
岐履歴である”０１００１０”が、１ビット左シフトし
、最右部には分岐結果である”０”が入って、”１００
１００”が分岐履歴保持部に更新される。但し、分岐命
令に連続した命令群が指定された分岐命令と、分岐先の
命令群が指定された分岐命令の場合は、分岐履歴を使用
せず、過去全て非分岐、あるいは全て分岐の履歴であっ
たものとして分岐履歴の出力にかえる。上記例では、”
００００００”あるいは”１１１１１１”が出力され、
分岐履歴保持部には書き戻されない。Two clocks later, when the result of the branch instruction is determined, the branch history update unit (not shown) stores the past branch history “010010” in the branch history holding unit corresponding to the address tag of the branch instruction. is shifted to the left by 1 bit, and the rightmost part contains "0", which is the branch result, and becomes "100".
100" is updated in the branch history holding unit. However, in the case of a branch instruction that specifies a group of consecutive instructions and a branch instruction that specifies a group of instructions at the branch destination, the branch history is not used. , the output is changed to the branch history assuming that the history was all non-branching or all branching in the past.In the above example, "
000000” or “111111” is output,
It is not written back to the branch history holding unit.

【００２６】ＢＴＢ８にヒットした場合、ＢＴＢ８から
は、分岐先命令８命令が分岐先命令ラッチ部３に出力さ
れ、分岐先アドレス＋８命令のアドレスが分岐先プログ
ラムカウンタ７に出力される。When the BTB 8 is hit, the BTB 8 outputs the branch destination instruction 8 to the branch destination instruction latch section 3 and outputs the address of the branch destination address + 8 instructions to the branch destination program counter 7 .

【００２７】命令ラッチ部２の内容は、非アボート命令
検出回路９に送られ、プリデコードによって分岐命令に
フラグを付けるか、あるいはデコードによって分岐命令
を認識する等の方法により、命令ラッチ部２に保持され
ている命令群の中での分岐命令の位置を認識し、分岐命
令より前の命令、すなわち分岐命令の分岐によりアボー
トされることがない非アボート命令数（分岐命令も含む
）を算出する。演算実行部１３ａ〜１３ｈの数から非ア
ボート命令数を引いた数が、分岐命令に続く命令群と、
分岐命令の分岐先アドレス以降の命令群から演算実行部
１３ａ〜１３ｈへ割り付けられる命令数の総和となる。The contents of the instruction latch section 2 are sent to the non-abort instruction detection circuit 9, and the contents are sent to the instruction latch section 2 by a method such as flagging a branch instruction by predecoding or recognizing a branch instruction by decoding. Recognizes the position of the branch instruction in the group of retained instructions and calculates the instructions before the branch instruction, that is, the number of non-abort instructions (including branch instructions) that will not be aborted due to the branch of the branch instruction. . The number obtained by subtracting the number of non-abort instructions from the number of arithmetic execution units 13a to 13h is the instruction group following the branch instruction,
This is the total number of instructions allocated to the operation execution units 13a to 13h from the instruction group after the branch destination address of the branch instruction.

【００２８】ＢＴＢ８より索かれた分岐履歴、及び非ア
ボート命令検出回路９で算出された非アボート命令数は
、分配命令数算出回路１０に送られる。分配命令算出回
路１０では、上記非アボート命令数から、非アボート命
令が分配される演算実行部１３を除いた、使用可能な演
算実行部数を知り、分岐確率より、命令ラッチ部２に保
持されている命令中、分岐命令以降の命令で、演算実行
部１３に分配する命令数と、分岐先命令ラッチ部３から
演算実行部１３に分配される命令数を決定する。命令ラ
ッチ部２、及び分岐先命令ラッチ部３より演算実行部１
３ａ〜１３ｈに分配される命令には、分岐実行時にアボ
ート、分岐非実行時にアボート、分岐実行、非実行に依
らず実行、実行アボートの４状態を示すフラグが命令分
配回路１１によって付けられ、各演算実行部１３ａ〜１
３ｈに分配され、分岐確定時に同フラグにより各命令の
命令アボートを行う。The branch history retrieved from the BTB 8 and the number of non-abort instructions calculated by the non-abort instruction detection circuit 9 are sent to a distributed instruction number calculation circuit 10. The distribution instruction calculation circuit 10 determines the number of available arithmetic execution units, excluding the arithmetic execution units 13 to which non-abort instructions are distributed, from the number of non-abort instructions, and calculates the number of arithmetic execution units held in the instruction latch unit 2 based on the branch probability. The number of instructions to be distributed to the arithmetic execution unit 13 and the number of instructions to be distributed from the branch destination instruction latch unit 3 to the arithmetic execution unit 13 are determined among the instructions after the branch instruction. Operation execution unit 1 from instruction latch unit 2 and branch destination instruction latch unit 3
The instructions distributed to 3a to 13h are flagged by the instruction distribution circuit 11 to indicate four states: abort when branch is executed, abort when branch is not executed, branch executed, execution regardless of whether branch is executed, and execution abort. Arithmetic execution units 13a-1
3h, and each instruction is aborted using the same flag when a branch is established.

【００２９】非アボート命令検出回路９では、分岐命令
が命令ラッチ部２内の８命令中、何番目に分岐命令が存
在するかを検出する。分岐命令が３番目にあった場合は
、”１１１０００００”を出力する（図４）。The non-abort instruction detection circuit 9 detects the position of the branch instruction among the eight instructions in the instruction latch unit 2. If there is a third branch instruction, "11100000" is output (FIG. 4).

【００３０】分配命令数算出回路１０では、非アボート
命令検出回路９から命令割付可能な演算実行部１３の数
が”５”であることを知り、分岐履歴保持部から分岐履
歴が”０１００１０”であことを知る。分岐命令では、
割付数が余った場合、あるいは足りない場合、非分岐、
分岐のどちらを選択するかの定義もなされており、分岐
命令が非分岐を選択するよう定義されている場合、上記
例では、”０１００１０”を処理して、”１１００００
”として、最右部の０をとり、”１１０００”とし、上
記”１１１０００００”の”０”の部分に”１１０００
”を埋め込み、”１１１１１０００”を生成して、命令
分配回路１１に出力する。また、分岐を選択するよう定
義されている場合、”１１００００”の最左部の１をと
り、”１００００”として埋め込む。The distributed instruction count calculation circuit 10 learns from the non-abort instruction detection circuit 9 that the number of arithmetic execution units 13 to which instructions can be assigned is "5", and determines that the branch history is "010010" from the branch history storage unit. Know Ako. In a branch instruction,
If the number of allocations is surplus or insufficient, no branching,
It is also defined which branch to select, and if the branch instruction is defined to select a non-branch, in the above example, "010010" is processed and "110000" is
”, take the rightmost 0 and make it “11000”, and replace “11000” in the “0” part of “11100000” above.
", generates "11111000" and outputs it to the instruction distribution circuit 11. Also, if it is defined to select a branch, take the leftmost 1 of "110000" and embed it as "10000". .

【００３１】仮に、非アボート命令検出回路９から出力
される８ビットのうち０が７個以上あった場合、非分岐
を選択するよう定義されている場合には”１１００００
”の最左部に１を付け足し、分岐を選択するよう定義さ
れている場合には最右部に０を付け足して埋め込む。If there are 7 or more 0s out of the 8 bits output from the non-abort instruction detection circuit 9, if it is defined to select a non-branch, then "110000"
” is added to the leftmost part, and if a branch is defined to be selected, a 0 is added to the rightmost part and embedded.

【００３２】命令ラッチ部２から出力される命令数と、
分岐先命令ラッチ部３から出力される命令数とが、各々
プログラムカウンタ６と分岐先プログラムカウンタ７に
送られる。上記例では、プログラムカウンタ７へは”５
”が、分岐先プログラムカウンタ７へは”３”が送られ
る（図５）。[0032] The number of instructions output from the instruction latch unit 2;
The number of instructions output from the branch destination instruction latch section 3 is sent to the program counter 6 and the branch destination program counter 7, respectively. In the above example, the program counter 7 is “5”.
", but "3" is sent to the branch destination program counter 7 (FIG. 5).

【００３３】命令分配回路１１では、分配命令数算出回
路１０から得られた値”１１１１１０００”に従い、５
器の演算実行部１３ａ〜１３ｅには命令ラッチ部２から
５命令が、３器の演算実行部１３ｅ〜１３ｈには分岐先
命令ラッチ部５から３命令が分配される。命令が分配さ
れる際には、分配命令数算出回路１０から得られた値”
１１１１１０００”と、非アボート数算出回路９から得
られた値”１１１０００００”とから各々１ｂｉｔずつ
取り出され、命令と同時に演算実行部１３ａ〜１３ｈに
分配される。すなわち、３器の演算実行部１３ａ〜１３
ｃには各々”１１”のフラグが、２器の演算実行部１３
ｄ，ｅには各々”０１”のフラグが、分岐先命令ラッチ
部５から命令を分配された３器の演算実行部１３ｆ〜１
３ｈには”００”のフラグが分配される。そして、分岐
確定時、”１１”のフラグが分配された演算実行部１３
ａ〜１３ｃは分岐命令の結果に依らず実行を継続し、”
０１”のフラグが分配された演算実行部１３ｄ，ｅは非
分岐時に実行を継続して分岐時には命令をアボートし、
”００”のフラグが分配された演算実行部１３ｆ〜１３
ｈは非分岐時に命令をアボートして分岐時に実行を継続
する（図６）。In the instruction distribution circuit 11, according to the value "11111000" obtained from the distribution instruction number calculation circuit 10, 5
Five instructions are distributed from the instruction latch unit 2 to the arithmetic execution units 13a to 13e, and three instructions are distributed from the branch destination instruction latch unit 5 to the three arithmetic execution units 13e to 13h. When instructions are distributed, the value obtained from the distribution instruction number calculation circuit 10 is used.
11111000" and the value "11100000" obtained from the non-abort number calculation circuit 9, one bit is each extracted and distributed to the operation execution units 13a to 13h at the same time as the instruction. That is, the three operation execution units 13a to 13h 13
A flag of “11” is set in each of the two arithmetic execution units 13 in c.
A flag of "01" is set in each of d and e, and the three arithmetic execution units 13f to 1 to which instructions are distributed from the branch destination instruction latch unit 5
A flag of "00" is distributed to 3h. Then, when the branch is confirmed, the calculation execution unit 13 to which the flag of "11" is distributed
a to 13c continue execution regardless of the result of the branch instruction,
The arithmetic execution units 13d and 13e to which the flag "01" has been distributed continue execution when there is no branch, abort the instruction when branched,
Arithmetic execution units 13f to 13 to which a flag of “00” is distributed
h aborts the instruction when a non-branch occurs and continues execution when a branch occurs (FIG. 6).

【００３４】分岐が実行された場合、プログラムカウン
タ６には、分配命令数算出回路１０から分岐先プログラ
ムカウンタ７に送られてきた分岐先命令のうちで、命令
分配回路１１より発行された命令数を加算したアドレス
が入り、分岐先命令ラッチ部３に残っている命令群は命
令ラッチ部２に移される。When a branch is executed, the program counter 6 stores the number of instructions issued by the instruction distribution circuit 11 among the branch destination instructions sent from the distribution instruction number calculation circuit 10 to the branch destination program counter 7. The address obtained by adding 2 is entered, and the instruction group remaining in the branch destination instruction latch section 3 is moved to the instruction latch section 2.

【００３５】通常、複数命令を同時に実行しようとした
場合、データの依存関係で、命令キューに入った全ての
命令が同時に実行できる確率は小さい。上記例では、デ
ータ依存解析のブロックは付けていないが、実際回路を
組んでデータ依存解析を行う場合、回路は分配命令数算
出回路１０に取り付け、スコアボード等の、既に実行中
の命令とのデータコンフリクト及びデコードによる命令
ラッチ部２及び分岐先命令ラッチ部３の命令間のデータ
コンフリクトを解析し、発行する命令にマスクを掛ける
等の回路を取り付けることで、データ依存による命令発
行停止は行える。Normally, when a plurality of instructions are to be executed simultaneously, the probability that all the instructions in the instruction queue can be executed simultaneously is small due to data dependencies. In the above example, a block for data dependence analysis is not attached, but when actually building a circuit and performing data dependence analysis, the circuit should be attached to the distribution instruction number calculation circuit 10 and connected to the instructions already being executed, such as on a scoreboard. By installing a circuit that analyzes data conflicts and data conflicts between instructions in the instruction latch unit 2 and branch destination instruction latch unit 3 due to data conflict and decoding, and masks issued instructions, instruction issuance can be stopped due to data dependence.

【００３６】上記例の場合、分岐確率は４／６で、仮に
分岐命令以降の命令および分岐先アドレス以降の命令で
、同時実行可能な命令数が各々２命令であったとすると
、従来の単純な分岐予測制御では、分岐命令の後ろには
、分岐先アドレス以降の命令が割り付けられるが、上記
実行可能命令数により、３器の演算実行部１３ｆ〜１３
ｈが空いてしまう。この場合分岐確定時に有効な命令数
は２×４／６で、確率的に１．３３命令ということにな
る。本発明による制御の場合、分岐先アドレスより３命
令、分岐命令に続く命令から２命令選択され、演算実行
部１３ｄ〜１３ｇに割り付けられる命令は、各々２命令
ずつとなる。この場合、分岐確定時に有効な命令数は、
２×４／６＋２×２／６で、２命令となる。In the above example, the branch probability is 4/6, and if the number of instructions that can be executed simultaneously is 2 for each of the instructions after the branch instruction and the instructions after the branch destination address, then the conventional simple In branch prediction control, instructions after the branch destination address are allocated after the branch instruction.
h becomes empty. In this case, the number of effective instructions when the branch is determined is 2×4/6, which means that the probability is 1.33 instructions. In the case of the control according to the present invention, three instructions are selected from the branch destination address and two instructions are selected from the instructions following the branch instruction, and the number of instructions assigned to each of the operation execution units 13d to 13g is two instructions. In this case, the number of valid instructions when the branch is confirmed is
2×4/6+2×2/6, which is 2 instructions.

【００３７】上記例では、分岐命令が初めて現われ、Ｂ
ＴＢ８にヒットしない場合が多分に考えられるが、上記
例で、ＢＴＢ８を使用したのは、分岐先のアドレス計算
にかなりの時間を必要とし、実際に回路を組む場合、１
クロックサイクルでは計算できない場合があるためであ
る。そのため、ＢＴＢ８の分岐先アドレス及び分岐先命
令群を使用しない場合は、命令フェッチフェーズを１つ
増やし、アドレス計算の時間を設けてやれば問題ない。また、分岐命令が存在する間隔が短く、分岐先の命令群
にも分岐命令が現われ、分岐先命令が木構造状に増えて
行く場合は、特開平２−１５７９３９にあるように、複
数の命令バッファを設け、バッファ入力、バッファ出力
を選択する回路を、設けることにより解決できる。In the above example, the branch instruction appears for the first time, and B
Although there are many cases where TB8 is not hit, the reason why BTB8 was used in the above example is that it takes a considerable amount of time to calculate the branch destination address, and when actually building the circuit, it takes 1
This is because calculation may not be possible in clock cycles. Therefore, if the branch destination address and branch destination instruction group of the BTB 8 are not used, there will be no problem if the number of instruction fetch phases is increased by one to provide time for address calculation. In addition, if the interval between branch instructions is short, branch instructions appear in the instruction group of the branch destination, and the number of branch destination instructions increases in a tree structure, multiple instructions may be This problem can be solved by providing a buffer and providing a circuit for selecting the buffer input and buffer output.

【００３８】さらに、分岐命令に連続した命令群と、分
岐先の命令群のどちらかが、データ依存等の問題で、確
率により割り付けた演算実行部１３ａ〜１３ｈに命令が
発行できず、演算実行部１３ａ〜１３ｈに空きが出来る
場合には、命令割り付けの比率をさらに変えて、他方の
命令を実行する回路を組むこともできる。Furthermore, due to problems such as data dependence, either the instruction group following the branch instruction or the instruction group at the branch destination cannot issue the instruction to the operation execution units 13a to 13h assigned according to the probability, and the operation execution is delayed. If space is available in the sections 13a to 13h, it is possible to further change the instruction allocation ratio and build a circuit for executing the other instruction.

【００３９】[0039]

【発明の効果】以上のように、本発明の並列演算処理装
置によれば、分岐命令があった場合は、同分岐命令の分
岐の履歴から求まる確率により、演算実行部へ割り付け
る、分岐命令以降の命令数と、分岐先アドレス以降の命
令数を変えている。これにより、１サイクルに複数命令
を並列処理する場合にも、分岐によってアボートされる
後続命令を最少に止めることができると共に、データ依
存関係等により生ずる演算実行部の待機時間も短縮でき
る。As described above, according to the parallel arithmetic processing device of the present invention, when there is a branch instruction, the information after the branch instruction is allocated to the arithmetic execution unit based on the probability determined from the branch history of the branch instruction. The number of instructions after the branch destination address and the number of instructions after the branch destination address are changed. As a result, even when a plurality of instructions are processed in parallel in one cycle, it is possible to minimize the number of subsequent instructions that are aborted due to branches, and it is also possible to reduce the waiting time of the arithmetic execution unit caused by data dependencies and the like.

[Brief explanation of the drawing]

【図１】本発明の一実施例の構成を示すブロック図であ
る。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention.

【図２】図１に示した命令アロケータ及び分岐先命令ア
ロケータの動作説明図である。FIG. 2 is an explanatory diagram of operations of an instruction allocator and a branch destination instruction allocator shown in FIG. 1;

【図３】図１に示したブランチターゲットバッファの構
成図である。FIG. 3 is a configuration diagram of the branch target buffer shown in FIG. 1;

【図４】図１に示した非アボート命令検出回路の動作説
明図である。FIG. 4 is an explanatory diagram of the operation of the non-abort instruction detection circuit shown in FIG. 1;

【図５】図１に示した分配命令数算出回路の動作説明図
である。FIG. 5 is an explanatory diagram of the operation of the distribution instruction number calculation circuit shown in FIG. 1;

【図６】図１に示した命令分配回路の構成図である。FIG. 6 is a configuration diagram of the instruction distribution circuit shown in FIG. 1;

【図７】従来のディレイド命令を説明するための図であ
る。FIG. 7 is a diagram for explaining a conventional delayed instruction.

【符号の説明】１　　ＩＭ（インストラクションメモリ）２　　命令ラ
ッチ部３　　分岐先命令ラッチ部４　　命令アロケータ５　　分岐先命令アロケータ６　　プログラムカウンタ７　　分岐先プログラムカウンタ８　　ＢＴＢ（ブランチターゲットバッファ）９　　非
アボート命令検出回路１０　　分配命令数算出回路１１　　命令分配回路１２　　ラッチ部１３ａ〜１３ｈ　　演算実行部１４　　ＤＭ（データメモリ）[Description of symbols] 1 IM (instruction memory) 2 Instruction latch section 3 Branch destination instruction latch section 4 Instruction allocator 5 Branch destination instruction allocator 6 Program counter 7 Branch destination program counter 8 BTB (branch target buffer) 9 Non-abort instruction detection circuit 10 Distribution instruction number calculation circuit 11 Instruction distribution circuit 12 Latch sections 13a to 13h Arithmetic execution section 14 DM (data memory)

Claims

[Claims]

1. A plurality of arithmetic execution units that operate in a pipeline and can operate simultaneously in parallel; a history information holding unit that holds history information as to whether or not a fetched conditional branch instruction has branched in the past; A subsequent instruction group holding means holds a subsequent instruction group of a conditional branch instruction, a branch destination instruction group holding means holds a group of instructions after the branch destination address of the conditional branch instruction, and the history information holding means is used to execute a branch. a probability generating means for determining the probability of occurrence, and determining the number of instructions to be issued from the subsequent instruction group holding means and the number of instructions to be issued from the branch target instruction group holding means according to the probability obtained by the probability generating means. and an instruction distributing means for allocating each instruction to the plurality of operation execution units.