JP4654433B2

JP4654433B2 - Data processing apparatus, data processing program, and recording medium on which data processing program is recorded

Info

Publication number: JP4654433B2
Application number: JP2004324348A
Authority: JP
Inventors: 康彦中島
Original assignee: Kyoto University
Current assignee: Kyoto University
Priority date: 2004-11-08
Filing date: 2004-11-08
Publication date: 2011-03-23
Anticipated expiration: 2024-11-08
Also published as: JP2006134186A

Description

本発明は、主記憶手段から命令列および／または値を読み出し、演算処理を行った結果を主記憶手段に書き込む処理を行うデータ処理装置に関するものである。 The present invention relates to a data processing apparatus that performs a process of reading an instruction sequence and / or a value from a main storage unit and writing a result of an arithmetic process into the main storage unit.

従来、ＣＰＵ(Central Processing Unit)を始めとするマイクロプロセッサにおいて、演算速度の高速化技術に関する研究開発が盛んに行われている。高速化技術としては、例えばパイプライン、スーパースケーラ、アウトオブオーダー実行、および、レジスタリネーミングなどが挙げられる。 2. Description of the Related Art Conventionally, research and development relating to a technique for increasing the operation speed of a microprocessor such as a CPU (Central Processing Unit) has been actively conducted. Examples of the speed-up technology include pipeline, superscaler, out-of-order execution, and register renaming.

パイプラインは、命令の実行処理を数段階に分解し、複数の命令を流れ作業的に同時処理を行う技術である。スーパースケーラは、命令の実行回路を２組以上用意し、複数の命令を同時に並行して実行する技術である。アウトオブオーダー実行は、命令の記述順序を無視して、いくつかの連続する命令の中から先に実行可能なものを探して先行処理を行う技術である。レジスタリネーミングは、例えばＣＩＳＣ(Complex Instruction Set Computer)タイプのプロセッサにおいて、従来のプロセッサにおける命令の互換性を保ちながら、汎用レジスタの数を増やすことによって並行処理が行われる確率を増大させる技術である。 Pipeline is a technique that decomposes instruction execution processing into several stages, and flows a plurality of instructions and performs simultaneous processing. Superscaler is a technology that prepares two or more instruction execution circuits and executes a plurality of instructions simultaneously in parallel. Out-of-order execution is a technique for ignoring the description order of instructions and searching for the first executable instruction among several consecutive instructions and performing a preceding process. Register renaming is a technique for increasing the probability that parallel processing is performed by increasing the number of general-purpose registers while maintaining compatibility of instructions in a conventional processor, for example, in a CISC (Complex Instruction Set Computer) type processor. .

このように、マイクロプロセッサにおける演算速度の高速化を図る際には、命令の実行を並行して行うことが重要となっている。しかしながら、プログラム中には、ある命令の結果に応じて異なる命令が行われるような依存関係、言い換えれば分岐が含まれている場合がほとんどである。このような分岐が含まれている場合、並行処理によって先行して処理を行っていると、分岐の結果によって先行処理した内容が無駄になるという状況が発生することになり、演算速度の高速化の効果が小さくなるという問題がある。 Thus, in order to increase the calculation speed in the microprocessor, it is important to execute instructions in parallel. However, most of the programs include a dependency relationship in which different instructions are executed depending on the result of a certain instruction, in other words, a branch. When such a branch is included, if processing is performed in advance by parallel processing, a situation occurs in which the content of the preceding processing is wasted depending on the result of the branch, and the calculation speed is increased. There is a problem that the effect of.

そこで、プログラム中に分岐がある場合に、分岐先を予測することによって先行処理が無駄になる確率を低減し、並行処理の効果を向上させる技術、いわゆる分岐予測に関する研究が数多く行われている。 Therefore, when there is a branch in the program, many studies have been made on a technique for reducing the probability that the preceding process is wasted by predicting the branch destination and improving the effect of parallel processing, so-called branch prediction.

しかしながら、分岐予測に基づいて投機的先行処理を行う場合には、一般的に次のような問題がある。第１の問題としては、予測の正当性を常に検証する必要があるので、先行命令列の実行時間そのものを削減することはできない、という点である。第２の問題としては、誤った予測に基づく一連の先行演算結果を全て無効化する必要があるので、一度に投機的先行処理できる命令数を多くするには、相応のハードウェアコストを要する、という点である。第３の問題としては、命令間の依存関係が多いほど、多重に投機的先行処理をする必要が生じ、予測の正当性の検証処理、および誤った予測に基づく処理の無効化処理が極めて複雑になる、という点である。 However, when speculative preceding processing is performed based on branch prediction, there are generally the following problems. The first problem is that the execution time of the preceding instruction sequence itself cannot be reduced because it is necessary to always verify the correctness of the prediction. As a second problem, since it is necessary to invalidate a series of predecessor operation results based on erroneous prediction, in order to increase the number of instructions that can be speculatively processed at once, a corresponding hardware cost is required. That is the point. As the third problem, the more the dependency between instructions, the more it becomes necessary to carry out speculative predecessor processing, and the verification validity verification process and the invalidation process based on erroneous prediction are extremely complicated. Is that it becomes.

一方、分岐予測とは異なる高速化技術として、値再利用という技術も提案されている。この値再利用とは、プログラムの一部分に関する入力値および出力値を再利用表に登録しておき、同じ箇所を再度実行する際に、入力値が再利用表に登録されているものである場合には、登録されている出力値を出力する、という技術である。この値再利用による効果としては次のようなものが挙げられる。（１）入力値が、再利用表に登録されている入力値と一致すれば、実行結果を検証する必要がない。（２）入力値および出力値の総数によってのみハードウェアコストが決定され、省略可能な命令列の長さが制約されない。（３）命令間の依存関係の多少は、再利用機構の複雑さに影響を与えない。（４）冗長なロード／ストア命令を削減することができるとともに、これに伴う消費電力の削減も実現される。 On the other hand, a technique called value reuse has also been proposed as a speed-up technique different from branch prediction. In this value reuse, input values and output values related to a part of the program are registered in the reuse table, and when the same part is executed again, the input values are registered in the reuse table. Is a technique of outputting a registered output value. The effects of this value reuse include the following. (1) If the input value matches the input value registered in the reuse table, there is no need to verify the execution result. (2) The hardware cost is determined only by the total number of input values and output values, and the length of the instruction sequence that can be omitted is not restricted. (3) The degree of dependency between instructions does not affect the complexity of the reuse mechanism. (4) Redundant load / store instructions can be reduced, and power consumption can be reduced accordingly.

後記する非特許文献１には、プログラムにおける関数に関して値再利用を行う技術が示されている。この従来技術では、一般的にロードモジュールがＡＢＩ(Application Binary Interface)に従って作られることを利用しており、特に、ＳＰＡＲＣ(Scalable Processor ARChitecture) ＡＢＩを利用している。そして、このＡＢＩにおいて関数の入出力を特定することによって値再利用を実現している。すなわち、値再利用のためのコンパイラによる専用命令の埋め込みが不要となっており、既存ロードモジュールへの適用が可能となっている。 Non-Patent Document 1, which will be described later, shows a technique for reusing values for functions in a program. In this prior art, a load module is generally used in accordance with ABI (Application Binary Interface), and in particular, SPARC (Scalable Processor ARChitecture) ABI is used. Then, value reuse is realized by specifying input / output of a function in this ABI. That is, it is not necessary to embed a dedicated instruction by a compiler for value reuse, and application to an existing load module is possible.

また、関数の多重構造を動的に把握することにより、関数内局所レジスタやスタック上の局所変数を値再利用における入出力値から除外するようにしており、これによって効率を向上させている。特に関数については、関数の複雑さに拘わらず、最大６のレジスタ入力、最大４のレジスタ出力、および、局所変数を含まない最小限の主記憶値の登録による再利用および事前実行が可能となっている。この従来技術について以下に詳細に説明する。 Also, by dynamically grasping the multiplex structure of functions, local registers in the function and local variables on the stack are excluded from input / output values in value reuse, thereby improving efficiency. For functions in particular, regardless of the complexity of the function, reuse and pre-execution are possible by registering a maximum of 6 register inputs, a maximum of 4 register outputs, and a minimum main memory value that does not include local variables. ing. This prior art will be described in detail below.

まず、単一の関数を対象として、何が入力で何が出力であるかを明らかにし、１レベルの再利用を行うために必要な機構について説明する。プログラムにおいては、一般的に関数は多重構造を形成している。関数Ａ（Function-A）が関数Ｂ（Function-B）を呼び出す構造を図１７（ａ）に示す。 First, for a single function, we will clarify what is input and what is output, and explain the mechanism required to perform one-level reuse. In a program, functions generally form multiple structures. FIG. 17A shows a structure in which the function A (Function-A) calls the function B (Function-B).

帯域変数（Globals）は、関数Ａの入出力（Ａ_ｉｎ／Ａ_ｏｕｔ）および関数Ｂの入出力（Ｂ_ｉｎ／Ｂ_ｏｕｔ）になりうるものである。関数Ａの局所変数（Locals-A）は、関数Ａの入出力ではないが、ポインタを通じてＢの入出力になりうるものである。また、関数Ａから関数Ｂへの引数（Args）は、関数Ｂへの入力となりうるものであり、関数Ｂから関数Ａの返り値（Ret.Val.）は、関数Ｂからの出力となりうるものである。なお、関数Ｂの局所変数（Locals-B）は、関数Ａおよび関数Ｂの入出力には含まれない。 The band variable (Globals) can be an input / output (A _in / A _out ) of the function A and an input / output (B _in / B _out ) of the function B. The local variable (Locals-A) of the function A is not an input / output of the function A, but can be an input / output of B through a pointer. An argument (Args) from function A to function B can be an input to function B, and a return value (Ret.Val.) From function B to function A can be an output from function B. It is. The local variable (Locals-B) of the function B is not included in the input / output of the function A and the function B.

コンテクストに依存せずに関数Ｂを再利用するには、関数Ｂの実行時に、関数Ｂの入出力Ｂ_ｉｎ／Ｂ_ｏｕｔのみを入出力として登録しなければならない。ここで、図１７（ａ）に示すプログラム構造を実行する際の主記憶におけるメモリマップを図１７（ｂ）に示す。このメモリマップにおいて、Ｂ_ｉｎ／Ｂ_ｏｕｔを含まない領域はLocals-Bのみとなっている。よって、Ｂ_ｉｎ／Ｂ_ｏｕｔを識別するには、GlobalsとLocals-Bとの境界、および、Locals-BとLocals-Aとの境界をそれぞれ確定しなければならない。前者については、一般的にＯＳ(Operating System)が実行時のデータサイズおよびスタックサイズの上限を決めることを利用し、ＯＳが設定する境界(LIMIT)に基づいてGlobalsとLocals-Bとの境界を確定することができる。後者については、Ｂが呼び出される直前のスタックポインタの値（SP in A）を用いることによって、Locals-BとLocals-Aとの境界を確定することができる。 In order to reuse the function B without depending on the context, only the input / output B _in / B _out of the function B must be registered as the input / output when the function B is executed. Here, FIG. 17B shows a memory map in the main memory when executing the program structure shown in FIG. In this memory map, the only area that does not include B _in / B _out is Locals-B. Therefore, _{in order} to identify B _in / B _out , the boundary between Globals and Locals-B and the boundary between Locals-B and Locals-A must be determined, respectively. For the former, the OS (Operating System) generally determines the upper limit of the data size and stack size at the time of execution, and the boundary between Globals and Locals-B is determined based on the boundary (LIMIT) set by the OS. It can be confirmed. For the latter, the boundary between Locals-B and Locals-A can be determined by using the value of the stack pointer (SP in A) immediately before B is called.

次に、与えられた主記憶アドレスが、大域変数であるか、または、どの関数の局所変数であるかを識別する方法について説明する。ロードモジュールは、ＳＰＡＲＣＡＢＩに規定されている以下の条件を満たすと仮定する。なお、％fpはフレームポインタ、％spはスタックポインタを意味するものとする。
(1)％sp以上の領域のうち、％sp＋０〜６３はレジスタ退避領域、％sp＋６８〜９１は引数退避領域であり、いずれも関数の入出力ではない。
(2)構造体を返す場合の暗黙的引数(Implicit Arg.)は％sp＋６４〜６７に格納される。
(3)明示的引数(Explicit Arg.)はレジスタ％o０〜５、％sp＋９２以上の領域に置かれる。 Next, a method for identifying whether a given main memory address is a global variable or a local variable of which function will be described. It is assumed that the load module satisfies the following conditions specified in SPARC ABI. Note that% fp means a frame pointer, and% sp means a stack pointer.
(1) Of the areas above% sp,% sp + 0 to 63 are register save areas, and% sp + 68 to 91 are argument save areas, and none of them is function input / output.
(2) An implicit argument (Implicit Arg.) For returning a structure is stored in% sp + 64 to 67.
(3) An explicit argument (Explicit Arg.) Is placed in an area of registers% o0 to 5 and% sp + 92 or more.

まず、大域変数と局所変数とを区別するために、一般的に、ＯＳが実行時のデータサイズおよびスタックサイズの上限を決めることを利用し、次の事項を仮定する。
(1)大域変数はLIMIT未満の領域に置かれる。
(2)％spは、LIMIT以下になることはなく、LIMIT〜％spの領域は無効である。 First, in order to distinguish between a global variable and a local variable, generally, the OS determines the upper limit of the data size and stack size at the time of execution, and assumes the following matters.
(1) Global variables are placed in the area below LIMIT.
(2)% sp never falls below LIMIT, and the area from LIMIT to% sp is invalid.

以上の条件を満たしながら、関数Ａが関数Ｂを呼び出す場合の、メモリマップにおける引数およびフレームの概要を図１８に示す。同図を参照しながら、以下にＡの局所変数およびＢの局所変数を区別する方法について説明する。 FIG. 18 shows an outline of arguments and frames in the memory map when the function A calls the function B while satisfying the above conditions. A method for distinguishing the local variable A and the local variable B will be described below with reference to FIG.

同図において、（ａ）はＡ実行中の状態を示している。LIMIT未満の太枠部分に命令(Instructions)および大域変数(Global Vars.)が格納され、％sp以上に有効な値が格納されている。％sp＋６４には、Ｂが構造体を返り値とする場合の暗黙的引数として、構造体の先頭アドレスが格納される。Ｂに対する明示的引数の先頭６ワードはレジスタ％o０〜５、第７ワード以降は％sp＋９２以上に格納される。ベースレジスタを％spとするオペランド％sp＋９２が出現した場合、この領域は引数の第７ワードすなわちＢの局所変数である。一方、オペランド％sp＋９２が出現しない場合、この領域はＡの局所変数である。このように、（ａ）の状態では、オペランドを検証することによってＡの局所変数とＢ局所変数とを区別することができる。 In the figure, (a) shows a state in which A is being executed. Instructions (Instructions) and global variables (Global Vars.) Are stored in a thick frame part less than LIMIT, and valid values are stored above% sp. In% sp + 64, the head address of the structure is stored as an implicit argument when B returns the structure. The first 6 words of the explicit argument for B are stored in registers% o0 to 5 and the seventh and subsequent words are stored in% sp + 92 or more. When an operand% sp + 92 having a base register% sp appears, this area is the seventh word of the argument, that is, a local variable of B. On the other hand, if the operand% sp + 92 does not appear, this area is a local variable of A. Thus, in the state (a), the local variable of A and the B local variable can be distinguished by verifying the operand.

一方、（ｂ）はＢ実行中の状態を示している。引数が入力、返り値が出力、大域変数およびＡの局所変数が入出力となりうる。ただし、Ｂは可変長引数を受け入れる場合があるので、一般に％fp＋９２以上の領域がＡの局所変数の領域となるかＢの局所変数の領域となるかは判断できない。 On the other hand, (b) shows a state in which B is being executed. Arguments can be input, return values can be output, global variables and A local variables can be input / output. However, since B may accept a variable-length argument, it is generally impossible to determine whether the area of% fp + 92 or more is the local variable area of A or the local variable area of B.

局所変数を区別するには、まず、（ａ）の時点において引数の第７ワード以降を検出した関数呼び出しは再利用の対象外とし、第７ワード以降を検出しない関数呼び出しに関して、直前に％sp＋９２の値を記録しておくようにする。なお、第７ワード以降を使用する関数呼び出しの出現頻度が低いと予想されることから、第７ワード以降を使用する関数を再利用の対象外とする制限による性能低下は軽微なものと考える。 In order to distinguish local variables, first, the function call that detects the seventh word and the following of the argument at the time of (a) is excluded from the reuse target, and the function call that does not detect the seventh word and later is immediately before% sp + 92. Record the value of. In addition, since the appearance frequency of function calls using the seventh word and later is expected to be low, it is considered that the performance degradation due to the restriction that the function using the seventh word and later is excluded from reuse.

以上の準備により、（ｂ）における主記憶参照アドレスが、予め記録した％sp＋９２の値以上の場合はＡの局所変数、小さい場合はＢの局所変数であることがわかる。Ｂ実行時には、Ｂの局所変数を除外しながら、大域変数およびＡの局所変数を再利用表へ登録する。 From the above preparation, it can be seen that when the main memory reference address in (b) is greater than or equal to the previously recorded value of% sp + 92, it is a local variable of A, and if it is smaller, it is a local variable of B. At the time of execution of B, the global variable and the local variable of A are registered in the reuse table while excluding the local variable of B.

再利用の際は、Ｂの局所変数は入出力から除外されるので、Ｂの局所変数のアドレスが一致している必要がない。このため、いかなるコンテクストであっても、入力さえ一致すれば、再利用することが可能である。ただし、Ｂが参照する大域変数やＡの局所変数については、アドレスおよびデータの両方が再利用表の内容と完全に一致する必要がある。すなわち、Ｂを実行する前に、どのようにして比較すべき主記憶アドレスを網羅するかがポイントになる。 At the time of reuse, since the local variable of B is excluded from input / output, the address of the local variable of B does not need to match. Therefore, any context can be reused as long as the input matches. However, for the global variable referenced by B and the local variable of A, both the address and data need to completely match the contents of the reuse table. That is, the point is how to cover the main memory addresses to be compared before executing B.

Ｂが参照する大域変数やＡの局所変数のアドレスは、そもそもＢにおいて生成されるアドレス定数や、大域変数／引数を起源とするポインタに基づいているものである。よって、まず引数が完全に一致する再利用表中のエントリを選択した後に、関連する主記憶アドレスをすべて参照して一致比較を行うことにより、Ｂが参照すべき主記憶アドレスを網羅することができる。そして、全ての入力が一致した場合にのみ、登録済の出力（返り値、大域変数、およびＡの局所変数）を再利用することができる。 The address of the global variable referred to by B or the local variable of A is originally based on an address constant generated in B or a pointer originating from the global variable / argument. Therefore, first, by selecting an entry in the reuse table whose arguments completely match, by referring to all the related main memory addresses and performing a matching comparison, it is possible to cover the main memory addresses that B should refer to. it can. The registered output (return value, global variable, and A local variable) can be reused only when all the inputs match.

関数再利用を実現するために、再利用表として、関数管理表（ＲＦ）および入出力記録表（ＲＢ）を設けることにする。１つの関数を再利用するために必要なハードウェア構成を図１９に示す。複数の関数を再利用可能とするには、この構成を複数組用意することになる。 In order to realize function reuse, a function management table (RF) and an input / output record table (RB) are provided as reuse tables. A hardware configuration necessary for reusing one function is shown in FIG. To make a plurality of functions reusable, a plurality of sets of this configuration are prepared.

この表において、ＲＦおよびＲＢに保持されるVは、エントリが有効であるか否かを示すフラグであり、LRU(least recently used)は、エントリ入れ替えのヒントを示している。ＲＦは、上記のVおよびLRUの他に、関数の先頭アドレス(Start)、および参照すべき主記憶アドレス(Read/Write)を保持する。ＲＢは、上記のVおよびLRUの他に、関数呼び出し直前の％sp(SP)、引数(Args.)（V：有効エントリ、Val.：値）、主記憶値(Mask：Read/Writeアドレスの有効バイト、Value：値)、および、返り値(Return Values)(V：有効エントリ、Val.：値)を保持する。 In this table, V held in RF and RB is a flag indicating whether or not an entry is valid, and LRU (least recently used) indicates a hint for entry replacement. In addition to the above V and LRU, the RF holds the start address (Start) of the function and the main memory address (Read / Write) to be referred to. In addition to the above V and LRU, RB includes% sp (SP) immediately before the function call, argument (Args.) (V: valid entry, Val .: value), main memory value (Mask: Read / Write address) Holds valid byte, Value: value), and Return Values (V: valid entry, Val .: value).

返り値は、％i０〜１（リーフ関数では％o０〜１に読み替える）または％f０〜１に格納され、％f２〜３を使用する返り値（拡張倍精度浮動小数点数）は対象プログラムには存在しないものと仮定する。ReadアドレスはＲＦが一括管理し、MaskおよびValueはＲＢが管理することにより、Readアドレスの内容とＲＢの複数エントリをＣＡＭ(content-addressable memory)により一度に比較する構成を可能としている。 The return value is stored in% i0 to 1 (read as% o0 to 1 in the leaf function) or% f0 to 1, and the return value using% f2 to 3 (extended double precision floating point number) is not included in the target program. Assume that it does not exist. The read address is collectively managed by the RF, and the mask and value are managed by the RB, thereby enabling a configuration in which the contents of the read address and a plurality of entries of the RB are compared at once by a CAM (content-addressable memory).

単一の関数を再利用するには、まず、関数実行時に、局所変数を除外しながら、引数、返り値、大域変数および上位関数の局所変数に関する入出力情報を再利用表に登録していく。ここで、読み出しが先行した引数レジスタは関数の入出力として、また、返り値レジスタへの書き込みは関数の出力として登録する。その他のレジスタ参照は登録する必要がない。主記憶参照も同様に、読み出しが先行したアドレスについては入力、書き込みは出力として登録する。 To reuse a single function, first, input / output information related to arguments, return values, global variables, and local variables of higher-level functions is registered in the reuse table while excluding local variables when the function is executed. . Here, the argument register preceded by reading is registered as the input / output of the function, and the writing to the return value register is registered as the output of the function. Other register references do not need to be registered. Similarly, in the main memory reference, an address preceded by reading is registered as input and writing as output.

関数から復帰するまでに次の関数を呼び出した場合、または、登録すべき入出力が再利用表の容量を超える、引数の第７ワードを検出する、途中でシステムコールや割り込みが発生する、などの擾乱が発生しなかった場合、復帰命令を実行した時点で、登録中の入出力表エントリを有効にする。 When the next function is called before returning from the function, or the input / output to be registered exceeds the capacity of the reuse table, the seventh word of the argument is detected, a system call or interrupt occurs in the middle, etc. If no disturbance occurs, the registered I / O table entry is validated when the return instruction is executed.

以降、図１９を参照しながら説明すると、関数を呼び出す前に、(1)関数先頭アドレスを検索し、(2)引数が完全に一致するエントリを選択し、(3)関連する主記憶アドレスすなわち少なくとも１つのMaskが有効であるReadアドレスをすべて参照して、(4)一致比較を行う。全ての入力が一致した場合に、(5)登録済の出力（返り値、大域変数、およびＡの局所変数）を書き戻すことによって、関数の実行を省略することができる。 Hereinafter, with reference to FIG. 19, before calling a function, (1) a function head address is searched, (2) an entry whose arguments completely match is selected, and (3) an associated main memory address, that is, All read addresses where at least one mask is valid are referred to, and (4) match comparison is performed. When all the inputs match, (5) the execution of the function can be omitted by writing back the registered output (return value, global variable, and A local variable).

ここで、命令区間の一例として、図２０に示す命令区間が、図１９に示したＲＦおよびＲＢの構成によって実行された場合の例について説明する。同図において、ＰＣは、該命令区間が開始された際のＰＣ値を示している。すなわち、命令区間の先頭が１０００番地となっている。また、図２１は、図２０に示す命令区間が実行された場合に、ＲＢに登録される入力アドレスおよび入力データ、並びに出力アドレスおよび出力データを簡略化して示しており、図２２は、ＲＢにおける実際の登録状況を示している。 Here, as an example of the instruction interval, an example in which the instruction interval shown in FIG. 20 is executed by the configuration of RF and RB shown in FIG. 19 will be described. In the figure, PC indicates a PC value when the instruction section is started. That is, the head of the instruction section is 1000 addresses. FIG. 21 shows the input address and input data registered in the RB and the output address and output data in a simplified manner when the instruction section shown in FIG. 20 is executed. The actual registration status is shown.

第１行目の命令（以降、単に第１の命令のように称する）において、アドレス定数Ａ１がレジスタＲ０にセットされる。第２の命令において、レジスタＲ０の内容をアドレスとする主記憶からロードされた４バイトデータ（00110000）がレジスタＲ１に格納される。この場合、アドレスＡ１、マスク（FFFFFFFF）（マスクにおいて、Fが有効バイトを示しており、0が無効バイトを示す）、データ（00110000）は、入力としてＲＢにおけるInput側の第１列に登録され、レジスタ番号Ｒ１、マスク（FFFFFFFF）、およびデータ（00110000）は出力としてＲＢにおけるOutput側の第１列に登録される。 In the first line instruction (hereinafter simply referred to as the first instruction), the address constant A1 is set in the register R0. In the second instruction, 4-byte data (00110000) loaded from the main memory with the contents of the register R0 as an address is stored in the register R1. In this case, address A1, mask (FFFFFFFF) (in the mask, F indicates a valid byte, 0 indicates an invalid byte), and data (00110000) are registered as inputs in the first column on the Input side of the RB. , Register number R1, mask (FFFFFFFF), and data (00110000) are registered as outputs in the first column on the Output side of RB.

第３の命令において、アドレス定数Ａ２がレジスタＲ０にセットされる。第４の命令において、レジスタＲ０の内容をアドレスとする主記憶からロードされた１バイトデータ（02）がレジスタＲ２に格納される。この場合、アドレスＡ２、マスク（FF000000）、およびデータ（02）は入力としてＲＢにおけるInput側の第２列に登録される。この際、アドレスＡ２の残り３バイトについては、Don't Careを意味する「−」が格納される。レジスタ番号Ｒ２、マスク（FFFFFFFF）およびデータ（00000002）は出力としてＲＢにおけるOutput側の第２列に登録される。 In the third instruction, the address constant A2 is set in the register R0. In the fourth instruction, 1-byte data (02) loaded from the main memory having the contents of the register R0 as an address is stored in the register R2. In this case, the address A2, the mask (FF000000), and the data (02) are registered as inputs in the second column on the Input side of the RB. At this time, “−” indicating Don't Care is stored for the remaining 3 bytes of the address A2. Register number R2, mask (FFFFFFFF), and data (00000002) are registered as outputs in the second column on the Output side of RB.

第５の命令において、アドレス（Ａ２＋Ｒ２）からロードされた１バイトデータ（22）がレジスタＲ２に格納されている。アドレスＲ２の値は（02）であったので、アドレス（Ａ２＋02）、およびデータ（22）が、入力としてＲＢにおけるInput側の第２列に追加登録される。この際、アドレス（Ａ２＋02）の部分に登録が行われ、アドレス（Ａ２＋01）および（Ａ２＋03）に対応する部分は、Don't Careを意味する「−」のままとなる。すなわち、アドレスＡ２に対応するマスクは（FF00FF00）となる。レジスタ番号Ｒ２、マスク（FFFFFFFF）、およびデータ（00000022）は、出力としてＲＢにおけるOutput側の第２列に上書きされる。 In the fifth instruction, 1-byte data (22) loaded from the address (A2 + R2) is stored in the register R2. Since the value of the address R2 is (02), the address (A2 + 02) and the data (22) are additionally registered in the second column on the Input side in the RB as inputs. At this time, registration is performed in the portion of the address (A2 + 02), and the portions corresponding to the addresses (A2 + 01) and (A2 + 03) remain “−” meaning Don't Care. That is, the mask corresponding to the address A2 is (FF00FF00). The register number R2, mask (FFFFFFFF), and data (00000022) are overwritten in the second column on the Output side of the RB as output.

第６の命令において、アドレス定数Ａ３がレジスタＲ０にセットされる。第７の命令において、レジスタＲ０の内容をアドレスとする主記憶からロードされた１バイトデータ（33）がレジスタＲ３に格納される。この場合、アドレスＡ３、マスク（00FF0000）、およびデータ（33）は入力としてＲＢにおけるInput側の第３列に登録される。レジスタ番号Ｒ３、マスク（FFFFFFFF）、およびデータ（00000033）は出力としてＲＢにおけるOutput側の第３列に登録される。 In the sixth instruction, the address constant A3 is set in the register R0. In the seventh instruction, 1-byte data (33) loaded from the main memory whose address is the contents of the register R0 is stored in the register R3. In this case, the address A3, the mask (00FF0000), and the data (33) are registered as inputs in the third column on the Input side in the RB. Register number R3, mask (FFFFFFFF), and data (00000033) are registered as outputs in the third column on the Output side of RB.

第８の命令において、アドレス（Ｒ１＋Ｒ２）からロードされた１バイトデータ（44）がレジスタＲ４に格納される。この場合、アドレスＲ１とアドレスＲ２は命令区間の内部にて上書きされたレジスタのアドレスとなるので、アドレスＲ１およびアドレスＲ２は命令区間の入力とはならない。一方、アドレス（Ｒ１＋Ｒ２）によって生成されたアドレスＡ４は命令区間の入力であるので、アドレスＡ４、マスク（00FF0000）、およびデータ（44）は入力としてＲＢにおけるInput側の第４列に登録される。レジスタ番号Ｒ４、マスク（FFFFFFFF）、およびデータ（00000044）は出力としてＲＢにおけるOutput側の第４列に登録される。 In the eighth instruction, 1-byte data (44) loaded from the address (R1 + R2) is stored in the register R4. In this case, since the addresses R1 and R2 are the addresses of the registers overwritten inside the instruction section, the addresses R1 and R2 are not input to the instruction section. On the other hand, since the address A4 generated by the address (R1 + R2) is an input in the instruction section, the address A4, the mask (00FF0000), and the data (44) are registered as inputs in the fourth column on the Input side in the RB. Register number R4, mask (FFFFFFFF), and data (00000044) are registered as outputs in the fourth column on the Output side of RB.

第９の命令において、レジスタＲ５から値が読み出され、読み出された値に１が加えられた結果が再びレジスタＲ５に格納される。この場合、レジスタＲ５、マスク（FFFFFFFF）、およびデータ（00000100）は入力としてＲＢにおけるInput側の第５列に登録される。また、レジスタ番号Ｒ５、マスク（FFFFFFFF）、およびデータ（00000101）は出力としてＲＢにおけるOutput側の第５列に登録される。 In the ninth instruction, the value is read from the register R5, and the result obtained by adding 1 to the read value is stored in the register R5 again. In this case, register R5, mask (FFFFFFFF), and data (00000100) are registered as inputs in the fifth column on the Input side of RB. Register number R5, mask (FFFFFFFF), and data (00000101) are registered as outputs in the fifth column on the Output side of RB.

以上のように、命令実行時におけるメモリ／レジスタからの読み出しに際しては、以下の処理が行われる。
（１）ＲＢにおけるOutput側が検索され、読み出されたアドレス／レジスタ番号が既登録であれば、該アドレス／レジスタ番号はInput側に登録されずに終了する。
（２）ＲＢにおけるOutput側になければＲＢにおけるInput側が検索され、読み出されたアドレス／レジスタ番号が既登録であれば該アドレス／レジスタ番号は登録されずに終了する。
（３）ＲＢにおけるInput側にもなければ、ＲＢに新たにエントリが追加されて、該アドレス／レジスタ番号および値が登録される。 As described above, the following processing is performed when reading from the memory / register during instruction execution.
(1) If the Output side in the RB is searched and the read address / register number is already registered, the address / register number ends without being registered on the Input side.
(2) If it is not on the Output side of the RB, the Input side of the RB is searched. If the read address / register number is already registered, the address / register number is not registered and the process ends.
(3) If there is no input side in the RB, an entry is newly added to the RB, and the address / register number and value are registered.

また、命令実行時におけるメモリ／レジスタへの書き込みに際しては以下の処理が行われる。
（１）ＲＢにおけるOutput側が検索され、読み出されたアドレス／レジスタ番号が既登録であれば値が更新されて終了する。
（２）ＲＢにおけるOutput側になければ、新たにエントリが追加されて読み出されたアドレス／レジスタ番号および値が登録される。 In addition, the following processing is performed when writing to the memory / register during instruction execution.
(1) The output side in the RB is searched, and if the read address / register number is already registered, the value is updated and the process ends.
(2) If it is not on the Output side in the RB, an address / register number and a value read by newly adding an entry are registered.

また、後述する特許文献１では、上記のような再利用を行う構成において、プロセッサを複数設け、並列事前実行を行う構成が開示されている。この並列事前実行が行われる際の入力の予測方法として、最後に出現した引数および最近出現した２組の引数の差分に基づいて、ストライド予測を行う方法が開示されている。 Patent Document 1 described below discloses a configuration in which a plurality of processors are provided and parallel pre-execution is performed in the configuration in which the above reuse is performed. As a method of predicting input when this parallel pre-execution is performed, a method of performing stride prediction based on the difference between the last appearing argument and the two most recently appearing arguments is disclosed.

以上のように入力予測を行えば、上記した入力パラメータが単調に変化し続けるような場合に、事前に予測しておいた結果に基づいて効果的に再利用を行うことが可能となる。
情報処理学会論文誌：ハイパフォーマンスコンピューティングシステム，ＨＰＳ５，pp.1-12，Sep.(2002)，“関数値再利用および並列事前実行による高速化技術”（中島康彦、緒方勝也、正西申悟、五島正裕、森眞一郎、北村俊明、富田眞治）（発行日２００２年９月１５日）特開２００４−２５８９０５号公報（公開日２００４年９月１６日） If the input prediction is performed as described above, it is possible to effectively reuse the input parameter based on the result predicted in advance when the above-described input parameter continues to change monotonously.
IPSJ Journal: High Performance Computing System, HPS5, pp.1-12, Sep. (2002), "High-speed technology using function value reuse and parallel pre-execution" (Yasuhiko Nakajima, Katsuya Ogata, Shingo Masanishi) Masahiro Goto, Junichiro Mori, Toshiaki Kitamura, Junji Tomita) (issued on September 15, 2002) JP 2004-258905 A (publication date September 16, 2004)

図２３は、図２０に示す命令区間が繰り返し実行された場合における、ＲＢの入力側に登録される履歴の例を示している。この例では、Timeが１〜４まで変化するごとに命令区間が実行され、命令区間が実行される度に、アドレスＡ２の値は、（02）、（03）、（04）、（05）と変化しており、これに伴って他の入力要素における値が変化している。 FIG. 23 shows an example of the history registered on the input side of the RB when the command section shown in FIG. 20 is repeatedly executed. In this example, the instruction interval is executed every time Time changes from 1 to 4, and each time the instruction interval is executed, the value of the address A2 is (02), (03), (04), (05). Along with this, the values in the other input elements change.

また、各履歴の間に示されるdiffは、対応する入力要素の値の変化量を示している。上記した従来の入力予測は、このdiffを用いて予測を行うことになる。図２４は、この従来の入力予測による予測結果を示している。 Moreover, diff shown between each log | history has shown the variation | change_quantity of the value of a corresponding input element. The above-described conventional input prediction is performed using this diff. FIG. 24 shows a prediction result by this conventional input prediction.

例えばループ制御変数のように、単調変化するアドレス（上記の例ではアドレスＡ２に対応）の内容については正確に予測することができている。しかしながら、命令区間に配列要素が含まれている場合、配列要素の添字が単調変化していても、配列要素値は一般に単調変化するとは限らない。図２３に示す例では、アドレスＡ２からロードした値が配列要素の添字に該当しており、この添字をアドレスとして用いる主記憶参照はアドレスが変化するために、履歴として登録される入力要素の数そのものが変化することになる。このような状況では、同一列の変化に規則性がなくなるために、図２４におけるアドレスＡ３に対応する列に示すように、予測的中率が極めて悪化することになる。 For example, the contents of a monotonously changing address (corresponding to the address A2 in the above example) like a loop control variable can be accurately predicted. However, when an array element is included in the instruction section, the array element value generally does not always change monotonously even if the subscript of the array element changes monotonously. In the example shown in FIG. 23, since the value loaded from the address A2 corresponds to the subscript of the array element, and the main memory reference using this subscript as the address changes the address, the number of input elements registered as the history It will change. In such a situation, since the regularity of the change in the same column is lost, as shown in the column corresponding to the address A3 in FIG. 24, the predictive predictive value is extremely deteriorated.

入力予測を行う際に、内容が変化しないアドレスに関する値の予測をすることはハードウェア資源の無駄となる。また、値の変化に規則性がない場合は、差分を０と仮定して予測するしかないが、無理に予測することにより、かえって的中率を下げることがある。図２４に示す例では、Ａ２＋4に対応するアドレスについてはマスク位置そのものの変化を予測すべきであるが、マスク位置の変化まで予測することは困難である。この場合には、予測せずに直接主記憶値を参照することが得策であることがわかる。 When performing input prediction, it is a waste of hardware resources to predict values related to addresses whose contents do not change. In addition, when there is no regularity in the change of the value, the prediction can only be made assuming that the difference is 0, but the prediction may be lowered by forcibly predicting. In the example shown in FIG. 24, the change in the mask position itself should be predicted for the address corresponding to A2 + 4, but it is difficult to predict the change in the mask position. In this case, it can be seen that it is a good idea to directly refer to the main memory value without prediction.

以上の課題はいずれも、登録された全てのアドレスを一律に扱ったことにより生じた問題である。 All of the above problems are problems caused by uniformly handling all registered addresses.

本発明は上記の問題点を解決するためになされたもので、その目的は、主記憶手段から命令列および／または値を読み出し、演算処理を行った結果を主記憶手段に書き込む処理を行うデータ処理装置において、予測の的中率を向上させることによって、より効果的な命令区間の事前実行を実現するデータ処理装置、データ処理プログラム、およびデータ処理プログラムを記録した記録媒体を提供することにある。 The present invention has been made in order to solve the above-mentioned problems, and its purpose is to read out a sequence of instructions and / or values from the main storage means, and to perform processing for writing the result of the arithmetic processing into the main storage means An object of the present invention is to provide a data processing device, a data processing program, and a recording medium on which the data processing program is recorded that realizes more effective pre-execution of an instruction interval by improving the prediction accuracy in the processing device. .

上記の課題を解決するために、本発明に係るデータ処理装置は、主記憶手段から命令区間を読み出し、演算処理を行った結果を主記憶手段に書き込む処理を行うデータ処理装置において、上記主記憶手段から読み出した命令区間に基づく演算を行う第１の演算手段と、上記第１の演算手段による上記主記憶手段に対する読み出しおよび書き込み時に用いられるレジスタと、複数の命令区間の実行結果としての入力パターンおよび出力パターンを記憶する入出力記憶手段とを備え、上記第１の演算手段が、命令区間を実行する際に、該命令区間の入力パターンと、上記入出力記憶手段に記憶されている入力パターンとが一致した場合、該入力パターンと対応して上記入出力記憶手段に記憶されている出力パターンをレジスタおよび／または主記憶手段に出力する再利用処理を行うとともに、上記第１の演算手段による命令区間の実行結果を、上記入出力記憶手段に記憶する際に、入力パターンに含まれる入力要素のうち、予測を行うべき入力要素と予測を行う必要のない入力要素とを区別し、この区別情報を上記入出力記憶手段に登録する区別処理手段と、上記区別情報に基づいて、上記入出力記憶手段に記憶されている入力要素のうち、予測を行うべき入力要素の値の変化の予測を行う予測処理手段と、上記予測処理手段によって予測された入力要素に基づいて、該当する命令区間を事前実行する第２の演算手段とをさらに備え、上記第２の演算手段による命令区間の事前実行結果が上記入出力記憶手段に記憶されることを特徴としている。 In order to solve the above problems, a data processing apparatus according to the present invention is a data processing apparatus that performs processing for reading a command section from a main storage means and writing a result of arithmetic processing into the main storage means. First arithmetic means for performing an operation based on an instruction section read from the means; a register used when reading and writing to the main storage means by the first arithmetic means; and an input pattern as an execution result of a plurality of instruction sections And an input / output storage means for storing an output pattern, and when the first arithmetic means executes an instruction section, an input pattern of the instruction section and an input pattern stored in the input / output storage means Matches the input pattern, the output pattern stored in the input / output storage means is stored in the register and / or main memory. In addition to performing the reuse processing output to the means, when storing the execution result of the instruction section by the first computing means in the input / output storage means, prediction should be made among the input elements included in the input pattern Differentiating between input elements and input elements that do not need to be predicted, the discrimination information is registered in the input / output storage means, and stored in the input / output storage means based on the discrimination information Of the input elements, a prediction processing means for predicting a change in the value of the input element to be predicted, and a second operation for executing in advance the corresponding instruction section based on the input element predicted by the prediction processing means And a pre-execution result of the instruction section by the second calculation means is stored in the input / output storage means.

上記の構成では、入出力記憶手段に、複数の命令区間の実行結果としての入力パターンおよび出力パターンが記憶されており、命令区間の実行時に、該命令区間の入力パターンと、入出力記憶手段に記憶されている入力パターンとが一致した場合に再利用を行う構成となっている。そして、予測処理手段によって、入出力記憶手段に記憶されている入力要素の今後の変化が予測され、この予測結果に基づいて、第２の演算手段が命令区間の事前実行を行うようになっている。 In the above configuration, the input pattern and the output pattern as the execution result of the plurality of instruction sections are stored in the input / output storage means, and when the instruction section is executed, the input pattern of the instruction section and the input / output storage means are stored. The configuration is such that reuse is performed when the stored input pattern matches. And the future change of the input element memorize | stored in the input-output memory | storage means is estimated by a prediction process means, Based on this prediction result, a 2nd calculating means will perform the prior execution of an instruction area. Yes.

ここで、前記した従来技術のように、単純に入力要素の予測を行うと、予測の的中率が低くなることによって、予測による事前実行の効果が非常に低くなるという問題がある。これに対して、上記の構成によれば、まず区別処理手段によって、入力パターンに含まれる入力要素のうち、予測を行うべき入力要素と予測を行う必要のない入力要素とが区別される。そして、予測処理手段は、区別処理手段によって予測を行うべき入力要素と判断された入力要素について予測を行うようになっている。したがって、予測の的中率を向上させることが可能となるので、より効果的な命令区間の事前実行を実現することが可能となる。このような事前実行が行われることによって、次に、同じ命令列が出現し、予測入力値と同じ入力が行われた場合には、命令列記憶手段に記憶されている値を再利用することが可能となる。 Here, when the input element is simply predicted as in the prior art described above, there is a problem that the effect of the prior execution by the prediction becomes very low due to the low prediction accuracy. On the other hand, according to the above configuration, first of all, the input element included in the input pattern is distinguished from the input element that should be predicted and the input element that does not need to be predicted by the distinction processing means. The prediction processing unit performs prediction for the input element that is determined to be predicted by the distinction processing unit. Therefore, since the prediction accuracy can be improved, it is possible to realize more effective pre-execution of the instruction interval. By performing such pre-execution, when the same instruction sequence appears next and the same input as the predicted input value is made, the value stored in the instruction sequence storage means is reused. Is possible.

また、本発明に係るデータ処理装置は、上記の構成において、上記区別処理手段が、入力に用いられた上記レジスタの各アドレスに対して、スタックポインタまたはフレームポインタとして用いられる場合、および、該アドレスに対する書き込み命令が定数セット命令である場合に、該当アドレスに対して区別情報として定数フラグをセットし、上記以外の場合に、該当アドレスに対して上記定数フラグをリセットする構成としてもよい。 In the data processing device according to the present invention, in the above configuration, the distinction processing unit is used as a stack pointer or a frame pointer for each address of the register used for input, and the address In the case where the write instruction for is a constant set instruction, a constant flag may be set as discrimination information for the corresponding address, and in the other cases, the constant flag may be reset for the corresponding address.

上記の構成によれば、入力に用いられたレジスタのアドレスのうち、アドレスが固定しており、かつ、値が単調変化すると予測されるアドレスに定数フラグをセットすることが可能となる。よって、定数フラグがセットされているレジスタのアドレスに基づく入力要素に対して予測を行うようにすることによって、予測的中率を向上させることが可能となる。 According to the configuration described above, it is possible to set a constant flag at an address where the address is fixed and the value is predicted to change monotonically among the addresses of the registers used for input. Therefore, it is possible to improve the predictive predictive value by performing prediction for the input element based on the address of the register in which the constant flag is set.

また、本発明に係るデータ処理装置は、上記の構成において、上記区別処理手段が、入力要素が新規に上記入出力記憶手段に記憶される際に、該入力要素のアドレスに対して、区別情報として変更フラグをリセットし、上記入出力記憶手段に記憶された後に、該当アドレスに対してストア命令が実行された場合に、該当アドレスに対して変更フラグをセットする構成としてもよい。 Further, in the data processing apparatus according to the present invention, in the above configuration, when the input processing unit newly stores the input element in the input / output storage unit, the discrimination processing unit performs the discrimination information for the input element address. The change flag may be reset to the corresponding address when a store instruction is executed for the corresponding address after the change flag is reset and stored in the input / output storage means.

上記の構成によれば、入出力記憶手段に記憶されたものの、その後一度も書き込みが行われないアドレスに対しては、変更フラグがリセットされた状態となる。このようなアドレスに記憶されている内容は変化していないことになるので、該アドレスに対して予測を行う必要はないことになる。すなわち、上記のような変更フラグが入力要素のアドレスに設けられることによって、予測が必要なアドレスのみに対して予測を行うことが可能となる。よって、予測処理のためのハードウェア資源を有効に利用することが可能となる。 According to the above configuration, the change flag is reset for an address that is stored in the input / output storage means but is never written thereafter. Since the contents stored at such an address have not changed, it is not necessary to make a prediction for the address. That is, by providing the change flag as described above at the address of the input element, it is possible to perform prediction only for the address that needs to be predicted. Therefore, it is possible to effectively use hardware resources for prediction processing.

また、本発明に係るデータ処理装置は、上記の構成において、上記区別処理手段が、入力要素が新規に上記入出力記憶手段に記憶される際に、該入力要素のアドレスに対して、区別情報として履歴フラグをリセットし、該アドレスに対するロード命令実行時に、該アドレスを生成したレジスタアドレスに上記定数フラグがセットされている場合に、該アドレスに対して履歴フラグをセットする構成としてもよい。 Further, in the data processing apparatus according to the present invention, in the above configuration, when the input processing unit newly stores the input element in the input / output storage unit, the discrimination processing unit performs the discrimination information for the input element address. The history flag may be reset as described above, and when the load instruction for the address is executed, if the constant flag is set in the register address that generated the address, the history flag may be set for the address.

上記の構成によれば、入出力記憶手段に記憶されている入力要素のアドレスに対するロード命令実行時に、該アドレスを生成したレジスタアドレスに上記定数フラグがセットされている場合に、該アドレスに対して履歴フラグがセットされるようになっている。ここで、定数フラグがセットされているレジスタアドレスとは、上記のように、アドレスが固定しており、かつ、値が単調変化すると予測されるアドレスとなっている。よって、このようなレジスタアドレスに基づいて生成されたアドレスに関して予測を行うことによる予測的中率は高くなることが予想される。すなわち、上記のような履歴フラグを設けることによって、予測すべきアドレスを適切に設定することが可能となる。 According to the above configuration, when the load instruction is executed for the address of the input element stored in the input / output storage means, if the constant flag is set to the register address that generated the address, A history flag is set. Here, the register address in which the constant flag is set is an address where the address is fixed and the value is predicted to change monotonously as described above. Therefore, it is expected that the predictive predictive value by performing the prediction on the address generated based on such a register address is increased. That is, by providing the history flag as described above, it is possible to appropriately set the address to be predicted.

なお、履歴フラグとしては、各アドレスに文字通りのフラグをたてるようにしてもよいし、複数のバイトデータからなるアドレスのうち、履歴保存対象とするバイト位置を示すマスクといった形式で履歴フラグを実現するようにしてもよい。 As the history flag, a literal flag may be set for each address, or the history flag is realized in a format such as a mask indicating a byte position to be stored in a history among addresses consisting of a plurality of byte data. You may make it do.

また、本発明に係るデータ処理装置は、上記の構成において、上記区別処理手段が、入力要素が新規に上記入出力記憶手段に記憶される際に、該入力要素のアドレスに対して、区別情報として変更フラグをリセットし、上記入出力記憶手段に記憶された後に、該当アドレスに対してストア命令が実行された場合に、該当アドレスに対して変更フラグをセットするとともに、上記予測処理手段が、上記入出力記憶手段に記憶されている入力要素のアドレスのうち、上記変更フラグがセットされ、かつ、履歴フラグがセットされているアドレスに関して、入力要素の変化の予測を行う構成としてもよい。 Further, in the data processing apparatus according to the present invention, in the above configuration, when the input processing unit newly stores the input element in the input / output storage unit, the discrimination processing unit performs the discrimination information for the input element address. And when the store instruction is executed for the corresponding address after being stored in the input / output storage means, the change flag is set for the corresponding address, and the prediction processing means Of the addresses of the input elements stored in the input / output storage means, the change of the input elements may be predicted with respect to the address where the change flag is set and the history flag is set.

ここで、変更フラグがセットされているアドレスとは、上記したように、予測を行うことによる効果が期待できるアドレスとなる。また、履歴フラグがセットされているアドレスとは、上記したように、予測的中率が高いことが期待できるアドレスとなる。したがって、上記の構成によれば、予測を行うことによる効果が高いと予想されるアドレスに関してのみ予測が行われることになる。よって、予測処理のためのハードウェア資源を有効に利用することが可能となる。 Here, as described above, the address for which the change flag is set is an address at which the effect of performing the prediction can be expected. In addition, as described above, the address where the history flag is set is an address that can be expected to have a high predictive predictive value. Therefore, according to the above configuration, prediction is performed only for addresses that are expected to have a high effect by performing prediction. Therefore, it is possible to effectively use hardware resources for prediction processing.

また、本発明に係るデータ処理装置は、上記の構成において、上記予測処理手段が、上記入出力記憶手段に記憶されている入力要素のうち、該入力要素の履歴における値の変化量が０ではない入力要素のみに対して、入力要素の値の変化の予測を行う構成としてもよい。 In the data processing device according to the present invention, in the above configuration, the prediction processing unit is configured such that, among the input elements stored in the input / output storage unit, the change amount of the value in the history of the input element is zero. It is good also as a structure which estimates the change of the value of an input element only with respect to the input element which does not exist.

上記の構成によれば、履歴における値の変化量が０ではない入力要素のみに対して、入力要素の値の変化の予測が行われることになる。ここで、履歴における値の変化量が０となっている入力要素とは、変化がないことが予想される入力要素であるので、該入力要素に対して予測を行う必要はないことになる。すなわち、上記の構成によれば、予測が必要なアドレスのみに対して予測を行うことが可能となる。よって、予測処理のためのハードウェア資源を有効に利用することが可能となる。 According to the above configuration, the change in the value of the input element is predicted only for the input element whose value change in the history is not zero. Here, the input element whose value change amount in the history is 0 is an input element that is expected to have no change, and thus it is not necessary to perform prediction on the input element. That is, according to the above configuration, it is possible to perform prediction only for addresses that need to be predicted. Therefore, it is possible to effectively use hardware resources for prediction processing.

以上のように、本発明に係るデータ処理装置は、上記第１の演算手段による命令区間の実行結果を、上記入出力記憶手段に記憶する際に、入力パターンに含まれる入力要素のうち、予測を行うべき入力要素と予測を行う必要のない入力要素とを区別し、この区別情報を上記入出力記憶手段に登録する区別処理手段と、上記区別情報に基づいて、上記入出力記憶手段に記憶されている入力要素のうち、予測を行うべき入力要素の値の変化の予測を行う予測処理手段と、上記予測処理手段によって予測された入力要素に基づいて、該当する命令区間を事前実行する第２の演算手段とをさらに備え、上記第２の演算手段による命令区間の事前実行結果が上記入出力記憶手段に記憶される構成である。 As described above, the data processing apparatus according to the present invention predicts among the input elements included in the input pattern when storing the execution result of the instruction section by the first arithmetic unit in the input / output storage unit. The input element to be performed is distinguished from the input element that does not need to be predicted, and the distinction information is registered in the input / output storage means, and stored in the input / output storage means based on the distinction information A prediction processing means for predicting a change in the value of the input element to be predicted among the input elements that are to be predicted, and a corresponding instruction section in advance based on the input elements predicted by the prediction processing means And a second execution means, and the pre-execution result of the instruction section by the second operation means is stored in the input / output storage means.

これにより、予測の的中率を向上させることが可能となるので、より効果的な命令区間の事前実行を実現することが可能となる。このような事前実行が行われることによって、次に、同じ命令列が出現し、予測入力値と同じ入力が行われた場合には、命令列記憶手段に記憶されている値を再利用することが可能となるという効果を奏する。 As a result, it is possible to improve the prediction accuracy, and it is possible to realize more effective pre-execution of the instruction interval. By performing such pre-execution, when the same instruction sequence appears next and the same input as the predicted input value is made, the value stored in the instruction sequence storage means is reused. There is an effect that becomes possible.

本発明の実施の一形態について図１ないし図１６に基づいて説明すれば、以下のとおりである。 One embodiment of the present invention will be described below with reference to FIGS.

（データ処理装置の構成）
本実施形態に係るデータ処理装置の概略構成を図２に示す。同図に示すように、該データ処理装置は、ＭＳＰ(Main Stream Processor)１Ａ、ＳＳＰ(Shadow Stream Processor)１Ｂ、再利用表としてのＲＦ／ＲＢ（命令列記憶手段）２、および主記憶（主記憶手段）を備えた構成となっており、主記憶３に記憶されているプログラムデータなどを読み出して各種演算処理を行い、演算結果を主記憶３に書き込む処理を行うものである。なお、同図に示す構成では、ＳＳＰ１Ｂを１つ備えた構成となっているが、２つ以上備えた構成となっていてもよい。 (Configuration of data processing device)
FIG. 2 shows a schematic configuration of the data processing apparatus according to the present embodiment. As shown in the figure, the data processing apparatus includes an MSP (Main Stream Processor) 1A, an SSP (Shadow Stream Processor) 1B, an RF / RB (instruction sequence storage means) 2 as a reuse table, and a main memory (main The storage unit) is configured to read out program data stored in the main memory 3, perform various arithmetic processes, and perform a process of writing the arithmetic results into the main memory 3. In the configuration shown in the figure, the configuration includes one SSP 1B, but the configuration may include two or more.

ＲＦ／ＲＢ２は、プログラムにおける関数およびループを再利用するためのデータを格納するメモリ手段であり、ＲＢ登録処理部（区別処理手段）２Ａおよび予測処理部（予測処理手段）２Ｂを備えた構成となっている。このＲＦ／ＲＢ２の詳細、ならびにＲＢ登録処理部２Ａおよび予測処理部２Ｂの詳細については後述する。 RF / RB2 is a memory means for storing data for reusing functions and loops in a program, and has a configuration including an RB registration processing section (discrimination processing means) 2A and a prediction processing section (prediction processing means) 2B. It has become. Details of the RF / RB 2 and details of the RB registration processing unit 2A and the prediction processing unit 2B will be described later.

主記憶３は、ＭＳＰ１ＡおよびＳＳＰ１Ｂの作業領域としてのメモリであり、例えばＲＡＭ(Random Access Memory)などによって構成されるものである。例えばハードディスクなどの外部記憶手段からプログラムやデータなどが主記憶３に読み出され、ＭＳＰ１ＡおよびＳＳＰ１Ｂは、主記憶３に読み出されたデータに基づいて演算を行うことになる。 The main memory 3 is a memory as a work area of the MSP 1A and the SSP 1B, and is constituted by, for example, a RAM (Random Access Memory). For example, a program or data is read from an external storage means such as a hard disk to the main memory 3, and the MSP 1A and SSP 1B perform calculations based on the data read to the main memory 3.

ＭＳＰ１Ａは、ＲＷ（再利用記憶手段）４Ａ、演算器（第１の演算手段）５Ａ、レジスタ６Ａ、およびＣａｃｈｅ７Ａを備えた構成となっている。また、ＳＳＰ１Ｂは、同様に、ＲＷ（再利用記憶手段）４Ｂ、演算器（第２の演算手段）５Ｂ、レジスタ６Ｂ、およびＣａｃｈｅ／Ｌｏｃａｌ７Ｂを備えた構成となっている。 The MSP 1A includes an RW (reuse storage unit) 4A, an arithmetic unit (first arithmetic unit) 5A, a register 6A, and a Cache 7A. Similarly, the SSP 1B includes a RW (reuse storage unit) 4B, a computing unit (second computing unit) 5B, a register 6B, and a Cache / Local 7B.

ＲＷ４Ａ・４Ｂは、再利用ウィンドウであり、現在実行中かつ登録中であるＲＦおよびＲＢ（後述する）の各エントリをリング構造のスタックとして保持するものである。このＲＷ４Ａ・４Ｂは、実際のハードウェア構造としては、ＲＦ／ＲＢ２における特定のエントリをアクティブにする制御線の集合によって構成される。 The RWs 4A and 4B are reusable windows, and hold RF and RB entries (described later) that are currently being executed and registered as a ring structure stack. The RW 4A and 4B are configured by a set of control lines that activate a specific entry in the RF / RB 2 as an actual hardware structure.

演算器５Ａ・５Ｂは、レジスタ６Ａ・６Ｂに保持されているデータに基づいて演算処理を行うものであり、ＡＬＵ（arithmetic and logical unit）と呼ばれるものである。レジスタ６Ａ・６Ｂは、演算器５Ａ・５Ｂによって演算を行うためのデータを保持する記憶手段である。なお、本実施形態では、演算器５Ａ・５Ｂ、およびレジスタ６Ａ・６Ｂは、ＳＰＡＲＣアーキテクチャに準じたものとする。Ｃａｃｈｅ７Ａ・７Ｂは、主記憶３と、ＭＳＰ１ＡおよびＳＳＰ１Ｂとの間でのキャッシュメモリとして機能するものである。なお、ＳＳＰ１Ｂでは、Ｃａｃｈｅ７Ｂには、局所メモリとしてのＬｏｃａｌ７Ｂが含まれているものとする。 The arithmetic units 5A and 5B perform arithmetic processing based on the data held in the registers 6A and 6B, and are called ALUs (arithmetic and logical units). The registers 6A and 6B are storage means for holding data for performing calculations by the calculators 5A and 5B. In this embodiment, it is assumed that the arithmetic units 5A and 5B and the registers 6A and 6B conform to the SPARC architecture. The Caches 7A and 7B function as cache memories between the main memory 3 and the MSP 1A and SSP 1B. In SSP1B, Cache7B includes Local7B as a local memory.

（ＲＦ／ＲＢの構成）
図１は、本実施形態におけるＲＦ／ＲＢ２によって実現される再利用表を示している。同図に示すように、ＲＦは、複数のエントリを格納しており、各エントリに対して、該エントリが有効であるか否かを示すV、エントリ入れ替えのヒントを示すLRU、関数の先頭アドレスを示すStart、参照すべき主記憶アドレスを示すRead/Write、および、関数とループとを区別するF/Lを保持している。 (Configuration of RF / RB)
FIG. 1 shows a reuse table realized by the RF / RB 2 in the present embodiment. As shown in the figure, the RF stores a plurality of entries. For each entry, V indicates whether the entry is valid, LRU indicates a hint for replacing the entry, and the start address of the function. , A read / write indicating a main memory address to be referred to, and an F / L for distinguishing between a function and a loop.

また、ＲＢは、ＲＦに格納されているエントリに対応して複数のエントリを格納しており、各エントリに対して、該エントリが有効であるか否かを示すV、エントリ入れ替えのヒントを示すLRU、関数またはループを呼び出す際の直前のスタックポイント％spを示すSP、引数(Args.)（V：有効エントリ、Val.：値）、主記憶値(C-FLAG：Readアドレスの変更フラグ、P-Mask：Readアドレスの履歴マスク、Mask：Read/Writeアドレスの有効バイト、Value：値)、返り値(Return Values)(V：有効エントリ、Val.：値)、ループの終了アドレス(End)、ループ終了時の分岐方向を示すtaken/not、および、引数や返り値以外のレジスタおよび条件コード(Regs.,CC)を保持している。また、ＲＢは、１つ以上のレジスタアドレスに対応して定数フラグ（Const-FLAG）を格納するメモリ領域を保持している。なお、定数フラグ（Const-FLAG）の詳細については後述する。 The RB stores a plurality of entries corresponding to the entries stored in the RF. For each entry, V indicates whether the entry is valid, and indicates an entry replacement hint. SP indicating stack point% sp immediately before calling LRU, function or loop, argument (Args.) (V: valid entry, Val .: value), main memory value (C-FLAG: Read address change flag, P-Mask: Read address history mask, Mask: Valid byte of Read / Write address, Value: Value), Return Values (V: Valid entry, Val .: Value), Loop end address (End) Holds / not indicating the branching direction at the end of the loop, and registers and condition codes (Regs., CC) other than arguments and return values. The RB also holds a memory area for storing a constant flag (Const-FLAG) corresponding to one or more register addresses. Details of the constant flag (Const-FLAG) will be described later.

上記のＲＦおよびＲＢにおける各項目についてより詳細に説明する。上記Vは、上記のようにエントリが有効であるか否かを示すものであるが、具体的には、未登録時には「０」、登録中である場合には「２」、登録済である場合には「１」の値が格納されるようになっている。例えば、ＲＦまたはＲＢを確保する際に、未登録エントリ（V=0）があれば、これを使用し、未登録エントリがなければ、登録済エントリ（V=1）の中からＬＲＵが最小のものを選択して上書きすることになる。登録中エントリ（V=2）は使用中であるので上書きすることはできない。 Each item in the RF and RB will be described in more detail. The above V indicates whether or not the entry is valid as described above. Specifically, it is “0” when not registered, “2” when registered, and registered. In this case, the value “1” is stored. For example, when securing RF or RB, if there is an unregistered entry (V = 0), this is used, and if there is no unregistered entry, LRU is the smallest among registered entries (V = 1). Select one and overwrite it. The entry being registered (V = 2) is in use and cannot be overwritten.

上記LRUは、一定時間間隔で右へシフトされていくシフトレジスタの中の「１」の個数を示したものである。ＲＦの場合、このシフトレジスタは、該当エントリに関して、再利用のための登録を行ったか、もしくは再利用を試みた場合に、左端に「１」が書き込まれるようになっている。したがって、該当エントリが頻繁に使用されれば、LRUは大きな値となり、一定期間使用されなければ、LRUの値は０となる。一方、ＲＢの場合、シフトレジスタには、該当エントリが再利用された場合に「１」が書き込まれるようになっている。したがって、該当エントリが頻繁に使用されれば、LRUは大きな値となり、一定期間使用されなければ、LRUの値は０となる。 The LRU indicates the number of “1” in the shift register that is shifted to the right at regular time intervals. In the case of RF, this shift register is configured such that “1” is written at the left end when registration for reuse is performed or the reuse is attempted for the corresponding entry. Therefore, if the corresponding entry is frequently used, the LRU value is large. If the entry is not used for a certain period, the LRU value is 0. On the other hand, in the case of RB, “1” is written in the shift register when the corresponding entry is reused. Therefore, if the corresponding entry is frequently used, the LRU value is large. If the entry is not used for a certain period, the LRU value is 0.

上記ＲＢにおける主記憶値のMaskについて説明する。一般に、アドレスとデータとを１バイトずつ管理することにすれば管理が可能であるが、実際には、４バイト単位でデータを管理する方がキャッシュ参照を高速に行うことができる。そこで、ＲＦでは、主記憶アドレスを４の倍数で記憶するようになっている。一方、管理単位を４バイトとする場合、１バイト分だけをロードすることに対応できるようにするために、４バイトのうちでどのバイトが有効であるかを示す必要がある。すなわち、Maskは、４バイトのうちでどのバイトが有効であるかを示す４ビットのデータとなっている。例えば、C001番地から１バイト分をロードした結果、値がE8であった場合、ＲＦには、アドレスC000が登録され、ＲＢのMaskに「0100」、Valueに「00E80000」が登録されることになる。なお、Readアドレスにおける変更フラグ（C-FLAG）および履歴マスク（P-Mask）の詳細については後述する。 The main memory value Mask in the RB will be described. In general, management is possible by managing addresses and data one byte at a time, but in practice, cache management can be performed at higher speed by managing data in units of 4 bytes. Therefore, in RF, the main memory address is stored as a multiple of four. On the other hand, if the management unit is 4 bytes, it is necessary to indicate which of the 4 bytes is valid in order to be able to support loading only 1 byte. That is, Mask is 4-bit data indicating which of the 4 bytes is valid. For example, if the value of E8 is loaded from address C001 and the value is E8, address C000 is registered in RF, “0100” is registered in Mask of RB, and “00E80000” is registered in Value. Become. The details of the change flag (C-FLAG) and history mask (P-Mask) in the Read address will be described later.

上記の引数や返り値以外のレジスタおよび条件コード(Regs.,CC)について説明する。本実施形態では、ＳＰＡＲＣアーキテクチャレジスタのうち、汎用レジスタ%g0-7、%o0-7、%l0-7、%i0-7、浮動小数点レジスタ%f0-31、条件コードレジスタICC、浮動小数点条件コードレジスタFCCを用いるようになっている（詳細は後述する）。このうち、リーフ関数の入力は汎用レジスタ%o0-5、出力は汎用レジスタ%o0-1、また、非リーフ関数の入力は汎用レジスタ%i0-5、出力は汎用レジスタ%i0-1、になり、入力は、arg[0-5]、出力は、rti[0-1]に登録される。ＳＰＡＲＣ−ＡＢＩの規定では、これら以外のレジスタは関数の入出力にはならないので、関数に関してはＲＢにおける引数(Args.)の項で十分である。 The registers and condition codes (Regs., CC) other than the above arguments and return values will be described. In this embodiment, among the SPARC architecture registers, general-purpose registers% g0-7,% o0-7,% l0-7,% i0-7, floating point register% f0-31, condition code register ICC, and floating point condition code The register FCC is used (details will be described later). Of these, the input of the leaf function is general-purpose register% o0-5, the output is general-purpose register% o0-1, the input of non-leaf function is general-purpose register% i0-5, and the output is general-purpose register% i0-1. The input is registered in arg [0-5], and the output is registered in rti [0-1]. According to the SPARC-ABI rules, the registers other than these do not serve as input / output of functions, so the argument (Args.) In RB is sufficient for functions.

一方、ＳＰＡＲＣ−ＡＢＩの規定では、ループの入出力に関しては、用いられるレジスタの種類を特定することはできないので、ループの入出力を特定するには、全ての種類のレジスタに関してＲＢに登録する必要がある。よって、ＲＢにおけるRegs.,CCには、%g0-7、%o0-7、%l0-7、%i0-7、%f0-31、ICC、FCCが登録されるようになっている。 On the other hand, according to the SPARC-ABI rules, the type of register used cannot be specified for loop input / output. Therefore, in order to specify loop input / output, all types of registers must be registered in the RB. There is. Therefore,% g0-7,% o0-7,% 10-7,% i0-7,% f0-31, ICC, FCC are registered in Regs., CC in RB.

以上のように、ＲＦ／ＲＢ２において、ReadアドレスはＲＦが一括管理し、MaskおよびValueはＲＢが管理している。これにより、Readアドレスの内容とＲＢの複数エントリをＣＡＭ(content-addressable memory)によって一度に比較する構成を可能としている。このことについて、以下により詳しく説明する。 As described above, in RF / RB2, the read address is collectively managed by RF, and the mask and value are managed by RB. As a result, a configuration is possible in which the contents of the Read address and a plurality of RB entries are compared at once by CAM (content-addressable memory). This will be described in more detail below.

一般的に、アドレスが与えられると、そのアドレスに格納された値を参照することができるメモリは、ＲＡＭと呼ばれるメモリである。一方、上記のＣＡＭとは、連想メモリと呼ばれるメモリであり、検索すべき内容が与えられると、そのエントリに対応する信号がＯＮとなるように動作するようになっている。通常は、ＣＡＭはＲＡＭとセットにして用いられる。 Generally, when an address is given, a memory that can refer to a value stored at the address is a memory called a RAM. On the other hand, the CAM is a memory called an associative memory, and operates so that when a content to be searched is given, a signal corresponding to the entry is turned ON. Normally, CAM is used as a set with RAM.

ここで、ＣＡＭとＲＡＭとの連携動作について、具体例を挙げて説明する。ＣＡＭに、「５，５，５，５，５」、「１，３，１，１，１」、「１，３，３，５，２」、「６，６，６，６，６」というデータ列がエントリとして登録されており、ＲＡＭに、ＣＡＭにおける各データ列に対応して、「５，５」、「１，１」、「１，２」、「６，６」というデータが登録されているとする。ここで、検索すべきデータ列として、「１，３，３，５，２」をＣＡＭに入力すると、一致するエントリがＯＮとなり、ＲＡＭに登録されている該当するデータ「１，２」が出力されることになる。この具体例と同様の構成および動作によって、上記ＲＢが実現されることになる。 Here, the cooperative operation between the CAM and the RAM will be described with a specific example. "5, 5, 5, 5, 5", "1, 3, 1, 1, 1", "1, 3, 3, 5, 2", "6, 6, 6, 6, 6" The data strings “5, 5”, “1, 1”, “1, 2”, “6, 6” are stored in the RAM corresponding to the data strings in the CAM. Suppose that it is registered. Here, when “1, 3, 3, 5, 2” is input to the CAM as the data string to be searched, the matching entry is turned ON, and the corresponding data “1, 2” registered in the RAM is output. Will be. The RB is realized by the same configuration and operation as this specific example.

（再利用処理の概略）
次に、関数およびループのそれぞれの場合について、再利用処理の概略について説明する。 (Outline of reuse processing)
Next, the outline of the reuse process will be described for each case of the function and the loop.

まず、関数の場合について説明する。関数から復帰するまでに次の関数を呼び出した場合、または、登録すべき入出力が再利用表の容量を超える、引数の第７ワードを検出する、途中でシステムコールや割り込みが発生する、などの擾乱が発生しなかった場合、復帰命令を実行した時点で、登録中の入出力表エントリを有効にする。 First, the case of a function will be described. When the next function is called before returning from the function, or the input / output to be registered exceeds the capacity of the reuse table, the seventh word of the argument is detected, a system call or interrupt occurs in the middle, etc. If no disturbance occurs, the registered I / O table entry is validated when the return instruction is executed.

以降、図１を参照しながら説明すると、関数を呼び出す前に、(1)ＲＦに登録されているエントリにおける関数の先頭アドレスに、該当関数の先頭アドレスと一致するものがあるかを検索する。一致するものがある場合には、(2)ＲＢに登録されている該当関数に関するエントリにおける引数が、呼び出す関数の引数と完全に一致するエントリを選択する。そして、(3)関連する主記憶アドレスすなわち少なくとも１つのMaskが有効であるReadアドレスをＲＦからすべて参照して、(4)ＲＢに登録されている内容と一致比較を行う。全ての入力が一致した場合に、(5)ＲＢに登録済の出力（返り値、大域変数、およびＡの局所変数）を主記憶３に書き戻すことによって、関数の実行を省略する、すなわち関数の再利用を実現することができる。 Hereinafter, description will be made with reference to FIG. 1. Before calling a function, (1) a search is made as to whether the start address of the function in the entry registered in the RF matches the start address of the corresponding function. If there is a match, (2) an entry in which the argument in the entry related to the function registered in the RB completely matches the argument of the function to be called is selected. Then, (3) the related main memory address, that is, at least one read address in which at least one mask is valid is referred to from RF, and (4) the content is registered with RB. When all the inputs match, (5) by omitting the execution of the function by writing back the output (return value, global variable, and A local variable) registered in the RB to the main memory 3, that is, the function Can be reused.

次に、ループの場合について説明する。ループが完了する以前に関数から復帰したり、前記した擾乱が発生したりするなど、ループの入出力登録が中止されなければ、登録中のループに対応する後方分岐命令を検出した時点で、登録中の入出力表エントリを有効にし、そのループの登録を完了する。 Next, the case of a loop will be described. If loop I / O registration is not canceled, such as when the function returns before the loop is completed or the above disturbance occurs, it will be registered when the backward branch instruction corresponding to the registered loop is detected. Validate the input / output table entry in the middle and complete the registration of the loop.

さらに、後方分岐命令が成立する場合は、次のループが再利用可能かどうかを判断する。すなわち、図１を参照しながら説明すると、後方分岐する前に、(1)ＲＦに登録されているエントリにおけるループの先頭アドレスに、該当ループの先頭アドレスと一致するものがあるかを検索する。一致するものがある場合には、(2)ＲＢに登録されている該当ループに関するレジスタ入力値が、呼び出すループのレジスタ入力値と完全に一致するエントリを選択する。そして、(3)関連する主記憶アドレスをＲＦから全て参照して、(4)ＲＢに登録されている内容と一致比較を行う。全ての入力が一致した場合に、(5)ＲＢに登録済の出力（レジスタおよび主記憶出力値）を主記憶３に書き戻すことによってループの実行を省略する、すなわちループの再利用を実現することができる。 Further, when the backward branch instruction is established, it is determined whether or not the next loop can be reused. That is, referring to FIG. 1, before branching backward, (1) a search is made as to whether or not there is a match with the start address of the corresponding loop start address in the entry registered in the RF. If there is a match, (2) an entry in which the register input value related to the corresponding loop registered in the RB completely matches the register input value of the calling loop is selected. Then, (3) all related main memory addresses are referred to from RF, and (4) a comparison with the contents registered in RB is performed. When all inputs match, (5) loop execution is omitted by writing back the output (register and main memory output value) registered in RB back to main memory 3, that is, reusing the loop. be able to.

再利用した場合、ＲＢに登録されている分岐方向に基づいて、さらに次のループに関して同様の処理を繰り返す。一方、次のループが再利用不可能であれば、次のループを通常に実行し、ＲＦおよびＲＢへの登録を開始する。 When reused, the same processing is repeated for the next loop based on the branch direction registered in the RB. On the other hand, if the next loop is not reusable, the next loop is executed normally and registration to RF and RB is started.

（命令区間の実行時における処理の流れ）
次に、命令がデコードされた場合の具体的な処理の流れについて説明する。以下では、命令がデコードされた結果、関数呼び出し命令である場合、関数復帰命令である場合、後方分岐成立の場合、後方分岐不成立の場合、およびその他の命令の場合について、それぞれ処理の流れを説明する。 (Processing flow during execution of instruction section)
Next, a specific processing flow when an instruction is decoded will be described. In the following, the flow of processing is explained for each case where the result of decoding is a function call instruction, a function return instruction, a backward branch is established, a backward branch is not established, and other instructions. To do.

（関数呼び出し命令である場合）
命令がデコードされた結果、関数呼び出し命令である場合の処理を図３に示すフローチャートを参照しながら以下に説明する。まずステップ１（以降、Ｓ１のように称する）において、引数の第７ワードを検出したか否かが判定される。Ｓ１においてＹＥＳ、すなわち、引数の第７ワードを検出したと判定された場合には、ＲＷに登録されている登録中ＲＢエントリを全て無効化し、Ｓ６に移行して、プログラムカウンタを関数の先頭へ進め、処理を終了する。 (If it is a function call instruction)
Processing when the instruction is decoded as a result of function decoding will be described below with reference to the flowchart shown in FIG. First, in step 1 (hereinafter referred to as S1), it is determined whether or not the seventh word of the argument has been detected. If YES in S1, that is, if it is determined that the seventh word of the argument has been detected, all registered RB entries registered in the RW are invalidated, the process proceeds to S6, and the program counter is moved to the head of the function. Proceed and finish the process.

一方、Ｓ１においてＮＯ、すなわち、引数の第７ワードを検出していないと判定された場合には、該関数呼び出しおよび入力値がＲＦおよびＲＢに登録されているか否かを検索する（Ｓ２）。Ｓ２においてＹＥＳ、すなわち、該関数呼び出しおよび入力値がＲＦおよびＲＢに登録されていると判定された場合には、後述するＳ７のステップに移行する。 On the other hand, if NO in S1, that is, if it is determined that the seventh word of the argument has not been detected, it is searched whether or not the function call and input value are registered in RF and RB (S2). If YES in S2, that is, if it is determined that the function call and the input value are registered in RF and RB, the process proceeds to step S7 described later.

Ｓ２においてＮＯ、すなわち、該関数呼び出しおよび入力値がＲＦおよびＲＢに登録されていないと判定された場合、該関数のためのＲＦエントリおよびＲＢエントリを確保しようと試み、(1)既存のＲＦエントリがあるか、(2)登録作業中につき追い出すことのできないＲＦエントリ以外に、使用可能なＲＦエントリがあるか、または(3)登録作業中につき追い出すことができないＲＢエントリ以外に、使用可能なＲＢエントリがあるかを判定する（Ｓ３）。 If NO in S2, that is, if it is determined that the function call and input value are not registered in RF and RB, an attempt is made to secure an RF entry and RB entry for the function, and (1) an existing RF entry (2) There is an RF entry that can be used in addition to an RF entry that cannot be driven out during registration work, or (3) an RB that can be used in addition to an RB entry that cannot be driven out during registration work It is determined whether there is an entry (S3).

Ｓ３においてＮＯ、すなわち、使用可能なＲＦ・ＲＢエントリがないと判定された場合には、登録を開始せず、ＲＷに登録されているＲＢを全て無効化し（Ｓ５）、ＲＷを空にする。一方、Ｓ３においてＹＥＳ、すなわち、使用可能なＲＦ・ＲＢエントリがあると判定された場合には、該関数のためのＲＦエントリおよびＲＢエントリを確保し、ＲＷに登録する（Ｓ４）。ここで、ＲＷに登録した際に、ＲＷに登録されているＲＷエントリが溢れた際には、最も古いＲＷエントリを削除し、対応するＲＢを無効化する。Ｓ３またはＳ４が行われた後に、プログラムカウンタを関数の先頭へ進め（Ｓ６）、処理を終了する。 If NO in S3, that is, if it is determined that there is no usable RF / RB entry, registration is not started, all RBs registered in RW are invalidated (S5), and RW is emptied. On the other hand, if YES in S3, that is, if it is determined that there is an available RF / RB entry, an RF entry and an RB entry for the function are secured and registered in the RW (S4). Here, when the RW entries registered in the RW overflow when registered in the RW, the oldest RW entry is deleted and the corresponding RB is invalidated. After S3 or S4 is performed, the program counter is advanced to the beginning of the function (S6), and the process is terminated.

一方、Ｓ２においてＹＥＳ、すなわち、該関数呼び出しおよび入力値がＲＦおよびＲＢに登録されていると判定された場合、該関数は再利用可能であることになる。すなわち、ＲＢから出力値を求めるとともに、レジスタおよび主記憶３にこの出力値を書き込む（Ｓ７）。そして、登録中の関数／ループがＲＷに登録されているか否かを判定し（Ｓ８）、登録されている場合には、再利用を行った関数のＲＢエントリの内容のうち必要なものをＲＷに登録されているエントリに追加する（Ｓ９）。ここで、ＲＷのＴＯＰから順に登録し、途中でＲＢがあふれた場合には、以降、ＲＷのＢＯＴＴＯＭまでに対するＲＢを無効化し、ＲＷから削除する。その後、プログラムカウンタを次の命令へ進め（Ｓ１０）、処理を終了する。 On the other hand, if YES in S2, that is, if it is determined that the function call and the input value are registered in RF and RB, the function is reusable. That is, the output value is obtained from the RB, and the output value is written to the register and the main memory 3 (S7). Then, it is determined whether or not the function / loop being registered is registered in the RW (S8). If the function / loop is being registered, a necessary one of the contents of the RB entry of the function that has been reused is determined as the RW. (S9). Here, registration is performed in order from the TOP of the RW, and when the RB overflows in the middle, the RB up to the BOTTOM of the RW is invalidated and deleted from the RW. Thereafter, the program counter is advanced to the next instruction (S10), and the process is terminated.

（関数復帰命令である場合）
命令がデコードされた結果、関数復帰命令である場合の処理を図４に示すフローチャートを参照しながら以下に説明する。Ｓ１１において、ＲＷのＴＯＰから順にたどり、関数に対応するＲＦ／ＲＢが検出されるまでに、ループに関するＲＢが検出されるか否かが判定される（Ｓ１２）。ここでループに関するＲＢが検出されると（Ｓ１２においてＹＥＳ）、該当ＲＢを全て無効化するとともに、ＲＷから削除する（Ｓ１３）。 (If it is a function return instruction)
Processing when the instruction is decoded as a result of function decoding will be described below with reference to the flowchart shown in FIG. In S11, it is determined in order from TOP of RW, whether or not RB related to the loop is detected before RF / RB corresponding to the function is detected (S12). If an RB related to the loop is detected (YES in S12), all the corresponding RBs are invalidated and deleted from the RW (S13).

一方、ＲＷ探索中に、該関数に対応するＲＦ／ＲＢを検出したか否かが判定される（Ｓ１４）。ここで、該関数に対応するＲＦ／ＲＢが検出されると（Ｓ１４においてＹＥＳ）、該当ＲＢエントリを有効化するとともに、ＲＷから削除する（Ｓ１５）。 On the other hand, it is determined whether or not RF / RB corresponding to the function is detected during the RW search (S14). If an RF / RB corresponding to the function is detected (YES in S14), the corresponding RB entry is validated and deleted from the RW (S15).

その後、復帰命令を実行し（Ｓ１６）、処理を終了する。 Thereafter, a return instruction is executed (S16), and the process is terminated.

（後方分岐成立である場合）
命令がデコードされた結果、後方分岐成立である場合の処理を図５に示すフローチャートを参照しながら以下に説明する。まず、ＲＷのＴＯＰから順にたどり、関数に対応するＲＢを検出するか否かが判定される（Ｓ２１）。Ｓ２１においてＹＥＳ、すなわち、関数に対応するＲＢを検出した場合には、後述するＳ２４のステップに移行する。 (When backward branch is established)
The processing when the backward branch is established as a result of decoding the instruction will be described below with reference to the flowchart shown in FIG. First, it is determined in turn from TOP of RW whether or not RB corresponding to the function is detected (S21). If YES in S21, that is, if an RB corresponding to the function is detected, the process proceeds to step S24 described later.

一方、Ｓ２１においてＮＯ、すなわち、関数に対応するＲＢを検出しない場合には、次に、該後方分岐命令自身のアドレスとＲＢ中のループ終了アドレスとが一致するか否かが判定される（Ｓ２２）。Ｓ２２においてＮＯ、すなわち、該後方分岐命令自身のアドレスとＲＢ中のループ終了アドレスとが一致しないと判定されると、後述するＳ２４のステップに移行する。 On the other hand, if NO in S21, that is, if the RB corresponding to the function is not detected, it is next determined whether or not the address of the backward branch instruction itself matches the loop end address in the RB (S22). ). If NO in S22, that is, if it is determined that the address of the backward branch instruction itself does not match the loop end address in the RB, the process proceeds to step S24 described later.

Ｓ２２においてＹＥＳ、すなわち、該後方分岐命令自身のアドレスとＲＢ中のループ終了アドレスとが一致すると判定された場合、ＲＷのＴＯＰから該ＲＢの手前までのＲＢを全て無効化し（Ｓ２３）、ＲＷから削除する。また、該ＲＢエントリを有効化し、かつtaken=1とし、ＲＷから削除する。 If YES in S22, that is, if it is determined that the address of the backward branch instruction itself matches the loop end address in the RB, all RBs from the RW TOP to the front of the RB are invalidated (S23). delete. Also, the RB entry is validated and taken = 1 is deleted from the RW.

次に、Ｓ２４において、次ループの先頭アドレスおよび入力値がＲＦおよびＲＢに登録されているか否かが判定される。Ｓ２４においてＹＥＳ、すなわち、次ループの先頭アドレスおよび入力値がＲＦおよびＲＢに登録されている場合には、後述するＳ３０のステップに移行する。 Next, in S24, it is determined whether or not the start address and input value of the next loop are registered in RF and RB. If YES in S24, that is, if the start address and input value of the next loop are registered in RF and RB, the process proceeds to step S30 described later.

一方、Ｓ２４においてＮＯ、すなわち、次ループの先頭アドレスおよび入力値がＲＦおよびＲＢに登録されていない場合には、次ループのためのＲＦエントリおよびＲＢエントリを確保しようと試み、(1)既存のＲＦエントリがあるか、(2)登録作業中につき追い出すことができないＲＦエントリ以外に、使用可能なＲＦエントリがあるか、または(3)登録作業中につき追い出すことができないＲＢエントリ以外に、使用可能なＲＢエントリがあるかが判定される（Ｓ２５）。 On the other hand, if NO in S24, that is, if the start address and input value of the next loop are not registered in RF and RB, an attempt is made to secure the RF entry and RB entry for the next loop, and (1) There are RF entries, (2) There are RF entries that can be used in addition to RF entries that cannot be driven out during registration work, or (3) Can be used in addition to RB entries that cannot be driven out during registration work It is determined whether there is an RB entry (S25).

Ｓ２５においてＮＯ、すなわち、使用可能なＲＦ・ＲＢエントリがないと判定された場合には、登録を開始せずに、ＲＷに登録されているＲＢを全て無効化し（Ｓ２６）、ＲＷを空にする。その後、Ｓ２９において、プログラムカウンタを条件分岐先へ進め、処理を終了する。 If NO in S25, that is, if it is determined that there is no usable RF / RB entry, all the RBs registered in the RW are invalidated without starting the registration (S26), and the RW is emptied. . Thereafter, in S29, the program counter is advanced to the conditional branch destination, and the process ends.

一方、Ｓ２５においてＹＥＳ、すなわち、使用可能なＲＦ・ＲＢエントリがあると判定された場合には、その使用可能なＲＦ・ＲＢエントリを確保し、確保したＲＦ・ＲＢをＲＷに登録する（Ｓ２７）。また、ＲＢにループ終了アドレス（後方分岐命令自身のアドレス）を登録する。ここで、ＲＷへの登録を行った際にＲＷが溢れた場合には、最も古いＲＷエントリを削除し（Ｓ２８）、それに対応するＲＢを無効化する。その後、Ｓ２９において、プログラムカウンタを条件分岐先へ進め、処理を終了する。 On the other hand, if YES in S25, that is, if it is determined that there is an available RF / RB entry, the available RF / RB entry is secured and the secured RF / RB is registered in the RW (S27). . Also, the loop end address (the address of the backward branch instruction itself) is registered in RB. Here, if the RW overflows when registering to the RW, the oldest RW entry is deleted (S28), and the corresponding RB is invalidated. Thereafter, in S29, the program counter is advanced to the conditional branch destination, and the process ends.

一方、前記したＳ２４においてＹＥＳとなった場合、次ループは再利用可能であることになるので、ＲＢから出力値を求め、この値をレジスタおよび主記憶３に書き込む（Ｓ３０）。ここで、登録中の関数／ループがＲＷに登録されているか否かが判定され（Ｓ３１）、登録されている場合、再利用を行ったループのＲＢエントリの内容のうち必要なものをＲＷに登録されているエントリに追加する（Ｓ３２）。このとき、ＲＷのＴＯＰから順に登録し、途中でＲＢが溢れた場合、以降、ＲＷのＢＯＴＴＯＭまでに対するＲＢを無効化し、ＲＷから削除する。 On the other hand, if YES in S24 described above, the next loop is reusable, so an output value is obtained from RB and this value is written to the register and main memory 3 (S30). Here, it is determined whether or not the function / loop being registered is registered in the RW (S31). If registered, the necessary contents of the RB entry of the loop that has been reused are stored in the RW. It adds to the registered entry (S32). At this time, if the RB is registered in order from the TOP of the RW, and the RB overflows in the middle, the RB up to the BOTTOM of the RW is invalidated and deleted from the RW.

その後、プログラムカウンタは、次ループ先頭ではなく、該ＲＢ中のtakenの値に応じて、taken=1の場合は自命令、taken=0の場合は、ＲＢ中に記憶しておいたループ終了アドレスの次へ進める。その後、処理を終了する。 After that, the program counter is not at the beginning of the next loop, but according to the value of take in the RB, the self-instruction when take = 1, and the loop end address stored in RB when take = 0 Proceed to the next. Thereafter, the process ends.

（後方分岐不成立である場合）
命令がデコードされた結果、後方分岐不成立である場合の処理を図６に示すフローチャートを参照しながら以下に説明する。まず、ＲＷのＴＯＰから順に検索し（Ｓ４１）、関数に対応するＲＢを検出したか否かが判定される（Ｓ４２）。Ｓ４２においてＹＥＳ、すなわち、関数に対応するＲＢを検出したと判定された場合、Ｓ４６においてプログラムカウンタを次命令に進め、処理を終了する。 (If backward branch is not established)
The processing when the backward branch is not established as a result of decoding the instruction will be described below with reference to the flowchart shown in FIG. First, the search is performed in order from the TOP of the RW (S41), and it is determined whether or not the RB corresponding to the function has been detected (S42). If YES in S42, that is, if it is determined that an RB corresponding to the function has been detected, the program counter is advanced to the next instruction in S46, and the process ends.

Ｓ４２においてＮＯ、すなわち、関数に対応するＲＢを検出していないと判定された場合、該後方分岐命令自身のアドレスとＲＢ中のループ終了アドレスが一致するか否かが判定される（Ｓ４３）。Ｓ４３においてＮＯ、すなわち、該後方分岐命令に対応するＲＦ／ＲＢを検出していないと判定された場合、Ｓ４６においてプログラムカウンタを次命令に進め、処理を終了する。 If NO in S42, that is, if it is determined that the RB corresponding to the function has not been detected, it is determined whether or not the address of the backward branch instruction itself matches the loop end address in the RB (S43). If NO in S43, that is, if it is determined that the RF / RB corresponding to the backward branch instruction has not been detected, the program counter is advanced to the next instruction in S46, and the process ends.

一方、Ｓ４３においてＹＥＳ、すなわち、該後方分岐命令に対応するＲＦ／ＲＢを検出したと判定された場合、ＲＷのＴＯＰから該ＲＢの手前までのＲＢを全て無効化し（Ｓ４４）、ＲＷから削除する。また、該ＲＢエントリを有効化し、かつtaken=0とし、ＲＷから削除する（Ｓ４５）。その後、Ｓ４６においてプログラムカウンタを次命令に進め、処理を終了する。 On the other hand, if YES in S43, that is, if it is determined that the RF / RB corresponding to the backward branch instruction is detected, all RBs from the RW TOP to the front of the RB are invalidated (S44) and deleted from the RW. . Also, the RB entry is validated and taken = 0 is set and deleted from the RW (S45). Thereafter, the program counter is advanced to the next instruction in S46, and the process is terminated.

（その他の命令である場合）
次に、命令がデコードされた結果、上記以外のその他の命令である場合について説明する。その他の命令である場合、レジスタＲ／Ｗ、主記憶Ｒ／Ｗが実行される。その際に、ＲＷが空でなければ、以下の手順によってレジスタＲ／Ｗ、主記憶Ｒ／ＷをＲＷに登録されているＲＢに対して登録する。以下では、（１）汎用レジスタＲＥＡＤの場合、（２）汎用レジスタＷＲＩＴＥの場合、（３）浮動小数点レジスタＲＥＡＤの場合、（４）浮動小数点レジスタＷＲＩＴＥの場合、（５）条件コードレジスタＩＣＣ−ＲＥＡＤの場合、（６）条件コードレジスタＩＣＣ−ＷＲＩＴＥの場合、（７）浮動小数点条件コードレジスタＦＣＣ−ＲＥＡＤの場合、（８）浮動小数点条件コードレジスタＦＣＣ−ＷＲＩＴＥの場合、（９）主記憶ＲＥＡＤの場合、（１０）主記憶ＷＲＩＴＥの場合についてそれぞれ説明する。 (In case of other instructions)
Next, a description will be given of the case where the instruction is decoded and other instructions than the above. In the case of other instructions, the register R / W and the main memory R / W are executed. At this time, if the RW is not empty, the register R / W and the main memory R / W are registered to the RB registered in the RW by the following procedure. In the following, (1) general-purpose register READ, (2) general-purpose register WRITE, (3) floating-point register READ, (4) floating-point register WRITE, (5) condition code register ICC-READ (6) Condition code register ICC-WRITE, (7) Floating point condition code register FCC-READ, (8) Floating point condition code register FCC-WRITE, (9) Main memory READ The case of (10) main memory WRITE will be described respectively.

（１）汎用レジスタＲＥＡＤの場合
まず、ＲＷのＴＯＰからＢＯＴＴＯＭまで順にたどる。そして、（１−１）該ＲＢがリーフ関数かつ%o0-6の場合、または該ＲＢが非リーフ関数かつ%i0-6の場合、arg[0-5].V=0であれば、arg[0-5].V=1に変更し、arg[0-5].Valに読み出しデータを記録する。その後、さらにＲＷをたどり、該ＲＢが関数の場合、処理を終了する。一方、該ＲＢが関数ではない（ループである）場合、arg[0-5].V=0であれば、arg[0-5].V=1に変更し、arg[0-5].Valに読み出しデータを記録し、処理を終了する。 (1) In the case of the general-purpose register READ First, the RW is sequentially traced from TOP to BOTTOM. (1-1) When the RB is a leaf function and% o0-6, or when the RB is a non-leaf function and% i0-6, arg [0-5]. If V = 0, arg Change [0-5] .V = 1 and record the read data in arg [0-5] .Val. Thereafter, RW is further traced, and when the RB is a function, the process is terminated. On the other hand, when the RB is not a function (is a loop), if arg [0-5] .V = 0, it is changed to arg [0-5] .V = 1 and arg [0-5]. The read data is recorded in Val, and the process ends.

一方、（１−２）該ＲＢがループの場合、（ａ）%g0-7でgrr[0-7].V=0であれば、grr[0-7].V=1に変更し、grr[0-7].Valに読み出しデータを記録し、処理を終了する。（ｂ）%o0-7でarg[0-7].V=0であれば、arg[0-7].V=1に変更し、arg[0-7].Valに読み出しデータを記録し、処理を終了する。（ｃ）%l0-7でlrr[0-7].V=0であれば、lrr[0-7].V=1に変更し、lrr[0-7].Valに読み出しデータを記録し、処理を終了する。（ｄ）%i0-7でirr[0-7].V=0であれば、irr[0-7].V=1に変更し、irr[0-7].Valに読み出しデータを記録し、次のＲＷエントリに進む。 On the other hand, (1-2) When the RB is a loop, (a) If grr [0-7] .V = 0 at% g0-7, change to grr [0-7] .V = 1, Record the read data in grr [0-7] .Val and end the process. (B) If arg [0-7] .V = 0 at% o0-7, change to arg [0-7] .V = 1 and record the read data in arg [0-7] .Val The process is terminated. (C) If lrr [0-7] .V = 0 at% l0-7, change to lrr [0-7] .V = 1 and record the read data in lrr [0-7] .Val The process is terminated. (D) If% i0-7 and irr [0-7] .V = 0, change to irr [0-7] .V = 1 and record the read data in irr [0-7] .Val. To the next RW entry.

（２）汎用レジスタＷＲＩＴＥの場合
まず、ＲＷのＴＯＰからＢＯＴＴＯＭまで順にたどる。そして（２−１）該ＲＢがリーフ関数かつ%o0-5の場合、または該ＲＢが非リーフ関数かつ%i0-5の場合、arg[0-5].V=0であれば、以降の読み出しは入力ではないことを示すために、arg[0-5].V=2に変更する。さらに、%o0-1/%i0-1について、rti[0-1].V=1に変更し、rti[0-1].Valに書き込みデータを記録する。その後、さらにＲＷをたどり、該ＲＢが関数の場合、処理を終了する。一方、該ＲＢが関数ではない（ループである）場合、arg[0-1].V=0であれば、以降の読み出しは入力ではないことを示すために、arg[0-1].V=2に変更し、rti[0-1].V=1に変更し、rti[0-1].Valに書き込みデータを記録し、処理を終了する。 (2) In the case of the general-purpose register WRITE First, the RW is sequentially traced from TOP to BOTTOM. (2-1) When the RB is a leaf function and% o0-5, or when the RB is a non-leaf function and% i0-5, if arg [0-5] .V = 0, then Change to arg [0-5] .V = 2 to indicate that reading is not input. Further,% o0-1 /% i0-1 is changed to rti [0-1] .V = 1, and write data is recorded in rti [0-1] .Val. Thereafter, RW is further traced, and when the RB is a function, the process is terminated. On the other hand, when the RB is not a function (a loop), if arg [0-1] .V = 0, arg [0-1] .V is used to indicate that the subsequent reading is not an input. = 2, rti [0-1] .V = 1, write data is recorded in rti [0-1] .Val, and the process ends.

一方、（２−２）該ＲＢがループの場合、（ａ）%g0-7でgrr[0-7].V=0であれば、grr[0-7].V=2に変更し、grr[0-7].Valに書き込みデータを記録し、処理を終了する。（ｂ）%o0-7でarg[0-7].V=0であれば、arg[0-7].V=2に変更し、arg[0-7].Valに書き込みデータを記録し、処理を終了する。（ｃ）%l0-7でlrr[0-7].V=0であれば、lrr[0-7].V=2に変更し、lrr[0-7].Valに書き込みデータを記録し、処理を終了する。（ｄ）%i0-7でirr[0-7].V=0であれば、irr[0-7].V=2に変更し、irr[0-7].Valに書き込みデータを記録し、次のＲＷエントリに進む。 On the other hand, (2-2) When the RB is a loop, (a) If grr [0-7] .V = 0 at% g0-7, change to grr [0-7] .V = 2, Record the write data in grr [0-7] .Val and finish the process. (B) If arg [0-7] .V = 0 at% o0-7, change to arg [0-7] .V = 2 and record the write data in arg [0-7] .Val The process is terminated. (C) If lrr [0-7] .V = 0 at% l0-7, change to lrr [0-7] .V = 2 and record the write data in lrr [0-7] .Val The process is terminated. (D) If% i0-7 and irr [0-7] .V = 0, change to irr [0-7] .V = 2 and record the write data in irr [0-7] .Val. To the next RW entry.

（３）浮動小数点レジスタＲＥＡＤの場合
まず、ＲＷのＴＯＰからＢＯＴＴＯＭまで順にたどる。そして（３−１）該ＲＢが関数の場合、何もせずに処理を終了する。一方、（３−２）該ＲＢがループの場合、frr[0-31].V=0であれば、frr[0-31].V=1に変更し、frr[0-31].Valに読み出しデータを記録し、処理を終了する。 (3) Floating-point register READ First, the RW is sequentially traced from TOP to BOTTOM. (3-1) If the RB is a function, the process ends without doing anything. On the other hand, (3-2) When the RB is a loop, if frr [0-31] .V = 0, it is changed to frr [0-31] .V = 1 and frr [0-31] .Val The read data is recorded in, and the process ends.

（４）浮動小数点レジスタＷＲＩＴＥの場合
まず、ＲＷのＴＯＰからＢＯＴＴＯＭまで順にたどる。そして（４−１）該ＲＢが関数かつ%f0-1の場合、rtf[0-1].V=1に変更し、rtf[0-1].Valに書き込みデータを記録する。さらにＲＷをたどり、frr[0-1].V=0であれば、以降の読み出しは入力ではないことを示すために、frr[0-1].V=2に変更し、rtf[0-1].V=1に変更し、rtf[0-1].Valに書き込みデータを記録し、処理を終了する。 (4) In the case of the floating-point register WRITE First, the RW is sequentially traced from TOP to BOTTOM. (4-1) If the RB is a function and% f0-1, then rtf [0-1] .V = 1 is changed and write data is recorded in rtf [0-1] .Val. Further follow RW and if frr [0-1] .V = 0, then change to frr [0-1] .V = 2 to indicate that the subsequent read is not an input, and rtf [0- 1] .V = 1, write data is recorded in rtf [0-1] .Val, and the process ends.

一方、（４−２）該ＲＢがループの場合、frr[0-31].V=0であれば、frr[0-31].V=2に変更し、frw[0-31].V=1に変更し、frw[0-7].Valに書き込みデータを記録し、処理を終了する。 On the other hand, (4-2) When the RB is a loop, if frr [0-31] .V = 0, it is changed to frr [0-31] .V = 2 and frw [0-31] .V Change to = 1, write data to frw [0-7] .Val, and finish the process.

（５）条件コードレジスタＩＣＣ−ＲＥＡＤの場合
まず、ＲＷのＴＯＰからＢＯＴＴＯＭまで順にたどる。そして（５−１）該ＲＢが関数の場合、何もせずに処理を終了する。一方、（５−２）該ＲＢがループの場合、icr.V=0であれば、icr.V=1に変更し、icr.Valに読み出しデータを記録し、処理を終了する。 (5) In the case of the condition code register ICC-READ First, the RW is sequentially traced from TOP to BOTTOM. (5-1) If the RB is a function, the process ends without doing anything. On the other hand, (5-2) When the RB is a loop, if icr.V = 0, it is changed to icr.V = 1, the read data is recorded in icr.Val, and the process is terminated.

（６）条件コードレジスタＩＣＣ−ＷＲＩＴＥの場合
まず、ＲＷのＴＯＰからＢＯＴＴＯＭまで順にたどる。そして（６−１）該ＲＢが関数の場合、何もせずに処理を終了する。一方、（６−２）該ＲＢがループの場合、icr.V=0であれば、icr.V=2、icw.V=1に変更し、icw.Valに書き込みデータを記録し、処理を終了する。 (6) In the case of the condition code register ICC-WRITE First, the RW is sequentially traced from TOP to BOTTOM. (6-1) If the RB is a function, the process ends without doing anything. On the other hand, (6-2) If the RB is a loop, if icr.V = 0, change to icr.V = 2, icw.V = 1, record the write data in icw.Val, and finish.

（７）浮動小数点条件コードレジスタＦＣＣ−ＲＥＡＤの場合
まず、ＲＷのＴＯＰからＢＯＴＴＯＭまで順にたどる。そして（７−１）該ＲＢが関数の場合、何もせずに処理を終了する。一方、（７−２）該ＲＢがループの場合、fcr.V=0であれば、fcr.V=1に変更し、fcr.Valに読み出しデータを記録し、処理を終了する。 (7) In the case of the floating point condition code register FCC-READ First, the RW is sequentially traced from TOP to BOTTOM. (7-1) If the RB is a function, the process ends without doing anything. On the other hand, (7-2) When the RB is a loop, if fcr.V = 0, the data is changed to fcr.V = 1, read data is recorded in fcr.Val, and the process is terminated.

（８）条件コードレジスタＩＣＣ−ＷＲＩＴＥの場合
まず、ＲＷのＴＯＰからＢＯＴＴＯＭまで順にたどる。そして（８−１）該ＲＢが関数の場合、何もせずに処理を終了する。一方、（８−２）該ＲＢがループの場合、fcr.V=0であれば、fcr.V=2、fcw.V=1に変更し、fcw.Valに書き込みデータを記録し、処理を終了する。 (8) In the case of the condition code register ICC-WRITE First, the RW is sequentially traced from TOP to BOTTOM. (8-1) If the RB is a function, the process is terminated without doing anything. On the other hand, (8-2) When the RB is a loop, if fcr.V = 0, change to fcr.V = 2 and ffw.V = 1, record the write data in fcw.Val, and finish.

（９）主記憶ＲＥＡＤの場合
まず、ＲＷのＴＯＰからＢＯＴＴＯＭまで順にたどる。そして、ＲＢにＷＲＩＴＥデータとして登録済である場合は、その値を使用する。一方、上記の場合ではなく、ＲＢにＲＥＡＤデータとして登録済である場合には、その値を使用する。さらに、いずれにも登録済でない場合は、キャッシュを経由して主記憶３から読み込む。 (9) Case of Main Memory READ First, the RW is sequentially traced from TOP to BOTTOM. If the RB has been registered as WRITE data, the value is used. On the other hand, when the data has been registered in the RB as READ data instead of the above case, the value is used. Further, if it is not registered in any of them, it is read from the main memory 3 via the cache.

その後、再度ＲＷのＴＯＰからＢＯＴＴＯＭまで順にたどる。そして、（ａ）アドレスが、ＲＢに登録されているsp+64の場合、構造体ポインタの読み出しであるので、arg0.V=0であれば、arg0.V=1に変更し、arg0.Valに読み出しデータを記録する。（ｂ）上記の（ａ）の場合でなく、アドレスが、LIMIT以上sp+92未満であれば、登録不要領域であるので、何もしない。（ｃ）上記の（ｂ）の場合でない場合、ＷＲＩＴＥデータとして登録済であるかどうかを検査し、そうであれば、すでに上書きされたあとのＲＥＡＤであるので登録不要であり、何もしない。（ｄ）上記の（ｃ）でない場合、ＲＥＡＤデータとして登録済であるかどうかを検査し、そうであれば、すでに登録済であるので登録不要であり、何もしない。（ｅ）上記の（ｄ）でない場合、ＲＥＡＤデータとしての登録が必要であるので、ＲＦに主記憶ＲＥＡＤアドレスを確保し、ＲＥＡＤデータとして登録する。ＲＦに主記憶アドレスを確保できなかった場合には、登録不能であるため、そのＲＷエントリからＢＯＴＴＯＭまでに対応するＲＢエントリを全て無効化する。 After that, the RW is followed again from the TOP to the BOTTOM. (A) When the address is sp + 64 registered in the RB, since the structure pointer is read, if arg0.V = 0, the address is changed to arg0.V = 1 and arg0.Val The read data is recorded in (B) Not in the case of (a) above, if the address is not less than LIMIT and less than sp + 92, it is a registration unnecessary area, so nothing is done. (C) If it is not the case of (b) above, it is checked whether or not it has been registered as WRITE data. If so, it is a READ after being overwritten, so registration is unnecessary and nothing is done. (D) If it is not (c) above, it is checked whether or not it has been registered as READ data. If so, it is already registered and no registration is required and nothing is done. (E) If it is not (d) above, registration as READ data is required, so a main memory READ address is secured in the RF and registered as READ data. If the main memory address cannot be secured in the RF, registration is impossible, and all RB entries corresponding to the RW entry to BOTTOM are invalidated.

（１０）主記憶ＷＲＩＴＥの場合
まず、キャッシュを経由して、主記憶３に書き込む。そして、ベースレジスタが１４（％ｓｐ）かつオフセットが９２以上である場合、引数の第７ワードを検出したことを記憶する。 (10) In case of main memory WRITE First, data is written in the main memory 3 via the cache. If the base register is 14 (% sp) and the offset is 92 or more, the fact that the seventh word of the argument has been detected is stored.

その後、ＲＷのＴＯＰからＢＯＴＴＯＭまで順にたどる。そして、（ａ）アドレスが、ＲＢに登録されているsp+64の場合、構造体ポインタの読み出しであるので、arg0.V=0であれば、arg0.V=2に変更する。（ｂ）上記の（ａ）の場合ではなく、アドレスがLIMIT以上sp+92未満であれば、登録不要領域であるので、何もしない。（ｃ）上記の（ｂ）の場合でない場合、ＷＲＩＴＥデータとして登録済であるかどうかを検査し、そうであれば、すでにアドレスは登録済であるので、内容を新しいＷＲＩＴＥデータに更新する。（ｄ）上記の（ｃ）でない場合、ＷＲＩＴＥデータとしての登録が必要であるので、ＲＦに主記憶ＷＲＩＴＥアドレスを確保し、ＷＲＩＴＥデータとして登録する。ＲＦに主記憶アドレスを確保できなかった場合には、登録不能であるため、そのＲＷエントリからＢＯＴＴＯＭまでに対応するＲＢエントリを全て無効化する。 Then, the RW is traced from the TOP to the BOTTOM. (A) When the address is sp + 64 registered in the RB, since the structure pointer is read, if arg0.V = 0, the address is changed to arg0.V = 2. (B) Not in the case of (a) above, but if the address is not less than LIMIT and less than sp + 92, it is a registration unnecessary area, so nothing is done. (C) If not in the case of (b) above, it is checked whether or not it has been registered as WRITE data. If so, the address has already been registered, so the contents are updated to new WRITE data. (D) If not (c) above, registration as WRITE data is required, so a main memory WRITE address is secured in the RF and registered as WRITE data. If the main memory address cannot be secured in the RF, registration is impossible, and all RB entries corresponding to the RW entry to BOTTOM are invalidated.

（ループを含む多重再利用）
１レベルで上記のような再利用機構を用いた場合、図１７（ａ）に示した例で言えば、リーフ関数としての関数Ｂや、関数Ｂの内部にあるループＣなどをそれぞれ再利用することが可能となる。これに対して、ある関数を一度実行しただけで、その関数の内部に含まれる関数やループを含む全ての命令区間が再利用可能となるように登録を行う仕組みが多重再利用である。例えば上記の例で言えば、多重再利用によれば、関数Ａを一度実行しただけで、入れ子関係にあるＡ，Ｂ，Ｃの全ての命令区間が再利用可能となる。以下に、多重再利用を実現する上で必要とされる機能拡張について説明する。 (Multiple reuse including loops)
When the reuse mechanism as described above is used at one level, in the example shown in FIG. 17A, the function B as the leaf function, the loop C inside the function B, etc. are reused. It becomes possible. On the other hand, multiple reuse is a mechanism for performing registration so that all instruction sections including functions and loops included in the function can be reused by executing a function once. For example, in the above example, according to multiple reuse, all the instruction sections of A, B, and C that are nested can be reused by executing the function A once. The function expansion required for realizing multiple reuse will be described below.

図７に、一例として、関数Ａおよび関数Ｄの概念的な構造を示す。同図に示す例では、関数Ａの内部にループＢが存在しており、ループＢの内部にループＣが存在しており、ループＣにおいて関数Ｄが呼び出されるようになっている。そして、関数Ｄの内部にループＥが存在しており、ループＥの内部にループＦが存在している。 FIG. 7 shows a conceptual structure of function A and function D as an example. In the example shown in the figure, a loop B exists inside the function A, a loop C exists inside the loop B, and the function D is called in the loop C. A loop E exists inside the function D, and a loop F exists inside the loop E.

図８は、図７に示す関数Ａ，ＤおよびループＢ，Ｃ，Ｅ，Ｆの入れ子構造において、内側の構造のレジスタ入出力（太枠セル領域）が、外側の構造のレジスタ入出力となる影響範囲（矢印）について示している。例えば、ループＦの内部において入力として参照された％i０〜５は、ループＥおよび関数Ｄに対する入力でもあり、さらに、関数Ｄを呼び出したループＣおよびループＢに対する入力（ただし％o０〜５に読み替える）でもある。一方、関数Ａにとって％o０〜５は局所変数に相当するので、％i０〜５（％o０〜５）は、関数Ａに対してのレジスタ入力とはならない。すなわち、％i０〜５（％o０〜５）の影響範囲はループＢまでとなる。別の見方をすれば、関数Ｄの内部で％i０〜５が参照された場合には、ループＢが直接的に％o０〜５を参照しなくても、％o０〜５をループＢの入力値として登録する必要がある。ループＦ内部において出力された％i０〜１についても同様である。 In FIG. 8, in the nested structure of the functions A and D and loops B, C, E, and F shown in FIG. 7, the register input / output (thick frame cell region) of the inner structure becomes the register input / output of the outer structure. The influence range (arrow) is shown. For example,% i0 to 5 referred to as an input in the loop F is also an input to the loop E and the function D, and further, an input to the loop C and the loop B that called the function D (but replaced with% o0 to 5). It is also. On the other hand, for function A,% o0 to 5 correspond to local variables, so% i0 to 5 (% o0 to 5) are not register inputs for function A. That is, the influence range of% i0 to 5 (% o0 to 5) is up to loop B. From another point of view, if% i0-5 is referenced inside function D,% o0-5 is input to loop B even if loop B does not directly reference% o0-5. Must be registered as a value. The same applies to% i0-1 output in the loop F.

浮動小数点レジスタはレジスタウィンドウに含まれないので、出力された％f０〜１は、関数Ａを含む全階層の出力となる。一方、その他のレジスタ入出力は、関数を超えて影響がおよぶことはない。すなわち、ループＦ内部における入出力、すなわち、レジスタ入力としての％i６〜７、％g,l,o、％f０〜３１、％icc、％fcc、およびレジスタ出力としての％I２〜７、％g,l,o、％f２〜３１、％icc、％fccの影響範囲はループＥまでとなる。主記憶３に対する入出力については、前述した、関数呼び出し直前の％sp(SP)と比較する方法を入れ子の全階層に対して適用することにより、影響範囲を特定することができる。 Since the floating point register is not included in the register window, the output% f0 to 1 is output of all layers including the function A. On the other hand, other register inputs and outputs have no effect beyond functions. That is, input / output in the loop F, that is,% i6-7,% g, l, o,% f0-31,% icc,% fcc as register inputs, and% I2-7,% g as register outputs , l, o,% f2 to 31,% icc,% fcc range up to loop E. As for the input / output to / from the main memory 3, the influence range can be specified by applying the above-described method of comparison with% sp (SP) immediately before the function call to all nested layers.

以上のことから、多重再利用を実現するには、前述したＲＦおよびＲＢを関数やループの入れ子構造と関連づける機構が必要である。図９に示すように、再利用ウィンドウ（ＲＷ）を装備することによって、現在実行中かつ登録中であるＲＦおよびＲＢの各エントリ（図中ではＡ、Ｂ、Ｃと示す）をスタック構造として保持する。関数やループの実行中は、ＲＷに登録されている全てのエントリについて、これまでに述べた方法に基づいて、レジスタおよび主記憶参照を登録していく。 From the above, in order to realize multiple reuse, a mechanism for associating the aforementioned RF and RB with a function or a loop nesting structure is necessary. As shown in FIG. 9, by having a reuse window (RW), the RF and RB entries (shown as A, B, and C in the figure) that are currently being executed and registered are maintained as a stack structure. To do. During execution of a function or loop, registers and main memory references are registered for all entries registered in the RW based on the method described so far.

この際に、あるエントリに関して、（１）登録可能項目数の超過、（２）引数の第７ワードの検出、（３）システムコールの検出、によって再利用不可能であると判断した場合には、ＲＷを用いて、そのエントリに対応するＲＢおよび上位のＲＢを特定し、登録を中止することができる。 At this time, if it is determined that a certain entry cannot be reused by (1) exceeding the number of items that can be registered, (2) detecting the seventh word of the argument, and (3) detecting a system call. , The RB corresponding to the entry and the upper RB can be identified and registration can be stopped.

なお、ＲＷの深さは有限であるものの、一度に登録可能な多重度を超えて関数やループを検出した場合には、外側の命令区間から順次登録を中止し、より内側の命令区間を登録対象に加えることによって、入れ子関係の動的変化に追随することができる。また、実行および登録中（例えばＡ）に、再利用可能な命令区間（例えばＤ）に遭遇した場合には、登録済の入出力をそのまま登録中エントリに追加することによって、ＲＷの深さを超えるＡの多重再利用も可能となる。 Although the depth of RW is limited, if a function or loop is detected exceeding the multiplicity that can be registered at one time, registration is stopped sequentially from the outer instruction section, and the inner instruction section is registered. By adding to the object, it is possible to follow the dynamic change of the nesting relationship. When a reusable instruction section (eg, D) is encountered during execution and registration (eg, A), the RW depth is increased by adding the registered input / output to the entry being registered as it is. Multiple reuse of A exceeding A is also possible.

（並列事前実行）
以上に述べた、関数やループの多重再利用では、ＲＢエントリの生存時間よりも同一パラメータが出現する間隔が長い場合や、パラメータが単調に変化し続ける場合には全く効果がないことになる。すなわち、ＲＢエントリの生存時間よりも同一パラメータが出現する間隔が長い場合には、ある関数またはループがＲＢに登録されたとしても、その登録された関数またはループに関して同一パラメータが次に出現した際には、すでにその関数またはループがＲＢエントリから消えていることになり、再利用できないことになる。また、パラメータが単調に変化し続ける場合には、該当する関数やループがＲＢに登録されていても、パラメータが異なることによって再利用できないことになる。 (Parallel pre-execution)
In the multiple reuse of functions and loops as described above, there is no effect if the interval at which the same parameter appears is longer than the lifetime of the RB entry or if the parameter continues to change monotonically. That is, when the same parameter appears longer than the lifetime of the RB entry, even when a certain function or loop is registered in the RB, the same parameter appears next for the registered function or loop. The function or loop has already disappeared from the RB entry and cannot be reused. In addition, when the parameter continues to change monotonously, even if the corresponding function or loop is registered in the RB, the parameter cannot be reused because the parameter is different.

これに対して、多重再利用を行うプロセッサとしてのＭＳＰ１Ａとは別に、命令区間の事前実行によってＲＢエントリを有効にするプロセッサとしてのＳＳＰ１Ｂを複数個設けることによって、さらなる高速化を図ることができる。 On the other hand, by providing a plurality of SSP1Bs as processors that enable the RB entry by pre-execution of the instruction section separately from the MSP1A as a processor that performs multiple reuse, it is possible to further increase the speed.

並列事前実行機構を行うためのハードウェア構成は、前記した図２に示すような構成となる。同図に示すように、ＲＷ４Ａ・４Ｂ、演算器５Ａ・５Ｂ、レジスタ６Ａ・６Ｂ、キャッシュ７Ａ・７Ｂは、各プロセッサごとに独立して設けられている一方、ＲＦ／ＲＢ２、および主記憶３は全てのプロセッサが共有するようになっている。同図において、破線は、ＭＳＰ１ＡおよびＳＳＰ１ＢがＲＦ／ＲＢ２に対して入出力を登録するパスを示している。 The hardware configuration for performing the parallel pre-execution mechanism is as shown in FIG. As shown in the figure, the RWs 4A and 4B, the arithmetic units 5A and 5B, the registers 6A and 6B, and the caches 7A and 7B are provided independently for each processor, while the RF / RB 2 and the main memory 3 are provided. All processors are shared. In the figure, the broken lines indicate paths in which MSP 1A and SSP 1B register input / output to / from RF / RB2.

ここで、並列事前実行を実現する上での課題は、（１）どのように主記憶一貫性を保つか、（２）どのように入力を予測するかが挙げられる。以下に、これらの課題に対する解決手法について説明する。 Here, problems in realizing parallel pre-execution include (1) how to maintain main memory consistency, and (2) how to predict input. Below, the solution method with respect to these subjects is demonstrated.

（主記憶一貫性に関する課題の解決方法）
まず、上記の課題（１）どのように主記憶一貫性を保つかについて説明する。特に予測した入力パラメータに基づいて命令区間を実行する場合、主記憶３に書き込む値がＭＳＰ１ＡとＳＳＰ１Ｂとで異なることになる。これを解決するために、図２に示すように、ＳＳＰ１Ｂは、ＲＢへの登録対象となる主記憶参照にはＲＦ／ＲＢ２、また、その他の局所的な参照にはＳＳＰ１Ｂごとに設けた局所メモリとしてのＬｏｃａｌ７Ｂを使用することとし、Ｃａｃｈｅ７Ｂおよび主記憶３への書き込みを不要としている。なお、ＭＳＰ１Ａが主記憶３に対して書き込みを行った場合には、対応するＳＳＰ１Ｂのキャッシュラインが無効化される。 (Solutions to main memory consistency issues)
First, the above problem (1) how to maintain main memory consistency will be described. In particular, when an instruction interval is executed based on the predicted input parameter, the value written to the main memory 3 differs between the MSP 1A and the SSP 1B. In order to solve this, as shown in FIG. 2, the SSP 1B uses the local memory provided for each SSP 1B for RF / RB2 for the main memory reference to be registered in the RB and for other local references. The local 7B is used, and writing to the cache 7B and the main memory 3 is unnecessary. When the MSP 1A writes to the main memory 3, the corresponding SSP 1B cache line is invalidated.

具体的には、ＲＢへの登録対象のうち、読み出しが先行するアドレスについては主記憶３を参照し、ＭＳＰ１Ａと同様にアドレスおよび値をＲＢへ登録する。以後、主記憶３ではなくＲＢを参照することによって、他のプロセッサからの上書きによる矛盾の発生を避けることができる。局所的な参照については、読み出しが先行するということは、変数を初期化せずに使うことに相当し、値は不定でよいことになるので、主記憶３を参照する必要はない。 Specifically, the address and value are registered in the RB in the same manner as the MSP 1A with reference to the main memory 3 for the address preceded by the reading among the registration targets in the RB. Thereafter, by referring to the RB instead of the main memory 3, it is possible to avoid inconsistency due to overwriting from another processor. As for local reference, the fact that reading precedes corresponds to using a variable without initializing it, and the value may be indefinite, so it is not necessary to refer to the main memory 3.

なお、局所メモリとしてのＬｏｃａｌ７Ｂの容量は有限であり、関数フレームの大きさがＬｏｃａｌ７Ｂの容量を超えた場合など、実行を継続できない場合は、事前実行を打ち切るようにする。また、事前実行の結果は主記憶３に書き込まれないので、事前実行結果を使って、さらに次の事前実行を行うことはできない。 Note that the capacity of the Local 7B as a local memory is finite, and if the execution cannot be continued, such as when the size of the function frame exceeds the capacity of the Local 7B, the pre-execution is aborted. Further, since the result of the pre-execution is not written to the main memory 3, the next pre-execution cannot be performed using the pre-execution result.

（予測機構）
次に、上記の課題（２）どのように入力を予測するかについて説明する。事前実行に際しては、ＲＢの使用履歴に基づいて将来の入力を予測し、ＳＳＰ１Ｂへ渡す必要がある。このために、ＲＦ／ＲＢ２には、予測処理部２Ｂが設けられている。この予測処理部２Ｂは、ＲＦの各エントリごとに設けた小さなプロセッサによって構成され、ＭＳＰ１ＡやＳＳＰ１Ｂとは独立して入力予測値を求めるものである。 (Prediction mechanism)
Next, the problem (2) how to predict the input will be described. In advance execution, it is necessary to predict a future input based on the usage history of the RB and pass it to the SSP 1B. For this purpose, the RF / RB 2 is provided with a prediction processing unit 2B. The prediction processing unit 2B is configured by a small processor provided for each entry of RF, and obtains an input predicted value independently of the MSP 1A and SSP 1B.

前記したように、従来の入力予測では、ＲＢにおける入力側に登録された全てのアドレスが一律に扱われたことによって、予測の的中率を下げる結果となっている。この問題を解決するためには、予測が的中する可能性が高いアドレスと、予想が外れる可能性が高いアドレスを区別するとともに、値の変化にも着目して必要最小限のアドレスのみを予測対象とすることが必要である。 As described above, in the conventional input prediction, since all addresses registered on the input side in the RB are uniformly treated, the prediction accuracy is lowered. In order to solve this problem, we distinguish between addresses that are likely to be predicted and addresses that are likely to be unpredictable, and predict only the minimum necessary addresses by paying attention to changes in values. It is necessary to target.

予測が的中することが期待できるアドレスとは、アドレスが固定しており、かつ、値が単調変化するアドレスである。このようなアドレスには、ラベルによって参照される帯域変数、および、スタックポインタやフレームポインタをベースレジスタとして参照される局所変数（フレーム内変数）などがある。 An address that can be expected to be predicted is an address whose address is fixed and whose value changes monotonously. Such addresses include a band variable referred to by a label and a local variable (an intra-frame variable) referred to using a stack pointer or a frame pointer as a base register.

これらのアドレスを識別するために、ロード命令実行時のアドレス計算が参照するレジスタに定数フラグ（Const-FLAG）が設けられる。スタックポインタやフレームポインタとして用いるレジスタについては無条件に定数フラグがセットされるものとする。その他のレジスタについては、定数をセットする命令が実行された時に定数フラグ（Const-FLAG）がセットされるものとする。 In order to identify these addresses, a constant flag (Const-FLAG) is provided in a register referred to by address calculation at the time of execution of a load instruction. It is assumed that constant flags are unconditionally set for registers used as stack pointers and frame pointers. For other registers, the constant flag (Const-FLAG) is set when an instruction to set a constant is executed.

次に、過去に参照したアドレスのうち、一度も書き込みが行われないアドレスについては、内容が変化していないことが保証されることになり、このようなアドレスについては予測する必要がないことになる。よって、このようなアドレスを区別するために、書き込みが行われたことを示す変更フラグ（C-FLAG）が設けられる。入力要素としてのアドレスをＲＦ／ＲＢに新規に記録する時には、該アドレスに対応する変更フラグ（C-FLAG）がリセットされ、登録後に該アドレスに対してストア命令が実行された時に、変更フラグ（C-FLAG）がセットされる。 Next, among the addresses that have been referred to in the past, it is guaranteed that the contents have not changed, and there is no need to predict such addresses. Become. Therefore, in order to distinguish such addresses, a change flag (C-FLAG) indicating that writing has been performed is provided. When an address as an input element is newly recorded in the RF / RB, the change flag (C-FLAG) corresponding to the address is reset, and when a store instruction is executed for the address after registration, the change flag ( C-FLAG) is set.

また、入力要素としてのアドレスを履歴保存対象とするか否かを示す履歴マスク（P-Mask）が設けられる。入力要素としてのアドレスをＲＦ／ＲＢに新規に記録する時には、該アドレスに対応する履歴マスク（P-Mask）（履歴フラグ）がリセットされる。そして、ロード命令実行時に、該アドレスを生成したレジスタに対応する定数フラグ（Const-FLAG）がセットされている場合には、履歴マスク（P-Mask）のうちロード対象となったバイト位置がセットされる。 In addition, a history mask (P-Mask) indicating whether or not an address as an input element is a history storage target is provided. When an address as an input element is newly recorded in the RF / RB, a history mask (P-Mask) (history flag) corresponding to the address is reset. If the constant flag (Const-FLAG) corresponding to the register that generated the address is set when the load instruction is executed, the byte position to be loaded is set in the history mask (P-Mask). Is done.

以上の定数フラグ（Const-FLAG）、変更フラグ（C-FLAG）、および履歴マスク（P-Mask）の設定の制御は、ＲＦ／ＲＢ２に設けられているＲＢ登録処理部２Ａによって行われる。このＲＢ登録処理部２Ａは、小さなプロセッサによって構成され、上記のような判断を行うことによって定数フラグ（Const-FLAG）、変更フラグ（C-FLAG）、および履歴マスク（P-Mask）の設定を行う。 The control of the setting of the constant flag (Const-FLAG), the change flag (C-FLAG), and the history mask (P-Mask) is performed by the RB registration processing unit 2A provided in the RF / RB2. This RB registration processing unit 2A is configured by a small processor, and by making the above determination, the constant flag (Const-FLAG), the change flag (C-FLAG), and the history mask (P-Mask) are set. Do.

（命令区間の実行例）
ここで、命令区間の一例として、図２０に示す命令区間が、図１に示したＲＦおよびＲＢの構成によって実行された場合の例について説明する。同図において、ＰＣは、該命令区間が開始された際のＰＣ値を示している。すなわち、命令区間の先頭が１０００番地となっている。また、図１０は、図２０に示す命令区間が実行された場合のＲＢにおける実際の登録状況を示している。 (Execution example of instruction section)
Here, as an example of the instruction interval, an example in which the instruction interval shown in FIG. 20 is executed by the configuration of RF and RB shown in FIG. 1 will be described. In the figure, PC indicates a PC value when the instruction section is started. That is, the head of the instruction section is 1000 addresses. FIG. 10 shows an actual registration status in the RB when the command section shown in FIG. 20 is executed.

第１の命令において、アドレス定数Ａ１がレジスタＲ０にセットされる。この命令は、定数をセットする命令であるので、レジスタＲ０に対応する定数フラグ（Const-FLAG）がセットされる。 In the first instruction, the address constant A1 is set in the register R0. Since this instruction is an instruction for setting a constant, a constant flag (Const-FLAG) corresponding to the register R0 is set.

第２の命令において、レジスタＲ０の内容をアドレスとする主記憶３からロードされた４バイトデータ（00110000）がレジスタＲ１に格納される。この場合、アドレスＡ１、マスク（FFFFFFFF）、データ（00110000）は、入力としてＲＢにおけるInput側の第１列に登録され、レジスタ番号Ｒ１、マスク（FFFFFFFF）、およびデータ（00110000）は出力としてＲＢにおけるOutput側の第１列に登録される。 In the second instruction, 4-byte data (00110000) loaded from the main memory 3 with the contents of the register R0 as an address is stored in the register R1. In this case, address A1, mask (FFFFFFFF), and data (00110000) are registered as inputs in the first column on the Input side in RB, and register number R1, mask (FFFFFFFF), and data (00110000) are output as RB in RB. Registered in the first column on the Output side.

また、アドレスとして用いたレジスタＲ０に対応する定数フラグ（Const-FLAG）がセットされているので、アドレスＡ１に対応する履歴マスク（P-Mask）がセットされる。ここで、対象となるデータは（00110000）の４バイトデータであるので、これに対応して、アドレスＡ１に対応する履歴マスク（P-Mask）には（FFFFFFFF）がセットされる。そして、レジスタＲ１は、定数がセットされるものではないことになるので、レジスタＲ１に対応する定数フラグ（Const-FLAG）はリセットされる。 Since the constant flag (Const-FLAG) corresponding to the register R0 used as the address is set, the history mask (P-Mask) corresponding to the address A1 is set. Here, since the target data is 4-byte data of (00110000), (FFFFFFFF) is set in the history mask (P-Mask) corresponding to the address A1 corresponding to this. Since the constant is not set in the register R1, the constant flag (Const-FLAG) corresponding to the register R1 is reset.

第３の命令において、アドレス定数Ａ２がレジスタＲ０にセットされる。この命令は、定数をセットする命令であるので、レジスタＲ０に対応する定数フラグ（Const-FLAG）がセットされる。 In the third instruction, the address constant A2 is set in the register R0. Since this instruction is an instruction for setting a constant, a constant flag (Const-FLAG) corresponding to the register R0 is set.

第４の命令において、レジスタＲ０の内容をアドレスとする主記憶３からロードされた１バイトデータ（02）がレジスタＲ２に格納される。この場合、アドレスＡ２、マスク（FF000000）、およびデータ（02）は入力としてＲＢにおけるInput側の第２列に登録される。この際、アドレスＡ２の残り３バイトについては、Don't Careを意味する「−」が格納される。レジスタ番号Ｒ２、マスク（FFFFFFFF）およびデータ（00000002）は出力としてＲＢにおけるOutput側の第２列に登録される。 In the fourth instruction, 1-byte data (02) loaded from the main memory 3 with the contents of the register R0 as an address is stored in the register R2. In this case, the address A2, the mask (FF000000), and the data (02) are registered as inputs in the second column on the Input side of the RB. At this time, “−” indicating Don't Care is stored for the remaining 3 bytes of the address A2. Register number R2, mask (FFFFFFFF), and data (00000002) are registered as outputs in the second column on the Output side of RB.

また、アドレスとして用いたレジスタＲ０に対応する定数フラグ（Const-FLAG）がセットされているので、アドレスＡ２に対応する履歴マスク（P-Mask）がセットされる。ここで、対象となるデータは（02）の１バイトデータであるので、これに対応して、アドレスＡ２に対応する履歴マスク（P-Mask）には（FF000000）がセットされる。そして、レジスタＲ２は、定数がセットされるものではないことになるので、レジスタＲ２に対応する定数フラグ（Const-FLAG）はリセットされる。 Further, since the constant flag (Const-FLAG) corresponding to the register R0 used as the address is set, the history mask (P-Mask) corresponding to the address A2 is set. Here, since the target data is 1-byte data (02), (FF000000) is set in the history mask (P-Mask) corresponding to the address A2 corresponding to this. Since the constant is not set in the register R2, the constant flag (Const-FLAG) corresponding to the register R2 is reset.

また、アドレスとして用いたレジスタＲ２に対応する定数フラグ（Const-FLAG）がリセットされているので、アドレス（Ａ２＋02）に対応する履歴マスク（P-Mask）はセットされない。すなわち、アドレスＡ２に対応する履歴マスク（P-Mask）は（FF000000）のままとなる。そして、レジスタＲ２は、定数がセットされるものではないことになるので、レジスタＲ２に対応する定数フラグ（Const-FLAG）はリセットされる。 Since the constant flag (Const-FLAG) corresponding to the register R2 used as the address is reset, the history mask (P-Mask) corresponding to the address (A2 + 02) is not set. That is, the history mask (P-Mask) corresponding to the address A2 remains (FF000000). Then, since the constant is not set in the register R2, the constant flag (Const-FLAG) corresponding to the register R2 is reset.

第６の命令において、アドレス定数Ａ３がレジスタＲ０にセットされる。この命令は、定数をセットする命令であるので、レジスタＲ０に対応する定数フラグ（Const-FLAG）がセットされる。 In the sixth instruction, the address constant A3 is set in the register R0. Since this instruction is an instruction for setting a constant, a constant flag (Const-FLAG) corresponding to the register R0 is set.

第７の命令において、レジスタＲ０の内容をアドレスとする主記憶３からロードされた１バイトデータ（33）がレジスタＲ３に格納される。この場合、アドレスＡ３、マスク（00FF0000）、およびデータ（33）は入力としてＲＢにおけるInput側の第３列に登録される。レジスタ番号Ｒ３、マスク（FFFFFFFF）、およびデータ（00000033）は出力としてＲＢにおけるOutput側の第３列に登録される。 In the seventh instruction, 1-byte data (33) loaded from the main memory 3 having the contents of the register R0 as an address is stored in the register R3. In this case, the address A3, the mask (00FF0000), and the data (33) are registered as inputs in the third column on the Input side in the RB. Register number R3, mask (FFFFFFFF), and data (00000033) are registered as outputs in the third column on the Output side of RB.

また、アドレスとして用いたレジスタＲ０に対応する定数フラグ（Const-FLAG）がセットされているので、アドレスＡ３に対応する履歴マスク（P-Mask）がセットされる。ここで、対象となるデータは（33）の１バイトデータであるので、これに対応して、アドレスＡ３に対応する履歴マスク（P-Mask）には（00FF0000）がセットされる。そして、レジスタＲ３は、定数がセットされるものではないことになるので、レジスタＲ３に対応する定数フラグ（Const-FLAG）はリセットされる。 Since the constant flag (Const-FLAG) corresponding to the register R0 used as the address is set, the history mask (P-Mask) corresponding to the address A3 is set. Here, since the target data is 1-byte data of (33), (00FF0000) is set in the history mask (P-Mask) corresponding to address A3 corresponding to this. Since the constant is not set in the register R3, the constant flag (Const-FLAG) corresponding to the register R3 is reset.

また、アドレスとして用いたレジスタＲ１およびレジスタＲ２に対応する定数フラグ（Const-FLAG）がリセットされているので、アドレスＡ４に対応する履歴マスク（P-Mask）はセットされない。すなわち、アドレスＡ４に対応する履歴マスク（P-Mask）は（00000000）となる。そして、レジスタＲ４は、定数がセットされるものではないことになるので、レジスタＲ４に対応する定数フラグ（Const-FLAG）はリセットされる。 Since the constant flags (Const-FLAG) corresponding to the registers R1 and R2 used as addresses are reset, the history mask (P-Mask) corresponding to the address A4 is not set. That is, the history mask (P-Mask) corresponding to the address A4 is (00000000). Since the constant is not set in the register R4, the constant flag (Const-FLAG) corresponding to the register R4 is reset.

第９の命令において、レジスタＲ５から値が読み出され、読み出された値に１が加えられた結果が再びレジスタＲ５に格納される。この場合、レジスタＲ５、マスク（FFFFFFFF）、およびデータ（00000100）は入力としてＲＢにおけるInput側の第５列に登録される。また、レジスタ番号Ｒ５、マスク（FFFFFFFF）、およびデータ（00000101）は出力としてＲＢにおけるOutput側の第５列に登録される。この時、レジスタＲ５は、定数がセットされるものではないことになるので、レジスタＲ５に対応する定数フラグ（Const-FLAG）はリセットされる。 In the ninth instruction, the value is read from the register R5, and the result obtained by adding 1 to the read value is stored in the register R5 again. In this case, register R5, mask (FFFFFFFF), and data (00000100) are registered as inputs in the fifth column on the Input side of RB. Register number R5, mask (FFFFFFFF), and data (00000101) are registered as outputs in the fifth column on the Output side of RB. At this time, since the constant is not set in the register R5, the constant flag (Const-FLAG) corresponding to the register R5 is reset.

その後、アドレスＡ２、およびアドレスＡ３に対してストア命令が実行され、アドレスＡ２、およびアドレスＡ３に対して変更フラグ（C-FLAG）がセットされたとする。 Thereafter, it is assumed that a store instruction is executed for the address A2 and the address A3, and a change flag (C-FLAG) is set for the address A2 and the address A3.

以上の結果、変更フラグ（C-FLAG）がセットされ、かつ、履歴マスク（P-Mask）がセットされたマスク位置は、アドレスＡ２の第１バイト、アドレスＡ３の第２バイトのみとなる。このマスク位置のみに対応するアドレス、マスク、および値が、予測対象として、命令区間ごとに過去の入力履歴を保持する履歴情報として、ＲＢのエントリに記録される。また、ＲＢの入力パターンに登録されたレジスタについては無条件に予測対象として履歴として記録される。 As a result, the mask position where the change flag (C-FLAG) is set and the history mask (P-Mask) is set is only the first byte of the address A2 and the second byte of the address A3. An address, a mask, and a value corresponding only to this mask position are recorded in the RB entry as history information that holds a past input history for each command section as a prediction target. The registers registered in the RB input pattern are unconditionally recorded as a history as a prediction target.

図１１は、図２０に示す命令区間が繰り返し実行された場合における、履歴としてＲＢに登録された例を示している。同図に示すように、ＲＢには、アドレスＡ２の列に履歴マスク（P-Mask）として（FF000000）、アドレスＡ３の列に履歴マスク（P-Mask）として（00FF0000）、およびアドレスＲ５の列に履歴マスク（P-Mask）として（FFFFFFFF）が記憶される。そして、Timeが１〜４に変化する間に、各アドレスにおける履歴マスク（P-Mask）に対応する値が変化することになる。各履歴の間に示されるdiffは、対応する入力要素の値の変化量（差分）を示している。このdiffは、予測処理部２Ｂによって算出される。 FIG. 11 shows an example in which the command section shown in FIG. 20 is repeatedly registered and registered in the RB as a history. As shown in the figure, RB includes a history mask (P-Mask) in the column of address A2 (FF000000), a column of address A3 as a history mask (P-Mask) (00FF0000), and a column of address R5. (FFFFFFFF) is stored as a history mask (P-Mask). Then, while Time changes from 1 to 4, the value corresponding to the history mask (P-Mask) at each address changes. The diff shown between the histories indicates the amount of change (difference) in the value of the corresponding input element. This diff is calculated by the prediction processing unit 2B.

同図に示す例では、アドレスＡ２およびアドレスＲ５の列に関しては、Timeが１〜４に変化する間におけるdiffが全て01となっている。よって、これらのアドレスに対応する値は、単位時間あたりに01ずつ増加していくことが予想される。一方、アドレスＡ３の列に関しては、Timeが１〜４に変化する間に、diffは00であったり02であったりしている。したがって、アドレスＡ３に関しては、予測することが困難であることがわかる。 In the example shown in the figure, for the column of address A2 and address R5, all diffs are 01 while Time changes from 1 to 4. Therefore, the values corresponding to these addresses are expected to increase by 01 per unit time. On the other hand, for the column of address A3, diff is 00 or 02 while Time changes from 1 to 4. Therefore, it can be seen that it is difficult to predict the address A3.

以上より、予測処理部２Ｂは、履歴において、差分が一定となっているアドレスに関して、該差分がその後も継続するものと仮定して予測を行うとともに、差分が一定でない、または差分が０となっているアドレスに関しては予測を行わないようにする。 As described above, the prediction processing unit 2B performs prediction assuming that the difference continues after that for the address where the difference is constant in the history, and the difference is not constant or the difference becomes zero. Do not make predictions about existing addresses.

図１２は、上記の予測に基づいて、予測処理部２ＢがアドレスＡ２およびアドレスＲ５の値に関して予測を行った場合の、予測エントリとしてＲＢに記録される入力要素の状態を示している。同図において、アドレス（Ａ２＋4）およびアドレスＡ３に関しては、予測値を求めずに直接主記憶３を参照することによって得られたものとなっている。 FIG. 12 shows the state of the input element recorded in the RB as a prediction entry when the prediction processing unit 2B makes a prediction regarding the values of the address A2 and the address R5 based on the above prediction. In the figure, the address (A2 + 4) and the address A3 are obtained by directly referring to the main memory 3 without obtaining a predicted value.

このように入力要素の予測値が算出されると、ＳＳＰ１Ｂが、この予測入力要素に基づいて命令区間を実行することによって出力要素が算出され、この予測出力要素が予測エントリとしてＲＢに記憶される。その後、ＭＳＰ１Ａによって命令区間が実行され、予測エントリとしてＲＢに記憶されている予測入力要素と同じ入力値が入力された場合に、それに対応する予測出力要素を出力することによって再利用が実現されることになる。 When the predicted value of the input element is calculated in this way, the SSP 1B calculates the output element by executing the instruction interval based on the predicted input element, and this predicted output element is stored in the RB as a predicted entry. . After that, when the instruction interval is executed by the MSP 1A and the same input value as the prediction input element stored in the RB is input as the prediction entry, the reuse is realized by outputting the corresponding prediction output element. It will be.

（ＲＦ／ＲＢの第２の構成例）
次に、ＲＦ／ＲＢ２の第２の構成例について、図１３を参照しながら以下に説明する。同図に示すように、ＲＦ／ＲＢ２は、ＲＢ、ＲＦ、ＲＯ１（第２出力パターン記憶手段）、およびＲＯ２（第１出力パターン記憶手段）を備えた構成となっている。 (Second configuration example of RF / RB)
Next, a second configuration example of RF / RB 2 will be described below with reference to FIG. As shown in the figure, RF / RB2 has a configuration including RB, RF, RO1 (second output pattern storage means), and RO2 (first output pattern storage means).

ＲＢは、比較すべき値であるレジスタ値または主記憶入力値を格納するValue（値格納領域）、およびキー番号を格納するKey（キー格納領域）を備えており、ValueおよびKeyの組み合わせのラインを複数備えている。 The RB includes a value (value storage area) for storing a register value or a main memory input value, which is a value to be compared, and a key (key storage area) for storing a key number. There are multiple.

ＲＦは、次に比較すべきレジスタ番号または主記憶アドレスがないことを示す終端フラグＥ、次に比較すべきレジスタ番号または主記憶アドレスの内容が更新されたことを示す比較要フラグ、次に比較すべき対象がレジスタか主記憶かを示すＲ／Ｍ、次に比較すべきレジスタ番号または主記憶アドレスを示すAdr.（検索項目指定領域）、直前に参照したライン番号を示すUP（親ノード格納領域）、次に比較すべきレジスタ番号または主記憶アドレスよりも優先して比較すべきレジスタ番号または主記憶アドレスを示すAlt.（比較要項目指定領域）、および、優先して比較する際に必要なキーを示すDN（比較要キー指定領域）を備えており、これらはＲＢにおける各ラインに対応して設けられている。 RF is an end flag E indicating that there is no register number or main memory address to be compared next, a comparison required flag indicating that the contents of the register number or main memory address to be compared next is updated, and then comparing R / M indicating whether the object to be registered is a register or main memory, Adr. (Search item designation area) indicating the register number or main memory address to be compared next, UP indicating the line number referenced immediately before (stored in parent node) Area), Alt. (Comparison item specification area) indicating the register number or main memory address to be compared with priority over the register number or main memory address to be compared next, and necessary for priority comparison DN (comparison key designation area) indicating a correct key is provided corresponding to each line in the RB.

ＲＯ１およびＲＯ２は、ＲＢおよびＲＦによる検索結果により、再利用が可能であると判定された場合に、主記憶および／またはレジスタに出力する出力値を格納するものである。ＲＯ１は、ＲＦの各ラインに１対１で対応して出力値および出力すべきアドレスを格納している。ＲＯ２は、ＲＯ１のみでは出力値を格納しきれない場合に、格納しきれない分の出力値および出力すべきアドレスを格納している。ＲＯ２からも出力値を読み出す必要がある場合には、ＲＯ１における該当ラインに、ＲＯ２における出力値が格納されているポインタが示されており、このポインタを用いてＲＯ２から出力値の読み出しが行われる。また、ＲＢおよびＲＦは、それぞれＣＡＭおよびＲＡＭによって構成されている。 RO1 and RO2 store an output value to be output to the main memory and / or a register when it is determined that reuse is possible based on the search results by RB and RF. RO1 stores an output value and an address to be output in one-to-one correspondence with each line of RF. RO2 stores output values and addresses to be output when the output values cannot be stored by RO1 alone. When it is necessary to read the output value from RO2, a pointer storing the output value at RO2 is shown in the corresponding line at RO1, and the output value is read from RO2 using this pointer. . Moreover, RB and RF are comprised by CAM and RAM, respectively.

（第２の構成例における連想検索動作）
次に、第２の構成例における連想検索動作について説明する。図１に示した構成では、ＲＢにおける各エントリとしての横の行は、一致比較を行うべき入力値の項目を全て含んだものとなっている。すなわち、全ての入力パターンをそれぞれ１つの行としてＲＢに登録するようになっている。 (Associative search operation in the second configuration example)
Next, an associative search operation in the second configuration example will be described. In the configuration shown in FIG. 1, the horizontal row as each entry in the RB includes all items of input values to be subjected to matching comparison. That is, all input patterns are registered in the RB as one row.

これに対して、第２の構成例では、一致比較を行うべき入力値の項目を短い単位に区切り、それぞれの比較単位をノードとしてとらえ、入力パターンを木構造としてＲＦおよびＲＢに登録するようになっている。そして、再利用を行う際には、一致するノードを順次選択することによって、最終的に再利用可能かを判断するようになっている。別の言い方をすれば、複数の入力パターンに共通する部分を１つにまとめて、ＲＦおよびＲＢの１行に対応づけるようになっている。 On the other hand, in the second configuration example, items of input values to be subjected to coincidence comparison are divided into short units, each comparison unit is regarded as a node, and an input pattern is registered in RF and RB as a tree structure. It has become. Then, when reusing, by sequentially selecting matching nodes, it is determined whether or not reusable is finally possible. In other words, portions common to a plurality of input patterns are combined into one and associated with one line of RF and RB.

これにより、冗長性をなくし、ＲＦ／ＲＢ２を構成するメモリの利用効率を向上させることが可能となる。また、入力パターンを木構造としているので、１つの入力パターンをＲＢにおける１つの行としてのエントリに対応付ける必要がないことになる。よって、一致比較を行うべき入力値の項目の数を可変にすることが可能となっている。 As a result, redundancy can be eliminated and the utilization efficiency of the memory constituting the RF / RB 2 can be improved. Also, since the input pattern has a tree structure, it is not necessary to associate one input pattern with an entry as one row in the RB. Therefore, it is possible to vary the number of input value items to be subjected to coincidence comparison.

また、ＲＦおよびＲＢは、入力パターンを木構造として登録しているので、一致比較を行う際には、マルチマッチが行われないことになる。つまり、命令区間記憶部２としては、シングルマッチ機構を有する連想検索メモリであれば実現可能となる。ここで、シングルマッチ機構のみを有する連想検索メモリは一般的に市販されている一方、マルチマッチをシングルマッチと同一性能によって報告可能な連想検索メモリは一般的には市販されていない。すなわち、第２の構成例によれば、市販の連想検索メモリを利用することができるので、より短期間かつ低コストで、本実施形態に係るデータ処理装置を実現することが可能となる。 Also, since RF and RB register their input patterns as a tree structure, multimatching is not performed when matching comparison is performed. That is, the instruction interval storage unit 2 can be realized as long as it is an associative search memory having a single match mechanism. Here, while an associative search memory having only a single match mechanism is generally marketed, an associative search memory capable of reporting multimatches with the same performance as a single match is not generally marketed. That is, according to the second configuration example, since a commercially available associative search memory can be used, the data processing apparatus according to the present embodiment can be realized in a shorter period of time and at a lower cost.

次に、図１４を参照しながら、ＲＦ／ＲＢ２における連想検索動作の具体例について説明する。まず、命令区間の実行が検出されると、プログラムカウンタ（ＰＣ）およびレジスタの内容（Reg.）がＲＢに入力される。そして、ＲＢにおいて、連想検索により、入力されたこれらの値と、ＲＢのValueの列に登録されている命令区間先頭アドレスおよびレジスタ値とが比較され、値が一致する唯一の行（ライン）が候補（マッチライン）として選択される。この例では、ＲＢにおける「０１」のラインがマッチラインとして選択される。 Next, a specific example of the associative search operation in RF / RB2 will be described with reference to FIG. First, when execution of an instruction interval is detected, a program counter (PC) and register contents (Reg.) Are input to the RB. Then, in the RB, these values inputted by the associative search are compared with the instruction section start address and the register value registered in the Value column of the RB, and the only line (line) in which the values match is found. Selected as a candidate (match line). In this example, the “01” line in the RB is selected as the match line.

次に、マッチラインとして選択されたラインのＲＢにおける番地である「０１」が、エンコード結果としてＲＦに伝達され、キー０１に対応するＲＦにおけるラインが参照される。キー０１に対応するＲＦにおけるラインでは、比較要フラグが「０」であり、比較すべき主記憶アドレスがＡ１となっている。すなわち、主記憶アドレスＡ１に関しては、一致比較を行う必要はないことになる。 Next, “01”, which is the address in the RB of the line selected as the match line, is transmitted to the RF as an encoding result, and the line in the RF corresponding to the key 01 is referred to. In the RF line corresponding to the key 01, the comparison required flag is “0”, and the main storage address to be compared is A1. That is, it is not necessary to perform a coincidence comparison on the main memory address A1.

次に、キー０１を用いて、ＲＢにおけるKeyの列に対して検索が行われる。この例では、ＲＢにおける「０３」のラインがマッチラインとして選択される。そして、エンコード結果としてキー０３がＲＦに伝達され、キー０３に対応するＲＦにおけるラインが参照される。キー０３に対応するＲＦにおけるラインでは、比較要フラグが「１」であり、比較すべき主記憶アドレスがＡ２となっている。すなわち、主記憶アドレスＡ２に関しては、一致比較を行う必要があることになる。ここで、主記憶３における主記憶アドレスＡ２の値がＣａｃｈｅ７Ａを介して読み出され、ＲＢにおいて、Valueが主記憶３から読み出された値であり、かつ、Keyが「０３」となっているラインが検索される。図１４に示す例では、Keyが「０３」となっているラインは「０４」および「０５」の２つあるが、主記憶３から読み出された値が「００」であるので、「０５」のラインがマッチラインとして選択され、ＲＦに対して、エンコード結果としてキー０５が伝達される。 Next, using the key 01, a search is performed on the Key column in the RB. In this example, the line “03” in RB is selected as the match line. Then, the key 03 is transmitted to the RF as an encoding result, and the line in the RF corresponding to the key 03 is referred to. In the RF line corresponding to the key 03, the comparison required flag is “1”, and the main storage address to be compared is A2. That is, it is necessary to perform a coincidence comparison on the main memory address A2. Here, the value of the main memory address A2 in the main memory 3 is read via the Cache 7A, and in RB, the value is the value read from the main memory 3, and the key is “03”. A line is searched. In the example shown in FIG. 14, there are two lines “04” and “05” with Key “03”, but the value read from the main memory 3 is “00”. "Is selected as a match line, and the key 05 is transmitted to the RF as an encoding result.

以上のような処理が繰り返され、ＲＦにおいて、次に比較すべきレジスタ番号または主記憶アドレスがないことを示す終端フラグＥが検出された場合、入力パターンが全て一致したと判定され、該当命令区間は再利用可能と判断される。そして、終端フラグＥが検出されたラインから「Select Output」信号が出力され、ＲＯ１およびＲＯ２に格納されている、該ラインに対応する出力値がレジスタ６Ａおよび主記憶３に対して出力される。 When the above processing is repeated and a termination flag E indicating that there is no register number or main memory address to be compared next is detected in RF, it is determined that all the input patterns match, and the corresponding instruction section Is determined to be reusable. Then, a “Select Output” signal is output from the line in which the termination flag E is detected, and output values corresponding to the lines stored in RO1 and RO2 are output to the register 6A and the main memory 3.

以上のように、第２の構成例による連想検索動作は、次のような特徴を有している。まず、内容が一致したことを示すマッチラインは、ＲＢにおいて１つのラインのみとなるので、検索動作を次列へ伝搬する際にエンコードした結果を１つ伝送すればよいことになる。したがって、ＲＢとＲＦとの間を接続する信号線は、アドレスのエンコード結果である１組（Ｎ本）でよいことになる。これに対して、上記した図１に示す例では、ＲＢにおいてマルチマッチが許容されているので、ＲＢにおける各列同士を接続する信号線は、各ラインごとに設ける（２^Ｎ本）必要があることになる。すなわち、第２の構成例によれば、ＲＦ／ＲＢ２を構成する連想検索メモリにおける信号線の数を大幅に低減することが可能となる。 As described above, the associative search operation according to the second configuration example has the following characteristics. First, since there is only one match line in the RB indicating that the contents match, it is only necessary to transmit one encoded result when propagating the search operation to the next column. Therefore, one set (N) of signal lines connecting the RB and the RF may be sufficient as the result of address encoding. On the other hand, in the example shown in FIG. 1 described above, since multi-matching is allowed in the RB, the signal lines that connect the columns in the RB need to be provided for each line ( ^2N lines). It will be. That is, according to the second configuration example, the number of signal lines in the associative search memory configuring the RF / RB 2 can be significantly reduced.

また、検索途中ではシングルマッチのみが許容されるようになっているので、比較すべき項目の比較順番は、木構造における参照順に限定されることになる。すなわち、レジスタ値とメモリ内容とは、参照順に混在させながら比較する必要がある。 Further, since only a single match is allowed during the search, the comparison order of items to be compared is limited to the reference order in the tree structure. That is, it is necessary to compare the register value and the memory contents while mixing them in the reference order.

入力パターンは、各項目を参照すべきKeyという形でリンクさせることにより、木構造によってＲＢおよびＲＦに登録されている。また、入力パターンの項目は、終端フラグによってその終端が示されるようになっている。よって、入力パターンの項目数を可変とすることができるので、再利用表に登録すべき命令区間の状態に応じて、柔軟に入力パターンの項目数を設定することが可能となる。また、入力パターンの項目数が固定でないことによって、利用しない項目が無駄にメモリ領域を占有することがなくなるので、メモリ領域の利用効率を向上させることができる。 The input pattern is registered in RB and RF by a tree structure by linking each item in the form of Key to be referred to. Further, the end of the input pattern item is indicated by the end flag. Therefore, since the number of input pattern items can be made variable, the number of input pattern items can be flexibly set in accordance with the state of the command section to be registered in the reuse table. In addition, since the number of items in the input pattern is not fixed, the unused area does not occupy the memory area unnecessarily, and the use efficiency of the memory area can be improved.

また、木構造によって入力パターンが登録されるので、項目の内容が重複する部分については、複数の入力パターンで１つのラインを共有することが可能となっている。よって、メモリ領域の利用効率をさらに向上させることができる。 In addition, since the input pattern is registered by the tree structure, it is possible to share one line with a plurality of input patterns for portions where the contents of the items overlap. Therefore, the utilization efficiency of the memory area can be further improved.

なお、以上のような構成の場合、ＲＦおよびＲＢを構成するメモリとしては、構造が縦長のものとなる。例えばこのメモリ容量を２Ｍｂｙｔｅとした場合、横が８ｗｏｒｄ、縦を６５５３６ラインとすることになる。 In the case of the above configuration, the memory constituting the RF and RB has a vertically long structure. For example, when the memory capacity is 2 Mbytes, the horizontal is 8 words and the vertical is 65536 lines.

（連想検索動作の別の例）
上記の例では、図１３に示したＲＦにおいて、UP、Alt.、およびDNの項目は利用していないことになる。すなわち、上記の例では、ＲＦにおいて、これらの項目を設ける必要はないことになる。これに対して、UP、Alt.、およびDNの項目を利用することによって、連想検索動作をさらに高速化する構成および動作について以下に説明する。 (Another example of associative search)
In the above example, the items UP, Alt., And DN are not used in the RF shown in FIG. That is, in the above example, it is not necessary to provide these items in the RF. On the other hand, a configuration and operation for further speeding up the associative search operation by using items of UP, Alt., And DN will be described below.

まず、図１５（ｂ）に、プログラムカウンタ（ＰＣ）およびレジスタの内容（Reg.）のみを比較し、これらが一致した場合は、主記憶値を比較することなく、区間の再利用が可能であると判断できる場合の状態を示す。この状態では、まず、ＲＢの「０１」のラインにおいて、ＰＣおよびReg.がValueに登録されており、ＲＦの「０１」のラインにおいて、終端フラグが「Ｅ」、比較要フラグが「０」、比較すべき主記憶アドレスが「Ａ１」、親ノード番号を示すUPが「ＦＦ」となっている。また、ＲＢの「０３」のラインでは、Value値なしで、Keyが「０１」となっており、ＲＦの「０３」のラインでは、終端フラグが「Ｅ」、比較要フラグが「０」、比較すべき主記憶アドレスが「Ａ２」、親ノード番号を示すUPが「ＦＦ」となっている。以降、同様に、ＲＢおよびＲＦにおける「０５」のラインおよび「０７」のラインが登録されており、それぞれ終端フラグが「Ｅ」、比較要フラグが「０」となっている。 First, in FIG. 15B, only the program counter (PC) and the contents of the register (Reg.) Are compared. If they match, the section can be reused without comparing the main memory values. Indicates the state when it can be determined that there is. In this state, first, PC and Reg. Are registered in Value in the RB “01” line. In the RF “01” line, the termination flag is “E” and the comparison required flag is “0”. The main storage address to be compared is “A1”, and the UP indicating the parent node number is “FF”. In the RB “03” line, the Key is “01” without the Value value, and in the RF “03” line, the termination flag is “E”, the comparison required flag is “0”, The main storage address to be compared is “A2”, and the UP indicating the parent node number is “FF”. Thereafter, similarly, the “05” line and the “07” line in the RB and RF are registered, the termination flag is “E”, and the comparison required flag is “0”, respectively.

この状態で、ある命令区間の実行が検出されると、ＰＣおよびReg.がＲＢに入力され、マッチラインとして、ＲＢにおける「０１」のラインが選択される。そして、マッチラインとして選択されたラインのＲＢにおける番地である「０１」が、エンコード結果としてＲＦに伝達され、キー０１に対応するＲＦにおけるラインが参照される。キー０１に対応するＲＦにおけるラインでは、終端フラグが「Ｅ」となっているので、次に比較すべき主記憶アドレスがないことがわかる。また、比較要フラグ「０」となっているので、主記憶アドレスＡ１について比較を行う必要はないことがわかる。 In this state, when execution of a certain instruction section is detected, PC and Reg. Are input to the RB, and the line “01” in the RB is selected as a match line. Then, “01” which is the address in the RB of the line selected as the match line is transmitted to the RF as an encoding result, and the line in the RF corresponding to the key 01 is referred to. Since the end flag is “E” in the RF line corresponding to the key 01, it can be seen that there is no main memory address to be compared next. Further, since the comparison required flag is “0”, it can be seen that it is not necessary to compare the main memory address A1.

したがって、図１５（ａ）の木構造に示すように、ＰＣおよびReg.の一致がＳ１において確認されると、Ｔｒ１に示すノードのように、主記憶アドレスＡ１、Ａ２、Ａ３における比較を行うことなく、対応する出力値が出力されることになる。 Therefore, as shown in the tree structure of FIG. 15 (a), when the coincidence of PC and Reg. Is confirmed in S1, comparison is performed at main storage addresses A1, A2, and A3 as in the node indicated by Tr1. Instead, the corresponding output value is output.

ＲＦおよびＲＢがこの状態である場合に、主記憶アドレスＡ２に対して書き込みが行われたとする。この場合、ＲＦおよびＲＢにおける入力パターンの登録時には主記憶アドレスＡ２の一致比較を行う必要はない状態であったが、主記憶アドレスＡ２が変更されることによって、主記憶アドレスＡ２の一致比較を行う必要が生じることになる。したがって、この場合には、図１６（ｂ）に示すようにＲＦおよびＲＢが変更されることになる。 When RF and RB are in this state, it is assumed that writing has been performed on the main memory address A2. In this case, it is not necessary to perform a coincidence comparison of the main memory address A2 at the time of registering the input pattern in the RF and RB, but a coincidence comparison of the main memory address A2 is performed by changing the main memory address A2. There will be a need. Therefore, in this case, RF and RB are changed as shown in FIG.

まず、内容が変更された主記憶アドレスであるＡ２をキーにして、ＲＦにおけるAdr.
の列に対して検索がかけられる。これによって、ＲＦにおける「０３」のラインが選択される。そして、選択された「０３」のラインにおいて、比較要フラグが「１」に設定されるとともに、終端フラグ「Ｅ」が削除される。 First, using A2 which is the main memory address whose contents are changed as a key, Adr.
A search is performed on the columns. As a result, the line “03” in RF is selected. Then, in the selected line “03”, the comparison required flag is set to “1” and the end flag “E” is deleted.

次に、「０３」のラインにおけるUPを参照することによって、親ノードとしての「０１」のラインが認識される。そして、「０１」のラインにおいて、次に比較すべき主記憶アドレスよりも優先して比較すべき主記憶アドレスを示すAlt.に、内容が変更された主記憶アドレスであるＡ２を書き込まれるとともに、終端フラグ「Ｅ」が削除される。さらに、「０１」のラインにおいて、優先して比較する際に必要なキーを示すDNに「０３」が書き込まれる。 Next, by referring to the UP in the “03” line, the “01” line as the parent node is recognized. In the line “01”, A2 which is the main storage address whose contents are changed is written to Alt. Indicating the main storage address to be compared with priority over the main storage address to be compared next. The termination flag “E” is deleted. Further, in the “01” line, “03” is written in the DN indicating the key required for the priority comparison.

以上のようにＲＦおよびＲＢが書き換えられた場合の連想検索動作は次のようになる。ある命令区間が検出された際に、まず、ＰＣおよびReg.がＲＢに入力される。そして、ＲＢにおいて、連想検索により、入力されたこれらの値と、ＲＢのValueの列に登録されている命令区間先頭アドレスおよびレジスタ値とが比較され、ＲＢにおける「０１」のラインがマッチラインとして選択される。 The associative search operation when RF and RB are rewritten as described above is as follows. When a certain instruction section is detected, first, PC and Reg. Are input to RB. Then, in the RB, these values input by the associative search are compared with the instruction section start address and register value registered in the Value column of the RB, and the “01” line in the RB is used as a match line. Selected.

次に、マッチラインとして選択されたラインのＲＢにおける番地である「０１」が、エンコード結果としてＲＦに伝達され、キー０１に対応するＲＦにおけるラインが参照される。キー０１に対応するＲＦにおけるラインでは、比較要フラグが「０」であり、比較すべき主記憶アドレスがＡ１となっている。すなわち、主記憶アドレスＡ１に関しては、一致比較を行う必要はないことがわかる。 Next, “01”, which is the address in the RB of the line selected as the match line, is transmitted to the RF as an encoding result, and the line in the RF corresponding to the key 01 is referred to. In the RF line corresponding to the key 01, the comparison required flag is “0”, and the main storage address to be compared is A1. That is, it can be seen that it is not necessary to perform a coincidence comparison with respect to the main memory address A1.

また、次に比較すべき主記憶アドレスよりも優先して比較すべき主記憶アドレスを示すAlt.に、主記憶アドレスＡ２が登録されており、優先して比較する際に必要なキーを示すDNに「０３」が登録されていることが確認される。この場合、主記憶３における主記憶アドレスＡ２の値がＣａｃｈｅ７Ａを介して読み出され、ＲＢにおいて、Valueが主記憶３から読み出された値であり、かつ、Keyが、DNに示されている「０３」となっているラインが検索される。 Also, the main memory address A2 is registered in Alt. Indicating the main memory address to be compared with priority over the main memory address to be compared next, and DN indicating the key required for the priority comparison It is confirmed that “03” is registered in. In this case, the value of the main memory address A2 in the main memory 3 is read through the Cache 7A, and in RB, Value is the value read from the main memory 3, and Key is indicated in DN. A line with “03” is searched.

図１６（ｂ）に示す例では、Keyが「０３」となっているラインは「０４」および「０５」の２つあるが、主記憶３から読み出された値が「００」であるので、「０５」のラインがマッチラインとして選択され、ＲＦに対して、エンコード結果としてキー０５が伝達される。キー０５に対応するＲＦにおけるラインでは、終端フラグが「Ｅ」となっているので、入力パターンが全て一致したと判定され、該当命令区間は再利用可能と判断される。そして、終端フラグＥが検出されたラインから「Select Output」信号が出力され、ＲＯ１およびＲＯ２に格納されている、該ラインに対応する出力値がレジスタ６Ａおよび主記憶３に対して出力される。 In the example shown in FIG. 16B, there are two lines with “03” Key “04” and “05”, but the value read from the main memory 3 is “00”. , “05” is selected as a match line, and key 05 is transmitted to RF as an encoding result. In the line in RF corresponding to the key 05, since the termination flag is “E”, it is determined that all the input patterns match, and the corresponding command section is determined to be reusable. Then, a “Select Output” signal is output from the line in which the termination flag E is detected, and output values corresponding to the lines stored in RO1 and RO2 are output to the register 6A and the main memory 3.

以上のような連想検索動作によれば、ＲＦにおいて、次に比較すべき主記憶アドレスよりも優先して比較すべき主記憶アドレスを示すAlt.、および、優先して比較する際に必要なキーを示すDNが設けられているので、主記憶アドレスＡ１の内容とキー０１による検索をスキップして、主記憶アドレスＡ２の内容とキー０３による検索が可能となる。したがって、検索動作の処理ステップを低減することができるので、処理の高速化を図ることができる。 According to the associative search operation as described above, in RF, Alt. Indicating the main memory address to be compared with priority over the main memory address to be compared next, and the key necessary for the priority comparison Therefore, it is possible to skip the search by the contents of the main storage address A1 and the key 01 and the search by the contents of the main storage address A2 and the key 03. Accordingly, the processing steps of the search operation can be reduced, and the processing speed can be increased.

（出力値の格納手段）
上記では、命令区間の入力パターンをＲＦおよびＲＢに登録し、連想検索動作を行うことについて説明したが、以下では、入力パターンの一致が確認された後に、再利用として出力される出力値を格納する手段について説明する。上記において図１３を参照しながら説明したように、命令区間記憶部２には、再利用が可能であると判定された場合に、主記憶および／またはレジスタに出力する出力値を格納する出力値格納手段として、ＲＯ１およびＲＯ２が設けられている。 (Output value storage means)
In the above description, the input pattern of the instruction section is registered in RF and RB, and the associative search operation is performed. However, in the following, the output value output as reuse is stored after the input pattern match is confirmed. The means to do will be described. As described above with reference to FIG. 13, the instruction interval storage unit 2 stores an output value for storing an output value to be output to the main memory and / or the register when it is determined that reuse is possible. RO1 and RO2 are provided as storage means.

出力値は、ＲＦおよびＲＢから出力されるアドレスに基づいて、出力値を記憶するＲＡＭなどの記憶手段を参照することによって得ることが可能である。しかしながら、入力パターンと同様に、出力パターンについても、出力値の項目数を可変とすることが好ましいので、出力値の格納方法に関して工夫が必要である。 The output value can be obtained by referring to storage means such as a RAM for storing the output value based on the address output from the RF and RB. However, like the input pattern, it is preferable that the number of output value items be variable for the output pattern, and thus a device for storing the output value needs to be devised.

入力パターンに関しては、ＲＦおよびＲＢにおいて木構造によって登録されている。そして、木構造の末端となっているライン、すなわち、終端フラグＥが登録されているラインにおいて、再利用が可能であると判定されることになる。したがって、終端フラグＥが登録されている各ラインに、出力すべき出力値を格納する出力値格納手段におけるポインタを登録しておくことによって、再利用の際の出力動作を行うことが可能となる。 The input pattern is registered in a tree structure in RF and RB. Then, it is determined that reuse is possible in the line that is the end of the tree structure, that is, the line in which the end flag E is registered. Therefore, by registering a pointer in the output value storage means for storing the output value to be output in each line in which the termination flag E is registered, it is possible to perform an output operation at the time of reuse. .

しかしながら、入力パターンが全て一致したことが確認された時点で、出力値が格納されているポインタに基づいて出力値格納手段における格納位置が特定される場合、ポインタに基づいて格納位置を特定するという変換処理が必要となり、処理速度を低下させる要因となる。 However, when the storage position in the output value storage unit is specified based on the pointer storing the output value when it is confirmed that all the input patterns match, the storage position is specified based on the pointer. Conversion processing is required, which causes a reduction in processing speed.

そこで、本実施形態では、出力値格納手段として、ＲＯ１およびＲＯ２の２つの記憶手段を設けている。そして、ＲＯ１は、ＲＦの各ラインに１対１で対応して出力値および出力すべきアドレスを格納している。すなわち、終端フラグＥが登録されているＲＦのラインにおいて再利用が可能であると判定された場合には、そのラインに対応するＲＯ１のラインが選択され、出力値が出力される。 Therefore, in the present embodiment, two storage means, RO1 and RO2, are provided as output value storage means. RO1 stores an output value and an address to be output in a one-to-one correspondence with each line of RF. That is, when it is determined that the RF line in which the termination flag E is registered can be reused, the RO1 line corresponding to the line is selected and the output value is output.

しかしながら、このように、出力値格納手段を、ＲＦの各ラインに１対１で対応して出力値および出力すべきアドレスを格納している場合、ＲＦにおける、終端フラグＥが登録されていないＲＦのラインに対しても、ＲＯ１においてメモリ領域が確保されることになる。また、終端フラグＥが登録されているＲＦの全てのラインに対応して、ＲＯ１において出力値を格納するので、同じ内容が複数箇所で記憶されている、というような冗長性が存在することになる。したがって、ＲＯ１は、高速に処理を行うという面では優れているが、メモリの利用効率としてはよくないことになる。 However, in this way, when the output value storage means stores the output value and the address to be output in one-to-one correspondence with each line of the RF, the RF in which the termination flag E is not registered in the RF A memory area is secured in RO1 for these lines. Further, since the output value is stored in RO1 corresponding to all the lines of the RF in which the termination flag E is registered, there is a redundancy that the same contents are stored in a plurality of places. Become. Therefore, RO1 is excellent in terms of processing at high speed, but it is not good in terms of memory utilization efficiency.

この問題を解消するために、ＲＯ１に登録可能な項目数、すなわち出力値と出力アドレスとの組の数を少なめに設定する（図１３の例では２つ）とともに、ＲＯ１に登録しきれない出力値および出力アドレスの組については、ポインタを用いて格納領域が指示される構成のＲＯ２に登録するようにしている。 In order to solve this problem, the number of items that can be registered in RO1, that is, the number of pairs of output values and output addresses is set to be small (two in the example of FIG. 13), and output that cannot be registered in RO1. A set of a value and an output address is registered in RO2 having a configuration in which a storage area is designated using a pointer.

ＲＯ２においては、ポインタによって格納領域が指示されるので、使用されないメモリ領域はほとんど生じないことになる。また、複数の出力値および出力アドレスの組を登録する場合には、順次ポインタを用いてつなげていくことができるので、登録可能な出力値および出力アドレスの組の数を可変にすることが可能である。さらに、ＲＯ１における複数のラインから、ＲＯ２における同じ格納位置を示すポインタを指示することも可能となるので、ＲＯ２における格納情報を、ＲＯ１における複数のラインで共有することも可能となる。よって、ＲＯ２においては、格納内容の冗長性を低くすることができる。 In RO2, since the storage area is indicated by the pointer, there is almost no unused memory area. Also, when registering multiple sets of output values and output addresses, you can connect them sequentially using pointers, so the number of sets of output values and output addresses that can be registered can be made variable. It is. Furthermore, since a pointer indicating the same storage position in RO2 can be designated from a plurality of lines in RO1, storage information in RO2 can also be shared by a plurality of lines in RO1. Therefore, in RO2, the redundancy of stored contents can be reduced.

以上のように、出力値格納手段としてＲＯ１およびＲＯ２の２つを設けることによって、出力値の項目が少ない場合にはＲＯ１のみの利用により処理の高速性を実現するとともに、出力値の項目が多い場合には、項目の数を可変とすることが可能なＲＯ２を用いることによって対応している。よって、上記の構成によれば、処理の高速性とメモリ利用効率の向上とを実現することができる。 As described above, by providing two output value storage units, RO1 and RO2, when the number of output value items is small, the processing speed can be increased by using only RO1, and the number of output value items is large. The case is dealt with by using RO2, which can change the number of items. Therefore, according to the above configuration, it is possible to realize high-speed processing and improvement in memory utilization efficiency.

（命令区間記憶部に対する登録処理）
上記では、ある命令区間の実行に際して再利用を行う場合の動作について説明した。以下では、ある命令区間の実行に際して、再利用が行えないと判断された場合に、該命令区間による入出力をＲＦ、ＲＢ、ＲＯ１、およびＲＯ２に登録する際の動作について説明する。 (Registration processing for the command section storage unit)
In the above, the operation in the case of reusing when executing a certain instruction section has been described. In the following, an operation when registering input / output in an instruction section in RF, RB, RO1, and RO2 when it is determined that reuse cannot be performed when executing a certain instruction section will be described.

まず、ある命令区間の実行が検出されると、ＰＣおよびReg.の値がＲＢに入力される。そして、ＲＢにおいて、連想検索により、入力されたこれらの値と、ＲＢのValueの列に登録されている命令区間先頭アドレスおよびレジスタ値とが比較される。ここで、ＲＢのValueの列に、入力された値と一致するものがないと判定された場合、該命令区間は、再利用が不可能であると判定され、演算器５Ａによる演算処理が行われる。そして、該当命令区間の演算処理が終了するまでに用いられるレジスタ入力値、主記憶入力値、主記憶出力値、およびレジスタ出力値が、ＲＢ、ＲＦ、ＲＯ１、必要に応じてＲＯ２に登録される。ここで、ＲＢおよびＲＦに登録を行う際には、上記で示したような木構造となるように、各項目が１つのラインに対応するように登録が行われる。そして、登録すべき入力パターンの最後の項目が登録されたラインにおいて、ＲＦの終端フラグを「Ｅ」とし、入力パターンの登録を終了する。 First, when execution of a certain instruction section is detected, the values of PC and Reg. Are input to RB. Then, in the RB, these values inputted by the associative search are compared with the instruction section start address and the register value registered in the Value column of the RB. Here, if it is determined that there is no RB Value column that matches the input value, it is determined that the instruction section cannot be reused, and the arithmetic unit 5A performs arithmetic processing. Is called. Then, the register input value, the main memory input value, the main memory output value, and the register output value that are used until the calculation process of the corresponding instruction section is completed are registered in RB, RF, RO1, and RO2 as necessary. . Here, when registering in RB and RF, registration is performed so that each item corresponds to one line so as to have the tree structure as described above. Then, in the line where the last item of the input pattern to be registered is registered, the RF termination flag is set to “E”, and the registration of the input pattern is completed.

一方、入力されたＰＣおよびReg.の値に一致するものが、ＲＢのValueの列に登録されている場合には、上記した連想検索動作と同様にして、次の一致比較すべき項目についての一致比較が行われる。このようにして、ＲＢおよびＲＦに登録されている入力パターンと、該当命令区間における入力パターンとの一致比較を継続していき、一致しない項目が生じた時点で、新たにノードを追加する形で、その一致しない項目についてＲＢおよびＲＦに登録が行われる。そして、登録すべき入力パターンの最後の項目が登録されたラインにおいて、ＲＦの終端フラグを「Ｅ」とし、入力パターンの登録を終了する。 On the other hand, if an item that matches the entered PC and Reg. Values is registered in the RB Value column, the next item to be compared is compared in the same manner as the associative search operation described above. A match comparison is performed. In this way, the comparison between the input pattern registered in RB and RF and the input pattern in the corresponding instruction section is continued, and a new node is added when an unmatched item occurs. Then, registration is performed in the RB and RF for the mismatched items. Then, in the line where the last item of the input pattern to be registered is registered, the RF termination flag is set to “E”, and the registration of the input pattern is completed.

入力パターンの登録が終了すると、終端フラグを「Ｅ」としたＲＦにおけるラインに対応する、ＲＯ１におけるラインに、出力値および出力アドレスの登録を行う。そして、出力値として登録すべき項目がＲＯ１に登録しきれない場合には、ポインタを用いてＲＯ２に対して登録が行われる。以上により、命令区間の登録処理が完了する。 When the registration of the input pattern is completed, the output value and the output address are registered in the line in RO1 corresponding to the line in RF whose end flag is “E”. When items to be registered as output values cannot be registered in RO1, registration is performed for RO2 using a pointer. Thus, the instruction section registration process is completed.

（第２の構成例における予測機構）
第２の構成例では、命令区間の実行時における入出力パターンを一時的に格納する場所は、ＲＷ４Ａ・４Ｂとなる。ここで、前記した第１の構成例では、命令区間の実行時における入出力パターンはＲＢに直接登録されていたので、ＲＷ４Ａ・４ＢはＲＢの各行に対するポインタによって実現されていた。これに対して、第２の構成例では、ＲＦおよびＲＢが木構造によって構成されているので、ＲＷ４Ａ・４Ｂが直接ＲＢの行をポイントすることができない。すなわち、第２の構成例では、ＲＷ４Ａ・４Ｂは、ＲＢの各行に対するポインタとして機能するものではなく、命令区間の実行時における入出力パターンを一時的に格納する実質的なメモリとして機能することになる。 (Prediction mechanism in the second configuration example)
In the second configuration example, the locations for temporarily storing the input / output pattern during execution of the instruction section are RW4A and 4B. Here, in the first configuration example described above, since the input / output pattern at the time of execution of the instruction section is directly registered in the RB, the RWs 4A and 4B are realized by pointers to each row of the RB. On the other hand, in the second configuration example, since RF and RB are configured by a tree structure, RW 4A and 4B cannot directly point to the row of RB. In other words, in the second configuration example, the RWs 4A and 4B do not function as pointers to the respective rows of the RBs, but function as substantial memories that temporarily store input / output patterns at the time of execution of instruction sections. Become.

また、図１３においては図示していないが、第２の構成例においても、所定の命令区間が繰り返し実行された場合における入力パターンの履歴エントリを格納する一時格納メモリ領域として、図１に示すようなＲＦおよびＲＢが設けられている。ただし、この場合には、ＲＢにおけるエントリの行は、履歴エントリを格納する履歴格納行としての数行によって構成されることになる。 Although not shown in FIG. 13, in the second configuration example as shown in FIG. 1 as a temporary storage memory area for storing a history entry of an input pattern when a predetermined instruction section is repeatedly executed. RF and RB are provided. However, in this case, the row of entries in the RB is composed of several rows as history storage rows for storing history entries.

命令区間が実行されると、その入力要素がＲＷ４Ａ・４Ｂに順次格納され、全ての入力要素が揃い、演算が行われることによって出力要素が確定すると、この入出力パターンが、上記履歴格納行に格納されるとともに、上記のような木構造の入出力パターン格納機構に格納されることになる。 When the command section is executed, the input elements are sequentially stored in the RW 4A and 4B, and when all the input elements are prepared and the output elements are determined by performing the operation, this input / output pattern is stored in the history storage line. In addition to being stored, it is stored in the tree-structured input / output pattern storage mechanism as described above.

また、所定の命令区間が繰り返し実行された場合には、履歴格納行に順次格納され、所定の数の履歴が格納された時点で、上記のように予測処理部２Ｂによって予測が行われ、予測に基づいてＳＳＰ１Ｂによって実行された結果は、上記のような木構造の入出力パターン格納機構に格納されることになる。 Further, when a predetermined command section is repeatedly executed, it is sequentially stored in the history storage line, and when a predetermined number of histories are stored, the prediction processing unit 2B performs prediction as described above, and the prediction The result executed by the SSP 1B based on the above is stored in the tree-structured input / output pattern storage mechanism as described above.

（本発明の適用例）
「LIMIT」などによって大域変数領域とスタック領域とを区別できるプログラム実行環境があるとした上で、本発明に係るデータ処理装置を他の命令セットアーキテクチャにも適用するためには、スタックフレーム上の変数が、上位／下位関数のどちらの局所変数であるかを区別する手段が必要である。特に、引数を格納するレジスタが不足し、引数をスタックに格納する場合、呼ばれた関数側ではこの区別をすることができないことになる。 (Application example of the present invention)
In order to apply the data processing apparatus according to the present invention to other instruction set architectures, assuming that there is a program execution environment in which the global variable area and the stack area can be distinguished by “LIMIT” or the like, There is a need for a means of distinguishing whether a variable is a local variable of an upper / lower function. In particular, if there are not enough registers to store arguments and arguments are stored on the stack, the called function cannot make this distinction.

本実施の形態で取り上げたＳＰＡＲＣプロセッサでは、引数の先頭６ワードを汎用レジスタに格納しており、６ワード以上の引数を扱う関数は出現頻度が高くないことと、引数がスタックに溢れた時点で再利用ができなくなることの両方を利用することによって、関数／ループの再利用を実現している。ＳＰＡＲＣプロセッサ同様に、３２本以上の汎用レジスタを有する多くのＲＩＳＣプロセッサでも、同様の判断をすることによって、本発明のような関数／ループの再利用を実現することが可能である。 In the SPARC processor taken up in this embodiment, the first 6 words of the argument are stored in the general-purpose register, and the function that handles the argument of 6 words or more does not appear frequently, and when the argument overflows the stack. By utilizing both of the fact that reuse becomes impossible, function / loop reuse is realized. Similar to the SPARC processor, many RISC processors having 32 or more general-purpose registers can realize the reuse of the function / loop as in the present invention by making the same determination.

本発明に係るデータ処理装置は、上記したようにＳＰＡＲＣプロセッサに適用することが可能である。また、ＳＰＡＲＣプロセッサと同様に、３２本以上の汎用レジスタを有する多くのＲＩＳＣプロセッサにも適用することが可能である。また、このようなプロセッサを備えたゲーム機器、携帯型電話機、および情報家電などに適用することができる。 The data processing apparatus according to the present invention can be applied to the SPARC processor as described above. Further, like the SPARC processor, the present invention can be applied to many RISC processors having 32 or more general-purpose registers. Further, the present invention can be applied to game machines, portable telephones, information home appliances, and the like provided with such a processor.

本発明の一実施形態に係るデータ処理装置が備えるＲＦ／ＲＢによって実現される再利用表を示す図である。It is a figure which shows the reuse table implement | achieved by RF / RB with which the data processor which concerns on one Embodiment of this invention is provided. 上記データ処理装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the said data processor. 命令がデコードされた結果、関数呼び出し命令である場合の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process in case it is a function call instruction as a result of decoding an instruction. 命令がデコードされた結果、関数復帰命令である場合の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process in case it is a function return instruction as a result of decoding an instruction. 命令がデコードされた結果、後方分岐成立である場合の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process in case a back branch is taken as a result of decoding an instruction. 命令がデコードされた結果、後方分岐不成立である場合の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process when back branch is not established as a result of decoding an instruction. 関数およびループが入れ子構造となっている状態の一例を示す図である。It is a figure which shows an example of the state where the function and the loop are nested. 関数の入れ子構造において、内側の構造のレジスタ入出力が、外側の構造のレジスタ入出力となる影響範囲を示す図である。It is a figure which shows the influence range from which the register input / output of an inner structure turns into the register input / output of an outer structure in the nested structure of a function. ＲＷと、ＲＦ・ＲＢとの関係を示す図である。It is a figure which shows the relationship between RW and RF * RB. ある命令区間が実行された場合のＲＢにおける実際の登録状況を示す図である。It is a figure which shows the actual registration condition in RB when a certain command area is performed. ある命令区間が繰り返し実行された場合における、履歴としてＲＢに登録された例を示す図である。It is a figure which shows the example registered into RB as a log | history when a certain command area is repeatedly performed. 予測に基づいて、予測処理部がアドレスＡ２およびアドレスＲ５の値に関して予測を行った場合の、予測エントリとしてＲＢに記録される入力要素の状態を示す図である。It is a figure which shows the state of the input element recorded on RB as a prediction entry when a prediction process part estimates about the value of address A2 and address R5 based on prediction. ＲＦ／ＲＢの第２の構成例の概略を示す図である。It is a figure which shows the outline of the 2nd structural example of RF / RB. 図１３に示すＲＦ／ＲＢにおける連想検索動作の具体例を示す図である。It is a figure which shows the specific example of the associative search operation | movement in RF / RB shown in FIG. 同図（ｂ）は、図１３に示すＲＦ／ＲＢにおける連想検索動作の他の具体例を示す図であり、同図（ａ）は、同図（ｂ）における連想検索動作を木構造として示す図である。FIG. 7B is a diagram showing another specific example of the associative search operation in the RF / RB shown in FIG. 13, and FIG. 9A shows the associative search operation in FIG. 13B as a tree structure. FIG. 同図（ｂ）は、図１３に示すＲＦ／ＲＢにおける連想検索動作のさらに他の具体例を示す図であり、同図（ａ）は、同図（ｂ）における連想検索動作を木構造として示す図である。FIG. 7B is a diagram showing still another specific example of the associative search operation in the RF / RB shown in FIG. 13, and FIG. 9A shows the associative search operation in FIG. FIG. 同図（ａ）は、関数Ａが関数Ｂを呼び出す構造を概念的に示す概念図であり、同図（ｂ）は、同図（ａ）に示すプログラム構造を実行する際の主記憶におけるメモリマップを示す図である。FIG. 4A is a conceptual diagram conceptually showing a structure in which the function A calls the function B, and FIG. 4B shows a memory in the main memory when executing the program structure shown in FIG. It is a figure which shows a map. 関数Ａが関数Ｂを呼び出す場合の、メモリマップにおける引数およびフレームの概要を示す図である。It is a figure which shows the outline | summary of the argument and frame in a memory map when the function A calls the function B. １つの関数を再利用するための従来の再利用表を示す図である。It is a figure which shows the conventional reuse table for reusing one function. 命令区間の一例を示す図である。It is a figure which shows an example of an instruction area. 図２０に示す命令区間が実行された場合に、ＲＢに登録される入力アドレスおよび入力データ、並びに出力アドレスおよび出力データを簡略化して示す図である。FIG. 21 is a diagram showing, in a simplified manner, input addresses and input data, and output addresses and output data that are registered in an RB when the instruction section shown in FIG. 20 is executed. ＲＢにおける実際の登録状況を示す図である。It is a figure which shows the actual registration condition in RB. 図２０に示す命令区間が繰り返し実行された場合における、ＲＢの入力側に登録される履歴の例を示す図である。It is a figure which shows the example of the log | history registered on the input side of RB when the command area shown in FIG. 20 is repeatedly performed. 従来の入力予測による予測結果を示す図である。It is a figure which shows the prediction result by the conventional input prediction.

Explanation of symbols

１ＡＭＳＰ
１ＢＳＳＰ
２ＲＦ／ＲＢ（入出力記憶手段）
２ＡＲＢ登録処理部（区別処理手段）
２Ｂ予測処理部（予測処理手段）
３主記憶（主記憶手段）
４Ａ・４ＢＲＷ
５Ａ・５Ｂ演算器（第１・第２の演算手段）
６Ａ・６Ｂレジスタ
７Ａ・７ＢＣａｃｈｅ 1A MSP
1B SSP
2 RF / RB (input / output storage means)
2A RB registration processing unit (discrimination processing means)
2B prediction processing unit (prediction processing means)
3 Main memory (main memory means)
4A ・ 4B RW
5A / 5B computing unit (first and second computing means)
6A ・ 6B Register 7A ・ 7B Cache

Claims

In a data processing apparatus that reads a command section from a main storage means and performs a process of writing a result of arithmetic processing to the main storage means,
First calculation means for performing an operation based on an instruction section read from the main storage means, a register used when reading and writing to the main storage means by the first calculation means, and execution results of a plurality of instruction sections Input / output storage means for storing the input pattern and output pattern of
When the first calculation means executes an instruction section, if the input pattern of the instruction section matches the input pattern stored in the input / output storage means, the first calculation means corresponds to the input pattern. Reuse processing for outputting the output pattern stored in the entry output storage means to the register and / or the main storage means,
The data processing device is
Of the input elements included in the input pattern, the input elements that should be predicted and the input elements that do not need to be predicted, when the execution result of the instruction section by the first calculation means is stored in the input / output storage means Distinction processing means for registering this distinction information in the input / output storage means,
Prediction processing means for predicting a change in the value of an input element to be predicted among the input elements stored in the input / output storage means based on the distinction information;
Based on the input element predicted by the prediction processing means, further comprising a second calculation means for executing the corresponding instruction section in advance,
The pre-execution result of the instruction section by the second calculation means is stored in the input / output storage means ,
When the distinction processing means is used as a stack pointer or a frame pointer for each address of the register used for input, and when the write instruction for the address is an instruction for setting a constant, the corresponding address On the other hand, a data processing apparatus is characterized in that a constant flag is set as distinction information indicating an input element to be predicted, and in the other cases, the constant flag is reset for the corresponding address .

In a data processing apparatus that reads a command section from a main storage means and performs a process of writing a result of arithmetic processing to the main storage means,
First calculation means for performing an operation based on an instruction section read from the main storage means, a register used when reading and writing to the main storage means by the first calculation means, and execution results of a plurality of instruction sections Input / output storage means for storing the input pattern and output pattern of
When the first calculation means executes an instruction section, if the input pattern of the instruction section matches the input pattern stored in the input / output storage means, the first calculation means corresponds to the input pattern. Reuse processing for outputting the output pattern stored in the entry output storage means to the register and / or the main storage means,
The data processing device is
Of the input elements included in the input pattern, the input elements that should be predicted and the input elements that do not need to be predicted, when the execution result of the instruction section by the first calculation means is stored in the input / output storage means Distinction processing means for registering this distinction information in the input / output storage means,
Prediction processing means for predicting a change in the value of an input element to be predicted among the input elements stored in the input / output storage means based on the distinction information;
Based on the input element predicted by the prediction processing means, further comprising a second calculation means for executing the corresponding instruction section in advance,
The pre-execution result of the instruction section by the second calculation means is stored in the input / output storage means ,
When the discrimination processing means newly stores an input element in the input / output storage means, the change flag as the discrimination information indicating the input element to be predicted is reset for the address of the input element, A data processing apparatus, wherein when the store instruction is executed for the corresponding address after being stored in the input / output storage means, the change flag is set for the corresponding address.

When the discrimination processing means newly stores an input element in the input / output storage means, the change flag as the discrimination information indicating the input element to be predicted is reset for the address of the input element , after being stored in the input storage means, when the store instruction for the corresponding address is performed, the corresponding address data processing apparatus according to claim 1, wherein the setting the change flag for.

When the discrimination processing means newly stores an input element in the input / output storage means, the history flag as the discrimination information indicating the input element to be predicted is reset for the address of the input element , at load instruction to said address, when the constant flag is set in the register address that generated the address, data processing according to claim 1, wherein the setting the history flag with respect to the address apparatus.

When the input processing unit newly stores the input element in the input / output storage unit, the discrimination processing unit resets a change flag as the discrimination information for the address of the input element and stores the change flag in the input / output storage unit Later, when a store instruction is executed for the corresponding address, a change flag is set for the corresponding address.
The prediction processing means predicts the change of the input element with respect to the address in which the change flag is set and the history flag is set among the addresses of the input elements stored in the input / output storage means. 5. A data processing apparatus according to claim 4, wherein:

The prediction processing means predicts a change in the value of an input element only for an input element stored in the input / output storage means and whose value change in the history of the input element is not zero. the data processing apparatus according to claim 1 or 2, wherein the performing.

A data processing program for causing a computer to execute processing performed by each means included in the data processing apparatus according to claim 1.

A computer-readable recording medium on which the data processing program according to claim 7 is recorded.