JP3872809B2

JP3872809B2 - Method and apparatus for enhancing scheduling in advanced microprocessors

Info

Publication number: JP3872809B2
Application number: JP2005334136A
Authority: JP
Inventors: ロザス，ガイラーモ・ジェイ; ドソウザ，ゴドフリー・ピー; プライス，チャールズ・アール; サーリス，ポール・エス
Original assignee: トランスメータ・コーポレーション
Priority date: 1999-06-14
Filing date: 2005-11-18
Publication date: 2007-01-24
Anticipated expiration: 2020-06-12
Also published as: JP2003502754A; US9081563B2; CA2377164A1; EP1194855A2; US8209517B1; EP1194855B1; JP2006099800A; DE60042824D1; CN1202480C; EP1194855A4; KR20020022068A; US7089404B1; CN1355902A; WO2000077965A3; CA2377164C; WO2000077965A2; KR100758367B1; ATE441147T1; US20120246453A1; KR20060002031A

Abstract

Apparatus and a method for causing scheduler software to produce code which executes more rapidly by ignoring some of the normal constraints placed on its scheduling operations and simply scheduling certain instructions to run as fast as possible, raising an exception if the scheduling violates a scheduling constraint, and determining steps to be taken for correctly executing each set of instructions about which an exception is raised.

Description

本発明は、コンピュータ・システムに関し、更に特定すれば、改良されたマイクロプロセッサにおける命令の並び替え（reordering）を加速化する方法および装置に関する。 The present invention relates to computer systems and, more particularly, to a method and apparatus for accelerating instruction reordering in an improved microprocessor.

近年、単純であるが非常に高速なホスト・プロセッサ（「モーフ・ホスト」と称する）およびソフトウエア（「コード・モーフィング・ソフトウエア（code morphing software）」と称する）を組み合わせ、このモーフ・ホスト・プロセッサとは異なるプロセッサのために設計されたアプリケーション・プログラムを、しかも当該プログラムが設計されたプロセッサ（ターゲット・プロセッサ）では達成できない速度で実行する、新たなマイクロプロセッサが開発された。モーフ・ホスト・プロセッサは、コード・モーフィング・ソフトウエアを実行して、アプリケーション・プログラムをモーフ・ホスト・プロセッサの命令に変換し、本来のターゲット・ソフトウエアの目的を達成する。ターゲット命令を変換しつつ、これらを実行すると同時に変換バッファに格納して、更に変換することなく、これらにアクセスすることができる。プログラムの初期的変換および実行は低速であるが、一旦変換すれば、ハードウエアにおいてプログラムを実行する際に通常必要とされるステップの多くは不要となる。 In recent years, a simple but very fast host processor (referred to as “morph morph”) and software (referred to as “code morphing software”) have been combined, New microprocessors have been developed that execute application programs designed for a processor different from the processor at a speed that cannot be achieved by the processor (target processor) for which the program is designed. The morph host processor executes code morphing software to convert the application program into morph host processor instructions to achieve the original target software objectives. As the target instructions are converted, they can be executed and stored in the conversion buffer at the same time and accessed without further conversion. Initial conversion and execution of the program is slow, but once converted, many of the steps normally required when executing the program in hardware are not required.

迅速なレートで、他のプロセッサ用に設計されたプログラムを実行可能とするために、モーフ・プロセッサは多数のハードウエア強化を含む。これらの強化の１つは、ホスト・プロセッサと変換バッファとの間に存在するゲート格納バッファ（gated store buffer）である。第２の強化は、変換対象のターゲット命令のいずれのシーケンスにおいても、その開始時におけるターゲット・マシンの状態を格納する１組のホスト・レジスタである。ターゲット・プロセッサの状態がわかっている範囲にあるターゲット命令のシーケンスを、モーフ・ホスト命令に変換し、変換バッファ内に置き、実行を待つ。例外が発生することなく、変換された命令が実行された場合、命令シーケンスの開始時におけるターゲット状態を、シーケンスが完了した時点におけるターゲット状態に更新する。 In order to be able to execute programs designed for other processors at a rapid rate, the morph processor includes a number of hardware enhancements. One of these enhancements is a gated store buffer that exists between the host processor and the translation buffer. The second enhancement is a set of host registers that store the state of the target machine at the start of any sequence of target instructions to be translated. A sequence of target instructions in a range where the state of the target processor is known is converted to a morph host instruction, placed in a conversion buffer, and awaiting execution. If the converted instruction is executed without an exception occurring, the target state at the start of the instruction sequence is updated to the target state at the time when the sequence is completed.

変換されたホスト命令シーケンスの実行中に例外が発生すると、処理は停止し、ターゲット・マシンの既知状態が存在する、ターゲット命令シーケンスの先頭に動作全体が戻る、即ち、ロールバックすることができる。これによって、命令を動的に変換および実行しつつ、非常に迅速かつ精確な例外処理が可能となり、従来技術では決して達成されなかった成果が得られる。 If an exception occurs during the execution of the translated host instruction sequence, the process stops and the entire operation can be returned to the beginning of the target instruction sequence where a known state of the target machine exists, i.e., can be rolled back. This allows for very rapid and precise exception handling while dynamically translating and executing instructions, resulting in results never achieved by the prior art.

コード・モーフィング・ソフトウエアの一部であるスケジューラによって新たなマイクロプロセッサを走らせる際、一層の高速化が達成される。スケジューラは、命令が変換されていく際に、生の変換によって生ずるそのままの順序から、同じ結果が得られるが実行の高速化を可能にする順序に、命令を並び替え、再スケジューリングする。スケジューラは、ある種の命令を他の命令の前に置くか、または命令を一緒に走らせて、再スケジューリングしたソフトウエアの実行にかかる時間を短縮しようとする。スケジューラの機能には多数の制約が伴い、最も基本的な制約は、再スケジューリングしたプログラムが、なおも元のプログラムと同じ最終結果を生成しなければならないということである。 Further speedup is achieved when running a new microprocessor with a scheduler that is part of the code morphing software. As the instructions are converted, the scheduler rearranges and reschedules the instructions from the raw order generated by the raw conversion into an order that provides the same result but allows for faster execution. The scheduler attempts to reduce the time it takes to execute the rescheduled software by placing certain instructions in front of other instructions or running the instructions together. The scheduler function is associated with a number of constraints, and the most basic constraint is that the rescheduled program must still produce the same final result as the original program.

一例として、プログラムの中には、正しい結果を得るためには中断なく実行しなければならない命令シーケンスがある。スケジューラは、得られる結果に干渉することなく、かかるシーケンスに干渉することはできない。多くのプロセッサは、ハードウエア・インターロックを備え、かかるシーケンスを実際に中断なく走らせることを保証している。かかる命令シーケンスを保護する必要性のために、ここで論じている先進のモーフ・ホスト・プロセッサのようなハードウエア・インターロックのないプロセッサには、特殊な制約がかけられる。ソフトウエアは、何らかの方法で、かかるシーケンスを把握し、中断なくこれらを走らせることを保証しなければならない。 As an example, some programs have a sequence of instructions that must be executed without interruption in order to obtain correct results. The scheduler cannot interfere with such sequences without interfering with the results obtained. Many processors have hardware interlocks to ensure that such sequences actually run without interruption. Because of the need to protect such instruction sequences, special constraints are placed on processors without hardware interlocks, such as the advanced morph host processors discussed herein. The software must in some way know these sequences and ensure that they run without interruption.

制御依存性は、スケジューラが直面する、並び替えに対する従来からの制約の別の１つである。制御依存性は、分岐命令に関係する。スケジューラは、分岐の前および後に現れる命令を並び替えても、プログラムが正しく実行することを保証しなければならない。 Control dependency is another traditional constraint on reordering that the scheduler faces. Control dependency is related to branch instructions. The scheduler must ensure that the program executes correctly even if it reorders the instructions that appear before and after the branch.

格納に関して、ロードの並び替えに影響を及ぼす別の依存性がある。例えば、更新したデータをあるメモリ・アドレスに格納し、次いでレジスタ動作において操作する場合、格納を行う時点では、当該アドレスのデータをレジスタ内に保持していてはならない。さもなければ、レジスタ内のデータは古くなっている可能性がある。 There is another dependency on storage that affects load reordering. For example, when the updated data is stored at a certain memory address and then operated in a register operation, the data at that address must not be held in the register at the time of storage. Otherwise, the data in the register may be out of date.

これらの制約全てにより、典型的なスケジューラの機能は非常に保守的となり、その結果、生成するコードが低速化する。
従来のスケジューラは、並び替えを行うために、互いに依存しあう命令を決定する際に最善を尽くす。通常のスケジューラは、ある動作が何らかの形で他の動作に依存すること、およびある動作は他の動作に全く依存しないことを判定できるが、他の動作に関しては何も判定することはできない。かかるスケジューラは、他の動作に依存する動作の場合、これらが発生した正常な生来の順序に並べることによって、保守的にこれらを扱う。かかるスケジューラは、他の動作に全く依存しない動作を、それが望むように、並び替える。最終的に、スケジューラは、依存性に関して判断することができない全ての動作を、これらが互いに依存するかのように扱い、したがってこれらの扱いは保守的でしかも緩慢となる。 All of these constraints make typical scheduler functionality very conservative, resulting in slower code generation.
Conventional schedulers do their best in determining instructions that depend on each other to perform reordering. A normal scheduler can determine that some operations depend in some way on other operations, and some operations do not depend on other operations at all, but cannot determine anything about other operations. Such schedulers treat them conservatively in the case of operations that depend on other operations, by arranging them in the normal native order in which they occurred. Such a scheduler reorders operations that are completely independent of other operations as it desires. Eventually, the scheduler treats all operations that cannot be determined with respect to dependencies as if they depend on each other, thus making these treatments conservative and slow.

先進のプロセッサのスケジューラが、加速した速度で実行するコードを発生することを可能にする回路およびソフトウエアを提供することが望ましい。 It would be desirable to provide circuitry and software that allows advanced processor schedulers to generate code that executes at an accelerated rate.

本発明は、スケジューリング動作に対して規定された通常の制約を一部無視し、できるだけ速く実行するようにある命令を単純にスケジューリングし、このスケジューリングがスケジューリング制約に違反した場合例外を発生し、例外を発生した各命令セットを正しく実行するために講ずる処置を決定することによって、スケジューラ・ソフトウエアに、より迅速に実行するコードを生成させる装置および方法によって実現する。 The present invention ignores some of the normal constraints defined for scheduling operations, simply schedules certain instructions to execute as fast as possible, raises an exception if this scheduling violates the scheduling constraints, This is accomplished by an apparatus and method that causes the scheduler software to generate code to be executed more quickly by determining the actions to be taken to correctly execute each instruction set that has generated.

本発明のこれらおよびその他の目的ならびに特徴は、以下の詳細な説明を、図面と共に参照することによって、より良く理解されよう。図面では、いくつかの図にわたって同様のエレメントは同様の符号で引用することとする。 These and other objects and features of the invention will be better understood by reference to the following detailed description taken in conjunction with the drawings. In the drawings, like elements are referred to with like numerals throughout the several views.

図１は、技術的現状のマイクロプロセッサよりもはるかに単純な、強化ハードウエア処理部（「モーフ・ホスト」と呼ぶ）、およびエミュレート・ソフトウエア部（「コード・モーフィング・ソフトウエア」と呼ぶ）を組み合わせた新規マイクロプロセッサ１０を示す。２つの部分は、一緒に機能して、通常ではハードウエアのみで行う動作を、先進のマイクロプロセッサにおいて実行する。新規マイクロプロセッサ１０は、従来技術のマイクロプロセッサよりも高速であり、多数の従来技術のマイクロプロセッサ系列によって走らせることができるあらゆるオペレーティング・システムのためのあらゆるソフトウエアを走らせることができ、しかも従来技術のマイクロプロセッサよりも安価である。 FIG. 1 is a much simpler hardware processor (referred to as “morph host”) and emulated software (referred to as “code morphing software”) that is much simpler than the state of the art microprocessor. ) Shows a new microprocessor 10 combined. The two parts work together to perform operations in an advanced microprocessor that would normally be performed only in hardware. The new microprocessor 10 is faster than prior art microprocessors, can run any software for any operating system that can be run by a number of prior art microprocessor families, and is conventional. Less expensive than technical microprocessors.

マイクロプロセッサ１０は、異なるターゲット・プロセッサ用に設計されたアプリケーション・プログラムを実行するために、コード・モーフィング・ソフトウエア１２を実行するように設計されたモーフ・ホスト・プロセッサ１１を含む。モーフ・ホスト１１は、コード・モーフィング・ソフトウエア１２によって提供される加速技法の効率的な利用を可能とするように、特別に適合化したハードウエア強化を含む。モーフ・ホスト・プロセッサは、動作の加速化を支援し、および例外またはエラーが発生した場合に直ちにターゲット・コンピュータの状態を与えるハードウエア強化を含む。コード・モーフィング・ソフトウエアは、とりわけ、ターゲット・プログラムの命令をモーフ・ホスト命令に変換し、ホスト命令のスケジューリングおよび最適化を行い、必要な場合には例外およびエラーに応答して、実行が正しかったことがわかっている最後の時点に実行をロールバックし、その時点における正しいターゲット状態と作業状態を置換することによって正しいターゲット・コードの再変換が行われるようにするソフトウエアを含む。また、コード・モーフィング・ソフトウエアは、処理速度を高める種々のプロセスも含む。図２のブロック図は、ここで論ずる特徴を実現する、モーフ・ホスト１１のハードウエア例を詳細に示す。 The microprocessor 10 includes a morph host processor 11 designed to execute code morphing software 12 to execute application programs designed for different target processors. The morph host 11 includes specially adapted hardware enhancements to allow efficient use of the acceleration techniques provided by the code morphing software 12. The morph host processor includes hardware enhancements that help speed up operations and provide the state of the target computer immediately when an exception or error occurs. Code morphing software, among other things, translates target program instructions into morph host instructions, schedules and optimizes host instructions, and executes correctly in response to exceptions and errors when necessary. It includes software that rolls back execution to the last known time and replaces the correct target state and working state at that time so that the correct target code retranslation occurs. The code morphing software also includes various processes that increase processing speed. The block diagram of FIG. 2 details the example hardware of the morph host 11 that implements the features discussed herein.

図３の図（コード・モーフィング・ソフトウエア１２の主ループの動作を示す）に示すように、コード・モーフィング・ソフトウエアは、強化モーフ・ホストとの組み合わせにより、実行中にターゲット命令をモーフ・ホスト用命令に変換し、これらのホスト命令をメモリ・データ構造（「変換バッファ」と呼ぶ）内にキャッシュする。一旦ターゲット命令を変換したなら、変換バッファからリコールし、実行することができる。その際、各ターゲット命令を実施するためにはどの原始命令が必要か判定を行い、各原始命令をアドレスし、各原始命令をフェッチし、原始命令のシーケンスを最適化し、各原始命令にアセットを割り当て、原始命令を並び替え、各ターゲット命令を実行する毎に、関係する原始命令の各シーケンスの各ステップを実行するというような、従来技術のハードウエア・マイクロプロセッサでは必要であった多数のステップのいずれも必要としない。 As shown in the diagram of FIG. 3 (showing the main loop operation of the code morphing software 12), the code morphing software, in combination with the enhanced morph host, morphs the target instruction during execution. Converts to host instructions and caches these host instructions in a memory data structure (referred to as a “translation buffer”). Once the target instruction has been converted, it can be recalled from the conversion buffer and executed. In doing so, determine which primitive instructions are required to implement each target instruction, address each primitive instruction, fetch each primitive instruction, optimize the sequence of primitive instructions, and assign assets to each primitive instruction Many steps required in prior art hardware microprocessors, such as assigning, reordering primitive instructions, and executing each step of each sequence of related primitive instructions each time a target instruction is executed None of them are required.

従来技術のエミュレーション技法の主な問題の１つは、ターゲット・プログラムの実行中に発生する例外を巧みに処理できないことであった。ターゲット・アプリケーションの実行中に発生した例外の中は、ターゲット・オペレーティング・システムに送出されるものがあり、例外およびそれに続く命令の適正な実行のためには、いずれのこのような例外の時点においても正しいターゲット状態が得られなければならない。また、他にも、エミュレータが、ある特定のホスト機能によって置換された特定のターゲット動作を検出するために、例外を発生する可能性がある。更に、ホスト・プロセッサも、ターゲット命令から派生したホスト命令を実行する際に、例外を発生する可能性がある。これらの例外は全て、エミュレータによってターゲット命令をホスト命令に変化させようとする試行の間、またはエミュレート・ホスト命令をホスト・プロセッサによって実行するときに、発生する可能性がある。ターゲット・オペレーティング・システムに送出される例外は、常にターゲット・プロセッサの状態に関する知識が必要であるので、特に困難である。 One of the main problems with prior art emulation techniques has been the inability to skillfully handle exceptions that occur during execution of the target program. Some exceptions that occur during the execution of the target application are thrown to the target operating system, and at the time of any such exceptions for proper execution of the exception and subsequent instructions. Even the correct target state must be obtained. In addition, the emulator may generate an exception in order to detect a specific target operation that has been replaced by a specific host function. Furthermore, the host processor may also generate an exception when executing a host instruction derived from the target instruction. All of these exceptions can occur during an attempt to change the target instruction to a host instruction by the emulator, or when an emulated host instruction is executed by the host processor. Exceptions thrown to the target operating system are particularly difficult because knowledge of the state of the target processor is always required.

これらの例外から効率的に回復するために、強化モーフ・ホストは多数のハードウエアの改良を含む。これらの改良に、ゲート格納バッファ（図５参照）が含まれる。ゲート格納バッファは、ハードウエア「ゲート」の「アンコミット」側（uncommited side）にワーキング・メモリ状態の変化、およびハードウエア・ゲートの「コミット」側（commited side）にオフィシャル・メモリ状態の変化を格納する。ハードウエア・ゲートにおいて、これらコミットした記憶が主メモリに「流れ出す」。「コミット」動作が、ゲートのアンコミット側から、ゲートのコミット側にメモリの記憶を移転させる。例外が発生すると、「ロールバック」動作によって、ゲート格納バッファ内のアンコミットの記憶は破棄される。 In order to efficiently recover from these exceptions, the enhanced morph host includes a number of hardware improvements. These improvements include a gate storage buffer (see FIG. 5). The gate storage buffer is responsible for changing the working memory state on the “uncommited” side of the hardware “gate” and changing the official memory state on the “commited” side of the hardware gate. Store. At the hardware gate, these committed memories "flow out" to main memory. A “commit” operation transfers memory storage from the uncommitted side of the gate to the commit side of the gate. When an exception occurs, the “rollback” operation discards the uncommitted storage in the gate storage buffer.

また、ハードウエア強化は、多数の追加プロセッサ・レジスタ（図４参照）も含む。追加のレジスタは、レジスタ名称変更（register renaming）によって、同じハードウエア資源を利用しようとする命令の問題軽減を可能にすることに加えて、ホスト命令を処理するための１組のホスト即ちワーキング・レジスタを維持すること、更に元来ターゲット・アプリケーションを作成した対象であるターゲット・プロセッサのオフィシャル状態を保持する１組のターゲット・レジスタを維持することが可能となる。ターゲット・レジスタは、専用インターフェースを介して、それらのワーキング・レジスタ同等物に接続されており、これによって、コミット動作が全てのワーキング・レジスタの内容をオフィシャル・ターゲット・レジスタに迅速に転送することを可能とし、更に「ロールバック」と呼ぶ動作によって、全てのオフィシャル・ターゲット・レジスタの内容をそれらのワーキング・レジスタ同等物に迅速に転送することを可能にする。 The hardware enhancement also includes a number of additional processor registers (see FIG. 4). The additional registers provide a set of host or working groups for processing host instructions, in addition to enabling register renaming to mitigate problems for instructions that attempt to utilize the same hardware resources. It is possible to maintain registers and also to maintain a set of target registers that hold the official state of the target processor that originally created the target application. Target registers are connected to their working register equivalents via a dedicated interface, which allows the commit operation to quickly transfer the contents of all working registers to the official target register. And allows the transfer of the contents of all official target registers to their working register equivalents quickly by an action called “rollback”.

一旦ターゲット命令の１つまたは１群が変換されエラーなく実行し終えると、追加のオフィシャル・レジスタおよびゲート格納バッファによって、メモリの状態およびターゲット・レジスタの状態を更新することができる。更新は、一体のターゲット命令の境界において行われるように、コード・モーフィング・ソフトウエアによって選択される。一連のターゲット命令の変換によって発生した原始ホスト命令を、ホスト・プロセッサが、例外を生ずることなく走らせた場合、これらの命令によって生じたワーキング・メモリ・ストアおよびワーキング・レジスタの状態は、オフィシャル・メモリおよびオフィシャル・ターゲット・レジスタに転送される。 Once one or a group of target instructions has been converted and executed without error, the state of the memory and the state of the target register can be updated with additional official registers and gate storage buffers. Updates are selected by code morphing software so that they occur at the boundaries of a single target instruction. If the source host instruction generated by a series of target instruction conversions is run without exception by the host processor, the working memory store and working register states generated by these instructions are in official memory. And transferred to the official target register.

一方、ターゲット命令の境界ではない時点でホスト命令を処理しているときに例外が発生した場合、最後の更新（またはコミット）時におけるターゲット・レジスタ内の元の状態を、ワーキング・レジスタに呼び戻すことができ、ゲート格納バッファ内のアンコミット・メモリ記憶を破棄することができる。そして、発生した例外がターゲット例外である場合、このターゲット例外の原因となったターゲット命令を一度に１つずつ再変換し、ターゲット・マイクロプロセッサによって実行されるかのように、逐次的なシーケンスで実行することができる。各ターゲット命令をエラーなく正しく実行する毎に、ターゲット・レジスタの状態を更新し、格納バッファ内のデータをメモリに通過させる（ｇａｔｅ）ことができる。そして、ホスト命令を走らせている際に再度例外が発生した場合、モーフ・ホストおよびメモリのターゲット・レジスタによって、ターゲット・プロセッサの正しい状態を保持し、動作は遅延なく正しく処理することができる。この調整的変換によって行われる新たな変換の各々は、変換される毎に、今後の使用のためにキャッシュすることができ、あるいはページ・フォールトのように一度だけまたは希にしか起こらない場合には、破棄してもよい。これらの特徴を組み合わせ、コード・モーフィング・ソフトウエアおよびモーフ・ホストの組み合わせによって作り出されるマイクロプロセッサが、元来ソフトウエアが書かれた対象であるプロセッサよりも迅速に命令を実行するのを支援する。 On the other hand, if an exception occurs when processing a host instruction at a time that is not at the target instruction boundary, the original state in the target register at the time of the last update (or commit) is recalled to the working register. The uncommitted memory storage in the gate storage buffer can be discarded. If the exception that occurred is a target exception, the target instructions that caused the target exception are reconverted one at a time, in a sequential sequence as if executed by the target microprocessor. Can be executed. As each target instruction is executed correctly without error, the state of the target register can be updated and the data in the storage buffer can be passed to memory. If an exception occurs again while running the host instruction, the correct state of the target processor is held by the morph host and the memory target register, and the operation can be processed correctly without delay. Each new conversion performed by this coordinated conversion can be cached for future use each time it is converted, or if it happens only once or rarely, such as a page fault You may discard it. Combining these features, the microprocessor created by the combination of code morphing software and morph host helps to execute instructions more quickly than the processor for which the software was originally written.

単純に命令を変換し、変換した命令をキャッシュし、その命令セットを実行する必要があるときはいつでも各変換を実行することに加えて、コード・モーフィング・ソフトウエアは、異なる変換の並び替え、最適化、および再スケジューリングも行う。最適化プロセスの１つは、実行中に分岐を行う可能性が明白になると、変換したホスト命令の種々のシーケンスを互いにリンクし合う。最終的に、主ループがホスト命令の分岐命令を参照することは、ほぼ完全に不要となる。この条件に達すると、いずれのホスト命令を走らせる前にも、ターゲット命令をフェッチし、ターゲット命令をデコードし、ターゲット命令を構成する原始命令をフェッチし、これら原始動作を最適化し、原始動作を並び替え、これら原始動作を再スケジューリングするために要する時間は不要となる。したがって、改良したマイクロプロセッサを用いていずれのターゲット命令セットを実行するために必要な作業も劇的に減少する。 In addition to simply translating instructions, caching the translated instructions, and performing each transformation whenever the instruction set needs to be executed, the code morphing software reorders the different transformations, It also performs optimization and rescheduling. One of the optimization processes links the various sequences of translated host instructions together when the possibility of taking a branch during execution becomes apparent. Finally, it becomes almost completely unnecessary for the main loop to refer to the branch instruction of the host instruction. When this condition is reached, before running any host instruction, fetch the target instruction, decode the target instruction, fetch the source instructions that make up the target instruction, optimize these primitive operations, The time required for re-scheduling and rescheduling these primitive operations is not required. Thus, the work required to execute any target instruction set using the improved microprocessor is dramatically reduced.

先に指摘したように、命令の順序が正しいが元のままである場合、並び替えの動作は、スケジューラを利用して、より良い命令の実行順序を選択しようとする。スケジューラに伴う問題の１つに、これらの機能には制約が多いことがあげられる。最も基本的な制約は、プログラムを実行したときに、命令の元のシーケンスで得られるのと同じ最終結果をなおも生成しなければならないことである。これらの制約全てのために、典型的なスケジューラは非常に保守的に機能することを余儀なくされ、その結果生成するコードは、実行が遅くなる。 As pointed out above, if the order of instructions is correct but remains the same, the reordering operation uses the scheduler to try to select a better instruction execution order. One of the problems with the scheduler is that these functions are limited. The most basic constraint is that when the program is executed, it still must produce the same final result that is obtained with the original sequence of instructions. All of these constraints force a typical scheduler to function very conservatively, resulting in slower execution of the resulting code.

例えば、正しい結果が生成されることを保証するために、典型的なスケジューラは、決定論に基づいて動作し、依存性を有さない命令、依存性を有する命令、および依存性の存在が未知の命令を選択する。依存性を有する命令、および依存性の存在が未知の命令は全て、依存性が存在するかのように扱われ、並び替えられない。依存性がないことがわかっている命令のみを並び替える。これらの指針にしたがって、スケジューラはコードを生成するので、その実行は遅くなる。 For example, to ensure that correct results are produced, typical schedulers operate on determinism, instructions that do not have dependencies, instructions that have dependencies, and the existence of dependencies is unknown Select the instruction. All instructions with dependencies and instructions whose dependencies do not exist are treated as if they exist and are not reordered. Only reorder instructions that are known to have no dependencies. Following these guidelines, the scheduler generates code, which slows its execution.

別の制約には、モーフ・ホスト・プロセッサの特定的な実施形態に関するものがある。モーフ・ホスト・プロセッサの一実施形態は、動作を遅くする特殊回路をなくすことによって、機能を迅速化するように設計されたプロセッサである。モーフ・ホスト・プロセッサのこの実施形態は、ハードウエア・ロッキング機構を全く用いずに設計される。ハードウエア・ロッキング機構とは、特定の命令シーケンスにおけるステップ全てが、中断されることなく実行されることを保証することを目的とする回路である。ロッキング機構がない場合、スケジューラは、かかるシーケンス内のステップ全てを並び替えずに、元の変換された順序で処理し、プロセッサがシーケンスから正しい結果を生成することを保証するように、厳密に機能することが要求される。 Another limitation is related to the specific embodiment of the morph host processor. One embodiment of a morph host processor is a processor designed to speed up functions by eliminating special circuitry that slows down operation. This embodiment of the morph host processor is designed without any hardware locking mechanism. A hardware locking mechanism is a circuit whose purpose is to ensure that all steps in a particular instruction sequence are executed without interruption. In the absence of a locking mechanism, the scheduler functions strictly to process all the steps in such a sequence, without reordering, in the original transformed order and to ensure that the processor produces the correct result from the sequence. It is required to do.

本発明のスケジューラは、コード・モーフィング・ソフトウエアのソフトウエア部である。従来技術のハードウエアによるスケジューラとは異なり、ソフトウエアのスケジューラは、命令を並び替える際に推測的技法を用いる。スケジューラは、ある動作について、できるだけ高速な動作が望まれると推測し、この結果を達成するように命令を並び替える。モーフ・ホストには、選択した推測が正しくない場合に例外を発生させるハードウエアが設けられている。殆どの場合、推測は正しいので、結果全体は遥かに速い動作となる。しかしながら、推測が正しくない場合、例外が、ソフトウエアにゲート格納バッファおよびターゲット・レジスタを利用させ、正しい状態がわかっている推測的シーケンスの先頭に動作をロールバックさせるのが通例である。 The scheduler of the present invention is a software part of code morphing software. Unlike prior art hardware schedulers, software schedulers use speculative techniques when reordering instructions. The scheduler assumes that the fastest possible operation is desired for a given operation and reorders the instructions to achieve this result. The morph host is equipped with hardware that raises an exception if the selected guess is incorrect. In most cases, the guess is correct and the overall result is much faster. However, if the guess is not correct, an exception typically causes the software to utilize the gate store buffer and target register and roll back the operation to the beginning of the speculative sequence where the correct state is known.

従来技術のスケジューラによって使用されている決定論的戦略とは対照的に、本発明のスケジューラは、並び替えのために命令のカテゴリを選択する際、確率論的指針を利用する。改良されたスケジューラは、変換によってターゲット命令セットから生成される命令のシーケンスから、４つのカテゴリの命令シーケンスを選択する（図６参照）。これらのカテゴリは、依存性のない命令シーケンス、既知の依存性を有する命令シーケンス、依存性を有さない確率が高い命令シーケンス、および依存性を有する確率が高い命令シーケンスを含む。従来技術の場合と同様、依存性がないことがわかっている命令シーケンスは、スケジューラによって任意に並び替えることができる。既知の依存性を有する命令シーケンスは、変換器によって与えられた逐次順序で処理される。 In contrast to the deterministic strategy used by prior art schedulers, the scheduler of the present invention utilizes probabilistic guidance when selecting a category of instructions for reordering. The improved scheduler selects four categories of instruction sequences from the sequence of instructions generated from the target instruction set by the transformation (see FIG. 6). These categories include instruction sequences that have no dependencies, instruction sequences that have known dependencies, instruction sequences that have a high probability of having no dependencies, and instruction sequences that have a high probability of having dependencies. As in the case of the prior art, an instruction sequence known to have no dependency can be arbitrarily rearranged by the scheduler. Instruction sequences with known dependencies are processed in the sequential order given by the converter.

しかしながら、依存性を有さない確率が高い命令は、実際に依存性がないものとして扱われ、実行を可能な限り高速化するように並び替えられる。モーフ・ホストには、正しくない並び替えを検出し、依存性が実際に存在する場合に例外を発生させるハードウエア手段が設けられている。スケジューラはこのハードウエア手段と協働してチェックを行い、並び替えた各命令が正しく実行できない場合を見つけ出し、動作シーケンスが正しく実行しない場合に例外を発生することができる。かかる例外によって、スケジューラは、それが以前に行った並び替えで、例外を発生したものを無視し、保守的にまたはより適切な他の何らかの態様でそのシーケンスを扱うことが可能となる。 However, instructions with a high probability of having no dependency are treated as actually having no dependency, and are rearranged so as to speed up execution as much as possible. The morph host is provided with hardware means for detecting an incorrect reordering and generating an exception if a dependency actually exists. The scheduler performs a check in cooperation with the hardware means, finds a case where each rearranged instruction cannot be executed correctly, and can generate an exception when the operation sequence does not execute correctly. Such an exception allows the scheduler to handle the sequence conservatively or in some other more appropriate manner, ignoring the reordering it has previously made that caused the exception.

一方、依存性を有する確率が高い命令の処理は、積極的な場合または保守的な場合のいずれかが可能である。積極的に処理する場合、これらは、依存性を有さない確率が高い命令として扱われる。これらを並び替え、出来るだけ実行を高速化し、モーフ・ホスト内に設けられているハードウエア手段を用いて、正しくない並び替えが行われた場合を検出し例外を発生させる。保守的に処理する場合、変換器によって与えられる逐次順序でこれらを処理する。通常では、保守的な扱いの方が処理が速い。何故なら、多数の例外を発生することが、実行速度を著しく低下させるからである。 On the other hand, an instruction with a high probability of having dependency can be processed either aggressively or conservatively. When actively processing, these are treated as instructions having a high probability of having no dependency. These are rearranged, the execution is speeded up as much as possible, and the hardware means provided in the morph host is used to detect the case where the incorrect rearrangement is performed and generate an exception. When processing conservatively, they are processed in the sequential order given by the converter. Normally, conservative handling is faster. This is because generating a large number of exceptions significantly reduces execution speed.

本発明の一実施形態では、図７に示すような回路をホスト・プロセッサに追加する。この回路は、特別の「ロード・アンド・プロテクト」または「ストア・アンド・プロテクト」動作を用いて、スケジューラによって並び替えられた命令がアクセスするメモリ・アドレスを格納するために利用される。かかる「ロード・アンド・プロテクト」または「ストア・アンド・プロテクト」動作は、命令が並び替えられた場合にはいつでも用いることができ、並び替えられた命令がアクセスするメモリ・アドレスを、保護レジスタとして用いるために設計されたモーフ・ホストの複数のレジスタ７１の内１つに置く効果を有する。一実施形態では、８つの保護レジスタ７１が設けられている。「ロード・アンド・プロテクト」または「ストア・アンド・プロテクト」命令は、その動作に用いられる特定の保護レジスタを指示する。 In one embodiment of the invention, a circuit such as that shown in FIG. 7 is added to the host processor. This circuit is used to store memory addresses that are accessed by instructions reordered by the scheduler using a special “load and protect” or “store and protect” operation. Such “load and protect” or “store and protect” operations can be used whenever an instruction is reordered, and the memory address accessed by the reordered instruction is used as a protection register. It has the effect of being placed in one of a plurality of morph host registers 71 designed for use. In one embodiment, eight protection registers 71 are provided. The “load and protect” or “store and protect” instruction points to the specific protection register used for the operation.

この明細書全体を通じて、ロード・アンド・プロテクト命令ならびにストア・アンド・プロテクト命令を説明する際「メモリ・アドレス」という用語を用いるが、この用語は、保護されるメモリ領域を決定するために可能な多数の構成に対する参照として用いている。メモリ・アドレスという用語は、保護対象のメモリ・アドレスの記述子を意味するために用いている。例えば、メモリがバイト・アドレス可能なシステムでは、本発明の一実施形態は、開始メモリ・アドレスと、アドレス領域内のバイト数に等しい数のビットとを用いて、これらのバイト各々の保護状態を示す。同様なアドレシングを行う別の実施形態では、開始メモリ・アドレスおよび長さを利用し、一方第３の実施形態では、個々のバイト・アドレスと、バイト・アドレス毎の個々の比較器を利用する。 Throughout this specification, the term “memory address” is used in describing load and protect instructions as well as store and protect instructions, but this term is possible to determine the memory area to be protected. It is used as a reference for many configurations. The term memory address is used to mean a descriptor of the memory address to be protected. For example, in a system where the memory is byte addressable, one embodiment of the present invention uses the starting memory address and a number of bits equal to the number of bytes in the address area to set the protection state of each of these bytes. Show. Another embodiment that provides similar addressing utilizes the starting memory address and length, while the third embodiment utilizes individual byte addresses and individual comparators per byte address.

動作の一例では、命令シーケンスは、第１ストア命令ＳＴＯＲＥ１、第２ストア命令ＳＴＯＲＥ２、およびロード命令ＬＯＡＤ１を順に含む。スケジューラは、並び替えによって正しくない動作が行われる確率は低いと想定して、これらの命令を並び替え、並び替えたシーケンスでは、ロード命令を最初に、第２ストア命令を２番目に、そして第１ストア命令を３番目に置く。これを行うために、スケジューラは、ロード・データを汎用レジスタ７２の１つに置き、更にロード・データが得られたメモリ位置のアドレスを、命令によって指定された保護レジスタ７１に置くために「ロード・アンド・プロテクト」動作を用いる。ソフトウエア・スケジューラは、並び替えによってエラーが生じたか否か判定するためにはどの命令をチェックすべきかわかっているので、スケジューラは、並び替えによって影響を受ける可能性がある次の命令（この場合、ＳＴＯＲＥ１およびＳＴＯＲＥ２命令であり、その前にロードが置かれた）に指示（例えば、ビットマスク内のビット）を置き、保護対象メモリ・アドレスを保持する特定の保護レジスタを示す。この指示が特定の位置（トラッピング機能のために８つの保護レジスタを用いる場合、８つのビットの内１つ）にあることによって、ストア命令によって各記憶が置かれるアドレスが、指示された保護レジスタ７１内に保持されているメモリ・アドレスと重複するか否かによって、命令の実行が左右されることを示す。 In one example of operation, the instruction sequence includes a first store instruction STORE1, a second store instruction STORE2, and a load instruction LOAD1 in order. The scheduler reorders these instructions, assuming that there is a low probability of incorrect operation due to reordering. In the reordered sequence, the load instruction first, the second store instruction second, and the second Place one store instruction third. To do this, the scheduler places the load data in one of the general purpose registers 72 and “load” to place the address of the memory location from which the load data was obtained in the protection register 71 specified by the instruction. Use “and protect” operation. The software scheduler knows which instruction to check to determine if an error has occurred due to the reordering, so the scheduler can determine the next instruction that might be affected by the reordering (in this case , STORE1 and STORE2 instructions, preceded by the load), indicate an instruction (eg, a bit in the bitmask) to indicate the particular protection register that holds the protected memory address. By having this indication in a specific location (one of 8 bits if 8 protection registers are used for the trapping function), the address where each store is placed by the store instruction is indicated in the indicated protection register 71. Indicates that execution of an instruction depends on whether it overlaps with a memory address held in the memory.

同様に、スケジューラは「ストア・アンド・プロテクト」動作を用いてＳＴＯＲＥ２命令のデータをメモリに格納し、データを格納したメモリ位置のアドレスを、ストア・アンド・プロテクト命令が指定する保護レジスタ７１内に置く。また、スケジューラは、並び替えによって影響を受ける可能性がある命令（この場合、ＳＴＯＲＥ１命令のみ）の各々のビットマスク内に指示を置き、この保護対象メモリ・アドレスを保持する特定の保護レジスタを示す。最後に、スケジューラは、最後のＳＴＯＲＥ１命令に対して通常のストア命令を用いる。 Similarly, the scheduler stores the data of the STORE2 instruction in the memory using the “store and protect” operation, and the address of the memory location where the data is stored is in the protection register 71 specified by the store and protect instruction. Put. In addition, the scheduler places an instruction in each bit mask of instructions that may be affected by the reordering (in this case, only the STORE1 instruction) and indicates a specific protection register that holds this protected memory address. . Finally, the scheduler uses a normal store instruction for the last STORE1 instruction.

命令シーケンスが実行されると、ホスト・ハードウエアは比較回路７３を用いて、これら３つの命令の各々について、命令のメモリ・アドレスが、保護レジスタ７１内の１つに格納されているメモリ・アドレスにおけるデータのいずれかの部分と重複しないか否か判定を行い、重複する場合、例外を発生する。このように、ＬＯＡＤ１動作（ロード・アンド・プロテクトとなっている）はそのメモリを保護レジスタ７１に書き込むが、いずれの保護レジスタもチェックしない。何故なら、セットされたインディケータはいずれの保護レジスタも指定していないからである。ＳＴＯＲＥ２動作（ストア・アンド・プロテクトになっている）は、そのメモリ位置を異なる保護レジスタ７１に書き込み、ＬＯＡＤ１命令に用いられている保護レジスタ７１をチェックして、それらのメモリ位置間の重複を判定する。最後に、ＳＴＯＲＥ１動作（保護レジスタ・インディケータによって増大したが、単なるストアのままである）が、ＬＯＡＤ１およびＳＴＯＲＥ２命令の各々に対する保護レジスタをチェックし、そのメモリ・アドレスとＬＯＡＤ１およびＳＴＯＲＥ２命令のメモリ・アドレスとの間の重複を調べる。前述の第１および第３実施形態の場合、比較によって、保護をバイト・レベルに正確に適用することが可能となる。 When the instruction sequence is executed, the host hardware uses the comparison circuit 73 to, for each of these three instructions, the memory address of the instruction stored in one of the protection registers 71. It is determined whether there is no duplication with any part of the data in, and if there is any duplication, an exception is generated. Thus, the LOAD1 operation (load and protect) writes its memory into the protection register 71, but does not check any protection registers. This is because the set indicator does not specify any protection register. The STORE2 operation (store and protect) writes its memory location to a different protection register 71 and checks the protection register 71 used for the LOAD1 instruction to determine overlap between those memory locations To do. Finally, the STORE1 operation (increased by the protection register indicator but remains as a store) checks the protection register for each of the LOAD1 and STORE2 instructions, and its memory address and the memory address of the LOAD1 and STORE2 instructions Examine the overlap between and. In the case of the first and third embodiments described above, the comparison makes it possible to apply protection exactly at the byte level.

いずれの例外も、コード・モーフィング・ソフトウエアに、当該例外に応答して講ずる処置を決定させる。通例では、コード・モーフィング・ソフトウエアは、並び替えた命令シーケンスの実行を中断させ、ホストに、命令シーケンスの先頭におけるターゲット・プロセッサの状態に戻させることにより、命令シーケンスを保守的に再処理できるようにする。アドレスが同一でない場合（この例では、ストア命令が保護対象メモリ・アドレスをアクセスしないことを示す）、並び替えた命令シーケンスの実行は、並び替えによって得られる加速ペースで進められる。 Any exception causes the code morphing software to decide what action to take in response to the exception. Typically, code morphing software can conservatively reprocess instruction sequences by interrupting execution of the reordered instruction sequence and causing the host to return to the state of the target processor at the beginning of the instruction sequence. Like that. If the addresses are not identical (in this example, indicates that the store instruction does not access the protected memory address), execution of the rearranged instruction sequence proceeds at the accelerated pace obtained by the rearrangement.

ホスト・プロセッサとスケジューラとの間で通信を実現するために、モーフ・ホストが利用するロードおよびストア命令を修正する。一実施形態では、これらの命令は、「ロード・アンド・プロテクト」および「ストア・アンド・プロテクト」命令と完全に置換される。各「ロード・アンド・プロテクト」ならびに各「ストア・アンド・プロテクト」命令は、ビットマスク（例えば、８つの保護レジスタに対応する８つのビット）を含み、これらのビットをフラグとして用いて、並び替えた命令またはエイリアスした命令のメモリ・アドレスを探す、特定の保護レジスタを示す。これらのビットの各々は、チェックすべきハードウエアのためにメモリ・アドレスを格納する、使用可能な保護レジスタの１つを指定する。このビットマスクによって、命令を並び替えたときのメモリ・アドレスを格納するように指定された具体的な保護レジスタをチェックした後、並び替えによって影響を受け得る後続の命令を実行する。「ロード・アンド・プロテクト」ならびに「ストア・アンド・プロテクト」命令は、それぞれ、通常のロードおよびストア命令の代わりに用いることも可能である。何故なら、ビットマスクのいずれのビットもセットされていない場合、チェックは行われないからである。このような場合、「ロード・アンド・プロテクト」ならびに「ストア・アンド・プロテクト」命令は、ロードおよびストア動作と同一である。また、注意すべきこととして、保護レジスタは、メモリ・データを保持する特定の汎用レジスタと関連させることができるので、少数の保護レジスタの効率的な使用が可能となるということがあげられる。 Modify load and store instructions utilized by the morph host to achieve communication between the host processor and the scheduler. In one embodiment, these instructions are completely replaced with “load and protect” and “store and protect” instructions. Each “Load and Protect” as well as each “Store and Protect” instruction includes a bit mask (eg, 8 bits corresponding to 8 protection registers) and reorders using these bits as flags Indicates a specific protection register that looks for the memory address of a specific instruction or aliased instruction. Each of these bits specifies one of the available protection registers that stores the memory address for the hardware to be checked. After checking the specific protection register designated to store the memory address when the instruction is rearranged by this bit mask, subsequent instructions that may be affected by the rearrangement are executed. "Load and protect" and "store and protect" instructions can also be used in place of normal load and store instructions, respectively. This is because if no bit in the bit mask is set, no check is made. In such cases, the “load and protect” and “store and protect” instructions are identical to the load and store operations. It should also be noted that a protection register can be associated with a specific general purpose register that holds memory data, thus allowing efficient use of a small number of protection registers.

また、本発明のホスト・プロセッサは、並び替えた命令に関係する有効なメモリ・アドレスを収容する保護レジスタの位置を格納する「イネーブル保護レジスタ」７４と呼ぶ追加のレジスタも含む。「ロード・アンド・プロテクト」または「ストア・アンド・プロテクト」命令によって与えられる指示を用いて、特定の保護レジスタを示すビットをセットして、当該保護レジスタを示す。一実施形態では、イネーブル保護レジスタのビットは、コミット動作が行われ、変換され並び替えられた命令シーケンスが並び替え例外が生じずに実行されたことを示す場合にはいつでも、クリアされる。並び替えは、全て２つのコミット点間に現れる命令のシーケンスにおいてのみ行われるので、並び替え動作では、新たに変換された命令シーケンス毎に、並び替えに割り当てられる保護レジスタの全てを利用することできる。 The host processor of the present invention also includes an additional register called an “enable protection register” 74 that stores the location of the protection register that contains the valid memory address associated with the reordered instruction. The instruction given by the “load and protect” or “store and protect” instruction is used to set a bit indicating a particular protection register to indicate that protection register. In one embodiment, the enable protection register bit is cleared whenever a commit operation is performed indicating that the translated and reordered instruction sequence has been executed without a reordering exception. Since the reordering is performed only in the sequence of instructions appearing between two commit points, the reordering operation can use all of the protection registers allocated for reordering for each newly converted instruction sequence. .

この新発明の追加の利点として、「ストア・アンド・プロテクト」動作によって、互いに対するストアの並び替えが可能となることがあげられる。本発明では、データをメモリ位置に格納し、このメモリ位置のアドレスを保護レジスタ内に保護することによって、これを達成することができる。並び替えによって影響を受け得る後続のストアが現れたとき、そのビットマスクは、ハードウエアがメモリ・アドレスに対してチェックすべき保護レジスタを示し、例外を発生すべきか否か、またはストアの並び替えが正しく行われたか否か判定を行う。 An additional advantage of this new invention is that the store can be reordered with respect to each other by a “store and protect” operation. In the present invention, this can be accomplished by storing data in a memory location and protecting the address of this memory location in a protection register. When subsequent stores appear that can be affected by reordering, the bit mask indicates which protection registers the hardware should check against the memory address, whether an exception should be raised, or the reordering of the store It is determined whether or not is correctly performed.

新規マイクロプロセッサの一実施形態では、動作の実行において頻繁に用いられるメモリ・データを実行ユニット・レジスタ内に複製し（または「エイリアスし」）、データをメモリからフェッチするために要する時間またはデータをメモリに格納するために要する時間をなくすことを可能にする回路構成が設けられている。例えば、メモリ内のデータが１つ以上のコード・シーケンスの実行中に頻繁に再使用される場合、データを用いる毎に、このデータをメモリから読み出し、実行ユニット内のレジスタにロードしなければならないのが通例である。かかる頻繁なメモリ・アクセスによって必要となる時間を短縮するために、代わりに、コード・シーケンスの開始時にデータをメモリから実行ユニット・レジスタに一旦ロードし、コード・シーケンスが継続する期間中、メモリ空間の代わりとして機能するこのレジスタを指定する。一旦これを行うと、通常では指定されたメモリ・アドレスからレジスタにデータをロードする際に関与するロード動作の各々は、代わりに、単なるレジスタ間コピー動作となり、遥かに速いペースで進み、更にこれらのコピー動作でさえも、更なる最適化によってなくせる場合が多い。 In one embodiment of the new microprocessor, memory data that is frequently used in performing operations is replicated (or “aliased”) into execution unit registers, and the time or data required to fetch the data from memory is A circuit configuration is provided that makes it possible to eliminate the time required for storage in the memory. For example, if data in memory is frequently reused during execution of one or more code sequences, each time the data is used, this data must be read from memory and loaded into a register in the execution unit It is customary. To reduce the time required by such frequent memory accesses, instead, the data is temporarily loaded from the memory into the execution unit register at the beginning of the code sequence, and the memory space is used for the duration of the code sequence. Specifies this register to act as a replacement for. Once this is done, each of the load operations normally involved in loading data from a specified memory address into a register instead is simply a register-to-register copy operation that proceeds at a much faster pace, and these In many cases, even the copying operation can be eliminated by further optimization.

同様に、コード・シーケンスの実行は、当該コード・シーケンスの実行中データを頻繁に１つのメモリ・アドレスに書き込むことを必要とする場合が多い。かかる頻繁な同一アドレスに対するメモリ・ストアに要する時間を短縮するために、データをメモリ・アドレスに書き込む毎に、コード・シーケンスが続いている期間中メモリ空間の代わりとして機能するように指定されている実行ユニット・レジスタにこれを転送することができる。一旦実行ユニット・レジスタを指定すれば、データを変更する毎に必要となるのは、単純なレジスタ間転送動作のみであり、これはメモリ・アドレスへのストアよりも遥かに速く進む。 Similarly, execution of a code sequence often requires that data being executed in that code sequence be frequently written to one memory address. In order to reduce the time required to store memory for such frequent identical addresses, each time data is written to a memory address, it is designated to serve as a replacement for memory space for the duration of the code sequence. This can be transferred to the execution unit register. Once an execution unit register has been specified, every time data is changed, only a simple register-to-register transfer operation is required, which proceeds much faster than a store to memory address.

エリアシング回路の動作は、１９９６年９月２６日に出願され、Method and Apparatus for Aliasing Memory Data in an Advanced Microprocessor（先進のマイクロプロセッサにおいてメモリ・データをエイリアスする方法および装置）と題するＭ．Ｗｉｎｇ等の米国特許出願第０８／７２１，６９８号に記載されている。この特許出願は、本発明の譲受人に譲渡されている。 The operation of the aliasing circuit was filed on Sep. 26, 1996 and is described in M.S. entitled Method and Apparatus for Aliasing Memory Data in an Advanced Microprocessor . U.S. patent application Ser. No. 08 / 721,698 to Wing et al. This patent application is assigned to the assignee of the present invention.

本発明の並び替え動作を加速するための第２実施形態は、ある追加ハードウエアを利用し、前述の特許出願に記載されているように、並び替えとメモリ・アドレスのエイリアシング双方に同じハードウエアを使用できるようにするものである。尚、並び替えた命令は隣接するコミットした動作の間に現れるのが通例であるのに対して、実行ユニット・レジスタ内にエイリアスしたメモリ・データは、実際には、遥かに長い期間存続するのが通例である。この第２実施形態では、第２「永続」レジスタ７６を追加し、イネーブル保護レジスタ７４による並び替えのために設けられた短期保護と共に、長期保護即ち永続的保護を利用可能とすることを注記しておく。第２永続レジスタ７６はレジスタ７４と同様に用いられるが、隣接するコミット動作間よりも長い期間にわたってメモリ・アドレスを維持すべき保護レジスタのみを記録する。 The second embodiment of the present invention for accelerating the reordering operation utilizes some additional hardware and, as described in the aforementioned patent application, the same hardware for both reordering and memory address aliasing. Is to be able to use. Note that reordered instructions typically appear between adjacent committed operations, whereas memory data aliased in execution unit registers actually lasts much longer. Is customary. Note that in this second embodiment, a second “persistent” register 76 is added to enable long-term or permanent protection, along with short-term protection provided for reordering by the enable protection register 74. Keep it. The second persistent register 76 is used in the same manner as the register 74, but records only the protection register that should maintain the memory address for a longer period than between adjacent commit operations.

例えば、ある長い期間（例えば、ループの間）使用するために、メモリ・アドレスをエイリアスしデータをホスト・レジスタに格納することが望まれる場合、どの保護レジスタが長期エイリアシング動作のアドレスを保持しているかという指示を命令からコピーし、イネーブル保護レジスタ７４および第２永続レジスタ７６双方に置く。並び替えた命令シーケンスが例外を発生することなく実行することにより、第１コミット動作を行うことができたと仮定すると、イネーブル保護レジスタがクリアされる。このように、チェックすべき並び替えられた命令のアドレスを保持する保護レジスタを示す短期フラグは、コミット毎に消去される。イネーブル保護レジスタをコミットの時点でクリアした後、第２永続レジスタの内容をイネーブル保護レジスタに書き込む。どの保護レジスタが長期エリアシングのために用いられているかを示す永続レジスタ内のデータは、イネーブル保護レジスタに書き込まれるので、長期エイリアシングに用いられる保護レジスタの指示は、コミット動作による影響を受けない。コミット毎に永続レジスタの内容をイネーブル保護レジスタに書き込むことによって、次の命令シーケンスに対して、そして究極的には、エイリアシング動作のためにデータがもはや不要となって第２レジスタを最終的にクリアするまで、保護を効果的に継続させる。 For example, if it is desired to alias the memory address and store the data in a host register for use for a long period of time (eg during a loop), which protection register holds the address of the long-term aliasing operation Is copied from the instruction and placed in both the enable protection register 74 and the second persistent register 76. Assuming that the first commit operation can be performed by executing the rearranged instruction sequence without causing an exception, the enable protection register is cleared. In this way, the short-term flag indicating the protection register that holds the address of the rearranged instruction to be checked is deleted every commit. After clearing the enable protection register at the time of commit, the contents of the second permanent register are written into the enable protection register. Since the data in the permanent register indicating which protection register is used for long-term aliasing is written to the enable protection register, the indication of the protection register used for long-term aliasing is not affected by the commit operation. By writing the contents of the persistent register to the enable protection register at each commit, the data is no longer needed for the next instruction sequence, and ultimately for the aliasing operation, and the second register is finally cleared. Until you do so, continue to protect effectively.

第２永続レジスタ７６に加えて、シャドウ・レジスタ７８を保持し、永続レジスタ内に保持されている情報も格納しておく。シャドウ・レジスタは、コミットおよびロールバック動作中に用いられる。コミットが発生すると、永続レジスタ７６内のデータは、前述のように、イネーブル保護レジスタ７４にコピーされる。また、コミット時に、永続レジスタをシャドウ（保護）するレジスタ７８に同じデータがコピーされるので、並び替えられる次の命令シーケンスの開始時には、シャドウ・レジスタに永続レジスタの設定が収容されている。次の命令シーケンスの実行中に例外が発生し、ロールバック動作が必要となった場合、シャドウ・レジスタの内容を、イネーブル保護レジスタおよび永続レジスタの双方にコピーする。これは、命令シーケンスの実行開始前にイネーブル保護レジスタおよび永続レジスタにあったのと同じ指示をこれらのレジスタに置くことになり、これによって、更に保守的に実行を続けるための正しい状態を確保する。 In addition to the second permanent register 76, a shadow register 78 is held, and information held in the permanent register is also stored. The shadow register is used during commit and rollback operations. When a commit occurs, the data in persistent register 76 is copied to enable protection register 74 as described above. Further, since the same data is copied to the register 78 that shadows (protects) the permanent register at the time of commit, the setting of the permanent register is accommodated in the shadow register at the start of the next instruction sequence to be rearranged. If an exception occurs during execution of the next instruction sequence and a rollback operation is required, the contents of the shadow register are copied to both the enable protection register and the permanent register. This will place in these registers the same instructions that were in the enable protection register and the permanent register before the execution of the instruction sequence started, thereby ensuring the correct state to continue execution more conservatively. .

本発明の構成によって、付加的な利点が得られる。永続レジスタ７６の追加によって、短期間（コミット間）における並び替え能力、および実行ユニット・レジスタ内に長期間エイリアス・メモリ・データを保持する能力双方を増強するために、同じハードウエアの使用が可能となり、これによってメモリ・アクセスの冗長性を排除することができ、更にコミット動作間に生ずる他の種類の冗長性を排除するためにもこれを用いることができる。例えば、１つの命令シーケンス内で同じメモリ・アドレスから２つのロードが行われる可能性がある。これが発生し、このメモリ・アドレスに対するストアが中間で生じない場合、２番目のロードは単に無視すればよく、最初のメモリ・アクセスによってレジスタ内に置かれたデータを、２番目のロード動作の代わりに、変更することなく用いればよい。しかしながら、これらのロード間にストアが入る場合、このストアが、２番目のアクセスを行うメモリ・アドレスに対して行われたのか否か判定する必要がある。つまり、従来技術の最適化技法では、ロード間にストアが入る場合、２番目のロードを削除することができなかった。 Additional advantages are provided by the arrangement of the present invention. With the addition of a persistent register 76, the same hardware can be used to enhance both the ability to reorder in the short term (between commits) and to retain alias memory data for long periods in the execution unit registers. This eliminates memory access redundancy, and can also be used to eliminate other types of redundancy that occur during commit operations. For example, two loads can occur from the same memory address within an instruction sequence. If this happens and the store for this memory address does not occur in the middle, the second load can simply be ignored, and the data placed in the register by the first memory access will be substituted for the second load operation. In addition, it may be used without change. However, if a store enters between these loads, it is necessary to determine whether this store has been performed on the memory address performing the second access. In other words, in the conventional optimization technique, when a store enters between loads, the second load cannot be deleted.

本発明は、動作を短縮するために有効に利用することができる。最初のロードを「ロード・アンド・プロテクト」動作に変更し、メモリ・アドレスを保護レジスタに格納し、ストア命令がチェックすべき特定の保護レジスタを示すフラグを受け取るようにすれば、２番目のロードを削除することができ、「ロード・アンド・プロテクト」動作によって格納したデータを２番目のロードのために用いることができる。ストア命令が保護対象メモリ・アドレスにアクセスしようとする場合、ストアのアクセスが行われる前に、チェックすべき保護レジスタを示すフラグが比較を行わせる。これによって例外を発生し、正しいターゲット状態が存在する最後のコミット点までのロールバックが行われる。こうして、スケジューラは、２番目のロード動作を含む適切な命令シーケンスを与えることができ、シーケンスを再度実行することができる。 The present invention can be effectively used to shorten the operation. If the first load is changed to a “load and protect” operation, the memory address is stored in a protection register and the store instruction receives a flag indicating the specific protection register to check, the second load The data stored by the “load and protect” operation can be used for the second load. When a store instruction attempts to access a protected memory address, a flag indicating the protection register to be checked performs a comparison before the store is accessed. This raises an exception and rolls back to the last commit point where the correct target state exists. Thus, the scheduler can provide an appropriate instruction sequence including a second load operation and can execute the sequence again.

同様に、２つのコミット動作間にある命令シーケンスが、同じメモリ・アドレスへの２回のストアを含む場合、このメモリ・アドレスからのロードがストアの間にない場合、最初のストアを削除することができる。しかしながら、このメモリ・アドレスからのデータが中間のロードのために用いられる場合、最初のストアを削除することはできない。本発明を用いて、ロード命令を「ロード・アンド・プロテクト」にすれば、このメモリ・アドレスに対する最初のストアを削除することができる。次いで、２番目のストアが「ロード・アンド・プロテクト」から保護レジスタ指示を受け取り、アクセスのメモリ・アドレスをチェックする。ロードが異なるアドレスからの場合、２番目のストアは正しく続行ことができる。ロードが同じアドレスからの場合、２番目のストアのためにメモリにアクセスしようとすると、例外が発生し、最後のコミット点に動作をロールバックすることになる。この地点から、スケジューラは、双方のストア動作を含むように命令を再スケジューリングし、シーケンスを再実行すればよい。 Similarly, if an instruction sequence between two commit operations includes two stores to the same memory address, delete the first store if there is no load from this memory address between the stores. Can do. However, if the data from this memory address is used for an intermediate load, the first store cannot be deleted. Using the present invention, the first store for this memory address can be deleted by making the load instruction "load and protect". The second store then receives the protection register indication from “Load and Protect” and checks the memory address of the access. If the load is from a different address, the second store can continue correctly. If the load is from the same address, attempting to access memory for the second store will raise an exception and roll back the operation to the last commit point. From this point, the scheduler may re-schedule the instruction to include both store operations and re-execute the sequence.

以上、好適な実施形態に関して本発明を説明したが、本発明の精神および範囲から逸脱することなく、種々の変更や変形も当業者には可能であることは認められよう。例えば、特定のプロセッサ系列と共に機能するように設計された実施形態について本発明を説明したが、本発明は、他のプロセッサ・アーキテクチャのために設計されたプログラムや、プログラムにも同様に適用されることは理解して当然である。したがって、本発明は、添付した特許請求の範囲に基づいて解釈すべきものとする。 While the invention has been described with reference to the preferred embodiment, it will be appreciated that various changes and modifications can be made by those skilled in the art without departing from the spirit and scope of the invention. For example, although the invention has been described with respect to embodiments designed to work with a particular processor family, the invention applies equally to programs and programs designed for other processor architectures. It is natural to understand that. Accordingly, the invention should be construed based on the claims that follow.

図１は、本発明を利用可能な、新規マイクロプロセッサを示す図である。FIG. 1 is a diagram illustrating a novel microprocessor that can utilize the present invention. 図２は、図１の新規マイクロプロセッサを実現するハードウエアのブロック図である。FIG. 2 is a block diagram of hardware that implements the novel microprocessor of FIG. 図３は、図１の新規プロセッサの主処理ループを示すフロー・チャートである。FIG. 3 is a flowchart showing the main processing loop of the new processor of FIG. 図４は、新規プロセッサの一部を示すブロック図である。FIG. 4 is a block diagram showing a part of the new processor. 図５は、新規プロセッサの別の一部を示すブロック図である。FIG. 5 is a block diagram showing another part of the new processor. 図６は、本発明にしたがって設計したスケジューラ・ソフトウエアの動作を示すフロー・チャートである。FIG. 6 is a flow chart showing the operation of scheduler software designed in accordance with the present invention. 図７は、本発明を実施する回路の一実施形態を示すブロック図である。FIG. 7 is a block diagram illustrating one embodiment of a circuit implementing the present invention.

Claims

Instruction scheduling and execution method comprising:
a) a first memory operation including a first address range; a second memory operation including at least a portion of the first address range; and a third intervening in the first and second memory operations. Accessing a sequence of instructions consisting of memory operations,
It is not known whether the third memory operation includes an address within the first address range, and at least one of the first to third memory operations includes a store operation;
An access step;
b) removing the second memory operation from the sequence of instructions;
c) adding information to the third memory operation to determine the first address range,
The information comprises a mask to determine which of the plurality of registers holds the protected address;
Additional steps;
d) removing the second memory operation and executing the sequence of instructions;
e) During the execution, it is determined whether the third memory operation affects an address in the first address range, and if so, an exception is generated and the second memory is Re-executing the sequence of instructions including the operation;
A method consisting of:

The method of claim 1, wherein step e) further determines whether during the execution, the third memory operation affected an address within the range of any protected address. A method comprising steps.

The method of claim 1, further comprising the step of storing a memory address associated with the first address range in one of the plurality of registers prior to execution of the instruction sequence. how to.

The method of claim 1, further comprising the step of storing a memory address associated with the first address range in a register prior to execution of the instruction sequence.

5. The method of claim 4, wherein the sequence of instructions includes a fourth memory operation in the sequence of instructions after the first instruction sequence;
Further, even if the fourth memory operation affects the first address range, second information that allows the fourth memory operation to be executed without occurrence of an exception is obtained. Including the step of appending to
A method characterized by that.

2. The method of claim 1, wherein the first and second memory operations can be safely reduced to a single memory operation if the third memory operation is not intervening.

Instruction scheduling and execution method comprising:
a) a first load instruction that loads from a first address range, a second load instruction that loads from the first address range, and a store instruction that intervenes in the first and second load instructions. Accessing a sequence of instructions comprising:
It is not known whether the store instruction stores to an address in the first address range;
An access step;
b) removing the second load instruction from the sequence of instructions;
c) executing the sequence of instructions without the second load instruction, including storing a memory address for the first address range in a protection register;
d) During the execution, determine whether the store instruction stores to an address in the first address range, and if so, generate an exception and include the second load instruction Re-executing the sequence;
A method consisting of:

The method of claim 7, wherein step b) further comprises adding a flag to the store instruction to indicate the protection register.

8. The method of claim 7, wherein step b) further comprises changing the first load instruction to a load and protect instruction.

Instruction scheduling and execution method comprising:
a) an instruction comprising a first store instruction to a first address range, a second store instruction to the first address range, and a load instruction intervening in the first and second store instructions A step of accessing a sequence of
It is not known whether the load instruction affects the first address range;
An access step;
b) removing the first store instruction from the sequence of instructions, comprising storing a memory address associated with the load instruction in a protection register;
c) executing the sequence of instructions without the first store instruction;
d) during the execution, determine whether the load instruction has affected an address in the first address range, and if so, generate an exception and include the first store instruction Re-executing the sequence of
A method consisting of:

11. The method of claim 10, wherein step b) further comprises adding a flag to the second store instruction to indicate the protection register.

11. The method of claim 10, wherein step b) further comprises changing the load instruction to a load and protect instruction.

Instruction scheduling and execution method comprising:
a) a first store instruction for storing in a first address range; a load instruction for loading from the first address range; a second store instruction for intervening in the first store instruction and the load instruction; Accessing a sequence of instructions comprising:
It is not known whether the second store instruction stores to an address in the first address range;
An access step;
b) removing the load instruction from the sequence of instructions, comprising storing a memory address associated with the first address range in a protection register;
c) executing the sequence of instructions without the load instruction;
d) During the execution, it is determined whether the second store instruction stores to an address in the first address range, and if so, an exception is generated and the instruction including the load instruction Re-executing the sequence;
A method consisting of:

14. The method of claim 13, wherein step b) further comprises adding a flag to the second store instruction to indicate the protection register.

14. The method of claim 13, wherein step b) further comprises changing the first store instruction to a store and protect instruction.

Instruction scheduling and execution method comprising:
a) a load instruction that loads from a first address range, a first store instruction that is not known to store to an address within the first address range, and a first instruction that stores to the first address range Accessing a sequence of instructions consisting of two store instructions,
The first store instruction intervenes in the load instruction and the second store instruction;
An access step;
b) removing the second store instruction from the sequence of instructions, comprising storing a memory address associated with the first address range in a protection register;
c) executing the sequence of instructions without the second store instruction;
d) During the execution, it is determined whether the first store instruction stores to an address in the first address range, and if so, an exception is generated and the second store instruction is Re-executing the sequence of instructions comprising:
A method consisting of:

17. The method of claim 16, wherein step b) further comprises adding a flag to the first store instruction to indicate the protection register.

17. The method of claim 16, wherein step b) further comprises changing the load instruction to a load and protect instruction.

17. The method of claim 16, wherein the second store instruction stores back to the first address range the same value that the load instruction loaded from the first address range. how to.