JPH0816875B2

JPH0816875B2 - Computer system emulation method

Info

Publication number: JPH0816875B2
Application number: JP19960592A
Authority: JP
Inventors: 秀昭小松; 修郷田
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1992-07-27
Filing date: 1992-07-27
Publication date: 1996-02-21
Anticipated expiration: 2011-02-21
Also published as: JPH0695919A

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明はコンピュータ・システ
ムのエミュレーション方法に関し、とくにメモリ・アク
セスに関するエミュレーションを最適化するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a computer system emulation method, and more particularly to optimizing emulation for memory access.

【０００２】[0002]

【従来の技術】新しいプロセッサの出現によってより高
性能なコンピュータ・システムが生産されている。その
時最も問題となるのが、既存のソフトウエアの継承であ
る。既存のソフトウェアがすべて高級言語で書かれてお
り、そのソースコードを全部所有しているなら、再コン
パイルすることによって、新しいコンピュータ・システ
ムに対応した、ソフトウェアを作り上げることができ
る。しかし、実際の商用コンピュータの利用形態をみる
と、一般のユーザがソースコードをもっていることは、
ほとんどなく、また、プログラムも必ずしも高級言語で
書かれていないため簡単に変換できないものが多い。With the advent of new processors, higher performance computer systems are being produced. The biggest problem at that time is the inheritance of existing software. If all your existing software is written in a high-level language and you own all of its source code, you can recompile it to make it suitable for your new computer system. However, looking at the actual usage of commercial computers, it can be seen that general users have the source code.
There are few, and many programs cannot be easily converted because they are not necessarily written in high-level languages.

【０００３】そこで、既存のソフトウエアの有効な継承
手段として対象とするコンピュータ・システムをエミュ
レーションすることがあげられる。このエミュレーショ
ン方式では、実際のコンピュータ・システムと対象とす
るコンピュータ・システムの間のプロセッサのアーキテ
クチャ（命令セット）の違いや、ハードウエア（Ｉ／Ｏ
装置）の種類や制御方式違いをソフトウェアによって、
吸収することによって、どのようなコンピュータのため
プログラムを実行することが可能となる。Therefore, emulation of a target computer system can be cited as an effective inheritance means of existing software. In this emulation method, the difference in processor architecture (instruction set) between the actual computer system and the target computer system and the hardware (I / O)
Different types of devices) and differences in control methods
By absorbing, it becomes possible to execute the program for any computer.

【０００４】実際にエミュレーションを行なう技術とし
て、インタプリタ方式、コンパイル方式、実行時コンパ
イル方式の３方式がある。インタプリタ方式は、対象と
するプロセッサの命令を１つ１つ取り出しては、それを
解釈実行するものであり、エミュレータを作成すること
が容易である。対象とするコンピュータ・システムをマ
イクロプログラムで実現するなら速度的にも十分であ
る。しかし、通常のプロセッサ上に実現した場合には、
実用的な速度を得ることは困難である。There are three methods for actually performing emulation: an interpreter method, a compile method, and a run-time compile method. The interpreter method takes out each instruction of a target processor, interprets and executes it, and it is easy to create an emulator. It is sufficient in terms of speed if the target computer system is realized by a microprogram. However, if realized on a normal processor,
It is difficult to obtain a practical speed.

【０００５】コンパイル方式は対象とするプログラムを
あらかじめコンパイルしておく方式であり、時間のかか
るコンパイル技術をあらかじめ適用しておくことも可能
になるため、実際の実行時間は３方式の中でもっとも高
速にできる。しかし、プログラムを変更するプログラム
を実現することはできないため、ＯＳのローダの機能を
実現することや、プログラムテクニックとして自己変更
を用いたものは、エミュレーションすることはできな
い。さらに、バイナリコードを完全に逆コンパイルする
ことは不可能であるため、エントリアドレスのすべてが
静的にトレースできないようなプログラムに、完全に対
応することはできない。さらに、実際に実行するかどう
か、分からないコードの断片のすべてをコンパイルして
おかなければならないため、エミュレーションのための
コードサイズが大きくなってしまう、などの欠点をも
つ。The compile method is a method in which a target program is compiled in advance, and it is possible to apply a time-consuming compile technique in advance. Therefore, the actual execution time is the fastest among the three methods. You can However, since it is not possible to realize a program that changes a program, it is not possible to emulate a function of an OS loader or a program technique that uses self-modification. Furthermore, it is impossible to completely decompile binary code, so it is not possible to fully support a program in which all of the entry addresses cannot be statically traced. Furthermore, it has a drawback that the code size for emulation becomes large because all the code fragments that are unknown whether to actually execute it must be compiled.

【０００６】これに対し、実行時コンパイル方式はコン
パイルするためのオーバーヘッドが必要であり、また、
生成コードの質を向上させようとして、複雑な最適化手
法を利用すると、このオーバーヘッドがさらに増加して
しまう。このため、実行時コンパイル方式の実行速度
は、コンパイル方式に比べて遅くなってしまう。しか
し、対象とするアーキテクチャ（命令及びＩ／Ｏ）をす
べてエミュレーションすることが可能となるため、どの
ようなプログラムでさえも、実行することができるよう
になる。もっとも実用性が高いものが、この方式であ
り、本発明もこの方式を対象としている。On the other hand, the run-time compilation method requires an overhead for compiling, and
This overhead is further increased when complex optimization techniques are used in an attempt to improve the quality of the generated code. Therefore, the execution speed of the run-time compilation method becomes slower than that of the compilation method. However, since it becomes possible to emulate all target architectures (instructions and I / O), any program can be executed. This method has the highest practicality, and the present invention is also directed to this method.

【０００７】プロセッサの命令の種類を大まかに分類す
ると、以下の５つになる・メモリ操作（データのロード・ストア）・演算（整数、実数、テスト）・分岐（条件分岐、無条件分岐）・システム制御（ＯＳのサポート）・入出力前記のエミュレーションの３方式のどの方式を用いて
も、エミュレーションの対象となるプロセッサでの処理
に比べて、もっとも処理に手間のかかるのが、メモリ操
作である。これは、メモリアクセスの動作が単純なデー
タのロード・ストアだけにとどまらず、操作の対象とな
るアドレス（実行アドレス）によって、以下の４種類の
操作に分類されるからである。・データのロードストア・メモリマップＩ／Ｏ・ＲＯＭエリアへのライト・プログラムエリアへのライトThe types of processor instructions are roughly classified into the following five types: -Memory operations (load / store of data) -Operations (integers, real numbers, tests) -Branches (conditional branches, unconditional branches)- System control (OS support) -Input / output No matter which of the above three emulation methods is used, the most time consuming process is the memory operation compared to the processing by the emulation target processor. . This is because the memory access operation is not limited to simple load / store of data, but is classified into the following four types of operations depending on the address (execution address) to be operated.・ Data load / store ・ Memory map I / O ・ Write to ROM area ・ Write to program area

【０００８】メモリマップＩ／Ｏを用いたコンピュータ
システムをエミュレーションする場合には、ロード・ス
トア命令の実行アドレスがＩ／Ｏエリアであった場合、
入出力命令として動作する必要がある。また、ＲＯＭエ
リアへのストアは無効にしなければならない。さらに、
実行時コンパイル方式を用いる場合は、プログラムエリ
ア（すでにコンパイルしたエリア）への書き込みは、コ
ードの無効化をおこない、さらに変更されたコードが実
行された場合、再コンパイルを必要とする。When emulating a computer system using the memory map I / O, when the execution address of the load / store instruction is the I / O area,
It must operate as an input / output instruction. Also, the store in the ROM area must be disabled. further,
When using the run-time compilation scheme, writing to the program area (the already compiled area) invalidates the code and requires recompilation if the modified code is executed.

【０００９】原理的には、すべてのメモリオペレーショ
ンに対して、その実行アドレスが通常のデータの空間か
そうでないかをテストする必要が生じるため、エミュレ
ーションの速度低下の原因となる。 11 メモリオペレーションにおけるテストは以下のようにな
る。実行アドレスの計算 if 実行アドレス＞＝Ｉ／Ｏエリアの先頭かつ実行アドレス＜＝Ｉ／Ｏエリアの最後 then Ｉ／Ｏ処
理 if 実行アドレス＞＝ＲＯＭエリアの先頭かつ実行アドレス＜＝ＲＯＭエリアの最後 then ＲＯＭ処
理 if 実行アドレス＞＝プログラムエリアの先頭かつ実行アドレス＜＝プログラムエリアの最後 then プロ
グラム処理In principle, for every memory operation, it is necessary to test whether the execution address is the normal data space or not, which causes the emulation to slow down. 11 The memory operation test is as follows. Calculation of execution address if execution address> = start of I / O area and execution address <= end of I / O area then I / O processing if execution address> = start of ROM area and execution address <= end of ROM area then ROM processing if execution address> = program area start and execution address <= program area end then program processing

【００１０】そこで、すべてのメモリー空間に一意に対
応した属性表を用いる方法（メモリ１バイトに１バイト
の属性値を持たせる方式）がある（図１）。この属性値
（１バイト）の中の書くビットに、そのメモリ空間がど
ういうエリアに属しているかを記録しておく。これを用
いた、メモリオペレーションのテストは以下のようにな
る。（実行時コンパイル方式）実行アドレスの計算属性アドレスの計算属性値のロード if 属性値が０でない then if 属性値のプログラムビットが真 then プログラム処
理 if 属性値のＩ／Ｏビットが真 then Ｉ／Ｏ処理 if 属性値のＲＯＭビットが真 then ＲＯＭ処理 endif １バイトのメモリに１バイトの属性をもたせるのは、属
性アドレスの計算の高速化と属性値の各テストの高速化
のためである。コンパイル方式では、プログラム処理は
対象としていないので、そのテストは必要ない。ほとん
どのメモリオペレーションが、通常のデータへのアクセ
スであるため（属性値が０であるため）、実行中は、属
性値の各ビットのテストがおこなわれないため実行時間
は、高速化される。この属性方式によって実用的な速度
で、実行時コンパイル方式が実用になる。Therefore, there is a method of using an attribute table uniquely corresponding to all memory spaces (a method of giving 1 byte of attribute value to 1 byte of memory) (FIG. 1). The area to which the memory space belongs is recorded in the bit to be written in this attribute value (1 byte). The memory operation test using this is as follows. (Run-time compilation method) Execution address calculation Attribute address calculation Attribute value loading if attribute value is not 0 then if Program bit of attribute value is true then Program processing if I / O bit of attribute value is true then I / O Process if ROM value of attribute value is true then ROM process endif One byte of memory has one byte of attribute in order to speed up calculation of attribute address and speed up each test of attribute value. The compile method does not target the program processing, so its test is not necessary. Since most memory operations are normal data accesses (because the attribute value is 0), each bit of the attribute value is not tested during execution, so that the execution time is accelerated. This attribute method makes the run-time compilation method practical at a practical speed.

【００１１】しかし、このメモリ操作は、依然としてエ
ミュレーション全体のボトルネックとなっている。その
理由としては、以下の５つが挙げられる。（１）ほとんどのメモリ操作命令はデータのロード・ス
トアのためにのみに使用され、Ｉ／Ｏの処理やＲＯＭエ
リアへの書き込み、プログラムの書き直しのために使用
されることは、ほとんど稀である。（２）メモリ操作命令は、比較的使用頻度の高い命令で
あり、テストの部分の処理時間に比べて、データのロー
ド・ストアの本来の処理はごくわずかしかないため、相
対的な速度の低下の比率が大きくなってしまう。（３）このテストの部分の処理の各ステップはおのおの
前のステップの結果に依存してしまうため、各ステップ
をまったく並列に実行できない。これによって、スーパ
・スカラ方式（ＲＳ６０００）やＶＬＩＷ方式などの命
令を並列に実行できるプロセッサでの速度向上が望めな
い。（４）メモリのロードは通常の命令に比べて実行時間が
長くなっていまう。この方式では、属性値をロードした
直後にテストしなければならないため、実行パイプライ
ンが停止してしまう。現在のプロセッサはこのパイプラ
イン技術によって実行速度をかせいでいるため、この速
度低下は問題となる。（５）属性値のテストのコードをインラインで生成しな
ければ、実行速度が低下してしまうため、全体のコンパ
イル・コードが大きくなってしまう。また、これによっ
て命令キャッシュの利用率の低下や、ページフォールト
を増加させ、全体の処理効率を低下されてしまう。However, this memory manipulation remains the bottleneck of the overall emulation. The reasons are as follows. (1) Most memory operation instructions are used only for loading / storing data, and are rarely used for I / O processing, writing to the ROM area, and rewriting programs. . (2) Memory operation instructions are relatively frequently used instructions, and compared with the processing time of the test part, the original processing of loading / storing data is very small, so the relative speed decreases. The ratio of will increase. (3) Since each step of the processing of this test part depends on the result of the previous step, the steps cannot be executed in parallel at all. Due to this, it is not possible to expect speed improvement in a processor capable of executing instructions in parallel such as the super-scalar method (RS6000) and the VLIW method in parallel. (4) The load time of the memory is longer than that of a normal instruction. This method causes the execution pipeline to stall because the attribute value must be tested immediately after loading. This reduction in speed is a problem because current processors rely on this pipeline technology for execution speed. (5) Unless the code for the attribute value test is generated inline, the execution speed is reduced, and the entire compiled code becomes large. This also lowers the instruction cache utilization rate and increases page faults, thus reducing the overall processing efficiency.

【００１２】なおこの発明と関連する特許文献として
は、特開平２−５１３８号公報がある。これは先に説明
したとおりの、属性値をそのまま保持するものである。As a patent document related to the present invention, there is JP-A-2-5138. This holds the attribute value as it is, as described above.

【００１３】[0013]

【発明が解決しようとする課題】この発明では以上の事
情を考慮してなされたものであり、メモリ・アクセスの
エミュレーションを最適化してエミュレーション全体の
高速化を図ることを目的としている。SUMMARY OF THE INVENTION The present invention has been made in consideration of the above circumstances, and it is an object of the present invention to optimize the emulation of memory access to speed up the entire emulation.

【００１４】[0014]

【課題を解決するための手段】この発明では、メモリの
１バイトに対応した属性表をもつことによって高速化す
る従前の手法を基本とし、さらにメモリ操作のほとんど
がデータのロード・ストアのために使用されることに着
目して、ほとんどのメモリ操作命令において属性値のテ
ストを行わずに実行することを可能とするものである。The present invention is based on the conventional method of speeding up by having an attribute table corresponding to one byte of memory, and most of the memory operations are for loading / storing data. Focusing on the fact that it is used, most memory operation instructions can be executed without testing the attribute value.

【００１５】このために付加するものとしてセグメント
予測表がある（図２）。すなわち、図２に示すように、
メモリ空間をある一定の大きさのまとまり（セグメント
と呼ぶ）に分割して、それぞれのセグメントの属性値の
論理和を表現する１ビットのデータ（セグメント予測ビ
ット）を生成し、さらに当該セグメントの予測ビット
と、隣接する２つのセグメントの予測ビットとの論理和
を行う。その操作をすべてのメモリ空間に対して実行
し、セグメント予測表を生成する（図２）。A segment prediction table is added for this purpose (FIG. 2). That is, as shown in FIG.
The memory space is divided into groups of a certain size (called segments), 1-bit data (segment prediction bit) that expresses the logical sum of the attribute values of each segment is generated, and the prediction of that segment is also performed. The bit is ORed with the prediction bit of two adjacent segments. The operation is executed for all memory spaces to generate a segment prediction table (Fig. 2).

【００１６】このセグメント予測表に格納されているビ
ットの値が０になっているということは、その予測ビッ
トに対応したセグメントに属しているメモリのどれも
が、通常のデータのためのメモリであり、各属性のテス
トは必要ないことを意味する。When the value of the bit stored in this segment prediction table is 0, it means that any of the memories belonging to the segment corresponding to the prediction bit is a memory for normal data. Yes, meaning that testing of each attribute is not required.

【００１７】実際にこのセグメント予測表のデータを直
接使用するのではなく、カテゴリ予測フラグを、ＣＰＵ
内に保持しておき、このフラグを参照する。カテゴリ予
測フラグは、メモリ参照のカテゴリごとに検査するスパ
ンが異なるので、フラグ生成の仕方もそのスパンに応じ
たものとするものである。このカテゴリ予測フラグが０
になっていればそのカテゴリのメモリ操作命令では、属
性のテストが必要なくなる。Instead of directly using the data of this segment prediction table, the category prediction flag is set to the CPU.
Keep this inside and refer to this flag. Since the category prediction flag has a different span to be inspected for each category of memory reference, the method of flag generation also depends on the span. This category prediction flag is 0
If so, the memory operation instruction of that category does not require the attribute test.

【００１８】メモリー操作命令のためテストのアルゴリ
ズムは以下のようになる。実行アドレスの計算 if カテゴリー予測フラグ＝０でない then 属性アドレスの計算属性値のロード if 属性値が０でない then if 属性値のプログラムビットが真 then プログラム処
理 if 属性値のＩ／Ｏビットが真 then Ｉ／Ｏ処理 if 属性値のＲＯＭビットが真 then ＲＯＭ処理 endif endifThe test algorithm for the memory operation instruction is as follows. Execution address calculation if Category prediction flag = not 0 then attribute address calculation Load attribute value if if attribute value is not 0 then if Program bit of attribute value is true then Program processing if I / O bit of attribute value is true then I / O processing if ROM bit of if attribute value is true then ROM processing endif endif

【００１９】カテゴリを選ぶ基準は、アドレッシングモ
ードの計算表現に用いられている基底アドレスの値の変
更が少なく、オフセットとして、レジスタでなく定数を
とるものがよい。実際に利用可能なカテゴリーの例とし
て、以下のようなものがあげられる。・スタック変数へのアクセス（図３）・グローバル変数へのアクセス（図４）・ヒープ領域の構造体へのアクセス（図５）As a criterion for selecting a category, it is preferable that the value of the base address used in the calculation expression of the addressing mode is not changed so much that the offset is a constant instead of the register. The following are examples of categories that can be actually used. -Access to stack variables (Fig. 3) -Access to global variables (Fig. 4) -Access to structures in the heap area (Fig. 5)

【００２０】スタック変数へのアクセスはスタック・ポ
インター（ＳＰ）を基底レジスタとして、変位を定数値
でアクセスすることが多い。よって、これをカテゴリと
することが可能である。グローバル変数へのアクセスも
ベース・レジスタを基底とし、変位を定数でアクセスす
ることが多い。さらに、ヒープ領域であっても同一の構
造体メンバーへのアクセスごとにそれをカテゴリーとす
ることが可能である。Access to a stack variable is often performed by using a stack pointer (SP) as a base register and a displacement as a constant value. Therefore, it is possible to make this a category. Access to global variables is based on the base register and displacement is often accessed as a constant. Furthermore, even in the heap area, it is possible to make it a category for each access to the same structure member.

【００２１】この方式をもとの属性方式に付加する際に
オーバーヘッドとなるのが、実行時に必要なカテゴリー
予測フラグのテストと、カテゴリー予測フラグの更新の
ためのオーバーヘッドである。実行時のカテゴリー予測
フラグのテストのために生じるオーバーヘッドは以下の
２つである。・予測フラグのテストのために必要な命令の実行時間・そのテストの分岐によって生じるパイプラインの遅延
時間When adding this method to the original attribute method, the overhead is the overhead of the category prediction flag test and the category prediction flag update necessary at the time of execution. There are two overheads incurred due to the testing of category prediction flags at runtime. -Instruction execution time required to test the prediction flag-Pipeline delay time caused by the branch of the test

【００２２】本方式では、カテゴリー予測フラグはＣＰ
Ｕ内部に保持しておくため、メモリからロードする必要
はない。レジスタの１ビットとして実現した場合は、テ
スト命令と条件分岐命令の２命令が、カテゴリー予測フ
ラグをテストするためのオーバーヘッドとなる。エミュ
レーションを実行しようとするＣＰＵのアーキテクチャ
に依存するが、条件コードレジスタ中の１ビットとして
実現可能なら、条件分岐命令１つだけが、テストのため
に必要な命令の実行時間となる。また、メモリ操作命令
がデータの操作として使用される確率、すなわち、予測
が当たる確率は非常に高く、この場合、属性のテストが
省略される。予測フラグのテストのために必要な命令数
よりも属性のテストのために必要な命令数の方が多いた
め、本方式によって実行命令数は少なくなる。In this method, the category prediction flag is CP.
Since it is held inside U, it is not necessary to load it from memory. When implemented as one bit in a register, two instructions, a test instruction and a conditional branch instruction, are overhead for testing the category prediction flag. Depending on the architecture of the CPU that is trying to execute the emulation, if it can be realized as one bit in the condition code register, only one conditional branch instruction will be the execution time of the instruction required for the test. Moreover, the probability that the memory operation instruction is used as an operation of data, that is, the probability that the prediction is correct is very high, and in this case, the attribute test is omitted. Since more instructions are required for attribute testing than for prediction flag testing, this method reduces the number of executed instructions.

【００２３】後者のテストの分岐によって生じるパイプ
ラインの停止によるオーバーヘッドは、予測が当たった
場合にのみ発生するオーバーヘッドであり、予測が当た
らなかった場合には、まったく無関係である。さらに、
予測が当たった場合には、本来必要であった属性のテス
トの分岐によるパイプラインの遅延を除去できるため、
全体としてはまったくオーバーヘッドにならない。The overhead due to pipeline stall caused by the latter test branch is the overhead that occurs only when the prediction is successful, and is totally irrelevant when the prediction is not successful. further,
If the prediction is correct, the pipeline delay due to the branch of the attribute test that was originally necessary can be eliminated,
Overall, there is no overhead at all.

【００２４】また、これまでの方法では、属性値のテス
トを生成コードのなかにインラインで展開しなければ、
実行効率が低下してしまったが、本方式を用いることに
よって、属生値のテストをサブルーチン化しても実行速
度が低下しないため、コードサイズを圧縮でき、命令キ
ャッシュの利用効率も向上する。さらに、実行アドレス
の計算とカテゴリー予測フラグのテストは依存関係がな
いため、スーパ・スカラやＶＬＩＷなどの命令を並列に
実行できるアーキテクチャをもったプロセッサでは、並
列実行が可能となる。Further, in the above-mentioned method, if the attribute value test is not expanded inline in the generated code,
Although the execution efficiency has decreased, using this method does not reduce the execution speed even if the generic value test is made into a subroutine, so the code size can be reduced and the instruction cache utilization efficiency is also improved. Further, since the calculation of the execution address and the test of the category prediction flag have no dependency relationship, parallel execution is possible in a processor having an architecture capable of executing instructions such as superscalar and VLIW in parallel.

【００２５】もう１つのオーバーヘッドである、カテゴ
リ予測フラグの更新がおこなわれるのは、以下の２つの
タイミングである。・対象とするセグメントの属性が変更されたとき・そのカテゴリーの対象とするセグメントが別のセグメ
ントになるときUpdate of the category prediction flag, which is another overhead, is performed at the following two timings. -When the attribute of the target segment is changed-When the target segment of the category becomes another segment

【００２６】前者は、今まで実行されていないコードが
はじめて実行されるさいに起こる。実行時コンパイル方
式によるエミュレーションでは、初めて実行される時に
そのコードをコンパイルするため、それによって、コン
パイルされたメモリの属性値が変更する。Ｉ／ＯやＲＯ
Ｍの空間は固定したものであるため、実行中に変化しな
い。もともと、実行時コンパイル方式では、コンパイル
のために、たくさんの処理時間を使用するため、セグメ
ント予測表の更新のオーバーヘッドは、それに比較して
非常に小さいものである。後者は、カテゴリーの基底レ
ジスタのアドレスが変更された際に起こる。このための
オーバーヘッドは以下の２つのステップによって生じる
ものである。・基底レジスタのアドレスの下位ビットをマスクする・セグメント予測ビットを引くこのオーバーヘッドは、属性値をロードするオーバーヘ
ッドと同程度であり、対象となるメモリ操作命令がデー
タ操作のためのものであれば、ほとんどオーバーヘッド
とならない。さらに、一度決定された、カテゴリー予測
フラグは再利用されるため、結果的にはまったくオーバ
ーヘッドにならない。The former occurs when code that has not been executed so far is executed for the first time. Emulation by the run-time compilation scheme compiles the code the first time it is executed, which changes the attribute values of the compiled memory. I / O and RO
The space of M is fixed and does not change during execution. Originally, the run-time compilation method uses a lot of processing time for compilation, so the overhead of updating the segment prediction table is much smaller than that. The latter happens when the address of the category's base register is changed. The overhead for this is caused by the following two steps. -Mask the lower bits of the address of the base register-Subtract the segment prediction bit This overhead is similar to the overhead of loading the attribute value, and if the target memory manipulation instruction is for data manipulation, Almost no overhead. Furthermore, once determined, the category prediction flag is reused, resulting in no overhead at all.

【００２７】本方式は、実行時コンパイラ方式を高速化
するためのものであるが、既存のコンパイラの最適化技
術（データフロー解析）を利用すれば、より効率的に本
方式を適応することが可能となる。これによって、セグ
メント予測フラグの更新を最少に抑えることができる。This method is for accelerating the runtime compiler method, but if the existing compiler optimization technology (data flow analysis) is used, this method can be more efficiently applied. It will be possible. Thereby, updating of the segment prediction flag can be minimized.

【００２８】[0028]

【実施例】以下この発明を、米国のインターナショナル
・ビジネス・マシーンズ（ＩＢＭ、商標）社のＩＢＭ
ＰＣを同じくＩＢＭ社のＲＳ／６０００（ＩＢＭ社の商
標）でエミュレートする場合に適用した実施例について
説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below with reference to IBM of International Business Machines (IBM, trademark)
An example applied when emulating a PC with RS / 6000 (trademark of IBM) of IBM is also described.

【００２９】ＩＢＭＰＣを実行時コンパイル方式によ
ってＲＳ／６０００上でエミュレートするための処理手
順は図６のようになる。まず実行しようとするＩｎｔｅ
ｌ８０８６プロセッサ（Ｉｎｔｅｌおよび８０８６は米
国Ｉｎｔｅｌ社の商標、以下単に８０８６と呼ぶ）の命
令のアドレスをアドレス対応表によって、ＲＳ／６００
０プロセッサのコンパイルされた命令列のアドレスに変
換する。The processing procedure for emulating the IBM PC on the RS / 6000 by the run-time compilation method is shown in FIG. Inte to try first
1/8086 processor (Intel and 8086 are trademarks of Intel Corporation in the United States, hereinafter referred to as 8086)
0 Convert to the address of the compiled instruction string of the processor.

【００３０】８０８６の実行アドレスがすでに実行され
たものであれば、アドレス対応表にエントリが存在し、
対応するＲＳ／６０００のアドレスが記録されているの
で、そこへ分岐することによってエミューレーションを
実行する。８０８６の実行アドレスがまだ実行されてい
ないものであれば、このアドレス対応表には未登録であ
るため、この場合に実行時コンパイラが起動される。コ
ンパイルする範囲はその８０８６の命令が到達可能であ
ると判断できるものがすべて対象となる。なるべく大き
な範囲をコンパイルの対象とすることで、実行時にコン
パイラを呼ぶ回数を減らすことが可能となる。さらに、
コンパイルの範囲を大きくとることによって、最適化が
適応される範囲も大きくなるため、より効率的な最適化
が実行できる。本手法も図６における最適化部、コード
生成部で使用されるものである。If the execution address of 8086 has already been executed, there is an entry in the address correspondence table,
Since the corresponding RS / 6000 address is recorded, the emulation is executed by branching there. If the execution address of 8086 has not been executed yet, it is not registered in this address correspondence table, and in this case, the runtime compiler is activated. The range to be compiled is all that can be judged that the 8086 instruction is reachable. By making the compilation target as large as possible, it is possible to reduce the number of times the compiler is called at runtime. further,
By increasing the range of compilation, the range to which optimization is applied also increases, so that more efficient optimization can be executed. This method is also used in the optimization unit and the code generation unit in FIG.

【００３１】実行時コンパイラはコンパイル対象の８０
８６の命令列をＲＳ／６０００の命令列に変換する。８
０８６の１命令がＲＳ／６０００の数命令に変換される
だけでなく、８０８６の数命令がＲＳ／６０００の数命
令に変換されることもある。１命令をコンパイルするよ
りも数命令をまとめてコンパイルしたほうが、最適化さ
れたコードが生成できる。コンパイルされる命令の種類
は以下のようになる。・メモリ操作（データのロード・ストア）・演算（整数、実数、テスト）・分岐（条件分岐、無条件分岐）・システム制御（ＯＳのサポート）・入出力The run-time compiler has 80
The 86 instruction sequence is converted into an RS / 6000 instruction sequence. 8
Not only one 086 instruction is converted into RS / 6000 number instructions, but 8086 number instructions may be converted into RS / 6000 number instructions. It is possible to generate optimized code by compiling several instructions together rather than compiling one instruction. The types of instructions compiled are as follows.・ Memory operation (data load / store) ・ Operation (integer, real number, test) ・ Branch (conditional branch, unconditional branch) ・ System control (OS support) ・ I / O

【００３２】これらの命令の種類のうち実際にコンパイ
ルされる範囲が分断されるのは、分岐命令だけである。
分岐命令もすべてのものではなく、間接分岐命令やリタ
ーン命令のようなデータの値に依存して実行後のプログ
ラム・カウンタの値が変化する命令によってである。間
接分岐命令にはレジスタ間接分岐、セグメントレジスタ
を使用した分岐命令（ｆａｒｃａｌｌ、ｆａｒｊｕ
ｍｐ）などである。このような命令をコンパイルするた
めに、コンパイラは、分岐条件の判断と分岐先の実行ア
ドレスの計算のあと、図６のループの先頭であるアドレ
ス対応表の検索の命令列へ分岐するための命令列を生成
する。Of the types of these instructions, it is only the branch instructions that divide the range that is actually compiled.
Branch instructions are not all, either, and are instructions such as indirect branch instructions and return instructions that change the value of the program counter after execution depending on the value of data. The indirect branch instruction is a register indirect branch, or a branch instruction using a segment register (far call, far ju).
mp) and so on. In order to compile such an instruction, the compiler determines the branch condition and calculates the execution address of the branch destination, and then the instruction for branching to the instruction sequence for searching the address correspondence table at the beginning of the loop in FIG. Generate a column.

【００３３】それ以外の分岐命令は、分岐先がアドレス
対応表に存在すれば、そこでそのパスのコンパイルを停
止して、別な到達可能な８０８６命令をコンパイルす
る。アドレス対応表に存在しなかった場合は継続してそ
のパスをコンパイルしていき、すべての到達可能なパス
をコンパイルすることによって、１回の実行時コンパイ
ルを停止する。For branch instructions other than that, if the branch destination exists in the address correspondence table, the compilation of the path is stopped there, and another reachable 8086 instruction is compiled. If it does not exist in the address mapping table, the path is continuously compiled, and all the reachable paths are compiled, so that one run-time compilation is stopped.

【００３４】分岐以外の上記の命令の種類に対しては、
実行時コンパイラのコンパイル領域の制御は行なわれ
ず、純粋に命令を変換するだけの処理を行う。入出力命
令に対しては、対応するデバイス（ディスプレイ、キー
ボード、ディスクなど）をエミュレートするサブルーチ
ンを用意しておき、そのサブルーチンへのコール命令を
生成することによって、コンパイルが行われる。レジス
タ間接でＩ／Ｏ空間をアクセスする８０８６命令の場合
には、どのデバイスが使用されるのかが、実行時まで決
定できないため、その振り分けを行うサブルーチンを用
意し、そのサブルーチンへのコール命令を生成してお
く。For the above instruction types other than branch,
It does not control the compile area of the run-time compiler, but only processes instructions that are translated. For input / output instructions, a subroutine that emulates a corresponding device (display, keyboard, disk, etc.) is prepared, and a call instruction to the subroutine is generated to compile. In the case of the 8086 instruction that accesses the I / O space by register indirect, it is not possible to determine which device will be used until execution time. Therefore, a subroutine for allocating the device is prepared, and a call instruction to that subroutine is generated. I'll do it.

【００３５】ＩＢＭＰＣではすべてのＩ／ＯがＩ／Ｏ
空間だけに存在するのではなく、メモリ・マップドＩ／
Ｏと呼ばれる、通常のデータ空間にＩ／Ｏを割り当てる
手法を使用している。これによって、すべてのＩ／Ｏが
Ｉ／Ｏ命令によってのみ処理されるのではなく、メモリ
にアクセスするすべての命令において、その影響を及ぼ
すアドレス（実行アドレス、effective address）がＩ
／Ｏのためのメモリ空間かデータやプログラムのための
空間かをテストする必要がある。ＩＢＭＰＣのメモリ
・マップドＩ／Ｏのアドレスは、Ａ００００（１６進）
からＤＦＦＦＦまでの２５６Ｋバイトである（図７）。In IBM PC, all I / Os are I / Os
Memory-mapped I /
It uses a technique called O, which allocates I / O to a normal data space. As a result, not all I / Os are processed only by I / O instructions, but in all instructions that access memory, the influencing address (execution address, effective address) is I.
It is necessary to test whether memory space for / O or space for data and programs. The address of memory mapped I / O of IBM PC is A0000 (hexadecimal)
To 256 bytes from DFFFF to DFFFF (FIG. 7).

【００３６】また、ＲＯＭ空間（ＩＢＭＰＣではＥ０
００（１６進）からＦＦＦＦまで）への書き込みは無効
にしなければならないため、書き込みの実行アドレスの
テストも必要となる。（図７）Also, ROM space (E0 in IBM PC is
Since writing from 00 (hexadecimal) to FFFF must be invalidated, a test of the write execution address is also necessary. (Figure 7)

【００３７】メモリ操作命令の実行時コンパイルでは、
前述のようにＩ／Ｏ空間にアクセスしているか、ＲＯＭ
空間にアクセスしているがどうかをテストしけらばなら
ないほか、ストア命令が、すでにコンパイルされて、Ｒ
Ｓ／６０００のオブジェクト・コードに変換されたプロ
グラムの１部を変更しているかどうかのテストも必要と
なる。In run-time compilation of memory manipulation instructions:
Whether the I / O space is accessed as described above, or ROM
You have to test if you are accessing the space and the store instruction has already been compiled and R
It is also necessary to test whether a part of the program converted into the S / 6000 object code is changed.

【００３８】８０８６の命令では、演算命令のオペラン
ドとしてメモリを取り得るため、メモリ操作命令（ロー
ド・ストア命令）以外にも、メモリをオペランドとして
いる命令は上記のテストが必要となる。ＲＯＭ空間とＩ
／Ｏ空間は固定で１つの領域をとっているため、テスト
が容易であるが、あるアドレスがすでにコンパイルされ
たプログラムであるか、データであるかを判定するのは
困難である。そこで、これらを効率的に処理するため
に、属性表を使用する。（図７）Since the instruction of 8086 can take a memory as an operand of an operation instruction, the above test is required for an instruction having a memory as an operand in addition to a memory operation instruction (load / store instruction). ROM space and I
Since the / O space is fixed and takes one area, it is easy to test, but it is difficult to determine whether an address is a compiled program or data. Therefore, in order to process these efficiently, the attribute table is used. (Figure 7)

【００３９】属性表はＩＢＭＰＣのメモリの１バイト
につき１バイトのエントリをもっている。使用するビッ
トは１バイト中のすべてではないが、この属性表のデー
タ（属性値）は非常に頻繁にアクセスされるため、数ビ
ットずつ使用することは、メモリを浪費せずにすむもの
の、エミュレーションの高速化のためにはよくない。そ
こでアクセスのもっとも高速な最小単位である１バイト
をこの属性表のエントリとする。The attribute table has an entry of 1 byte for each byte of the IBM PC memory. Although the bits used are not all in one byte, the data (attribute values) in this attribute table is accessed very frequently, so using several bits at a time saves memory, but emulation is possible. Not good for speeding up. Therefore, 1 byte, which is the fastest and smallest unit of access, is used as an entry in this attribute table.

【００４０】属性値は図７に示すように使用されてい
る。属性表はエミュレーションの開始時に初期化される
（図６におけるシステムの初期化）。前述のようにＩＢ
ＭＰＣではＲＯＭ空間とＩ／Ｏ空間が固定しているた
め、これらのアドレスの領域に対応する属性表のエント
リのＩ／ＯビットとＲＯＭビットをオン、それ以外のビ
ットをすべてオフにセットされたものが、その初期値と
なる。実行時に属性値が変化するのは、プログラムビッ
トのみで、実行時コンパイラがＲＳ／６０００のオブジ
ェクト・コードに変換した８０８６の命令のすべてのア
ドレスに対応する属性値のプログラム・ビットをオンに
セットする。The attribute values are used as shown in FIG. The attribute table is initialized at the start of emulation (system initialization in FIG. 6). IB as mentioned above
Since the ROM space and I / O space are fixed in MPC, the I / O bit and ROM bit of the attribute table entry corresponding to these address areas are set to ON, and all other bits are set to OFF. The initial value is Only the program bit changes the attribute value at execution time. The runtime compiler sets the attribute value program bit corresponding to all the addresses of the 8086 instruction converted into the RS / 6000 object code to ON. .

【００４１】この属性値を用いたメモリ・リードの際の
テストのフローチャートを図８に示す。リードの場合に
はＲＯＭ空間とプログラム空間のテストは不用である。
メモリ・ライトの際のテストのフローチャートを図９に
示す。ライトの場合には、ＲＯＭ空間、Ｉ／Ｏ空間、プ
ログラム空間のテストが必要である。ライトの場合には
３つテストが必要となるが、属性値が０になっていれ
ば、個々のテストは不用となる。実際のプログラムで
は、通常がデータアクセスされる頻度に比較して、プロ
グラムが変更されたり、Ｉ／Ｏ空間やＲＯＭ空間がアク
セスされることは、極めて稀である。それゆえ、リード
の場合のテストもライトの場合のテストも実行時間に比
べてそれほど差がない。FIG. 8 shows a flow chart of a test at the time of memory read using this attribute value. In case of read, the test of ROM space and program space is unnecessary.
FIG. 9 shows a flow chart of the test at the time of memory write. In the case of writing, it is necessary to test the ROM space, I / O space, and program space. In the case of write, three tests are required, but if the attribute value is 0, each test is unnecessary. In an actual program, it is extremely rare that the program is changed or the I / O space or the ROM space is accessed, as compared with the frequency at which data is normally accessed. Therefore, there is not much difference in execution time between the read test and the write test.

【００４２】このテストの処理時間を縮小するために付
加するものとして、メモリ空間をある一定の大きさのセ
グメントに分割して、それぞれのセグメントの属性値の
論理和を、隣接する２つのセグメントの論理和と論理和
を行なったセグメント予測表を用いる。（セグメント予
測ビットの値が０になっていれば、各属性のテストは必
要ない）このセグメント予測表のデータを、メモリ参照
を利用法ごとに分類したカテゴリー予測フラグを、ＲＳ
／６０００のフラグレジスタに割り当てる。当然このカ
テゴリー予測フラグが０になっていればそのカテゴリー
のメモリ操作命令では、属性のテストが必要ない。As an addition to reduce the processing time of this test, the memory space is divided into segments of a certain size, and the logical sum of the attribute values of each segment is used for the calculation of two adjacent segments. A segment prediction table obtained by performing logical sum and logical sum is used. (If the value of the segment prediction bit is 0, it is not necessary to test each attribute.) The category prediction flag obtained by classifying the data of this segment prediction table according to the usage of the memory is RS.
/ 6000 flag register. Naturally, if this category prediction flag is 0, the attribute operation is not required for the memory operation instruction of that category.

【００４３】８０８６プロセッサで使用するカテゴリは
以下のようなものが挙げられる。（１）スタックカテゴリＳＳ：定数（ＢＰ）（２）グローバルカテゴリＤＳ：定数（３）ヒープカテゴリＥＳ：定数（ＢＸ）８０８６において、スタック変数へのアクセスはスタッ
クセグメント（ＳＳ）において、ベース・ポインター
（ＢＰ）を基底レジスタとして、変位を定数値でアクセ
スされる頻度が非常に高い。また、一般的にＢＰはサブ
ルーチンの先頭で更新され、リターン時に復帰する。こ
のスタックカテゴリの予測フラグによって、サブルーチ
ン内のスタック変数へのアクセスに必要となる、メモリ
テストコードを最適化することが可能となる。The categories used by the 8086 processor are as follows. (1) Stack category SS: Constant (BP) (2) Global category DS: Constant (3) Heap category ES: Constant (BX) In 8086, access to a stack variable is made in a base segment (SS) in a stack segment (SS). BP) is used as a base register, and displacement is frequently accessed with a constant value. Further, generally, the BP is updated at the beginning of the subroutine and returned when returning. The prediction flag of this stack category makes it possible to optimize the memory test code required for accessing the stack variable in the subroutine.

【００４４】本方式のセグメントのサイズは、８０８６
のアーキテクチャを有効に処理するために、１２８バイ
トに設定している（図１０）。８０８６はスタックフレ
ームのアクセスするのに、ＢＰ（ベースポインタ）を基
底レジスタとして、ショートオフセットの場合には、プ
ラス１２７バイトマイナス１２８バイトの範囲でアクセ
ス可能である。８０８６をエミュレートする上で、本方
式をもっとも効果的に使用できるカテゴリーがスタック
フレームにある変数のアクセスであるため、１２８を本
方式におけるセグメントのサイズとしている。このサイ
ズをより大きなものに変更することも可能であるが、あ
まり大きな値を用いてしまうと、１セグメント内にプロ
グラムとデータが混在してしまうため、予測がはずれて
しまい、本方式で効果がえられなくなってしまう。The segment size of this method is 8086.
In order to effectively process the above architecture, it is set to 128 bytes (FIG. 10). The 8086 can access the stack frame within a range of plus 127 bytes minus 128 bytes in the case of a short offset using a BP (base pointer) as a base register. In emulating the 8086, since the category in which the present method can be used most effectively is access to variables in the stack frame, 128 is set as the segment size in the present method. It is possible to change this size to a larger one, but if a too large value is used, the program and data will be mixed in one segment, so the prediction will be incorrect and the effect of this method will be I can not get it.

【００４５】ショートオフセットを使用した場合（ほと
んどのスタックフレームはこのアドレッシング・モード
でアクセスされる）、ＢＰレジスタがアクセス可能な領
域は図１１のメモリ空間の斜線の部分になる。ＢＰレジ
スタの示しているアドレスに対応する属性のセグメント
をＳｅｇＢＰとする。この前後のセグメント（Ｓｅ
ｇＢＰー１及びＳｅｇＢＰ＋１）は、ＢＰレジス
タがアクセス可能なメモリ領域より大きな領域となって
おり、これに対応するセグメント予測ビットは、ＢＰレ
ジスタがアクセス可能なメモリ領域のすべてを予測可能
となる。セグメント予測表のＢＰ番目のエントリはＢＰ
レジスタの値を７ビット右にシフトした（１２８で割っ
た）ものである。When the short offset is used (most stack frames are accessed in this addressing mode), the area accessible by the BP register is the shaded portion of the memory space in FIG. The segment having the attribute corresponding to the address indicated by the BP register is Seg BP. Before and after this segment (Se
g BP-1 and Seg BP + 1) is an area larger than the memory area accessible by the BP register, and the segment prediction bit corresponding thereto can predict all the memory area accessible by the BP register. . The BPth entry in the segment prediction table is BP
The register value is right-shifted by 7 bits (divided by 128).

【００４６】（２）のグローバル変数へのアクセスはデ
ータセグメント（ＤＳ）において、定数のオフセットの
みでアクセスされる。また、ほとんどのプログラムは実
行中にＤＳの値を変更しないため、１度グローバルカテ
ゴリ予測フラグを設定すれば、ＤＳの変更によるグロー
バルカテゴリ予測フラグの更新は必要ないことが多い。
ＤＳのサイズは６４Ｋバイトであるため、１ビットの予
測フラグだけをＲＳ／６０００の条件レジスタにセット
するのではなく、セグメント予測表のエントリのビット
から５１２ビットの論理和をセットする必要がある。The access to the global variable of (2) is accessed only by a constant offset in the data segment (DS). Further, since most programs do not change the value of DS during execution, once the global category prediction flag is set, it is often unnecessary to update the global category prediction flag due to the change of DS.
Since the size of the DS is 64 Kbytes, it is necessary to set not only the 1-bit prediction flag in the RS / 6000 condition register but also the logical sum of 512 bits from the bit of the entry of the segment prediction table.

【００４７】論理和の値は計算される値のなかに１がひ
とつでもあれば結果も１になるので、１ビットではな
く、まとまったビット列が０かどうかのテストによって
高速化できる。ＲＳ／６０００は１ワード３２ビットで
あるため、１ワード分のセグメント予測フラグをロード
することによって、４Ｋバイトの大きさの８０８６のメ
モリ空間を予測可能となる。すなわち６４Ｋバイト分の
予測を行なうために、１７ワードのセグメント予測フラ
グをロードすればよい。１６ワードではなく、１ワード
余分に必要なのは、ＤＳが予測セグメントの途中の中途
半端なアドレスになっている場合に、１つ余計にテスト
しておくことで、６４Ｋバイトが完全にテストされたこ
とを保証できる。If there is at least one 1 in the calculated values of the logical sum, the result will also be 1. Therefore, the speed can be increased by testing whether the aggregated bit string is 0 instead of 1 bit. Since RS / 6000 is 32 bits per word, it is possible to predict the memory space of 8086 having a size of 4 Kbytes by loading the segment prediction flag for one word. That is, a segment prediction flag of 17 words may be loaded in order to predict 64 Kbytes. One extra word is needed instead of 16 words. When the DS has a halfway address in the middle of the prediction segment, one extra test is performed, and 64 Kbytes are completely tested. Can be guaranteed.

【００４８】このグローバルカテゴリのカテゴリ予測フ
ラグの更新は、スタックカテゴリの更新に比べで、メモ
リからロードするデータが１７倍も必要である。しか
し、ＲＳ／６０００などのパイプライン化されたデータ
キャッシュを備えた最新のプロセッサでは、連続するア
ドレスのリードはオーバーヘッドなしで、ロード可能で
ある。さらにキャッシュミスの可能性も２、３倍程度し
か増加しないため、全体として数倍程度の処理しか必要
としない。さらに、スタックカテゴリのカテゴリ予測フ
ラグが、サブルーチ・コール程度の頻度で更新されるの
に比べて、グローバルカテゴリの更新はさらに少ない。Updating the category prediction flag of this global category requires 17 times more data to be loaded from the memory than updating the stack category. However, in modern processors with pipelined data caches such as RS / 6000, consecutive address reads can be loaded without overhead. Further, the possibility of cache miss increases only about a couple of times, so that only a few times as many processes are required as a whole. Further, the category prediction flag of the stack category is updated as frequently as the subroutine call, and the global category is updated less frequently.

【００４９】（３）の構造体カテゴリは、以下の２つの
メモリアクセスを高速化するものである。・ヒープ領域の同一構造体のメンバーへのアクセス・ヒープ領域の配列要素へのアクセス８０８６のヒープへのアクセスはＥＳをセグメントレ
ジスタとし、ＢＸレジスタをインデックスレジスタと
してアクセスされることが多い。Ｉ／Ｏ空間のアクセス
も同様のアドレッシングモード（ＥＳ：定数（ＢＸ））
でアクセスされるため、これがデータをアクセスするも
のかどうか判断できない。しかし、ヒープ領域の同一構
造体のメンバーへのアクセスやヒープ領域の配列要素へ
のアクセスは、同じ性質をもっていると考えられるた
め、これをカテゴリーとして予測フラグを使用する。The structure category (3) accelerates the following two memory accesses. -Access to members of the same structure in the heap area-Access to array elements in the heap area When accessing the heap of the 8086, ES is often used as a segment register and BX register is used as an index register. The same addressing mode (ES: constant (BX)) is used for accessing I / O space
Since it is accessed in, it cannot be determined whether this is to access the data. However, since access to members of the same structure in the heap area and access to array elements in the heap area are considered to have the same property, the prediction flag is used with this as a category.

【００５０】図１２の例はヒープ領域にある２つのデー
タ（ CHILD1 及び CHILD2 ）からなる構造体（ PARENT
)の各要素をそれぞれ１増加させるものである。この例
では、本手法を利用しないと５回の属性値のテストが必
要である（１から５の５つ）。これらのテストが本手法
によってまったく省略さてしまう。この例における本手
法を用いたオーバーヘッドは図１２の（２）のＭＯＶ命
令の先頭でＥＳ：０（ＢＸ）の実行アドレスからヒー
プカテゴリ用のセグメント予測フラグをリードする必要
がある。The example shown in FIG. 12 is a structure (PARENT) consisting of two data (CHILD1 and CHILD2) in the heap area.
), Each element is increased by one. In this example, if this method is not used, it is necessary to test the attribute value 5 times (5 from 1 to 5). These tests are completely omitted by this method. In the overhead using this method in this example, it is necessary to read the segment prediction flag for the heap category from the execution address of ES: 0 (BX) at the head of the MOV instruction of (2) in FIG.

【００５１】次にセグメント予測フラグを用いることに
よって、１回のメモリアクセスのテストがどのくらい高
速化されるかを述べる。ＲＳ／６０００は異なった種類
（分岐、整数、不動小数点）の命令を同時に実行可能な
スーパー・スカラー・アーキテクチャをもったＣＰＵで
ある。さらに、ＲＳ／６０００は、ユーザが自由に使用
できる３２ビットの条件コードレジスタを備えているた
め、このレジスタの各ビットに様々なカテゴリー予測フ
ラグをマップすることが可能であり、これをテストする
のに１命令しか必要とならない。このテストの命令は実
行アドレスを計算する命令と並列に実行可能なため、Ｒ
Ｓ／６０００上では、カテゴリー予測フラグをテストす
る命令の実行時間は完全に０にすることが可能である。Next, how to speed up the test for one memory access by using the segment prediction flag will be described. The RS / 6000 is a CPU having a superscalar architecture capable of simultaneously executing different types of instructions (branch, integer, fixed point). In addition, the RS / 6000 has a 32-bit condition code register that is freely available to the user, so it is possible to map different category prediction flags to each bit of this register, and test this. Only one instruction is needed for each. Since the instruction of this test can be executed in parallel with the instruction that calculates the execution address, R
On S / 6000, the execution time of the instruction testing the category prediction flag can be completely zero.

【００５２】ＲＳ／６０００では、条件分岐命令の分岐
条件（この場合には、予測フラグの値）が命令の実行よ
り３サイクルより前に決定していれば、０サイクルの分
岐がおこなえる、すなわち、分岐によって生じるパイプ
ラインの遅延を除去することが可能なアーキテクチャを
持っている。本方式では、カテゴリー予測フラグの値が
決定するのは、その値をテストするより、ほとんどの場
合、ずっと前である。よって、ＲＳ／６０００の０サイ
クル分岐の特徴を利用することによって、予測が当たっ
た場合にも必要であった分岐による遅延時間を除去する
ことが可能となる。In RS / 6000, if the branch condition of the conditional branch instruction (in this case, the value of the prediction flag) is determined three cycles before the execution of the instruction, the branch of 0 cycle can be performed, that is, It has an architecture that can eliminate the pipeline delay caused by branching. In the present scheme, the value of the category prediction flag is determined, in most cases, long before it is tested. Therefore, by utilizing the characteristics of the 0-cycle branch of RS / 6000, it is possible to eliminate the delay time due to the branch that was necessary even when the prediction was successful.

【００５３】ＲＳ／６０００上で稼働しているＰＣシミ
ュレータにおけるメモリ操作命令の実行時間を測定す
る。対象とするメモリ操作命令は以下の８０８６のスタ
ックフレーム上の変数にＡＸレジスタの値を書き込むも
のである。これは、通常のデータをストアする命令なの
で、予測があった場合の実行となる。 mov ss:8(bp),ax ・本方式による実行時間（２サイクル）実行アドレスの計算（０サイクル） if 予測フラグ＝０でない then call
属性処理（１サイクル）データのストア合計３サイクル（４命令）・属性方式による実行時間（２サイクル）実行アドレスの計算（１サイクル）属性アドレスの計算（２サイクル）属性値のロード（４サイクル） if 属性値が０でない then call 属
性処理（１サイクル）データのストア合計１０サイクル（７命令）The execution time of the memory operation instruction in the PC simulator operating on RS / 6000 is measured. The target memory operation instruction is to write the value of the AX register to the variable on the following 8086 stack frame. Since this is an instruction to store normal data, it will be executed when there is a prediction. mov ss: 8 (bp), ax ・ Execution time according to this method (2 cycles) Calculation of execution address (0 cycles) if prediction flag = 0 then call
Attribute processing (1 cycle) Data storage Total 3 cycles (4 instructions) ・ Execution time by attribute method (2 cycles) Execution address calculation (1 cycle) Attribute address calculation (2 cycles) Attribute value load (4 cycles) if attribute value is not 0 then call attribute processing (1 cycle) Data storage 10 cycles in total (7 instructions)

【００５４】属性値テストの処理時間（データ空間へ
の）は、属性値がデータキャッシュにヒットした場合
で、７サイクルかかってしまうが、本発明では、事実上
０サイクルとなる。処理全体の実行時間は１０サイクル
が３サイクルとなる。属性値がキャッシュミスした場合
は非常に大きな差となる。予測がはずれた場合（Ｉ／
Ｏ、プログラム、ＲＯＭなどの場合）も、ＲＳ／６００
０のもつ０サイクル分岐の能力のため実行時のオーバー
ヘッドはまったくない。実行する命令数はもともと７命
令必要であったものが４命令に縮小される。これによっ
て、命令キャッシュの利用効率が改善する。また、属性
値をアクセスしなくなるため、データキャッシュの利用
効率も向上する。これらによって、さらなる高速化が得
られる。The processing time (to the data space) of the attribute value test takes 7 cycles when the attribute value hits the data cache, but in the present invention, it is practically 0 cycle. The execution time of the entire processing is 10 cycles, which is 3 cycles. If the attribute value has a cache miss, the difference will be very large. If the prediction is wrong (I /
O, program, ROM, etc.), RS / 600
There is no run-time overhead due to the zero cycle branch capability of zero. The number of instructions to be executed was originally reduced from 7 to 4 but is reduced to 4. This improves the utilization efficiency of the instruction cache. Further, since the attribute value is no longer accessed, the utilization efficiency of the data cache is improved. By these, further speeding up is obtained.

【００５５】[0055]

【発明の効果】以上説明したようにこの発明によれば、
メモリ・アクセスのエミュレーション時に従来行われて
いた、実際のデータ領域かどうかのチェックを簡易に行
うようにしているので、エミュレーションを高速化でき
る。As described above, according to the present invention,
The emulation can be sped up because it is easy to check whether it is the actual data area, which was conventionally done during memory access emulation.

[Brief description of drawings]

【図１】従来例を説明する図である。FIG. 1 is a diagram illustrating a conventional example.

【図２】この発明の概略を説明する図である。FIG. 2 is a diagram illustrating the outline of the present invention.

【図３】この発明のメモリ・アクセスのカテゴリを説
明する図である。FIG. 3 is a diagram illustrating categories of memory access according to the present invention.

【図４】この発明のメモリ・アクセスのカテゴリを説
明する図である。FIG. 4 is a diagram for explaining categories of memory access according to the present invention.

【図５】この発明のメモリ・アクセスのカテゴリを説
明する図である。FIG. 5 is a diagram illustrating categories of memory access according to the present invention.

【図６】この発明の実施例の全体の動作を説明するフ
ローチャートである。FIG. 6 is a flowchart explaining the overall operation of the embodiment of the present invention.

【図７】上述実施例の要部を説明する図である。FIG. 7 is a diagram illustrating a main part of the above-described embodiment.

【図８】上述実施例のメモリ読み出しアクセスの際の
動作を説明するフローチャートである。FIG. 8 is a flowchart illustrating an operation at the time of memory read access according to the above-described embodiment.

【図９】上述実施例のメモリ書き込みアクセスの際の
動作を説明するフローチャートである。FIG. 9 is a flowchart illustrating an operation of the memory write access according to the above-described embodiment.

【図１０】上述実施例のセグメント予測表の生成を説
明する図である。FIG. 10 is a diagram illustrating generation of a segment prediction table according to the above-described embodiment.

【図１１】上述実施例のスタック変数へのアクセスの
際の動作を説明する図である。FIG. 11 is a diagram illustrating an operation at the time of accessing a stack variable according to the above-described embodiment.

【図１２】上述実施例のヒープ領域の構造体へのアク
セスの際の動作を説明する図である。FIG. 12 is a diagram for explaining the operation when accessing the structure of the heap area in the above-mentioned embodiment.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平２−5138（ＪＰ，Ａ) ─────────────────────────────────────────────────── ─── Continuation of the front page (56) Reference JP-A-2-5138 (JP, A)

Claims

[Claims]

1. An emulation method, wherein a first processor having a first instruction set is emulated by a second processor having a second instruction set, and an application for the first processor is executed by the second processor. Generating the memory space of the first processor in the memory of two processors; generating a first attribute table representing the attribute value of each memory location of the memory space; Performs a logical operation on the attribute value of a memory location in the subspace and allocates the first logical value when there is a memory location that requires special processing, and when there is no memory location that requires special processing Generating a second attribute table by allocating a second logical value; Each time the specified range of memory access of standing changes,
Referring to the second attribute table, a step of determining whether or not a memory location requiring special processing in addition to the memory access is within the range of the memory access, When it is determined that the location does not exist, the step of simply executing the memory access without performing the check by the above-mentioned first table until the range of the memory access is changed, and the memory requiring special processing. When it is determined that there is an access, at least until the next memory access range is changed, it is determined whether or not special processing is necessary by referring to the above-mentioned first table for each memory access. If so, the emulation method comprising the step of executing the special processing.

2. An emulation method, wherein a first processor having a first instruction set is emulated by a second processor having a second instruction set, and an application for the first processor is executed by the second processor. Generating the memory space of the first processor in the memory of the two processors, generating a first attribute table representing the attribute value of each memory location of the memory space, and changing the attribute value. A step of modifying the first attribute table; and, for each subspace of the memory space, performing a logical operation of the attribute value of the memory location in the subspace, and when there is a memory location requiring special processing, If you allocate a logical value of 1 and there are no memory locations that need special handling, The step of generating a second attribute table by allocating a logical value of 2; the step of modifying the second attribute table when the first attribute table is modified; and the specification of the current memory access range. Every time it changes
Alternatively, each time the second attribute table is modified, the second attribute table is referred to, and whether the memory location requiring special processing in addition to the memory access is within the range of the memory access is checked. The determining step, and when it is determined that there is no memory location that requires special processing, until the next change in the range of memory access or until the second attribute table is modified, 1 Memory is not used for inspection
When it is determined that there is a memory access requiring a special process and a step of executing the access, at least until the next memory access range is changed or the second attribute table is modified. Up to the step of determining whether or not special processing is necessary by referring to the first table for each memory access, and executing the special processing if necessary.

3. The emulation method according to claim 1, wherein the sub space is allocated to each of a series of segments that fills the memory space, and the sub space includes the segment and two segments adjacent to the segment.

4. The internal register of the second processor holds the result of determination as to whether or not the memory location requiring the special processing is within the range of the memory access. Emulation method.

5. The memory-mapped I is the special processing.
5. The emulation method according to claim 1, 2, 3 or 4, which is executed for an access to an I / O, a write access to a ROM area, and a write access to a program area.

6. The size of the memory access range changes according to the method of memory access.
The emulation method described in 2, 3, 4 or 5.

7. The emulation method according to claim 6, wherein the memory access method includes at least a stack variable access method, a global variable access method, and a heap area structure access method.

8. The second processor for emulating a first processor having a first instruction set with a second processor having a second instruction set to execute an application for the first processor on the second processor. Computer emulation for processor emulation
In the program product, the step of generating the memory space of the first processor in the memory of the second processor, the step of generating a first attribute table representing the attribute value of each memory location of the memory space, and the current memory・ Each time the access range is changed,
Alternatively, each time the second attribute table is modified, the second attribute table is referred to, and whether the memory location requiring special processing in addition to the memory access is within the range of the memory access is checked. Each time the determination step and the current memory access range specification changes,
Referring to the second attribute table, a step of determining whether or not a memory location requiring special processing in addition to the memory access is within the range of the memory access, When it is determined that the location does not exist, the step of simply executing the memory access without performing the check by the above-mentioned first table until the range of the memory access is changed, and the memory requiring special processing. -When it is determined that there is an access, at least until the next memory access range is changed, it is determined whether or not special processing is necessary by referring to the first table for each memory access, and it is necessary. If so, the emulation computer characterized by causing the second processor to execute the step of executing the special processing. Program product.