JP2003510681A

JP2003510681A - Optimized bytecode interpreter for virtual machine instructions

Info

Publication number: JP2003510681A
Application number: JP2001525514A
Authority: JP
Inventors: ファビオリッカルディ
Original assignee: Philips Electronics NV
Current assignee: Koninklijke Philips NV
Priority date: 1999-09-21
Filing date: 2000-09-13
Publication date: 2003-03-18
Also published as: KR20010080525A; EP1183598A2; WO2001022213A3; WO2001022213A2; CN1347525A; CN1173262C

Abstract

(57)【要約】【課題】組込みシステムに対して非常に便利な、解釈されたプログラムの実行時間を最適化する方法を提供すること。【解決手段】本発明は、バイトコードに基づく言語の仮想マシンインタープリタにおいて、単純なオペレーションコードの元のシーケンスを当該マクロの動作コードの新たなシーケンスにより置換することにより、動的に当該仮想マシンをマクロの動作コードにより再構成する手段を有する、解釈されたプログラムを最適化する方法に関する。仮想マシンインタープリタは、バイトコードをオペレーションコードのインプリメンテーションアドレスに翻訳するオペレーションコードのインプリメンテーションアドレスを含む変換テーブルにより、間接スレッディングインタープリタとして符号化されている。アプリケーション：いかなるバイトコードに基づくプログラム言語も使用している組込みシステム、双方向ビデオ伝送に対するセットトップボックス。 (57) [Problem] To provide a method of optimizing the execution time of an interpreted program which is very convenient for an embedded system. SOLUTION: The present invention dynamically replaces a simple operation code original sequence with a new sequence of operation codes of a macro in a virtual machine interpreter of a language based on bytecode, thereby dynamically changing the virtual machine. The present invention relates to a method for optimizing an interpreted program, having a means for reconstructing with a macro operation code. The virtual machine interpreter is encoded as an indirect threading interpreter by a conversion table that includes the implementation address of the operation code that translates the bytecode into the implementation address of the operation code. Applications: Embedded systems using any bytecode-based programming language, set-top boxes for interactive video transmission.

Description

Detailed Description of the Invention

【０００１】[0001]

TECHNICAL FIELD OF THE INVENTION

本発明は、解釈されたプログラムの実行時間の最適化に関する。これは、より
詳しくは、新たなマクロオペレーションコードにより動的にそれ自体を再構成す
る仮想マシンによって、解釈されたプログラムを最適化する方法に関する。本発
明は、いかなるバイトコードに基づくプログラム言語にも適用できる。The present invention relates to optimizing the execution time of interpreted programs. It more particularly relates to a method of optimizing an interpreted program by a virtual machine that dynamically reconfigures itself with new macro opcodes. The present invention is applicable to any bytecode based programming language.

【０００２】[0002]

[Prior art]

プログラマが視覚可能なスタックによるバイトコードに基づく言語は、コンパ
イラに対する中間言語として、そしてマシンに依存しない実行可能プログラム表
示として普及している。これらは、ネットワークコンピューティングに対して有
意な利点を提供する。モントリオール（カナダ）で1998年6月17日に開催された
、Programming Language Design and Implementation（PLDI）に関するACM SIGP
LAN 98会議のProceedingsにおける、I. PiumantaおよびF. Riccardiによる論文
「選択インラインによる直接スレッデッドコードの最適化」の、pp. 291-300は
、解釈されたプログラムを最適化する第一パラグラフで述べた技術を記述してい
る。仮想マシン（VM）は、VMインタープリタによりプログラムを解釈するために
使用される。VMは、特にこのアーキテクチャに対してコンパイルされたアプリケ
ーションが実行される仮想プロセッサのアーキテクチャを表すソフトウェアイン
プリメンテーションである。仮想プロセッサ／マシンの命令は、バイトコードと
呼ばれている。VMインタープリタは、バイトコードの実行メカニズムを表すVMの
部分である。バイトコードは、VMインタープリタにより解釈される。バイトコー
ドの実行メカニズムは、現在、スイッチケースブロックを有する無限ループとし
て実行される。引用した論文に記載されている技術は、直接スレデッドインター
プリタに適用される。スレッデッドコードインタープリタは、バイトコードを行
で実行する。各バイトコードの翻訳は、次のバイトコードへの参照を含む。した
がって、翻訳スレデッドインタープリタにより実行されるバイトコードは、無限
ループに関係しない。スレデッドインタープリタが実行上の利点を提供するとし
も、それらは、遅すぎるし、またかなり多くのメモリを必要とするのでほとんど
の組込みシステムには不便である。直接スレッデッドコードインタープリタの場
合、引用した論文に説明されているように、VMバイトコードは、それらのインプ
リメンテーションのアドレスにより表されるので、各バイトコードは、次のバイ
トコードのインプリメンテーションに直接ジャンプすることができる。バイトコ
ードの翻訳が行われると、バイトコードインプリテーションの物理アドレスが、
即座にアクセス可能となるように、テーブルは、翻訳演算の前にアプリケーショ
ンの各バイトコードのアドレスにより初期化される。このテーブルにより、バイ
トコードから他のものにスイッチすることが可能になる。直接スレデッドインタ
ープリタは速いが、それらはコードが拡張する。バイトコードを直接スレッデッ
ドコードに変更することにより、コードサイズは約150%増大してしまう。何故な
らば，オペレーションコードが、それらのインプリメンテーションコードのアド
レスにより置換されるからである。一般に、アドレスが4バイトを必要とするの
に対し、オペレーションコードは1バイトしか必要としない。したがって、直接
スレデッドインタープリタは、メモリの消費量を増大させるので、組込みシステ
ムに対してはあまり適切ではない。Bytecode-based languages with programmer-visible stacks are popular as intermediate languages for compilers and as machine-independent executable program representations. These provide significant advantages for network computing. ACM SIGP on Programming Language Design and Implementation (PLDI), June 17, 1998 in Montreal (Canada)
I. Piumanta and F. Riccardi's paper "Optimizing Direct Threaded Code with Selective Inline," pp. 291-300, at Proceedings of the LAN 98 Conference, mentioned in the first paragraph on optimizing interpreted programs. Described technology. A virtual machine (VM) is used by a VM interpreter to interpret a program. A VM is a software implementation that represents the architecture of a virtual processor on which an application compiled specifically for this architecture runs. Virtual processor / machine instructions are called bytecodes. The VM interpreter is the part of the VM that represents the bytecode execution mechanism. The bytecode is interpreted by the VM interpreter. The bytecode execution mechanism is currently implemented as an infinite loop with switch case blocks. The techniques described in the cited paper apply directly to the threaded interpreter. The threaded code interpreter executes bytecode in lines. Each bytecode translation contains a reference to the next bytecode. Therefore, the bytecode executed by the translation threaded interpreter does not participate in the infinite loop. Even though threaded interpreters provide execution advantages, they are inconvenient for most embedded systems because they are too slow and require a lot of memory. In the case of a direct threaded code interpreter, VM bytecodes are represented by the addresses of their implementations, as described in the cited paper, so each bytecode is the implementation of the next bytecode. You can jump directly to. When the bytecode is translated, the physical address of the bytecode implementation is
To be instantly accessible, the table is initialized with the address of each bytecode of the application before the translation operation. This table makes it possible to switch from bytecode to another. Direct threaded interpreters are fast, but they are code extensible. Changing the bytecode directly to threaded code increases the code size by about 150%. This is because the operation codes are replaced by the addresses of their implementation code. In general, an address requires 4 bytes, whereas an operation code requires only 1 byte. Therefore, direct threaded interpreters increase memory consumption and are not well suited for embedded systems.

【０００３】[0003]

[Means for Solving the Problems]

本発明の目的は、組込みシステムに対して非常に便利な、解釈されたプログラ
ムの実行時間を最適化する方法を提供することである。このようなシステムは、
例えば、ディジタルビデオ受信機（しばしば、セットトップボックスと呼ばれる
）に組み込まれる衛星またはケーブル伝送システムとすることができる。本発明
は、そのオペレーティングシステムが、プログラム言語に基づくバイトコードに
基づく如何なる製品にも適用される。本発明は、メモリおよびCPUリソースを節
約しかつシシステムの効率を向上させることを可能にする。It is an object of the present invention to provide a method for optimizing the execution time of an interpreted program which is very convenient for embedded systems. Such a system
For example, it may be a satellite or cable transmission system incorporated into a digital video receiver (often called a set top box). The invention applies to any product whose operating system is based on programming language bytecodes. The invention makes it possible to save memory and CPU resources and improve the efficiency of the system.

【０００４】本発明によると、バイトコードに基づく言語の仮想マシンインタープリタにお
いて、解釈されたプログラムを最適化する方法であって、前記仮想マシンが、単
純なバイトコードのシーケンスを新たなマクロバイトコードのシークエンスによ
り置換することによりそれ自体を動的に再構成し、かつ前記仮想マシンインター
プリタが、前記バイトコードをそれらのインプリメンテーションコードに翻訳す
るスレッデッドコードインタープリタとして符号化されている方法が、記載され
ている。本発明のスレッデッドコードインタープリタは、バイトコードの翻訳の
間、次のバイトコードのアドレスが次のバイトコードにジャンプすることが出来
るように取り出されるように、バイトコードのインプリメンテーションアドレス
を含む参照テーブルにより、間接スレッデッドコードインタープリタとして符号
化されている。According to the invention, a method for optimizing an interpreted program in a virtual machine interpreter for a bytecode-based language, the virtual machine converting a simple sequence of bytecodes into a new macrobytecode. A method of dynamically reconfiguring itself by substituting by a sequence and wherein the virtual machine interpreter is encoded as a threaded code interpreter that translates the bytecodes into their implementation code, Has been done. The threaded code interpreter of the present invention includes a bytecode implementation address reference so that during bytecode translation, the address of the next bytecode is retrieved so that the next bytecode can be jumped to. It is encoded by the table as an indirect threaded code interpreter.

【０００５】本発明および本発明をインプリメントするためにオプションとして使用するこ
とができる付加機能は、以下に記載される図面を参照して明らかになるであろう
。The invention and additional features that may optionally be used to implement the invention will be apparent with reference to the drawings described below.

【０００６】[0006]

BEST MODE FOR CARRYING OUT THE INVENTION

いかなるバイトコードに基づく言語にも適用できる新規な実行時間最適化方法
を示す本発明が、Java言語を一例として用いて、以下により詳細に説明される。The present invention, which illustrates a novel runtime optimization method applicable to any bytecode-based language, is described in more detail below using the Java language as an example.

【０００７】通常Just-In-Time(JIT)コンパイラにより用いられるアプローチは、Java仮想
マシン（VM）インタープリタを全く捨て、そしてアプリケーションのバイトコー
ドをその実行前に固有マシンコード（それゆえ、Just-In-Time表示）に翻訳する
。このプロセスは、元のアプリケーションのセマンティックを理解し、それをよ
り便利な固有形式に再表現することである。これは、性能を達成する面では、効
率的な方法であるが、これは、一方で、バイトコードに基づく言語が固有コード
よりコンパクトであるので、非常に大きいメモリを消費し、他方で、Javaバイト
コードをターゲットマシンに再マッピングすることが容易なタスクではないので
、CPU（中央処理装置）の大きいリソースを消費する。The approach normally used by Just-In-Time (JIT) compilers throws away the Java Virtual Machine (VM) interpreter altogether, and leaves the application's bytecode with native machine code (hence Just-In -Time display). The process is to understand the semantics of the original application and re-express it in a more convenient native form. This is an efficient way to achieve performance, but it consumes a lot of memory, on the one hand, as languages based on bytecode are more compact than native code, and on the other hand Java Remapping bytecodes to the target machine is not an easy task and consumes a lot of CPU (Central Processing Unit) resources.

【０００８】本発明は、ある種のダイナミックコード生成にも基づいているが、その目的は
、アプリケーションのJavaバイトコードを固有マシンコードに翻訳することでは
なく、むしろ、Java VMをアプリケーションの特定バイトコードシーケンスの実
行に動的に適応させることである。従って、元のアプリケーションのJavaバイト
コードは、保存され、VMは、その実行効率を向上させる新規なバイトコードまた
は演算コード（オペコード）により動的に豊かにされる。The present invention is also based on some sort of dynamic code generation, the purpose of which is not to translate the Java bytecode of an application into native machine code, but rather a Java VM to a specific bytecode of the application. Dynamically adapting to the execution of the sequence. Therefore, the Java bytecodes of the original application are preserved and the VM is dynamically enriched with new bytecodes or opcodes (opcodes) that improve its execution efficiency.

【０００９】このアプローチには、いくつかの利点が有る：実行可能コードのサイズを増大させない：このアプリケーションは、メモリ-
効率の良いJavaのバイトコード化された表示に残される。 VMの実行メカニズムが、経済的である：実行メカニズムは、一つしかないので
、アプリケーションを実行するVMは、多重コード表示を取扱う必要がないので、
そのサイズを減少させかつその信頼性を向上させることが出来る。コード生成技術が、単純である：VMオプティマイザは、非常に単純な構造を有
し、アプリケーションのバイトコード分析は、ごくわずかなCPUリソースしか必
要としないワンパステーブル-駆動手順であり、そしてこれは新たなバイトコー
ドの合成を直接駆動する。This approach has several advantages: Does not increase the size of the executable code: This application is memory-
Efficiently left in Java byte-coded display. VM execution mechanism is economical: Since there is only one execution mechanism, the VM executing the application does not have to deal with multiple code representations,
It can reduce its size and improve its reliability. Code generation technology is simple: VM optimizer has a very simple structure, application bytecode analysis is a one-pass table-driven procedure that requires very little CPU resources, and this is a new Directly drives the synthesis of simple bytecodes.

【００１０】これらの特性により本発明は組み込まれたアプリケーションに対し適切なもの
となる。本発明の最適化テクニックは、「典型的な」アプリケーションのカテゴ
リに関するインタープリタの非常に基本的なメカニズムのコストの調査に基づい
ている。アプリケーションのプロフィールの適切性は、考えられる種々の最適化
テクニックから達成できる潜在的利点にある。目標が組み込まれたアプリケーシ
ョンである、「典型的な」アプリケーションとして規定されるかもしれないもの
には、例えば、制御アプリケーション、グラフィカルユーザーインタフェース等
が挙げられる。These properties make the present invention suitable for embedded applications. The optimization technique of the present invention is based on an investigation of the cost of the interpreter's very basic mechanism for "typical" application categories. The suitability of an application's profile lies in the potential benefits that can be achieved from the various possible optimization techniques. Applications that have goals embedded and may be defined as "typical" applications include, for example, control applications, graphical user interfaces, and the like.

【００１１】目標アプリケーションは、基本的なVM（オブジェクト操作）により提供される
基本命令に、良好にマップされると仮定する。したがって、それらは、ラジカル
コード変換から大きな利益を得ることはなく、むしろVMの実行メカニズムの一般
的な改善から利益を得るであろう。いかにしてVMの効率を向上させるべきかを理
解するために、Amdhalの法則が使用された。ヘネシーおよびパターソンによるバ
ージョンにおいては、Amdhalの法則は、次のように表される：「あるより速い実
行モードを使用することから得られる性能改善は、そのより速いモードを使用す
ることができる時間の割合部分により限定される」、または、より一般的には：
「共通ケースを速くする」。It is assumed that the target application maps well to the basic instructions provided by the basic VM (Object Manipulation). Therefore, they would not benefit significantly from radical transcoding, but rather from general improvements in VM execution mechanisms. Amdhal's law was used to understand how to improve the efficiency of VMs. In the version by Hennessy and Patterson, Amdhal's law is expressed as: "The performance improvement that results from using one faster execution mode is the time it takes to use that faster mode. Limited by a proportion ", or more commonly:
"Make common cases faster."

【００１２】インタープリタの効率は、実行可能コードに対して選択された表示およびバイ
トコードをディスパッチするために使用されるメカニズムに依存する。インプリ
メンテーションコストを減らす第一のアプローチは、インタープリタの重要部分
がメカニズムをディスパッチするその命令であるので、命令ディスパッチングの
コストを減少させることであった。（純粋なバイトコードインタープリタと呼ば
れる）典型的なインタープリタは、プロセッサシミュレーションのようにインプ
リメントされる：大きいスイッチ文が、無限ループ内にあって、それらのインプ
リメンテーションに命令をディスパッチする。したがって、純粋なバイトコード
インタープリタの内側ループは、非常に単純である：次のバイトコードをフェッ
チし、そしてスイッチ文を使用しているインプリメンテーションにディスパッチ
する。インタープリタは、逐次バイトコードをディスパッチするスイッチ文を含
む無限ループであり、そして無限ループの開始に制御を戻すためにスイッチを起
動させることにより、次のバイトコードに制御を渡す。次の命令の組は、典型的
なバイトコードインタープリタのインプリメンテーションを示す。ループ ( Op = *pc++; Switch (op) { Case op_1 : // op_1's implementation break; case op_2 : // op_2's implementation break; case op_3 : // op_3's implementation break; ... }The efficiency of the interpreter depends on the mechanism used to dispatch the selected representation and bytecode to the executable code. The first approach to reduce implementation cost was to reduce the cost of instruction dispatching, since a significant part of the interpreter is its instruction dispatching mechanism. A typical interpreter (called a pure bytecode interpreter) is implemented like a processor simulation: large switch statements are in infinite loops and dispatch instructions to their implementation. Therefore, the inner loop of the pure bytecode interpreter is very simple: fetch the next bytecode and dispatch it to the implementation using the switch statement. The interpreter is an infinite loop that contains switch statements that dispatch sequential bytecodes, and passes control to the next bytecode by activating the switch to return control to the beginning of the infinite loop. The following set of instructions shows a typical bytecode interpreter implementation. Loop (Op = * pc ++; Switch (op) {Case op_1: // op_1's implementation break; case op_2: // op_2's implementation break; case op_3: // op_3's implementation break; ...}

【００１３】コンパイラが、ループの終わりで暗黙のジャンプによりブレークからのジャン
プチェインをその始めに最適化すると仮定すると、このアプローチに関連するオ
ーバーヘッドは、次の通りである：命令ポインターpcをインクリメントし、メモリから次のバイトコードをフェッチし、スイッチするために引数への冗長範囲検査、テーブルから行き先ケースラベルのアドレスをフェッチし、そのアドレスにジャンプし、そして各バイトコードの終了部で：次のバイトコードをフェッチするためにループの開始部に戻る。Assuming the compiler optimizes the jump chain from the break to its beginning with an implicit jump at the end of the loop, the overhead associated with this approach is: increment the instruction pointer pc, Fetch the next bytecode from memory, check the redundancy range to the argument to switch, fetch the address of the destination case label from the table, jump to that address, and at the end of each bytecode: next byte Return to the beginning of the loop to fetch the code.

【００１４】この場合、スイッチ文の実際のインプリメンテーションのような非効率性の全
ての他のソースを無視すると、命令ディスパッチングのコストは、次のものから
構成される： 2つのメモリアクセス：次の命令の値を検索するメモリアクセス、および命令
のインプリメンテーションのアドレスを取り出すメモリアクセスそして2つのブランチ：バイトコードのインプリメンテーションにジャンプす
るブランチとループの始めに戻る他のブランチ。ジャンプは、現在のアーキテク
チャでは最も高価な命令に属する。In this case, ignoring all other sources of inefficiency, such as the actual implementation of the switch statement, the cost of instruction dispatching consists of: 2 memory accesses: A memory access to retrieve the value of the next instruction, and a memory access to retrieve the address of the instruction's implementation, and two branches: a branch that jumps to the bytecode implementation and another branch that returns to the beginning of the loop. Jump belongs to the most expensive instruction in current architecture.

【００１５】純粋なバイトコードインタープリタは、書き込みと理解が容易である。それら
は、また、移植可能性が高いが、低速である。したがって、それらは、組込みシ
ステムに対しては便利でない。前述した具体例の場合のように、ほとんどのバイ
トコードが単純な演算を実行する場合には、大部分の実行時間は、ディスパッチ
を実行するために消費される。実際に、メカニズムの現実のコストを認識するた
めに、それは、単一のバイトコードの実行コストと比較されるべきである。Java
バイトコードは非常に低レベルのセマンティクスを有するので、それらのインプ
リメンテーションは多くの場合単純である。したがって、最も一般に実行される
バイトコードは、実際には、ディスパッチングメカニズムそれ自体より高価には
ならない。A pure bytecode interpreter is easy to write and understand. They are also more portable but slower. Therefore, they are not convenient for embedded systems. When most bytecodes perform simple operations, as in the example above, most of the execution time is spent executing dispatch. In fact, in order to recognize the actual cost of the mechanism, it should be compared to the execution cost of a single bytecode. Java
Bytecodes have a very low level of semantics, so their implementation is often simple. Therefore, the most commonly executed bytecodes are actually less expensive than the dispatching mechanism itself.

【００１６】本発明の効率における第一の改善は、次の命令の組で示されるような間接スレ
ッデッドコードの採用である： Op_1_lbl: // op_1's implementation goto opcode_table (*pc++) ; Op_2_lbl: // op_2's implementation goto opcode_table (*pc++) ; Op_3_lbl: // op_3's implementation goto opcode_table (*pc++) ; ここで、Op_1_ lbl、Op_2_ lblおよびOp_3_ lblは、VMインタープリタにより解
釈される3つの異なるオペレーションコードを表す。The first improvement in efficiency of the present invention is the adoption of indirect threaded code as shown by the following set of instructions: Op_1_lbl: // op_1's implementation goto opcode_table (* pc ++); Op_2_lbl: // op_2's implementation goto opcode_table (* pc ++); Op_3_lbl: // op_3's implementation goto opcode_table (* pc ++); where Op_1_lbl, Op_2_lbl and Op_3_lbl represent three different opcodes interpreted by the VM interpreter.

【００１７】間接スレッデッドコードと呼ばれる、このインプリメンテーションによると、
VMは、間接スレッデッドコードインタープリタとして符号化される。バイトコー
ド翻訳の間、次のバイトコードのアドレスが、分析される。opcode_tableと示さ
れる参照テーブルは、バイトコードインプリメンテーションアドレスを含む。こ
の参照テーブルは、ポインタ(*pc++)のインデクスによりアクセスされる。各バ
イトコード翻訳に対しては、次のバイトコードのアドレスが、次のバイトコード
にジャンプするために取り出される。このようにして、各バイトコードインプリ
メンテーションは、直接次のバイトコードインプリメンテーションにジャンプし
、一つのブランチ、外側ループ、およびスイッチ文のインプリメンテーションの
不必要な非効率性（範囲のチェックおよびデフォルトケースの処理）が、省略さ
れた。According to this implementation, called indirect threaded code,
The VM is encoded as an indirect threaded code interpreter. During bytecode translation, the address of the next bytecode is parsed. The lookup table, designated opcode_table, contains bytecode implementation addresses. This reference table is accessed by the index of the pointer (* pc ++). For each bytecode translation, the address of the next bytecode is fetched to jump to the next bytecode. In this way, each bytecode implementation jumps directly to the next bytecode implementation, unnecessarily inefficient (range checking) in the implementation of one branch, outer loop, and switch statement. And default case handling) were omitted.

【００１８】本発明の好適な一実施例によると、翻訳はバイトコードに基づく言語VM明細書
の、使用されていないバイトコードを利用することにより実行される。According to a preferred embodiment of the present invention, the translation is carried out by utilizing the unused bytecodes of the bytecode-based language VM specification.

【００１９】図1のブロックダイアグラムは、間接スレッデッドコードインタープリタによ
りバイトコード（例えば、バイトコードbipush）を固有命令に翻訳する本発明の
方法の主なステップを要約している：ステップK0= BIPUSH; bipushパラメータ(par)である1/2語をスタックに置
くことから構成されるバイトコードbipushを翻訳する方法の開始、ステップK1= PAR; bipushパラメータ(par)を取り出しステップK2= PUT; スタックにbipushパラメータを置くステップK3 = GOTO; 次のバイトコードのインプリメンテーションのアド
レスを含む参照テーブルopcode_tableを調査することにより、次のバイトコード
に行く(goto opcode_table (*pc)）。The block diagram of FIG. 1 summarizes the main steps of the inventive method for translating bytecodes (eg bytecode bipush) into native instructions by an indirect threaded code interpreter: step K0 = BIPUSH; Start a method of translating the bytecode bipush consisting of putting 1/2 words on the stack, which is the bipush parameter (par), step K1 = PAR; retrieve the bipush parameter (par) step K2 = PUT; bipush on the stack Put Parameter K3 = GOTO; Go to the next bytecode (goto opcode_table (* pc)) by examining the lookup table opcode_table that contains the address of the implementation of the next bytecode.

【００２０】スレッデッドコードの採用は、それ自体VMの効率を二倍にすることができるが
、以下の記述から判るように、それは、他の興味ある最適化機会を提供すること
もできる。Javaのバイトコードの統計的分析によると、平均して、5-6個の命令
毎にブランチが存在する。それらが、パイプラインストールおよび／またはトリ
ガー外部バス活動の要因となる可能性があるので、現在のいかなるCPUにおいて
も、ブランチは本質的に高価な命令となる。また、ループ展開または方法コール
インライン(call in-lining)に対しては、それについて実際に行うことができる
ことは多くない。コードを固有表示に再コンパイルする場合でさえ、制御文はそ
こに存在するであろう。While the adoption of threaded code can double the efficiency of the VM itself, it can also provide other interesting optimization opportunities, as can be seen from the description below. According to a statistical analysis of Java bytecodes, on average there is a branch every 5-6 instructions. In any current CPU, branches are inherently expensive instructions because they can cause pipeline stalls and / or trigger external bus activity. Also, for loop unrolling or method call in-lining, there is not much that can really be done about it. The control statements will still be there, even if the code is recompiled to uniqueness.

【００２１】ハイエンドのワークステーションでのオブジェクト指向のアプリケーションに
ついてのCPUの使用に関する最近の調査によると、CPUは、予測が誤ったブランチ
文のために、パイプラインストールから回復し、かつ主メモリ（キャッシュが無
い場合）からデータおよび命令を待つために、そのクロックサイクルの70%も費
やしている。さらに、組込みシステムで使用可能なCPUは、通常、キャッシュが
非常に小さく、ダイナミックブランチ予測に対して何のハードウェアの援助も無
く、かつL2キャッシュが無い低いおよび／またはせまいメモリインタフェースを
有する。これらの付加制約は、CPUの利用および効率をさらに減少させるであろ
う。A recent study on CPU usage for object-oriented applications on high-end workstations showed that the CPU recovered from pipeline stalls due to mispredicted branch statements, and the main memory (cache was 70% of that clock cycle to wait for data and instructions (if none). In addition, CPUs available in embedded systems typically have very small caches, no hardware assistance for dynamic branch prediction, and low and / or narrow memory interfaces without L2 caches. These additional constraints will further reduce CPU utilization and efficiency.

【００２２】 Javaバイトコードは、2つのカテゴリに分けることができる：単純なオペレーションコード（ロード、ストア、算術および制御文）および複雑なオペレーションコード（メモリー管理、同期、など）。[0022] Java bytecode can be divided into two categories: Simple opcodes (load, store, arithmetic and control statements) and Complex operation codes (memory management, synchronization, etc.).

【００２３】単純なバイトコードは、典型的には、ディスパッチングメカニズムより廉価で
ある。これに対し、ディスパッチングコストはバイトコード実行コストの全コス
トの最小部分しか占めないので、複雑なバイトコードは、はるかに高価である。
単純なバイトコードは、複雑なバイトコードよりもはるかに（大きさの順序につ
いて）頻繁に実行される。これは、古典的なJavaインタープリタは、その時間の
大部分を有用なことを実際に行うよりもバイトコードのディスパッチングに費や
すことを意味する。従って、複雑なバイトコードに対するよりも単純なバイトコ
ードに対してディスパッチングコストを低減させることが、確実により実効的で
あることが仮定される。Simple bytecode is typically cheaper than the dispatching mechanism. Complex bytecodes, on the other hand, are much more expensive, as dispatching costs make up the least part of the total cost of bytecode execution costs.
Simple bytecodes are executed much more frequently (in order of magnitude) than complex bytecodes. This means that the classic Java interpreter spends most of its time doing bytecode dispatching rather than doing useful things. Therefore, it is assumed that reducing dispatching costs for simpler bytecodes than for complex bytecodes is definitely more effective.

【００２４】バイトコードを間接スレッデッドコードに翻訳することは、実行可能なコード
に任意の変換を行なう機会を与える。このような変換により、バイトコードの共
通シーケンスが検出され、そしてそれらは単一のスレデッド「マクロコード」に
翻訳される。このマクロコードは、元のバイトコードの全体のシーケンスについ
て作業を実行する。したがって、本発明の好適な一実施例によると、単純なバイ
トコードのシーケンスをいくつかの等価「マクロコード」により置換することが
、提案されている。例えば、引用した論文に開示されているように、バイトコー
ド"push literal, push variable, add, store variable"は、間接スレッデッド
コードにおいて単一の"add-literal-to-variable"なマクロコードに翻訳させる
ことができる。このような最適化は、それらが、元のバイトコードによって示さ
れるが、マクロコードの範囲内では省略される多重ディスパッチのオーバーヘッ
ドを回避するので、実効的である。一連のN個の元のバイトコードから翻訳され
る単一のマクロコードは、実行時にN-1個のバイトコードディスパッチを回避す
る。どのようにマクロコードを構築すべきかについてのより詳しい点は、引用し
た論文に見出される。このようなマクロコードは、次の基準を満たさなければな
らないであろう：Translating bytecodes into indirect threaded code gives the executable code the opportunity to make arbitrary conversions. With such a conversion, a common sequence of bytecodes is detected and they are translated into a single threaded "macrocode". This macro code does work on the entire sequence of original bytecodes. Therefore, according to a preferred embodiment of the invention, it is proposed to replace a sequence of simple bytecodes by some equivalent "macrocode". For example, as disclosed in the cited paper, the bytecode "push literal, push variable, add, store variable" becomes a single "add-literal-to-variable" macro code in indirect threaded code. Can be translated. Such optimizations are effective because they avoid the multiple dispatch overhead that is implied by the original bytecode but is omitted within the macro code. A single macrocode translated from a series of N original bytecodes avoids N-1 bytecode dispatches at run time. More details on how to build macro code can be found in the cited paper. Such macro code would have to meet the following criteria:

【００２５】複雑なバイトコードのディスパッチングコストを低減させることは意味がない
ので、マクロは、単純なバイトコードのシーケンスから作られなければならない
。Since it does not make sense to reduce the dispatching cost of complex bytecodes, macros must be made up of simple bytecode sequences.

【００２６】マクロは、可能なブランチ目標である命令を含んではならない。さもないと、
VM実行メカニズムを大きく変更させなければならないであろう。マクロそれ自体
を、ブランチ目標とすることが出来る。The macro must not contain instructions that are possible branch targets. Otherwise,
The VM execution mechanism will have to be changed significantly. The macro itself can be the branch target.

【００２７】固有ブランチのコストは、ディスパッチ演算のコストと等価であるので、マク
ロは、制御文または方法コールで終了しなければならない。Since the cost of a native branch is equivalent to the cost of a dispatch operation, the macro must end with a control statement or method call.

【００２８】インプリメンテーションの簡単化のために、マクロの最大長さは、約15バイト
コードとすべきである。「自然の」平均マクロ長さは、4-5バイトコードである
。これらの基準から、CPUタイムを僅かしか使用せずに、このようなマクロシー
ケンスを構成することは非常に単純である。方法のバイトコードの単純なスキャ
ンは、実際に十分であり、そして大部分の構文解析を、テーブル駆動および単一
のバイトコードに基づかせることが出来る。For ease of implementation, the maximum macro length should be about 15 byte code. The "natural" average macro length is a 4-5 byte code. From these criteria, constructing such a macro sequence with very little CPU time is very simple. A simple scan of the method's bytecode is actually sufficient, and most parsing can be based on table-driven and single bytecode.

【００２９】使用されないバイトコードが、非常に少ない（平均して30-40）ことを考慮に
入れる好適な実施例の特定代替例によると、新たなマクロ命令を表す新たなバイ
トコードに対しては、2-バイト表示を使用することができる。元のシーケンスの
オペランドは新たなシーケンスの直後にグループ化される。これにより、仮想マ
シンのプログラムカウンタをインクリメントすることにより、それらは容易にア
クセス可能となる。According to a particular alternative of the preferred embodiment, which takes into account that very few unused bytecodes (30-40 on average), for new bytecodes representing new macroinstructions, , 2-byte display can be used. The operands of the original sequence are grouped immediately after the new sequence. This makes them easily accessible by incrementing the program counter of the virtual machine.

【００３０】一旦プロセスが走査されると、マクロは、スレッデッドコードインタープリタ
に対してコンパイラにより生成される二進コードを単にカッティングしそして一
緒にペーストすることにより構成させることができる。マクロは、スレッディン
グディスパッチャによる通常のバイトコードとみなすことができる。Once the process has been scanned, the macro can be constructed by simply cutting and pasting together the binary code generated by the compiler for the threaded code interpreter. Macros can be thought of as normal bytecode by the threading dispatcher.

【００３１】図2は、本発明の仮想マシンの好適な一実施例の概要を示す。VMは、VMインタ
ープリタにより解釈されるバイトコードを含むプログラムをロードするように実
行される。この方法の主ステップは、以下のようになる：ステップK0= INIT：バイトコードを含むプログラムをロードすることによ
りVMにより実行される手順を初期化し、ステップK1= OPCODE：解釈されるバイトコードを取り出し、ステップK2= MACRO：単純なバイトコードのシーケンスをマクロバイトコ
ードにより置換し、ステップK3= TRANS：図1に示される間接スレデッドインタープリタ方法を
使用するマクロバイトコードを解釈し、ステップK4= RES：結果を得て、この方法を終了する。FIG. 2 outlines a preferred embodiment of the virtual machine of the present invention. The VM is executed to load a program containing bytecode that is interpreted by the VM interpreter. The main steps of this method are as follows: Step K0 = INIT: Initialize the procedure executed by the VM by loading the program containing the bytecode, Step K1 = OPCODE: Retrieve the bytecode to be interpreted , Step K2 = MACRO: Replace a sequence of simple bytecodes with macrobytecodes, Step K3 = TRANS: Interpret macrobytecodes using the indirect threaded interpreter method shown in Figure 1, Step K4 = RES: The result is obtained and the method ends.

【００３２】実際のJavaアプリケーションの実行トレースに実行された統計解析によると、
典型的なマクロ長さは、4-5バイトコードであり、かつコード変換の後、残りの
バイトコードより5倍までマクロをより頻繁に実行させることができることが、
判明した。残りのバイトコードは、インプリメンテーションが複雑過ぎてインラ
インする価値がないものおよびブランチ目標分析を考慮することにより残されて
いるものである。このようにして、バイトコードディスパッチングの総コストは
、4のファクタより多く低減させることができる。ディスパッチングコストが、
元々実行総コストの約50%を占める場合、それは、本発明を使用することにより
かなり低減させることができる。According to the statistical analysis performed on the execution trace of the actual Java application,
A typical macro length is 4-5 byte code, and after code conversion, the macro can be executed up to 5 times more often than the rest of the byte code,
found. The rest of the bytecode is that the implementation is too complex to be worth inline and left over by considering branch goal analysis. In this way, the total cost of bytecode dispatching can be reduced by more than a factor of four. The dispatching cost is
If originally accounting for about 50% of the total cost of execution, it can be significantly reduced by using the present invention.

【００３３】本発明は、いくつかの付加利点をもたらす。プロセッサブランチ命令も、また
、5のファクタ低減させることができる。実行されるコードが線形化されている
ので、プロセッサのパイプラインおよびメモリサブシステムの効率は、大幅に改
善させることができる。実際の利得は、パイプラインストールのコストに対して
はプロセッサのアーキテクチャに、そしてキャッシュラインフィルのコストに対
してはメモリサブシステムのアーキテクチャに依存する。「対処すべきメモリ」
システムについては、ほとんどの組み込まれたアプリケーションのように、これ
らのコストは、極めて高いので、確実に低減させる価値がある。残りのディスパ
ッチングコストは、本質的に、Javaコードに存在する制御文に依存する。バイト
コードを古典的なダイナミック再コンパイルのような二進コードに完全に翻訳す
るためには、ブランチ文が実行可能コードに導入されるべきである。これは、残
される残りのディスパッチとだいたい同じコストを有するであろう。The present invention offers several additional advantages. Processor branch instructions can also be reduced by a factor of five. Since the code executed is linearized, the efficiency of the processor pipeline and memory subsystem can be greatly improved. The actual gain depends on the architecture of the processor for the cost of pipeline stalls and the architecture of the memory subsystem for the cost of cache linefills. "Memory to deal with"
For systems, like most embedded applications, these costs are extremely high and are definitely worth reducing. The rest of the dispatching cost essentially depends on the control statements present in the Java code. In order to fully translate the bytecode into binary code, such as classic dynamic recompilation, branch statements should be introduced into the executable code. This will have roughly the same cost as the rest of the dispatch left.

【００３４】マクロの利点の一つは、それらが、バイトコードの一般的なシーケンスであり
、かつこのようなシーケンスの一つが、他のプロセスまたは同じプロセスの文脈
における他の場所で発見される確立が、極めて高いことである。Javaバイトコー
ドに対して、テストが行なわれた。マクロの有効部分を再利用することができる
ことが、見いだされた。したがって、再使用ファクタを考慮することにより、マ
クロコードインプリメンテーションにより使用されるメモリ使用量を、低減させ
ることができる。二進コードへの全翻訳は、少なくとも二倍のメモリを使うので
、無視できる程度の効率利点しかもたらさないであろう。たとえば他の2のファ
クタ分スケジューリングのコストをさらに削減することが可能であると仮定する
と、観測できるスピードにおける総インクリメントは、非常に小さくなるであろ
う。おそらく、メモリ使用量を二倍にしても効果は得られないであろう。One of the advantages of macros is that they are general sequences of bytecodes, and one such sequence is found elsewhere or elsewhere in the context of the same process. However, it is extremely high. Testing was done against Java bytecode. It has been found that the useful part of a macro can be reused. Therefore, by considering the reuse factor, the memory usage used by the macrocode implementation can be reduced. A full translation to binary code would use at least twice as much memory and would therefore have negligible efficiency advantages. Assuming that the cost of scheduling could be further reduced by the other two factors, for example, the total increment in observable speed would be very small. Perhaps doubling the memory usage would not be effective.

【００３５】マクロの他の利点は、それらが、通常のバイトコードディスパッチングメカニ
ズムに、いかなる影響も与えないことである。VMにすでに存在しているものに他
の実行メカニズムを加える必要は無い。コンパイルされたプロセスとコンパイル
されていないプロセスを識別する必要は無く、そして固有コードインターフェイ
スの宿命およびオーバーヘッドに頼る必要が無くなる。Another advantage of macros is that they do not have any effect on the normal bytecode dispatching mechanism. There is no need to add any other execution mechanism to what already exists in the VM. There is no need to distinguish between compiled and uncompiled processes, and no reliance on the fate and overhead of native code interfaces.

【００３６】 Javaのようなオブジェクト指向言語は、コードの非常に小さいユニットの存在
がその特徴である。それらがほとんどいつでも潜在的に多形態であるので、Java
プロセスは、インラインさせることが非常に困難でもある。したがって、完全に
最適化しているコンパイラが、基本的なプロセッサアーキテクチャにプロセス実
行セマンティクスをより良くマップすることが可能であっても、バイナリ翻訳さ
れたプロセスのプリアンブルと結論のオーバーヘッドは、しばしばいかなる利点
も抑制してしまうであろう。Object oriented languages like Java are characterized by the presence of very small units of code. Java as they are potentially polymorphic almost always
The process is also very difficult to inline. Thus, even though a fully optimizing compiler can better map process execution semantics to the underlying processor architecture, the preamble and conclusion overhead of a binary translated process often gives any benefit. Will restrain.

【００３７】実行効率を向上させるためには、スタックキャッチ技術を使用することができ
る。これは、メモリアクセスの回数をかなり低減させて、プロセッサのレジスタ
ファイル内部のJavaスタックの最初の3個の位置を保持する。この技術は、目標
プロセッサがスタックマシンそれ自体である事実を利用する。元のバイトコード
インプリテーションは、等価プロセッサ命令シーケンスにより置換される。単純
な変換テーブルおよび単純なコスト関数（メモリー参照の数）を使用することに
より、非常に速くて効率的なコンパイル技術を、達成することができる。Javaの
場合を一例として、本発明の別の実施例にによって、メモリ入力／出力のコスト
低減を、以下に説明する。Stack catch techniques can be used to improve execution efficiency. This significantly reduces the number of memory accesses and keeps the first three positions of the Java stack inside the processor register file. This technique takes advantage of the fact that the target processor is the stack machine itself. The original bytecode implementation is replaced by the equivalent processor instruction sequence. By using a simple translation table and a simple cost function (number of memory references), a very fast and efficient compilation technique can be achieved. Taking the Java case as an example, the memory input / output cost reduction according to another embodiment of the present invention will be described below.

【００３８】 Javaは、スタックに基づく言語である：バイトコードは、メモリを使用して相
互に通信する。一回のバイトコード実行ごとに、少なくとも一回のメモリアクセ
スが行なわれるが、これは非常に高価になる。たとえば、次の単純式を考慮する
と： C = a + b; スタックに基づく言語の場合、これは、次のように翻訳される： Push a -- 1 read 1 write Push b -- 1 read 1 write Add -- 2 read 1 write Store c -- 1 read 1 write これは、9回のメモリアクセス演算を表す。内部状態が最小値であるCPUは、3回
のメモリアクセスのみで同じことを行うことができる。現在のプロセッサアーキ
テクチャに関しては、メモリー参照が、最も高価な演算に属すると言う事実を考
慮すると、これは、最適化の理想的なフィールドである。符号化の作業をほとん
ど追加すること無く、Javaバイトコードのバージョンについて、外部メモリによ
るよりもその代わりにマシンレジスタによりデータを交換させることができる。
次いで、ストランドと呼ばれるこれらの専門バイトコードから開始して、マクロ
の範囲内でメモリアクセスの数を2のファクタより多くのファクタ低減させて、
マクロを、生成させることができる。Java is a stack-based language: Bytecodes use memory to communicate with each other. There is at least one memory access per bytecode execution, which is very expensive. For example, consider the following simple expression: C = a + b; For a stack-based language, this translates to: Push a --1 read 1 write Push b --1 read 1 write Add --2 read 1 write Store c --1 read 1 write This represents 9 memory access operations. A CPU with a minimum internal state can do the same with only three memory accesses. Considering the fact that for current processor architectures memory references belong to the most expensive operations, this is an ideal field of optimization. With little additional coding effort, Java bytecode versions can be made to exchange data by machine registers instead of by external memory.
Then, starting with these specialized bytecodes called strands, we reduce the number of memory accesses within the macro by a factor of more than two,
Macros can be generated.

【００３９】 "macroizer"およびバイトコード"standifier"のインプリメンテーションは、
多過ぎる命令行を必要としない。インタープリタのループの部分再書込みは、C
コードの、例えば、2、3千行で推測させることができる。間接スレッデッドコー
ドディスパッチャのインプリメンテーションに対しては、アセンブリのわずかな
行しか、必要でなく、そして2、3百行が"standifier"に使用される。The implementation of "macroizer" and bytecode "standifier" is
Does not require too many command lines. A partial rewrite of the interpreter loop is C
It can be guessed at, for example, a few thousand lines of code. For an indirect threaded code dispatcher implementation, only a few lines of assembly are needed, and a few hundred lines are used for the "standifier".

【００４０】バイトコード構文解析と新たなマクロバイトコードの生成に対して費やされる
時間を考慮しない実行時間のテストおよび測定が、行なわれた。それにもかかわ
らず、実行時間は、固有コードプロファイラを使用して測定された。ウェブブラ
ウザのような大きいアプリケーションを実行する場合、"マクロ化(macroization
)"に対して費やされる総時間は、総実行時間のごくわずかな部分に限定されたま
まである。Execution time tests and measurements were performed that did not take into account the time spent on bytecode parsing and the generation of new macrobytecode. Nevertheless, the execution time was measured using the native code profiler. When running a large application such as a web browser, "macroization"
The total time spent for ")" remains limited to a small fraction of the total execution time.

【００４１】本発明の受信機の具体例が、図3に示されている。これは、双方向テレビ伝送
に対するセットトップボックス受信機20である。これは、ケーブル伝送チャネル
23によりビデオ送信機24からコード化された信号を受信し、かつビデオディスプ
レイ25に表示される送信データを取り出すために受信信号を復号化する、（例え
ば、MPEG 2 (Moving Pictures Experts group, ISO/IEC 13818-2)勧告と互換で
ある）復号器を有する。セットトップボックスの機能は、バイトコードの形でJa
vaのような解釈された言語を実行するシステムを使用して、効率的にソフトウェ
アによりインプリメントさせることが出来る。このシステムは、メインプロセッ
サCPUと、図1または2において説明したように、メインプロセッサCPUが本発明の
方法を実行するための命令を表すソフトウェアのコード部分を格納するメモリME
Mとを有する。A specific example of the receiver of the present invention is shown in FIG. It is a set top box receiver 20 for interactive television transmission. This is a cable transmission channel
23 to receive the coded signal from the video transmitter 24 and to decode the received signal to retrieve the transmitted data displayed on the video display 25 (e.g. MPEG 2 (Moving Pictures Experts group, ISO / It has a decoder (compatible with IEC 13818-2) recommendations. The function of the set-top box is Ja in the form of bytecode.
It can be efficiently implemented by software using a system that runs an interpreted language such as va. This system comprises a main processor CPU and a memory ME, which as described in FIG. 1 or 2, stores a code portion of software representing instructions for the main processor CPU to carry out the method of the invention.
Have M and.

【００４２】発明の他の実施例によれば、セットトップボックス20は、受信信号の一部とし
てバイトコードを含むJavaアプリケーションを受信することができる。この場合
、セットトップボックスは、遠く離れた送信者から受信されたバイトコードに基
づくプログラムをロードするローダーを有する。According to another embodiment of the invention, the set top box 20 is capable of receiving Java applications that include bytecode as part of the received signal. In this case, the set top box has a loader that loads a program based on bytecode received from a remote sender.

[Brief description of drawings]

【図１】本発明の方法の機能を示すブロックダイアグラムである。1 is a block diagram illustrating the functionality of the method of the present invention.

【図２】本発明の好適な実施例による方法の機能を示すブロックダイアグラムで
ある。FIG. 2 is a block diagram illustrating the functionality of the method according to the preferred embodiment of the present invention.

【図３】本発明の受信機の一具体例を示す線図的なダイアグラムである。FIG. 3 is a schematic diagram showing a specific example of the receiver of the present invention.

[Explanation of symbols]

1 半導体基体 3 表面 4 ソース 5 ドレイン 6 フローティングゲート 7 制御ゲート 8 フローティングゲート誘電体 9 インターゲート誘電体 10 選択ゲート 11 ゲート誘電体 12 ドープされた付加領域 13 実質的に平坦な表面部分 14 側壁部分 15 実質部分 16 側壁スペーサ 18 金属シリサイド 19 誘電層 20 コンタクトホール T1 フローティングゲートトランジスタ T2 選択トランジスタ WLi 語線 SLi 選択線 BLj ビット線 1 Semiconductor substrate 3 surface 4 sources 5 drain 6 floating gate 7 control gate 8 floating gate dielectric 9 Intergate dielectric 10 Select gate 11 Gate dielectric 12 additional doped regions 13 Substantially flat surface area 14 Side wall 15 Real part 16 Side wall spacer 18 metal silicide 19 Dielectric layer 20 contact holes T1 floating gate transistor T2 selection transistor WLi word line SLi selection line BLj bit line

【手続補正書】[Procedure amendment]

【提出日】平成１３年６月４日（２００１．６．４）[Submission date] June 4, 2001 (2001.6.4)

【手続補正１】[Procedure Amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】図面の簡単な説明[Name of item to be corrected] Brief description of the drawing

【補正方法】変更[Correction method] Change

【補正の内容】[Contents of correction]

【図面の簡単な説明】[Brief description of drawings]

【符号の説明】 20 セットトップボックス受信機 23 ケーブル伝送チャネル 24 ビデオ送信機 25 ビデオディスプレイ[Explanation of symbols] 20 set-top box receiver 23 cable transmission channel 24 video transmitter 25 video display

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 5B033 AA01 BB03 5B081 CC24 DD01 ─────────────────────────────────────────────────── ─── Continued front page F-term (reference) 5B033 AA01 BB03 5B081 CC24 DD01

Claims

[Claims]

1. A method for optimizing an interpreted program in a virtual machine interpreter for a bytecode-based language, the virtual machine replacing a simple sequence of bytecodes with a new sequence of macrobytecodes. By dynamically reconfiguring itself and the virtual machine interpreter being encoded as a threaded code interpreter that translates the bytecodes into their implementation code, During translation, a reference is made to the address of the implementation of the bytecode so that the address of the implementation of the next bytecode is retrieved so that the next bytecode can be jumped to. Contains contains reference table Method.

2. The method according to claim 1, wherein the bytecodes of the original sequence are grouped after the new sequence of operation codes of the macro.

3. The virtual machine interpreter has a predetermined set of bytecodes, some of which are unused, and the new sequence of macro operation codes comprises:
Claim 1 or 2 implemented by the unused bytecode
The method described in.

4. The method of claim 3, wherein the unused bytecodes are encoded with a representation of at least 2 bytes.

5. A method for optimizing an interpreted program in a virtual machine for a bytecode-based language, the method comprising the steps of: initializing by loading a program containing said bytecode; Replacing a sequence of bytecodes with macrocodes, interpreting the macrobytecodes using an indirect threaded interpreter that translates the bytecodes into their implementation code, the current byte During the interpretation of the code, the implementation of the bytecode such that the address of the implementation of the next bytecode is retrieved so that it is possible to jump to the next bytecode. Reference to the above address Method having a reference table containing.

6. A memory-loaded computer program product having a set of instructions for causing a processor to perform the method of any of claims 1-5.

7. A processor (CPU) and a memory (MEM) for storing software code portions representing instructions for causing the processor to carry out the method according to claim 1.
And a receiver for receiving the transmitted signal.

8. A method making it possible to download to a receiver according to claim 7 a computer program having instructions for performing the method according to any one of claims 1-5.