New! View global litigation for patent families

CN1173262C - Optimized bytecode interpreter of virtual machine instructions - Google Patents

Optimized bytecode interpreter of virtual machine instructions Download PDF

Info

Publication number
CN1173262C
CN1173262C CN 00802974 CN00802974A CN1173262C CN 1173262 C CN1173262 C CN 1173262C CN 00802974 CN00802974 CN 00802974 CN 00802974 A CN00802974 A CN 00802974A CN 1173262 C CN1173262 C CN 1173262C
Authority
CN
Grant status
Grant
Patent type
Prior art keywords
optimized
bytecode
instructions
interpreter
virtual
Prior art date
Application number
CN 00802974
Other languages
Chinese (zh)
Other versions
CN1347525A (en )
Inventor
F・里卡迪
F·里卡迪
Original Assignee
皇家菲利浦电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Grant date

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/44Arrangements for executing specific programmes
    • G06F9/455Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
    • G06F9/45504Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
    • G06F9/45516Runtime code conversion or optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • G06F2212/451Stack data

Abstract

本发明涉及在一种字节码为基础的语言的虚拟机解释程序内优化被解释程序的方法,包括用所说的宏操作码新序列取代简单操作码原序列而用宏操作码动态重新配所说虚拟机的方法。 Optimization of the process within the interpreter present invention relates In one bytecode based language interpreter virtual machine, including substituted prosequence simple operation code with said sequence macro action code with the new dynamic reconfiguration macro opcode said process virtual machine. 将虚拟机解释程序编码成间接线程解释程序应归功于翻译表,该表包括操作码执行地址,用于将字节码翻译成操作码执行地址。 The virtual machine interpreter program code interpreter thread in indirect attributed translation table, which includes an operation code execution address, for translating into opcode byte code execution address. 应用:采用任何以字节码为基础的程序设计语言的嵌入系统。 Application: use of any bytecode in embedded systems based programming language.

Description

虚拟机指令的优化字节码解释器 Optimization of virtual machine instructions of the bytecode interpreter

发明领域本发明涉及被解释程序运行时间的优化。 FIELD OF THE INVENTION The present invention relates to the optimization of the interpreter run time. 特别是涉及将被解释程序优化的方法,该方法是借助于用新的宏操作码将其自身重新动态配置的虚拟机而实现的。 Particularly to a method of the interpreter to be optimized, the process is by means of a new macro opcode itself dynamically re-configured virtual machine realized. 本发明适用于任何字节代码为基础的程序设计语言。 The present invention is applicable to any byte code based programming language.

发明背景作为编译程序以及与机器无关的可执行程序表示的中间语言,采用程序员可见的堆栈的以字节码为基础的语言是普遍的。 BACKGROUND OF THE INVENTION intermediate language compiler, and expressed as a machine-independent executable program, using the programmer-visible bytecode stack-based language is universal. 这类语言对网络计算提供明显的优点。 Such language provides distinct advantages for network computing. 作者L.Piumanta和F.Riccardi的“用选择性排队将直接线程代码优化”的文章在其开始段落即对被解释程序的优化描述了一种技术,该文章发表在1998年6月17日在加拿大Montreal召开的程序语言设计和执行(PLDI)98会议“ACMSIGPLAN”文集的第291-300页。 Author L.Piumanta and F.Riccardi of "selective direct line threaded code optimization," the article in its opening paragraph that is to be interpreted optimization program describes a technique, published in the June 17, 1998 in programming language design and implementation (PLDI) 98 meeting held in Montreal, Canada "ACMSIGPLAN" on page 291-300 corpus. 采用虚拟机(VM)解释程序应归功于VM解释器。 Virtual machine (VM) VM interpreter should be attributed to the interpreter. VM是代表虚拟处理器体系结构的一种软件执行程序,在这种虚拟处理器上执行专为这种体系结构而编译的应用程序。 VM virtual processor architecture is representative of a software program execution, execute applications designed for this architecture and compiled on this virtual processor. 该虚拟处理器/机的指令被称之为字节码。 The instruction virtual processor / machine called bytecode. VM解释程序是代表字节码执行机制的VM的那部份。 VM interpreter is that part of the enforcement mechanisms on behalf of the VM bytecode. 该字节码被说成是由VM解释程序所解释。 The bytecodes are said to be interpreted by the interpreter VM. 字节码执行机制当今是以带开关事件块的无穷环路实现的。 Bytecode execution mechanism today is an infinite loop events with a switch block implementation. 上述文章中所描述的技术适用于直接线程解释程序。 Technology described in the above article for direct thread interpreter. 线程代码解释程序按行执行字节码。 Threaded code interpreter bytecode execution row. 每一字节码的翻译包含对下一字节码的引用。 Translation every byte code contains a reference to the next byte code. 因此,以线程解释程序执行的字节码翻译不涉及无穷环路。 Therefore, in order to translate the bytecode interpreter execution thread does not involve an infinite loop. 尽管线程解释程序具有其性能优点,但太慢且要求太多的存储器而不适合多数嵌入系统在上述文章所提到的直接线程代码解释程序中,采用它们的执行地址表示VM字节码,所以每个字节码可直接跳到下一字节码的执行。 Although the thread has its interpreter performance advantages, but is too slow and requires too much memory and not directly suitable for most embedded systems threaded code interpreter in the above-mentioned article, the use thereof represents VM byte code execution address, so each bytecode can jump directly to the next bytecode. 在翻译操作之前,用该应用程序的每一字节码的地址将一表启动,使之当字节码翻译发生时能快速访问字节码执行的物理地址。 Prior to the translation operation, the application with the address of each byte code will start a table, so that when the physical address can be accessed quickly byte code translation occurs bytecode execution. 该表允许从一字节码切换到另一字节码。 This table allows the switching from one to another bytecode bytecode. 直接线程解释程序虽然相当快,但它们包含代码扩展。 Direct threads interpreter although quite fast, but they contain code extensions. 将字节码改变成直接线程代码,其代码大小增加约150%,这是因为操作码被它们的执行码的地址所取代。 Bytecodes directly threaded code is changed to increase to about 150% of its code size, since the operation code is substituted with the address of the execution code thereof. 一般而言,地址需要4字节,而操作码仅需1字节。 In general, addresses require 4 bytes, and only 1 byte opcode. 因此,直接线程解释程序增大存储器消耗而不十分适合嵌入系统。 Thus, direct interpreter thread consumption without increasing the memory is suitable for embedded systems.

发明概述本发明旨在提供一种方法以优化被解释的程序的运行时间,该方法十分适合于嵌入系统。 SUMMARY OF THE INVENTION The present invention aims to provide a method to optimize the run-time program is interpreted, the method is very suitable for embedded systems. 例如,这种系统可以是卫星或电缆传输系统,它嵌入数字视频接收器内,通常叫做机顶盒。 For example, such a system may be a satellite or cable transmission systems, it is embedded within a digital video receiver, commonly called set-top box. 但是本发明同样适合于其操作系统是基于字节码的程序设计语言的任何产品。 However, the invention is equally suitable for its operating system is any product based programming language bytecode. 本发明也可以节省存储器和CPU资源,因而可改善系统的性能。 The present invention can also save memory and CPU resources, system performance may be improved.

按照本发明,描述一种对基于字节码语言的虚拟机解释器内的被解释的程序进行优化的方法,其中虚拟机在动态上用新的宏字节码(或操作码)将自身重新配置以取代简单字节码序列,同时为将字节码翻译成它们的执行代码虚拟机解释器被编码成线程代码(threadedcode)解释器。 According to the present invention, describes a method for optimizing the interpretation of the virtual machine bytecode interpreter based language program, wherein the virtual machine with a new dynamic macros byte code (or opcode) itself again configured to replace a simple sequence of bytecodes, while the virtual machine interpreter is encoded into threaded code (threadedcode) interpreter bytecode to be translated into their code execution. 按照本发明,线程代码解释器被编码成间接线程代码解释器,这应归功于参考表,这种参考表含有字节码的执行地址使之在一字节码翻译期间可提取下一字节码的地址而达到能跳到下一字节码的目的。 According to the present invention, the thread is encoded into a code interpreter indirect threaded code interpreter, which should be attributed to the reference table, the reference table containing the bytecode execution address so that the next byte can be extracted during a bytecode translator the purpose of the address code can jump to the next bytecode.

附图简述参照下面附图对本发明及其可选择性地用来实现本发明的其他特点可一目了然。 BRIEF DESCRIPTION OF THE DRAWINGS Referring glance and other features of the present invention may optionally be used to implement the present invention.

图1是方块图,显示按本发明的一种方法的特点。 FIG. 1 is a block diagram showing the characteristics of a method according to the present invention.

图2是方块图,显示按本发明优选实施方案的一种方法的特点。 FIG 2 is a block diagram showing the characteristics of a method according to a preferred embodiment of the present invention.

图3是原理图,显示按本发明的一种接收器的例子。 FIG 3 is a schematic diagram, according to an example of the invention A receiver display.

发明详述现在对本发明给以详细解释,以Java语言为例,显示适合于任何基于字节码语言的新颖(novel)机运行时间优化策略。 DETAILED DESCRIPTION now give a detailed explanation of the invention, the Java language, for example, display suitable for any (novel) machine running time optimization based on the novel bytecode languages ​​strategy.

通常适时(JIT)编译程序所采用的方法是整个舍弃Java虚拟机(VM)解释程序,并在执行之前(所以命名为JIT)将应用程序的字节码翻译成本机代码。 The method is usually timely (JIT) compiler is used to discard the entire Java virtual machine (VM) interpreter, and prior to execution (so named for JIT) application bytecode to native code translation. 这种方法包括对原应用程序语义学的理解和将它重新表示为更方便的本机形式。 This method comprises the understanding of the semantics of the original application and it will be rewritten as a more convenient form of the machine. 尽管这种方法可以是一种获取其性能的有效途径,但一方面它要以大的存储消耗为代价,因为基于字节码的语言较之于本机代码更为紧凑;另一方面,必要消耗大量的CPU(中央处理单元)资源,因为在目标机器上重新变换Java字节码的任务并非轻而易举。 While this method may be an effective way to obtain its performance, but on the one hand it is to be consumed at the expense of a large memory, because the bytecode language based compared to native code more compact; on the other hand, necessary consume a large amount of CPU (central processing unit) resources, since on the target machine remapping Java bytecode task is not easy.

本发明同时还基于某类动态代码生成,但是它的目的不是将应用程序的Java字节码翻译成本机代码,而是将Java VM动态适配到应用程序特定的字节码序列的执行。 The present invention is further based on both certain dynamic code generation, but its purpose is not Java bytecode to native code translator applications, but the Java VM is adapted to perform a dynamic application-specific code sequence of bytes. 因此原应用程序Java字节码被保留,而VM则用新颖字节码或改善其执行效率的操作码(opcodes)动态上加以充实。 Thus the original application Java bytecodes are retained, while with the new VM byte code or improving the efficiency of an operation code (with opcodes) to be enriched dynamic.

这种方法有几种优点:它不增大可执行代码的大小:其应用程序变成节省存储器的Java字节编码表示,VM执行机制是经济的:只有一个执行机制,因此执行应用程序的机制不必处理多代码表示,从而降低了其大小和改善了可靠性,代码生成技术相当简单:VM优化器结构非常简单,其应用程序的字节码分析是一次性通过表驱动的过程,占用很少的CPU资源,而且它直接驱动新字节码的合成。 This approach has several advantages: it does not increase the size of the executable code: their applications into memory savings represents a Java byte code, the VM implementation mechanism is economical: only one execution mechanism, thus executing an application mechanism having to deal with multi-code represented, thereby reducing the size and improving the reliability, the code generation techniques quite simple: VM optimize the structure is very simple, the bytecode application is analyzed by one-time table driven process, it takes up very little CPU resources, and it directly drives the synthesis of new bytecode.

这些特性使本发明适合于嵌入系统。 These features of the present invention is suitable for embedded systems. 本发明优化技术的基础在于一种解释程序的最基本的机制相对于一类“典型”的应用程序的开销的研究。 Based optimization techniques of the present invention that the basic mechanism of the overhead interpreter study a class of "typical" with respect to the application. 这种应用程序的配置文件的相关性在于从各种可以重视的优化技术中得到的潜在利益。 This configuration file of the application is the relevance of the potential benefits derived from a variety of optimization techniques can be important. 因为目标是嵌入应用程序,可以定义为“典型”应用程序的程序是,例如,控制应用程序,图形用户界面等。 Because the target embedded application, may be defined as "typical" application program is, for example, a control application, a graphical user interface.

假定目标应用程序被完全映象到由下层VM(目标管理)提供的原语上。 Assuming that the target application is completely mapped to the primitives provided by the underlying VM (management by objectives). 因此,它们将从基本的代码变换中受益不多,而真正的受益却来自VM执行机制的总体改善。 Therefore, they transform from a basic code to benefit much, but the real benefit comes from the overall improvement of enforcement mechanisms VM. 为了理解如何改善VM的效率,曾利用Amdhal定律。 In order to understand how to improve the efficiency of the VM, it had used Amdhal laws. 在Hennessy和Patterson陈述的版本中,Amdhal定律表述如下:从利用某种快速执行方式所得到的性能改善受限于可以使用的快速方式的时间份额,更为综合地,“致使通常情形加快”。 In the version Hennessy and Patterson's statement, Amdhal's Law stated as follows: the performance improvement resulting from the implementation of some quick way to use the limited time share fast way can be used, more comprehensively, 'cause usually speed up the case. "

解释器的性能取决于对可执行代码可选择的表示和用于发送字节码的机制。 It depends on the performance of the interpreter executable code mechanism for selectively transmitting representation bytecode. 降低执行开销的第一种途径是降低指令发送的开销,因为解释程序的核心是它的指令发送机制。 The first way to reduce execution costs is to reduce the cost of instruction sent, because the core interpreter is its instruction delivery mechanism. 典型的解释程序,称之为纯字节码解释器,其执行机制像是处理器的模仿:在闭环内有一大的开关语句,将指令发送到它们的执行程序。 Typically interpreter, called pure bytecode interpreter that executes such imitation processor mechanism: has a large loop in the switch statement, send instructions to their execution. 因此,纯字节码解释器的内环十分简单:用开关语句读取下一字节码并将它发送至执行程序。 Thus, pure bytecode interpreter of the inner ring is simple: read the next bytecode switch statement and sends it to execute the program. 解释程序是一无穷环路,含有开关语句以发送相继的字节码,并通过打开开关将控制返回到无穷环路的起点将控制传至下一字节码。 Interpreter is an endless loop, comprising a switch statement to transmit successive byte code, and by opening the control switch is returned to the starting point of the endless loop passes control to the next bytecode. 下面一组指令显示一典型的字节码解释程序的执行过程。 Following a set of instructions executed during a typical display bytecode interpreter.

<pre listing-type="program-listing"> Loop ( op=*pc++; Switch(op){ Case op_1: // op_1′s实现break; case op_2: // op_2′s实现break; case op_3: // op_3′s实现break; … }</pre>假定从开关打开通过环路端点的隐含跳跃返回至它的起点,编译程序对这一整个跳跃(转移)链进行优化,则与这种方法相连系的总开销是:增量指令指针pc,从存储器读取下一字节码,就开关参数进行冗余范围检验,从表中读取终点事件标识符地址,跳至该地址,并在每一字节码终点:跳回至环路起点读取下一字节码。 <Pre listing-type = "program-listing"> Loop (op = * pc ++; Switch (op) {Case op_1: // op_1's realization break; case op_2: // op_2's realization break; case op_3: / / op_3's realization break; ...} </ pre> assuming open end through the loop jumps back to its implicit starting point, the compiler to optimize the whole hop (transfer) chain from the switch, and this method is is connected to system overhead: pointer increment command pc, the next byte code is read from the memory, on the redundant switching parameter range test, read endpoint identifier from the table address, jump to this address, and each bytecode end: jump back to the loop start point read the next bytecode.

在这一事例中,如忽略诸如开关语句实际执行过程中非有效性的其他来源,则指令发送的开销包括:2次存储器访问:一次是提取下一指令的值,一次是提取该指令执行的地址,加2次转移:一次是跳至字节码的执行而另一次是返回至环路的开始。 In this case, as such a switch statement ignores other sources Africa effectiveness actual implementation, the transmission overhead of the instruction comprising: two memory accesses: one is to extract the value of the next instruction, the first instruction is executed to extract address, plus 2 times the transfer: a jump is executed and the bytecode to another loop is started to return. 在现代的体系结构中跳跃是最昂贵的指令。 In modern architecture is the most expensive of the jump instruction.

纯字节码解释器易写易懂。 Pure bytecode interpreter easy to write and easy to understand. 它们也很小巧,但相当慢。 They are also very small, but very slowly. 因此它们不便于用到嵌入系统。 Thus they are not easy to use the embedded system. 在多数字节码实行简单操作的情形中,如上面所示例子,大部分执行时间浪费在指令发送。 The case has a simple operation in most byte code, as shown in the example above, most of the execution time is wasted sending instruction. 实际上,为了熟悉该种机制的真实开销,不妨将单个字节码的执行开销加以比较。 In fact, in order to be familiar with the true cost of this kind of mechanism, it might be a single byte code execution costs compare. Java字节码具有很低级的语义学,而它们的执行过程通常又是繁琐的。 Java bytecode has a very low-level semantics, and their execution is usually tedious. 因此,最普遍被执行的字节码实际上比发送机制本身便宜。 Therefore, the most common byte code to be executed is actually cheaper than the delivery mechanism itself.

如下面一组指令所示,按照本发明所获取的第一种效益改善是采用间接线程代码:op_1_lbl:// op_1′s实现goto opcode_table(*pc++);op_2_lbl:// op_2′s实现goto opcode_table(*pc++);op_3_lbl:// op_3′s实现goto opcode_table(*pc++);此处Op_1_lbl,Op_2_lbl和Op_3_lbl代表由VM解释程序解释的3种不同的操作码。 As shown below, a set of instructions, according to the first invention of the acquired effective improvement is the use of indirect threaded code: op_1_lbl: // op_1's realization goto opcode_table (* pc ++); op_2_lbl: // op_2's implemented goto opcode_table (* pc ++); op_3_lbl: // op_3's realization goto opcode_table (* pc ++); here Op_1_lbl, Op_2_lbl and Op_3_lbl representative explained by the VM interpreter three different opcodes.

按照这种称为间接线程代码的执行程序,VM被编码成间接线程代码解释程序。 According to this execution of the program is called indirect threaded code, VM is encoded into indirect threaded code interpreter. 在字节码翻译期间,对下一字节码的地址加以解析。 During the bytecode translator, it is resolved to address the next byte code. 引用表,记为opcode_table,包含字节码执行地址。 Reference table, referred to as opcode_table, comprising bytecode execution address. 该引用表受指针(*p++)指数访问。 The reference index to access the table by the pointer (* p ++). 为跳至下一字节码每次字节码翻译时提取下一字节码的地址。 Extracting the address of the next byte code is translated bytecode each skip to the next bytecode. 采用这种方式,每一字节码的执行直接跳至下一字节码的执行,因而省去了一次转移,外环以及开关语句执行(范围检验和缺省情况处理)中不必要的非有效性。 In this way, each byte code is executed to perform the next skip byte code, thus eliminating the need for a transfer, an outer ring and a switch statement is executed (and the default range test process) in the unnecessary non- effectiveness.

根据本发明的优选实施方案,其翻译是通过开发基于字节码语言的VM性能中那些没用的字节码而实现的。 According to a preferred embodiment of the present invention, which translate the performance of a VM by developing bytecode language bytecode those useless achieved.

图1总结了按照本发明所提供的方法的主要步骤,这种方法借助于间接线程代码解释程序将字节码,例如字节码双压入,翻译成本机指令:step K0=BIPUSH;翻译字节码双压入方法的开始,包括将1/2字置入堆栈,其1/2字是双压入参数(par)step K1=PAR;提取双压入参数(par)step K2=PUT;将双压入参数置入堆栈step K3=GOTO;通过查看包含下一字节码执行地址的参考表(opcode_table)转移至下一字节码(goto opcode_table(*pc))。 Figure 1 summarizes the major steps of the method according to the present invention is provided, by means of this method of indirect threaded code interpreter bytecode, the bytecode e.g. double pressed, translation native instructions: step K0 = BIPUSH; word translation start code section into the double pressure method, comprising a half word into the stack, which is 1/2 the word into the dual-pressure parameters (par) step K1 = PAR; bis pressed extracting parameters (par) step K2 = pUT; the dual press-fitted into the parameter stack step K3 = GOTO; transferred to the next byte code (goto opcode_table (* pc)) by looking at the reference table containing the next bytecode execution address (opcode_table).

通过本身采用线程代码可以将VM的性能加倍,但是我们以下将看到,它还可以提供其它有趣的优化机会。 VM performance can be doubled by the use of threaded code itself, but we will see, it can also provide other interesting opportunities for optimization. Java字节码的统计分析表明,平均而言,每个转移中约有5-6个指令。 Statistical analysis Java byte code shows that on average, each about 5-6 transfer instructions. 在任何现代的CPU上,转移是固有的昂贵指令,因为它们可以引起管道障碍和/或触发外部总线活动。 On any modern CPU, the transfer instructions are inherently expensive, since they can cause the pipe disorders and / or trigger external bus activity. 除此之外,对环路的不滚动或队列中的方法调用(method call in-line),实际上也是无计可施。 In addition, the loop does not scroll or method calls in the queue (method call in-line), actually do anything. 甚至当将代码编译开销机表示时,控制语句将依然存在。 Even when the code is compiled overhead machine, said control statements will remain.

有关CPU在高端工作站面向目标应用程序的使用的新近研究表明,就如同错误预期转移指令所产生的效应一样,这时CPU可以消耗它的70%的时钟周期以便从管线障碍中恢复和从主存储器等待数据和指令(高速缓存错过)。 Recent studies on the CPU in the high-end workstation using the target application program indicates, as would expected error effect generated by the same branch instruction, then the CPU can consume 70% of its clock period to recover from the pipeline and from the main memory disorders wait for data and instruction (cache miss). 此外,在嵌入系统中可用的CPU具有很小的高速缓存,对动态转移预测没有硬件辅助,以及带L2高速缓存的低和/或窄存储界面。 In addition, available in embedded systems with little CPU cache, the dynamic branch prediction without hardware assistance, as well as low and / or narrow storage interfaces with L2 cache. 这些额外的限制将进一步降低CPU的利用率和性能。 These additional restrictions will further reduce the CPU utilization and performance.

Java字节码可分成两类:简单操作码(装入、储存、运算和控制语句)和复杂操作码(存储管理、同步等)。 Java byte codes can be divided into two categories: simple code (load, store, arithmetic and control statements) and complex operation code (memory management, synchronization, etc.).

简单字符码典型地不如发送机制昂贵。 Simple character code transmission mechanism is typically not as expensive. 而复杂字符码却昂贵得多,发送开销仅代表字节码执行开销的总开销的很小一部份。 And complex character code but much more expensive, sending cost represents only a small part of the total cost of the bytecode execution overhead. 简单字符码比复杂字符码更为频繁地执行(约高一个数量级),这就意味着经典的Java解释程序将其大部分时间用于发送字符码而不是真正做任何有益的事情。 Perform complex than simple character code character code more often (about an order of magnitude), which means that the classic Java interpreter most of its time to send character codes are not really doing anything useful things. 因此,可以断定,降低发送开销对简单字节码要比复杂字节码更为有效。 Therefore, it is concluded that reducing the transmission overhead byte code more complex than simple byte code more effective.

将字节码翻译成间接线程代码也给对可执行代码的任意变换提供了机会。 Will translate bytecode into indirect threaded code but also to transform any executable code provides an opportunity. 一种这样的变换就是探测字节码的公共序列并将它们翻译成单线程“宏码”。 One such transformation is the common probe sequence byte code and translate them into a single-threaded "macroblocks." 这种宏码执行原字节码整个序列的工作。 This macro code implementation of the entire sequence of the original bytecode. 因此,根据本发明的一种优选实施方案,建议用某些等价“宏码”取代简单字节码序列。 Thus, according to one preferred embodiment of the present invention, it suggested some equivalent "macroblocks" substituted simple sequence of bytecodes. 例如,像上述引用文章中所介绍的那样,字节码“压入文字、压入变量、添加、储存变量”在简单线程代码中可翻译成简单的“添加-文字-到一变量”的宏码。 For example, as described in the article cited above as bytecode "pressed into text, the variable pressure, adding, storing the variable" threads in a simple code can be translated into a simple "add - text - to a variable" macros code. 这种优化是有效的,因为它们避免了由原字节码编辑的多次发送的额外开销,而这种多次发送在宏码中是被取消的。 This optimization is effective because they avoid the overhead of the original bytecode edited several times sent, and send this many times in the macro code is canceled. 从N原字节码序列翻译来的单一宏码在执行时避免了N-1字节码发送。 N byte code translated from the original single sequence of macroblocks avoid N-1 when executing byte code transmission. 可以在上述引用文章中找到关于如何生成宏码的细节。 You can find details on how to generate macro code in the above referenced article. 这类宏码必须满足如下准则:宏必须由简单字节码序列生成,因为就降低复杂字节码的发送开销方面尚无技可施。 Such macroblocks must meet the following criteria: macro must be generated by a simple sequence of bytecodes, it reduces transmission overhead because aspect bytecode no complex technology can be applied.

宏决不可包含是可能转移目标的指令,不然的话,必须对VM执行机制做重大改变。 Macros may contain is not possible transfer target instruction, otherwise, have to make major changes to the enforcement mechanism VM. 宏码本身可以是转移目标。 Macro code itself can be a transfer target.

宏必须用控制语句或方法调用终结,因为一次本机转移的开销相当于一次发送操作的开销。 Macro control statements must call or a terminating method, because the transfer overhead time corresponds to the unit cost of a transmission operation.

为了执行方便,一个宏的最大长度应当是大约15个字节码。 In order to facilitate the implementation, a maximum length of the macro should be about 15 bytes code. “中性”平均宏码长度为4-5个字节码。 "Neutral" average code length of the macro 4-5 bytecode. 从这些准则可以很容易地建造这类宏码序列,而只需要很少和约束的CPU时间。 From these criteria you can easily build this type of macro code sequences, and requires very little CPU and time constraints. 对方法本身的字节码只需简单地扫描,而多数分析则可以是表示驱动的和单一字节码为基础的。 A method for byte code itself is simply scanned, and most analysis can be represented and driven based single byte code.

根据该优选实施方案的一种特殊选择方案,考虑到没用的字节码是很少的(平均为30-40),可以将2字节表示用于表示该新宏指令的新字节码。 According to a particular embodiment of this option is preferable, considering the bytecode is useless little (average 30-40), 2 bytes may be used to indicate that a new macro represent a new bytecode . 原序列的操作数在新序列之后即刻分组,使这些操作数借助于增量虚拟机程序计数器而易于访问。 Prosequence operands immediately after the new packet sequence, so that these operands accessible by means of a virtual machine program counter increments.

一旦过程被扫描,可以通过将编译器为线程代码解释器产生的二进制代码简单地加以剪切和粘贴而建造成宏指令。 Once the process is scanned, can be cut and paste by the binary code produced by a compiler is simply threaded code interpreter and macros constructed. 而宏指令刚好被线程发送器认为是正常的字节码。 The macro is just considered normal thread transmitter bytecode.

图2概括了根据本发明的一种虚拟机的优选实施方案。 Figure 2 summarizes the preferred embodiment of a virtual machine according to the invention. VM被实现以装入程序,这些程序含有被VM解释器解释的字节码。 VM is implemented to load programs, which contained VM interpreted bytecode interpreter. 该方法的主要步骤如下:step K0=INIT:通过将含有字节码的程序装入而启动由VM执行的过程,step K1=OPCODE:提取被解释的字节码,step K2=MACRO:用宏字节码取代简单字节码序列,step K3=TRANS:利用如图1中所描述的间接线程解释程序解释宏字节码,step K4=RES:获取结果,方法结束。 The main steps of the method are as follows: step K0 = INIT: initiated by the VM process performed by a program containing the byte code is loaded, step K1 = OPCODE: extracting the interpreted byte code, step K2 = MACRO: with a Macro simple substituted bytecode sequence of bytecodes, step K3 = TRANS: FIG indirectly using a thread in the interpreter described macro bytecode interpretation, step K4 = RES: acquiring a result, the method ends.

对实际Java应用程序执行的跟踪统计分析表明,典型的宏指令长度为4-5字节码,因此,在代码变换之后,宏的执行通常比剩余字节码高至5倍之多。 Statistical analysis of the actual tracking Java application executed show, a typical length of 4-5 macro byte code, therefore, after the code conversion is generally higher than the execution of the macro remaining bytecodes to 5 times. 剩余字节码是这样一些代码,它们的执行太复杂而不值得在线(to be worth in-lining)以及那些由于考虑到转移目标分析而舍弃的代码。 The remaining byte codes are those codes, their implementation is too complex and not worth it online (to be worth in-lining) and those codes due consideration to the transfer target analysis and discarded. 因此,总的字节码发送开销可以降低4倍以上。 Therefore, the total overhead byte code transmission can be reduced more than four times. 如果发送开销最初占总执行开销的大约50%,则使用本发明可明显地降低。 If the initial transmission overhead of approximately 50% of the total cost of execution, the present invention is used can be significantly reduced.

本发明还带出了某些附加的优点。 The present invention also brings out some additional advantages. 处理器转移指令也可以降低大约5倍。 Transfer the processor instructions may be reduced about 5 times. 因为被执行的代码已经线性化,处理器管线和存储器子系统的性能可以得到明显改善。 Because the code is executed are linearized, a processor pipeline and memory subsystem performance can be significantly improved. 实际的改善程度对管线障碍开销而言取决于处理器的体系结构,对高速缓存线填充取决于存储器子系统的体系结构。 The actual degree of improvement in terms of obstacles overhead line depends on the architecture of the processor, cache line fill depends on the architecture of the memory subsystem. 在“高存储器需求的”系统内,像多数嵌入系统,这类开销相当高并且毫无疑问值得降低。 In the "high memory demand" system, like most embedded systems, such overhead is quite high and there is no doubt worth decreases. 余下的发送开销基本上取决于Java代码内的控制语句。 The remaining overhead substantially dependent transmission control statements within a Java code. 为了像典型动态重编译那样将字节码完全翻译成二进制代码,应在可执行代码中引入转移语句。 For typical dynamic recompilation like as bytecodes fully translated into binary code, to be introduced into the branch statements in the executable code. 这样会有同遗留下的剩余发送大致相同的开销。 Such transmission will be left behind with the remaining approximately the same cost.

宏的优点之一是,它们是普通的字节码序列,同时在别处其它方法,或者甚至在相同方法的环境中找到一种这样序列的可能性是相当高的。 One of the advantages of macro is used which is a common sequence of bytecodes, while elsewhere in other methods, or even the possibility of finding one such sequence in the same manner as the environment is relatively high. 曾经对Java字节码进行过试验。 Once to Java bytecode been tested. 发现宏指令的大部分可以重新使用。 Most find macro can be reused. 因此,通过对重新使用因素的考虑,可以降低被宏指令代码执行使用的存储器痕迹(memory footprint)。 Thus, by considering reuse factors can reduce the memory trace (memory footprint) is used macro code execution. 全部翻译成二进制代码将至少消耗两倍的存储,并且很可能只获得微不足道的性能优势。 Fully translated into binary code will consume at least twice the memory, and is likely to get only insignificant performance advantage. 例如,假定可以将调度开销进一步削减另外二倍,而在速度上总的可见增量却是很小的。 For example, assume that scheduling overhead can be further cut twice Further, while the overall apparent speed increment is small. 因而很可能不值得换取双倍的存储器痕迹。 Thus it may not be worth double the return of memory traces.

宏的另一优点是它们对正常字节码发送机制不会有任何影响。 Another advantage is that they macros will not have any effect on normal byte code transmission mechanism. 对已经存在于VM内的宏无需添加另外的执行机制。 Already present in the VM macros without adding additional enforcement mechanisms. 无需将编译过的和未编译过的方法加以区分,也无需重复到本机代码界面的命运和额外开销。 No need to distinguish between compiled and methods are not compiled, there is no need to repeat the fate of overhead and native code interface.

像Java这种面向对象语言是以很小单位代码存在为特征的。 This object-oriented language like Java is a very small unit characterized by the presence of the code. Java方法也很难联机(inline),因为它们几乎总是潜在多形的。 Java method is difficult Online (inline), because they are almost always potentially polymorphic. 因此,即使是充分优化的编译程序能更好地映像到下层处理器体系结构上的该方法执行语义学上,二进制翻译方法的开头和结尾的额外开销常常超出任何优越性。 Therefore, even a fully optimized compilers better image to the method of the underlying processor architecture on the implementation of the semantics, the beginning and the end of the overhead binary translation method often than any superiority.

为改善执行效率,采用堆栈高速缓存技术,它将处理器寄存器文件中Java堆栈的头三个地址加以保留,因而明显地降低了存储器访问次数。 To improve efficiency, the stack using caching techniques, it will be retained in the first three address processor register file Java stack, thereby significantly reducing the number of memory accesses. 该技术充分利用目标处理器就是堆栈机本身这一事实。 The technology takes advantage of the target processor is the fact that stack machine itself. 将原字节码的执行用等效处理器指令序列代替。 Execution of the original bytecode instruction sequence replaced by equivalent processor. 借助于繁复的翻译表和简单的开销函数(存储器参考号码)可以实现很快和有效的编译技术。 Translation table by means of complicated and simple cost function (reference number memory) can be realized quickly and efficient compiler technology. 根据本发明的另一种选择实施方案,作为在Java情形下的一个例子,现在将描述存储器输入/输出的开销降低。 Another alternative embodiment of the present invention, as an example in the case of Java, now overhead memory input / output decreased described.

Java是一种以堆栈为基础的语言:字节码用存储器相互通信。 Java is a stack-based language to: communicate with each byte code memory used. 每单次字节码的执行意味着至少一次存储器访问,而这种访问是非常昂贵的。 Execution per single byte code means at least one memory access, and this access is very expensive. 例如,考虑如下简单的表达式:C=a+b;在以堆栈为基础的语言中,将它翻译成:压入a -1读 1写压入b -1读 1写加 -2读 1写存储c -1读 1写以上表示有9次存储器访问操作。 For example, consider the following simple expression: C = a + b; in the stack-based language, translate it into: a -1 reading press-fitted into the press-b -1 1 Write 1 Read Write Read -2 plus 1 a read write memory write c -1 represents 9 times more memory access operations. 而一个具有最小内部状态的CPU仅只用3次存储器访问即可达到同样目的。 And a CPU having a minimum internal state memory accesses only been 3 to achieve the same purpose of use. 考虑到现代处理器体系结构,存储器引用是最昂贵的操作,因而属于优化的理想领域。 Considering the architecture of modern processors, memory references are the most expensive operations, and thus over the field of optimization belongs. 借助于很少的附加编码工作,可以完成Java字节码的一个版本使数据通过机器寄存器交换,而不是通过外部存储器交换。 By means of a few additional coding may be accomplished in a Java bytecode version register so that the data exchange through the machine, rather than exchanging the external memory. 因此可以产生宏,从称之为要素的这些特定的字节码开始,在一个宏指令内存储器访问的次数可以降低2倍以上。 Can thus generate a macro, call these particular elements from the start byte code, the number of memory access within a macro can be reduced more than 2 times.

“Marcroizer”和字节码“Standifier”的执行不需要太多行的代码。 "Marcroizer" and bytecode "Standifier" does not require much execution code lines. 对解释程序环的部份重写可以加以估计,例如,约为数千行C代码。 Most of the interpreter ring rewriting can be estimated, for example, about thousands of lines of C code. 为执行简单线程代码,仅需数行汇编语言,而对“Standifier”需要数百行。 To perform simple threaded code, just a few lines of assembly language, while "Standifier" require hundreds of lines.

已经对运行时间进行了测试,这种测试未考虑字节码粘贴和新宏字节码生成的时间耗费。 Running time has been tested, the test time and the new macro paste bytecode bytecode generation is not considered cost. 但不管怎样,运行时间是用本机代码配置文件测量的。 In any case, the running time is in native code profile measurements. 当运行大的应用程序时,如网上浏览器,“maeroization”的总时间耗费仍维持在总执行时间的一个很小的百分数。 When you run large applications, such as web browsers, "maeroization" total time spent remained at a very small percentage of the total execution time.

图2所示是根据本发明的一种接收器的例子。 Figure 2 is an example of a receiver in accordance with one of the present invention. 这是一机顶盒接收器20,用于交互式视频传输。 This is a set-top box receiver 20, for interactive video transmission. 它包括解码器,例如和MPEG 2(活动图像专家组,ISO/IEC 13818-2)建议兼容,用于通过电缆传输频道23接收来自视频发射器24的编码信号,然后将接收信号解码以便提取被传输的数据以显示在视频显示25上。 It includes a decoder, and for example, MPEG 2 (Motion Picture Experts Group, ISO / IEC 13818-2) is recommended compatible for receiving the encoded video signal from the transmitter 24 via a cable transmission channel 23, then the received signal is decoded to extract data transmission to the video display on the display 25. 该机顶盒的功能可以是用一系统有效执行的软件,这种系统执行诸如以字节码形式的Java被解释语言。 The set-top box functions may be software executed by a valid system, such as system performs bytecode to be interpreted in a Java language. 该系统包括主处理器CPU和用于存储软件代码部份的存储器MEN,软件代码部件代表按在图1或图2中所描述的本发明使主处理器CPU完成其方法的指令。 The system includes a main processor CPU and a memory for storing software code for part of the MEN, representative of the software code means according to the present invention in FIG. 1 or FIG. 2 described in the main processor CPU instruction completion method thereof.

根据本发明的另一种实施方案,机顶盒20可接收Java应用程序,这种Java应用程序包括作为接收信号一部分的字节码。 According to another embodiment of the present invention, the set top box 20 may receive a Java application, Java application program such as a received signal including a portion of the bytecode. 在这种情形下,机顶盒应包括一装载器,装入接收的以字节码为基础的程序,该程序来自远端发送器。 In this case, the set top box should include a loader, bytecode program based on the received load the program from the remote transmitter.

Claims (4)

  1. 1.一种在基于字节码的语言的虚拟机解释器中优化被解释的程序的方法,其中虚拟机通过用宏字节码的新序列取代简单字节码的原序列而达到自身的动态重新配置,其中虚拟机解释器被编码成线程代码解释器,用于将字节码翻译成它们实现的代码,包括引用表,引用表含有对字节码的实现的地址的引用,使得在翻译当前字节码期间,检索下一字节码的实现的地址以便能够跳至下一字节码;其中虚拟机解释器包括一组预定的字节码,其中一些是不用的,而其中宏操作代码的新序列是通过开发所说的不用的字节码而实现的。 An optimization procedure is to be construed in a virtual machine interpreter bytecode language based methods, wherein the substitution by a virtual machine with a new simple bytecode sequence macro bytecodes prosequence and achieve its dynamic reconfiguring, wherein the virtual machine interpreter is encoded into a threaded code interpreter for translating the byte code into code implementation thereof, including reference tables, the reference table containing references to the address of the bytecode to achieve such translation during the current address byte code, implemented retrieve the next bytecode to be able to skip to the next bytecode; wherein the virtual machine comprises a set of predetermined interpreted byte code, some of which are unused, and wherein the macro operation the new code sequence is not through the development of said bytecode achieved.
  2. 2.根据权利要求1的方法,其中原序列的字节码在所说的宏操作代码的新序列之后即被分组。 2. The method according to claim 1, wherein the byte code of the original sequence after the sequence of said new operation codes macro i.e. packets.
  3. 3.根据权利要求1的方法,其中不用的字节码是用至少2字节的表示编码的。 3. The method according to claim 1, wherein the bytecode is not at least two bytes of the encoded representation.
  4. 4.一种在使用基于字节码的语言的虚拟机中优化被解释的程序的方法,该方法包括如下方法:通过装入包含字节码的程序而初始化,用宏代码取代简单字节码的序列,用间接线程解释器解释宏字节码,以便将字节码翻译成它们的实现代码,包括引用表,引用表含有对字节码的实现的地址的引用,使得在解释当前字节码期间,检索下一字节码的实现的地址以便能够跳至下一字节码;其中虚拟机解释器包括一组预定的字节码,其中一些是不用的,而其中宏操作代码的新序列是通过开发所说的不用的字节码而实现的。 4. A method for optimizing the use of the program to be interpreted bytecode language of the virtual machine based on the method comprises the following method: By loading a program comprising bytecodes and initialization, a substituted simple macro code bytecode sequence macro interpreted bytecode interpreter threads indirectly, to translate the bytecode into their implementation code, comprising a reference table, the reference table containing a reference to the address of the bytecode implemented such that interpretation of the current byte during code, retrieve the next bytecode address is implemented to be able to skip to the next bytecode; wherein the virtual machine comprises a set of predetermined interpreted byte code, some of which are unused, and wherein the macro operation of the new code by developing sequence is said without bytecode achieved.
CN 00802974 1999-09-21 2000-09-13 Optimized bytecode interpreter of virtual machine instructions CN1173262C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP99402309 1999-09-21

Publications (2)

Publication Number Publication Date
CN1347525A true CN1347525A (en) 2002-05-01
CN1173262C true CN1173262C (en) 2004-10-27

Family

ID=8242118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 00802974 CN1173262C (en) 1999-09-21 2000-09-13 Optimized bytecode interpreter of virtual machine instructions

Country Status (5)

Country Link
EP (1) EP1183598A2 (en)
JP (1) JP2003510681A (en)
KR (1) KR20010080525A (en)
CN (1) CN1173262C (en)
WO (1) WO2001022213A3 (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6996813B1 (en) 2000-10-31 2006-02-07 Sun Microsystems, Inc. Frameworks for loading and execution of object-based programs
US7096466B2 (en) 2001-03-26 2006-08-22 Sun Microsystems, Inc. Loading attribute for partial loading of class files into virtual machines
US6957428B2 (en) 2001-03-27 2005-10-18 Sun Microsystems, Inc. Enhanced virtual machine instructions
US7543288B2 (en) 2001-03-27 2009-06-02 Sun Microsystems, Inc. Reduced instruction set for Java virtual machines
FR2828296B1 (en) * 2001-08-03 2003-12-12 Trusted Logic Code Compression Method object interpreted by factoring tree expressions
US7228533B2 (en) 2001-08-24 2007-06-05 Sun Microsystems, Inc. Frameworks for generation of Java macro instructions for performing programming loops
US7058934B2 (en) 2001-08-24 2006-06-06 Sun Microsystems, Inc. Frameworks for generation of Java macro instructions for instantiating Java objects
US6988261B2 (en) * 2001-08-24 2006-01-17 Sun Microsystems, Inc. Frameworks for generation of Java macro instructions in Java computing environments
US7039904B2 (en) * 2001-08-24 2006-05-02 Sun Microsystems, Inc. Frameworks for generation of Java macro instructions for storing values into local variables
FR2829252B1 (en) * 2001-08-30 2004-10-01 Gemplus Card Int Compressing a program in intermediate language
EP1308838A3 (en) 2001-10-31 2007-12-19 Aplix Corporation Intermediate code preprocessing apparatus, intermediate code execution apparatus, intermediate code execution system, and computer program product for preprocessing or executing intermediate code
US7516459B2 (en) 2004-02-20 2009-04-07 Intel Corporation Methods and apparatus to optimize managed application program interfaces
CN100495329C (en) 2004-06-05 2009-06-03 图形公司 Object process graph system
KR100597413B1 (en) * 2004-09-24 2006-07-05 삼성전자주식회사 Method for translating Java bytecode and Java interpreter using the same
KR100678912B1 (en) * 2005-10-18 2007-01-30 삼성전자주식회사 Method for interpreting method bytecode and system by the same
WO2011008856A3 (en) * 2009-07-14 2011-03-24 Unisys Corporation Systems, methods, and computer programs for dynamic binary translation in an interpreter
CN102662830A (en) * 2012-03-20 2012-09-12 湖南大学 Code reuse attack detection system based on dynamic binary translation framework

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5778232A (en) * 1996-07-03 1998-07-07 Hewlett-Packard Company Automatic compiler restructuring of COBOL programs into a proc per paragraph model

Also Published As

Publication number Publication date Type
WO2001022213A2 (en) 2001-03-29 application
JP2003510681A (en) 2003-03-18 application
EP1183598A2 (en) 2002-03-06 application
CN1347525A (en) 2002-05-01 application
KR20010080525A (en) 2001-08-22 application
WO2001022213A3 (en) 2001-11-29 application

Similar Documents

Publication Publication Date Title
Auerbach et al. Lime: a Java-compatible and synthesizable language for heterogeneous architectures
US5430850A (en) Data processing system with synchronization coprocessor for multiple threads
US6367071B1 (en) Compiler optimization techniques for exploiting a zero overhead loop mechanism
Bruening et al. An infrastructure for adaptive dynamic optimization
Debray et al. Profile-guided code compression
US6000028A (en) Means and apparatus for maintaining condition codes in an unevaluated state
US5450575A (en) Use of stack depth to identify machine code mistakes
US6910206B1 (en) Data processing with native and interpreted program instruction words
US5892966A (en) Processor complex for executing multimedia functions
US7162710B1 (en) Dynamic modifications to a heterogeneous program in a distributed environment
US7516453B1 (en) Binary translator with precise exception synchronization mechanism
US20070300238A1 (en) Adapting software programs to operate in software transactional memory environments
Silberman et al. An architectural framework for supporting heterogeneous instruction-set architectures
US6338160B1 (en) Constant pool reference resolution method
US6091897A (en) Fast translation and execution of a computer program on a non-native architecture by use of background translator
US5802373A (en) Method for providing a pipeline interpreter for a variable length instruction set
US20090150890A1 (en) Strand-based computing hardware and dynamically optimizing strandware for a high performance microprocessor system
US7146607B2 (en) Method and system for transparent dynamic optimization in a multiprocessing environment
US20070033592A1 (en) Method, apparatus, and computer program product for adaptive process dispatch in a computer system having a plurality of processors
US6061783A (en) Method and apparatus for manipulation of bit fields directly in a memory source
US5930509A (en) Method and apparatus for performing binary translation
US6484312B1 (en) Inferring operand types within an intermediate language
Leroy The ZINC experiment: an economical implementation of the ML language
US6966057B2 (en) Static compilation of instrumentation code for debugging support
US7543284B2 (en) Partial dead code elimination optimizations for program code conversion

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C14 Grant of patent or utility model
C19 Lapse of patent right due to non-payment of the annual fee