WO1997027539A1 - Methods and apparatuses for stack caching - Google Patents

Methods and apparatuses for stack caching Download PDF

Info

Publication number
WO1997027539A1
WO1997027539A1 PCT/US1997/001303 US9701303W WO9727539A1 WO 1997027539 A1 WO1997027539 A1 WO 1997027539A1 US 9701303 W US9701303 W US 9701303W WO 9727539 A1 WO9727539 A1 WO 9727539A1
Authority
WO
WIPO (PCT)
Prior art keywords
stack
memory
cache
pointer
value
Prior art date
Application number
PCT/US1997/001303
Other languages
English (en)
French (fr)
Other versions
WO1997027539B1 (en
Inventor
Marc Tremblay
James Michael O'connor
Original Assignee
Sun Microsystems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Microsystems, Inc. filed Critical Sun Microsystems, Inc.
Priority to JP52708497A priority Critical patent/JP3634379B2/ja
Priority to DE69734399T priority patent/DE69734399D1/de
Priority to EP97904010A priority patent/EP0976034B1/en
Publication of WO1997027539A1 publication Critical patent/WO1997027539A1/en
Publication of WO1997027539B1 publication Critical patent/WO1997027539B1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • G06F15/7842Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers)
    • G06F15/7846On-chip cache and off-chip main memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/22Microcontrol or microprogram arrangements
    • G06F9/26Address formation of the next micro-instruction ; Microprogram storage or retrieval arrangements
    • G06F9/262Arrangements for next microinstruction selection
    • G06F9/264Microinstruction selection based on results of processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30021Compare instructions, e.g. Greater-Than, Equal-To, MINMAX
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30134Register stacks; shift registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/3016Decoding the operand specifier, e.g. specifier format
    • G06F9/30167Decoding the operand specifier, e.g. specifier format of immediate specifier, e.g. constants
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/3017Runtime instruction translation, e.g. macros
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/3017Runtime instruction translation, e.g. macros
    • G06F9/30174Runtime instruction translation, e.g. macros for non-native instruction set, e.g. Javabyte, legacy code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • G06F9/345Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4488Object-oriented
    • G06F9/449Object-oriented method invocation or resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45504Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • G06F2212/451Stack data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45504Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
    • G06F9/45516Runtime code conversion or optimisation

Definitions

  • the present invention relates generally to computer systems and, in particular, to caching of stack memory architectures.
  • intranet In addition, to the public carrier network or Internet, many corporations and other businesses are shifting their internal information systems onto an intranet as a way of more effectively sharing information within a corporate or private network.
  • the basic infrastructure for an intranet is an internal network connecting servers and desktops, which may or may not be connected to the Internet through a firewall. These intranets provide services to desktops via standard open network protocols which are well established in the industry. Intranets provide many benefits to the enterprises which employ them, such as simplified internal information management and improved internal communication using the browser paradigm. Integrating Internet technologies with a company's enterprise infrastructure and legacy systems also leverages existing technology investment for the party employing an intranet.
  • intranets and the Internet are closely related, with intranets being used for internal and secure communications within the business and the Internet being used for external transactions between the business and the outside world.
  • the term "networks” includes both the Internet and intranets. However, the distinction between the Internet and an intranet should be born in mind where applicable.
  • JAVA is a trademark of Sun Microsystems of Mountain View, CA.
  • the JAVA programming language resulted from programming efforts which initially were intended to be coded in the C++ programming language; therefore, the JAVA programming language has many commonalities with the C++ programming language.
  • the JAVA programming language is a simple, object-oriented, distributed, interpreted yet high performance, robust yet safe, secure, dynamic, architecture neutral, portable, and multi-threaded language.
  • the JAVA programming language has emerged as the programming language of choice for the Internet as many large hardware and software companies have licensed it from Sun Microsystems.
  • the JAVA programming language and environment is designed to solve a number of problems in modern programming practice.
  • the JAVA programming language omits many rarely used, poorly understood, and confusing features of the C++ programming language. These omitted features primarily consist of operator overloading, multiple inheritance, and extensive automatic coercions.
  • the JAVA programming language includes automatic garbage collection that simplifies the task of programming because it is no longer necessary to allocated and free memory as in the C programming language.
  • the JAVA programming language restricts the use. of pointers as defined in the C programming language, and instead has true arrays in which array bounds are explicitly checked, thereby eliminating vulnerability to many viruses and nasty bugs.
  • the JAVA programming language includes objective-C interfaces and specific exception handlers.
  • the JAVA programming language has an extensive library of routines for coping easily with TCP/IP protocol (Transmission Control Protocol based on
  • the JAVA programming language is intended to be used in networked/distributed environments .
  • the JAVA programming language enabled the construction of virus-free, tamper-free systems.
  • the authentication techniques are based on public-key encryption.
  • stack-based computing systems including those implementing the JAVA virtual machine, use relatively slow memory devices to store the stack.
  • adding a cache for slow memory devices increases overall memory performance only if the vast majority of memory requests result in cache hits, i.e. the requested memory address is within the cache.
  • Conventional cache designs are designed for random access memory architectures and do not perform well with stack-based memory architectures. Therefore, a caching method and a caching apparatus targeted to improve stack-based memory architectures is desirable.
  • the present invention provides a stack management unit including a stack cache to accelerate data transfers between the stack-based computing system and the stack.
  • the stack management unit includes a stack cache, a dribble manager unit, and a stack control unit. Since the vast majority of memory accesses to the stack occur at or near the top of the stack, the dribble manager unit maintains the top portion of the stack in the stack cache. Specifically, when the stack-based computing system is pushing data onto the stack and the stack cache is almost full, the dribble manager unit transfers data from the bottom of the stack cache to the stack so that the top portion of the stack remains in the stack cache. When the stack-based computing system is popping data off of the stack and the stack cache is becoming empty, the dribble manager unit transfer data from the stack to the bottom of the stack cache to maintain the top portion of the stack in the stack cache.
  • the stack cache includes a stack cache memory circuit, one or more read ports, and one or more write ports.
  • the stack cache memory circuit is a register file configured in a circular buffer memory architecture.
  • the registers can be addressed using modulo addressing.
  • an OPTOP pointer is used to define and point to the top memory location in the stack cache memory circuit and a bottom pointer is used to define and point to the bottom memory location in the stack cache memory circuit.
  • the OPTOP pointer is incremented or decremented, respectively.
  • the bottom pointer is incremented or decremented, respectively.
  • the stack management unit include a fill control unit and a spill control unit. If the fill control unit detects a fill condition, the fill control unit transfers data from the stack to the stack cache memory circuit.
  • a fill condition occurs if a the optop pointer is greater than a high water mark.
  • a fill condition occurs if the number of free memory locations in the stack cache memory circuit is greater than a low cache threshold or the number of used memory locations is less than the low cache threshold.
  • the low water mark and the low cache threshold are stored in programmable registers. The number of free memory locations can be determined with a modulo subtractor.
  • a spill condition occurs if the optop pointer is less than a low water mark.
  • a spill condition occurs if the number of free location in the stack cache memory circuit is less than a high cache threshold or the number of used memory location is greater than the high cache threshold.
  • the low water mark and the low cache threshold are stored in programmable registers. The number of free memory locations can be determined with a modulo subtractor.
  • method frames are stored in two different memory circuits. The first memory circuit stores the execution environment of each method call, and the second memory circuit stores parameters, variables or operands of the method calls.
  • the execution environment includes a return program counter, a return frame, a return constant pool, a current method vector, and a current monitor address.
  • the memory circuits are stacks; therefore, the stack management unit described herein can be used to cache the memory circuits.
  • Figure 1 is a block diagram of one embodiment of virtual machine hardware processor that utilizes the a stack management unit of this invention.
  • Figure 2 is an process flow diagram for generation of virtual machine instructions that are used in one embodiment of this invention.
  • Figure 3 illustrates an instruction pipeline implemented in the hardware processor of Figure 1.
  • Figure 4A is an illustration of the one embodiment of the logical organization of a stack structure where each method frame includes a local variable storage area, an environment storage area, and an operand stack utilized by the hardware processor of Figure 1.
  • Figure 4B is an illustration of an alternative embodiment of the logical organization of a stack structure where each method frame includes a local variable storage area and an operand stack on the stack, and an environment storage area for the method frame is included on a separate execution environment stack.
  • Figure 4C is an illustration of an alternative embodiment of the stack management unit for the stack and execution environment stack of Figure 4B.
  • Figure 4D is an illustration of one embodiment of the local variables look-aside cache in the stack management unit of Figure 1.
  • Figure 5 illustrates several possible add-ons to the hardware processor of Figure 1.
  • Figure 6 illustrates a block diagram of one embodiment of a stack cache management unit in accordance with this invention.
  • Figure 7 illustrates the memory architecture of one embodiment of a stack cache in accordance with this invention.
  • Figure 8 illustrates the contents of a register or memory location of one embodiment of a stack cache in accordance with this invention.
  • Figure 9 illustrates a block diagram of one embodiment of a dribble manager unit in accordance with this invention.
  • Figure 10A illustrates a block diagram of another embodiment of a dribble manager unit in accordance with this invention.
  • Figure 10B illustrates a block diagram of another embodiment of a dribble manager unit in accordance with this invention.
  • Figure 11 illustrates a block diagram of a portion of an embodiment of a dribble manager unit in accordance with this invention.
  • Figure 12 illustrates an pointer generation circuit for one embodiment of a stack cache in accordance with this invention.
  • Figure 1 illustrates one embodiment of a virtual machine instruction hardware processor 100, hereinafter hardware processor 100, that includes a stack management unit in accordance with the present invention, and that directly executes virtual machine instructions that are processor architecture independent.
  • hardware processor 100 in executing JAVA virtual machine instructions is much better than high-end CPUs, such as the Intel PENTIUM microprocessor or the Sun Microsystems ULTRASPARC processor, (ULTRASPARC is a trademark of Sun Microsystems of Mountain View, CA. , and PENTIUM is a trademark of Intel Corp. of Sunnyvale, CA.) interpreting the same virtual machine instructions with a software JAVA interpreter, or with a JAVA just-in- time compiler; is low cost; and exhibits low power consumption.
  • ULTRASPARC is a trademark of Sun Microsystems of Mountain View, CA.
  • PENTIUM is a trademark of Intel Corp. of Sunnyvale, CA.
  • hardware processor 100 is well suited for portable applications.
  • Hardware processor 100 provides similar advantages for other virtual machine stack-based architectures as well as for virtual machines utilizing features such as garbage collection, thread synchronization, etc.
  • a system based on hardware processor 100 presents attractive price for performance characteristics, if not the best overall performance, as compared with alternative virtual machine execution environments including software interpreters and just-in-time compilers.
  • the present invention is not limited to virtual machine hardware processor embodiments, and encompasses any suitable stack-based, or non-stack-based machine implementations, including implementations emulating the JAVA virtual machine as a software interpreter, compiling JAVA virtual machine instructions (either in batch or just-in-time) to machine instruction native to a particular hardware processor, or providing hardware implementing the JAVA virtual machine in microcode, directly in silicon, or in some combination thereof.
  • hardware processor 100 has the advantage that the 250 Kilobytes to 500 Kilobytes (Kbytes) of memory storage, e.g., read-only memory or random access memory, typically required by a software interpreter, is eliminated.
  • a simulation of hardware processor 100 showed that hardware processor 100 executes virtual machine instructions twenty times faster than a software interpreter running on a variety of applications on a PENTIUM processor clocked at the same clock rate as hardware processor 100, and executing the same virtual machine instructions.
  • Another simulation of hardware processor 100 showed that hardware processor 100 executes virtual machine instructions five times faster than a just-in-time compiler running on a PENTIUM processor running at the same clock rate as hardware processor 100, and executing the same virtual machine instructions.
  • hardware processor 100 is advantageous. These applications include, for example, an Internet chip for network appliances, a cellular telephone processor, other telecommunications integrated circuits, or other low-power, low-cost applications such as embedded processors, and portable devices .
  • the present invention includes a stack management unit 150 that utilizes a stack cache 155 to accelerate data transfers for execution unit 140.
  • stack management unit 150 can be an integral part of hardware processor 100 as shown in Figure 1, many embodiments of stack management unit 150 are not integrated with a hardware processor since stack management in accordance with the present invention can be adapted for use with any stack-based computing system.
  • stack management unit 150 includes a stack cache 155, a dribble manager unit 151, and a stack control unit 152.
  • dribble manager unit 151 When hardware processor 100 is pushing data onto stack 400 ( Figure 4 (a) ) and stack cache 155 is almost full, dribble manager unit 151 transfers data from the bottom of stack cache 155 to stack 400 through data cache unit 160, so that the top portion of stack 400 remains in stack cache 155. When hardware processor 100 is popping data off of stack 400 and stack cache 155 is almost empty, dribble manager unit 151 transfers data from stack 400 to the bottom of stack cache 155 so that the top portion of stack 400 is maintained in stack cache 155.
  • a virtual machine is an abstract computing machine that, like a real computing machine, has an instruction set and uses various memory areas.
  • a virtual machine specification defines a set of processor architecture independent virtual machine instructions that are executed by a virtual machine implementation, e.g., hardware processor 100.
  • Each virtual machine instruction defines a specific operation that is to be performed.
  • the virtual computing machine need not understand the computer language that is used to generate virtual machine instructions or the underlying implementation of the virtual machine. Only a particular file format for virtual machine instructions needs to be understood.
  • the virtual machine instructions are JAVA virtual machine instructions.
  • Each JAVA virtual machine instruction includes one or more bytes that encode instruction identifying information, operands, and any other required information.
  • Appendix I which is incorporated herein by reference in its entirety, includes an illustrative set of the JAVA virtual machine instructions.
  • the particular set of virtual machine instructions utilized is not an essential aspect of this invention.
  • those of skill in the art can modify the invention for a particular set of virtual machine instructions, or for changes to the JAVA virtual machine specification.
  • a JAVA compiler JAVAC (Fig. 2) that is executing on a computer platform, converts an application 201 written in the JAVA computer language to an architecture neutral object file format encoding a compiled instruction sequence 203, according to the JAVA Virtual Machine Specification, that includes a compiled instruction set.
  • a source of virtual machine instructions and related information is needed. The method or technique used to generate the source of virtual machine instructions and related information is not essential to this invention.
  • Compiled instruction sequence 203 is executable on hardware processor 100 as well as on any computer platform that implements the JAVA virtual machine using, for example, a software interpreter or just-in- time compiler.
  • hardware processor 100 provides significant performance advantages over the software implementations.
  • hardware processor 100 processes the JAVA virtual machine instructions, which include bytecodes.
  • Hardware processor 100 executes directly most of the bytecodes. However, execution of some of the bytecodes is implemented via microcode.
  • firmware means microcode stored in ROM that when executed controls the operations of hardware processor 100.
  • hardware processor 100 includes an I/O bus and memory interface unit 110, an instruction cache unit 120 including instruction cache 125, an instruction decode unit 130, a unified execution unit 140, a stack management unit 150 including stack cache 155, a data cache unit 160 including a data cache 165, and program counter and trap control logic 170. Each of these units is described more completely below.
  • each unit includes several elements.
  • the interconnections between elements within a unit are not shown in Figure 1.
  • those of skill in the art will understand the interconnections and cooperation between the elements in a unit and between the various units.
  • the pipeline stages implemented using the units illustrated in Figure 1 include fetch, decode, execute, and write-back stages. If desired, extra stages for memory access or exception resolution are provided in hardware processor 100.
  • Figure 3 is an illustration of a four stage pipeline for execution of instructions in the exemplary embodiment of processor 100.
  • fetch stage 301 a virtual machine instruction is fetched and placed in instruction buffer 124 (Fig. 1) .
  • the virtual machine instruction is fetched from one of (i) a fixed size cache line from instruction cache 125 or (ii) external memory.
  • each virtual machine instruction is between one and five bytes long. Thus, to keep things simple, at least forty bits are required to guarantee that all of a given instruction is contained in the fetch.
  • Another alternative is to always fetch a predetermined number of bytes, for example, four bytes, starting with the opcode. This is sufficient for 95% of JAVA virtual machine instructions (See Appendix I) .
  • JAVA virtual machine instructions See Appendix I
  • the instruction execution can be started with the first operands fetched even if the full set of operands is not yet available.
  • decode stage 302 FIG. 3
  • the virtual machine instruction at the front of instruction buffer 124 (Fig. 1) is decoded and instruction folding is performed if possible.
  • Stack cache 155 is accessed only if needed by the virtual machine instruction.
  • Register OPTOP that contains a pointer OPTOP to a top of a stack 400 (Fig. 4) , is also updated in decode stage 302 (Fig. 3) .
  • a register to store a pointer is illustrative only of one embodiment.
  • the pointer may be implemented using hardware register, a hardware counter, a software counter, a software pointer, or other equivalent embodiments known to those of skill in the art.
  • the particular implementation selected is not essential to the invention, and typically is made based on a price to performance trade-off.
  • execute stage 303 the virtual machine instruction is executed for one or more cycles.
  • an ALU in integer unit 142 (Fig. 1) is used either to do an arithmetic computation or to calculate the address of a load or store from data cache unit (DCU) 160. If necessary, traps are prioritized and taken at the end of execute stage 303 (Fig. 3) .
  • the branch address is calculated in execute stage 303, as well as the condition upon which the branch is dependent .
  • Cache stage 304 is a non-pipelined stage.
  • Data cache 165 (Fig. 1) is accessed if needed during execution stage 303 (Fig. 3) .
  • stage 304 is non-pipelined is because hardware processor 100 is a stack-based machine.
  • the instruction following a load is almost always dependent on the value returned by the load. Consequently, in this embodiment, the pipeline is held for one cycle for a data cache access. This reduces the pipeline stages, and the die area taken by the pipeline for the extra registers and bypasses.
  • Write-back stage 305 is the last stage in the pipeline. In stage 305, the calculated data is written back to stack cache 155.
  • Hardware processor 100 in this embodiment, directly implements a stack 400 (Fig. 4A) that supports the JAVA virtual machine stack-based architecture (See Appendix I) . Sixty-four entries on stack 400 are contained on stack cache 155 in stack management unit 150. Some entries in stack 400 may be duplicated on stack cache 150. Operations on data are performed through stack cache 155.
  • Stack 400 of hardware processor 100 is primarily used as a repository of information for methods. At any point in time, hardware processor 100 is executing a single method. Each method has memory space, i.e., a method frame on stack 400, allocated for a set of local variables, an operand stack, and an execution environment structure. A new method frame, e.g., method frame two 410, is allocated by hardware processor 100 upon a method invocation in execution stage 303 (Fig. 3) and becomes the current frame, i.e., the frame of the current method.
  • Current frame 410 (Fig. 4A) may contain a part of or all of the following six entities, depending on various method invoking situations:
  • Invoker's method context Operand stack; and Return value from method.
  • object reference, incoming arguments, and local variables are included in arguments and local variables area 421.
  • the invoker's method context is included in execution environment 422, sometimes called frame state, that in turn includes: a return program counter value 431 that is the address of the virtual machine instruction, e.g., JAVA opcode, next to the method invoke instruction; a return frame 432 that is the location of the calling method's frame,* a return constant pool pointer 433 that is a pointer to the calling method's constant pool table; a current method vector 434 that is the base address of the current method's vector table; and a current monitor address 435 that is the address of the current method's monitor.
  • a return program counter value 431 that is the address of the virtual machine instruction, e.g., JAVA opcode, next to the method invoke instruction
  • a return frame 432 that is the location of the calling method's frame,* a return
  • the object reference is an indirect pointer to an object-storage representing the object being targeted for the method invocation.
  • JAVA compiler JAVAC See
  • Fig. 2. generates an instruction to push this pointer onto operand stack 423 prior to generating an invoke instruction.
  • This object reference is accessible as local variable zero during the execution of the method.
  • This indirect pointer is not available for a static method invocation as there is no target-object defined for a static method invocation.
  • the list of incoming arguments transfers information from the calling method to the invoked method. Like the object reference, the incoming arguments are pushed onto stack 400 by JAVA compiler generated instructions and may be accessed as local variables.
  • JAVA compiler JAVAC (See Fig. 2.) statically generates a list of arguments for current method 410 (Fig. 4A) , and hardware processor 100 determines the number of arguments from the list.
  • the object reference is present in the frame for a non-static method invocation, the first argument is accessible as local variable one. For a static method invocation, the first argument becomes local variable zero.
  • the upper 32-bits i.e., the 32 most significant bits, of a 64-bit entity are placed on the upper location of stack 400, i.e., pushed on the stack last.
  • the upper 32-bit portion of the 64-bit entity is on the top of the stack, and the lower 32-bit portion of the 64-bit entity is in the storage location immediately adjacent to the top of stack 400.
  • the local variable area on stack 400 (Fig. 4A) for current method 410 represents temporary variable storage space which is allocated and remains effective during invocation of method 410.
  • JAVA compiler JAVAC (Fig. 2) statically determines the required number of local variables and hardware processor 100 allocates temporary variable storage space accordingly.
  • the local variables When a method is executing on hardware processor 100, the local variables typically reside in stack cache 155 and are addressed as offsets from pointer VARS (Figs. 1 and 4A) , which points to the position of the local variable zero. Instructions are provided to load the values of local variables onto operand stack 423 and store values from operand stack into local variables area 421.
  • the information in execution environment 422 includes the invoker's method context.
  • hardware processor 100 pushes the invoker's method context onto newly allocated frame 410, and later utilizes the information to restore the invoker's method context before returning.
  • Pointer FRAME (Figs. 1 and 4A) is a pointer to the execution environment of the current method. In the exemplary embodiment, each register in register set 144 (Fig. 1) is 32-bits wide.
  • Operand stack 423 is allocated to support the execution of the virtual machine instructions within the current method.
  • Program counter register PC (Fig. 1) contains the address of the next instruction, e.g., opcode, to be executed.
  • Locations on operand stack 423 (Fig. 4A) are used to store the operands of virtual machine instructions, providing both source and target storage locations for instruction execution.
  • the size of operand stack 423 is statically determined by JAVA compiler JAVAC (Fig. 2) and hardware processor 100 allocates space for operand stack 423 accordingly.
  • Register OPTOP (Figs. 1 and 4A) holds a pointer to a top of operand stack 423.
  • the invoked method may return its execution result onto the invoker's top of stack, so that the invoker can access the return value with operand stack references.
  • the return value is placed on the area where an object reference or an argument is pushed before a method invocation.
  • One way to speed up this process is for hardware processor 100 to load the execution environment in the background and indicate what has been loaded so far, e.g., simple one bit scoreboarding. Hardware processor 100 tries to execute the bytecodes of the called method as soon as possible, even though stack 400 is not completely loaded. If accesses are made to variables already loaded, overlapping of execution with loading of stack 400 is achieved, otherwise a hardware interlock occurs and hardware processor 100 just waits for the variable or variables in the execution environment to be loaded.
  • Figure 4B illustrates another way to accelerate method invocation.
  • the execution environment of each method frame is stored separately from the local variable area and the operand stack of the method frame.
  • stack 400B contains modified method frames, e.g. modified method frame 410B having only local variable area 421 and operand stack 423.
  • Execution environment 422 of the method frame is stored in an execution environment memory 440. Storing the execution environment in execution environment memory 440 reduces the amount of data in stack cache 155. Therefore, the size of stack cache 155 can be reduced.
  • execution environment memory 440 and stack cache 155 can be accessed simultaneously.
  • method invocation can be accelerated by loading or storing the execution environment in parallel with loading or storing data onto stack 400B.
  • the memory architecture of execution environment memory 440 is also a stack. As modified method frames are pushed onto stack 400b through stack cache 155, corresponding execution environments are pushed onto execution environment memory 440. For example, since modified method frames 0 to 2, as shown in Figure 4B, are in stack 400B, execution environments (EE) 0 to 2 , respectively, are stored in execution environment memory circuit 440.
  • an execution environment cache can be added to improve the speed of saving and retrieving the execution environment during method invocation.
  • the architecture described more completely below for stack cache 155, dribbler manager unit 151, and stack control unit 152 for caching stack 400, can also be applied to caching execution environment memory 440.
  • Figure 4C illustrates an embodiment of stack management unit 150 modified to support both stack 400b and execution environment memory 440. Specifically, the embodiment of stack management unit 150 in Figure 4C adds an execution environment stack cache 450, an execution environment dribble manager unit 460, and an execution environment stack control unit 470. Typically, execution dribble manager unit 460 transfers an entire execution environment between execution environment cache 450 and execution environment memory 440 during a spill operation or a fill operation.
  • I/O bus and memory interface unit 110 (Fig. 1) , sometimes called interface unit 110, implements an interface between hardware processor 100 and a memory hierarchy which in an exemplary embodiment includes external memory and may optionally include memory storage and/or interfaces on the same die as hardware processor 100.
  • I/O controller 111 interfaces with external I/O devices and memory controller 112 interfaces with external memory.
  • external memory means memory external to hardware processor 100.
  • external memory either may be included on the same die as hardware processor 100, may be external to the die containing hardware processor 100, or may include both on- and off-die portions.
  • requests to I/O devices go through memory controller 112 which maintains an address map of the entire system including hardware processor 100.
  • hardware processor 100 is the only master and does not have to arbitrate to use the memory bus.
  • alternatives for the input/output bus that interfaces with I/O bus and memory interface unit 110 include supporting memory-mapped schemes, providing direct support for PCI, PCMCIA, or other standard busses.
  • Fast graphics w/ VIS or other technology may optionally be included on the die with hardware processor 100.
  • I/O bus and memory interface unit 110 generates read and write requests to external memory.
  • interface unit 110 provides an interface for instruction cache and data cache controllers 121 and 161 to the external memory.
  • Interface unit 110 includes arbitration logic for internal requests from instruction cache controller 121 and data cache controller 161 to access external memory and in response to a request initiates either a read or a write request on the memory bus to the external memory.
  • a request from data cache controller 121 is always treated as higher priority relative to a request from instruction cache controller 161.
  • Interface unit 110 provides an acknowledgment signal to the requesting instruction cache controller 121, or data cache controller 161 on read cycles so that the requesting controller can latch the data. On write cycles, the acknowledgment signal from interface unit 110 is used for flow control so that the requesting instruction cache controller 121 or data cache controller 161 does not generate a new request when there is one pending. Interface unit 110 also handles errors generated on the memory bus to the external memory.
  • Instruction cache unit (ICU) 120 fetches virtual machine instructions from instruction cache 125 and provides the instructions to instruction decode unit 130.
  • instruction cache controller 121 upon a instruction cache hit, transfers an instruction from instruction cache 125 to instruction buffer 124 where the instruction is held until integer execution unit IEU, that is described more completely below, is ready to process the instruction. This separates the rest of pipeline 300 (Fig. 3) in hardware processor 100 from fetch stage 301. If it is undesirable to incur the complexity of supporting an instruction-buffer type of arrangement, a temporary one instruction register is sufficient for most purposes. However, instruction fetching, caching, and buffering should provide sufficient instruction bandwidth to support instruction folding as described below.
  • the front end of hardware processor 100 is largely separate from the rest of hardware processor 100.
  • the instructions are aligned on an arbitrary eight-bit boundary by byte aligner circuit 122 in response to a signal from instruction decode unit 130.
  • byte aligner circuit 122 in response to a signal from instruction decode unit 130.
  • the front end of hardware processor 100 efficiently deals with fetching from any byte position.
  • hardware processor 100 deals with the problems of instructions that span multiple cache lines of cache 125. In this case, since the opcode is the first byte, the design is able to tolerate an extra cycle of fetch latency for the operands. Thus, a very simple de-coupling between the fetching and execution of the bytecodes is possible.
  • instruction cache controller 121 In case of an instruction cache miss, instruction cache controller 121 generates an external memory request for the missed instruction to I/O bus and memory interface unit 110. If instruction buffer 124 is empty, or nearly empty, when there is an instruction cache miss, instruction decode unit 130 is stalled, i.e., pipeline 300 is stalled. Specifically, instruction cache controller 121 generates a stall signal upon a cache miss which is used along with an instruction buffer empty signal to determine whether to stall pipeline 300. Instruction cache 125 can be invalidated to accommodate self-modifying code, e.g., instruction cache controller 121 can invalidate a particular line in instruction cache 125.
  • instruction cache controller 121 determines the next instruction to be fetched, i.e., which instruction in instruction cache 125 needs to accessed, and generates address, data and control signals for data and tag RAMs in instruction cache 125. On a cache hit, four bytes of data are fetched from instruction cache 125 in a single cycle, and a maximum of four bytes can be written into instruction buffer 124.
  • Byte aligner circuit 122 aligns the data out of the instruction cache RAM and feeds the aligned data to instruction buffer 124. As explained more completely below, the first two bytes in instruction buffer 124 are decoded to determine the length of the virtual machine instruction. Instruction buffer 124 tracks the valid instructions in the queue and updates the entries, as explained more completely below.
  • Instruction cache controller 121 also provides the data path and control for handling instruction cache misses. On an instruction cache miss, instruction cache controller 121 generates a cache fill request to I/O bus and memory interface unit 110.
  • instruction cache controller 121 On receiving data from external memory, instruction cache controller 121 writes the data into instruction cache 125 and the data are also bypassed into instruction buffer 124. Data are bypassed to instruction buffer 124 as soon as the data are available from external memory, and before the completion of the cache fill. Instruction cache controller 121 continues fetching sequential data until instruction buffer 124 is full or a branch or trap has taken place. In one embodiment, instruction buffer 124 is considered full if there are more than eight bytes of valid entries in buffer 124. Thus, typically, eight bytes of data are written into instruction cache 125 from external memory in response to the cache fill request sent to interface unit 110 by instruction cache unit 120. If there is a branch or trap taken while processing an instruction cache miss, only after the completion of the miss processing is the trap or branch executed.
  • a fault indication is generated and stored into instruction buffer 124 along with the virtual machine instruction, i.e., a fault bit is set.
  • the line is not written into instruction cache 125.
  • the erroneous cache fill transaction acts like a non-cacheable transaction except that a fault bit is set.
  • Instruction cache controller 121 also services non-cacheable instruction reads.
  • An instruction cache enable (ICE) bit in a processor status register in register set 144, is used to define whether a load can be cached. If the instruction cache enable bit is cleared, instruction cache unit 120 treats all loads as non-cacheable loads.
  • Instruction cache controller 121 issues a non-cacheable request to interface unit 110 for non-cacheable instructions.
  • the data are available on a cache fill bus for the non-cacheable instruction, the data are bypassed into instruction buffer 124 and are not written into instruction cache 125.
  • instruction cache 125 is a direct-mapped, eight-byte line size cache. Instruction cache 125 has a single cycle latency.
  • the cache size is configurable to OK, IK, 2K, 4K, 8K and 16K byte sizes where K means kilo.
  • the default size is 4K bytes.
  • Each line has a cache tag entry associated with the line. Each cache tag contains a twenty bit address tag field and one valid bit for the default 4K byte size.
  • Instruction buffer 124 which, in an exemplary embodiment, is a twelve-byte deep first-in, first-out (FIFO) buffer, de-links fetch stage 301 (Fig. 3) from the rest of pipeline 300 for performance reasons.
  • Each instruction in buffer 124 (Fig. 1) has an associated valid bit and an error bit. When the valid bit is set, the instruction associated with that valid bit is a valid instruction. When the error bit is set, the fetch of the instruction associated with that error bit was an erroneous transaction.
  • Instruction buffer 124 includes an instruction buffer control circuit (not shown) that generates signals to pass data to and from instruction buffer 124 and that keeps track of the valid entries in instruction buffer 124, i.e., those with valid bits set.
  • instruction buffer 124 in an exemplary embodiment, four bytes can be received into instruction buffer 124 in a given cycle. Up to five bytes, representing up to two virtual machine instructions, can be read out of instruction buffer 124 in a given cycle. Alternative embodiments, particularly those providing folding of multi-byte virtual machine instructions and/or those providing folding of more than two virtual machine instructions, provide higher input and output bandwidth. Persons of ordinary skill in the art will recognize a variety of suitable instruction buffer designs including, for example, alignment logic, circular buffer design, etc. When a branch or trap is taken, all the entries in instruction buffer 124 are nullified and the branch/trap data moves to the top of instruction buffer 124.
  • a unified execution unit 140 is shown.
  • instruction decode unit 120, integer unit 142, and stack management unit 150 are considered a single integer execution unit
  • floating point execution unit 143 is a separate optional unit.
  • the various elements in the execution unit may be implemented using the execution unit of another processor.
  • the various elements included in the various units of Figure 1 are exemplary only of one embodiment. Each unit could be implemented with all or some of the elements shown. Again, the decision is largely dependent upon a price vs. performance trade ⁇ off.
  • virtual machine instructions are decoded in decode stage 302 (Fig. 3) of pipeline 300.
  • two bytes that can correspond to two virtual machine instructions, are fetched from instruction buffer 124 (Fig. 1) .
  • the two bytes are decoded in parallel to determine if the two bytes correspond to two virtual machine instructions, e.g., a first load top of stack instruction and a second add top two stack entries instruction, that can be folded into a single equivalent operation. Folding refers to supplying a single equivalent operation corresponding to two or more virtual machine instructions .
  • a single-byte first instruction can be folded with a second instruction.
  • alternative embodiments provide folding of more than two virtual machine instructions, e.g., two to four virtual machine instructions, and of multi-byte virtual machine instructions, though at the cost of instruction decoder complexity and increased instruction bandwidth. See U.S. Patent Application Serial No. 08/xxx,xxx, entitled “INSTRUCTION FOLDING FOR A STACK-BASED MACHINE” naming Marc Tremblay and James Michael O'Connor as inventors, assigned to the assignee of this application, and filed on even date herewith with Attorney Docket No. SP2036, which is incorporated herein by reference in its entirety.
  • the first byte which corresponds to the first virtual machine instruction
  • the first and second instructions are not folded.
  • An optional current object loader folder 132 exploits instruction folding, such as that described above, and in greater detail in U.S. Patent Application Serial No. 08/xxx,xxx, entitled “INSTRUCTION FOLDING FOR A STACK-BASED MACHINE” naming Marc Tremblay and James Michael O'Connor as inventors, assigned to the assignee of this application, and filed on even date herewith with Attorney Docket No. SP2036, which is incorporated herein by reference in its entirety, in virtual machine instruction sequences which simulation results have shown to be particularly frequent and therefore a desirable target for optimization.
  • method invocations typically load an object reference for the corresponding object onto the operand stack and fetch a field from the object.
  • instruction decode unit 130 Upon a subsequent call of that instruction, instruction decode unit 130 detects that the instruction is identified as a quick-variant and simply retrieves the information needed to initiate execution of the instruction from non-quick to quick translator cache 131.
  • Non-quick to quick translator cache is an optional feature of hardware processor 100.
  • branch predictor circuit 133 branch predictor circuit 133.
  • Implementations for branch predictor circuit 133 include branching based on opcode, branching based on offset, or branching based on a two-bit counter mechanism.
  • Operand stack 423 contains a reference to an object and some number of arguments when this instruction is executed.
  • Index bytes one and two are used to generate an index into the constant pool of the current class.
  • the item in the constant pool at that index points to a complete method signature and class. Signatures are defined in Appendix I and that description is incorporated herein by reference.
  • the method signature a short, unique identifier for each method, is looked up in a method table of the class indicated.
  • the result of the lookup is a method block that indicates the type of method and the number of arguments for the method.
  • the object reference and arguments are popped off this method's stack and become initial values of the local variables of the new method.
  • the execution then resumes with the first instruction of the new method.
  • instructions invokevirt al, opcode 182, and invokestatic, opcode 184 invoke processes similar to that just described. In each case, a pointer is used to lookup a method block.
  • a method argument cache 134 that also is an optional feature of hardware processor 100, is used, in a first embodiment, to store the method block of a method for use after the first call to the method, along with the pointer to the method block as a tag.
  • Instruction decode unit 130 uses index bytes one and two to generate the pointer and then uses the pointer to retrieve the method block for that pointer in cache 134. This permits building the stack frame for the newly invoked method more rapidly in the background in subsequent invocations of the method.
  • Alternative embodiments may use a program counter or method identifier as a reference into cache 134. If there is a cache miss, the instruction is executed in the normal fashion and cache 134 is updated accordingly. The particular process used to determine which cache entry is overwritten is not an essential aspect of this invention.
  • method argument cache 134 is used to store the pointer to the method block, for use after the first call to the method, along with the value of program counter PC of the method as a tag.
  • Instruction decode unit 130 uses the value of program counter PC to access cache 134. If the value of program counter PC is equal to one of the tags in cache 134, cache 134 supplies the pointer stored with that tag to instruction decode unit 130.
  • Instruction decode unit 139 uses the supplied pointer to retrieve the method block for the method.
  • Wide index forwarder 136 which is an optional element of hardware processor 100, is a specific embodiment of instruction folding for instruction wide. Wide index forwarder 136 handles an opcode encoding an extension of an index operand for an immediately subsequent virtual machine instruction. In this way, wide index forwarder 136 allows instruction decode unit 130 to provide indices into local variable storage 421 when the number of local variables exceeds that addressable with a single byte index without incurring a separate execution cycle for instruction wide.
  • instruction decoder 135, particularly instruction folding, non-quick to quick translator cache 131, current object loader folder 132, branch predictor 133, method argument cache 134, and wide index forwarder 136 are also useful in implementations that utilize a software interpreter or just-in-time compiler, since these elements can be used to accelerate the operation of the software interpreter or just-in-time compiler.
  • the virtual machine instructions are translated to an instruction for the processor executing the interpreter or compiler, e.g., any one of a Sun processor, a DEC processor, an Intel processor, or a Motorola processor, for example, and the operation of the elements is modified to support execution on that processor.
  • Integer execution unit IEU that includes instruction decode unit 130, integer unit 142, and stack management unit 150, is responsible for the execution of all the virtual machine instructions except the floating point related instructions.
  • the floating point related instructions are executed in floating point unit 143.
  • Integer execution unit IEU interacts at the front end with instructions cache unit 120 to fetch instructions, with floating point unit (FPU) 143 to execute floating point instructions, and finally with data cache unit (DCU) 160 to execute load and store related instructions.
  • Integer execution unit IEU also contains microcode ROM 149 which contains instructions to execute certain virtual machine instructions associated with integer operations.
  • Integer execution unit IEU includes a cached portion of stack 400, i.e., stack cache 155.
  • Stack cache 155 provides fast storage for operand stack and local variable entries associated with a current method, e.g., operand stack 423 and local variable storage 421 entries.
  • stack cache 155 may provide sufficient storage for all operand stack and local variable entries associated with a current method, depending on the number of operand stack and local variable entries, less than all of local variable entries or less than all of both local variable entries and operand stack entries may be represented in stack cache 155.
  • additional entries e.g., operand stack and or local variable entries for a calling method, may be represented in stack cache 155 if space allows.
  • Stack cache 155 is a sixty-four entry thirty-two- bit wide array of registers that is physically implemented as a register file in one embodiment .
  • Stack cache 155 has three read ports, two of which are dedicated to integer execution unit IEU and one to dribble manager unit 151. Stack cache 155 also has two write ports, one dedicated to integer execution unit IEU and one to dribble manager unit 151.
  • Integer unit 142 maintains the various pointers which are used to access variables, such as local variables, and operand stack values, in stack cache 155. Integer unit 142 also maintains pointers to detect whether a stack cache hit has taken place. Runtime exceptions are caught and dealt with by exception handlers that are implemented using information in microcode ROM 149 and circuit 170.
  • Integer unit 142 contains a 32-bit ALU to support arithmetic operations.
  • the operations supported by the ALU include: add, subtract, shift, and, or, exclusive or, compare, greater than, less than, and bypass.
  • the ALU is also used to determine the address of conditional branches while a separate comparator determines the outcome of the branch instruction.
  • the most common set of instructions which executes cleanly through the pipeline is the group of ALU instructions.
  • the ALU instructions read the operands from the top of stack 400 in decode stage 302 and use the ALU in execution stage 303 to compute the result. The result is written back to stack 400 in write-back stage 305.
  • a shifter is also present as part of the ALU. If the operands are not available for the instruction in decode stage 302, or at a maximum at the beginning of execution stage 303, an interlock holds the pipeline stages before execution stage 303.
  • the instruction cache unit interface of integer execution unit IEU is a valid/accept interface, where instruction cache unit 120 delivers instructions to integer decode unit 130 in fixed fields along with valid bits.
  • Instruction decoder 135 responds by signaling how much byte aligner circuit 122 needs to shift, or how many bytes instruction decode unit 130 could consume in decode stage 302.
  • the instruction cache unit interface also signals to instruction cache unit 120 the branch mis-predict condition, and the branch address in execution stage 303. Traps, when taken, are also similarly indicated to instruction cache unit 120.
  • Instruction cache unit 120 can hold integer unit 142 by not asserting any of the valid bits to instruction decode unit 130.
  • Instruction decode unit 130 can hold instruction cache unit 120 by not asserting the shift signal to byte aligner circuit 122.
  • the data cache interface of integer execution unit IEU also is a valid-accept interface, where integer unit 142 signals, in execution stage 303, a load or store operation along with its attributes, e.g., non-cached, special stores etc., to data cache controller 161 in data cache unit 160.
  • Data cache unit 160 can return the data on a load, and control integer unit 142 using a data control unit hold signal. On a data cache hit, data cache unit 160 returns the requested data, and then releases the pipeline.
  • integer unit 142 also supplies the data along with the address in execution stage 303.
  • Data cache unit 165 can hold the pipeline in cache stage 304 if data cache unit 165 is busy, e.g., doing a line fill etc.
  • Instruction decoder 135 fetches and decodes floating point unit 143 related instructions. Instruction decoder 135 sends the floating point operation operands for execution to floating point unit 142 in decode state 302. While floating point unit 143 is busy executing the floating point operation, integer unit 142 halts the pipeline and waits until floating point unit 143 signals to integer unit 142 that the result is available.
  • a floating point ready signal from floating point unit 143 indicates that execution stage 303 of the floating point operation has concluded.
  • the result is written back into stack cache 155 by integer unit 142.
  • Floating point load and stores are entirely handled by integer execution unit IEU, since the operands for both floating point unit 143 and integer unit 142 are found in stack cache 155.
  • a stack management unit 150 stores information, and provides operands to execution unit 140. Stack management unit 150 also takes care of overflow and underflow conditions of stack cache 155.
  • stack management unit 150 includes stack cache 155 that, as described above, is a three read port, two write port register file in one embodiment; a stack control unit 152 which provides the necessary control signals for two read ports and one write port that are used to retrieve operands for execution unit 140 and for storing data back from a write-back register or data cache 165 into stack cache 155; and a dribble manager 151 which speculatively dribbles data in and out of stack cache 155 into memory whenever there is an overflow or underflow in stack cache 155.
  • memory includes data cache 165 and any memory storage interfaced by memory interface unit 110.
  • memory includes any suitable memory hierarchy including caches, addressable read/write memory storage, secondary storage, etc.
  • Dribble manager 151 also provides the necessary control signals for a single read port and a single write port of stack cache 155 which are used exclusively for background dribbling purposes.
  • stack cache 155 is managed as a circular buffer which ensures that the stack grows and shrinks in a predictable manner to avoid overflows or overwrites.
  • the saving and restoring of values to and from data cache 165 is controlled by dribbler manager 151 using high- and low-water marks, in one embodiment.
  • Stack management unit 150 provides execution unit
  • Stack management unit 150 can store a single 32-bit result in a given cycle.
  • Dribble manager 151 handles spills and fills of stack cache 155 by speculatively dribbling the data in and out of stack cache 155 from and to data cache 165. Dribble manager 151 generates a pipeline stall signal to stall the pipeline when a stack overflow or underflow condition is detected. Dribble manager 151 also keeps track of requests sent to data cache unit 160. A single request to data cache unit 160 is a 32-bit consecutive load or store request.
  • stack cache 155 The hardware organization of stack cache 155 is such that, except for long operands (long integers and double precision floating-point numbers) , implicit operand fetches for opcodes do not add latency to the execution of the opcodes.
  • the number of entries in operand stack 423 (Fig. 4A) and local variable storage 422 that are maintained in stack cache 155 represents a hardware/performance tradeoff. At least a few operand stack 423 and local variable storage 422 entries are required to get good performance. In the exemplary embodiment of Figure 1, at least the top three entries of operand stack 423 and the first four local variable storage 422 entries are preferably represented in stack cache 155.
  • stack cache 155 (Fig.l) is to emulate a register file where access to the top two registers is always possible without extra cycles.
  • a small hardware stack is sufficient if the proper intelligence is provided to load/store values from/to memory in the background, therefore preparing stack cache 155 for incoming virtual machine instructions.
  • An entry in stack 400 thus represents a value and not a number of bytes.
  • Long integer and double precision floating ⁇ point numbers require two entries.
  • the mechanism for filling and spilling the operand stack from stack cache 155 out to memory by dribble manager 151 can assume one of several alternative forms. One register at a time can be filled or spilled, or a block of several registers filled or spilled at once. A simple scoreboarded method is appropriate for stack management.
  • a single bit indicates if the register in stack cache 155 is currently valid.
  • some embodiments of stack cache 155 use a single bit to indicate whether the data content of the register is saved to stack 400, i.e., whether the register is dirty.
  • a high-water mark/low-water mark heuristic determines when entries are saved to and restored from stack 400, respectively (Fig. 4A) .
  • the hardware starts loading registers from stack 400 into stack cache 155.
  • stack management unit 150 and dribble manager unit 151 are described below and in U.S. Patent Application Serial No. 08/xxx,xxx, entitled “METHOD FRAME STORAGE USING MULTIPLE MEMORY CIRCUITS” naming James Michael O'Connor and Marc Tremblay as inventors, assigned to the assignee of this application, and filed on even date herewith with Attorney Docket No. SP2038, which is incorporated herein by reference in its entirety.
  • stack management unit 150 also includes an optional local variable look-aside cache 153.
  • Cache 153 is most important in applications where both the local variables and operand stack 423 (Fig. 4A) for a method are not located on stack cache 155. In such instances when cache 153 is not included in hardware processor 100, there is a miss on stack cache 155 when a local variable is accessed, and execution unit 140 accesses data cache unit 160, which in turn slows down execution. In contrast, with cache 153, the local variable is retrieved from cache 153 and there is no delay in execution.
  • Local variables zero to M, where M is an integer, for method 0 are stored in plane 421A_0 of cache 153 and plane 421A_0 is accessed when method number 402 is zero.
  • Local variables zero to N, where N is an integer, for method 1 are stored in plane 421A_1 of cache 153 and plane 421A_1 is accessed when method number 402 is one.
  • Local variables zero to P, where P is an integer, for method 1 are stored in plane 421A_2 of cache 153 and plane 421A_2 is accessed when method number 402 is two. Notice that the various planes of cache 153 may be different sizes, but typically each plane of the cache has a fixed size that is empirically determined.
  • a new plane 421A_2 in cache 153 is loaded with the local variables for that method, and method number register 402, which in one embodiment is a counter, is changed, e.g., incremented, to point to the plane of cache 153 containing the local variables for the new method.
  • the local variables are ordered within a plane of cache 153 so that cache 153 is effectively a direct-mapped cache.
  • the variable is accessed directly from the most recent plane in cache 153, i.e., the plane identified by method number 402.
  • method number register 402 is changed, e.g., decremented, to point at previous plane 421A-1 of cache 153.
  • Cache 153 can be made as wide and as deep as necessary.
  • Data cache unit 160 manages all requests for data in data cache 165.
  • Data cache requests can come from dribbling manager 151 or execution unit 140.
  • Data cache controller 161 arbitrates between these requests giving priority to the execution unit requests.
  • data cache controller 161 In response to a request, data cache controller 161 generates address, data and control signals for the data and tags RAMs in data cache 165. For a data cache hit, data cache controller 161 reorders the data RAM output to provide the right data.
  • Data cache controller 161 also generates requests to I/O bus and memory interface unit 110 in case of data cache misses, and in case of non-cacheable loads and stores.
  • Data cache controller 161 provides the data path and control logic for processing non- cacheable requests, and the data path and data path control functions for handling cache misses.
  • data cache unit 160 For data cache hits, data cache unit 160 returns data to execution unit 140 in one cycle for loads. Data cache unit 160 also takes one cycle for write hits. In case of a cache miss, data cache unit 160 stalls the pipeline until the requested data is available from the external memory. For both non ⁇ cacheable loads and stores, data cache 161 is bypassed and requests are sent to I/O bus and memory interface unit 110. Non-aligned loads and stores to data cache 165 trap in software.
  • Data cache 165 is a two-way set associative, write back, write allocate, 16-byte line cache.
  • the cache size is configurable to 0, 1, 2, 4, 8, 16 Kbyte sizes. The default size is 8 Kbytes.
  • Each line has a cache tag store entry associated with the line. On a cache miss, 16 bytes of data are written into cache 165 from extema1 memory.
  • Each data cache tag contains a 20-bit address tag field, one valid bit, and one dirty bit. Each cache tag is also associated with a least recently used bit that is used for replacement policy. To support multiple cache sizes, the width of the tag fields also can be varied. If a cache enable bit in processor service register is not set, loads and stores are treated like non-cacheable instructions by data cache controller 161.
  • a single sixteen-byte write back buffer is provided for writing back dirty cache lines which need to be replaced.
  • Data cache unit 160 can provide a maximum of four bytes on a read and a maximum of four bytes of data can be written into cache 161 in a single cycle. Diagnostic reads and writes can be done on the caches.
  • data cache unit 165 includes a memory allocation accelerator 166.
  • memory allocation accelerator 166 When a new object is created, fields for the object are fetched from external memory, stored in data cache 165 and then the field is cleared to zero. This is a time consuming process that is eliminated by memory allocation accelerator 166.
  • memory allocation accelerator 160 When a new object is created, no fields are retrieved from external memory. Rather, memory allocation accelerator 160 simply stores a line of zeros in data cache 165 and marks that line of data cache 165 as dirty.
  • Memory allocation accelerator 166 is particularly advantageous with a write-back cache. Since memory allocation accelerator 166 eliminates the external memory access each time a new object is created, the performance of hardware processor 100 is enhanced.
  • Floating point unit (FPU) 143 includes a microcode sequencer, input/output section with input/output registers, a floating point adder, i.e., an ALU, and a floating point multiply/divide unit.
  • the microcode sequencer controls the microcode flow and microcode branches.
  • the input/output section provides the control for input/output data transactions, and provides the input data loading and output data unloading registers. These registers also provide intermediate result storage.
  • the floating point adder-ALU includes the combinatorial logic used to perform the floating point adds, floating point subtracts, and conversion operations.
  • the floating point multiply/divide unit contains the hardware for performing multiply/divide and remainder.
  • Floating point unit 143 is organized as a microcoded engine with a 32-bit data path. This data path is often reused many times during the computation of the result . Double precision operations require approximately two to four times the number of cycles as single precision operations.
  • the floating point ready signal is asserted one-cycle prior to the completion of a given floating point operation. This allows integer unit 142 to read the floating point unit output registers without any wasted interface cycles. Thus, output data is available for reading one cycle after the floating point ready signal is asserted.
  • JAVA Virtual Machine Specification of Appendix I is hardware independent, the virtual machine instructions are not optimized for a particular general type of processor, e.g., a complex instruction set computer (CISC) processor, or a reduced instruction set computer (RISC) processor. In fact, some virtual machine instructions have a CISC nature and others a RISC nature. This dual nature complicates the operation and optimization of hardware processor 100.
  • CISC complex instruction set computer
  • RISC reduced instruction set computer
  • the JAVA virtual machine specification defines opcode 171 for an instruction lookups itch, which is a traditional switch statement.
  • the datastream to instruction cache unit 120 includes an opcode 171, identifying the N-way switch statement, that is followed zero to three bytes of padding. The number of bytes of padding is selected so that first operand byte begins at an address that is a multiple of four.
  • datastream is used generically to indicate information that is provided to a particular element, block, component, or unit.
  • a first operand in the first pair is the default offset for the switch statement that is used when the argument, referred to as an integer key, or alternatively, a current match value, of the switch statement is not equal to any of the values of the matches in the switch statement.
  • the second operand in the first pair defines the number of pairs that follow in the datastream.
  • Each subsequent operand pair in the datastream has a first operand that is a match value, and a second operand that is an offset. If the integer key is equal to one of the match values, the offset in the pair is added to the address of the switch statement to define the address to which execution branches. Conversely, if the integer key is unequal to any of the match values, the default offset in the first pair is added to the address of the switch statement to define the address to which execution branches. Direct execution of this virtual machine instruction requires many cycles.
  • look-up switch accelerator 145 is included in hardware processor 100.
  • Look-up switch accelerator 145 includes an associative memory which stores information associated with one or more lookup switch statements. For each lookup switch statement, i.e., each instruction lookupswitch, this information includes a lookup switch identifier value, i.e., the program counter value associated with the lookup switch statement, a plurality of match values and a corresponding plurality of jump offset values.
  • Lookup switch accelerator 145 determines whether a current instruction received by hardware processor 100 corresponds to a lookup switch statement stored in the associative memory. Lookup switch accelerator 145 further determines whether a current match value associated with the current instruction corresponds with one of the match values stored in the associative memory. Lookup switch accelerator 145 accesses a jump offset value from the associative memory when the current instruction corresponds to a lookup switch statement stored in the memory and the current match value corresponds with one of the match values stored in the memory wherein the accessed jump offset value corresponds with the current match value.
  • Lookup switch accelerator 145 further includes circuitry for retrieving match and jump offset values associated with a current lookup switch statement when the associative memory does not already contain the match and jump offset values associated with the current lookup switch statement .
  • Lookup switch accelerator 145 is described in more detail in U.S. Patent Application Serial No. 08/xxx,xxx, entitled “LOOK-UP SWITCH ACCELERATOR AND METHOD OF OPERATING SAME" naming Marc Tremblay and James Michael O'Connor as inventors, assigned to the assignee of this application, and filed on even date herewith with Attorney Docket No. SP2040, which is incorporated herein by reference in its entirety.
  • execution unit 140 accesses a method vector to retrieve one of the method pointers in the method vector, i.e., one level of indirection.
  • Execution unit 140 then uses the accessed method pointer to access a corresponding method, i.e., a second level of indirection.
  • each object is provided with a dedicated copy of each of the methods to be accessed by the object.
  • Execution unit 140 then accesses the methods using a single level of indirection. That is, each method is directly accessed by a pointer which is derived from the object. This eliminates a level of indirection which was previously introduced by the method pointers .
  • the operation of execution unit 140 can be accelerated. The acceleration of execution unit 140 by reducing the levels of indirection experienced by execution unit 140 is described in more detail in U.S. Patent Application Serial No.
  • TLB translation lookaside buffer
  • JAVA virtual machine specification defines an instruction putfield, opcode 181, that upon execution sets a field in an object and an instruction getfield, opcode 180, that upon execution fetches a field from an object.
  • the opcode is followed by an index byte one and an index byte two.
  • Operand stack 423 contains a reference to an object followed by a value for instruction putfield, but only a reference to an object for instruction getfield.
  • Index bytes one and two are used to generate an index into the constant pool of the current class .
  • the item in the constant pool at that index is a field reference to a class name and a field name.
  • the item is resolved to a field block pointer which has both the field width, in bytes, and the field offset, in bytes.
  • An optional getfield- putfield accelerator 146 in execution unit 140 stores the field block pointer for instruction getfield or instruction putfield in a cache, for use after the first invocation of the instruction, along with the index used to identify the item in the constant pool that was resolved into the field block pointer as a tag. Subsequently, execution unit 140 uses index bytes one and two to generate the index and supplies the index to getfield-putfield accelerator 146.
  • index matches one of the indexes stored as a tag, i.e., there is a hit, the field block pointer associated with that tag is retrieved and used by execution unit 140. Conversely, if a match is not found, execution unit 140 performs the operations described above.
  • Getfield-putfield accelerator 146 is implemented without using self- modifying code that was used in one embodiment of the quick instruction translation described above.
  • getfield-putfield accelerator 146 includes an associative memory that has a first section that holds the indices that function as tags, and a second section that holds the field block pointers. When an index is applied through an input section to the first section of the associative memory, and there is a match with one of the stored indices, the field block pointer associated with the stored index that matched in input index is output from the second section of the associative memory. Bounds Check Unit
  • Bounds check unit 147 (Fig. 1) in execution unit 140 is an optional hardware circuit that checks each access to an element of an array to determine whether the access is to a location within the array. When the access is to a location outside the array, bounds check unit 147 issues an active array bound exception signal to execution unit 140. In response to the active array bound exception signal, execution unit 140 initiates execution of an exception handler stored in microcode
  • bounds check unit 147 includes an associative memory element in which is stored a array identifier for an array, e.g., a program counter value, and a maximum value and a minimum value for the array.
  • a array identifier for an array e.g., a program counter value
  • a maximum value and a minimum value for the array e.g., the array identifier for that array is applied to the associative memory element, and assuming the array is represented in the associative memory element
  • the stored minimum value is a first input signal to a first comparator element, sometimes called a comparison element
  • the stored maximum value is a first input signal to a second comparator element, sometimes also called a comparison element.
  • a second input signal to the first and second comparator elements is the value associated with the access of the array's element.
  • bounds check unit 147 A more detailed description of one embodiment of bounds check unit 147 is provided in U.S. Patent Application Serial No. 08/xxx,xxx, entitled “PROCESSOR WITH ACCELERATED ARRAY ACCESS BOUNDS CHECKING” naming Marc Tremblay, James Michael O'Connor, and William N. Joy as inventors, assigned to the assignee of this application, and filed on even date herewith with Attorney Docket No. SP2041 which is incorporated herein by reference in its entirety.
  • the JAVA Virtual Machine Specification defines that certain instructions can cause certain exceptions.
  • the checks for these exception conditions are implemented, and a hardware/software mechanism for dealing with them is provided in hardware processor 100 by information in microcode ROM 149 and program counter and trap control logic 170.
  • the alternatives include having a trap vector style or a single trap target and pushing the trap type on the stack so that the dedicated trap handler routine determines the appropriate action.
  • Figure 5 illustrates several possible add-ons to hardware processor 100 to create a unique system. Circuits supporting any of the eight functions shown, i.e., NTSC encoder 501, MPEG 502, Ethernet controller 503, VIS 504, ISDN 505, I/O controller 506, ATM assembly/reassembly 507, and radio link 508 can be integrated into the same chip as hardware processor 100 of this invention.
  • FIG. 6 is a block diagram of one embodiment of a stack management unit 150.
  • Stack management unit 150 serves as a high speed buffer between stack 400 and hardware processor 100.
  • Hardware processor 100 accesses stack management unit 150 as if stack management unit 150 were stack 400.
  • Stack management unit 150 automatically transfers data between stack management unit 150 and stack 400 as necessary to provide improve the throughput of data between stack 400 and hardware processor 100.
  • data cache unit 160 retrieves the requested data word and places the requested data word at the top of stack cache 155.
  • Stack management unit 150 contains a stack cache memory circuit 610.
  • Stack cache memory circuit 610 is typically fast memory devices such as a register file or SRAM; however, slower memory devices such as DRAM can also be used.
  • access to stack cache memory circuit 610 is controlled by stack control unit 152.
  • a write port 630 allows hardware processor 100 to write data on data lines 635 to stack cache memory circuit 610.
  • Read port 640 and read port 650 allow hardware processor 100 to read data from stack cache memory circuit 610 on data lines 645 and 655, respectively. Two read ports are provided to increase throughput since many operations of stack-based computing systems require two operands from stack 400. Other embodiments of stack cache 155 may provide more or less read and write ports.
  • dribble manager unit 151 controls the transfer of data between stack 400
  • Dribble manager unit 151 includes a fill control unit 694 and a spill control unit 698. In some embodiments of dribble manager unit 151, fill control unit 694 and spill control unit 698 function independently. Fill control unit 694 determines if a fill condition exists. If the fill condition exists, fill control unit 694 transfers data words from stack 400 to stack cache memory circuit 610 on data lines 675 through a write port 670. Spill control unit 698 determines if a spill condition exists.
  • spill control unit 698 transfers data words from stack cache memory circuit 610 to stack 400 through read port 680 on data lines 685.
  • Write port 670 and read port 680 allows transfers between stack 400 and stack cache memory circuit 610 to occur simultaneously with reads and writes controlled by stack control unit 152. If contention for read and write ports of stack cache memory circuit 610 is not important, dribble manager unit 151 can share read and write ports with stack control unit 152.
  • stack management unit 150 is described in the context of buffering stack 400 for hardware processor 100, stack management unit 150 can perform caching for any stack-based computing system.
  • the details of hardware processor 100 are provided only as an example of one possible stack-based computing system for use with the present invention.
  • Figure 7 shows a conceptual model of the memory architecture of stack cache memory circuit 610 for one embodiment of stack cache 155.
  • stack cache memory circuit 610 is a register file organized in a circular buffer memory architecture capable of holding 64 data words. Other embodiments may contain a different number of data words .
  • the circular memory architecture causes data words in excess of the capacity of stack cache memory circuit 610 to be written to previously used registers. If stack cache memory unit 610 uses a different memory device, such as an SRAM, different registers would correspond to different memory locations.
  • One technique to address registers in a ' circular buffer is to use pointers containing modulo stack cache size (modulo-SCS) addresses to the various registers of stack cache memory circuit 610.
  • modulo-N operations have the results of the standard operation mapped to a number between 0 and
  • FIG. 7 One embodiment of the pointer addresses of the registers of stack cache memory circuit 610 are shown in Figure 7 as numbered 0-63 along the outer edge of stack cache memory circuit 610.
  • data words (numbered 1 to 70) are written to stack cache memory circuit 610 when stack cache memory circuit 610 is empty, data words 1 to 64 are written to registers 0 to 63, respectively and data words 65 to 70 are written subsequently to registers 0 to 5.
  • dribble manager unit 151 Prior to writing data words 65 to 70, dribble manager unit 151, as described below, transfers data words 1 to 6 which were in registers 0 to 5 to stack 400.
  • data words 70 to 65 are read out of stack cache memory circuit 610, data words 1 to 6 can be retrieved from stack 400 and placed in memory locations 0 to 5.
  • a pointer OPTOP contains the location of the top of stack 400, i.e. the top memory location.
  • pointer OPTOP is a programmable register in execution unit 140.
  • stack management unit 150 maintain pointer OPTOP in stack control unit 152. Since pointer OPTOP is often increased by one, decreased by one, or changed by a specific amount, pointer OPTOP, in one embodiment is a programmable up/down counter.
  • pointer OPTOP indicates the register of stack cache memory circuit 610 containing the most recently written data word in stack cache memory circuit 610, i.e. pointer OPTOP points to the register containing the most recently written data word also called the top register.
  • Some embodiments of stack management unit 150 also contains a pointer OPTOP1 (not shown) which points to the register preceding the register pointed to by pointer OPTOP. Pointer 0PT0P1 can improve the performance of stack management unit 150 since many operations in hardware processor 100 require two data words from stack management unit 150.
  • Pointer OPTOP and pointer OPTOP1 are incremented whenever a new data word is written to stack cache 155.
  • Pointer OPTOP and pointer 0PT0P1 are decremented whenever a stacked data word, i.e. a data word already in stack 400, is popped off of stack cache 155. Since some embodiments of hardware processor 100 may add or remove multiple data words simultaneously, pointer OPTOP and OPTOP1 are implemented, in one embodiment as programmable registers so that new values can be written into the registers rather than requiring multiple increment or decrement cycles.
  • pointer 0PT0P1 may also be implemented using a modulo SCS subtractor which modulo-SCS subtracts one from pointer OPTOP.
  • Some embodiments of stack cache 155 may also include pointers 0PT0P2 or pointer OPTOP3.
  • stack cache memory circuit 610 Since data words are stored in stack cache memory circuit 610 circularly, the bottom of stack cache memory circuit 610 can fluctuate. Therefore, most embodiments of stack cache memory circuit 610 include a pointer CACHE_BOTTOM to indicate the bottom memory location of stack cache memory circuit 610.
  • Pointer CACHE_BOTTOM is typically maintained by dribble manager unit 151. The process to increment or decrement pointer CACHE_BOTTOM varies with the specific embodiment of stack management unit 150.
  • Pointer CACHE_B0TTOM is typically implemented as a programmable up/down counter.
  • Some embodiments of stack management unit 150 also includes other pointers, such as pointer VARS, which points to a memory location of a data word that is often accessed.
  • pointer VARS is stored in a programmable register in execution unit 140. If stack cache 155 is organized using sequential addressing, pointer VARS1 may also be implemented using a modulo-SCS adder which modulo-SCS adds one to pointer VARS.
  • stack management unit 150 To determine which data words to transfer between stack cache memory circuit 610 and stack 400, stack management unit 150, typically tags, i.e. tracks, the valid data words and the data words which are stored in both stack cache memory circuit 610 and stack 400.
  • Figure 8 illustrates one tagging scheme used in some embodiments of stack management unit 150. Specifically, Figure 8 shows a register 810 from stack cache memory circuit 610. The actual data word is stored in data section 812. A valid bit 814 and a saved bit 816 are used to track the status of register 810. If valid bit 814 is at a valid logic state, typically logic high, data section 812 contains a valid data word. If valid bit 814 is at an invalid logic state, typically logic low, data section 812 does not contain a valid data word. If saved bit 816 is at a saved logic state, typically logic high, the data word contained in data section 812 is also stored in stack 400.
  • saved bit 816 is at an unsaved logic state, typically logic low, the data word contained in data section 812 is not stored in stack 400.
  • valid bit 814 of each register is set to the invalid logic state and saved bit 816 of each register is set to the unsaved logic state.
  • stack cache 155 is generally much smaller than the memory address space of hardware processor 100
  • the pointers used to access stack cache memory circuit 610 are generally much smaller than general memory addresses.
  • the specific technique used to map stack cache 155 into the memory space of hardware processor 100 can vary.
  • the pointers used to access stack cache memory circuit 610 are only the lower bits of general memory pointers, i.e. the least significant bits. For example, if stack cache memory circuit 610 comprises 64 registers, pointers OPTOP, VARS, and CACHE_BOTTOM need only be six bits long. If hardware processor 100 has a 12 bit address space, pointers OPTOP, VARS, and
  • CACHE_BOTTOM could be the lower six bits of a general memory pointer.
  • stack cache memory circuit 610 is mapped to a specific segment of the address space having a unique upper six bit combination.
  • Some embodiments of stack cache management unit 150 may be used with purely stacked based computing system so that there is not a memory address space for the system. In this situation, the pointers for accessing stack cache 155 are only internal to stack cache management unit 150.
  • stack management unit 150 can improve data accesses of hardware processor 100 while only caching the top portion of stack 400.
  • stack management unit 150 can improve data accesses of hardware processor 100 while only caching the top portion of stack 400.
  • stack management unit 150 pushes more data words to stack management unit 150 than stack cache memory circuit 610 is able to store, the data words near the bottom of stack cache memory circuit 610 are transferred to stack 400.
  • hardware processor 100 pops data words out of stack cache 155, data words from stack 400 are copied under the bottom of stack cache memory circuit 610, and pointer CACHE_BOTTOM is decremented to point to the new bottom of stack cache memory circuit 610.
  • dribble manager unit 151 should transfer data from stack cache memory circuit 610 to stack 400, i.e. a spill operation, as hardware processor fills stack cache memory circuit 610. Conversely, dribble manager unit 151 should copy data from stack 400 to stack cache memory circuit 610, i.e. a fill operation, as hardware processor empties stack cache memory circuit 610.
  • Figure 9 shows one embodiment of dribble manager unit 151 in which decisions on transferring data from stack cache memory circuit 610 to stack 400, i.e. spilling data, are based on the number of free registers in stack cache memory circuit 610.
  • Free registers includes registers without valid data as well as registers containing data already stored in stack 400, i.e. registers with saved bit 816 set to the saved logic state.
  • Decisions on transferring data from stack 400 to stack cache memory circuit 610, i.e. filling data are based on the number of used registers.
  • a used registers contains a valid but unsaved data word in stack cache memory circuit 610.
  • dribble manager unit 151 further includes a stack cache status circuit 910 and a cache bottom register 920, which can be a programmable up/down counter.
  • Stack cache status circuit 910 receives pointer CACHE_BOTTOM from cache bottom register 920 and pointer OPTOP to determine the number of free registers FREE and the number of used registers USED.
  • USED (OPTOP - CACHE_BOTTOM +1) MOD SCS.
  • USED (OPTOP - CACHE_BOTTOM +1) MOD SCS.
  • stack cache status circuit 910 can be implemented with a modulo SCS adder/subtractor.
  • the number of used registers USED and the number of free registers FREE can also be generated using a programmable up/down counters. For example, a used register can be incremented whenever a data word is added to stack cache 155 and decremented whenever a data word is removed from stack cache 155. Specifically, if pointer OPTOP is modulo-SCS incremented by some amount, the used register is incremented by the same amount. If pointer OPTOP is modulo-SCS decremented by some amount, the used register is decremented by the same amount.
  • pointer CACHE_BOTTOM is modulo-SCS incremented by some amount
  • the used register is decremented by the same amount. If pointer CACHE_BOTTOM is modulo-SCS decremented by some amount, the used register is incremented the same amount.
  • the number of free registers FREE can be generated by subtracting the number of used registers USED from the total number of registers .
  • Spill control unit 694 ( Figures 6 and 9) includes a cache high threshold register 930 and a comparator 940.
  • Comparator 940 compares the value in cache high threshold register 930 to the number of free registers FREE. If the number of free registers FREE is less than the value in cache high threshold register 930, comparator 940 drives a spill signal SPILL to a spill logic level, typically logic high, to indicate that the spill condition exists and one or more data words should be transferred from stack cache memory circuit 610 to stack 400, i.e. a spill operation should be performed. The spill operation is described in more detail below.
  • cache high threshold register 930 is programmable by hardware processor 100.
  • Fill control unit 698 includes a cache low threshold register 950 and a comparator 960.
  • Comparator 960 compares the value in cache low threshold register 950 to the number of used registers USED. If the number of used registers is less than the value in cache low threshold register 950, comparator 960 drives a fill signal FILL to a fill logic level, typically logic high, to indicate that the fill condition exists and one or more data words should be transferred from stack 400 to stack cache memory circuit 610, i.e. a fill operation should be performed. The fill operation is described in more detail below.
  • cache low threshold register 950 is programmable by hardware processor 100.
  • a single cache threshold register can be used.
  • Fill control unit 698 can be modified to use the number of free registers FREE to drive signal FILL to the fill logic level if then number of free registers is greater than the value in cache low threshold 950, with a proper modification of the value in cache low threshold 950.
  • spill control unit 694 can be modified to use the number of used registers.
  • FIG. 10A shows another embodiment of dribble manager unit 151, which uses a high-water mark/low- water mark heuristic to determine when a spill condition or a fill condition exists.
  • Spill control unit 694 includes a high water mark register 1010 implemented as a programmable up/down counter.
  • a comparator 1020 in spill control unit 694 compares the value in high water mark register 1010, i.e. the high water mark, with pointer OPTOP. If pointer OPTOP is greater than the high water mark, comparator 1020 drives spill signal SPILL to the spill logic level to indicate a spill operation should be performed.
  • Fill control unit 698 includes a low water mark register 1010 implemented as a programmable up/down counter.
  • a comparator 1030 in fill control unit 698 compares the value in low water mark register 1030, i.e. the low water mark, with pointer OPTOP. If pointer OPTOP is less than the low water mark, comparator 1040 drives fill signal FILL to the fill logic level to indicate a fill operation should be performed.
  • the low water mark register is modulo-SCS incremented and modulo-SCS decremented whenever pointer CACHE_BOTTOM is modulo-SCS incremented or modulo-SCS decremented, respectively.
  • FIG. 10B shows an alternative circuit to generate the high water mark and low water mark.
  • Cache high threshold register 930 typically implemented as a programmable register, contains the number of free registers which should be maintained in stack cache memory circuit 610.
  • the high water mark is then calculated by modulo-SCS subtractor 1050 by modulo-SCS subtracting the value in cache high threshold register 930 from pointer CACHE_BOTTOM stored in cache bottom register 920.
  • the low water mark is calculated by doing a modulo-SCS addition.
  • cache low threshold register 950 is programmed to contain the minimum number of used data registers desired to be maintained in stack cache memory circuit 610.
  • the low water mark is then calculated by modulo-SCS adder 1060 by modulo-SCS adding the value in cache low threshold register 950 with pointer CACHE_BOTTOM stored in cache bottom register 920.
  • a spill operation is the transfer of one or more data words from stack cache memory circuit 610 to stack 400.
  • the transfers occurs though data cache unit 160.
  • the specific interface between stack management unit 150 and data cache unit 160 can vary.
  • stack management unit 150, and more specifically dribble manager unit 151 sends the data word located at the bottom of stack cache 155, as indicated by pointer CACHE_BOTTOM from read port 680 to data cache unit 160.
  • pointer CACHE_BOTTOM is also provided to data cache unit 160 so that data cache unit 160 can address the data word appropriately.
  • the saved bit of the register indicated by pointer CACHE_BOTTOM is set to the saved logic level.
  • pointer CACHE_B0TT0M is modulo-SCS incremented by one.
  • Other registers as described above may also be modulo-SCS incremented by one.
  • high water mark register 1010 Figure 10A
  • low water mark 1030 would be modulo-SCS incremented by one.
  • Some embodiments of dribble manager unit 151 transfer multiple words for each spill operation.
  • pointer CACHE_BOTTOM is modulo-SCS incremented by the number words transferred to stack 400.
  • pointer CACHE_B0TT0M is at the saved logic level, the data word in that data register is already stored in stack 400. Therefore, the data word in that data register does not need to be copied to stack 400.
  • pointer CACHE_BOTTOM is still modulo-SCS incremented by one.
  • a fill operation transfers data words from stack 400 to stack cache memory circuit 610.
  • the transfers occurs though data cache unit 160.
  • the specific interface between stack management unit 150 and data cache unit 160 can vary.
  • stack management unit 150, and more specifically dribble manager unit 151 determines whether the data register preceding the data register pointed by CACHE_BOTTOM is free, i.e. either the saved bit is in the saved logic state or the valid bit is in the invalid logic state. If the data register preceding the data register pointed to by pointer CACHE_BOTTOM is free, dribble manager unit 151 requests a data word from stack 400 by sending a request with the value of pointer CACHE_BOTTOM modulo-SCS minus one.
  • pointer CACHE_BOTTOM When the data word is received from data cache unit 160, pointer CACHE_BOTTOM is modulo-SCS decremented by one and the received data word is written to the data register pointed to by pointer CACHE_BOTTOM through write port 670. Other registers as described above may also be modulo-SCS decremented. The saved bit and valid bit of the register pointed to by pointer CACHE_BOTTOM are set to the saved logic state and valid logic state, respectively. Some embodiments of dribble manager unit 151 transfer multiple words for each spill operation. For these embodiments, pointer CACHE_BOTTOM is modulo-SCS decremented by the number words transferred to stack 400.
  • pointer CACHE_BOTTOM is still modulo-SCS decremented by one.
  • stack cache 155 hardware processor 100 accesses stack cache memory circuit 610 ( Figure 6) through write port 630, read port 640 and read port 650.
  • Stack control unit 152 generates pointers for write port 630, read port 640 ' , and read port 650 based on the requests of hardware processor 100.
  • Figure 11 shows a circuit to generate pointers for a typical operation which reads two data words from stack cache 155 and writes one data word to stack cache 155. The most common stack manipulation for a stack-based computing system is to pop the top two data words off of the stack and to push a data word onto the top of the stack.
  • the circuit of Figure 11 is configured to be able to provide read pointers to the value of pointer OPTOP and the value of pointer OPTOP modulo-SCS minus one, and a write pointer to the current value of OPTOP modulo-SCS minus one.
  • Multiplexer (MUX) 1110 drives a read pointer RP1 for read port 640.
  • a select line RSI controlled by hardware processor 100 determines whether multiplexer 1110 drives the same value as pointer OPTOP or a read address R_ADDR1 as provided by hardware processor 100.
  • Multiplexer 1120 provides a read pointer RP2 for read port 650.
  • Modulo adder 1140 modulo-SCS adds negative one to the value of pointer OPTOP and drives the resulting sum to multiplexer 1120.
  • a select line RS2 controlled by hardware processor 100 determines whether multiplexer 1120 drives the value from modulo adder 1140 or a read address R_ADDR2 as provided by hardware processor 100.
  • Multiplexer 1130 provides a write pointer WP for write port 630.
  • a modulo adder 1150 modulo-SCS adds one to the value of pointer OPTOP and drives the resulting sum to multiplexer 1130.
  • Select lines WS controlled by hardware processor 100 determines whether multiplexer 1130 drives the value from modulo-SCS adder 1140, the value from modulo-SCS adder 1150, or a write address W_ADDR as provided by hardware processor 100.
  • Figure 12 shows a circuit that generates a read pointer R for read port 640 or read port 650 in embodiments allowing accessing stack cache memory circuit using pointer VARS.
  • Multiplexer 1260 drives read pointer R to one of several input values received on input ports 1261-1267 as determined by selection signals RS. Selection signals RS are controlled by hardware processor 100.
  • the value of pointer OPTOP is driven to input port 1261.
  • Modulo-SCS adder 1210 drives the modulo-SCS sum of the value of pointer OPTOP with negative one to input port 1262.
  • Modulo-SCS adder 1210 drives the modulo-SCS sum of the value of pointer OPTOP with negative two to input port 1263.
  • the value of pointer VARS is driven to input port 1264.
  • Modulo-SCS adder 1230 drives the modulo-SCS sum of the value of pointer VARS with one to input port 1265.
  • Modulo-SCS adder 1240 drives the modulo-SCS sum of the value of pointer VARS with two to input port 1266.
  • Modulo adder-SCS 1250 drives the modulo-SCS sum of the value of pointer VARS with three to input port 1263.
  • Other embodiments may provide other values to the input ports of multiplexer 1260.
  • a dribbling management unit can efficiently control transfers between the stack cache and the stack. Specifically, the dribbling management unit is able to transfer data out of the stack cache to make room for additional data as necessary and transfer data into the stack cache as room becomes available transparently to the stack-based computing system using the stack management unit.
  • This BETA quality release and related documentation are protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this release or related documentation may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any.
  • Portions of this product may be derived from the UNIX ® and Berkeley 4.3 BSD systems, licensed from UNIX System Laboratories, Inc. and the University of California, respectively. Third-party font software in this release is protected by copyright and licensed from Sun's Font Suppliers.
  • TRADEMARKS Sun Sun Microsystems, Sun Microsystems Computer Corporation, the Sun logo, the Sun Microsystems Computer Corporation logo, WebRunner, JAVA, FirstPerson and the FirstPerson logo and agent are trademarks or registered trademarks of Sun Microsystems, Inc.
  • the "Duke” character is a trademark of Sun Microsystems, Inc. and Copyright (c) 1992-1995 Sun Microsystems, Inc. All Rights Reserved.
  • UNIX ® is a registered trademark in the United States and other countries, exclusively licensed through X/Open Company, Ltd.
  • OPEN LOOK is a registered trademark of Novell, Inc. All other product names mentioned herein are the trademarks of their respective owners.
  • SPARC trademarks including the SCD Compliant logo, are trademarks or registered trademarks of SPARC International, Inc.
  • SPARCstation, SPARCserver, SPARCengine, SPARCworks, and SPARCompiler are licensed exclusively to Sun Microsystems, Inc. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc.
  • the OPEN LOOK ® and SunTM Graphical User Interfaces were developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun's licensees who implement OPEN LOOK GUIs and otherwise comply with Sun's written license agreements.
  • X Window System is a trademark and product of the Massachusetts Institute of Technology.
  • This document describes version 1.0 of the JAVA Virtual Machine and its instruction set. We have written this document to act as a specification for both compiler writers, who wish to target the machine, and as a specification for others who may wish to implement a compliant JAVA Virtual Machine.
  • the JAVA Virtual Machine is an imaginary machine that is implemented by emulating it in software on a real machine.
  • Code for the JAVA Virtual Machine is stored in .class files, each of which contains the code for at most one public class.
  • Appendix A contains some instructions generatedinternally by Sun's implementation of the JAVA Virtual Machine. While not strictly part of the specification we describe these here so that this specification can serve as a reference for our implementation. As more implementations of the JAVA Virtual Machine become available, we may remove Appendix A from future releases.
  • Sun will license the JAVA Virtual Machine trademark and logo for use with compliant implementations of this specification. If you are considering constructing your own implementation of the JAVA Virtual Machine please contact us, at the email address below, so that we can work together to insure 100% compatibility of your implementation.
  • the virtual machine data types include the basic data types of the JAVA language: byte // 1-byte signed 2's complement integer short // 2-byte signed 2's complement integer int // 4-byte signed 2's complement integer long // 8-byte signed 2's complement integer float // 4-byte IEEE 754 single-precision float double // 8-byte IEEE 754 double-precision float char // 2-byte unsigned Unicode character Nearly all JAVA type checking is done at compile time. Data of the primitive types shown above need not be tagged by the hardware to allow execution of JAVA.
  • the bytecodes that operate on primitive values indicate the types of the operands so that, for example, the iadd, ladd, fadd, and dadd instructions each add two numbers, whose types are int, long, float, and double, respectively
  • the virtual machine doesn't have separate instructions for boolean types. Instead, integer instructions, including integer returns, are used to operate on boolean values; byte arrays are used for arrays of boolean.
  • the virtual machine specifies that floating point be done in IEEE 754 format, with support for gradual underflow. Older computer architectures that do not have support for IEEE format may run JAVA numeric programs very slowly.
  • JAVA arrays are treated as objects. This specification does not require any particular internal structure for objects.
  • an object reference is to a handle, which is a pair of pointers: one to a method table for the object, and the other to the data allocated for the object.
  • Other implementations may use inline caching, rather than method table dispatch; such methods are likely to be faster on hardware that is emerging between now and the year 2000.
  • Programs represented by JAVA Virtual Machine bytecodes are expected to maintain proper type discipline and an implementation may refuse to execute a bytecode program that appears to violate such type discipline.
  • the virtual machine is executing the code of a single method, and the pc register contains the address of the next bytecode to be executed.
  • Each method has memory space allocated for it to hold: a set of local variables, referenced by a vars register; an operand stack, referenced by an optop register; and a execution environment structure, referenced by a frame register. All of this space can be allocated at once, since the size of the local variables and operand stack are known at compile time, and the size of the execution environment structure is well-known to the interpreter. All of these registers are 32 bits wide.
  • Each JAVA method uses a fixed-sized set of local variables. They are addressed as word offsets from the vars register. Local variables are all 32 bits wide.
  • Instructions are provided to load the values of local variables onto the operand stack and store values from the operand stack into local variables.
  • the machine instructions all take operands from an operand stack, operate on them, and return results to the stack.
  • the operand stack is 32 bits wide. It is used to pass parameters to methods and receive method results, as well as to supply parameters for operations and save operation results.
  • execution of instruction iadd adds two integers together. It expects that the two integers are the top two words on the operand stack, and were pushed there by previous instructions. Both integers are popped from the stack, added, and their sum pushed back onto the operand stack.
  • Subcomputations may be nested on the operand stack, and result in a single operand that can be used by the nesting computation.
  • Each primitive data type has specialized instructions that know how to operate on operands of that type.
  • Each operand requires a single location on the stack, except for long and double operands, which require two locations.
  • Operands must be operated on by operators appropriate to their type. It is illegal, for example, to push two integers and then treat them as a long. This restriction is enforced, in the Sun implementation, by the bytecode verifier. However, a small number of operations (the dup opcodes and swap) operate on runtime data areas as raw values of a given width without regard to type.
  • Execution Environment The information contained in the execution environment is used to do dynamic linking, normal method returns, and exception propagation.
  • Dynamic Linking The execution environment contains references to the interpreter symbol table for the current method and current class, in support of dynamic linking of the method code.
  • the class file code for a method refers to methods to be called and variables to be accessed symbolically. Dynamic linking translates these symbolic method calls into actual method calls, loading classes as necessary to resolve as-yet-undefined symbols, and translates variable accesses into appropriate offsets in storage structures associated with the runtime location of these variables.
  • the execution environment is used in this case to restore the registers of the caller, with the program counter of the caller appropriately incremented to skip the method call instruction. Execution then continues in the calling method's execution environment.
  • An exceptional condition known in JAVA as an Error or Exception, which are subclasses of Throwable, may arise in a program because of : a dynamic linkage failure, such as a failure to find a needed class file; a run-time error, such as a reference through a null pointer; an asynchronous event, such as is thrown by Thread. stop, from another thread; and the program using a throw statement.
  • a dynamic linkage failure such as a failure to find a needed class file
  • a run-time error such as a reference through a null pointer
  • an asynchronous event such as is thrown by Thread. stop, from another thread
  • the program using a throw statement When an exception occurs:
  • Each catch clause describes the instruction range for which it is active, describes the type of exception that it is to handle, and has the address of the code to handle it.
  • An exception matches a catch clause if the instruction that caused the exception is in the appropriate instruction range, and the exception type is a subtype of the type of exception that the catch clause handles. If a matching catch clause is found, the system branches to the specified handler. If no handler is found, the process is repeated until all the nested catch clauses of the current method have been exhausted. The order of the catch clauses in the list ' is important. The virtual machine execution continues at the first matching catch clause. Because JAVA code is structured, it is always possible to sort all the exception handlers for one method into a single list that, for any possible program counter value, can be searched in linear order to find the proper (innermost containing applicable) exception handler for an exception occurring at that program counter value.
  • the execution environment may be extended with additional implementation-specified information, such as debugging information.
  • the JAVA heap is the runtime data area from which class instances (objects) are allocated.
  • the JAVA language is designed to be garbage collected - it does not give the programmer the ability to deallocate objects explicitly.
  • the JAVA language does not presuppose any particular kind of garbage collection; various algorithms may be used depending on system requirements.
  • the method area is analogous to the store for compiled code in conventional languages or the text segment in a UNIX process . It stores method code (compiled JAVA code) and symbol tables. In the current JAVA implementation, method code is not part of the garbage-collected heap, although this is planned for a future release.
  • An instruction in the JAVA instruction set consists of a one-byte opcode specifying the operation to be performed, and zero or more operands supplying parameters or data that will be used by the operation. Many instructions have no operands and consist only of an opcode.
  • the inner loop of the virtual machine execution is effectively: do ⁇ fetch an opcode byte execute an action depending on the value of the opcode ⁇ while (there is more to do) ;
  • the number and size of the additional operands is determined by the opcode. If an additional operand is more than one byte in size, then it is stored in big-endian order - high order byte first. For example, a 16-bit parameter is stored as two bytes whose value is : first_byte * 256 + second_byte
  • the bytecode instruction stream is only byte-aligned, with the exception being the tableswitch and lookupswitch instructions, which force alignment to a 4-byte boundary within their instructions.
  • the per-class constant pool has a maximum of 65535 entries. This acts as an internal limit on the total complexity of a single class.
  • the amount of code per method is limited to 65535 bytes by the sizes of the indices in the code in the exception table, the line number table, and the local variable table .
  • Each class file contains the compiled version of either a JAVA class or a JAVA interface.
  • Compliant JAVA interpreters must be capable of dealing with all class files that conform to the following specification.
  • a JAVA class file consists of a stream of 8-bit bytes. All 16-bit and 32-bit quantities are constructed by reading in two or four 8-bit bytes, respectively. The bytes are joined together in network (big-endian) order, where the high bytes come first.
  • This format is supported by the JAVA JAVA.io.Datainput and JAVA. io.DataOutput interfaces, and classes such as JAVA. io.DatalnputStream and JAVA. io.DataOutputStream.
  • the class file format is described here using a structure notation. Successive fields in the structure appear in the external representation without padding or alignment. Variable size arrays, often of variable sized elements, are called tables and are commonplace in these structures.
  • the types ul, u2, and u4 mean an unsigned one-, two-, or four-byte quantity, respectively, which are read by method such as readUnsignedByte, readUnsignedShort and readint of the JAVA. io.DataInput interface .
  • 2.1 Format The following pseudo-structure gives a top-level description of the format of a class file:
  • ClassFile ⁇ u4 magic; u2 minor_version,* u2 major_version; u2 constant_pool_count; cp_info constant_pool [constant_pool_count - 1] ; u2 access_flags; u2 this_class; u2 super_class; u2 interfaces_count; u2 interfaces [interfaces_count] ,* u2 fields_count,* field_info fields [fields_count] ; u2 methods_count; method_info methods [methods_count] ; u2 attributes_count; attribute_info attributes [attribute_count] ; ⁇ magic
  • the current major version number is 45; the current minor version number is 3. constant_pool_count
  • the constant pool is a table of values. These values are the various string constants, class names, field names, and others that are referred to by the class structure or by the code.
  • constant_pool [0] is always unused by the compiler, and may be used by an implementation for any purpose.
  • Each of the constant_pool entries 1 through constant_pool_count-l is a variable-length entry, whose format is given by the first "tag" byte, as described in section 2.3. access_flags
  • This field contains a mask of up to sixteen modifiers used with class, method, and field declarations.
  • the same encoding is used on similar fields in field_info and method_info as described below. Here is the encoding:
  • ACC_STATIC 0x0008 Variable or method is Method, static Variable
  • interfaces_count This field gives the number of interfaces that this class implements. interfaces
  • This field gives the number of instance variables, both static and dynamic, defined by this class.
  • the fields table includes only those variables that are defined explicitly by this class. It does not include those instance variables that are accessible from this class but are inherited from superclasses. fields
  • This field indicates the number of methods, both static and dynamic, defined by this class. This table only includes those methods that are explicitly defined by this class. It does not include inherited methods. methods
  • attributes A class can have any number of optional attributes associated with it.
  • the only class attribute recognized is the "SourceFile” attribute, which indicates the name of the source file from which this class file was compiled. See section 2.6 for more information on the attribute_info structure .
  • a signature is a string representing a type of a method, field or array.
  • the field signature represents the value of an argument to a function or the value of a variable. It is a series of bytes generated by the following grammar:
  • a return- ype signature represents the return value from a method. It is a series of bytes in the following grammar:
  • An argument signature represents an argument passed to a method:
  • a method signature represents the arguments that the method expects, and the value that it returns.
  • Each item in the constant pool begins with a 1-byte tag: .
  • the table below lists the valid tags and their values
  • Each tag byte is then followed by one or more bytes giving more information about the specific constant .
  • CONSTANT_Class is used to represent a class or an interface.
  • CONSTANT_Class_info ⁇ ul tag; u2 name_index;
  • the tag will have the value CONSTANT_C1ass name_index constant_pool [name_index] is a CONSTANT_Utf8 giving the string name of the class. Because arrays are objects, the opcodes anewarray and multianewarray can reference array "classes" via CONSTANT_Class items in the constant pool . In this case, the name of the class is its signature. For example, the class name for int[] [] is
  • CONSTANT_Fieldref_info ⁇ ul tag,* u2 class_index; u2 name_and_type_index;
  • class_index constant_pool [class_index] will be an entry of type CONSTANT_Class giving the name of the class or interface containing the field or method.
  • CONSTANT_Class item For CONSTANT_Fieldref and CONSTANT_Methodref, the CONSTANT_Class item must be an actual class. For CONSTANT_InterfaceMethodref, the item must be an interface which purports to implement the given method.
  • name_and_type_index constant_pool [name_and_type_index] will be an entry of type CONSTANT_NameAndType . This constant pool entry indicates the name and signature of the field or method.
  • CONSTANT_String is used to represent constant objects of the built-in type String.
  • CONSTANT_String_info ⁇ ul tag; u2 string_index;
  • the tag will have the value CONSTANT_String string_index constant_pool [string_index] is a CONSTANT Jtf8 string giving the value to which the String object is initialized.
  • CONSTANT_Integer and CONSTANT_Float represent four-byte constants.
  • the tag will have the value CONSTANT_Integer or
  • the four bytes are the integer value.
  • integers For integers, the four bytes are the integer value.
  • floats they are the IEEE 754 standard representation of the floating point value. These bytes are in network (high byte first) order.
  • CONSTANT_Long andCONSTANT_Double represent eight-byte constants.
  • the 64-bit value is (high_bytes
  • CONSTANT_NameAndType is used to represent a field or method, without indicating which class it belongs to.
  • CONSTANT_NameAndType_info ⁇ ul tag; u2 name index; u2 signature_index;
  • the tag will have the valueCONSTANT_NameAndType.
  • name_index constant_pool [name_index] is a CONSTANT_Utf8 string giving the name of the field or method.
  • signature index constant_pool [signature_index] is a CONSTANT_Utf8 string giving the signature of the field or method.
  • CONSTANT_Utf8 andCONSTANTJnicode are used to represent constant string values.
  • CONSTANTJ tf ⁇ strings are "encoded" so that strings containing only non-null ASCII characters, can be represented using only one byte per character, but characters of up to 16 bits can be represented:
  • null character (0x0000) and characters in the range 0x0080 to 0x07FF are represented by a pair of two bytes :
  • null byte (0x00) is encoded in two-byte format rather than one-byte, so that our strings never have embedded nulls.
  • null byte (0x00) is encoded in two-byte format rather than one-byte, so that our strings never have embedded nulls.
  • CONSTANT_Utf8_info ⁇ ul tag; u2 length; ul bytes [length] ; ⁇
  • the tag will have the value CONSTANT_Utf8 or CONSTANTJJnicode. length The number of bytes in the string. These strings are not null terminated. bytes
  • field_info ⁇ u2 access_flags; u2 name_index; u2 signature_index; u2 attributes_count; attribute_info attributes [attribute__count] ,*
  • ACC_PUBLIC ACC_PRIVATE
  • ACC_PROTECTED ACC_STATIC
  • ACC_FINAL ACC_VOLATILE
  • ACC TRANSIENT At most one of ACC_PUBLIC, ACC_PROTECTED, and ACC_PRIVATE can be set for any method.
  • name_index constant_pool [name_index] is a CONSTANT_Utf8 string which is the name of the field.
  • signature_index constant_pool [signature_index] is a CONSTANT_Utf8 string which is the signature of the field. See the section "Signatures" for more information on signatures. attributes_count
  • a field can have any number of optional attributes associated with it. Currently, the only field attribute recognized is the "ConstantValue” attribute, which indicates that this field is a static numeric constant, and indicates the constant value of that field. Any other attributes are skipped.
  • method__info ⁇ u2 access_flags; u2 name__index; u2 signature_index; u2 attributes_count,* attribute_info attributes [attribute_count] ;
  • the possible fields that can be set for a method are ACC_PUBLIC, ACC_PRIVATE, ACC_PROTECTED, ACC_STATIC, ACC_FINAL, ACC SYNCHRONIZED, ACC_NATIVE, and ACC_ABSTRACT.
  • ACC_PUBLIC, ACC_PROTECTED, and ACC_PRIVATE can be set for any method.
  • name_index constant_pool [name_index] is a CONSTANT_Utf8 string giving the name of the method.
  • signature_index constant_pool [signature_index] is a CONSTANT_Utf8 string giving the signature of the field. See the section "Signatures" for more information on signatures. attributes_count
  • a field can have any number of optional attributes associated with it. Each attribute has a name, and other additional information. Currently, the only field attributes recognized are the "Code" and
  • Attributes are used at several different places in the class format. All attributes have the following format : GenericAttribute_info ⁇ u2 attribute_name,* u4 attribute_length; ul info [attribute_length] ;
  • the attribute_name is a 16-bit index into the class's constant pool; the value of constant pool [attribute_name] is a CONSTANT_Utf8 string giving the name of the attribute.
  • the field attribute_length indicates the length of the subsequent information in bytes. This length does not include the six bytes of the attribute_name and attribute_length.
  • SourceFile_attribute ⁇ u2 attribute_name_index; u4 attribute_length; u2 sourcefile_index;
  • attribute_name_index constant_pool [attribute_name_index] is the CONSTANT_Utf8 string "SourceFile”.
  • SourceFile_attribute The length of a SourceFile_attribute must be 2.
  • sourcefile_index constant_pool [sourcefile_index] is a CONSTANT_Utf8 string giving the source file from which this class file was compiled.
  • the "ConstantValue” attribute has the following format :
  • ConstantValue_attribute ⁇ u2 attribute_name_index; u4 attribute_length; u2 constantvalue_index;
  • attribute_name_index constant_pool [attribute_name_index] is the CONSTANT_Utf8 string "ConstantValue”.
  • ConstantValue_attribute The length of a ConstantValue_attribute must be 2.
  • constantvalue_index constant_pool [constantvalue_index] gives the constant value for this field.
  • the constant pool entry must be of a type appropriate to the field, as shown by the following table:
  • the "Code” attribute has the following format: Code_attribute ⁇ u2 attribute_name_index; u4 at ribute_length; u2 max_stack; u2 max__locals; u4 code_length; ul code [code_length] ; u2 exception_table_length;
  • code length The number of bytes in the virtual machine code for this method.
  • exception_table_length The number of entries in the following exception table.
  • start_pc and end_j?c indicate the ranges in the code at which the exception handler is active. The values of both fields are offsets from the start of the code.start_pc is inclusive.end_pc is exclusive. handler_pc
  • This field indicates the starting address of the exception handler.
  • the value of the field is an offset from the start of the code.
  • constant_pool [catch_type] will be the class of exceptions that this exception handler is designated to catch. This exception handler should only be called if the thrown exception is an instance of the given class.
  • the "Code” attribute can itself have attributes. attributes
  • a “Code” attribute can have any number of optional attributes associated with it. Each attribute has a name, and other additional information. Currently, the only code attributes defined are the “LineNumberTable” and “LocalVariableTable,” both of which contain debugging information.
  • This table is used by compilers which indicate which Exceptions a method is declared to throw: Exceptions_attribute ⁇ u2 attribute_name_index; u4 attribute_length; u2 number_of_exceptions; u2 exception_index_table [number_of_ex- ceptions] ;
  • attribute_name_index constant_pool [attribute_name_index] will be the CONSTANT_Utf8 string "Exceptions” .
  • This field indicates the total length of the Exceptions_attribute, excluding the initial six bytes. number of ⁇ xceptione
  • LineNumberTable_attribute has the following format:
  • attribute_name_index constant_pool (attribute__name_index] will be the CONSTANT_Utf8 string "LineNumberTable” .
  • attribute_length This field indicates the total length of the
  • LineNumberTable_attribute excluding the initial six bytes.
  • This field indicates the number of entries in the following line number table. 1ine_nun.ber_ able
  • Each entry in the line number table indicates that the line number in the source file changes at a given point in the code.
  • source_pc SHOULD THAT BEstart_pc?>> is an offset from the beginning of the code. 1ine_number
  • This attribute is used by debuggers to determine the value of a given local variable during the dynamic execution of a method.
  • the format of the LocalVariableTable_attribute is as follows: LocalVariableTable_attribute ⁇ u2 attribute_name_index; u4 attribute_length; u2 local_variable_table_length; ⁇ u2 start_pc; u2 length; u2 name_index; u2 signature_index; u2 slot; ⁇ local_variable_table [local_ variable_table_length] ,*
  • attribute_name_index constant_pool [attribute_name_index] will be the CONSTANTJ tf ⁇ string "LocalVariableTable”.
  • Each entry in the local variable table indicates a code range during which a local variable has a value. It also indicates where on the stack the value of that variable can be found. start_pc, length
  • the given local variable will have a value at the code between start_pc andstart_pc + length.
  • the two " values are both offsets from the beginning of the code name_index signature_index constant_pool [name_index] and constant_pool [signature_index] are CONSTANT_Ut 8 strings giving the name and signature of the local variable. slot
  • the given variable will be the slot ch local variable in the method's frame.
  • JAVA Virtual Machine instructions are represented in this document by an entry of the following form. instruction name
  • Instruction iload has the short description "Load integer from local variable.” Implicitly, the integer is loaded onto the stack. Instruction iadd is described as "Integer add"; both its source and destination are the stack.
  • Instructions that do not affect the control flow of a computation may be assumed to always advance the virtual machine program counter to the opcode of the following instruction. Only instructions that do affect control flow will explicitly mention the effect they have on the program counter.
  • indexbytel and indexbyte2 are used to construct an unsigned 16-bit index into the constant pool of the current class. The item at that index is resolved and pushed onto the stack. If a String is being pushed and there isn't enough memory to allocate space for it then an OutOfMemoryError is thrown.
  • the two-word constant that index is resolved and pushed onto the stack.
  • iload_ ⁇ n> Load integer from local variable Syntax: iload ⁇ n>
  • This instruction is the same as Iload with a vindex of ⁇ n>, except that the operand ⁇ n> is implicit
  • This instruction is the same as dload with a vindex of ⁇ n>, except that the operand ⁇ n> is implicit aload
  • Local variable vindex in the current JAVA frame is set to value.
  • istore_0 59
  • istore_l 60
  • istore_2 61
  • Local variable ⁇ n> in the current JAVA frame is set to value.
  • This instruction is the same as istore with a vindex of ⁇ n>, except that the operand ⁇ n> is implicit
  • Stack ..., value-wordl, value-word2 »> ... value must be a long integer.
  • Local variables vindex+1 in the current JAVA frame are set to value,
  • Local variable vindex in the current JAVA frame is set to value.
  • fstore_0 67
  • fstore_l 68
  • fstore_2 69
  • fstore_3 70 value must be a single-precision floating point number.
  • Local variable ⁇ n> in the current JAVA frame is set to value.
  • This instruction is the same as fstore with a vindex of ⁇ n>, except that the operand ⁇ n> is implicit dstore
  • Local variables vindex and vindex+1 in the current JAVA frame are set to value.
  • dstore_0 71
  • dstore_l 72
  • dstore_2 73
  • Local variables ⁇ n> and ⁇ n>+l in the current JAVA frame are set to value.
  • This instruction is the same as dstore with a vindex of ⁇ n>, except that the operand ⁇ n> is implicit astore
  • Local variable vindex in the current JAVA frame is set to value.
  • astore_0 75
  • astore_l 76
  • astore_2 77
  • Local variable ⁇ n> in the current JAVA frame is set to value.
  • This instruction is the same as astore with a vindex of ⁇ n>, except that the operand ⁇ n> is implicit
  • This bytecode must precede one of the following bytecodes: iload, Iload, fload, dload, aload, istore, Istore, fstore, dstore, astore, iinc.
  • the vindex of the following bytecode and vindex2 from this bytecode are assembled into an unsigned 16-bit index to a local variable in the current JAVA frame. The following bytecode operates as normal except for the use of this wider index.
  • atype is an internal code that indicates the type of array to allocate. Possible values for atype are as follows:
  • result size must be an integer. It represents the number of elements in the new array. indexbytel and indexbyte2 are used to construct an index into the constant pool of the current class. The item at that index is resolved. The resulting entry must be a class. A new array of the indicated class type and capable of holding size elements is allocated, and result is a reference to this new object. Allocation of an array large enough to contain size items of the given class type is attempted. All elements of the array are initialized to null.
  • anewarray is used to create a single dimension of an array of object references.
  • the following code is used: bipush 7 anewarray ⁇ Class "JAVA.1ang.Thread”> anewarray can also be used to create the first dimension of a multi-dimensional array.
  • new int [6] [] is created with the following code: bipush 6 anewarray ⁇ Class " [I"> See CONSTANT_Class in the "Class File Format" chapter for information on array class names.
  • Each size must be an integer. Each represents the number of elements in a dimension of the array. indexbytel and indexbyte2 are used to construct an index into the constant pool of the current class . The item at that index is resolved. The resulting entry must be an array class of one or more dimensions. dimensions has the following aspects:
  • arraylength 190
  • index must be an integer. The signed byte value at position number index in the array is retrieved, expanded to an integer, and pushed onto the top of the stack.
  • caload 52
  • the ;signed short integer value at position number index in the array is retrieved, expanded to an integer, and pushed onto the top of the stack. If arrayref is null, a NullPointerException is thrown. If index is not within the bounds of the array an ArraylndexOutOfBoundsException is thrown.
  • Stack ..., arrayref, index, value-wordl, value-word2 ⁇ >> ... arrayref must be a reference to an array of long integers, index must be an integer, and value a long integer. The long integer value is stored at position index in the array.
  • the single float value is stored at position index in the array.
  • the object reference value is stored at position index in the array.
  • bastore 84
  • Stack ... , array, index, value * > ... arrayref must be an array of shorts, index must be an integer, and value an integer. The integer value is stored at position index in the array. If value is too large to be an short, it is truncated.
  • Pop top stack word Syntax: pop 87
  • dup2 92
  • dup xl 93
  • dup x2 91
  • dup2 x2 94
  • valuel is divided by value2, and both values are replaced on the stack by their long integer quotient. The result is truncated to the nearest integer that is between it and 0. An attempt to divide by zero results in a "/ by zero" ArithmeticException being thrown.
  • Double float divide Syntax: ddiv 111
  • valuel is divided by value2, and the quotient is truncated to an integer, and then multiplied by value2. The product is subtracted from valuel. The result, as a double-precision floating point number, replaces both values on the stack. result ⁇ valuel - (integral_part (valuel/value2) * value2) , where integral_part () rounds to the nearest integer, with a tie going to the even number.
  • valuel is shifted right arithmetically (with sign extension) by the amount indicated by the low five bits of value2.
  • result value must be a double-precision floating point number. It is converted to a single-precision floating point number. If overflow occurs, the result must be infinity with the same sign as value. The result replaces value on the stack.
  • Execution proceeds at that offset from the address of this instruction. Otherwise execution proceeds at the instruction following the ifeq.
  • Stack ..., value «•> ... value must be a reference to an object. It is popped from the stack. If value is null, branchbytel and branchbyte2 are used to construct a signed 16-bit offset. Execution proceeds at that offset from the address of this instruction. Otherwise execution proceeds at the instruction following the ifnull.
  • Branch if integers equal Syntax: if icmpeg 159 branchbytel branchbyte2
  • Branch if integer greater than Syntax: if icmpgt 163 branchbytel branchbyte2
  • Stack ..., valuel, value2 «> ... valuel and value2 must be integers. They are both popped from the stack. If valuel is greater than or equal to value2, branchbytel and branchbyte2 are used to construct a signed 16-bit offset. Execution proceeds at that offset from the address of this instruction. Otherwise execution proceeds at the instruction following instruction if_icmpge. lcmp
  • result valuel and value2 must be double-precision floating point numbers. They are both popped from the stack and compared. If valuel is greater than value2, the integer value 1 is pushed onto the stack. If valuel is equal to value2, the value 0 is pushed onto the stack. If valuel is less than value2, the value 1 is pushed onto the stack.
  • Execution proceeds at that offset from the Address of this instruction. Otherwise execution proceeds at the instruction following the if_acmpeq.
  • Execution proceeds at that offset from the address of this instruction. Otherwise execution proceeds at the instruction following instruction if_acmpne.
  • Stack no change branchbytel and branchbyte2 are used to construct a signed 16-bit offset. Execution proceeds at that offset from the address of this instruction.
  • Stack no change branchbytel, branchbyte2, branchbyte3, and branchbyte4 are used to construct a signed 32-bit offset.
  • the address of the instruction immediately following the jsr is pushed onto the stack. Execution proceeds at the offset from the address of this instruction.
  • the address of the instruction immediately following the jsr_w is pushed onto the stack. Execution proceeds at the offset from the address of this instruction.
  • vindexbytel and vindexbyte2 are assembled into an unsigned 16-bit index to a local variable in the current JAVA frame. That local variable must contain a return address. The contents of the local variable are written into the pc . See the ret instruction for more information.
  • index must be an integer. If index is less than low or index is greater than high, then default-offset is added to the address of this instruction. Otherwise, low is subtracted from index, and the index-low' th element of the jump table is extracted, and added to the address of this instruction.
PCT/US1997/001303 1996-01-24 1997-01-23 Methods and apparatuses for stack caching WO1997027539A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP52708497A JP3634379B2 (ja) 1996-01-24 1997-01-23 スタックキャッシングのための方法及び装置
DE69734399T DE69734399D1 (de) 1996-01-24 1997-01-23 Verfahren und vorrichtung zur stapel-cachespeicherung
EP97904010A EP0976034B1 (en) 1996-01-24 1997-01-23 Method and apparatus for stack caching

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US1052796P 1996-01-24 1996-01-24
US64225396A 1996-05-02 1996-05-02
US08/642,253 1996-05-02
US64710396A 1996-05-07 1996-05-07
US60/010,527 1996-05-07
US08/647,103 1996-05-07

Publications (2)

Publication Number Publication Date
WO1997027539A1 true WO1997027539A1 (en) 1997-07-31
WO1997027539B1 WO1997027539B1 (en) 1997-09-25

Family

ID=27359254

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/001303 WO1997027539A1 (en) 1996-01-24 1997-01-23 Methods and apparatuses for stack caching

Country Status (6)

Country Link
US (5) US6532531B1 (US06532531-20030311-C00088.png)
EP (1) EP0976034B1 (US06532531-20030311-C00088.png)
JP (1) JP3634379B2 (US06532531-20030311-C00088.png)
KR (1) KR100584964B1 (US06532531-20030311-C00088.png)
DE (1) DE69734399D1 (US06532531-20030311-C00088.png)
WO (1) WO1997027539A1 (US06532531-20030311-C00088.png)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1019829A1 (en) * 1997-09-30 2000-07-19 Idea Corporation Method and apparatus for transferring data between a register stack and a memory resource
EP1124362A2 (en) * 2000-01-19 2001-08-16 Wiznot Corp. Apparatus for processing TCP/IP by hardware, and operating method therefor
WO2002045385A2 (en) * 2000-11-20 2002-06-06 Zucotto Wireless, Inc. Methods and devices for caching method frame segments in a low-power stack-based processor
EP1387274A2 (en) * 2002-07-31 2004-02-04 Texas Instruments Incorporated Memory management for local variables
US8516589B2 (en) 2009-04-07 2013-08-20 Samsung Electronics Co., Ltd. Apparatus and method for preventing virus code execution

Families Citing this family (131)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3634379B2 (ja) * 1996-01-24 2005-03-30 サン・マイクロシステムズ・インコーポレイテッド スタックキャッシングのための方法及び装置
EP0825506B1 (en) 1996-08-20 2013-03-06 Invensys Systems, Inc. Methods and apparatus for remote process control
AU2158900A (en) 1998-11-25 2000-06-19 Sun Microsystems, Inc. A method for enabling comprehensive profiling of garbage-collected memory systems
US6332215B1 (en) 1998-12-08 2001-12-18 Nazomi Communications, Inc. Java virtual machine hardware for RISC and CISC processors
US7225436B1 (en) 1998-12-08 2007-05-29 Nazomi Communications Inc. Java hardware accelerator using microcode engine
JP3355602B2 (ja) * 1999-01-27 2002-12-09 インターナショナル・ビジネス・マシーンズ・コーポレーション 多次元配列オブジェクトの処理方法及び装置
US8346971B2 (en) 1999-05-04 2013-01-01 At&T Intellectual Property I, Lp Data transfer, synchronising applications, and low latency networks
WO2000070531A2 (en) 1999-05-17 2000-11-23 The Foxboro Company Methods and apparatus for control configuration
US7089530B1 (en) 1999-05-17 2006-08-08 Invensys Systems, Inc. Process control configuration system with connection validation and configuration
US6788980B1 (en) 1999-06-11 2004-09-07 Invensys Systems, Inc. Methods and apparatus for control using control devices that provide a virtual machine environment and that communicate via an IP network
US6665793B1 (en) * 1999-12-28 2003-12-16 Institute For The Development Of Emerging Architectures, L.L.C. Method and apparatus for managing access to out-of-frame Registers
GB2367653B (en) * 2000-10-05 2004-10-20 Advanced Risc Mach Ltd Restarting translated instructions
GB2367654B (en) * 2000-10-05 2004-10-27 Advanced Risc Mach Ltd Storing stack operands in registers
EP1197847A3 (en) 2000-10-10 2003-05-21 Nazomi Communications Inc. Java hardware accelerator using microcode engine
US8769508B2 (en) 2001-08-24 2014-07-01 Nazomi Communications Inc. Virtual machine hardware for RISC and CISC processors
US6772292B2 (en) * 2002-06-04 2004-08-03 Isaak Garber Two area stack
US7228532B1 (en) * 2002-06-26 2007-06-05 Sun Microsystems, Inc. Method and apparatus to facilitate code verification and garbage collection in a platform-independent virtual machine
EP1387249B1 (en) * 2002-07-31 2019-03-13 Texas Instruments Incorporated RISC processor having a stack and register architecture
GB0220282D0 (en) * 2002-08-31 2002-10-09 Ibm Improved just in time compilation of java software methods
US7367022B2 (en) * 2002-09-05 2008-04-29 Intel Corporation Methods and apparatus for optimizing the operating speed and size of a computer program
US7801120B2 (en) * 2003-01-13 2010-09-21 Emulex Design & Manufacturing Corporation Method and system for efficient queue management
US7139877B2 (en) * 2003-01-16 2006-11-21 Ip-First, Llc Microprocessor and apparatus for performing speculative load operation from a stack memory cache
US7136990B2 (en) * 2003-01-16 2006-11-14 Ip-First, Llc. Fast POP operation from RAM cache using cache row value stack
US7191291B2 (en) * 2003-01-16 2007-03-13 Ip-First, Llc Microprocessor with variable latency stack cache
US7139876B2 (en) * 2003-01-16 2006-11-21 Ip-First, Llc Microprocessor and apparatus for performing fast speculative pop operation from a stack memory cache
US7694301B1 (en) * 2003-06-27 2010-04-06 Nathan Laredo Method and system for supporting input/output for a virtual machine
US20050066305A1 (en) * 2003-09-22 2005-03-24 Lisanke Robert John Method and machine for efficient simulation of digital hardware within a software development environment
US7496917B2 (en) * 2003-09-25 2009-02-24 International Business Machines Corporation Virtual devices using a pluarlity of processors
US7516456B2 (en) * 2003-09-25 2009-04-07 International Business Machines Corporation Asymmetric heterogeneous multi-threaded operating system
US7415703B2 (en) * 2003-09-25 2008-08-19 International Business Machines Corporation Loading software on a plurality of processors
US7389508B2 (en) * 2003-09-25 2008-06-17 International Business Machines Corporation System and method for grouping processors and assigning shared memory space to a group in heterogeneous computer environment
US7478390B2 (en) * 2003-09-25 2009-01-13 International Business Machines Corporation Task queue management of virtual devices using a plurality of processors
US20050071828A1 (en) * 2003-09-25 2005-03-31 International Business Machines Corporation System and method for compiling source code for multi-processor environments
US7444632B2 (en) * 2003-09-25 2008-10-28 International Business Machines Corporation Balancing computational load across a plurality of processors
US7523157B2 (en) * 2003-09-25 2009-04-21 International Business Machines Corporation Managing a plurality of processors as devices
US7549145B2 (en) * 2003-09-25 2009-06-16 International Business Machines Corporation Processor dedicated code handling in a multi-processor environment
US7257665B2 (en) * 2003-09-29 2007-08-14 Intel Corporation Branch-aware FIFO for interprocessor data sharing
US20050071606A1 (en) * 2003-09-30 2005-03-31 Roman Talyansky Device, system and method of allocating spill cells in binary instrumentation using one free register
US7707389B2 (en) * 2003-10-31 2010-04-27 Mips Technologies, Inc. Multi-ISA instruction fetch unit for a processor, and applications thereof
FR2864411B1 (fr) * 2003-12-23 2006-03-03 Cit Alcatel Terminal avec des moyens de protection contre le dysfonctionnement de certaines applications java
AT413739B (de) * 2004-02-09 2006-05-15 Ge Jenbacher Gmbh & Co Ohg Verfahren zum regeln einer brennkraftmaschine
DE102004025418A1 (de) * 2004-05-24 2005-12-22 Infineon Technologies Ag Controller mit einer Decodiereinrichtung
DE102004025419A1 (de) * 2004-05-24 2005-12-22 Infineon Technologies Ag Controller und Verfahren zum Verarbeiten von Befehlen
US7278122B2 (en) * 2004-06-24 2007-10-02 Ftl Systems, Inc. Hardware/software design tool and language specification mechanism enabling efficient technology retargeting and optimization
US20060095675A1 (en) * 2004-08-23 2006-05-04 Rongzhen Yang Three stage hybrid stack model
WO2006031551A2 (en) 2004-09-10 2006-03-23 Cavium Networks Selective replication of data structure
US7941585B2 (en) * 2004-09-10 2011-05-10 Cavium Networks, Inc. Local scratchpad and data caching system
US7594081B2 (en) 2004-09-10 2009-09-22 Cavium Networks, Inc. Direct access to low-latency memory
US7526502B2 (en) * 2004-09-10 2009-04-28 Microsoft Corporation Dynamic call site binding
US7314491B2 (en) * 2004-12-29 2008-01-01 Bull Hn Information Systems Inc. Encapsulation of large native operating system functions as enhancements of the instruction set in an emulated central processor system
US7478224B2 (en) * 2005-04-15 2009-01-13 Atmel Corporation Microprocessor access of operand stack as a register file using native instructions
KR100725393B1 (ko) * 2005-05-19 2007-06-07 삼성전자주식회사 자바 가상 머신에서 바이트 코드의 수행 시간을 줄이는시스템 및 방법
US7676796B2 (en) * 2005-09-29 2010-03-09 Intel Corporation Device, system and method for maintaining a pre-defined number of free registers within an instrumented program
US8024551B2 (en) 2005-10-26 2011-09-20 Analog Devices, Inc. Pipelined digital signal processor
US8285972B2 (en) 2005-10-26 2012-10-09 Analog Devices, Inc. Lookup table addressing system and method
US7728744B2 (en) * 2005-10-26 2010-06-01 Analog Devices, Inc. Variable length decoder system and method
US7454572B2 (en) * 2005-11-08 2008-11-18 Mediatek Inc. Stack caching systems and methods with an active swapping mechanism
JP4732874B2 (ja) * 2005-11-28 2011-07-27 株式会社エヌ・ティ・ティ・ドコモ ソフトウェア動作モデル化装置、ソフトウェア動作監視装置、ソフトウェア動作モデル化方法及びソフトウェア動作監視方法
US7366842B1 (en) * 2005-12-15 2008-04-29 Nvidia Corporation Creating permanent storage on the fly within existing buffers
WO2007076629A1 (en) * 2005-12-30 2007-07-12 Intel Corporation Type checking for object-oriented programming languages
US7502029B2 (en) * 2006-01-17 2009-03-10 Silicon Integrated Systems Corp. Instruction folding mechanism, method for performing the same and pixel processing system employing the same
DE102006041002B4 (de) * 2006-08-31 2009-01-02 Infineon Technologies Ag Verfahren, um ein Programm an einen Zwischenspeicher anzupassen, und Schaltungsanordnung
GB2442495B (en) * 2006-10-02 2009-04-01 Transitive Ltd Method and apparatus for handling dynamically linked function cells with respect to program code conversion
US8301990B2 (en) * 2007-09-27 2012-10-30 Analog Devices, Inc. Programmable compute unit with internal register and bit FIFO for executing Viterbi code
DE102007051345A1 (de) * 2007-10-26 2009-04-30 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Explosivstoffladung
WO2009155483A1 (en) 2008-06-20 2009-12-23 Invensys Systems, Inc. Systems and methods for immersive interaction with actual and/or simulated facilities for process, environmental and industrial control
US10621092B2 (en) 2008-11-24 2020-04-14 Intel Corporation Merging level cache and data cache units having indicator bits related to speculative execution
US9672019B2 (en) 2008-11-24 2017-06-06 Intel Corporation Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program to multiple parallel threads
US8453114B2 (en) * 2008-12-24 2013-05-28 Microsoft Corporation Implicit iteration of keyed array symbol
EP2400778B1 (en) * 2009-02-23 2019-02-13 Mitsubishi Electric Corporation Wireless communication system, wireless communication device, and wireless communication method
US10453130B2 (en) 2009-03-18 2019-10-22 Bgc Partners, Inc. Electronic exchange system using messages related to events and actions on an exchange
US10380689B2 (en) * 2009-03-06 2019-08-13 Bgc Partners, Inc. Method and apparatus for exchange-based condition processing
US8463964B2 (en) 2009-05-29 2013-06-11 Invensys Systems, Inc. Methods and apparatus for control configuration with enhanced change-tracking
US8127060B2 (en) 2009-05-29 2012-02-28 Invensys Systems, Inc Methods and apparatus for control configuration with control objects that are fieldbus protocol-aware
US8775153B2 (en) * 2009-12-23 2014-07-08 Intel Corporation Transitioning from source instruction set architecture (ISA) code to translated code in a partial emulation environment
US9268945B2 (en) * 2010-03-19 2016-02-23 Contrast Security, Llc Detection of vulnerabilities in computer systems
US9189271B2 (en) 2011-09-13 2015-11-17 Empire Technology Development, Llc Operation transfer from an origin virtual machine to a destination virtual machine while continue the execution of the operation on the origin virtual machine
US20130086359A1 (en) * 2011-09-29 2013-04-04 Qualcomm Incorporated Processor Hardware Pipeline Configured for Single-Instruction Address Extraction and Memory Access Operation
US9417855B2 (en) 2011-09-30 2016-08-16 Intel Corporation Instruction and logic to perform dynamic binary translation
US20130091331A1 (en) * 2011-10-11 2013-04-11 Iulian Moraru Methods, apparatus, and articles of manufacture to manage memory
US9069587B2 (en) * 2011-10-31 2015-06-30 Stec, Inc. System and method to cache hypervisor data
JP5779077B2 (ja) * 2011-11-22 2015-09-16 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation プログラムの生成を支援する装置及び方法
US8990788B2 (en) 2012-03-09 2015-03-24 Empire Technology Development Llc Compilation of code in a data center
US9471344B1 (en) * 2012-03-27 2016-10-18 Marvell International Ltd. Hardware support for processing virtual machine instructions
US9383448B2 (en) 2012-07-05 2016-07-05 Deca System Co., Ltd. Golf GPS device with automatic hole recognition and playing hole selection
US9575755B2 (en) 2012-08-03 2017-02-21 International Business Machines Corporation Vector processing in an active memory device
US9569211B2 (en) 2012-08-03 2017-02-14 International Business Machines Corporation Predication in a vector processor
US9632777B2 (en) 2012-08-03 2017-04-25 International Business Machines Corporation Gather/scatter of multiple data elements with packed loading/storing into/from a register file entry
US9003160B2 (en) 2012-08-03 2015-04-07 International Business Machines Corporation Active buffered memory
US9594724B2 (en) 2012-08-09 2017-03-14 International Business Machines Corporation Vector register file
US9298395B2 (en) 2012-10-22 2016-03-29 Globalfoundries Inc. Memory system connector
US8972782B2 (en) 2012-11-09 2015-03-03 International Business Machines Corporation Exposed-pipeline processing element with rollback
US9189399B2 (en) 2012-11-21 2015-11-17 Advanced Micro Devices, Inc. Stack cache management and coherence techniques
US20140143498A1 (en) * 2012-11-21 2014-05-22 Advanced Micro Devices, Inc. Methods and apparatus for filtering stack data within a cache memory hierarchy
US9734059B2 (en) 2012-11-21 2017-08-15 Advanced Micro Devices, Inc. Methods and apparatus for data cache way prediction based on classification as stack data
US9032157B2 (en) * 2012-12-11 2015-05-12 International Business Machines Corporation Virtual machine failover
US9069701B2 (en) * 2012-12-11 2015-06-30 International Business Machines Corporation Virtual machine failover
US9292292B2 (en) * 2013-06-20 2016-03-22 Advanced Micro Devices, Inc. Stack access tracking
US8943462B2 (en) * 2013-06-28 2015-01-27 Sap Se Type instances
US10001993B2 (en) 2013-08-08 2018-06-19 Linear Algebra Technologies Limited Variable-length instruction buffer management
US11768689B2 (en) 2013-08-08 2023-09-26 Movidius Limited Apparatus, systems, and methods for low power computational imaging
US9891936B2 (en) 2013-09-27 2018-02-13 Intel Corporation Method and apparatus for page-level monitoring
US20150186168A1 (en) * 2013-12-30 2015-07-02 Unisys Corporation Dedicating processing resources to just-in-time compilers and instruction processors in a dynamic translator
JP6371855B2 (ja) 2014-03-26 2018-08-08 インテル・コーポレーション プロセッサ、方法、システム、プログラム、及び非一時的機械可読記憶媒体
US9513805B2 (en) * 2014-04-15 2016-12-06 International Business Machines Corporation Page table including data fetch width indicator
US11755202B2 (en) 2015-01-20 2023-09-12 Ultrata, Llc Managing meta-data in an object memory fabric
WO2016118615A1 (en) 2015-01-20 2016-07-28 Ultrata Llc Object memory data flow instruction execution
US9672351B2 (en) * 2015-02-02 2017-06-06 Qualcomm Incorporated Authenticated control stacks
US11327779B2 (en) * 2015-03-25 2022-05-10 Vmware, Inc. Parallelized virtual machine configuration
US9971542B2 (en) 2015-06-09 2018-05-15 Ultrata, Llc Infinite memory fabric streams and APIs
US10698628B2 (en) 2015-06-09 2020-06-30 Ultrata, Llc Infinite memory fabric hardware implementation with memory
US9886210B2 (en) 2015-06-09 2018-02-06 Ultrata, Llc Infinite memory fabric hardware implementation with router
CN105183433B (zh) * 2015-08-24 2018-02-06 上海兆芯集成电路有限公司 指令合并方法以及具有多数据通道的装置
US9705620B2 (en) * 2015-09-18 2017-07-11 Qualcomm Incorporated Synchronization of endpoints using tunable latency
US10241676B2 (en) 2015-12-08 2019-03-26 Ultrata, Llc Memory fabric software implementation
US10235063B2 (en) 2015-12-08 2019-03-19 Ultrata, Llc Memory fabric operations and coherency using fault tolerant objects
WO2017100281A1 (en) 2015-12-08 2017-06-15 Ultrata, Llc Memory fabric software implementation
WO2017100288A1 (en) 2015-12-08 2017-06-15 Ultrata, Llc. Memory fabric operations and coherency using fault tolerant objects
US10908910B2 (en) * 2018-07-27 2021-02-02 Oracle International Corporation Lazy copying of runtime-managed stack frames
US11106463B2 (en) 2019-05-24 2021-08-31 Texas Instruments Incorporated System and method for addressing data in memory
US10942852B1 (en) 2019-09-12 2021-03-09 Advanced New Technologies Co., Ltd. Log-structured storage systems
SG11202002588RA (en) 2019-09-12 2020-04-29 Alibaba Group Holding Ltd Log-structured storage systems
SG11202002732TA (en) 2019-09-12 2020-04-29 Alibaba Group Holding Ltd Log-structured storage systems
WO2019228575A2 (en) 2019-09-12 2019-12-05 Alibaba Group Holding Limited Log-structured storage systems
WO2019228571A2 (en) 2019-09-12 2019-12-05 Alibaba Group Holding Limited Log-structured storage systems
SG11202002363QA (en) 2019-09-12 2020-04-29 Alibaba Group Holding Ltd Log-structured storage systems
EP3673376B1 (en) 2019-09-12 2022-11-30 Advanced New Technologies Co., Ltd. Log-structured storage systems
SG11202002027TA (en) * 2019-09-12 2020-04-29 Alibaba Group Holding Ltd Log-structured storage systems
WO2019228568A2 (en) 2019-09-12 2019-12-05 Alibaba Group Holding Limited Log-structured storage systems
CN113326020A (zh) * 2020-02-28 2021-08-31 北京百度网讯科技有限公司 缓存器件、缓存器、系统、数据处理方法、装置及介质
US11809839B2 (en) 2022-01-18 2023-11-07 Robert Lyden Computer language and code for application development and electronic and optical communication

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5107457A (en) * 1989-04-03 1992-04-21 The Johns Hopkins University Stack data cache having a stack management hardware with internal and external stack pointers and buffers for handling underflow and overflow stack
US5157777A (en) * 1989-12-22 1992-10-20 Intel Corporation Synchronous communication between execution environments in a data processing system employing an object-oriented memory protection mechanism

Family Cites Families (88)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3878513A (en) 1972-02-08 1975-04-15 Burroughs Corp Data processing method and apparatus using occupancy indications to reserve storage space for a stack
US3810117A (en) 1972-10-20 1974-05-07 Ibm Stack mechanism for a data processor
GB1441816A (en) 1973-07-18 1976-07-07 Int Computers Ltd Electronic digital data processing systems
US3889243A (en) 1973-10-18 1975-06-10 Ibm Stack mechanism for a data processor
JPS5474651A (en) 1977-11-28 1979-06-14 Toshiba Corp Stack control system
US4354232A (en) 1977-12-16 1982-10-12 Honeywell Information Systems Inc. Cache memory command buffer circuit
US4325118A (en) 1980-03-03 1982-04-13 Western Digital Corporation Instruction fetch circuitry for computers
US4524416A (en) 1980-04-15 1985-06-18 Honeywell Information Systems Inc. Stack mechanism with the ability to dynamically alter the size of a stack in a data processing system
US4375678A (en) 1980-08-25 1983-03-01 Sperry Corporation Redundant memory arrangement providing simultaneous access
US4439828A (en) * 1981-07-27 1984-03-27 International Business Machines Corp. Instruction substitution mechanism in an instruction handling unit of a data processing system
US4530049A (en) 1982-02-11 1985-07-16 At&T Bell Laboratories Stack cache with fixed size stack frames
US5043870A (en) 1982-02-24 1991-08-27 At&T Bell Laboratories Computer with automatic mapping of memory contents into machine registers during program execution
US4600986A (en) 1984-04-02 1986-07-15 Sperry Corporation Pipelined split stack with high performance interleaved decode
US4674032A (en) 1984-04-02 1987-06-16 Unisys Corporation High-performance pipelined stack with over-write protection
JPS6133546A (ja) 1984-07-25 1986-02-17 Nec Corp 情報処理装置
US4761733A (en) 1985-03-11 1988-08-02 Celerity Computing Direct-execution microprogrammable microprocessor system
DE3689595T2 (de) 1985-04-08 1994-05-19 Hitachi Ltd Datenverarbeitungssystem.
US4849880A (en) * 1985-11-18 1989-07-18 John Fluke Mfg. Co., Inc. Virtual machine programming system
JP2826309B2 (ja) 1985-11-20 1998-11-18 日本電気株式会社 情報処理装置
JPH0687221B2 (ja) 1986-04-02 1994-11-02 日本電気株式会社 情報処理装置
JP2545789B2 (ja) 1986-04-14 1996-10-23 株式会社日立製作所 情報処理装置
US4811208A (en) 1986-05-16 1989-03-07 Intel Corporation Stack frame cache on a microprocessor chip
JPS63242243A (ja) 1987-03-31 1988-10-07 株式会社東芝 超音波ドプラ診断装置
US5115500A (en) 1988-01-11 1992-05-19 International Business Machines Corporation Plural incompatible instruction format decode method and apparatus
US5210874A (en) 1988-03-22 1993-05-11 Digital Equipment Corporation Cross-domain call system in a capability based digital data processing system
US5313614A (en) * 1988-12-06 1994-05-17 At&T Bell Laboratories Method and apparatus for direct conversion of programs in object code form between different hardware architecture computer systems
US5187793A (en) * 1989-01-09 1993-02-16 Intel Corporation Processor with hierarchal memory and using meta-instructions for software control of loading, unloading and execution of machine instructions stored in the cache
US4951194A (en) 1989-01-23 1990-08-21 Tektronix, Inc. Method for reducing memory allocations and data copying operations during program calling sequences
US5359507A (en) * 1989-04-07 1994-10-25 Mitsubishi Denki Kabushiki Kaisha Sequence controller
US5142635A (en) 1989-04-07 1992-08-25 Intel Corporation Method and circuitry for performing multiple stack operations in succession in a pipelined digital computer
US5093777A (en) 1989-06-12 1992-03-03 Bull Hn Information Systems Inc. Method and apparatus for predicting address of a subsequent cache request upon analyzing address patterns stored in separate miss stack
JP2818249B2 (ja) * 1990-03-30 1998-10-30 株式会社東芝 電子計算機
US5471591A (en) * 1990-06-29 1995-11-28 Digital Equipment Corporation Combined write-operand queue and read-after-write dependency scoreboard
JP3027627B2 (ja) 1991-02-25 2000-04-04 松下電工株式会社 プログラマブルコントローラの演算プロセッサ
US5701417A (en) 1991-03-27 1997-12-23 Microstar Laboratories Method and apparatus for providing initial instructions through a communications interface in a multiple computer system
JP3204323B2 (ja) 1991-07-05 2001-09-04 エヌイーシーマイクロシステム株式会社 キャッシュメモリ内蔵マイクロプロセッサ
US5634027A (en) 1991-11-20 1997-05-27 Kabushiki Kaisha Toshiba Cache memory system for multiple processors with collectively arranged cache tag memories
US5274818A (en) * 1992-02-03 1993-12-28 Thinking Machines Corporation System and method for compiling a fine-grained array based source program onto a course-grained hardware
US5438668A (en) 1992-03-31 1995-08-01 Seiko Epson Corporation System and method for extraction, alignment and decoding of CISC instructions into a nano-instruction bucket for execution by a RISC computer
US5522051A (en) * 1992-07-29 1996-05-28 Intel Corporation Method and apparatus for stack manipulation in a pipelined processor
US5471602A (en) * 1992-07-31 1995-11-28 Hewlett-Packard Company System and method of scoreboarding individual cache line segments
US5367650A (en) * 1992-07-31 1994-11-22 Intel Corporation Method and apparauts for parallel exchange operation in a pipelined processor
AU4804493A (en) * 1992-08-07 1994-03-03 Thinking Machines Corporation Massively parallel computer including auxiliary vector processor
DE4306031C2 (de) * 1993-02-26 1995-11-02 Siemens Ag Verfahren zum ferngesteuerten Administrieren von Kommunikationssystemen
JPH0793216A (ja) 1993-09-27 1995-04-07 Hitachi Ltd キャッシュ記憶制御装置
US5499352A (en) * 1993-09-30 1996-03-12 Intel Corporation Floating point register alias table FXCH and retirement floating point register array
US5548776A (en) * 1993-09-30 1996-08-20 Intel Corporation N-wide bypass for data dependencies within register alias table
JPH07114473A (ja) * 1993-10-19 1995-05-02 Fujitsu Ltd コンパイラの命令列最適化方法
US5481684A (en) 1994-01-11 1996-01-02 Exponential Technology, Inc. Emulating operating system calls in an alternate instruction set using a modified code segment descriptor
EP0676691A3 (en) 1994-04-06 1996-12-11 Hewlett Packard Co Device for saving and restoring registers in a digital computer.
US5485572A (en) 1994-04-26 1996-01-16 Unisys Corporation Response stack state validation check
US5481693A (en) 1994-07-20 1996-01-02 Exponential Technology, Inc. Shared register architecture for a dual-instruction-set CPU
US5598546A (en) 1994-08-31 1997-01-28 Exponential Technology, Inc. Dual-architecture super-scalar pipeline
US5636362A (en) 1994-09-28 1997-06-03 Intel Corporation Programmable high watermark in stack frame cache using second region as a storage if first region is full and an event having a predetermined minimum priority
DE4435183C2 (de) * 1994-09-30 2000-04-20 Siemens Ag Verfahren zum Betrieb eines Magnetresonanzgeräts
US6496922B1 (en) 1994-10-31 2002-12-17 Sun Microsystems, Inc. Method and apparatus for multiplatform stateless instruction set architecture (ISA) using ISA tags on-the-fly instruction translation
US5748964A (en) * 1994-12-20 1998-05-05 Sun Microsystems, Inc. Bytecode program interpreter apparatus and method with pre-verification of data type restrictions
US5630066A (en) 1994-12-20 1997-05-13 Sun Microsystems, Inc. System and method for locating object view and platform independent object
US5638525A (en) 1995-02-10 1997-06-10 Intel Corporation Processor capable of executing programs that contain RISC and CISC instructions
US5600726A (en) * 1995-04-07 1997-02-04 Gemini Systems, L.L.C. Method for creating specific purpose rule-based n-bit virtual machines
US5634118A (en) * 1995-04-10 1997-05-27 Exponential Technology, Inc. Splitting a floating-point stack-exchange instruction for merging into surrounding instructions by operand translation
US5862370A (en) * 1995-09-27 1999-01-19 Vlsi Technology, Inc. Data processor system with instruction substitution filter for deimplementing instructions
US6076155A (en) 1995-10-24 2000-06-13 S3 Incorporated Shared register architecture for a dual-instruction-set CPU to facilitate data exchange between the instruction sets
US5657486A (en) * 1995-12-07 1997-08-12 Teradyne, Inc. Automatic test equipment with pipelined sequencer
US5699537A (en) * 1995-12-22 1997-12-16 Intel Corporation Processor microarchitecture for efficient dynamic scheduling and execution of chains of dependent instructions
US5687336A (en) 1996-01-11 1997-11-11 Exponential Technology, Inc. Stack push/pop tracking and pairing in a pipelined processor
US5784553A (en) 1996-01-16 1998-07-21 Parasoft Corporation Method and system for generating a computer program test suite using dynamic symbolic execution of JAVA programs
US5761408A (en) * 1996-01-16 1998-06-02 Parasoft Corporation Method and system for generating a computer program test suite using dynamic symbolic execution
JP3634379B2 (ja) 1996-01-24 2005-03-30 サン・マイクロシステムズ・インコーポレイテッド スタックキャッシングのための方法及び装置
EP0976030B1 (en) * 1996-01-24 2008-07-02 Sun Microsystems, Inc. Instruction folding for a stack-based machine
EP0976029A2 (en) 1996-01-24 2000-02-02 Sun Microsystems, Inc. A processor for executing instruction sets received from a network or from a local memory
US6151703A (en) * 1996-05-20 2000-11-21 Inprise Corporation Development system with methods for just-in-time compilation of programs
US6711667B1 (en) * 1996-06-28 2004-03-23 Legerity, Inc. Microprocessor configured to translate instructions from one instruction set to another, and to store the translated instructions
US5953741A (en) * 1996-11-27 1999-09-14 Vlsi Technology, Inc. Stack cache for stack-based processor and method thereof
US5903761A (en) * 1997-10-31 1999-05-11 Preemptive Solutions, Inc. Method of reducing the number of instructions in a program code sequence
US6205578B1 (en) * 1998-08-14 2001-03-20 Ati International Srl Interpreter for stack-based languages
US6349383B1 (en) * 1998-09-10 2002-02-19 Ip-First, L.L.C. System for combining adjacent push/pop stack program instructions into single double push/pop stack microinstuction for execution
US20050149694A1 (en) * 1998-12-08 2005-07-07 Mukesh Patel Java hardware accelerator using microcode engine
US6332215B1 (en) * 1998-12-08 2001-12-18 Nazomi Communications, Inc. Java virtual machine hardware for RISC and CISC processors
US6338160B1 (en) * 1998-12-08 2002-01-08 Nazomi Communications, Inc. Constant pool reference resolution method
US6826749B2 (en) * 1998-12-08 2004-11-30 Nazomi Communications, Inc. Java hardware accelerator using thread manager
GB2367654B (en) * 2000-10-05 2004-10-27 Advanced Risc Mach Ltd Storing stack operands in registers
GB2367653B (en) * 2000-10-05 2004-10-20 Advanced Risc Mach Ltd Restarting translated instructions
GB2369464B (en) * 2000-11-27 2005-01-05 Advanced Risc Mach Ltd A data processing apparatus and method for saving return state
US7076771B2 (en) * 2000-12-01 2006-07-11 Arm Limited Instruction interpretation within a data processing system
GB2376097B (en) * 2001-05-31 2005-04-06 Advanced Risc Mach Ltd Configuration control within data processing systems
GB2376100B (en) * 2001-05-31 2005-03-09 Advanced Risc Mach Ltd Data processing using multiple instruction sets
US6832307B2 (en) * 2001-07-19 2004-12-14 Stmicroelectronics, Inc. Instruction fetch buffer stack fold decoder for generating foldable instruction status information

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5107457A (en) * 1989-04-03 1992-04-21 The Johns Hopkins University Stack data cache having a stack management hardware with internal and external stack pointers and buffers for handling underflow and overflow stack
US5157777A (en) * 1989-12-22 1992-10-20 Intel Corporation Synchronous communication between execution environments in a data processing system employing an object-oriented memory protection mechanism

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"UP POPS A 32BIT STACK MICROPROCESSOR", ELECTRONIC ENGINEERING, vol. 61, no. 750, June 1989 (1989-06-01), pages 79, XP000033120 *
ATKINSON R R ET AL: "THE DRAGON PROCESSOR", SECOND INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS II), no. 1987, 5 October 1987 (1987-10-05), PALO ALTO, CALIFORNIA, US, pages 65 - 69, XP000042867 *
BURNLEY P: "CPU ARCHITECTURE FOR REALTIME VME SYSTEMS", MICROPROCESSORS AND MICROSYSTEMS, LONDON, GB, vol. 12, no. 3, April 1988 (1988-04-01), pages 153 - 158, XP000002633 *
LOPRIORE L: "LINE FETCH/PREFETCH IN A STACK CACHE MEMORY", MICROPROCESSORS AND MICROSYSTEMS, vol. 17, no. 9, 1 November 1993 (1993-11-01), pages 547 - 555, XP000413173 *
STANLEY ET AL.: "A performance analysis of automatically managed top of stack buffers", 14TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, 2 June 1987 (1987-06-02), PITTSBURGH, US, pages 272 - 281, XP002032257 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1019829A1 (en) * 1997-09-30 2000-07-19 Idea Corporation Method and apparatus for transferring data between a register stack and a memory resource
EP1019829A4 (en) * 1997-09-30 2003-04-16 Idea Corp METHOD AND DEVICE FOR TRACING DATA BETWEEN A REGISTER STACK AND A STORAGE SOURCE
EP1124362A2 (en) * 2000-01-19 2001-08-16 Wiznot Corp. Apparatus for processing TCP/IP by hardware, and operating method therefor
EP1124362A3 (en) * 2000-01-19 2004-01-07 Wiznot Corp. Apparatus for processing TCP/IP by hardware, and operating method therefor
WO2002045385A2 (en) * 2000-11-20 2002-06-06 Zucotto Wireless, Inc. Methods and devices for caching method frame segments in a low-power stack-based processor
WO2002045385A3 (en) * 2000-11-20 2003-09-12 Zucotto Wireless Inc Methods and devices for caching method frame segments in a low-power stack-based processor
EP1387274A2 (en) * 2002-07-31 2004-02-04 Texas Instruments Incorporated Memory management for local variables
EP1387274A3 (en) * 2002-07-31 2004-08-11 Texas Instruments Incorporated Memory management for local variables
US7203797B2 (en) 2002-07-31 2007-04-10 Texas Instruments Incorporated Memory management of local variables
US8516589B2 (en) 2009-04-07 2013-08-20 Samsung Electronics Co., Ltd. Apparatus and method for preventing virus code execution

Also Published As

Publication number Publication date
US6532531B1 (en) 2003-03-11
US6961843B2 (en) 2005-11-01
KR20050052529A (ko) 2005-06-02
US20030115238A1 (en) 2003-06-19
JP3634379B2 (ja) 2005-03-30
US20050267996A1 (en) 2005-12-01
DE69734399D1 (de) 2006-03-02
JP2000513464A (ja) 2000-10-10
US20070277021A1 (en) 2007-11-29
US20030200351A1 (en) 2003-10-23
US6950923B2 (en) 2005-09-27
EP0976034A1 (en) 2000-02-02
EP0976034B1 (en) 2005-10-19
KR100584964B1 (ko) 2006-05-29

Similar Documents

Publication Publication Date Title
EP0976034B1 (en) Method and apparatus for stack caching
EP0976030B1 (en) Instruction folding for a stack-based machine
US5925123A (en) Processor for executing instruction sets received from a network or from a local memory
US6038643A (en) Stack management unit and method for a processor having a stack
EP0976050A1 (en) Processor with accelerated array access bounds checking
US6076141A (en) Look-up switch accelerator and method of operating same
US6148391A (en) System for simultaneously accessing one or more stack elements by multiple functional units using real stack addresses
US5970242A (en) Replicating code to eliminate a level of indirection during execution of an object oriented computer program
US7080362B2 (en) Java virtual machine hardware for RISC and CISC processors
US6065108A (en) Non-quick instruction accelerator including instruction identifier and data set storage and method of implementing same
Yellin et al. The java virtual machine specification
KR100618718B1 (ko) 스택메모리구조에서의캐싱방법및장치

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 1019980705675

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 1997904010

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1019980705675

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 1997904010

Country of ref document: EP

WWR Wipo information: refused in national office

Ref document number: 1019980705675

Country of ref document: KR

WWG Wipo information: grant in national office

Ref document number: 1997904010

Country of ref document: EP