WO2001084305A1 - Processor architecture having an alu, a java stack and multiple satckpointers - Google Patents

Processor architecture having an alu, a java stack and multiple satckpointers Download PDF

Info

Publication number
WO2001084305A1
WO2001084305A1 PCT/US2001/006813 US0106813W WO0184305A1 WO 2001084305 A1 WO2001084305 A1 WO 2001084305A1 US 0106813 W US0106813 W US 0106813W WO 0184305 A1 WO0184305 A1 WO 0184305A1
Authority
WO
WIPO (PCT)
Prior art keywords
stack
data
operand
recited
processor
Prior art date
Application number
PCT/US2001/006813
Other languages
French (fr)
Inventor
Lonnie C. Goff
David R. Evoy
Satyendra S. Sethi
Original Assignee
Koninklijke Philips Electronics N.V.
Philips Semiconductors, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V., Philips Semiconductors, Inc. filed Critical Koninklijke Philips Electronics N.V.
Priority to EP01914655A priority Critical patent/EP1281120A1/en
Priority to JP2001580661A priority patent/JP2003532221A/en
Publication of WO2001084305A1 publication Critical patent/WO2001084305A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30134Register stacks; shift registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead

Definitions

  • PROCESSOR ARCHITECTURE HAVING AN ALU A JAVA STACK AND MULTIPLE STACKPOINTERS
  • the present invention relates to the execution unit inside a JavaTM stack-based processor. More particularly, the present invention relates to processing data through stack processing hardware structures in arrangements that provide more efficient exchange of data between processing and storage components.
  • processors having greater processing capabilities.
  • Today's processors handle more complex tasks which require greater processing speed.
  • these complex tasks require processors that utilize their internal storage capabilities more efficiently.
  • Well known processors include stack-based machines that perform arithmetic and logical operations on data retrieved from an operand stack.
  • One such stack-based processor is implemented by the Java Virtual Machine (JVM).
  • JVM Java Virtual Machine
  • the JVM which is commonly in the form of a computer model, supports the execution of the Java language.
  • JVM Java Virtual Machine
  • the JVM works well for some less demanding applications, there has been a push to implement some of the JVM in hardware to improve performance.
  • stack processor has been implemented in hardware, although, the implementations have been less than efficient in view of the increased processing demands.
  • common stack-based processors are constructed with either 32 bit or 64 bit wide stacks. If a 32 bit wide stack-based processor is used, and there is a desire to w ⁇ te a 64 bit wide entry, two accesses are required to read the data from the operand stack to an ALU, and two accesses are required to store the result back into the stack That is, two 32 bit reads are required to read the data from the stack and an additional two 32 bit w ⁇ tes are required to store the result back into the stack.
  • a stack-based processor that is 64 bits wide contains data that is two 64 bits wide, again a total of four accesses would be required to read the data from the stack and then store the result in the operand stack. Meaning, two 64 bit reads are required to read the data from the operand stack and two 64 bits are required to store the result in the operand stack Accordingly, the multiple accesses required to process data in current stack-based processors increase the overall processing time of data and decrease the overall efficiency of an implementation using the stack-based processor
  • the native Java processor creates a real stack-based processor in hardware
  • the real stack-based processor is very complex and does not enable use of most software available for standard processors, (e g , ARM, MIPS, x86, etc )
  • Compilation is another technique used to enhance performance Compilation eliminates the interpreter and compiles Java directly to a specified machine code Nonetheless, the resulting executable file is no longer portable
  • the Resources required to perform Java compilation are expensive and unavailable at times (e g , handheld personal digital assistants and mobile phones do not have sufficient memory to support a compiler)
  • the time required to perform the compilation will cause excessive initial delays (i e , real-time responses are not possible at startup)
  • the present invention fills these needs by providing a method and hardware for efficiently processing data through a stack-based processor It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, a device or a method Several inventive embodiments of the present invention are desc ⁇ bed below
  • a method for processing data through a stack processor comp ⁇ ses w ⁇ ting data to an operand stack and identifying locations for the data within the operand stack. The locations of the data withm the operand stack are identified using one or more stackpointers Once the stackpointers locate the data withm the operand stack, a parallel transfer of data to an a ⁇ thmetic logic unit is done The transferred data is selected using the stackpointers and a function code. The transferred data is then processed in an a ⁇ thmetic logic unit in accordance with instructions defined by the function code.
  • a stack processing subsystem for processing data comp ⁇ ses an operand stack located within the stack processing subsystem, where the operand stack has banks capable of sto ⁇ ng the data.
  • the stack processing subsystem also includes stackpointers that are configured to define locations of the data withm the banks of the operand stack.
  • An a ⁇ thmetic logic unit interfaces with the operand stack such that parallel word transfers of the data can be executed from the operand stack to the a ⁇ thmetic logic unit.
  • the data to be transferred in the parallel word transfers is defined using the stackpointers and function code
  • the function code defines a particular instruction to be processed by the a ⁇ thmetic logic unit.
  • a method for performing operations on data using a stack processor is disclosed.
  • the data is placed into an operand stack contained within the stack processor.
  • the location of the data withm the operand stack is tracked by stackpointers also located in the stack processor
  • the data is transferred from the operand stack to an a ⁇ thmetic logic unit
  • the data is capable of being transferred in 128 bit increments from the operand stack to the a ⁇ thmetic logic unit. Once the data is transferred to the a ⁇ thmetic logic unit, the data is processed according to instructions from function code.
  • the operand stack and the a ⁇ thmetic logic unit are integrated into a single unit.
  • the present invention facilitates more efficient use of a 128 bit wide operand stack by dividing the stack into banks. In the past, if data less than the data width of the stack was to be stored in the stack (i.e., 32 bit data in a 64 bit stack), the remaining space was wasted (i.e., the remaining 32 bits in the 64 bit stack was wasted).
  • the stackpointers allow multiple data sets to be placed in the same row (i.e., two separate 64 bit ent ⁇ es on the same row) as opposed to placing the data on separate rows (i.e., placing two separate 64 bit ent ⁇ es on separate 128 bit rows).
  • the present invention also allows full processing of data 128 bits wide in two only accesses for simple a ⁇ thmetic operations ( ⁇ e., one read access and one w ⁇ te access)
  • ⁇ e., one read access and one w ⁇ te access simple a ⁇ thmetic operations
  • FIG. 1 shows a Java stack subsystem in accordance with one embodiment of the present invention.
  • Figure 2 is one embodiment of the present invention illustrating the stackpointers being implemented to direct transfers of data from the operand stack to particular ALU slots.
  • FIG. 1 shows an overview 100 of a JavaTM stack subsystem 102 in accordance with one embodiment of the present invention
  • the Java stack subsystem 102 is the execution unit for the Java processor
  • the Java stack subsystem 102 includes a 32 bit wide bus 120 which facilitates communication with other subsystems and a system bus 150.
  • the bus 120 is enabled to provide data to be loaded into an input multiplexer (input mux) 108.
  • the input mux 108 directs the 32 bit data from the bus 120 into an operand stack 110.
  • the Stack/ ALU controller 104 decodes function code 124
  • the function code 124 provides information to the Stack/ ALU controller 104 regarding what operations are to be performed on the data stored in the operand stack 110. For example, the function code 124 may call for an ADD operation to be performed on a given set of data. In this case, the function code 124 will direct an a ⁇ thmetic logic unit (ALU) 106 to perform the ADD operation on the data to be read from the operand stack 110.
  • ALU a ⁇ thmetic logic unit
  • the function code 124 therefore causes stackpointers 105c-f (i.e , stackpointer -1, stackpointer -2, stackpointer -3 and stackpointer -4) to define the location of the desired data in the operand stack 110.
  • the desired data can then be ret ⁇ eved by the ALU 106 by way of the output multiplexer 112 using a 128 bit wide bus.
  • data from each portion of the operand stack 110 e.g., each 32 bit portion
  • the operand stack 110 is shown, in one embodiment, including four banks HOa-l lOd. Each of the banks 110 is preferably 32 bits wide.
  • the stackpointers 105 are 32 bit cent ⁇ c and are used to select which of the banks HOa-l lOd of the operand stack 110 will receive data and will also define which data within the banks HOa-l lOd will be passed to the ALU 106.
  • the stackpointers 105c-f define data locations for the last four ent ⁇ es that were pushed down into the operand stack 110.
  • the Stack/ ALU controller 104 controls all operations and manages all functionality within the Stack/ ALU subsystem 102.
  • the Stack/ ALU controller 104 contains six stack pointers 105 which allow for flexible and efficient access to the stack.
  • a stackpointer 105 a (stackpointer +1) and a stackpointer 105b (stackpointer) are defined above the stackpointers 105c-f.
  • Stackpointer +1 105a is configured to address the top of the stack plus one.
  • Stackpointer 105b always addresses the top of the stack.
  • the stackpointers 105c-f address the top of the stack -1, -2, -3, and -A, as desc ⁇ bed above. Du ⁇ ng an access, each of the stackpointers 105c-f can define which ones or all of the banks HOa-d are to be gated.
  • the Stack/ ALU subsystem 102 therefore does not waste time by only reading one 32 bit entity per cycle and then adjusting stackpointers to read a next 32 bit entity.
  • BMIU 130 bus master interface unit 130.
  • the BMIU 130 is configured to manage the communication between the system bus 150 and the Stack/ALU subsystem 102. For example, the BMIU 130 can control addressing 142 and read/w ⁇ tes 144 to the Stack/ ALU subsystem 102 Also shown is a halt 145 command, which can be communicated to the Stack/ ALU subsystem 102
  • RD/WRT STK read/w ⁇ te stack
  • the RD/WRT STK 126 is, in one embodiment, defined by function code which will be discussed further with reference to Figure 2
  • the RD/WRT STK 126 allows the host processor to read and w ⁇ te to and from an operand stack 110.
  • a wait 136 command and ALU CCs (condition codes) 134 are shown as signals coming from the Stack/ALU subsystem 102.
  • a bus 140 receives output from the output 116 to the BMIU 130 along path 132.
  • the following is an example of the efficient operation of the Stack/ALU subsystem 102.
  • data which is 32 bits wide, is sent to the Stack/ ALU subsystem 102 via the bus 120
  • the stackpointers 105c-f direct the data as 32 bit portions into each of banks 110a, 110b, 110c and l lOd.
  • the function code 124 which in one example may define an ADD operation, is passed to the Stack/ ALU controller 104, where the function code 124 is decoded.
  • a first cycle starts where the Stack/ ALU controller 104 directs the passing of data from the operand stack 110 through the output mux 112 to the ALU 106.
  • the data passed from the output mux 112 to the ALU 106 can be up to 128 bits wide (e.g., multiple 32 bit words). This is contrary to the p ⁇ or art which only transfers data at 32 bit wide increments (e.g., only one word at a time) and the transfer of more than one 32 bit portion is completed over a number of cycles without the use of stackpointers.
  • the Stack/ALU controller 104 transfers the function code to the ALU 106.
  • the ALU 106 can therefore perform the desired operation (e.g., which can be any arithmetic operation commonly performed by processors, Java processors, or the like) as specified by the function code 124 on the data to produce a result.
  • the ALU 106 can then signal the Stack/ALU controller 104 informing that the operation is complete and the Stack/ ALU controller 104 reads the result from the ALU 106 back to the operand stack 110.
  • a return path 114 from the ALU 106 to the operand stack 110 is defined by two 32 bit wide paths 114a and 114b (i.e., a 64 bit wide path). This ensures that the result can be stored during the same single access operation. It should be noted that when the result is stored back to the operand stack 110, the stackpointers 105e (stackpointer -3) and 105f (stackpointer -4) define the location of the result within the operand stack 110.
  • the stackpointers 105 are then readjusted after the result is read back into the operand stack 110 in a second cycle.
  • the stackpointers are readjusted such that the stackpointers 105c (stackpointer -1) and 105d (stackpointer -2) indicate the location of the data. It should be noted that the location of the data within the operand stack 110 has not changed.
  • the instruction to readjust the stackpointers is part of the function code 124.
  • the output mux 112 pops out 32 bit wide results.
  • the instruction to pop out the result from the operand stack 110 is written into the function code 124 and is part of standard Java code.
  • the function code is sent to the stack/ ALU subsystem 102 by a decode/execute subsystem. When the function code is sent to the stack/ALU subsystem 102, this causes the appropriate data inside the operand stack 110 to get gated onto the 'B Bus'.
  • Figure 2 is one embodiment of the present invention illustrating the stackpointers 105 transferring data from the operand stack 110 to ALU slots 112a-d.
  • a function code register 202 holds the operation to be performed by the stack/ ALU subsystem 102. This information is derived from a Java instruction by the decode/execute subsystem.
  • the decode/execute subsystem is in communication with the stack/ ALU subsystem 102, and its functionality is known to those skilled in the art
  • the function code in one embodiment, requests that an operation be performed on data withm the operand stack 110 For example, if an ADD operation is desired, the function code register 202 is w ⁇ tten so that an ADD operation is directed.
  • the function code register 202 is shown in communication with a function de-code table 204
  • the function de-code table 204 contains decode signals (e.g., ADDs, long divide, etc.) These decode signals 204a therefore define, in one embodiment, the function to be performed
  • the function de-code table 204 communicates decode signals 204a to a bank selector mux 206.
  • the stackpointers 105 communicate the bank address information of the data within the operand stack 110 to the bank selector mux 206
  • the stackpointers 105 contain address bits 105c-l through 105f-l which specify the bank addresses of the data within the operand stack 110.
  • the bank selector mux 206 takes the information from the decode signals 204a and the address bits 105c-l through 105f-l and generates slot selection signals 206a.
  • the slot selection signals 206a direct which ALU slots 112a-d of the output mux 112 the data will be sent for processing withm the ALU 106.
  • the slot selection signals 206a also tell the input mux 108 the target location in the operand bank 110 which will contain the result of the ALU operation. It should be noted that data may be taken from any of the banks HOa-d regardless of which order the data is in the operand stack 110.
  • a Java instruction may specify that an ADD operation be performed for a set of data contained in the operand stack 110
  • the decode of the Java instruction causes a function code requesting the ADD operation to get loaded.
  • the decode signals 204a are generated and sent to the bank selector mux 206.
  • the stackpointers 105 send information indicating the bank address of data withm the operand stack 110.
  • the stackpointer 105f specifies the bank location of the set of data for the requested ADD operation with address bits 105f-l.
  • the bank selector mux 206 will generate slot signals 206a and sends this to the output mux 112.
  • the slot signals 206a tell the output mux 112 the bank address of the set of data.
  • the present invention provides many benefits to users
  • the present invention now allows operations to be performed through a stack processor in fewer accesses as well as more efficiently At most, two accesses are required, one for reading and one for w ⁇ tmg
  • the data path from the operand stack to the ALU is 128 bits wide, therefore two 64 bit ent ⁇ es may be read in a single access
  • the data path was either 32 bits wide or 64 bits wide In either case, multiple accesses were required to read the data from the operand stack into the ALU for 128 bit entities, thereby increasing processing time
  • the stackpointers avoid wasting expensive storage space within the stacks
  • the stackpointers establish selectable banks whereby 32 bit wide storage areas are created in the 128 bit wide stack
  • 32 bit wide storage areas are created in the 128 bit wide stack
  • up to four separate data ent ⁇ es that are 32 bits wide are capable of being stored in one 128 bit wide stack
  • one 32 bit data entry will not waste storage space in a 128 bit wide stack, thereby using the storage space of the stack more efficiently
  • va ⁇ ous computer-implemented operations involving data stored in computer systems to d ⁇ ve computer pe ⁇ pheial devices may be employed These operations are those requi ⁇ ng physical manipulation of physical quantities Usually, though not necessa ⁇ ly, these quantities take the form of elect ⁇ cal or magnetic signals capable of being stored, transferred, combined, compared and otherwise manipulated Further, the manipulations performed are often referred to in terms such as ascertaining, identifying, scanning, or compa ⁇ ng
  • any of the operations desc ⁇ bed herein that form part of the invention are useful machine operations Any approp ⁇ ate device or apparatus may be utilized to perform these operations
  • the apparatus may be specially constructed for the required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer
  • va ⁇ ous general purpose machines may be used with computer programs w ⁇ tten in accordance with the teachings herein, where it may be more convenient to construct a more specialized apparatus to perform the required operations
  • Any of the operations desc ⁇ bed herein that form part of the invention are useful machine operations. Any approp ⁇ ate device or apparatus may be utilized to perform these operations.
  • the apparatus may be specially constructed for the required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer.
  • va ⁇ ous general purpose machines may be used with computer programs w ⁇ tten in accordance with the teachings herein, where it may be more convenient to construct a more specialized apparatus to perform the required operations.

Abstract

A stack-based processor architecture and associated method for processing data through a processor is provided. First data which is to be processed is written to an operand stack. The location of the data within the operand stack is identified using stackpointers that are contained in a stack/ALU controller of the stack processor. The stackpointers identify banks within the operand stack where data may be stored. After the location of the data is identified, a parallel transfer of data selected using the stackpointers and a function code is generated. The selected data is transferred from the operand stack to an arithmetic logic unit for processing in accordance with instructions defined by the function code. The result can then be efficiently transferred back to the operand stack and read out when desired.

Description

PROCESSOR ARCHITECTURE HAVING AN ALU , A JAVA STACK AND MULTIPLE STACKPOINTERS
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the execution unit inside a Java™ stack-based processor. More particularly, the present invention relates to processing data through stack processing hardware structures in arrangements that provide more efficient exchange of data between processing and storage components.
2. Description of the Related Art
As computer processors begin to inundate the marketplace, there is a push for processors having greater processing capabilities. Today's processors handle more complex tasks which require greater processing speed. In addition, these complex tasks require processors that utilize their internal storage capabilities more efficiently. Well known processors include stack-based machines that perform arithmetic and logical operations on data retrieved from an operand stack. One such stack-based processor is implemented by the Java Virtual Machine (JVM). The JVM, which is commonly in the form of a computer model, supports the execution of the Java language. Although the JVM works well for some less demanding applications, there has been a push to implement some of the JVM in hardware to improve performance. Namely;'vjhe stack processor has been implemented in hardware, although, the implementations have been less than efficient in view of the increased processing demands. For example, common stack-based processors are constructed with either 32 bit or 64 bit wide stacks. If a 32 bit wide stack-based processor is used, and there is a desire to wπte a 64 bit wide entry, two accesses are required to read the data from the operand stack to an ALU, and two accesses are required to store the result back into the stack That is, two 32 bit reads are required to read the data from the stack and an additional two 32 bit wπtes are required to store the result back into the stack. If a stack-based processor that is 64 bits wide contains data that is two 64 bits wide, again a total of four accesses would be required to read the data from the stack and then store the result in the operand stack. Meaning, two 64 bit reads are required to read the data from the operand stack and two 64 bits are required to store the result in the operand stack Accordingly, the multiple accesses required to process data in current stack-based processors increase the overall processing time of data and decrease the overall efficiency of an implementation using the stack-based processor
Also, current 64 bit wide stacks do not allow for efficient allocation of data that is less than 64 bits wide. For example, 32 bit wide data will occupy a row of stack storage area capable of holding 64 bits. That is, the remaining 32 bits of storage in the row will be wasted. Therefore, this type of implementation is highly undesirable due to the waste of expensive storage space.
In addition to the aforementioned stack-based processors, there are currently three basic techniques to enhance performance, full hardware interpretation of Java instructions, native Java and compilation. The hardware interpretation approach uses hardware to improve the interpretive process. A host's hardware translates the Java instructions into machine code. The translated machine code is compatible with the industry standard processor of the hardware that is modeling the Java machine. This presents two fundamental problems First, regardless of whether the translation is performed by software interpretation (typical situation) or hardware translation, this takes time. Once the result is generated, the translation must be executed. Thus, hardware translation is a two-step operation (i.e., translation followed by execution). The second problem associated with the hardware translation of Java is architectural. If the target processor is not a stack-based machine (as is the case with most machines), the translation process becomes more complex As a result, each Ja\ a instruction will explode into multiple target processor instructions so that an artificial (i e , virtual) stack can be emulated
The native Java processor creates a real stack-based processor in hardware However, the real stack-based processor is very complex and does not enable use of most software available for standard processors, (e g , ARM, MIPS, x86, etc )
Compilation is another technique used to enhance performance Compilation eliminates the interpreter and compiles Java directly to a specified machine code Nonetheless, the resulting executable file is no longer portable The Resources required to perform Java compilation are expensive and unavailable at times (e g , handheld personal digital assistants and mobile phones do not have sufficient memory to support a compiler) In addition, the time required to perform the compilation will cause excessive initial delays (i e , real-time responses are not possible at startup)
In view of the foregoing, there is a need for a stack-based processing architecture which avoid the problems of the pπor art In addition, the disclosed architecture and associated method enables efficient use of storage space within an operand stack while avoiding the problems of the pπor art
SUMMARY OF THE INVENTION
Broadly speaking, the present invention fills these needs by providing a method and hardware for efficiently processing data through a stack-based processor It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, a device or a method Several inventive embodiments of the present invention are descπbed below
In one embodiment, a method for processing data through a stack processor is disclosed The method compπses wπting data to an operand stack and identifying locations for the data within the operand stack The locations of the data withm the operand stack are identified using one or more stackpointers Once the stackpointers locate the data withm the operand stack, a parallel transfer of data to an aπthmetic logic unit is done The transferred data is selected using the stackpointers and a function code. The transferred data is then processed in an aπthmetic logic unit in accordance with instructions defined by the function code.
In another embodiment, a stack processing subsystem for processing data is disclosed The stack processing system compπses an operand stack located within the stack processing subsystem, where the operand stack has banks capable of stoπng the data. The stack processing subsystem also includes stackpointers that are configured to define locations of the data withm the banks of the operand stack. An aπthmetic logic unit interfaces with the operand stack such that parallel word transfers of the data can be executed from the operand stack to the aπthmetic logic unit. In addition, the data to be transferred in the parallel word transfers is defined using the stackpointers and function code The function code defines a particular instruction to be processed by the aπthmetic logic unit.
In yet another embodiment, a method for performing operations on data using a stack processor is disclosed. The data is placed into an operand stack contained within the stack processor. The location of the data withm the operand stack is tracked by stackpointers also located in the stack processor After the data is placed into the operand stack, the data is transferred from the operand stack to an aπthmetic logic unit The data is capable of being transferred in 128 bit increments from the operand stack to the aπthmetic logic unit. Once the data is transferred to the aπthmetic logic unit, the data is processed according to instructions from function code.
The many advantages of the present invention should be recognized. In accordance with embodiments of the invention, the operand stack and the aπthmetic logic unit are integrated into a single unit. In addition, the present invention facilitates more efficient use of a 128 bit wide operand stack by dividing the stack into banks. In the past, if data less than the data width of the stack was to be stored in the stack (i.e., 32 bit data in a 64 bit stack), the remaining space was wasted (i.e., the remaining 32 bits in the 64 bit stack was wasted). In the present invention, the stackpointers allow multiple data sets to be placed in the same row (i.e., two separate 64 bit entπes on the same row) as opposed to placing the data on separate rows (i.e., placing two separate 64 bit entπes on separate 128 bit rows). The present invention also allows full processing of data 128 bits wide in two only accesses for simple aπthmetic operations (ι e., one read access and one wπte access) Thus, the processing time of Java stack-based processors is greatly reduced since fewer accesses and more efficient processing of data is made possible by the disclosed and claimed structure and methods.
Other aspects and advantages of the invention will become apparent from the following detailed descπption, taken in conjunction with the accompanying drawings, illustrating by way of example the pπnciples of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be readily understood by the following detailed descπption in conjunction with the accompanying drawings. Therefore, like reference numerals designate like structural elements
Figure 1 shows a Java stack subsystem in accordance with one embodiment of the present invention.
Figure 2 is one embodiment of the present invention illustrating the stackpointers being implemented to direct transfers of data from the operand stack to particular ALU slots.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
An architecture for an integrated stack/ ALU subsystem and an associated method for processing data through the stack subsystem is disclosed. In the following descπption, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be understood, however, to one skilled in the art, that the present invention may be practiced without some or all of these specific details. In other instances, well known process operations have not been descπbed in detail in order not to unnecessaπly obscure the present invention. Figure 1 shows an overview 100 of a Java™ stack subsystem 102 in accordance with one embodiment of the present invention The Java stack subsystem 102 is the execution unit for the Java processor The Java stack subsystem 102 includes a 32 bit wide bus 120 which facilitates communication with other subsystems and a system bus 150. In one embodiment, the bus 120 is enabled to provide data to be loaded into an input multiplexer (input mux) 108. The input mux 108 directs the 32 bit data from the bus 120 into an operand stack 110.
In order to load data into the operand stack 110, the Stack/ ALU controller 104 decodes function code 124 The function code 124 provides information to the Stack/ ALU controller 104 regarding what operations are to be performed on the data stored in the operand stack 110. For example, the function code 124 may call for an ADD operation to be performed on a given set of data. In this case, the function code 124 will direct an aπthmetic logic unit (ALU) 106 to perform the ADD operation on the data to be read from the operand stack 110. The function code 124 therefore causes stackpointers 105c-f (i.e , stackpointer -1, stackpointer -2, stackpointer -3 and stackpointer -4) to define the location of the desired data in the operand stack 110. The desired data can then be retπeved by the ALU 106 by way of the output multiplexer 112 using a 128 bit wide bus. Thus, data from each portion of the operand stack 110 (e.g., each 32 bit portion) can be read by the ALU 106 duπng a single access
The operand stack 110 is shown, in one embodiment, including four banks HOa-l lOd. Each of the banks 110 is preferably 32 bits wide. The stackpointers 105 are 32 bit centπc and are used to select which of the banks HOa-l lOd of the operand stack 110 will receive data and will also define which data within the banks HOa-l lOd will be passed to the ALU 106. The stackpointers 105c-f define data locations for the last four entπes that were pushed down into the operand stack 110. In a preferred embodiment, the Stack/ ALU controller 104 controls all operations and manages all functionality within the Stack/ ALU subsystem 102. The Stack/ ALU controller 104 contains six stack pointers 105 which allow for flexible and efficient access to the stack. A stackpointer 105 a (stackpointer +1) and a stackpointer 105b (stackpointer) are defined above the stackpointers 105c-f. Stackpointer +1 105a is configured to address the top of the stack plus one. Stackpointer 105b always addresses the top of the stack. The stackpointers 105c-f address the top of the stack -1, -2, -3, and -A, as descπbed above. Duπng an access, each of the stackpointers 105c-f can define which ones or all of the banks HOa-d are to be gated. As a result, it is possible to have all four banks HOa-l lOd gated onto a 128 bit wide bus to the ALU 106 duπng a single access cycle The Stack/ ALU subsystem 102 therefore does not waste time by only reading one 32 bit entity per cycle and then adjusting stackpointers to read a next 32 bit entity.
Also connected to the bus 120 is a bus master interface unit (BMIU) 130. The BMIU 130 is configured to manage the communication between the system bus 150 and the Stack/ALU subsystem 102. For example, the BMIU 130 can control addressing 142 and read/wπtes 144 to the Stack/ ALU subsystem 102 Also shown is a halt 145 command, which can be communicated to the Stack/ ALU subsystem 102 A read/wπte stack (RD/WRT STK) 126 is also shown in communication with the Stack/ ALU subsystem 102. The RD/WRT STK 126 is, in one embodiment, defined by function code which will be discussed further with reference to Figure 2 The RD/WRT STK 126 allows the host processor to read and wπte to and from an operand stack 110. In addition, a wait 136 command and ALU CCs (condition codes) 134 are shown as signals coming from the Stack/ALU subsystem 102. A bus 140 receives output from the output 116 to the BMIU 130 along path 132.
For purposes of understanding, the following is an example of the efficient operation of the Stack/ALU subsystem 102. Initially, data, which is 32 bits wide, is sent to the Stack/ ALU subsystem 102 via the bus 120 The stackpointers 105c-f direct the data as 32 bit portions into each of banks 110a, 110b, 110c and l lOd. The function code 124, which in one example may define an ADD operation, is passed to the Stack/ ALU controller 104, where the function code 124 is decoded. After the Stack/ ALU controller 104 decodes the function code 124, a first cycle starts where the Stack/ ALU controller 104 directs the passing of data from the operand stack 110 through the output mux 112 to the ALU 106. In one preferred embodiment, the data passed from the output mux 112 to the ALU 106 can be up to 128 bits wide (e.g., multiple 32 bit words). This is contrary to the pπor art which only transfers data at 32 bit wide increments (e.g., only one word at a time) and the transfer of more than one 32 bit portion is completed over a number of cycles without the use of stackpointers. As the data is passed to the ALU 106, the Stack/ALU controller 104 transfers the function code to the ALU 106. The ALU 106 can therefore perform the desired operation (e.g., which can be any arithmetic operation commonly performed by processors, Java processors, or the like) as specified by the function code 124 on the data to produce a result. The ALU 106 can then signal the Stack/ALU controller 104 informing that the operation is complete and the Stack/ ALU controller 104 reads the result from the ALU 106 back to the operand stack 110. A return path 114 from the ALU 106 to the operand stack 110 is defined by two 32 bit wide paths 114a and 114b (i.e., a 64 bit wide path). This ensures that the result can be stored during the same single access operation. It should be noted that when the result is stored back to the operand stack 110, the stackpointers 105e (stackpointer -3) and 105f (stackpointer -4) define the location of the result within the operand stack 110.
The stackpointers 105 are then readjusted after the result is read back into the operand stack 110 in a second cycle. The stackpointers are readjusted such that the stackpointers 105c (stackpointer -1) and 105d (stackpointer -2) indicate the location of the data. It should be noted that the location of the data within the operand stack 110 has not changed. The instruction to readjust the stackpointers is part of the function code 124. When a user desires to read the result stored in the operand stack 110, the output mux 112 pops out 32 bit wide results. The instruction to pop out the result from the operand stack 110 is written into the function code 124 and is part of standard Java code. Information that is retrieved from the Java Stack is typically done under the control of a POP instruction. The function code is sent to the stack/ ALU subsystem 102 by a decode/execute subsystem. When the function code is sent to the stack/ALU subsystem 102, this causes the appropriate data inside the operand stack 110 to get gated onto the 'B Bus'.
Figure 2 is one embodiment of the present invention illustrating the stackpointers 105 transferring data from the operand stack 110 to ALU slots 112a-d. A function code register 202 holds the operation to be performed by the stack/ ALU subsystem 102. This information is derived from a Java instruction by the decode/execute subsystem. The decode/execute subsystem is in communication with the stack/ ALU subsystem 102, and its functionality is known to those skilled in the art The function code, in one embodiment, requests that an operation be performed on data withm the operand stack 110 For example, if an ADD operation is desired, the function code register 202 is wπtten so that an ADD operation is directed. The function code register 202 is shown in communication with a function de-code table 204 The function de-code table 204, in one embodiment, contains decode signals (e.g., ADDs, long divide, etc.) These decode signals 204a therefore define, in one embodiment, the function to be performed
The function de-code table 204 communicates decode signals 204a to a bank selector mux 206. The stackpointers 105 communicate the bank address information of the data within the operand stack 110 to the bank selector mux 206 The stackpointers 105 contain address bits 105c-l through 105f-l which specify the bank addresses of the data within the operand stack 110. The bank selector mux 206 takes the information from the decode signals 204a and the address bits 105c-l through 105f-l and generates slot selection signals 206a. The slot selection signals 206a direct which ALU slots 112a-d of the output mux 112 the data will be sent for processing withm the ALU 106. The slot selection signals 206a also tell the input mux 108 the target location in the operand bank 110 which will contain the result of the ALU operation. It should be noted that data may be taken from any of the banks HOa-d regardless of which order the data is in the operand stack 110.
For purposes of understanding, the following is an example of the operation of transfemng data from the operand stack 110 to the ALU slots 112a-d as discussed with reference to Figure 2. For example, a Java instruction may specify that an ADD operation be performed for a set of data contained in the operand stack 110 The decode of the Java instruction causes a function code requesting the ADD operation to get loaded.
After the function de-code table 204 is accessed, the decode signals 204a are generated and sent to the bank selector mux 206. In addition, the stackpointers 105 send information indicating the bank address of data withm the operand stack 110. In the example, the stackpointer 105f specifies the bank location of the set of data for the requested ADD operation with address bits 105f-l. The bank selector mux 206 will generate slot signals 206a and sends this to the output mux 112. The slot signals 206a tell the output mux 112 the bank address of the set of data. As may be appreciated, the present invention provides many benefits to users The present invention now allows operations to be performed through a stack processor in fewer accesses as well as more efficiently At most, two accesses are required, one for reading and one for wπtmg The data path from the operand stack to the ALU is 128 bits wide, therefore two 64 bit entπes may be read in a single access In the past, the data path was either 32 bits wide or 64 bits wide In either case, multiple accesses were required to read the data from the operand stack into the ALU for 128 bit entities, thereby increasing processing time
In addition, the stackpointers avoid wasting expensive storage space within the stacks The stackpointers establish selectable banks whereby 32 bit wide storage areas are created in the 128 bit wide stack Thus, up to four separate data entπes that are 32 bits wide are capable of being stored in one 128 bit wide stack As opposed to the pπor art, one 32 bit data entry will not waste storage space in a 128 bit wide stack, thereby using the storage space of the stack more efficiently
The present invention may be implemented using an appropπate type of software dπven computer-implemented operation As such, vaπous computer-implemented operations involving data stored in computer systems to dπve computer peπpheial devices (l e , in the form of software dπvers) may be employed These operations are those requiπng physical manipulation of physical quantities Usually, though not necessaπly, these quantities take the form of electπcal or magnetic signals capable of being stored, transferred, combined, compared and otherwise manipulated Further, the manipulations performed are often referred to in terms such as ascertaining, identifying, scanning, or compaπng
Any of the operations descπbed herein that form part of the invention are useful machine operations Any appropπate device or apparatus may be utilized to perform these operations The apparatus may be specially constructed for the required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer In particular, vaπous general purpose machines may be used with computer programs wπtten in accordance with the teachings herein, where it may be more convenient to construct a more specialized apparatus to perform the required operations
Although the foregoing invention has been descπbed in some detail for purposes of claπty of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. Accordingly, the present embodiments are to be considered as illustrative and not restπctive, and the invention is not to be limited to the details given herein, but may be modified withm the scope and equivalents of the appended claims The present invention may be implemented using an appropπate type of software dπven computer-implemented operation As such, vaπous computer-implemented operations involving data stored in computer systems to dπve computer peπpheral devices (i.e., in the form of software dπvers) may be employed These operations are those requiπng physical manipulation of physical quantities Usually, though not necessaπly, these quantities take the form of electπcal or magnetic signals capable of being stored, transferred, combined, compared and otherwise manipulated Further, the manipulations performed are often referred to in terms such as ascertaining, identifying, scanning, or compaπng
Any of the operations descπbed herein that form part of the invention are useful machine operations. Any appropπate device or apparatus may be utilized to perform these operations. The apparatus may be specially constructed for the required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, vaπous general purpose machines may be used with computer programs wπtten in accordance with the teachings herein, where it may be more convenient to construct a more specialized apparatus to perform the required operations.
What is claimed is.-

Claims

Claims
1. A method for processing data through a stack processor, comprising: writing data to an operand stack; identifying locations for the data within the operand stack using one or more stackpointers; and generating a parallel transfer of data selected using the stackpointers and a function code from the operand stack to an arithmetic logic unit for processing in accordance with instructions defined by the function code.
2. A method for processing data through a stack processor as recited in claim
1, wherein the parallel transfer includes one of moving 64 bits in parallel, 96 bits in parallel, and 128 bits in parallel.
3. A method for processing data through a stack processor as recited in claim 1, wherein the parallel transfer provides the selected data to the arithmetic logic unit for parallel word processing.
4. A method for processing data through a stack processor as recited in claim 1, further comprising: generating a result at the arithmetic logic unit; transferring the result as one of a single word and a multiple word result back to the operand stack, and one or more of the stackpointers being configured to define a location in the operand stack for the result.
5. A method for processing data through a stack processor as recited in claim 4, wherein the parallel transfer provides the selected data to the aπthmetic logic unit for parallel word processing
6. A method for processing data through a stack processor as recited in claim
1, wherein the generating of the parallel transfer of data selected using the stackpointers and a function code from the operand stack to the aπthmetic logic unit further compπses: transfemng stackpointer address information and decode signals to a selector multiplexer; and communicating slot selection signals received from the selector multiplexer to an output multiplexer, the output multiplexer being configured to select particular data from the operand stack.
7. A method for processing data through a stack processor as recited in claim 1, wherein the stack processor is a Java stack processor.
8. A method for processing data through a stack processor as recited in claim 1, wherein the operand stack is divided into a plurality of banks, and each of the plurality of banks having a 32 bit word width.
9. A method for processing data through a stack processor as recited in claim 1, wherein the identifying and generating is controlled by a Stack/ ALU controller.
10. A stack processing subsystem for processing data, compπsing- an operand stack having banks which are capable of stoπng the data, wherein the operand stack is located with the stack processing subsystem; stackpointers in the stack processing subsystem, the stackpointers being configured to define locations of the data within the banks of the operand stack; and an arithmetic logic unit interfaced with the operand stack such that parallel word transfers of the data can be executed from the operand stack to the arithmetic logic unit, wherein the data to be transferred in the parallel word transfers is defined using the stackpointers and function code, the function code defining a particular instruction to be processed by the arithmetic logic unit.
11. A stack processor for processing data as recited in claim 10, wherein the parallel transfer includes one of moving 64 bits in parallel, 96 bits in parallel, and 128 bits in parallel.
12. A stack processor for processing data as recited in claim 10, wherein the stackpointers in the stack processor define locations of the processed data in the operand stack.
13. A stack processor for processing data as recited in claim 10, further including: a bank selector multiplexer that accepts bank address information from the stack pointers and decode signals from the function code, such that the bank selector multiplexer sends slot selection signals to an output multiplexer.
14. A stack processor for processing data as recited in claim 13, wherein the output multiplexer selects particular data from the operand stack.
15. A stack processor for processing data as recited in claim 10, wherein the operand stack is divided into a plurality of banks, with each of the plurality of banks having a 32 bit word width.
16 A stack processor for processing data as recited in claim 10, wherein the operand stack is hardware logic.
17. A stack processor for processing data as recited in claim 10, wherein the stackpointers are hardware logic
18. A stack processor for processing data as recited in claim 10, wherein the aπthmetic logic unit is hardware logic.
19. A stack processor for processing data as recited in claim 10, further compπsing: an input multiplexer proximately located to the operand stack.
20 A stack processor for processing data as recited in claim 14, wherein the output multiplexer is located proximately to the operand stack the aπthmetic logic unit.
21. A stack processor for processing data as recited in claim 10, wherein the stackpointers are located withm a Stack/ ALU controller.
22. A method for performing operations on data using a stack processor, the method compπsing: placing the data into an operand stack, wherein the operand stack is contained withm the stack processor; tracking the location of the data within the operand stack with stackpointers, wherein the stackpointers are located in the stack processor, transfemng the data from the operand stack to an aπthmetic logic unit, wherein the data is capable of being transferred at 128 bit increments; and processing the data in the aπthmetic logic unit according to instructions from function code.
23. A method for performing operations on data using a stack processor as recited in claim 22, further compπsing: generating a result at the aπthmetic logic unit, transfemng the result as one of a single word and a multiple word result back to the operand stack, and one or more of the stackpointers being configured to define a location in the operand stack for the result.
24. A method for performing operations on data using a stack processor as recited in claim 22, wherein transfemng the data from the operand stack to an aπthmetic logic unit further includes: transfemng stackpointer address information and decode signals to a selector multiplexer; and communicating slot selection signals received from the selector multiplexer to an output multiplexer, the output multiplexer being configured to select particular data from the operand stack.
PCT/US2001/006813 2000-05-04 2001-03-01 Processor architecture having an alu, a java stack and multiple satckpointers WO2001084305A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP01914655A EP1281120A1 (en) 2000-05-04 2001-03-01 Processor architecture having an alu, a java stack and multiple stackpointers
JP2001580661A JP2003532221A (en) 2000-05-04 2001-03-01 Processor architecture with ALU, Java stack and multiple stack pointers

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US56567900A 2000-05-04 2000-05-04
US09/565,679 2000-05-04

Publications (1)

Publication Number Publication Date
WO2001084305A1 true WO2001084305A1 (en) 2001-11-08

Family

ID=24259660

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/006813 WO2001084305A1 (en) 2000-05-04 2001-03-01 Processor architecture having an alu, a java stack and multiple satckpointers

Country Status (3)

Country Link
EP (1) EP1281120A1 (en)
JP (1) JP2003532221A (en)
WO (1) WO2001084305A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4247920A (en) * 1979-04-24 1981-01-27 Tektronix, Inc. Memory access system
US4432055A (en) * 1981-09-29 1984-02-14 Honeywell Information Systems Inc. Sequential word aligned addressing apparatus
EP1050818A1 (en) * 1999-05-03 2000-11-08 STMicroelectronics SA Computer memory access

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4247920A (en) * 1979-04-24 1981-01-27 Tektronix, Inc. Memory access system
US4432055A (en) * 1981-09-29 1984-02-14 Honeywell Information Systems Inc. Sequential word aligned addressing apparatus
EP1050818A1 (en) * 1999-05-03 2000-11-08 STMicroelectronics SA Computer memory access

Also Published As

Publication number Publication date
EP1281120A1 (en) 2003-02-05
JP2003532221A (en) 2003-10-28

Similar Documents

Publication Publication Date Title
TWI407366B (en) Microprocessor with private microcode ram,method for efficiently storing data within a microprocessor ,and computer program product for use with a computing device
US6134653A (en) RISC processor architecture with high performance context switching in which one context can be loaded by a co-processor while another context is being accessed by an arithmetic logic unit
US5819063A (en) Method and data processing system for emulating a program
US5784638A (en) Computer system supporting control transfers between two architectures
US4729094A (en) Method and apparatus for coordinating execution of an instruction by a coprocessor
JP3120152B2 (en) Computer system
US4715013A (en) Coprocessor instruction format
EP0938703A4 (en) Real time program language accelerator
WO1993002414A2 (en) Data processing system with synchronization coprocessor for multiple threads
JPH0612327A (en) Data processor having cache memory
US4731736A (en) Method and apparatus for coordinating execution of an instruction by a selected coprocessor
US4750110A (en) Method and apparatus for executing an instruction contingent upon a condition present in another data processor
US5021991A (en) Coprocessor instruction format
KR20010007031A (en) Data processing apparatus
EP0525831B1 (en) Method and apparatus for enabling a processor to coordinate with a coprocessor in the execution of an instruction which is in the intruction stream of the processor.
US4821231A (en) Method and apparatus for selectively evaluating an effective address for a coprocessor
US4758950A (en) Method and apparatus for selectively delaying an interrupt of a coprocessor
US4914578A (en) Method and apparatus for interrupting a coprocessor
Alsup Motorola's 88000 family architecture
JP4465081B2 (en) Efficient sub-instruction emulation in VLIW processor
Wilsey et al. The concurrent execution of non-communicating programs on SIMD processors
Berenbaum et al. Architectural Innovations in the CRISP Microprocessor.
US6108761A (en) Method of and apparatus for saving time performing certain transfer instructions
EP1281120A1 (en) Processor architecture having an alu, a java stack and multiple stackpointers
US4758978A (en) Method and apparatus for selectively evaluating an effective address for a coprocessor

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN DE GB JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

WWE Wipo information: entry into national phase

Ref document number: 2001914655

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 2001914655

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2001914655

Country of ref document: EP