WO2001084305A1 - Processor architecture having an alu, a java stack and multiple satckpointers - Google Patents
Processor architecture having an alu, a java stack and multiple satckpointers Download PDFInfo
- Publication number
- WO2001084305A1 WO2001084305A1 PCT/US2001/006813 US0106813W WO0184305A1 WO 2001084305 A1 WO2001084305 A1 WO 2001084305A1 US 0106813 W US0106813 W US 0106813W WO 0184305 A1 WO0184305 A1 WO 0184305A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- stack
- data
- operand
- recited
- processor
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
- G06F9/30134—Register stacks; shift registers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
Definitions
- PROCESSOR ARCHITECTURE HAVING AN ALU A JAVA STACK AND MULTIPLE STACKPOINTERS
- the present invention relates to the execution unit inside a JavaTM stack-based processor. More particularly, the present invention relates to processing data through stack processing hardware structures in arrangements that provide more efficient exchange of data between processing and storage components.
- processors having greater processing capabilities.
- Today's processors handle more complex tasks which require greater processing speed.
- these complex tasks require processors that utilize their internal storage capabilities more efficiently.
- Well known processors include stack-based machines that perform arithmetic and logical operations on data retrieved from an operand stack.
- One such stack-based processor is implemented by the Java Virtual Machine (JVM).
- JVM Java Virtual Machine
- the JVM which is commonly in the form of a computer model, supports the execution of the Java language.
- JVM Java Virtual Machine
- the JVM works well for some less demanding applications, there has been a push to implement some of the JVM in hardware to improve performance.
- stack processor has been implemented in hardware, although, the implementations have been less than efficient in view of the increased processing demands.
- common stack-based processors are constructed with either 32 bit or 64 bit wide stacks. If a 32 bit wide stack-based processor is used, and there is a desire to w ⁇ te a 64 bit wide entry, two accesses are required to read the data from the operand stack to an ALU, and two accesses are required to store the result back into the stack That is, two 32 bit reads are required to read the data from the stack and an additional two 32 bit w ⁇ tes are required to store the result back into the stack.
- a stack-based processor that is 64 bits wide contains data that is two 64 bits wide, again a total of four accesses would be required to read the data from the stack and then store the result in the operand stack. Meaning, two 64 bit reads are required to read the data from the operand stack and two 64 bits are required to store the result in the operand stack Accordingly, the multiple accesses required to process data in current stack-based processors increase the overall processing time of data and decrease the overall efficiency of an implementation using the stack-based processor
- the native Java processor creates a real stack-based processor in hardware
- the real stack-based processor is very complex and does not enable use of most software available for standard processors, (e g , ARM, MIPS, x86, etc )
- Compilation is another technique used to enhance performance Compilation eliminates the interpreter and compiles Java directly to a specified machine code Nonetheless, the resulting executable file is no longer portable
- the Resources required to perform Java compilation are expensive and unavailable at times (e g , handheld personal digital assistants and mobile phones do not have sufficient memory to support a compiler)
- the time required to perform the compilation will cause excessive initial delays (i e , real-time responses are not possible at startup)
- the present invention fills these needs by providing a method and hardware for efficiently processing data through a stack-based processor It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, a device or a method Several inventive embodiments of the present invention are desc ⁇ bed below
- a method for processing data through a stack processor comp ⁇ ses w ⁇ ting data to an operand stack and identifying locations for the data within the operand stack. The locations of the data withm the operand stack are identified using one or more stackpointers Once the stackpointers locate the data withm the operand stack, a parallel transfer of data to an a ⁇ thmetic logic unit is done The transferred data is selected using the stackpointers and a function code. The transferred data is then processed in an a ⁇ thmetic logic unit in accordance with instructions defined by the function code.
- a stack processing subsystem for processing data comp ⁇ ses an operand stack located within the stack processing subsystem, where the operand stack has banks capable of sto ⁇ ng the data.
- the stack processing subsystem also includes stackpointers that are configured to define locations of the data withm the banks of the operand stack.
- An a ⁇ thmetic logic unit interfaces with the operand stack such that parallel word transfers of the data can be executed from the operand stack to the a ⁇ thmetic logic unit.
- the data to be transferred in the parallel word transfers is defined using the stackpointers and function code
- the function code defines a particular instruction to be processed by the a ⁇ thmetic logic unit.
- a method for performing operations on data using a stack processor is disclosed.
- the data is placed into an operand stack contained within the stack processor.
- the location of the data withm the operand stack is tracked by stackpointers also located in the stack processor
- the data is transferred from the operand stack to an a ⁇ thmetic logic unit
- the data is capable of being transferred in 128 bit increments from the operand stack to the a ⁇ thmetic logic unit. Once the data is transferred to the a ⁇ thmetic logic unit, the data is processed according to instructions from function code.
- the operand stack and the a ⁇ thmetic logic unit are integrated into a single unit.
- the present invention facilitates more efficient use of a 128 bit wide operand stack by dividing the stack into banks. In the past, if data less than the data width of the stack was to be stored in the stack (i.e., 32 bit data in a 64 bit stack), the remaining space was wasted (i.e., the remaining 32 bits in the 64 bit stack was wasted).
- the stackpointers allow multiple data sets to be placed in the same row (i.e., two separate 64 bit ent ⁇ es on the same row) as opposed to placing the data on separate rows (i.e., placing two separate 64 bit ent ⁇ es on separate 128 bit rows).
- the present invention also allows full processing of data 128 bits wide in two only accesses for simple a ⁇ thmetic operations ( ⁇ e., one read access and one w ⁇ te access)
- ⁇ e., one read access and one w ⁇ te access simple a ⁇ thmetic operations
- FIG. 1 shows a Java stack subsystem in accordance with one embodiment of the present invention.
- Figure 2 is one embodiment of the present invention illustrating the stackpointers being implemented to direct transfers of data from the operand stack to particular ALU slots.
- FIG. 1 shows an overview 100 of a JavaTM stack subsystem 102 in accordance with one embodiment of the present invention
- the Java stack subsystem 102 is the execution unit for the Java processor
- the Java stack subsystem 102 includes a 32 bit wide bus 120 which facilitates communication with other subsystems and a system bus 150.
- the bus 120 is enabled to provide data to be loaded into an input multiplexer (input mux) 108.
- the input mux 108 directs the 32 bit data from the bus 120 into an operand stack 110.
- the Stack/ ALU controller 104 decodes function code 124
- the function code 124 provides information to the Stack/ ALU controller 104 regarding what operations are to be performed on the data stored in the operand stack 110. For example, the function code 124 may call for an ADD operation to be performed on a given set of data. In this case, the function code 124 will direct an a ⁇ thmetic logic unit (ALU) 106 to perform the ADD operation on the data to be read from the operand stack 110.
- ALU a ⁇ thmetic logic unit
- the function code 124 therefore causes stackpointers 105c-f (i.e , stackpointer -1, stackpointer -2, stackpointer -3 and stackpointer -4) to define the location of the desired data in the operand stack 110.
- the desired data can then be ret ⁇ eved by the ALU 106 by way of the output multiplexer 112 using a 128 bit wide bus.
- data from each portion of the operand stack 110 e.g., each 32 bit portion
- the operand stack 110 is shown, in one embodiment, including four banks HOa-l lOd. Each of the banks 110 is preferably 32 bits wide.
- the stackpointers 105 are 32 bit cent ⁇ c and are used to select which of the banks HOa-l lOd of the operand stack 110 will receive data and will also define which data within the banks HOa-l lOd will be passed to the ALU 106.
- the stackpointers 105c-f define data locations for the last four ent ⁇ es that were pushed down into the operand stack 110.
- the Stack/ ALU controller 104 controls all operations and manages all functionality within the Stack/ ALU subsystem 102.
- the Stack/ ALU controller 104 contains six stack pointers 105 which allow for flexible and efficient access to the stack.
- a stackpointer 105 a (stackpointer +1) and a stackpointer 105b (stackpointer) are defined above the stackpointers 105c-f.
- Stackpointer +1 105a is configured to address the top of the stack plus one.
- Stackpointer 105b always addresses the top of the stack.
- the stackpointers 105c-f address the top of the stack -1, -2, -3, and -A, as desc ⁇ bed above. Du ⁇ ng an access, each of the stackpointers 105c-f can define which ones or all of the banks HOa-d are to be gated.
- the Stack/ ALU subsystem 102 therefore does not waste time by only reading one 32 bit entity per cycle and then adjusting stackpointers to read a next 32 bit entity.
- BMIU 130 bus master interface unit 130.
- the BMIU 130 is configured to manage the communication between the system bus 150 and the Stack/ALU subsystem 102. For example, the BMIU 130 can control addressing 142 and read/w ⁇ tes 144 to the Stack/ ALU subsystem 102 Also shown is a halt 145 command, which can be communicated to the Stack/ ALU subsystem 102
- RD/WRT STK read/w ⁇ te stack
- the RD/WRT STK 126 is, in one embodiment, defined by function code which will be discussed further with reference to Figure 2
- the RD/WRT STK 126 allows the host processor to read and w ⁇ te to and from an operand stack 110.
- a wait 136 command and ALU CCs (condition codes) 134 are shown as signals coming from the Stack/ALU subsystem 102.
- a bus 140 receives output from the output 116 to the BMIU 130 along path 132.
- the following is an example of the efficient operation of the Stack/ALU subsystem 102.
- data which is 32 bits wide, is sent to the Stack/ ALU subsystem 102 via the bus 120
- the stackpointers 105c-f direct the data as 32 bit portions into each of banks 110a, 110b, 110c and l lOd.
- the function code 124 which in one example may define an ADD operation, is passed to the Stack/ ALU controller 104, where the function code 124 is decoded.
- a first cycle starts where the Stack/ ALU controller 104 directs the passing of data from the operand stack 110 through the output mux 112 to the ALU 106.
- the data passed from the output mux 112 to the ALU 106 can be up to 128 bits wide (e.g., multiple 32 bit words). This is contrary to the p ⁇ or art which only transfers data at 32 bit wide increments (e.g., only one word at a time) and the transfer of more than one 32 bit portion is completed over a number of cycles without the use of stackpointers.
- the Stack/ALU controller 104 transfers the function code to the ALU 106.
- the ALU 106 can therefore perform the desired operation (e.g., which can be any arithmetic operation commonly performed by processors, Java processors, or the like) as specified by the function code 124 on the data to produce a result.
- the ALU 106 can then signal the Stack/ALU controller 104 informing that the operation is complete and the Stack/ ALU controller 104 reads the result from the ALU 106 back to the operand stack 110.
- a return path 114 from the ALU 106 to the operand stack 110 is defined by two 32 bit wide paths 114a and 114b (i.e., a 64 bit wide path). This ensures that the result can be stored during the same single access operation. It should be noted that when the result is stored back to the operand stack 110, the stackpointers 105e (stackpointer -3) and 105f (stackpointer -4) define the location of the result within the operand stack 110.
- the stackpointers 105 are then readjusted after the result is read back into the operand stack 110 in a second cycle.
- the stackpointers are readjusted such that the stackpointers 105c (stackpointer -1) and 105d (stackpointer -2) indicate the location of the data. It should be noted that the location of the data within the operand stack 110 has not changed.
- the instruction to readjust the stackpointers is part of the function code 124.
- the output mux 112 pops out 32 bit wide results.
- the instruction to pop out the result from the operand stack 110 is written into the function code 124 and is part of standard Java code.
- the function code is sent to the stack/ ALU subsystem 102 by a decode/execute subsystem. When the function code is sent to the stack/ALU subsystem 102, this causes the appropriate data inside the operand stack 110 to get gated onto the 'B Bus'.
- Figure 2 is one embodiment of the present invention illustrating the stackpointers 105 transferring data from the operand stack 110 to ALU slots 112a-d.
- a function code register 202 holds the operation to be performed by the stack/ ALU subsystem 102. This information is derived from a Java instruction by the decode/execute subsystem.
- the decode/execute subsystem is in communication with the stack/ ALU subsystem 102, and its functionality is known to those skilled in the art
- the function code in one embodiment, requests that an operation be performed on data withm the operand stack 110 For example, if an ADD operation is desired, the function code register 202 is w ⁇ tten so that an ADD operation is directed.
- the function code register 202 is shown in communication with a function de-code table 204
- the function de-code table 204 contains decode signals (e.g., ADDs, long divide, etc.) These decode signals 204a therefore define, in one embodiment, the function to be performed
- the function de-code table 204 communicates decode signals 204a to a bank selector mux 206.
- the stackpointers 105 communicate the bank address information of the data within the operand stack 110 to the bank selector mux 206
- the stackpointers 105 contain address bits 105c-l through 105f-l which specify the bank addresses of the data within the operand stack 110.
- the bank selector mux 206 takes the information from the decode signals 204a and the address bits 105c-l through 105f-l and generates slot selection signals 206a.
- the slot selection signals 206a direct which ALU slots 112a-d of the output mux 112 the data will be sent for processing withm the ALU 106.
- the slot selection signals 206a also tell the input mux 108 the target location in the operand bank 110 which will contain the result of the ALU operation. It should be noted that data may be taken from any of the banks HOa-d regardless of which order the data is in the operand stack 110.
- a Java instruction may specify that an ADD operation be performed for a set of data contained in the operand stack 110
- the decode of the Java instruction causes a function code requesting the ADD operation to get loaded.
- the decode signals 204a are generated and sent to the bank selector mux 206.
- the stackpointers 105 send information indicating the bank address of data withm the operand stack 110.
- the stackpointer 105f specifies the bank location of the set of data for the requested ADD operation with address bits 105f-l.
- the bank selector mux 206 will generate slot signals 206a and sends this to the output mux 112.
- the slot signals 206a tell the output mux 112 the bank address of the set of data.
- the present invention provides many benefits to users
- the present invention now allows operations to be performed through a stack processor in fewer accesses as well as more efficiently At most, two accesses are required, one for reading and one for w ⁇ tmg
- the data path from the operand stack to the ALU is 128 bits wide, therefore two 64 bit ent ⁇ es may be read in a single access
- the data path was either 32 bits wide or 64 bits wide In either case, multiple accesses were required to read the data from the operand stack into the ALU for 128 bit entities, thereby increasing processing time
- the stackpointers avoid wasting expensive storage space within the stacks
- the stackpointers establish selectable banks whereby 32 bit wide storage areas are created in the 128 bit wide stack
- 32 bit wide storage areas are created in the 128 bit wide stack
- up to four separate data ent ⁇ es that are 32 bits wide are capable of being stored in one 128 bit wide stack
- one 32 bit data entry will not waste storage space in a 128 bit wide stack, thereby using the storage space of the stack more efficiently
- va ⁇ ous computer-implemented operations involving data stored in computer systems to d ⁇ ve computer pe ⁇ pheial devices may be employed These operations are those requi ⁇ ng physical manipulation of physical quantities Usually, though not necessa ⁇ ly, these quantities take the form of elect ⁇ cal or magnetic signals capable of being stored, transferred, combined, compared and otherwise manipulated Further, the manipulations performed are often referred to in terms such as ascertaining, identifying, scanning, or compa ⁇ ng
- any of the operations desc ⁇ bed herein that form part of the invention are useful machine operations Any approp ⁇ ate device or apparatus may be utilized to perform these operations
- the apparatus may be specially constructed for the required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer
- va ⁇ ous general purpose machines may be used with computer programs w ⁇ tten in accordance with the teachings herein, where it may be more convenient to construct a more specialized apparatus to perform the required operations
- Any of the operations desc ⁇ bed herein that form part of the invention are useful machine operations. Any approp ⁇ ate device or apparatus may be utilized to perform these operations.
- the apparatus may be specially constructed for the required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer.
- va ⁇ ous general purpose machines may be used with computer programs w ⁇ tten in accordance with the teachings herein, where it may be more convenient to construct a more specialized apparatus to perform the required operations.
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP01914655A EP1281120A1 (en) | 2000-05-04 | 2001-03-01 | Processor architecture having an alu, a java stack and multiple stackpointers |
JP2001580661A JP2003532221A (en) | 2000-05-04 | 2001-03-01 | Processor architecture with ALU, Java stack and multiple stack pointers |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US56567900A | 2000-05-04 | 2000-05-04 | |
US09/565,679 | 2000-05-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2001084305A1 true WO2001084305A1 (en) | 2001-11-08 |
Family
ID=24259660
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2001/006813 WO2001084305A1 (en) | 2000-05-04 | 2001-03-01 | Processor architecture having an alu, a java stack and multiple satckpointers |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1281120A1 (en) |
JP (1) | JP2003532221A (en) |
WO (1) | WO2001084305A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4247920A (en) * | 1979-04-24 | 1981-01-27 | Tektronix, Inc. | Memory access system |
US4432055A (en) * | 1981-09-29 | 1984-02-14 | Honeywell Information Systems Inc. | Sequential word aligned addressing apparatus |
EP1050818A1 (en) * | 1999-05-03 | 2000-11-08 | STMicroelectronics SA | Computer memory access |
-
2001
- 2001-03-01 EP EP01914655A patent/EP1281120A1/en not_active Withdrawn
- 2001-03-01 WO PCT/US2001/006813 patent/WO2001084305A1/en not_active Application Discontinuation
- 2001-03-01 JP JP2001580661A patent/JP2003532221A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4247920A (en) * | 1979-04-24 | 1981-01-27 | Tektronix, Inc. | Memory access system |
US4432055A (en) * | 1981-09-29 | 1984-02-14 | Honeywell Information Systems Inc. | Sequential word aligned addressing apparatus |
EP1050818A1 (en) * | 1999-05-03 | 2000-11-08 | STMicroelectronics SA | Computer memory access |
Also Published As
Publication number | Publication date |
---|---|
EP1281120A1 (en) | 2003-02-05 |
JP2003532221A (en) | 2003-10-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI407366B (en) | Microprocessor with private microcode ram,method for efficiently storing data within a microprocessor ,and computer program product for use with a computing device | |
US6134653A (en) | RISC processor architecture with high performance context switching in which one context can be loaded by a co-processor while another context is being accessed by an arithmetic logic unit | |
US5819063A (en) | Method and data processing system for emulating a program | |
US5784638A (en) | Computer system supporting control transfers between two architectures | |
US4729094A (en) | Method and apparatus for coordinating execution of an instruction by a coprocessor | |
JP3120152B2 (en) | Computer system | |
US4715013A (en) | Coprocessor instruction format | |
EP0938703A4 (en) | Real time program language accelerator | |
WO1993002414A2 (en) | Data processing system with synchronization coprocessor for multiple threads | |
JPH0612327A (en) | Data processor having cache memory | |
US4731736A (en) | Method and apparatus for coordinating execution of an instruction by a selected coprocessor | |
US4750110A (en) | Method and apparatus for executing an instruction contingent upon a condition present in another data processor | |
US5021991A (en) | Coprocessor instruction format | |
KR20010007031A (en) | Data processing apparatus | |
EP0525831B1 (en) | Method and apparatus for enabling a processor to coordinate with a coprocessor in the execution of an instruction which is in the intruction stream of the processor. | |
US4821231A (en) | Method and apparatus for selectively evaluating an effective address for a coprocessor | |
US4758950A (en) | Method and apparatus for selectively delaying an interrupt of a coprocessor | |
US4914578A (en) | Method and apparatus for interrupting a coprocessor | |
Alsup | Motorola's 88000 family architecture | |
JP4465081B2 (en) | Efficient sub-instruction emulation in VLIW processor | |
Wilsey et al. | The concurrent execution of non-communicating programs on SIMD processors | |
Berenbaum et al. | Architectural Innovations in the CRISP Microprocessor. | |
US6108761A (en) | Method of and apparatus for saving time performing certain transfer instructions | |
EP1281120A1 (en) | Processor architecture having an alu, a java stack and multiple stackpointers | |
US4758978A (en) | Method and apparatus for selectively evaluating an effective address for a coprocessor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN DE GB JP KR |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2001914655 Country of ref document: EP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWP | Wipo information: published in national office |
Ref document number: 2001914655 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2001914655 Country of ref document: EP |