WO2001082059A2 - Procede et appareil d'amelioration des temps de changement de contexte dans un systeme informatique - Google Patents

Procede et appareil d'amelioration des temps de changement de contexte dans un systeme informatique Download PDF

Info

Publication number
WO2001082059A2
WO2001082059A2 PCT/US2001/013530 US0113530W WO0182059A2 WO 2001082059 A2 WO2001082059 A2 WO 2001082059A2 US 0113530 W US0113530 W US 0113530W WO 0182059 A2 WO0182059 A2 WO 0182059A2
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
register
registers
source
processor
Prior art date
Application number
PCT/US2001/013530
Other languages
English (en)
Other versions
WO2001082059A3 (fr
Inventor
Ashley Saulsbury
Daniel S. Rice
David Emberson
Original Assignee
Sun Microsystems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Microsystems, Inc. filed Critical Sun Microsystems, Inc.
Priority to AU2001257323A priority Critical patent/AU2001257323A1/en
Publication of WO2001082059A2 publication Critical patent/WO2001082059A2/fr
Publication of WO2001082059A3 publication Critical patent/WO2001082059A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30105Register structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30123Organisation of register space, e.g. banked or distributed register file according to context, e.g. thread buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • G06F9/3832Value prediction for operands; operand history buffers

Definitions

  • This invention relates to the field of processors in computing systems, and, more specifically, to context switching.
  • a processor In current computing systems, a processor (also referred to in various implementations as a microprocessor, controller, microcontroller, central processing unit (CPU), etc.) is used to decode program instructions associated with a process, and to execute the operations specified for each given instruction by manipulating data values and moving data within the computing system's address space.
  • Many computing systems implement a scheme for handling multiple processes concurrently in what is called "multiprocessing.” Multiprocessing includes sharing, among all runnable processes, instruction execution cycles of one or more processors.
  • a processor interrupts the execution of a first process to execute a second process. The second process is then interrupted to execute a third process, or to resume execution of the first process, and so on.
  • the determination of when each process executes within a processor (referred to as "process scheduling”) may be, for example, time-based and/or priority-based, non-preemptive or preemptive.
  • a process context includes, for example, the contents of the processor's registers, and may be further extended to include data stored in cache memory.
  • the process context includes the processor state necessary for a process to be stopped and restarted at a later time without incurring errors in execution, or requiring excessive re-execution of instructions to recapture the prior processor state.
  • main memory e.g., RAM
  • context switching This process of writing (or "storing") a first context to memory and loading a second context into the processor is called "context switching," and occurs each time the processor changes processes.
  • context switching is a source of undesired overhead in the system, as the processor must sacrifice valuable execution cycles to individually store each register value for the outgoing context into main memory, and to individually load each register value from memory into the respective processor register for the incoming context. This undesired overhead increases with the number of registers supported by the processor architecture.
  • processor architectures of the prior art use a "dirty" register scheme.
  • This scheme limits the registers included in a current process context to those registers that have been used by the current process since the last context switch.
  • the dirty register scheme is typically implemented by associating a "dirty bit" with each register. Whenever a register is used during process execution, that register's associated dirty bit is set. Those registers that have not been used by the current process, i.e., those registers whose associated dirty bits are not set, are ignored during a context switch.
  • the decrease in context size results in a reduction in the number of store operations required to transfer an outgoing context to memory. The number of load operations required to transfer an incoming context back into the processor is also reduced.
  • FIG. 1 is a flow diagram illustrating a context switch using the dirty register scheme.
  • Steps 100-105 refer to the outgoing context of the process being interrupted, and steps 106-110 refer to the incoming context of the process that is resuming execution.
  • the context switch begins in step 100, where certain special function registers are stored in memory as part of the outgoing context. These special function registers are those registers fundamental to processor operation, such as the program counter register, that are always included in the process context.
  • the rest of the processor registers commonly configured as part of a register file, are processed beginning in step 101 where the first register of the register file is identified as the current register under consideration.
  • step 102 the processor determines whether the current register is dirty, i.e., whether the associated dirty bit is set. If the current register is dirty, the contents of the register are stored in memory in step 103 as part of the outgoing context. If the current register is not dirty, step 103 is bypassed.
  • step 104 the processor determines whether the current register is the last register in the register file. If the current register is not the last register, the next register is identified for consideration in step 105, and the context switch returns to step 102. If, in step 104, the current register is the last register in the register file, the outgoing context is completed with respect to the processor registers, and the context switch continues with the mcoming context in step 106.
  • step 106 the special function registers of the processor are loaded from the incoming context in memory.
  • step 107 the first register value in memory is identified as the current register value being processed in the context switch, and in step 108, the current register value is loaded from the mcoming context in memory into the associated register in the processor's register file.
  • step 109 the processor determines whether the current register value is the last value of the incoming context. If the current register value is not the last register value, the next register value in the incoming context is identified as the current register value being processed, and the context switch returns to step 108. If, in step 109, the current register value is the last register value stored in the incoming context, the context switch is completed and the process associated with the mcoming context resumes execution.
  • a method and apparatus to improve context switch times in a computing system are described.
  • the invention provides a mechanism for registers containing unneeded data to be removed from a process context, thus reducing the amount of time required to perform a context switch.
  • Program instructions are used to explicitly specify registers that are to be abandoned.
  • the processor resets the dirty bits for those registers that are to be abandoned.
  • a subsequent context switch excludes the abandoned registers.
  • a microprocessor instruction (such as a NOP instruction or other instruction with available fields) is provided that has one or more operands for specifying registers to be abandoned.
  • the operands of the microprocessor instruction are set to specify registers that become unnecessary, i.e., that no longer contain useful information.
  • the processor determines the registers to be abandoned from the operands of the microprocessor instruction, and removes those registers from the process context (e.g., by resetting corresponding dirty bits).
  • the identification of registers for abandonment is encoded within existing instructions that specify registers as source operands.
  • An "abandon register” bit-field is provided in an instruction for each source operand that specifies a processor register.
  • a compiler is configured to determine whether a source register will be needed after a given instruction is executed, and to set an associated "abandon register” bit of the instruction based on the results of that determination.
  • the "abandon register” bit is not set for a given source operand, the "dirty bit” of the associated processor register is unchanged. If the "abandon register” bit is set for the given source operand, the processor removes the associated register from the process context, for example, by resetting the "dirty bit" of the associated processor register to indicate that the value stored in the register is not needed.
  • Figure 1 is a flow diagram of a context switch using a dirty register scheme.
  • Figure 2 is a block diagram of an example processor architecture.
  • Figure 3 is a block diagram of a register file.
  • Figure 4 is a bit diagram of sample instruction formats.
  • FIG. 5 is a bit diagram of instruction formats in accordance with embodiments of the invention.
  • Figure 6 is a flow diagram of an instruction execution process in accordance with an embodiment of the invention.
  • Figure 7 is an example of assembly code illustrating application of a NOP instruction in accordance with an embodiment of the invention.
  • Figure 8 is a flow diagram of a compiler process in accordance with an embodiment of the invention.
  • FIG. 9 is a flow diagram of an instruction execution process in accordance with an embodiment of the invention.
  • Figure 10 is an example of assembly code illustrating application of instruction encoding in accordance with an embodiment of the invention.
  • Figure 11 is a flow diagram of a compiler process in accordance with an embodiment of the invention.
  • the invention is a method and apparatus to improve context switch times in a computing system.
  • numerous specific details are set forth to provide a more thorough description of embodiments of the invention. It will be apparent, however, to one skilled in the art, that the invention may be practiced without these specific details. In other instances, well known features have not been described in detail so as not to obscure the invention.
  • registers that have been used by a process are included in the process context. This includes those registers for which the stored data has already been utilized in a desired operation, and for which no further purpose remains. Thus, registers containing unneeded data are unnecessarily transferred to main memory, and loaded from main memory, as part of a context switch. Sub-optimal context switch times are therefore achieved.
  • a processor is configured to remove unneeded registers from a process context (i.e., to abandon the register) in response to program instructions.
  • a compiler is configured to provide such program instructions for registers that no longer contain data needed by the executing process.
  • An embodiment of the invention can be implemented in the instruction set of any processor architecture.
  • a processor architecture For purposes of example, one possible embodiment of a processor architecture is shown in Figure 2.
  • FIG. 2 is a block diagram of a microprocessor (i.e., a processor implemented on an integrated circuit (IC) chip) containing a processor 200 coupled to data bus 210 and address bus 211. More complex microprocessors may also include other on-chip components, such as floating-point units, memory management units, an "on-chip” cache (sometimes referred to as “level 1” or “LI” cache), and possibly further processors.
  • Figure 2 also includes "off- chip” main memory (e.g., random access memory (RAM)) 208 coupled to the microprocessor chip via data bus 210 and address bus 211.
  • Processor 200 can access program instructions and program data in main memory 208 via data bus 210 by asserting an appropriate memory address on address bus 211.
  • processor 200 stores one or more process contexts (1-M) 209 in main memory 208. Each process context 209 represents the preserved state of a process not currently executing in processor 200.
  • Off-chip computer components such as “level 2” or “L2” cache, mass storage devices (e.g., disk drives) and input/output (I/O) devices (e.g., keyboard, mouse, display, network devices, etc.), may also be coupled to data bus 210 and address bus 211 to interface with the microprocessor. Each component occupies a portion of the computing system's address space.
  • Memory-mapped I/O devices may have data written to them, and read from them, in similar fashion to memory devices, by addressing the portion of the address space specified for each respective I/O device.
  • data bus 210 and address bus 211 are implemented as a single, time-multiplexed bus.
  • Processor 200 provides the main mechanism for executing instructions associated with application programs, operating systems, and, in some embodiments, firmware routines. Execution is carried out by a process of fetching individual instructions from memory, decoding those instructions, and exerting control over the components of the processor to implement the desired function(s) of each instruction. The manner in which a processor carries out this execution process is dependent on the individual components of the given processor and the component interaction provided by the defined instruction set for the given processor architecture. For the example architecture of processor 200, the functions of the individual components are generally described below.
  • Processor 200 comprises arithmetic logic unit (ALU) 201, register file 202, program counter register (PC REG) 203, memory address register (MEM ADDR REG) 204, instruction register (INSTR REG) 205, control unit 206 and ALU input multiplexer (MUX) 212, each coupled to an internal data bus 213.
  • Multiplexer (MUX) 207 is provided to drive address bus 211 from either program counter register 203 or memory address register 204.
  • Memory buffer register 229 is coupled between data bus 210 and internal data bus 213 to buffer data transfers between the two data buses.
  • Program counter register 203, memory address register 204, memory buffer register 229 and instruction register 205 are special function registers used by processor 200. If the processor architecture implements a stack structure in main memory 208, processor 200 may also include a stack pointer register (not shown) to store the address for the top element of the stack. Program counter register 203, instruction register 205 and the stack pointer register (if implemented) are each stored as part of the process context during a context switch.
  • Program counter register 203 is used to store the address of the next instruction for the current process, and is updated during the execution of each instruction. This updating typically consists of incrementing the current value of program counter register 203 to point to the next instruction. However, for branching instructions, program counter register 203 may be directly loaded with a new address as a jump destination, or an offset may be added to the current value. In the execution of a conditional branching instruction, condition codes generated by arithmetic logic unit 201 may be used in the determination of which update scheme is applied to program counter register 203. When it is time to fetch the next instruction, program counter register 203 drives its output signal 218 onto address bus 211 via MUX 207. Control signal 224 from control unit 206 controls the loading of program counter register 203, whereas control signal 227 controls the selection mechanism of MUX 207 to drive output signal 218 (or output signal 219 of memory address register 204) onto address bus 211.
  • Memory address register 204 is used to hold the target memory address during execution of memory access instructions, such as "load” and "store.” In variations of the load and store instructions, the memory address may be loaded into memory address register 204 from one of the registers in register file 202, from the results of an ALU operation, or from an address bit-field extracted from an instruction (e.g., via masking and shifting). When the memory access is initiated, memory address register 204 drives its output signal 219 onto address bus 211 via MUX 207. Memory buffer register 229 provides a data buffer between the internal data bus 213 of processor 200 and the typically much slower data bus 210.
  • Incoming data (including instructions) loaded from main memory 208 or other components external to processor 200 is first stored into memory buffer register 229 before being placed into one of the registers within processor 200.
  • the address asserted on address bus 211 by program counter register 203 or memory address register 204 determines the origination of the incoming data.
  • Outgoing data is loaded from one of the registers of processor 200 into memory buffer register 229 to be driven onto data bus 210.
  • the address asserted on address bus 211 determines the outgoing data's destination address, in main memory 208, for example.
  • Control unit 206 enables loading of memory buffer register 229 via control signal 228.
  • instruction register 205 When an instruction is fetched from main memory (or from an instruction cache), the instruction is loaded into instruction register 205 (enabled by control unit 206 via control signal 226). The contents of instruction register 205 are made available to control unit 206 via port 220.
  • An instruction cache (not shown) may also be implemented which stores a subset of the previously accessed instructions in a fast cache memory that is typically on-chip with the processor.
  • the instruction cache is checked first to see if the instruction is resident in the cache (i.e., a cache "hit"). If the instruction is resident, then the instruction is loaded from the cache. If the instruction is not resident in the instruction cache (i.e., a cache "miss”), the instruction is fetched from the slower main memory.
  • Arithmetic logic unit 201 provides the data processing or calculating capabilities of the processor.
  • Arithmetic logic unit 201 typically comprises double-input/single-output hardware for performing functions such as integer arithmetic (add, subtract), Boolean operations (bitwise AND, OR, NOT (complement)) and bit shifts (left, right, rotate). These functions are sufficient for processor 200 to implement most basic operations, with more complex operations being implemented in software using combinations of those basic operations (e.g., integer multiplication may be implemented by a series of integer add and shift operations).
  • arithmetic logic unit 201 may further comprise hardware for implementing more complex functions, such as integer multiplication and division, floating-point operations, specific mathematical functions (e.g., square root, sine, cosine, log, etc.) and vector processing or graphical functions, for example.
  • Control signal 222 from control unit 206 controls a multiplexer or other selection mechanism within arithmetic logic unit 201 for selecting the desired function.
  • the inputs to arithmetic logic unit 201 are signal 214 from register file 202, and signal 217 from input multiplexer 212. Based on control signal 221 from control unit 206, MUX 212 selects signal 217 from internal data bus 213 or signal 215 from register file 202. Output signal 216 of arithmetic logic unit 201 is provided to internal data bus 213 and/or register file 202. Additional output signals (not shown) may include condition codes such as "zero,” “negative,” “carry” and “overflow.” These condition codes may be used in implementing conditional branching operations during updating of program counter register 203.
  • Register file 202 comprises a set of fast, multi-port registers for holding data to be processed by arithmetic logic unit 201, or otherwise frequently used data. By mamtaining data in register file 202, the data is kept in close proximity to the processor, and lengthy memory access delays are avoided. Register access speeds typically permit register values to be operated on by the arithmetic logic unit and stored back into the register file within a single clock cycle. In contrast, access to values in main memory may take several cycles or more to transfer between the memory buffer register and main memory.
  • the registers within register file 202 are directly accessible to software through instructions that contain source and destination register address bit-fields. Most current processors have at least thirty-two integer registers capable of holding 32-bit or 64-bit data words.
  • a processor having floating point processing capabilities may also contain thirty-two or more floating point registers, each capable of storing a double-precision floating point data word.
  • the concepts described herein may be applied to register files and registers (i.e., data words) of any size.
  • An example implementation of register file 202 is illustrated in Figure 3.
  • the register file comprises N registers 300 addressable using log 2 N bits for addresses 0 through N-l. Registers 300 are used to store data words.
  • the register file also contains N "dirty" bits 301 (e.g., one-bit memory cells), with each dirty bit corresponding to a respective register 300.
  • the respective dirty bit is set by the processor (e.g., by control unit 206 asserting set inputs of appropriate dirty bits via control signal 223) when a data word is loaded into the register.
  • the set of used or dirty registers is included in the process context when a context switch occurs.
  • registers 0, 1, 4 and N-2 are shown with respective dirty bits set and are thus part of the process context. The dirty bits may be reset for a newly loaded process context after a context switch occurs.
  • control unit 206 decodes the instruction contained in instruction register 205 to determine the operation or operations called for by the given instruction.
  • Control unit 206 manipulates the control signals of the components of processor 200 to carry out those operations. For example, in a processor with thirty-two registers in register file 202 (addresses 0- 31), the assembly code "ADD $3 , $4 , $7" is implemented in an instruction comprising an "add” operation code, or "opcode,” and three five-bit register address bit-fields.
  • register 3 (OOlOlbin) and register 4 (OOlOObin) are provided to arithmetic logic unit 201 as signal 214 and 215 (control signal 221 selects signal 215 to be passed through MUX 212 as ALU input signal 217), and control signal 222 selects an add operation for arithmetic logic unit 201.
  • the result of the addition is output from arithmetic logic unit 201 as signal 216, and loaded into register 7 (OOlllbin)-
  • Control signal 223 enables the input and output ports of the specified registers in register file 202 to carry out the transmission of data between register file 202 and arithmetic logic unit 201.
  • control unit 206 may be implemented with purely combinational logic to directly decode the opcode of the instruction into appropriately asserted control signals.
  • a single instruction requires the processor to perform multiple operations, some of which may be performed in parallel.
  • the "ADD" instruction previously discussed might include the steps of processing the register values in arithmetic logic unit 201, updating program counter register 203, and fetching the next instruction.
  • control unit 206 typically contains a state machine implementation for asserting control signals during different timed states.
  • control unit's state machine is implemented as a ROM-based microprogram that contains microinstructions corresponding to each state. Based on decoding of a program instruction, the microprogram jumps to routines or microinstruction sequences for carrying out the operations of specific program instructions.
  • a separate program counter register in the control unit is used to step through the microinstructions, and each microinstruction is decoded to generate the appropriate control signals for the associated state.
  • processor architecture described above is for purposes of example only.
  • An embodiment of the invention may be implemented in any type of processor or processing environment, including virtual processors or virtual machines that implement context switching.
  • the following is a general description of sample program instruction formats that may be implemented in a processor such as the embodiment of processor 200 previously described.
  • Program instructions are provided in binary form, with different processor architectures having different instruction widths.
  • An instruction typically comprises an operation code (or "opcode") bit-field and zero or more additional bit-fields. Examples of some possible instruction formats are illustrated as bit diagrams in Figure 4.
  • Bit diagram 400 shows an instruction of width Y bits, having an opcode of X bits followed by a data field of (Y-X) bits. The sizes of the opcode bit-field and the data field may vary for different instructions within a processor architecture's instruction set. Also, the number and type of the data fields are in accordance with the defined instruction set of the processor architecture.
  • the opcode specifies the type of the instruction, such as a load, store, add, subtract, or jump instruction or variation thereof.
  • the data field may be used to specify constant values, function flags and/ or address offsets for stack functions.
  • the data field may also comprise several address fields (addresses 1-M), as shown in bit diagram 401.
  • the address fields may specify source or destination address values for accessing the computing system's address space via buses 210 and 211, and/or the address fields may specify one or more source and/or destination register addresses for accessing the processor's register file 202.
  • Bit diagram 402 shows how an instruction specifying register operations with two source operands and one destination operand might be implemented in a 32-bit instruction, assuming that the processor has thirty-two addressable registers in its register file.
  • the instruction contains an opcode field of seventeen or fewer bits to provide sufficient room for the source and destination operands.
  • the opcode is followed by a first source operand (SRC 1), a second source operand (SRC 2) and a destination operand (DEST).
  • the destination operand may be the same as one of the source operands if the instruction is operating on the values of two registers and placing the result back into one of those same registers.
  • Each of the source and destination operands are five bits wide to permit any of registers 0 through 31 to be addressed.
  • Many current computing systems are expanding to use 64-bit instructions, which allows for more and /or wider data fields. For register operations, a wider source or destination operand allows for addressing of a larger register file.
  • a "NOP" instruction i.e., an instruction that otherwise performs no action
  • NOP instruction is used to specify registers in the register file that should be released from the current process context.
  • the NOP instruction is described in this embodiment because the NOP instruction typically contains surplus bits that may be used to specify "abandon register” operands. It will be obvious that other instructions that contain sufficient surplus bits to provide "abandon register” operands may be used in a similar manner.
  • the instruction comprises a NOP opcode and one or more register addresses as operands.
  • the NOP instruction is Y bits wide, with T bits provided for each operand for a register file of size less than or equal to 2 T registers. The remaining bits comprise the NOP opcode. If only one operand may be specified in a NOP instruction and more than one source register should be abandoned, multiple NOP instructions may be used.
  • the control unit asserts control signals to reset the dirty bits of those registers specified by the operands.
  • register zero is hardwired to a specific value (e.g.,
  • the "abandon register" o ⁇ erand(s) of the NOP instruction may be set to specify register zero so that no dirty bits are reset (i.e., implementing a true "no operation" NOP instruction).
  • FIG. 6 is a flow diagram of the control unit process for handling a NOP instruction in accordance with an embodiment of the invention. As previously indicated, the process of Figure 6 may be similarly applied to instructions other than NOP instructions.
  • a determination is made of whether the current instruction is a NOP instruction ( or other instruction used to implement "abandon register" operands). If the instruction is not a NOP instruction, no NOP handling is performed for that instruction. However, if the instruction is a NOP instruction, in step 601, the control unit determines which registers are specified by the operands of the instruction, e.g., by decoding the respective operands to generate register select signals.
  • the dirty bits of the designated register(s) are reset by the control unit asserting reset signals to those dirty bits.
  • the NOP-related steps are completed after step 602.
  • the NOP instruction occurs after the program instruction that specifies the source register that is no longer needed. The closer the NOP instruction is placed to the program instruction that specifies the unneeded source register, the less likely it is that a context switch will occur before the source register is removed from the process context.
  • Figure 7 is an example code fragment (in pseudo-assembly code) illustrating the placement of NOP instructions in accordance with an embodiment of the invention.
  • Fragment 700 represents unaltered program code
  • fragment 701 represents altered code (i.e., with NOP instructions inserted)
  • fragment 702 represents the altered code after optimization (i.e., NOP instructions combined).
  • the individual instructions of fragment 700 are intended to perform the following operations:
  • NOP instructions are inserted after the "add" and "store” instructions.
  • the NOP instructions specify address operands of register 3 and register 1, respectively.
  • the NOP instructions may be combined into a single NOP instruction after the "store” instruction.
  • the combined NOP instruction specifies both register addresses and handles the resetting of the dirty bits for registers 3 and 1 simultaneously.
  • the compiler may utilize only those NOP instructions already present in the code, e.g., to align code, balance instruction issue timing, etc. Registers are specified for abandonment based on the availability of NOP instructions in the code. Because there may be insufficient NOP instructions to account for every register that is a candidate for abandonment, and because NOP instructions may not be immediately available in the instruction stream, it is possible that some candidate registers will be included in a context switch. Thus, the conservative approach is less efficient at reducing context switch times than an embodiment that adds further NOP instructions. However, by foregoing the addition of NOP instructions into the instruction stream, code density is preserved. The number of required instruction fetches is reduced, saving fetch time and processor cycles, as well as space in the instruction cache (if implemented) and main memory.
  • Figure 8 is a flow diagram of a compiler process for setting fields in NOP instructions to abandon unneeded registers (or other instructions with available fields for such a purpose) in accordance with an embodiment of the invention. The steps described herein may be incorporated into any compiler process.
  • step 800 of Figure 8 the compiler determines whether the current instruction under consideration has source registers. If no source registers are specified, the process is complete for the current instruction. If source registers are specified, the compiler determines in step 801 whether the contents of the specified source register(s) will be needed after the instruction is executed. If the contents of the source register(s) will be needed, no registers are to be abandoned and the process is complete. However, if, in step 801, the contents of one or more source registers will not be needed, then, in step 802, the compiler determines whether any NOP instructions are available for specifying abandoned registers.
  • step 802 If, in step 802, a NOP instruction is available, the process continues at step 804, in which the unneeded source registers are identified in operands of the NOP instruction, completing the process for the current instruction. If, in step 802, no NOP instruction is available, the compiler may either append a NOP instruction to the code in step 803 and continue to step 804 (option A), or the process can complete for the current instruction (option B).
  • Options A and B represent different embodiments that may be implemented. Whereas option A insures that all unneeded registers are abandoned for best-case context switching, option B simplifies the compiler (no code insertion) and insures that code density is maintained.
  • the compiler may perform an additional optimizing step
  • instructions that specify registers as source operands are themselves utilized to identify those registers for abandonment (i.e., for removal from the current process context).
  • Bit-fields for indicating abandoned registers are encoded or embedded within the instruction, so that the addition and /or utilization of NOP instructions for this purpose is unnecessary. Optimized context switching is achieved without sacrificing code density or processor cycles.
  • a bit-field for one or more source operands is added, for example, to the opcode portion of the instruction.
  • the added bit-field is referred to as the "abandon register" bit, and, when set, indicates that the corresponding source register is to be removed from the current process context after the instruction is executed.
  • the processor is configured to decode the added bit-field or bit-fields when decoding the instruction, and, if indicated by the bit-field value, to remove the corresponding register from the process context as part of instruction execution( e.g., by resetting a respective dirty bit in the register file).
  • Bit diagram 501 of Figure 5 illustrates an example format for encoding information about unneeded registers into an instruction.
  • This example shows two source operands (SRC 1 and SRC 2) and one destination operand (DEST) specifying registers in a processor's register file.
  • the opcode portion of the instruction contains the standard opcode followed by the single-bit bit-fields Al and A2, representing the "abandon register" bits for SRC 1 and SRC 2, respectively.
  • a similar format can be used for any instruction having one or more register-based source operands.
  • Figure 9 is a flow diagram illustrating a method for a processor to process encoded instructions in accordance with an embodiment of the invention. It will be obvious to one of ordinary skill in the art that certain steps in the following method may be performed in parallel with other steps, or in a sequence differing from that shown, without departing from the scope of the invention.
  • step 900 of Figure 9 an instruction is fetched from main memory (or the instruction cache).
  • the current instruction is decoded by the processor's control unit in step 901, and, in step 902, the operation specified by the decoded opcode is carried out by the processor.
  • step 903 the processor determines whether the current instruction specifies a register in a destination operand. If a destination register is specified, then, in step 904, the dirty bit of the destination register is set before proceeding to step 905. If a register is not specified in a destination operand, the method bypasses step 904 and continues in step 905.
  • step 905 the processor determines whether any registers are specified in source operands of the current instruction. If no source registers are specified, the method continues at step 909. If source registers are specified in step 905, the steps shown in block 906 (i.e., steps 907-908) are executed for each source operand that has a corresponding "abandon register” bit in the instruction format. In step 907, the processor determines the value of the corresponding "abandon register” bit for the given source operand.
  • step 908 the dirty bit of the specified source register (i.e., the register whose address is specified in the source operand) is reset in step 908 and the method continues, returning to step 907 if another source operand remains to be processed, or proceeding in step 909 if no other source operands remain. If the "abandon register” bit is not set, the processor bypasses step 908, and performs no action with respect to the source register's dirty bit. In step 909, the program counter register is loaded with the next instruction address, and the method returns to step 900, where execution of the next instruction begins.
  • the specified source register i.e., the register whose address is specified in the source operand
  • the flow chart of Figure 9 illustrates how the steps for abandoning a register are embedded within the last instruction that uses that register.
  • the steps for abandoning a register comprise the resetting of dirty bits, and do not affect other operations carried out in connection with the instruction being executed, the "abandon register" steps may be performed in parallel with other operations of the given instruction.
  • program execution experiences little to no degradation in performance, and context switch times are reduced by the amount of time otherwise needed to store and load the abandoned registers.
  • Figure 10 illustrates how code fragment 700 is altered by the use of encoded instructions in accordance with an embodiment of the invention. Due to the encoded nature of the instructions, no additional instructions are added in the altered code fragment 1000. Rather, in each instruction that has one or more registers as source operands, a compiler-programmable "abandon register” bit is provided for one or more of the source operands. Optionally, registers that provide addressing data (e.g., pointers) for "load” and “store” instruction variations may also be provided with an “abandon register” bit (as shown in fragment 1000). In fragment 1000, "abandon register” bits are represented by a “0” or “1” in brackets at the end of the assembly instruction.
  • addressing data e.g., pointers
  • brackets The order of the values within the brackets indicates which source operand is to be abandoned, though it will be apparent that other embodiments may use other orderings or representations.
  • the "abandon register" value in the brackets is represented by a bit value in the "abandon register” bit-field(s) of the instruction, as shown by bit-fields Al and A2 in bit diagram 501, for example.
  • register 1 ($1) is the optional source operand. However, in this example, register 1 is not to be abandoned until the "store” instruction executes. Therefore, the "load” instructions are appended with "[0]", to indicate that the "abandon register” bit in the machine code instruction is not set (defaults to "0").
  • registers 3 and 4 ($3 and $4) represent the first and second source operands, respectively. In this example, it is desired to abandon register 3, but maintain register 4. Therefore, the "add” instruction is appended with "[1,0]” to set the "abandon register” bit-field Al while leaving bit-field A2 at its default "0" value.
  • register 4 represents a first source operand as the data source of the store operation
  • register 1 ($1) represents an optional second source operand providing addressing data.
  • the store instruction is thus appended with "[0,1]", to leave bit-field Al at its default value and to set bit-field A2.
  • Other assembly code formats may also be used to designate whether the "abandon register” bit-fields are set for a given instruction.
  • Figure 11 is a flow diagram of a compiler process for handling "abandon register” bits in accordance with an embodiment of the invention.
  • step 1100 of Figure 11 the compiler process determines whether the current instruction includes source registers (i.e., registers specified in source operands). If the current instruction does not specify source registers, the process is completed for the current instruction. If the current instruction does specify source registers, the first source register is identified for evaluation in step 1101. In step 1102, for the current source register, the compiler process determines whether the data in the source register is still useful (i.e., needed for future processing). If the data is no longer useful, in step 1104, the compiler process sets the corresponding "abandon register" bit in the instruction before proceeding to step 1105.
  • source registers i.e., registers specified in source operands
  • step 1103 the compiler process resets the "corresponding "abandon register” bit, or leaves the “abandon register” bit at its default value, and proceeds to step 1105.
  • step 1105 if the current source register is the last or only source register, the process is completed. Otherwise, the compiler process identifies the next source register in step 1106 and returns to step 1102.
  • the steps of Figure 11 may be applied as each instruction is generated during compilation of a higher-level program, or the compiler may perform these steps on the program instructions in an optimizing sweep after all instructions have been generated.
  • the compiler is able to use knowledge of later occurring program instructions in determining the usefulness of register values (e.g., if no later occurring instructions access a register value, that register may be abandoned).
  • the process of Figure 11 may be carried out as a modular process separate from normal compilation. Thus, even if a program is written directly in assembly or machine code form by a programmer without attention to abandoning registers, the optimization process of Figure 11 may still be applied to the program code to set the "abandon register" bit-field values for improved context switching. This is also true for the process of Figure 8 in connection with the "NOP" embodiment previously described.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Executing Machine-Instructions (AREA)
  • Advance Control (AREA)

Abstract

L'invention concerne un procédé et un appareil d'amélioration des temps de changement de contexte dans un système informatique. L'invention a trait à un mécanisme destiné à des registres contenant des données non requises à enlever d'un contexte de processus, ce qui permet ainsi de réduire le temps nécessaire pour réaliser un changement de contexte. On utilise des instructions de programme pour spécifier explicitement des registres qui doivent être abandonnés. Dans le cadre de l'installation d'un projet de bits modifiés par un processeur, ce dernier réinitialise les bits modifiés pour ces registres qui doivent être abandonnés. Un changement de contexte subséquent exclut des registres abandonnés. Selon un mode de réalisation, une instruction NOP (ou une autre instruction avec des bits de surplus pour des opérandes supplémentaires) présente au moins une opérande de spécification des registres à abandonner. Lors de l'exécution normale de l'instruction NOP, les registres spécifiés sont enlevés du contexte du processus (par exemple, par réinitialisation des bits modifiés associés). Dans un autre mode de réalisation, l'identification des registres destinés à l'abandon est codée au sein d'instructions existantes qui peuvent spécifier des registres comme des opérandes de source. Selon l'invention, un champ binaire de « registre d'abandon » est créé dans une instruction pour chaque opérande de source qui spécifie un registre de processeur. Si le bit du « registre d'abandon » est établi pour l'opérande de source donnée, le processeur élimine du contexte du processus le registre associé.
PCT/US2001/013530 2000-04-27 2001-04-26 Procede et appareil d'amelioration des temps de changement de contexte dans un systeme informatique WO2001082059A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001257323A AU2001257323A1 (en) 2000-04-27 2001-04-26 Method and apparatus to improve context switch times in a computing system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US56137800A 2000-04-27 2000-04-27
US09/561,378 2000-04-27

Publications (2)

Publication Number Publication Date
WO2001082059A2 true WO2001082059A2 (fr) 2001-11-01
WO2001082059A3 WO2001082059A3 (fr) 2002-04-11

Family

ID=24241692

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/013530 WO2001082059A2 (fr) 2000-04-27 2001-04-26 Procede et appareil d'amelioration des temps de changement de contexte dans un systeme informatique

Country Status (2)

Country Link
AU (1) AU2001257323A1 (fr)
WO (1) WO2001082059A2 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007059475A2 (fr) * 2005-11-10 2007-05-24 Qualcomm Incorporated Expansion d’un fichier d’un registre en pile en utilisant des registres shadow
US7962731B2 (en) 2005-10-20 2011-06-14 Qualcomm Incorporated Backing store buffer for the register save engine of a stacked register file

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4740893A (en) * 1985-08-07 1988-04-26 International Business Machines Corp. Method for reducing the time for switching between programs
US5485624A (en) * 1991-06-19 1996-01-16 Hewlett-Packard Company Co-processor monitoring address generated by host processor to obtain DMA parameters in the unused portion of instructions
EP0955583A2 (fr) * 1998-05-06 1999-11-10 Sun Microsystems, Inc. Technique de commutation de contexte pour processeurs avec grand tampon de régistres

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4740893A (en) * 1985-08-07 1988-04-26 International Business Machines Corp. Method for reducing the time for switching between programs
US5485624A (en) * 1991-06-19 1996-01-16 Hewlett-Packard Company Co-processor monitoring address generated by host processor to obtain DMA parameters in the unused portion of instructions
EP0955583A2 (fr) * 1998-05-06 1999-11-10 Sun Microsystems, Inc. Technique de commutation de contexte pour processeurs avec grand tampon de régistres

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"TECHNIQUE FOR REDUCING THE NUMBER OF REGISTERS SAVED AT A CONTEXT SWAP" IBM TECHNICAL DISCLOSURE BULLETIN, IBM CORP. NEW YORK, US, vol. 33, no. 3A, 1 August 1990 (1990-08-01), pages 234-235, XP000123918 ISSN: 0018-8689 *
SCHLAEPPI H P ET AL: "LIVE/DEAD TAGS ON COMPUTER REGISTERS" IBM TECHNICAL DISCLOSURE BULLETIN, IBM CORP. NEW YORK, US, vol. 25, no. 3A, August 1982 (1982-08), pages 1304-1307, XP000885132 ISSN: 0018-8689 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7962731B2 (en) 2005-10-20 2011-06-14 Qualcomm Incorporated Backing store buffer for the register save engine of a stacked register file
WO2007059475A2 (fr) * 2005-11-10 2007-05-24 Qualcomm Incorporated Expansion d’un fichier d’un registre en pile en utilisant des registres shadow
WO2007059475A3 (fr) * 2005-11-10 2007-08-23 Qualcomm Inc Expansion d’un fichier d’un registre en pile en utilisant des registres shadow
US7844804B2 (en) 2005-11-10 2010-11-30 Qualcomm Incorporated Expansion of a stacked register file using shadow registers

Also Published As

Publication number Publication date
AU2001257323A1 (en) 2001-11-07
WO2001082059A3 (fr) 2002-04-11

Similar Documents

Publication Publication Date Title
US5926646A (en) Context-dependent memory-mapped registers for transparent expansion of a register file
US8386754B2 (en) Renaming wide register source operand with plural short register source operands for select instructions to detect dependency fast with existing mechanism
US5404552A (en) Pipeline risc processing unit with improved efficiency when handling data dependency
US6594754B1 (en) Mapping destination logical register to physical register storing immediate or renamed source register of move instruction and using mapping counters
US20010010072A1 (en) Instruction translator translating non-native instructions for a processor into native instructions therefor, instruction memory with such translator, and data processing apparatus using them
US5727227A (en) Interrupt coprocessor configured to process interrupts in a computer system
US5995743A (en) Method and system for interrupt handling during emulation in a data processing system
EP2660715A2 (fr) Optimisation des opérations d'initialisation de registre
JP2002512399A (ja) 外部コプロセッサによりアクセス可能なコンテキストスイッチレジスタセットを備えたriscプロセッサ
US9652234B2 (en) Instruction and logic to control transfer in a partial binary translation system
US5940876A (en) Stride instruction for fetching data separated by a stride amount
US20120204008A1 (en) Processor with a Hybrid Instruction Queue with Instruction Elaboration Between Sections
US6950926B1 (en) Use of a neutral instruction as a dependency indicator for a set of instructions
US5502827A (en) Pipelined data processor for floating point and integer operation with exception handling
CN111782270B (zh) 一种数据处理方法及装置、存储介质
US6145075A (en) Apparatus and method for executing a single-cycle exchange instruction to exchange contents of two locations in a register file
JP5335440B2 (ja) オペランドの早期の条件付き選択
US6799266B1 (en) Methods and apparatus for reducing the size of code with an exposed pipeline by encoding NOP operations as instruction operands
JPH09152973A (ja) カウント/リンク・レジスタ変更命令の投機実行をサポートする方法及び装置
US6983359B2 (en) Processor and method for pre-fetching out-of-order instructions
US5768553A (en) Microprocessor using an instruction field to define DSP instructions
US5761469A (en) Method and apparatus for optimizing signed and unsigned load processing in a pipelined processor
US20070180220A1 (en) Processor system
US5841999A (en) Information handling system having a register remap structure using a content addressable table
US11269649B2 (en) Resuming beats of processing of a suspended vector instruction based on beat status information indicating completed beats

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP