WO2018193321A1 - Register context restoration based on rename register recovery - Google Patents

Register context restoration based on rename register recovery Download PDF

Info

Publication number
WO2018193321A1
WO2018193321A1 PCT/IB2018/051646 IB2018051646W WO2018193321A1 WO 2018193321 A1 WO2018193321 A1 WO 2018193321A1 IB 2018051646 W IB2018051646 W IB 2018051646W WO 2018193321 A1 WO2018193321 A1 WO 2018193321A1
Authority
WO
WIPO (PCT)
Prior art keywords
snapshot
registers
register
memory
architected registers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/IB2018/051646
Other languages
English (en)
French (fr)
Inventor
Michael Karl Gschwind
Chung-Lung Shum
Timothy Slegel
Valentina Salapura
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IBM China Investment Co Ltd
IBM United Kingdom Ltd
International Business Machines Corp
Original Assignee
IBM China Investment Co Ltd
IBM United Kingdom Ltd
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by IBM China Investment Co Ltd, IBM United Kingdom Ltd, International Business Machines Corp filed Critical IBM China Investment Co Ltd
Priority to DE112018000848.7T priority Critical patent/DE112018000848T5/de
Priority to JP2019556276A priority patent/JP7046098B2/ja
Priority to CN201880025664.XA priority patent/CN110520837B/zh
Priority to GB1916132.2A priority patent/GB2575412B/en
Publication of WO2018193321A1 publication Critical patent/WO2018193321A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • G06F9/30058Conditional branch instructions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30105Register structure
    • G06F9/30116Shadow registers, e.g. coupled registers, not forming part of the register space
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • G06F9/384Register renaming
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • G06F9/3863Recovery, e.g. branch miss-prediction, exception handling using multiple copies of the architectural state, e.g. shadow registers

Definitions

  • One or more aspects relate, in general, to processing within a computing environment, and in particular, to facilitating such processing.
  • Short functions like any other functions, store callee-saved registers that they modify on a stack as part of the function's prolog and restore them as part of the epilog.
  • the stack also referred to as a call stack, is used by a computer program to store information about active functions of the computer program.
  • callers to such functions save caller-saved registers on the stack as part of the function's call sequence, and restore them upon the return, if the values live across the function call. Saving these registers is a significant expense of calling a function.
  • the computer program product comprises a storage medium readable by a processing circuit and storing instructions for performing a method.
  • the method includes, for instance, obtaining, by a processor, a load request to restore a plurality of architected registers; and restoring, based on obtaining the load request, one or more architected registers of the plurality of architected registers.
  • the restoring uses a snapshot that maps architected registers to physical registers to replace one or more physical registers currently assigned to the one or more architected registers with one or more physical registers of the snapshot corresponding to the one or more architected registers.
  • the restoring is performed using the snapshot, based on the determining indicating the snapshot is available.
  • the one or more architected registers are restored by loading values from memory into the one or more architected registers, based on the determining indicating a snapshot corresponding to the one or more architected registers is unavailable.
  • the determining includes using a snapshot stack to determine whether a snapshot corresponding to the one or more architected registers is available.
  • the snapshot stack includes, for instance, a plurality of entries.
  • an entry of the snapshot stack includes a snapshot identifier identifying the snapshot.
  • the entry of the snapshot stack includes additional information including at least one selected from a group consisting of: an address in memory of contents of the one or more architected registers, an indication of the one or more architected registers associated with the snapshot, and a validity indicator indicating whether the snapshot is valid.
  • the snapshot is created to save a mapping of the one or more physical registers to the one or more architected registers.
  • the creating the snapshot is performed, in one example, based on obtaining a save request requesting a saving of the one or more architected registers.
  • the load request includes a load multiple instruction
  • the save request includes a store multiple instruction
  • the one or more architected registers are restored absent a copying of values for the one or more architected registers from memory.
  • FIG. 1 A depicts one example of a computing environment to incorporate and use one or more aspects of the present invention
  • FIG. 1 B depicts further details of a processor of FIG. 1 A, in accordance with one or more aspects of the present invention
  • FIG. 1C depicts further details of one example of an instruction execution pipeline used in accordance with one or more aspects of the present invention
  • FIG. 1 D depicts further details of one example of a processor of FIG. 1A, in accordance with an aspect of the present invention
  • FIG. 2A depicts one example of storing caller-saved registers, in accordance with an aspect of the present invention
  • FIG. 2B depicts one example of storing callee-saved registers, in accordance with an aspect of the present invention
  • FIG. 3 depicts one example of a mapping of architected registers to physical registers, in accordance with an aspect of the present invention
  • FIG. 4A depicts one example of processing associated with a bulk save request, in accordance with an aspect of the present invention
  • FIG. 4B depicts one example of processing associated with a bulk restore request, in accordance with an aspect of the present invention
  • FIG. 5A depicts one example of a register rename table, a plurality of snapshots, and a physical rename file used in accordance with one or more aspects of the present invention
  • FIG. 5B is a further example of a register rename table, a plurality of snapshots, and a physical rename file used in accordance with one or more aspects of the present invention
  • FIG. 5C pictorially depicts one example of rolling back a snapshot, in accordance with an aspect of the present invention.
  • FIG. 5D pictorially depicts another example of rolling back a snapshot, in accordance with an aspect of the present invention.
  • FIG. 6 depicts one example of a snapshot stack used in accordance with one or more aspects of the present invention.
  • FIG. 7A depicts one example of a Spill Multiple instruction, in accordance with an aspect of the present invention
  • FIG. 7B depicts one example of a Reload Multiple instruction, in accordance with an aspect of the present invention
  • FIG. 8A depicts another example of processing associated with a bulk restore request, in accordance with an aspect of the present invention
  • FIG. 8B depicts yet another example of processing associated with a bulk restore request, in accordance with an aspect of the present invention.
  • FIG. 9 pictorially depicts one example of reusing a snapshot, in accordance with an aspect of the present invention
  • FIGS. 10A-10E depict examples of processing associated with managing restoration snapshots, in accordance with one or more aspects of the present invention
  • FIG. 10F depicts one example of performing recovery using shared snapshots for recovery and/or restoration, in accordance with an aspect of the present invention
  • FIGS. 11 A-11C depict embodiments of checking for memory changes and optionally recovering, in accordance with one or more aspects of the present invention
  • FIG. 12 depicts one example of processing associated with mismatched Spill Multiple/Reload Multiple pairs, in accordance with an aspect of the present invention
  • FIG. 13A depicts one example of entries in a data cache with associated indicators, in accordance with an aspect of the present invention
  • FIGS. 13B-13D depict examples of processing associated with the indicators depicted in FIG. 13A, in accordance with one or more aspects of the present invention
  • FIGS. 14A-14B depict examples of processing associated with register restoration, in accordance with one or more aspects of the present invention.
  • FIG. 15A depicts an example of processing associated with transactional memory and restoration, in accordance with one or more aspects of the present invention
  • FIG. 15B depicts one example of a Transaction Begin instruction, in accordance with one or more aspects of the present invention.
  • FIGS. 15C-15E depict aspects of processing associated with transactional memory and restoration, in accordance with one or more aspects of the present invention.
  • FIGS. 16A-16D depict examples of techniques used to track memory changes, in accordance with one or more aspects of the present invention.
  • FIG. 17 depicts one example of handling a restoration request, in accordance with an aspect of the present invention.
  • FIGS. 18A-18C depict examples of processing associated with context switches, in accordance with one or more aspects of the present invention.
  • FIG. 19A depicts one example of processing associated with managing snapshots based on executing a
  • FIG. 19B depicts one example of processing associated with a register save indication, in accordance with an aspect of the present invention
  • FIGS. 20A-20B depict examples of processing associated with coalescing store/load instructions, in accordance with one or more aspects of the present invention
  • FIG. 21 A depicts one example of a store queue that includes write back logic, used in accordance with an aspect of the present invention
  • FIGS. 21 B-21C depict examples of write back logic processing, in accordance with one or more aspects of the present invention.
  • FIG. 22A depicts one example of a recovery buffer, in accordance with an aspect of the present invention.
  • FIG. 22B depicts one example of a processor that includes a recovery buffer, in accordance with an aspect of the present invention
  • FIGS. 23A-23B depict examples of processing associated with register allocation requests, in accordance with one or more aspects of the present invention.
  • FIGS. 24A-24B depict one example of an aspect of facilitating processing within a computing environment, in accordance with an aspect of the present invention.
  • FIG. 25A depicts another example of a computing environment to incorporate and use one or more aspects of the present invention.
  • FIG. 25B depicts further details of the memory of FIG. 25A
  • FIG. 26 depicts one embodiment of a cloud computing environment
  • FIG. 27 depicts one example of abstraction model layers.
  • a capability is provided to optimize the saving and restoring of registers on function calls, thereby improving processing and reducing costs associated therewith.
  • the capability uses register renaming for the saving/restoring.
  • FIG. 1 A One embodiment of a computing environment to incorporate and use one or more aspects of the present invention is described with reference to FIG. 1 A.
  • the computing environment is based on the z/Architecture, offered by International Business Machines Corporation, Armonk, New York.
  • One embodiment of the z/Architecture is described in "z/Architecture Principles of Operation," IBM Publication No. SA22-7832-10, March 2015, which is hereby incorporated herein by reference in its entirety.
  • Z/ARCHITECTURE is a registered trademark of International Business Machines Corporation, Armonk, New York, USA.
  • the computing environment is based on the Power Architecture, offered by International Business Machines Corporation, Armonk, New York.
  • Power Architecture is described in "Power ISATM Version 2.07B," International Business Machines Corporation, April 9, 2015, which is hereby incorporated herein by reference in its entirety.
  • POWER ARCHITECTURE is a registered trademark of International Business Machines Corporation, Armonk, New York, USA.
  • the computing environment may also be based on other architectures, including, but not limited to, the Intel x86 architectures. Other examples also exist.
  • a computing environment 100 includes, for instance, a computer system 102 shown, e.g., in the form of a general-purpose computing device.
  • Computer system 102 may include, but is not limited to, one or more processors or processing units 104 (e.g., central processing units (CPUs)), a memory 106 (referred to as main memory or storage, as examples), and one or more input/output (I/O) interfaces 108, coupled to one another via one or more buses and/or other connections 110.
  • processors or processing units 104 e.g., central processing units (CPUs)
  • main memory or storage referred to as examples
  • I/O input/output
  • Bus 110 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
  • bus architectures include the Industry Standard Architecture (ISA), the Micro Channel Architecture (MCA), the Enhanced ISA (EISA), the Video
  • VESA Electronics Standards Association
  • PCI Peripheral Component Interconnect
  • Memory 106 may include, for instance, a cache 120, such as a shared cache, which may be coupled to local caches 122 of processors 104. Further, memory 106 may include one or more programs or applications 130, an operating system 132, and one or more computer readable program instructions 134. Computer readable program instructions 134 may be configured to carry out functions of embodiments of aspects of the invention.
  • Computer system 102 may also communicate via, e.g., I/O interfaces 108 with one or more external devices 140, one or more network interfaces 142, and/or one or more data storage devices 144.
  • Example external devices include a user terminal, a tape drive, a pointing device, a display, etc.
  • Network interface 142 enables computer system 102 to communicate with one or more networks, such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet), providing communication with other computing devices or systems.
  • LAN local area network
  • WAN wide area network
  • public network e.g., the Internet
  • Data storage device 144 may store one or more programs 146, one or more computer readable program instructions 148, and/or data, etc.
  • the computer readable program instructions may be configured to carry out functions of embodiments of aspects of the invention.
  • Computer system 102 may include and/or be coupled to removable/non-removable, volatile/nonvolatile computer system storage media.
  • it may include and/or be coupled to a non-removable, non- volatile magnetic media (typically called a "hard drive"), a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a "floppy disk"), and/or an optical disk drive for reading from or writing to a removable, non-volatile optical disk, such as a CD-ROM, DVD-ROM or other optical media.
  • a non-removable, non- volatile magnetic media typically called a "hard drive”
  • a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a "floppy disk")
  • an optical disk drive for reading from or writing to a removable, non-volatile optical disk, such as a CD-ROM, DVD-ROM or other optical media.
  • other hardware and/or software components
  • Computer system 102 may be operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with computer system 102 include, but are not limited to, personal computer (PC) systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like.
  • PC personal computer
  • server computer systems thin clients, thick clients, handheld or laptop devices
  • multiprocessor systems microprocessor-based systems
  • set top boxes programmable consumer electronics
  • network PCs minicomputer systems
  • mainframe computer systems mainframe computer systems
  • distributed cloud computing environments that include any of the above systems or devices, and the like.
  • Processor 104 includes a plurality of functional components used to execute instructions. These functional components include, for instance, an instruction fetch component 150 to fetch instructions to be executed; an instruction decode unit 152 to decode the fetched instructions and to obtain operands of the decoded instructions; instruction execution components 154 to execute the decoded instructions; a memory access component 156 to access memory for instruction execution, if necessary; and a write back component 160 to provide the results of the executed instructions.
  • One or more of these components may, in accordance with an aspect of the present invention, be used to execute one or more register restoration operations and/or instructions 166, and/or other operations/instructions associated therewith.
  • Processor 104 also includes, in one embodiment, one or more registers 168 to be used by one or more of the functional components. Processor 104 may include additional, fewer and/or other components than the examples provided herein.
  • FIG. 1C Further details regarding an execution pipeline of processor 104 are described with reference to FIG. 1C. Although various processing stages of the pipeline are depicted and described herein, it will be understood that additional, fewer and/or other stages may be used without departing from the spirit of aspects of the invention.
  • an instruction is fetched 170 from an instruction queue, and branch prediction 172 and/or decoding 174 of the instruction may be performed.
  • the decoded instruction may be added to a group of instructions 176 to be processed together.
  • the grouped instructions are provided to a mapper 178 that determines any dependencies, assigns resources and dispatches the group of instructions/operations to the appropriate issue queues.
  • an instruction/operation is issued to the appropriate execution unit. Any registers are read 182 to retrieve its sources, and the instruction/operation executes during an execute stage 184.
  • the execution may be for a branch, a load (LD) or a store (ST), a fixed point operation (FX), a floating point operation (FP), or a vector operation (VX), as examples. Any results are written to the appropriate register(s) during a write back stage 186. Subsequently, the instruction completes 188. If there is an interruption or flush 190, processing may return to instruction fetch 170.
  • LD load
  • ST store
  • FX fixed point operation
  • FP floating point operation
  • VX vector operation
  • a register renaming unit 192 coupled to the decode unit is a register renaming unit 192, used in one or more aspects in the saving/restoring of registers.
  • a processor such as processor 104
  • the prediction hardware includes, for instance, a local branch history table (BHT) 105a, a global branch history table (BHT) 105b, and a global selector 105c.
  • BHT branch history table
  • BHT global branch history table
  • I FAR instruction fetch address register
  • instruction cache 109 may fetch a plurality of instructions referred to as a "fetch group".
  • fetch group a plurality of instructions referred to as a "fetch group”.
  • directory 111 Associated with instruction cache 109 is a directory 111.
  • the cache and prediction hardware are accessed at approximately the same time with the same address. If the prediction hardware has prediction information available for an instruction in the fetch group, that prediction is forwarded to an instruction sequencing unit (ISU) 113, which, in turn, issues instructions to execution units for execution.
  • the prediction may be used to update I FAR 107 in conjunction with branch target calculation 115 and branch target prediction hardware (such as a link register prediction stack 117a and a count register stack 117b. If no prediction information is available, but one or more instruction decoders 119 find a branch instruction in the fetch group, a prediction is created for that fetch group. Predicted branches are stored in the prediction hardware, such as in a branch information queue (BIQ) 125, and forwarded to ISU 113.
  • BIQ branch information queue
  • a branch execution unit (BRU) 121 operates in response to instructions issued to it by ISU 113.
  • BRU 121 has read access to a condition register (CR) file 123.
  • Branch execution unit 121 further has access to information stored by the branch scan logic in branch information queue 125 to determine the success of a branch prediction, and is operatively coupled to instruction fetch address register(s) (IFAR) 107 corresponding to the one or more threads supported by the microprocessor.
  • IFAR instruction fetch address register
  • BIQ entries are associated with, and identified by an identifier, e.g., by a branch tag, BTAG. When a branch associated with a BIQ entry is completed, it is so marked.
  • BIQ entries are maintained in a queue, and the oldest queue entry (entries) is (are) de-allocated sequentially when they are marked as containing information associated with a completed branch.
  • BRU 121 is further operatively coupled to cause a predictor update when BRU 121 discovers a branch misprediction.
  • BRU 121 detects if the prediction is wrong. If so, the prediction is to be updated.
  • the processor also includes predictor update logic 127.
  • Predictor update logic 127 is responsive to an update indication from branch execution unit 121 and configured to update array entries in one or more of the local BHT 105a, global BHT 105b, and global selector 105c.
  • the predictor hardware 105a, 105b, and 105c may have write ports distinct from the read ports used by the instruction fetch and prediction operation, or a single read/write port may be shared.
  • Predictor update logic 127 may further be operatively coupled to link stack 117a and count register stack 117b.
  • condition register file (CRF) 123 is read-accessible by BRU 121 and can be written to by the execution units, including but not limited to, a fixed point unit (FXU) 141 , a floating point unit (FPU) 143, and a vector multimedia extension unit (VMXU) 145.
  • a condition register logic execution unit (CRL execution) 147 also referred to as the CRU
  • SPR special purpose register
  • CRU 147 performs logical operations on the condition registers stored in CRF file 123.
  • FXU 141 is able to perform write updates to CRF 123.
  • Processor 104 further includes, a load/store unit 151 , and various multiplexors 153 and buffers 155, as well as address translation tables 157, and other circuitry.
  • Executing within processor 104 are programs (also referred to as applications) that use hardware registers to store information.
  • programs that call routines such as functions, subroutines or other types of routines, are responsible for saving registers used by the caller and for restoring those registers upon return from the callee.
  • the callee is responsible for saving/restoring registers that it uses, as shown in the below code.
  • caller-saved registers are stored, STEP 200.
  • STMG Store Multiple instruction
  • function parameters are loaded (e.g., using the load instruction LGFI), STEP 202, and a function call is performed using, for instance, a branch instruction (BRASL), STEP 204 (i.e., the callee is called).
  • BRASL branch instruction
  • STEP 204 i.e., the callee is called
  • STEP 206 i.e., the callee is called
  • LMG Load Multiple instruction
  • a bulk save or bulk store includes a store of one or more registers
  • a bulk restore or bulk reload includes a load of one or more registers.
  • the bulk save (store) and the bulk restore (reload) are related to saving/restoring registers related to function calls.
  • the bulk save (store) and the bulk restore (reload) are related to saving values on a program stack and restoring the values from the program stack to the same registers from which they have been stored.
  • bit positions of the set of general registers starting with general register Ri specified by the instruction and ending with general register R3 specified by the instruction are placed in the storage area beginning at the location designated by the second operand address (e.g., provided by the contents of the register designated by B2 plus the contents of D2; both B2 and D2 are specified by the instruction) and continuing through as many locations as needed.
  • the contents of bit positions 32-63 of the general registers are stored in successive four-byte fields beginning at the second operand address.
  • the general registers are stored in the ascending order of their register numbers, starting with general register Ri and continuing up to and including general register R3, with general register 0 following general register 15.
  • bit positions of the set of general registers starting with general register Ri, specified by the instruction, and ending with general register R3, specified by the instruction are loaded from storage beginning at the location designated by the second operand address (e.g., provided by the contents of the register designated by B2 plus the contents of D2; both B2 and D2 are specified by the instruction) and continuing through as many locations as needed.
  • bit positions 32-63 of the general registers are loaded from successive four-byte fields beginning at the second operand address and bits 0-31 remain unchanged.
  • a set of callee-saved registers are stored, STEP 220.
  • STMG Store Multiple instruction
  • processing is performed as part of the function body, including loading the return address back to the caller, STEP 222.
  • the callee-saved registers are restored, STEP 224.
  • this occurs in the epilog, and includes a bulk restore of the callee-saved registers via, for instance, a Load Multiple instruction (LMG).
  • LMG Load Multiple instruction
  • the registers that are saved/restored may be architected or logical registers that are mapped to physical registers, as shown in FIG. 3.
  • FIG. 3 based on, for instance, an instruction referring to an architected register 300, that architected register is associated with a physical register 302.
  • register renaming logic is used to look up a table (or other data structure) to determine what physical register corresponds to an architected register. For instance, for read accesses, an architected register is replaced with a physical register that is found in the table; and for write accesses, a new physical register is allocated out of a free list.
  • the renaming logic may involve, in accordance with one or more aspects, one or more units of the processor.
  • a processor decode unit receives instructions; renames target instructions by, e.g., updating a lookup table associating a set of architected registers to physical registers obtained from a free list; updates a register rename table for source instructions; takes a rollback snapshot (e.g., copy of register rename table) when an instruction or group of instructions may trigger a rollback (e.g., due to the instruction being able to raise an exception or for a branch instruction that may be mispredicted); and includes rollback logic adapted to recover a snapshot corresponding to an event requiring a rollback, e.g., for an exception handler or a new branch target, or cause re-execution.
  • the renaming logic may involve an execution unit that includes a physical register file accessed by physical register numbers received by the decode unit; logic to execute instructions and write results to a specified physical register; and logic to indicate successful completion or a rollback in the event of, e.g., a branch misprediction or exception.
  • an instruction completion unit is used that receives reports indicating that instructions have completed; marks snapshots as no longer necessary; adds physical registers to the free list; and updates an in-order program counter or other in-order state.
  • registers may be saved and restored by either a caller function, a callee function, or both.
  • This capability includes, for instance, using a register snapshot to save and restore the registers, thereby avoiding, in at least one aspect, the use of memory in restoring (and optionally, saving) the registers.
  • a snapshot is taken of at least a portion of the register state (e.g., at least a portion of a register rename map, other register restoration information, or the full register state) when a store of bulk registers is recognized.
  • a request for a bulk store is obtained (e.g., received, determined, provided, retrieved, have, etc.), STEP 400.
  • the bulk store may be, for example, a Store Multiple Instruction that stores multiple registers.
  • the bulk save is performed, and the contents of the multiple registers are written to memory, STEP 402. Based thereon, a snapshot is created, STEP 404. (In another embodiment, the storing to memory is not performed.)
  • FIG. 5A One example of a snapshot is shown in FIG. 5A.
  • a snapshot 500a is taken of the mapping of physical registers 502a to architected registers 504.
  • physical register 45 is assigned to architected register 0; physical register 88 is assigned to architected register 11; physical register 96 is assigned to architected register 12; physical register 67 is assigned to architected register 13; physical register 38 is assigned to architected register 14; and physical register 22 is assigned to architected register 15.
  • a mapping of these physical registers to the architected registers is captured by snapshot 500a.
  • a physical register file 506 indicates for each physical register 502 the value 505 stored within that register.
  • a snapshot identifier 508 (e.g., ID 4) is assigned to snapshot 500a. Further, in one example, there may also be a plurality of older snapshots 510 (e.g., snapshots 2 and 3).
  • a snapshot of the registers participating in the bulk save is taken (e.g., snapshot 500a). Then, processing continues, and as shown in FIG. 5B, new physical registers 502b are allocated, and the function is executed.
  • a bulk restore request is obtained (e.g., received, determined, provided, have, retrieved, etc.), STEP 450.
  • the bulk restore request may be, for example, a Load Multiple instruction that loads multiple registers.
  • a determination is made as to whether a corresponding snapshot is available, INQUIRY 452. If a corresponding snapshot is unavailable, then the values are reloaded from memory, STEP 454. However, if a corresponding snapshot is available, then a further determination is made as to whether the bulk restore matches the bulk save, INQUIRY 456.
  • FIG. 5C the mapping of the physical registers is restored by recovering 520 the snapshot, resulting in a restored snapshot.
  • a restored snapshot 530 maps to the same architected registers that were previously saved.
  • p123 assigned to rO is replaced with p45; p23 assigned to r11 is replaced with p88; p58 assigned to r12 is replaced with p96; p67 assigned to r13 is replaced with p67 (or no replace is performed); p245 assigned to r14 is replaced with p38; and p14 assigned to r15 is replaced with p22.
  • a number of techniques may be used. One technique includes remembering the last snapshot that was taken. For instance, based on a store multiple, a snapshot is taken, the identifier of that snapshot is remembered, and the snapshot is marked as available. Then, if another store multiple is performed, another snapshot is taken, the snapshot id is incremented and that identifier is remembered, etc. Further, based on a bulk restore, the snapshot id of the last bulk save is recovered and that snapshot is marked as unavailable.
  • Leaf functions are a sizeable fraction of all functions (typically about 50%), which are also among the shortest functions, thus, save and restore processing represent a significant fraction of execution time for such functions, which is reduced using the snapshot.
  • a snapshot stack 600 includes one or more snapshot identifiers (snapshot ID) 602, one for each snapshot taken with the latest snapshot on top, as indicated by a top-of-stack (TOS) pointer 604.
  • snapshot ID snapshot identifiers
  • TOS top-of-stack
  • the snapshot stack may optionally include additional information. For instance, the values of the registers are saved to memory (STEP 402) for a number of situations, including, for instance, in case the snapshot is lost (STEP 454), or if there is a need to confirm if the snapshot contains the latest values (STEP 460).
  • the additional information may include an address or address range 606 of where the value or values of the snapshot registers are stored in memory.
  • the snapshot may not be valid for all of the registers contained within the snapshot, but instead only for a subset of the registers.
  • the snapshot stack may include for each snapshot, a register from indication 608 and a register to indication 610 that provide the registers that are valid for the snapshot.
  • a valid indicator 612 may optionally be provided to indicate whether the snapshot is valid. Other, additional and/or less information may be provided in other embodiments.
  • the top-of-stack pointer is adjusted. For instance, based on creating a snapshot, a new entry is added to the stack, and the top-of-stack pointer is incremented. Further, when restoring a snapshot, the entry corresponding to the top-of-stack pointer is removed from the stack, and the top-of-stack pointer is decremented. If, for instance, there is a branch misprediction or an exception, then multiple entries may be removed and the top-of-stack pointer is appropriately adjusted.
  • the save and restore of the registers are based on performing Store Multiple or Load Multiple instructions (or similar instructions). However, these instructions may be used for many purposes, and therefore, checking and/or heuristics may be used to ensure correct execution is preserved. That is, a determination is made of the store/load pairs, which then may be optimized using the saving and restoring aspects of the present invention. Unmatching store/load pairs are not optimized using the saving/restoring aspects of the present invention.
  • new instructions are defined that behave differently than the load multiple/store multiple instructions.
  • the new instructions referred to herein as Spill Multiple (Spillm) and Reload multiple (Reloadm) are defined such that they do not consider modifications to memory that occur between the spill and reload. That is, in accordance with one architectural definition of those instructions, the user of Spillm/Reloadm is not to modify the in-memory values corresponding to those registers between the spill and the reload. Thus, if an in-memory image is modified, the new instructions are not obligated to consider that value.
  • a Spill Multiple (Spillm) instruction 700 includes at least one operation code field 702 that includes an operation code (opcode) indicating a spill multiple operation; a first register (Ri) field 704; a second register (f3 ⁇ 4) field 706; a base field (B 2 ) 708; and a displacement field (D 2 ) 710 (e.g., a 12-bit unsigned binary integer).
  • the displacement field may include multiple fields (e.g., Dl_2 and Dhb) and may be, e.g., a 20-bit unsigned binary integer (other sizes are also possible).
  • each of the fields is separate from one another, in one example. However, in other examples, one or more of the fields may be combined.
  • the subscript number associated with a field designates the operand to which that field corresponds. For instance, a field having a subscript number 1 corresponds to a first operand; a field having a subscript number 2 corresponds to a second operand; and so on.
  • bit positions of the set of general registers starting with general register Ri and ending with general register R3 are preserved for later restoration.
  • the storage area beginning at the location designated by the second operand address (e.g., provided by the contents of the register designated by B2 plus the contents of D2 or DL2 plus DH2) and continuing through as many locations as needed may be used as a buffer to store some or all of the registers.
  • the corresponding buffer storage address is to be specified for a corresponding recovery address.
  • the contents of bit positions 32-63 of the general registers are stored in successive four-byte fields beginning at the second operand address.
  • the general registers are preserved for later restoration.
  • bit positions 0-63 of the general registers are preserved for later restoration.
  • a buffer corresponding to 4 bytes (or in the other embodiment, 8 bytes) per register may be used and are to be accessible.
  • the content of the buffer is undefined and may change from system generation to system generation.
  • the buffer is defined and contains a value corresponding to the value of the storage area in accordance with the definition of a corresponding Store Multiple instruction.
  • a snapshot of the one or more registers indicated by the instruction is taken to have a mapping of the physical registers to the specified architected registers.
  • a Reload Multiple (Reloadm) instruction 750 includes at least one operation code field 752 that includes an operation code (opcode) indicating a reload multiple operation; a first register (Ri) field 754; a second register (f3 ⁇ 4) field 756; a base field (B 2 ) 758; and a displacement field (D 2 ) 760 (e.g., a 12-bit unsigned binary integer).
  • the displacement field may include multiple fields (e.g., Dl_2 and DH 2 ) and may be, e.g., a 20-bit unsigned binary integer (other sizes are possible).
  • each of the fields is separate from one another, in one example. However, in other examples, one or more of the fields may be combined. Further, in one example, the subscript number associated with the field designates the operand to which that field corresponds. For instance, a field having a subscript number 1 corresponds to a first operand; a field having a subscript number 2 corresponds to a second operand; and so on.
  • bit positions of the set of general registers starting with general register Ri and ending with general register R3 are restored from the most recent snapshot, removing the most recent snapshot and making its preceding snapshot available as most recent snapshot for subsequent Reload Multiple instructions.
  • bit positions 32-63 of the general registers are reloaded from a previously stored value, and bits 0-31 remain unchanged.
  • bit positions 0-63 of the general registers are restored from a previously stored value.
  • the general registers are loaded in the ascending order of their register numbers, starting with general register Ri and continuing up to and including general register f3 ⁇ 4, with general register 0 following general register 15.
  • the registers are loaded from storage beginning at the location designated by the second operand address (e.g., provided by the contents of the register designated by B2 plus the contents of D2 (or DL2 plus DH 2 ).
  • the result of this operation can be undefined for a variety of reasons, including: a preceding Spill Multiple instruction did not specify the same register range to be prepared for restore. (In another embodiment, the result is undefined when a previous Spill Multiple instruction did not specify a superset of the register range to be prepared for restore); the Reload Multiple Instruction does not specify the same buffer (In one embodiment, this is to be the same address. In another embodiment, this is to be an adjusted address when a subset of registers are restored); or the buffer has been modified by intervening instructions.
  • the snapshot is not verified (as in FIG. 4B), since in accordance with the architectural definition of Reloadm, the user is not to modify the stored data corresponding to those registers.
  • FIG. 8A there is no verify snapshot after the restore snapshot.
  • a bulk restore request (e.g., a Reloadm instruction) is obtained (e.g., received, determined, provided, have, retrieved, etc.), STEP 800.
  • a determination is made as to whether a corresponding snapshot is available, INQUIRY 802. This determination may be made using the techniques described above, such as remembering the last snapshot id, using a snapshot stack, and/or other techniques.
  • a corresponding snapshot is unavailable, then the values are reloaded from memory, using for instance, a Load Multiple or similar instruction, STEP 804. However, if a corresponding snapshot is available, then a further determination is made as to whether the bulk restore matches the bulk save (e.g., performed by Spillm), INQUIRY 806. That is, are the registers to be restored the same registers that were saved. If they are, then the snapshot is restored, STEP 808. For instance, the mapping of the physical registers to the architected registers is changed to reflect the last snapshot. Since Reloadm was used in the restoration, the snapshot is not verified, as is when a Load Multiple is used.
  • the bulk restore e.g., performed by Spillm
  • a Reloadm instruction is architecturally guaranteed to match a previous Spillm instruction
  • the match verification may also be suppressed, as shown in FIG. 8B. More specifically, it is the programmer's responsibility to match corresponding pairs of Spillm and Reloadm, at the penalty of an undefined result when such a match is not guaranteed by the programmer.
  • a bulk restore request e.g., Reloadm
  • STEP 820 e.g., received, determined, provided, have, retrieved, etc.
  • a determination is made as to whether a corresponding snapshot is available, INQUIRY 822. If a corresponding snapshot is unavailable, then the values are reloaded from memory (e.g., using Load Multiple), STEP 824. Otherwise, the snapshot is restored, STEP 826 (an inquiry corresponding to INQUIRY 806 is not performed).
  • support for bulk saves/restores in accordance with conventional store multiple and load multiple bulk requests may be combined with the new Spillm/Reloadm facility in at least one embodiment.
  • code in accordance with conventional ISA (instruction set architecture) definitions may be accelerated, but use additional checking to ensure adherence to the architectural compliance with the conventional instruction definition, whereas providing even higher performance due to reduced checking for the code using the new Spillm/Reloadm instructions in accordance with aspects of the present invention.
  • ISA instruction set architecture
  • snapshots may be shared between adjacent register restoration points. These snapshots are taken of at least a portion of a register rename map, of other register restoration information, or of the full register state, as examples.
  • a function call is often associated with two sets of matched register spill and register reload pairs (e.g., STMG and LMG, or Spillm and Reloadm)-one associated with the saving of caller-saved registers at the call site, and another one associated with the saving of callee-saved registers in the called function.
  • the spills i.e., the saving of multiple architected registers
  • the restores or reloads of the registers are usually dynamically close, as well.
  • the registers are mutually exclusive, and the registers to be saved by the second spill operation are commonly not modified by code between the first and second spill.
  • Example code, in pseudocode notation, is below:
  • a register restoration snapshot may be shared between dynamically adjacent instances of spill operations (e.g., Spillm instructions).
  • a processor may create a restoration snapshot that includes a single snapshot of the register state used for restoring both r10-r15 of the caller function and r16-r20 of the callee function. It should be noted that the ranges of saved registers do not have to be adjacent.
  • a restoration snapshot includes two separate records: an address to which a spill has occurred and a register snapshot to be used in restoration. The register snapshot is shared, but separate address values are maintained for each Spillm/Reloadm pair: ⁇ address, snapshot-ID>.
  • the processor maintains, in accordance with an aspect of the present invention, a reference to the last restoration snapshot taken (which includes registers that may be referenced by a plurality of spill operations), in conjunction with a bitmap of register values that have not been written to since the last restoration snapshot.
  • a reference to the last restoration snapshot taken which includes registers that may be referenced by a plurality of spill operations
  • the present spill operation can reuse the previous restoration snapshot. Otherwise, a new restoration snapshot may be created.
  • rr create_restoration_record (spill_to_address, last_snapshot);
  • snapshot 4 refers to a restoration snapshot; i.e., a snapshot taken based on execution of one or more bulk saves (e.g., Spillm and/or Store Multiple). Initially, one embodiment of sharing a restoration snapshot is described with reference to FIG. 10A.
  • snapshot_regs is set to those registers that are to be included in a restoration snapshot, STEP 1000. For instance, a determination is made of the registers to be included, such as those to be included in one or more Spillm or Store Multiple instructions, and an indication of those registers is provided in snapshot_regs.
  • prev_snapshot_ID is set equal to this_snapshot_ID, such that this snapshot is remembered for further use, STEP 1014.
  • the registers that are modified by an instruction are tracked.
  • a determination is made as to whether an instruction changes the contents of one or more registers, INQUIRY 1020. If one or more registers are modified by the instruction, then those registers are tracked, STEP 1022.
  • a new snapshot is forced.
  • an event e.g., flush, branch misprediction, exception, etc.
  • prev_snapshot_id is set to NONE indicating there is no previous snapshot that may be shared, STEP 1030, and unmodified_regs is set to no_regs, STEP 1032. That is, there is no indication of a usable previous snapshot, and there are no registers considered unmodified.
  • rollback to a previous snapshot may be performed, as described with reference to FIG. 10D.
  • rollback_snapshot_ID which is the snapshot to which processing is rolled back, STEP 1042. Additionally, unmodified_regs is set to all of the registers, since at this point, no registers have been modified, STEP 1044.
  • prev_snapshot_ID is set to NONE, STEP 1046, and unmodified registers is set to no registers, STEP 1048.
  • FIG. 10E A further example of processing associated with rolling back to a snapshot due to an event, such as a flush, branch misprediction, exception, is described with reference to FIG. 10E.
  • a determination is made as to whether processing is to roll back to or beyond the last snapshot, INQUIRY 1050. If not, this processing is complete and the last snapshot may be used, STEP 1052.
  • the set of unmodified registers may contain a subset of the truly unmodified registers, as processing of rolled back instructions in accordance with the technique of FIG. 10B may have removed registers from the unmodified register set.
  • the set of snapshots that may be taken is fewer than those in accordance with FIGS. 10C or FIG. 10D, in which each snapshot request triggers a snapshot being taken in accordance with the technique of FIG. 10C, and in accordance with choosing STEPS 1046 and 1048 of FIG. 10D.
  • prev_snapshot_ID is set equal to rollback_snapshot_ID, which is the snapshot to which processing is rolled back, STEP 1056.
  • unmodified_regs is set to all of the registers, since at this point, no registers have been modified, STEP 1058.
  • prev_snapshot_ID is set to NONE, STEP 1060, and unmodified registers is set to no registers, STEP 1062.
  • snapshots for restoration are those snapshots taken based on a bulk save (e.g., Store Multiple, Spillm, etc.); and snapshots for recovery are those snapshots taken based on a change to execution flow, such as a branch or a situation where an address may be mispredicted, as examples.
  • a bulk save e.g., Store Multiple, Spillm, etc.
  • snapshots for recovery are those snapshots taken based on a change to execution flow, such as a branch or a situation where an address may be mispredicted, as examples.
  • recovery snapshots are taken for the branch conditional (bcond) instruction and the jump (jsr) instruction
  • restoration snapshots are taken for the Spillm instruction.
  • the recovery snapshots e.g., for the branch conditional and the jump
  • FIG. 10F One example of sharing a recovery snapshot is described with reference to FIG. 10F.
  • the processing of FIG. 10F is performed when a recovery snapshot is to be made, e.g., to create a rollback point for branch misprediction recovery implementing precise exceptions, handling pipeline flushes, or other such events.
  • snapshot_regs the set of registers to be snapshot for a recovery snapshot
  • all_regs() e.g., all of the registers to be saved for recovery in the event of a precise exception, branch misprediction, or other such event (e.g., the registers associated with the conditional branch bcond and subroutine call jsr, in this example)
  • STEP 1070 A check is made as to whether a previous snapshot is usable, STEP 1072. This includes determining whether the registers to be used for the present snapshot (which corresponds to all registers, in accordance with STEP 1070, in one example) includes only unmodified registers since the previous snapshot was taken.
  • prev_snapshot_ID STEP 1076
  • the previous snapshot is usable for providing a snapshot for the present snapshot request. Otherwise, if a previous snapshot is not usable, then a snapshot is made, and this_snapshot_ID is updated (e.g., incremented by one), STEP 1078. Further, unmodified_regs is set equal to all registers, STEP 1080, and prev_snapshot_ID is set equal to this_snapshot_ID, STEP 1082, such that the present snapshot may be reused for future snapshot requests, e.g., at least with one or more of a recovery and a restoration snapshot request in accordance with embodiments of aspects of the present invention.
  • FIG. 10F implemented for sharing recovery snapshots may operate in conjunction with the processing of one or more of FIGS. 10A-10E implemented for sharing restoration snapshots.
  • snapshots may be shared for recovery, for restoration or for a combination of recovery and restoration.
  • the technique of FIG. 10F, as one example, can be used for sharing snapshots, regardless of the type.
  • FIGS. 10A-10F have been described with respect to a single snapshot for a register file, in accordance with embodiments of aspects of the present invention, the register snapshot techniques described herein may be performed for a variety of register types, including but not limited to general registers, integer registers, address registers, data registers, fixed point registers, floating point registers, vector registers, vector- scalar registers, condition registers, predicate register, control registers, special purpose registers, etc.
  • multiple register types may be supported in a single implementation, so as to provide, for example, snapshots for general purpose and vector registers (or other combinations of register types).
  • snapshots for floating point and vector registers may be shared, e.g., with an implementation in accordance with the z/Architecture providing for shared vector and floating point registers.
  • snapshots for floating point, vector registers and vector-scalar registers may be shared, e.g., with an implementation in accordance with the Power ISA providing for shared vector-scalar, vector, and floating point registers.
  • Other examples are also possible.
  • changes to memory are tracked in order to determine, if desired, whether restored registers are correct (i.e., does the snapshot used to recover the registers have the most current information). For instance, in one embodiment in which Store Multiple and Load Multiple instructions (or other such instructions) are used, changes to memory that occur between the load and store are captured to be able to determine correctness, if desired, of the restored registers in accordance with the values previously stored by a store multiple request and to be reloaded by a load multiple instruction, when the registers are restored from a register restoration snapshot rather than from memory.
  • Store Multiple and Load Multiple instructions or other such instructions
  • a register snapshot is restored based on receiving a register restoration request. Further, based on the request and restoring registers from a snapshot, each of the registers restored from a register snapshot is checked by loading the corresponding register value from the memory buffer corresponding to the register save/restore pair, and comparing the value restored from the register snapshot to the value loaded from the memory buffer. If a mismatch is detected, the restored registers are recovered from the in-memory buffer. Otherwise, the restored registers from the snapshot are used.
  • the checking of the registers can be performed in parallel to performing computation using the restored values, thereby enabling the application program to proceed with computation even if the checking has not completed.
  • a restore operation is obtained (e.g., received, provided, determined, retrieved, have, etc.), STEP 1100.
  • a Load Multiple instruction is received.
  • a determination is made as to whether the restore operation matches the previous save operation (e.g., do the registers of the Load Multiple match the registers of the Store Multiple paired with the Load Multiple; are the addresses the same; is a subset of the registers or addresses the same; etc.), INQUIRY 1102. If the registers to be restored correspond to those that were saved, then the last snapshot is obtained and used to restore the registers, STEP 1104.
  • Subsequent checking is performed to determine correctness of the restored values. Serialization for the subsequent checking is provided to ensure that checking commences after a snapshot has been restored, STEP 1106, and an indicator indicating whether a mismatch in which recovery of values from memory is to be performed, referred to as mismatch, is initialized to FALSE, STEP 1108.
  • a stored value corresponding to a selected register is loaded from memory using, e.g., a micro-operation, STEP 1110. That loaded value is compared to the restored value of the selected register being checked, STEP 1112. If the compare fails, i.e., the loaded value does not match the restored value, mismatch is set to TRUE.
  • the processor checks the mismatch indicator and if it indicates a failed compare, INQUIRY 1114, then the pipeline is flushed and a restart is performed after the restore operation, STEP 1116. Additionally, the values for the registers being restored are reloaded from memory, STEP 1118.
  • the flush is performed in order to cause all instructions that may have executed speculatively using the values from the restored snapshot to be re-executed with the values obtained from memory when a mismatch was detected.
  • the flush can be more selective, causing, e.g., only a flush and re-execution of instructions depending on the restored registers.
  • one or more steps of FIG. 11A are implemented by expanding a restore operation into one or more multiple internal operations (iops) corresponding to one or more of the steps of FIG. 11 A that may be executed out-of-order with respect to other instructions and/or internal operations corresponding to other instructions.
  • iops generated corresponding to the present instruction may be executed out-of-order relative to each other.
  • out-of-order execution logic provides suitable interlocks so as to ensure that subsequent operations are only executed when a snapshot has been restored, and further any speculative executed instructions that have been executed based on a restored value are invalidated, flushed and re-executed when a mismatch is detected.
  • the steps of FIG. 11 A are implemented as steps of dedicated circuitry for register restoration and validation.
  • the logic corresponding to FIG. 11 A is performed in parallel to executing subsequent instructions using restored values while the circuitry implementing the technique herein continues to verify the restored registers.
  • suitable interlocks so as to ensure that subsequent operations are only executed when a snapshot has been restored, and further any speculatively executed instructions that have been executed based on a restored value are invalidated, flushed and re-executed when a mismatch is detected.
  • FIG. 11 B Another embodiment of the restore, check and recover technique is described with reference to FIG. 11 B.
  • a restore operation such as a Load Multiple
  • STEP 1130 A determination is made as to whether the restore operation matches the previous save operation (e.g., do the registers of the Load Multiple match the registers of the Store Multiple paired to the Load Multiple; are the addresses the same; is a subset of the registers or addresses the same; etc.), INQUIRY 1132. If the registers to be restored correspond to those that were saved, then the last snapshot is obtained and used to restore the registers, STEP 1136.
  • subsequent checking is performed to determine correctness of the restored values.
  • Serialization for the subsequent checking is provided to ensure that checking commences after a snapshot has been restored, STEP 1138, and firstjnismatch is set to NONE, STEP 1140.
  • a stored value corresponding to a selected register is loaded from memory using, e.g., a micro-operation, STEP 1142. That loaded value is compared to the restored value of the selected register being checked, STEP 1144. If the compare fails, i.e., the loaded value does not match the restored value, firstjnismatch is set to the register that failed the compare. If firstjnismatch is no longer equal to NONE, the processor determines there is a mismatch. INQUIRY 1146, and the pipeline is flushed and a restart is performed after the restore operation, STEP 1150. Additionally, the values in the register of the failed compare and subsequent registers are reloaded from memory, STEP 1152.
  • the flush is performed in order to cause all instructions that may have executed speculatively using the values from the restored snapshot to be re-executed with the values obtained from memory when a mismatch was detected.
  • the flush can be more selective, causing, e.g., only a flush and re-execution of instructions depending on the restored registers, or depending on the restored registers that are recovered from memory, starting with the firstjnismatch register.
  • one or more steps of FIG. 11 B are implemented by expanding a restore operation into one or more multiple internal operations (iops) corresponding to one or more of the steps of FIG. 11 B that may be executed out-of-order with respect to other instructions and/or internal operations corresponding to other instructions.
  • iops generated corresponding to the present instruction may be executed out-of-order relative to each other.
  • out-of-order execution logic provides suitable interlocks so as to ensure that subsequent operations are only executed when a snapshot has been restored, and further any speculative executed instructions that have been executed based on a restored value are invalidated, flushed and re-executed when a mismatch is detected.
  • the steps of FIG. 11 B are implemented as steps of dedicated circuitry for register restoration and validation.
  • the logic corresponding to FIG. 11 B is performed in parallel to executing subsequent instructions using restored values while the circuitry implementing the technique herein continues to verify the restored registers.
  • suitable interlocks so as to ensure that subsequent operations are only executed when a snapshot has been restored, and further any speculatively executed instructions that have been executed based on a restored value are invalidated, flushed and re-executed when a mismatch is detected.
  • FIG. 11C Another embodiment of restoring, checking and recovering is described with reference to FIG. 11C.
  • individual registers are tracked and may be restored using the snapshot, while others may be restored from memory.
  • a restore operation (e.g., a Load Multiple) is obtained (e.g., received, provided, determined, retrieved, have, etc.), STEP 1160.
  • a determination is made as to whether the restore operation matches the previous save operation (e.g., do the registers of the Load Multiple match the registers of the Store Multiple paired to the Load Multiple; are the addresses the same; is a subset of the registers or addresses the same; etc.), INQUIRY 1162. If the registers to be restored correspond to those that were saved, then the last snapshot is obtained and used to restore the registers, STEP 1166. Thereafter, subsequent checking is performed to determine correctness of the restored values. Serialization for the subsequent checking is provided to ensure that checking commences after a snapshot has been restored, STEP 1168, and a mismatch set is set to an empty set, STEP 1170.
  • a stored value corresponding to a selected register is loaded from memory using, e.g., a micro- operation, STEP 1172. That loaded value is compared to a restored value of the selected register being checked, STEP 1174. If the compare fails, i.e., loaded value does not match the restored value, INQUIRY 1176, then the miscompared register is added to the mismatch set, STEP 1178.
  • this is achieved by causing a full or partial flush in order to cause all instructions that may have executed speculatively using the values from the restored snapshot to be re-executed with the values obtained from memory when a mismatch was detected.
  • the flush is selective, causing, e.g., only a flush and re-execution of instructions depending on the restored registers of the instruction, or depending on the restored registers that are recovered from memory, as represented by the registers in the mismatch set.
  • one or more steps of FIG. 11C are implemented by expanding a restore operation into one or more multiple internal operations (iops) corresponding to one or more of the steps of FIG. 11C that may be executed out-of-order with respect to other instructions and/or internal operations corresponding to other instructions.
  • iops generated corresponding to the present instruction may be executed out-of-order relative to each other.
  • out-of- order execution logic provides suitable interlocks so as to ensure that subsequent operations are only executed when a snapshot has been restored, and further any speculative executed instructions that have been executed based on a restored value are invalidated, flushed and re-executed when a mismatch is detected.
  • the steps of FIG. 11C are implemented as steps of dedicated circuitry for register restoration and validation.
  • the logic corresponding to FIG. 11C is performed in parallel to executing subsequent instructions using restored values while the circuitry implementing the technique herein continues to verify the restored registers.
  • suitable interlocks so as to ensure that subsequent operations are only executed when a snapshot has been restored, and further any speculatively executed instructions that have been executed based on a restored value are invalidated, flushed and re-executed when a mismatch is detected.
  • the checking for memory changes is not performed, since the saving and restoring are performed using Spillm and Reloadm instructions (or similar instructions), which are architecturally defined to not allow, between the spillm and reloadm, memory changes to the register values stored in memory.
  • the instruction definition indicates that the restored register values are undefined if the memory is modified.
  • the user is not to modify the corresponding stored area. If the user does modify the area, this is considered a programming error and correctness is not guaranteed.
  • Spillm saves the register values to memory so that they can be used as, for instance, a fallback in case of snapshot invalidation. Snapshot invalidation may occur, for example, if the processor runs out of physical registers, the processor runs out of storage for snapshots, there is a context switch, etc.
  • the verify snapshot of FIG. 4B is not needed. Therefore, as shown in FIG. 8A, there is no snapshot verification performed.
  • the bulk restore may be performed without using the matching inquiry (e.g., INQUIRY 806), since it may be architecturally defined that the Reloadm is to match the Spillm.
  • the matching inquiry e.g., INQUIRY 806
  • a capability is provided to ensure that a particular Reloadm matches a particular Spillm.
  • the capability includes invalidating, if need be, a snapshot to be used for recovery and/or ensuring that the caller's Reloadm is not satisfied using the callee's Spillm snapshot.
  • a number of techniques may be used, in accordance with aspects of the present invention, including performing a Reloadm into, for example, one single register simply to remove a snapshot; unstacking a Spillm snapshot using, for instance, an invalidate snapshot, Invsnap, instruction; or otherwise, removing a snapshot from the snapshot stack or ignoring a snapshot.
  • the number of Reloadm instructions that are skipped is determined by scanning the code of the function, and then that number of snapshots is invalidated.
  • code analysis for invalidated snapshots and snapshot invalidation may be performed in conjunction with conventional unwind processing, e.g., to restore registers in the presence of structured exception handling. This is shown in the below code provided in pseudocode notation: Unwind_and_invalidate ()
  • the processor scans the code of the function looking for Spillm/Reloadm pairs, STEP 1200.
  • the number of skipped Reloadm instructions is counted, STEP 1202, and a corresponding number of snapshots is invalidated, STEP 1204.
  • changes are tracked as they occur, instead of performing the recovery and checking as described with reference to FIGS. 11 A-11C.
  • a processor each time a processor updates memory, a check is made to determine if the update impacts the validity of a snapshot. If it does, then the requested values are obtained from memory, instead of the snapshot registers.
  • cache lines subject to a store multiple are marked as being in a write-set.
  • the interference is used to invalidate in-flight forwarding opportunities.
  • the cache lines of the write-set may be associated with a corresponding identifier to indicate which store/load pair is to be excluded from register restoration.
  • the write-set indication is not cleared until all intervening store memory operations (or other synchronizing operations) have completed.
  • the write-set for a buffer is not cleared until the corresponding load has completed. In at least one weak memory ordering embodiment, the write-set reservation is cleared immediately when the load has completed.
  • two register restoration sequences may be in-flight for the same memory location. For instance, a function is called, a store multiple is performed for callee-saved registers, the callee-saved registers are reloaded using load multiple, the function returns, the function is immediately called again, and another store multiple and load multiple occur to the same address.
  • write-set cache lines can be associated with multiple pairs. In another embodiment, when multiple cache lines are associated, a single bit is used to force the clearance of all store/load register restoration pairs. Various examples exist.
  • interference with stored register in-memory buffers from local accesses are to be considered. This may be accomplished by obtaining a base address and a range (either as a length or as an end address) for a memory buffer associated with a snapshot, and comparing the address of subsequent store operations against the range of the memory buffer to detect an interference.
  • this interference test is performed for instructions such as store-multiple bulk save, whereas interference for individual stores is tracked using an indicator associated with a cache line, or cache subline.
  • a single snapshot and associated in-memory buffer range for the most recent store/load pair is maintained. In other embodiments, additional snapshots and associated in-memory buffer ranges are supported to allow the maintenance of more pairs.
  • address checking can incur significant area, power and/or delay costs either by implementing concurrent checking logic or by forcing serial checking of interference at the risk of incurring queuing delays as requests are processed. To reduce these costs, a variety of approaches may be used in conjunction with embodiments of aspects of the present invention.
  • only remote accesses that hit in the first level data cache are compared against tracked memory ranges, when the L1 cache is inclusive.
  • additional filtering is provided by tracking the presence of buffers in specific cache lines, e.g., using marker bits.
  • marker bits may be used to indicate active buffers and buffers may be deactivated responsive to writes to cache lines and subcache lines responsive to a plurality of access types without comparing to tracked address ranges corresponding to snapshots, thereby invalidating snapshots without incurring the overhead of comparing tracked ranges corresponding to all snapshots.
  • snapshots corresponding to a cache line may be identified by the cache directory or a separate table, further reducing the number of comparisons that are to be performed.
  • memory addresses may be filtered using a memory address filter to reduce the number of memory accesses that are to be compared.
  • a variety of address filters may be used for address filtering in conjunction with one or more aspects of the present invention.
  • address filtering By using address filtering, more address ranges may be tracked without a commensurate cost in area, power and delay for memory checking.
  • more snapshots e.g., corresponding to multiple ranges for deeper levels of a call hierarchy may be tracked using a variety of filters and digests to give a conservative answer. In one embodiment, this is achieved by tracking the address of the first buffer to the last buffer.
  • This range may, for example, correspond to a number of stack frames holding memory buffers for register save and restore in function calls, while filtering out memory requests corresponding to heap, static variables, file buffers, and so forth.
  • the filter may capture additional information to differentiate buffer accesses from local variable accesses, and filter local variable accesses to further reduce the number of accesses that are compared against tracked memory buffer ranges corresponding to in-memory register spill buffers.
  • one or more filters may be periodically reset, in conjunction with invalidating pending register restoration snapshots, or when no register restoration snapshots are active.
  • a range filter is employed in accordance with one or more known snoop filter architectures.
  • the interference determination of transactional write sets to detect modifications of buffer memory for store/load instructions is used as a filter, and offending remote accesses are then compared against the exact buffer boundaries used in the core to check against interference from stores of the thread itself.
  • a data cache 1300 includes a plurality of cache lines 1302, and each cache line 1302 (or in another embodiment, selected portions of cache lines) is marked. For instance, each cache line or a portion of a cache line in those embodiments that mark cache line portions has an address tag 1304 indicating the memory address to which the cache line or cache line portion corresponds; a validity (V) indicator 1306 indicating whether the cache line or portion is valid; a dirty (D) indicator 1308 indicating whether the data from the cache line or portion is to be written back to memory; and a marking (M) indicator 1310, in accordance with an aspect of the present invention, used to indicate whether the snapshot is valid for the cache line or cache line portion. Further, the cache line or cache line portion includes the data 1312.
  • a request is obtained (e.g., received, provided, retrieved, have, determined, etc.) to fetch data from memory into a data cache, STEP 1320.
  • data is obtained from memory and stored into a cache line, STEP 1322.
  • An address tag is computed for the data obtained from memory and stored in the corresponding address tag field 1304, STEP 1324.
  • valid indicator 1306 is set to one, since the cache line is now valid; dirty indicator 1308 is set to zero, since the data was just loaded, and thus, not dirty; and marking indicator 1310 is set to zero, since registers have not been stored that have a corresponding snapshot, STEP 1326.
  • the indicators are also updated during a store into the cache, as described with reference to FIG. 13C.
  • data and an address are obtained (e.g., received, provided, retrieved, have, determined, etc.), STEP 1330.
  • a determination is made as to whether the store corresponds to an existing cache line within the cache, INQUIRY 1332. If not, then the cache reload procedure of FIG. 13B is performed, STEP 1334. However, if the store does correspond to a cache line, then the data is stored, STEP 1336, and the corresponding dirty indicator is set to one, STEP 1338.
  • this is a bulk save, e.g., a STMG instruction to spill a plurality of caller-saved or callee-saved registers to a stack frame, INQUIRY 1340, then the marking indicator is set to one, STEP 1342. Otherwise, it is zero.
  • one or more of the indicators may be updated based on receiving an update request from another processor, as described with reference to FIG. 13D.
  • data and a memory address are obtained (e.g., received, provided, retrieved, have, determined, etc.), STEP 1350.
  • a determination is made as to whether the store corresponds to an existing cache line within the cache, INQUIRY 1352. If not, then the cache reload procedure of FIG. 13B is performed, STEP 1354. However, if the store does correspond to a cache line, then the data is stored, STEP 1356, and the marking indicator is set to zero, STEP 1358.
  • the marking indicator is set to zero, since a write access from a remote processor may have modified the in-memory register buffer, thereby making the register snapshot stale with respect to the in-memory buffer.
  • the request range may be compared against the tracked addresses for snapshots to determine whether a specific access interferes with a snapshot. In at least one embodiment, this reduces the number of snaphost invalidations when an update corresponds to a portion of a memory buffer's cache line that does not correspond to the memory buffer.
  • the cache directory or logic associated therewith may be used to identify specific snapshots of the plurality of active snapshots which correspond to a cache line to reduce the number of interference checks to be performed.
  • using marker bits as a filter to reduce the number of interference checks is used to filter updates from the local processor. Other possibilities also exist.
  • LM Load Multiple
  • STEP 1400 A determination is made as to whether the Load Multiple instruction corresponds to a restoration request, INQUIRY 1402. This may be determined by checking, for instance, the additional fields of the snapshot stack (e.g., the address field) to determine if registers that were previously stored are being restored. If the Load Multiple instruction does not correspond to a restoration request, then the load multiple operation is performed, STEP 1404. If the Load Multiple instruction does correspond to a restoration request, then one or more register mappings are recovered, STEP 1406. For instance, one or more snapshots are used to recover the specified registers.
  • LM Load Multiple
  • the marking indicator (M) is obtained from the cache line (or cache line portion) corresponding to the registers indicated by the load multiple, STEP 1408, and a determination is made as to whether the cache line is marked as unmodified, INQUIRY 1410. If it is marked as unmodified, then a further determination is made as to whether there are more cache lines or cache portions are to be processed, INQUIRY 1412. If so, processing continues to STEP 1408. Otherwise, the register restoration processing is complete.
  • FIG. 14B Another example of register restoration is described with reference to FIG. 14B.
  • a check is made as to the validity of the snapshot to be used for restoration. For instance, a Store Multiple may have overwritten another Store Multiple. When this occurs, the first Store Multiple is no longer a valid restoration candidate for a Store Multiple/Load Multiple used for register restoration.
  • the snapshot stack may be traversed to determine if the current snapshot overlaps a previous snapshot in response to a store multiple request. In other embodiments, this check is performed for other memory update operations, or for other memory update operations when one or more filter criteria indicate that a check is to be performed. If a match is found, the entry in the stack of the previous snapshot is invalidated.
  • the valid indicator for that snapshot stack entry is set to invalid. This indicator is then checked during restoration processing.
  • FIG. 14B One example of this processing is described with reference to FIG. 14B.
  • a Load Multiple instruction (or similar instruction) is obtained (e.g., received, retrieved, determined, provided, have, etc.), STEP 1420.
  • a determination is made as to whether the Load Multiple instruction corresponds to a restoration request, INQUIRY 1422. This may be determined by checking, for instance, the additional fields of the snapshot stack (e.g., the address field) to determine if registers that were previously stored are being restored. If the Load Multiple instruction does not correspond to a restoration request, then the load multiple operation is performed by loading the plurality of registers to be loaded by the Load Multiple instruction from memory, STEP 1424.
  • the Load Multiple instruction does correspond to a restoration request, then a determination is made as to whether one or more restoration snapshots to be used are valid (i.e., to confirm that the in-memory buffer has not been overwritten), INQUIRY 1426. If the one or more valid indicators indicate that the one or more restoration snapshots are valid, then one or more register mappings are recovered, STEP 1428. For instance, one or more snapshots are used to recover the specified registers. Thereafter, or in parallel, the marking indicator (M) is obtained from the cache line (or cache line portion) corresponding to the registers of load multiple, STEP 1430, and a determination is made as to whether the cache line is marked as unmodified, INQUIRY 1432. If it is marked as unmodified, then a further determination is made as to whether there are more cache lines or cache portions are to be processed, INQUIRY 1434. If so, processing continues to STEP 1430. Otherwise, the register restoration processing is complete.
  • another mechanism for tracking modifications to memory includes using transactional memory hardware to track changes to memory.
  • Transactional memory has the capability to track interference, to track access to a range of memory locations that correspond to transactional state, and that capability may be used to track whether the buffer corresponding to the in-memory image of the saved registers is being modified.
  • the transactional memory facility may track whether an in-memory change affects a register included in a particular snapshot.
  • a capability is provided, in one aspect, for saving registers for transactional memory rollback recovery and function call register preservation using a shared register restore capability.
  • the facility is initiated by a bulk- save indicating event, e.g. receiving a bulk-save indicating instruction.
  • a TBegin (begin transactional execution instruction) is a first indicating instruction
  • a Store Multiple or Spill Multiple instruction is a second indicating instruction.
  • a test is made as to whether the present request is compatible with pre-existing requests. When compatibility is determined, processing proceeds. Otherwise, if the initial request corresponds to a transactional memory rollback request, a bulk save is directly performed, and in-core register preservation is used exclusively for transactional memory. If the first request is a register save request, then in-core register preservation for a function call bulk restore is terminated, and transactional memory saving is initiated. [0170] When a restore event occurs, the subset of tracked registers which have been saved are restored. In one embodiment, only modified registers are saved. In another embodiment, all tracked registers are saved.
  • register restoration is implemented as a modified transactional execution register rollback operation. For instance, when a bulk store is identified, a snapshot is made of the registers to be spilled into transactional memory (TM) register restoration state. Further, in one example, when a bulk restore is identified, the register snapshot is restored in a manner otherwise restored during a transaction abort.
  • TM transactional memory
  • a previous TM register snapshot is discarded when a new bulk store is identified, and the most recent bulk store can be received using the TM register restoration.
  • multiple spill snapshots are stored in register restoration snapshots corresponding to multiple nested transactions.
  • a nested transaction may be flattened, into an outer transaction to avoid deallocating a bulk store snapshot.
  • a transaction rollback, transaction failure, transaction interference, transaction abort, or other operation terminating and invalidating an operation triggers a restore event, when the initiating event is a TBegin.
  • a Load Multiple or Reload multiple is considered a restore event, when the initiating event is a Store Multiple, Spill Multiple request, or similar request.
  • a register bulk reload is performed when the register bulk save has been initiated by a Store Multiple, Spill Multiple, or similar request.
  • the saving of registers occurs incrementally, and the incrementally saved registers may be restored.
  • Further details relating to transactional memory and register restoration are described with reference to FIGS. 15A-15E.
  • a TBegin instruction is obtained (e.g., received, provided, determined, retrieved, have, etc.), STEP 1500.
  • the TBegin instruction initiates a transaction, and, in one example, as shown in FIG. 15B, includes, for instance, an operation code (opcode) field 1510 that includes an opcode specifying a transaction begin operation; a base field (Bi) 1512; a displacement field (Di) 1514; and an immediate field ( ) 1516.
  • opcode operation code
  • the first operand address designates the location of a 256 byte transaction diagnostic block, called a TBEGIN-specified TDB into which various diagnostic information may be stored if the transaction is aborted.
  • bits of field 1516 are defined as follows, in one example:
  • GRSM General Register Save Mask
  • Allow AR Modification A: The A control, bit 12 of the field, controls whether the transaction is allowed to modify an access register.
  • Allow Floating Point Operation F: The F control, bit 13 of the field, controls whether the transaction is allowed to execute specified floating point instructions.
  • Program Interruption Filtering Control Bits 14-15 of the field are the program interruption filtering control (PIFC). The PIFC controls whether certain classes of program exception conditions (e.g., addressing exception, data exception, operation exception, protection exception, etc.) that occur while the CPU is in the transactional execution mode result in an interruption.
  • PIFC program interruption filtering control
  • the field may include more, fewer or different controls than described herein.
  • a register restoration facility e.g., the register restoration snapshot facility
  • a snapshot of the registers to be saved as indicated by the TBegin instruction (e.g., specified by GRSM) is taken. Further, the tracking of transactional state interference is initiated, STEP 1508. For instance, transactional state buffer and TBegins are tracked.
  • a snapshot may be taken based on a register save request, as described with reference to FIG. 15C.
  • a register save indication e.g., Store Multiple
  • a determination is made as to whether the register restoration facility is in use for transactional execution (e.g., by checking an indicator), INQUIRY 1522. If it is in use for transactional execution, then the register state is stored in memory, STEP 1530. However, if the register restoration facility is not in use for transactional execution, then a further determination is made as to whether the register restoration facility is in use for register restoration of incompatible requests (e.g., of different registers), INQUIRY 1524. If it is in use for incompatible requests, then processing continues to STEP 1530, in which the register state is stored in memory.
  • incompatible requests e.g., of different registers
  • a register restoration snapshot is created (e.g., a snapshot of the registers specified by the Store Multiple), STEP 1526, and interference tracking for in-memory register restoration buffers is initiated using, for instance, logic of the transactional facility adapted to identify interference with a transaction's transactional state in memory, STEP 1528. Further, the register state is stored in memory, STEP 1530.
  • the in-memory register buffer of STEP 1530 is tracked for interference by the interference checking logic.
  • interference when a remote access to the in- memory buffer containing a copy of the saved registers is received, interference is registered.
  • no rollback occurs when the interference tracking logic is used to determine modification of register save buffers.
  • the in-processor register snapshot is not used when the registers are being restored, and the registers are instead retrieved from the in-memory register save buffer.
  • additional tracking is performed to track in-memory register save buffer modification by processor-local memory write instructions, e.g., by comparing writes to the address range of one or more memory buffers in accordance with one aspect of the present invention.
  • a snapshot is recovered based on a transactional rollback, as described with reference to FIG. 15D.
  • a rollback request e.g., responsive to interference with a transaction's transactional state or execution of a Transaction Abort (T Abort) instruction
  • T Abort Transaction Abort
  • STEP 1540 The state is rolled back to the starting point of the transaction, STEP 1542.
  • the register state is restored to the state at the beginning of the transaction (i.e., where the TBegin was executed to create the rollback snapshot for transactional execution in accordance with FIG. 15A).
  • Performing a transactional rollback includes, for instance, restoring a program counter and canceling the in-memory effects of an aborted transaction in accordance with the known implementation of transactional memory.
  • Transactional rollback is indicated as inactive, STEP 1544, and transactional memory interference tracking is deactivated, STEP 1546.
  • a snapshot is recovered based on a register restoration restore request, as described with reference to FIG. 15E.
  • a register restoration restore request (e.g., Load Multiple) is obtained (e.g., received, provided, have, etc.), STEP 1550, and a determination is made as to whether an in-core register restoration facility is active, INQUIRY 1552. If it is not active, then recovery is performed from memory state, STEP 1554. However, if in-core register restoration is active, then a further determination is made as to whether there is or has been interference with an in-memory register restoration buffer, INQUIRY 1556. If there is interference, then register restoration in-memory tracking is deactivated, STEP 1558, and processing continues to STEP 1554.
  • the indicated registers are recovered from in-core state (e.g., a snapshot), STEP 1560.
  • the program counter and in- memory effect rollback are excluded.
  • Register restoration in-memory tracking is deactivated, STEP 1562.
  • the transactional memory facility may be used to track changes.
  • transactional state is re-used by mirroring the actions used for transactional execution to achieve the goals of register restoration; however, transactional rollback processing and register restoration are triggered by different instructions; they are mutually exclusive in that when one is active for restoration, the other is not; and register restoration (e.g., based on a LM) does not recover the program counter or undo in-memory changes, as does transactional rollback processing, as examples.
  • register restoration e.g., based on a LM
  • tracking of memory changes is performed in conjunction with a snapshot stack.
  • a snapshot stack provides a list of buffers since each entry includes an address or address range of its buffer. Thus, each time there is a write, the address or address range of the write is compared against the address or address range of the buffers in the stack.
  • the snapshot stack used for tracking memory save changes corresponds to and is shared with a snapshot stack in accordance with FIG. 6 used for storing snapshot IDs created by register save operations for corresponding register restore operations in accordance with aspects of the present invention. Examples of various techniques used to track memory changes are described with reference to FIGS. 16A-16D.
  • a first technique described with reference to FIG. 16A relates to taking a new snapshot.
  • a Store Multiple (STM) instruction (or similar instruction) is obtained (e.g., received, provided, have, retrieved, determined, etc.) by the processor, STEP 1600, and a determination is made as to whether there is an available entry in the snapshot stack, INQUIRY 1602. If there is no available entry, then a snapshot stack overflow is performed, STEP 1604. For example, an error is indicated. If there is an available entry, the top-of- stack pointer is updated (e.g., incremented by 1), STEP 1606.
  • STM Store Multiple
  • a snapshot is created, STEP 1608, and a snapshot identifier is stored in the entry, STEP 1610. Additionally, the contents of the registers specified by the Store Multiple are stored in memory, STEP 1612, and the memory address range of where the contents are stored is included in the entry (e.g., address), STEP 1614. Further, the valid indicator is set (e.g., to 1) in the entry, STEP 1616, and other fields, if any, are also updated, STEP 1618.
  • a technique for tracking changes if executing an individual store request is described with reference to FIG. 16B.
  • a check of the stack is performed to determine whether there is any overlap. Initially, a memory write request with a store address is obtained (e.g., received, is provided, have, determined, retrieved, etc.) by this processor, STEP 1620. Then, for each entry in the snapshot stack, STEP 1622, a determination is made as to whether the address range for this entry matches the store address, INQUIRY 1624. If so, then the valid bit for the current entry is reset (e.g., to zero), STEP 1626.
  • a processor obtains (e.g., received, provided, retrieved, determined, have, etc.) a remote memory write request with a store address from another processor requesting exclusive access or a data update, STEP 1640. Then, for each entry in the snapshot stack, STEP 1642, a determination is made as to whether the address range for this entry matches the address of the store request, INQUIRY 1644. If so, then the valid bit for the current entry is reset (e.g., to zero), STEP 1646.
  • FIGS. 16B and 16C are described with respect to checking the addresses of all entries in a snapshot stack
  • the number of writes to be compared and entries on the snapshot stack may be reduced to reduce the cost of performing the test for snapshot invalidation.
  • filtering techniques such as snoop filters
  • Some example filters may be range filters, filtering by way of mark bits associated with a data cache, e.g., in conjunction with a cache in accordance with FIG. 13A, and so forth.
  • a subset of stack entries may be identified, e.g., by determining which entries are to be tested based on an address received.
  • snapshot entries may have entry indicators associated to cache lines containing a corresponding memory buffer.
  • a technique for performing register restoration based on receipt of a bulk restore is described with reference to FIG. 16D.
  • a Load Multiple (LM) instruction or similar instruction is obtained (e.g., received, provided, retrieved, have, determined, etc.) by the processor, STEP 1660.
  • the processor obtaining the request determines whether the load multiple operation corresponds to a restoration request, INQUIRY 1662. If it does not correspond to a restoration request, then the load multiple operation is performed, STEP 1664. However, if the load multiple operation corresponds to a restoration request, then a determination is made as to whether the corresponding restoration snapshot is valid, INQUIRY 1666.
  • Register restoration from in-core values represents a technique to recover values from in-core data stores.
  • these data stores are shared with micro-architectural structures used to implement other processor mechanisms (e.g., branch misprediction recovery and precise exceptions).
  • Examples of the in-core data stores are recovery files and register rename maps (e.g., snapshots).
  • not all values can be restored. For example, some registers may no longer be available to recover from, because they have been reallocated to hold new architected values.
  • values that have been overwritten are tracked, e.g., by allocation time tracking or tracking liveness.
  • each physical register is associated with a time when it was last allocated (written) to hold an architected value. Then, if that allocated time (tag) is later than the time (tag) of the created restore point, the value is not available.
  • bitmap of all (or a subset of) values is created, or a register rename map is updated.
  • a register becomes unavailable, it is removed from a bit map, or a register rename map, so that during recovery, the unrestorable registers are known.
  • the bit map or register rename map corresponds to a register restoration snapshot.
  • register restoration from in-core values is provided when in-core values can be determined to be available, in accordance with a liveness/availability tracking mechanism. The remaining values are loaded from memory.
  • a restoration request (e.g., a Load Multiple instruction) is obtained (e.g., received, provided, retrieved, determined, have, etc.), STEP 1700, and based thereon, the processor determines whether the snapshot corresponding to the registers to be restored is valid, INQUIRY 1702. If the snapshot is invalid, then the values are restored from memory, STEP 1704. However, if the snapshot is valid, for each register to be restored, STEP 1706, a determination is made, e.g., via time tracking, a bit map, etc., as to whether the particular register can be restored from the snapshot, INQUIRY 1708.
  • a restoration request e.g., a Load Multiple instruction
  • the processor determines whether the snapshot corresponding to the registers to be restored is valid, INQUIRY 1702. If the snapshot is invalid, then the values are restored from memory, STEP 1704. However, if the snapshot is valid, for each register to be restored, STEP 1706, a determination is made, e.g., via time
  • the register can be restored from the snapshot, then it is restored from the snapshot, by, for instance, updating a rename map, STEP 1710. However, if the register cannot be restored from the snapshot, then the register is restored from memory, e.g., by allocating a new rename register to the corresponding architected register and loading its value from memory, STEP 1712.
  • Register rename restoration captures processor state for later restoration based on explicit or inferred restoration point indicators (e.g., Spillm instruction, Store Multiple instruction, Store Multiple instructions using one of a well-defined base register, such as a frame pointer or stack pointer, etc.). Further, register restoration is performed upon an explicit or an inferred restoration request (e.g., Reloadm instruction, Load Multiple instruction, Load Multiple instructions using one of a well-defined base register, such as a frame pointer or stack pointer, etc.).
  • explicit or inferred restoration point indicators e.g., Spillm instruction, Store Multiple instruction, Store Multiple instructions using one of a well-defined base register, such as a frame pointer or stack pointer, etc.
  • register restoration is performed upon an explicit or an inferred restoration request (e.g., Reloadm instruction, Load Multiple instruction, Load Multiple instructions using one of a well-defined base register, such as a frame pointer or stack pointer, etc.).
  • a restore point used by one application may be incorrectly used by another application after a context switch. In particular, this may even occur when the identification of snapshot locations identify a particular binary, e.g., even using detailed fingerprints of binaries, as multiple instances of the same binary or library may be executing, or a fingerprint may be matching, and a restoration point from a function in one context may be used to perform restoration in the function of another process of the same binary or library matching the fingerprint.
  • an explicit and/or an inferred context switch e.g., switch from one application or function to another application or function, etc.
  • cognizance is included within a processor. Based on detection of a possible context switch, all or a subset of restoration points are invalidated, in one embodiment.
  • an invalidation instruction is provided that is used to invalidate one or more restoration points, e.g., as part of a context switch routine.
  • this instruction is executed by an operating system's context switch code.
  • at least one restoration point is invalidated. In a further embodiment, this does not occur so as to allow register restoration points to be used in the presence of functions which make system calls, when such system calls may be short, e.g., the POSIX getpid system call).
  • a change of a value in a register indicative of a process is used to identify a task switch.
  • register snapshots may be associated to indicia values indicative of a particular process, and an indicia match between the indica associated with a snapshot ID and the indicia of a current process requesting register restoration is to be confirmed before restoring registers using a register restoration snapshot.
  • indicia may be used in embodiments, such as the LPID and PID in one example embodiment.
  • the invalidation instruction may also be used for non-traditional control flow in programs, e.g., for setjump/longjump handling, or C++/Java structured exception processing. Other examples also exist.
  • an invalidate restoration snapshot instruction is used.
  • This instruction is, for instance, an architected instruction having an operation code indicating it is an invalidate restoration snapshot instruction, and one or more fields used to indicate or determine one or more snapshots to be invalidated. Since this is a new instruction, the operating system, in one example, is modified to recognize and use the instruction.
  • the processor obtains (e.g., received, provided, retrieved, determined, have, etc.) an invalidate restoration snapshot instruction, STEP 1800.
  • This instruction may be initiated based on determining a context switch, as described herein.
  • the processor clears at least one entry on the snapshot stack based on an indication by the instruction of the one or more snapshots to be invalidated, STEP 1802. Additionally, in one example, one or more of the marking bits in the cache entries corresponding to the addresses indicated in the one or more snapshots that are invalidated are cleared (e.g., set to zero), STEP 1804.
  • FIG. 18B instead of using an architected instruction that requires an update of the operating system, heuristics are used to determine whether there is a context switch, and therefore, one or more snapshots are to be invalidated.
  • a determination is made as to whether the processor detects changes in the processor state that are indicative of a context switch, INQUIRY 1820. For instance, has a program status word (PSW) changed or has the pointer to address translation tables changed, both of which may be indicating a context switch. If a context switch is indicated by one of these mechanisms or another mechanism, the processor clears at least one entry on the snapshot stack of the one or more snapshots to be invalidated, STEP 1822.
  • PSW program status word
  • one or more of the marking bits in the cache entries corresponding to the addresses indicated in the one or more snapshots that are invalidated are cleared (e.g., set to zero), STEP 1824. Otherwise, if a context switch is not indicated, this processing ends.
  • the use of the invalidation instruction or heuristics may be dynamically selected by the processor. Referring to FIG. 18C, the processor dynamically selects either the invalidate restoration snapshot instruction or the heuristic technique for determining whether there has been a context switch, STEP 1830. For instance, if the operating system is at a version level that does not recognize such an instruction, then the heuristic approach is selected.
  • the operating system may wait for issuance of the instruction or use heuristics to determine if a snapshot is to be invalidated. If it is determined that invalidation is to be performed, either by receiving the instruction or heuristically, INQUIRY 1832, then the processor clears at least one entry on the snapshot stack to be invalidated, STEP 1834. Additionally, in one example, one or more of the marking bits in the cache entries corresponding to the addresses indicated in the one or more snapshots that are invalidated are cleared (e.g., set to zero), STEP 1836. Otherwise, if a context switch is not indicated, this processing ends.
  • register preservation occurs incrementally for function call bulk state preservation, and registers are not saved to memory immediately upon receiving, for instance, a register Spill Multiple instruction.
  • registers are saved when in-core preservation is terminated due to, e.g., a switch to transactional memory preservation. This may be implemented, for instance, using a state machine of transitions.
  • multiple bulk save requests may be received, and therefore, it is to be determined if a given request is compatible with processing that is being performed. For instance, if no pre-existing bulk save request is present, a new request that is received is compatible. If a pre-existing bulk save request is present, and a bulk save request is received, a further determination is made: If the registers are mutually exclusive, they may be considered compatible. If they refer to one or more registers, and an intervening modification has occurred, they may be considered compatible. If hardware supports multiple bulk save/restores, they may be considered compatible.
  • a pre-existing transactional memory rollback request exists, and a transactional memory rollback request is received, a further determination is made: If nested transactions are implemented as flattened transactions, they are compatible. If nested transactions are true nested transactions, and a context (e.g., snapshot or other saving of state) is available, they are compatible. If no more storage to save additional state remains, flattening of nested transactions may be selected to achieve compatibility. [0227] If a pre-existing transactional memory rollback request is present, and a register save request is received, further tests are performed: If multiple bulk requests are supported, and storage is available for additional state, they may be considered compatible. If no intervening modifications have occurred to registers that are shared between transactional memory rollback set and Store Multiple set, they are compatible.
  • a TBegin instruction is obtained (e.g., received, provided, have, retrieved, determined), STEP 1900.
  • a determination is made as to whether a register restoration facility is in active use, INQUIRY 1902. If a register restoration facility is not in active use, then a transactional rollback snapshot is created (e.g., a snapshot is taken of the registers indicated to be saved by the TBegin instruction), STEP 1912, and tracking of transactional state interference is initiated (e.g., tracking whether an in- memory write corresponds to one of the registers of the snapshot), STEP 1914.
  • a register restoration facility if a register restoration facility is in active use, then a determination is made as to whether a snapshot compatible with the transactional request exists (e.g., are the registers the same), INQUIRY 1904. If the snapshot is compatible with the request, then the register restoration snapshot is used for transactional execution, STEP 1906. However, if the snapshot is not compatible with the request, then a further check is made as to whether more snapshots may be made (e.g., is there room in the snapshot stack), INQUIRY 1908. If more snapshots can be made, then processing continues to STEP 1912. Otherwise, the register restoration snapshot is deactivated, STEP 1910, and optionally, the snapshot is stored in memory if, for instance, it is not previously stored. In another embodiment, there are separate stacks for recovery snapshots and restoration snapshots.
  • a register save indication request (e.g., a Store Multiple) is obtained (e.g., received, provided, determined, retrieved, have, etc.), STEP 1920.
  • a determination is made as to whether the register restoration facility is in use for incompatible requests, INQUIRY 1922. If the facility is in use for such requests, a further determination is made as to whether storage is available for additional snapshots (referred to herein as snapshot contexts), INQUIRY 1924. If not, then the register state is stored in memory, STEP 1932.
  • register restoration snapshot is created, STEP 1926. Further, interference tracking for an in-memory register restoration buffer is initiated, STEP 1928. Optionally, the register state is stored in memory, STEP 1932.
  • a capability is provided to coalesce a plurality of load and store instructions to determine a range of registers to be restored.
  • the processor is adapted to recognize a sequence of individual load and store operations which may be coalesced into a single restore and save operation, respectively. Coalescing may be performed using a variety of techniques.
  • coalescing sequences of loads and stores is used to enable the use of the register restoration techniques described herein in conjunction with legacy code sequences without bulk save and restore instructions, such as STMG and LMG for z/Architecture general purpose registers, or STMW and LMW for Power ISA fixed point registers.
  • legacy code sequences without bulk save and restore instructions
  • STMG and LMG for z/Architecture general purpose registers
  • STMW and LMW for Power ISA fixed point registers.
  • this includes the bulk save of some register types in the z/Architecture and Power ISA, such as, inter alia, floating point registers in z/Architecture and Power ISA, and vector registers in Power ISA for which no store multiple and load multiple floating point instructions exist.
  • some architectures do not provide store multiple and load multiple instructions for any register types
  • each store request may start a store multiple coalescing sequence that may be recognized.
  • only certain store requests trigger a coalescing sequence that may be recognized. This is to, for instance, avoid power and other overhead associated with operating additional logic.
  • a coalescing sequence is started only by store requests that use a certain register, e.g., a frame point, stack pointer, or other distinguished register as a base register.
  • at least a first and second instruction with adjacent addresses start a sequence. Other examples are also possible.
  • the state of the register file (e.g., the register file map, etc.) is snapshot.
  • a bit mask is initialized, in one example, reflecting which registers may be restored from a snapshot. Subsequent writers to registers indicate in the bit mask that a particular register no longer corresponds to the value in the snapshot.
  • a subsequent store refers to such a register, it may either be separately performed independent of the coalescing sequence, start a new coalescing sequence, or both.
  • a strict order may be imposed on the instruction sequence, e.g., each store is to store the register R(N+1) if the previous instruction stored register R(N), enabling a single counter to track the embodiment.
  • heuristics are used to limit the stores which may trigger the creation of a snapshot.
  • coalescing of individual stores and loads into groups of stores and loads which may then trigger state snapshotting and register restoration in accordance with an aspect of the present invention is performed in conjunction with group formation.
  • instructions are grouped to keep adjacent stores without intervening modifications of registers in the store range.
  • loads are coalesced in a similar manner as stores.
  • loads are executed singly wherein for each load a corresponding rename register is retrieved from a register snapshot individually because in at least one embodiment, the overhead of register restoration is primarily associated with storing and maintaining a mechanism to retrieve stored values for in-core restoration.
  • the recognizing is performed in one of a pre-decode, a group formation and a decode stage.
  • a trace cache a loop cache or other such cache, it may operatively be coupled to logic adapted to creating and/or optimizing a loop, trace or iop (internal operation) cache.
  • a technique for restoring registers from an in-core value pool includes recognizing a sequence of adjacent individual store instructions of adjacent registers, creating and maintaining a single snapshot for restoration, and using a single snapshot to bypass registers from the single snapshot.
  • a single store request is obtained (e.g., received, have, provided, retrieved, determined, etc.), STEP 2000.
  • a determination is made as to whether this is a possible start of a store sequence (e.g., a multi-store/register spill sequence), INQUIRY 2002. This may include checking for a subset of registers, addressing modes, addressing ranges, or another indication of a register spill sequence. If it is determined that it is not the start of a possible store sequence, then one or more stores are performed, STEP 2004. However, if it is a possible start of a store sequence, then a prospective register restoration snapshot request with the present store request is tracked, STEP 2006.
  • next instruction is not a continuation of a store request, INQUIRY 2008
  • a determination is made as to whether a register restoration snapshot is desirable, INQUIRY 2014. That is, does the prospective snapshot have enough registers to make the snapshot worthwhile?
  • a snapshot is desirable, then a register restoration snapshot technique is performed to create a snapshot, STEP 2016. However, if a snapshot is not desirable, then one or more stores are performed, STEP 2018.
  • snapshots saving a certain minimum number of registers are desirable, so as to amortize the cost of creating and managing a snapshot.
  • desirability of a snapshot is estimated based on possible runtime improvement.
  • when a prospective snapshot offers more than a set threshold of runtime improvement it is considered desirable. Other possibilities also exist.
  • a single load request is obtained (e.g., received, have, retrieved, determined, provided, etc.), STEP 2040.
  • INQUIRY 2042 includes a test whether the load request corresponds to the most recent register snapshot with respect to the register being restored and the specified in-memory storage location. If it is determined that it is not the start of a possible restore sequence, then one or more loads are performed, STEP 2044. However, if it is a possible start of a restore sequence, then a prospective register restoration request with the present load request is tracked, STEP 2046.
  • next instruction is not a continuation of the restore request, INQUIRY 2048, then a determination is made as to whether the restore request(s) match the register restoration snapshot, INQUIRY 2054. If so, then a register restoration snapshot restore technique is performed, STEP 2056. Otherwise, one or more loads are performed, STEP 2058.
  • an in-core register restoration snapshot is made. Additionally, spilled registers are stored to a temporary location (commonly the stack frame of the current function) in case the in-core register restoration snapshot is invalidated. Contents of the Spillm registers are placed on a store queue and written to memory. [0247] The registers are restored from the in-core register restoration snapshot by, for instance, the Reloadm instruction, if the register snapshot is valid. Otherwise, Reloadm reloads the values from memory (e.g., the temporary storage area in the current function's stack frame).
  • values to be stored based on a Spillm instruction may still be queued in the store queue to be written to caches and eventually system memory. Once the Reloadm has completed, no further reads to the buffer will occur. Consequently, these values use up valuable space in the store queue and cause time delay for subsequent stores in the store queue as well as energy consumption while processing stores that are known to be useless.
  • store queue entries corresponding to the Spillm/Reloadm save/restore sequence are invalidated. For instance, they are removed from the store queue, or when they come to the head of the store queue to be committed to the memory hierarchy, they are not written. Other examples also exist.
  • a store request (SRQ) write back logic 2100 is located in a store queue 2102 that further includes address information 2104 and data 2106 for each store queue entry.
  • Store queue 2102 receives store requests 2108 from a load/store unit (LSU) 2110 of a central processing unit (CPU) 2112.
  • CPU 2112 further includes, for instance, an instruction fetch unit (IFU) 2114 that fetches instructions, which are decoded using an instruction decode unit 2116.
  • the decoded instructions may be executed via one or more execution units 2118.
  • Store request write back logic 2100 performs a write back to a memory subsystem 2122, which may include one or more caches 2124 and memory 2126.
  • the write back logic includes the following:
  • the write back logic includes:
  • FIGS. 21 B-21C is performed in conjunction with the
  • Spillm/Reloadm instructions since those instructions indicate that the buffer is not programmer accessible for writes at a particular point in time (e.g., between store and load).
  • one or more instructions or other mechanisms are used to indicate that the storage buffer will no longer be accessed after a load multiple state restore or after another selected point in time.
  • data stored in the stack region i.e., those pages in the address space allocated for holding the stack
  • the stack pointer are considered to be no longer used, and are suppressed during write back from the store queue, and/or in responding to XI cross-interrogate requests.
  • optimizations wherein data is allocated below the stack pointer are not permissible.
  • the write back of data is suppressed when the data is written beyond the defined region wherein data may be allocated and accessed beyond the stack pointer, e.g., write back from a store queue may be suppressed for addresses more than 512 bytes below the stack pointer, in accordance with, e.g., the Power ELFv2 ABI.
  • an alternative to the register restoration snapshot may be used.
  • This alternative is a recovery buffer, also referred to as a recovery file.
  • the register values may be recovered from the recovery buffer rather than a snapshot.
  • the old values are stored in a recovery file queue in case they are needed for recovery.
  • FIG. 22A One example of such a recovery file is depicted in FIG. 22A.
  • a recovery file 2200 is implemented as a queue, and includes one or more register recovery values corresponding to executed instructions 2202.
  • recovery buffer 2200 includes a plurality of registers 2204 having a register number to be recovered, Rn, Rk, Rm, and so forth, and a register value 2206 to be recovered.
  • An instruction tag 2208 is assigned to each register 2204.
  • the queue has a tail pointer 2209 pointing to the recovery register corresponding to the oldest instruction that may be rolled back by recovering the value overwritten by the instruction, and a head pointer 2210 pointing to the recovery value corresponding to the youngest instruction and indicating the position where additional recovery values will be inserted responsive to instructions being executed.
  • a stored recovery value is associated with an instruction to restore.
  • the stored value is overwritten by an instruction and recovery is performed when the instruction is flushed.
  • a state machine may be provided in which for each flushed instruction, the value is recovered from recovery buffer 2200 reading recovery values 2206 corresponding to instructions.
  • the state machine may have a forward or backward scan.
  • an instruction is fetched from an instruction cache 2220 by an instruction fetch unit 2222 and decoded by an instruction decode unit 2224. Operands are obtained from one or more register files 2226. If a value from a register file is modified, it is stored in recovery buffer 2228 at the head of the recovery buffer.
  • One or more execution units 2230 execute the instructions, and completion and recovery unit 2232 completes the instruction or if there is a misprediction or exception, processing proceeds to the recovery buffer 2228 that walks backwards, in one example, copying each value to be restored from recovery buffer 2228 to register files 2226 until the in-order register state at the misprediction or exception point is restored.
  • the recovery values stored in fast processor memory are used to restore values responsive to a register restoration request corresponding to a load multiple, a coalesced load multiple sequence or Reloadm instructions.
  • the processor steps through the recovery buffer to retrieve the values present at entry to the store multiple.
  • the step through the recovery buffer to restore register values is performed via a backward scan or a forward scan. (In one particular example, the oldest recovery entry successive to a save request for a specified register is restored.)
  • a forward scan is indicated below:
  • Restore tracks for each register whether a register is still to be restored. It is initialized to restore all registers corresponding to the registers specified by the Load Multiple (or Reloadm) instruction. If the tag of a corresponding Store Multiple (STM) from which the Load Multiple (LM) is to restore the state cannot be located, special handling is performed. In at least one embodiment, the special handling corresponds to loading values from the memory location specified in the Load Multiple or Reloadm instruction.
  • STM Store Multiple
  • LM Load Multiple
  • the pseudocode then scans forward through the recovery buffer starting at the recovery buffer entry corresponding to the tag of the register save instruction (e.g., a STM Store Multiple or Spillm) up to the instruction restoring registers (e.g., the Load Multiple or Reloadm).
  • the register save instruction e.g., a STM Store Multiple or Spillm
  • the instruction restoring registers e.g., the Load Multiple or Reloadm
  • the recovery buffer entry is read (i.e., represented by the value RecFileQ), consisting of at least the fields Rec.reg indicating the register number (2204 of FIG. 22A) contained in the particular recovery buffer entry and the value to be restored to the register, Rec.value (field 2206 of FIG. 22A). If the register corresponds to one of the registers of the Load Multiple or Reloadm, the first (oldest) value overwritten after the STM Store Multiple (Spillm) is restored. [0264] Any remaining registers in Restore[] that have not been restored from the recovery buffer are restored from memory.
  • the logic includes:
  • values beyond the recovery file tail may be recovered. This may be performed if the value has not been overwritten, which can be determined by comparing against a highwater mark of the head/second tail that moves in response to the head overwriting the tail entries. If head is greater than or equal to the second tail, then the second tail is equal to the head.
  • the actual state restored by exception and branch misprediction recovery, as well as register restoration is contained, in one example, in physical registers.
  • a physical register is allocated, an architected register to be written to is mapped to the allocated physical register, and the physical register is written.
  • physical registers are allocated responsive to register write requests so as to maximize the utility of physical registers as a source for restoring registers responsive to a register restoration request. For instance, the allocation technique for new registers from the physical register file is modified to give preference to selecting registers not associated with a register restoration snapshot.
  • a selection is made so as to minimize the performance impact by selecting a physical register which may not be a part of the registers to be restored from a snapshot, or from a snapshot with the least performance impact.
  • a snapshot corresponding to the least performance impact is the oldest snapshot.
  • a register is selected, in accordance with an aspect of the present invention, STEP 2304. For instance, a register is selected that is not in an active snapshot for recovery (e.g., branch misprediction, exception, etc.); i.e., a register used in register restoration snapshots, but not recovery snapshots.
  • a register is selected from the oldest snapshot, since, for instance, the oldest snapshot may be more likely to have a register ready to be freed and it may be less costly to take a register from an older snapshot, since it may not be used as soon as a register in a younger snapshot.
  • a register from a register restoration snapshot is chosen that does not correspond to a register to be restored, i.e., a register outside the range of registers specified to be saved by the STM or Spillm instruction.
  • the register may be marked as invalid in the snapshot, STEP 2306, or the register restoration snapshot may be deallocated, STEP 2308.
  • the register restoration snapshot When a register restoration snapshot is deallocated, the physical registers associated with that snapshot become available when they do not correspond to registers also used in another snapshot.
  • the selected register is then allocated, STEP 2310.
  • FIG. 23B Another embodiment for allocating a register is described with reference to FIG. 23B.
  • a determination is made as to whether an unused physical register is available, INQUIRY 2330. If an unused physical register is available, an unused register is selected and allocated, STEP 2332. However, if an unused physical register is not available, then an oldest snapshot is selected, STEP 2334, and a determination is made as to whether the selected snapshot is a register restoration snapshot, INQUIRY 2336. If it is a register restoration snapshot, it is deallocated enabling the registers associated therewith to become available if they are not in another snapshot, STEP 2340. Processing continues to INQUIRY 2330.
  • rename registers and rename maps may be used for implementing branch misprediction recovery and precise exceptions.
  • the in-order program state can be recovered from the register rename map and the physical registers, and by flushing speculatively stored state in store queues (and possibly caches, e.g., for an embodiment using transactional memory).
  • the register rename maps and physical registers may be deallocated and reused.
  • rename registers and rename maps used for implementing branch misprediction recovery and precise exceptions are also used to implement register restoration for saving and restoring program state, e.g., in the context of function calls, for recovering caller and callee-saved registers in the caller and a callee, respectively.
  • register snapshots are created in order to implement branch misprediction recovery, and implement precise exceptions in the presence of out-of-order execution.
  • additional snapshots are made for recovering architected state using register restoration.
  • holding such register snapshots for register restoration may cause an insufficient number of free registers to become available, stopping processors from making progress during execution when no new target registers can be executed.
  • register restoration snapshots may be allocated, but recovery may never occur.
  • structured C++/Java exception handling may cause a function to abort without ever restoring its state, or setjump/longjump may similarly prevent a register restore to be encountered that may deallocate a register snapshot allocated for register restoration.
  • register snapshots are maintained in a separate queue, rename registers referenced in snapshots are prevented from being deallocated, and register snapshots may be recycled based on ensuring a suitable supply of free registers.
  • register freelist i.e., the register rename pool used to satisfy new register allocation requests
  • register snapshots may be deallocated until the freelist reaches the target size.
  • the target size may be a fixed threshold, or based on an estimate of the number of new rename registers allocated by a current workload, possibly weighted by the number of cycles needed to deallocate snapshots and make available additional registers. (In one embodiment, that threshold may be 0, and snapshots would only be deallocated to satisfy rename register allocation requests.
  • register rename snapshots and their associated registers are deallocated and recycled into the rename freelist in the FIFO (first in, first out) policy, where the snapshot having been allocated the earliest is also deallocated first.
  • register snapshots that have been used by register restoration are also immediately deallocated - this may take particular advantage of the stack model used for function calls, where the most recently entered function is exited first, and so its register restoration state may become available for deallocation first).
  • register snapshots for register restoration are captured under mask control, so that a snapshot may only contain the registers listed by the Spilll/STM request, in order to prevent rename registers from being unnecessarily prevented from reallocation.
  • registers corresponding to register save/restore sequences that may be listed in register rename map snapshots made for register restoration are not independently retained. Instead, rename registers are deallocated based on their use for implementing branch misprediction recovery and precise exceptions. When registers are no longer needed, they are returned to the freelist pool (e.g., in accordance with an implementation, such as that of Buti et al.)
  • the freelist selection algorithm is modified to select registers from the freelist which are not referenced by a register restoration snapshot. In another embodiment, the freelist selection algorithm is modified to select registers from the freelist which were allocated to a register rename snapshot earlier than other rename registers. In yet another embodiment, the freelist selection algorithm is modified to select registers from the freelist which are not referenced by an active register rename snapshot (i.e., for example excluding most recently allocated snapshots that have already been used to restore the register state, e.g., for a corresponding function). In yet another embodiment, a combination of any of these three criteria and additional criteria may be used. In yet another embodiment, a combination of all of these three criteria and additional criteria may be used. Other possibilities also exist.
  • register restoration processing Described herein are various aspects and embodiments of register restoration processing. Although a number of examples and techniques are provided, variations and/or additions may be made without departing from a spirit of aspects of the present invention. [0285] One or more aspects of the present invention are inextricably tied to computer technology and facilitate processing within a computer, improving performance thereof. Further details of one embodiment of facilitating processing within a computing environment, as it relates to one or more aspects of the present invention, are described with reference to FIGS. 24A-24B.
  • a load request to restore a plurality of architected registers is obtained by a processor (2400). Based on obtaining the load request, one or more architected registers of the plurality of architected registers are restored (2402).
  • the restoring includes, e.g., using a snapshot that maps architected registers to physical registers to replace one or more physical registers currently assigned to the one or more architected registers with one or more physical registers of the snapshot corresponding to the one or more architected registers (2404).
  • the one or more architected registers are restored without a copying of values for the one or more architected registers from memory (2406).
  • the one or more architected registers are restored by loading values from memory into the one or more architected registers (2412).
  • the determining whether a snapshot is available includes using a snapshot stack to determine whether a snapshot corresponding to the one or more architected registers is available (2420).
  • the snapshot stack includes a plurality of entries (2422), and an entry of the snapshot stack includes a snapshot identifier identifying the snapshot (2424).
  • the entry of the snapshot stack may include additional information including at least one of an address in memory of contents of the one or more architected registers, an indication of the one or more architected registers associated with the snapshot, and/or a validity indicator indicating whether the snapshot is valid (2426).
  • the snapshot is created to save a mapping of the one or more physical registers to the one or more architected registers (2428).
  • the creating the snapshot is performed, e.g., based on obtaining a save request requesting a saving of the one or more architected registers (2430).
  • the load request includes a load multiple instruction
  • the save request includes a store multiple instruction (2432).
  • a computing environment 2500 includes, for instance, a native central processing unit (CPU) 2502, a memory 2504, and one or more input/output devices and/or interfaces 2506 coupled to one another via, for example, one or more buses 2508 and/or other connections.
  • computing environment 2500 may include a PowerPC processor or a pSeries server offered by International Business Machines Corporation, Armonk, New York; and/or other machines based on architectures offered by International Business Machines Corporation, Intel, or other companies.
  • Native central processing unit 2502 includes one or more native registers 2510, such as one or more general purpose registers and/or one or more special purpose registers used during processing within the environment. These registers include information that represents the state of the environment at any particular point in time.
  • native central processing unit 2502 executes instructions and code that are stored in memory 2504.
  • the central processing unit executes emulator code 2512 stored in memory 2504.
  • This code enables the computing environment configured in one architecture to emulate another architecture.
  • emulator code 2512 allows machines based on architectures other than the z/Architecture, such as PowerPC processors, pSeries servers, or other servers or processors, to emulate the z/Architecture and to execute software and instructions developed based on the z/Architecture.
  • Guest instructions 2550 stored in memory 2504 comprise software instructions (e.g., correlating to machine instructions) that were developed to be executed in an architecture other than that of native CPU 2502.
  • guest instructions 2550 may have been designed to execute on a z/Architecture processor, but instead, are being emulated on native CPU 2502, which may be, for example, an Intel processor.
  • emulator code 2512 includes an instruction fetching routine 2552 to obtain one or more guest instructions 2550 from memory 2504, and to optionally provide local buffering for the instructions obtained. It also includes an instruction translation routine 2554 to determine the type of guest instruction that has been obtained and to translate the guest instruction into one or more corresponding native instructions 2556.
  • emulator code 2512 includes an emulation control routine 2560 to cause the native instructions to be executed.
  • Emulation control routine 2560 may cause native CPU 2502 to execute a routine of native instructions that emulate one or more previously obtained guest instructions and, at the conclusion of such execution, return control to the instruction fetch routine to emulate the obtaining of the next guest instruction or a group of guest instructions.
  • Execution of native instructions 2556 may include loading data into a register from memory 2504; storing data back to memory from a register; or performing some type of arithmetic or logic operation, as determined by the translation routine.
  • Each routine is, for instance, implemented in software, which is stored in memory and executed by native central processing unit 2502.
  • one or more of the routines or operations are implemented in firmware, hardware, software or some combination thereof.
  • the registers of the emulated processor may be emulated using registers 2510 of the native CPU or by using locations in memory 2504.
  • guest instructions 2550, native instructions 2556 and emulator code 2512 may reside in the same memory or may be disbursed among different memory devices.
  • firmware includes, e.g., the microcode or Millicode of the processor. It includes, for instance, the hardware-level instructions and/or data structures used in implementation of higher level machine code. In one embodiment, it includes, for instance, proprietary code that is typically delivered as microcode that includes trusted software or microcode specific to the underlying hardware and controls operating system access to the system hardware.
  • a guest instruction 2550 that is obtained, translated and executed is, for instance, one of the instructions described herein.
  • the instruction which is of one architecture (e.g., the zl Architecture), is fetched from memory, translated and represented as a sequence of native instructions 2556 of another architecture (e.g., PowerPC, pSeries, Intel, etc.). These native instructions are then executed.
  • One or more aspects may relate to cloud computing.
  • Cloud computing is a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service.
  • This cloud model may include at least five characteristics, at least three service models, and at least four deployment models.
  • On-demand self-service a cloud consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with the service's provider.
  • Resource pooling the provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to demand. There is a sense of location independence in that the consumer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter).
  • Rapid elasticity capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.
  • Measured service cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported providing transparency for both the provider and consumer of the utilized service.
  • level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts).
  • SaaS Software as a Service: the capability provided to the consumer is to use the provider's applications running on a cloud infrastructure.
  • the applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based email).
  • a web browser e.g., web-based email.
  • the consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
  • PaaS Platform as a Service
  • the consumer does not manage or control the underlying cloud infrastructure including networks, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.
  • laaS Infrastructure as a Service
  • the consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
  • Private cloud the cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on-premises or off-premises.
  • Public cloud the cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.
  • Hybrid cloud the cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for loadbalancing between clouds).
  • a cloud computing environment is service oriented with a focus on statelessness, low coupling, modularity, and semantic interoperability.
  • An infrastructure comprising a network of interconnected nodes.
  • cloud computing environment 50 comprises one or more cloud computing nodes 10 with which local computing devices used by cloud consumers, such as, for example, personal digital assistant (PDA) or cellular telephone 54A, desktop computer 54B, laptop computer 54C, and/or automobile computer system 54N may communicate.
  • Nodes 10 may communicate with one another. They may be grouped (not shown) physically or virtually, in one or more networks, such as Private, Community, Public, or Hybrid clouds as described hereinabove, or a combination thereof.
  • This allows cloud computing environment 50 to offer infrastructure, platforms and/or software as services for which a cloud consumer does not need to maintain resources on a local computing device.
  • computing devices 54A-N shown in FIG. 26 are intended to be illustrative only and that computing nodes 10 and cloud computing environment 50 can communicate with any type of computerized device over any type of network and/or network addressable connection (e.g., using a web browser).
  • Hardware and software layer 60 includes hardware and software components. Examples of hardware components include mainframes 61 ; RISC (Reduced Instruction Set Computer) architecture based servers 62; servers 63; blade servers 64; storage devices 65; and networks and networking components 66. In some embodiments, software components include network application server software 67 and database software 68.
  • hardware components include mainframes 61 ; RISC (Reduced Instruction Set Computer) architecture based servers 62; servers 63; blade servers 64; storage devices 65; and networks and networking components 66.
  • software components include network application server software 67 and database software 68.
  • Virtualization layer 70 provides an abstraction layer from which the following examples of virtual entities may be provided: virtual servers 71 ; virtual storage 72; virtual networks 73, including virtual private networks; virtual applications and operating systems 74; and virtual clients 75.
  • management layer 80 may provide the functions described below.
  • Resource provisioning 81 provides dynamic procurement of computing resources and other resources that are utilized to perform tasks within the cloud computing environment.
  • Metering and Pricing 82 provide cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may comprise application software licenses.
  • Security provides identity verification for cloud consumers and tasks, as well as protection for data and other resources.
  • User portal 83 provides access to the cloud computing environment for consumers and system administrators.
  • Service level management 84 provides cloud computing resource allocation and management such that required service levels are met.
  • Service Level Agreement (SLA) planning and fulfillment 85 provides pre-arrangement for, and procurement of, cloud computing resources for which a future requirement is anticipated in accordance with an SLA.
  • SLA Service Level Agreement
  • Workloads layer 90 provides examples of functionality for which the cloud computing environment may be utilized. Examples of workloads and functions which may be provided from this layer include: mapping and navigation 91 ; software development and lifecycle management 92; virtual classroom education delivery 93; data analytics processing 94; transaction processing 95; and register restoration and associated processing 96.
  • the present invention may be a system, a method, and/or a computer program product at any possible technical detail level of integration
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
  • a computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating
  • electromagnetic waves electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
  • Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, configuration data for integrated circuitry, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++, or the like, and procedural programming languages, such as the "C" programming language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures.
  • two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • one or more aspects may be provided, offered, deployed, managed, serviced, etc. by a service provider who offers management of customer environments.
  • the service provider can create, maintain, support, etc. computer code and/or a computer infrastructure that performs one or more aspects for one or more customers.
  • the service provider may receive payment from the customer under a subscription and/or fee agreement, as examples. Additionally or alternatively, the service provider may receive payment from the sale of advertising content to one or more third parties.
  • an application may be deployed for performing one or more embodiments.
  • the deploying of an application comprises providing computer infrastructure operable to perform one or more embodiments.
  • a computing infrastructure may be deployed comprising integrating computer readable code into a computing system, in which the code in combination with the computing system is capable of performing one or more embodiments.
  • a process for integrating computing infrastructure comprising integrating computer readable code into a computer system
  • the computer system comprises a computer readable medium, in which the computer medium comprises one or more embodiments.
  • the code in combination with the computer system is capable of performing one or more embodiments.
  • a data processing system suitable for storing and/or executing program code includes at least two processors coupled directly or indirectly to memory elements through a system bus.
  • the memory elements include, for instance, local memory employed during actual execution of the program code, bulk storage, and cache memory which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
  • I/O devices can be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modems, and Ethernet cards are just a few of the available types of network adapters.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Executing Machine-Instructions (AREA)
  • Advance Control (AREA)
PCT/IB2018/051646 2017-04-18 2018-03-13 Register context restoration based on rename register recovery Ceased WO2018193321A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
DE112018000848.7T DE112018000848T5 (de) 2017-04-18 2018-03-13 Registerkontextwiederherstellung auf der Grundlage der Wiedergewinnung von Umbenennungsregistern
JP2019556276A JP7046098B2 (ja) 2017-04-18 2018-03-13 リネーム・レジスタ復旧に基づくレジスタ・コンテキスト復元
CN201880025664.XA CN110520837B (zh) 2017-04-18 2018-03-13 用于促进计算环境中的处理的方法、系统和介质
GB1916132.2A GB2575412B (en) 2017-04-18 2018-03-13 Register context restoration based on rename register recovery

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15/490,013 2017-04-18
US15/490,013 US10838733B2 (en) 2017-04-18 2017-04-18 Register context restoration based on rename register recovery

Publications (1)

Publication Number Publication Date
WO2018193321A1 true WO2018193321A1 (en) 2018-10-25

Family

ID=63790045

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2018/051646 Ceased WO2018193321A1 (en) 2017-04-18 2018-03-13 Register context restoration based on rename register recovery

Country Status (6)

Country Link
US (1) US10838733B2 (enExample)
JP (1) JP7046098B2 (enExample)
CN (1) CN110520837B (enExample)
DE (1) DE112018000848T5 (enExample)
GB (1) GB2575412B (enExample)
WO (1) WO2018193321A1 (enExample)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10740108B2 (en) 2017-04-18 2020-08-11 International Business Machines Corporation Management of store queue based on restoration operation
US10545766B2 (en) 2017-04-18 2020-01-28 International Business Machines Corporation Register restoration using transactional memory register snapshots
US10489382B2 (en) 2017-04-18 2019-11-26 International Business Machines Corporation Register restoration invalidation based on a context switch
US10552164B2 (en) 2017-04-18 2020-02-04 International Business Machines Corporation Sharing snapshots between restoration and recovery
US10564977B2 (en) 2017-04-18 2020-02-18 International Business Machines Corporation Selective register allocation
US10540184B2 (en) 2017-04-18 2020-01-21 International Business Machines Corporation Coalescing store instructions for restoration
US10782979B2 (en) 2017-04-18 2020-09-22 International Business Machines Corporation Restoring saved architected registers and suppressing verification of registers to be restored
US10572265B2 (en) 2017-04-18 2020-02-25 International Business Machines Corporation Selecting register restoration or register reloading
US10963261B2 (en) 2017-04-18 2021-03-30 International Business Machines Corporation Sharing snapshots across save requests
US10649785B2 (en) 2017-04-18 2020-05-12 International Business Machines Corporation Tracking changes to memory via check and recovery
US11010192B2 (en) 2017-04-18 2021-05-18 International Business Machines Corporation Register restoration using recovery buffers
US11372970B2 (en) * 2019-03-12 2022-06-28 Hewlett Packard Enterprise Development Lp Multi-dimensional attestation
CN111159002B (zh) * 2019-12-31 2023-04-28 山东有人物联网股份有限公司 一种基于分组的数据边缘采集方法、边缘采集设备及系统
CN111552511B (zh) * 2020-05-14 2023-06-16 山东省计算中心(国家超级计算济南中心) 一种VxWorks系统物联网固件解包恢复文件名的方法
US11249757B1 (en) * 2020-08-14 2022-02-15 International Business Machines Corporation Handling and fusing load instructions in a processor
CN114489791B (zh) * 2021-01-27 2023-03-24 沐曦集成电路(上海)有限公司 处理器装置及其指令执行方法、计算设备
CN114741237B (zh) * 2022-04-20 2024-11-01 Oppo广东移动通信有限公司 一种任务切换方法、装置、设备及存储介质
CN115437691B (zh) * 2022-11-09 2023-01-31 进迭时空(杭州)科技有限公司 一种针对risc-v矢量与浮点寄存器的物理寄存器堆分配装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5794024A (en) * 1996-03-25 1998-08-11 International Business Machines Corporation Method and system for dynamically recovering a register-address-table upon occurrence of an interrupt or branch misprediction
WO2013026055A1 (en) * 2011-08-18 2013-02-21 Qualcomm Incorporated Link stack repair of erroneous speculative update
US20150227355A1 (en) * 2014-02-10 2015-08-13 Netflix, Inc. Automatically generating volume images and launching virtual computing instances

Family Cites Families (134)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3781810A (en) 1972-04-26 1973-12-25 Bell Telephone Labor Inc Scheme for saving and restoring register contents in a data processor
US4992938A (en) * 1987-07-01 1991-02-12 International Business Machines Corporation Instruction control mechanism for a computing system with register renaming, map table and queues indicating available registers
JPH0353328A (ja) 1989-07-20 1991-03-07 Hitachi Ltd レジスタ退避回復方法ならびに処理装置
US5444853A (en) 1992-03-31 1995-08-22 Seiko Epson Corporation System and method for transferring data between a plurality of virtual FIFO's and a peripheral via a hardware FIFO and selectively updating control information associated with the virtual FIFO's
US6047122A (en) 1992-05-07 2000-04-04 Tm Patents, L.P. System for method for performing a context switch operation in a massively parallel computer system
US5535397A (en) 1993-06-30 1996-07-09 Intel Corporation Method and apparatus for providing a context switch in response to an interrupt in a computer process
JP3169779B2 (ja) 1994-12-19 2001-05-28 日本電気株式会社 マルチスレッドプロセッサ
US5673426A (en) * 1995-02-14 1997-09-30 Hal Computer Systems, Inc. Processor structure and method for tracking floating-point exceptions
US6356918B1 (en) * 1995-07-26 2002-03-12 International Business Machines Corporation Method and system for managing registers in a data processing system supports out-of-order and speculative instruction execution
US5809522A (en) 1995-12-18 1998-09-15 Advanced Micro Devices, Inc. Microprocessor system with process identification tag entries to reduce cache flushing after a context switch
JPH09212371A (ja) 1996-02-07 1997-08-15 Nec Corp レジスタ退避及び復元システム
US5881305A (en) 1996-12-13 1999-03-09 Advanced Micro Devices, Inc. Register rename stack for a microprocessor
US6088779A (en) 1996-12-30 2000-07-11 Fujitsu Limited System and method for execution management of computer programs
US5918005A (en) 1997-03-25 1999-06-29 International Business Machines Corporation Apparatus region-based detection of interference among reordered memory operations in a processor
JPH10289113A (ja) * 1997-04-14 1998-10-27 Toshiba Corp 計算機のレジスタコンテキストの保存/復元方式
US5987495A (en) 1997-11-07 1999-11-16 International Business Machines Corporation Method and apparatus for fully restoring a program context following an interrupt
US6134653A (en) 1998-04-22 2000-10-17 Transwitch Corp. RISC processor architecture with high performance context switching in which one context can be loaded by a co-processor while another context is being accessed by an arithmetic logic unit
US6338137B1 (en) 1998-05-29 2002-01-08 Texas Instruments Incorporated Data processor having memory access unit with predetermined number of instruction cycles between activation and initial data transfer
US6298403B1 (en) 1998-06-02 2001-10-02 Adaptec, Inc. Host adapter having a snapshot mechanism
US6457021B1 (en) 1998-08-18 2002-09-24 Microsoft Corporation In-memory database system
US6421758B1 (en) 1999-07-26 2002-07-16 International Business Machines Corporation Method and system for super-fast updating and reading of content addressable memory with a bypass circuit
JP3739607B2 (ja) 1999-08-24 2006-01-25 富士通株式会社 情報処理装置
US6480931B1 (en) 1999-11-05 2002-11-12 International Business Machines Corporation Content addressable storage apparatus and register mapper architecture
US7085914B1 (en) 2000-01-27 2006-08-01 International Business Machines Corporation Methods for renaming stack references to processor registers
US7155599B2 (en) * 2000-12-29 2006-12-26 Intel Corporation Method and apparatus for a register renaming structure
US6751749B2 (en) * 2001-02-22 2004-06-15 International Business Machines Corporation Method and apparatus for computer system reliability
US6968476B2 (en) * 2001-06-27 2005-11-22 International Business Machines Corporation Checkpointing a superscalar, out-of-order processor for error recovery
US20030177342A1 (en) 2002-03-15 2003-09-18 Hitachi Semiconductor (America) Inc. Processor with register dirty bits and special save multiple/return instructions
WO2004023314A2 (en) 2002-09-03 2004-03-18 Koninklijke Philips Electronics N.V. Method and apparatus for handling nested interrupts
US7269719B2 (en) 2002-10-30 2007-09-11 Stmicroelectronics, Inc. Predicated execution using operand predicates
US7127592B2 (en) 2003-01-08 2006-10-24 Sun Microsystems, Inc. Method and apparatus for dynamically allocating registers in a windowed architecture
JP2004220070A (ja) 2003-01-09 2004-08-05 Japan Science & Technology Agency コンテキスト切り替え方法及び装置、中央演算装置、コンテキスト切り替えプログラム及びそれを記憶したコンピュータ読み取り可能な記憶媒体
US7107438B2 (en) 2003-02-04 2006-09-12 Via Technologies, Inc. Pipelined microprocessor, apparatus, and method for performing early correction of conditional branch instruction mispredictions
US7543284B2 (en) 2003-04-22 2009-06-02 Transitive Limited Partial dead code elimination optimizations for program code conversion
US7493621B2 (en) 2003-12-18 2009-02-17 International Business Machines Corporation Context switch data prefetching in multithreaded computer
EP1622009A1 (en) 2004-07-27 2006-02-01 Texas Instruments Incorporated JSM architecture and systems
US20060184771A1 (en) 2005-02-11 2006-08-17 International Business Machines Mini-refresh processor recovery as bug workaround method using existing recovery hardware
US8095915B2 (en) 2005-04-13 2012-01-10 Telefonaktiebolaget Lm Ericsson (Publ) Data value coherence in computer systems
GB2425622A (en) 2005-04-27 2006-11-01 Ncapsa Ltd Programming real-time systems using data flow diagrams
US7562200B1 (en) 2005-06-10 2009-07-14 American Megatrends, Inc. Method, system, apparatus, and computer-readable medium for locking and synchronizing input/output operations in a data storage system
US20070043934A1 (en) * 2005-08-22 2007-02-22 Intel Corporation Early misprediction recovery through periodic checkpoints
US7426618B2 (en) 2005-09-06 2008-09-16 Dot Hill Systems Corp. Snapshot restore method and apparatus
US7747841B2 (en) 2005-09-26 2010-06-29 Cornell Research Foundation, Inc. Method and apparatus for early load retirement in a processor system
US7962731B2 (en) 2005-10-20 2011-06-14 Qualcomm Incorporated Backing store buffer for the register save engine of a stacked register file
US20070130448A1 (en) 2005-12-01 2007-06-07 Intel Corporation Stack tracker
US8266609B2 (en) 2005-12-07 2012-09-11 Microsoft Corporation Efficient placement of software transactional memory operations around procedure calls
US8813052B2 (en) 2005-12-07 2014-08-19 Microsoft Corporation Cache metadata for implementing bounded transactional memory
TWI312112B (en) 2005-12-30 2009-07-11 Ind Tech Res Inst Data managing method, method and apparatus to snapshot data for multiple volumes to a single snapshot volume in a data processing system
EP2013809B1 (en) 2006-05-01 2018-11-21 MediaTek Inc. Method and apparatus for secure context switching in a system including a processor and cached virtual memory
US8219885B2 (en) 2006-05-12 2012-07-10 Arm Limited Error detecting and correcting mechanism for a register file
US20080016325A1 (en) 2006-07-12 2008-01-17 Laudon James P Using windowed register file to checkpoint register state
US20080077782A1 (en) 2006-09-26 2008-03-27 Arm Limited Restoring a register renaming table within a processor following an exception
US7950002B2 (en) 2006-10-02 2011-05-24 International Business Machines Corporation Testing a software product by initiating a breakpoint and executing a probe routine
EP2527972A3 (en) 2006-11-14 2014-08-06 Soft Machines, Inc. Apparatus and method for processing complex instruction formats in a multi- threaded architecture supporting various context switch modes and virtualization schemes
US20080148022A1 (en) 2006-12-13 2008-06-19 Arm Limited Marking registers as available for register renaming
US7802136B2 (en) 2006-12-28 2010-09-21 Intel Corporation Compiler technique for efficient register checkpointing to support transaction roll-back
US8010543B1 (en) 2007-05-25 2011-08-30 Emc Corporation Protecting a file system on an object addressable storage system
US8239633B2 (en) 2007-07-11 2012-08-07 Wisconsin Alumni Research Foundation Non-broadcast signature-based transactional memory
US8661204B2 (en) 2007-08-15 2014-02-25 University Of Rochester, Office Of Technology Transfer Mechanism to support flexible decoupled transactional memory
US20100031084A1 (en) 2008-08-04 2010-02-04 Sun Microsystems, Inc. Checkpointing in a processor that supports simultaneous speculative threading
US8078854B2 (en) 2008-12-12 2011-12-13 Oracle America, Inc. Using register rename maps to facilitate precise exception semantics
US8245018B2 (en) 2008-12-31 2012-08-14 International Business Machines Corporation Processor register recovery after flush operation
CN101788901B (zh) 2009-01-24 2013-09-25 世意法(北京)半导体研发有限责任公司 使用影子寄存器的高效硬件实现的设备及其方法
CN101819518B (zh) 2009-02-26 2013-09-11 国际商业机器公司 在事务内存中快速保存上下文的方法和装置
US9940138B2 (en) * 2009-04-08 2018-04-10 Intel Corporation Utilization of register checkpointing mechanism with pointer swapping to resolve multithreading mis-speculations
US8484438B2 (en) 2009-06-29 2013-07-09 Oracle America, Inc. Hierarchical bloom filters for facilitating concurrency control
US8356148B2 (en) * 2009-09-22 2013-01-15 Lsi Corporation Snapshot metadata management in a storage system
GB2474522B (en) * 2009-10-19 2014-09-03 Advanced Risc Mach Ltd Register state saving and restoring
US8516465B2 (en) 2009-12-04 2013-08-20 Oracle America, Inc. Register prespill phase in a compiler
US8972994B2 (en) 2009-12-23 2015-03-03 Intel Corporation Method and apparatus to bypass object lock by speculative execution of generated bypass code shell based on bypass failure threshold in managed runtime environment
US9009692B2 (en) 2009-12-26 2015-04-14 Oracle America, Inc. Minimizing register spills by using register moves
US20110238962A1 (en) 2010-03-23 2011-09-29 International Business Machines Corporation Register Checkpointing for Speculative Modes of Execution in Out-of-Order Processors
US9911008B2 (en) 2010-05-25 2018-03-06 Via Technologies, Inc. Microprocessor with on-the-fly switching of decryption keys
US8560816B2 (en) 2010-06-30 2013-10-15 Oracle International Corporation System and method for performing incremental register checkpointing in transactional memory
US8424015B2 (en) 2010-09-30 2013-04-16 International Business Machines Corporation Transactional memory preemption mechanism
US9626190B2 (en) 2010-10-07 2017-04-18 Advanced Micro Devices, Inc. Method and apparatus for floating point register caching
US9110691B2 (en) 2010-11-16 2015-08-18 Advanced Micro Devices, Inc. Compiler support technique for hardware transactional memory systems
US8966453B1 (en) 2010-11-24 2015-02-24 ECOLE POLYTECHNIQUE FéDéRALE DE LAUSANNE Automatic generation of program execution that reaches a given failure point
US9170818B2 (en) 2011-04-26 2015-10-27 Freescale Semiconductor, Inc. Register renaming scheme with checkpoint repair in a processing device
US9063747B2 (en) 2011-04-28 2015-06-23 Freescale Semiconductor, Inc. Microprocessor systems and methods for a combined register file and checkpoint repair register
US8656121B2 (en) 2011-05-17 2014-02-18 International Business Machines Corporation Facilitating data coherency using in-memory tag bits and tag test instructions
US8756591B2 (en) 2011-10-03 2014-06-17 International Business Machines Corporation Generating compiled code that indicates register liveness
US10078515B2 (en) 2011-10-03 2018-09-18 International Business Machines Corporation Tracking operand liveness information in a computer system and performing function based on the liveness information
US9081834B2 (en) 2011-10-05 2015-07-14 Cumulus Systems Incorporated Process for gathering and special data structure for storing performance metric data
US9569369B2 (en) 2011-10-27 2017-02-14 Oracle International Corporation Software translation lookaside buffer for persistent pointer management
WO2013099414A1 (ja) 2011-12-26 2013-07-04 インターナショナル・ビジネス・マシーンズ・コーポレーション レジスタ・マッピング方法
US9058201B2 (en) 2011-12-28 2015-06-16 Intel Corporation Managing and tracking thread access to operating system extended features using map-tables containing location references and thread identifiers
GB2498203B (en) * 2012-01-06 2013-12-04 Imagination Tech Ltd Restoring a register renaming map
US20130226878A1 (en) 2012-02-27 2013-08-29 Nec Laboratories America, Inc. Seamless context transfers for mobile applications
CN102662851A (zh) * 2012-04-12 2012-09-12 江苏中科芯核电子科技有限公司 一种堆栈入栈出栈装置及方法
US9740549B2 (en) 2012-06-15 2017-08-22 International Business Machines Corporation Facilitating transaction completion subsequent to repeated aborts of the transaction
US9361115B2 (en) 2012-06-15 2016-06-07 International Business Machines Corporation Saving/restoring selected registers in transactional processing
US9262170B2 (en) 2012-07-26 2016-02-16 International Business Machines Corporation Out-of-order checkpoint reclamation in a checkpoint processing and recovery core microarchitecture
US9229873B2 (en) 2012-07-30 2016-01-05 Soft Machines, Inc. Systems and methods for supporting a plurality of load and store accesses of a cache
US9672044B2 (en) 2012-08-01 2017-06-06 Nxp Usa, Inc. Space efficient checkpoint facility and technique for processor with integrally indexed register mapping and free-list arrays
EP2912579B1 (en) 2012-10-23 2020-08-19 IP Reservoir, LLC Method and apparatus for accelerated format translation of data in a delimited data format
US9411739B2 (en) 2012-11-30 2016-08-09 Intel Corporation System, method and apparatus for improving transactional memory (TM) throughput using TM region indicators
US9183399B2 (en) 2013-02-14 2015-11-10 International Business Machines Corporation Instruction set architecture with secure clear instructions for protecting processing unit architected state information
US9336004B2 (en) * 2013-02-28 2016-05-10 Advanced Micro Devices, Inc. Checkpointing registers for transactional memory
WO2014159123A1 (en) 2013-03-12 2014-10-02 Microchip Technology Incorporated Programmable cpu register hardware context swap mechanism
US8972992B2 (en) 2013-04-30 2015-03-03 Splunk Inc. Proactive monitoring tree with state distribution ring
JP6107485B2 (ja) 2013-07-04 2017-04-05 富士通株式会社 演算処理装置及び演算処理装置の制御方法
US9471325B2 (en) 2013-07-12 2016-10-18 Qualcomm Incorporated Method and apparatus for selective renaming in a microprocessor
US9311084B2 (en) * 2013-07-31 2016-04-12 Apple Inc. RDA checkpoint optimization
US9946666B2 (en) 2013-08-06 2018-04-17 Nvidia Corporation Coalescing texture access and load/store operations
US9465721B2 (en) 2013-08-19 2016-10-11 Microsoft Technology Licensing, Llc Snapshotting executing code with a modifiable snapshot definition
US9298626B2 (en) 2013-09-26 2016-03-29 Globalfoundries Inc. Managing high-conflict cache lines in transactional memory computing environments
US9471480B2 (en) 2013-12-02 2016-10-18 The Regents Of The University Of Michigan Data processing apparatus with memory rename table for mapping memory addresses to registers
GB2518022B (en) * 2014-01-17 2015-09-23 Imagination Tech Ltd Stack saved variable value prediction
GB2516999B (en) 2014-01-31 2015-07-22 Imagination Tech Ltd An improved return stack buffer
US9262206B2 (en) 2014-02-27 2016-02-16 International Business Machines Corporation Using the transaction-begin instruction to manage transactional aborts in transactional memory computing environments
US10198321B1 (en) 2014-04-01 2019-02-05 Storone Ltd. System and method for continuous data protection
US9305167B2 (en) 2014-05-21 2016-04-05 Bitdefender IPR Management Ltd. Hardware-enabled prevention of code reuse attacks
CN104021058A (zh) * 2014-06-30 2014-09-03 广州视源电子科技股份有限公司 一种测试板卡快速启动的方法
US10042710B2 (en) 2014-09-16 2018-08-07 Actifio, Inc. System and method for multi-hop data backup
US9501637B2 (en) 2014-09-26 2016-11-22 Intel Corporation Hardware shadow stack support for legacy guests
WO2016069029A1 (en) 2014-10-28 2016-05-06 Hewlett Packard Enterprise Development Lp Snapshot creation
GB2538237B (en) * 2015-05-11 2018-01-10 Advanced Risc Mach Ltd Available register control for register renaming
GB2538764B (en) 2015-05-28 2018-02-14 Advanced Risc Mach Ltd Register renaming
GB2538766B (en) 2015-05-28 2018-02-14 Advanced Risc Mach Ltd Register renaming
US20170249144A1 (en) 2016-02-26 2017-08-31 Qualcomm Incorporated Combining loads or stores in computer processing
CN106201656B (zh) * 2016-06-30 2019-06-07 无锡华云数据技术服务有限公司 一种对kvm虚拟机快照存储空间的统计方法
EP3373178B1 (en) 2017-03-08 2024-09-18 Secure-IC SAS Comparison of execution context data signatures with references
US10963261B2 (en) 2017-04-18 2021-03-30 International Business Machines Corporation Sharing snapshots across save requests
US11010192B2 (en) 2017-04-18 2021-05-18 International Business Machines Corporation Register restoration using recovery buffers
US10540184B2 (en) 2017-04-18 2020-01-21 International Business Machines Corporation Coalescing store instructions for restoration
US10545766B2 (en) 2017-04-18 2020-01-28 International Business Machines Corporation Register restoration using transactional memory register snapshots
US10552164B2 (en) 2017-04-18 2020-02-04 International Business Machines Corporation Sharing snapshots between restoration and recovery
US10564977B2 (en) 2017-04-18 2020-02-18 International Business Machines Corporation Selective register allocation
US10740108B2 (en) 2017-04-18 2020-08-11 International Business Machines Corporation Management of store queue based on restoration operation
US10572265B2 (en) 2017-04-18 2020-02-25 International Business Machines Corporation Selecting register restoration or register reloading
US10782979B2 (en) 2017-04-18 2020-09-22 International Business Machines Corporation Restoring saved architected registers and suppressing verification of registers to be restored
US10489382B2 (en) 2017-04-18 2019-11-26 International Business Machines Corporation Register restoration invalidation based on a context switch
US10649785B2 (en) 2017-04-18 2020-05-12 International Business Machines Corporation Tracking changes to memory via check and recovery

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5794024A (en) * 1996-03-25 1998-08-11 International Business Machines Corporation Method and system for dynamically recovering a register-address-table upon occurrence of an interrupt or branch misprediction
WO2013026055A1 (en) * 2011-08-18 2013-02-21 Qualcomm Incorporated Link stack repair of erroneous speculative update
US20150227355A1 (en) * 2014-02-10 2015-08-13 Netflix, Inc. Automatically generating volume images and launching virtual computing instances

Also Published As

Publication number Publication date
US20180300157A1 (en) 2018-10-18
JP7046098B2 (ja) 2022-04-01
GB2575412B (en) 2021-10-20
JP2020518895A (ja) 2020-06-25
CN110520837A (zh) 2019-11-29
CN110520837B (zh) 2023-06-23
GB201916132D0 (en) 2019-12-18
GB2575412A (en) 2020-01-08
DE112018000848T5 (de) 2019-11-07
US10838733B2 (en) 2020-11-17

Similar Documents

Publication Publication Date Title
US11061684B2 (en) Architecturally paired spill/reload multiple instructions for suppressing a snapshot latest value determination
US10838733B2 (en) Register context restoration based on rename register recovery
US10592251B2 (en) Register restoration using transactional memory register snapshots
US10732981B2 (en) Management of store queue based on restoration operation
US10552164B2 (en) Sharing snapshots between restoration and recovery
US10564977B2 (en) Selective register allocation
US11010192B2 (en) Register restoration using recovery buffers
US10572265B2 (en) Selecting register restoration or register reloading
US10649785B2 (en) Tracking changes to memory via check and recovery
US10489382B2 (en) Register restoration invalidation based on a context switch
US10540184B2 (en) Coalescing store instructions for restoration
US10963261B2 (en) Sharing snapshots across save requests
US11579806B2 (en) Portions of configuration state registers in-memory
JP7190797B2 (ja) メモリ内構成状態レジスタの保護を提供するコンピュータ・プログラム、コンピュータ・システム、およびコンピュータ実装方法
JP7249717B2 (ja) メモリ・ポインタを変更することによるコンテキスト切り替えを提供するコンピュータ・プログラム、コンピュータ・システム、およびコンピュータ実装方法
JP7160447B2 (ja) 構成状態レジスタの一括格納および読み込み動作を提供するコンピュータ・プログラム、コンピュータ・システム、およびコンピュータ実装方法
JP7190798B2 (ja) 機能的類似性に基づいてグループ化された構成状態レジスタを提供するコンピュータ・プログラム、コンピュータ・システム、およびコンピュータ実装方法
US20200012427A1 (en) Automatic pinning of units of memory
JP7189935B2 (ja) 同時の分岐アドレスの予測およびレジスタの内容の更新

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18787059

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019556276

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 201916132

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20180313

122 Ep: pct application non-entry in european phase

Ref document number: 18787059

Country of ref document: EP

Kind code of ref document: A1