WO2023194702A1 - Technique for handling ordering constrained access operations - Google Patents

Technique for handling ordering constrained access operations Download PDF

Info

Publication number
WO2023194702A1
WO2023194702A1 PCT/GB2023/050589 GB2023050589W WO2023194702A1 WO 2023194702 A1 WO2023194702 A1 WO 2023194702A1 GB 2023050589 W GB2023050589 W GB 2023050589W WO 2023194702 A1 WO2023194702 A1 WO 2023194702A1
Authority
WO
WIPO (PCT)
Prior art keywords
access
instruction
ordering
data values
operations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/GB2023/050589
Other languages
English (en)
French (fr)
Inventor
Simon John Craske
Jacob Eapen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Ltd
Original Assignee
ARM Ltd
Advanced Risc Machines Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Ltd, Advanced Risc Machines Ltd filed Critical ARM Ltd
Priority to CN202380031691.9A priority Critical patent/CN118974699A/zh
Priority to KR1020247036464A priority patent/KR20240167917A/ko
Priority to EP23711543.1A priority patent/EP4505292B1/en
Priority to JP2024558321A priority patent/JP2025511310A/ja
Priority to US18/853,552 priority patent/US20250190217A1/en
Priority to IL315456A priority patent/IL315456A/en
Publication of WO2023194702A1 publication Critical patent/WO2023194702A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/3013Organisation of register space, e.g. banked or distributed register file according to data content, e.g. floating-point registers, address registers
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/3016Decoding the operand specifier, e.g. specifier format
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/3834Maintaining memory consistency

Definitions

  • the present technique relates to the handling of ordering constrained access operations.
  • TSO total store order
  • RCpc release consistency, processor consistent
  • an apparatus comprising: processing circuitry to perform operations; instruction decoder circuitry to decode instructions to control the processing circuitry to perform the operations specified by the instructions; and a set of registers to hold data values for access by the processing circuitry; wherein the instruction decoder circuitry is responsive to an ordering constrained access instruction used to access multiple data values, and providing register indication information and memory address information, to control the processing circuitry to perform a sequence of access operations, where each access operation causes a data value from amongst the multiple data values to be moved between an associated register determined from the register indication information and an associated memory address determined from the memory address information; and wherein an ordering indication is derived from the ordering constrained access instruction and used to determine an order in which the multiple data values are to be accessed when performing the sequence of access operations.
  • a method of handling ordering constrained access operations in an apparatus having processing circuitry to perform operations comprising: employing instruction decoder circuitry to decode instructions to control the processing circuitry to perform the operations specified by the instructions; employing a set of registers to hold data values for access by the processing circuitry; causing the instruction decoder circuitry, in response to an ordering constrained access instruction used to access multiple data values, and providing register indication information and memory address information, to control the processing circuitry to perform a sequence of access operations, where each access operation causes a data value from amongst the multiple data values to be moved between an associated register determined from the register indication information and an associated memory address determined from the memory address information; and determining, responsive to an ordering indication derived from the ordering constrained access instruction, an order in which the multiple data values are to be accessed when performing the sequence of access operations.
  • a computer program for controlling a host data processing apparatus to provide an instruction execution environment comprising: processing program logic to perform operations; instruction decode program logic to decode instructions to control the processing program logic to perform the operations specified by the instructions; and register emulating program logic to emulate a set of registers to hold data values for access by the processing program logic; wherein the instruction decode program logic is responsive to an ordering constrained access instruction used to access multiple data values, and providing register indication information and memory address information, to control the processing program logic to perform a sequence of access operations, where each access operation causes a data value from amongst the multiple data values to be moved between an associated register determined from the register indication information and an associated memory address determined from the memory address information; and wherein an ordering indication is derived from the ordering constrained access instruction and used to determine an order in which the multiple data values are to be accessed when performing the sequence of access operations.
  • Figure 1 is a block diagram of a system incorporating an apparatus in accordance with one example implementation
  • Figures 2A and 2B are diagrams schematically illustrating fields provided within an ordering constrained access instruction used to access multiple data values, in accordance with one example implementation
  • Figure 3 is a flow diagram illustrating how an ordering constrained access instruction used to access multiple data values is processed in accordance with one example implementation
  • Figures 4A and 4B are diagrams schematically illustrating how observability requirements are met when processing an ordering constrained access instruction used to access multiple data values, in accordance with one example implementation
  • Figure 5 is a flow diagram illustrating how an ordering indication is derived in accordance with one example implementation
  • Figure 6 is a flow diagram illustrating one example implementation where decoder circuitry decomposes an ordering constrained access instruction used to access multiple data values into a sequence of single access ordering constrained access instructions; and Figure 7 illustrates an example simulator implementation that may be used. DESCRIPTION OF EXAMPLES
  • an apparatus has processing circuitry for performing operations, and instruction decoder circuitry to decode instructions in order to control the processing circuitry to perform the operations specified by those instructions.
  • a set of registers is provided, where each register can be used to hold a data value for access by the processing circuitry.
  • the data value may comprise one or more data elements, and the term “data value” used herein is used to refer to the block of data that can be held within a single register.
  • a single access ordering constrained access instruction such as a load acquire instruction or a store release instruction.
  • a single register will be accessed (either as the source for a store operation or as the destination for a load operation).
  • the processing circuitry will ensure that an ordering constraint is met in order to meet certain observability requirements. For example, for a load acquire instruction, the processing circuitry will ensure that any access instruction (either load or store) appearing in a program order after the load acquire instruction will only be observed (for example by any other processing element in the system) as being executed after execution of the load acquire instruction has been completed. Expressed another way, the load operation associated with the load acquire instruction will be ordered before any access operation associated with another access instruction appearing in program order after the load acquire instruction.
  • the processing circuitry will ensure that any access instruction that is observed as having completed execution before the store release instruction is observed as being executed will be an access instruction appearing in a program order prior to the store release instruction. Expressed another way, any access instruction appearing in a program order prior to the store release instruction will have its associated access operation ordered before the store operation associated with the store release instruction.
  • an ordering constrained access instruction that is used to access multiple data values
  • the instruction decoder circuitry is responsive to such an instruction to control the processing circuitry to perform a sequence of access operations.
  • Each access operation causes a data value from amongst the multiple data values to be moved between an associated register (determined from register indication information provided by the instruction) and an associated memory address (determined from memory address information provided by the instruction).
  • an ordering indication is derived from the ordering constrained access instruction and is used to determine an order in which the multiple data values are to be accessed when performing the sequence of access operations.
  • any observing entity in the system will observe the accesses performed by those series of instructions in a particular order, namely the order in which those single access ordering constrained access instructions appear in program order.
  • the ordering indication may in one example implementation be determined by the instruction decoder circuitry when decoding the ordering constrained access instruction.
  • the ordering indication may be determined by the processing circuitry based on the information provided to it by the instruction decoder circuitry.
  • the ordering indication can take a variety of forms.
  • the ordering indication is used to identify an order in which the memory addresses associated with the multiple data values are to be accessed, thereby determining the order in which the multiple data values are to be accessed.
  • the ordering indication may identify whether the lowest memory address should be accessed first, or the highest memory address should be accessed first.
  • the instruction decoder circuitry is arranged to control the processing circuitry to ensure that, for observing circuitry coupled to the apparatus and able to observe the access operations performed by the processing circuitry, a given access operation in the sequence of access operations performed when executing the ordering constrained access instruction is observable by the observing circuitry as having completed before performance of any subsequent access operation following the given access operation in the sequence of access operations is observable by the observing circuitry.
  • the sequence of access operations includes access operation A followed by access operation B
  • the above requirement will ensure that it is also able to observe the entirety of access operation A. It is worth noting that this also implies the reverse observing condition, namely that if the observer cannot observe the entirety of access operation A, then it will not be able to observe any part of the access operation B.
  • each data value may comprise a plurality of data elements.
  • the processing circuitry may be arranged to ensure that all of the multiple accesses forming the given access operation are observable by the observing circuitry as having completed before performance of any subsequent access operation following the given access operation in the sequence of access operations is observable by the observing circuitry.
  • the ordering indication may be directly encoded within a field of the ordering constrained access instruction, and hence the ordering indication can be determined from an analysis of that field.
  • the ordering indication may be derived from other information encoded into the ordering constrained access instruction.
  • the ordering constrained access instruction is arranged to specify an addressing mode used to determine the memory addresses for the multiple data values from the memory address information, and the ordering indication may be derived in dependence on the addressing mode.
  • the memory address information may be arranged to provide a memory address indication used to determine one memory address (this may for example be, but does not need to be, the first memory address to be accessed in accordance with the ordering indication), the addressing mode may identify an adjustment direction used at least during determination of each other memory address, and the ordering indication may be determined in dependence on the adjustment direction.
  • the addressing mode will identify not just an adjustment direction, but may also provide information enabling determination of an adjustment amount.
  • the adjustment direction and adjustment amount information can then be used to determine each memory address in the sequence. For example, based on such addressing mode information, it may be possible to construct a sequence of increasing or decreasing memory addresses, each separated from each other by the adjustment amount.
  • the adjustment direction and adjustment amount may also be used when determining the one memory address that is determined from the memory address indication, for example when the addressing mode identifies a pre-decrementing mechanism that causes the memory address to be determined by decrementing by the adjustment amount a memory address determined from the memory address indication.
  • the order in which the memory addresses are to be accessed so as to meet the observability constraints could differ from the way in which the sequence of addresses are determined based on the addressing mode, it has been found that it is often the case that there is inherent link between the ordering of the accesses required to meet the observability constraints and the adjustment direction provided by the addressing mode, and hence it is often the case that the addressing mode information can be re-used to determine the ordering indication, either by itself, or in combination with other information provided by the instruction.
  • the ordering indication may be derived in dependence on an instruction type of the ordering constrained access instruction.
  • this instruction type information may be used in combination with the earlier-mentioned addressing mode information.
  • the instruction type is arranged to identify whether the ordering constrained access instruction is a load instruction seeking to load the multiple data values from memory into a plurality of the registers, or a store instruction seeking to store the multiple data values from the plurality of registers to memory.
  • the ordering indication is derived from information encoded into the ordering constrained access instruction identifying whether the ordering constrained access instruction is to be used to perform a stack-type access operation or is to be used to perform a non-stack-type access operation. This may be explicitly identified within the instruction, for example by identifying that the ordering constrained access instruction is using a stack pointer to identify the required memory addresses, or alternatively the fact that the ordering constrained access instruction is being used to perform a stack-type access operation may be inferred from other information within the instruction, for instance where a particular addressing mode is reserved for use when performing stack-type access operations.
  • a pre-decrementing addressing mode may be reserved for use in performing stack-type store operations, and hence the ordering indication may be determined based on whether the ordering constrained access instruction is performing a store operation and whether that store operation is a stack-type operation. If that is the case, then one form of ordering indication may be determined, whereas if that is not the case then an opposite ordering indication may be determined.
  • the ordering constrained access instruction is arranged to access a pair of data values, and provides register indication information sufficient to identify a register associated with each data value in the pair of data values.
  • register indication information sufficient to identify a register associated with each data value in the pair of data values.
  • the memory address information may be arranged to provide a memory address indication used to determine one memory address to be associated with one of the data values in the pair of data values, and then a further memory address to be associated with the other of the data values in the pair may be derived from the one memory address. For instance, the further memory address may be inferred once the one memory address has been determined, based on the addressing mode information.
  • the ordering constrained access instruction used to access multiple data values can take a variety of forms.
  • the ordering constrained access instruction is a store release instruction used to store multiple data values to memory, and the instruction decoder circuitry is arranged, on decoding the store release instruction, to control the processing circuitry to ensure:
  • the ordering constrained access instruction may be a load acquire instruction used to load multiple data values into associated registers in the set of registers
  • the instruction decoder circuitry may be arranged, on decoding the load acquire instruction, to control the processing circuitry to ensure:
  • the instruction decoder circuitry may handle the ordering constrained access instruction in order to appropriately control the processing circuitry to perform the specified access operations whilst meeting the required observability conditions.
  • the instruction decoder circuitry is arranged to decompose the ordering constrained access instruction used to access multiple data values into a sequence of single access ordering constrained access instructions, where each single access ordering constrained access instruction is arranged to access one of the data values amongst the multiple data values.
  • the instruction decoder circuitry is arranged to determine the order in which to control the processing circuitry to execute each single access ordering constrained access instruction in the sequence dependent on the ordering indication.
  • it may decompose the ordering constrained access instruction into a series of single access ordering constrained access instructions, and then either cause the processing circuitry to execute that series in a default order or in a reverse order, depending on the ordering indication determined from the ordering constrained access instruction.
  • FIG. 1 is a block diagram of a system incorporating an apparatus in accordance with one example implementation.
  • the apparatus may take the form of either the processor core 10 or the processor core 20 in the example of Figure 1, and as will be apparent from Figure 1 both of these processor cores may be constructed in an identical manner.
  • processor core 10 is executing a sequence of instructions that includes one or more instances of the earlier-discussed ordering constrained access instruction used to access multiple data values, and the processor core 20 is an observing entity for those accesses.
  • the processor core 20 is executing such instructions
  • the processor core 10 is an observer, and indeed both situations could occur within the same system.
  • a number of additional processing elements may also be provided within the system, that may operate to execute such instructions, and/or be observers of accesses made by other processing elements.
  • the processor cores 10, 20 are coupled to an interconnect 30 via which they share access to memory 45.
  • the interconnect can take a variety of forms, but in the example shown is a coherent interconnect that may include a system cache 35 accessible to both of the processor cores 10, 20, and associated cache coherency circuitry 40 to ensure that each of the processor cores has a coherent view of the data stored within the caches of the system.
  • system cache 35 accessible to both of the processor cores 10, 20, and associated cache coherency circuitry 40 to ensure that each of the processor cores has a coherent view of the data stored within the caches of the system.
  • there may be one or more other levels of cache for example one or more levels of local cache 15, 25 accessible to the respective processor cores 10, 20.
  • the cache coherency circuitry 40 can employ any of a number of known cache coherency schemes to ensure that each processor core 10, 20 will access the most up-to-date version of the data cached within the system in response to issuing a request to access that data.
  • the processor core 10 may include an instruction decoder 50 for decoding instructions fetched from memory or one of the caches, in order to generate control signals that are then used to control the processing circuitry 52 to perform the operations required by those instructions.
  • the processing circuitry 52 has access to a set of registers 54 in which data values to be used as inputs to the operations may be stored, and in which the output results generated by those operations may be stored.
  • Some of the instructions executed by the processor core 10 may cause access operations to be performed by the processing circuitry 52 in order to load data values from the memory/caches into the registers 54 (in this instance the access operations being load operations), and/or to store data values from the registers 54 to the memory/caches (in this instance the access operations being store operations).
  • store operations When store operations are to be performed, they may be temporarily buffered within the store buffer 56, and the processor core 10 may be able to perform some reordering of the store operations held within the store buffer in order to seek to improve performance.
  • a load buffer 58 may be provided to temporarily buffer load operations to be performed by the processor core. In an out of order processor, it may be possible to reorder certain load operations in situations where the address computation time may vary amongst different load instructions, and in such cases the presence of a load buffer 58 can be useful.
  • the processor core 20 is constructed in an identical manner to the processor core 10, and hence includes an instruction decoder 60, processing circuitry 62, a set of registers 64, a store buffer 66 and optionally a load buffer 68.
  • the system will also typically employ a memory consistency model in respect of memory in order to ensure that the results of reading, writing or updating memory will be predictable.
  • Some systems may employ a relatively weak consistency model to allow flexibility in the way in which accesses to memory may be reordered by particular processing elements within the system, but in some instances it may be desired to emulate the behaviour of a stronger consistency model than is inherently supported by the system.
  • load acquire and store release instructions instead of standard load and store instructions, and in particular when a processing element executes a load acquire instruction or a store release instruction certain observability constraints are ensured so that another processing element in the system observing the accesses performed by a given processing element will observe those accesses as having occurred in a particular program order even if some local reordering is performed by the given processing element.
  • new forms of load acquire and store release instructions (referred to herein as ordering constrained access instructions that are used to access multiple data values) are provided that are able to specify multiple data values to be accessed, and in particular which, when executed, will cause a series of access operations to be performed in respect of multiple registers, each access operation causing a data value from amongst the multiple data values to be moved between an associated register and an associated memory address (from the register to memory in the event of a store release instruction, and from the memory to a register in the event of a load acquire instruction).
  • an ordering indication is derived from the ordering constrained access instruction and is used to determine an order in which the multiple data values are to be accessed when performing the series of access operations.
  • an ordering indication it is possible to ensure that the individual accesses are externally observable in the required order, and in particular can meet the same observability requirements that would be met had multiple single access ordering constrained access instructions (i.e. a series of traditional load acquire or store release instructions, each accessing one data value) been executed instead of the new ordering constrained access instruction used to access multiple data values.
  • Figure 2A is a diagram schematically illustrating fields that may be provided within an ordering constrained access instruction 100 of the type used to access multiple data values, in accordance with one example implementation.
  • a first field 105 is used to specify the instruction type, and hence for example may identify whether the instruction is a load or store instruction. It may also optionally specify additional information, such as whether the instruction is operating on a stack in memory or instead is operating on a non-stack region of memory.
  • a further field 110 provides an addressing mode, and is used in combination with the memory address information in the memory address field 125 to determine the memory addresses associated with each of the data values to be accessed.
  • the memory address information in the field 125 may for instance give sufficient information to enable one of the addresses to be determined, for example by providing a stack pointer indication used to identify a stack pointer, or by identifying a register whose contents may be used to determine the memory address. In this latter case, it may for example be the case that the data value in that identified register is used as an offset to add to some base address in order to determine the memory address.
  • the addressing mode information can then be used to compute each of the other addresses required, and indeed in some instances can also be taken into account when computing the first memory address from the memory address information in the field 125.
  • the addressing mode may for example indicate an adjustment direction, such as whether each subsequent address is to be determined by incrementing the previously determined address, or by decrementing the previously determined address.
  • the addressing mode may also provide an adjustment amount in some implementations, so as to allow configurability as to the amount of the adjustment to be made when computing each subsequent address.
  • a register indication field 120 is also provided to store register indication information. This information can be used to determine a register identifier for each register to be accessed when executing the ordering constrained access instruction.
  • each register may be explicitly identified within the register indication field 120. However, in an alternative implementation, one register may be identified, and each additional register may be inferred, for example in situations where the instruction operates in respect of a series of adjacent registers, or registers separated by a predetermined amount.
  • an ordering indication field 115 is provided as an explicit field, in which ordering indication information may be stored to identify the order in which each of the access operations required to execute the ordering constrained access instruction are to be performed.
  • an explicit ordering indication field is used
  • there may be no need for an explicit ordering indication field and instead it may be possible to derive the ordering indication from other information provided within the instruction.
  • the ordering constrained access instruction 130 that is used to access multiple data values includes the earlier described fields 105, 110, 120, 125, but no explicit ordering indication field 115.
  • it is possible in this alternative implementation to infer the ordering indication from other information in the instruction and in particular in one example implementation this is achieved with reference to both the instruction type information in the field 105 and the addressing mode information in the field 110.
  • Figure 3 is a flow diagram illustrating how the ordering constrained access instruction may be processed in accordance with one example implementation.
  • step 200 when it is determined that an ordering constrained access instruction for accessing multiple data values has been encountered, then at step 205 it is determined what type of instruction is being executed, for example whether the instruction is a load acquire instruction or a store release instruction.
  • the ordering indication is determined, and as discussed earlier this may either be determined with reference to an explicit field within the instruction, or instead may be derived from other information, for example from an indication of the addressing mode and/or the instruction type.
  • the memory address for each data value is determined, using the memory address information in the field 125 and the addressing mode information in the addressing mode field 110.
  • the register associated with each data value is determined, using the information in the register indication field 120.
  • the processing circuitry can be constrained to perform the accesses to each data value in the order indicated by the ordering indication.
  • the instruction decoder 50 may be arranged, on decoding the store release instruction, to control the processing circuitry 52 to ensure:
  • the instruction decoder 50 may be arranged, on decoding the load acquire instruction, to control the processing circuitry 52 to ensure:
  • steps 205, 210, 215 and 220 are shown sequentially, it will be appreciated that one or more of these steps may be performed in parallel, dependent on the implementation.
  • steps are performed by the decoder and which steps are performed by the processing circuitry may also vary dependent on implementation.
  • the decoder may be arranged to determine which register operands are to be used, and the addressing mode.
  • the processing circuitry can then be arranged to implement the addressing mode in order to perform the required access operations whilst enforcing the required ordering as determined from the ordering indication.
  • Figures 4A and 4B are diagrams illustrating how the observability requirements are met between the individual access operations required to execute the abovedescribed ordering constrained access instruction.
  • the ordering constrained access instruction causes a pair of data values to be accessed, with those data values either being moved from memory into a pair of registers in the event of a load acquire instruction, or with those data values being moved from a pair of registers to memory in the event of a store release instruction.
  • the two access operations required to access each data value in the pair of data values are referred to in Figures 4A and 4B as access operations 1 and 2.
  • Figure 4B illustrates a scenario where access operation 1 needs to be broken down into multiple separate accesses, in particular in this example a first access and a second access. There are various reasons why this may occur, but in the example of Figure 4B it is assumed that the required data values to be accessed are spread across two cache lines, and hence separate accesses to each of those cache lines is required. This means that the performance of access operation 1 is not itself atomic. Nevertheless, even in that scenario, the processing circuitry is constrained to ensure that the above observability constraint between access operation 1 and access operation 2 is met.
  • Figure 5 is a flow diagram illustrating a particular example implementation used to determine the ordering indication from other information present within the ordering constrained access instruction.
  • step 300 it is assumed that an ordering constrained access instruction used to access multiple data values has been encountered, and it is determined whether that instruction is a store release instruction. In this example implementation, if the instruction is not a store release instruction, then the process proceeds directly to step 315 where the ordering indication is determined to identify that the lowest memory address should be accessed first.
  • step 305 it is determined whether the instruction is performing a stack-type operation. This could be determined in a variety of ways. For example, it may be explicitly identified within the instruction, for instance by identifying that the store release instruction is using a stack pointer to identify the required memory addresses. Alternatively, it may be possible to infer this information from other information provided within the instruction. For example, if a particular addressing mode is reserved for use when performing stack-type store operations, then the presence of that addressing mode may be used at step 305 to determine that the instruction is intended to perform a stack-type operation.
  • step 315 the ordering indication is determined to identify that the lowest memory address should be accessed first.
  • step 305 the instruction is for performing a stack-type operation
  • the process may proceed directly to step 320 where the ordering indication is determined to identify that the highest memory address should be accessed first.
  • the addressing mode is a pre-decrement addressing mode. If not, the process may proceed to step 315 where the ordering indication is determined to identify that the lowest memory address is to be accessed first, but if the addressing mode is pre-decrement, then the process proceeds to step 320 where the ordering indication is determined to identify that the highest memory address should be accessed first.
  • Execution of the first store release instruction would cause the data in register XI to be stored to a location in a stack in memory determined by pre-decrementing the provided stack pointer value by eight bytes to generate a new stack pointer value.
  • Execution of the second store release instruction would then cause the data in register X0 to be stored to a location in a stack in memory determined by pre-decrementing the stack pointer value generated through execution of the first STLR instruction, again by eight bytes, to create an updated stack pointer value.
  • the “P” indicates that the store release instruction is to be executed on a pair of registers, namely the identified registers X0 and XI, and the ordering indication derived from the instruction identifies that the highest memory address should be accessed first (in one embodiment this can be determined from a combination of the instruction being a store release instruction, and the pre-decrement addressing mode being used).
  • the pre-decrement amount is 16 bytes, and so the stack pointer is pre-decremented by 16 bytes, so as to enable the first eight bytes of data from register X0 to be stored to the stack followed by the second eight bytes of data from register XI .
  • the stores will be processed such that the store of XI is ordered before the store of X0.
  • Two separate store operations can be performed to implement the execution of this instruction, and the processing circuitry will be constrained to ensure that the earlier-mentioned observability requirements between the individual store operations are met.
  • STLR X1, [X2, 8] could be replaced by the single new store release instruction of the form:
  • the addressing mode may identify post increment changes to the address by 8 bytes, with a first store access operation being used to store the data in register X0 to a memory address determined from the contents of the register X2, and a second store access operation being used to store the data in register XI to a memory address determined by incrementing the address determined for the first access operation by 8 bytes.
  • the ordering indication could in one embodiment be derived from the addressing mode, and may indicate that the accesses should be performed to the lowest memory address first.
  • the processing circuitry when executing this new form of store release instruction, the processing circuitry will be constrained to ensure that the earlier-mentioned observability requirements between the individual store operations are met.
  • the instruction decoder 50 may handle an ordering constrained access instruction of the above type in order to appropriately control the processing circuitry to perform the specified access operations whilst meeting the required observability conditions.
  • the decoder may be arranged at step 400 to decompose the ordering constrained access instruction that is used to access multiple data values into a sequence of single access ordering constrained access instructions, each of which is used to access one of the data values.
  • the new form of load acquire or store release instructions that are used to access multiple data values can be broken down into a series of existing load acquire or store release instructions, each of which performs an access in respect of the data value associated with a single register.
  • the ordering indication can be determined using any of the techniques discussed earlier. Based on the determined ordering indication, then at step 410 the decoder can determine the order in which to control the processing circuitry to execute each of the single access ordering constrained access instructions. Hence, in one example implementation, the series may be executed in the originally determined decomposed order, for example if the ordering indication indicates that the lowest memory address should be accessed first, but if instead the ordering indication indicates that the highest memory address should be accessed first, the decoder may reverse the order in which the single access ordering constrained access instructions are executed. This provides a particularly simple and efficient mechanism for implementing the handling of these new load acquire and store release instructions.
  • Figure 7 illustrates a simulator implementation that may be used. Whilst the earlier described examples implement the present invention in terms of apparatus and methods for operating specific processing hardware supporting the techniques concerned, it is also possible to provide an instruction execution environment in accordance with the examples described herein which is implemented through the use of a computer program. Such computer programs are often referred to as simulators, insofar as they provide a software based implementation of a hardware architecture. Varieties of simulator computer programs include emulators, virtual machines, models, and binary translators, including dynamic binary translators. Typically, a simulator implementation may run on a host processor 515, optionally running a host operating system 510, supporting the simulator program 505.
  • the hardware may be multiple layers of simulation between the hardware and the provided instruction execution environment, and/or multiple distinct instruction execution environments provided on the same host processor.
  • powerful processors have been required to provide simulator implementations which execute at a reasonable speed, but such an approach may be justified in certain circumstances, such as when there is a desire to run code native to another processor for compatibility or re-use reasons.
  • the simulator implementation may provide an instruction execution environment with additional functionality which is not supported by the host processor hardware, or provide an instruction execution environment typically associated with a different hardware architecture.
  • An overview of simulation is given in “Some Efficient Architecture Simulation Techniques”, Robert Bedichek, Winter 1990, USENIX Conference, Pages 53 to 63.
  • a simulated implementation equivalent functionality may be provided by suitable software constructs or features.
  • particular circuitry may be provided in a simulated implementation as computer program logic.
  • memory hardware such as register or cache, may be provided in a simulated implementation as a software data structure.
  • the physical address space used to access memory in the hardware apparatus could be emulated as a simulated address space which is mapped on to the virtual address space used by the host operating system 510 by the simulator 505.
  • some simulated implementations may make use of the host hardware, where suitable.
  • the simulator program 505 may be stored on a computer readable storage medium (which may be a non-transitory medium), and provides a virtual hardware interface (instruction execution environment) to the target code 500 (which may include applications, operating systems and a hypervisor) which is the same as the hardware interface of the hardware architecture being modelled by the simulator program 505.
  • the program instructions of the target code 500 may be executed from within the instruction execution environment using the simulator program 505, so that a host computer 515 which does not actually have the hardware features of the apparatus discussed above can emulate those features.
  • the simulator program may include processing program logic 520 to emulate the behaviour of the processing circuitry 52, 62, instruction decode program logic 525 to emulate the behaviour of the instruction decoder 50, 60, and register emulating program logic 522 to maintain data structures to emulate the set of registers 54, 64.
  • processing program logic 520 to emulate the behaviour of the processing circuitry 52, 62
  • instruction decode program logic 525 to emulate the behaviour of the instruction decoder 50, 60
  • register emulating program logic 522 to maintain data structures to emulate the set of registers 54, 64.
  • the techniques described herein provide a particularly efficient mechanism for handling ordering constrained access operations such as load acquire and store release operations, enabling code density and hence performance to be improved by enabling an individual load acquire or store release instruction to be specified that causes data values associated with multiple registers to be accessed, whilst ensuring that the required observability behaviour between the individual access operations required to implement the load acquire or store release instruction is met.
  • the words “configured to. . .” are used to mean that an element of an apparatus has a configuration able to carry out the defined operation.
  • a “configuration” means an arrangement or manner of interconnection of hardware or software.
  • the apparatus may have dedicated hardware which provides the defined operation, or a processor or other processing device may be programmed to perform the function. “Configured to” does not imply that the apparatus element needs to be changed in any way in order to provide the defined operation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
  • Executing Machine-Instructions (AREA)
PCT/GB2023/050589 2022-04-07 2023-03-13 Technique for handling ordering constrained access operations Ceased WO2023194702A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
CN202380031691.9A CN118974699A (zh) 2022-04-07 2023-03-13 用于处理排序受限制存取操作的技术
KR1020247036464A KR20240167917A (ko) 2022-04-07 2023-03-13 오더링 제약 액세스 동작들을 핸들링하기 위한 기술
EP23711543.1A EP4505292B1 (en) 2022-04-07 2023-03-13 Technique for handling ordering constrained access operations
JP2024558321A JP2025511310A (ja) 2022-04-07 2023-03-13 順序付け制約付きアクセス動作を処理するための技法
US18/853,552 US20250190217A1 (en) 2022-04-07 2023-03-13 Technique for handling ordering constrained access operations
IL315456A IL315456A (en) 2022-04-07 2023-03-13 A technique for handling the arrangement of forced access operations

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB2205110.6A GB2617551B (en) 2022-04-07 2022-04-07 Technique for handling ordering constrained access operations
GB2205110.6 2022-04-07

Publications (1)

Publication Number Publication Date
WO2023194702A1 true WO2023194702A1 (en) 2023-10-12

Family

ID=81653133

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2023/050589 Ceased WO2023194702A1 (en) 2022-04-07 2023-03-13 Technique for handling ordering constrained access operations

Country Status (9)

Country Link
US (1) US20250190217A1 (enExample)
EP (1) EP4505292B1 (enExample)
JP (1) JP2025511310A (enExample)
KR (1) KR20240167917A (enExample)
CN (1) CN118974699A (enExample)
GB (1) GB2617551B (enExample)
IL (1) IL315456A (enExample)
TW (1) TW202340938A (enExample)
WO (1) WO2023194702A1 (enExample)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2701463A (en) * 2024-10-15 2026-04-29 Arm Ltd Masked load/store instruction for gpu

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5887183A (en) * 1995-01-04 1999-03-23 International Business Machines Corporation Method and system in a data processing system for loading and storing vectors in a plurality of modes
US20030131205A1 (en) * 2002-01-04 2003-07-10 Huck Jerome C. Atomic transfer of a block of data
US20180095756A1 (en) * 2016-09-30 2018-04-05 Intel Corporation Processors, methods, systems, and instructions to load multiple data elements to destination storage locations other than packed data registers

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5201039A (en) * 1987-09-30 1993-04-06 Mitsubishi Denki Kabushiki Kaisha Multiple address-space data processor with addressable register and context switching
GB2549239A (en) * 2014-11-13 2017-10-18 Advanced Risc Mach Ltd Context sensitive barriers in data processing
US20190065199A1 (en) * 2017-08-31 2019-02-28 MIPS Tech, LLC Saving and restoring non-contiguous blocks of preserved registers

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5887183A (en) * 1995-01-04 1999-03-23 International Business Machines Corporation Method and system in a data processing system for loading and storing vectors in a plurality of modes
US20030131205A1 (en) * 2002-01-04 2003-07-10 Huck Jerome C. Atomic transfer of a block of data
US20180095756A1 (en) * 2016-09-30 2018-04-05 Intel Corporation Processors, methods, systems, and instructions to load multiple data elements to destination storage locations other than packed data registers

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ROBERT BEDICHEK: "Some Efficient Architecture Simulation Techniques", USENIX CONFERENCE, 1990, pages 53 - 63

Also Published As

Publication number Publication date
US20250190217A1 (en) 2025-06-12
CN118974699A (zh) 2024-11-15
KR20240167917A (ko) 2024-11-28
GB2617551A (en) 2023-10-18
EP4505292B1 (en) 2026-04-29
GB2617551B (en) 2024-08-28
JP2025511310A (ja) 2025-04-15
TW202340938A (zh) 2023-10-16
IL315456A (en) 2024-11-01
EP4505292A1 (en) 2025-02-12
GB202205110D0 (en) 2022-05-25

Similar Documents

Publication Publication Date Title
EP1008053B1 (en) Controlling memory access ordering in a multi-processing system
US9195786B2 (en) Hardware simulation controller, system and method for functional verification
US20190095389A1 (en) An apparatus and method for performing operations on capability metadata
US11119778B2 (en) Apparatus and method for controlling execution of instructions
JP5318197B2 (ja) ホストデータ処理装置内におけるデバイスエミュレーションのサポート
JP2023512502A (ja) ケイパビリティベースの処理のための装置及び方法
US20250190217A1 (en) Technique for handling ordering constrained access operations
CN110291507B (zh) 用于提供对存储器系统的加速访问的方法和装置
US9658849B2 (en) Processor simulation environment
US11614985B2 (en) Insert operation
US8417508B2 (en) Multiprocessor development environment
US5822607A (en) Method for fast validation checking for code and data segment descriptor loads
US5815729A (en) Method and apparatus for on the fly descriptor validation
US11720619B2 (en) Filtering based on a range specifier
US20260044344A1 (en) Technique for controlling stashing of data
US20240394061A1 (en) Methods and systems for data transfer
WO2025186533A1 (en) Tag-non-preserving write operation
EP4639355A1 (en) Apparatus, method and computer program, for performing translation table entry load/store operation
WO2008004006A1 (en) Multiprocessor development environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23711543

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 315456

Country of ref document: IL

WWE Wipo information: entry into national phase

Ref document number: 202380031691.9

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2024558321

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 18853552

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 202417074877

Country of ref document: IN

ENP Entry into the national phase

Ref document number: 20247036464

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2023711543

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2023711543

Country of ref document: EP

Effective date: 20241107

WWP Wipo information: published in national office

Ref document number: 18853552

Country of ref document: US