WO2017014752A1 - Implementation of load acquire/store release instructions using load/store operation with dmb operation - Google Patents
Implementation of load acquire/store release instructions using load/store operation with dmb operation Download PDFInfo
- Publication number
- WO2017014752A1 WO2017014752A1 PCT/US2015/041322 US2015041322W WO2017014752A1 WO 2017014752 A1 WO2017014752 A1 WO 2017014752A1 US 2015041322 W US2015041322 W US 2015041322W WO 2017014752 A1 WO2017014752 A1 WO 2017014752A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- memory
- load
- operations
- instructions
- barrier
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G06F9/30043—LOAD or STORE instructions; Clear instruction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30087—Synchronisation or serialisation instructions
Definitions
- This disclosure relates to memory operation ordering in a computing environment.
- threads can manipulate shared memory, they can compete with each other for a resource, or they can pass information co-operatively from one thread to another.
- Acquire and release semantics are used to accomplish passing information cooperatively from one thread to another.
- Acquire and release semantics provide a structural system for ensuring that memory operations are ordered correctly to avoid errors.
- Store release instructions ensure that all previous instructions are completed, and load-acquire instructions ensure that all following instructions will complete only after it completes.
- a load with acquire comprises a data memory barrier that is used in conjunction with a load operation which guarantees that all accesses prior to and including the load with acquire are ordered before all access from instructions after the load with acquire.
- a system comprises a processor that executes computer-executable instructions to perform operations.
- the instructions can include a load with acquire instruction that performs memory operation ordering, wherein the load with acquire instruction comprises a load operation followed by a data memory barrier operation.
- a method comprises executing instructions in a processor.
- the method can include a load with acquire instruction for performing memory operation ordering, wherein executing the load with acquire instruction comprises executing a load operation followed by a data memory barrier operation.
- a system comprises a processor that executes computer-executable instructions to perform operations.
- the instructions can include a store with release instruction that performs memory operation ordering, wherein the store with release instruction comprise a first data memory barrier operation followed by a store operation followed by a second data memory barrier operation.
- a method comprises executing instructions in a processor.
- the method can include a store with release instruction for performing memory operation ordering, wherein the executing the store with release instruction comprises executing a first data memory barrier operation followed by executing a store operation followed by executing a second data memory barrier operation.
- FIG. 1 is a block diagram illustrating an embodiment of a system that filters memory operations in accordance with various aspects described herein.
- FIG. 2 is a block diagram illustrating an embodiment of a system that filters memory operations in accordance with various aspects described herein.
- FIG. 3 is a block diagram illustrating an embodiment of a system that filters memory operations in accordance with various aspects described herein.
- FIG. 4 is a block diagram illustrating an embodiment of a system that filters memory operations in accordance with various aspects described herein.
- FIG. 5 illustrates a flow diagram of an embodiment of a method for executing a load with acquire instruction.
- FIG. 6 illustrates a flow diagram of an embodiment of a method for executing a store with release instruction.
- FIG. 7 illustrates a flow diagram of an embodiment of a method for filtering memory operations using a data memory barrier.
- FIG. 8 illustrates a block diagram of an electronic computing environment that can be implemented in conjunction with one or more aspects described herein.
- FIG. 9 illustrates a block diagram of a data communication network that can be operable in conjunction with various aspects described herein.
- Various embodiments provide for a system that simplifies load acquire and store release semantics that are used in reduced instruction set computing (RISC).
- RISC reduced instruction set computing
- threads can manipulate shared memory, they can compete with each other for a resource, or they can pass information co-operatively from one thread to another.
- These semantics are complex however, and replacing specialized semantics with simple data memory barriers can simplify the process of memory ordering.
- Translating semantics into micro-operations, or low-level instructions used to implement complex machine instructions can avoid having to implement complicated new memory operations.
- Using a data memory barrier in conjunction with load and store instructions can provide sufficient ordering using simple brute force ordering operations.
- an instruction is composed of one or more operations, while an operation may include zero or more memory accesses or barriers.
- a load with acquire instruction creates two operations (a load operations and a barrier operation). This barrier splits all memory accesses into two groups. The first group comprises accesses from all instructions prior to the load with acquire as well as the access from the load operation that belongs to the load with acquire. The second group comprises accesses from all instructions after the load with acquire instruction.
- Fig. 1 illustrates a system 100 that filters memory operations using a data memory barrier in a RISC processor, processing environment, or architecture.
- the RISC processor can include variations of ARM processors, and specifically, in this embodiment, an ARMv8 processor.
- system 100 can include load/store component 102 that can be
- processor 104 communicatively coupled and/or operationally coupled to processor 104 for facilitating operation and/or execution of computer executable instructions and/or components by system 100
- memory 106 for storing data and/or computer executable instructions and/or components for execution by system 100 utilizing processor 104, for instance
- storage component 108 for providing longer term storage for data and/or computer executable
- system 100 can receive input 110 that can be transformed by execution of one or more computer executable instructions and/or components, by processor 104, from a first state to a second state, wherein the first state can be distinguished and/or is discernible and/or is different from the second state.
- System 100 can also produce output 112 that can include an article that has been transformed, through processing by system 100, into a different state or thing.
- FIG. 2 illustrates a block diagram of an embodiment of a system that filters memory operations in accordance with various aspects described herein.
- System 200 includes a data memory barrier 204 that enforces an ordering constraint on prior instructions 202 and subsequent instructions 206.
- Data memory barrier 204 is a type of barrier operation which causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier operation. This typically means that certain operations are guaranteed to be performed before the barrier, and others after.
- Data memory barrier 204 ensures that prior instructions 202 are performed and completed before subsequent instructions 206 are executed.
- Prior instructions 202 and subsequent instructions 206 can each include various combinations of basic load and store instructions plus more complex variants of these instructions (e.g., load-exclusive with acquire, store- exclusive with release, and etc).
- prior instructions 202 and subsequent instructions 206 can comprise load or store instructions that are configured for loading a first set of data from a memory and storing a second set of data to the memory.
- the data memory barrier 204 can be configured for ordering the memory operations associated with loading and storing the data, wherein the type of ordering accomplished is based on the position in a program order of the data memory relative to the one or more load instructions and store instructions.
- FIG. 3 is a block diagram illustrating an embodiment of a system that filters memory operations via a load with acquire instruction in
- System 300 can include a data memory barrier 304 that orders load operation 302 that precedes the data memory barrier 304 in a program order.
- Data memory barrier 304 ensures that load operation 302 is performed and completed before subsequent instructions are executed.
- System 300 shows a simple load with acquire instruction that comprises a load operation and a data memory barrier operation. In other embodiments, other types of load operations can result in different load instructions, such as load exclusive with acquire and other variants.
- F!G. 4 illustrates an embodiment of a system that performs a store with release instruction in accordance with various aspects described herein.
- System 400 can include data memory barriers 402 and 406 on either side of a store operation 404 in a program order.
- Data memory barrier 402 ensures that all prior instructions/operations have ceased before store operation 404 is initiated, while data memory barrier 406 ensures that store operation 404 is completed before any subsequent memory
- first data memory barrier 402 and second data memory barrier 406 also create an ordering to ensure that store with release and load with acquire instructions are observed in program order.
- F!G. 5 illustrates a flow diagram of an embodiment of a method for executing a load with acquire instruction.
- Methodology 500 can start at 502, where a load operation is executed, wherein the load operation specifies an address for accessing a data from a memory.
- a data memory barrier can be executed.
- the data memory barrier is a type of barrier operation which causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. The typically means that certain operations are guaranteed to be performed before the barrier, and others after.
- Data memory barrier ensures that prior instructions are performed and completed before subsequent instructions are executed. In this instance, the data memory barrier operation ensures that the prior load operation is performed and completed before subsequent instructions are executed.
- Methodology 600 can start at 602, where a first data memory barrier operation is executed.
- the data memory barrier is a type of barrier instruction which causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.
- a store operation is executed.
- the store operation specifies an address for writing data to memory.
- a second data memory barrier operation is executed. Having a store operation between two data memory barrier operations ensures that all other memory operations have been performed and are completed before the store operation is executed, and then no other memory operations are allowed until the store operation is completed. In this way, the store with release instruction performed memory operation ordering using simple store and data memory barrier operations.
- FIG. 7 is a flow diagram of an embodiment of a method for filtering memory operations using a data memory barrier o.
- Methodology 700 can start at 702, where a first set of memory operations are executed before a barrier.
- the barrier ensures that all instructions are completed before step 704, where a second set of memory operations are executed after the data memory barrier.
- the techniques described herein can be applied to any reduced instruction set computing environment where it is desirable to perform memory operation ordering or filtering. It is to be understood that handheld, portable and other computing devices and computing objects of all kinds are contemplated for use in connection with the various embodiments, i.e., anywhere that memory operation ordering may be performed.
- the below general purpose remote computer described below in FIG. 8 is an example, and the disclosed subject matter can be implemented with any client having network/bus interoperability and interaction.
- the disclosed subject matter can be implemented on chips or systems in an environment of networked hosted services, e.g. , a networked environment in which the client device serves merely as an interface to the network/bus, such as an object placed in an appliance.
- FIG. 8 illustrates an example of a suitable computing system environment 800 in which aspects of the disclosed subject matter can be implemented
- computing system environment 800 is only one example of a suitable computing environment for a device and is not intended to suggest any limitation as to the scope of use or functionality of the disclosed subject matter.
- FIG. 8 is an exemplary device for implementing the disclosed subject matter includes a general-purpose computing device in the form of a computer 810.
- Components of computer 810 may include a processing unit 820, a system memory 830, and a system bus 821 that couples various system components including the system memory to the processing unit 820.
- the system bus 821 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
- Computer 8 0 typically includes a variety of computer readable media.
- the system memory 830 may include computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) and/or random access memory (RAM).
- ROM read only memory
- RAM random access memory
- a basic input/output system (BIOS) containing the basic routines that help to transfer information between elements within computer 810, such as during start-up, may be stored in memory 830.
- the computer 810 may also include other removable/nonremovable, volatile/nonvolatile computer storage media.
- a user can enter commands and information into the computer 810 through input devices such as a keyboard and pointing device, commonly referred to as a mouse, trackball, or touch pad.
- input devices such as a keyboard and pointing device, commonly referred to as a mouse, trackball, or touch pad.
- the computer 810 can operate in a networked or distributed environment using logical connections to one or more other remote computer(s), such as remote computer 870, which can in turn have media capabilities different from device 810.
- Computer- readable media can include hardware media, or software media, the media can include non-transitory media, or transport media.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Executing Machine-Instructions (AREA)
- Advance Control (AREA)
- Stored Programmes (AREA)
Abstract
Systems and methods are provided for simplifying load acquire and store release semantics that are used in reduced instruction set computing (RISC). Translating the semantics into micro-operations, or low- level instructions used to implement complex machine instructions, can avoid having to implement complicated new memory operations. Using one or more data memory barrier operations in conjunction with load and store operations can provide sufficient ordering as a data memory barrier ensures that prior instructions are performed and completed before subsequent instructions are executed.
Description
IMPLEMENTATION OF LOAD ACQUIRE/STORE RELEASE INSTRUCTIONS USING LOAD/STORE OPERATION WITH DMB OPERATION
TECHNICAL FIELD
[0001] This disclosure relates to memory operation ordering in a computing environment.
BACKGROUND
[0002] In lock free computing, there are two ways in which threads can manipulate shared memory, they can compete with each other for a resource, or they can pass information co-operatively from one thread to another.
Acquire and release semantics are used to accomplish passing information cooperatively from one thread to another. Acquire and release semantics provide a structural system for ensuring that memory operations are ordered correctly to avoid errors. Store release instructions ensure that all previous instructions are completed, and load-acquire instructions ensure that all following instructions will complete only after it completes. To properly order memory operations using acquire and release semantics, complex
combinations of store release and load acquire instructions are necessary.
SUMMARY
[0003] Disclosed herein is a system and method for simplifying load acquire and store release semantics that are used in reduced instruction set computing (RISC). Embodiments provide for ordering memory operations with respect to the instructions disclosed herein. A typical load with acquire instruction only requires that memory operations after the load with acquire are ordered after the load with acquire - it does not impose any order on the instructions before the load with acquire (both with respect to the load with acquire and to the subsequent instructions). In an embodiment, a load with acquire comprises a data memory barrier that is used in conjunction with a load operation which guarantees that all accesses prior to and including the load with acquire are ordered before all access from instructions after the load with acquire.
[0004] Similarly, traditional store with release instructions impose ordering between the access from the store with release and the accesses of all prior instructions (but not subsequent instructions). In an embodiment, however, a data memory barrier at the beginning of the store with release provides a strong ordering between prior access and the access associated with the store with release.
[0005] In an embodiment, a system comprises a processor that executes computer-executable instructions to perform operations. The instructions can include a load with acquire instruction that performs memory operation ordering, wherein the load with acquire instruction comprises a load operation followed by a data memory barrier operation.
[0006] In another embodiment, a method comprises executing instructions in a processor. The method can include a load with acquire instruction for performing memory operation ordering, wherein executing the load with acquire instruction comprises executing a load operation followed by a data memory barrier operation.
[0007] In an embodiment, a system comprises a processor that executes computer-executable instructions to perform operations. The instructions can include a store with release instruction that performs memory operation ordering, wherein the store with release instruction comprise a first data memory barrier operation followed by a store operation followed by a second data memory barrier operation.
[0008] In an embodiment, a method comprises executing instructions in a processor. The method can include a store with release instruction for performing memory operation ordering, wherein the executing the store with release instruction comprises executing a first data memory barrier operation followed by executing a store operation followed by executing a second data memory barrier operation.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] FIG. 1 is a block diagram illustrating an embodiment of a system that filters memory operations in accordance with various aspects described herein.
[0010] FIG. 2 is a block diagram illustrating an embodiment of a system that filters memory operations in accordance with various aspects described herein.
[0011] FIG. 3 is a block diagram illustrating an embodiment of a system that filters memory operations in accordance with various aspects described herein.
[0012] FIG. 4 is a block diagram illustrating an embodiment of a system that filters memory operations in accordance with various aspects described herein.
[0013] FIG. 5 illustrates a flow diagram of an embodiment of a method for executing a load with acquire instruction.
[0014] FIG. 6 illustrates a flow diagram of an embodiment of a method for executing a store with release instruction.
[0015] FIG. 7 illustrates a flow diagram of an embodiment of a method for filtering memory operations using a data memory barrier.
[0016] FIG. 8 illustrates a block diagram of an electronic computing environment that can be implemented in conjunction with one or more aspects described herein.
[0017] FIG. 9 illustrates a block diagram of a data communication network that can be operable in conjunction with various aspects described herein.
DETAILED DESCRIPTION
[0018] Various embodiments provide for a system that simplifies load acquire and store release semantics that are used in reduced instruction set computing (RISC). In lock free computing, there are two ways in which threads can manipulate shared memory, they can compete with each other for a resource, or they can pass information co-operatively from one thread to another. These semantics are complex however, and replacing specialized semantics with simple data memory barriers can simplify the process of memory ordering. Translating semantics into micro-operations, or low-level instructions used to implement complex machine instructions, can avoid having to implement complicated new memory operations. Using a data
memory barrier in conjunction with load and store instructions can provide sufficient ordering using simple brute force ordering operations.
[0019] As used in this disclosure, the terms "instruction", "operation", and "access" refer to separate processes and are not interchangeable. An instruction is composed of one or more operations, while an operation may include zero or more memory accesses or barriers. By way of example, a load with acquire instruction creates two operations (a load operations and a barrier operation). This barrier splits all memory accesses into two groups. The first group comprises accesses from all instructions prior to the load with acquire as well as the access from the load operation that belongs to the load with acquire. The second group comprises accesses from all instructions after the load with acquire instruction.
[0020] Fig. 1 illustrates a system 100 that filters memory operations using a data memory barrier in a RISC processor, processing environment, or architecture. The RISC processor can include variations of ARM processors, and specifically, in this embodiment, an ARMv8 processor. As illustrated, system 100 can include load/store component 102 that can be
communicatively coupled and/or operationally coupled to processor 104 for facilitating operation and/or execution of computer executable instructions and/or components by system 100, memory 106 for storing data and/or computer executable instructions and/or components for execution by system 100 utilizing processor 104, for instance, and storage component 108 for providing longer term storage for data and/or computer executable
instructions and/or components that can be executed by system 00 using processor 104, for example. Additionally, and as depicted, system 100 can receive input 110 that can be transformed by execution of one or more computer executable instructions and/or components, by processor 104, from a first state to a second state, wherein the first state can be distinguished and/or is discernible and/or is different from the second state. System 100 can also produce output 112 that can include an article that has been transformed, through processing by system 100, into a different state or thing.
[0021] FIG. 2 illustrates a block diagram of an embodiment of a system that filters memory operations in accordance with various aspects described herein. System 200 includes a data memory barrier 204 that enforces an
ordering constraint on prior instructions 202 and subsequent instructions 206. Data memory barrier 204 is a type of barrier operation which causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier operation. This typically means that certain operations are guaranteed to be performed before the barrier, and others after. Data memory barrier 204 ensures that prior instructions 202 are performed and completed before subsequent instructions 206 are executed. Prior instructions 202 and subsequent instructions 206 can each include various combinations of basic load and store instructions plus more complex variants of these instructions (e.g., load-exclusive with acquire, store- exclusive with release, and etc).
[0022] In an embodiment, prior instructions 202 and subsequent instructions 206 can comprise load or store instructions that are configured for loading a first set of data from a memory and storing a second set of data to the memory. The data memory barrier 204 can be configured for ordering the memory operations associated with loading and storing the data, wherein the type of ordering accomplished is based on the position in a program order of the data memory relative to the one or more load instructions and store instructions.
[0023] FIG. 3 is a block diagram illustrating an embodiment of a system that filters memory operations via a load with acquire instruction in
accordance with various aspects described herein. System 300 can include a data memory barrier 304 that orders load operation 302 that precedes the data memory barrier 304 in a program order. Data memory barrier 304 ensures that load operation 302 is performed and completed before subsequent instructions are executed. System 300 shows a simple load with acquire instruction that comprises a load operation and a data memory barrier operation. In other embodiments, other types of load operations can result in different load instructions, such as load exclusive with acquire and other variants.
[0024] F!G. 4 illustrates an embodiment of a system that performs a store with release instruction in accordance with various aspects described herein. System 400 can include data memory barriers 402 and 406 on either side of a store operation 404 in a program order. Data memory barrier 402
ensures that all prior instructions/operations have ceased before store operation 404 is initiated, while data memory barrier 406 ensures that store operation 404 is completed before any subsequent memory
instructions/operations occur. In addition, first data memory barrier 402 and second data memory barrier 406 also create an ordering to ensure that store with release and load with acquire instructions are observed in program order.
[0025] Methods that may be implemented in accordance with the described subject matter with reference to the flow charts of FIGs. 5-7 are shown and described as a series of blocks, it is to be understood that the claimed subject matter is not limited by the order of the blocks, as some blocks may occur in different orders and/or concurrently with other blocks from what is depicted and described herein. Moreover, not all illustrated blocks may be required to implement the methods described hereinafter.
[0026] F!G. 5 illustrates a flow diagram of an embodiment of a method for executing a load with acquire instruction. Methodology 500 can start at 502, where a load operation is executed, wherein the load operation specifies an address for accessing a data from a memory.
[0027] At 504, a data memory barrier can be executed. The data memory barrier is a type of barrier operation which causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. The typically means that certain operations are guaranteed to be performed before the barrier, and others after. Data memory barrier ensures that prior instructions are performed and completed before subsequent instructions are executed. In this instance, the data memory barrier operation ensures that the prior load operation is performed and completed before subsequent instructions are executed.
[0028] FIG. 6, illustrated is a flow diagram of an embodiment of a method for executing a store with release instruction. Methodology 600 can start at 602, where a first data memory barrier operation is executed. The data memory barrier is a type of barrier instruction which causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.
[0029] At 604, a store operation is executed. The store operation specifies an address for writing data to memory. At 606, a second data
memory barrier operation is executed. Having a store operation between two data memory barrier operations ensures that all other memory operations have been performed and are completed before the store operation is executed, and then no other memory operations are allowed until the store operation is completed. In this way, the store with release instruction performed memory operation ordering using simple store and data memory barrier operations.
[0030] FIG. 7 is a flow diagram of an embodiment of a method for filtering memory operations using a data memory barrier o. Methodology 700 can start at 702, where a first set of memory operations are executed before a barrier. The barrier ensures that all instructions are completed before step 704, where a second set of memory operations are executed after the data memory barrier.
[0031] The techniques described herein can be applied to any reduced instruction set computing environment where it is desirable to perform memory operation ordering or filtering. It is to be understood that handheld, portable and other computing devices and computing objects of all kinds are contemplated for use in connection with the various embodiments, i.e., anywhere that memory operation ordering may be performed. The below general purpose remote computer described below in FIG. 8 is an example, and the disclosed subject matter can be implemented with any client having network/bus interoperability and interaction. Thus, the disclosed subject matter can be implemented on chips or systems in an environment of networked hosted services, e.g. , a networked environment in which the client device serves merely as an interface to the network/bus, such as an object placed in an appliance.
[0032] FIG. 8 illustrates an example of a suitable computing system environment 800 in which aspects of the disclosed subject matter can be implemented, computing system environment 800 is only one example of a suitable computing environment for a device and is not intended to suggest any limitation as to the scope of use or functionality of the disclosed subject matter.
[0033] FIG. 8 is an exemplary device for implementing the disclosed subject matter includes a general-purpose computing device in the form of a
computer 810. Components of computer 810 may include a processing unit 820, a system memory 830, and a system bus 821 that couples various system components including the system memory to the processing unit 820. The system bus 821 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. Computer 8 0 typically includes a variety of computer readable media.
[0034] The system memory 830 may include computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) and/or random access memory (RAM). A basic input/output system (BIOS), containing the basic routines that help to transfer information between elements within computer 810, such as during start-up, may be stored in memory 830. The computer 810 may also include other removable/nonremovable, volatile/nonvolatile computer storage media.
[0035] A user can enter commands and information into the computer 810 through input devices such as a keyboard and pointing device, commonly referred to as a mouse, trackball, or touch pad.
[0036] The computer 810 can operate in a networked or distributed environment using logical connections to one or more other remote computer(s), such as remote computer 870, which can in turn have media capabilities different from device 810.
[0037] In addition to the foregoing, the disclosed subject matter can be implemented as a method, apparatus, or article of manufacture using typical manufacturing, programming or engineering techniques to produce hardware, firmware, software, or any suitable combination thereof to control an electronic device to implement the disclosed subject matter. Computer- readable media can include hardware media, or software media, the media can include non-transitory media, or transport media.
Claims
1. A processor that executes computer-executable instructions to perform operations, the instructions comprising:
a load with acquire instruction that performs memory operation ordering, wherein the load with acquire instruction comprises a load operation followed by a data memory barrier operation.
2. The processor of claim 1 , wherein the data memory barrier operation orders memory operations comprising a first set of memory operations occurring before the barrier operation, and a second set of memory operations occurring after the barrier operation.
3. The processor of claim 1 , wherein
the load operation specifies an address for accessing a first data from the memory;
the load with acquire instruction comprises at least one of a plurality of types of load with acquire instructions; and
the data memory barrier operation replaces a set of load acquire semantics for memory operation ordering.
4. A method for executing instructions in a processor, comprising: executing a load with acquire instruction for performing memory operation ordering, wherein the executing the load with acquire instruction comprises executing a load operation followed by a data memory barrier operation.
5. The method of claim 4, further comprising executing a plurality of types of load with acquire instructions; wherein
executing the data memory barrier operation replaces a set of load acquire semantics for memory operation ordering; and
the load operation specifies an address for accessing a first data from the memory.
6. The method of claim 4, wherein the data memory barrier operation orders memory operations comprising a first set of memory operations occurring before the barrier operation, and a second set of memory operations occurring after the barrier operation.
7. A processor that executes computer-executable instructions to perform operations, the instructions comprising:
a store with release instruction that performs memory operation ordering, wherein the store with release instruction comprise a first data memory barrier operation followed by a store operation followed by a second data memory barrier operation.
8. The processor of claim 7, wherein
the first and second data memory barrier operations order memory operations comprising a first set of memory operations occurring before the barrier operations, and a second set of memory operations occurring after the barrier operations;
the store operation specifies an address for writing a first data to memory; and
the instructions further comprise a plurality of types of store with release instructions.
9. A method for executing instructions in a processor, comprising: executing a store with release instruction for performing memory operation ordering, wherein executing the store with release instruction comprises executing a first data memory barrier operation followed by executing a store operation followed by executing a second data memory barrier operation.
10. The method of claim 9, further comprising executing a plurality of types of store with release instructions; wherein
executing the first and second data memory barrier operations order memory operations comprising a first set of memory accesses occurring before the barrier operations, and a second set of memory accesses occurring after the barrier operations; and
the executing the second data memory barrier operation before executing a load with acquire instruction ensures the instructions are observed in a program order.
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP15899072.1A EP3326059A4 (en) | 2015-07-21 | 2015-07-21 | Implementation of load acquire/store release instructions using load/store operation with dmb operation |
JP2018502709A JP6739513B2 (en) | 2015-07-21 | 2015-07-21 | Implementation of load get/store release instructions using load/store operations with DMB operations |
CN201910999320.5A CN110795150A (en) | 2015-07-21 | 2015-07-21 | Implementation of load fetch/store release instruction by load/store operation according to DMB operation |
CN201580082189.6A CN108139903B (en) | 2015-07-21 | 2015-07-21 | Implement load acquisition/storage with load/store operations according to DMB operation to release order |
PCT/US2015/041322 WO2017014752A1 (en) | 2015-07-21 | 2015-07-21 | Implementation of load acquire/store release instructions using load/store operation with dmb operation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2015/041322 WO2017014752A1 (en) | 2015-07-21 | 2015-07-21 | Implementation of load acquire/store release instructions using load/store operation with dmb operation |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017014752A1 true WO2017014752A1 (en) | 2017-01-26 |
Family
ID=57835180
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2015/041322 WO2017014752A1 (en) | 2015-07-21 | 2015-07-21 | Implementation of load acquire/store release instructions using load/store operation with dmb operation |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP3326059A4 (en) |
JP (1) | JP6739513B2 (en) |
CN (2) | CN110795150A (en) |
WO (1) | WO2017014752A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10394558B2 (en) | 2017-10-06 | 2019-08-27 | International Business Machines Corporation | Executing load-store operations without address translation hardware per load-store unit port |
US10572257B2 (en) | 2017-10-06 | 2020-02-25 | International Business Machines Corporation | Handling effective address synonyms in a load-store unit that operates without address translation |
US10606593B2 (en) | 2017-10-06 | 2020-03-31 | International Business Machines Corporation | Effective address based load store unit in out of order processors |
US10606591B2 (en) | 2017-10-06 | 2020-03-31 | International Business Machines Corporation | Handling effective address synonyms in a load-store unit that operates without address translation |
US10977047B2 (en) | 2017-10-06 | 2021-04-13 | International Business Machines Corporation | Hazard detection of out-of-order execution of load and store instructions in processors without using real addresses |
US11175925B2 (en) | 2017-10-06 | 2021-11-16 | International Business Machines Corporation | Load-store unit with partitioned reorder queues with single cam port |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005332387A (en) * | 2004-05-04 | 2005-12-02 | Sun Microsyst Inc | Method and system for grouping and managing memory instruction |
US20100077143A1 (en) * | 2008-07-09 | 2010-03-25 | Arm Limited | Monitoring a data processing apparatus and summarising the monitoring data |
US20120198214A1 (en) * | 2009-09-25 | 2012-08-02 | Shirish Gadre | N-way memory barrier operation coalescing |
US20140089589A1 (en) * | 2012-09-27 | 2014-03-27 | Apple Inc. | Barrier colors |
US20150046652A1 (en) * | 2013-08-07 | 2015-02-12 | Advanced Micro Devices, Inc. | Write combining cache microarchitecture for synchronization events |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07302200A (en) * | 1994-04-28 | 1995-11-14 | Hewlett Packard Co <Hp> | Loading instruction method of computer provided with instruction forcing sequencing loading operation and sequencing storage |
JP2000181891A (en) * | 1998-12-18 | 2000-06-30 | Hitachi Ltd | Shared memory access sequence assurance system |
WO2005121948A1 (en) * | 2004-06-02 | 2005-12-22 | Sun Microsystems, Inc. | Method and apparatus for enforcing membar instruction semantics in an execute-ahead processor |
US7725618B2 (en) * | 2004-07-29 | 2010-05-25 | International Business Machines Corporation | Memory barriers primitives in an asymmetric heterogeneous multiprocessor environment |
US8060482B2 (en) * | 2006-12-28 | 2011-11-15 | Intel Corporation | Efficient and consistent software transactional memory |
WO2009050644A1 (en) * | 2007-10-18 | 2009-04-23 | Nxp B.V. | Data processing system with a plurality of processors, cache circuits and a shared memory |
US8935513B2 (en) * | 2012-02-08 | 2015-01-13 | International Business Machines Corporation | Processor performance improvement for instruction sequences that include barrier instructions |
US9442755B2 (en) * | 2013-03-15 | 2016-09-13 | Nvidia Corporation | System and method for hardware scheduling of indexed barriers |
-
2015
- 2015-07-21 CN CN201910999320.5A patent/CN110795150A/en active Pending
- 2015-07-21 EP EP15899072.1A patent/EP3326059A4/en active Pending
- 2015-07-21 JP JP2018502709A patent/JP6739513B2/en active Active
- 2015-07-21 CN CN201580082189.6A patent/CN108139903B/en not_active Expired - Fee Related
- 2015-07-21 WO PCT/US2015/041322 patent/WO2017014752A1/en unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005332387A (en) * | 2004-05-04 | 2005-12-02 | Sun Microsyst Inc | Method and system for grouping and managing memory instruction |
US20100077143A1 (en) * | 2008-07-09 | 2010-03-25 | Arm Limited | Monitoring a data processing apparatus and summarising the monitoring data |
US20120198214A1 (en) * | 2009-09-25 | 2012-08-02 | Shirish Gadre | N-way memory barrier operation coalescing |
US20140089589A1 (en) * | 2012-09-27 | 2014-03-27 | Apple Inc. | Barrier colors |
US20150046652A1 (en) * | 2013-08-07 | 2015-02-12 | Advanced Micro Devices, Inc. | Write combining cache microarchitecture for synchronization events |
Non-Patent Citations (1)
Title |
---|
See also references of EP3326059A4 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10394558B2 (en) | 2017-10-06 | 2019-08-27 | International Business Machines Corporation | Executing load-store operations without address translation hardware per load-store unit port |
US10572257B2 (en) | 2017-10-06 | 2020-02-25 | International Business Machines Corporation | Handling effective address synonyms in a load-store unit that operates without address translation |
US10572256B2 (en) | 2017-10-06 | 2020-02-25 | International Business Machines Corporation | Handling effective address synonyms in a load-store unit that operates without address translation |
US10606593B2 (en) | 2017-10-06 | 2020-03-31 | International Business Machines Corporation | Effective address based load store unit in out of order processors |
US10606591B2 (en) | 2017-10-06 | 2020-03-31 | International Business Machines Corporation | Handling effective address synonyms in a load-store unit that operates without address translation |
US10606590B2 (en) | 2017-10-06 | 2020-03-31 | International Business Machines Corporation | Effective address based load store unit in out of order processors |
US10606592B2 (en) | 2017-10-06 | 2020-03-31 | International Business Machines Corporation | Handling effective address synonyms in a load-store unit that operates without address translation |
US10628158B2 (en) | 2017-10-06 | 2020-04-21 | International Business Machines Corporation | Executing load-store operations without address translation hardware per load-store unit port |
US10776113B2 (en) | 2017-10-06 | 2020-09-15 | International Business Machines Corporation | Executing load-store operations without address translation hardware per load-store unit port |
US10963248B2 (en) | 2017-10-06 | 2021-03-30 | International Business Machines Corporation | Handling effective address synonyms in a load-store unit that operates without address translation |
US10977047B2 (en) | 2017-10-06 | 2021-04-13 | International Business Machines Corporation | Hazard detection of out-of-order execution of load and store instructions in processors without using real addresses |
US11175925B2 (en) | 2017-10-06 | 2021-11-16 | International Business Machines Corporation | Load-store unit with partitioned reorder queues with single cam port |
US11175924B2 (en) | 2017-10-06 | 2021-11-16 | International Business Machines Corporation | Load-store unit with partitioned reorder queues with single cam port |
Also Published As
Publication number | Publication date |
---|---|
CN110795150A (en) | 2020-02-14 |
CN108139903B (en) | 2019-11-15 |
EP3326059A1 (en) | 2018-05-30 |
CN108139903A (en) | 2018-06-08 |
JP6739513B2 (en) | 2020-08-12 |
JP2018523235A (en) | 2018-08-16 |
EP3326059A4 (en) | 2019-04-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017014752A1 (en) | Implementation of load acquire/store release instructions using load/store operation with dmb operation | |
US9152474B2 (en) | Context aware synchronization using context and input parameter objects associated with a mutual exclusion lock | |
US9720819B2 (en) | Concurrent, moving, garbage collector | |
CN107479981B (en) | Processing method and device for realizing synchronous call based on asynchronous call | |
US10592235B2 (en) | Generating an idempotent workflow | |
US9207967B2 (en) | Using nonspeculative operations for lock elision | |
DE102014003799A1 (en) | Systems and methods for transfer elimination with bypass multiple instantiation table | |
CN107643904B (en) | Method, device and medium for detecting code submission log and electronic equipment | |
US20210004212A1 (en) | Method and apparatus for compiling source code object, and computer | |
CN106716348A (en) | Shared resources in a data processing appartus for executing a plurality of threads | |
CN105094840A (en) | Atomic operation implementation method and device based on cache consistency principle | |
US9703905B2 (en) | Method and system for simulating multiple processors in parallel and scheduler | |
US10984150B2 (en) | Harness design change record and replay | |
CN104391754A (en) | Method and device for processing task exception | |
WO2014201885A1 (en) | Method and system for invoking plug-in function | |
US10338891B2 (en) | Migration between model elements of different types in a modeling environment | |
DE112013007703T5 (en) | Command and logic for identifying instructions for retirement in a multi-stranded out-of-order processor | |
US20160320984A1 (en) | Information processing device, parallel processing program and method for accessing shared memory | |
US9898301B2 (en) | Framework to provide time bound execution of co-processor commands | |
DE102015007423A1 (en) | Memory sequencing with coherent and non-coherent subsystems | |
US10310914B2 (en) | Methods and systems for recursively acquiring and releasing a spinlock | |
US11513798B1 (en) | Implementation of load acquire/store release instructions using load/store operation with DMB operation | |
US20170228304A1 (en) | Method and device for monitoring the execution of a program code | |
Khot | Parallelization in Python | |
WO2017210034A1 (en) | Asynchronous sequential processing execution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15899072 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2018502709 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |