WO2017014752A1

WO2017014752A1 - Implementation of load acquire/store release instructions using load/store operation with dmb operation

Info

Publication number: WO2017014752A1
Application number: PCT/US2015/041322
Authority: WO
Inventors: Matthew Ashcraft; Christopher Nelson
Original assignee: Applied Micro Circuits Corporation
Priority date: 2015-07-21
Filing date: 2015-07-21
Publication date: 2017-01-26
Also published as: CN110795150A; CN108139903B; EP3326059A1; CN108139903A; JP6739513B2; JP2018523235A; EP3326059A4

Abstract

Systems and methods are provided for simplifying load acquire and store release semantics that are used in reduced instruction set computing (RISC). Translating the semantics into micro-operations, or low- level instructions used to implement complex machine instructions, can avoid having to implement complicated new memory operations. Using one or more data memory barrier operations in conjunction with load and store operations can provide sufficient ordering as a data memory barrier ensures that prior instructions are performed and completed before subsequent instructions are executed.

Description

IMPLEMENTATION OF LOAD ACQUIRE/STORE RELEASE INSTRUCTIONS USING LOAD/STORE OPERATION WITH DMB OPERATION

TECHNICAL FIELD

[0001] This disclosure relates to memory operation ordering in a computing environment.

BACKGROUND

[0002] In lock free computing, there are two ways in which threads can manipulate shared memory, they can compete with each other for a resource, or they can pass information co-operatively from one thread to another.

Acquire and release semantics are used to accomplish passing information cooperatively from one thread to another. Acquire and release semantics provide a structural system for ensuring that memory operations are ordered correctly to avoid errors. Store release instructions ensure that all previous instructions are completed, and load-acquire instructions ensure that all following instructions will complete only after it completes. To properly order memory operations using acquire and release semantics, complex

combinations of store release and load acquire instructions are necessary.

SUMMARY

[0003] Disclosed herein is a system and method for simplifying load acquire and store release semantics that are used in reduced instruction set computing (RISC). Embodiments provide for ordering memory operations with respect to the instructions disclosed herein. A typical load with acquire instruction only requires that memory operations after the load with acquire are ordered after the load with acquire - it does not impose any order on the instructions before the load with acquire (both with respect to the load with acquire and to the subsequent instructions). In an embodiment, a load with acquire comprises a data memory barrier that is used in conjunction with a load operation which guarantees that all accesses prior to and including the load with acquire are ordered before all access from instructions after the load with acquire. [0004] Similarly, traditional store with release instructions impose ordering between the access from the store with release and the accesses of all prior instructions (but not subsequent instructions). In an embodiment, however, a data memory barrier at the beginning of the store with release provides a strong ordering between prior access and the access associated with the store with release.

[0005] In an embodiment, a system comprises a processor that executes computer-executable instructions to perform operations. The instructions can include a load with acquire instruction that performs memory operation ordering, wherein the load with acquire instruction comprises a load operation followed by a data memory barrier operation.

[0006] In another embodiment, a method comprises executing instructions in a processor. The method can include a load with acquire instruction for performing memory operation ordering, wherein executing the load with acquire instruction comprises executing a load operation followed by a data memory barrier operation.

[0007] In an embodiment, a system comprises a processor that executes computer-executable instructions to perform operations. The instructions can include a store with release instruction that performs memory operation ordering, wherein the store with release instruction comprise a first data memory barrier operation followed by a store operation followed by a second data memory barrier operation.

[0008] In an embodiment, a method comprises executing instructions in a processor. The method can include a store with release instruction for performing memory operation ordering, wherein the executing the store with release instruction comprises executing a first data memory barrier operation followed by executing a store operation followed by executing a second data memory barrier operation.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] FIG. 1 is a block diagram illustrating an embodiment of a system that filters memory operations in accordance with various aspects described herein. [0010] FIG. 2 is a block diagram illustrating an embodiment of a system that filters memory operations in accordance with various aspects described herein.

[0011] FIG. 3 is a block diagram illustrating an embodiment of a system that filters memory operations in accordance with various aspects described herein.

[0012] FIG. 4 is a block diagram illustrating an embodiment of a system that filters memory operations in accordance with various aspects described herein.

[0013] FIG. 5 illustrates a flow diagram of an embodiment of a method for executing a load with acquire instruction.

[0014] FIG. 6 illustrates a flow diagram of an embodiment of a method for executing a store with release instruction.

[0015] FIG. 7 illustrates a flow diagram of an embodiment of a method for filtering memory operations using a data memory barrier.

[0016] FIG. 8 illustrates a block diagram of an electronic computing environment that can be implemented in conjunction with one or more aspects described herein.

[0017] FIG. 9 illustrates a block diagram of a data communication network that can be operable in conjunction with various aspects described herein.

DETAILED DESCRIPTION

[0018] Various embodiments provide for a system that simplifies load acquire and store release semantics that are used in reduced instruction set computing (RISC). In lock free computing, there are two ways in which threads can manipulate shared memory, they can compete with each other for a resource, or they can pass information co-operatively from one thread to another. These semantics are complex however, and replacing specialized semantics with simple data memory barriers can simplify the process of memory ordering. Translating semantics into micro-operations, or low-level instructions used to implement complex machine instructions, can avoid having to implement complicated new memory operations. Using a data memory barrier in conjunction with load and store instructions can provide sufficient ordering using simple brute force ordering operations.

[0019] As used in this disclosure, the terms "instruction", "operation", and "access" refer to separate processes and are not interchangeable. An instruction is composed of one or more operations, while an operation may include zero or more memory accesses or barriers. By way of example, a load with acquire instruction creates two operations (a load operations and a barrier operation). This barrier splits all memory accesses into two groups. The first group comprises accesses from all instructions prior to the load with acquire as well as the access from the load operation that belongs to the load with acquire. The second group comprises accesses from all instructions after the load with acquire instruction.

[0020] Fig. 1 illustrates a system 100 that filters memory operations using a data memory barrier in a RISC processor, processing environment, or architecture. The RISC processor can include variations of ARM processors, and specifically, in this embodiment, an ARMv8 processor. As illustrated, system 100 can include load/store component 102 that can be

communicatively coupled and/or operationally coupled to processor 104 for facilitating operation and/or execution of computer executable instructions and/or components by system 100, memory 106 for storing data and/or computer executable instructions and/or components for execution by system 100 utilizing processor 104, for instance, and storage component 108 for providing longer term storage for data and/or computer executable

instructions and/or components that can be executed by system 00 using processor 104, for example. Additionally, and as depicted, system 100 can receive input 110 that can be transformed by execution of one or more computer executable instructions and/or components, by processor 104, from a first state to a second state, wherein the first state can be distinguished and/or is discernible and/or is different from the second state. System 100 can also produce output 112 that can include an article that has been transformed, through processing by system 100, into a different state or thing.

[0021] FIG. 2 illustrates a block diagram of an embodiment of a system that filters memory operations in accordance with various aspects described herein. System 200 includes a data memory barrier 204 that enforces an ordering constraint on prior instructions 202 and subsequent instructions 206. Data memory barrier 204 is a type of barrier operation which causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier operation. This typically means that certain operations are guaranteed to be performed before the barrier, and others after. Data memory barrier 204 ensures that prior instructions 202 are performed and completed before subsequent instructions 206 are executed. Prior instructions 202 and subsequent instructions 206 can each include various combinations of basic load and store instructions plus more complex variants of these instructions (e.g., load-exclusive with acquire, store- exclusive with release, and etc).

[0022] In an embodiment, prior instructions 202 and subsequent instructions 206 can comprise load or store instructions that are configured for loading a first set of data from a memory and storing a second set of data to the memory. The data memory barrier 204 can be configured for ordering the memory operations associated with loading and storing the data, wherein the type of ordering accomplished is based on the position in a program order of the data memory relative to the one or more load instructions and store instructions.

[0023] FIG. 3 is a block diagram illustrating an embodiment of a system that filters memory operations via a load with acquire instruction in

accordance with various aspects described herein. System 300 can include a data memory barrier 304 that orders load operation 302 that precedes the data memory barrier 304 in a program order. Data memory barrier 304 ensures that load operation 302 is performed and completed before subsequent instructions are executed. System 300 shows a simple load with acquire instruction that comprises a load operation and a data memory barrier operation. In other embodiments, other types of load operations can result in different load instructions, such as load exclusive with acquire and other variants.

[0024] F!G. 4 illustrates an embodiment of a system that performs a store with release instruction in accordance with various aspects described herein. System 400 can include data memory barriers 402 and 406 on either side of a store operation 404 in a program order. Data memory barrier 402 ensures that all prior instructions/operations have ceased before store operation 404 is initiated, while data memory barrier 406 ensures that store operation 404 is completed before any subsequent memory

instructions/operations occur. In addition, first data memory barrier 402 and second data memory barrier 406 also create an ordering to ensure that store with release and load with acquire instructions are observed in program order.

[0025] Methods that may be implemented in accordance with the described subject matter with reference to the flow charts of FIGs. 5-7 are shown and described as a series of blocks, it is to be understood that the claimed subject matter is not limited by the order of the blocks, as some blocks may occur in different orders and/or concurrently with other blocks from what is depicted and described herein. Moreover, not all illustrated blocks may be required to implement the methods described hereinafter.

[0026] F!G. 5 illustrates a flow diagram of an embodiment of a method for executing a load with acquire instruction. Methodology 500 can start at 502, where a load operation is executed, wherein the load operation specifies an address for accessing a data from a memory.

[0027] At 504, a data memory barrier can be executed. The data memory barrier is a type of barrier operation which causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. The typically means that certain operations are guaranteed to be performed before the barrier, and others after. Data memory barrier ensures that prior instructions are performed and completed before subsequent instructions are executed. In this instance, the data memory barrier operation ensures that the prior load operation is performed and completed before subsequent instructions are executed.

[0028] FIG. 6, illustrated is a flow diagram of an embodiment of a method for executing a store with release instruction. Methodology 600 can start at 602, where a first data memory barrier operation is executed. The data memory barrier is a type of barrier instruction which causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.

[0029] At 604, a store operation is executed. The store operation specifies an address for writing data to memory. At 606, a second data memory barrier operation is executed. Having a store operation between two data memory barrier operations ensures that all other memory operations have been performed and are completed before the store operation is executed, and then no other memory operations are allowed until the store operation is completed. In this way, the store with release instruction performed memory operation ordering using simple store and data memory barrier operations.

[0030] FIG. 7 is a flow diagram of an embodiment of a method for filtering memory operations using a data memory barrier o. Methodology 700 can start at 702, where a first set of memory operations are executed before a barrier. The barrier ensures that all instructions are completed before step 704, where a second set of memory operations are executed after the data memory barrier.

[0031] The techniques described herein can be applied to any reduced instruction set computing environment where it is desirable to perform memory operation ordering or filtering. It is to be understood that handheld, portable and other computing devices and computing objects of all kinds are contemplated for use in connection with the various embodiments, i.e., anywhere that memory operation ordering may be performed. The below general purpose remote computer described below in FIG. 8 is an example, and the disclosed subject matter can be implemented with any client having network/bus interoperability and interaction. Thus, the disclosed subject matter can be implemented on chips or systems in an environment of networked hosted services, e.g. , a networked environment in which the client device serves merely as an interface to the network/bus, such as an object placed in an appliance.

[0032] FIG. 8 illustrates an example of a suitable computing system environment 800 in which aspects of the disclosed subject matter can be implemented, computing system environment 800 is only one example of a suitable computing environment for a device and is not intended to suggest any limitation as to the scope of use or functionality of the disclosed subject matter.

[0033] FIG. 8 is an exemplary device for implementing the disclosed subject matter includes a general-purpose computing device in the form of a computer 810. Components of computer 810 may include a processing unit 820, a system memory 830, and a system bus 821 that couples various system components including the system memory to the processing unit 820. The system bus 821 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. Computer 8 0 typically includes a variety of computer readable media.

[0034] The system memory 830 may include computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) and/or random access memory (RAM). A basic input/output system (BIOS), containing the basic routines that help to transfer information between elements within computer 810, such as during start-up, may be stored in memory 830. The computer 810 may also include other removable/nonremovable, volatile/nonvolatile computer storage media.

[0035] A user can enter commands and information into the computer 810 through input devices such as a keyboard and pointing device, commonly referred to as a mouse, trackball, or touch pad.

[0036] The computer 810 can operate in a networked or distributed environment using logical connections to one or more other remote computer(s), such as remote computer 870, which can in turn have media capabilities different from device 810.

[0037] In addition to the foregoing, the disclosed subject matter can be implemented as a method, apparatus, or article of manufacture using typical manufacturing, programming or engineering techniques to produce hardware, firmware, software, or any suitable combination thereof to control an electronic device to implement the disclosed subject matter. Computer- readable media can include hardware media, or software media, the media can include non-transitory media, or transport media.

Claims

CLAIMS What is claimed is:

1. A processor that executes computer-executable instructions to perform operations, the instructions comprising:

a load with acquire instruction that performs memory operation ordering, wherein the load with acquire instruction comprises a load operation followed by a data memory barrier operation.

2. The processor of claim 1 , wherein the data memory barrier operation orders memory operations comprising a first set of memory operations occurring before the barrier operation, and a second set of memory operations occurring after the barrier operation.

3. The processor of claim 1 , wherein

the load operation specifies an address for accessing a first data from the memory;

the load with acquire instruction comprises at least one of a plurality of types of load with acquire instructions; and

the data memory barrier operation replaces a set of load acquire semantics for memory operation ordering.

4. A method for executing instructions in a processor, comprising: executing a load with acquire instruction for performing memory operation ordering, wherein the executing the load with acquire instruction comprises executing a load operation followed by a data memory barrier operation.

5. The method of claim 4, further comprising executing a plurality of types of load with acquire instructions; wherein

executing the data memory barrier operation replaces a set of load acquire semantics for memory operation ordering; and

the load operation specifies an address for accessing a first data from the memory.

6. The method of claim 4, wherein the data memory barrier operation orders memory operations comprising a first set of memory operations occurring before the barrier operation, and a second set of memory operations occurring after the barrier operation.

7. A processor that executes computer-executable instructions to perform operations, the instructions comprising:

a store with release instruction that performs memory operation ordering, wherein the store with release instruction comprise a first data memory barrier operation followed by a store operation followed by a second data memory barrier operation.

8. The processor of claim 7, wherein

the first and second data memory barrier operations order memory operations comprising a first set of memory operations occurring before the barrier operations, and a second set of memory operations occurring after the barrier operations;

the store operation specifies an address for writing a first data to memory; and

the instructions further comprise a plurality of types of store with release instructions.

9. A method for executing instructions in a processor, comprising: executing a store with release instruction for performing memory operation ordering, wherein executing the store with release instruction comprises executing a first data memory barrier operation followed by executing a store operation followed by executing a second data memory barrier operation.

10. The method of claim 9, further comprising executing a plurality of types of store with release instructions; wherein

executing the first and second data memory barrier operations order memory operations comprising a first set of memory accesses occurring before the barrier operations, and a second set of memory accesses occurring after the barrier operations; and

the executing the second data memory barrier operation before executing a load with acquire instruction ensures the instructions are observed in a program order.