CN110795150A

CN110795150A - Implementation of load fetch/store release instruction by load/store operation according to DMB operation

Info

Publication number: CN110795150A
Application number: CN201910999320.5A
Authority: CN
Inventors: M·阿什克拉夫特; C·纳尔逊
Original assignee: Applied Micro Circuits Corp
Current assignee: MACOM Connectivity Solutions LLC
Priority date: 2015-07-21
Filing date: 2015-07-21
Publication date: 2020-02-14
Also published as: CN108139903B; EP3326059A1; WO2017014752A1; CN108139903A; JP6739513B2; JP2018523235A; EP3326059A4

Abstract

The present application relates to implementing a load fetch/store release instruction with a load/store operation in accordance with a DMB operation. Systems and methods are provided for semantically simplifying load fetch and store release used in reduced instruction set operations (RISC). Translating the semantic meaning into micro-operations, or low-level instructions, for implementing complex machine instructions avoids having to implement complex new memory operations. Barrier operations using one or more data memories in conjunction with load and store operations may provide sufficient ordering when the data memory barrier ensures that a preceding instruction is performed and completed before a subsequent instruction is executed.

Description

Implementation of load fetch/store release instruction by load/store operation according to DMB operation

The application is a divisional application of PCT patent application with the application date of 2015, 7, month and 21, entering China, and the application number of the PCT patent is 201580082189.6, and the invention is named as 'implementing a load acquisition/store release instruction according to DMB operation by load/store operation'.

Technical Field

The present application relates to memory operation ordering (memory operation ordering) in a computing environment.

Background

In lock-free operations, there are two ways in which threads (threads) can manipulate shared memory, which may compete for resources, or which may transfer information from one thread to another in a coordinated manner. The capture and release semantics are used to achieve a coordinated passing of information from one thread to another. The fetch and release semantics provide a structured system for ensuring that memory operations are properly sequenced to avoid errors. A store release (store release) instruction ensures that all preceding instructions have completed, while a load acquire (load acquire) instruction ensures that all following instructions will complete only after they have completed. In order to properly sequence memory operations using fetch and release semantics, a complex combination of store release and load fetch instructions is necessary.

Disclosure of Invention

Disclosed herein are systems and methods for semantically simplifying load fetches and store releases used in Reduced Instruction Set Computing (RISC). Particular embodiments are used to sequence memory operations with respect to the instructions disclosed herein. A typical load with a fetch instruction only requires that memory operations following the load with a fetch be sequenced out after the load with a fetch that does not have any sequencing applied to the instruction before the load with a fetch (both with respect to the load with a fetch and with respect to subsequent instructions). In one embodiment, the load with a prefetch includes a data memory barrier used by a companion load operation that ensures that all accesses prior to and including the load with a prefetch are sequenced out before all accesses from the instruction following the load with a prefetch are performed.

Similarly, a conventional store with a release instruction imposes ordering between accesses made from the store with a release and accesses of all preceding instructions (but not subsequent instructions). However, in a particular embodiment, the data memory barrier at the beginning of a store with a release provides strong ordering between previous accesses and store-associated accesses with a release.

In a particular embodiment, a system includes a processor that executes computer-executable instructions for performing operations. The instruction may include a load with acquire instruction that performs memory operation sequencing, where the load with acquire instruction includes a load operation followed by a barrier operation of the data memory.

In another embodiment, a method includes executing instructions in a processor. The method may include a load with fetch instruction for memory operation sequencing, wherein executing the load with fetch instruction includes executing a load operation followed by a barrier operation of a data memory.

In a particular embodiment, a system includes a processor that executes computer-executable instructions for performing operations. The instruction may include a store with a release instruction to sequence memory operations, where the store with a release instruction includes a barrier operation of a first data memory followed by a store operation followed by a barrier operation of a second data memory.

In one embodiment, a method includes executing instructions in a processor. The method may include storing a release-with-instruction for sequencing memory operations, wherein executing the storing of the release-with-instruction includes performing a barrier operation of a first data memory, followed by performing a store operation, followed by performing a barrier operation of a second data memory.

Drawings

FIG. 1 is a block diagram depicting one embodiment of a system for screening memory operations according to aspects described herein.

FIG. 2 is a block diagram depicting one embodiment of a system for screening memory operations according to aspects described herein.

FIG. 3 is a block diagram depicting one embodiment of a system for screening memory operations according to aspects described herein.

FIG. 4 is a block diagram depicting one embodiment of a system for screening memory operations according to aspects described herein.

FIG. 5 is a flow diagram of one embodiment of a method for executing a load with a fetch instruction.

FIG. 6 is a flow diagram depicting one embodiment of a method for executing a store with release instruction.

FIG. 7 depicts a flow diagram of one embodiment of a method for screening memory operations using data memory barriers.

FIG. 8 is a block diagram illustrating an electronic computing environment that may be implemented with one or more aspects described herein.

Fig. 9 is a block diagram of a data communication network that may operate in accordance with various aspects described herein.

Description of the main component symbols:

100,200,300,400 system

102 load/store component

104 processor

106 memory

108 storage component

110 input

112 output

202 preceding instruction

204,304,402,406 data store barrier

206 subsequent instruction

302 load operation

404 store operation

500,600,700 method

502 to 504,602 to 606,702 to 704

800 computing system environment

810 computer

820 processing unit

821 system bus

830 System memory

870 remote computer.

Detailed Description

Embodiments are provided for a system that semantically simplifies load fetch and store release used in Reduced Instruction Set Computing (RISC). In lock-free operations, there are two ways in which threads can manipulate shared memory, which may compete for resources, or may transfer information from one thread to another in a coordinated manner. However, these semantics are complex and replacing dedicated semantics with a simple data memory barrier simplifies the process of memory sequencing. Translating semantics into micro-operations, or low-order instructions, for implementing complex machine instructions may avoid having to implement complex new memory operations. The use of data memory barriers in conjunction with load and store instructions may provide sufficient sequencing using purely brute force sequencing operations.

The terms "instruction," "operation," and "access" as used in this application refer to separate processes and are not interchangeable. An instruction consists of one or more operations, and an operation may include zero or more memory accesses or barriers. For example, a load with a fetch instruction establishes two operations (a load operation and a barrier operation). This barrier divides all memory accesses into two groups. The first group includes accesses from all instructions prior to the load with fetch and accesses from load operations belonging to the load with fetch. The second group includes accesses from all instructions after the load with fetch.

FIG. 1 shows a system 100 for screening memory operations using data memory barriers in a RISC processor, processing environment, or architecture. The RISC processor may comprise a variation of an ARM processor, and in particular, in this particular embodiment, may comprise an ARMv8 processor. As shown, for example, the system 100 may include a load/store component 102 that may be communicatively coupled and/or operatively coupled to a processor 104 for facilitating operation and/or execution of computer-executable instructions and/or components by the system 100, a memory 106 for storing data and/or computer-executable instructions and/or components for execution by the system 100 utilizing the processor 104, and a storage component 108 for providing longer term storage of data and/or computer-executable instructions and/or components that may be executed by the system 100 utilizing the processor 104, for example. Additionally, and as shown, the system 100 may receive an input 110 that may be transitioned from a first state to a second state by execution of one or more computer-executable instructions and/or components by the processor 104, where the first state may be distinct, and/or distinguishable, and/or different from the second state. The system 100 may also generate an output 112, which may include an item that has been transformed into a different state or thing through processing by the system 100.

FIG. 2 depicts a block diagram of an embodiment of a system that screens memory operations, according to aspects described herein. The system 200 includes a data memory barrier 204 that imposes an ordering constraint on a preceding instruction 202 and a subsequent instruction 206. The data memory barrier 204 is a type of barrier operation that causes a CPU or compiler (compiler) to impose ordering constraints on memory operations issued before and after the barrier operation. This generally means ensuring that certain operations are performed before the barrier and other operations are performed later. The data memory barrier 204 ensures that the preceding instruction 202 is made and completed before the subsequent instruction 206 is executed. The preceding instructions 202 and the subsequent instructions 206 may each include various combinations of basic load and store instructions plus more complex variations of these instructions (e.g., no fetch load, no release store, etc.).

In one embodiment, the preceding instruction 202 and the subsequent instruction 206 may comprise load or store instructions configured to load a first set of data from memory and store a second set of data to the memory. The data memory barrier 204 may be configured for ordering load and store data-associated memory operations, where such type of ordering achieved is based on the location of the program order (program order) of the data memory relative to one or more load and store instructions.

FIG. 3 is a block diagram depicting one embodiment of a system for screening memory operations via loads with fetch instructions in accordance with aspects described herein. The system 300 may include a data memory barrier 304 that orders load operations 302 that precede the data memory barrier 304 in program order. The data memory barrier 304 ensures that the load operation 302 is performed and completed before subsequent instructions are executed. The system 300 shows a load with purely fetch instructions including a load operation and a barrier operation of the data memory. In other embodiments, other types of load operations may result in different load instructions, such as a no fetch load and other variants.

FIG. 4 depicts one embodiment of a system for performing a store with release instruction, according to aspects described herein. System 400 may include data memory barriers 402 and 406 on either side of store operation 404 in program order. The data memory barrier 402 ensures that all preceding instructions/operations have been stalled before the store operation 404 is initiated, while the data memory barrier 406 ensures that the store operation 404 is completed before any subsequent memory instructions/operations occur. Additionally, the first data memory barrier 402 and the second data memory barrier 406 also establish ordering to ensure that store with release and load with acquire instructions are observed in program order.

While the method implemented in accordance with the patent object may be shown and described as a series of blocks with reference to the flowcharts of fig. 5-7, it is to be understood that the patent object is not limited by the order of the blocks, as some blocks may occur in different orders and/or concurrently with other blocks from what is depicted and described herein. Moreover, not all illustrated blocks may be required to implement the methodologies described hereinafter.

FIG. 5 is a flow diagram of one embodiment of a method for executing a load with a fetch instruction. The method 500 may begin at 502 where a load operation is performed, where the load operation specifies an address for accessing data from memory.

At 504, a data memory barrier may be executed. The data memory barrier is a type of barrier operation that causes a CPU or compiler to impose ordering constraints on memory operations issued before and after the barrier instruction. This generally means ensuring that certain operations are performed before the barrier and other operations are performed later. The data memory barrier ensures that a preceding instruction is made and completed before a subsequent instruction is executed. In this implementation, the barrier operation of the data memory ensures that the prior load operation is performed and completed before the subsequent instruction is executed.

FIG. 6 is a flow diagram illustrating one embodiment of a method for performing a store with release instruction. The method 600 may begin at 602 where a barrier operation of a first data store is performed. The data memory barrier is a type of barrier instruction that causes a CPU or compiler to impose ordering constraints on memory operations issued before and after the barrier instruction.

At 604, a store operation is performed. The store operation specifies an address for writing data to memory. At 606, a barrier operation is performed for the second data memory. Having a store operation before the barrier operation of two data memories ensures that all other memory operations have been performed and completed before the store operation is performed, and then no other memory operations are allowed before the store operation is completed. In this manner, store with release instructions use a pure store and data memory barrier operation for memory operation sequencing.

FIG. 7 is a flow diagram of one embodiment of a method for screening memory operations using data memory barriers. The method 700 may begin at 702 where a first set of memory operations is performed prior to a barrier. The barrier ensures that all instructions are completed before step 704, where the second set of memory operations are executed after the data memory barrier.

The techniques described herein may be applied to any reduced instruction set computing environment in which memory operation sequencing or screening is desired. It is understood that handheld, portable and other computing devices and computing objects of all kinds are contemplated for use with various embodiments (i.e., any embodiment capable of memory operations). The below general purpose remote computer described below in FIG. 8 is an example, and the disclosed subject matter can be implemented with any client having network/bus interoperability and interaction. Thus, the disclosed patent objects may be implemented on a chip or system in a network-linked hosted services environment, for example, where a client device acts merely as an interface to a network/bus, such as an object placed in an appliance.

FIG. 8 illustrates one embodiment of a suitable computing system environment 800 in which aspects of the disclosed subject matter may be implemented, the computing system environment 800 being but one embodiment of a computing environment suitable for use with a device and is not intended to suggest any limitation as to the scope of use or functionality of the disclosed subject matter.

FIG. 8 is an exemplary device for implementing the disclosed patented object, including a general purpose computing device in the form of a computer 810. Components of computer 810 may include a processing unit 820, a system memory 830, and a system bus 821 that couples various system components including the system memory to the processing unit 820. The system bus 821 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. Computer 810 typically includes a variety of computer readable media.

The system memory 830 may include computer storage media in the form of volatile and/or nonvolatile memory such as Read Only Memory (ROM) and/or Random Access Memory (RAM). A basic input/output system (BIOS), containing the basic routines that help to transfer information between elements within computer 810, may be stored in memory 830. The computer 810 may also include other removable/non-removable, volatile/nonvolatile computer storage media.

A user may enter commands and information into the computer 810 through input devices such as a keyboard and pointing device, commonly referred to as a mouse, trackball or touch pad.

The computer 810 may operate in a networked or distributed environment using logical connections to one or more other remote computers, such as a remote computer 870, which may in turn have different media functions than the device 810.

In addition to the foregoing, the disclosed patent objects may be implemented as a method, apparatus, or article of manufacture using typical manufacturing, programming, or engineering techniques to produce hardware, firmware, software, or any suitable combination thereof to control an electronic device to implement the disclosed patent objects. A computer-readable medium may include a hardware medium, or a software medium, which may include a non-transitory medium or a transport medium.

Claims

1. A processor, comprising:

a load/store component configured to facilitate memory operations, the load/store component further configured to receive a load with a fetch instruction and execute the load with the fetch instruction as a load operation followed by a barrier operation of a data memory, the barrier operation of the data memory replacing a fetch set of load fetch semantics.

2. The processor as recited in claim 1 wherein a barrier operation of the data memory orders memory operations comprising a first set of memory operations occurring before the barrier operation and a second set of memory operations occurring after the barrier operation.

3. The processor of claim 1,

the load operation specifying an address for accessing first data from the memory; and

the load with fetch instruction includes at least one of a plurality of types of loads with fetch instructions.

4. The processor of claim 1, wherein the processor is a Reduced Instruction Set Computing (RISC) processor.

5. The processor of claim 1, wherein the processor is an advanced reduced instruction set computing machine (ARM) processor.

6. The processor of claim 1, wherein the load with fetch instruction is a load without fetch instruction.

7. The processor of claim 1, wherein the processor is a server processor configured to execute client requests associated with one or more network interoperable clients.

8. A processor, comprising:

a load/store component configured to facilitate memory operations, the load/store component further configured to receive a store with a release instruction and execute the store with the release instruction, wherein executing the store with the release instruction comprises a barrier operation of a first data memory followed by a store operation followed by a barrier operation of a second data memory, the barrier operation of the second data memory replacing a set of store release semantics.

9. The processor of claim 8,

the barrier operation of the first data store and the barrier operation of the second data store sequencing memory operations, the memory operations including a first set of memory operations occurring before the barrier operation of the first data store and the barrier operation of the second data store and a second set of memory operations occurring after the barrier operation of the first data store and the barrier operation of the second data store; and

the store operation specifies an address for writing the first data to the memory.

10. The processor of claim 8, wherein the processor is a Reduced Instruction Set Computing (RISC) processor.

11. The processor of claim 10, wherein the processor is an advanced reduced instruction set computing machine (ARM) processor.

12. The processor of claim 8, wherein the store with release instruction is a no release instruction store.

13. The processor of claim 8, wherein the processor is a server processor configured to execute client requests associated with one or more network interoperable clients.