US20150317158A1 - Implementation of load acquire/store release instructions using load/store operation with dmb operation - Google Patents

Implementation of load acquire/store release instructions using load/store operation with dmb operation Download PDF

Info

Publication number
US20150317158A1
US20150317158A1 US14/243,949 US201414243949A US2015317158A1 US 20150317158 A1 US20150317158 A1 US 20150317158A1 US 201414243949 A US201414243949 A US 201414243949A US 2015317158 A1 US2015317158 A1 US 2015317158A1
Authority
US
United States
Prior art keywords
memory
load
operations
processor
store
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/243,949
Inventor
Matthew Ashcraft
Christopher Nelson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ampere Computing LLC
Original Assignee
Applied Micro Circuits Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Applied Micro Circuits Corp filed Critical Applied Micro Circuits Corp
Priority to US14/243,949 priority Critical patent/US20150317158A1/en
Assigned to APPLIED MICRO CIRCUITS CORPORATION reassignment APPLIED MICRO CIRCUITS CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ASHCRAFT, MATTHEW, NELSON, CHRISTOPHER
Publication of US20150317158A1 publication Critical patent/US20150317158A1/en
Assigned to MACOM CONNECTIVITY SOLUTIONS, LLC reassignment MACOM CONNECTIVITY SOLUTIONS, LLC MERGER AND CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: MACOM CONNECTIVITY SOLUTIONS, LLC, APPLIED MICRO CIRCUITS CORPORATION, MACOM CONNECTIVITY SOLUTIONS, LLC
Assigned to GOLDMAN SACHS BANK USA, AS COLLATERAL AGENT reassignment GOLDMAN SACHS BANK USA, AS COLLATERAL AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MACOM CONNECTIVITY SOLUTIONS, LLC (SUCCESSOR TO APPLIED MICRO CIRCUITS CORPORATION)
Assigned to MACOM CONNECTIVITY SOLUTIONS, LLC (SUCCESSOR TO APPLIED MICRO CIRCUITS CORPORATION) reassignment MACOM CONNECTIVITY SOLUTIONS, LLC (SUCCESSOR TO APPLIED MICRO CIRCUITS CORPORATION) RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: GOLDMAN SACHS BANK USA, AS COLLATERAL AGENT
Assigned to PROJECT DENVER INTERMEDIATE HOLDINGS LLC reassignment PROJECT DENVER INTERMEDIATE HOLDINGS LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MACOM CONNECTIVITY SOLUTIONS, LLC
Assigned to AMPERE COMPUTING LLC reassignment AMPERE COMPUTING LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: PROJECT DENVER INTERMEDIATE HOLDINGS LLC
Priority to US16/424,138 priority patent/US11513798B1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30087Synchronisation or serialisation instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • G06F9/3834Maintaining memory consistency

Definitions

  • the subject disclosure relates to memory operation ordering in a reduced instruction set computing environment.
  • a system and method are provided for simplifying load acquire and store release semantics that are used in reduced instruction set computing (RISC).
  • RISC reduced instruction set computing
  • Various embodiments also provide for ordering memory operations with respect to the instructions disclosed herein.
  • a typical load with acquire instruction only requires that memory operations after the load with acquire are ordered after the load with acquire—it does not impose any order on the instructions before the load with acquire (both with respect to the load with acquire and to the subsequent instructions).
  • a load with acquire comprises a data memory barrier that is used in conjunction with a load operation which guarantees that all accesses prior to and including the load with acquire are ordered before all access from instructions after the load with acquire.
  • a system comprises a processor that executes computer-executable instructions to perform operations.
  • the instructions can include a load with acquire instruction that performs memory operation ordering, wherein the load with acquire instruction comprises a load operation followed by a data memory barrier operation.
  • a method comprises executing instructions in a processor.
  • the method can include a load with acquire instruction for performing memory operation ordering, wherein the executing the load with acquire instruction comprises executing a load operation followed by a data memory barrier operation.
  • a system comprises a processor that executes computer-executable instructions to perform operations.
  • the instructions can include a store with release instruction that performs memory operation ordering, wherein the store with release instruction comprise a first data memory barrier operation followed by a store operation followed by a second data memory barrier operation.
  • a method comprises executing instructions in a processor.
  • the method can include a store with release instruction for performing memory operation ordering, wherein the executing the store with release instruction comprises executing a first data memory barrier operation followed by executing a store operation followed by executing a second data memory barrier operation.
  • FIG. Us a block diagram illustrating an example, non-limiting embodiment of a system that filters memory operations in accordance with various aspects described herein.
  • FIG. 2 is a block diagram illustrating an example, non-limiting embodiment of a system that filters memory operations in accordance with various aspects described herein.
  • FIG. 3 is a block diagram illustrating an example, non-limiting embodiment of a system that filters memory operations in accordance with various aspects described herein.
  • FIG. 4 is a block diagram illustrating an example, non-limiting embodiment of a system that filters memory operations in accordance with various aspects described herein.
  • FIG. 5 illustrates a flow diagram of an example, non-limiting embodiment of a method for executing a load with acquire instruction.
  • FIG. 6 illustrates a flow diagram of an example, non-limiting embodiment of a method for executing a store with release instruction.
  • FIG. 7 illustrates a flow diagram of an example, non-limiting embodiment of a method for filtering memory operations using a data memory barrier.
  • FIG. 8 illustrates a block diagram of an example electronic computing environment that can be implemented in conjunction with one or more aspects described herein.
  • Various embodiments provide for a system that simplifies load acquire and store release semantics that are used in reduced instruction set computing (RISC).
  • RISC reduced instruction set computing
  • threads can manipulate shared memory, they can compete with each other for a resource, or they can pass information co-operatively from one thread to another.
  • These semantics are complex however, and replacing the specialized semantics with simple data memory barriers can simplify the process of memory ordering.
  • Translating the semantics into micro-operations, or low-level instructions used to implement complex machine instructions can avoid having to implement complicated new memory operations.
  • Using a data memory barrier in conjunction with load and store instructions can provide sufficient ordering using simple brute force ordering operations.
  • an instruction is composed of one or more operations, while an operation may include zero or more memory accesses or barriers.
  • a load with acquire instruction creates two operations (a load operations and a barrier operation). This barrier splits all memory accesses into two groups. The first group comprises accesses from all instructions prior to the load with acquire as well as the access from the load operation that belongs to the load with acquire. The second group comprises accesses from all instructions after the load with acquire instruction.
  • FIG. 1 illustrates a system 100 that filters memory operations using a data memory barrier in a RISC processor, processing environment, or architecture.
  • the RISC processor can include variations of ARM processors, and specifically, in this embodiment, an ARMv8 processor.
  • system 100 can include load/store component 102 that can be communicatively coupled and/or operationally coupled to processor 104 for facilitating operation and/or execution of computer executable instructions and/or components by system 100 , memory 106 for storing data and/or computer executable instructions and/or components for execution by system 100 utilizing processor 104 , for instance, and storage component 108 for providing longer term storage for data and/or computer executable instructions and/or components that can be executed by system 100 using processor 104 , for example.
  • load/store component 102 can be communicatively coupled and/or operationally coupled to processor 104 for facilitating operation and/or execution of computer executable instructions and/or components by system 100
  • memory 106 for storing data and/or computer executable instructions and/or components for execution by system 100 utilizing processor 104 , for instance
  • storage component 108 for providing longer term storage for data and/or computer executable instructions and/or components that can be executed by system 100 using processor 104 , for example.
  • system 100 can receive input 110 that can be transformed by execution of one or more computer executable instructions and/or components, by the processor 104 , from a first state to a second state, wherein the first state can be distinguished and/or is discernible and/or is different from the second state.
  • System 100 can also produce output 112 that can include an article that has been transformed, through processing by system 100 , into a different state or thing.
  • System 200 includes a data memory barrier 204 that enforces an ordering constraint on prior instructions 202 and subsequent instructions 206 .
  • the data memory barrier 204 is a type of barrier operation which causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier operation. The typically means that certain operations are guaranteed to be performed before the barrier, and others after.
  • Data memory barrier 204 ensures that prior instructions 202 are performed and completed before subsequent instructions 206 are executed.
  • Prior instructions 202 and subsequent instructions 206 can each include various combinations of basic load and store instructions plus more complex variants of these instructions (e.g., load-exclusive with acquire, store-exclusive with release, and etc).
  • the prior instructions 202 and subsequent instructions 206 can comprise load or store instructions that are configured for loading a first set of data from a memory and storing a second set of data to the memory.
  • the data memory barrier 204 can be configured for ordering the memory operations associated with loading and storing the data, wherein the type of ordering accomplished is based on the position in a program order of the data memory relative to the one or more load instructions and store instructions.
  • System 300 can include a data memory barrier 304 that orders load operation 302 that precedes the data memory barrier 304 in a program order.
  • Data memory barrier 304 ensures that load operation 302 is performed and completed before subsequent instructions are executed.
  • System 300 shows a simple load with acquire instruction that comprises a load operation and a data memory barrier operation. In other embodiments, other types of load operations can result in different load instructions, such as load exclusive with acquire and other variants.
  • System 400 can include data memory barriers 402 and 406 on either side of a store operation 404 in a program order.
  • Data memory barrier 402 ensures that all prior instructions/operations have ceased before store operation 404 is initiated, while data memory barrier 406 ensures that store operation 404 is completed before any subsequent memory instructions/operations occur.
  • the first data memory barrier 402 and the second data memory barrier 406 also create an ordering to ensure that store with release and load with acquire instructions are observed in program order.
  • Methodology 500 can start at 502 , where a load operation is executed, wherein the load operation specifies an address for accessing a data from a memory.
  • a data memory barrier can be executed.
  • the data memory barrier is a type of barrier operation which causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. The typically means that certain operations are guaranteed to be performed before the barrier, and others after.
  • Data memory barrier ensures that prior instructions are performed and completed before subsequent instructions are executed. In this instance, the data memory barrier operation ensures that the prior load operation is performed and completed before subsequent instructions are executed.
  • Methodology 600 can start at 602 , where a first data memory barrier operation is executed.
  • the data memory barrier is a type of barrier instruction which causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.
  • a store operation is executed.
  • the store operation specifies an address for writing data to memory.
  • a second data memory barrier operation is executed. Having a store operation between two data memory barrier operations ensures that all other memory operations have been performed and are completed before the store operation is executed, and then no other memory operations are allowed until the store operation is completed. In this way, the store with release instruction performed memory operation ordering using simple store and data memory barrier operations.
  • Methodology 700 can start at 702 , where a first set of memory operations are executed before a barrier.
  • the barrier ensures that all instructions are completed before step 704 , where a second set of memory operations are executed after the data memory barrier.
  • the techniques described herein can be applied to any reduced instruction set computing environment where it is desirable to perform memory operation ordering or filtering. It is to be understood, therefore, that handheld, portable and other computing devices and computing objects of all kinds are contemplated for use in connection with the various non-limiting embodiments, i.e., anywhere that memory operation ordering may be performed. Accordingly, the below general purpose remote computer described below in FIG. 8 is but one example, and the disclosed subject matter can be implemented with any client having network/bus interoperability and interaction.
  • the disclosed subject matter can be implemented on chips or systems in an environment of networked hosted services in which very little or minimal client resources are implicated, e.g., a networked environment in which the client device serves merely as an interface to the network/bus, such as an object placed in an appliance.
  • aspects of the disclosed subject matter can partly be implemented via an operating system, for use by a developer of services for a device or object, and/or included within application software that operates in connection with the component(s) of the disclosed subject matter.
  • Software may be described in the general context of computer executable instructions, such as program modules or components, being executed by one or more computer(s), such as projection display devices, viewing devices, or other devices.
  • computer(s) such as projection display devices, viewing devices, or other devices.
  • FIG. 8 thus illustrates an example of a suitable computing system environment 800 in which some aspects of the disclosed subject matter can be implemented, although as made clear above, the computing system environment 800 is only one example of a suitable computing environment for a device and is not intended to suggest any limitation as to the scope of use or functionality of the disclosed subject matter. Neither should the computing environment 800 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 800 .
  • an exemplary device for implementing the disclosed subject matter includes a general-purpose computing device in the form of a computer 810 .
  • Components of computer 810 may include, but are not limited to, a processing unit 820 , a system memory 830 , and a system bus 821 that couples various system components including the system memory to the processing unit 820 .
  • the system bus 821 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
  • Computer 810 typically includes a variety of computer readable media.
  • Computer readable media can be any available media that can be accessed by computer 810 .
  • Computer readable media can comprise computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CDROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 810 .
  • Communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
  • the system memory 830 may include computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) and/or random access memory (RAM).
  • ROM read only memory
  • RAM random access memory
  • a basic input/output system (BIOS) containing the basic routines that help to transfer information between elements within computer 810 , such as during start-up, may be stored in memory 830 .
  • Memory 830 typically also contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 820 .
  • memory 830 may also include an operating system, application programs, other program modules, and program data.
  • the computer 810 may also include other removable/non-removable, volatile/nonvolatile computer storage media.
  • computer 810 could include a hard disk drive that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive that reads from or writes to a removable, nonvolatile magnetic disk, and/or an optical disk drive that reads from or writes to a removable, nonvolatile optical disk, such as a CD-ROM or other optical media.
  • Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like.
  • a hard disk drive is typically connected to the system bus 821 through a non-removable memory interface such as an interface
  • a magnetic disk drive or optical disk drive is typically connected to the system bus 821 by a removable memory interface, such as an interface.
  • a user can enter commands and information into the computer 810 through input devices such as a keyboard and pointing device, commonly referred to as a mouse, trackball, or touch pad.
  • Other input devices can include a microphone, joystick, game pad, satellite dish, scanner, wireless device keypad, voice commands, or the like.
  • user input 840 and associated interface(s) that are coupled to the system bus 821 , but may be connected by other interface and bus structures, such as a parallel port, game port, or a universal serial bus (USB).
  • a graphics subsystem can also be connected to the system bus 821 .
  • a projection unit in a projection display device, or a HUD in a viewing device or other type of display device can also be connected to the system bus 821 via an interface, such as output interface 850 , which may in turn communicate with video memory.
  • an interface such as output interface 850
  • computers can also include other peripheral output devices such as speakers which can be connected through output interface 850 .
  • the computer 810 can operate in a networked or distributed environment using logical connections to one or more other remote computer(s), such as remote computer 870 , which can in turn have media capabilities different from device 810 .
  • the remote computer 870 can be a personal computer, a server, a router, a network PC, a peer device, personal digital assistant (PDA), cell phone, handheld computing device, a projection display device, a viewing device, or other common network node, or any other remote media consumption or transmission device, and may include any or all of the elements described above relative to the computer 810 .
  • PDA personal digital assistant
  • LAN local area network
  • WAN wide area network
  • Such networking environments are commonplace in homes, offices, enterprise-wide computer networks, intranets and the Internet.
  • the computer 810 When used in a LAN networking environment, the computer 810 can be connected to the LAN 871 through a network interface or adapter. When used in a WAN networking environment, the computer 810 can typically include a communications component, such as a modem, or other means for establishing communications over the WAN, such as the Internet.
  • a communications component such as wireless communications component, a modem and so on, which can be internal or external, can be connected to the system bus 821 via the user input interface of input 840 , or other appropriate mechanism.
  • program modules depicted relative to the computer 810 can be stored in a remote memory storage device. It will be appreciated that the network connections shown and described are exemplary and other means of establishing a communications link between the computers can be used.
  • NAND and NOR memory refer to two types of flash memory based on the NAND and NOR logic gates that they respectively use.
  • the NAND type is primarily used in main memory memory cards, USB flash drives, solid-state drives, and similar products, for general storage and transfer of data.
  • the NOR type which allows true random access and therefore direct code execution, is used as a replacement for the older EPROM and as an alternative to certain kinds of ROM applications.
  • NOR flash memory can emulate ROM primarily at the machine code level; many digital designs need ROM (or PLA) structures for other uses, often at significantly higher speeds than (economical) flash memory may achieve.
  • NAND or NOR flash memory is also often used to store configuration data in numerous digital products, a task previously made possible by EEPROMs or battery-powered static RAM.
  • a component can be one or more transistors, a memory cell, an arrangement of transistors or memory cells, a gate array, a programmable gate array, an application specific integrated circuit, a controller, a processor, a process running on the processor, an object, executable, program or application accessing or interfacing with semiconductor memory, a computer, or the like, or a suitable combination thereof.
  • the component can include erasable programming (e.g., process instructions at least in part stored in erasable memory) or hard programming (e.g., process instructions burned into non-erasable memory at manufacture).
  • an architecture can include an arrangement of electronic hardware (e.g., parallel or serial transistors), processing instructions and a processor, which implement the processing instructions in a manner suitable to the arrangement of electronic hardware.
  • an architecture can include a single component (e.g., a transistor, a gate array, . . . ) or an arrangement of components (e.g., a series or parallel arrangement of transistors, a gate array connected with program circuitry, power leads, electrical ground, input signal lines and output signal lines, and so on).
  • a system can include one or more components as well as one or more architectures.
  • One example system can include a switching block architecture comprising crossed input/output lines and pass gate transistors, as well as power source(s), signal generator(s), communication bus(ses), controllers, I/O interface, address registers, and so on. It is to be appreciated that some overlap in definitions is anticipated, and an architecture or a system can be a stand-alone component, or a component of another architecture, system, etc.
  • the disclosed subject matter can be implemented as a method, apparatus, or article of manufacture using typical manufacturing, programming or engineering techniques to produce hardware, firmware, software, or any suitable combination thereof to control an electronic device to implement the disclosed subject matter.
  • the terms “apparatus” and “article of manufacture” where used herein are intended to encompass an electronic device, a semiconductor device, a computer, or a computer program accessible from any computer-readable device, carrier, or media.
  • Computer-readable media can include hardware media, or software media.
  • the media can include non-transitory media, or transport media.
  • non-transitory media can include computer readable hardware media.
  • Computer readable hardware media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), smart cards, and flash memory devices (e.g., card, stick, key drive . . . ).
  • Computer-readable transport media can include carrier waves, or the like.
  • the word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the word exemplary is intended to present concepts in a concrete fashion.
  • the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”. That is, unless specified otherwise, or clear from context, “X employs A or B” is intended to mean any of the natural inclusive permutations. That is, if X employs A; X employs B; or X employs both A and B, then “X employs A or B” is satisfied under any of the foregoing instances.
  • the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or clear from context to be directed to a singular form.
  • the terms (including a reference to a “means”) used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., a functional equivalent), even though not structurally equivalent to the disclosed structure, which performs the function in the herein illustrated exemplary aspects of the embodiments.
  • a particular feature may have been disclosed with respect to only one of several implementations, such feature may be combined with one or more other features of the other implementations as may be desired and advantageous for any given or particular application.
  • the embodiments include a system as well as a computer-readable medium having computer-executable instructions for performing the acts and/or events of the various processes.

Abstract

A system and method are provided for simplifying load acquire and store release semantics that are used in reduced instruction set computing (RISC). Translating the semantics into micro-operations, or low-level instructions used to implement complex machine instructions, can avoid having to implement complicated new memory operations. Using one or more data memory barrier operations in conjunction with load and store operations can provide sufficient ordering as a data memory barrier ensures that prior instructions are performed and completed before subsequent instructions are executed.

Description

    TECHNICAL FIELD
  • The subject disclosure relates to memory operation ordering in a reduced instruction set computing environment.
  • BACKGROUND
  • In lock free computing, there are two ways in which threads can manipulate shared memory, they can compete with each other for a resource, or they can pass information co-operatively from one thread to another. Acquire and release semantics are used to accomplish passing information cooperatively from one thread to another. Acquire and release semantics provide a structural system for ensuring that memory operations are ordered correctly to avoid errors. Store release instructions ensure that all previous instructions are completed, and load-acquire instructions ensure that all following instructions will complete only after it completes. Accordingly, to properly order memory operations using acquire and release semantics, complex combinations of store release and load acquire instructions are necessary.
  • The above-described description is merely intended to provide a contextual overview of current techniques for performing memory operation ordering and is not intended to be exhaustive.
  • SUMMARY
  • The following presents a simplified summary in order to provide a basic understanding of some aspects described herein. This summary is not an extensive overview of the disclosed subject matter. It is intended to neither identify key nor critical elements of the disclosure nor delineate the scope thereof. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
  • A system and method are provided for simplifying load acquire and store release semantics that are used in reduced instruction set computing (RISC). Various embodiments also provide for ordering memory operations with respect to the instructions disclosed herein. A typical load with acquire instruction only requires that memory operations after the load with acquire are ordered after the load with acquire—it does not impose any order on the instructions before the load with acquire (both with respect to the load with acquire and to the subsequent instructions). In an embodiment of the disclosure however, a load with acquire comprises a data memory barrier that is used in conjunction with a load operation which guarantees that all accesses prior to and including the load with acquire are ordered before all access from instructions after the load with acquire.
  • Similarly, traditional store with release instructions impose ordering between the access from the store with release and the accesses of all prior instructions (but not subsequent instructions). In an embodiment of the disclosure, however, a data memory barrier at the beginning of the store with release provides a strong ordering between prior access and the access associated with the store with release.
  • In an example embodiment, a system comprises a processor that executes computer-executable instructions to perform operations. The instructions can include a load with acquire instruction that performs memory operation ordering, wherein the load with acquire instruction comprises a load operation followed by a data memory barrier operation.
  • In another example embodiment, a method comprises executing instructions in a processor. The method can include a load with acquire instruction for performing memory operation ordering, wherein the executing the load with acquire instruction comprises executing a load operation followed by a data memory barrier operation.
  • In an example embodiment, a system comprises a processor that executes computer-executable instructions to perform operations. The instructions can include a store with release instruction that performs memory operation ordering, wherein the store with release instruction comprise a first data memory barrier operation followed by a store operation followed by a second data memory barrier operation.
  • In an example embodiment, a method comprises executing instructions in a processor. The method can include a store with release instruction for performing memory operation ordering, wherein the executing the store with release instruction comprises executing a first data memory barrier operation followed by executing a store operation followed by executing a second data memory barrier operation.
  • The following description and the annexed drawings set forth in detail certain illustrative aspects of the subject disclosure. These aspects are indicative, however, of but a few of the various ways in which the principles of various disclosed aspects can be employed and the disclosure is intended to include all such aspects and their equivalents. Other advantages and novel features will become apparent from the following detailed description when considered in conjunction with the drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. Us a block diagram illustrating an example, non-limiting embodiment of a system that filters memory operations in accordance with various aspects described herein.
  • FIG. 2 is a block diagram illustrating an example, non-limiting embodiment of a system that filters memory operations in accordance with various aspects described herein.
  • FIG. 3 is a block diagram illustrating an example, non-limiting embodiment of a system that filters memory operations in accordance with various aspects described herein.
  • FIG. 4 is a block diagram illustrating an example, non-limiting embodiment of a system that filters memory operations in accordance with various aspects described herein.
  • FIG. 5 illustrates a flow diagram of an example, non-limiting embodiment of a method for executing a load with acquire instruction.
  • FIG. 6 illustrates a flow diagram of an example, non-limiting embodiment of a method for executing a store with release instruction.
  • FIG. 7 illustrates a flow diagram of an example, non-limiting embodiment of a method for filtering memory operations using a data memory barrier.
  • FIG. 8 illustrates a block diagram of an example electronic computing environment that can be implemented in conjunction with one or more aspects described herein.
  • DETAILED DESCRIPTION
  • The disclosure herein is described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the subject innovation. It may be evident, however, that various disclosed aspects can be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing the subject innovation.
  • Various embodiments provide for a system that simplifies load acquire and store release semantics that are used in reduced instruction set computing (RISC). In lock free computing, there are two ways in which threads can manipulate shared memory, they can compete with each other for a resource, or they can pass information co-operatively from one thread to another. These semantics are complex however, and replacing the specialized semantics with simple data memory barriers can simplify the process of memory ordering. Translating the semantics into micro-operations, or low-level instructions used to implement complex machine instructions, can avoid having to implement complicated new memory operations. Using a data memory barrier in conjunction with load and store instructions can provide sufficient ordering using simple brute force ordering operations.
  • As used in this disclosure, the terms “instruction”, “operation”, and “access” refer to separate processes and are not interchangeable. An instruction is composed of one or more operations, while an operation may include zero or more memory accesses or barriers. By way of example, a load with acquire instruction creates two operations (a load operations and a barrier operation). This barrier splits all memory accesses into two groups. The first group comprises accesses from all instructions prior to the load with acquire as well as the access from the load operation that belongs to the load with acquire. The second group comprises accesses from all instructions after the load with acquire instruction.
  • Turning now to the illustrations. FIG. 1 illustrates a system 100 that filters memory operations using a data memory barrier in a RISC processor, processing environment, or architecture. The RISC processor can include variations of ARM processors, and specifically, in this embodiment, an ARMv8 processor. As illustrated, system 100 can include load/store component 102 that can be communicatively coupled and/or operationally coupled to processor 104 for facilitating operation and/or execution of computer executable instructions and/or components by system 100, memory 106 for storing data and/or computer executable instructions and/or components for execution by system 100 utilizing processor 104, for instance, and storage component 108 for providing longer term storage for data and/or computer executable instructions and/or components that can be executed by system 100 using processor 104, for example. Additionally, and as depicted, system 100 can receive input 110 that can be transformed by execution of one or more computer executable instructions and/or components, by the processor 104, from a first state to a second state, wherein the first state can be distinguished and/or is discernible and/or is different from the second state. System 100 can also produce output 112 that can include an article that has been transformed, through processing by system 100, into a different state or thing.
  • Turning now to FIG. 2, illustrated is a block diagram of an example, non-limiting embodiment of a system that filters memory operations in accordance with various aspects described herein. System 200 includes a data memory barrier 204 that enforces an ordering constraint on prior instructions 202 and subsequent instructions 206. The data memory barrier 204 is a type of barrier operation which causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier operation. The typically means that certain operations are guaranteed to be performed before the barrier, and others after. Data memory barrier 204 ensures that prior instructions 202 are performed and completed before subsequent instructions 206 are executed. Prior instructions 202 and subsequent instructions 206 can each include various combinations of basic load and store instructions plus more complex variants of these instructions (e.g., load-exclusive with acquire, store-exclusive with release, and etc).
  • In an embodiment, the prior instructions 202 and subsequent instructions 206 can comprise load or store instructions that are configured for loading a first set of data from a memory and storing a second set of data to the memory. The data memory barrier 204 can be configured for ordering the memory operations associated with loading and storing the data, wherein the type of ordering accomplished is based on the position in a program order of the data memory relative to the one or more load instructions and store instructions.
  • Turning now to FIG. 3, a block diagram illustrating an example, non-limiting embodiment of a system that filters memory operations via a load with acquire instruction in accordance with various aspects described herein is shown. System 300 can include a data memory barrier 304 that orders load operation 302 that precedes the data memory barrier 304 in a program order. Data memory barrier 304 ensures that load operation 302 is performed and completed before subsequent instructions are executed. System 300 shows a simple load with acquire instruction that comprises a load operation and a data memory barrier operation. In other embodiments, other types of load operations can result in different load instructions, such as load exclusive with acquire and other variants.
  • Turning now to FIG. 4, illustrated is an example, non-limiting embodiment of a system that performs a store with release instruction in accordance with various aspects described herein is shown. System 400 can include data memory barriers 402 and 406 on either side of a store operation 404 in a program order. Data memory barrier 402 ensures that all prior instructions/operations have ceased before store operation 404 is initiated, while data memory barrier 406 ensures that store operation 404 is completed before any subsequent memory instructions/operations occur. In addition, the first data memory barrier 402 and the second data memory barrier 406 also create an ordering to ensure that store with release and load with acquire instructions are observed in program order.
  • In view of the example systems described above, methods that may be implemented in accordance with the described subject matter may be better appreciated with reference to the flow charts of FIGS. 5-7. While for purposes of simplicity, the methods are shown and described as a series of blocks, it is to be understood and appreciated that the claimed subject matter is not limited by the order of the blocks, as some blocks may occur in different orders and/or concurrently with other blocks from what is depicted and described herein. Moreover, not all illustrated blocks may be required to implement the methods described hereinafter.
  • Referring now to FIG. 5, illustrated is a flow diagram of an example, non-limiting embodiment of a method for executing a load with acquire instruction. Methodology 500 can start at 502, where a load operation is executed, wherein the load operation specifies an address for accessing a data from a memory.
  • At 504, a data memory barrier can be executed. The data memory barrier is a type of barrier operation which causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. The typically means that certain operations are guaranteed to be performed before the barrier, and others after. Data memory barrier ensures that prior instructions are performed and completed before subsequent instructions are executed. In this instance, the data memory barrier operation ensures that the prior load operation is performed and completed before subsequent instructions are executed.
  • Turning now to FIG. 6, illustrated is a flow diagram of an example, non-limiting embodiment of a method for executing a store with release instruction. Methodology 600 can start at 602, where a first data memory barrier operation is executed. The data memory barrier is a type of barrier instruction which causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.
  • At 604, a store operation is executed. The store operation specifies an address for writing data to memory. At 606, a second data memory barrier operation is executed. Having a store operation between two data memory barrier operations ensures that all other memory operations have been performed and are completed before the store operation is executed, and then no other memory operations are allowed until the store operation is completed. In this way, the store with release instruction performed memory operation ordering using simple store and data memory barrier operations.
  • Turning now to FIG. 7, a flow diagram of an example, non-limiting embodiment of a method for filtering memory operations using a data memory barrier o. Methodology 700 can start at 702, where a first set of memory operations are executed before a barrier. The barrier ensures that all instructions are completed before step 704, where a second set of memory operations are executed after the data memory barrier.
  • Example Computing Environment
  • As mentioned, advantageously, the techniques described herein can be applied to any reduced instruction set computing environment where it is desirable to perform memory operation ordering or filtering.. It is to be understood, therefore, that handheld, portable and other computing devices and computing objects of all kinds are contemplated for use in connection with the various non-limiting embodiments, i.e., anywhere that memory operation ordering may be performed. Accordingly, the below general purpose remote computer described below in FIG. 8 is but one example, and the disclosed subject matter can be implemented with any client having network/bus interoperability and interaction. Thus, the disclosed subject matter can be implemented on chips or systems in an environment of networked hosted services in which very little or minimal client resources are implicated, e.g., a networked environment in which the client device serves merely as an interface to the network/bus, such as an object placed in an appliance.
  • Although not required, some aspects of the disclosed subject matter can partly be implemented via an operating system, for use by a developer of services for a device or object, and/or included within application software that operates in connection with the component(s) of the disclosed subject matter. Software may be described in the general context of computer executable instructions, such as program modules or components, being executed by one or more computer(s), such as projection display devices, viewing devices, or other devices. Those skilled in the art will appreciate that the disclosed subject matter may be practiced with other computer system configurations and protocols.
  • FIG. 8 thus illustrates an example of a suitable computing system environment 800 in which some aspects of the disclosed subject matter can be implemented, although as made clear above, the computing system environment 800 is only one example of a suitable computing environment for a device and is not intended to suggest any limitation as to the scope of use or functionality of the disclosed subject matter. Neither should the computing environment 800 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 800.
  • With reference to FIG. 8, an exemplary device for implementing the disclosed subject matter includes a general-purpose computing device in the form of a computer 810. Components of computer 810 may include, but are not limited to, a processing unit 820, a system memory 830, and a system bus 821 that couples various system components including the system memory to the processing unit 820. The system bus 821 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
  • Computer 810 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 810. By way of example, and not limitation, computer readable media can comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CDROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 810. Communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
  • The system memory 830 may include computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) and/or random access memory (RAM). A basic input/output system (BIOS), containing the basic routines that help to transfer information between elements within computer 810, such as during start-up, may be stored in memory 830. Memory 830 typically also contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 820. By way of example, and not limitation, memory 830 may also include an operating system, application programs, other program modules, and program data.
  • The computer 810 may also include other removable/non-removable, volatile/nonvolatile computer storage media. For example, computer 810 could include a hard disk drive that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive that reads from or writes to a removable, nonvolatile magnetic disk, and/or an optical disk drive that reads from or writes to a removable, nonvolatile optical disk, such as a CD-ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. A hard disk drive is typically connected to the system bus 821 through a non-removable memory interface such as an interface, and a magnetic disk drive or optical disk drive is typically connected to the system bus 821 by a removable memory interface, such as an interface.
  • A user can enter commands and information into the computer 810 through input devices such as a keyboard and pointing device, commonly referred to as a mouse, trackball, or touch pad. Other input devices can include a microphone, joystick, game pad, satellite dish, scanner, wireless device keypad, voice commands, or the like. These and other input devices are often connected to the processing unit 820 through user input 840 and associated interface(s) that are coupled to the system bus 821, but may be connected by other interface and bus structures, such as a parallel port, game port, or a universal serial bus (USB). A graphics subsystem can also be connected to the system bus 821. A projection unit in a projection display device, or a HUD in a viewing device or other type of display device can also be connected to the system bus 821 via an interface, such as output interface 850, which may in turn communicate with video memory. In addition to a monitor, computers can also include other peripheral output devices such as speakers which can be connected through output interface 850.
  • The computer 810 can operate in a networked or distributed environment using logical connections to one or more other remote computer(s), such as remote computer 870, which can in turn have media capabilities different from device 810. The remote computer 870 can be a personal computer, a server, a router, a network PC, a peer device, personal digital assistant (PDA), cell phone, handheld computing device, a projection display device, a viewing device, or other common network node, or any other remote media consumption or transmission device, and may include any or all of the elements described above relative to the computer 810. The logical connections depicted in FIG. 8 include a network 871, such local area network (LAN) or a wide area network (WAN), but can also include other networks/buses, either wired or wireless. Such networking environments are commonplace in homes, offices, enterprise-wide computer networks, intranets and the Internet.
  • When used in a LAN networking environment, the computer 810 can be connected to the LAN 871 through a network interface or adapter. When used in a WAN networking environment, the computer 810 can typically include a communications component, such as a modem, or other means for establishing communications over the WAN, such as the Internet. A communications component, such as wireless communications component, a modem and so on, which can be internal or external, can be connected to the system bus 821 via the user input interface of input 840, or other appropriate mechanism. In a networked, environment, program modules depicted relative to the computer 810, or portions thereof, can be stored in a remote memory storage device. It will be appreciated that the network connections shown and described are exemplary and other means of establishing a communications link between the computers can be used.
  • Reference throughout this specification to “one embodiment,” “an embodiment,” “a disclosed aspect,” or “an aspect” means that a particular feature, structure, or characteristic described in connection with the embodiment or aspect is included in at least one embodiment or aspect of the present disclosure. Thus, the appearances of the phrase “in one embodiment,” “in one aspect,” or “in an embodiment,” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in various disclosed embodiments.
  • As utilized herein, NAND and NOR memory refer to two types of flash memory based on the NAND and NOR logic gates that they respectively use. The NAND type is primarily used in main memory memory cards, USB flash drives, solid-state drives, and similar products, for general storage and transfer of data. The NOR type, which allows true random access and therefore direct code execution, is used as a replacement for the older EPROM and as an alternative to certain kinds of ROM applications. However, NOR flash memory can emulate ROM primarily at the machine code level; many digital designs need ROM (or PLA) structures for other uses, often at significantly higher speeds than (economical) flash memory may achieve. NAND or NOR flash memory is also often used to store configuration data in numerous digital products, a task previously made possible by EEPROMs or battery-powered static RAM.
  • As utilized herein, terms “component,” “system,” “architecture” and the like are intended to refer to a computer or electronic-related entity, either hardware, a combination of hardware and software, software (e.g., in execution), or firmware. For example, a component can be one or more transistors, a memory cell, an arrangement of transistors or memory cells, a gate array, a programmable gate array, an application specific integrated circuit, a controller, a processor, a process running on the processor, an object, executable, program or application accessing or interfacing with semiconductor memory, a computer, or the like, or a suitable combination thereof. The component can include erasable programming (e.g., process instructions at least in part stored in erasable memory) or hard programming (e.g., process instructions burned into non-erasable memory at manufacture).
  • By way of illustration, both a process executed from memory and the processor can be a component. As another example, an architecture can include an arrangement of electronic hardware (e.g., parallel or serial transistors), processing instructions and a processor, which implement the processing instructions in a manner suitable to the arrangement of electronic hardware. In addition, an architecture can include a single component (e.g., a transistor, a gate array, . . . ) or an arrangement of components (e.g., a series or parallel arrangement of transistors, a gate array connected with program circuitry, power leads, electrical ground, input signal lines and output signal lines, and so on). A system can include one or more components as well as one or more architectures. One example system can include a switching block architecture comprising crossed input/output lines and pass gate transistors, as well as power source(s), signal generator(s), communication bus(ses), controllers, I/O interface, address registers, and so on. It is to be appreciated that some overlap in definitions is anticipated, and an architecture or a system can be a stand-alone component, or a component of another architecture, system, etc.
  • In addition to the foregoing, the disclosed subject matter can be implemented as a method, apparatus, or article of manufacture using typical manufacturing, programming or engineering techniques to produce hardware, firmware, software, or any suitable combination thereof to control an electronic device to implement the disclosed subject matter. The terms “apparatus” and “article of manufacture” where used herein are intended to encompass an electronic device, a semiconductor device, a computer, or a computer program accessible from any computer-readable device, carrier, or media. Computer-readable media can include hardware media, or software media. In addition, the media can include non-transitory media, or transport media. In one example, non-transitory media can include computer readable hardware media. Specific examples of computer readable hardware media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), smart cards, and flash memory devices (e.g., card, stick, key drive . . . ). Computer-readable transport media can include carrier waves, or the like. Of course, those skilled in the art will recognize many modifications can be made to this configuration without departing from the scope or spirit of the disclosed subject matter.
  • What has been described above includes examples of the subject innovation. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the subject innovation, but one of ordinary skill in the art can recognize that many further combinations and permutations of the subject innovation are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the disclosure. Furthermore, to the extent that a term “includes”, “including”, “has” or “having” and variants thereof is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.
  • Moreover, the word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the word exemplary is intended to present concepts in a concrete fashion. As used in this application, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”. That is, unless specified otherwise, or clear from context, “X employs A or B” is intended to mean any of the natural inclusive permutations. That is, if X employs A; X employs B; or X employs both A and B, then “X employs A or B” is satisfied under any of the foregoing instances. In addition, the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or clear from context to be directed to a singular form.
  • Additionally, some portions of the detailed description have been presented in terms of algorithms or process operations on data bits within electronic memory. These process descriptions or representations are mechanisms employed by those cognizant in the art to effectively convey the substance of their work to others equally skilled. A process is here, generally, conceived to be a self-consistent sequence of acts leading to a desired result. The acts are those requiring physical manipulations of physical quantities. Typically, though not necessarily, these quantities take the form of electrical and/or magnetic signals capable of being stored, transferred, combined, compared, and/or otherwise manipulated.
  • It has proven convenient, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise or apparent from the foregoing discussion, it is appreciated that throughout the disclosed subject matter, discussions utilizing terms such as processing, computing, calculating, determining, or displaying, and the like, refer to the action and processes of processing systems, and/or similar consumer or industrial electronic devices or machines, that manipulate or transform data represented as physical (electrical and/or electronic) quantities within the registers or memories of the electronic device(s), into other data similarly represented as physical quantities within the machine and/or computer system memories or registers or other such information storage, transmission and/or display devices.
  • In regard to the various functions performed by the above described components, architectures, circuits, processes and the like, the terms (including a reference to a “means”) used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., a functional equivalent), even though not structurally equivalent to the disclosed structure, which performs the function in the herein illustrated exemplary aspects of the embodiments. In addition, while a particular feature may have been disclosed with respect to only one of several implementations, such feature may be combined with one or more other features of the other implementations as may be desired and advantageous for any given or particular application. It will also be recognized that the embodiments include a system as well as a computer-readable medium having computer-executable instructions for performing the acts and/or events of the various processes.
  • Other than where otherwise indicated, all numbers, values and/or expressions referring to quantities of items such as memory size, etc., used in the specification and claims are to be understood as modified in all instances by the term “about.”

Claims (26)

What is claimed is:
1. A processor that executes computer-executable instructions to perform operations, the instructions comprising:
a load with acquire instruction that performs memory operation ordering, wherein the load with acquire instruction comprises a load operation followed by a data memory barrier operation.
2. The processor of claim 1, wherein the processor is an ARMv8 processor.
3. The processor of claim 1, wherein the data memory barrier operation orders memory operations comprising a first set of memory operations occurring before the barrier operation, and a second set of memory operations occurring after the barrier operation.
4. The processor of claim 1, wherein the load operation specifies an address for accessing a first data from the memory.
5. The processor of claim 1, wherein the load with acquire instruction comprises at least one of a plurality of types of load with acquire instructions.
6. The processor of claim 1, wherein the data memory barrier operation replaces a set of load acquire semantics for memory operation ordering.
7. A method for executing instructions in a processor, comprising:
executing a load with acquire instruction for performing memory operation ordering, wherein the executing the load with acquire instruction comprises executing a load operation followed by a data memory barrier operation.
8. The method of claim 7, further comprising executing the instructions on an ARMv8 processor.
9. The method of claim 7, further comprising executing a plurality of types of load with acquire instructions.
10. The method of claim 7, wherein executing the data memory barrier operation replaces a set of load acquire semantics for memory operation ordering.
11. The method of claim 7, wherein the load operation specifies an address for accessing a first data from the memory.
12. The method of claim 7, wherein the data memory barrier operation orders memory operations comprising a first set of memory operations occurring before the barrier operation, and a second set of memory operations occurring after the barrier operation.
13. A processor that executes computer-executable instructions to perform operations, the instructions comprising:
a store with release instruction that performs memory operation ordering, wherein the store with release instruction comprise a first data memory barrier operation followed by a store operation followed by a second data memory barrier operation.
14. The processor of claim 13, wherein the processor is an ARMv8 processor.
15. The processor of claim 13, wherein the first and second data memory barrier operations order memory operations comprising a first set of memory operations occurring before the barrier operations, and a second set of memory operations occurring after the barrier operations.
16. The processor of claim 13, wherein the store operation specifies an address for writing a first data to memory.
17. The processor of claim 13, wherein the instructions further comprise a plurality of types of store with release instructions.
18. The processor of claim 13, wherein the second data memory barrier operation ensures that a following load with acquire instruction is observed in a program order.
19. The processor of claim 13, wherein the data memory barrier operations replaces a set of store release semantics for memory operation ordering.
20. A method for executing instructions in a processor, comprising:
executing a store with release instruction for performing memory operation ordering, wherein executing the store with release instruction comprises executing a first data memory barrier operation followed by executing a store operation followed by executing a second data memory barrier operation.
21. The method of claim 20, further comprising executing the store with release instruction on an ARMv8 processor.
22. The method of claim 20, further comprising executing a plurality of types of store with release instructions.
23. The method of claim 20, wherein executing the data memory barrier operations replaces a set of store release semantics for memory operation ordering.
24. The method of claim 20, wherein executing the first and second data memory barrier operations order memory operations comprising a first set of memory accesses occurring before the barrier operations, and a second set of memory accesses occurring after the barrier operations.
25. The method of claim 20, wherein executing the store operations specifies an address for writing a first data to memory.
26. The method of claim 20, wherein the executing the second data memory barrier operation before executing a load with acquire instruction ensures the instructions are observed in a program order.
US14/243,949 2014-04-03 2014-04-03 Implementation of load acquire/store release instructions using load/store operation with dmb operation Abandoned US20150317158A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/243,949 US20150317158A1 (en) 2014-04-03 2014-04-03 Implementation of load acquire/store release instructions using load/store operation with dmb operation
US16/424,138 US11513798B1 (en) 2014-04-03 2019-05-28 Implementation of load acquire/store release instructions using load/store operation with DMB operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/243,949 US20150317158A1 (en) 2014-04-03 2014-04-03 Implementation of load acquire/store release instructions using load/store operation with dmb operation

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/424,138 Continuation US11513798B1 (en) 2014-04-03 2019-05-28 Implementation of load acquire/store release instructions using load/store operation with DMB operation

Publications (1)

Publication Number Publication Date
US20150317158A1 true US20150317158A1 (en) 2015-11-05

Family

ID=54355292

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/243,949 Abandoned US20150317158A1 (en) 2014-04-03 2014-04-03 Implementation of load acquire/store release instructions using load/store operation with dmb operation
US16/424,138 Active US11513798B1 (en) 2014-04-03 2019-05-28 Implementation of load acquire/store release instructions using load/store operation with DMB operation

Family Applications After (1)

Application Number Title Priority Date Filing Date
US16/424,138 Active US11513798B1 (en) 2014-04-03 2019-05-28 Implementation of load acquire/store release instructions using load/store operation with DMB operation

Country Status (1)

Country Link
US (2) US20150317158A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018057113A1 (en) * 2016-09-22 2018-03-29 Qualcomm Incorporated Instruction-based synchronization of operations including at least one simd scatter operation

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100199054A1 (en) * 2009-01-30 2010-08-05 Mips Technologies, Inc. System and Method for Improving Memory Transfer

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4106090A (en) 1977-01-17 1978-08-08 Fairchild Camera And Instrument Corporation Monolithic microcomputer central processor
US5652723A (en) 1991-04-18 1997-07-29 Mitsubishi Denki Kabushiki Kaisha Semiconductor memory device
JPH07302200A (en) 1994-04-28 1995-11-14 Hewlett Packard Co <Hp> Loading instruction method of computer provided with instruction forcing sequencing loading operation and sequencing storage
US6546462B1 (en) * 1999-12-30 2003-04-08 Intel Corporation CLFLUSH micro-architectural implementation method and system
US6678810B1 (en) * 1999-12-30 2004-01-13 Intel Corporation MFENCE and LFENCE micro-architectural implementation method and system
US6725340B1 (en) * 2000-06-06 2004-04-20 International Business Machines Corporation Mechanism for folding storage barrier operations in a multiprocessor system
US6681317B1 (en) * 2000-09-29 2004-01-20 Intel Corporation Method and apparatus to provide advanced load ordering
US7552317B2 (en) 2004-05-04 2009-06-23 Sun Microsystems, Inc. Methods and systems for grouping instructions using memory barrier instructions
WO2005121948A1 (en) 2004-06-02 2005-12-22 Sun Microsystems, Inc. Method and apparatus for enforcing membar instruction semantics in an execute-ahead processor
US8060482B2 (en) 2006-12-28 2011-11-15 Intel Corporation Efficient and consistent software transactional memory
EP2075696A3 (en) 2007-05-10 2010-01-27 Texas Instruments Incorporated Interrupt- related circuits, systems and processes
US7984202B2 (en) 2007-06-01 2011-07-19 Qualcomm Incorporated Device directed memory barriers
US7730248B2 (en) 2007-12-13 2010-06-01 Texas Instruments Incorporated Interrupt morphing and configuration, circuits, systems and processes
GB2461716A (en) 2008-07-09 2010-01-13 Advanced Risc Mach Ltd Monitoring circuitry for monitoring accesses to addressable locations in data processing apparatus that occur between the start and end events.
US8352682B2 (en) 2009-05-26 2013-01-08 Qualcomm Incorporated Methods and apparatus for issuing memory barrier commands in a weakly ordered storage system
US8997103B2 (en) 2009-09-25 2015-03-31 Nvidia Corporation N-way memory barrier operation coalescing
GB2474446A (en) 2009-10-13 2011-04-20 Advanced Risc Mach Ltd Barrier requests to maintain transaction order in an interconnect with multiple paths
US8332564B2 (en) 2009-10-20 2012-12-11 Arm Limited Data processing apparatus and method for connection to interconnect circuitry
US8984511B2 (en) * 2012-03-29 2015-03-17 Advanced Micro Devices, Inc. Visibility ordering in a memory model for a unified computing system
US9582276B2 (en) 2012-09-27 2017-02-28 Apple Inc. Processor and method for implementing barrier operation using speculative and architectural color values
US9477599B2 (en) 2013-08-07 2016-10-25 Advanced Micro Devices, Inc. Write combining cache microarchitecture for synchronization events

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100199054A1 (en) * 2009-01-30 2010-08-05 Mips Technologies, Inc. System and Method for Improving Memory Transfer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Preshing; Jeff, 'Acquire and Release Semantics', Sep 13 2012, Preshing on Programming, http://preshing.com/20120913/acquire-and-release-semantics/ *
Terekhov; Alexander, Sewell; Peter, "C/C++11 mappings to processor", 12/22/2011, Archive Date: 9/7/2012, http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018057113A1 (en) * 2016-09-22 2018-03-29 Qualcomm Incorporated Instruction-based synchronization of operations including at least one simd scatter operation
CN109690474A (en) * 2016-09-22 2019-04-26 高通股份有限公司 The synchronization based on instruction of operation comprising at least one SIMD scatter operation
KR20190050989A (en) * 2016-09-22 2019-05-14 퀄컴 인코포레이티드 Instruction-based synchronization of operations including at least one SIMD scatter operation
US10474461B2 (en) 2016-09-22 2019-11-12 Qualcomm Incorporated Instruction-based synchronization of operations including at least one SIMD scatter operation
KR102090947B1 (en) 2016-09-22 2020-03-19 퀄컴 인코포레이티드 Command-based synchronization of operations including at least one SIMD scatter operation
AU2017330183B2 (en) * 2016-09-22 2020-11-12 Qualcomm Incorporated Instruction-based synchronization of operations including at least one SIMD scatter operation

Also Published As

Publication number Publication date
US11513798B1 (en) 2022-11-29

Similar Documents

Publication Publication Date Title
KR101817397B1 (en) Inter-architecture compatability module to allow code module of one architecture to use library module of another architecture
US9864702B2 (en) Techniques to prelink software to improve memory de-duplication in a virtual system
TWI486810B (en) Counter operation in a state machine lattice
US9785378B2 (en) Tracking transformed memory pages in virtual machine chain migration
US9208030B1 (en) Systems and methods of processing data associated with rapid snapshot and restore of guest operating system states
TW201602827A (en) Return-target restrictive return from procedure instructions, processors, methods, and systems
US20190324729A1 (en) Web Application Development Using a Web Component Framework
US10394561B2 (en) Mechanism for facilitating dynamic and efficient management of instruction atomicity volations in software programs at computing systems
CN102236621A (en) Computer interface information configuration system and method
US10462110B2 (en) System, apparatus and method for providing a unique identifier in a fuseless semiconductor device
US20220004668A1 (en) Lockable partition in nvme drives with drive migration support
US10162616B2 (en) System for binary translation version protection
US11513798B1 (en) Implementation of load acquire/store release instructions using load/store operation with DMB operation
US11113178B2 (en) Exposing and reproducing software race conditions
US10310857B2 (en) Systems and methods facilitating multi-word atomic operation support for system on chip environments
CN110249305B (en) Shell operation browser extension when browser crashes or hangs
EP4020216A1 (en) Performance circuit monitor circuit and method to concurrently store multiple performance monitor counts in a single register
US11138316B2 (en) Apparatus and method to provide secure fuse sense protection against power attacks
US10127064B2 (en) Read-only VM function chaining for secure hypervisor access
US11074200B2 (en) Use-after-free exploit prevention architecture
CN108292265B (en) Memory management for high performance memory
US9588814B2 (en) Fast approximate conflict detection
US10910025B2 (en) Flexible utilization of block storage in a computing system
WO2023116281A1 (en) Selective on-demand execution encryption
DE112017004783T5 (en) MAPPING OF SECURITY GUIDELINES GROUP REGISTERS

Legal Events

Date Code Title Description
AS Assignment

Owner name: APPLIED MICRO CIRCUITS CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ASHCRAFT, MATTHEW;NELSON, CHRISTOPHER;REEL/FRAME:032590/0334

Effective date: 20140401

AS Assignment

Owner name: MACOM CONNECTIVITY SOLUTIONS, LLC, MASSACHUSETTS

Free format text: MERGER AND CHANGE OF NAME;ASSIGNORS:APPLIED MICRO CIRCUITS CORPORATION;MACOM CONNECTIVITY SOLUTIONS, LLC;MACOM CONNECTIVITY SOLUTIONS, LLC;SIGNING DATES FROM 20170126 TO 20170127;REEL/FRAME:042176/0185

AS Assignment

Owner name: GOLDMAN SACHS BANK USA, AS COLLATERAL AGENT, NEW Y

Free format text: SECURITY INTEREST;ASSIGNOR:MACOM CONNECTIVITY SOLUTIONS, LLC (SUCCESSOR TO APPLIED MICRO CIRCUITS CORPORATION);REEL/FRAME:042444/0891

Effective date: 20170504

Owner name: GOLDMAN SACHS BANK USA, AS COLLATERAL AGENT, NEW YORK

Free format text: SECURITY INTEREST;ASSIGNOR:MACOM CONNECTIVITY SOLUTIONS, LLC (SUCCESSOR TO APPLIED MICRO CIRCUITS CORPORATION);REEL/FRAME:042444/0891

Effective date: 20170504

AS Assignment

Owner name: MACOM CONNECTIVITY SOLUTIONS, LLC (SUCCESSOR TO APPLIED MICRO CIRCUITS CORPORATION), MASSACHUSETTS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:GOLDMAN SACHS BANK USA, AS COLLATERAL AGENT;REEL/FRAME:044652/0609

Effective date: 20171027

Owner name: MACOM CONNECTIVITY SOLUTIONS, LLC (SUCCESSOR TO AP

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:GOLDMAN SACHS BANK USA, AS COLLATERAL AGENT;REEL/FRAME:044652/0609

Effective date: 20171027

AS Assignment

Owner name: PROJECT DENVER INTERMEDIATE HOLDINGS LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MACOM CONNECTIVITY SOLUTIONS, LLC;REEL/FRAME:044798/0599

Effective date: 20171025

Owner name: PROJECT DENVER INTERMEDIATE HOLDINGS LLC, CALIFORN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MACOM CONNECTIVITY SOLUTIONS, LLC;REEL/FRAME:044798/0599

Effective date: 20171025

AS Assignment

Owner name: AMPERE COMPUTING LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:PROJECT DENVER INTERMEDIATE HOLDINGS LLC;REEL/FRAME:044717/0683

Effective date: 20171129

STCV Information on status: appeal procedure

Free format text: NOTICE OF APPEAL FILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION