CN110795150A - Implementation of load fetch/store release instruction by load/store operation according to DMB operation - Google Patents
Implementation of load fetch/store release instruction by load/store operation according to DMB operation Download PDFInfo
- Publication number
- CN110795150A CN110795150A CN201910999320.5A CN201910999320A CN110795150A CN 110795150 A CN110795150 A CN 110795150A CN 201910999320 A CN201910999320 A CN 201910999320A CN 110795150 A CN110795150 A CN 110795150A
- Authority
- CN
- China
- Prior art keywords
- store
- load
- processor
- memory
- instruction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000015654 memory Effects 0.000 claims abstract description 108
- 230000004888 barrier function Effects 0.000 claims abstract description 66
- 238000012163 sequencing technique Methods 0.000 claims description 10
- 238000000034 method Methods 0.000 abstract description 23
- 238000010586 diagram Methods 0.000 description 14
- 238000012216 screening Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 5
- 239000002609 medium Substances 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000006163 transport media Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G06F9/30043—LOAD or STORE instructions; Clear instruction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30087—Synchronisation or serialisation instructions
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Executing Machine-Instructions (AREA)
- Advance Control (AREA)
- Stored Programmes (AREA)
Abstract
The present application relates to implementing a load fetch/store release instruction with a load/store operation in accordance with a DMB operation. Systems and methods are provided for semantically simplifying load fetch and store release used in reduced instruction set operations (RISC). Translating the semantic meaning into micro-operations, or low-level instructions, for implementing complex machine instructions avoids having to implement complex new memory operations. Barrier operations using one or more data memories in conjunction with load and store operations may provide sufficient ordering when the data memory barrier ensures that a preceding instruction is performed and completed before a subsequent instruction is executed.
Description
The application is a divisional application of PCT patent application with the application date of 2015, 7, month and 21, entering China, and the application number of the PCT patent is 201580082189.6, and the invention is named as 'implementing a load acquisition/store release instruction according to DMB operation by load/store operation'.
Technical Field
The present application relates to memory operation ordering (memory operation ordering) in a computing environment.
Background
In lock-free operations, there are two ways in which threads (threads) can manipulate shared memory, which may compete for resources, or which may transfer information from one thread to another in a coordinated manner. The capture and release semantics are used to achieve a coordinated passing of information from one thread to another. The fetch and release semantics provide a structured system for ensuring that memory operations are properly sequenced to avoid errors. A store release (store release) instruction ensures that all preceding instructions have completed, while a load acquire (load acquire) instruction ensures that all following instructions will complete only after they have completed. In order to properly sequence memory operations using fetch and release semantics, a complex combination of store release and load fetch instructions is necessary.
Disclosure of Invention
Disclosed herein are systems and methods for semantically simplifying load fetches and store releases used in Reduced Instruction Set Computing (RISC). Particular embodiments are used to sequence memory operations with respect to the instructions disclosed herein. A typical load with a fetch instruction only requires that memory operations following the load with a fetch be sequenced out after the load with a fetch that does not have any sequencing applied to the instruction before the load with a fetch (both with respect to the load with a fetch and with respect to subsequent instructions). In one embodiment, the load with a prefetch includes a data memory barrier used by a companion load operation that ensures that all accesses prior to and including the load with a prefetch are sequenced out before all accesses from the instruction following the load with a prefetch are performed.
Similarly, a conventional store with a release instruction imposes ordering between accesses made from the store with a release and accesses of all preceding instructions (but not subsequent instructions). However, in a particular embodiment, the data memory barrier at the beginning of a store with a release provides strong ordering between previous accesses and store-associated accesses with a release.
In a particular embodiment, a system includes a processor that executes computer-executable instructions for performing operations. The instruction may include a load with acquire instruction that performs memory operation sequencing, where the load with acquire instruction includes a load operation followed by a barrier operation of the data memory.
In another embodiment, a method includes executing instructions in a processor. The method may include a load with fetch instruction for memory operation sequencing, wherein executing the load with fetch instruction includes executing a load operation followed by a barrier operation of a data memory.
In a particular embodiment, a system includes a processor that executes computer-executable instructions for performing operations. The instruction may include a store with a release instruction to sequence memory operations, where the store with a release instruction includes a barrier operation of a first data memory followed by a store operation followed by a barrier operation of a second data memory.
In one embodiment, a method includes executing instructions in a processor. The method may include storing a release-with-instruction for sequencing memory operations, wherein executing the storing of the release-with-instruction includes performing a barrier operation of a first data memory, followed by performing a store operation, followed by performing a barrier operation of a second data memory.
Drawings
FIG. 1 is a block diagram depicting one embodiment of a system for screening memory operations according to aspects described herein.
FIG. 2 is a block diagram depicting one embodiment of a system for screening memory operations according to aspects described herein.
FIG. 3 is a block diagram depicting one embodiment of a system for screening memory operations according to aspects described herein.
FIG. 4 is a block diagram depicting one embodiment of a system for screening memory operations according to aspects described herein.
FIG. 5 is a flow diagram of one embodiment of a method for executing a load with a fetch instruction.
FIG. 6 is a flow diagram depicting one embodiment of a method for executing a store with release instruction.
FIG. 7 depicts a flow diagram of one embodiment of a method for screening memory operations using data memory barriers.
FIG. 8 is a block diagram illustrating an electronic computing environment that may be implemented with one or more aspects described herein.
Fig. 9 is a block diagram of a data communication network that may operate in accordance with various aspects described herein.
Description of the main component symbols:
100,200,300,400 system
102 load/store component
104 processor
106 memory
108 storage component
110 input
112 output
202 preceding instruction
204,304,402,406 data store barrier
206 subsequent instruction
302 load operation
404 store operation
500,600,700 method
502 to 504,602 to 606,702 to 704
800 computing system environment
810 computer
820 processing unit
821 system bus
830 System memory
870 remote computer.
Detailed Description
Embodiments are provided for a system that semantically simplifies load fetch and store release used in Reduced Instruction Set Computing (RISC). In lock-free operations, there are two ways in which threads can manipulate shared memory, which may compete for resources, or may transfer information from one thread to another in a coordinated manner. However, these semantics are complex and replacing dedicated semantics with a simple data memory barrier simplifies the process of memory sequencing. Translating semantics into micro-operations, or low-order instructions, for implementing complex machine instructions may avoid having to implement complex new memory operations. The use of data memory barriers in conjunction with load and store instructions may provide sufficient sequencing using purely brute force sequencing operations.
The terms "instruction," "operation," and "access" as used in this application refer to separate processes and are not interchangeable. An instruction consists of one or more operations, and an operation may include zero or more memory accesses or barriers. For example, a load with a fetch instruction establishes two operations (a load operation and a barrier operation). This barrier divides all memory accesses into two groups. The first group includes accesses from all instructions prior to the load with fetch and accesses from load operations belonging to the load with fetch. The second group includes accesses from all instructions after the load with fetch.
FIG. 1 shows a system 100 for screening memory operations using data memory barriers in a RISC processor, processing environment, or architecture. The RISC processor may comprise a variation of an ARM processor, and in particular, in this particular embodiment, may comprise an ARMv8 processor. As shown, for example, the system 100 may include a load/store component 102 that may be communicatively coupled and/or operatively coupled to a processor 104 for facilitating operation and/or execution of computer-executable instructions and/or components by the system 100, a memory 106 for storing data and/or computer-executable instructions and/or components for execution by the system 100 utilizing the processor 104, and a storage component 108 for providing longer term storage of data and/or computer-executable instructions and/or components that may be executed by the system 100 utilizing the processor 104, for example. Additionally, and as shown, the system 100 may receive an input 110 that may be transitioned from a first state to a second state by execution of one or more computer-executable instructions and/or components by the processor 104, where the first state may be distinct, and/or distinguishable, and/or different from the second state. The system 100 may also generate an output 112, which may include an item that has been transformed into a different state or thing through processing by the system 100.
FIG. 2 depicts a block diagram of an embodiment of a system that screens memory operations, according to aspects described herein. The system 200 includes a data memory barrier 204 that imposes an ordering constraint on a preceding instruction 202 and a subsequent instruction 206. The data memory barrier 204 is a type of barrier operation that causes a CPU or compiler (compiler) to impose ordering constraints on memory operations issued before and after the barrier operation. This generally means ensuring that certain operations are performed before the barrier and other operations are performed later. The data memory barrier 204 ensures that the preceding instruction 202 is made and completed before the subsequent instruction 206 is executed. The preceding instructions 202 and the subsequent instructions 206 may each include various combinations of basic load and store instructions plus more complex variations of these instructions (e.g., no fetch load, no release store, etc.).
In one embodiment, the preceding instruction 202 and the subsequent instruction 206 may comprise load or store instructions configured to load a first set of data from memory and store a second set of data to the memory. The data memory barrier 204 may be configured for ordering load and store data-associated memory operations, where such type of ordering achieved is based on the location of the program order (program order) of the data memory relative to one or more load and store instructions.
FIG. 3 is a block diagram depicting one embodiment of a system for screening memory operations via loads with fetch instructions in accordance with aspects described herein. The system 300 may include a data memory barrier 304 that orders load operations 302 that precede the data memory barrier 304 in program order. The data memory barrier 304 ensures that the load operation 302 is performed and completed before subsequent instructions are executed. The system 300 shows a load with purely fetch instructions including a load operation and a barrier operation of the data memory. In other embodiments, other types of load operations may result in different load instructions, such as a no fetch load and other variants.
FIG. 4 depicts one embodiment of a system for performing a store with release instruction, according to aspects described herein. System 400 may include data memory barriers 402 and 406 on either side of store operation 404 in program order. The data memory barrier 402 ensures that all preceding instructions/operations have been stalled before the store operation 404 is initiated, while the data memory barrier 406 ensures that the store operation 404 is completed before any subsequent memory instructions/operations occur. Additionally, the first data memory barrier 402 and the second data memory barrier 406 also establish ordering to ensure that store with release and load with acquire instructions are observed in program order.
While the method implemented in accordance with the patent object may be shown and described as a series of blocks with reference to the flowcharts of fig. 5-7, it is to be understood that the patent object is not limited by the order of the blocks, as some blocks may occur in different orders and/or concurrently with other blocks from what is depicted and described herein. Moreover, not all illustrated blocks may be required to implement the methodologies described hereinafter.
FIG. 5 is a flow diagram of one embodiment of a method for executing a load with a fetch instruction. The method 500 may begin at 502 where a load operation is performed, where the load operation specifies an address for accessing data from memory.
At 504, a data memory barrier may be executed. The data memory barrier is a type of barrier operation that causes a CPU or compiler to impose ordering constraints on memory operations issued before and after the barrier instruction. This generally means ensuring that certain operations are performed before the barrier and other operations are performed later. The data memory barrier ensures that a preceding instruction is made and completed before a subsequent instruction is executed. In this implementation, the barrier operation of the data memory ensures that the prior load operation is performed and completed before the subsequent instruction is executed.
FIG. 6 is a flow diagram illustrating one embodiment of a method for performing a store with release instruction. The method 600 may begin at 602 where a barrier operation of a first data store is performed. The data memory barrier is a type of barrier instruction that causes a CPU or compiler to impose ordering constraints on memory operations issued before and after the barrier instruction.
At 604, a store operation is performed. The store operation specifies an address for writing data to memory. At 606, a barrier operation is performed for the second data memory. Having a store operation before the barrier operation of two data memories ensures that all other memory operations have been performed and completed before the store operation is performed, and then no other memory operations are allowed before the store operation is completed. In this manner, store with release instructions use a pure store and data memory barrier operation for memory operation sequencing.
FIG. 7 is a flow diagram of one embodiment of a method for screening memory operations using data memory barriers. The method 700 may begin at 702 where a first set of memory operations is performed prior to a barrier. The barrier ensures that all instructions are completed before step 704, where the second set of memory operations are executed after the data memory barrier.
The techniques described herein may be applied to any reduced instruction set computing environment in which memory operation sequencing or screening is desired. It is understood that handheld, portable and other computing devices and computing objects of all kinds are contemplated for use with various embodiments (i.e., any embodiment capable of memory operations). The below general purpose remote computer described below in FIG. 8 is an example, and the disclosed subject matter can be implemented with any client having network/bus interoperability and interaction. Thus, the disclosed patent objects may be implemented on a chip or system in a network-linked hosted services environment, for example, where a client device acts merely as an interface to a network/bus, such as an object placed in an appliance.
FIG. 8 illustrates one embodiment of a suitable computing system environment 800 in which aspects of the disclosed subject matter may be implemented, the computing system environment 800 being but one embodiment of a computing environment suitable for use with a device and is not intended to suggest any limitation as to the scope of use or functionality of the disclosed subject matter.
FIG. 8 is an exemplary device for implementing the disclosed patented object, including a general purpose computing device in the form of a computer 810. Components of computer 810 may include a processing unit 820, a system memory 830, and a system bus 821 that couples various system components including the system memory to the processing unit 820. The system bus 821 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. Computer 810 typically includes a variety of computer readable media.
The system memory 830 may include computer storage media in the form of volatile and/or nonvolatile memory such as Read Only Memory (ROM) and/or Random Access Memory (RAM). A basic input/output system (BIOS), containing the basic routines that help to transfer information between elements within computer 810, may be stored in memory 830. The computer 810 may also include other removable/non-removable, volatile/nonvolatile computer storage media.
A user may enter commands and information into the computer 810 through input devices such as a keyboard and pointing device, commonly referred to as a mouse, trackball or touch pad.
The computer 810 may operate in a networked or distributed environment using logical connections to one or more other remote computers, such as a remote computer 870, which may in turn have different media functions than the device 810.
In addition to the foregoing, the disclosed patent objects may be implemented as a method, apparatus, or article of manufacture using typical manufacturing, programming, or engineering techniques to produce hardware, firmware, software, or any suitable combination thereof to control an electronic device to implement the disclosed patent objects. A computer-readable medium may include a hardware medium, or a software medium, which may include a non-transitory medium or a transport medium.
Claims (13)
1. A processor, comprising:
a load/store component configured to facilitate memory operations, the load/store component further configured to receive a load with a fetch instruction and execute the load with the fetch instruction as a load operation followed by a barrier operation of a data memory, the barrier operation of the data memory replacing a fetch set of load fetch semantics.
2. The processor as recited in claim 1 wherein a barrier operation of the data memory orders memory operations comprising a first set of memory operations occurring before the barrier operation and a second set of memory operations occurring after the barrier operation.
3. The processor of claim 1,
the load operation specifying an address for accessing first data from the memory; and
the load with fetch instruction includes at least one of a plurality of types of loads with fetch instructions.
4. The processor of claim 1, wherein the processor is a Reduced Instruction Set Computing (RISC) processor.
5. The processor of claim 1, wherein the processor is an advanced reduced instruction set computing machine (ARM) processor.
6. The processor of claim 1, wherein the load with fetch instruction is a load without fetch instruction.
7. The processor of claim 1, wherein the processor is a server processor configured to execute client requests associated with one or more network interoperable clients.
8. A processor, comprising:
a load/store component configured to facilitate memory operations, the load/store component further configured to receive a store with a release instruction and execute the store with the release instruction, wherein executing the store with the release instruction comprises a barrier operation of a first data memory followed by a store operation followed by a barrier operation of a second data memory, the barrier operation of the second data memory replacing a set of store release semantics.
9. The processor of claim 8,
the barrier operation of the first data store and the barrier operation of the second data store sequencing memory operations, the memory operations including a first set of memory operations occurring before the barrier operation of the first data store and the barrier operation of the second data store and a second set of memory operations occurring after the barrier operation of the first data store and the barrier operation of the second data store; and
the store operation specifies an address for writing the first data to the memory.
10. The processor of claim 8, wherein the processor is a Reduced Instruction Set Computing (RISC) processor.
11. The processor of claim 10, wherein the processor is an advanced reduced instruction set computing machine (ARM) processor.
12. The processor of claim 8, wherein the store with release instruction is a no release instruction store.
13. The processor of claim 8, wherein the processor is a server processor configured to execute client requests associated with one or more network interoperable clients.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910999320.5A CN110795150A (en) | 2015-07-21 | 2015-07-21 | Implementation of load fetch/store release instruction by load/store operation according to DMB operation |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910999320.5A CN110795150A (en) | 2015-07-21 | 2015-07-21 | Implementation of load fetch/store release instruction by load/store operation according to DMB operation |
CN201580082189.6A CN108139903B (en) | 2015-07-21 | 2015-07-21 | Implement load acquisition/storage with load/store operations according to DMB operation to release order |
PCT/US2015/041322 WO2017014752A1 (en) | 2015-07-21 | 2015-07-21 | Implementation of load acquire/store release instructions using load/store operation with dmb operation |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201580082189.6A Division CN108139903B (en) | 2015-07-21 | 2015-07-21 | Implement load acquisition/storage with load/store operations according to DMB operation to release order |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110795150A true CN110795150A (en) | 2020-02-14 |
Family
ID=57835180
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910999320.5A Pending CN110795150A (en) | 2015-07-21 | 2015-07-21 | Implementation of load fetch/store release instruction by load/store operation according to DMB operation |
CN201580082189.6A Expired - Fee Related CN108139903B (en) | 2015-07-21 | 2015-07-21 | Implement load acquisition/storage with load/store operations according to DMB operation to release order |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201580082189.6A Expired - Fee Related CN108139903B (en) | 2015-07-21 | 2015-07-21 | Implement load acquisition/storage with load/store operations according to DMB operation to release order |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP3326059A4 (en) |
JP (1) | JP6739513B2 (en) |
CN (2) | CN110795150A (en) |
WO (1) | WO2017014752A1 (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10417002B2 (en) | 2017-10-06 | 2019-09-17 | International Business Machines Corporation | Hazard detection of out-of-order execution of load and store instructions in processors without using real addresses |
US10572256B2 (en) | 2017-10-06 | 2020-02-25 | International Business Machines Corporation | Handling effective address synonyms in a load-store unit that operates without address translation |
US10606590B2 (en) | 2017-10-06 | 2020-03-31 | International Business Machines Corporation | Effective address based load store unit in out of order processors |
US11175924B2 (en) | 2017-10-06 | 2021-11-16 | International Business Machines Corporation | Load-store unit with partitioned reorder queues with single cam port |
US10394558B2 (en) | 2017-10-06 | 2019-08-27 | International Business Machines Corporation | Executing load-store operations without address translation hardware per load-store unit port |
US10606591B2 (en) | 2017-10-06 | 2020-03-31 | International Business Machines Corporation | Handling effective address synonyms in a load-store unit that operates without address translation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000181891A (en) * | 1998-12-18 | 2000-06-30 | Hitachi Ltd | Shared memory access sequence assurance system |
US20050273583A1 (en) * | 2004-06-02 | 2005-12-08 | Paul Caprioli | Method and apparatus for enforcing membar instruction semantics in an execute-ahead processor |
CN101828173A (en) * | 2007-10-18 | 2010-09-08 | Nxp股份有限公司 | Data processing system with a plurality of processors, cache circuits and a shared memory |
US20150046652A1 (en) * | 2013-08-07 | 2015-02-12 | Advanced Micro Devices, Inc. | Write combining cache microarchitecture for synchronization events |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07302200A (en) * | 1994-04-28 | 1995-11-14 | Hewlett Packard Co <Hp> | Loading instruction method of computer provided with instruction forcing sequencing loading operation and sequencing storage |
US7552317B2 (en) * | 2004-05-04 | 2009-06-23 | Sun Microsystems, Inc. | Methods and systems for grouping instructions using memory barrier instructions |
US7725618B2 (en) * | 2004-07-29 | 2010-05-25 | International Business Machines Corporation | Memory barriers primitives in an asymmetric heterogeneous multiprocessor environment |
US8060482B2 (en) * | 2006-12-28 | 2011-11-15 | Intel Corporation | Efficient and consistent software transactional memory |
GB2461716A (en) * | 2008-07-09 | 2010-01-13 | Advanced Risc Mach Ltd | Monitoring circuitry for monitoring accesses to addressable locations in data processing apparatus that occur between the start and end events. |
US8997103B2 (en) * | 2009-09-25 | 2015-03-31 | Nvidia Corporation | N-way memory barrier operation coalescing |
US8935513B2 (en) * | 2012-02-08 | 2015-01-13 | International Business Machines Corporation | Processor performance improvement for instruction sequences that include barrier instructions |
US9582276B2 (en) * | 2012-09-27 | 2017-02-28 | Apple Inc. | Processor and method for implementing barrier operation using speculative and architectural color values |
US9442755B2 (en) * | 2013-03-15 | 2016-09-13 | Nvidia Corporation | System and method for hardware scheduling of indexed barriers |
-
2015
- 2015-07-21 CN CN201910999320.5A patent/CN110795150A/en active Pending
- 2015-07-21 EP EP15899072.1A patent/EP3326059A4/en active Pending
- 2015-07-21 JP JP2018502709A patent/JP6739513B2/en active Active
- 2015-07-21 CN CN201580082189.6A patent/CN108139903B/en not_active Expired - Fee Related
- 2015-07-21 WO PCT/US2015/041322 patent/WO2017014752A1/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000181891A (en) * | 1998-12-18 | 2000-06-30 | Hitachi Ltd | Shared memory access sequence assurance system |
US20050273583A1 (en) * | 2004-06-02 | 2005-12-08 | Paul Caprioli | Method and apparatus for enforcing membar instruction semantics in an execute-ahead processor |
CN101828173A (en) * | 2007-10-18 | 2010-09-08 | Nxp股份有限公司 | Data processing system with a plurality of processors, cache circuits and a shared memory |
US20150046652A1 (en) * | 2013-08-07 | 2015-02-12 | Advanced Micro Devices, Inc. | Write combining cache microarchitecture for synchronization events |
Non-Patent Citations (2)
Title |
---|
JEFF PRESHING: ""Acquire and Release Semantics"", 《HTTPS://PRESHING.COM/20120913/ACQUIRE-AND-RELEASE-SEMANTICS/》, pages 1 - 18 * |
LISA HIGHAM等: "Programmer-Centric Conditions for Itanium Memory Consistency", pages 58 * |
Also Published As
Publication number | Publication date |
---|---|
CN108139903B (en) | 2019-11-15 |
EP3326059A1 (en) | 2018-05-30 |
WO2017014752A1 (en) | 2017-01-26 |
CN108139903A (en) | 2018-06-08 |
JP6739513B2 (en) | 2020-08-12 |
JP2018523235A (en) | 2018-08-16 |
EP3326059A4 (en) | 2019-04-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110795150A (en) | Implementation of load fetch/store release instruction by load/store operation according to DMB operation | |
US11003489B2 (en) | Cause exception message broadcast between processing cores of a GPU in response to indication of exception event | |
DE102018126150A1 (en) | DEVICE, METHOD AND SYSTEMS FOR MULTICAST IN A CONFIGURABLE ROOM ACCELERATOR | |
JP5934094B2 (en) | Mapping across multiple processors of processing logic with data parallel threads | |
DE112017001825T5 (en) | PROCESSORS, METHODS, SYSTEMS AND INSTRUCTIONS FOR ATOMICALLY SAVING DATA WIDER THAN A NATIVELY SUPPORTED DATA WIDTH IN A MEMORY | |
US20130231912A1 (en) | Method, system, and scheduler for simulating multiple processors in parallel | |
CN111459618A (en) | Intelligent GPU scheduling in virtualized environments | |
CN107479981B (en) | Processing method and device for realizing synchronous call based on asynchronous call | |
US9170786B1 (en) | Composable context menus | |
US20150277405A1 (en) | Production plan display method, production plan support method, production plan display apparatus, production plan support apparatus, and recording medium | |
US9703905B2 (en) | Method and system for simulating multiple processors in parallel and scheduler | |
DE112013007703T5 (en) | Command and logic for identifying instructions for retirement in a multi-stranded out-of-order processor | |
DE102015007423A1 (en) | Memory sequencing with coherent and non-coherent subsystems | |
EP3200083A1 (en) | Resource scheduling method and related apparatus | |
US20180067859A1 (en) | Selective allocation of cpu cache slices to database objects | |
EP2988469B1 (en) | A method and apparatus for updating a user interface of one program unit in response to an interaction with a user interface of another program unit | |
US10713085B2 (en) | Asynchronous sequential processing execution | |
CN102867018A (en) | Method for analogue signal communication between threads in database system | |
CN103019844A (en) | Method and device supporting calling of MPI (Message Passing Interface) function through multiple threads | |
CN102736949A (en) | Scheduling of tasks to be performed by a non-coherent device | |
EP3131004A1 (en) | Processor and method | |
US20210055971A1 (en) | Method and node for managing a request for hardware acceleration by means of an accelerator device | |
US8584143B2 (en) | Collection access in a parallel environment | |
Khot | Parallelization in Python | |
CN103714511A (en) | GPU-based branch processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |