CN110795150A - Implementation of load fetch/store release instruction by load/store operation according to DMB operation - Google Patents

Implementation of load fetch/store release instruction by load/store operation according to DMB operation Download PDF

Info

Publication number
CN110795150A
CN110795150A CN201910999320.5A CN201910999320A CN110795150A CN 110795150 A CN110795150 A CN 110795150A CN 201910999320 A CN201910999320 A CN 201910999320A CN 110795150 A CN110795150 A CN 110795150A
Authority
CN
China
Prior art keywords
store
load
processor
memory
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910999320.5A
Other languages
Chinese (zh)
Inventor
M·阿什克拉夫特
C·纳尔逊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MACOM Connectivity Solutions LLC
Original Assignee
Applied Micro Circuits Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Applied Micro Circuits Corp filed Critical Applied Micro Circuits Corp
Priority to CN201910999320.5A priority Critical patent/CN110795150A/en
Publication of CN110795150A publication Critical patent/CN110795150A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30087Synchronisation or serialisation instructions

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Executing Machine-Instructions (AREA)
  • Advance Control (AREA)
  • Stored Programmes (AREA)

Abstract

The present application relates to implementing a load fetch/store release instruction with a load/store operation in accordance with a DMB operation. Systems and methods are provided for semantically simplifying load fetch and store release used in reduced instruction set operations (RISC). Translating the semantic meaning into micro-operations, or low-level instructions, for implementing complex machine instructions avoids having to implement complex new memory operations. Barrier operations using one or more data memories in conjunction with load and store operations may provide sufficient ordering when the data memory barrier ensures that a preceding instruction is performed and completed before a subsequent instruction is executed.

Description

Implementation of load fetch/store release instruction by load/store operation according to DMB operation
The application is a divisional application of PCT patent application with the application date of 2015, 7, month and 21, entering China, and the application number of the PCT patent is 201580082189.6, and the invention is named as 'implementing a load acquisition/store release instruction according to DMB operation by load/store operation'.
Technical Field
The present application relates to memory operation ordering (memory operation ordering) in a computing environment.
Background
In lock-free operations, there are two ways in which threads (threads) can manipulate shared memory, which may compete for resources, or which may transfer information from one thread to another in a coordinated manner. The capture and release semantics are used to achieve a coordinated passing of information from one thread to another. The fetch and release semantics provide a structured system for ensuring that memory operations are properly sequenced to avoid errors. A store release (store release) instruction ensures that all preceding instructions have completed, while a load acquire (load acquire) instruction ensures that all following instructions will complete only after they have completed. In order to properly sequence memory operations using fetch and release semantics, a complex combination of store release and load fetch instructions is necessary.
Disclosure of Invention
Disclosed herein are systems and methods for semantically simplifying load fetches and store releases used in Reduced Instruction Set Computing (RISC). Particular embodiments are used to sequence memory operations with respect to the instructions disclosed herein. A typical load with a fetch instruction only requires that memory operations following the load with a fetch be sequenced out after the load with a fetch that does not have any sequencing applied to the instruction before the load with a fetch (both with respect to the load with a fetch and with respect to subsequent instructions). In one embodiment, the load with a prefetch includes a data memory barrier used by a companion load operation that ensures that all accesses prior to and including the load with a prefetch are sequenced out before all accesses from the instruction following the load with a prefetch are performed.
Similarly, a conventional store with a release instruction imposes ordering between accesses made from the store with a release and accesses of all preceding instructions (but not subsequent instructions). However, in a particular embodiment, the data memory barrier at the beginning of a store with a release provides strong ordering between previous accesses and store-associated accesses with a release.
In a particular embodiment, a system includes a processor that executes computer-executable instructions for performing operations. The instruction may include a load with acquire instruction that performs memory operation sequencing, where the load with acquire instruction includes a load operation followed by a barrier operation of the data memory.
In another embodiment, a method includes executing instructions in a processor. The method may include a load with fetch instruction for memory operation sequencing, wherein executing the load with fetch instruction includes executing a load operation followed by a barrier operation of a data memory.
In a particular embodiment, a system includes a processor that executes computer-executable instructions for performing operations. The instruction may include a store with a release instruction to sequence memory operations, where the store with a release instruction includes a barrier operation of a first data memory followed by a store operation followed by a barrier operation of a second data memory.
In one embodiment, a method includes executing instructions in a processor. The method may include storing a release-with-instruction for sequencing memory operations, wherein executing the storing of the release-with-instruction includes performing a barrier operation of a first data memory, followed by performing a store operation, followed by performing a barrier operation of a second data memory.
Drawings
FIG. 1 is a block diagram depicting one embodiment of a system for screening memory operations according to aspects described herein.
FIG. 2 is a block diagram depicting one embodiment of a system for screening memory operations according to aspects described herein.
FIG. 3 is a block diagram depicting one embodiment of a system for screening memory operations according to aspects described herein.
FIG. 4 is a block diagram depicting one embodiment of a system for screening memory operations according to aspects described herein.
FIG. 5 is a flow diagram of one embodiment of a method for executing a load with a fetch instruction.
FIG. 6 is a flow diagram depicting one embodiment of a method for executing a store with release instruction.
FIG. 7 depicts a flow diagram of one embodiment of a method for screening memory operations using data memory barriers.
FIG. 8 is a block diagram illustrating an electronic computing environment that may be implemented with one or more aspects described herein.
Fig. 9 is a block diagram of a data communication network that may operate in accordance with various aspects described herein.
Description of the main component symbols:
100,200,300,400 system
102 load/store component
104 processor
106 memory
108 storage component
110 input
112 output
202 preceding instruction
204,304,402,406 data store barrier
206 subsequent instruction
302 load operation
404 store operation
500,600,700 method
502 to 504,602 to 606,702 to 704
800 computing system environment
810 computer
820 processing unit
821 system bus
830 System memory
870 remote computer.
Detailed Description
Embodiments are provided for a system that semantically simplifies load fetch and store release used in Reduced Instruction Set Computing (RISC). In lock-free operations, there are two ways in which threads can manipulate shared memory, which may compete for resources, or may transfer information from one thread to another in a coordinated manner. However, these semantics are complex and replacing dedicated semantics with a simple data memory barrier simplifies the process of memory sequencing. Translating semantics into micro-operations, or low-order instructions, for implementing complex machine instructions may avoid having to implement complex new memory operations. The use of data memory barriers in conjunction with load and store instructions may provide sufficient sequencing using purely brute force sequencing operations.
The terms "instruction," "operation," and "access" as used in this application refer to separate processes and are not interchangeable. An instruction consists of one or more operations, and an operation may include zero or more memory accesses or barriers. For example, a load with a fetch instruction establishes two operations (a load operation and a barrier operation). This barrier divides all memory accesses into two groups. The first group includes accesses from all instructions prior to the load with fetch and accesses from load operations belonging to the load with fetch. The second group includes accesses from all instructions after the load with fetch.
FIG. 1 shows a system 100 for screening memory operations using data memory barriers in a RISC processor, processing environment, or architecture. The RISC processor may comprise a variation of an ARM processor, and in particular, in this particular embodiment, may comprise an ARMv8 processor. As shown, for example, the system 100 may include a load/store component 102 that may be communicatively coupled and/or operatively coupled to a processor 104 for facilitating operation and/or execution of computer-executable instructions and/or components by the system 100, a memory 106 for storing data and/or computer-executable instructions and/or components for execution by the system 100 utilizing the processor 104, and a storage component 108 for providing longer term storage of data and/or computer-executable instructions and/or components that may be executed by the system 100 utilizing the processor 104, for example. Additionally, and as shown, the system 100 may receive an input 110 that may be transitioned from a first state to a second state by execution of one or more computer-executable instructions and/or components by the processor 104, where the first state may be distinct, and/or distinguishable, and/or different from the second state. The system 100 may also generate an output 112, which may include an item that has been transformed into a different state or thing through processing by the system 100.
FIG. 2 depicts a block diagram of an embodiment of a system that screens memory operations, according to aspects described herein. The system 200 includes a data memory barrier 204 that imposes an ordering constraint on a preceding instruction 202 and a subsequent instruction 206. The data memory barrier 204 is a type of barrier operation that causes a CPU or compiler (compiler) to impose ordering constraints on memory operations issued before and after the barrier operation. This generally means ensuring that certain operations are performed before the barrier and other operations are performed later. The data memory barrier 204 ensures that the preceding instruction 202 is made and completed before the subsequent instruction 206 is executed. The preceding instructions 202 and the subsequent instructions 206 may each include various combinations of basic load and store instructions plus more complex variations of these instructions (e.g., no fetch load, no release store, etc.).
In one embodiment, the preceding instruction 202 and the subsequent instruction 206 may comprise load or store instructions configured to load a first set of data from memory and store a second set of data to the memory. The data memory barrier 204 may be configured for ordering load and store data-associated memory operations, where such type of ordering achieved is based on the location of the program order (program order) of the data memory relative to one or more load and store instructions.
FIG. 3 is a block diagram depicting one embodiment of a system for screening memory operations via loads with fetch instructions in accordance with aspects described herein. The system 300 may include a data memory barrier 304 that orders load operations 302 that precede the data memory barrier 304 in program order. The data memory barrier 304 ensures that the load operation 302 is performed and completed before subsequent instructions are executed. The system 300 shows a load with purely fetch instructions including a load operation and a barrier operation of the data memory. In other embodiments, other types of load operations may result in different load instructions, such as a no fetch load and other variants.
FIG. 4 depicts one embodiment of a system for performing a store with release instruction, according to aspects described herein. System 400 may include data memory barriers 402 and 406 on either side of store operation 404 in program order. The data memory barrier 402 ensures that all preceding instructions/operations have been stalled before the store operation 404 is initiated, while the data memory barrier 406 ensures that the store operation 404 is completed before any subsequent memory instructions/operations occur. Additionally, the first data memory barrier 402 and the second data memory barrier 406 also establish ordering to ensure that store with release and load with acquire instructions are observed in program order.
While the method implemented in accordance with the patent object may be shown and described as a series of blocks with reference to the flowcharts of fig. 5-7, it is to be understood that the patent object is not limited by the order of the blocks, as some blocks may occur in different orders and/or concurrently with other blocks from what is depicted and described herein. Moreover, not all illustrated blocks may be required to implement the methodologies described hereinafter.
FIG. 5 is a flow diagram of one embodiment of a method for executing a load with a fetch instruction. The method 500 may begin at 502 where a load operation is performed, where the load operation specifies an address for accessing data from memory.
At 504, a data memory barrier may be executed. The data memory barrier is a type of barrier operation that causes a CPU or compiler to impose ordering constraints on memory operations issued before and after the barrier instruction. This generally means ensuring that certain operations are performed before the barrier and other operations are performed later. The data memory barrier ensures that a preceding instruction is made and completed before a subsequent instruction is executed. In this implementation, the barrier operation of the data memory ensures that the prior load operation is performed and completed before the subsequent instruction is executed.
FIG. 6 is a flow diagram illustrating one embodiment of a method for performing a store with release instruction. The method 600 may begin at 602 where a barrier operation of a first data store is performed. The data memory barrier is a type of barrier instruction that causes a CPU or compiler to impose ordering constraints on memory operations issued before and after the barrier instruction.
At 604, a store operation is performed. The store operation specifies an address for writing data to memory. At 606, a barrier operation is performed for the second data memory. Having a store operation before the barrier operation of two data memories ensures that all other memory operations have been performed and completed before the store operation is performed, and then no other memory operations are allowed before the store operation is completed. In this manner, store with release instructions use a pure store and data memory barrier operation for memory operation sequencing.
FIG. 7 is a flow diagram of one embodiment of a method for screening memory operations using data memory barriers. The method 700 may begin at 702 where a first set of memory operations is performed prior to a barrier. The barrier ensures that all instructions are completed before step 704, where the second set of memory operations are executed after the data memory barrier.
The techniques described herein may be applied to any reduced instruction set computing environment in which memory operation sequencing or screening is desired. It is understood that handheld, portable and other computing devices and computing objects of all kinds are contemplated for use with various embodiments (i.e., any embodiment capable of memory operations). The below general purpose remote computer described below in FIG. 8 is an example, and the disclosed subject matter can be implemented with any client having network/bus interoperability and interaction. Thus, the disclosed patent objects may be implemented on a chip or system in a network-linked hosted services environment, for example, where a client device acts merely as an interface to a network/bus, such as an object placed in an appliance.
FIG. 8 illustrates one embodiment of a suitable computing system environment 800 in which aspects of the disclosed subject matter may be implemented, the computing system environment 800 being but one embodiment of a computing environment suitable for use with a device and is not intended to suggest any limitation as to the scope of use or functionality of the disclosed subject matter.
FIG. 8 is an exemplary device for implementing the disclosed patented object, including a general purpose computing device in the form of a computer 810. Components of computer 810 may include a processing unit 820, a system memory 830, and a system bus 821 that couples various system components including the system memory to the processing unit 820. The system bus 821 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. Computer 810 typically includes a variety of computer readable media.
The system memory 830 may include computer storage media in the form of volatile and/or nonvolatile memory such as Read Only Memory (ROM) and/or Random Access Memory (RAM). A basic input/output system (BIOS), containing the basic routines that help to transfer information between elements within computer 810, may be stored in memory 830. The computer 810 may also include other removable/non-removable, volatile/nonvolatile computer storage media.
A user may enter commands and information into the computer 810 through input devices such as a keyboard and pointing device, commonly referred to as a mouse, trackball or touch pad.
The computer 810 may operate in a networked or distributed environment using logical connections to one or more other remote computers, such as a remote computer 870, which may in turn have different media functions than the device 810.
In addition to the foregoing, the disclosed patent objects may be implemented as a method, apparatus, or article of manufacture using typical manufacturing, programming, or engineering techniques to produce hardware, firmware, software, or any suitable combination thereof to control an electronic device to implement the disclosed patent objects. A computer-readable medium may include a hardware medium, or a software medium, which may include a non-transitory medium or a transport medium.

Claims (13)

1. A processor, comprising:
a load/store component configured to facilitate memory operations, the load/store component further configured to receive a load with a fetch instruction and execute the load with the fetch instruction as a load operation followed by a barrier operation of a data memory, the barrier operation of the data memory replacing a fetch set of load fetch semantics.
2. The processor as recited in claim 1 wherein a barrier operation of the data memory orders memory operations comprising a first set of memory operations occurring before the barrier operation and a second set of memory operations occurring after the barrier operation.
3. The processor of claim 1,
the load operation specifying an address for accessing first data from the memory; and
the load with fetch instruction includes at least one of a plurality of types of loads with fetch instructions.
4. The processor of claim 1, wherein the processor is a Reduced Instruction Set Computing (RISC) processor.
5. The processor of claim 1, wherein the processor is an advanced reduced instruction set computing machine (ARM) processor.
6. The processor of claim 1, wherein the load with fetch instruction is a load without fetch instruction.
7. The processor of claim 1, wherein the processor is a server processor configured to execute client requests associated with one or more network interoperable clients.
8. A processor, comprising:
a load/store component configured to facilitate memory operations, the load/store component further configured to receive a store with a release instruction and execute the store with the release instruction, wherein executing the store with the release instruction comprises a barrier operation of a first data memory followed by a store operation followed by a barrier operation of a second data memory, the barrier operation of the second data memory replacing a set of store release semantics.
9. The processor of claim 8,
the barrier operation of the first data store and the barrier operation of the second data store sequencing memory operations, the memory operations including a first set of memory operations occurring before the barrier operation of the first data store and the barrier operation of the second data store and a second set of memory operations occurring after the barrier operation of the first data store and the barrier operation of the second data store; and
the store operation specifies an address for writing the first data to the memory.
10. The processor of claim 8, wherein the processor is a Reduced Instruction Set Computing (RISC) processor.
11. The processor of claim 10, wherein the processor is an advanced reduced instruction set computing machine (ARM) processor.
12. The processor of claim 8, wherein the store with release instruction is a no release instruction store.
13. The processor of claim 8, wherein the processor is a server processor configured to execute client requests associated with one or more network interoperable clients.
CN201910999320.5A 2015-07-21 2015-07-21 Implementation of load fetch/store release instruction by load/store operation according to DMB operation Pending CN110795150A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910999320.5A CN110795150A (en) 2015-07-21 2015-07-21 Implementation of load fetch/store release instruction by load/store operation according to DMB operation

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201910999320.5A CN110795150A (en) 2015-07-21 2015-07-21 Implementation of load fetch/store release instruction by load/store operation according to DMB operation
CN201580082189.6A CN108139903B (en) 2015-07-21 2015-07-21 Implement load acquisition/storage with load/store operations according to DMB operation to release order
PCT/US2015/041322 WO2017014752A1 (en) 2015-07-21 2015-07-21 Implementation of load acquire/store release instructions using load/store operation with dmb operation

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201580082189.6A Division CN108139903B (en) 2015-07-21 2015-07-21 Implement load acquisition/storage with load/store operations according to DMB operation to release order

Publications (1)

Publication Number Publication Date
CN110795150A true CN110795150A (en) 2020-02-14

Family

ID=57835180

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201910999320.5A Pending CN110795150A (en) 2015-07-21 2015-07-21 Implementation of load fetch/store release instruction by load/store operation according to DMB operation
CN201580082189.6A Expired - Fee Related CN108139903B (en) 2015-07-21 2015-07-21 Implement load acquisition/storage with load/store operations according to DMB operation to release order

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201580082189.6A Expired - Fee Related CN108139903B (en) 2015-07-21 2015-07-21 Implement load acquisition/storage with load/store operations according to DMB operation to release order

Country Status (4)

Country Link
EP (1) EP3326059A4 (en)
JP (1) JP6739513B2 (en)
CN (2) CN110795150A (en)
WO (1) WO2017014752A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10417002B2 (en) 2017-10-06 2019-09-17 International Business Machines Corporation Hazard detection of out-of-order execution of load and store instructions in processors without using real addresses
US10572256B2 (en) 2017-10-06 2020-02-25 International Business Machines Corporation Handling effective address synonyms in a load-store unit that operates without address translation
US10606590B2 (en) 2017-10-06 2020-03-31 International Business Machines Corporation Effective address based load store unit in out of order processors
US11175924B2 (en) 2017-10-06 2021-11-16 International Business Machines Corporation Load-store unit with partitioned reorder queues with single cam port
US10394558B2 (en) 2017-10-06 2019-08-27 International Business Machines Corporation Executing load-store operations without address translation hardware per load-store unit port
US10606591B2 (en) 2017-10-06 2020-03-31 International Business Machines Corporation Handling effective address synonyms in a load-store unit that operates without address translation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000181891A (en) * 1998-12-18 2000-06-30 Hitachi Ltd Shared memory access sequence assurance system
US20050273583A1 (en) * 2004-06-02 2005-12-08 Paul Caprioli Method and apparatus for enforcing membar instruction semantics in an execute-ahead processor
CN101828173A (en) * 2007-10-18 2010-09-08 Nxp股份有限公司 Data processing system with a plurality of processors, cache circuits and a shared memory
US20150046652A1 (en) * 2013-08-07 2015-02-12 Advanced Micro Devices, Inc. Write combining cache microarchitecture for synchronization events

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07302200A (en) * 1994-04-28 1995-11-14 Hewlett Packard Co <Hp> Loading instruction method of computer provided with instruction forcing sequencing loading operation and sequencing storage
US7552317B2 (en) * 2004-05-04 2009-06-23 Sun Microsystems, Inc. Methods and systems for grouping instructions using memory barrier instructions
US7725618B2 (en) * 2004-07-29 2010-05-25 International Business Machines Corporation Memory barriers primitives in an asymmetric heterogeneous multiprocessor environment
US8060482B2 (en) * 2006-12-28 2011-11-15 Intel Corporation Efficient and consistent software transactional memory
GB2461716A (en) * 2008-07-09 2010-01-13 Advanced Risc Mach Ltd Monitoring circuitry for monitoring accesses to addressable locations in data processing apparatus that occur between the start and end events.
US8997103B2 (en) * 2009-09-25 2015-03-31 Nvidia Corporation N-way memory barrier operation coalescing
US8935513B2 (en) * 2012-02-08 2015-01-13 International Business Machines Corporation Processor performance improvement for instruction sequences that include barrier instructions
US9582276B2 (en) * 2012-09-27 2017-02-28 Apple Inc. Processor and method for implementing barrier operation using speculative and architectural color values
US9442755B2 (en) * 2013-03-15 2016-09-13 Nvidia Corporation System and method for hardware scheduling of indexed barriers

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000181891A (en) * 1998-12-18 2000-06-30 Hitachi Ltd Shared memory access sequence assurance system
US20050273583A1 (en) * 2004-06-02 2005-12-08 Paul Caprioli Method and apparatus for enforcing membar instruction semantics in an execute-ahead processor
CN101828173A (en) * 2007-10-18 2010-09-08 Nxp股份有限公司 Data processing system with a plurality of processors, cache circuits and a shared memory
US20150046652A1 (en) * 2013-08-07 2015-02-12 Advanced Micro Devices, Inc. Write combining cache microarchitecture for synchronization events

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JEFF PRESHING: ""Acquire and Release Semantics"", 《HTTPS://PRESHING.COM/20120913/ACQUIRE-AND-RELEASE-SEMANTICS/》, pages 1 - 18 *
LISA HIGHAM等: "Programmer-Centric Conditions for Itanium Memory Consistency", pages 58 *

Also Published As

Publication number Publication date
CN108139903B (en) 2019-11-15
EP3326059A1 (en) 2018-05-30
WO2017014752A1 (en) 2017-01-26
CN108139903A (en) 2018-06-08
JP6739513B2 (en) 2020-08-12
JP2018523235A (en) 2018-08-16
EP3326059A4 (en) 2019-04-17

Similar Documents

Publication Publication Date Title
CN110795150A (en) Implementation of load fetch/store release instruction by load/store operation according to DMB operation
US11003489B2 (en) Cause exception message broadcast between processing cores of a GPU in response to indication of exception event
DE102018126150A1 (en) DEVICE, METHOD AND SYSTEMS FOR MULTICAST IN A CONFIGURABLE ROOM ACCELERATOR
JP5934094B2 (en) Mapping across multiple processors of processing logic with data parallel threads
DE112017001825T5 (en) PROCESSORS, METHODS, SYSTEMS AND INSTRUCTIONS FOR ATOMICALLY SAVING DATA WIDER THAN A NATIVELY SUPPORTED DATA WIDTH IN A MEMORY
US20130231912A1 (en) Method, system, and scheduler for simulating multiple processors in parallel
CN111459618A (en) Intelligent GPU scheduling in virtualized environments
CN107479981B (en) Processing method and device for realizing synchronous call based on asynchronous call
US9170786B1 (en) Composable context menus
US20150277405A1 (en) Production plan display method, production plan support method, production plan display apparatus, production plan support apparatus, and recording medium
US9703905B2 (en) Method and system for simulating multiple processors in parallel and scheduler
DE112013007703T5 (en) Command and logic for identifying instructions for retirement in a multi-stranded out-of-order processor
DE102015007423A1 (en) Memory sequencing with coherent and non-coherent subsystems
EP3200083A1 (en) Resource scheduling method and related apparatus
US20180067859A1 (en) Selective allocation of cpu cache slices to database objects
EP2988469B1 (en) A method and apparatus for updating a user interface of one program unit in response to an interaction with a user interface of another program unit
US10713085B2 (en) Asynchronous sequential processing execution
CN102867018A (en) Method for analogue signal communication between threads in database system
CN103019844A (en) Method and device supporting calling of MPI (Message Passing Interface) function through multiple threads
CN102736949A (en) Scheduling of tasks to be performed by a non-coherent device
EP3131004A1 (en) Processor and method
US20210055971A1 (en) Method and node for managing a request for hardware acceleration by means of an accelerator device
US8584143B2 (en) Collection access in a parallel environment
Khot Parallelization in Python
CN103714511A (en) GPU-based branch processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination